LR²Depth: Large-Region Aggregation at Low Resolution for Efficient Monocular Depth Estimation

Monocular depth estimation (MDE) is crucial for various computer vision applications, but existing methods often struggle to balance inference speed and accuracy when processing large-region visual information. This paper introduces LR²Depth, a novel MDE method that addresses this challenge by utilizing large-kernel convolution on low-resolution feature maps for efficient large-region feature aggregation. Our approach leverages the fact that each pixel on low-resolution feature maps corresponds to a larger region of the original image, allowing for fast and accurate depth predictions at a lower inference cost. Extensive experiments on NYU-Depth-V2, KITTI, and SUN RGB-D datasets demonstrate that LR$^2$Depth not only achieves state-of-the-art performance but also operates approximately twice as fast as previous MDE methods. Notably, at the time of submission, LR²Depth secured the top-1 position on the KITTI depth prediction online benchmark in 2024.

LR²Depth: Large-Region Aggregation at Low Resolution for Efficient Monocular Depth Estimation

(IROS 2025)

Demo

MDE using LR²Depth on low power mobile device

Abstract

Method

Main Results

Quantitative comparison

Qualitative comparison

3D Reconstruction