Align3R: Aligned Monocular Depth Estimation for Dynamic Videos

Lu, Jiahao; Huang, Tianyu; Li, Peng; Dou, Zhiyang; Lin, Cheng; Cui, Zhiming; Dong, Zhen; Yeung, Sai-Kit; Wang, Wenping; Liu, Yuan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.03079 (cs)

[Submitted on 4 Dec 2024 (v1), last revised 5 Dec 2024 (this version, v2)]

Title:Align3R: Aligned Monocular Depth Estimation for Dynamic Videos

Authors:Jiahao Lu, Tianyu Huang, Peng Li, Zhiyang Dou, Cheng Lin, Zhiming Cui, Zhen Dong, Sai-Kit Yeung, Wenping Wang, Yuan Liu

View PDF HTML (experimental)

Abstract:Recent developments in monocular depth estimation methods enable high-quality depth estimation of single-view images but fail to estimate consistent video depth across different frames. Recent works address this problem by applying a video diffusion model to generate video depth conditioned on the input video, which is training-expensive and can only produce scale-invariant depth values without camera poses. In this paper, we propose a novel video-depth estimation method called Align3R to estimate temporal consistent depth maps for a dynamic video. Our key idea is to utilize the recent DUSt3R model to align estimated monocular depth maps of different timesteps. First, we fine-tune the DUSt3R model with additional estimated monocular depth as inputs for the dynamic scenes. Then, we apply optimization to reconstruct both depth maps and camera poses. Extensive experiments demonstrate that Align3R estimates consistent video depth and camera poses for a monocular video with superior performance than baseline methods.

Comments:	Project Page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.03079 [cs.CV]
	(or arXiv:2412.03079v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.03079

Submission history

From: Tianyu Huang [view email]
[v1] Wed, 4 Dec 2024 07:09:59 UTC (17,259 KB)
[v2] Thu, 5 Dec 2024 14:16:07 UTC (17,259 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Align3R: Aligned Monocular Depth Estimation for Dynamic Videos

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Align3R: Aligned Monocular Depth Estimation for Dynamic Videos

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators