DUSt3R: Geometric 3D Vision Made Easy

Shuzhe Wang*, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, Jerome Revaud

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

60 Citations (Scopus)

Abstract

Multi-view stereo reconstruction (MVS) in the wild re-quires to first estimate the camera intrinsic and extrinsic parameters. These are usually tedious and cumbersome to obtain, yet they are mandatory to triangulate corresponding pixels in 3D space, which is at the core of all best performing MVS algorithms. In this work, we take an opposite stance and introduce DUSt3R, a radically novel paradigm for Dense and Unconstrained Stereo 3D Reconstruction of arbitrary image collections, operating without prior infor-mation about camera calibration nor viewpoint poses. We cast the pairwise reconstruction problem as a regression of pointmaps, relaxing the hard constraints of usual projective camera models. We show that this formulation smoothly unifies the monocular and binocular reconstruction cases. In the case where more than two images are provided, we fur-ther propose a simple yet effective global alignment strategy that expresses all pairwise pointmaps in a common refer-ence frame. We base our network architecture on standard Transformer encoders and decoders, allowing us to leverage powerful pretrained models. Our formulation directly provides a 3D model of the scene as well as depth information, but interestingly, we can seamlessly recover from it, pixel matches, focal lengths, relative and absolute cameras. Extensive experiments on all these tasks showcase how DUSt3R effectively unifies various 3D vision tasks, setting new performance records on monocular & multi-view depth estimation as well as relative pose estimation. In summary, DUSt3R makes many geometric 3D vision tasks easy. Code and mod-els at https://github.com/naver/dust3r.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PublisherIEEE
Pages20697-20709
Number of pages13
ISBN (Electronic)979-8-3503-5300-6
DOIs
Publication statusPublished - 2024
MoE publication typeA4 Conference publication
EventIEEE Conference on Computer Vision and Pattern Recognition - Seattle, United States
Duration: 16 Jun 202422 Jun 2024

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919

Conference

ConferenceIEEE Conference on Computer Vision and Pattern Recognition
Abbreviated titleCVPR
Country/TerritoryUnited States
CitySeattle
Period16/06/202422/06/2024

Keywords

  • 3D reconstruction
  • camera calibration
  • foundation model
  • monocular depth
  • multi-view depth
  • multi-view pose estimation
  • visual localization

Fingerprint

Dive into the research topics of 'DUSt3R: Geometric 3D Vision Made Easy'. Together they form a unique fingerprint.

Cite this