Abstract
We propose a novel idea for depth estimation from multi-view image-pose pairs, where the model has capability to leverage information from previous latent-space encodings of the scene. This model uses pairs of images and poses, which are passed through an encoder-decoder model for disparity estimation. The novelty lies in soft-constraining the bottleneck layer by a nonparametric Gaussian process prior. We propose a pose-kernel structure that encourages similar poses to have resembling latent spaces. The flexibility of the Gaussian process (GP) prior provides adapting memory for fusing information from nearby views. We train the encoder-decoder and the GP hyperparameters jointly end-to-end. In addition to a batch method, we derive a lightweight estimation scheme that circumvents standard pitfalls in scaling Gaussian process inference, and demonstrate how our scheme can run in real-time on smart devices.
Original language | English |
---|---|
Title of host publication | 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) |
Publisher | IEEE |
Pages | 2651–2660 |
Number of pages | 10 |
ISBN (Electronic) | 9781728148038 |
DOIs | |
Publication status | Published - 2019 |
MoE publication type | A4 Article in a conference publication |
Event | IEEE International Conference on Computer Vision - Seoul, Korea, Republic of Duration: 27 Oct 2019 → 2 Nov 2019 http://iccv2019.thecvf.com/ |
Publication series
Name | Proceedings of the IEEE International Conference on Computer Vision |
---|---|
Volume | 2019-October |
ISSN (Electronic) | 1550-5499 |
Conference
Conference | IEEE International Conference on Computer Vision |
---|---|
Abbreviated title | ICCV |
Country/Territory | Korea, Republic of |
City | Seoul |
Period | 27/10/2019 → 02/11/2019 |
Internet address |