Abstract
Visual-language models (VLMs) have recently been introduced in robotic mapping using the latent representations, i.e., embeddings, of the VLMs to represent semantics in the map. They allow moving from a limited set of human-created labels toward open-vocabulary scene understanding, which is very useful for robots when operating in complex real-world environments and interacting with humans. While there is anecdotal evidence that maps built this way support downstream tasks, such as navigation, rigorous analysis of the quality of the maps using these embeddings is missing.
In this paper, we propose a way to analyze the quality of maps created using VLMs. We investigate two critical properties of map quality: queryability and distinctness. The evaluation of queryability addresses the ability to retrieve information from the embeddings. We investigate intra-map distinctness to study the ability of the embeddings to represent abstract semantic classes and inter-map distinctness to evaluate the generalization properties of the representation.
We propose metrics to evaluate these properties and evaluate two state-of-the-art mapping methods, VLMaps and OpenScene, using two encoders, LSeg and OpenSeg, using real-world data from the Matterport3D data set. Our findings show that while 3D features improve queryability, they are not scale invariant, whereas image-based embeddings generalize to multiple map resolutions. This allows the image-based methods to maintain smaller map sizes, which can be crucial for using these methods in real-world deployments. Furthermore, we show that the choice of the encoder has an effect on the results. The results imply that properly thresholding open-vocabulary queries is an open problem.
In this paper, we propose a way to analyze the quality of maps created using VLMs. We investigate two critical properties of map quality: queryability and distinctness. The evaluation of queryability addresses the ability to retrieve information from the embeddings. We investigate intra-map distinctness to study the ability of the embeddings to represent abstract semantic classes and inter-map distinctness to evaluate the generalization properties of the representation.
We propose metrics to evaluate these properties and evaluate two state-of-the-art mapping methods, VLMaps and OpenScene, using two encoders, LSeg and OpenSeg, using real-world data from the Matterport3D data set. Our findings show that while 3D features improve queryability, they are not scale invariant, whereas image-based embeddings generalize to multiple map resolutions. This allows the image-based methods to maintain smaller map sizes, which can be crucial for using these methods in real-world deployments. Furthermore, we show that the choice of the encoder has an effect on the results. The results imply that properly thresholding open-vocabulary queries is an open problem.
| Original language | English |
|---|---|
| Title of host publication | 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2025 |
| Publisher | IEEE |
| Pages | 4059-4066 |
| Number of pages | 8 |
| ISBN (Electronic) | 979-8-3315-4393-8 |
| DOIs | |
| Publication status | Published - Oct 2025 |
| MoE publication type | A4 Conference publication |
| Event | IEEE/RSJ International Conference on Intelligent Robots and Systems - Hangzhou, China Duration: 19 Oct 2025 → 25 Oct 2025 |
Publication series
| Name | Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems |
|---|---|
| Publisher | IEEE |
| ISSN (Electronic) | 2153-0866 |
Conference
| Conference | IEEE/RSJ International Conference on Intelligent Robots and Systems |
|---|---|
| Abbreviated title | IROS |
| Country/Territory | China |
| City | Hangzhou |
| Period | 19/10/2025 → 25/10/2025 |
Fingerprint
Dive into the research topics of 'Do Visual-Language Grid Maps Capture Latent Semantics?'. Together they form a unique fingerprint.-
Hypermaps: Hypermaps: closing the complexity gap in robotic mapping
Verdoja, F. (Principal investigator), Nguyen, P. (Project Member) & Pekkanen, M. (Project Member)
01/09/2023 → 31/08/2027
Project: RCF Academy Research Fellow (new)
-
SANTTU: Kumppanuusmalli - SANTTU - Aalto
Kyrki, V. (Principal investigator), Chaubey, S. (Project Member), Blanco Mulero, D. (Project Member), Nguyen Le, T. (Project Member), Verdoja, F. (Project Member), Hannus, E. (Project Member), Arndt, K. (Project Member), Struckmeier, O. (Project Member), Nóbrega Barros, S. (Project Member) & Pekkanen, M. (Project Member)
01/04/2022 → 31/03/2024
Project: BF Co-Innovation
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver