Evaluating the quality of robotic visual-language maps

Matti Pekkanen*, Tsvetomila Mihaylova, Francesco Verdoja, Ville Kyrki

*Tämän työn vastaava kirjoittaja

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsProfessional

Abstrakti

Visual-language models (VLMs) have recently been introduced in robotic mapping by using the latent representations, i.e., embeddings, of the VLMs to represent the natural language semantics in the map. The main benefit is moving beyond a small set of human-created labels toward open-vocabulary scene understanding. While there is anecdotal evidence that maps built this way support downstream tasks, such as navigation, rigorous analysis of the quality of the maps using these embeddings is lacking. In this paper, we propose a way to analyze the quality of maps created using VLMs by evaluating two critical properties: queryability and consistency. We demonstrate the proposed method by evaluating the maps created by two state-of-the-art methods, VLMaps and OpenScene, using two encoders, LSeg and OpenSeg, using real-world data from the Matterport3D data set. We find that OpenScene outperforms VLMaps with both encoders, and LSeg outperforms OpenSeg with both methods.
AlkuperäiskieliEnglanti
OtsikkoWorkshop on Vision-Language Models for Navigation and Manipulation
KustantajaIEEE
Sivumäärä5
TilaJulkaistu - 17 toukok. 2024
OKM-julkaisutyyppiD3 Artikkeli ammatillisessa konferenssijulkaisussa
TapahtumaWorkshop on Vision-Language Models for Navigation and Manipulation - Pacifico Yokohama, Yokohama, Japani
Kesto: 17 toukok. 202417 toukok. 2024
https://vlmnm-workshop.github.io/

Workshop

WorkshopWorkshop on Vision-Language Models for Navigation and Manipulation
LyhennettäVLMNM
Maa/AlueJapani
KaupunkiYokohama
Ajanjakso17/05/202417/05/2024
www-osoite

Sormenjälki

Sukella tutkimusaiheisiin 'Evaluating the quality of robotic visual-language maps'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä