Projekteja vuodessa
Abstrakti
Effective modeling of human behavior is crucial for the safe and reliable coexistence of humans and autonomous vehicles. Traditional deep learning methods have limitations in capturing the complexities of pedestrian behavior, often relying on simplistic representations or indirect inference from visual cues, which hinders their explainability. To address this gap, we introduce PedVLM, a vision-language model that leverages multiple modalities (RGB images, optical flow, and text) to predict pedestrian intentions and also provide explainability for pedestrian behavior. PedVLM comprises a CLIP-based vision encoder and a text-to-text transfer transformer (T5) language model, which together extract and combine visual and text embeddings to predict pedestrian actions and enhance explainability. Furthermore, to complement our PedVLM model and further facilitate research, we also publicly release the corresponding dataset, PedPrompt, which includes the prompts in the Question-Answer (QA) template for pedestrian intention prediction. PedVLM is evaluated on PedPrompt, JAAD, and PIE datasets demonstrates its efficacy compared to state-of-the-art methods. The dataset and code will be made available at https://github.com/munirfarzeen/PedVLM
Alkuperäiskieli | Englanti |
---|---|
Sivut | 393-406 |
Sivumäärä | 14 |
Julkaisu | IEEE Open Journal of Intelligent Transportation Systems |
Vuosikerta | 6 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 2025 |
OKM-julkaisutyyppi | A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä |
Sormenjälki
Sukella tutkimusaiheisiin 'Pedestrian Vision Language Model for Intentions Prediction'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.Projektit
- 1 Päättynyt
-
AIforLEssAuto: Artificial Intelligence for Urban Low-Emission Autonomous Traffic
Kyrki, V. (Vastuullinen tutkija)
EU The Recovery and Resilience Facility (RRF)
01/01/2022 → 31/12/2024
Projekti: RCF Academy Project targeted call