Vision Transformer for Learning Driving Policies in Complex and Dynamic Environments

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

16 Downloads (Pure)

Abstract

Driving in a complex and dynamic urban environment is a difficult task that requires a complex decision policy. In order to make informed decisions, one needs to gain an understanding of the long-range context and the importance of other vehicles. In this work, we propose to use Vision Transformer (ViT) to learn a driving policy in urban settings with birds-eye-view (BEV) input images. The ViT network learns the global context of the scene more effectively than with earlier proposed Convolutional Neural Networks (ConvNets). Furthermore, ViT's attention mechanism helps to learn an attention map for the scene which allows the ego car to determine which surrounding cars are important to its next decision. We demonstrate that a DQN agent with a ViT backbone outperforms baseline algorithms with ConvNet backbones pre-trained in various ways. In particular, the proposed method helps reinforcement learning algorithms to learn faster, with increased performance and less data than baselines.

Original languageEnglish
Title of host publication2022 IEEE Intelligent Vehicles Symposium, IV 2022
PublisherIEEE
Pages1558-1564
Number of pages7
ISBN (Electronic)9781665488211
DOIs
Publication statusPublished - 19 Jul 2022
MoE publication typeA4 Article in a conference publication
EventIEEE Intelligent Vehicles Symposium - Aachen, Germany
Duration: 5 Jun 20229 Jun 2022

Publication series

NameIEEE Intelligent Vehicles Symposium, Proceedings
Volume2022-June

Conference

ConferenceIEEE Intelligent Vehicles Symposium
Abbreviated titleIV
Country/TerritoryGermany
CityAachen
Period05/06/202209/06/2022

Fingerprint

Dive into the research topics of 'Vision Transformer for Learning Driving Policies in Complex and Dynamic Environments'. Together they form a unique fingerprint.

Cite this