TY - GEN
T1 - Vision Transformer for Learning Driving Policies in Complex and Dynamic Environments
AU - Kargar, Eshagh
AU - Kyrki, Ville
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022/7/19
Y1 - 2022/7/19
N2 - Driving in a complex and dynamic urban environment is a difficult task that requires a complex decision policy. In order to make informed decisions, one needs to gain an understanding of the long-range context and the importance of other vehicles. In this work, we propose to use Vision Transformer (ViT) to learn a driving policy in urban settings with birds-eye-view (BEV) input images. The ViT network learns the global context of the scene more effectively than with earlier proposed Convolutional Neural Networks (ConvNets). Furthermore, ViT's attention mechanism helps to learn an attention map for the scene which allows the ego car to determine which surrounding cars are important to its next decision. We demonstrate that a DQN agent with a ViT backbone outperforms baseline algorithms with ConvNet backbones pre-trained in various ways. In particular, the proposed method helps reinforcement learning algorithms to learn faster, with increased performance and less data than baselines.
AB - Driving in a complex and dynamic urban environment is a difficult task that requires a complex decision policy. In order to make informed decisions, one needs to gain an understanding of the long-range context and the importance of other vehicles. In this work, we propose to use Vision Transformer (ViT) to learn a driving policy in urban settings with birds-eye-view (BEV) input images. The ViT network learns the global context of the scene more effectively than with earlier proposed Convolutional Neural Networks (ConvNets). Furthermore, ViT's attention mechanism helps to learn an attention map for the scene which allows the ego car to determine which surrounding cars are important to its next decision. We demonstrate that a DQN agent with a ViT backbone outperforms baseline algorithms with ConvNet backbones pre-trained in various ways. In particular, the proposed method helps reinforcement learning algorithms to learn faster, with increased performance and less data than baselines.
UR - http://www.scopus.com/inward/record.url?scp=85135372101&partnerID=8YFLogxK
U2 - 10.1109/IV51971.2022.9827348
DO - 10.1109/IV51971.2022.9827348
M3 - Conference article in proceedings
AN - SCOPUS:85135372101
T3 - IEEE Intelligent Vehicles Symposium, Proceedings
SP - 1558
EP - 1564
BT - 2022 IEEE Intelligent Vehicles Symposium, IV 2022
PB - IEEE
T2 - IEEE Intelligent Vehicles Symposium
Y2 - 5 June 2022 through 9 June 2022
ER -