6 Downloads (Pure)

Abstract

Reinforcement learning (RL) is able to solve complex sequential decision-making tasks but is currently limited by sample efficiency and required computation. To improve sample efficiency, recent work focuses on model-based RL which interleaves model learning with planning. Recent methods further utilize policy learning, value estimation, and, self-supervised learning as auxiliary objectives. In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. This applies when using pure planning with a dynamics model conditioned on the representation, but, also when utilizing the representation as policy and value function features in model-free RL. In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1× faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the Deepmind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods’ sample efficiency while training 2.4× faster.
Original languageEnglish
Title of host publicationProceedings of the 40th International Conference on Machine Learning
EditorsAndread Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, Jonathan Scarlett
PublisherJMLR
Pages42227-42246
Number of pages20
Publication statusPublished - Jul 2023
MoE publication typeA4 Conference publication
EventInternational Conference on Machine Learning - Honolulu, United States
Duration: 23 Jul 202329 Jul 2023
Conference number: 40

Publication series

NameProceedings of Machine Learning Research
PublisherPMLR
Volume202
ISSN (Electronic)2640-3498

Conference

ConferenceInternational Conference on Machine Learning
Abbreviated titleICML
Country/TerritoryUnited States
CityHonolulu
Period23/07/202329/07/2023

Fingerprint

Dive into the research topics of 'Simplified Temporal Consistency Reinforcement Learning'. Together they form a unique fingerprint.

Cite this