Sample-Efficient Methods for Real-World Deep Reinforcement Learning

Rinu Boney

Research output: ThesisDoctoral ThesisCollection of Articles


Reinforcement learning (RL) is a general framework for learning and evaluating intelligent behaviors in any domain. Deep reinforcement learning combines RL with deep learning to learn expressive nonlinear functions that can interpret rich sensory signals to produce complex behaviors. However, this comes at the cost of increased sample complexity and instability, limiting the practical impact of deep RL algorithms on real-world problems. The thesis presents advances towards improving the sample efficiency and benchmarking of deep RL algorithms on real-world problems. This work develops sample-efficient deep RL algorithms for three different problem settings: multi-agent discrete control, continuous control, and continuous control from image observations. For multi-agent discrete control, the thesis proposes a sample-efficient model-based approach that plans using known dynamics models, to learn to play imperfect-information games with large state-action spaces. This is achieved by training a policy network from partial observations to imitate the actions of an oracle planner that has full observability. For continuous control, the thesis demonstrates that trajectory optimization with learned dynamics models could lead to the optimization procedure exploiting the inaccuracies of the model. The thesis proposes two regularization strategies to prevent this, based on uncertainty estimates from a denoising autoencoder or an energy-based model, to achieve rapid initial learning on a set of popular continuous control tasks. For continuous control problems with image observations, the thesis proposes an actor-critic method that learns feature point state representations, without any additional supervision, for improved sample efficiency. The thesis also introduces two low-cost robot learning benchmarks to ground the research of RL algorithms on real-world problems. The first benchmark adapts an open-source RC car platform called Donkey car to benchmark RL algorithms on continuous control of the car to learn to drive around miniature tracks from image observations. The second benchmark is based on a low-cost quadruped robot developed in this thesis called RealAnt, to benchmark RL algorithms on continuous control of the robot servos to learn basic tasks like turning and walking. The thesis demonstrates sample-efficient deep RL using existing methods on these benchmarks.
Translated title of the contributionSample-Efficient Methods for Real-World Deep Reinforcement Learning
Original languageEnglish
QualificationDoctor's degree
Awarding Institution
  • Aalto University
  • Kannala, Juho, Supervising Professor
  • Ilin, Alexander, Supervising Professor
Print ISBNs978-952-64-0808-8
Electronic ISBNs978-952-64-0809-5
Publication statusPublished - 2022
MoE publication typeG5 Doctoral dissertation (article)


  • reinforcement learning
  • deep learning
  • robot learning
  • sample-efficient learning


Dive into the research topics of 'Sample-Efficient Methods for Real-World Deep Reinforcement Learning'. Together they form a unique fingerprint.

Cite this