The past decade has witnessed enormous progress in reinforcement learning, with intelligent agents learning to perform a variety of different tasks, including locomotion, imitating human behavior, and even outperforming human experts in a range of board games and video games of various complexity, such as Pong, Go, or Dota 2. However, all these tasks share one common characteristic: they are all either performed entirely in simulation, or are based on simple rules that can be perfectly modeled in software. Furthermore, current reinforcement learning approaches that perform well in virtual environments cannot be directly applied to physical agents operating in the real world, such as robots, due to their reliance on massive data collection. As such, the training process not only takes a long time, resulting in hardware depreciation, but often involves a safety risk associated with active exploration: the agent must evaluate a large number of possible actions in order to decide on the best one, some of which can lead to catastrophic outcomes. One proposed solution to this problem is to train reinforcement learning policies for robots in simulation and to later deploy the trained behavior policy on the real physical system. This approach, however, raises a number of new issues: simulated dynamics and observations do not exactly match the real world, and thus behaviors learned in simulation often do not transfer well to the real system. This thesis formulates the sim-to-real transfer of robot policies as an augmented Markov decision process. Within the proposed framework, the problem is then divided into individual subproblems, each of which is addressed separately. The thesis begins with a discussion of the possibility of transferring behavior policies to the real world without any real-world data available to the algorithm. The applicability of such methods to the case of dynamics and visual discrepancies between source and target domains is analyzed and the limitations of such methods in both scenarios are discussed. The thesis then evaluates a range of methods for using real-world data to improve domain transfer accuracy in a data-efficient way, with a focus on system parameter estimation, policy and model adaptation through meta-learning, and efficient ways of collecting informative real-world data. Finally, the thesis discusses the safety aspects of the sim-to-real adaptation scenario by extending the augmented MDP framework, and it explores how safe adaptation can be achieved through constraints on the action space and through cautious, safety-aware domain adaptation algorithms. The safety considerations behind finding optimal parameter distributions for sim-to-real policy training are also discussed. Our experiments show that robot policies can be successfully transferred from simulation to the real world and that each of the different issues with sim-to-real domain transfer can be addressed with dedicated algorithms, leading to safe and efficient real-world operation.
|Translated title of the contribution||Safe and efficient transfer of robot policies from simulation to the real world|
|Publication status||Published - 2023|
|MoE publication type||G5 Doctoral dissertation (article)|
- machine learning
- reinforcement learning