Abstract
Multi-objective reinforcement learning (MORL) is the generalization of standard reinforcement learning (RL) approaches to solve sequential decision making problems that consist of several, possibly conflicting, objectives. Generally, in such formulations, there is no single optimal policy which optimizes all the objectives simultaneously, and instead, a number of policies has to be found each optimizing a preference of the objectives. In this paper, we introduce a novel MORL approach by training a meta-policy, a policy simultaneously trained with multiple tasks sampled from a task distribution, for a number of randomly sampled Markov decision processes (MDPs). In other words, the MORL is framed as a meta-learning problem, with the task distribution given by a distribution over the preferences. We demonstrate that such a formulation results in a better approximation of the Pareto optimal solutions in terms of both the optimality and the computational efficiency. We evaluated our method on obtaining Pareto optimal policies using a number of continuous control problems with high degrees of freedom.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019 |
| Publisher | IEEE |
| Pages | 977-983 |
| Number of pages | 7 |
| ISBN (Electronic) | 978-1-7281-4004-9 |
| DOIs | |
| Publication status | Published - 2019 |
| MoE publication type | A4 Conference publication |
| Event | IEEE/RSJ International Conference on Intelligent Robots and Systems - The Venetian Macao, Macau, China Duration: 4 Nov 2019 → 8 Nov 2019 https://www.iros2019.org/ |
Publication series
| Name | Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems |
|---|---|
| ISSN (Print) | 2153-0858 |
| ISSN (Electronic) | 2153-0866 |
Conference
| Conference | IEEE/RSJ International Conference on Intelligent Robots and Systems |
|---|---|
| Abbreviated title | IROS |
| Country/Territory | China |
| City | Macau |
| Period | 04/11/2019 → 08/11/2019 |
| Internet address |
Funding
This work is supported by the European Unions Horizon 2020 research and innovation program, the CENTAURO project (under grant agreement No. 644839), the socSMCs project (H2020-FETPROACT-2014), and also by the Academy of Finland through the DEEPEN project.
Keywords
- Decision making
- Learning
- Markov processes
- Pareto optimisation
Fingerprint
Dive into the research topics of 'Meta-Learning for Multi-objective Reinforcement Learning'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Deep reinforcement learning for physical agents
Kyrki, V. (Principal investigator), Yang, Y. (Project Member), Hazara, M. (Project Member), Arndt, K. (Project Member), Ghadirzadeh, A. (Project Member), Hämäläinen, A. (Project Member) & Struckmeier, O. (Project Member)
01/01/2018 → 31/12/2019
Project: Academy of Finland: Other research funding
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver