Abstract
Many real world reinforcement learning (RL) problems consist of multiple conflicting objective functions that need to be optimized simultaneously. Finding these optimal policies (known as Pareto optimal policies) for different preferences of objectives requires extensive state space exploration. Thus, obtaining a dense set of Pareto optimal policies is challenging and often reduces the sample efficiency. In this paper, we propose a hybrid multiobjective policy optimization approach for solving multiobjective reinforcement learning (MORL) problems with continuous actions. Our approach combines the faster convergence of multiobjective policy gradient (MOPG) and a surrogate assisted multiobjective evolutionary algorithm (MOEA) to produce a dense set of Pareto optimal policies. The solutions found by the MOPG algorithm are utilized to build computationally inexpensive surrogate models in the parameter space of the policies that approximate the return of policies. An MOEA is executed that utilizes the surrogates’ mean prediction and uncertainty in the prediction to find approximate optimal policies. The final solution policies are later evaluated using the simulator and stored in an archive. Tests on multiobjective continuous action RL benchmarks show that a hybrid surrogate assisted multiobjective evolutionary optimizer with robust selection criterion produces a dense set of Pareto optimal policies without extensively exploring the state space. We also apply the proposed approach to train Pareto optimal agents for autonomous driving, where the hybrid approach produced superior results compared to a state-of-the-art MOPG algorithm.
Original language | English |
---|---|
Title of host publication | Applications of Evolutionary Computation - 27th European Conference, EvoApplications 2024, Held as Part of EvoStar 2024, Proceedings |
Editors | Stephen Smith, João Correia, Christian Cintrano |
Publisher | Springer |
Pages | 61-75 |
Number of pages | 15 |
ISBN (Print) | 9783031568541 |
DOIs | |
Publication status | Published - 2024 |
MoE publication type | A4 Conference publication |
Event | European Conference on Applications of Evolutionary Computation - Aberystwyth, United Kingdom Duration: 3 Apr 2024 → 5 Apr 2024 Conference number: 27 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 14635 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | European Conference on Applications of Evolutionary Computation |
---|---|
Abbreviated title | EvoApplications |
Country/Territory | United Kingdom |
City | Aberystwyth |
Period | 03/04/2024 → 05/04/2024 |
Keywords
- multiobjective evolutionary optimization
- multiobjective policy gradient
- Multiobjective reinforcement learning
- Pareto front
- surrogate assisted optimization