PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Perttu Hämäläinen, Amin Babadi, Xiaoxiao Ma, Jaakko Lehtinen

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

29 Citations (Scopus)
378 Downloads (Pure)

Abstract

Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach. However, we observe that in a continuous action space, PPO can prematurely shrink the exploration variance, which leads to slow progress and may make the algorithm prone to getting stuck in local optima. Drawing inspiration from CMA-ES, a black-box evolutionary optimization method designed for robustness in similar situations, we propose PPO-CMA, a proximal policy optimization approach that adaptively expands the exploration variance to speed up progress. With only minor changes to PPO, our algorithm considerably improves performance in Roboschool continuous control benchmarks. Our results also show that PPO-CMA, as opposed to PPO, is significantly less sensitive to the choice of hyperparameters, allowing one to use it in complex movement optimization tasks without requiring tedious tuning.
Original languageEnglish
Title of host publicationProceedings of the 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing, MLSP 2020
PublisherIEEE
Number of pages6
ISBN (Electronic)978-1-7281-6662-9
DOIs
Publication statusPublished - Sept 2020
MoE publication typeA4 Conference publication
EventIEEE International Workshop on Machine Learning for Signal Processing - Aalto University, Espoo, Finland
Duration: 21 Sept 202024 Sept 2020
Conference number: 30
https://ieeemlsp.cc

Workshop

WorkshopIEEE International Workshop on Machine Learning for Signal Processing
Abbreviated titleMLSP
Country/TerritoryFinland
CityEspoo
Period21/09/202024/09/2020
Internet address

Fingerprint

Dive into the research topics of 'PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation'. Together they form a unique fingerprint.
  • Virtual Coach Based on Multibody Dynamics

    Hämäläinen, P. (Principal investigator), Rajamäki, J. (Project Member), Naderi, K. (Project Member), Takatalo, J. (Project Member) & Kaos, M. (Project Member)

    01/01/201731/12/2018

    Project: Academy of Finland: Other research funding

  • IMAI: Interactive Movement Artificial Intelligence (IMAI)

    Hämäläinen, P. (Principal investigator), Babadi, A. (Project Member), Rajamäki, J. (Project Member), Kaos, M. (Project Member), Takatalo, J. (Project Member), Toikka, J. (Project Member), Ikkala, A. (Project Member) & Naderi, K. (Project Member)

    01/09/201631/08/2020

    Project: Academy of Finland: Other research funding

Cite this