PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

11 Sitaatiot (Scopus)
189 Lataukset (Pure)

Abstrakti

Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach. However, we observe that in a continuous action space, PPO can prematurely shrink the exploration variance, which leads to slow progress and may make the algorithm prone to getting stuck in local optima. Drawing inspiration from CMA-ES, a black-box evolutionary optimization method designed for robustness in similar situations, we propose PPO-CMA, a proximal policy optimization approach that adaptively expands the exploration variance to speed up progress. With only minor changes to PPO, our algorithm considerably improves performance in Roboschool continuous control benchmarks. Our results also show that PPO-CMA, as opposed to PPO, is significantly less sensitive to the choice of hyperparameters, allowing one to use it in complex movement optimization tasks without requiring tedious tuning.
AlkuperäiskieliEnglanti
OtsikkoProceedings of the 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing, MLSP 2020
KustantajaIEEE
Sivumäärä6
ISBN (elektroninen)978-1-7281-6662-9
DOI - pysyväislinkit
TilaJulkaistu - syysk. 2020
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaIEEE International Workshop on Machine Learning for Signal Processing - Aalto University, Espoo, Suomi
Kesto: 21 syysk. 202024 syysk. 2020
Konferenssinumero: 30
https://ieeemlsp.cc

Workshop

WorkshopIEEE International Workshop on Machine Learning for Signal Processing
LyhennettäMLSP
Maa/AlueSuomi
KaupunkiEspoo
Ajanjakso21/09/202024/09/2020
www-osoite

Sormenjälki

Sukella tutkimusaiheisiin 'PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä