GPU-Accelerated Policy Optimization via Batch Automatic Differentiation of Gaussian Processes for Real-World Control

Abdolreza Taheri, Joni Pajarinen, Reza Ghabcheloo

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

1 Sitaatiot (Scopus)

Abstrakti

The ability of Gaussian processes (GPs) to predict the behavior of dynamical systems as a more sample-efficient alternative to parametric models seems promising for real-world robotics research. However, the computational complexity of GPs has made policy search a highly time and memory consuming process that has not been able to scale to larger problems. In this work, we develop a policy optimization method by leveraging fast predictive sampling methods to process batches of trajectories in every forward pass, and compute gradient updates over policy parameters by automatic differentiation of Monte Carlo evaluations, all on GPU. We demonstrate the effectiveness of our approach in training policies on a set of reference-tracking control experiments with a heavy-duty machine. Benchmark results show a significant speedup over exact methods and showcase the scalability of our method to larger policy networks, longer horizons, and up to thousands of trajectories with a sublinear drop in speed.

AlkuperäiskieliEnglanti
Otsikko2022 IEEE International Conference on Robotics and Automation, ICRA 2022
KustantajaIEEE
Sivut10557-10563
Sivumäärä7
ISBN (elektroninen)9781728196817
DOI - pysyväislinkit
TilaJulkaistu - 2022
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaIEEE International Conference on Robotics and Automation - Philadelphia, Yhdysvallat
Kesto: 23 toukok. 202227 toukok. 2022
Konferenssinumero: 39

Julkaisusarja

NimiProceedings - IEEE International Conference on Robotics and Automation
ISSN (painettu)1050-4729

Conference

ConferenceIEEE International Conference on Robotics and Automation
LyhennettäICRA
Maa/AlueYhdysvallat
KaupunkiPhiladelphia
Ajanjakso23/05/202227/05/2022

Sormenjälki

Sukella tutkimusaiheisiin 'GPU-Accelerated Policy Optimization via Batch Automatic Differentiation of Gaussian Processes for Real-World Control'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä