TY - GEN
T1 - Online vs. Offline Adaptive Domain Randomization Benchmark
AU - Tiboni, Gabriele
AU - Arndt, Karol
AU - Averta, Giuseppe
AU - Kyrki, Ville
AU - Tommasi, Tatiana
N1 - Funding Information:
Acknowledgments. We acknowledge the computational resources generously provided by HPC@POLITO and by the Aalto Science-IT project.
Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Physics simulators have shown great promise for conveniently learning reinforcement learning policies in safe, unconstrained environments. However, transferring the acquired knowledge to the real world can be challenging due to the reality gap. To this end, several methods have been recently proposed to automatically tune simulator parameters with posterior distributions given real data, for use with domain randomization at training time. These approaches have been shown to work for various robotic tasks under different settings and assumptions. Nevertheless, existing literature lacks a thorough comparison of existing adaptive domain randomization methods with respect to transfer performance and real-data efficiency. This work presents an open benchmark for both offline and online methods (SimOpt, BayRn, DROID, DROPO), to investigate current limitations on multiple settings and tasks. We found that online methods are limited by the quality of the currently learned policy for the next iteration, while offline methods may sometimes fail when replaying trajectories in simulation with open-loop commands. The code used is publicly available at https://github.com/gabrieletiboni/adr-benchmark.
AB - Physics simulators have shown great promise for conveniently learning reinforcement learning policies in safe, unconstrained environments. However, transferring the acquired knowledge to the real world can be challenging due to the reality gap. To this end, several methods have been recently proposed to automatically tune simulator parameters with posterior distributions given real data, for use with domain randomization at training time. These approaches have been shown to work for various robotic tasks under different settings and assumptions. Nevertheless, existing literature lacks a thorough comparison of existing adaptive domain randomization methods with respect to transfer performance and real-data efficiency. This work presents an open benchmark for both offline and online methods (SimOpt, BayRn, DROID, DROPO), to investigate current limitations on multiple settings and tasks. We found that online methods are limited by the quality of the currently learned policy for the next iteration, while offline methods may sometimes fail when replaying trajectories in simulation with open-loop commands. The code used is publicly available at https://github.com/gabrieletiboni/adr-benchmark.
KW - Benchmark
KW - Domain randomization
KW - Robot learning
KW - Sim-to-real
UR - http://www.scopus.com/inward/record.url?scp=85147986185&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-22731-8_12
DO - 10.1007/978-3-031-22731-8_12
M3 - Conference contribution
AN - SCOPUS:85147986185
SN - 978-3-031-22730-1
T3 - Springer Proceedings in Advanced Robotics
SP - 158
EP - 173
BT - Human-Friendly Robotics 2022 - HFR
A2 - Borja, Pablo
A2 - Della Santina, Cosimo
A2 - Peternel, Luka
A2 - Torta, Elena
PB - Springer
T2 - International Workshop on Human-Friendly Robotics
Y2 - 22 September 2022 through 23 September 2022
ER -