Abstrakti

Learning policies from multiple demonstrators is often difficult because different individuals perform the same task differently due to hidden factors such as preferences. In the context of policy learning, this leads to multimodal policies. Existing policy learning methods often converge to a single solution mode, failing to capture the diversity in the solution space. In this paper, we introduce an imitation-guided reinforcement learning framework to solve the multimodal policy learning problem from a limited number of state-only demonstrations. Then, we propose LfBD (Learning from Behaviourally diverse Demonstration), an algorithm that builds a parameterised solution space to capture the variability in the behaviour space defined by demonstrations. To this end, we define a projection function based on the state density distributions from demonstrations to define such space. Our goal is not only to learn how to solve the task as the human demonstrator but also to extrapolate beyond the provided demonstrations. In addition, we show that with our method, we can perform a post-hoc policy search in the built solution space to recover policies that satisfy specific constraints or to find a policy that matches a given (state-only) behaviour.
AlkuperäiskieliEnglanti
Otsikko2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
KustantajaIEEE
Sivut1675-1682
Sivumäärä8
ISBN (elektroninen)978-1-6654-9190-7
DOI - pysyväislinkit
TilaJulkaistu - 2023
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaIEEE/RSJ International Conference on Intelligent Robots and Systems
- Detroit, Yhdysvallat
Kesto: 1 lokak. 20235 lokak. 2023

Julkaisusarja

NimiProceedings of the IEEE/RSJ international conference on intelligent robots and systems
KustantajaIEEE
ISSN (elektroninen)2153-0866

Conference

ConferenceIEEE/RSJ International Conference on Intelligent Robots and Systems
LyhennettäIROS
Maa/AlueYhdysvallat
KaupunkiDetroit
Ajanjakso01/10/202305/10/2023

Sormenjälki

Sukella tutkimusaiheisiin 'Imitation-guided Multimodal Policy Generation from Behaviourally Diverse Demonstrations'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä