ILPO-MP: Mode Priors Prevent Mode Collapse when Imitating Latent Policies from Observations

Oliver Struckmeier, Ville Kyrki

Tutkimustuotos: LehtiartikkeliArticleScientificvertaisarvioitu


Imitation learning from observations (IfO) constrains the classic imitation learning setting to cases where expert observations are easy to obtain, but no expert actions are available. Most existing IfO methods require access to task-specific cost functions or many interactions with the target environment. Learning a forward dynamics model in combination with a latent policy has been shown to solve these issues. However, the limited supervision in the IfO scenario can lead to mode collapse when learning the generative forward dynamics model and the corresponding latent policy. In this paper, we analyze the mode collapse problem in this setting and show that it is caused by a combination of deterministic expert data and bad initialization of the models. Under the assumption of piecewise continuous system dynamics, we propose ILPO-MP, a method to prevent the mode collapse using clustering of expert transitions to impose a mode prior on the generative model and the latent policy. We show that ILPO-MP prevents mode collapse and improves performance in a variety of environments.
JulkaisuTransactions on Machine Learning Research
TilaJulkaistu - 2 marrask. 2023
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä


Sukella tutkimusaiheisiin 'ILPO-MP: Mode Priors Prevent Mode Collapse when Imitating Latent Policies from Observations'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä