ILPO-MP: Mode Priors Prevent Mode Collapse when Imitating Latent Policies from Observations

Oliver Struckmeier, Ville Kyrki

Tutkimustuotos: LehtiartikkeliArticleScientificvertaisarvioitu

Abstrakti

Imitation learning from observations (IfO) constrains the classic imitation learning setting to cases where expert observations are easy to obtain, but no expert actions are available. Most existing IfO methods require access to task-specific cost functions or many interactions with the target environment. Learning a forward dynamics model in combination with a latent policy has been shown to solve these issues. However, the limited supervision in the IfO scenario can lead to mode collapse when learning the generative forward dynamics model and the corresponding latent policy. In this paper, we analyze the mode collapse problem in this setting and show that it is caused by a combination of deterministic expert data and bad initialization of the models. Under the assumption of piecewise continuous system dynamics, we propose ILPO-MP, a method to prevent the mode collapse using clustering of expert transitions to impose a mode prior on the generative model and the latent policy. We show that ILPO-MP prevents mode collapse and improves performance in a variety of environments.
AlkuperäiskieliEnglanti
Sivumäärä17
JulkaisuTransactions on Machine Learning Research
Numero2835-8856
TilaJulkaistu - 2 marrask. 2023
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä

Sormenjälki

Sukella tutkimusaiheisiin 'ILPO-MP: Mode Priors Prevent Mode Collapse when Imitating Latent Policies from Observations'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä