ILPO-MP: Mode Priors Prevent Mode Collapse when Imitating Latent Policies from Observations

Oliver Struckmeier, Ville Kyrki

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Imitation learning from observations (IfO) constrains the classic imitation learning setting to cases where expert observations are easy to obtain, but no expert actions are available. Most existing IfO methods require access to task-specific cost functions or many interactions with the target environment. Learning a forward dynamics model in combination with a latent policy has been shown to solve these issues. However, the limited supervision in the IfO scenario can lead to mode collapse when learning the generative forward dynamics model and the corresponding latent policy. In this paper, we analyze the mode collapse problem in this setting and show that it is caused by a combination of deterministic expert data and bad initialization of the models. Under the assumption of piecewise continuous system dynamics, we propose ILPO-MP, a method to prevent the mode collapse using clustering of expert transitions to impose a mode prior on the generative model and the latent policy. We show that ILPO-MP prevents mode collapse and improves performance in a variety of environments.
Original languageEnglish
Number of pages17
JournalTransactions on Machine Learning Research
Issue number2835-8856
Publication statusPublished - 2 Nov 2023
MoE publication typeA1 Journal article-refereed

Keywords

  • Imitation learning

Fingerprint

Dive into the research topics of 'ILPO-MP: Mode Priors Prevent Mode Collapse when Imitating Latent Policies from Observations'. Together they form a unique fingerprint.

Cite this