Contextual Policies Enable Efficient and Interpretable Inverse Reinforcement Learning for Populations

Ville Tanskanen, Chang Rajani, Perttu Hämäläinen, Christian Guckelsberger, Arto Klami

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Inverse reinforcement learning (IRL) methods learn a reward function from expert demonstrations such as human behavior, offering a practical solution for crafting reward functions for complex environments. However, IRL is computationally expensive when applied to large populations of demonstrators, as existing IRL algorithms require solving a separate reinforcement learning (RL) problem for each individual. We propose a new IRL approach that relies on contextual RL, where an optimal policy is learned for multiple contexts. We first learn a contextual policy that provides the RL solution directly for a parametric family of reward functions, and then re-use it for IRL on each individual within the population. We motivate our method within the scenario of AI-driven playtesting of videogames, and focus on an interpretable family of reward functions. We evaluate the method on a navigation task and the battle arena game Derk, where it successfully recovers distinct player reward preferences from a simulated population and provides substantial time savings compared to a solid baseline of adversarial IRL.
Original languageEnglish
Number of pages23
JournalTransactions on Machine Learning Research
Publication statusPublished - 10 Jul 2024
MoE publication typeA1 Journal article-refereed

Fingerprint

Dive into the research topics of 'Contextual Policies Enable Efficient and Interpretable Inverse Reinforcement Learning for Populations'. Together they form a unique fingerprint.

Cite this