State-Conditioned Adversarial Subgoal Generation

Vivienne Huiling Wang, Joni Pajarinen, Tinghuai Wang, Joni Kristian Kämäräinen

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

Abstract

Hierarchical reinforcement learning (HRL) proposes to solve difficult tasks by performing decision-making and control at successively higher levels of temporal abstraction. However, off-policy HRL often suffers from the problem of a non-stationary high-level policy since the low-level policy is constantly changing. In this paper, we propose a novel HRL approach for mitigating the non-stationarity by adversarially enforcing the high-level policy to generate subgoals compatible with the current instantiation of the low-level policy. In practice, the adversarial learning is implemented by training a simple state conditioned discriminator network concurrently with the high-level policy which determines the compatibility level of subgoals. Comparison to state-of-the-art algorithms shows that our approach improves both learning efficiency and performance in challenging continuous control tasks.

Original languageEnglish
Title of host publicationAAAI-23 Technical Tracks 8
EditorsBrian Williams, Yiling Chen, Jennifer Neville
PublisherAAAI Press
Pages10184-10191
Number of pages8
ISBN (Electronic)978-1-57735-880-0
DOIs
Publication statusPublished - 26 Jun 2023
MoE publication typeA4 Conference publication
EventAAAI Conference on Artificial Intelligence - Walter E. Washington Convention Center, Washington, United States
Duration: 7 Feb 202314 Feb 2023
Conference number: 37
https://aaai-23.aaai.org/

Publication series

NameProceedings of the AAAI Conference on Artificial Intelligence
Volume37
ISSN (Electronic)2374-3468

Conference

ConferenceAAAI Conference on Artificial Intelligence
Abbreviated titleAAAI
Country/TerritoryUnited States
CityWashington
Period07/02/202314/02/2023
Internet address

Fingerprint

Dive into the research topics of 'State-Conditioned Adversarial Subgoal Generation'. Together they form a unique fingerprint.

Cite this