State-Conditioned Adversarial Subgoal Generation

Vivienne Huiling Wang, Joni Pajarinen, Tinghuai Wang, Joni Kristian Kämäräinen

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

8 Sitaatiot (Scopus)

Abstrakti

Hierarchical reinforcement learning (HRL) proposes to solve difficult tasks by performing decision-making and control at successively higher levels of temporal abstraction. However, off-policy HRL often suffers from the problem of a non-stationary high-level policy since the low-level policy is constantly changing. In this paper, we propose a novel HRL approach for mitigating the non-stationarity by adversarially enforcing the high-level policy to generate subgoals compatible with the current instantiation of the low-level policy. In practice, the adversarial learning is implemented by training a simple state conditioned discriminator network concurrently with the high-level policy which determines the compatibility level of subgoals. Comparison to state-of-the-art algorithms shows that our approach improves both learning efficiency and performance in challenging continuous control tasks.

AlkuperäiskieliEnglanti
OtsikkoAAAI-23 Technical Tracks 8
ToimittajatBrian Williams, Yiling Chen, Jennifer Neville
KustantajaAAAI Press
Sivut10184-10191
Sivumäärä8
ISBN (elektroninen)978-1-57735-880-0
DOI - pysyväislinkit
TilaJulkaistu - 26 kesäk. 2023
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaAAAI Conference on Artificial Intelligence - Walter E. Washington Convention Center, Washington, Yhdysvallat
Kesto: 7 helmik. 202314 helmik. 2023
Konferenssinumero: 37
https://aaai-23.aaai.org/

Julkaisusarja

NimiProceedings of the AAAI Conference on Artificial Intelligence
Vuosikerta37
ISSN (elektroninen)2374-3468

Conference

ConferenceAAAI Conference on Artificial Intelligence
LyhennettäAAAI
Maa/AlueYhdysvallat
KaupunkiWashington
Ajanjakso07/02/202314/02/2023
www-osoite

Sormenjälki

Sukella tutkimusaiheisiin 'State-Conditioned Adversarial Subgoal Generation'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä