Topological Experience Replay

Zhang-Wei Hong, Tao Chen, Yen-Chen Lin, Joni Pajarinen, Pulkit Agrawal

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsProfessional

Abstrakti

State-of-the-art deep Q-learning methods update Q-values using state transition tuples sampled from the experience replay buffer. This strategy often randomly samples or prioritizes data sampling based on measures such as the temporal difference (TD) error. Such sampling strategies can be inefficient at learning Q-function since a state's correct Q-value preconditions on the accurate successor states' Q-value. Disregarding such a successor's value dependency leads to useless updates and even learning wrong values.
To expedite Q-learning, we maintain states' dependency by organizing the agent's experience into a graph. Each edge in the graph represents a transition between two connected states. We perform value backups via a breadth-first search that expands vertices in the graph starting from the set of terminal states successively moving backward. We empirically show that our method is substantially more data-efficient than several baselines on a diverse range of goal-reaching tasks. Notably, the proposed method also outperforms baselines that consume more batches of training experience.
AlkuperäiskieliEnglanti
OtsikkoInternational Conference on Learning Representations
KustantajaOpenReview.net
Sivumäärä24
TilaJulkaistu - 2022
OKM-julkaisutyyppiD3 Artikkeli ammatillisessa konferenssijulkaisussa
TapahtumaInternational Conference on Learning Representations - Virtual, Online
Kesto: 25 huhtik. 202229 huhtik. 2022
Konferenssinumero: 10

Conference

ConferenceInternational Conference on Learning Representations
LyhennettäICLR
KaupunkiVirtual, Online
Ajanjakso25/04/202229/04/2022

Sormenjälki

Sukella tutkimusaiheisiin 'Topological Experience Replay'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä