Boosted Curriculum Reinforcement Learning

Pascal Klink, Carlo D'Eramo, Jan Peters, Joni Pajarinen

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsProfessional

Abstract

Curriculum value-based reinforcement learning (RL) solves a complex target task by reusing action-values across a tailored sequence of related tasks of increasing difficulty. However, finding an exact way of reusing action-values in this setting is still a poorly understood problem. In this paper, we introduce the concept of boosting to curriculum value-based RL, by approximating the action-value function as a sum of residuals trained on each task. This approach, which we refer to as boosted curriculum reinforcement learning (BCRL), has the benefit of naturally increasing the representativeness of the functional space by adding a new residual each time a new task is presented. This procedure allows reusing previous action-values while promoting expressiveness of the action-value function. We theoretically study BCRL as an approximate value iteration algorithm, discussing advantages over regular curriculum RL in terms of approximation accuracy and convergence to the optimal action-value function. Finally, we provide detailed empirical evidence of the benefits of BCRL in problems requiring curricula for accurate action-value estimation and targeted exploration.
Original languageEnglish
Title of host publicationInternational Conference on Learning Representations
PublisherOpenReview.net
Number of pages22
Publication statusPublished - 2022
MoE publication typeD3 Professional conference proceedings
EventInternational Conference on Learning Representations - Virtual, Online
Duration: 25 Apr 202229 Apr 2022
Conference number: 10

Conference

ConferenceInternational Conference on Learning Representations
Abbreviated titleICLR
CityVirtual, Online
Period25/04/202229/04/2022

Fingerprint

Dive into the research topics of 'Boosted Curriculum Reinforcement Learning'. Together they form a unique fingerprint.

Cite this