A gaussian process reinforcement learning algorithm with adaptability and minimal tuning requirements

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

3 Citations (Scopus)


We present a novel Bayesian reinforcement learning algorithm that addresses model bias and exploration overhead issues. The algorithm combines different aspects of several state-of-the-art reinforcement learning methods that use Gaussian Processes model-based approaches to increase the use of the online data samples. The algorithm uses a smooth reward function requiring the reward value to be derived from the environment state. It works with continuous states and actions in a coherent way with a minimized need for expert knowledge in parameter tuning. We analyse and discuss the practical benefits of the selected approach in comparison to more traditional methodological choices, and illustrate the use of the algorithm in a motor control problem involving a two-link simulated arm.

Original languageEnglish
Title of host publicationArtificial Neural Networks and Machine Learning, ICANN 2014 - 24th International Conference on Artificial Neural Networks, Proceedings
Number of pages8
Volume8681 LNCS
ISBN (Print)9783319111780
Publication statusPublished - 2014
MoE publication typeA4 Conference publication
EventInternational Conference on Artificial Neural Networks - Hamburg, Germany
Duration: 15 Sept 201419 Sept 2014
Conference number: 24

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8681 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349


ConferenceInternational Conference on Artificial Neural Networks
Abbreviated titleICANN


  • batch reinforcement learning
  • Bayesian reinforcement learning
  • Gaussian processes
  • minimal domain-expert knowledge
  • Non-parametric reinforcement learning


Dive into the research topics of 'A gaussian process reinforcement learning algorithm with adaptability and minimal tuning requirements'. Together they form a unique fingerprint.

Cite this