Interactive Reward Tuning: Interactive Visualization for Preference Elicitation

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

55 Downloads (Pure)

Abstract

In reinforcement learning, tuning reward weights in the reward function is necessary to align behavior with user preferences. However, current approaches, which use pairwise comparisons for preference elicitation, are inefficient, because they miss much of the human ability to explore and judge groups of candidate solutions. The paper presents a novel visualization-based approach that better exploits the user’s ability to quickly recognize interesting directions for reward tuning. It breaks down the tuning problem by using the visual information-seeking principle: overview first, zoom and filter, then details-on-demand. Following this principle, we built a visualization system comprising two interactively linked views: 1) an embedding view showing a contextual overview of all sampled behaviors and 2) a sample view displaying selected behaviors and visualizations of the detailed time-series data. A user can efficiently explore large sets of samples by iterating between these two views. The paper demonstrates that the proposed approach is capable of tuning rewards for challenging behaviors. The simulation-based evaluation shows that the system can reach optimal solutions with fewer queries relative to baselines.
Original languageEnglish
Title of host publication2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
PublisherIEEE
Number of pages8
ISBN (Electronic)979-8-3503-7770-5
DOIs
Publication statusPublished - 2024
MoE publication typeA4 Conference publication
EventIEEE/RSJ International Conference on Intelligent Robots and Systems - ADNEC, Abu Dhabi, United Arab Emirates
Duration: 14 Oct 202418 Oct 2024
https://iros2024-abudhabi.org/

Publication series

Name Proceedings of the International Conference on Intelligent Robots and Systems
ISSN (Electronic)2153-0866

Conference

ConferenceIEEE/RSJ International Conference on Intelligent Robots and Systems
Abbreviated titleIROS
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period14/10/202418/10/2024
Internet address

Fingerprint

Dive into the research topics of 'Interactive Reward Tuning: Interactive Visualization for Preference Elicitation'. Together they form a unique fingerprint.

Cite this