Preference-based Optimization

Project Overview

The Preference-based Optimization project aims to develop optimization algorithms that efficiently learn from human or system preferences, rather than relying solely on explicit numerical feedback. This approach is particularly valuable in scenarios where quantitative evaluation is difficult or subjective, including user experience design, personalized medicine, and robotics.

Demonstration

SelfSparring

The SelfSparring algorithm simplifies the multi-dueling bandits problem to a conventional bandit setting using Thompson Sampling, while modeling dependencies with a Gaussian process prior.

1-d Example

CoSpar

CoSpar is a preference-based learning method. It is based on the SelfSparring algorithm and allows users to actively provide coactive feedback while accepting preference feedback.

With coactive feedback

Without coactive feedback

Research Papers

Multi-dueling bandits with dependent arms

Yanan Sui, Vincent Zhuang, Joel Burdick, Yisong Yue

Conference on Uncertainty in Artificial Intelligence (UAI), 2017

Paper Code

Preference-Based Learning for Exoskeleton Gait Optimization

Maegan Tucker, Ellen Novoseller, Claudia Kann, Yanan Sui, Yisong Yue, Joel Burdick, Aaron D. Ames

International Conference on Robotics and Automation (ICRA), 2020