SelfSparring
The SelfSparring algorithm simplifies the multi-dueling bandits problem to a conventional bandit setting using Thompson Sampling, while modeling dependencies with a Gaussian process prior.
The Preference-based Optimization project aims to develop optimization algorithms that efficiently learn from human or system preferences, rather than relying solely on explicit numerical feedback. This approach is particularly valuable in scenarios where quantitative evaluation is difficult or subjective, including user experience design, personalized medicine, and robotics.
The SelfSparring algorithm simplifies the multi-dueling bandits problem to a conventional bandit setting using Thompson Sampling, while modeling dependencies with a Gaussian process prior.
CoSpar is a preference-based learning method. It is based on the SelfSparring algorithm and allows users to actively provide coactive feedback while accepting preference feedback.
Yanan Sui, Vincent Zhuang, Joel Burdick, Yisong Yue
Conference on Uncertainty in Artificial Intelligence (UAI), 2017