Non-Compliant Bandits

By Branislav Kveton et al
Published on Oct. 21, 2023
Read the original document by opening this link in a new tab.

Table of Contents

1. Introduction
2. Setting
3. Algorithms
4. Analysis

Summary

Non-Compliant Bandits is a paper that introduces the concept of non-compliant bandit algorithms, focusing on learning rewarding actions that comply with downstream tasks. The paper proposes two algorithms, CompUCB and CompTS, which aim to handle non-compliance in bandit settings. The algorithms act optimistically based on compliance probabilities and mean rewards. The reward model is linear, and the compliance model is logistic, making the algorithms computationally efficient. The paper provides a detailed analysis of the algorithms, proving regret bounds and discussing the assumptions for their validity.
×
This is where the content will go.