Summary
Non-Compliant Bandits is a paper that introduces the concept of non-compliant bandit algorithms, focusing on learning rewarding actions that comply with downstream tasks. The paper proposes two algorithms, CompUCB and CompTS, which aim to handle non-compliance in bandit settings. The algorithms act optimistically based on compliance probabilities and mean rewards. The reward model is linear, and the compliance model is logistic, making the algorithms computationally efficient. The paper provides a detailed analysis of the algorithms, proving regret bounds and discussing the assumptions for their validity.