BubbleRank

By Chang Li et al
Read the original document by opening this link in a new tab.

Table of Contents

Abstract
1 INTRODUCTION
2 BACKGROUND
2.1 CLICK MODELS
2.2 STOCHASTIC CLICK BANDIT
3 ONLINE LEARNING TO RE-RANK
3.1 ALGORITHM
4 THEORETICAL ANALYSIS
4.1 REGRET BOUND
4.2 SAFETY
4.3 DISCUSSION

Summary

BubbleRank is a bandit algorithm for safe online learning to re-rank, combining offline and online settings. It gradually improves an initial base list by exchanging items to improve the quality of displayed lists. The algorithm explores safely by bubbling up more attractive items. The theoretical analysis provides bounds on the n-step regret and proves the algorithm's safety. BubbleRank aims to optimize the re-ranking stage in production ranking systems by adaptively re-ranking items. The paper discusses click models, stochastic click bandit framework, and presents BubbleRank algorithm with detailed explanations and theoretical guarantees.
×
This is where the content will go.