When Combinatorial Thompson Sampling meets Approximation Regret

02/22/2023
βˆ™
by   Pierre Perrault, et al.
βˆ™
0
βˆ™

We study the Combinatorial Thompson Sampling policy (CTS) for combinatorial multi-armed bandit problems (CMAB), within an approximation regret setting. Although CTS has attracted a lot of interest, it has a drawback that other usual CMAB policies do not have when considering non-exact oracles: for some oracles, CTS has a poor approximation regret (scaling linearly with the time horizon T) [Wang and Chen, 2018]. A study is then necessary to discriminate the oracles on which CTS could learn. This study was started by Kong et al. [2021]: they gave the first approximation regret analysis of CTS for the greedy oracle, obtaining an upper bound of order π’ͺ(log(T)/Ξ”^2), where Ξ” is some minimal reward gap. In this paper, our objective is to push this study further than the simple case of the greedy oracle. We provide the first π’ͺ(log(T)/Ξ”) approximation regret upper bound for CTS, obtained under a specific condition on the approximation oracle, allowing a reduction to the exact oracle analysis. We thus term this condition REDUCE2EXACT, and observe that it is satisfied in many concrete examples. Moreover, it can be extended to the probabilistically triggered arms setting, thus capturing even more problems, such as online influence maximization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
βˆ™ 11/08/2021

The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle

Thompson sampling (TS) has attracted a lot of interest in the bandit are...
research
βˆ™ 03/13/2018

Thompson Sampling for Combinatorial Semi-Bandits

We study the application of the Thompson Sampling (TS) methodology to th...
research
βˆ™ 06/12/2021

Simple Combinatorial Algorithms for Combinatorial Bandits: Corruptions and Approximations

We consider the stochastic combinatorial semi-bandit problem with advers...
research
βˆ™ 07/24/2017

Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms: A Case with Bounded Regret

In this paper, we study the combinatorial multi-armed bandit problem (CM...
research
βˆ™ 07/07/2019

Thompson Sampling for Combinatorial Network Optimization in Unknown Environments

Influence maximization, item recommendation, adaptive routing and dynami...
research
βˆ™ 06/24/2020

Online Competitive Influence Maximization

Online influence maximization has attracted much attention as a way to m...
research
βˆ™ 05/22/2018

Cost-aware Cascading Bandits

In this paper, we propose a cost-aware cascading bandits model, a new va...

Please sign up or login with your details

Forgot password? Click here to reset