When Combinatorial Thompson Sampling meets Approximation Regret

by   Pierre Perrault, et al.

We study the Combinatorial Thompson Sampling policy (CTS) for combinatorial multi-armed bandit problems (CMAB), within an approximation regret setting. Although CTS has attracted a lot of interest, it has a drawback that other usual CMAB policies do not have when considering non-exact oracles: for some oracles, CTS has a poor approximation regret (scaling linearly with the time horizon T) [Wang and Chen, 2018]. A study is then necessary to discriminate the oracles on which CTS could learn. This study was started by Kong et al. [2021]: they gave the first approximation regret analysis of CTS for the greedy oracle, obtaining an upper bound of order π’ͺ(log(T)/Ξ”^2), where Ξ” is some minimal reward gap. In this paper, our objective is to push this study further than the simple case of the greedy oracle. We provide the first π’ͺ(log(T)/Ξ”) approximation regret upper bound for CTS, obtained under a specific condition on the approximation oracle, allowing a reduction to the exact oracle analysis. We thus term this condition REDUCE2EXACT, and observe that it is satisfied in many concrete examples. Moreover, it can be extended to the probabilistically triggered arms setting, thus capturing even more problems, such as online influence maximization.


page 1

page 2

page 3

page 4

βˆ™ 11/08/2021

The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle

Thompson sampling (TS) has attracted a lot of interest in the bandit are...
βˆ™ 03/13/2018

Thompson Sampling for Combinatorial Semi-Bandits

We study the application of the Thompson Sampling (TS) methodology to th...
βˆ™ 06/12/2021

Simple Combinatorial Algorithms for Combinatorial Bandits: Corruptions and Approximations

We consider the stochastic combinatorial semi-bandit problem with advers...
βˆ™ 07/24/2017

Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms: A Case with Bounded Regret

In this paper, we study the combinatorial multi-armed bandit problem (CM...
βˆ™ 07/07/2019

Thompson Sampling for Combinatorial Network Optimization in Unknown Environments

Influence maximization, item recommendation, adaptive routing and dynami...
βˆ™ 06/24/2020

Online Competitive Influence Maximization

Online influence maximization has attracted much attention as a way to m...
βˆ™ 05/22/2018

Cost-aware Cascading Bandits

In this paper, we propose a cost-aware cascading bandits model, a new va...

Please sign up or login with your details

Forgot password? Click here to reset