Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem

by   Nadav Merlis, et al.

We consider the combinatorial multi-armed bandit (CMAB) problem, where the reward function is nonlinear. In this setting, the agent chooses a batch of arms on each round and receives feedback from each arm of the batch. The reward that the agent aims to maximize is a function of the selected arms and their expectations. In many applications, the reward function is highly nonlinear, and the performance of existing algorithms relies on a global Lipschitz constant to encapsulate the function's nonlinearity. This may lead to loose regret bounds, since by itself, a large gradient does not necessarily cause a large regret, but only in regions where the uncertainty in the reward's parameters is high. To overcome this problem, we introduce a new smoothness criterion, which we term Gini-weighted smoothness, that takes into account both the nonlinearity of the reward and concentration properties of the arms. We show that a linear dependence of the regret in the batch size in existing algorithms can be replaced by this smoothness parameter. This, in turn, leads to much tighter regret bounds when the smoothness parameter is batch-size independent. For example, in the probabilistic maximum coverage (PMC) problem, that has many applications, including influence maximization, diverse recommendations and more, we achieve dramatic improvements in the upper bounds. We also prove matching lower bounds for the PMC problem and show that our algorithm is tight, up to a logarithmic factor in the problem's parameters.


page 1

page 2

page 3

page 4


Tight Lower Bounds for Combinatorial Multi-Armed Bandits

The Combinatorial Multi-Armed Bandit problem is a sequential decision-ma...

Multi-armed Bandit Algorithm against Strategic Replication

We consider a multi-armed bandit problem in which a set of arms is regis...

Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications

We study combinatorial multi-armed bandit with probabilistically trigger...

Online Stochastic Optimization under Correlated Bandit Feedback

In this paper we consider the problem of online stochastic optimization ...

Dual-Mandate Patrols: Multi-Armed Bandits for Green Security

Conservation efforts in green security domains to protect wildlife and f...

Filtered Poisson Process Bandit on a Continuum

We consider a version of the continuum armed bandit where an action indu...

Repeated Principal-Agent Games with Unobserved Agent Rewards and Perfect-Knowledge Agents

Motivated by a number of real-world applications from domains like healt...

Please sign up or login with your details

Forgot password? Click here to reset