On Interpolating Experts and Multi-Armed Bandits

07/14/2023
โˆ™
by   Houshuang Chen, et al.
โˆ™
0
โˆ™

Learning with expert advice and multi-armed bandit are two classic online decision problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating the two. For a vector ๐ฆ=(m_1,โ€ฆ,m_K)โˆˆโ„•^K, an instance of ๐ฆ-MAB indicates that the arms are partitioned into K groups and the i-th group contains m_i arms. Once an arm is pulled, the losses of all arms in the same group are observed. We prove tight minimax regret bounds for ๐ฆ-MAB and design an optimal PAC algorithm for its pure exploration version, ๐ฆ-BAI, where the goal is to identify the arm with minimum loss with as few rounds as possible. We show that the minimax regret of ๐ฆ-MAB is ฮ˜(โˆš(Tโˆ‘_k=1^Klog (m_k+1))) and the minimum number of pulls for an (ฯต,0.05)-PAC algorithm of ๐ฆ-BAI is ฮ˜(1/ฯต^2ยทโˆ‘_k=1^Klog (m_k+1)). Both our upper bounds and lower bounds for ๐ฆ-MAB can be extended to a more general setting, namely the bandit with graph feedback, in terms of the clique cover and related graph parameters. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
โˆ™ 02/13/2020

Tight Lower Bounds for Combinatorial Multi-Armed Bandits

The Combinatorial Multi-Armed Bandit problem is a sequential decision-ma...
research
โˆ™ 05/30/2022

Improved Algorithms for Bandit with Graph Feedback via Regret Decomposition

The problem of bandit with graph feedback generalizes both the multi-arm...
research
โˆ™ 11/17/2021

Max-Min Grouped Bandits

In this paper, we introduce a multi-armed bandit problem termed max-min ...
research
โˆ™ 05/15/2017

Bandit Regret Scaling with the Effective Loss Range

We study how the regret guarantees of nonstochastic multi-armed bandits ...
research
โˆ™ 06/05/2023

Online Learning with Feedback Graphs: The True Shape of Regret

Sequential learning with feedback graphs is a natural extension of the m...
research
โˆ™ 05/04/2021

Optimal Algorithms for Range Searching over Multi-Armed Bandits

This paper studies a multi-armed bandit (MAB) version of the range-searc...
research
โˆ™ 08/11/2022

Regret Analysis for Hierarchical Experts Bandit Problem

We study an extension of standard bandit problem in which there are R la...

Please sign up or login with your details

Forgot password? Click here to reset