Best-of-Both-Worlds Algorithms for Partial Monitoring

07/29/2022
by   Taira Tsuchiya, et al.
0

This paper considers the partial monitoring problem with k-actions and d-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are bounded poly-logarithmically in the stochastic regime and near-optimally in the adversarial regime. To be more specific, we show that for non-degenerate locally observable games, the regret in the stochastic regime is bounded by O(k^3 m^2 log(T) log(k_Π T) / Δ_min) and in the adversarial regime by O(k^2/3 m √(T log(T) log k_Π)), where T is the number of rounds, m is the maximum number of distinct observations per action, Δ_min is the minimum optimality gap, and k_Π is the number of Pareto optimal actions. Moreover, we show that for non-degenerate globally observable games, the regret in the stochastic regime is bounded by O(max{c_𝒢^2 / k, c_𝒢}log(T) log(k_Π T) / Δ_min^2) and in the adversarial regime by O((max{c_𝒢^2 / k, c_𝒢}log(T) log(k_Π T)))^1/3 T^2/3), where c_𝒢 is a game-dependent constant. Our algorithms are based on the follow-the-regularized-leader framework that takes into account the nature of the partial monitoring problem, inspired by algorithms in the field of online learning with feedback graphs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2022

Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs

This study considers online learning with general directed feedback grap...
research
04/25/2022

Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games

In this paper we establish efficient and uncoupled learning dynamics so ...
research
07/12/2019

Exploration by Optimisation in Partial Monitoring

We provide a simple and efficient algorithm for adversarial k-action d-o...
research
02/20/2023

A Blackbox Approach to Best of Both Worlds in Bandits and Beyond

Best-of-both-worlds algorithms for online learning which achieve near-op...
research
03/23/2021

Improved Analysis of Robustness of the Tsallis-INF Algorithm to Adversarial Corruptions in Stochastic Multiarmed Bandits

We derive improved regret bounds for the Tsallis-INF algorithm of Zimmer...
research
05/26/2023

Stability-penalty-adaptive Follow-the-regularized-leader: Sparsity, Game-dependency, and Best-of-both-worlds

Adaptivity to the difficulties of a problem is a key property in sequent...
research
02/24/2023

Best-of-Three-Worlds Linear Bandit Algorithm with Variance-Adaptive Regret Bounds

This paper proposes a linear bandit algorithm that is adaptive to enviro...

Please sign up or login with your details

Forgot password? Click here to reset