No Discounted-Regret Learning in Adversarial Bandits with Delays

03/08/2021
by   Ilai Bistritz, et al.
0

Consider a player that in each round t out of T rounds chooses an action and observes the incurred cost after a delay of d_t rounds. The cost functions and the delay sequence are chosen by an adversary. We show that even if the players' algorithms lose their "no regret" property due to too large delays, the expected discounted ergodic distribution of play converges to the set of coarse correlated equilibrium (CCE) if the algorithms have "no discounted-regret". For a zero-sum game, we show that no discounted-regret is sufficient for the discounted ergodic average of play to converge to the set of Nash equilibria. We prove that the FKM algorithm with n dimensions achieves a regret of O(nT^3/4+√(n)T^1/3D^1/3) and the EXP3 algorithm with K arms achieves a regret of O(√(ln K(KT+D))) even when D=∑_t=1^Td_t and T are unknown. These bounds use a novel doubling trick that provably retains the regret bound for when D and T are known. Using these bounds, we show that EXP3 and FKM have no discounted-regret even for d_t=O(tlog t). Therefore, the CCE of a finite or convex unknown game can be approximated even when only delayed bandit feedback is available via simulation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2020

Unknown Delay for Adversarial Bandit Setting with Multiple Play

This paper addresses the problem of unknown delays in adversarial multi-...
research
02/27/2023

Equilibrium Bandits: Learning Optimal Equilibria of Unknown Dynamics

Consider a decision-maker that can pick one out of K actions to control ...
research
11/09/2018

Policy Regret in Repeated Games

The notion of policy regret in online learning is a well defined? perfor...
research
05/17/2022

Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks

In this paper, we present online algorithm called Delaytron for learning...
research
10/18/2021

Game Redesign in No-regret Game Playing

We study the game redesign problem in which an external designer has the...
research
05/20/2016

Adversarial Delays in Online Strongly-Convex Optimization

We consider the problem of strongly-convex online optimization in presen...
research
06/03/2019

Nonstochastic Multiarmed Bandits with Unrestricted Delays

We investigate multiarmed bandits with delayed feedback, where the delay...

Please sign up or login with your details

Forgot password? Click here to reset