Banker Online Mirror Descent

06/16/2021
by   Jiatai Huang, et al.
0

We propose Banker-OMD, a novel framework generalizing the classical Online Mirror Descent (OMD) technique in online learning algorithm design. Banker-OMD allows algorithms to robustly handle delayed feedback, and offers a general methodology for achieving Õ(√(T) + √(D))-style regret bounds in various delayed-feedback online learning tasks, where T is the time horizon length and D is the total feedback delay. We demonstrate the power of Banker-OMD with applications to three important bandit scenarios with delayed feedback, including delayed adversarial Multi-armed bandits (MAB), delayed adversarial linear bandits, and a novel delayed best-of-both-worlds MAB setting. Banker-OMD achieves nearly-optimal performance in all the three settings. In particular, it leads to the first delayed adversarial linear bandit algorithm achieving Õ(poly(n)(√(T) + √(D))) regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2023

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

We derive a new analysis of Follow The Regularized Leader (FTRL) for onl...
research
07/09/2018

Delayed Bandit Online Learning with Unknown Delays

This paper studies bandit learning problems with delayed feedback, which...
research
12/19/2016

Corralling a Band of Bandit Algorithms

We study the problem of combining multiple bandit algorithms (that is, o...
research
10/14/2020

Online Learning with Vector Costs and Bandits with Knapsacks

We introduce online learning with vector costs () where in each time ste...
research
06/16/2023

Understanding the Role of Feedback in Online Learning with Switching Costs

In this paper, we study the role of feedback in online learning with swi...
research
11/03/2017

Learning to Bid Without Knowing your Value

We address online learning in complex auction settings, such as sponsore...
research
08/22/2016

Multi-Dueling Bandits and Their Application to Online Ranker Evaluation

New ranking algorithms are continually being developed and refined, nece...

Please sign up or login with your details

Forgot password? Click here to reset