An Improved Best-of-both-worlds Algorithm for Bandits with Delayed Feedback

08/21/2023
by   Saeed Masoudian, et al.
0

We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback. The algorithm improves on prior work by Masoudian et al. [2022] by eliminating the need in prior knowledge of the maximal delay d_max and providing tighter regret bounds in both regimes. The algorithm and its regret bounds are based on counts of outstanding observations (a quantity that is observed at action time) rather than delays or the maximal delay (quantities that are only observed when feedback arrives). One major contribution is a novel control of distribution drift, which is based on biased loss estimators and skipping of observations with excessively large delays. Another major contribution is demonstrating that the complexity of best-of-both-worlds bandits with delayed feedback is characterized by the cumulative count of outstanding observations after skipping of observations with excessively large delays, rather than the delays or the maximal delay.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2019

Nonstochastic Multiarmed Bandits with Unrestricted Delays

We investigate multiarmed bandits with delayed feedback, where the delay...
research
06/29/2022

A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback

We present a modified tuning of the algorithm of Zimmert and Seldin [202...
research
10/14/2019

An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays

We propose a new algorithm for adversarial multi-armed bandits with unre...
research
07/21/2022

Delayed Feedback in Generalised Linear Bandits Revisited

The stochastic generalised linear bandit is a well-understood model for ...
research
05/30/2023

Delayed Bandits: When Do Intermediate Observations Help?

We study a K-armed bandit with delayed feedback and intermediate observa...
research
02/24/2022

Thompson Sampling with Unrestricted Delays

We investigate properties of Thompson Sampling in the stochastic multi-a...
research
05/03/2023

Predictive Wand: a mathematical interface design for operations with delays

Action-feedback delay during operation reduces both task performance and...

Please sign up or login with your details

Forgot password? Click here to reset