Bandits with Deterministically Evolving States

07/21/2023
by   Khashayar Khosravi, et al.
0

We propose a model for learning with bandit feedback while accounting for deterministically evolving and unobservable states that we call Bandits with Deterministically Evolving States. The workhorse applications of our model are learning for recommendation systems and learning for online ads. In both cases, the reward that the algorithm obtains at each round is a function of the short-term reward of the action chosen and how “healthy” the system is (i.e., as measured by its state). For example, in recommendation systems, the reward that the platform obtains from a user's engagement with a particular type of content depends not only on the inherent features of the specific content, but also on how the user's preferences have evolved as a result of interacting with other types of content on the platform. Our general model accounts for the different rate λ∈ [0,1] at which the state evolves (e.g., how fast a user's preferences shift as a result of previous content consumption) and encompasses standard multi-armed bandits as a special case. The goal of the algorithm is to minimize a notion of regret against the best-fixed sequence of arms pulled. We analyze online learning algorithms for any possible parametrization of the evolution rate λ. Specifically, the regret rates obtained are: for λ∈ [0, 1/T^2]: O(√(KT)); for λ = T^-a/b with b < a < 2b: O (T^b/a); for λ∈ (1/T, 1 - 1/√(T)): O (K^1/3T^2/3); and for λ∈ [1 - 1/√(T), 1]: O (K√(T)).

READ FULL TEXT
research
06/18/2020

Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect

We study the effect of persistence of engagement on learning in a stocha...
research
11/14/2019

Unreliable Multi-Armed Bandits: A Novel Approach to Recommendation Systems

We use a novel modification of Multi-Armed Bandits to create a new model...
research
02/12/2022

Online Bayesian Recommendation with No Regret

We introduce and study the online Bayesian recommendation problem for a ...
research
07/06/2018

Combinatorial Bandits for Incentivizing Agents with Dynamic Preferences

The design of personalized incentives or recommendations to improve user...
research
09/21/2020

Bandits Under The Influence (Extended Version)

Recommender systems should adapt to user interests as the latter evolve....
research
01/05/2021

Sequential Choice Bandits with Feedback for Personalizing users' experience

In this work, we study sequential choice bandits with feedback. We propo...
research
01/04/2018

Lazy Restless Bandits for Decision Making with Limited Observation Capability: Applications in Wireless Networks

In this work we formulate the problem of restless multi-armed bandits wi...

Please sign up or login with your details

Forgot password? Click here to reset