Learning Contextual Bandits Through Perturbed Rewards

01/24/2022
by   Yiling Jia, et al.
4

Thanks to the power of representation learning, neural contextual bandit algorithms demonstrate remarkable performance improvement against their classical counterparts. But because their exploration has to be performed in the entire neural network parameter space to obtain nearly optimal regret, the resulting computational cost is prohibitively high. We perturb the rewards when updating the neural network to eliminate the need of explicit exploration and the corresponding computational overhead. We prove that a Õ(d̃√(T)) regret upper bound is still achievable under standard regularity conditions, where T is the number of rounds of interactions and d̃ is the effective dimension of a neural tangent kernel matrix. Extensive comparisons with several benchmark contextual bandit algorithms, including two recent neural contextual bandit models, demonstrate the effectiveness and computational efficiency of our proposed neural bandit algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2019

Neural Contextual Bandits with Upper Confidence Bound-Based Exploration

We study the stochastic contextual bandit problem, where the reward is g...
research
02/20/2019

AdaLinUCB: Opportunistic Learning for Contextual Bandits

In this paper, we propose and study opportunistic contextual bandits - a...
research
06/29/2021

Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit

Balancing exploration and exploitation (EE) is a fundamental problem in ...
research
12/03/2020

Neural Contextual Bandits with Deep Representation and Shallow Exploration

We study a general class of contextual bandits, where each context-actio...
research
06/13/2022

Scalable Exploration for Neural Online Learning to Rank with Perturbed Feedback

Deep neural networks (DNNs) demonstrate significant advantages in improv...
research
02/11/2022

Efficient Kernel UCB for Contextual Bandits

In this paper, we tackle the computational efficiency of kernelized UCB ...
research
02/20/2019

A Note on Bounding Regret of the C^2UCB Contextual Combinatorial Bandit

We revisit the proof by Qin et al. (2014) of bounded regret of the C^2UC...

Please sign up or login with your details

Forgot password? Click here to reset