Combinatorial Neural Bandits

05/31/2023
βˆ™
by   TaeHyun Hwang, et al.
βˆ™
0
βˆ™

We consider a contextual combinatorial bandit problem where in each round a learning agent selects a subset of arms and receives feedback on the selected arms according to their scores. The score of an arm is an unknown function of the arm's feature. Approximating this unknown score function with deep neural networks, we propose algorithms: Combinatorial Neural UCB () and Combinatorial Neural Thompson Sampling (). We prove that achieves π’ͺΜƒ(dΜƒβˆš(T)) or π’ͺΜƒ(√(dΜƒ T K)) regret, where dΜƒ is the effective dimension of a neural tangent kernel matrix, K is the size of a subset of arms, and T is the time horizon. For , we adapt an optimistic sampling technique to ensure the optimism of the sampled combinatorial action, achieving a worst-case (frequentist) regret of π’ͺΜƒ(dΜƒβˆš(TK)). To the best of our knowledge, these are the first combinatorial neural bandit algorithms with regret performance guarantees. In particular, is the first Thompson sampling algorithm with the worst-case regret guarantees for the general contextual combinatorial bandit problem. The numerical experiments demonstrate the superior performances of our proposed algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
βˆ™ 07/07/2019

Thompson Sampling for Combinatorial Network Optimization in Unknown Environments

Influence maximization, item recommendation, adaptive routing and dynami...
research
βˆ™ 06/12/2021

Simple Combinatorial Algorithms for Combinatorial Bandits: Corruptions and Approximations

We consider the stochastic combinatorial semi-bandit problem with advers...
research
βˆ™ 06/07/2020

Thompson Sampling for Multinomial Logit Contextual Bandits

We consider a dynamic assortment selection problem where the goal is to ...
research
βˆ™ 10/05/2021

Contextual Combinatorial Volatile Bandits via Gaussian Processes

We consider a contextual bandit problem with a combinatorial action set ...
research
βˆ™ 02/09/2021

Robust Bandit Learning with Imperfect Context

A standard assumption in contextual multi-arm bandit is that the true co...
research
βˆ™ 09/05/2019

An Arm-wise Randomization Approach to Combinatorial Linear Semi-bandits

Combinatorial linear semi-bandits (CLS) are widely applicable frameworks...
research
βˆ™ 02/20/2019

A Note on Bounding Regret of the C^2UCB Contextual Combinatorial Bandit

We revisit the proof by Qin et al. (2014) of bounded regret of the C^2UC...

Please sign up or login with your details

Forgot password? Click here to reset