Combinatorial Neural Bandits

by   TaeHyun Hwang, et al.

We consider a contextual combinatorial bandit problem where in each round a learning agent selects a subset of arms and receives feedback on the selected arms according to their scores. The score of an arm is an unknown function of the arm's feature. Approximating this unknown score function with deep neural networks, we propose algorithms: Combinatorial Neural UCB () and Combinatorial Neural Thompson Sampling (). We prove that achieves π’ͺΜƒ(dΜƒβˆš(T)) or π’ͺΜƒ(√(dΜƒ T K)) regret, where dΜƒ is the effective dimension of a neural tangent kernel matrix, K is the size of a subset of arms, and T is the time horizon. For , we adapt an optimistic sampling technique to ensure the optimism of the sampled combinatorial action, achieving a worst-case (frequentist) regret of π’ͺΜƒ(dΜƒβˆš(TK)). To the best of our knowledge, these are the first combinatorial neural bandit algorithms with regret performance guarantees. In particular, is the first Thompson sampling algorithm with the worst-case regret guarantees for the general contextual combinatorial bandit problem. The numerical experiments demonstrate the superior performances of our proposed algorithms.


page 1

page 2

page 3

page 4

βˆ™ 07/07/2019

Thompson Sampling for Combinatorial Network Optimization in Unknown Environments

Influence maximization, item recommendation, adaptive routing and dynami...
βˆ™ 06/12/2021

Simple Combinatorial Algorithms for Combinatorial Bandits: Corruptions and Approximations

We consider the stochastic combinatorial semi-bandit problem with advers...
βˆ™ 06/07/2020

Thompson Sampling for Multinomial Logit Contextual Bandits

We consider a dynamic assortment selection problem where the goal is to ...
βˆ™ 10/05/2021

Contextual Combinatorial Volatile Bandits via Gaussian Processes

We consider a contextual bandit problem with a combinatorial action set ...
βˆ™ 02/09/2021

Robust Bandit Learning with Imperfect Context

A standard assumption in contextual multi-arm bandit is that the true co...
βˆ™ 09/05/2019

An Arm-wise Randomization Approach to Combinatorial Linear Semi-bandits

Combinatorial linear semi-bandits (CLS) are widely applicable frameworks...
βˆ™ 02/20/2019

A Note on Bounding Regret of the C^2UCB Contextual Combinatorial Bandit

We revisit the proof by Qin et al. (2014) of bounded regret of the C^2UC...

Please sign up or login with your details

Forgot password? Click here to reset