Pareto Optimal Model Selection in Linear Bandits

by   Yinglun Zhu, et al.

We study a model selection problem in the linear bandit setting, where the learner must adapt to the dimension of the optimal hypothesis class on the fly and balance exploration and exploitation. More specifically, we assume a sequence of nested linear hypothesis classes with dimensions d_1 < d_2 < …, and the goal is to automatically adapt to the smallest hypothesis class that contains the true linear model. Although previous papers provide various guarantees for this model selection problem, the analysis therein either works in favorable cases when one can cheaply conduct statistical testing to locate the right hypothesis class or is based on the idea of "corralling" multiple base algorithms which often performs relatively poorly in practice. These works also mainly focus on upper bounding the regret. In this paper, we first establish a lower bound showing that, even with a fixed action set, adaptation to the unknown intrinsic dimension d_⋆ comes at a cost: there is no algorithm that can achieve the regret bound O(√(d_⋆ T)) simultaneously for all values of d_⋆. We also bring new ideas, i.e., constructing virtual mixture-arms to effectively summarize useful information, into the model selection problem in linear bandits. Under a mild assumption on the action set, we design a Pareto optimal algorithm with guarantees matching the rate in the lower bound. Experimental results confirm our theoretical results and show advantages of our algorithm compared to prior work.


page 1

page 2

page 3

page 4


Near Instance Optimal Model Selection for Pure Exploration Linear Bandits

The model selection problem in the pure exploration linear bandit settin...

Model selection for contextual bandits

We introduce the problem of model selection for contextual bandits, wher...

Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference

We present a non-asymptotic lower bound on the eigenspectrum of the desi...

Open Problem: Model Selection for Contextual Bandits

In statistical learning, algorithms for model selection allow the learne...

Best of Both Worlds Model Selection

We study the problem of model selection in bandit scenarios in the prese...

Model Selection in Contextual Stochastic Bandit Problems

We study model selection in stochastic bandit problems. Our approach rel...

Model Selection in Reinforcement Learning with General Function Approximations

We consider model selection for classic Reinforcement Learning (RL) envi...

Please sign up or login with your details

Forgot password? Click here to reset