Scalable Generalized Linear Bandits: Online Computation and Hashing

by   Kwang-Sung Jun, et al.

Generalized Linear Bandits (GLBs), a natural extension of the stochastic linear bandits, has been popular and successful in recent years. However, existing GLBs scale poorly with the number of rounds and the number of arms, limiting their utility in practice. This paper proposes new, scalable solutions to the GLB problem in two respects. First, unlike existing GLBs, whose per-time-step space and time complexity grow at least linearly with time t, we propose a new algorithm that performs online computations to enjoy a constant space and time complexity. At its heart is a novel Generalized Linear extension of the Online-to-confidence-set Conversion (GLOC method) that takes any online learning algorithm and turns it into a GLB algorithm. As a special case, we apply GLOC to the online Newton step algorithm, which results in a low-regret GLB algorithm with much lower time and memory complexity than prior work. Second, for the case where the number N of arms is very large, we propose new algorithms in which each next arm is selected via an inner product search. Such methods can be implemented via hashing algorithms (i.e., "hash-amenable") and result in a time complexity sublinear in N. While a Thompson sampling extension of GLOC is hash-amenable, its regret bound for d-dimensional arm sets scales with d^3/2, whereas GLOC's regret bound scales with d. Towards closing this gap, we propose a new hash-amenable algorithm whose regret bound scales with d^5/4. Finally, we propose a fast approximate hash-key computation (inner product) with a better accuracy than the state-of-the-art, which can be of independent interest. We conclude the paper with preliminary experimental results confirming the merits of our methods.


page 1

page 2

page 3

page 4


Linear Bandit Algorithms with Sublinear Time Complexity

We propose to accelerate existing linear bandit algorithms to achieve pe...

Tight Memory-Regret Lower Bounds for Streaming Bandits

In this paper, we investigate the streaming bandits problem, wherein the...

A Novel Confidence-Based Algorithm for Structured Bandits

We study finite-armed stochastic bandits where the rewards of each arm m...

Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits

We study the online restless bandit problem, where the state of each arm...

Conservative Exploration for Semi-Bandits with Linear Generalization: A Product Selection Problem for Urban Warehouses

The recent rising popularity of ultra-fast delivery services on retail p...

Improved Algorithm on Online Clustering of Bandits

We generalize the setting of online clustering of bandits by allowing no...

Learning to Identify Top Elo Ratings: A Dueling Bandits Approach

The Elo rating system is widely adopted to evaluate the skills of (chess...

Please sign up or login with your details

Forgot password? Click here to reset