On the Complexity of Representation Learning in Contextual Linear Bandits

by   Andrea Tirinzoni, et al.

In contextual linear bandits, the reward function is assumed to be a linear combination of an unknown reward vector and a given embedding of context-arm pairs. In practice, the embedding is often learned at the same time as the reward vector, thus leading to an online representation learning problem. Existing approaches to representation learning in contextual bandits are either very generic (e.g., model-selection techniques or algorithms for learning with arbitrary function classes) or specialized to particular structures (e.g., nested features or representations with certain spectral properties). As a result, the understanding of the cost of representation learning in contextual linear bandit is still limited. In this paper, we take a systematic approach to the problem and provide a comprehensive study through an instance-dependent perspective. We show that representation learning is fundamentally more complex than linear bandits (i.e., learning with a given representation). In particular, learning with a given set of representations is never simpler than learning with the worst realizable representation in the set, while we show cases where it can be arbitrarily harder. We complement this result with an extensive discussion of how it relates to existing literature and we illustrate positive instances where representation learning is as complex as learning with a fixed representation and where sub-logarithmic regret is achievable.


page 1

page 2

page 3

page 4


Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

We study the problem of representation learning in stochastic contextual...

Leveraging Good Representations in Linear Contextual Bandits

The linear contextual bandit literature is mostly focused on the design ...

Neural Contextual Bandits with Deep Representation and Shallow Exploration

We study a general class of contextual bandits, where each context-actio...

Nested Subspace Arrangement for Representation of Relational Data

Studies on acquiring appropriate continuous representations of discrete ...

Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

Dialog response selection is an important step towards natural response ...

Provable General Function Class Representation Learning in Multitask Bandits and MDPs

While multitask representation learning has become a popular approach in...

Representation Learning for Context-Dependent Decision-Making

Humans are capable of adjusting to changing environments flexibly and qu...

Please sign up or login with your details

Forgot password? Click here to reset