We consider the adversarial linear contextual bandit problem, where the ...
We propose a new best-of-both-worlds algorithm for bandits with variably...
Best-of-both-worlds algorithms for online learning which achieve near-op...
Policy optimization methods are popular reinforcement learning algorithm...
We study reinforcement learning in stochastic path (SP) problems. The go...
We present a modified tuning of the algorithm of Zimmert and Seldin [202...
We revisit the problem of stochastic online learning with feedback graph...
We revisit the classical online portfolio selection problem. It is widel...
Recent progress in model selection raises the question of the fundamenta...
We develop a model selection approach to tackle reinforcement learning w...
Multiclass logistic regression is a fundamental task in machine learning...
A major research direction in contextual bandits is to develop algorithm...
We provide improved gap-dependent regret bounds for reinforcement learni...
We study model selection in stochastic bandit problems. Our approach rel...
Existing multi-armed bandit (MAB) models make two implicit assumptions: ...
We propose a new algorithm for adversarial multi-armed bandits with
unre...
The information-theoretic analysis by Russo and Van Roy (2014) in combin...
We develop the first general semi-bandit algorithm that simultaneously
a...
We provide an algorithm that achieves the optimal (up to constants) fini...
We introduce the factored bandits model, which is a framework for learni...