Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

05/28/2019
by   Julian Zimmert, et al.
0

The information-theoretic analysis by Russo and Van Roy (2014) in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings. In most applications there is a tantalising similarity to the classical analysis based on mirror descent. We make a formal connection, showing that the information-theoretic bounds in most applications can be derived from existing techniques for online convex optimisation. Besides this, for k-armed adversarial bandits we provide an efficient algorithm with regret that matches the best information-theoretic upper bound and improve best known regret guarantees for online linear optimisation on ℓ_p-balls and bandits with graph feedback.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset