Optimal No-Regret Learning in Strongly Monotone Games with Bandit Feedback

by   Tianyi Lin, et al.

We consider online no-regret learning in unknown games with bandit feedback, where each agent only observes its reward at each time – determined by all players' current joint action – rather than its gradient. We focus on the class of smooth and strongly monotone games and study optimal no-regret learning therein. Leveraging self-concordant barrier functions, we first construct an online bandit convex optimization algorithm and show that it achieves the single-agent optimal regret of Θ̃(√(T)) under smooth and strongly-concave payoff functions. We then show that if each agent applies this no-regret learning algorithm in strongly monotone games, the joint action converges in last iterate to the unique Nash equilibrium at a rate of Θ̃(1/√(T)). Prior to our work, the best-know convergence rate in the same class of games is O(1/T^1/3) (achieved by a different algorithm), thus leaving open the problem of optimal no-regret learning algorithms (since the known lower bound is Ω(1/√(T))). Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning by identifying the first doubly optimal bandit learning algorithm, in that it achieves (up to log factors) both optimal regret in the single-agent learning and optimal last-iterate convergence rate in the multi-agent learning. We also present results on several simulation studies – Cournot competition, Kelly auctions, and distributed regularized logistic regression – to demonstrate the efficacy of our algorithm.


page 1

page 2

page 3

page 4


Doubly Optimal No-Regret Learning in Monotone Games

We consider online learning in multi-player smooth monotone games. Exist...

Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games

We consider multi-agent learning via online gradient descent (OGD) in a ...

Bandit learning in concave N-person games

This paper examines the long-run behavior of learning with bandit feedba...

Taming Wild Price Fluctuations: Monotone Stochastic Convex Optimization with Bandit Feedback

Prices generated by automated price experimentation algorithms often dis...

No-regret learning for repeated non-cooperative games with lossy bandits

This paper considers no-regret learning for repeated continuous-kernel g...

Online Monotone Games

Algorithmic game theory (AGT) focuses on the design and analysis of algo...

A Tight and Unified Analysis of Extragradient for a Whole Spectrum of Differentiable Games

We consider differentiable games: multi-objective minimization problems,...

Please sign up or login with your details

Forgot password? Click here to reset