Stochastic Stability of Reinforcement Learning in Positive-Utility Games
This paper considers a class of discrete-time reinforcement-learning dynamics and provides a stochastic-stability analysis in repeatedly played positive-utility (strategic-form) games. For this class of dynamics, convergence to pure Nash equilibria has been demonstrated only for the fine class of potential games. Prior work primarily provides convergence properties through stochastic approximations, where the asymptotic behavior can be associated with the limit points of an ordinary-differential equation (ODE). However, analyzing global convergence through an ODE-approximation requires the existence of a Lyapunov or a potential function, which naturally restricts the analysis to a fine class of games. To overcome these limitations, this paper introduces an alternative framework for analyzing convergence under reinforcement learning that is based upon an explicit characterization of the invariant probability measure of the induced Markov chain. We further provide a methodology for computing the invariant probability measure in positive-utility games, together with an illustration in the context of coordination games.
READ FULL TEXT