Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods

10/09/2019
by   René Carmona, et al.
0

We investigate reinforcement learning for mean field control problems in discrete time, which can be viewed as Markov decision processes for a large number of exchangeable agents interacting in a mean field manner. Such problems arise, for instance when a large number of robots communicate through a central unit dispatching the optimal policy computed by minimizing the overall social cost. An approximate solution is obtained by learning the optimal policy of a generic agent interacting with the statistical distribution of the states of the other agents. We prove rigorously the convergence of exact and model-free policy gradient methods in a mean-field linear-quadratic setting. We also provide graphical evidence of the convergence based on implementations of our algorithms.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset