Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games

12/29/2022
by   Batuhan Yardim, et al.
0

Mean-field games have been used as a theoretical tool to obtain an approximate Nash equilibrium for symmetric and anonymous N-player games in literature. However, limiting applicability, existing theoretical results assume variations of a "population generative model", which allows arbitrary modifications of the population distribution by the learning algorithm. Instead, we show that N agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within 𝒪̃(ε^-2) samples from a single sample trajectory without a population generative model, up to a standard 𝒪(1/√(N)) error due to the mean field. Taking a divergent approach from literature, instead of working with the best-response map we first show that a policy mirror ascent map can be used to construct a contractive operator having the Nash equilibrium as its fixed point. Next, we prove that conditional TD-learning in N-agent games can learn value functions within 𝒪̃(ε^-2) time steps. These results allow proving sample complexity guarantees in the oracle-free setting by only relying on a sample path from the N agent simulator. Furthermore, we demonstrate that our methodology allows for independent learning by N agents with finite sample guarantees.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro