α^α-Rank: Scalable Multi-agent Evaluation through Evolution

09/25/2019
by   Yaodong Yang, et al.
0

Although challenging, strategy profile evaluation in large connected learner networks is crucial for enabling the next wave of machine learning applications. Recently, α-Rank, an evolutionary algorithm, has been proposed as a solution for ranking joint policy profiles in multi-agent systems. α-Rank claimed scalability through a polynomial time implementation with respect to the total number of pure strategy profiles. In this paper, we formally prove that such a claim is not grounded. In fact, we show that α-Rank exhibits an exponential complexity in number of agents, hindering its application beyond a small finite number of joint profiles. Realizing such a limitation, we contribute by proposing a scalable evaluation protocol that we title α^α-Rank. Our method combines evolutionary dynamics with stochastic optimization and double oracles for truly scalable ranking with linear (in number of agents) time and memory complexities. Our contributions allow us, for the first time, to conduct large-scale evaluation experiments of multi-agent systems, where we show successful results on large joint strategy profiles with sizes in the order of O(2^25) (i.e., ≈33 million strategies) -- a setting not evaluable using current techniques.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset