SC-PSRO: A Unified Strategy Learning Method for Normal-form Games

by   Yudong Hu, et al.

Solving Nash equilibrium is the key challenge in normal-form games with large strategy spaces, wherein open-ended learning framework provides an efficient approach. Previous studies invariably employ diversity as a conduit to foster the advancement of strategies. Nevertheless, diversity-based algorithms can only work in zero-sum games with cyclic dimensions, which lead to limitations in their applicability. Here, we propose an innovative unified open-ended learning framework SC-PSRO, i.e., Self-Confirming Policy Space Response Oracle, as a general framework for both zero-sum and general-sum games. In particular, we introduce the advantage function as an improved evaluation metric for strategies, allowing for a unified learning objective for agents in normal-form games. Concretely, SC-PSRO comprises three quintessential components: 1) A Diversity Module, aiming to avoid strategies to be constrained by the cyclic structure. 2) A LookAhead Module, devised for the promotion of strategy in the transitive dimension. This module is theoretically guaranteed to learn strategies in the direction of the Nash equilibrium. 3) A Confirming-based Population Clipping Module, contrived for tackling the equilibrium selection problem in general-sum games. This module can be applied to learn equilibria with optimal rewards, which to our knowledge is the first improvement for general-sum games. Our experiments indicate that SC-PSRO accomplishes a considerable decrease in exploitability in zero-sum games and an escalation in rewards in general-sum games, markedly surpassing antecedent methodologies. Code will be released upon acceptance.


Computing Stackelberg Equilibria of Large General-Sum Games

We study the computational complexity of finding Stackelberg Equilibria ...

Learning in Multi-Memory Games Triggers Complex Dynamics Diverging from Nash Equilibrium

Repeated games consider a situation where multiple agents are motivated ...

Modelling Behavioural Diversity for Learning in Open-Ended Games

Promoting behavioural diversity is critical for solving games with non-t...

Block-Coordinate Methods and Restarting for Solving Extensive-Form Games

Coordinate descent methods are popular in machine learning and optimizat...

Double Oracle Algorithm for Computing Equilibria in Continuous Games

Many efficient algorithms have been designed to recover Nash equilibria ...

Some Properties of the Nash Equilibrium in 2 × 2 Zero-Sum Games

In this report, some properties of the set of Nash equilibria (NEs) of 2...

What game are we playing? End-to-end learning in normal and extensive form games

Although recent work in AI has made great progress in solving large, zer...

Please sign up or login with your details

Forgot password? Click here to reset