SC-PSRO: A Unified Strategy Learning Method for Normal-form Games

08/24/2023
by   Yudong Hu, et al.
0

Solving Nash equilibrium is the key challenge in normal-form games with large strategy spaces, wherein open-ended learning framework provides an efficient approach. Previous studies invariably employ diversity as a conduit to foster the advancement of strategies. Nevertheless, diversity-based algorithms can only work in zero-sum games with cyclic dimensions, which lead to limitations in their applicability. Here, we propose an innovative unified open-ended learning framework SC-PSRO, i.e., Self-Confirming Policy Space Response Oracle, as a general framework for both zero-sum and general-sum games. In particular, we introduce the advantage function as an improved evaluation metric for strategies, allowing for a unified learning objective for agents in normal-form games. Concretely, SC-PSRO comprises three quintessential components: 1) A Diversity Module, aiming to avoid strategies to be constrained by the cyclic structure. 2) A LookAhead Module, devised for the promotion of strategy in the transitive dimension. This module is theoretically guaranteed to learn strategies in the direction of the Nash equilibrium. 3) A Confirming-based Population Clipping Module, contrived for tackling the equilibrium selection problem in general-sum games. This module can be applied to learn equilibria with optimal rewards, which to our knowledge is the first improvement for general-sum games. Our experiments indicate that SC-PSRO accomplishes a considerable decrease in exploitability in zero-sum games and an escalation in rewards in general-sum games, markedly surpassing antecedent methodologies. Code will be released upon acceptance.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro