Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning via Best Response

by   Rui Yan, et al.

This paper introduces two metrics (cycle-based and memory-based metrics), grounded on a dynamical game-theoretic solution concept called sink equilibrium, for the evaluation, ranking, and computation of policies in multi-agent learning. We adopt strict best response dynamics (SBRD) to model selfish behaviors at a meta-level for multi-agent reinforcement learning. Our approach can deal with dynamical cyclical behaviors (unlike approaches based on Nash equilibria and Elo ratings), and is more compatible with single-agent reinforcement learning than α-rank which relies on weakly better responses. We first consider settings where the difference between largest and second largest underlying metric has a known lower bound. With this knowledge we propose a class of perturbed SBRD with the following property: only policies with maximum metric are observed with nonzero probability for a broad class of stochastic games with finite memory. We then consider settings where the lower bound for the difference is unknown. For this setting, we propose a class of perturbed SBRD such that the metrics of the policies observed with nonzero probability differ from the optimal by any given tolerance. The proposed perturbed SBRD addresses the opponent-induced non-stationarity by fixing the strategies of others for the learning agent, and uses empirical game-theoretic analysis to estimate payoffs for each strategy profile obtained due to the perturbation.


A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

To achieve general intelligence, agents must learn how to interact with ...

Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning

In multi-agent reinforcement learning, the behaviors that agents learn i...

Multi-agent Inverse Reinforcement Learning for General-sum Stochastic Games

This paper addresses the problem of multi-agent inverse reinforcement le...

Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation

We consider model-based multi-agent reinforcement learning, where the en...

A Game-Theoretic Approach for Hierarchical Policy-Making

We present the design and analysis of a multi-level game-theoretic model...

Memory Lens: How Much Memory Does an Agent Use?

We propose a new method to study the internal memory used by reinforceme...

Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

Multiagent reinforcement learning (MARL) has benefited significantly fro...

Please sign up or login with your details

Forgot password? Click here to reset