A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

by   Qinghua Liu, et al.

Model-based algorithms—algorithms that decouple learning of the model and planning given the model—are widely used in reinforcement learning practice and theoretically shown to achieve optimal sample efficiency for single-agent reinforcement learning in Markov Decision Processes (MDPs). However, for multi-agent reinforcement learning in Markov games, the current best known sample complexity for model-based algorithms is rather suboptimal and compares unfavorably against recent model-free approaches. In this paper, we present a sharp analysis of model-based self-play algorithms for multi-agent Markov games. We design an algorithm Optimistic Nash Value Iteration (Nash-VI) for two-player zero-sum Markov games that is able to output an ϵ-approximate Nash policy in 𝒪̃(H^3SAB/ϵ^2) episodes of game playing, where S is the number of states, A,B are the number of actions for the two players respectively, and H is the horizon length. This is the first algorithm that matches the information-theoretic lower bound Ω(H^3S(A+B)/ϵ^2) except for a min{A,B} factor, and compares favorably against the best known model-free algorithm if min{A,B}=o(H^3). In addition, our Nash-VI outputs a single Markov policy with optimality guarantee, while existing sample-efficient model-free algorithms output a nested mixture of Markov policies that is in general non-Markov and rather inconvenient to store and execute. We further adapt our analysis to designing a provably efficient task-agnostic algorithm for zero-sum Markov games, and designing the first line of provably sample-efficient algorithms for multi-player general-sum Markov games.


page 1

page 2

page 3

page 4


Model-Free Algorithm with Improved Sample Efficiency for Zero-Sum Markov Games

The problem of two-player zero-sum Markov games has recently attracted i...

A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

Many model-based reinforcement learning (RL) algorithms can be viewed as...

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Model-based reinforcement learning (RL), which finds an optimal policy u...

Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus

This paper considers offline multi-agent reinforcement learning. We prop...

A Game Theoretic Framework for Model Based Reinforcement Learning

Model-based reinforcement learning (MBRL) has recently gained immense in...

Learning to Coordinate Efficiently: A Model-based Approach

In common-interest stochastic games all players receive an identical pay...

Provably Learning Nash Policies in Constrained Markov Potential Games

Multi-agent reinforcement learning (MARL) addresses sequential decision-...

Please sign up or login with your details

Forgot password? Click here to reset