On the Complexity of Multi-Agent Decision Making: From Learning in Games to Partial Monitoring

by   Dylan J. Foster, et al.

A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees, and how these considerations change as we move from few to many agents. We study this question in a general framework for interactive decision making with multiple agents, encompassing Markov games with function approximation and normal-form games with bandit feedback. We focus on equilibrium computation, in which a centralized learning algorithm aims to compute an equilibrium by controlling multiple agents that interact with an unknown environment. Our main contributions are: - We provide upper and lower bounds on the optimal sample complexity for multi-agent decision making based on a multi-agent generalization of the Decision-Estimation Coefficient, a complexity measure introduced by Foster et al. (2021) in the single-agent counterpart to our setting. Compared to the best results for the single-agent setting, our bounds have additional gaps. We show that no "reasonable" complexity measure can close these gaps, highlighting a striking separation between single and multiple agents. - We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making, but with hidden (unobserved) rewards, a framework that subsumes variants of the partial monitoring problem. As a consequence, we characterize the statistical complexity for hidden-reward interactive decision making to the best extent possible. Building on this development, we provide several new structural results, including 1) conditions under which the statistical complexity of multi-agent decision making can be reduced to that of single-agent, and 2) conditions under which the so-called curse of multiple agents can be avoided.


page 1

page 2

page 3

page 4


On the Complexity of Adversarial Decision Making

A central problem in online learning and decision making – from bandits ...

Learning in Stackelberg Games with Non-myopic Agents

We study Stackelberg games where a principal repeatedly interacts with a...

Prophet Inequality with Competing Agents

We introduce a model of competing agents in a prophet setting, where rew...

Artificial Decision Making Under Uncertainty in Intelligent Buildings

Our hypothesis is that by equipping certain agents in a multi-agent syst...

Decision Market Based Learning For Multi-agent Contextual Bandit Problems

Information is often stored in a distributed and proprietary form, and a...

Multi-agent Time-based Decision-making for the Search and Action Problem

Many robotic applications, such as search-and-rescue, require multiple a...

Equilibration Analysis and Control of Coordinating Decision-Making Populations

Whether a population of decision-making individuals will reach a state o...

Please sign up or login with your details

Forgot password? Click here to reset