Anytime Planning for Decentralized POMDPs using Expectation Maximization

03/15/2012
by   Akshat Kumar, et al.
0

Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While fnite-horizon DECPOMDPs have enjoyed signifcant success, progress remains slow for the infnite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infnite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/2020

Geom-SPIDER-EM: Faster Variance Reduced Stochastic Expectation Maximization for Nonconvex Finite-Sum Optimization

The Expectation Maximization (EM) algorithm is a key reference for infer...
research
05/01/2015

Stick-Breaking Policy Learning in Dec-POMDPs

Expectation maximization (EM) has recently been shown to be an efficient...
research
11/30/2020

A Stochastic Path-Integrated Differential EstimatoR Expectation Maximization Algorithm

The Expectation Maximization (EM) algorithm is of key importance for inf...
research
09/17/2021

Solving infinite-horizon Dec-POMDPs using Finite State Controllers within JESP

This paper looks at solving collaborative planning problems formalized a...
research
05/31/2013

Expectation-maximization for logistic regression

We present a family of expectation-maximization (EM) algorithms for bina...
research
12/29/2020

Fast Incremental Expectation Maximization for finite-sum optimization: nonasymptotic convergence

Fast Incremental Expectation Maximization (FIEM) is a version of the EM ...
research
03/02/2023

PLUNDER: Probabilistic Program Synthesis for Learning from Unlabeled and Noisy Demonstrations

Learning from demonstration (LfD) is a widely researched paradigm for te...

Please sign up or login with your details

Forgot password? Click here to reset