Learning for Multi-robot Cooperation in Partially Observable Stochastic Environments with Macro-actions

by   Miao Liu, et al.
Northeastern University
Princeton University

This paper presents a data-driven approach for multi-robot coordination in partially-observable domains based on Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a general framework for cooperative sequential decision making under uncertainty and MAs allow temporally extended and asynchronous action execution. To date, most methods assume the underlying Dec-POMDP model is known a priori or a full simulator is available during planning time. Previous methods which aim to address these issues suffer from local optimality and sensitivity to initial conditions. Additionally, few hardware demonstrations involving a large team of heterogeneous robots and with long planning horizons exist. This work addresses these gaps by proposing an iterative sampling based Expectation-Maximization algorithm (iSEM) to learn polices using only trajectory data containing observations, MAs, and rewards. Our experiments show the algorithm is able to achieve better solution quality than the state-of-the-art learning-based methods. We implement two variants of multi-robot Search and Rescue (SAR) domains (with and without obstacles) on hardware to demonstrate the learned policies can effectively control a team of distributed robots to cooperate in a partially observable stochastic environment.


page 7

page 8


Decentralized Control of Partially Observable Markov Decision Processes using Belief Space Macro-actions

The focus of this paper is on solving multi-robot planning problems in c...

Macro-Action-Based Deep Multi-Agent Reinforcement Learning

In real-world multi-robot systems, performing high-quality, collaborativ...

Semantic-level Decentralized Multi-Robot Decision-Making using Probabilistic Macro-Observations

Robust environment perception is essential for decision-making on robots...

Planning for Decentralized Control of Multiple Robots Under Uncertainty

We describe a probabilistic framework for synthesizing control policies ...

CAR-DESPOT: Causally-Informed Online POMDP Planning for Robots in Confounded Environments

Robots operating in real-world environments must reason about possible o...

Efficient Planning under Uncertainty with Macro-actions

Deciding how to act in partially observable environments remains an acti...

Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes

The incorporation of macro-actions (temporally extended actions) into mu...

Please sign up or login with your details

Forgot password? Click here to reset