BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations

05/31/2023
by   Robert J. Moss, et al.
0

Real-world planning problemsx2014including autonomous driving and sustainable energy applications like carbon storage and resource explorationx2014have recently been modeled as partially observable Markov decision processes (POMDPs) and solved using approximate methods. To solve high-dimensional POMDPs in practice, state-of-the-art methods use online planning with problem-specific heuristics to reduce planning horizons and make the problems tractable. Algorithms that learn approximations to replace heuristics have recently found success in large-scale problems in the fully observable domain. The key insight is the combination of online Monte Carlo tree search with offline neural network approximations of the optimal policy and value function. In this work, we bring this insight to partially observed domains and propose BetaZero, a belief-state planning algorithm for POMDPs. BetaZero learns offline approximations based on accurate belief models to enable online decision making in long-horizon problems. We address several challenges inherent in large-scale partially observable domains; namely challenges of transitioning in stochastic environments, prioritizing action branching with limited search budget, and representing beliefs as input to the network. We apply BetaZero to various well-established benchmark POMDPs found in the literature. As a real-world case study, we test BetaZero on the high-dimensional geological problem of critical mineral exploration. Experiments show that BetaZero outperforms state-of-the-art POMDP solvers on a variety of tasks.

READ FULL TEXT

page 7

page 9

page 14

page 15

page 19

page 20

research
10/18/2021

Probabilistic Inference in Planning for Partially Observable Long Horizon Problems

For autonomous service robots to successfully perform long horizon tasks...
research
12/17/2021

Visual Learning-based Planning for Continuous High-Dimensional POMDPs

The Partially Observable Markov Decision Process (POMDP) is a powerful f...
research
11/14/2022

Monte Carlo Planning in Hybrid Belief POMDPs

Real-world problems often require reasoning about hybrid beliefs, over b...
research
12/08/2022

Task-Directed Exploration in Continuous POMDPs for Robotic Manipulation of Articulated Objects

Representing and reasoning about uncertainty is crucial for autonomous a...
research
06/30/2011

Finding Approximate POMDP solutions Through Belief Compression

Standard value function approaches to finding policies for Partially Obs...
research
08/05/2015

On the Linear Belief Compression of POMDPs: A re-examination of current methods

Belief compression improves the tractability of large-scale partially ob...
research
05/29/2019

LeTS-Drive: Driving in a Crowd by Learning from Tree Search

Autonomous driving in a crowded environment, e.g., a busy traffic inters...

Please sign up or login with your details

Forgot password? Click here to reset