A Surprisingly Simple Continuous-Action POMDP Solver: Lazy Cross-Entropy Search Over Policy Trees

05/14/2023
by   Marcus Hoerger, et al.
0

The Partially Observable Markov Decision Process (POMDP) provides a principled framework for decision making in stochastic partially observable environments. However, computing good solutions for problems with continuous action spaces remains challenging. To ease this challenge, we propose a simple online POMDP solver, called Lazy Cross-Entropy Search Over Policy Trees (LCEOPT). At each planning step, our method uses a lazy Cross-Entropy method to search the space of policy trees, which provide a simple policy representation. Specifically, we maintain a distribution on promising finite-horizon policy trees. The distribution is iteratively updated by sampling policies, evaluating them via Monte Carlo simulation, and refitting them to the top-performing ones. Our method is lazy in the sense that it exploits the policy tree representation to avoid redundant computations in policy sampling, evaluation, and distribution update. This leads to computational savings of up to two orders of magnitude. Our LCEOPT is surprisingly simple as compared to existing state-of-the-art methods, yet empirically outperforms them on several continuous-action POMDP problems, particularly for problems with higher-dimensional action spaces.

READ FULL TEXT
research
10/07/2020

Bayesian Optimized Monte Carlo Planning

Online solvers for partially observable Markov decision processes have d...
research
09/13/2022

Adaptive Discretization using Voronoi Trees for Continuous-Action POMDPs

Solving Partially Observable Markov Decision Processes (POMDPs) with con...
research
02/21/2023

Adaptive Discretization using Voronoi Trees for Continuous POMDPs

Solving continuous Partially Observable Markov Decision Processes (POMDP...
research
11/04/2020

An On-Line POMDP Solver for Continuous Observation Spaces

Planning under partial obervability is essential for autonomous robots. ...
research
03/25/2019

Q-Learning for Continuous Actions with Cross-Entropy Guided Policies

Off-Policy reinforcement learning (RL) is an important class of methods ...
research
10/07/2020

Improved POMDP Tree Search Planning with Prioritized Action Branching

Online solvers for partially observable Markov decision processes have d...
research
03/16/2017

Scalable Accelerated Decentralized Multi-Robot Policy Search in Continuous Observation Spaces

This paper presents the first ever approach for solving continuous-obser...

Please sign up or login with your details

Forgot password? Click here to reset