POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis

06/08/2020
by   Weichao Mao, et al.
0

Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has demonstrated remarkable performance in applications with finite spaces. In this paper, we consider Monte-Carlo planning in an environment with continuous state-action spaces, a much less understood problem with important applications in control and robotics. We introduce POLY-HOOT, an algorithm that augments MCTS with a continuous armed bandit strategy named Hierarchical Optimistic Optimization (HOO) (Bubeck et al., 2011). Specifically, we enhance HOO by using an appropriate polynomial, rather than logarithmic, bonus term in the upper confidence bounds. Such a polynomial bonus is motivated by its empirical successes in AlphaGo Zero (Silver et al., 2017b), as well as its significant role in achieving theoretical guarantees of finite space MCTS (Shah et al., 2019). We investigate, for the first time, the regret of the enhanced HOO algorithm in non-stationary bandit problems. Using this result as a building block, we establish non-asymptotic convergence guarantees for POLY-HOOT: the value estimate converges to an arbitrarily small neighborhood of the optimal value function at a polynomial rate. We further provide experimental results that corroborate our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2021

Limited depth bandit-based strategy for Monte Carlo planning in continuous action spaces

This paper addresses the problem of optimal control using search trees. ...
research
06/08/2021

Measurable Monte Carlo Search Error Bounds

Monte Carlo planners can often return sub-optimal actions, even if they ...
research
02/14/2019

On Reinforcement Learning Using Monte Carlo Tree Search with Supervised Learning: Non-Asymptotic Analysis

Inspired by the success of AlphaGo Zero (AGZ) which utilizes Monte Carlo...
research
05/11/2015

Adapting Improved Upper Confidence Bounds for Monte-Carlo Tree Search

The UCT algorithm, which combines the UCB algorithm and Monte-Carlo Tree...
research
01/09/2019

Robust and Adaptive Planning under Model Uncertainty

Planning under model uncertainty is a fundamental problem across many ap...
research
02/10/2023

A Monte Carlo packing algorithm for poly-ellipsoids and its comparison with packing generation using Discrete Element Model

Granular material is showing very often in geotechnical engineering, pet...
research
09/19/2023

Monte-Carlo tree search with uncertainty propagation via optimal transport

This paper introduces a novel backup strategy for Monte-Carlo Tree Searc...

Please sign up or login with your details

Forgot password? Click here to reset