Online Reinforcement Learning for Real-Time Exploration in Continuous State and Action Markov Decision Processes

12/12/2016
by   Ludovic Hofer, et al.
0

This paper presents a new method to learn online policies in continuous state, continuous action, model-free Markov decision processes, with two properties that are crucial for practical applications. First, the policies are implementable with a very low computational cost: once the policy is computed, the action corresponding to a given state is obtained in logarithmic time with respect to the number of samples used. Second, our method is versatile: it does not rely on any a priori knowledge of the structure of optimal policies. We build upon the Fitted Q-iteration algorithm which represents the Q-value as the average of several regression trees. Our algorithm, the Fitted Policy Forest algorithm (FPF), computes a regression forest representing the Q-value and transforms it into a single tree representing the policy, while keeping control on the size of the policy using resampling and leaf merging. We introduce an adaptation of Multi-Resolution Exploration (MRE) which is particularly suited to FPF. We assess the performance of FPF on three classical benchmarks for reinforcement learning: the "Inverted Pendulum", the "Double Integrator" and "Car on the Hill" and show that FPF equals or outperforms other algorithms, although these algorithms rely on the use of particular representations of the policies, especially chosen in order to fit each of the three problems. Finally, we exhibit that the combination of FPF and MRE allows to find nearly optimal solutions in problems where ϵ-greedy approaches would fail.

READ FULL TEXT

page 1

page 9

research
09/17/2013

Models and algorithms for skip-free Markov decision processes on trees

We introduce a class of models for multidimensional control problems whi...
research
08/22/2019

Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes

Off-policy evaluation (OPE) in reinforcement learning allows one to eval...
research
05/24/2023

Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time

A crucial problem in reinforcement learning is learning the optimal poli...
research
09/05/2015

Reinforcement Learning with Parameterized Actions

We introduce a model-free algorithm for learning in Markov decision proc...
research
03/29/2016

Algorithms for Batch Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning (HRL) exploits temporal abstraction ...
research
06/03/2021

A Provably-Efficient Model-Free Algorithm for Constrained Markov Decision Processes

This paper presents the first model-free, simulator-free reinforcement l...
research
01/30/2013

Flexible Decomposition Algorithms for Weakly Coupled Markov Decision Problems

This paper presents two new approaches to decomposing and solving large ...

Please sign up or login with your details

Forgot password? Click here to reset