On learning history based policies for controlling Markov decision processes

by   Gandharv Patil, et al.

Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP. However, there has been little formal analysis of such history-based algorithms, as most existing frameworks focus exclusively on memory-less features. In this paper, we introduce a theoretical framework for studying the behaviour of RL algorithms that learn to control an MDP using history-based feature abstraction mappings. Furthermore, we use this framework to design a practical RL algorithm and we numerically evaluate its effectiveness on a set of continuous control tasks.


page 1

page 2

page 3

page 4


Reinforcement Learning under Partial Observability Guided by Learned Environment Models

In practical applications, we can rarely assume full observability of a ...

Approximate information state based convergence analysis of recurrent Q-learning

In spite of the large literature on reinforcement learning (RL) algorith...

Refined Analysis of FPL for Adversarial Markov Decision Processes

We consider the adversarial Markov Decision Process (MDP) problem, where...

A2: Extracting Cyclic Switchings from DOB-nets for Rejecting Excessive Disturbances

Reinforcement Learning (RL) is limited in practice by its gray-box natur...

Reinforcement Learning with History-Dependent Dynamic Contexts

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a no...

A Minimum Relative Entropy Controller for Undiscounted Markov Decision Processes

Adaptive control problems are notoriously difficult to solve even in the...

Please sign up or login with your details

Forgot password? Click here to reset