Gradient-Aware Model-based Policy Search

09/09/2019
by   Pierluca D'Oro, et al.
0

Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor estimates, as some relevant available information is ignored. In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement. We leverage a weighting scheme, derived from the minimization of the error on the model-based policy gradient estimator, in order to define a suitable objective function that is optimized for learning the approximate transition model. Then, we integrate this procedure into a batch policy improvement algorithm, named Gradient-Aware Model-based Policy Search (GAMPS), which iteratively learns a transition model and uses it, together with the collected trajectories, to compute the new policy parameters. Finally, we empirically validate GAMPS on benchmark domains analyzing and discussing its properties.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2020

Policy-Aware Model Learning for Policy Gradient Methods

This paper considers the problem of learning a model in model-based rein...
research
09/19/2019

Revisit Policy Optimization in Matrix Form

In tabular case, when the reward and environment dynamics are known, pol...
research
10/21/2022

Random Actions vs Random Policies: Bootstrapping Model-Based Direct Policy Search

This paper studies the impact of the initial data gathering method on th...
research
05/22/2023

TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching

Standard model-based reinforcement learning (MBRL) approaches fit a tran...
research
05/23/2016

Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks

We present an algorithm for model-based reinforcement learning that comb...
research
06/13/2012

Improving Gradient Estimation by Incorporating Sensor Data

An efficient policy search algorithm should estimate the local gradient ...
research
03/02/2021

Minimax Model Learning

We present a novel off-policy loss function for learning a transition mo...

Please sign up or login with your details

Forgot password? Click here to reset