AI Chat AI Image Generator AI Video Text to Speech

Improving Gradient Estimation by Incorporating Sensor Data

06/13/2012

∙

by Gregory Lawrence, et al.

∙

∙

An efficient policy search algorithm should estimate the local gradient of the objective function, with respect to the policy parameters, from as few trials as possible. Whereas most policy search methods estimate this gradient by observing the rewards obtained during policy trials, we show, both theoretically and empirically, that taking into account the sensor data as well gives better gradient estimates and hence faster learning. The reason is that rewards obtained during policy execution vary from trial to trial due to noise in the environment; sensor data, which correlates with the noise, can be used to partially correct for this variation, resulting in an estimatorwith lower variance.

Gregory Lawrence
1 publication
Stuart Russell
64 publications

page 1

page 2

page 3

page 4

research

∙ 06/21/2019

Leveraging Reinforcement Learning Techniques for Effective Policy Adoption and Validation

Rewards and punishments in different forms are pervasive and present in ...

0 Nikki Lijing Kuang, et al. ∙

research

∙ 11/12/2019

On Policy Gradients

The goal of policy gradient approaches is to find a policy in a given cl...

0 Mattis Manfred Kämmerer, et al. ∙

research

∙ 08/25/2020

Robust Estimation of Noise for Electromagnetic Brain Imaging with the Champagne Algorithm

Robust estimation of the number, location, and activity of multiple corr...

0 Ali Hashemi, et al. ∙

research

∙ 09/09/2019

Gradient-Aware Model-based Policy Search

Traditional model-based reinforcement learning approaches learn a model ...

0 Pierluca D'Oro, et al. ∙

research

∙ 02/08/2016

Data-Efficient Reinforcement Learning in Continuous-State POMDPs

We present a data-efficient reinforcement learning algorithm resistant t...

0 Rowan McAllister, et al. ∙

research

∙ 06/22/2021

Local policy search with Bayesian optimization

Reinforcement learning (RL) aims to find an optimal policy by interactio...

0 Sarah Müller, et al. ∙

research

∙ 10/12/2020

Multi-Objective Bayesian Optimisation and Joint Inversion for Active Sensor Fusion

A critical decision process in data acquisition for mineral and energy r...

0 Sebastian Haan, et al. ∙