Trust Region Value Optimization using Kalman Filtering

01/23/2019
by   Shirli Di-Castro Shashua, et al.
0

Policy evaluation is a key process in reinforcement learning. It assesses a given policy using estimation of the corresponding value function. When using a parameterized function to approximate the value, it is common to optimize the set of parameters by minimizing the sum of squared Bellman Temporal Differences errors. However, this approach ignores certain distributional properties of both the errors and value parameters. Taking these distributions into account in the optimization process can provide useful information on the amount of confidence in value estimation. In this work we propose to optimize the value by minimizing a regularized objective function which forms a trust region over its parameters. We present a novel optimization method, the Kalman Optimization for Value Approximation (KOVA), based on the Extended Kalman Filter. KOVA minimizes the regularized objective function by adopting a Bayesian perspective over both the value parameters and noisy observed returns. This distributional property provides information on parameter uncertainty in addition to value estimates. We provide theoretical results of our approach and analyze the performance of our proposed optimizer on domains with large state and action spaces.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2020

Kalman meets Bellman: Improving Policy Evaluation through Value Tracking

Policy evaluation is a key process in Reinforcement Learning (RL). It as...
research
03/07/2017

Deep Robust Kalman Filter

A Robust Markov Decision Process (RMDP) is a sequential decision making ...
research
07/31/2020

Queueing Network Controls via Deep Reinforcement Learning

Novel advanced policy gradient (APG) methods with conservative policy it...
research
12/23/2019

Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

It is well known that quantifying uncertainty in the action-value estima...
research
05/30/2020

MM-KTD: Multiple Model Kalman Temporal Differences for Reinforcement Learning

There has been an increasing surge of interest on development of advance...
research
07/31/2023

Moreau-Yoshida Variational Transport: A General Framework For Solving Regularized Distributional Optimization Problems

We consider a general optimization problem of minimizing a composite obj...
research
04/16/2014

An Analysis of State-Relevance Weights and Sampling Distributions on L1-Regularized Approximate Linear Programming Approximation Accuracy

Recent interest in the use of L_1 regularization in the use of value fun...

Please sign up or login with your details

Forgot password? Click here to reset