Monte Carlo Matrix Inversion Policy Evaluation

10/19/2012
by   Fletcher Lu, et al.
0

In 1950, Forsythe and Leibler (1950) introduced a statistical technique for finding the inverse of a matrix by characterizing the elements of the matrix inverse as expected values of a sequence of random walks. Barto and Duff (1994) subsequently showed relations between this technique and standard dynamic programming and temporal differencing methods. The advantage of the Monte Carlo matrix inversion (MCMI) approach is that it scales better with respect to state-space size than alternative techniques. In this paper, we introduce an algorithm for performing reinforcement learning policy evaluation using MCMI. We demonstrate that MCMI improves on runtime over a maximum likelihood model-based policy evaluation approach and on both runtime and accuracy over the temporal differencing (TD) policy evaluation approach. We further improve on MCMI policy evaluation by adding an importance sampling technique to our algorithm to reduce the variance of our estimator. Lastly, we illustrate techniques for scaling up MCMI to large state spaces in order to perform policy improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2022

Policy Learning and Evaluation with Randomized Quasi-Monte Carlo

Reinforcement learning constantly deals with hard integrals, for example...
research
02/08/2021

Monte Carlo Rollout Policy for Recommendation Systems with Dynamic User Behavior

We model online recommendation systems using the hidden Markov multi-sta...
research
01/21/2021

Model-based Policy Search for Partially Measurable Systems

In this paper, we propose a Model-Based Reinforcement Learning (MBRL) al...
research
11/19/2018

Multilevel Monte Carlo estimation of expected information gains

In this paper we develop an efficient Monte Carlo algorithm for estimati...
research
09/11/2023

Parallel Selected Inversion for Space-Time Gaussian Markov Random Fields

Performing a Bayesian inference on large spatio-temporal models requires...
research
10/07/2022

Inferring Smooth Control: Monte Carlo Posterior Policy Iteration with Gaussian Processes

Monte Carlo methods have become increasingly relevant for control of non...
research
01/20/2022

The Accuracy vs. Sampling Overhead Trade-off in Quantum Error Mitigation Using Monte Carlo-Based Channel Inversion

Quantum error mitigation (QEM) is a class of promising techniques for re...

Please sign up or login with your details

Forgot password? Click here to reset