Monte Carlo Matrix Inversion Policy Evaluation

10/19/2012

∙

In 1950, Forsythe and Leibler (1950) introduced a statistical technique for finding the inverse of a matrix by characterizing the elements of the matrix inverse as expected values of a sequence of random walks. Barto and Duff (1994) subsequently showed relations between this technique and standard dynamic programming and temporal differencing methods. The advantage of the Monte Carlo matrix inversion (MCMI) approach is that it scales better with respect to state-space size than alternative techniques. In this paper, we introduce an algorithm for performing reinforcement learning policy evaluation using MCMI. We demonstrate that MCMI improves on runtime over a maximum likelihood model-based policy evaluation approach and on both runtime and accuracy over the temporal differencing (TD) policy evaluation approach. We further improve on MCMI policy evaluation by adding an importance sampling technique to our algorithm to reduce the variance of our estimator. Lastly, we illustrate techniques for scaling up MCMI to large state spaces in order to perform policy improvement.

READ FULL TEXT

Monte Carlo Matrix Inversion Policy Evaluation

Policy Learning and Evaluation with Randomized Quasi-Monte Carlo

Monte Carlo Rollout Policy for Recommendation Systems with Dynamic User Behavior

Model-based Policy Search for Partially Measurable Systems

Multilevel Monte Carlo estimation of expected information gains

Parallel Selected Inversion for Space-Time Gaussian Markov Random Fields

Inferring Smooth Control: Monte Carlo Posterior Policy Iteration with Gaussian Processes

The Accuracy vs. Sampling Overhead Trade-off in Quantum Error Mitigation Using Monte Carlo-Based Channel Inversion

Monte Carlo Matrix Inversion Policy Evaluation

Related Research

Policy Learning and Evaluation with Randomized Quasi-Monte Carlo

Monte Carlo Rollout Policy for Recommendation Systems with Dynamic User Behavior

Model-based Policy Search for Partially Measurable Systems

Multilevel Monte Carlo estimation of expected information gains

Parallel Selected Inversion for Space-Time Gaussian Markov Random Fields

Inferring Smooth Control: Monte Carlo Posterior Policy Iteration with Gaussian Processes

The Accuracy vs. Sampling Overhead Trade-off in Quantum Error Mitigation Using Monte Carlo-Based Channel Inversion