Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning

02/19/2021
by   Luofeng Liao, et al.
1

In offline reinforcement learning (RL) an optimal policy is learnt solely from a priori collected observational data. However, in observational data, actions are often confounded by unobserved variables. Instrumental variables (IVs), in the context of RL, are the variables whose influence on the state variables are all mediated through the action. When a valid instrument is present, we can recover the confounded transition dynamics through observational data. We study a confounded Markov decision process where the transition dynamics admit an additive nonlinear functional form. Using IVs, we derive a conditional moment restriction (CMR) through which we can identify transition dynamics based on observational data. We propose a provably efficient IV-aided Value Iteration (IVVI) algorithm based on a primal-dual reformulation of CMR. To the best of our knowledge, this is the first provably efficient algorithm for instrument-aided offline RL.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2022

Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

We study offline reinforcement learning (RL) in partially observable Mar...
research
03/06/2021

Causal Reinforcement Learning: An Instrumental Variable Approach

In the standard data analysis framework, data is first collected (once f...
research
06/22/2020

Provably Efficient Causal Reinforcement Learning with Confounded Observational Data

Empowered by expressive function approximators such as neural networks, ...
research
09/18/2022

Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

We study the offline reinforcement learning (RL) in the face of unmeasur...
research
12/12/2019

Provably Efficient Exploration in Policy Optimization

While policy-based reinforcement learning (RL) achieves tremendous succe...
research
02/24/2023

Provably Efficient Neural Offline Reinforcement Learning via Perturbed Rewards

We propose a novel offline reinforcement learning (RL) algorithm, namely...
research
02/22/2022

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

This paper is concerned with constructing a confidence interval for a ta...

Please sign up or login with your details

Forgot password? Click here to reset