Batch Policy Learning under Constraints

03/20/2019
by   Hoang M. Le, et al.
26

When learning policies for real-world domains, two important questions arise: (i) how to efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate among different competing objectives and constraints. We thus study the problem of batch policy learning under multiple constraints, and offer a systematic solution. We first propose a flexible meta-algorithm that admits any batch reinforcement learning and online learning procedure as subroutines. We then present a specific algorithmic instantiation and provide performance guarantees for the main objective and all constraints. To certify constraint satisfaction, we propose a new and simple method for off-policy policy evaluation (OPE) and derive PAC-style bounds. Our algorithm achieves strong empirical results in different domains, including in a challenging problem of simulated car driving subject to multiple constraints such as lane keeping and smooth driving. We also show experimentally that our OPE method outperforms other popular OPE techniques on a standalone basis, especially in a high-dimensional setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2020

Causality and Batch Reinforcement Learning: Complementary Approaches To Planning In Unknown Domains

Reinforcement learning algorithms have had tremendous successes in onlin...
research
06/20/2020

Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies

We consider the problem of reinforcement learning when provided with a b...
research
02/19/2020

Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning

Off-policy reinforcement learning algorithms promise to be applicable in...
research
05/31/2021

Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

We study the problem of Safe Policy Improvement (SPI) under constraints ...
research
03/19/2021

On a probabilistic approach to synthesize control policies from example datasets

This paper is concerned with the design of control policies from example...
research
10/20/2020

Robust Constrained Reinforcement Learning for Continuous Control with Model Misspecification

Many real-world physical control systems are required to satisfy constra...
research
05/05/2023

Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement Learning

Standard approaches to sequential decision-making exploit an agent's abi...

Please sign up or login with your details

Forgot password? Click here to reset