Stochastic convex optimization for provably efficient apprenticeship learning

12/31/2021
by   Angeliki Kamoutsi, et al.
0

We consider large-scale Markov decision processes (MDPs) with an unknown cost function and employ stochastic convex optimization tools to address the problem of imitation learning, which consists of learning a policy from a finite set of expert demonstrations. We adopt the apprenticeship learning formalism, which carries the assumption that the true cost function can be represented as a linear combination of some known features. Existing inverse reinforcement learning algorithms come with strong theoretical guarantees, but are computationally expensive because they use reinforcement learning or planning algorithms as a subroutine. On the other hand, state-of-the-art policy gradient based algorithms (like IM-REINFORCE, IM-TRPO, and GAIL), achieve significant empirical success in challenging benchmark tasks, but are not well understood in terms of theory. With an emphasis on non-asymptotic guarantees of performance, we propose a method that directly learns a policy from expert demonstrations, bypassing the intermediate step of learning the cost function, by formulating the problem as a single convex optimization problem over occupancy measures. We develop a computationally efficient algorithm and derive high confidence regret bounds on the quality of the extracted policy, utilizing results from stochastic convex optimization and recent works in approximate linear programming for solving forward MDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/28/2021

Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations

We consider large-scale Markov decision processes with an unknown cost f...
research
05/20/2018

Transitions, Losses, and Re-parameterizations: Elements of Prediction Games

This thesis presents some geometric insights into three different types ...
research
06/03/2022

Rate-Optimal Online Convex Optimization in Adaptive Linear Control

We consider the problem of controlling an unknown linear dynamical syste...
research
09/21/2022

On the convex formulations of robust Markov decision processes

Robust Markov decision processes (MDPs) are used for applications of dyn...
research
03/02/2022

Efficient Online Linear Control with Stochastic Convex Costs and Unknown Dynamics

We consider the problem of controlling an unknown linear dynamical syste...
research
03/27/2019

Skill Acquisition via Automated Multi-Coordinate Cost Balancing

We propose a learning framework, named Multi-Coordinate Cost Balancing (...
research
06/11/2021

Safe Reinforcement Learning with Linear Function Approximation

Safety in reinforcement learning has become increasingly important in re...

Please sign up or login with your details

Forgot password? Click here to reset