Optimal Exploration for Model-Based RL in Nonlinear Systems

by   Andrew Wagenmaker, et al.

Learning to control unknown nonlinear dynamical systems is a fundamental problem in reinforcement learning and control theory. A commonly applied approach is to first explore the environment (exploration), learn an accurate model of it (system identification), and then compute an optimal controller with the minimum cost on this estimated system (policy optimization). While existing work has shown that it is possible to learn a uniformly good model of the system <cit.>, in practice, if we aim to learn a good controller with a low cost on the actual system, certain system parameters may be significantly more critical than others, and we therefore ought to focus our exploration on learning such parameters. In this work, we consider the setting of nonlinear dynamical systems and seek to formally quantify, in such settings, (a) which parameters are most relevant to learning a good controller, and (b) how we can best explore so as to minimize uncertainty in such parameters. Inspired by recent work in linear systems <cit.>, we show that minimizing the controller loss in nonlinear systems translates to estimating the system parameters in a particular, task-dependent metric. Motivated by this, we develop an algorithm able to efficiently explore the system to reduce uncertainty in this metric, and prove a lower bound showing that our approach learns a controller at a near-instance-optimal rate. Our algorithm relies on a general reduction from policy optimization to optimal experiment design in arbitrary systems, and may be of independent interest. We conclude with experiments demonstrating the effectiveness of our method in realistic nonlinear robotic systems.


page 1

page 2

page 3

page 4


Task-Optimal Exploration in Linear Dynamical Systems

Exploration in unknown environments is a fundamental problem in reinforc...

Analytic Estimation of Region of Attraction of an LQR Controller for Torque Limited Simple Pendulum

Linear-quadratic regulators (LQR) are a well known and widely used tool ...

Automatic Policy Synthesis to Improve the Safety of Nonlinear Dynamical Systems

Learning controllers merely based on a performance metric has been prove...

Stabilizing Dynamical Systems via Policy Gradient Methods

Stabilizing an unknown control system is one of the most fundamental pro...

CACTO: Continuous Actor-Critic with Trajectory Optimization – Towards global optimality

This paper presents a novel algorithm for the continuous control of dyna...

Synthesis of Feedback Controller for Nonlinear Control Systems with Optimal Region of Attraction

The problem of computing and characterizing Region of Attraction (ROA) w...

Agnostic System Identification for Model-Based Reinforcement Learning

A fundamental problem in control is to learn a model of a system from ob...

Please sign up or login with your details

Forgot password? Click here to reset