On the Search for Feedback in Reinforcement Learning

02/21/2020
by   Ran Wang, et al.
10

This paper addresses the problem of learning the optimal feedback policy for a nonlinear stochastic dynamical system with continuous state space, continuous action space and unknown dynamics. Feedback policies are complex objects that typically need a large dimensional parametrization, which makes Reinforcement Learning algorithms that search for an optimum in this large parameter space, sample inefficient and subject to high variance. We propose a "decoupling" principle that drastically reduces the feedback parameter space while still remaining near-optimal to the fourth-order in a small noise parameter. Based on this principle, we propose a decoupled data-based control (D2C) algorithm that addresses the stochastic control problem: first, an open-loop deterministic trajectory optimization problem is solved using a black-box simulation model of the dynamical system. Then, a linear closed-loop control is developed around this nominal trajectory using only a simulation model. Empirical evidence suggests significant reduction in training time, as well as the training variance, compared to other state of the art Reinforcement Learning algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/17/2019

Decoupled Data Based Approach for Learning to Control Nonlinear Dynamical Systems

This paper addresses the problem of learning the optimal control policy ...
research
04/01/2020

Near Optimality and Tractability in Stochastic Nonlinear Control

We consider the problem of nonlinear stochastic optimal control. This is...
research
02/21/2020

Experiments with Tractable Feedback in Robotic Planning under Uncertainty: Insights over a wide range of noise regimes

We consider the problem of robotic planning under uncertainty. This prob...
research
12/21/2020

Explicitly Encouraging Low Fractional Dimensional Trajectories Via Reinforcement Learning

A key limitation in using various modern methods of machine learning in ...
research
01/31/2019

Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization Perspective

Black-box optimizers that explore in parameter space have often been sho...
research
11/21/2020

On the Convergence of Reinforcement Learning

We consider the problem of Reinforcement Learning for nonlinear stochast...
research
08/29/2019

Enabling Simulation-Based Optimization Through Machine Learning: A Case Study on Antenna Design

Complex phenomena are generally modeled with sophisticated simulators th...

Please sign up or login with your details

Forgot password? Click here to reset