Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems

06/09/2022

∙

by Mohamad Kazem Shirani Faradonbeh, et al.

∙

This work studies theoretical performance guarantees of a ubiquitous reinforcement learning policy for controlling the canonical model of stochastic linear-quadratic system. We show that randomized certainty equivalent policy addresses the exploration-exploitation dilemma for minimizing quadratic costs in linear dynamical systems that evolve according to stochastic differential equations. More precisely, we establish square-root of time regret bounds, indicating that randomized certainty equivalent policy learns optimal control actions fast from a single state trajectory. Further, linear scaling of the regret with the number of parameters is shown. The presented analysis introduces novel and useful technical approaches, and sheds light on fundamental challenges of continuous-time reinforcement learning.

READ FULL TEXT

Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems

Sign in with Google

Consider DeepAI Pro