Scalable regret for learning to control network-coupled subsystems with unknown dynamics

08/18/2021
โˆ™
by   Sagar Sudhakara, et al.
โˆ™
0
โˆ™

We consider the problem of controlling an unknown linear quadratic Gaussian (LQG) system consisting of multiple subsystems connected over a network. Our goal is to minimize and quantify the regret (i.e. loss in performance) of our strategy with respect to an oracle who knows the system model. Viewing the interconnected subsystems globally and directly using existing LQG learning algorithms for the global system results in a regret that increases super-linearly with the number of subsystems. Instead, we propose a new Thompson sampling based learning algorithm which exploits the structure of the underlying network. We show that the expected regret of the proposed algorithm is bounded by ๐’ชฬƒ( n โˆš(T)) where n is the number of subsystems, T is the time horizon and the ๐’ชฬƒ(ยท) notation hides logarithmic terms in n and T. Thus, the regret scales linearly with the number of subsystems. We present numerical experiments to illustrate the salient features of the proposed algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
โˆ™ 11/09/2020

Thompson sampling for linear quadratic mean-field teams

We consider optimal control of an unknown multi-agent linear quadratic (...
research
โˆ™ 02/07/2022

On learning Whittle index policy for restless bandits with scalable regret

Reinforcement learning is an attractive approach to learn good resource ...
research
โˆ™ 02/19/2020

Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently

We consider the problem of learning in Linear Quadratic Control systems ...
research
โˆ™ 09/29/2021

Minimal Expected Regret in Linear Quadratic Control

We consider the problem of online learning in Linear Quadratic Control s...
research
โˆ™ 10/17/2022

Regret Bounds for Learning Decentralized Linear Quadratic Regulator with Partially Nested Information Structure

We study the problem of learning decentralized linear quadratic regulato...
research
โˆ™ 08/19/2021

A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems

We revisit the Thompson sampling algorithm to control an unknown linear ...
research
โˆ™ 06/30/2020

Provably More Efficient Q-Learning in the Full-Feedback/One-Sided-Feedback Settings

We propose two new Q-learning algorithms, Full-Q-Learning (FQL) and Elim...

Please sign up or login with your details

Forgot password? Click here to reset