Optimal Rates for Bandit Nonstochastic Control

05/24/2023
by   Y. Jennifer Sun, et al.
0

Linear Quadratic Regulator (LQR) and Linear Quadratic Gaussian (LQG) control are foundational and extensively researched problems in optimal control. We investigate LQR and LQG problems with semi-adversarial perturbations and time-varying adversarial bandit loss functions. The best-known sublinear regret algorithm of <cit.> has a T^3/4 time horizon dependence, and its authors posed an open question about whether a tight rate of √(T) could be achieved. We answer in the affirmative, giving an algorithm for bandit LQR and LQG which attains optimal regret (up to logarithmic factors) for both known and unknown systems. A central component of our method is a new scheme for bandit convex optimization with memory, which is of independent interest.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset