Convex Q-Learning, Part 1: Deterministic Optimal Control

08/08/2020
by   Prashant G. Mehta, et al.
0

It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging: does the projected Bellman equation have a solution? If so, is the solution useful in the sense of generating a good policy? And, if the preceding questions are answered in the affirmative, is the algorithm consistent? These questions are unanswered even in the special case of Q-function approximations that are linear in the parameter. The challenge seems paradoxical, given the long history of convex analytic approaches to dynamic programming. The paper begins with a brief survey of linear programming approaches to optimal control, leading to a particular over parameterization that lends itself to applications in reinforcement learning. The main conclusions are summarized as follows: (i) The new class of convex Q-learning algorithms is introduced based on the convex relaxation of the Bellman equation. Convergence is established under general conditions, including a linear function approximation for the Q-function. (ii) A batch implementation appears similar to the famed DQN algorithm (one engine behind AlphaZero). It is shown that in fact the algorithms are very different: while convex Q-learning solves a convex program that approximates the Bellman equation, theory for DQN is no stronger than for Watkins' algorithm with function approximation: (a) it is shown that both seek solutions to the same fixed point equation, and (b) the ODE approximations for the two algorithms coincide, and little is known about the stability of this ODE. These results are obtained for deterministic nonlinear systems with total cost criterion. Many extensions are proposed, including kernel implementation, and extension to MDP models.

READ FULL TEXT
research
09/10/2023

Convex Q Learning in a Stochastic Environment: Extended Version

The paper introduces the first formulation of convex Q-learning for Mark...
research
10/11/2019

Zap Q-Learning With Nonlinear Function Approximation

The Zap stochastic approximation (SA) algorithm was introduced recently ...
research
07/15/2022

Approximation of Optimal Control Problems for the Navier-Stokes equation via multilinear HJB-POD

We consider the approximation of some optimal control problems for the N...
research
12/18/2014

Theoretical and Numerical Analysis of Approximate Dynamic Programming with Approximation Errors

This study is aimed at answering the famous question of how the approxim...
research
05/17/2023

The mathematical theory of Hughes' model: a survey of results

We provide an overview of the results on Hughes' model for pedestrian mo...
research
10/14/2022

Model-Free Characterizations of the Hamilton-Jacobi-Bellman Equation and Convex Q-Learning in Continuous Time

Convex Q-learning is a recent approach to reinforcement learning, motiva...
research
10/17/2022

Sufficient Exploration for Convex Q-learning

In recent years there has been a collective research effort to find new ...

Please sign up or login with your details

Forgot password? Click here to reset