Convex Q Learning in a Stochastic Environment: Extended Version

09/10/2023
by   Fan Lu, et al.
0

The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The algorithms and theory rest on a relaxation of a dual of Manne's celebrated linear programming characterization of optimal control. The main contributions firstly concern properties of the relaxation, described as a deterministic convex program: we identify conditions for a bounded solution, and a significant relationship between the solution to the new convex program, and the solution to standard Q-learning. The second set of contributions concern algorithm design and analysis: (i) A direct model-free method for approximating the convex program for Q-learning shares properties with its ideal. In particular, a bounded solution is ensured subject to a simple property of the basis functions; (ii) The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a mean-square sense; (iii) The approach can be generalized to a range of performance criteria, and it is found that variance can be reduced by considering “relative” dynamic programming equations; (iv) The theory is illustrated with an application to a classical inventory control problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/08/2020

Convex Q-Learning, Part 1: Deterministic Optimal Control

It is well known that the extension of Watkins' algorithm to general fun...
research
10/17/2022

Sufficient Exploration for Convex Q-learning

In recent years there has been a collective research effort to find new ...
research
10/14/2022

Model-Free Characterizations of the Hamilton-Jacobi-Bellman Equation and Convex Q-Learning in Continuous Time

Convex Q-learning is a recent approach to reinforcement learning, motiva...
research
03/28/2023

Numerical Methods for Convex Multistage Stochastic Optimization

Optimization problems involving sequential decisions in a stochastic env...
research
05/07/2020

A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes

The canonical solution methodology for finite constrained Markov decisio...
research
05/11/2019

Limited Resource Optimal Distribution Algorithm Based on Game Iteration Method

The article provides a solution algorithm for the linear programming pro...
research
10/21/2020

Logistic Q-Learning

We propose a new reinforcement learning algorithm derived from a regular...

Please sign up or login with your details

Forgot password? Click here to reset