Periodic Q-Learning

02/23/2020
by   Donghwan Lee, et al.
0

The use of target networks is a common practice in deep reinforcement learning for stabilizing the training; however, theoretical understanding of this technique is still limited. In this paper, we study the so-called periodic Q-learning algorithm (PQ-learning for short), which resembles the technique used in deep Q-learning for solving infinite-horizon discounted Markov decision processes (DMDP) in the tabular setting. PQ-learning maintains two separate Q-value estimates - the online estimate and target estimate. The online estimate follows the standard Q-learning update, while the target estimate is updated periodically. In contrast to the standard Q-learning, PQ-learning enjoys a simple finite time analysis and achieves better sample complexity for finding an epsilon-optimal policy. Our result provides a preliminary justification of the effectiveness of utilizing target estimates or networks in Q-learning algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2019

Target-Based Temporal Difference Learning

The use of target networks has been a popular and key component of recen...
research
09/23/2020

CertRL: Formalizing Convergence Proofs for Value and Policy Iteration in Coq

Reinforcement learning algorithms solve sequential decision-making probl...
research
10/27/2021

Finite Horizon Q-learning: Stability, Convergence and Simulations

Q-learning is a popular reinforcement learning algorithm. This algorithm...
research
12/08/2016

Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning

We study the online estimation of the optimal policy of a Markov decisio...
research
03/22/2022

A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle

Q-learning with function approximation could diverge in the off-policy s...
research
02/24/2023

Why Target Networks Stabilise Temporal Difference Methods

Integral to recent successes in deep reinforcement learning has been a c...
research
06/04/2021

Beyond Target Networks: Improving Deep Q-learning with Functional Regularization

Target networks are at the core of recent success in Reinforcement Learn...

Please sign up or login with your details

Forgot password? Click here to reset