Smoothed Q-learning

03/15/2023
by   David Barber, et al.
0

In Reinforcement Learning the Q-learning algorithm provably converges to the optimal solution. However, as others have demonstrated, Q-learning can also overestimate the values and thereby spend too long exploring unhelpful states. Double Q-learning is a provably convergent alternative that mitigates some of the overestimation issues, though sometimes at the expense of slower convergence. We introduce an alternative algorithm that replaces the max operation with an average, resulting also in a provably convergent off-policy algorithm which can mitigate overestimation yet retain similar convergence as standard Q-learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2020

Provably Efficient Exploration for RL with Unsupervised Learning

We study how to use unsupervised learning for efficient exploration in r...
research
05/24/2023

Replicable Reinforcement Learning

The replicability crisis in the social, behavioral, and data sciences ha...
research
11/27/2015

On the convergence of cycle detection for navigational reinforcement learning

We consider a reinforcement learning framework where agents have to navi...
research
08/23/2023

Diverse Policies Converge in Reward-free Markov Decision Processe

Reinforcement learning has achieved great success in many decision-makin...
research
03/10/2021

Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme

We analyze the DQN reinforcement learning algorithm as a stochastic appr...
research
06/18/2020

Provably adaptive reinforcement learning in metric spaces

We study reinforcement learning in continuous state and action spaces en...
research
06/08/2023

Negotiated Reasoning: On Provably Addressing Relative Over-Generalization

Over-generalization is a thorny issue in cognitive science, where people...

Please sign up or login with your details

Forgot password? Click here to reset