Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics

05/08/2020
by   Arsenii Kuznetsov, et al.
0

The overestimation bias is one of the major impediments to accurate off-policy learning. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting. Our method—Truncated Quantile Critics, TQC,—blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics. Distributional representation and truncation allow for arbitrary granular overestimation control, while ensembling provides additional score improvements. TQC outperforms the current state of the art on all environments from the continuous control benchmark suite, demonstrating 25 the most challenging Humanoid environment.

READ FULL TEXT
research
11/17/2021

Aggressive Q-Learning with Ensembles: Achieving Both High Sample Efficiency and High Asymptotic Performance

Recently, Truncated Quantile Critics (TQC), using distributional represe...
research
11/24/2021

Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Accurate value estimates are important for off-policy reinforcement lear...
research
12/29/2022

Invariance to Quantile Selection in Distributional Continuous Control

In recent years distributional reinforcement learning has produced many ...
research
04/21/2022

Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach

Actor-critic algorithms that make use of distributional policy evaluatio...
research
05/26/2023

Distributional Reinforcement Learning with Dual Expectile-Quantile Regression

Successful applications of distributional reinforcement learning with qu...
research
02/22/2021

Distributional data analysis via quantile functions and its application to modelling digital biomarkers of gait in Alzheimer's Disease

With the advent of continuous health monitoring via wearable devices, us...
research
12/14/2017

Nonparametric Adaptive CUSUM Chart for Detecting Arbitrary Distributional Changes

Nonparametric control charts that can detect arbitrary distributional ch...

Please sign up or login with your details

Forgot password? Click here to reset