Anti-Overestimation Dialogue Policy Learning for Task-Completion Dialogue System

07/24/2022
by   Chang Tian, et al.
0

A dialogue policy module is an essential part of task-completion dialogue systems. Recently, increasing interest has focused on reinforcement learning (RL)-based dialogue policy. Its favorable performance and wise action decisions rely on an accurate estimation of action values. The overestimation problem is a widely known issue of RL since its estimate of the maximum action value is larger than the ground truth, which results in an unstable learning process and suboptimal policy. This problem is detrimental to RL-based dialogue policy learning. To mitigate this problem, this paper proposes a dynamic partial average estimator (DPAV) of the ground truth maximum action value. DPAV calculates the partial average between the predicted maximum action value and minimum action value, where the weights are dynamically adaptive and problem-dependent. We incorporate DPAV into a deep Q-network as the dialogue policy and show that our method can achieve better or comparable results compared to top baselines on three dialogue datasets of different domains with a lower computational load. In addition, we also theoretically prove the convergence and derive the upper and lower bounds of the bias compared with those of other methods.

READ FULL TEXT
research
02/28/2022

A Survey on Recent Advances and Challenges in Reinforcement LearningMethods for Task-Oriented Dialogue Policy Learning

Dialogue Policy Learning is a key component in a task-oriented dialogue ...
research
09/21/2020

Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems

Dialogue policy learning for task-oriented dialogue systems has enjoyed ...
research
03/22/2023

Deep RL with Hierarchical Action Exploration for Dialogue Generation

Conventionally, since the natural language action space is astronomical,...
research
05/16/2022

Taming Continuous Posteriors for Latent Variational Dialogue Policies

Utilizing amortized variational inference for latent-action reinforcemen...
research
03/08/2018

Feudal Reinforcement Learning for Dialogue Management in Large Domains

Reinforcement learning (RL) is a promising approach to solve dialogue po...
research
11/18/2020

LAVA: Latent Action Spaces via Variational Auto-encoding for Dialogue Policy Optimization

Reinforcement learning (RL) can enable task-oriented dialogue systems to...
research
11/30/2017

Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation

In statistical dialogue management, the dialogue manager learns a policy...

Please sign up or login with your details

Forgot password? Click here to reset