Automatic Curriculum Learning With Over-repetition Penalty for Dialogue Policy Learning

by   Yangyang Zhao, et al.

Dialogue policy learning based on reinforcement learning is difficult to be applied to real users to train dialogue agents from scratch because of the high cost. User simulators, which choose random user goals for the dialogue agent to train on, have been considered as an affordable substitute for real users. However, this random sampling method ignores the law of human learning, making the learned dialogue policy inefficient and unstable. We propose a novel framework, Automatic Curriculum Learning-based Deep Q-Network (ACL-DQN), which replaces the traditional random sampling method with a teacher policy model to realize the dialogue policy for automatic curriculum learning. The teacher model arranges a meaningful ordered curriculum and automatically adjusts it by monitoring the learning progress of the dialogue agent and the over-repetition penalty without any requirement of prior knowledge. The learning progress of the dialogue agent reflects the relationship between the dialogue agent's ability and the sampled goals' difficulty for sample efficiency. The over-repetition penalty guarantees the sampled diversity. Experiments show that the ACL-DQN significantly improves the effectiveness and stability of dialogue tasks with a statistically significant margin. Furthermore, the framework can be further improved by equipping with different curriculum schedules, which demonstrates that the framework has strong generalizability.


Dialogue Response Selection with Hierarchical Curriculum Learning

We study the learning of a matching model for dialogue response selectio...

Accuracy-based Curriculum Learning in Deep Reinforcement Learning

In this paper, we investigate a new form of automated curriculum learnin...

Integrating planning for task-completion dialogue policy learning

Training a task-completion dialogue agent with real users via reinforcem...

Trying AGAIN instead of Trying Longer: Prior Learning for Automatic Curriculum Learning

A major challenge in the Deep RL (DRL) community is to train agents able...

PCC: Paraphrasing with Bottom-k Sampling and Cyclic Learning for Curriculum Data Augmentation

Curriculum Data Augmentation (CDA) improves neural models by presenting ...

Mastering Rate based Curriculum Learning

Recent automatic curriculum learning algorithms, and in particular Teach...

It Takes Four to Tango: Multiagent Selfplay for Automatic Curriculum Generation

We are interested in training general-purpose reinforcement learning age...

Please sign up or login with your details

Forgot password? Click here to reset