Iterative Policy Learning in End-to-End Trainable Task-Oriented Neural Dialog Models

09/18/2017
by   Bing Liu, et al.
0

In this paper, we present a deep reinforcement learning (RL) framework for iterative dialog policy optimization in end-to-end task-oriented dialog systems. Popular approaches in learning dialog policy with RL include letting a dialog agent to learn against a user simulator. Building a reliable user simulator, however, is not trivial, often as difficult as building a good dialog agent. We address this challenge by jointly optimizing the dialog agent and the user simulator with deep RL by simulating dialogs between the two agents. We first bootstrap a basic dialog agent and a basic user simulator by learning directly from dialog corpora with supervised training. We then improve them further by letting the two agents to conduct task-oriented dialogs and iteratively optimizing their policies with deep RL. Both the dialog agent and the user simulator are designed with neural network models that can be trained end-to-end. Our experiment results show that the proposed method leads to promising improvements on task success rate and total task reward comparing to supervised training and single-agent RL training baseline models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2019

How to Build User Simulators to Train RL-based Dialog Systems

User simulators are essential for training reinforcement learning (RL) b...
research
04/08/2020

Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward Decomposition

Many studies have applied reinforcement learning to train a dialog polic...
research
06/08/2016

Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning

This paper presents an end-to-end framework for task-oriented dialog sys...
research
06/03/2016

End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning

This paper presents a model for end-to-end learning of task-oriented dia...
research
09/22/2020

SUMBT+LaRL: End-to-end Neural Task-oriented Dialog System with Reinforcement Learning

The recent advent of neural approaches for developing each dialog compon...
research
12/07/2017

End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient

Learning a goal-oriented dialog policy is generally performed offline wi...
research
07/17/2019

Learning End-to-End Goal-Oriented Dialog with Maximal User Task Success and Minimal Human Agent Use

Neural end-to-end goal-oriented dialog systems showed promise to reduce ...

Please sign up or login with your details

Forgot password? Click here to reset