Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems

by   Weiwei Sun, et al.

Evaluation is crucial in the development process of task-oriented dialogue systems. As an evaluation method, user simulation allows us to tackle issues such as scalability and cost-efficiency, making it a viable choice for large-scale automatic evaluation. To help build a human-like user simulator that can measure the quality of a dialogue, we propose the following task: simulating user satisfaction for the evaluation of task-oriented dialogue systems. The purpose of the task is to increase the evaluation power of user simulations and to make the simulation more human-like. To overcome a lack of annotated data, we propose a user satisfaction annotation dataset, USS, that includes 6,800 dialogues sampled from multiple domains, spanning real-world e-commerce dialogues, task-oriented dialogues constructed through Wizard-of-Oz experiments, and movie recommendation dialogues. All user utterances in those dialogues, as well as the dialogues themselves, have been labeled based on a 5-level satisfaction scale. We also share three baseline methods for user satisfaction prediction and action prediction tasks. Experiments conducted on the USS dataset suggest that distributed representations outperform feature-based methods. A model based on hierarchical GRUs achieves the best performance in in-domain user satisfaction prediction, while a BERT-based model has better cross-domain generalization ability.


page 1

page 2

page 3

page 4


CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset

To advance multi-domain (cross-domain) dialogue modeling as well as alle...

Understanding User Satisfaction with Task-oriented Dialogue Systems

Dialogue systems are evaluated depending on their type and purpose. Two ...

Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems

Task-oriented dialogue systems (TDSs) are assessed mainly in an offline ...

Enhancing Large Language Model Induced Task-Oriented Dialogue Systems Through Look-Forward Motivated Goals

Recently, the development of large language models (LLMs) has been signi...

Is MultiWOZ a Solved Task? An Interactive TOD Evaluation Framework with User Simulator

Task-Oriented Dialogue (TOD) systems are drawing more and more attention...

Domain-Independent turn-level Dialogue Quality Evaluation via User Satisfaction Estimation

An automated metric to evaluate dialogue quality is vital for optimizing...

A Transformer-Based User Satisfaction Prediction for Proactive Interaction Mechanism in DuerOS

Recently, spoken dialogue systems have been widely deployed in a variety...

Please sign up or login with your details

Forgot password? Click here to reset