Sparse and Dense Approaches for the Full-rank Retrieval of Responses for Dialogues

by   Gustavo Penha, et al.

Ranking responses for a given dialogue context is a popular benchmark in which the setup is to re-rank the ground-truth response over a limited set of n responses, where n is typically 10. The predominance of this setup in conversation response ranking has lead to a great deal of attention to building neural re-rankers, while the first-stage retrieval step has been overlooked. Since the correct answer is always available in the candidate list of n responses, this artificial evaluation setup assumes that there is a first-stage retrieval step which is always able to rank the correct response in its top-n list. In this paper we focus on the more realistic task of full-rank retrieval of responses, where n can be up to millions of responses. We investigate both dialogue context and response expansion techniques for sparse retrieval, as well as zero-shot and fine-tuned dense retrieval approaches. Our findings based on three different information-seeking dialogue datasets reveal that a learned response expansion technique is a solid baseline for sparse retrieval. We find the best performing method overall to be dense retrieval with intermediate training, i.e. a step after the language model pre-training where sentence representations are learned, followed by fine-tuning on the target conversational data. We also investigate the intriguing phenomena that harder negatives sampling techniques lead to worse results for the fine-tuned dense retrieval models. The code and datasets are available at


Do the Findings of Document and Passage Retrieval Generalize to the Retrieval of Responses for Dialogues?

A number of learned sparse and dense retrieval approaches have recently ...

Exploring Dense Retrieval for Dialogue Response Selection

Recent research on dialogue response selection has been mainly focused o...

Polite Dialogue Generation Without Parallel Data

Stylistic dialogue response generation, with valuable applications in pe...

Benchmarking Middle-Trained Language Models for Neural Search

Middle training methods aim to bridge the gap between the Masked Languag...

A Benchmark for Understanding Dialogue Safety in Mental Health Support

Dialogue safety remains a pervasive challenge in open-domain human-machi...

Deploying a Retrieval based Response Model for Task Oriented Dialogues

Task-oriented dialogue systems in industry settings need to have high co...

N-best Response-based Analysis of Contradiction-awareness in Neural Response Generation Models

Avoiding the generation of responses that contradict the preceding conte...

Please sign up or login with your details

Forgot password? Click here to reset