Learning More From Less: Towards Strengthening Weak Supervision for Ad-Hoc Retrieval

07/19/2019
by   Dany Haddad, et al.
0

The limited availability of ground truth relevance labels has been a major impediment to the application of supervised methods to ad-hoc retrieval. As a result, unsupervised scoring methods, such as BM25, remain strong competitors to deep learning techniques which have brought on dramatic improvements in other domains, such as computer vision and natural language processing. Recent works have shown that it is possible to take advantage of the performance of these unsupervised methods to generate training data for learning-to-rank models. The key limitation to this line of work is the size of the training set required to surpass the performance of the original unsupervised method, which can be as large as 10^13 training examples. Building on these insights, we propose two methods to reduce the amount of training data required. The first method takes inspiration from crowdsourcing, and leverages multiple unsupervised rankers to generate soft, or noise-aware, training labels. The second identifies harmful, or mislabeled, training examples and removes them from the training set. We show that our methods allow us to surpass the performance of the unsupervised baseline with far fewer training examples than previous works.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2018

Adversarial Sampling and Training for Semi-Supervised Information Retrieval

Modern ad-hoc retrieval models learned with implicit feedback have two p...
research
12/20/2022

Careful Data Curation Stabilizes In-context Learning

In-context learning (ICL) enables large language models (LLMs) to perfor...
research
04/29/2020

Zero-shot Neural Retrieval via Domain-targeted Synthetic Query Generation

Deep neural scoring models have recently been shown to improve ranking q...
research
05/18/2023

Query Performance Prediction: From Ad-hoc to Conversational Search

Query performance prediction (QPP) is a core task in information retriev...
research
05/19/2017

Sparse Coding on Stereo Video for Object Detection

Deep Convolutional Neural Networks (DCNN) require millions of labeled tr...
research
06/26/2016

Fast Incremental Learning for Off-Road Robot Navigation

A promising approach to autonomous driving is machine learning. In such ...
research
02/09/2018

Information Planning for Text Data

Information planning enables faster learning with fewer training example...

Please sign up or login with your details

Forgot password? Click here to reset