Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations

02/19/2021
by   Jimmy Lin, et al.
0

Pyserini is an easy-to-use Python toolkit that supports replicable IR research by providing effective first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections. We aim to support, out of the box, the entire research lifecycle of efforts aimed at improving ranking with modern neural approaches. In particular, Pyserini supports sparse retrieval (e.g., BM25 scoring using bag-of-words representations), dense retrieval (e.g., nearest-neighbor search on transformer-encoded representations), as well as hybrid retrieval that integrates both approaches. This paper provides an overview of toolkit features and presents empirical results that illustrate its effectiveness on two popular ranking tasks. We also describe how our group has built a culture of replicability through shared norms and tools that enable rigorous automated testing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2023

Anserini Gets Dense Retrieval: Integration of Lucene's HNSW Indexes

Anserini is a Lucene-based toolkit for reproducible information retrieva...
research
10/28/2020

Flexible retrieval with NMSLIB and FlexNeuART

Our objective is to introduce to the NLP community an existing k-NN sear...
research
03/11/2022

Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval

Recent rapid advancements in deep pre-trained language models and the in...
research
05/24/2023

Ranger: A Toolkit for Effect-Size Based Multi-Task Evaluation

In this paper, we introduce Ranger - a toolkit to facilitate the easy us...
research
04/12/2021

A Replication Study of Dense Passage Retriever

Text retrieval using learned dense representations has recently emerged ...
research
06/28/2021

A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques

Recent developments in representational learning for information retriev...
research
04/19/2019

Critically Examining the "Neural Hype": Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models

Is neural IR mostly hype? In a recent SIGIR Forum article, Lin expressed...

Please sign up or login with your details

Forgot password? Click here to reset