Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face

02/28/2023
by   Christopher Akiki, et al.
0

We present Spacerini, a modular framework for seamless building and deployment of interactive search applications, designed to facilitate the qualitative analysis of large scale research datasets. Spacerini integrates features from both the Pyserini toolkit and the Hugging Face ecosystem to ease the indexing text collections and deploy them as search engines for ad-hoc exploration and to make the retrieval of relevant data points quick and efficient. The user-friendly interface enables searching through massive datasets in a no-code fashion, making Spacerini broadly accessible to anyone looking to qualitatively audit their text collections. This is useful both to IR researchers aiming to demonstrate the capabilities of their indexes in a simple and interactive way, and to NLP researchers looking to better understand and audit the failure modes of large language models. The framework is open source and available on GitHub: https://github.com/castorini/hf-spacerini, and includes utilities to load, pre-process, index, and deploy local and web search applications. A portfolio of applications created with Spacerini for a multitude of use cases can be found by visiting https://hf.co/spacerini.

READ FULL TEXT
research
06/02/2023

GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration

Noticing the urgent need to provide tools for fast and user-friendly qua...
research
10/05/2018

Sifaka: Text Mining Above a Search API

Text mining and analytics software has become popular, but little attent...
research
05/12/2023

Knowledge Refinement via Interaction Between Search Engines and Large Language Models

Information retrieval (IR) plays a crucial role in locating relevant res...
research
08/28/2023

RefSearch: A Search Engine for Refactoring

Developers often refactor source code to improve its quality during soft...
research
12/04/2019

WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset

Over the past years, deep learning methods allowed for new state-of-the-...
research
06/17/2022

CLEAR: A Fully User-side Image Search System

We use many search engines on the Internet in our daily lives. However, ...
research
04/18/2019

Exquisitor: Interactive Learning at Large

Increasing scale is a dominant trend in today's multimedia collections, ...

Please sign up or login with your details

Forgot password? Click here to reset