Atomized Search Length: Beyond User Models

01/05/2022
by   John Alex, et al.
0

We argue that current IR metrics, modeled on optimizing user experience, measure too narrow a portion of the IR space. If IR systems are weak, these metrics undersample or completely filter out the deeper documents that need improvement. If IR systems are relatively strong, these metrics undersample deeper relevant documents that could underpin even stronger IR systems, ones that could present content from tens or hundreds of relevant documents in a user-digestible hierarchy or text summary. We reanalyze over 70 TREC tracks from the past 28 years, showing that roughly half undersample top ranked documents and nearly all undersample tail documents. We show that in the 2020 Deep Learning tracks, neural systems were actually near-optimal at top-ranked documents, compared to only modest gains over BM25 on tail documents. Our analysis is based on a simple new systems-oriented metric, 'atomized search length', which is capable of accurately and evenly measuring all relevant documents at any depth.

READ FULL TEXT

page 4

page 5

page 7

research
07/07/2022

On the Metric Properties of IR Evaluation Measures Based on Ranking Axioms

The axiomatic analysis of IR evaluation metrics has contributed to a bet...
research
07/04/2022

On the Effect of Ranking Axioms on IR Evaluation Metrics

The study of IR evaluation metrics through axiomatic analysis enables a ...
research
03/23/2019

Action-Centered Information Retrieval

Information Retrieval (IR) aims at retrieving documents that are most re...
research
01/19/2023

New Metrics to Encourage Innovation and Diversity in Information Retrieval Approaches

In evaluation campaigns, participants often explore variations of popula...
research
05/17/2022

Moving Stuff Around: A study on efficiency of moving documents into memory for Neural IR models

When training neural rankers using Large Language Models, it's expected ...
research
05/01/2023

A Blueprint of IR Evaluation Integrating Task and User Characteristics: Test Collection and Evaluation Metrics

Relevance is generally understood as a multi-level and multi-dimensional...
research
09/12/2022

Joint Upper Lower Bound Normalization for IR Evaluation

In this paper, we present a novel perspective towards IR evaluation by p...

Please sign up or login with your details

Forgot password? Click here to reset