On Slicing Sorted Integer Sequences

07/01/2019
by   Giulio Ermanno Pibiri, et al.
0

Representing sorted integer sequences in small space is a central problem for large-scale retrieval systems such as Web search engines. Efficient query resolution, e.g., intersection or random access, is achieved by carefully partitioning the sequences. In this work we describe and compare two different partitioning paradigms: partitioning by cardinality and partitioning by universe. Although the ideas behind such paradigms have been known in the coding and algorithmic community since many years, inverted index compression has extensively adopted the former paradigm, whereas the latter has received only little attention. As a result, an experimental comparison between these two is missing for the setting of inverted index compression. We also propose and implement a solution that recursively slices the universe of representation of a sequence to achieve compact storage and attain to fast query execution. Albeit larger than some state-of-the-art representations, this slicing approach substantially improves the performance of list intersections and unions while operating in compressed space, thus offering an excellent space/time trade-off for the problem.

READ FULL TEXT
research
08/28/2019

Techniques for Inverted Index Compression

The data structure at the core of large-scale search engines is the inve...
research
12/02/2022

Trie-Compressed Intersectable Sets

We introduce space- and time-efficient algorithms and data structures fo...
research
04/29/2018

Variable-Byte Encoding is Now Space-Efficient Too

The ubiquitous Variable-Byte encoding is considered one of the fastest c...
research
05/08/2021

Construction of Sparse Suffix Trees and LCE Indexes in Optimal Time and Space

The notions of synchronizing and partitioning sets are recently introduc...
research
04/16/2019

Compressed Indexes for Fast Search of Semantic Data

The sheer increase in volume of RDF data demands efficient solutions for...
research
05/13/2020

Efficient and Effective Query Auto-Completion

Query Auto-Completion (QAC) is an ubiquitous feature of modern textual s...
research
05/24/2023

Towards Optimizing Storage Costs on the Cloud

We study the problem of optimizing data storage and access costs on the ...

Please sign up or login with your details

Forgot password? Click here to reset