Substring Complexity in Sublinear Space

07/16/2020
by   Giulia Bernardini, et al.
0

Shannon's entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad-hoc measures are employed to estimate the repetitiveness of strings, e.g., the size z of the Lempel-Ziv parse or the number r of equal-letter runs of the Burrows-Wheeler transform. A more recent one is the size γ of a smallest string attractor. Unfortunately, Kempa and Prezza [STOC 2018] showed that computing γ is NP-hard. Kociumaka et al. [LATIN 2020] considered a new measure that is based on the function S_T counting the cardinalities of the sets of substrings of each length of T, also known as the substring complexity. This new measure is defined as δ= sup{S_T(k)/k, k≥ 1} and lower bounds all the measures previously considered. In particular, δ≤γ always holds and δ can be computed in 𝒪(n) time using Ω(n) working space. Kociumaka et al. showed that if δ is given, one can construct an 𝒪(δlogn/δ)-sized representation of T supporting efficient direct access and efficient pattern matching queries on T. Given that for highly compressible strings, δ is significantly smaller than n, it is natural to pose the following question: Can we compute δ efficiently using sublinear working space? It is straightforward to show that any algorithm computing δ using 𝒪(b) space requires Ω(n^2-o(1)/b) time through a reduction from the element distinctness problem [Yao, SIAM J. Comput. 1994]. We present the following results: an 𝒪(n^3/b^2)-time and 𝒪(b)-space algorithm to compute δ, for any b∈[1,n]; and an 𝒪̃(n^2/b)-time and 𝒪(b)-space algorithm to compute δ, for any b∈[n^2/3,n].

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2019

Towards a Definitive Measure of Repetitiveness

Unlike in statistical compression, where Shannon's entropy is a definiti...
research
02/16/2022

An Optimal-Time RLBWT Construction in BWT-runs Bounded Space

The compression of highly repetitive strings (i.e., strings with many re...
research
01/17/2018

Nondeterminisic Sublinear Time Has Measure 0 in P

The measure hypothesis is a quantitative strengthening of the P != NP co...
research
04/08/2019

String Synchronizing Sets: Sublinear-Time BWT Construction and Optimal LCE Data Structure

Burrows-Wheeler transform (BWT) is an invertible text transformation tha...
research
11/13/2020

Substring Query Complexity of String Reconstruction

Suppose an oracle knows a string S that is unknown to us and we want to ...
research
06/29/2020

Pattern Masking for Dictionary Matching

In the Pattern Masking for Dictionary Matching (PMDM) problem, we are gi...
research
03/05/2018

Optimal Substring-Equality Queries with Applications to Sparse Text Indexing

We consider the problem of encoding a string of length n from an alphabe...

Please sign up or login with your details

Forgot password? Click here to reset