Closing in on Time and Space Optimal Construction of Compressed Indexes

12/13/2017
by   Dominik Kempa, et al.
0

Fast and space-efficient construction of compressed indexes such as compressed suffix array (CSA) and compressed suffix tree (CST) has been a major open problem until recently, when Belazzougui [STOC 2014] described an algorithm able to build both of these data structures in O(n) (randomized; later improved by the same author to deterministic) time and O(n/_σn) words of space, where n is the length of the string and σ is the alphabet size. Shortly after, Munro et al. [SODA 2017] described another deterministic construction using the same time and space based on different techniques. It has remained an elusive open problem since then whether these bounds are optimal or, assuming non-wasteful text encoding, the construction achieving O(n / _σn) time and space is possible. In this paper we provide a first algorithm that can achieve these bounds. We show a deterministic algorithm that constructs CSA and CST using O(n / _σ n + r ^11 n) time and O(n / _σ n + r ^10 n) working space, where r is the number of runs in the Burrows-Wheeler transform of the input text. As one of the applications of our techniques we show how to compute the LZ77 parsing in O(n/_σn + r^11n+z^10n) time and O(n/_σn + r^9n) space, which is optimal for highly repetitive strings.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset