The Read-Optimized Burrows-Wheeler Transform

09/19/2018
by   Travis Gagie, et al.
0

The advent of high-throughput sequencing has resulted in massive genomic datasets, some consisting of assembled genomes but others consisting of raw reads. We consider how to reduce the amount of space needed to index a set of reads, in particular how to reduce the number of runs in the Burrows-Wheeler Transform (BWT) that is the basis of FM-indexing. The best current fully-functional index for repetitive collections (Gagie et al., SODA 2018) uses space proportional to this number.

READ FULL TEXT

page 1

page 2

research
03/29/2021

A Fast and Small Subsampled R-index

The r-index (Gagie et al., JACM 2020) represented a breakthrough in comp...
research
10/10/2019

E2FM: an encrypted and compressed full-text index for collections of genomic sequences

Next Generation Sequencing (NGS) platforms and, more generally, high-thr...
research
12/02/2022

Computing the optimal BWT of very large string collections

It is known that the exact form of the Burrows-Wheeler-Transform (BWT) o...
research
11/16/2018

Efficient Construction of a Complete Index for Pan-Genomics Read Alignment

While short read aligners, which predominantly use the FM-index, are abl...
research
05/03/2022

Computing Maximal Unique Matches with the r-index

In recent years, pangenomes received increasing attention from the scien...
research
02/26/2022

A theoretical and experimental analysis of BWT variants for string collections

The extended Burrows-Wheeler-Transform (eBWT), introduced by Mantaci et ...
research
07/07/2021

Burrows Wheeler Transform on a Large Scale: Algorithms Implemented in Apache Spark

With the rapid growth of Next Generation Sequencing (NGS) technologies, ...

Please sign up or login with your details

Forgot password? Click here to reset