GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping

by   Mohammed Alser, et al.

Motivation: High throughput DNA sequencing (HTS) technologies generate an excessive number of small DNA segments -called short reads- that cause significant computational burden. To analyze the entire genome, each of the billions of short reads must be mapped to a reference genome based on the similarity between a read and "candidate" locations in that reference genome. The similarity measurement, called alignment, formulated as an approximate string matching problem, is the computational bottleneck because: (1) it is implemented using quadratic-time dynamic programming algorithms, and (2) the majority of candidate locations in the reference genome do not align with a given read due to high dissimilarity. Calculating the alignment of such incorrect candidate locations consumes an overwhelming majority of a modern read mapper's execution time. Therefore, it is crucial to develop a fast and effective filter that can detect incorrect candidate locations and eliminate them before using computationally costly alignment operations. Results: We propose GateKeeper, a new hardware accelerator that functions as a pre-alignment step that quickly filters out most incorrect candidate locations. GateKeeper is the first design to accelerate pre-alignment using Field-Programmable Gate Arrays (FPGAs), which can perform pre-alignment much faster than software. GateKeeper can be integrated with any mapper that performs sequence alignment for verification. When implemented on a single FPGA chip, GateKeeper maintains high accuracy (on average >96 to 105-fold and 215-fold speedup over the state-of-the-art software pre-alignment techniques, Adjacency Filter and Shifted Hamming Distance (SHD), respectively. Availability: GateKeeper is available at:


Accelerating the Understanding of Life's Code Through Better Algorithms and Hardware Design

Calculating the similarities between a pair of genomic sequences is one ...

GateKeeper-GPU: Fast and Accurate Pre-Alignment Filtering in Short Read Mapping

At the last step of short read mapping, the candidate locations of the r...

FPGA Acceleration of Short Read Alignment

Aligning millions of short DNA or RNA reads, of 75 to 250 base pairs eac...

TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering

Basecalling is an essential step in nanopore sequencing analysis where t...

Large-scale Machine Learning for Metagenomics Sequence Classification

Metagenomics characterizes the taxonomic diversity of microbial communit...

Fast Exact String to D-Texts Alignments

In recent years, aligning a sequence to a pangenome has become a central...

Fast Characterization of Segmental Duplications in Genome Assemblies

Segmental duplications (SDs), or low-copy repeats (LCR), are segments of...

Please sign up or login with your details

Forgot password? Click here to reset