GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis

by   Nika Mansouri-Ghiasi, et al.

Read mapping is a fundamental, yet computationally-expensive step in many genomics applications. It is used to identify potential matches and differences between fragments (called reads) of a sequenced genome and an already known genome (called a reference genome). To address the computational challenges in genome analysis, many prior works propose various approaches such as filters that select the reads that must undergo expensive computation, efficient heuristics, and hardware acceleration. While effective at reducing the computation overhead, all such approaches still require the costly movement of a large amount of data from storage to the rest of the system, which can significantly lower the end-to-end performance of read mapping in conventional and emerging genomics systems. We propose GenStore, the first in-storage processing system designed for genome sequence analysis that greatly reduces both data movement and computational overheads of genome sequence analysis by exploiting low-cost and accurate in-storage filters. GenStore leverages hardware/software co-design to address the challenges of in-storage processing, supporting reads with 1) different read lengths and error rates, and 2) different degrees of genetic variation. Through rigorous analysis of read mapping processes, we meticulously design low-cost hardware accelerators and data/computation flows inside a NAND flash-based SSD. Our evaluation using a wide range of real genomic datasets shows that GenStore, when implemented in three modern SSDs, significantly improves the read mapping performance of state-of-the-art software (hardware) baselines by 2.07-6.05× (1.52-3.32×) for read sets with high similarity to the reference genome and 1.45-33.63× (2.70-19.2×) for read sets with low similarity to the reference genome.


Accelerating Genome Analysis: A Primer on an Ongoing Journey

Genome analysis fundamentally starts with a process known as read mappin...

GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping

Nanopore sequencing is a widely-used high-throughput genome sequencing t...

AirLift: A Fast and Comprehensive Technique for Translating Alignments between Reference Genomes

As genome sequencing tools and techniques improve, researchers are able ...

Taming Large-Scale Genomic Analyses via Sparsified Genomics

Searching for similar genomic sequences is an essential and fundamental ...

TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering

Basecalling is an essential step in nanopore sequencing analysis where t...

SeGraM: A Universal Hardware Accelerator for Genomic Sequence-to-Graph and Sequence-to-Sequence Mapping

A critical step of genome sequence analysis is the mapping of sequenced ...

Please sign up or login with your details

Forgot password? Click here to reset