Going From Molecules to Genomic Variations to Scientific Discovery:
Intelligent Algorithms and Architectures for Intelligent Genome Analysis
We now need more than ever to make genome analysis more intelligent. We need
to read, analyze, and interpret our genomes not only quickly, but also
accurately and efficiently enough to scale the analysis to population level.
There currently exist major computational bottlenecks and inefficiencies
throughout the entire genome analysis pipeline, because state-of-the-art genome
sequencing technologies are still not able to read a genome in its entirety. We
describe the ongoing journey in significantly improving the performance,
accuracy, and efficiency of genome analysis using intelligent algorithms and
hardware architectures. We explain state-of-the-art algorithmic methods and
hardware-based acceleration approaches for each step of the genome analysis
pipeline and provide experimental evaluations. Algorithmic approaches exploit
the structure of the genome as well as the structure of the underlying
hardware. Hardware-based acceleration approaches exploit specialized
microarchitectures or various execution paradigms (e.g., processing inside or
near memory) along with algorithmic changes, leading to new hardware/software
co-designed systems. We conclude with a foreshadowing of future challenges,
benefits, and research directions triggered by the development of both very low
cost yet highly error prone new sequencing technologies and specialized
hardware chips for genomics. We hope that these efforts and the challenges we
discuss provide a foundation for future work in making genome analysis more
intelligent. The analysis script and data used in our experimental evaluation
are available at: https://github.com/CMU-SAFARI/Molecules2Variations
READ FULL TEXT