Extreme Scale De Novo Metagenome Assembly

09/19/2018
by   Evangelos Georganas, et al.
0

Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's genomes. State-of-the-art tools require big shared memory machines and cannot handle contemporary metagenome datasets that exceed Terabytes in size. In this paper, we introduce the MetaHipMer pipeline, a high-quality and high-performance metagenome assembler that employs an iterative de Bruijn graph approach. MetaHipMer leverages a specialized scaffolding algorithm that produces long scaffolds and accommodates the idiosyncrasies of metagenomes. MetaHipMer is end-to-end parallelized using the Unified Parallel C language and therefore can run seamlessly on shared and distributed-memory systems. Experimental results show that MetaHipMer matches or outperforms the state-of-the-art tools in terms of accuracy. Moreover, MetaHipMer scales efficiently to large concurrencies and is able to assemble previously intractable grand challenge metagenomes. We demonstrate the unprecedented capability of MetaHipMer by computing the first full assembly of the Twitchell Wetlands dataset, consisting of 7.5 billion reads - size 2.6 TBytes.

READ FULL TEXT
research
02/12/2019

Apollo: A Sequencing-Technology-Independent, Scalable, and Accurate Assembly Polishing Algorithm

A large proportion of the basepairs in the long reads that third-generat...
research
07/10/2022

Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly

De novo genome assembly, i.e., rebuilding the sequence of an unknown gen...
research
07/12/2016

DNA Image Pro -- A Tool for Generating Pixel Patterns using DNA Tile Assembly

Self-assembly is a process found everywhere in the Nature. In particular...
research
08/14/2020

PANDA: Processing-in-MRAM Accelerated De Bruijn Graph based DNA Assembly

Spurred by widening gap between data processing speed and data communica...
research
06/01/2022

Learning to Untangle Genome Assembly with Graph Convolutional Networks

A quest to determine the complete sequence of a human DNA from telomere ...
research
04/28/2023

KmerCo: A lightweight K-mer counting technique with a tiny memory footprint

K-mer counting is a requisite process for DNA assembly because it speeds...
research
03/05/2019

Planning Grasps for Assembly Tasks

This paper develops model-based grasp planning algorithms for assembly t...

Please sign up or login with your details

Forgot password? Click here to reset