A step towards neural genome assembly

11/10/2020
by   Lovro Vrček, et al.
8

De novo genome assembly focuses on finding connections between a vast amount of short sequences in order to reconstruct the original genome. The central problem of genome assembly could be described as finding a Hamiltonian path through a large directed graph with a constraint that an unknown number of nodes and edges should be avoided. However, due to local structures in the graph and biological features, the problem can be reduced to graph simplification, which includes removal of redundant information. Motivated by recent advancements in graph representation learning and neural execution of algorithms, in this work we train the MPNN model with max-aggregator to execute several algorithms for graph simplification. We show that the algorithms were learned successfully and can be scaled to graphs of sizes up to 20 times larger than the ones used in training. We also test on graphs obtained from real-world genomic data—that of a lambda phage and E. coli.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2020

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly

One of the most computationally intensive tasks in computational biology...
research
06/01/2022

Learning to Untangle Genome Assembly with Graph Convolutional Networks

A quest to determine the complete sequence of a human DNA from telomere ...
research
11/25/2020

Genome assembly, a universal theoretical framework: unifying and generalizing the safe and complete algorithms

Genome assembly is a fundamental problem in Bioinformatics, requiring to...
research
07/10/2022

Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly

De novo genome assembly, i.e., rebuilding the sequence of an unknown gen...
research
02/04/2021

Optimal Construction of Hierarchical Overlap Graphs

Genome assembly is a fundamental problem in Bioinformatics, where for a ...
research
12/13/2021

ViQUF: de novo Viral Quasispecies reconstruction using Unitig-based Flow networks

During viral infection, intrahost mutation and recombination can lead to...
research
09/18/2018

Π-cyc: A Reference-free SNP Discovery Application using Parallel Graph Search

Motivation: Working with a large number of genomes simultaneously is of ...

Please sign up or login with your details

Forgot password? Click here to reset