Metagenome2Vec: Building Contextualized Representations for Scalable Metagenome Analysis

Advances in next-generation metagenome sequencing have the potential to revolutionize the point-of-care diagnosis of novel pathogen infections, which could help prevent potential widespread transmission of diseases. Given the high volume of metagenome sequences, there is a need for scalable frameworks to analyze and segment metagenome sequences from clinical samples, which can be highly imbalanced. There is an increased need for learning robust representations from metagenome reads since pathogens within a family can have highly similar genome structures (some more than 90 segmentation and identification of novel pathogen sequences with limited labeled data. In this work, we propose Metagenome2Vec - a contextualized representation that captures the global structural properties inherent in metagenome data and local contextualized properties through self-supervised representation learning. We show that the learned representations can help detect six (6) related pathogens from clinical samples with less than 100 labeled sequences. Extensive experiments on simulated and clinical metagenome data show that the proposed representation encodes compositional properties that can generalize beyond annotations to segment novel pathogens in an unsupervised setting.

READ FULL TEXT

page 1

page 7

research
07/21/2021

MG-NET: Leveraging Pseudo-Imaging for Multi-Modal Metagenome Analysis

The emergence of novel pathogens and zoonotic diseases like the SARS-CoV...
research
11/30/2022

Scalable Pathogen Detection from Next Generation DNA Sequencing with Deep Learning

Next-generation sequencing technologies have enhanced the scope of Inter...
research
06/23/2021

Bootstrap Representation Learning for Segmentation on Medical Volumes and Sequences

In this work, we propose a novel straightforward method for medical volu...
research
03/09/2021

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

Wav2vec-C introduces a novel representation learning technique combining...
research
09/29/2022

Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation

Vessel segmentation in medical images is one of the important tasks in t...
research
09/24/2020

Semi-supervised sequence classification through change point detection

Sequential sensor data is generated in a wide variety of practical appli...
research
08/17/2021

MVCNet: Multiview Contrastive Network for Unsupervised Representation Learning for 3D CT Lesions

Objective and Impact Statement. With the renaissance of deep learning, a...

Please sign up or login with your details

Forgot password? Click here to reset