Multi-scale Sinusoidal Embeddings Enable Learning on High Resolution Mass Spectrometry Data

07/06/2022
by   Gennady Voronov, et al.
0

Small molecules in biological samples are studied to provide information about disease states, environmental toxins, natural product drug discovery, and many other applications. The primary window into the composition of small molecule mixtures is tandem mass spectrometry (MS2), which produces data that are of high sensitivity and part per million resolution. We adopt multi-scale sinusoidal embeddings of the mass data in MS2 designed to meet the challenge of learning from the full resolution of MS2 data. Using these embeddings, we provide a new state of the art model for spectral library search, the standard task for initial evaluation of MS2 data. We also introduce a new task, chemical property prediction from MS2 data, that has natural applications in high-throughput MS2 experiments and show that an average R^2 of 80% for novel compounds can be achieved across 10 chemical properties prioritized by medicinal chemists. We use dimensionality reduction techniques and experiments with different floating point resolutions to show the essential role multi-scale sinusoidal embeddings play in learning from MS2 data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2021

Scaffold Embeddings: Learning the Structure Spanned by Chemical Fragments, Scaffolds and Compounds

Molecules have seemed like a natural fit to deep learning's tendency to ...
research
08/28/2017

ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity?

Generating molecules with desired chemical properties is important for d...
research
07/19/2019

A Multi-Scale Mapping Approach Based on a Deep Learning CNN Model for Reconstructing High-Resolution Urban DEMs

The shortage of high-resolution urban digital elevation model (DEM) data...
research
09/08/2023

On the Efficacy of Multi-scale Data Samplers for Vision Applications

Multi-scale resolution training has seen an increased adoption across mu...
research
02/21/2022

Non-Volatile Memory Accelerated Geometric Multi-Scale Resolution Analysis

Dimensionality reduction algorithms are standard tools in a researcher's...
research
04/09/2021

Ice Core Science Meets Computer Vision: Challenges and Perspectives

Polar ice cores play a central role in studies of the earth's climate sy...
research
03/23/2021

Automated fragment identification for electron ionisation mass spectrometry: application to atmospheric measurements of halocarbons

Background: Non-target screening consists in searching a sample for all ...

Please sign up or login with your details

Forgot password? Click here to reset