Deep Learning Methods for Protein Family Classification on PDB Sequencing Data

07/14/2022
by   Aaron Wang, et al.
0

Composed of amino acid chains that influence how they fold and thus dictating their function and features, proteins are a class of macromolecules that play a central role in major biological processes and are required for the structure, function, and regulation of the body's tissues. Understanding protein functions is vital to the development of therapeutics and precision medicine, and hence the ability to classify proteins and their functions based on measurable features is crucial; indeed, the automatic inference of a protein's properties from its sequence of amino acids, known as its primary structure, remains an important open problem within the field of bioinformatics, especially given the recent advancements in sequencing technologies and the extensive number of known but uncategorized proteins with unknown properties. In this work, we demonstrate and compare the performance of several deep learning frameworks, including novel bi-directional LSTM and convolutional models, on widely available sequencing data from the Protein Data Bank (PDB) of the Research Collaboratory for Structural Bioinformatics (RCSB), as well as benchmark this performance against classical machine learning approaches, including k-nearest neighbors and multinomial regression classifiers, trained on experimental data. Our results show that our deep learning models deliver superior performance to classical machine learning methods, with the convolutional architecture providing the most impressive inference performance.

READ FULL TEXT
research
11/05/2022

Learning the shape of protein micro-environments with a holographic convolutional neural network

Proteins play a central role in biology from immune recognition to brain...
research
09/16/2021

PDBench: Evaluating Computational Methods for Protein Sequence Design

Proteins perform critical processes in all living systems: converting so...
research
01/15/2019

Comparing two deep learning sequence-based models for protein-protein interaction prediction

Biological data are extremely diverse, complex but also quite sparse. Th...
research
08/13/2021

Biomedical data and deep learning computational models for predicting compound-protein relations

Researchers have developed a computational field called virtual screenin...
research
11/03/2021

Binary classification of proteins by a Machine Learning approach

In this work we present a system based on a Deep Learning approach, by u...
research
05/01/2019

Machine Learning for Classification of Protein Helix Capping Motifs

The biological function of a protein stems from its 3-dimensional struct...
research
01/16/2023

PlasmoFAB: A Benchmark to Foster Machine Learning for Plasmodium falciparum Protein Antigen Candidate Prediction

Motivation: Machine learning methods can be used to support scientific d...

Please sign up or login with your details

Forgot password? Click here to reset