Comparing two deep learning sequence-based models for protein-protein interaction prediction

01/15/2019
by   Florian Richoux, et al.
0

Biological data are extremely diverse, complex but also quite sparse. The recent developments in deep learning methods are offering new possibilities for the analysis of complex data. However, it is easy to be get a deep learning model that seems to have good results but is in fact either overfitting the training data or the validation data. In particular, the fact to overfit the validation data, called "information leak", is almost never treated in papers proposing deep learning models to predict protein-protein interactions (PPI). In this work, we compare two carefully designed deep learning models and show pitfalls to avoid while predicting PPIs through machine learning methods. Our best model predicts accurately more than 78 conditions both for training and testing. The methodology we propose here allow us to have strong confidences about the ability of a model to scale up on larger datasets. This would allow sharper models when larger datasets would be available, rather than current models prone to information leaks. Our solid methodological foundations shall be applicable to more organisms and whole proteome networks predictions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2023

Do Deep Learning Models Really Outperform Traditional Approaches in Molecular Docking?

Molecular docking, given a ligand molecule and a ligand binding site (ca...
research
07/14/2022

Deep Learning Methods for Protein Family Classification on PDB Sequencing Data

Composed of amino acid chains that influence how they fold and thus dict...
research
08/10/2023

OpenProteinSet: Training data for structural biology at scale

Multiple sequence alignments (MSAs) of proteins encode rich biological i...
research
10/12/2022

When does deep learning fail and how to tackle it? A critical analysis on polymer sequence-property surrogate models

Deep learning models are gaining popularity and potency in predicting po...
research
09/06/2019

Mass Personalization of Deep Learning

We discuss training techniques, objectives and metrics toward mass perso...
research
05/24/2022

Learning multi-scale functional representations of proteins from single-cell microscopy data

Protein function is inherently linked to its localization within the cel...
research
09/08/2021

Machine learning modeling of family wide enzyme-substrate specificity screens

Biocatalysis is a promising approach to sustainably synthesize pharmaceu...

Please sign up or login with your details

Forgot password? Click here to reset