PlasmoFAB: A Benchmark to Foster Machine Learning for Plasmodium falciparum Protein Antigen Candidate Prediction

01/16/2023
by   Jonas Christian Ditz, et al.
0

Motivation: Machine learning methods can be used to support scientific discovery in healthcare-related research fields. However, these methods can only be reliably used if they can be trained on high-quality and curated datasets. Currently, no such dataset for the exploration of Plasmodium falciparum protein antigen candidates exists. The parasite Plasmodium falciparum causes the infectious disease malaria. Thus, identifying potential antigens is of utmost importance for the development of antimalarial drugs and vaccines. Since exploring antigen candidates experimentally is an expensive and time-consuming process, applying machine learning methods to support this process has the potential to accelerate the development of drugs and vaccines which are needed for fighting and controlling malaria. Results: We developed PlasmoFAB, a curated benchmark that can be used to train machine learning methods for the exploration of Plasmodium falciparum protein antigen candidates. We combined an extensive literature search with domain expertise to create high-quality labels for Plasmodium falciparum specific proteins that distinguish between antigen candidates and intracellular proteins. Additionally, we used our benchmark to compare different well-known prediction models and available protein localization prediction services on the task of identifying protein antigen candidates. We show that available general-purpose services are unable to provide sufficient performance on identifying protein antigen candidates and are outperformed by models that were trained on specialized data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2019

Machine learning for protein folding and dynamics

Many aspects of the study of protein folding and dynamics have been affe...
research
08/23/2021

APObind: A Dataset of Ligand Unbound Protein Conformations for Machine Learning Applications in De Novo Drug Design

Protein-ligand complex structures have been utilised to design benchmark...
research
06/06/2023

AVIDa-hIL6: A Large-Scale VHH Dataset Produced from an Immunized Alpaca for Predicting Antigen-Antibody Interactions

Antibodies have become an important class of therapeutic agents to treat...
research
01/16/2023

Hybrid quantum-classical convolutional neural networks to improve molecular protein binding affinity predictions

One of the main challenges in drug discovery is to find molecules that b...
research
07/14/2022

Deep Learning Methods for Protein Family Classification on PDB Sequencing Data

Composed of amino acid chains that influence how they fold and thus dict...
research
05/28/2021

Bridge Data Center AI Systems with Edge Computing for Actionable Information Retrieval

Extremely high data rates at modern synchrotron and X-ray free-electron ...
research
06/25/2020

Machine-Learning Driven Drug Repurposing for COVID-19

The integration of machine learning methods into bioinformatics provides...

Please sign up or login with your details

Forgot password? Click here to reset