xTrimoABFold: Improving Antibody Structure Prediction without Multiple Sequence Alignments

11/30/2022
by   Yining Wang, et al.
0

In the field of antibody engineering, an essential task is to design a novel antibody whose paratopes bind to a specific antigen with correct epitopes. Understanding antibody structure and its paratope can facilitate a mechanistic understanding of its function. Therefore, antibody structure prediction from its sequence alone has always been a highly valuable problem for de novo antibody design. AlphaFold2, a breakthrough in the field of structural biology, provides a solution to predict protein structure based on protein sequences and computationally expensive coevolutionary multiple sequence alignments (MSAs). However, the computational efficiency and undesirable prediction accuracy of antibodies, especially on the complementarity-determining regions (CDRs) of antibodies limit their applications in the industrially high-throughput drug design. To learn an informative representation of antibodies, we employed a deep antibody language model (ALM) on curated sequences from the observed antibody space database via a transformer model. We also developed a novel model named xTrimoABFold to predict antibody structure from antibody sequence based on the pretrained ALM as well as efficient evoformers and structural modules. The model was trained end-to-end on the antibody structures in PDB by minimizing the ensemble loss of domain-specific focal loss on CDR and the frame-aligned point loss. xTrimoABFold outperforms AlphaFold2 and other protein language model based SOTAs, e.g., OmegaFold, HelixFold-Single, and IgFold with a large significant margin (30+% improvement on RMSD) while performing 151 times faster than AlphaFold2. To the best of our knowledge, xTrimoABFold achieved state-of-the-art antibody structure prediction. Its improvement in both accuracy and efficiency makes it a valuable tool for de novo antibody design and could make further improvements in immuno-theory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/05/2023

Reprogramming Pretrained Language Models for Protein Sequence Representation Learning

Machine Learning-guided solutions for protein learning tasks have made s...
research
04/14/2022

Generative power of a protein language model trained on multiple sequence alignments

Computational models starting from large ensembles of evolutionarily rel...
research
08/14/2023

Pairing interacting protein sequences using masked language modeling

Predicting which proteins interact together from amino-acid sequences is...
research
10/11/2022

Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems

Do all instances need inference through the big models for a correct pre...
research
11/20/2021

Simple End-to-end Deep Learning Model for CDR-H3 Loop Structure Prediction

Predicting a structure of an antibody from its sequence is important sin...
research
02/08/2022

ECRECer: Enzyme Commission Number Recommendation and Benchmarking based on Multiagent Dual-core Learning

Enzyme Commission (EC) numbers, which associate a protein sequence with ...
research
08/20/2022

Few-Shot Learning of Accurate Folding Landscape for Protein Structure Prediction

Data-driven predictive methods which can efficiently and accurately tran...

Please sign up or login with your details

Forgot password? Click here to reset