Pairing interacting protein sequences using masked language modeling

08/14/2023
by   Umberto Lupo, et al.
0

Predicting which proteins interact together from amino-acid sequences is an important task. We develop a method to pair interacting protein sequences which leverages the power of protein language models trained on multiple sequence alignments, such as MSA Transformer and the EvoFormer module of AlphaFold. We formulate the problem of pairing interacting partners among the paralogs of two protein families in a differentiable way. We introduce a method called DiffPALM that solves it by exploiting the ability of MSA Transformer to fill in masked amino acids in multiple sequence alignments using the surrounding context. MSA Transformer encodes coevolution between functionally or structurally coupled amino acids. We show that it captures inter-chain coevolution, while it was trained on single-chain data, which means that it can be used out-of-distribution. Relying on MSA Transformer without fine-tuning, DiffPALM outperforms existing coevolution-based pairing methods on difficult benchmarks of shallow multiple sequence alignments extracted from ubiquitous prokaryotic protein datasets. It also outperforms an alternative method based on a state-of-the-art protein language model trained on single sequences. Paired alignments of interacting protein sequences are a crucial ingredient of supervised deep learning methods to predict the three-dimensional structure of protein complexes. DiffPALM substantially improves the structure prediction of some eukaryotic protein complexes by AlphaFold-Multimer, without significantly deteriorating any of those we tested. It also achieves competitive performance with using orthology-based pairing.

READ FULL TEXT

page 6

page 10

page 27

page 30

page 32

research
04/14/2022

Generative power of a protein language model trained on multiple sequence alignments

Computational models starting from large ensembles of evolutionarily rel...
research
10/27/2021

MutFormer: A context-dependent transformer-based model to predict pathogenic missense mutations

A missense mutation is a point mutation that results in a substitution o...
research
05/11/2022

MAS2HP: A Multi Agent System to predict protein structure in 2D HP model

Protein Structure Prediction (PSP) is an unsolved problem in the field o...
research
03/29/2022

Protein language models trained on multiple sequence alignments learn phylogenetic relationships

Self-supervised neural language models with attention have recently been...
research
07/28/2022

HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative

AI-based protein structure prediction pipelines, such as AlphaFold2, hav...
research
07/08/2021

Network and Sequence-Based Prediction of Protein-Protein Interactions

Background:Typically, proteins perform key biological functions by inter...
research
11/30/2022

xTrimoABFold: Improving Antibody Structure Prediction without Multiple Sequence Alignments

In the field of antibody engineering, an essential task is to design a n...

Please sign up or login with your details

Forgot password? Click here to reset