DPCIPI: A pre-trained deep learning model for estimation of cross-immunity between drifted strains of Influenza A/H3N2

by   Yiming Du, et al.

Motivation: This study aims to develop a novel model called DNA Pretrained Cross-Immunity Protection Inference Model (DPCIPI) to predict the cross-immunity of influenza virus strains. The traditional method for measuring this is through HI experiments, which are costly and time-consuming. The DPCIPI model uses a pre-trained neural network to vectorize the gene sequences of viruses and predicts the degree of cross-immunity between them. Method: The paper describes the steps taken to develop the DPCIPI model. First, the gene sequence of two viruses is converted into k-mers. Then, the k-mers sequences are aligned, and identical k-mers at the exact position are deleted. The human DNA pre-trained model (DNABERT) is used to vectorize each k-mer in the remaining k-mers. This is followed using a BiLSTM encoder to encode the two viruses into sequence representation and embeddings. An information fusion operation is then performed on the two embeddings to obtain a splicing vector, which is further input into a fully connected neural network for prediction. All parameters of the model are trained simultaneously. Result: Binary cross-immunity prediction predicts whether the HI titer between two viruses exceeds a certain threshold (in our case, an HI titer measurement value higher than 40). Compared with baseline methods such as Logistic Regression, Perceptron, Decision Tree, and CNN-based models, DPCIPI achieves better performance: F1 (88.14 (89.69 intervals. Again, DPCIPI's performance surpasses baseline models. The study concludes that DPCIPI has enormous potential for predicting the cross-immunity between influenza virus strains.


page 1

page 6


DNAGPT: A Generalized Pretrained Tool for Multiple DNA Sequence Analysis Tasks

The success of the GPT series proves that GPT can extract general inform...

SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model

A large number of inorganic and organic compounds are able to bind DNA a...

Density estimation in representation space to predict model uncertainty

Deep learning models frequently make incorrect predictions with high con...

Prompt-based Pre-trained Model for Personality and Interpersonal Reactivity Prediction

This paper describes the LingJing team's method to the Workshop on Compu...

A Pilot Study of Relating MYCN-Gene Amplification with Neuroblastoma-Patient CT Scans

Neuroblastoma is one of the most common cancers in infants, and the init...

Epigenomic language models powered by Cerebras

Large scale self-supervised pre-training of Transformer language models ...

Fast deep learning correspondence for neuron tracking and identification in C.elegans using synthetic training

We present an automated method to track and identify neurons in C. elega...

Please sign up or login with your details

Forgot password? Click here to reset