i6mA-CNN: a convolution based computational approach towards identification of DNA N6-methyladenine sites in rice genome

07/20/2020
by   Ruhul Amin, et al.
0

Motivation: DNA N6-methylation (6mA) in Adenine nucleotide is a post replication modification and is responsible for many biological functions. Experimental methods for genome wide 6mA site detection is an expensive and manual labour intensive process. Automated and accurate computational methods can help to identify 6mA sites in long genomes saving significant time and money. Results: Our study develops a convolutional neural network based tool i6mA-CNN capable of identifying 6mA sites in the rice genome. Our model coordinates among multiple types of features such as PseAAC inspired customized feature vector, multiple one hot representations and dinucleotide physicochemical properties. It achieves area under the receiver operating characteristic curve of 0.98 with an overall accuracy of 0.94 using 5 fold cross validation on benchmark dataset. Finally, we evaluate our model on two other plant genome 6mA site identification datasets besides rice. Results suggest that our proposed tool is able to generalize its ability of 6mA site identification on plant genomes irrespective of plant species. Availability: Web tool for this research can be found at: https://cutt.ly/Co6KuWG. Contact: rafeed@cse.uiu.ac.bd Supplementary information: Supplementary data (benchmark dataset, independent test dataset, comparison purpose dataset, trained model, physicochemical property values, attention mechanism details for motif finding) are available at https://cutt.ly/PpDdeDH.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2019

iPromoter-BnCNN: a Novel Branched CNN Based Predictor for Identifying and Classifying Sigma Promoters

Promoter is a short region of DNA which is responsible for initiating tr...
research
02/12/2019

PLIT: An alignment-free computational tool for identification of long non-coding RNAs in plant transcriptomic datasets

Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs which play...
research
04/13/2018

Classification of large DNA methylation datasets for identifying cancer drivers

DNA methylation is a well-studied genetic modification crucial to regula...
research
10/03/2017

Dilated Convolutions for Modeling Long-Distance Genomic Dependencies

We consider the task of detecting regulatory elements in the human genom...
research
11/01/2019

PtLnc-BXE: Prediction of plant lncRNAs using a Bagging-XGBoost-ensemble method with multiple features

Motivation: Long non-coding RNAs (lncRNAs) are a diverse class of RNA mo...
research
08/03/2015

Unsupervised Learning in Genome Informatics

With different genomes available, unsupervised learning algorithms are e...
research
11/03/2022

betaclust: a family of mixture models for beta valued DNA methylation data

The DNA methylation process has been extensively studied for its role in...

Please sign up or login with your details

Forgot password? Click here to reset