PtLnc-BXE: Prediction of plant lncRNAs using a Bagging-XGBoost-ensemble method with multiple features

11/01/2019
by   Guangyan Zhang, et al.
0

Motivation: Long non-coding RNAs (lncRNAs) are a diverse class of RNA molecules with a length above 200 nucleotides that do not encode proteins. Since lncRNAs have involved in a wide range of functions in cellular and developmental processes, an increasing number of methods or tools for distin-guishing lncRNAs from coding RNAs have been proposed. However, most of the existing methods are designed for lncRNAs in animal systems, and only a few methods focus on the plant lncRNA identifica-tion. Different from lncRNAs in animal systems, plant lncRNAs have distinct characteristics. It is desira-ble to develop a computational method for accurate and rapid identification of plant lncRNAs. Results: Herein, we present a plant lncRNA prediction approach PtLnc-BXE, which combines multiple sequence features in two steps to develop an ensemble mode. First, a diverse number of plants lncRNA features are collected and filtered by feature selection and subsequently used to represent RNA se-quences. Then, the training dataset is sampled into several subsets using the bootstrapping technique, and base learners are constructed on data subsets by using XGBoost, and multiple base learners are further combined into a single meta-learner by using logistic regression. PtLnc-BXE outperformed other state-of-the-art plant lncRNA prediction methods, achieving higher AUC (> 95.9 reveal that the different species have a high overlap between their selected features for modeling. Therefore, it is possible to build the cross-species predic-tion models for plant lncRNAs. Availability: The scripts and data can be downloaded at https://github.com/xxxxx Contact: example@example.org Supplementary information: Supplementary data are available at Bioinformatics online.

READ FULL TEXT
research
02/12/2019

PLIT: An alignment-free computational tool for identification of long non-coding RNAs in plant transcriptomic datasets

Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs which play...
research
06/28/2015

Deep-Plant: Plant Identification with convolutional neural networks

This paper studies convolutional neural networks (CNN) to learn unsuperv...
research
03/24/2020

Scalable learning for bridging the species gap in image-based plant phenotyping

The traditional paradigm of applying deep learning – collect, annotate a...
research
09/13/2020

A Review of Visual Descriptors and Classification Techniques Used in Leaf Species Identification

Plants are fundamentally important to life. Key research areas in plant ...
research
07/20/2020

i6mA-CNN: a convolution based computational approach towards identification of DNA N6-methyladenine sites in rice genome

Motivation: DNA N6-methylation (6mA) in Adenine nucleotide is a post rep...
research
10/11/2019

From Species to Cultivar: Soybean Cultivar Recognition using Multiscale Sliding Chord Matching of Leaf Images

Leaf image recognition techniques have been actively researched for plan...
research
08/02/2022

CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods

Causal relationships are commonly examined in manufacturing processes to...

Please sign up or login with your details

Forgot password? Click here to reset