Cascade Neural Ensemble for Identifying Scientifically Sound Articles

Background: A significant barrier to conducting systematic reviews and meta-analysis is efficiently finding scientifically sound relevant articles. Typically, less than 1 highly imbalanced task. Although feature-engineered and early neural networks models were studied for this task, there is an opportunity to improve the results. Methods: We framed the problem of filtering articles as a classification task, and trained and tested several ensemble architectures of SciBERT, a variant of BERT pre-trained on scientific articles, on a manually annotated dataset of about 50K articles from MEDLINE. Since scientifically sound articles are identified through a multi-step process we proposed a novel cascade ensemble analogous to the selection process. We compared the performance of the cascade ensemble with a single integrated model and other types of ensembles as well as with results from previous studies. Results: The cascade ensemble architecture achieved 0.7505 F measure, an impressive 49.1 previously proposed and evaluated on a selected subset of the 50K articles. On the full dataset, the cascade ensemble achieved 0.7639 F measure, resulting in an error rate reduction of 19.7 previous study that used the full dataset. Conclusion: Pre-trained contextual encoder neural networks (e.g. SciBERT) perform better than the models studied previously and manually created search filters in filtering for scientifically sound relevant articles. The superior performance achieved by the cascade ensemble is a significant result that generalizes beyond this task and the dataset, and is analogous to query optimization in IR and databases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2020

Robustly Pre-trained Neural Model for Direct Temporal Relation Extraction

Background: Identifying relationships between clinical events and tempor...
research
10/26/2021

Diversity and Generalization in Neural Network Ensembles

Ensembles are widely used in machine learning and, usually, provide stat...
research
05/12/2021

Priberam at MESINESP Multi-label Classification of Medical Texts Task

Medical articles provide current state of the art treatments and diagnos...
research
07/06/2020

Coronary Heart Disease Diagnosis Based on Improved Ensemble Learning

Accurate diagnosis is required before performing proper treatments for c...
research
01/05/2021

COVID-19: Comparative Analysis of Methods for Identifying Articles Related to Therapeutics and Vaccines without Using Labeled Data

Here we proposed an approach to analyze text classification methods base...
research
07/12/2022

Utilizing Excess Resources in Training Neural Networks

In this work, we suggest Kernel Filtering Linear Overparameterization (K...
research
07/24/2020

IR-BERT: Leveraging BERT for Semantic Search in Background Linking for News Articles

This work describes our two approaches for the background linking task o...

Please sign up or login with your details

Forgot password? Click here to reset