BioNerFlair: biomedical named entity recognition using flair embedding and sequence tagger

11/03/2020
by   Harsh Patel, et al.
0

Motivation: The proliferation of Biomedical research articles has made the task of information retrieval more important than ever. Scientists and Researchers are having difficulty in finding articles that contain information relevant to them. Proper extraction of biomedical entities like Disease, Drug/chem, Species, Gene/protein, can considerably improve the filtering of articles resulting in better extraction of relevant information. Performance on BioNer benchmarks has progressively improved because of progression in transformers-based models like BERT, XLNet, OpenAI, GPT2, etc. These models give excellent results; however, they are computationally expensive and we can achieve better scores for domain-specific tasks using other contextual string-based models and LSTM-CRF based sequence tagger. Results: We introduce BioNerFlair, a method to train models for biomedical named entity recognition using Flair plus GloVe embeddings and Bidirectional LSTM-CRF based sequence tagger. With almost the same generic architecture widely used for named entity recognition, BioNerFlair outperforms previous state-of-the-art models. I performed experiments on 8 benchmarks datasets for biomedical named entity recognition. Compared to current state-of-the-art models, BioNerFlair achieves the best F1-score of 90.17 beyond 84.72 on the BioCreative II gene mention (BC2GM) corpus, best F1-score of 94.03 beyond 92.36 on the BioCreative IV chemical and drug (BC4CHEMD) corpus, best F1-score of 88.73 beyond 78.58 on the JNLPBA corpus, best F1-score of 91.1 beyond 89.71 on the NCBI disease corpus, best F1-score of 85.48 beyond 78.98 on the Species-800 corpus, while near best results was observed on BC5CDR-chem, BC3CDR-disease, and LINNAEUS corpus.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/24/2016

Robust Named Entity Recognition in Idiosyncratic Domains

Named entity recognition often fails in idiosyncratic domains. That caus...
research
11/01/2022

CCS Explorer: Relevance Prediction, Extractive Summarization, and Named Entity Recognition from Clinical Cohort Studies

Clinical Cohort Studies (CCS), such as randomized clinical trials, are a...
research
02/14/2023

Generation of Highlights from Research Papers Using Pointer-Generator Networks and SciBERT Embeddings

Nowadays many research articles are prefaced with research highlights to...
research
06/27/2023

CamemBERT-bio: a Tasty French Language Model Better for your Health

Clinical data in hospitals are increasingly accessible for research thro...
research
10/24/2020

Disease Normalization with Graph Embeddings

The detection and normalization of diseases in biomedical texts are key ...
research
11/30/2021

Text Mining Drug/Chemical-Protein Interactions using an Ensemble of BERT and T5 Based Models

In Track-1 of the BioCreative VII Challenge participants are asked to id...
research
10/08/2019

When Specialization Helps: Using Pooled Contextualized Embeddings to Detect Chemical and Biomedical Entities in Spanish

The recognition of pharmacological substances, compounds and proteins is...

Please sign up or login with your details

Forgot password? Click here to reset