A hybrid deep-learning approach for complex biochemical named entity recognition

by   Jian Liu, et al.

Named entity recognition (NER) of chemicals and drugs is a critical domain of information extraction in biochemical research. NER provides support for text mining in biochemical reactions, including entity relation extraction, attribute extraction, and metabolic response relationship extraction. However, the existence of complex naming characteristics in the biomedical field, such as polysemy and special characters, make the NER task very challenging. Here, we propose a hybrid deep learning approach to improve the recognition accuracy of NER. Specifically, our approach applies the Bidirectional Encoder Representations from Transformers (BERT) model to extract the underlying features of the text, learns a representation of the context of the text through Bi-directional Long Short-Term Memory (BILSTM), and incorporates the multi-head attention (MHATT) mechanism to extract chapter-level features. In this approach, the MHATT mechanism aims to improve the recognition accuracy of abbreviations to efficiently deal with the problem of inconsistency in full-text labels. Moreover, conditional random field (CRF) is used to label sequence tags because this probabilistic method does not need strict independence assumptions and can accommodate arbitrary context information. The experimental evaluation on a publicly-available dataset shows that the proposed hybrid approach achieves the best recognition performance; in particular, it substantially improves performance in recognizing abbreviations, polysemes, and low-frequency entities, compared with the state-of-the-art approaches. For instance, compared with the recognition accuracies for low-frequency entities produced by the BILSTM-CRF algorithm, those produced by the hybrid approach on two entity datasets (MULTIPLE and IDENTIFIER) have been increased by 80 21.69


page 1

page 2

page 3

page 4


Recognizing Chinese Judicial Named Entity using BiLSTM-CRF

Named entity recognition (NER) plays an essential role in natural langua...

Confidence penalty, annealing Gaussian noise and zoneout for biLSTM-CRF networks for named entity recognition

Named entity recognition (NER) is used to identify relevant entities in ...

Hybrid NER System for Multi-Source Offer Feeds

Data available across the web is largely unstructured. Offers published ...

An Evaluation of Recent Neural Sequence Tagging Models in Turkish Named Entity Recognition

Named entity recognition (NER) is an extensively studied task that extra...

Unified Neural Architecture for Drug, Disease and Clinical Entity Recognition

Most existing methods for biomedical entity recognition task rely on exp...

Deep Learning Approach for Intelligent Named Entity Recognition of Cyber Security

In recent years, the amount of Cyber Security data generated in the form...

A New Data Representation Based on Training Data Characteristics to Extract Drug Named-Entity in Medical Text

One essential task in information extraction from the medical corpus is ...

Please sign up or login with your details

Forgot password? Click here to reset