ANER: Arabic and Arabizi Named Entity Recognition using Transformer-Based Approach
One of the main tasks of Natural Language Processing (NLP), is Named Entity Recognition (NER). It is used in many applications and also can be used as an intermediate step for other tasks. We present ANER, a web-based named entity recognizer for the Arabic, and Arabizi languages. The model is built upon BERT, which is a transformer-based encoder. It can recognize 50 different entity classes, covering various fields. We trained our model on the WikiFANE_Gold dataset which consists of Wikipedia articles. We achieved an F1 score of 88.7%, which beats CAMeL Tools' F1 score of 83% on the ANERcorp dataset, which has only 4 classes. We also got an F1 score of 77.7% on the NewsFANE_Gold dataset which contains out-of-domain data from News articles. The system is deployed on a user-friendly web interface that accepts users' inputs in Arabic, or Arabizi. It allows users to explore the entities in the text by highlighting them. It can also direct users to get information about entities through Wikipedia directly. We added the ability to do NER using our model, or CAMeL Tools' model through our website. ANER is publicly accessible at <http://www.aner.online>. We also deployed our model on HuggingFace at https://huggingface.co/boda/ANER, to allow developers to test and use it.
READ FULL TEXT