Hierarchical Classification for Spoken Arabic Dialect Identification using Prosody: Case of Algerian Dialects

03/29/2017
by   Soumia Bougrine, et al.
0

In daily communications, Arabs use local dialects which are hard to identify automatically using conventional classification methods. The dialect identification challenging task becomes more complicated when dealing with an under-resourced dialects belonging to a same county/region. In this paper, we start by analyzing statistically Algerian dialects in order to capture their specificities related to prosody information which are extracted at utterance level after a coarse-grained consonant/vowel segmentation. According to these analysis findings, we propose a Hierarchical classification approach for spoken Arabic algerian Dialect IDentification (HADID). It takes advantage from the fact that dialects have an inherent property of naturally structured into hierarchy. Within HADID, a top-down hierarchical classification is applied, in which we use Deep Neural Networks (DNNs) method to build a local classifier for every parent node into the hierarchy dialect structure. Our framework is implemented and evaluated on Algerian Arabic dialects corpus. Whereas, the hierarchy dialect structure is deduced from historic and linguistic knowledges. The results reveal that within , the best classifier is DNNs compared to Support Vector Machine. In addition, compared with a baseline Flat classification system, our HADID gives an improvement of 63.5 precision. Furthermore, overall results evidence the suitability of our prosody-based HADID for speaker independent dialect identification while requiring less than 6s test utterances.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2015

Automatic Dialect Detection in Arabic Broadcast Speech

We investigate different approaches for dialect identification in Arabic...
research
11/13/2020

Arabic Dialect Identification Using BERT-Based Domain Adaptation

Arabic is one of the most important and growing languages in the world. ...
research
05/30/2018

Improving Dialogue Act Classification for Spontaneous Arabic Speech and Instant Messages at Utterance Level

The ability to model and automatically detect dialogue act is an importa...
research
05/29/2018

Automatic Identification of Arabic expressions related to future events in Lebanon's economy

In this paper, we propose a method to automatically identify future even...
research
01/19/2022

Interpreting Arabic Transformer Models

Arabic is a Semitic language which is widely spoken with many dialects. ...
research
06/24/2015

Benchmark of structured machine learning methods for microbial identification from mass-spectrometry data

Microbial identification is a central issue in microbiology, in particul...
research
05/02/2023

From Local to Global: Navigating Linguistic Diversity in the African Context

The focus is on critical problems in NLP related to linguistic diversity...

Please sign up or login with your details

Forgot password? Click here to reset