Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language

02/16/2020
by   Kohei Matsuura, et al.
0

Ainu is an unwritten language that has been spoken by Ainu people who are one of the ethnic groups in Japan. It is recognized as critically endangered by UNESCO and archiving and documentation of its language heritage is of paramount importance. Although a considerable amount of voice recordings of Ainu folklore has been produced and accumulated to save their culture, only a quite limited parts of them are transcribed so far. Thus, we started a project of automatic speech recognition (ASR) for the Ainu language in order to contribute to the development of annotated language archives. In this paper, we report speech corpus development and the structure and performance of end-to-end ASR for Ainu. We investigated four modeling units (phone, syllable, word piece, and word) and found that the syllable-based model performed best in terms of both word and phone recognition accuracy, which were about 60 respectively in speaker-open condition. Furthermore, word and phone accuracy of 80 that a multilingual ASR training with additional speech corpora of English and Japanese further improves the speaker-open test accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/17/2021

JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification

In this paper, we construct a new Japanese speech corpus called "JTubeSp...
research
09/15/2023

Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network

While standard speaker diarization attempts to answer the question "who ...
research
04/15/2022

Automated speech tools for helping communities process restricted-access corpora for language revival efforts

Many archival recordings of speech from endangered languages remain unan...
research
03/22/2017

Direct Acoustics-to-Word Models for English Conversational Speech Recognition

Recent work on end-to-end automatic speech recognition (ASR) has shown t...
research
12/13/2021

PM-MMUT: Boosted Phone-mask Data Augmentation using Multi-modeing Unit Training for Robust Uyghur E2E Speech Recognition

Consonant and vowel reduction are often encountered in Uyghur speech, wh...
research
01/29/2018

A Corpus for Modeling Word Importance in Spoken Dialogue Transcripts

Motivated by a project to create a system for people who are deaf or har...
research
05/25/2020

An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling

Mispronunciation detection and diagnosis (MDD) is a core component of co...

Please sign up or login with your details

Forgot password? Click here to reset