SANTLR: Speech Annotation Toolkit for Low Resource Languages

08/02/2019
by   Xinjian Li, et al.
0

While low resource speech recognition has attracted a lot of attention from the speech community, there are a few tools available to facilitate low resource speech collection. In this work, we present SANTLR: Speech Annotation Toolkit for Low Resource Languages. It is a web-based toolkit which allows researchers to easily collect and annotate a corpus of speech in a low resource language. Annotators may use this toolkit for two purposes: transcription or recording. In transcription, annotators would transcribe audio files provided by the researchers; in recording, annotators would record their voice by reading provided texts. We highlight two properties of this toolkit. First, SANTLR has a very user-friendly User Interface (UI). Both researchers and annotators may use this simple web interface to interact. There is no requirement for the annotators to have any expertise in audio or text processing. The toolkit would handle all preprocessing and postprocessing steps. Second, we employ a multi-step ranking mechanism facilitate the annotation process. In particular, the toolkit would give higher priority to utterances which are easier to annotate and are more beneficial to achieving the goal of the annotation, e.g. quickly training an acoustic model.

READ FULL TEXT

page 1

page 2

research
03/30/2022

Vakyansh: ASR Toolkit for Low Resource Indic languages

We present Vakyansh, an end to end toolkit for Speech Recognition in Ind...
research
11/17/2022

Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation

While deep learning-based text-to-speech (TTS) models such as VITS have ...
research
03/23/2021

PanGEA: The Panoramic Graph Environment Annotation Toolkit

PanGEA, the Panoramic Graph Environment Annotation toolkit, is a lightwe...
research
07/14/2022

Open Terminology Management and Sharing Toolkit for Federation of Terminology Databases

Consolidated access to current and reliable terms from different subject...
research
07/06/2022

Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

With the rise of deep learning and intelligent vehicles, the smart assis...
research
03/29/2023

Tackling Hate Speech in Low-resource Languages with Context Experts

Given Myanmars historical and socio-political context, hate speech sprea...
research
07/12/2019

Pykaldi2: Yet another speech toolkit based on Kaldi and Pytorch

We introduce PyKaldi2 speech recognition toolkit implemented based on Ka...

Please sign up or login with your details

Forgot password? Click here to reset