RISC: A Corpus for Shout Type Classification and Shout Intensity Prediction

06/07/2023
by   Takahiro Fukumori, et al.
0

The detection of shouted speech is crucial in audio surveillance and monitoring. Although it is desirable for a security system to be able to identify emergencies, existing corpora provide only a binary label (i.e., shouted or normal) for each speech sample, making it difficult to predict the shout intensity. Furthermore, most corpora comprise only utterances typical of hazardous situations, meaning that classifiers cannot learn to discriminate such utterances from shouts typical of less hazardous situations, such as cheers. Thus, this paper presents a novel research source, the RItsumeikan Shout Corpus (RISC), which contains wide variety types of shouted speech samples collected in recording experiments. Each shouted speech sample in RISC has a shout type and is also assigned shout intensity ratings via a crowdsourcing service. We also present a comprehensive performance comparison among deep learning approaches for speech type classification tasks and a shout intensity prediction task. The results show that feature learning based on the spectral and cepstral domains achieves high performance, no matter which network architecture is used. The results also demonstrate that shout type classification and intensity prediction are still challenging tasks, and RISC is expected to contribute to further development in this research area.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2022

Assessing the impact of contextual information in hate speech detection

In recent years, hate speech has gained great relevance in social networ...
research
07/27/2018

A small Griko-Italian speech translation corpus

This paper presents an extension to a very low-resource parallel corpus ...
research
03/13/2023

Speech Intelligibility Classifiers from 550k Disordered Speech Samples

We developed dysarthric speech intelligibility classifiers on 551,176 di...
research
12/19/2019

Developing a Multi-Platform Speech Recording System Toward Open Service of Building Large-Scale Speech Corpora

This paper briefly reports our ongoing attempt at the development of a m...
research
07/06/2019

Towards Debugging Deep Neural Networks by Generating Speech Utterances

Deep neural networks (DNN) are able to successfully process and classify...
research
09/22/2022

Controllable Accented Text-to-Speech Synthesis

Accented text-to-speech (TTS) synthesis seeks to generate speech with an...
research
08/04/2023

Adapting the NICT-JLE Corpus for Disfluency Detection Models

The detection of disfluencies such as hesitations, repetitions and false...

Please sign up or login with your details

Forgot password? Click here to reset