Robust Acoustic Scene Classification in the Presence of Active Foreground Speech

08/02/2021
by   Siyuan Song, et al.
0

We present an iVector based Acoustic Scene Classification (ASC) system suited for real life settings where active foreground speech can be present. In the proposed system, each recording is represented by a fixed-length iVector that models the recording's important properties. A regularized Gaussian backend classifier with class-specific covariance models is used to extract the relevant acoustic scene information from these iVectors. To alleviate the large performance degradation when a foreground speaker dominates the captured signal, we investigate the use of the iVector framework on Mel-Frequency Cepstral Coefficients (MFCCs) that are derived from an estimate of the noise power spectral density. This noise-floor can be extracted in a statistical manner for single channel recordings. We show that the use of noise-floor features is complementary to multi-condition training in which foreground speech is added to training signal to reduce the mismatch between training and testing conditions. Experimental results on the DCASE 2016 Task 1 dataset show that the noise-floor based features and multi-condition training realize significant classification accuracy gains of up to more than 25 percentage points (absolute) in the most adverse conditions. These promising results can further facilitate the integration of ASC in resource-constrained devices such as hearables.

READ FULL TEXT
research
07/22/2016

Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recording

In this paper we present our work on Task 1 Acoustic Scene Classi- ficat...
research
08/17/2018

Unsupervised adversarial domain adaptation for acoustic scene classification

A general problem in acoustic scene classification task is the mismatche...
research
03/23/2022

Wider or Deeper Neural Network Architecture for Acoustic Scene Classification with Mismatched Recording Devices

In this paper, we present a robust and low complexity system for Acousti...
research
10/23/2021

A Study of Acoustic Features in Arabic Speaker Identification under Noisy Environmental Conditions

One of the major parts of the voice recognition field is the choice of a...
research
09/16/2021

DDS: A new device-degraded speech dataset for speech enhancement

A large and growing amount of speech content in real-life scenarios is b...
research
09/12/2023

Assessing the Generalization Gap of Learning-Based Speech Enhancement Systems in Noisy and Reverberant Environments

The acoustic variability of noisy and reverberant speech mixtures is inf...
research
02/23/2022

Speech watermarking: an approach for the forensic analysis of digital telephonic recordings

In this article, the authors discuss the problem of forensic authenticat...

Please sign up or login with your details

Forgot password? Click here to reset