Speaker Diarization and Identification from Single-Channel Classroom Audio Recording Using Virtual Microphones

07/01/2022
by   Antonio Gomez, et al.
0

Speaker identification in noisy audio recordings, specifically those from collaborative learning environments, can be extremely challenging. There is a need to identify individual students talking in small groups from other students talking at the same time. To solve the problem, we assume the use of a single microphone per student group without any access to previous large datasets for training. This dissertation proposes a method of speaker identification using cross-correlation patterns associated to an array of virtual microphones, centered around the physical microphone. The virtual microphones are simulated by using approximate speaker geometry observed from a video recording. The patterns are constructed based on estimates of the room impulse responses for each virtual microphone. The correlation patterns are then used to identify the speakers. The proposed method is validated with classroom audios and shown to substantially outperform diarization services provided by Google Cloud and Amazon AWS.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2018

Weakly Supervised Training of Speaker Identification Models

We propose an approach for training speaker identification models in a w...
research
05/03/2019

Meeting Transcription Using Virtual Microphone Arrays

We describe a system that generates speaker-annotated transcripts of mee...
research
12/17/2020

Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording

Leveraging additional speaker information to facilitate speech separatio...
research
09/18/2019

RTTD-ID: Tracked Captions with Multiple Speakers for Deaf Students

Students who are deaf and hard of hearing cannot hear in class and do no...
research
03/22/2018

Speaker Clustering With Neural Networks And Audio Processing

Speaker clustering is the task of differentiating speakers in a recordin...
research
05/18/2020

A Thousand Words are Worth More Than One Recording: NLP Based Speaker Change Point Detection

Speaker Diarization (SD) consists of splitting or segmenting an input au...
research
03/06/2020

Lightweight Speaker Verification for Online Identification of New Speakers with Short Segments

Verifying if two audio segments belong to the same speaker has been rece...

Please sign up or login with your details

Forgot password? Click here to reset