Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

11/28/2021
by   Zhihao Du, et al.
0

Overlapping speech diarization is always treated as a multi-label classification problem. In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set. Specifically, we propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels according to the similarities between speech features and given speaker embeddings. Our method is further extended and integrated with downstream tasks by utilizing the textual information, which has not been well studied in previous literature. The experimental results show that our method achieves lower diarization error rate than the target-speaker voice activity detection. When textual information is involved, the diarization errors can be further reduced. For the real meeting scenario, our method can achieve 34.11 with the Bayesian hidden Markov model based clustering algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2022

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios

Overlapping speech diarization has been traditionally treated as a multi...
research
11/18/2022

Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis

Recently, hybrid systems of clustering and neural diarization models hav...
research
05/14/2020

Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario

Speaker diarization for real-life scenarios is an extremely challenging ...
research
03/08/2023

TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization

Recently, end-to-end neural diarization (EEND) is introduced and achieve...
research
02/24/2020

End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification

The most common approach to speaker diarization is clustering of speaker...
research
11/03/2020

DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

Several advances have been made recently towards handling overlapping sp...
research
10/28/2022

Dysfluencies Seldom Come Alone – Detection as a Multi-Label Problem

Specially adapted speech recognition models are necessary to handle stut...

Please sign up or login with your details

Forgot password? Click here to reset