Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

by   Luyao Cheng, et al.

Speaker diarization has gained considerable attention within speech processing research community. Mainstream speaker diarization rely primarily on speakers' voice characteristics extracted from acoustic signals and often overlook the potential of semantic information. Considering the fact that speech signals can efficiently convey the content of a speech, it is of our interest to fully exploit these semantic cues utilizing language models. In this work we propose a novel approach to effectively leverage semantic information in clustering-based speaker diarization systems. Firstly, we introduce spoken language understanding modules to extract speaker-related semantic information and utilize these information to construct pairwise constraints. Secondly, we present a novel framework to integrate these constraints into the speaker diarization pipeline, enhancing the performance of the entire system. Extensive experiments conducted on the public dataset demonstrate the consistent superiority of our proposed approach over acoustic-only speaker diarization systems.


page 1

page 2

page 3

page 4


Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization

Speaker diarization(SD) is a classic task in speech processing and is cr...

ConceptBeam: Concept Driven Target Speech Extraction

We propose a novel framework for target speech extraction based on seman...

A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis

In human speech, the attitude of a speaker cannot be fully expressed onl...

Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach

Large language models (LLMs) have shown great promise for capturing cont...

Weak-Supervised Dysarthria-invariant Features for Spoken Language Understanding using an FHVAE and Adversarial Training

The scarcity of training data and the large speaker variation in dysarth...

Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire

In multi-talker scenarios such as meetings and conversations, speech pro...

Deep generative factorization for speech signal

Various information factors are blended in speech signals, which forms t...

Please sign up or login with your details

Forgot password? Click here to reset