Attention Mechanism in Speaker Recognition: What Does It Learn in Deep Speaker Embedding?

09/25/2018
by   Qiongqiong Wang, et al.
0

This paper presents an experimental study on deep speaker embedding with an attention mechanism that has been found to be a powerful representation learning technique in speaker recognition. In this framework, an attention model works as a frame selector that computes an attention weight for each frame-level feature vector, in accord with which an utterancelevel representation is produced at the pooling layer in a speaker embedding network. In general, an attention model is trained together with the speaker embedding network on a single objective function, and thus those two components are tightly bound to one another. In this paper, we consider the possibility that the attention model might be decoupled from its parent network and assist other speaker embedding networks and even conventional i-vector extractors. This possibility is demonstrated through a series of experiments on a NIST Speaker Recognition Evaluation (SRE) task, with 9.0 min_Cprimary reduction when the attention weights are applied to i-vector extraction. Another experiment shows that DNN-based soft voice activity detection (VAD) can be effectively combined with the attention mechanism to yield further reduction of minCprimary by 6.6 embedding and i-vector systems, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2018

Attentive Statistics Pooling for Deep Speaker Embedding

This paper proposes attentive statistics pooling for deep speaker embedd...
research
08/12/2021

Xi-Vector Embedding for Speaker Recognition

We present a Bayesian formulation for deep speaker embedding, wherein th...
research
06/27/2022

Mushroom image recognition and distance generation based on attention-mechanism model and genetic information

The species identification of Macrofungi, i.e. mushrooms, has always bee...
research
09/24/2019

Improving Robustness In Speaker Identification Using A Two-Stage Attention Model

In this paper a novel framework to tackle speaker recognition using a tw...
research
08/11/2020

Compact Speaker Embedding: lrx-vector

Deep neural networks (DNN) have recently been widely used in speaker rec...
research
11/10/2020

Supervised attention for speaker recognition

The recently proposed self-attentive pooling (SAP) has shown good perfor...
research
05/25/2023

Ordered and Binary Speaker Embedding

Modern speaker recognition systems represent utterances by embedding vec...

Please sign up or login with your details

Forgot password? Click here to reset