Multi-Frequency Information Enhanced Channel Attention Module for Speaker Representation Learning

07/10/2022
by   Mufan Sang, et al.
0

Recently, attention mechanisms have been applied successfully in neural network-based speaker verification systems. Incorporating the Squeeze-and-Excitation block into convolutional neural networks has achieved remarkable performance. However, it uses global average pooling (GAP) to simply average the features along time and frequency dimensions, which is incapable of preserving sufficient speaker information in the feature maps. In this study, we show that GAP is a special case of a discrete cosine transform (DCT) on time-frequency domain mathematically using only the lowest frequency component in frequency decomposition. To strengthen the speaker information extraction ability, we propose to utilize multi-frequency information and design two novel and effective attention modules, called Single-Frequency Single-Channel (SFSC) attention module and Multi-Frequency Single-Channel (MFSC) attention module. The proposed attention modules can effectively capture more speaker information from multiple frequency components on the basis of DCT. We conduct comprehensive experiments on the VoxCeleb datasets and a probe evaluation on the 1st 48-UTD forensic corpus. Experimental results demonstrate that our proposed SFSC and MFSC attention modules can efficiently generate more discriminative speaker representations and outperform ResNet34-SE and ECAPA-TDNN systems with relative 20.9 adding extra network parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2021

Duality Temporal-channel-frequency Attention Enhanced Speaker Representation Learning

The use of channel-wise attention in CNN based speaker representation ne...
research
10/31/2022

Convolution-Based Channel-Frequency Attention for Text-Independent Speaker Verification

Deep convolutional neural networks (CNNs) have been applied to extractin...
research
04/03/2022

Selective Kernel Attention for Robust Speaker Verification

Recent state-of-the-art speaker verification architectures adopt multi-s...
research
08/04/2022

Data-driven Attention and Data-independent DCT based Global Context Modeling for Text-independent Speaker Recognition

Learning an effective speaker representation is crucial for achieving re...
research
12/22/2020

FcaNet: Frequency Channel Attention Networks

Attention mechanism, especially channel attention, has gained great succ...
research
09/02/2020

Speaker Representation Learning using Global Context Guided Channel and Time-Frequency Transformations

In this study, we propose the global context guided channel and time-fre...
research
10/16/2019

Frequency and temporal convolutional attention for text-independent speaker recognition

Majority of the recent approaches for text-independent speaker recogniti...

Please sign up or login with your details

Forgot password? Click here to reset