ResNeXt and Res2Net Structure for Speaker Verification

07/06/2020
by   Tianyan Zhou, et al.
0

ResNet-based architecture has been widely adopted as the speaker embedding extractor in speaker verification system. Its standard topology and modularized design ease the human efforts on hyper parameter tuning. Therefore, width and depth are left as two major dimensions to further improve ResNet's representation power. However, simply increasing width or depth is not efficient. In this paper, we investigate the effectiveness of two new structures, i.e., ResNeXt and Res2Net, for speaker verification task. They introduce another two effective dimensions to improve model's representation capacity, called cardinality and scale, respectively. Experimental results on VoxCeleb data demonstrated increasing these two dimensions is more efficient than going deeper or wider. Experiments on two internal test sets with mismatched acoustic conditions also proved the generalization of ResNeXt and Res2Net architecture. Particularly, with Res2Net structure, our best model achieved state-of-the-art performance on VoxCeleb1 test set by reducing the EER by 18.5 utterances has been largely improved as a result of Res2Net module's multi-scale feature representation ability.

READ FULL TEXT
research
11/16/2016

Aggregated Residual Transformations for Deep Neural Networks

We present a simple, highly modularized network architecture for image c...
research
04/07/2020

Multi-Scale Aggregation Using Feature Pyramid Module for Text-Independent Speaker Verification

Currently, the most widely used approach for speaker verification is the...
research
08/08/2020

NPU Speaker Verification System for INTERSPEECH 2020 Far-Field Speaker Verification Challenge

This paper describes the NPU system submitted to Interspeech 2020 Far-Fi...
research
04/07/2020

Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances

Currently, the most widely used approach for speaker verification is the...
research
06/28/2023

MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation

The previous SpEx+ has yielded outstanding performance in speaker extrac...
research
12/18/2019

ResNetX: a more disordered and deeper network architecture

Designing efficient network structures has always been the core content ...
research
10/09/2021

Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification

With the development of deep learning, automatic speaker verification ha...

Please sign up or login with your details

Forgot password? Click here to reset