Utterance-level Aggregation For Speaker Recognition In The Wild

02/26/2019
by   Weidi Xie, et al.
2

The objective of this paper is speaker recognition "in the wild"-where utterances may be of variable length and also contain irrelevant signals. Crucial elements in the design of deep networks for this task are the type of trunk (frame level) network, and the method of temporal aggregation. We propose a powerful speaker recognition deep network, using a "thin-ResNet" trunk architecture, and a dictionary-based NetVLAD or GhostVLAD layer to aggregate features across time, that can be trained end-to-end. We show that our network achieves state of the art performance by a significant margin on the VoxCeleb1 test set for speaker recognition, whilst requiring fewer parameters than previous methods. We also investigate the effect of utterance length on performance, and conclude that for "in the wild" data, a longer length is beneficial.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2018

Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System

In this paper, we explore the encoding/pooling layer and loss function i...
research
09/30/2021

Fine-tuning wav2vec2 for speaker recognition

This paper explores applying the wav2vec2 framework to speaker recogniti...
research
09/28/2020

Siamese Capsule Network for End-to-End Speaker Recognition In The Wild

We propose an end-to-end deep model for speaker verification in the wild...
research
06/30/2019

Contextual Phonetic Pretraining for End-to-end Utterance-level Language and Speaker Recognition

Pretrained contextual word representations in NLP have greatly improved ...
research
11/02/2022

Towards End-to-end Speaker Diarization in the Wild

Speaker diarization algorithms address the "who spoke when" problem in a...
research
04/06/2020

Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs

In realistic settings, a speaker recognition system needs to identify a ...
research
07/16/2018

Subjective and objective experiments on the influence of speaker's gender on the unvoiced segments

Subjective and objective experiments are conducted to understand the ext...

Please sign up or login with your details

Forgot password? Click here to reset