A Novel Minimum Divergence Approach to Robust Speaker Identification

by   Ayanendranath Basu, et al.

In this work, a novel solution to the speaker identification problem is proposed through minimization of statistical divergences between the probability distribution (g). of feature vectors from the test utterance and the probability distributions of the feature vector corresponding to the speaker classes. This approach is made more robust to the presence of outliers, through the use of suitably modified versions of the standard divergence measures. The relevant solutions to the minimum distance methods are referred to as the minimum rescaled modified distance estimators (MRMDEs). Three measures were considered - the likelihood disparity, the Hellinger distance and Pearson's chi-square distance. The proposed approach is motivated by the observation that, in the case of the likelihood disparity, when the empirical distribution function is used to estimate g, it becomes equivalent to maximum likelihood classification with Gaussian Mixture Models (GMMs) for speaker classes, a highly effective approach used, for example, by Reynolds [22] based on Mel Frequency Cepstral Coefficients (MFCCs) as features. Significant improvement in classification accuracy is observed under this approach on the benchmark speech corpus NTIMIT and a new bilingual speech corpus NISIS, with MFCC features, both in isolation and in combination with delta MFCC features. Moreover, the ubiquitous principal component transformation, by itself and in conjunction with the principle of classifier combination, is found to further enhance the performance.


page 1

page 2

page 3

page 4


Pitch-synchronous DCT features: A pilot study on speaker identification

We propose a new feature, namely, pitchsynchronous discrete cosine trans...

Speaker Sincerity Detection based on Covariance Feature Vectors and Ensemble Methods

Automatic measuring of speaker sincerity degree is a novel research prob...

Histogram Transform-based Speaker Identification

A novel text-independent speaker identification (SI) method is proposed....

Minimum divergence estimators, Maximum Likelihood and the generalized bootstrap

This paper is an attempt to set a justification for making use of some d...

Robust Speaker Clustering using Mixtures of von Mises-Fisher Distributions for Naturalistic Audio Streams

Speaker Diarization (i.e. determining who spoke and when?) for multi-spe...

I-vector Based Features Embedding for Heart Sound Classification

Cardiovascular disease (CVD) is considered as one of the main causes of ...

Please sign up or login with your details

Forgot password? Click here to reset