Estimating Uniqueness of Human Voice UsingI-Vector Representation
We study the individuality of human voice with re-spect to a widely used feature representation of speech utterances,namely, the i-vector model. As a first step toward this goal, wecompare and contrast uniqueness measures proposed consideringdifferent biometric modalities. Then, we introduce a more appro-priate uniqueness measure that evaluates the entropy of i-vectorswhile taking into account speaker level variations. Estimates areobtained on two newly generated datasets designed to capturevariabilities between and within speakers. The first dataset speechsamples of more than 20 thousand speakers obtained fromTEDx Talks videos. The second one includes samples of morethan one and a half thousand actors that are extracted frommovie dialogues. By using this data, we analyzed how severalfactors, such as the number of speakers, number of samples perspeakers, and different levels of within-speaker variation affectestimates. Most notably, we determined that the discretizationof i-vector elements does not necessarily cause a reduction inspeaker recognition performance. Our results show that thedegree of uniqueness offered by i-vector based representationmay reach 43-52 bits in a confined setting; however, under lessconstrained variations estimates reduce significantly to 13-20 bitlevel, depending on coarseness of quantization.
READ FULL TEXT