End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors

05/20/2020
by   Shota Horiguchi, et al.
0

End-to-end speaker diarization for an unknown number of speakers is addressed in this paper. Recently proposed end-to-end speaker diarization outperformed conventional clustering-based speaker diarization, but it has one drawback: it is less flexible in terms of the number of speakers. This paper proposes a method for encoder-decoder based attractor calculation (EDA), which first generates a flexible number of attractors from a speech embedding sequence. Then, the generated multiple attractors are multiplied by the speech embedding sequence to produce the same number of speaker activities. The speech embedding sequence is extracted using the conventional self-attentive end-to-end neural speaker diarization (SA-EEND) network. In a two-speaker condition, our method achieved a 2.69 4.56 method attained a 15.29 method achieved a 19.43

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset