Latent variable approach to diarization of audio recordings using ad-hoc randomly placed mobile devices
Diarization of audio recordings from ad-hoc mobile devices using spatial information is considered in this paper. A two-channel synchronous recording is assumed for each mobile device, which is used to compute directional statistics separately at each device in a frame-wise manner. The recordings across the mobile devices are asynchronous, but a coarse synchronization is performed by aligning the signals using acoustic events, or real-time clock. Direction statistics computed for all the devices, are then modeled jointly using a Dirichlet mixture model, and the posterior probability over the mixture components is used to derive the diarization information. Experiments on real life recordings using mobile phones show a diarization error rate of less than 14
READ FULL TEXT