Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system

by   Keisuke Kinoshita, et al.

Automatic meeting analysis is an essential fundamental technology required to let, e.g. smart devices follow and respond to our conversations. To achieve an optimal automatic meeting analysis, we previously proposed an all-neural approach that jointly solves source separation, speaker diarization and source counting problems in an optimal way (in a sense that all the 3 tasks can be jointly optimized through error back-propagation). It was shown that the method could well handle simulated clean (noiseless and anechoic) dialog-like data, and achieved very good performance in comparison with several conventional methods. However, it was not clear whether such all-neural approach would be successfully generalized to more complicated real meeting data containing more spontaneously-speaking speakers, severe noise and reverberation, and how it performs in comparison with the state-of-the-art systems in such scenarios. In this paper, we first consider practical issues required for improving the robustness of the all-neural approach, and then experimentally show that, even in real meeting scenarios, the all-neural approach can perform effective speech enhancement, and simultaneously outperform state-of-the-art systems.


page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8


All-neural online source separation, counting, and diarization for meeting analysis

Automatic meeting analysis comprises the tasks of speaker counting, spea...

DeepSpace: Dynamic Spatial and Source Cue Based Source Separation for Dialog Enhancement

Dialog Enhancement (DE) is a feature which allows a user to increase the...

Incorporating Real-world Noisy Speech in Neural-network-based Speech Enhancement Systems

Supervised speech enhancement relies on parallel databases of degraded s...

End-to-End Multi-speaker ASR with Independent Vector Analysis

We develop an end-to-end system for multi-channel, multi-speaker automat...

Jointly optimal denoising, dereverberation, and source separation

This paper proposes methods that can optimize a Convolutional BeamFormer...

Utterance-by-utterance overlap-aware neural diarization with Graph-PIT

Recent speaker diarization studies showed that integration of end-to-end...

Please sign up or login with your details

Forgot password? Click here to reset