Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization

10/07/2022
by   Shota Horiguchi, et al.
0

Due to the high performance of multi-channel speech processing, we can use the outputs from a multi-channel model as teacher labels when training a single-channel model with knowledge distillation. To the contrary, it is also known that single-channel speech data can benefit multi-channel models by mixing it with multi-channel speech data during training or by using it for model pretraining. This paper focuses on speaker diarization and proposes to conduct the above bi-directional knowledge transfer alternately. We first introduce an end-to-end neural diarization model that can handle both single- and multi-channel inputs. Using this model, we alternately conduct i) knowledge distillation from a multi-channel model to a single-channel model and ii) finetuning from the distilled single-channel model to a multi-channel model. Experimental results on two-speaker data show that the proposed method mutually improved single- and multi-channel speaker diarization performances.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2022

The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

This paper describes our submission to ICASSP 2022 Multi-channel Multi-p...
research
10/10/2021

Multi-Channel End-to-End Neural Diarization with Distributed Microphones

Recent progress on end-to-end neural diarization (EEND) has enabled over...
research
03/15/2023

Beamformer-Guided Target Speaker Extraction

We propose a Beamformer-guided Target Speaker Extraction (BG-TSE) method...
research
03/31/2022

Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech Recognition: A Comparative Study

Recently, the end-to-end training approach for multi-channel ASR has sho...
research
05/09/2023

Multi-Teacher Knowledge Distillation For Text Image Machine Translation

Text image machine translation (TIMT) has been widely used in various re...
research
11/27/2022

EPIK: Eliminating multi-model Pipelines with Knowledge-distillation

Real-world tasks are largely composed of multiple models, each performin...
research
09/21/2023

Multi-Channel MOSRA: Mean Opinion Score and Room Acoustics Estimation Using Simulated Data and a Teacher Model

Previous methods for predicting room acoustic parameters and speech qual...

Please sign up or login with your details

Forgot password? Click here to reset