Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation

03/29/2022
by   Ryo Fukuda, et al.
0

Speech segmentation, which splits long speech into short segments, is essential for speech translation (ST). Popular VAD tools like WebRTC VAD have generally relied on pause-based segmentation. Unfortunately, pauses in speech do not necessarily match sentence boundaries, and sentences can be connected by a very short pause that is difficult to detect by VAD. In this study, we propose a speech segmentation method using a binary classification model trained using a segmented bilingual speech corpus. We also propose a hybrid method that combines VAD and the above speech segmentation method. Experimental results revealed that the proposed method is more suitable for cascade and end-to-end ST systems than conventional segmentation methods. The hybrid approach further improved the translation performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2022

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

Speech translation models are unable to directly process long audios, li...
research
02/24/2022

Speech segmentation using multilevel hybrid filters

A novel approach for speech segmentation is proposed, based on Multileve...
research
05/25/2023

End-to-End Simultaneous Speech Translation with Differentiable Segmentation

End-to-end simultaneous speech translation (SimulST) outputs translation...
research
05/04/2021

Speech Decomposition Based on a Hybrid Speech Model and Optimal Segmentation

In a hybrid speech model, both voiced and unvoiced components can coexis...
research
08/05/2020

Contextualized Translation of Automatically Segmented Speech

Direct speech-to-text translation (ST) models are usually trained on cor...
research
04/23/2021

Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation

The audio segmentation mismatch between training data and those seen at ...
research
05/27/2023

How Good is Automatic Segmentation as a Multimodal Discourse Annotation Aid?

Collaborative problem solving (CPS) in teams is tightly coupled with the...

Please sign up or login with your details

Forgot password? Click here to reset