BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric

12/16/2022
by   Mingda Chen, et al.
0

End-to-End speech-to-speech translation (S2ST) is generally evaluated with text-based metrics. This means that generated speech has to be automatically transcribed, making the evaluation dependent on the availability and quality of automatic speech recognition (ASR) systems. In this paper, we propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR systems. BLASER leverages a multilingual multimodal encoder to directly encode the speech segments for source input, translation output and reference into a shared embedding space and computes a score of the translation quality that can be used as a proxy to human evaluation. To evaluate our approach, we construct training and evaluation sets from more than 40k human annotations covering seven language directions. The best results of BLASER are achieved by training with supervision from human rating scores. We show that when evaluated at the sentence level, BLASER correlates significantly better with human judgment compared to ASR-dependent metrics including ASR-SENTBLEU in all translation directions and ASR-COMET in five of them. Our analysis shows combining speech and text as inputs to BLASER does not increase the correlation with human scores, but best correlations are achieved when using speech, which motivates the goal of our research. Moreover, we show that using ASR for references is detrimental for text-based metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2023

Understanding Shared Speech-Text Representations

Recently, a number of approaches to train speech models by incorpo-ratin...
research
11/05/2022

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

End-to-end formulation of automatic speech recognition (ASR) and speech ...
research
06/02/2020

An ASR Guided Speech Intelligibility Measure for TTS Model Selection

The perceptual quality of neural text-to-speech (TTS) is highly dependen...
research
12/08/2022

SpeechLMScore: Evaluating speech generation using speech language model

While human evaluation is the most reliable metric for evaluating speech...
research
10/21/2022

A Textless Metric for Speech-to-Speech Comparison

This paper proposes a textless speech-to-speech comparison metric that a...
research
12/06/2017

Evaluating the Usability of Automatically Generated Captions for People who are Deaf or Hard of Hearing

The accuracy of Automated Speech Recognition (ASR) technology has improv...
research
05/22/2023

Improving Metrics for Speech Translation

We introduce Parallel Paraphrasing (Para_both), an augmentation method f...

Please sign up or login with your details

Forgot password? Click here to reset