Toward Zero Oracle Word Error Rate on the Switchboard Benchmark

06/13/2022
by   Arlo Faria, et al.
0

The "Switchboard benchmark" is a very well-known test set in automatic speech recognition (ASR) research, establishing record-setting performance for systems that claim human-level transcription accuracy. This work highlights lesser-known practical considerations of this evaluation, demonstrating major improvements in word error rate (WER) by correcting the reference transcriptions and deviating from the official scoring methodology. In this more detailed and reproducible scheme, even commercial ASR systems can score below 5 2.3 penalize deletions and appears to be more discriminating for human vs. machine performance. While commercial ASR systems are still below this threshold, a research system is shown to clearly surpass the accuracy of commercial human speech recognition. This work also explores using standardized scoring tools to compute oracle WER by selecting the best among a list of alternatives. A phrase alternatives representation is compared to utterance-level N-best lists and word-level data structures; using dense lattices and adding out-of-vocabulary words, this achieves an oracle WER of 0.18

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2022

Earnings-22: A Practical Benchmark for Accents in the Wild

Modern automatic speech recognition (ASR) systems have achieved superhum...
research
08/23/2021

Automatic Speech Recognition using limited vocabulary: A survey

Automatic Speech Recognition (ASR) is an active field of research due to...
research
06/07/2023

Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally Occurring Spelling Inconsistency

Word error rate (WER) and character error rate (CER) are standard metric...
research
06/14/2021

Assessing the Use of Prosody in Constituency Parsing of Imperfect Transcripts

This work explores constituency parsing on automatically recognized tran...
research
08/08/2020

Word Error Rate Estimation Without ASR Output: e-WER2

Measuring the performance of automatic speech recognition (ASR) systems ...
research
11/01/2021

A transfer learning based approach for pronunciation scoring

Phone-level pronunciation scoring is a challenging task, with performanc...
research
11/29/2022

On Word Error Rate Definitions and their Efficient Computation for Multi-Speaker Speech Recognition Systems

We present a general framework to compute the word error rate (WER) of A...

Please sign up or login with your details

Forgot password? Click here to reset