The CAPIO 2017 Conversational Speech Recognition System

12/29/2017
by   Kyu J. Han, et al.
0

In this paper we show how we have achieved the state-of-the-art performance on the industry-standard NIST 2000 Hub5 English evaluation set. We explore densely connected LSTMs, inspired by the densely connected convolutional networks recently introduced for image classification tasks. We also propose an acoustic model adaptation scheme that simply averages the parameters of a seed neural network acoustic model and its adapted version. This method was applied with the CallHome training corpus and improved individual system performances by on average 6.1 set with no performance loss on the Switchboard portion. With RNN-LM rescoring and lattice combination on the 5 systems trained across three different phone sets, our 2017 speech recognition system has obtained 5.0 Switchboard and CallHome, respectively, both of which are the best word error rates reported thus far. According to IBM in their latest work to compare human and machine transcriptions, our reported Switchboard word error rate can be considered to surpass the human parity (5.1 telephone speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2016

Achieving Human Parity in Conversational Speech Recognition

Conversational speech recognition has served as a flagship speech recogn...
research
03/06/2017

English Conversational Telephone Speech Recognition by Humans and Machines

One of the most difficult speech recognition tasks is accurate recogniti...
research
08/21/2017

The Microsoft 2017 Conversational Speech Recognition System

We describe the 2017 version of Microsoft's conversational speech recogn...
research
08/10/2018

Densely Connected Convolutional Networks for Speech Recognition

This paper presents our latest investigation on Densely Connected Convol...
research
11/05/2018

The Marchex 2018 English Conversational Telephone Speech Recognition System

In this paper, we describe recent improvements to the production Marchex...
research
03/17/2021

Advancing RNN Transducer Technology for Speech Recognition

We investigate a set of techniques for RNN Transducers (RNN-Ts) that wer...
research
02/18/2018

Improved TDNNs using Deep Kernels and Frequency Dependent Grid-RNNs

Time delay neural networks (TDNNs) are an effective acoustic model for l...

Please sign up or login with your details

Forgot password? Click here to reset