Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition

03/23/2023
by   Kai Liu, et al.
0

As one of the major branches of automatic speech recognition, attention-based models greatly improves the feature representation ability of the model. In particular, the multi-head mechanism is employed in the attention, hoping to learn speech features of more aspects in different attention subspaces. For speech recognition of complex languages, on the one hand, a small head size will lead to an obvious shortage of learnable aspects. On the other hand, we need to reduce the dimension of each subspace to keep the size of the overall feature space unchanged when we increase the number of heads, which will significantly weaken the ability to represent the feature of each subspace. Therefore, this paper explores how to use a small attention subspace to represent complete speech features while ensuring many heads. In this work we propose a novel neural network architecture, namely, pyramid multi-branch fusion DCNN with multi-head self-attention. The proposed architecture is inspired by Dilated Convolution Neural Networks (DCNN), it uses multiple branches with DCNN to extract the feature of the input speech under different receptive fields. To reduce the number of parameters, every two branches are merged until all the branches are merged into one. Thus, its shape is like a pyramid rotated 90 degrees. We demonstrate that on Aishell-1, a widely used Mandarin speech dataset, our model achieves a character error rate (CER) of 6.45

READ FULL TEXT
research
08/31/2021

Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition

The recently proposed Conformer architecture has shown state-of-the-art ...
research
04/14/2021

Efficient conformer-based speech recognition with linear attention

Recently, conformer-based end-to-end automatic speech recognition, which...
research
09/13/2022

Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition

Attention layers are an integral part of modern end-to-end automatic spe...
research
04/22/2018

Multi-Head Decoder for End-to-End Speech Recognition

This paper presents a new network architecture called multi-head decoder...
research
05/01/2020

Multi-head Monotonic Chunkwise Attention For Online Speech Recognition

The attention mechanism of the Listen, Attend and Spell (LAS) model requ...
research
07/10/2018

Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition

In this paper, we propose a novel Convolutional Neural Network (CNN) arc...
research
09/01/2023

MuraNet: Multi-task Floor Plan Recognition with Relation Attention

The recognition of information in floor plan data requires the use of de...

Please sign up or login with your details

Forgot password? Click here to reset