Listening to the World Improves Speech Command Recognition

10/23/2017
by   Brian McMahan, et al.
0

We study transfer learning in convolutional network architectures applied to the task of recognizing audio, such as environmental sound events and speech commands. Our key finding is that not only is it possible to transfer representations from an unrelated task like environmental sound classification to a voice-focused task like speech command recognition, but also that doing so improves accuracies significantly. We also investigate the effect of increased model capacity for transfer learning audio, by first validating known results from the field of Computer Vision of achieving better accuracies with increasingly deeper networks on two audio datasets: UrbanSound8k and the newly released Google Speech Commands dataset. Then we propose a simple multiscale input representation using dilated convolutions and show that it is able to aggregate larger contexts and increase classification performance. Further, the models trained using a combination of transfer learning and multiscale input representations need only 40 accuracies as a freshly trained model with 100 we demonstrate a positive interaction effect for the multiscale input and transfer learning, making a case for the joint application of the two techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2021

Do sound event representations generalize to other audio tasks? A case study in audio transfer learning

Transfer learning is critical for efficient information transfer across ...
research
02/06/2019

Transfer Learning From Sound Representations For Anger Detection in Speech

In this work, we train fully convolutional networks to detect anger in s...
research
11/04/2017

Knowledge Transfer from Weakly Labeled Audio using Convolutional Neural Network for Sound Events and Scenes

In this work we propose approaches to effectively transfer knowledge fro...
research
06/11/2020

Blissful Ignorance: Anti-Transfer Learning for Task Invariance

We introduce the novel concept of anti-transfer learning for neural netw...
research
07/09/2019

Transfer Learning from Audio-Visual Grounding to Speech Recognition

Transfer learning aims to reduce the amount of data required to excel at...
research
03/15/2023

Transfer Learning Based Diagnosis and Analysis of Lung Sound Aberrations

With the development of computer -systems that can collect and analyze e...
research
04/15/2020

ESResNet: Environmental Sound Classification Based on Visual Domain Models

Environmental Sound Classification (ESC) is an active research area in t...

Please sign up or login with your details

Forgot password? Click here to reset