Broadcasted Residual Learning for Efficient Keyword Spotting

06/08/2021
by   Byeonggeun Kim, et al.
0

Keyword spotting is an important research field because it plays a key role in device wake-up and user interaction on smart devices. However, it is challenging to minimize errors while operating efficiently in devices with limited resources such as mobile phones. We present a broadcasted residual learning method to achieve high accuracy with small model size and computational load. Our method configures most of the residual functions as 1D temporal convolution while still allows 2D convolution together using a broadcasted-residual connection that expands temporal output to frequency-temporal dimension. This residual mapping enables the network to effectively represent useful audio features with much less computation than conventional convolutional neural networks. We also propose a novel network architecture, Broadcasting-residual network (BC-ResNet), based on broadcasted residual learning and describe how to scale up the model according to the target device's resources. BC-ResNets achieve state-of-the-art 98.0 top-1 accuracy on Google speech command datasets v1 and v2, respectively, and consistently outperform previous approaches, using fewer computations and parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2017

Deep Residual Learning for Small-Footprint Keyword Spotting

We explore the application of deep residual learning and dilated convolu...
research
04/08/2019

Temporal Convolution for Real-time Keyword Spotting on Mobile Devices

Keyword spotting (KWS) plays a critical role in enabling speech-based us...
research
03/31/2022

A Temporal-oriented Broadcast ResNet for COVID-19 Detection

Detecting COVID-19 from audio signals, such as breathing and coughing, c...
research
01/15/2019

URNet : User-Resizable Residual Networks with Conditional Gating Module

Convolutional Neural Networks are widely used to process spatial scenes,...
research
06/22/2019

Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

Keyword spotting (KWS) is experiencing an upswing due to the pervasivene...
research
11/12/2021

Domain Generalization on Efficient Acoustic Scene Classification using Residual Normalization

It is a practical research topic how to deal with multi-device audio inp...
research
06/28/2022

QTI Submission to DCASE 2021: residual normalization for device-imbalanced acoustic scene classification with efficient design

This technical report describes the details of our TASK1A submission of ...

Please sign up or login with your details

Forgot password? Click here to reset