Separable Temporal Convolution plus Temporally Pooled Attention for Lightweight High-performance Keyword Spotting

08/27/2021
by   Shenghua Hu, et al.
0

Keyword spotting (KWS) on mobile devices generally requires a small memory footprint. However, most current models still maintain a large number of parameters in order to ensure good performance. In this paper, we propose a temporally pooled attention module which can capture global features better than the AveragePool. Besides, we design a separable temporal convolution network which leverages depthwise separable and temporal convolution to reduce the number of parameter and calculations. Finally, taking advantage of separable temporal convolution and temporally pooled attention, a efficient neural network (ST-AttNet) is designed for KWS system. We evaluate the models on the publicly available Google speech commands data sets V1. The number of parameters of proposed model (48K) is 1/6 of state-of-the-art TC-ResNet14-1.5 model (305K). The proposed model achieves a 96.6 to the TC-ResNet14-1.5 model (96.6

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2021

A Separable Temporal Convolution Neural Network with Attention for Small-Footprint Keyword Spotting

Keyword spotting (KWS) on mobile devices generally requires a small memo...
research
04/25/2020

Depthwise Separable Convolutional ResNet with Squeeze-and-Excitation Blocks for Small-footprint Keyword Spotting

One difficult problem of keyword spotting is how to miniaturize its memo...
research
10/20/2020

Small-Footprint Keyword Spotting with Multi-Scale Temporal Convolution

Keyword Spotting (KWS) plays a vital role in human-computer interaction ...
research
08/01/2020

Neural ODE with Temporal Convolution and Time Delay Neural Networks for Small-Footprint Keyword Spotting

In this paper, we propose neural network models based on the neural ordi...
research
04/08/2019

Temporal Convolution for Real-time Keyword Spotting on Mobile Devices

Keyword spotting (KWS) plays a critical role in enabling speech-based us...
research
01/15/2022

ConvMixer: Feature Interactive Convolution with Curriculum Learning for Small Footprint and Noisy Far-field Keyword Spotting

Building efficient architecture in neural speech processing is paramount...
research
04/10/2019

C3AE: Exploring the Limits of Compact Model for Age Estimation

Age estimation is a classic learning problem in computer vision. Many la...

Please sign up or login with your details

Forgot password? Click here to reset