Temporal Convolution for Real-time Keyword Spotting on Mobile Devices

04/08/2019
by   Seungwoo Choi, et al.
0

Keyword spotting (KWS) plays a critical role in enabling speech-based user interactions on smart devices. Recent developments in the field of deep learning have led to wide adoption of convolutional neural networks (CNNs) in KWS systems due to their exceptional accuracy and robustness. The main challenge faced by KWS systems is the trade-off between high accuracy and low latency. Unfortunately, there has been little quantitative analysis of the actual latency of KWS models on mobile devices. This is especially concerning since conventional convolution-based KWS approaches are known to require a large number of operations to attain an adequate level of performance. In this paper, we propose a temporal convolution for real-time KWS on mobile devices. Unlike most of the 2D convolution-based KWS approaches that require a deep architecture to fully capture both low- and high-frequency domains, we exploit temporal convolutions with a compact ResNet architecture. In Google Speech Command Dataset, we achieve more than 385x speedup on Google Pixel 1 and surpass the accuracy compared to the state-of-the-art model. In addition, we release the implementation of the proposed and the baseline models including an end-to-end pipeline for training models and evaluating them on mobile devices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2021

Broadcasted Residual Learning for Efficient Keyword Spotting

Keyword spotting is an important research field because it plays a key r...
research
05/12/2023

Monitoring and Adapting ML Models on Mobile Devices

ML models are increasingly being pushed to mobile devices, for low-laten...
research
03/03/2023

Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers

Keyword spotting (KWS) is a core human-machine-interaction front-end tas...
research
08/27/2021

Separable Temporal Convolution plus Temporally Pooled Attention for Lightweight High-performance Keyword Spotting

Keyword spotting (KWS) on mobile devices generally requires a small memo...
research
12/10/2017

A Cascade Architecture for Keyword Spotting on Mobile Devices

We present a cascade architecture for keyword spotting with speaker veri...
research
06/17/2022

PICO: Pipeline Inference Framework for Versatile CNNs on Diverse Mobile Devices

Recent researches in artificial intelligence have proposed versatile con...
research
02/12/2023

LipLearner: Customizable Silent Speech Interactions on Mobile Devices

Silent speech interface is a promising technology that enables private c...

Please sign up or login with your details

Forgot password? Click here to reset