Real time spectrogram inversion on mobile phone

03/01/2022
by   Oleg Rybakov, et al.
0

With the growth of computing power on mobile phones and privacy concerns over user's data, on-device real time speech processing has become an important research topic. In this paper, we focus on methods for real time spectrogram inversion, where an algorithm receives a portion of the input signal (e.g., one frame) and processes it incrementally, i.e., operating in streaming mode. We present a real time Griffin Lim(GL) algorithm using a sliding window approach in STFT domain. The proposed algorithm is 2.4x faster than real time on the ARM CPU of a Pixel4. In addition we explore a neural vocoder operating in streaming mode and demonstrate the impact of looking ahead on perceptual quality. As little as one hop size (12.5ms) of lookahead is able to significantly improve perceptual quality in comparison to a causal model. We compare GL with the neural vocoder and show different trade-offs in terms of perceptual quality, on-device latency, algorithmic delay, memory footprint and noise sensitivity. For fair quality assessment of the GL approach, we use input log magnitude spectrogram without mel transformation. We evaluate presented real time spectrogram inversion approaches on clean, noisy and atypical speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2022

Streaming Parrotron for on-device speech-to-speech conversion

We present a fully on-device and streaming Speech-To-Speech (STS) conver...
research
05/14/2020

Streaming keyword spotting on mobile devices

In this work we explore the latency and accuracy of keyword spotting (KW...
research
07/27/2023

Turning Whisper into Real-Time Transcription System

Whisper is one of the recent state-of-the-art multilingual speech recogn...
research
08/29/2022

A Language Agnostic Multilingual Streaming On-Device ASR System

On-device end-to-end (E2E) models have shown improvements over a convent...
research
04/14/2022

Streamable Neural Audio Synthesis With Non-Causal Convolutions

Deep learning models are mostly used in an offline inference fashion. Ho...
research
10/21/2020

Real-time Speech Frequency Bandwidth Extension

In this paper we propose a lightweight model for frequency bandwidth ext...
research
12/01/2020

Low Bandwidth Video-Chat Compression using Deep Generative Models

To unlock video chat for hundreds of millions of people hindered by poor...

Please sign up or login with your details

Forgot password? Click here to reset