Low Bit-Rate Speech Coding with VQ-VAE and a WaveNet Decoder

10/14/2019
by   Cristina Garbacea, et al.
11

In order to efficiently transmit and store speech signals, speech codecs create a minimally redundant representation of the input signal which is then decoded at the receiver with the best possible perceptual quality. In this work we demonstrate that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality. A prosody-transparent and speaker-independent model trained on the LibriSpeech corpus coding audio at 1.6 kbps exhibits perceptual quality which is around halfway between the MELP codec at 2.4 kbps and AMR-WB codec at 23.05 kbps. In addition, when training on high-quality recorded speech with the test speaker included in the training set, a model coding speech at 1.6 kbps produces output of similar perceptual quality to that generated by AMR-WB at 23.05 kbps.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/01/2017

Wavenet based low rate speech coding

Traditional parametric coding of speech facilitates low rate but provide...
research
02/04/2021

Low Bit-Rate Wideband Speech Coding: A Deep Generative Model based Approach

Traditional low bit-rate speech coding approach only handles narrowband ...
research
07/05/2019

Speech bandwidth extension with WaveNet

Large-scale mobile communication systems tend to contain legacy transmis...
research
07/07/2022

NESC: Robust Neural End-2-End Speech Coding with GANs

Neural networks have proven to be a formidable tool to tackle the proble...
research
09/09/2018

A novel method of speech information hiding based on 3D-Magic Matrix

Redundant information of low-bit-rate speech is extremely small, thus it...
research
05/16/2020

Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction

Vector Quantized Variational AutoEncoders (VQ-VAE) are a powerful repres...
research
08/19/2019

Salient Speech Representations Based on Cloned Networks

We define salient features as features that are shared by signals that a...

Please sign up or login with your details

Forgot password? Click here to reset