Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model

09/06/2021
by   Zhongwei Teng, et al.
0

An emerging trend in audio processing is capturing low-level speech representations from raw waveforms. These representations have shown promising results on a variety of tasks, such as speech recognition and speech separation. Compared to handcrafted features, learning speech features via backpropagation provides the model greater flexibility in how it represents data for different tasks theoretically. However, results from empirical study shows that, in some tasks, such as voice spoof detection, handcrafted features are more competitive than learned features. Instead of evaluating handcrafted features and raw waveforms independently, this paper proposes an Auxiliary Rawnet model to complement handcrafted features with features learned from raw waveforms. A key benefit of the approach is that it can improve accuracy at a relatively low computational cost. The proposed Auxiliary Rawnet model is tested using the ASVspoof 2019 dataset and the results from this dataset indicate that a light-weight waveform encoder can potentially boost the performance of handcrafted-features-based encoders in exchange for a small amount of additional computational work.

READ FULL TEXT

page 1

page 4

research
09/28/2022

MeWEHV: Mel and Wave Embeddings for Human Voice Tasks

A recent trend in speech processing is the use of embeddings created thr...
research
11/27/2018

Learning to detect dysarthria from raw speech

Speech classifiers of paralinguistic traits traditionally learn from div...
research
09/11/2016

Wav2Letter: an End-to-End ConvNet-based Speech Recognition System

This paper presents a simple end-to-end model for speech recognition, co...
research
06/21/2019

Multi-Span Acoustic Modelling using Raw Waveform Signals

Traditional automatic speech recognition (ASR) systems often use an acou...
research
03/31/2022

DeepFry: Identifying Vocal Fry Using Deep Neural Networks

Vocal fry or creaky voice refers to a voice quality characterized by irr...
research
06/02/2023

Improved DeepFake Detection Using Whisper Features

With a recent influx of voice generation methods, the threat introduced ...
research
05/11/2019

Encrypted Speech Recognition using Deep Polynomial Networks

The cloud-based speech recognition/API provides developers or enterprise...

Please sign up or login with your details

Forgot password? Click here to reset