Content Adaptive Front End For Audio Signal Processing

03/18/2023
by   Prateek Verma, et al.
0

We propose a learnable content adaptive front end for audio signal processing. Before the modern advent of deep learning, we used fixed representation non-learnable front-ends like spectrogram or mel-spectrogram with/without neural architectures. With convolutional architectures supporting various applications such as ASR and acoustic scene understanding, a shift to a learnable front ends occurred in which both the type of basis functions and the weight were learned from scratch and optimized for the particular task of interest. With the shift to transformer-based architectures with no convolutional blocks present, a linear layer projects small waveform patches onto a small latent dimension before feeding them to a transformer architecture. In this work, we propose a way of computing a content-adaptive learnable time-frequency representation. We pass each audio signal through a bank of convolutional filters, each giving a fixed-dimensional vector. It is akin to learning a bank of finite impulse-response filterbanks and passing the input signal through the optimum filter bank depending on the content of the input signal. A content-adaptive learnable time-frequency representation may be more broadly applicable, beyond the experiments in this paper.

READ FULL TEXT

page 3

page 4

research
08/20/2023

Neural Architectures Learning Fourier Transforms, Signal Processing and Much More....

This report will explore and answer fundamental questions about taking F...
research
05/01/2021

Audio Transformers:Transformer Architectures For Large Scale Audio Understanding. Adieu Convolutions

Over the past two decades, CNN architectures have produced compelling mo...
research
11/17/2022

SpectNet : End-to-End Audio Signal Classification Using Learnable Spectrograms

Pattern recognition from audio signals is an active research topic encom...
research
03/29/2022

Learning neural audio features without supervision

Deep audio classification, traditionally cast as training a deep neural ...
research
10/25/2022

Artificial ASMR: A Cyber-Psychological Study

The popularity of Autonomous Sensory Meridian Response (ASMR) has skyroc...
research
08/01/2020

Singer Identification Using Convolutional Acoustic Motif Embeddings

Flamenco singing is characterized by pitch instability, micro-tonal orna...
research
07/30/2021

A Multi-Head Relevance Weighting Framework For Learning Raw Waveform Audio Representations

In this work, we propose a multi-head relevance weighting framework to l...

Please sign up or login with your details

Forgot password? Click here to reset