SwishNet: A Fast Convolutional Neural Network for Speech, Music and Noise Classification and Segmentation

12/01/2018
by   Md. Shamim Hussain, et al.
0

Speech, Music and Noise classification/segmentation is an important preprocessing step for audio processing/indexing. To this end, we propose a novel 1D Convolutional Neural Network (CNN) - SwishNet. It is a fast and lightweight architecture that operates on MFCC features which is suitable to be added to the front-end of an audio processing pipeline. We showed that the performance of our network can be improved by distilling knowledge from a 2D CNN, pretrained on ImageNet. We investigated the performance of our network on the MUSAN corpus - an openly available comprehensive collection of noise, music and speech samples, suitable for deep learning. The proposed network achieved high overall accuracy in clip (length of 0.5-2s) classification (>97 and frame-wise segmentation (>93 (>99 our model, we trained it on MUSAN and evaluated it on a different corpus - GTZAN and found good accuracy with very little fine-tuning. We also demonstrated that our model is fast on both CPU and GPU, consumes a low amount of memory and is suitable for implementation in embedded systems.

READ FULL TEXT
research
05/15/2021

1D CNN Architectures for Music Genre Classification

This paper proposes a 1D residual convolutional neural network (CNN) arc...
research
02/19/2021

Artificially Synthesising Data for Audio Classification and Segmentation to Improve Speech and Music Detection in Radio Broadcast

Segmenting audio into homogeneous sections such as music and speech help...
research
10/08/2021

MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection

With the recent growth of remote and hybrid work, online meetings often ...
research
11/16/2018

AclNet: efficient end-to-end audio classification CNN

We propose an efficient end-to-end convolutional neural network architec...
research
01/24/2019

Bottom-up Broadcast Neural Network For Music Genre Classification

Music genre recognition based on visual representation has been successf...
research
02/11/2021

DEEPF0: End-To-End Fundamental Frequency Estimation for Music and Speech Signals

We propose a novel pitch estimation technique called DeepF0, which lever...

Please sign up or login with your details

Forgot password? Click here to reset