Learning Frame Level Attention for Environmental Sound Classification

07/12/2020
by   Zhichao Zhang, et al.
0

Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The classification performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However, ESC often suffers from the semantically irrelevant frames and silent frames. In order to deal with this, we employ a frame-level attention model to focus on the semantically relevant frames and salient frames. Specifically, we first propose a convolutional recurrent neural network to learn spectro-temporal features and temporal correlations. Then, we extend our convolutional RNN model with a frame-level attention mechanism to learn discriminative feature representations for ESC. We investigated the classification performance when using different attention scaling function and applying different layers. Experiments were conducted on ESC-50 and ESC-10 datasets. Experimental results demonstrated the effectiveness of the proposed method and our method achieved the state-of-the-art or competitive classification accuracy with lower computational complexity. We also visualized our attention results and observed that the proposed attention mechanism was able to lead the network tofocus on the semantically relevant parts of environmental sounds.

READ FULL TEXT

page 2

page 3

page 8

research
07/04/2019

Attention based Convolutional Recurrent Neural Network for Environmental Sound Classification

Environmental sound classification (ESC) is a challenging problem due to...
research
08/25/2018

Deep Convolutional Neural Network with Mixup for Environmental Sound Classification

Environmental sound classification (ESC) is an important and challenging...
research
08/16/2019

Sub-Spectrogram Segmentation for Environmental Sound Classification via Convolutional Recurrent Neural Network and Score Level Fusion

Environmental Sound Classification (ESC) is an important and challenging...
research
04/14/2021

Revisiting the Onsets and Frames Model with Additive Attention

Recent advances in automatic music transcription (AMT) have achieved hig...
research
06/12/2020

Recursion and evolution: Part II

We examine the question of whether it is possible for a diagonalizing sy...
research
06/05/2019

Automated Classification of Seizures against Nonseizures: A Deep Learning Approach

In current clinical practice, electroencephalograms (EEG) are reviewed a...
research
11/10/2021

Learning to ignore: rethinking attention in CNNs

Recently, there has been an increasing interest in applying attention me...

Please sign up or login with your details

Forgot password? Click here to reset