CLAR: Contrastive Learning of Auditory Representations

10/19/2020
by   Haider Al-Tahan, et al.
0

Learning rich visual representations using contrastive self-supervised learning has been extremely successful. However, it is still a major question whether we could use a similar approach to learn superior auditory representations. In this paper, we expand on prior work (SimCLR) to learn better auditory representations. We (1) introduce various data augmentations suitable for auditory data and evaluate their impact on predictive performance, (2) show that training with time-frequency audio features substantially improves the quality of the learned representations compared to raw signals, and (3) demonstrate that training with both supervised and contrastive losses simultaneously improves the learned representations compared to self-supervised pre-training followed by supervised fine-tuning. We illustrate that by combining all these methods and with substantially less labeled data, our framework (CLAR) achieves significant improvement on prediction performance compared to supervised approach. Moreover, compared to self-supervised approach, our framework converges faster with significantly better representations.

READ FULL TEXT

page 6

page 7

research
02/13/2020

A Simple Framework for Contrastive Learning of Visual Representations

This paper presents SimCLR: a simple framework for contrastive learning ...
research
10/13/2021

The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning

Contrastive learning of auditory and visual perception has been extremel...
research
05/15/2023

Improved baselines for vision-language pre-training

Contrastive learning has emerged as an efficient framework to learn mult...
research
10/22/2020

A Framework for Contrastive and Generative Learning of Audio Representations

In this paper, we present a framework for contrastive learning for audio...
research
05/27/2020

CLOCS: Contrastive Learning of Cardiac Signals

The healthcare industry generates troves of unlabelled physiological dat...
research
03/29/2022

Learning neural audio features without supervision

Deep audio classification, traditionally cast as training a deep neural ...
research
06/16/2021

Watching Too Much Television is Good: Self-Supervised Audio-Visual Representation Learning from Movies and TV Shows

The abundance and ease of utilizing sound, along with the fact that audi...

Please sign up or login with your details

Forgot password? Click here to reset