Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?

05/23/2023
by   Eklavya Sarkar, et al.
0

Self-supervised learning (SSL) models use only the intrinsic structure of a given signal, independent of its acoustic domain, to extract essential information from the input to an embedding space. This implies that the utility of such representations is not limited to modeling human speech alone. Building on this understanding, this paper explores the cross-transferability of SSL neural representations learned from human speech to analyze bio-acoustic signals. We conduct a caller discrimination analysis and a caller detection study on Marmoset vocalizations using eleven SSL models pre-trained with various pretext tasks. The results show that the embedding spaces carry meaningful caller information and can successfully distinguish the individual identities of Marmoset callers without fine-tuning. This demonstrates that representations pre-trained on human speech can be effectively applied to the bio-acoustics domain, providing valuable insights for future investigations in this field.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2021

Investigating self-supervised front ends for speech spoofing countermeasures

Self-supervised speech model is a rapid progressing research topic, and ...
research
06/01/2023

How to Estimate Model Transferability of Pre-Trained Speech Models?

In this work, we introduce a “score-based assessment” framework for esti...
research
04/05/2022

Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation

We propose a computational model of speech production combining a pre-tr...
research
10/25/2020

Probing Acoustic Representations for Phonetic Properties

Pre-trained acoustic representations such as wav2vec and DeCoAR have att...
research
10/27/2022

Opening the Black Box of wav2vec Feature Encoder

Self-supervised models, namely, wav2vec and its variants, have shown pro...
research
10/28/2022

The Importance of Speech Stimuli for Pathologic Speech Classification

Current findings show that pre-trained wav2vec 2.0 models can be success...
research
04/25/2023

Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations

Joint-embedding based learning (e.g., SimCLR, MoCo, DINO) and reconstruc...

Please sign up or login with your details

Forgot password? Click here to reset