Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation

02/11/2023
by   Cong Han, et al.
0

Auditory attention decoding (AAD) is a technique used to identify and amplify the talker that a listener is focused on in a noisy environment. This is done by comparing the listener's brainwaves to a representation of all the sound sources to find the closest match. The representation is typically the waveform or spectrogram of the sounds. The effectiveness of these representations for AAD is uncertain. In this study, we examined the use of self-supervised learned speech representation in improving the accuracy and speed of AAD. We recorded the brain activity of three subjects using invasive electrocorticography (ECoG) as they listened to two conversations and focused on one. We used WavLM to extract a latent representation of each talker and trained a spatiotemporal filter to map brain activity to intermediate representations of speech. During the evaluation, the reconstructed representation is compared to each speaker's representation to determine the target speaker. Our results indicate that speech representation from WavLM provides better decoding accuracy and speed than the speech envelope and spectrogram. Our findings demonstrate the advantages of self-supervised learned speech representation for auditory attention decoding and pave the way for developing brain-controlled hearable technologies.

READ FULL TEXT
research
04/01/2021

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations

We propose using self-supervised discrete representations for the task o...
research
02/24/2023

Phone and speaker spatial organization in self-supervised speech representations

Self-supervised representations of speech are currently being widely use...
research
10/16/2022

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

We present the SUPERB challenge at SLT 2022, which aims at learning self...
research
08/09/2023

Speaker Recognition Using Isomorphic Graph Attention Network Based Pooling on Self-Supervised Representation

The emergence of self-supervised representation (i.e., wav2vec 2.0) allo...
research
06/03/2022

Toward a realistic model of speech processing in the brain with self-supervised learning

Several deep neural networks have recently been shown to generate activa...
research
11/05/2018

Decoding Generic Visual Representations From Human Brain Activity using Machine Learning

Among the most impressive recent applications of neural decoding is the ...
research
06/09/2023

Probing self-supervised speech models for phonetic and phonemic information: a case study in aspiration

Textless self-supervised speech models have grown in capabilities in rec...

Please sign up or login with your details

Forgot password? Click here to reset