Learning to Separate Voices by Spatial Regions

by   Zhongweiyang Xu, et al.

We consider the problem of audio voice separation for binaural applications, such as earphones and hearing aids. While today's neural networks perform remarkably well (separating 4+ sources with 2 microphones) they assume a known or fixed maximum number of sources, K. Moreover, today's models are trained in a supervised manner, using training data synthesized from generic sources, environments, and human head shapes. This paper intends to relax both these constraints at the expense of a slight alteration in the problem definition. We observe that, when a received mixture contains too many sources, it is still helpful to separate them by region, i.e., isolating signal mixtures from each conical sector around the user's head. This requires learning the fine-grained spatial properties of each region, including the signal distortions imposed by a person's head. We propose a two-stage self-supervised framework in which overheard voices from earphones are pre-processed to extract relatively clean personalized signals, which are then used to train a region-wise separation model. Results show promising performance, underscoring the importance of personalization over a generic supervised approach. (audio samples available at our project website: https://uiuc-earable-computing.github.io/binaural/. We believe this result could help real-world applications in selective hearing, noise cancellation, and audio augmented reality.


page 1

page 2

page 3

page 4


Visually Guided Sound Source Separation and Localization using Self-Supervised Motion Representations

The objective of this paper is to perform audio-visual sound source sepa...

Adversarial Semi-Supervised Audio Source Separation applied to Singing Voice Extraction

The state of the art in music source separation employs neural networks ...

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

Recent progress in deep learning has enabled many advances in sound sepa...

Separate What You Describe: Language-Queried Audio Source Separation

In this paper, we introduce the task of language-queried audio source se...

Self-Supervised Generation of Spatial Audio for 360 Video

We introduce an approach to convert mono audio recorded by a 360 video c...

Multi-Microphone Speaker Separation by Spatial Regions

We consider the task of region-based source separation of reverberant mu...

Please sign up or login with your details

Forgot password? Click here to reset