Visual Speech Enhancement using Noise-Invariant Training

11/23/2017
by   Aviv Gabbay, et al.
0

Visual speech enhancement is used on videos shot in noisy environments to enhance the voice of a visible speaker and to reduce background noise. While most existing methods use audio-only inputs, we propose an audio-visual neural network model for this purpose. The visible mouth movements are used to separate the speaker's voice from the background sounds. Instead of training our speech enhancement model on a wide range of possible noise types, we train the model on videos where other speech samples of the target speaker are used as background noise. A model trained using this paradigm generalizes well to various noise types, while also substantially reducing training time. The proposed model outperforms prior audio visual methods on two public lipreading datasets. It is also the first to be demonstrated on a general dataset not designed for lipreading. Our dataset was composed of weekly addresses of Barack Obama.

READ FULL TEXT

page 4

page 6

research
11/23/2017

Visual Speech Enhancement

When video is shot in noisy environment, the voice of a speaker seen in ...
research
07/11/2019

My lips are concealed: Audio-visual speech enhancement through obstructions

Our objective is an audio-visual model for separating a single speaker f...
research
01/22/2023

Cellular Network Speech Enhancement: Removing Background and Transmission Noise

The primary objective of speech enhancement is to reduce background nois...
research
08/22/2017

Seeing Through Noise: Visually Driven Speaker Separation and Enhancement

Isolating the voice of a specific person while filtering out other voice...
research
12/20/2020

Visual Speech Enhancement Without A Real Visual Stream

In this work, we re-think the task of speech enhancement in unconstraine...
research
11/06/2018

Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments

In this paper, we address the problem of enhancing the speech of a speak...
research
07/20/2018

A Fully Convolutional Neural Network Approach to End-to-End Speech Enhancement

This paper will describe a novel approach to the cocktail party problem ...

Please sign up or login with your details

Forgot password? Click here to reset