Vocoder-Based Speech Synthesis from Silent Videos

04/06/2020
by   Daniel Michelsanti, et al.
0

Both acoustic and visual information influence human perception of speech. For this reason, the lack of audio in a video sequence determines an extremely low speech intelligibility for untrained lip readers. In this paper, we present a way to synthesise speech from the silent video of a talker using deep learning. The system learns a mapping function from raw video frames to acoustic features and reconstructs the speech with a vocoder synthesis algorithm. To improve speech reconstruction performance, our model is also trained to predict text information in a multi-task learning fashion and it is able to simultaneously reconstruct and recognise speech in real time. The results in terms of estimated speech quality and intelligibility show the effectiveness of our method, which exhibits an improvement over existing video-to-speech approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2020

Audiovisual Speech Synthesis using Tacotron2

Audiovisual speech synthesis is the problem of synthesizing a talking fa...
research
07/31/2023

Audio-visual video-to-speech synthesis with synthesized input audio

Video-to-speech synthesis involves reconstructing the speech signal of a...
research
05/15/2019

Speaker-Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models

Speech-driven visual speech synthesis involves mapping features extracte...
research
08/01/2017

Improved Speech Reconstruction from Silent Video

Speechreading is the task of inferring phonetic information from visuall...
research
11/19/2021

More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech

In this paper we present VDTTS, a Visually-Driven Text-to-Speech model. ...
research
12/22/2020

AudioViewer: Learning to Visualize Sound

Sensory substitution can help persons with perceptual deficits. In this ...
research
11/01/2022

Why Is It Hate Speech? Masked Rationale Prediction for Explainable Hate Speech Detection

In a hate speech detection model, we should consider two critical aspect...

Please sign up or login with your details

Forgot password? Click here to reset