Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus

01/24/2022
by   Yaoming Zhu, et al.
0

The punctuation restoration task aims to correctly punctuate the output transcriptions of automatic speech recognition systems. Previous punctuation models, either using text only or demanding the corresponding audio, tend to be constrained by real scenes, where unpunctuated sentences are a mixture of those with and without audio. This paper proposes a unified multimodal punctuation restoration framework, named UniPunc, to punctuate the mixed sentences with a single model. UniPunc jointly represents audio and non-audio samples in a shared latent space, based on which the model learns a hybrid representation and punctuates both kinds of samples. We validate the effectiveness of the UniPunc on real-world datasets, which outperforms various strong baselines (e.g. BERT, MuSe) by at least 0.8 overall F1 scores, making a new state-of-the-art. Extensive experiments show that UniPunc's design is a pervasive solution: by grafting onto previous models, UniPunc enables them to punctuate on the mixed corpus. Our code is available at github.com/Yaoming95/UniPunc

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2023

Efficient Ensemble Architecture for Multimodal Acoustic and Textual Embeddings in Punctuation Restoration using Time-Delay Neural Networks

Punctuation restoration plays an essential role in the post-processing p...
research
01/05/2022

Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction

Video recordings of speech contain correlated audio and visual informati...
research
11/21/2022

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

Although speech is a simple and effective way for humans to communicate ...
research
05/12/2023

Versatile Audio-Visual Learning for Handling Single and Multi Modalities in Emotion Regression and Classification Tasks

Most current audio-visual emotion recognition models lack the flexibilit...
research
03/03/2023

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

Speech restoration (SR) is a task of converting degraded speech signals ...
research
02/12/2021

Multimodal Punctuation Prediction with Contextual Dropout

Automatic speech recognition (ASR) is widely used in consumer electronic...
research
07/29/2020

Text-based classification of interviews for mental health – juxtaposing the state of the art

Currently, the state of the art for classification of psychiatric illnes...

Please sign up or login with your details

Forgot password? Click here to reset