Modeling Acoustic-Prosodic Cues for Word Importance Prediction in Spoken Dialogues

03/28/2019
by   Sushant Kafle, et al.
0

Prosodic cues in conversational speech aid listeners in discerning a message. We investigate whether acoustic cues in spoken dialogue can be used to identify the importance of individual words to the meaning of a conversation turn. Individuals who are Deaf and Hard of Hearing often rely on real-time captions in live meetings. Word error rate, a traditional metric for evaluating automatic speech recognition, fails to capture that some words are more important for a system to transcribe correctly than others. We present and evaluate neural architectures that use acoustic features for 3-class word importance prediction. Our model performs competitively against state-of-the-art text-based word-importance prediction models, and it demonstrates particular benefits when operating on imperfect ASR output.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2018

A Corpus for Modeling Word Importance in Spoken Dialogue Transcripts

Motivated by a project to create a system for people who are deaf or har...
research
07/31/2020

Modular End-to-end Automatic Speech Recognition Framework for Acoustic-to-word Model

End-to-end (E2E) systems have played a more and more important role in a...
research
04/24/2017

Joint Modeling of Text and Acoustic-Prosodic Cues for Neural Parsing

In conversational speech, the acoustic signal provides cues that help li...
research
04/08/2019

Giving Attention to the Unexpected: Using Prosody Innovations in Disfluency Detection

Disfluencies in spontaneous speech are known to be associated with proso...
research
07/09/2019

Joint Speech Recognition and Speaker Diarization via Sequence Transduction

Speech applications dealing with conversations require not only recogniz...
research
04/23/2018

A Discriminative Acoustic-Prosodic Approach for Measuring Local Entrainment

Acoustic-prosodic entrainment describes the tendency of humans to align ...
research
05/29/2020

Prosody leaks into the memories of words

The average predictability (aka informativity) of a word in context has ...

Please sign up or login with your details

Forgot password? Click here to reset