DeepAI AI Chat
Log In Sign Up

Modeling Acoustic-Prosodic Cues for Word Importance Prediction in Spoken Dialogues

by   Sushant Kafle, et al.
Rochester Institute of Technology

Prosodic cues in conversational speech aid listeners in discerning a message. We investigate whether acoustic cues in spoken dialogue can be used to identify the importance of individual words to the meaning of a conversation turn. Individuals who are Deaf and Hard of Hearing often rely on real-time captions in live meetings. Word error rate, a traditional metric for evaluating automatic speech recognition, fails to capture that some words are more important for a system to transcribe correctly than others. We present and evaluate neural architectures that use acoustic features for 3-class word importance prediction. Our model performs competitively against state-of-the-art text-based word-importance prediction models, and it demonstrates particular benefits when operating on imperfect ASR output.


page 1

page 2

page 3

page 4


A Corpus for Modeling Word Importance in Spoken Dialogue Transcripts

Motivated by a project to create a system for people who are deaf or har...

Modular End-to-end Automatic Speech Recognition Framework for Acoustic-to-word Model

End-to-end (E2E) systems have played a more and more important role in a...

Joint Modeling of Text and Acoustic-Prosodic Cues for Neural Parsing

In conversational speech, the acoustic signal provides cues that help li...

Giving Attention to the Unexpected: Using Prosody Innovations in Disfluency Detection

Disfluencies in spontaneous speech are known to be associated with proso...

Joint Speech Recognition and Speaker Diarization via Sequence Transduction

Speech applications dealing with conversations require not only recogniz...

A Discriminative Acoustic-Prosodic Approach for Measuring Local Entrainment

Acoustic-prosodic entrainment describes the tendency of humans to align ...

Prosody leaks into the memories of words

The average predictability (aka informativity) of a word in context has ...