On the Difficulty of Segmenting Words with Attention

09/21/2021
by   Ramon Sanabria, et al.
0

Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks. Previous papers have suggested that for sequence-to-sequence models trained on tasks such as speech translation or speech recognition, attention can be used to locate and segment the words. We show, however, that even on monolingual data this approach is brittle. In our experiments with different input types, data sizes, and segmentation algorithms, only models trained to predict phones from words succeed in the task. Models trained to predict words from either phones or speech (i.e., the opposite direction needed to generalize to new data), yield much worse results, suggesting that attention-based segmentation is only useful in limited scenarios.

READ FULL TEXT
research
04/25/2022

Supervised Attention in Sequence-to-Sequence Models for Speech Recognition

Attention mechanism in sequence-to-sequence models is designed to model ...
research
03/24/2018

Low-Resource Speech-to-Text Translation

Speech-to-text translation has many potential applications for low-resou...
research
07/23/2018

Acoustic-to-Word Recognition with Sequence-to-Sequence Models

Acoustic-to-Word recognition provides a straightforward solution to end-...
research
06/22/2022

DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon

Finding word boundaries in continuous speech is challenging as there is ...
research
06/08/2020

Learning to Count Words in Fluent Speech enables Online Speech Recognition

Sequence to Sequence models, in particular the Transformer, achieve stat...
research
08/01/2016

Blind phoneme segmentation with temporal prediction errors

Phonemic segmentation of speech is a critical step of speech recognition...

Please sign up or login with your details

Forgot password? Click here to reset