Are discrete units necessary for Spoken Language Modeling?

03/11/2022
by   Tu Anh Nguyen, et al.
0

Recent work in spoken language modeling shows the possibility of learning a language unsupervisedly from raw audio without any text labels. The approach relies first on transforming the audio into a sequence of discrete units (or pseudo-text) and then training a language model directly on such pseudo-text. Is such a discrete bottleneck necessary, potentially introducing irreversible errors in the encoding of the speech signal, or could we learn a language model without discrete units at all? In this work, show that discretization is indeed essential for good results in spoken language modeling, but that can omit the discrete bottleneck if we use using discrete target features from a higher level than the input features. We also show that an end-to-end model trained with discrete target like HuBERT achieves similar results as the best language model trained on pseudo-text on a set of zero-shot spoken language modeling metrics from the Zero Resource Speech Challenge 2021.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2021

Generative Spoken Language Modeling from Raw Audio

Generative spoken language modeling involves learning jointly the acoust...
research
09/07/2021

Text-Free Prosody-Aware Generative Spoken Language Modeling

Speech pre-training has primarily demonstrated efficacy on classificatio...
research
03/30/2022

Generative Spoken Dialogue Language Modeling

We introduce dGSLM, the first "textless" model able to generate audio sa...
research
01/02/2023

Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling

This work profoundly analyzes discrete self-supervised speech representa...
research
05/11/2020

Luganda Text-to-Speech Machine

In Uganda, Luganda is the most spoken native language. It is used for co...
research
09/19/2019

A Random Gossip BMUF Process for Neural Language Modeling

LSTM language model is an essential component of industrial ASR systems....
research
05/30/2023

Empirical Sufficiency Lower Bounds for Language Modeling with Locally-Bootstrapped Semantic Structures

In this work we build upon negative results from an attempt at language ...

Please sign up or login with your details

Forgot password? Click here to reset