In recent years, Large Language Models (LLMs) have garnered significant
...
Spoken semantic parsing (SSP) involves generating machine-comprehensible...
Model adaptation is crucial to handle the discrepancy between proxy trai...
Automatic Speech Recognition (ASR) models need to be optimized for speci...
This paper studies contextual biasing with Large Language Models (LLMs),...
End-to-end (E2E) spoken language understanding (SLU) systems that genera...
Large language models have proven themselves highly flexible, able to so...
This paper presents a method for selecting appropriate synthetic speech
...
State space models (SSMs) have recently shown promising results on
small...
End-to-end multilingual ASR has become more appealing because of several...
Recently, there has been an increasing interest in two-pass streaming
en...
Neural transducers have gained popularity in production ASR systems,
ach...
Neural network pruning can be effectively applied to compress automatic
...
There is growing interest in unifying the streaming and full-context
aut...
We propose a novel deliberation-based approach to end-to-end (E2E) spoke...
Cross-device federated learning (FL) protects user privacy by collaborat...
Streaming ASR with strict latency constraints is required in many speech...
We propose Neural-FST Class Language Model (NFCLM) for end-to-end speech...
With 4.5 million hours of English speech from 10 different sources acros...
From wearables to powerful smart devices, modern automatic speech recogn...
Measuring automatic speech recognition (ASR) system quality is critical ...
This paper improves the streaming transformer transducer for speech
reco...
Detection of common events and scenes from audio is useful for extractin...
Automatic speech recognition (ASR) has become increasingly ubiquitous on...
On-device speech recognition requires training models of different sizes...
Often, the storage and computational constraints of embeddeddevices dema...
As speech-enabled devices such as smartphones and smart speakers become
...
How to leverage dynamic contextual information in end-to-end speech
reco...
We propose a dynamic encoder transducer (DET) for on-device speech
recog...
Word Error Rate (WER) has been the predominant metric used to evaluate t...
In this paper, we tackle the problem of handling narrowband and wideband...