Graph neural networks (GNNs) have pioneered advancements in graph
repres...
We study the implicit bias of batch normalization trained by gradient
de...
Large Language Models (LLMs) have been applied in the speech domain, oft...
Large language models (LLMs) can learn to perform a wide range of natura...
We study (differentially) private federated learning (FL) of language mo...
Language models are increasingly being deployed for general problem solv...
Face clustering can provide pseudo-labels to the massive unlabeled face ...
Gradient regularization, as described in <cit.>, is a
highly effective t...
Mixup, a simple data augmentation method that randomly mixes two data po...
Data multiplexing is a recently proposed method for improving a model's
...
The rapid scaling of language models is motivating research using
low-bi...
Session-based recommendation (SBR) problem, which focuses on next-item
p...
We propose AnyTOD, an end-to-end task-oriented dialog (TOD) system with
...
Most research on task oriented dialog modeling is based on written text
...
Hashing has been widely researched to solve the large-scale approximate
...
Knowledge (including structured knowledge such as schema and ontology, a...
While large language models (LLMs) have demonstrated impressive capabili...
Recent works have demonstrated a double descent phenomenon in
over-param...
Carefully-designed schemas describing how to collect and annotate dialog...
In this paper we share findings from our effort to build practical machi...
Label smoothing is ubiquitously applied in Neural Machine Translation (N...
Building universal dialogue systems that can seamlessly operate across
m...
As deep learning becomes the mainstream in the field of natural language...
Multilingual neural machine translation models are trained to maximize t...
Modern neural networks often have great expressive power and can be trai...
Task-oriented dialogue (TOD) systems are required to identify key inform...
Achieving universal translation between all human language pairs is the
...
"Benign overfitting", where classifiers memorize noisy training data yet...
Encoder-decoder networks with attention have proven to be a powerful way...
Zero/few-shot transfer to unseen services is a critical challenge in
tas...
Federated learning is used for decentralized training of machine learnin...
This paper explores zero-label learning in Natural Language Processing (...
Sequence-to-sequence models have been applied to a wide variety of NLP t...
Adaptive gradient methods such as Adam have gained increasing popularity...
With recent progress in joint modeling of visual and textual representat...
The presence of haze significantly reduces the quality of images. Resear...
Let 𝔽_q be the finite field of q elements and let
D_2n=⟨ x,y| x^n=1, y^2...
Modern machine learning systems such as deep neural networks are often h...
Dialogue state tracking (DST) is a pivotal component in task-oriented
di...
We propose automatic speech recognition (ASR) models inspired by echo st...
We consider a one-hidden-layer leaky ReLU network of arbitrary width tra...
The predictions of wide Bayesian neural networks are described by a Gaus...
Despite the widespread application of recurrent neural networks (RNNs) a...
One challenge of machine translation is how to quickly adapt to unseen
d...
Most undeciphered lost languages exhibit two characteristics that pose
s...
Massively multilingual models subsuming tens or even hundreds of languag...
We analyze the properties of gradient descent on convex surrogates for t...
For any positive integers m and k, existing literature only determines
t...
We consider the problem of learning the best-fitting single neuron as
me...
Over the last few years two promising research directions in low-resourc...