We introduce a new class of objectives for optimal transport computation...
Large Language Models (LLMs), despite their recent impressive
accomplish...
The theory of greedy low-rank learning (GLRL) aims to explain the impres...
Transformer architecture has shown impressive performance in multiple
re...
Large Language Models (LLMs), armed with billions of parameters, exhibit...
The high computational and memory requirements of large language model (...
Communication compression is a crucial technique for modern distributed
...
Training foundation models, such as GPT-3 and PaLM, can be extremely
exp...
Overparameterized neural networks generalize well but are expensive to t...
The accuracy and completeness of population estimation would significant...
Recent advances in efficient Transformers have exploited either the spar...
Softmax classifiers with a very large number of classes naturally occur ...
Dense embedding models are commonly deployed in commercial search engine...
Efficient inference for wide output layers (WOLs) is an essential yet
ch...
Although convolutional neural networks (CNNs) are inspired by the mechan...
Stochastic Gradient Descent or SGD is the most popular optimization algo...
Deep Learning (DL) algorithms are the central focus of modern machine
le...
Entity resolution identifies and removes duplicate entities in large, no...
WTA (Winner Take All) hashing has been successfully applied in many larg...