Vision and Language Models (VLMs), such as CLIP, have enabled visual
rec...
Vision and Language (VL) models offer an effective method for aligning
r...
Recently, large-scale pre-trained Vision and Language (VL) models have s...
Recent models such as XLS-R and Whisper have made multilingual speech
te...
Vision and Language (VL) models have demonstrated remarkable zero-shot
p...
Transformations based on domain expertise (expert transformations), such...
Large-scale pre-trained Vision Language (VL) models have shown remar...
Large scale Vision-Language (VL) models have shown tremendous success in...
Prompt tuning, in which a base pretrained model is adapted to each task ...
Scaling transformers has led to significant breakthroughs in many domain...
Action recognition models have achieved impressive results by incorporat...
The goal of Anomaly-Detection (AD) is to identify outliers, or outlying
...
Computer vision models suffer from a phenomenon known as catastrophic
fo...
Generalized Zero-Shot Learning (GZSL) aims to train a classifier that ca...
Vision and Language (VL) models have demonstrated remarkable zero-shot
p...
Recently, large-scale pre-trained Vision-and-Language (VL) foundation mo...
State-of-the-art (SOTA) semi-supervised learning (SSL) methods have been...
Multilingual text-video retrieval methods have improved significantly in...
Vision-language models trained on large, randomly collected data had
sig...
Foundation Models (FMs) have demonstrated unprecedented capabilities
inc...
This technical report describes the SViT approach for the Ego4D Point of...
Recent action recognition models have achieved impressive results by
int...
The ability to generalize learned representations across significantly
d...
Pre-training models on Imagenet or other massive datasets of real images...
The digital conversion of information stored in documents is a great sou...
Most existing works in few-shot learning rely on meta-learning the netwo...
Nowadays, there is an abundance of data involving images and surrounding...
Tremendous progress has been made in visual representation learning, not...
Temporal modelling is the key for efficient video action recognition. Wh...
As machine learning algorithms grow in popularity and diversify to many
...
Few-shot learning methods offer pre-training techniques optimized for ea...
Action recognition is an open and challenging problem in computer vision...
Data augmentation is one of the most important tools in training modern ...
In this paper, we propose a new few-shot learning method called StarNet,...
The field of Few-Shot Learning (FSL), or learning from very few (typical...
Recent progress on few-shot learning has largely re-lied on annotated da...
Few-Shot Learning (FSL) is a topic of rapidly growing interest. Typicall...
Learning from one or few visual examples is one of the key capabilities ...
Example synthesis is one of the leading methods to tackle the problem of...
Deep neural networks, trained with large amount of labeled data, can fai...
Learning to classify new categories based on just one or a few examples ...
Distance metric learning (DML) has been successfully applied to object
c...