Multi-modal large language models (MLLMs) are trained based on large lan...
Medical imaging has witnessed remarkable progress but usually requires a...
The recent work CLIPA presents an inverse scaling law for CLIP training ...
CLIP, the first foundation model that connects images and text, has enab...
There has been a longstanding belief that generation can facilitate a tr...
In recent years, camera-based 3D object detection has gained widespread
...
This paper presents a simple and effective visual prompting method for
a...
Despite the superior performance brought by vision-and-language pretrain...
Modern deep neural networks tend to be evaluated on static test sets. On...
Federated embodied agent learning protects the data privacy of individua...
Video-language pre-training is crucial for learning powerful multi-modal...
Adversarial training (AT) with samples generated by Fast Gradient Sign M...
This paper studies the potential of distilling knowledge from pre-traine...
Data mixing (e.g., Mixup, Cutmix, ResizeMix) is an essential component f...
The recent success of Vision Transformers is shaking the long dominance ...
Unlearnable examples (ULEs) aim to protect data from unauthorized usage ...
The score-based query attacks (SQAs) pose practical threats to deep neur...
Image pre-training, the current de-facto paradigm for a wide range of vi...
Adversarial Propagation (AdvProp) is an effective way to improve recogni...
Deep neural networks are powerful tools for representation learning, but...
The success of language Transformers is primarily attributed to the pret...
Transformer emerges as a powerful tool for visual recognition. In additi...
While neural symbolic methods demonstrate impressive performance in visu...
Most machine learning models are validated and tested on fixed datasets....
Data augmentation has become a de facto component for training
high-perf...
Batch normalization (BN) is a fundamental unit in modern deep networks, ...
Shape and texture are two prominent and complementary cues for recognizi...
It is commonly believed that networks cannot be both accurate and robust...
Patch-based attacks introduce a perceptible but localized change to the ...
Non-Local (NL) blocks have been widely studied in various vision tasks.
...
Adversarial examples are commonly viewed as a threat to ConvNets. Here w...
In this paper, we study physical adversarial attacks on object detectors...
Adversarial training is one of the main defenses against adversarial att...
This paper focuses on learning transferable adversarial examples specifi...
The recent development of adversarial attack has proven that ensemble-ba...
Adversarial attacks to image classification systems present challenges t...
To accelerate research on adversarial examples and robustness of machine...
Though convolutional neural networks have achieved state-of-the-art
perf...
We propose a novel single shot object detection network named Detection ...
It is very attractive to formulate vision in terms of pattern theory
Mum...
Convolutional neural networks have demonstrated their powerful ability o...
In this paper, we study the task of detecting semantic parts of an objec...
In this paper, we address the task of detecting semantic parts on partia...
It has been well demonstrated that adversarial examples, i.e., natural i...