BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning

09/03/2023
by   Yi Zhang, et al.
0

Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP and ALIGN, have introduced a new paradigm for learning transferable visual representations. Recently, there has been a surge of interest among researchers in developing lightweight fine-tuning techniques to adapt these models to downstream visual tasks. We recognize that current state-of-the-art fine-tuning methods, such as Tip-Adapter, simply consider the covariance between the query image feature and features of support few-shot training samples, which only captures linear relations and potentially instigates a deceptive perception of independence. To address this issue, in this work, we innovatively introduce Brownian Distance Covariance (BDC) to the field of vision-language reasoning. The BDC metric can model all possible relations, providing a robust metric for measuring feature dependence. Based on this, we present a novel method called BDC-Adapter, which integrates BDC prototype similarity reasoning and multi-modal reasoning network prediction to perform classification tasks. Our extensive experimental results show that the proposed BDC-Adapter can freely handle non-linear relations and fully characterize independence, outperforming the current state-of-the-art methods by large margins.

READ FULL TEXT

page 1

page 2

page 4

page 9

research
08/22/2023

Unsupervised Prototype Adapter for Vision-Language Models

Recently, large-scale pre-trained vision-language models (e.g. CLIP and ...
research
07/28/2023

Cross-Modal Concept Learning and Inference for Vision-Language Models

Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP, est...
research
05/05/2023

COLA: How to adapt vision-language models to Compose Objects Localized with Attributes?

Compositional reasoning is a hallmark of human visual intelligence; yet ...
research
12/02/2021

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

Biological intelligence systems of animals perceive the world by integra...
research
07/19/2023

Improving Pre-trained Language Models' Generalization

The reusability of state-of-the-art Pre-trained Language Models (PLMs) i...
research
11/28/2022

Distance Metric Learning Loss Functions in Few-Shot Scenarios of Supervised Language Models Fine-Tuning

This paper presents an analysis regarding an influence of the Distance M...
research
11/29/2022

Abstract Visual Reasoning with Tangram Shapes

We introduce KiloGram, a resource for studying abstract visual reasoning...

Please sign up or login with your details

Forgot password? Click here to reset