Supporting Vision-Language Model Inference with Causality-pruning Knowledge Prompt

by   Jiangmeng Li, et al.

Vision-language models are pre-trained by aligning image-text pairs in a common space so that the models can deal with open-set visual concepts by learning semantic information from textual labels. To boost the transferability of these models on downstream tasks in a zero-shot manner, recent works explore generating fixed or learnable prompts, i.e., classification weights are synthesized from natural language describing task-relevant categories, to reduce the gap between tasks in the training and test phases. However, how and what prompts can improve inference performance remains unclear. In this paper, we explicitly provide exploration and clarify the importance of including semantic information in prompts, while existing prompt methods generate prompts without exploring the semantic information of textual labels. A challenging issue is that manually constructing prompts, with rich semantic information, requires domain expertise and is extremely time-consuming. To this end, we propose Causality-pruning Knowledge Prompt (CapKP) for adapting pre-trained vision-language models to downstream image recognition. CapKP retrieves an ontological knowledge graph by treating the textual label as a query to explore task-relevant semantic information. To further refine the derived semantic information, CapKP introduces causality-pruning by following the first principle of Granger causality. Empirically, we conduct extensive evaluations to demonstrate the effectiveness of CapKP, e.g., with 8 shots, CapKP outperforms the manual-prompt method by 12.51 by 1.39 of CapKP in domain generalization compared to benchmark approaches.


page 2

page 3

page 4

page 9

page 12


PRE: Vision-Language Prompt Learning with Reparameterization Encoder

Large pre-trained vision-language models such as CLIP have demonstrated ...

Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

This work proposes POMP, a prompt pre-training method for vision-languag...

Pre-trained Language Model with Prompts for Temporal Knowledge Graph Completion

Temporal Knowledge graph completion (TKGC) is a crucial task that involv...

Learning to Prompt for Vision-Language Models

Vision-language pre-training has recently emerged as a promising alterna...

OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression

This paper presents a language-powered paradigm for ordinal regression. ...

CKG: Dynamic Representation Based on Context and Knowledge Graph

Recently, neural language representation models pre-trained on large cor...

How does a Pre-Trained Transformer Integrate Contextual Keywords? Application to Humanitarian Computing

In a classification task, dealing with text snippets and metadata usuall...

Please sign up or login with your details

Forgot password? Click here to reset