Exploiting CLIP-based Multi-modal Approach for Artwork Classification and Retrieval

09/21/2023
by   Alberto Baldrati, et al.
0

Given the recent advances in multimodal image pretraining where visual models trained with semantically dense textual supervision tend to have better generalization capabilities than those trained using categorical attributes or through unsupervised techniques, in this work we investigate how recent CLIP model can be applied in several tasks in artwork domain. We perform exhaustive experiments on the NoisyArt dataset which is a dataset of artwork images crawled from public resources on the web. On such dataset CLIP achieves impressive results on (zero-shot) classification and promising results in both artwork-to-artwork and description-to-artwork domain.

READ FULL TEXT

page 4

page 5

research
07/23/2020

ZSCRGAN: A GAN-based Expectation Maximization Model for Zero-Shot Retrieval of Images from Textual Descriptions

Most existing algorithms for cross-modal Information Retrieval are based...
research
12/27/2021

A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision

Using natural language as a supervision for training visual recognition ...
research
02/01/2023

Learning Generalized Zero-Shot Learners for Open-Domain Image Geolocalization

Image geolocalization is the challenging task of predicting the geograph...
research
05/25/2023

DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification

Large pre-trained models have had a significant impact on computer visio...
research
07/31/2023

Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks

In recent times there has been a surge of multi-modal architectures base...
research
07/08/2021

Exploiting the relationship between visual and textual features in social networks for image classification with zero-shot deep learning

One of the main issues related to unsupervised machine learning is the c...
research
05/19/2014

ESSP: An Efficient Approach to Minimizing Dense and Nonsubmodular Energy Functions

Many recent advances in computer vision have demonstrated the impressive...

Please sign up or login with your details

Forgot password? Click here to reset