Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders

by   Jongseong Jang, et al.

Deep neural networks have been successfully adopted to diverse domains including pathology classification based on medical images. However, large-scale and high-quality data to train powerful neural networks are rare in the medical domain as the labeling must be done by qualified experts. Researchers recently tackled this problem with some success by taking advantage of models pre-trained on large-scale general domain data. Specifically, researchers took contrastive image-text encoders (e.g., CLIP) and fine-tuned it with chest X-ray images and paired reports to perform zero-shot pathology classification, thus completely removing the need for pathology-annotated images to train a classification model. Existing studies, however, fine-tuned the pre-trained model with the same contrastive learning objective, and failed to exploit the multi-labeled nature of medical image-report pairs. In this paper, we propose a new fine-tuning strategy based on sentence sampling and positive-pair loss relaxation for improving the downstream zero-shot pathology classification performance, which can be applied to any pre-trained contrastive image-text encoders. Our method consistently showed dramatically improved zero-shot pathology classification performance on four different chest X-ray datasets and 3 different pre-trained models (5.77 particular, fine-tuning CLIP with our method showed much comparable or marginally outperformed to board-certified radiologists (0.619 vs 0.625 in F1 score and 0.530 vs 0.544 in MCC) in zero-shot classification of five prominent diseases from the CheXpert dataset.


LiT: Zero-Shot Transfer with Locked-image Text Tuning

This paper presents contrastive-tuning, a simple method employing contra...

Local Contrastive Learning for Medical Image Recognition

The proliferation of Deep Learning (DL)-based methods for radiographic i...

Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime

This paper explores training medical vision-language models (VLMs) – whe...

CLIP-ReIdent: Contrastive Training for Player Re-Identification

Sports analytics benefits from recent advances in machine learning provi...

Improving Zero-Shot Detection of Low Prevalence Chest Pathologies using Domain Pre-trained Language Models

Recent advances in zero-shot learning have enabled the use of paired ima...

COAD: Contrastive Pre-training with Adversarial Fine-tuning for Zero-shot Expert Linking

Expert finding, a popular service provided by many online websites such ...

Exploring Image Augmentations for Siamese Representation Learning with Chest X-Rays

Image augmentations are quintessential for effective visual representati...

Please sign up or login with your details

Forgot password? Click here to reset