Improving Joint Learning of Chest X-Ray and Radiology Report by Word Region Alignment

09/04/2021
by   Zhanghexuan Ji, et al.
0

Self-supervised learning provides an opportunity to explore unlabeled chest X-rays and their associated free-text reports accumulated in clinical routine without manual supervision. This paper proposes a Joint Image Text Representation Learning Network (JoImTeRNet) for pre-training on chest X-ray images and their radiology reports. The model was pre-trained on both the global image-sentence level and the local image region-word level for visual-textual matching. Both are bidirectionally constrained on Cross-Entropy based and ranking-based Triplet Matching Losses. The region-word matching is calculated using the attention mechanism without direct supervision about their mapping. The pre-trained multi-modal representation learning paves the way for downstream tasks concerning image and/or text encoding. We demonstrate the representation learning quality by cross-modality retrievals and multi-label classifications on two datasets: OpenI-IU and MIMIC-CXR

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/04/2021

Generalized Radiograph Representation Learning via Cross-supervision between Images and Free-text Radiology Reports

Pre-training lays the foundation for recent successes in radiograph anal...
research
03/18/2022

Graph-Text Multi-Modal Pre-training for Medical Representation Learning

As the volume of Electronic Health Records (EHR) sharply grows, there ha...
research
03/30/2021

Self-supervised Image-text Pre-training With Mixed Data In Chest X-rays

Pre-trained models, e.g., from ImageNet, have proven to be effective in ...
research
10/06/2021

Improving Pneumonia Localization via Cross-Attention on Medical Images and Reports

Localization and characterization of diseases like pneumonia are primary...
research
07/10/2020

EMIXER: End-to-end Multimodal X-ray Generation via Self-supervision

Deep generative models have enabled the automated synthesis of high-qual...
research
09/05/2018

Bimodal network architectures for automatic generation of image annotation from text

Medical image analysis practitioners have embraced big data methodologie...
research
11/21/2018

Unsupervised Multimodal Representation Learning across Medical Images and Reports

Joint embeddings between medical imaging modalities and associated radio...

Please sign up or login with your details

Forgot password? Click here to reset