CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation

by   Jun Wang, et al.
University of Warwick

Radiology report generation (RRG) has gained increasing research attention because of its huge potential to mitigate medical resource shortages and aid the process of disease decision making by radiologists. Recent advancements in Radiology Report Generation (RRG) are largely driven by improving models' capabilities in encoding single-modal feature representations, while few studies explore explicitly the cross-modal alignment between image regions and words. Radiologists typically focus first on abnormal image regions before they compose the corresponding text descriptions, thus cross-modal alignment is of great importance to learn an abnormality-aware RRG model. Motivated by this, we propose a Class Activation Map guided Attention Network (CAMANet) which explicitly promotes cross-modal alignment by employing the aggregated class activation maps to supervise the cross-modal attention learning, and simultaneously enriches the discriminative information. Experimental results demonstrate that CAMANet outperforms previous SOTA methods on two commonly used RRG benchmarks.


page 1

page 3

page 11

page 13

page 14


Cross-modal Prototype Driven Network for Radiology Report Generation

Radiology report generation (RRG) aims to describe automatically a radio...

Graph Pattern Loss based Diversified Attention Network for Cross-Modal Retrieval

Cross-modal retrieval aims to enable flexible retrieval experience by co...

Co-Attentive Cross-Modal Deep Learning for Medical Evidence Synthesis and Decision Making

Modern medicine requires generalised approaches to the synthesis and int...

AnANet: Modeling Association and Alignment for Cross-modal Correlation Classification

The explosive increase of multimodal data makes a great demand in many c...

Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze

When speakers describe an image, they tend to look at objects before men...

Cross-modal Contrastive Learning for Multimodal Fake News Detection

Automatic detection of multimodal fake news has gained a widespread atte...

Text-guided 3D Human Generation from 2D Collections

3D human modeling has been widely used for engaging interaction in gamin...

Please sign up or login with your details

Forgot password? Click here to reset