Ultrasound Image Representation Learning by Modeling Sonographer Visual Attention

by   Richard Droste, et al.

Image representations are commonly learned from class labels, which are a simplistic approximation of human image understanding. In this paper we demonstrate that transferable representations of images can be learned without manual annotations by modeling human visual attention. The basis of our analyses is a unique gaze tracking dataset of sonographers performing routine clinical fetal anomaly screenings. Models of sonographer visual attention are learned by training a convolutional neural network (CNN) to predict gaze on ultrasound video frames through visual saliency prediction or gaze-point regression. We evaluate the transferability of the learned representations to the task of ultrasound standard plane detection in two contexts. Firstly, we perform transfer learning by fine-tuning the CNN with a limited number of labeled standard plane images. We find that fine-tuning the saliency predictor is superior to training from random initialization, with an average F1-score improvement of 9.6 train a simple softmax regression on the feature activations of each CNN layer in order to evaluate the representations independently of transfer learning hyper-parameters. We find that the attention models derive strong representations, approaching the precision of a fully-supervised baseline model for all but the last layer.


Discovering Salient Anatomical Landmarks by Predicting Human Gaze

Anatomical landmarks are a crucial prerequisite for many medical imaging...

Do Better ImageNet Models Transfer Better... for Image Recommendation ?

Visual embeddings from Convolutional Neural Networks (CNN) trained on th...

Self-supervised Representation Learning for Ultrasound Video

Recent advances in deep learning have achieved promising performance for...

Evaluate Fine-tuning Strategies for Fetal Head Ultrasound Image Segmentation with U-Net

Fetal head segmentation is a crucial step in measuring the fetal head ci...

Understanding and Visualizing Deep Visual Saliency Models

Recently, data-driven deep saliency models have achieved high performanc...

Vanishing point attracts gaze in free-viewing and visual search tasks

To investigate whether the vanishing point (VP) plays a significant role...

Please sign up or login with your details

Forgot password? Click here to reset