Improving Medical Annotation Quality to Decrease Labeling Burden Using Stratified Noisy Cross-Validation

by   Joy Hsu, et al.

As machine learning has become increasingly applied to medical imaging data, noise in training labels has emerged as an important challenge. Variability in diagnosis of medical images is well established; in addition, variability in training and attention to task among medical labelers may exacerbate this issue. Methods for identifying and mitigating the impact of low quality labels have been studied, but are not well characterized in medical imaging tasks. For instance, Noisy Cross-Validation splits the training data into halves, and has been shown to identify low-quality labels in computer vision tasks; but it has not been applied to medical imaging tasks specifically. In this work we introduce Stratified Noisy Cross-Validation (SNCV), an extension of noisy cross validation. SNCV can provide estimates of confidence in model predictions by assigning a quality score to each example; stratify labels to handle class imbalance; and identify likely low-quality labels to analyze the causes. We assess performance of SNCV on diagnosis of glaucoma suspect risk from retinal fundus photographs, a clinically important yet nuanced labeling task. Using training data from a previously-published deep learning model, we compute a continuous quality score (QS) for each training example. We relabel 1,277 low-QS examples using a trained glaucoma specialist; the new labels agree with the SNCV prediction over the initial label >85 low-QS examples mostly reflect labeler errors. We then quantify the impact of training with only high-QS labels, showing that strong model performance may be obtained with many fewer examples. By applying the method to randomly sub-sampled training dataset, we show that our method can reduce labelling burden by approximately 50 using the full dataset on multiple held-out test sets.


Data Valuation for Medical Imaging Using Shapley Value: Application on A Large-scale Chest X-ray Dataset

The reliability of machine learning models can be compromised when train...

Cross-Validation Is All You Need: A Statistical Approach To Label Noise Estimation

Label noise is prevalent in machine learning datasets. It is crucial to ...

Scheduling Techniques for Liver Segmentation: ReduceLRonPlateau Vs OneCycleLR

Machine learning and computer vision techniques have influenced many fie...

Cross-Task Attention Network: Improving Multi-Task Learning for Medical Imaging Applications

Multi-task learning (MTL) is a powerful approach in deep learning that l...

RoS-KD: A Robust Stochastic Knowledge Distillation Approach for Noisy Medical Imaging

AI-powered Medical Imaging has recently achieved enormous attention due ...

Improving Medical Image Classification with Label Noise Using Dual-uncertainty Estimation

Deep neural networks are known to be data-driven and label noise can hav...

Writing Style Invariant Deep Learning Model for Historical Manuscripts Alignment

Historical manuscript alignment is a widely known problem in document an...

Please sign up or login with your details

Forgot password? Click here to reset