Performance assessment of the deep learning technologies in grading glaucoma severity
Objective: To validate and compare the performance of eight available deep learning architectures in grading the severity of glaucoma based on color fundus images. Materials and Methods: We retrospectively collected a dataset of 5978 fundus images and their glaucoma severities were annotated by the consensus of two experienced ophthalmologists. We preprocessed the images to generate global and local regions of interest (ROIs), namely the global field-of-view images and the local disc region images. We then divided the generated images into three independent sub-groups for training, validation, and testing purposes. With the datasets, eight convolutional neural networks (CNNs) (i.e., VGG16, VGG19, ResNet, DenseNet, InceptionV3, InceptionResNet, Xception, and NASNetMobile) were trained separately to grade glaucoma severity, and validated quantitatively using the area under the receiver operating characteristic (ROC) curve and the quadratic kappa score. Results: The CNNs, except VGG16 and VGG19, achieved average kappa scores of 80.36 trained from scratch on global and local ROIs, and 85.29 fine-tuned using the pre-trained weights, respectively. VGG16 and VGG19 achieved reasonable accuracy when trained from scratch, but they failed when using pre-trained weights for global and local ROIs. Among these CNNs, the DenseNet had the highest classification accuracy (i.e., 75.50 pre-trained weights when using global ROIs, as compared to 65.50 local ROIs. Conclusion: The experiments demonstrated the feasibility of the deep learning technology in grading glaucoma severity. In particular, global field-of-view images contain relatively richer information that may be critical for glaucoma assessment, suggesting that we should use the entire field-of-view of a fundus image for training a deep learning network.
READ FULL TEXT