Fundus Image Analysis for Age Related Macular Degeneration: ADAM-2020 Challenge Report
FFundus Image Analysis for Age Related MacularDegeneration: ADAM-2020 Challenge Report
Sharath M Shankaranarayana
Zasti.AI [email protected]
Abstract.
Age related macular degeneration (AMD) is one of the majorcauses for blindness in the elderly population. In this report, we proposedeep learning based methods for retinal analysis using color fundus imagesfor computer aided diagnosis of AMD. We leverage the recent state ofthe art deep networks for building a single fundus image based AMDclassification pipeline. We also propose methods for the other directlyrelevant and auxiliary tasks such as lesions detection and segmentation,fovea detection and optic disc segmentation. We propose the use ofgenerative adversarial networks (GANs) for the tasks of segmentationand detection. We also propose a novel method of fovea detection usingGANs.
Keywords:
Age Related Macular Degeneration · deep learning · classifi-cation · detection · segmentation. With the advancement in medical field and thereby increase in life expectancy,age-related diseases also tend to increase, which causes burden on the healthcareproviders. Age related macular degeneration (AMD) is one such diseases whichaffects the elderly and potentially causing loss in vision. Early detection is veryimportant for prevention and treatment of AMD. Color fundus imaging (CFI)is one of the quickest retinal imaging modality and is very useful in monitoringa large number of retinal diseases. However, conclusive diagnosis for AMD isdone predominantly by examining other retinal imaging modality called opticalcoherence tomography (OCT), since most ophthalmologists find it difficult toaccurately diagnose AMD only based on CFI. Moreover, the task of detectingthe abnormalities in retina such as drusen, exudate, hemorrhage etc., is a laborintensive and time consuming process even for experts. This necessitates the needfor automated methods for fundus image analysis for detection of AMD.There are only a few works on automated AMD detection using color fundusimages. Recently the authors of [6] proposed a deep convolutional neural network(CNN) based classification system for AMD along with providing a large datasetfor AMD related research. Instead of directly classifying retinal images as AMDor not, the authors use Age-Related Eye Disease Study (AREDS) SimplifiedSeverity Scale to predict the risk of progression to late AMD. In another recent a r X i v : . [ ee ss . I V ] S e p Sharath M Shankaranarayana
Fig. 1.
Proposed pipeline for AMD classification work [11], the authors propose a deep network which detects the presence of dryand wet AMD using both fundus images as well as OCT slices.In this report, we propose methods employed for various tasks related toretinal image analysis for aiding the detection of AMD. We propose methods forsingle image level grading for AMD using CFI and also methods for segmentationof various kinds of lesions found in the retinal. We also propose a novel methodfor the localization of fovea in the fundus images.
The first of the challenge requires us to predict the probability of AMD for agiven retinal fundus image. The dataset consists of training images of which have AMD and the rest do not have AMD. To overcome the inherentimbalance in the dataset, we resort to data augmentation along with oversamplingof the AMD fundus images. For data augmentation, we employ the followingtechniques:1. Random flipping and rotation2. Photometric distortion3. Specific histogram based image processing techniques such as histogramequalization, adaptive histogram equalization, intensity rescaling at differentlevels, histogram matching by randomly selecting a few cannonical imagesfrom the validation set. DAM-2020 Challenge Report 3
With the augmented dataset, we then employ multiple pretrained deep con-volutional neural networks (CNNs) for the task of binary classification. Theclassification networks employed are:1. EfficientNet [10]: EfficientNets are recently introduced class of networks andthey employ a model scaling method to scale up CNNs in a more structuredway and they have been shown to surpass the performance of other deepnetworks on ImageNet dataset and with better efficiency. We use multipleclasses of EfficientNets- EfficientNet-B4, EfficientNet-B5, EfficientNet-B6,EfficientNet-B7. We use them because they are optimized for training atlarger resolutions ( , , and , respectively) when compared tostandard x resolution of other Imagenet pretrained networks.2. Inception-Resnet [9]: We use the Inception-Resnet architecture since it com-bines the two most commonly used blocks- inception block, which helpsin multiscale feature extraction and residual block, which helps in fasterconvergence and alleviating vanishing gradients. These two blocks help inbetter feature extraction.3. Resnext [12]: Resnext is another highly modularized network architecturefor image classification tasks, which has also proved to be state-of-the-art inImageNet classification task. Along with ImageNet pretrained Resnext, wealso use pretrained weights obtained by weakly supervised learning on theInstagram dataset [4].4. Squeeze and Excitation networks [3]: We use this class of network architecturessince they consist of "Squeeze-and-Excitation" (SE) blocks that tend togeneralize well across different datasets.Since AMD is characterized by abnormalities in macular region, we also crop outthe macular region at various different zoom levels. Finally, with all the networkstrained for the AMD classification task, we ensemble the network predictionsusing simple averaging of posterior probabilities (overall block diagram shown infigure Fig 1. For the second task, we train a deep network for the task of semantic segmentationof OD. We employ the same methodology proposed in our previous works [7]for training. But instead of using the RGB color channels, we use invertedgreen channel images as proposed in [8] since the inverted green channel imagesprovides better contrast for OD and the background. We use adversarial trainingsetting and use ResUnet as our base architecture as mentioned in the paper [7].The reader is advised to refer [7] for details. Once we train the network andpredict the segmentation maps for the given retinal images, we perform somepost-processing operations on the predicted maps. We first keep only the largestconnected component in the binary segmentation map and remove the othersmaller components. And later we apply convex-hull operator to obtain the finalsegmentation map.
Sharath M Shankaranarayana
Fig. 2.
Process of creating distance maps
Additionally, the task also requires us to do optic disc detection. For this,we simply keep a threshold on the area largest connected component in thesegmentation map during the post-processing stage and discard if the area issmaller than the threshold.
This task requires us to predict the point coordinates of the location of fovea.Instead of treating it as a standard coordinates regression problem, we convertthis into an image to image translation problem. For this, prom the informationof groundtruth point coordinates, we create distance maps having the same sizeas the dundus images. Distance maps are using the euclidean distance transformscomputed from the fovea location i.e., the distances farther from the fovea havehigher values than the distances closer to the fovea. For the purpose of easiertraining, we normalize the distance map and then invert so that the distancesnearer to fovea has higher values. We later truncate it such that it only containsa specific radius around the fovea. This is done to improve training and is shownin figure Fig. 2.We then treat it as an paired image to image translation problem[13] as we did with od segmentation task similar to our work in [7]. The overallblock diagram of generative adversarial network (GAN) framework used for imagetranslation task is shown in figure Fig. 3We train the generator is to learn a mapping from the input fundus image x to the fovea distance map y : G : x → y . We train the discriminator to distinguishbetween the generated distance map and real distance map: L GAN ( G, D ) = E x,y [ log ( D ( y ))] + E x [ log (1 − D ( G ( x ))] (1)where E x,y represents the expectation of the log-likelihood of the pair ( x, y ) being sampled from the underlying probability distribution of real pairs of inputfundus images and groundtruth distance maps. DAM-2020 Challenge Report 5
Fig. 3.
Overall block diagram of fovea distance map regression
Additionally, we also use L loss between the generator predicted distancemap and groundtruth distance map and therefore the final objective becomes G ∗ = arg min G max D ( L GAN ( G, D ) + λ ( L L ( G )) (2)where λ balances the contribution of two losses. In equation (2), the discriminatortries to maximize the expression by classifying whether the distance map is realor generated. The generator tries to minimize both adversarial loss and L lossin equation (2).The architecture for generator part of GAN is described in the next sectionsince the base architecture is same for this task as well as the lesions segmen-tation task. The discriminator uses a conventional CNN architecture used forclassification. Let Ln denote a Convolution-BatchNorm-ReLU layer with n filters.The discriminator uses the following architectureL64-L128-L256-L512-L512-L512Finally, with the predicted distance maps from the generator, we extract thefovea point coordinates by performing a post-processing operation. Ideally, wecould just take the pixel with highest intensity as the fovea coordinate, but doingso results in other erroneous regions as well. Therefore we cluster one percent ofthe highest intensities and segment out the largest cluster. The centroid of thislargest cluster gives us the fovea coordinate. For this task, given a fundus image, we need to segment out various kinds oflesions such as drusen, exudate, hemorrhage, scar and others. Similar to theprevious two tasks, we employ GAN based frameworks for the task of lesionssegmentation. The only difference between OD segmentation in Task 2 and thistask is the generator architecture. We would also like to mention once again thatthis generator architecture was used for the fovea localization task as well.
Sharath M Shankaranarayana
Fig. 4.
Base Architecture or Generator Archtecture for GAN
Base Architecture :
The architecture for the base network (or the generatornetwork is shown in Fig. 4. The input input image is first passed through threeconvolutional layers with kernel sizes X , X and X respectively. We usestrided convolution with stride of 2 for downsampling the second and third layers.These initial layers serve as coarse feature extraction layers. Then, the networkconsists of special blocks which are inspired from [1] and [7] works. Then, afterspecific number of special blocks, the network consists of strided deconvolutionallayers for upsampling and finally a X convolution for mapping to the output.All the layers are followed by batch normalization and Relu operations, exceptthe final layer which is followed by tanh activation. Additionally, we also employlong skip connections.We train the GAN for each of the lesion segmentation task separately, sincethere is severe class imbalance and also overlap of the lesion annotations insome cases. Thus, we predict segmentation maps separately for each of the tasks.Finally, for lesions detection, we simply discard those segmentation predictionswhere the lesion area is less than a specific threshold value found empirically.
For the first task of AMD classification, we employ only the images providedby the challenge organisers for training, and a subset of STARE dataset [2] forvalidation. We train a total of models with different crop levels with respectto macular region. We thus obtain different models. All the networks aretrained for epochs and stochastic gradient descent (SGD) optimizer except theEfficientNets, which are trained using ADAM optimizer, with an initial learningrate of e − for all models. We save the model giving the highest accuracy on thevalidation set. While testing, we perform test time augmentation (TTA) by doingvarious histogram processing based operations for each of the trained networks DAM-2020 Challenge Report 7 and finally perform ensembling of all models by taking the mean of posteriorprobabilities for AMD. The final ensembling gave us an AUC score of . inthe validation stage of the competetion.For the second and third tasks, along with the challenge data, we employ images from the REFUGE challenge [5] , while for the fourth task, we only usethe dataset provided by the organizers of the competetion. We train the GANmodels from scratch with initialization from a Gaussian distribution with mean0 and standard deviation 0.02 and do the training for epochs for the secondand the third tasks since we have sufficient number of images after augmentation.Since the number of images for the fourth task are less, even after extensiveaugmentation, we initialize the weights obtained from the fovea distance mapregression task. This is also helpful since most of the abnormalities occur near themacula region than the OD region. We keep the initial learning rate to be − and halve every epochs. We train the models at a high resolution of x .We save the model which gives the best validation score. For the second task,we obtain an F score of . for disc detection, and Dice score of . for ODsegmentation. For the third task, we Euclidean Distance difference of . pixels. We obtain Dice scores of . , . , . , . and . for thetasks of drusen, exudate, hemorrhage, scar and others segmentation respectively. In this report, we outlined the methods for fundus image analysis for variousAMD related tasks. All of our techniques employed the latest advancementsin deep learning. We see that even directly employing the latest classificationnetworks off the shelf provides a good initial score. Also, the GAN based modelsperform very well in almost all of the other tasks. In future, we would like toexplore multi-task learning for obtaining better results.
We thank the organizers of the Automatic Detection challenge on Age-relatedMacular degeneration (ADAM) (https://amd.grand-challenge.org/) for hostingthe challenge and kindly providing the dataset.
References
1. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.In: Proceedings of the IEEE conference on computer vision and pattern recognition.pp. 770–778 (2016)2. Hoover, A., Kouznetsova, V., Goldbaum, M.: Locating blood vessels in retinal imagesby piecewise threshold probing of a matched filter response. IEEE Transactions onMedical imaging (3), 203–210 (2000)3. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of theIEEE conference on computer vision and pattern recognition. pp. 7132–7141 (2018) Sharath M Shankaranarayana4. Mahajan, D., Girshick, R., Ramanathan, V., He, K., Paluri, M., Li, Y., Bharambe,A., van der Maaten, L.: Exploring the limits of weakly supervised pretraining. In:Proceedings of the European Conference on Computer Vision (ECCV). pp. 181–196(2018)5. Orlando, J.I., Fu, H., Breda, J.B., van Keer, K., Bathula, D.R., Diaz-Pinto, A., Fang,R., Heng, P.A., Kim, J., Lee, J., et al.: Refuge challenge: A unified framework forevaluating automated methods for glaucoma assessment from fundus photographs.Medical image analysis , 101570 (2020)6. Peng, Y., Dharssi, S., Chen, Q., Keenan, T.D., Agrón, E., Wong, W.T., Chew, E.Y.,Lu, Z.: Deepseenet: A deep learning model for automated classification of patient-based age-related macular degeneration severity from color fundus photographs.Ophthalmology (4), 565–575 (2019)7. Shankaranarayana, S.M., Ram, K., Mitra, K., Sivaprakasam, M.: Joint optic discand cup segmentation using fully convolutional and adversarial networks. In: Fetal,Infant and Ophthalmic Medical Image Analysis, pp. 168–176. Springer (2017)8. Shankaranarayana, S.M., Ram, K., Mitra, K., Sivaprakasam, M.: Fully convolutionalnetworks for monocular retinal depth estimation and optic disc-cup segmentation.IEEE journal of biomedical and health informatics23