Exploiting the Transferability of Deep Learning Systems Across Multi-modal Retinal Scans for Extracting Retinopathy Lesions
EExploiting the Transferability of Deep LearningSystems Across Multi-modal Retinal Scans forExtracting Retinopathy Lesions
Taimur Hassan † (cid:5) , Muhammad Usman Akram (cid:5) , Naoufel Werghi † † Department of Electrical Engineering and Computer Sciences, Khalifa University, Abu Dhabi, United Arab Emirates. (cid:5)
Department of Computer and Software Engineering, National University of Sciences and Technology, Islamabad, Pakistan.
Abstract —Retinal lesions play a vital role in the accurateclassification of retinal abnormalities. Many researchers haveproposed deep lesion-aware screening systems that analyze andgrade the progression of retinopathy. However, to the best ofour knowledge, no literature exploits the tendency of thesesystems to generalize across multiple scanner specifications andmulti-modal imagery. Towards this end, this paper presentsa detailed evaluation of semantic segmentation, scene parsingand hybrid deep learning systems for extracting the retinallesions such as intra-retinal fluid, sub-retinal fluid, hard exudates,drusen, and other chorioretinal anomalies from fused fundus andoptical coherence tomography (OCT) imagery. Furthermore, wepresent a novel strategy exploiting the transferability of thesemodels across multiple retinal scanner specifications. A totalof 363 fundus and 173,915 OCT scans from seven publiclyavailable datasets were used in this research (from which 297fundus and 59,593 OCT scans were used for testing purposes).Overall, a hybrid retinal analysis and grading network (RAGNet),backboned through ResNet , stood first for extracting the retinallesions, achieving a mean dice coefficient score of 0.822. Moreover,the complete source code and its documentation are released athttp://biomisa.org/index.php/downloads/. Index Terms —Retinal Lesions, Ophthalmology, ConvolutionalNeural Networks, Fundus Photography, Optical Coherence To-mography.
I. I
NTRODUCTION
Retinopathy or retinal diseases tend to damage the retina,which may result in a non-recoverable loss of vision or evenblindness if not timely treated. Most of these diseases areassociated with diabetes. However, they may also occur due toaging, uveitis, and cataract surgeries. The two common retinaldiseases are macular edema (ME) and age-related maculardegeneration (AMD). ME occurs due to fluid accumulationwithin the macula mostly due to the associated hyperglycemia,uveitis, and cataract surgeries. ME caused by diabetes isoften termed as diabetic macular edema (DME) which isidentified by examining the patient’s diabetic history and alsoby analyzing the retinal thickening (caused due to retinal fluid)or hard exudates (HE) within the one-disc diameter of thecenter of the macula (containing a small pit known as fovea)[1]. Early Treatment Diabetic Retinopathy Studies (ETDRS)classified clinically significant ME as having: 1) either thethickening within 500 µ m of the macular center; 2) HE along This work is supported by a research fund from Khalifa University: Ref:CIRA-2019-047.
Figure 1:
Retinal lesions in fundus and OCT scans of Ci-CSME (A,B) and dry AMD pathology (C, D). with the adjacent thickening within the macular center of500 µ m; or 3) retinal thickening regions of one (or more) discdiameter in which some part of them are within the one-discdiameter of the macular center [2]. But, with the advent ofnew imaging techniques such as optical coherence tomography(OCT), the classification of DME is redefined as centrallyinvolved clinical significant macular edema (Ci-CSME) if thepresence of retinal thickening, due to retinal fluid or hardexudates, is discovered within the central sub-field zone ofthe macula (having 1mm or greater diameter). Otherwise,DME is classified as non-centrally involved [3]. AMD isanother retinal syndrome mostly found in elder people. Itis typically classified into two stages i.e. non-neovascularAMD and the neovascular AMD. Non-neovascular AMD isthe “dry” form of AMD in which small, medium or large-sized drusen can be observed. With the increasing diseaseseverity, abnormal blood vessels intervene retina leading tochorioretinal anomalies such as fibrotic scars and choroidalneovascular membranes (CNVM). In such a case, AMD isclassified as wet or neovascular AMD. Fig. 1 shows some ofthe fundus and OCT scans showing retinal lesions at differentstages of AMD and DME.II. R ELATED W ORK
In the literature, a large body of solutions assessing retinalregions employed features extraction techniques coupled withclassical machine learning (ML) tools. The majority of thesemethods are validated on a limited number of scans and thusexhibited feeble reproducibility. More recently, with the adventof deep learning, a wide variety of end-to-end approaches,operating on more massive datasets, have been proposed.
Traditional Approaches:
Fundus imagery has been themodality of choice for examining the retinal pathology for a r X i v : . [ c s . C V ] A ug while [5] and is still used as a secondary examination tech-nique in analyzing the complex retinal pathologies. But, withthe advent of OCT, most of the solutions for retinal image anal-ysis have migrated towards this new modality due to its abilityto present objective visualization of retinal abnormalities inearly stages. Chiu et al. [6] developed a kernel regressionwith graph theory and dynamic programming (KR+GTDP)scheme to extract retinal layers and retinal fluid from DMEaffected scans [6]. In [7], a Random Forest-based frameworkwas proposed for the automated extraction of retinal layersand fluid from scans affected by central serous retinopathy(CSR). Wilkins et al. [8] presented an automated method forthe extraction of intra-retinal cystoid fluid using OCT images.Vidal et al. [9] used a linear discriminant classifier, supportvector machines, and a Parzen window for the identificationof intra-retinal fluid (IRF). Apart from this, we have alsoproposed several methods for extracting retinal layers, retinalfluid, and for classifying retinopathy using traditional MLtechniques [10]–[14]. Deep Learning Methods:
Many researchers have applieddeep learning for the extraction of retinal layers [15] andretinal lesions such as IRF [16], sub-retinal fluid (SRF) [17]and HE [21]. Seebock et al. [19] proposed a Bayesian UNetbased framework for recognizing different anomalies withinthe retinal pathologies. Fang et al. [20] developed a lesion-aware convolutional neural network (LACNN) model for theaccurate classification of DME, choroidal neovascularization,drusen (AMD) and normal pathologies. LACNN is composedof a lesion detection network (LDN) and lesion-attentionmodule where LDN first generates a soft attention map toweight the lesion-aware features extracted from the lesion-attention module and then these features are used for theaccurate classification of retinal pathologies. Apart from this,we have recently proposed a hybrid retinal analysis and grad-ing architecture (RAGNet) [21] that utilizes a single featureextraction model for retinal lesions segmentation, lesion-awareclassification and severity grading of retinopathy based onOCT images. III. C
ONTRIBUTIONS
In this paper, we present a thorough evaluation of deeplearning models for the extraction of IRF, SRF, HE, drusen,and other chorioretinal anomalies such as fibrotic scars andCNVM from multi-modal retinal images. Furthermore, weexploited the transferability of these models for retinal lesionsextraction across multi-modal imagery. To the best of ourknowledge, there is no literature available to date providinga thorough transferability analysis of encoder-decoder, fullyconvolutional, scene parsing, and hybrid deep learning systemsfor extracting this multitude of lesions in one go from multi-modal retinal imagery. Subsequently, the main contributionsof this paper are: • A first comprehensive evaluation of semantic segmentation,scene parsing and hybrid deep learning systems such asRAGNet [21], PSPNet [22], SegNet [23], UNet [24], and FCN (8s and 32s) [25] for extracting multiple lesions frommulti-modal retinal imagery. • A comprehensive study encompassing seven publicly avail-able datasets, and five different retinal pathologies repre-sented in a total of 363 fundus and 173,915 OCT scansfrom which 297 fundus and 59,593 OCT scans were usedfor testing purposes. • A detailed exploration of the transferability of these modelsacross multiple scanner specifications.IV. P
ROPOSED A PPROACH
We propose a novel study to analyze the transferability of thestate-of-the-art deep learning frameworks across fused fundusand OCT imagery for extracting multiple retinal lesions in onego. The models which we considered are as follows:
RAGNet : is a hybrid convolutional network that can performpixel-level segmentation and scan-level classification at thesame time [21]. The uniqueness in the RAGNet architectureis that it uses the same feature extractor for the classificationand segmentation purposes. So, if the problem demands seg-mentation and classification from the same image based uponsimilar features, then RAGNet would be an ideal choice ratherthan using two separate models [21]. Here, we have only usedRAGNet segmentation unit since we are focusing on the retinallesions segmentation.
PSPNet : is a state-of-the-art scene parsing network that con-tains a pyramid pooling module to generate four pyramids offeature maps representing coarser to finer details to minimizethe loss of global scene context while generating the latentrepresentations [22]. The pooled outputs are then concatenatedwith the original feature maps to generate the final segmenta-tion results.
SegNet : is an encoder-decoder network for semantic seg-mentation. The uniqueness in the SegNet model is that ituses pooling indices from the corresponding encoder blockto up-sample the feature maps at the decoder end in a non-linear fashion. Afterward, the feature maps are convolved withtrainable filters to remove their sparsity. Moreover, SegNet hasa smaller number of trainable parameters due to which it iscomputationally more efficient.
UNet : is an auto-encoder inspired by FCN for semanticsegmentation. The key feature of UNet is that it is fast andcan generate good segmentation results with a small numberof training samples because of its in-built data augmentationstrategy [24]. UNet uses up-sampling instead of pooling opera-tions and generates a large number of feature maps to mitigatethe contextual information to the higher resolution layers [24].
FCN : is an end-to-end model proposed for semantic seg-mentation. FCN uses learned representation from the pre-trained models, fine-tune them, and generates finer pixel-levelpredictions in one go based upon up-sampling lower networklayers with finer strides. In this study, we have utilized FCN-8and FCN-32 (i.e. the finest and the coarsest version of FCN)for retinal lesions extraction.We have applied these models for extracting retinal lesionsfrom both fundus and OCT imagery. Here, we note thatur study covers some of the most complex and commonlyoccurring retinal pathologies including non-neovascular AMD,neovascular AMD, Ci-CSME, and non-Ci-CSME. We alsonote that the related scans were collected using machines fromdifferent manufacturers and exhibit varying scan quality. Tomake the comparison objective and highly reproducible, wehave used publicly available datasets in our investigations.Furthermore, we have tested the transferability of these modelsthrough an extensive cross-dataset validation. The series ofexperiments we conducted in this work provide a reliablebenchmark for assessing the robustness and generalizationcapacity of each model.V. E
XPERIMENTAL S ETUP
This section reports the detailed description of the datasetswhich have been used in this research. Furthermore, it containsimplementation details and the performance metrics on whichthe models are evaluated:
A. Datasets
We have evaluated all the models on seven publicly availableretinal image datasets where the ground truths for retinallesions were acquired through the Armed Forces Institute ofOphthalmology, Rawalpindi Pakistan. The detailed summaryof each dataset is presented below:
Rabbani-I [26] is one of the few datasets which containsboth OCT and fundus images of each subject reflecting AMD,DME, and normal pathologies. The dataset is acquired at NoorEye Hospital, Tehran Iran and contains a total of 4,241 OCTand 148 fundus scans from 50 normal, 48 dry AMD, and 50DME affected subjects. In this paper, we considered 37 fundusscans and 1,061 OCT scans for training and the rest for testingpurposes.
Rabbani-II [27] contains 12,800 OCT and 100 color fundusscans from both eyes of 50 healthy subjects. Since Rabbani-IIonly contain scans from the healthy subjects so it served asan excellent benchmark to test the false positive rate for allthe models (indicating how many false positives each modelgenerates).
Duke-I [28] is one of the oldest retinal OCT datasets contain-ing a total of 38,400 scans from which 26,900 scans reflect dryAMD and 11,500 scans show controlled (healthy) pathology.In the proposed study, a total of 300 scans have been used fortraining and 38,100 scans have been used for testing purposes.
Duke-II [6] has a total of 610 OCT images from 10 severeDME affected subjects. Moreover, the dataset contains highlydetailed markings for retinal layers and fluid from two clini-cians. In this paper, a total of 305 scans were used for trainingfrom the first five subjects, and the rest for testing.
Duke-III [29] is another dataset from Duke University whichwe used in our research. The dataset contains 723 scansreflecting dry AMD pathologies, 1,101 scans reflecting DMEpathology, and 1,407 scans showing normal retinal pathology.For the experimentations, we considered 3,048 scans fortraining and the rest 183 for testing.
BIOMISA dataset [30] contains a total of 5,324 OCT (657dry AMD, 2,195 ME, 904 normal, 407 wet AMD, and 1,161CSR) and 115 fundus scans from 99 subjects (17 healthy, 31ME, 8 dry AMD, 19 wet AMD, 24 CSR). In this study, atotal of 1,299 OCT and 29 fundus images from the BIOMISAdataset were used for training and the rest for the evaluationpurposes.
Zhang dataset [31] contains 109,309 scans representing wetAMD (CNV), dry AMD (Drusen), DME, and healthy patholo-gies. The dataset also presents a clear separation of 108,309scans for training while 1,000 scans for testing purposes whichwe followed in the experimentations as well.
B. Implementation Details
All the models have been implemented using Keras, Python3.7.4 on a machine having Intel 8 th generation Core i5,NVIDIA RTX 2080 GPU and 16 GB RAM where ResNet was used as a backbone. Moreover, the optimizer usedduring the training was an adaptive learning rate method(ADADELTA) with a default learning and decay rate. Thesource code has been released at http://biomisa.org/index.php/downloads/ for reproducibility. C. Evaluation Metrics
In the proposed study, all the segmentation models have beenevaluated using the following metrics:
Mean Dice Coefficient : Dice coefficient ( D C ) computes thedegree of similarity between the ground truth and the extractedresults using following relation: (D C = P P +F P +F N ) , where T P indicates the true positives, F P indicates the false positivesand F N indicates the false negatives. After computing D C foreach lesion class. The mean dice coefficient is computed foreach network by taking an average of their D C scores. Mean Intersection-over-Union : The mean intersection-over-union (IoU) is computed by taking an average of IoU scoresfor each lesion class where the IoU scores are computedthrough (IoU = T P T P +F P +F N ) . Recall, Precision and F-score : To further evaluate the models,we computed pixel-level recall (T PR = T P T P +F N ) , precision (P PV = T P T P +F P ) and F-score (F = PR xP PV T PR +P PV ) . Qualitative Evaluations : The performances of all the modelsfor lesions extraction have been also qualitatively evaluatedthrough some visual examples.VI. R
ESULTS AND D ISCUSSION
The evaluation of segmentation models has been conductedon the combination of all seven datasets containing mixedOCT and fundus scans. In terms of T PR and F as shownin Table I, RAGNet achieves 9.48% and 3.36% improvementsas compared to UNet and PSPNet, respectively. However, interms of precision, SegNet has a lead of 1.52% as comparedto PSPNet. This indicates that SegNet produces fewer falsepositives as compared to the rest of the models. For pixel-level comparison, we have excluded accuracy because it givesbiased results towards a dominant-negative class i.e. the back-ground.able I: Performance evaluations in terms of pixel-level recall,precision and F-scores on combined dataset. Bold indicate thebest performance. Network T PR P PV F RAGNet
PSPNet 0.7540 0.9200 0.8287SegNet 0.6388
Figure 2:
Comparison of retinal lesions extraction on combineddataset. (From left to right: original image, ground truth, RAGNet,PSPNet, UNet, FCN-8, FCN-32, SegNet). Blue, red, yellow, green,and pink indicate HE, IRF, SRF, CA, and drusen, respectively.
Tables 2 and 3 reports the performance of all the models forextracting retinal lesions in terms of mean D C and mean IoU,respectively. From Table 2, it can be observed that RAGNetachieves the best mean D C score of 0.822, leading PSPNetby 4.5% and FCN-32 by 51.33%. Moreover, in terms ofmean IoU, RAGNet also achieves the overall best performance(mean IoU: 0.710) showing a neat gap over its competitorsfor extracting IRF, SRF, and HE regions. In Table 3, thesecond-best performance is achieved by PSPNet that lags fromRAGNet by 6.9%. Also, we noticed that on fundus imagesUNet achieves optimal lesion extraction results with an overallperformance comparable to that of PSPNet. Fig. 2 shows thequalitative results of all the models when trained on multi-modal images from all seven datasets at once, where we cannotice the best overall performance of RAG-Net. It should benoticed that extracting lesions accurately from both modalitiesat once is quite challenging as their image features vary a lot.In the second series of experiments, we have conducted atransferability analysis to assess the generalization capabilitiesof all models. Here, we combined Duke-I, II, and III asone dataset (i.e. Duke) and Rabbani-I and Rabbani-II dataset Table II: Performance evaluations of deep segmentation mod-els for retinal lesions extraction in terms of D C . Bold indicatesthe overall best performance. Network IRF SRF CA HE Drusen MeanRAGNet
SegNet 0.810 0.610 0.886 0.373 0.695 0.675PSPNet 0.843 0.809
Table III: Performance evaluations of deep segmentation mod-els for retinal lesions extraction in terms of IoU. Bold indicatesthe overall best performance.
Network IRF SRF CA HE Drusen MeanRAGNet
SegNet 0.681 0.439 0.796 0.229 0.533 0.535PSPNet 0.728 0.680 as Rabbani to avoid redundant combinations as they havesimilar image features. We report the results in Table 4 whereit can be observed that all the methods have shown goodperformance for Duke and Zhang dataset pairs and this isnatural because both datasets are acquired through Spectralis,Heidelberg Inc. Moreover, RAGNet achieved the overall bestperformance as evident from Table 4, whereas PSPNet stood2 nd best but its performance is comparable with UNet. Inanother experiment, we have used Rabbani-II dataset to testhow many false positive does each model produce. SinceRabbani-II contains only healthy scans, so there are no actuallesions in this dataset. The best performance in this experimentis achieved by the RAGNet with a true negative (T N ) rate of0.9999 indicating that it produces a minimum number of falseTable IV: Transferability analysis (Training → Testing) for allmodels in terms of mean IoU. Bold and blue indicates the firstand second-best performance, respectively. (Datasets name arecoded as follows: R: Rabbani, D: Duke, Z: Zhang and B:BIOMISA). The rest of the abbreviations are RN: RAGNet,PN: PSPNet, SN: SegNet, UN: UNet, F-8: FCN-8, F-32: FCN-32.
RN PN SN UN F-8 F-32R → D → R → Z → R → R → B → D → Z → B → D → Z → B esions. Apart from this, the worse performance is achievedfor FCN-32 ( T N rate: 0.9379). The worse performance ofFCN-32 is even above 90% because the ratio of T N pixelsand the F P pixels is extremely high. The results for thisexperiment are available in the codebase package for thereaders at http://biomisa.org/index.php/downloads/.VII. C ONCLUSION AND F UTURE R ESEARCH
In this paper, we presented a thorough evaluation of semanticsegmentation, scene parsing, and hybrid deep learning sys-tems for extracting retinal lesions from fused fundus andOCT imagery. We also assessed the generalization capacityof each model through comprehensive cross-data validationswhere RAGNet, due to its robustness in retaining lesioncontextual information during scan decomposition, producessuperior results as compared to other models. Furthermore, thebenchmarking performed in this work will be of great utilityfor both researchers and practitioners who want to employdeep learning models for lesion-aware grading of the retina.In the future, we plan to extend and exploit this study for theextraction of the optic disc and retinal layers in the optic nervehead region for the glaucoma analysis.R
EFERENCES[1] G. M. Comers, “Cystoid macular edema,” in
Kellog Eye Center . Ac-cessed: June 2019.[2] “Diabetic macular edema,” in
EyeWiki . Accessed: November 4th, 2019.[3] N. Relhan et al. , “The early treatment diabetic retinopathy study his-torical review and relevance to today’s management of diabetic macularedema,” in
Current Opinion in Ophthalmology . Wolters Kluwer, May2017.[4] M. U. Akram et al. , “An automated system for the grading of diabeticmaculopathy in fundus images,” in . November 12th-15th, 2012.[5] T. Hassan et al. , “Review of oct and fundus images for detection ofmacular edema,” in
IEEE International Conference on Imaging Systemsand Techniques (IST) . September, 2015.[6] S. J. Chiu et al. , “Kernel regression based segmentation of optical coher-ence tomography images with diabetic macular edema,” in
BiomedicalOptics Express . Vol. 6, No. 4, April 2015.[7] D. Xiang et al. , “Automatic retinal layer segmentation of oct imageswith central serous retinopathy,” in
IEEE Journal of Biomedical andHealth Informatics . Vol 23, No. 1, January 2019.[8] G. R. Wilkins et al. , “Automated segmentation of intraretinal cys-toid fluid in optical coherence tomography,” in
IEEE Transactions onBiomedical Engineering . pp. 1109-1114, 2012.[9] P. L. Vidal et al. , “Intraretinal fluid identification via enhanced mapsusing optical coherence tomography images,” in
Biomedical OpticsExpress . October 2018.[10] S. Khalid et al. , “Automated segmentation and quantification of drusenin fundus and optical coherence tomography images for detection ofarmd,” in
Journal of Digital Imaging . December 2017.[11] S. Khalid et al. , “Fully automated robust system to detect retinal edema,central serous chorioretinopathy, and age related macular degenerationfrom optical coherence tomography images,” in
BioMed Research Inter-national . March 2017. [12] T. Hassan et al. , “Automated segmentation of subretinal layers for thedetection of macular edema,” in
Applied Optics . 55, 454-461, 2016.[13] B. Hassan et al. , “Structure tensor based automated detection of macularedema and central serous retinopathy using optical coherence tomogra-phy images,” in
Journal of Optical Society of America A . 33, 455-463,2016.[14] A. M. Syed et al. , “Automated diagnosis of macular edema and centralserous retinopathy through robust reconstruction of 3D retinal surfaces,”in
Computer Methods and Programs in Biomedicine . 137, 1-10, 2016.[15] L. Fang et al. , “Automatic segmentation of nine retinal layer boundariesin oct images of non-exudative amd patients using deep learning andgraph search,” in
Biomedical Optics Express . Vol. 8, No. 5, May 2017.[16] A. G. Roy et al. , “ReLayNet: retinal layer and fluid segmentationof macular optical coherence tomography using fully convolutionalnetworks,” in
Biomedical Optics Express . Vol. 8, No. 8, 1 August 2017.[17] T. Schlegl et al. , “Fully automated detection and quantification ofmacular fluid in oct using deep learning,” in
Ophthalmology . Vol. 125,No. 4, April 2018.[18] B. Hassan et al. , “Deep ensemble learning based objective grading ofmacular edema by extracting clinically significant findings from fusedretinal imaging modalities,” in
MDPI Sensors . July 2019.[19] P. Seebock et al. , “Exploiting epistemic uncertainty of anatomy seg-mentation for anomaly detection in retinal oct,” in
IEEE Transactionson Medical Imaging . May 2019.[20] L. Fang et al. , “Attention to lesion: Lesion-aware convolutional neuralnetwork for retinal optical coherence tomography image classification,”in
IEEE Transactions on Medical Imaging . August 2019.[21] T. Hassan et al. , “RAG-FW: A hybrid convolutional framework for theautomated extraction of retinal lesions and lesion-influenced grading ofhuman retinal pathology,” in
IEEE Journal of Biomedical and HealthInformatics . March 2020.[22] H. Zhao et al. , “Pyramid scene parsing network,” in
IEEE CVPR . 2017.[23] V. Badrinarayanan et al. , “Segnet: A deep convolutional encoder-decoderarchitecture for image segmentation,” in
IEEE Transactions on PatternAnalysis and Machine Intelligence . December 2017.[24] O. Ronneberger et al. , “U-net: Convolutional networks for biomedicalimage segmentation,” in
MICCAI . 2015.[25] J. Long et al. , “Fully convolutional networks for semantic segmentation,”in
IEEE CVPR . 2015.[26] R. Rasti et al. , “Macular oct classification using a multi-scale convo-lutional neural network ensemble,” in
IEEE Transactions on MedicalImaging . vol. 37, no. 4, pp. 1024-1034, April 2018.[27] T. Mahmudi et al. , “Comparison of macular octs in right andleft eyes ofnormal people,” in Proc. SPIE, Medical Imaging, San Diego, California,United States Feb. 15-20, 2014.[28] S. Farsiu et al. , “Quantitative classification of eyes with and withoutintermediate age-related macular degeneration using optical coherencetomography,” in
Ophthalmology . 121(1), 162-172 January 2014.[29] P. P. Srinivasan et al. , “Fully automated detection of diabetic macularedema and dry age-related macular degeneration from optical coherencetomography images,” in
Biomedical Optics Express . Vol. 5, No. 10 —DOI:10.1364/BOE.5.0035 68, 12 Sep 2014.[30] T. Hassan et al. , “BIOMISA Retinal Image Database for Macular andOcular Syndromes,” in
ICIAR-2018 . Portugal, June 2018.[31] D. Kermany et al. , “Identifying medical diagnoses and treatable diseasesby image-based deep learning,”