Conditional GAN for Prediction of Glaucoma Progression with Macular Optical Coherence Tomography
Osama N. Hassan, Serhat Sahin, Vahid Mohammadzadeh, Xiaohe Yang, Navid Amini, Apoorva Mylavarapu, Jack Martinyan, Tae Hong, Golnoush Mahmoudinezhad, Daniel Rueckert, Kouros Nouri-Mahdavi, Fabien Scalzo
CConditional GAN for Prediction of GlaucomaProgression with Macular Optical CoherenceTomography
Osama N. Hassan , Serhat Sahin , Vahid Mohammadzadeh , Xiaohe Yang ,Navid Amini , Apoorva Mylavarapu , Jack Martinyan , Tae Hong , GolnoushMahmoudinezhad , Daniel Rueckert , , Kouros Nouri-Mahdavi , and FabienScalzo , , Computing department, Imperial College London, London, UK Department of Electrical and Computer Engineering, UCLA, Los Angeles, USA Ophthalmology department, Jules Stein Eye Institute, Los Angeles, USA Department of Computer Science, UCLA, Los Angeles, USA David Geffen School of Medicine, UCLA, Los Angeles, USA Technische Universitt Mnchen, Munich, Germany Department of Computer Science, California State University, Los Angeles, USA Department of Neurology, UCLA, Los Angeles, USA
Abstract.
The estimation of glaucoma progression is a challenging taskas the rate of disease progression varies among individuals in additionto other factors such as measurement variability and the lack of stan-dardization in defining progression. Structural tests, such as thicknessmeasurements of the retinal nerve fiber layer or the macula with opticalcoherence tomography (OCT), are able to detect anatomical changes inglaucomatous eyes. Such changes may be observed before any functionaldamage. In this work, we built a generativedeep learning model usingthe conditional GAN architecture to predict glaucoma progression overtime. The patient’s OCT scan is predicted from three or two prior mea-surements. The predicted images demonstrate high similarity with theground truth images. In addition, our results suggest that OCT scansobtained from only two prior visits may actually be sufficient to predictthe next OCT scan of the patient after six months.
Keywords:
Generative models · CGAN · Glaucoma Progression · OCT.
Glaucoma is a progressive optic neuropathy and is the second leading cause ofblindness worldwide [1]. The number of people with glaucoma worldwide wasestimated to be about 60.5 million people in 2010 and it is expected to reach111.8 million in 2040 [2]. The retinal ganglion cell (RGC) machinery is locatedin the inner retina and RGC axons form the optic nerve. The role of the optic
O. N. Hassan and S. Sahin—Equal contribution. a r X i v : . [ ee ss . I V ] S e p O. N. Hassan et al. nerve is to transmit visual information from the photoreceptors to the brain.Glaucoma is characterized by slow degeneration of the RGC and their axonswhich leads to a functional visual loss in glaucoma patients [3,4]. The functionalvisual loss in glaucoma manifests as a progressive loss of vision mainly in theperiphery; if glaucoma is not treated, it can eventually lead to complete visualloss and blindness [5].Due to the progressive and asymptomatic nature of glaucoma, it is crucial forclinicians to diagnose it in its early stages and be able to detect its progressionin a timely manner to prevent the progressive functional loss [5–9]. Glaucomaprogression can be evaluated with structural and functional measures [10–17].The estimation of glaucoma progression is challenging as the rate of diseaseprogression varies among individuals [18]. Moreover, measurement variability,the influence of age-related attrition, and the lack of standardization in definingprogression make tracking disease deterioration a very challenging task with ei-ther structural or functional tests [19–21]. Standard achromatic perimetry andmeasurement of the visual field (VF) is the most common functional test usedto evaluate glaucoma progression [22]. It quantifies visual degradation in theperipheral field of view of the patient. Patients may experience VF loss after asubstantial amount of structural change has occurred [23]. Structural tests, suchas thickness measurements of the retinal nerve fiber layer or the macula (centralretina) with optical coherence tomography (OCT), are able to detect anatomicalchanges in glaucomatous eyes; such changes may be observed before any func-tional damage; hence, they may be useful for glaucoma detection especially inearly stages [24, 25]. Clinicians also depend on structural tests for the detectionof disease progression, especially in early to moderate stages [26, 27]. When thedisease becomes more advanced, structural measurements may reach their floorand further changes might be difficult to detect [28]. At this stage, functionaltests are considered to be more useful to track disease progression [29].The gold standard for retinal imaging at present is an optical imaging modal-ity called OCT [30]. OCT is non-invasive and is able to acquire high resolution,in-vivo cross-sectional or 3D images from transparent or semi-transparent bio-logical tissues. With the aid of OCT, it has become possible to image retinalanatomy including individual layers such as the ganglion cell layer and diagnoseglaucoma before the visual field defects emerge. OCT systems can be classifiedinto time domain based OCT (TD-OCT) and spectral domain based OCT (SD-OCT). The SD-OCT systems have better resolution, are much faster, have higherreproducibility and are more computationally efficient and therefore, SD-OCThas become the gold standard for imaging of the retinal and the optic nervehead [29]. An example of a retinal OCT cross section is shown in Fig. 1.The goal of our work is to provide a computational framework for the model-ing of glaucoma progression over time based on macular OCT images. A datasetof longitudinal macular OCT images is used. Macular OCT images of aroundhundred eyes with more than two years of follow-up were used. We aim to pre-dict structural and functional changes over time. More specifically, assume wehave images x , x i to x n − where each image represents a scan at a specific time GAN based Prediction of Glaucoma Progression 3
Fig. 1: (Right) A raw macular B-scan of optical coherence tomography pass-ing through the fovea (center of the macula). (Left) An infrared image of themacula and the green square outlines the area in which all the B-scans will betransmitted.point i , the question we address here is how the image x n looks like and whetherthe changes are beyond what is expected. A machine learning based algorithmis used to make the prediction and reconstruction of image x n . Our study pri-marily uses the generative adversarial networks (GAN) to achieve its predictiongoal. The framework of generative model is like a minimax two-player game.The GAN consists of two components: a generator G and a discriminator D .The generator captures the data distribution and predicts the next time-pointimage based on the input images of previous time points. On the other hand,the discriminator tries to distinguish between the ground-truth image and theimage predicted by the generator. The training succeeds when the discriminatoris no longer able to tell any difference between the ground-truth images andthe predicted images and the generator totally fools the discriminator. Both thegenerator and discriminator models are constructed using neural networks [31]. Our dataset consists of longitudinal macular OCT images of 109 eyes. Each eyeis scanned at four to ten visits separated by six months. Each visit has a macularOCT volume that consists of 61 cross-sectional B-scans from the central retinaspanning 30 X
25 degrees. The hierarchy of the dataset is depicted in Fig. 2.The objective of this work is to predict glaucoma progression over time by theconstruction of a future macular cross-sectional image from past measurements.To elaborate, for a cross-sectional image that is available at 3 time points x , x , x , we reconstruct the cross-sectional image at time point x . In other words,the model’s task is to learn the growth of glaucoma-related features of OCTimages over different cross sectional images in individual patients. Moreover, weset no constraints on the baseline which implies that the input images can be atany stage of the disease provided that subsequent images are separated by sixmonths in time. O. N. Hassan et al.
Fig. 2: The hierarchy of the data set. Each patient, at each visit (date) has 61cross-sectional images (B-scans) of the retina.
We adopted in our work the image-to-image translation framework with con-ditional generative adversarial network (cGAN) that is presented in [32], withsome minor modifications in our implementation of its architecture.
The motivation of using GAN in this problem is its flexibility of specifying theobjective of the network at high-level by requiring the output of its generatorto be indistinguishable from reality; the network then automatically learns theloss function that is necessary to achieve this through its adversarial mechanism.That is, the GAN learns a loss function through its discriminator that attemptsto classify the generated image as true or fake while training a generative modelthat tries to minimize this loss at the same time. This learning-based loss functionintroduces a general framework to many tasks for which defining a loss functionwould be otherwise very difficult. In addition, we have chosen to use particularlyconditional GAN to enforce the network to constrain each generated outputimage to the corresponding input; in other words, the output of the GAN networkis conditioned on the input images [32]. The cGAN model consists of:
Generator Model
The generator architecture can be divided into two blocks.First, a 3D convolutional neural network (3D-CNN) block to learn the spatio-temporal features in the input image frames (see Fig. 3a) [33]. Second, similarto [32], a U-Net based architecture, as originally proposed in [34], is used as themain block of the generative model. The general architecture is shown in Fig 3b.The U-Net generator is a decoder-encoder network with long skip connections.The network consists of 4 encoding/down-sampling layers and 4 decoding/up-sampling layers. It uses a skip connection mechanism that copies the learnedfeatures from layer i to layer n − i , where n is the total number of layers. At eachlayer of the generator, except for the last layer, rectified linear units (ReLU)are used in the up-sampling part of the network, and their leaky version are GAN based Prediction of Glaucoma Progression 5(a) 3D convolutional block to extract the spatio-temporal features.(b) Generator with U-Net-based architecture. Reproduced from [32].
Fig. 3: The components of our proposed model showing the feature extractionblock and the generator.used in the down-sampling part. In addition, batch normalization layers [35] areadded to accelerate the training process and dropout layers are used within theup-sampling layers (except for the first and last layers) to add randomness tothe generative process.
Discriminator Model
The discriminator is a fully CNN classifier. In thisstudy, we adopted the five-layer PatchGAN discriminator that is proposed at [32]and originally discussed at [36].Although using the L norm in the loss function does not preserve high fre-quencies and results in blurry images, it preserves the low frequency content andtherefore if we use L loss, the GAN discriminator can be designed to be morededicated to preserving the structural and high frequency content of the gener-ated images while leaving the the low frequency preservation task to the L loss.In order for the discriminator to preserve high frequencies, it does not classifythe image as a whole. Instead, it treats it as patches and classifies each patchas real or fake. This way the discriminator offers structured loss functionalityand penalizes the joint configuration of the output and does not consider theoutput of each pixel to be conditionally independent in an unstructured fashion.This design modality of the discriminator is called PatchGAN since it penalizesstructures at the patch scale. This results in the additional advantages of havinga discriminator with fewer parameters and being able to apply the discriminator O. N. Hassan et al. to arbitrarily large images. The PatchGAN classifies patches of size 70 x
70 assuggested by [32]. A concatenation of both the input images to the generatorand the image to be classified are fed to the discriminator (see Fig. 5) and passedthrough five down-sampling stages resulting in a 2D map in which each pixel hasa receptive field of 70 x
70; the corresponding patch in the input image is thenclassified as real or fake.Fig. 4: PatchGAN-based discriminator Network. Reproduced from [32].
In a vanilla GAN, the generator loss ( L G ) and the discriminator loss ( L D )aredefined as L G = F ( D (ˆ y ) , ,L D = F ( D (ˆ y ) ,
0) + F ( D ( y ) , , where F can be a binary cross entropy (BCE) loss or mean squared error(MSE) loss, y is the real ground-truth target image, and ˆ y is the predictedoutput of the generator. The discriminator input in this case the output of thegenerator ˆ y .However, in conditional GAN, the discriminator input includes both the gen-erator input x and the generator output ˆ y . In addition, we add to the generatorloss: L norm loss, to capture the low frequency content, as explained earlier.This can be written as L G = F ( D ( x, ˆ y ) ,
1) + α ∗ L (ˆ y, y ) ,L D = F ( D ( x, ˆ y ) ,
0) + F ( D ( x, y ) , . (1) GAN based Prediction of Glaucoma Progression 7 where the hyper-parameter α is used to emphasize the weight of the L lossand is optimized empirically. For training, 26 ,
592 OCT cross-sections from 101 glaucomatous eyes were pre-pared. These eyes were imaged in at least four visits. We conducted two differentexperiments. In experiment A , the model was trained based on a sequence of fourimages using the first 3 visits as the input and the fourth visit as the output ofthe model. In experiment B , the model was trained based on a sequence of threevisits using the first two visits as the input and the third visit as the output ofthe model. We arranged the training and validation split percentages as 75% and15% respectively. The less the number of visits that one uses as an input, themore useful the model becomes when we have limited data for a given patient.Fig. 5: Training conditional GAN. (Top) the generator optimization step and(Bottom) the discriminator optimization step.In GAN training, it is often seen that the discriminator detects the outputsof the generator as fake images at very early stages of the training process, whichstops the generator from learning. To prevent this issue, we alternated betweenfour optimization steps on the generator and then one optimization step onthe discriminator as this experimentally resulted in an optimum performance.In optimization, Adam optimizer was used for the generator with momentumparameters β = 0.5 and β = 0.999. For the discriminator, stochastic gradientdescent (SGD) with momentum (0.5) was used. A batch size of eight was usedin training and dropout was used at rate of 0 . O. N. Hassan et al.
The test data split had the scans of 16 patients that was not used during trainingthe model. For the experiment A, a total of 2 ,
379 input-output pairs were pre-pared from 8 patients that have at least four visits, while in the experiment B,a total of 3 ,
111 input-output pairs were prepared for the all of the 16 patients’data as they all have at least three visits. Unlike the conventional protocol, wefollowed [32] in applying dropout and batch normalization, using the test batchstatistics, at the test time as well.Fig. 6: Examples of ground-truth macular OCT images (left column) vs. thecorresponding GAN generated images (right column). The red lines highlightfrom left-to-right the nasal peak, the foveal pit and the temporal peak.Examples of ground-truth macular OCT images and the corresponding GAN-generated images are shown in Fig.6. These cross-sections pass through the fovea,which is located in the center of the macula and where the visual acuity is thebest. To evaluate the accuracy of the generated images, the similarity betweenthe original B-scans (i.e. ground truth) and the constructed B-scans (i.e. pre-dicted B-scan) is measured by the structural similarity index measure (SSIM).SSIM takes into account changes in luminance, contrast, and structure. TheSSIM index ranges in [0, 1], where 0 indicates no similarity between two im-ages and 1 implies perfect similarity. The SSIM index has been shown to be inaccordance with human visual perception and human grading of image similar-ity [37]. Since SSIM is measured locally, it is less sensitive to noise compared withother image similarity measurements such as the mean squared error (MSE) andpeak signal-to-noise ratio (PSNR) [38]. Another advantage of SSIM over MSEor PNSR is that SSIM measures the perceived change in structural information,taking into account, the inter-dependencies of spatially proximate pixels and notjust the error [39]. The SSIM results are summarized in Table 1.
GAN based Prediction of Glaucoma Progression 9
Table 1: Evaluation of the SSIM metric value for the results of experiments “A”and “B”.
Experiment Average SSIMA (with 3 visits) 0.8325B (with 2 visits) 0.8336
The visual inspection of the OCT images (i.e ground-truth) and the GAN gen-erated images (see Fig. 6) initially demonstrates good structural agreement be-tween them. Furthermore, the network has a denoising effect on the images whichis evident by comparing the noise in the background of the generated and groundtruth images.The SSIM results are above 0.83 for both experiments, which demonstratesthe accuracy for our method. In addition, both experiments have very closeSSIM values suggesting that it is actually adequate to use two visits to make thepredictions and adding a third visit does not help the model make better predic-tions. This is practically very useful as it makes it possible to make predictionswith limited number of visits.Fig. 7: An example of an artifact that can be generated by the GAN networkand result in a corrupted image and wrong predictions. (Left) ground truth and(Right) predicted image. Duplicate image representation can be observed.A limitation of our method, although uncommon, is the artifacts that canexist in the predicted image. An example of an artifact is shown in Fig. 7 wherethe network superimposed duplicate cross-sections on top of each other. This is aweakness of the current GAN methods and represents a potential area for furtherresearch. Increasing the training dataset size or constraining the cost functionwith more priors or implementing a hybrid model of both learning-based andrule-based models may help us solve this problem in the future but this remains,for now, an open problem for neural networks based generative models in medicalimage analysis.
Glaucoma is an eye disease that results in irreversible vision loss and is thesecond leading cause of blindness worldwide. Monitoring glaucoma patients forsigns of progression and slowing the decay rate is the ultimate goal of glaucomatreatment. Clinicians depend on retinal structural information obtained withoptical coherence tomography for tracking disease progression.In this work, we built a learning-based generative model using a conditionalGAN architecture to predict glaucoma progression over time by reconstructingmacular cross-sectional images from three or two prior measurements separatedby six-month intervals with no constraints on the stage of the disease at thethe baseline. We conducted two experiments, one with prior three visits as aninput to the model and the other is only with two prior visits as the input. Inthe first experiment, a total of 2 ,
379 predictions were made for eight patientsbased on the previous three visits and the predicted images demonstrated a highsimilarity compared with the ground truth images with an SSIM of 0.8325. Inthe second experiment, a total of 3 ,
111 predictions were made based on twoprior visits resulting in an SSIM of 0.8336. This shows that only two visits mayactually be sufficient to use to make the predictions.A limitation of our method is duplicate image artifacts that were observedin some predicted images and future work may investigate this challenge. Inaddition, automated segmentation based techniques that are tailored to thisproblem may be used as an alternative way to accurately measure the layers’thicknesses to evaluate the quality of the generated images.
References
1. Quigley, H.A., Broman, A.T.: The number of people with glaucoma worldwide in2010 and 2020. British journal of ophthalmology (3), 262–267 (2006)2. Tham, Y.C., et al.: Global prevalence of glaucoma and projections of glaucoma bur-den through 2040: a systematic review and meta-analysis. Ophthalmology, (11),2081–2090 (2014)3. Quigley, H.A., Dunkelberger, G.R., Green, W.R.: Retinal ganglion cell atrophycorrelated with automated perimetry in human eyes with glaucoma. Americanjournal of ophthalmology, (5), 453–464 (1989)4. Quigley, H.A., et al.: Retinal ganglion cell death in experimental glaucoma andafter axotomy occurs by apoptosis. Investigative ophthalmology & visual science, (5), 774–786 (1995)5. Weinreb, R.N., Aung, T., Medeiros, F.A.: The pathophysiology and treatment ofglaucoma: a review. Jama, (18), 1901–1911 (2014)6. Nouri-Mahdavi, K., Caprioli, J.: Measuring rates of structural and functionalchange in glaucoma. British Journal of Ophthalmology, (7), 893–898 (2015)7. Coleman, A.: Glaucoma. The Lancet, (9192), 1803-1810 (1999)8. Weinreb, R.N., Aung, T., Medeiros, F.A.: The pathophysiology and treatment ofglaucoma: a review. Jama, (18), 1901–1911 (2014)9. Weinreb, R.N., et al.: Risk assessment in the management of patients with ocularhypertension. American journal of ophthalmology, (3), 458–467 (2004)GAN based Prediction of Glaucoma Progression 1110. Raza, A.S., Hood, D.C.: Evaluation of the structurefunction relationship in glau-coma using a novel method for estimating the number of retinal ganglion cells inthe human retina. Investigative ophthalmology & visual science, (9), 5548-5556(2015)11. Sharma, P., Sample, P.A., Zangwill, L.M., Schuman, J.S.: Diagnostic tools forglaucoma detection and management. Survey of ophthalmology, (6), S17–S32(2008)12. Alexandrescu, C., et al.: Confocal scanning laser ophthalmoscopy in glaucoma di-agnosis and management. Journal of medicine and life, (3), 229 (2010)13. Andreou, P.A., et al.: A comparison of HRT II and GDx imaging for glaucomadetection in a primary care eye clinic setting. Eye, (8), 1050–1055 (2007)14. Belghith, A., et al.: A unified framework for glaucoma progression detection usingHeidelberg Retina Tomograph images. Computerized medical imaging and graph-ics, (5), 411–420 (2014)15. Lin, S.C., et al.: Optic nerve head and retinal nerve fiber layer analysis: a reportby the American Academy of Ophthalmology. Ophthalmology, (10), 1937–1949(2007)16. Na, J.H., et al.: The glaucoma detection capability of spectral-domain OCT andGDx-VCC deviation maps in early glaucoma patients with localized visual fielddefects. Graefe’s Archive for Clinical and Experimental Ophthalmology, (10),2371–2382 (2013)17. Stein, J.D., Talwar, N., LaVerne, A.M., Nan, B., Lichter, P.R.: Trends in use ofancillary glaucoma tests for patients with open-angle glaucoma from 2001 to 2009.Ophthalmology, (4), 748–758 (2012)18. Lee, W.J., et al.: Rates of ganglion cell-inner plexiform layer thinning in normal,open-angle glaucoma and pseudoexfoliation glaucoma eyes: a trend-based analysis.Investigative ophthalmology & visual science, (2), 599–604 (2019)19. Wadhwani, M., et al.: Test-retest variability of retinal nerve fiber layer thick-ness and macular ganglion cell-inner plexiform layer thickness measurements usingspectral-domain optical coherence tomography. Journal of glaucoma, (5), e109–e115 (2015)20. Heijl, A., Lindgren, A., Lindgren, G.: Test-retest variability in glaucomatous visualfields. American journal of ophthalmology, (2), 130–135 (1989)21. Kim, K.E., Yoo, B.W., Jeoung, J.W., Park, K.H.: Long-term reproducibility ofmacular ganglion cell analysis in clinically stable glaucoma patients. Investigativeophthalmology & visual science, (8), 4857–4864 (2015)22. Caprioli, J., et al.: A method to measure and predict rates of regional visual fielddecay in glaucoma. Investigative ophthalmology & visual science, (7), 4765–4773(2011)23. Hood, D.C., Kardon, R.H.: A framework for comparing structural and functionalmeasures of glaucomatous damage. Progress in retinal and eye research, (6),688–710 (2007)24. Leung, C.K.S., et al.: Evaluation of retinal nerve fiber layer progression in glau-coma: a study on optical coherence tomography guided progression analysis. In-vestigative ophthalmology & visual science, (1), 217–222 (2010)25. Edlinger, F.S., et al.: Structural changes of macular inner retinal layers in earlynormal-tension and high-tension glaucoma by spectral-domain optical coherencetomography. Graefe’s Archive for Clinical and Experimental Ophthalmology, (7), 1245–1256 (2018)2 O. N. Hassan et al.26. Anraku, A., et al.: Baseline thickness of macular ganglion cell complex predictsprogression of visual field loss. Graefe’s Archive for Clinical and ExperimentalOphthalmology, (1), 109–115 (2014)27. Zhang, X., et al.: Predicting development of glaucomatous visual field conversionusing baseline fourier-domain optical coherence tomography. American journal ofophthalmology, 163, 29–37 (2016)28. Miraftabi, A., et al.: Macular SD-OCT outcome measures: comparison of localstructure-function relationships and dynamic range. Investigative ophthalmology& visual science, (11), 4815–4823 (2016)29. Akman, A., Bayer, A. and Nouri-Mahdavi, K.: Optical Coherence Tomography inGlaucoma: A Practical Guide. 1st edn. Springer (2018)30. Parikh, R.S., et al.: Diagnostic capability of optical coherence tomography (StratusOCT 3) in early glaucoma. Ophthalmology, (12), 2238–2243 (2007)31. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in neural informa-tion processing systems, pp. 2672–2680 (2014)32. Isola, P., Zhu, J.Y., Zhou, T. and Efros, A.A.: Image-to-image translation with con-ditional adversarial networks. In: Proceedings of the IEEE conference on computervision and pattern recognition, pp. 1125–1134 (2017)33. Tran, D., et al.: Learning spatiotemporal features with 3d convolutional networks.In: Proceedings of the IEEE international conference on computer vision, pp. 4489–4497 (2015)34. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedi-cal image segmentation. In: International Conference on Medical image computingand computer-assisted intervention, pp. 234–241. Springer, Cham (2015)35. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training byreducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)36. Li, C., Wand, M.: Precomputed real-time texture synthesis with markovian gener-ative adversarial networks. In: European conference on computer vision, pp. 702–716. Springer, Cham (2016)37. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment:from error visibility to structural similarity. IEEE transactions on image processing, (4), 600–612 (2004)38. Dosselmann, R., Yang, X.D.: A comprehensive assessment of the structural simi-larity index. Signal, Image and Video Processing, (1), 81–91 (2011)39. Marson, A.M. and Stern, A.: Horizontal resolution enhancement of autostereoscopythree-dimensional displayed image by chroma subpixel downsampling. Journal ofDisplay technology,11