[PDF] Multi-scale GCN-assisted two-stage network for joint segmentation of retinal layers and disc in peripapillary OCT images

Abstract

An accurate and automated tissue segmentation algorithm for retinal optical coherence tomography (OCT) images is crucial for the diagnosis of glaucoma. However, due to the presence of the optic disc, the anatomical structure of the peripapillary region of the retina is complicated and is challenging for segmentation. To address this issue, we developed a novel graph convolutional network (GCN)-assisted two-stage framework to simultaneously label the nine retinal layers and the optic disc. Specifically, a multi-scale global reasoning module is inserted between the encoder and decoder of a U-shape neural network to exploit anatomical prior knowledge and perform spatial reasoning. We conducted experiments on human peripapillary retinal OCT images. The Dice score of the proposed segmentation network is 0.820\pm0.001 and the pixel accuracy is 0.830\pm0.002, both of which outperform those from other state-of-the-art techniques.

Full PDF

MMulti-scale GCN-assisted two-stage network forjoint segmentation of retinal layers and disc inperipapillary OCT images J IAXUAN L I , P EIYAO J IN , J IANFENG Z HU , H AIDONG Z OU , X UN X U , M IN T ANG , M INWEN Z HOU , Y U G AN , J IANGNAN H E Y UYE L ING

AND Y IKAI S U John Hopcroft Center for Computer Science, Shanghai Jiao Tong University, Shanghai 200240, China Department of Preventative Ophthalmology, Shanghai Eye Disease Prevention and Treatment Center,Shanghai Eye Hospital, Shanghai 200040, China Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University, Shanghai200080, China Department of Electrical and Computer Engineering, The University of Alabama, AL 35487, USA State Key Lab of Advanced Optical Communication Systems and Networks, Department of ElectronicEngineering, Shanghai Jiao Tong University, Shanghai 200240, China * [email protected] and [email protected] Abstract:

An accurate and automated tissue segmentation algorithm for retinal opticalcoherence tomography (OCT) images is crucial for the diagnosis of glaucoma. However, due tothe presence of the optic disc, the anatomical structure of the peripapillary region of the retina iscomplicated and is challenging for segmentation. To address this issue, we developed a novelgraph convolutional network (GCN)-assisted two-stage framework to simultaneously label thenine retinal layers and the optic disc. Speciﬁcally, a multi-scale global reasoning module isinserted between the encoder and decoder of a U-shape neural network to exploit anatomical priorknowledge and perform spatial reasoning. We conducted experiments on human peripapillaryretinal OCT images. The Dice score of the proposed segmentation network is 0.820 ± ± © 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Glaucoma is the leading cause of irreversible blindness globally, which aﬀects approximately64.3 million individuals worldwide [1]. In China, around 13.12 million people were aﬀected byglaucoma in 2015, and the number is projected to reach 25.16 million by 2050 [2]. This will causea heavy burden on public health. Currently, the most eﬀective way to prevent glaucoma- relatedvision loss is early diagnosis and early intervention. In particular, detecting small morphologicalchanges of retinal layers, such as the thinning of the retinal nerve ﬁber layer (RNFL) and ganglioncell layer (GCL), has critical value on the precise diagnosis of glaucoma [3].Optical coherence tomography (OCT) as a non-invasive three-dimensional imaging modality iscommonly used in eye clinics for retinal inspection. Due to the micro-meter-level axial resolution,it provides a unique capability to directly visualize the stratiﬁed structure of the retina and accesstheir corresponding thicknesses. OCT-derived thickness of peripapillary RNFL is a commonindicator in early-stage glaucoma diagnosis [4]. Therefore, a precise tissue segmentation ofretinal OCT images becomes a critical step towards successful early diagnosis of glaucoma.However, manual segmentation is time-consuming and laborious, while an accurate automatedalgorithm is desirable by both clinicians and researchers. Numerous automated retinal OCTsegmentation techniques have been proposed in the past decades [5–17]. OCTExplorer is a a r X i v : . [ ee ss . I V ] F e b rominent example for retinal layer boundary extraction, which is based on a conventionalgraph-theory algorithm [8]. Alonso-Caneiro et al . and Tian et al . designed graph-theory-basedboundary detectors to extract choroidal boundary after image pre-processing [6, 14]. Mayer et al . proposed an energy-minimization-based algorithm for retinal nerve ﬁber layer surfacesegmentation in circular OCT images [10]. Lang et al . presented a random forest boundaryclassiﬁer to segment eight retinal layers in macular cube images [9]. Chiu et al . reported a kernelregression-based segmentation method for retinal OCT images with diabetic macular edema [7]. Fig. 1. The comparison between (a) macular OCT image and (b) peripapillary OCTimage. The peripapillary image is manually segmented. Ten labels including RNFL,GCL, IPL, INL, OPL, ONL, IS/OS, RPE, choroid, and optic disc are used and annotatedby diﬀerent colors. The layer structure follows an arrangement that the optic nerve headis located in the center of the image, while much thinner retinal layers are stratiﬁed onboth sides.

Recently, convolutional neural networks (CNN) have been widely applied to segment imagesobtained from various modalities and thus enable exciting applications [18–26]. Fully convo-lutional networks (FCN) [27] and U-Net [28] are two popular candidates for medical imagesegmentation. For retinal OCT image segmentation, most state-of-the-art models [29–38] couldbe considered variants of the encoder-decoder architecture like FCN and U-Net. Roy et al .proposed ReLayNet for end-to-end segmentation of macular OCT B-scans into retinal layersand ﬂuid masses [34]. This work is among the ﬁrst deep learning-based methods for automatedsegmentation of retinal OCT images. Yang et al . designed an attention-guided channel-to-pixelconvolution network for retinal layer segmentation with choroidal neovascularization [37]: achannel-to-pixel block and an “edge loss function” were used to segment the retinal layer withblurry boundaries. To address the large morphological variations of the retinal layers, they alsoemployed the attention mechanism. However, these two techniques were mainly targeting macularretinal image segmentation rather than that of peripapillary retinal images. Zang et al . developedan automated segmentation method for peripapillary retinal layer segmentation [38]. The leftand right boundaries of the optic disc were ﬁrst determined based on the estimated position ofBruch’s membrane opening in radially resampled B-scans. The retinal layer boundaries werethen segmented by combining a convolutional neural network with a multi-weight graph searchalgorithm. Devalla et al . proposed a dilated-residual U-Net (DRUNET) to facilitate end-to-endsegmentation of the individual neural and connective tissues of the optical nerve head [30].However, they only segmented the retina into ﬁve layers and did not fully segment the optic discfrom its connected tissues. In summary, most aforementioned techniques perform segmentationbased on the textural features of the OCT images, while abundant anatomical priors available inthe peripapillary retinal OCT images are not utilized.In this manuscript, we report our recent study on explicit exploiting the prior knowledgexisted in the peripapillary OCT images. We argue that all peripapillary OCT images obtainedby following a strict clinical protocol should share a similar anatomical arrangement: the opticnerve head, which is a large structure, is located in the center region of the image, while muchthinner retinal layers are stratiﬁed on both sides, as shown in Fig. 1. Inspired by Jamal et al .’swork that uses a graph to represent the domain knowledge and the structural relationship ofthe tissues [39], we designed a novel multi-scale graph convolutional network (GCN)-assistedtwo-stage network for joint segmentation of retinal layers and optic disc in peripapillary OCTimages to fully take advantage of the anatomical priors. To show the eﬃcacy of the proposedframework, experiments were conducted on a collected peripapillary OCT dataset, which consistsof a total number of 122 OCT B-scans from 61 patients, and another public dataset [40]. Theproposed model demonstrated superior performances on both datasets in comparison with thebaselines and the state-of-the-arts. In the future, we plan to integrate the proposed segmentationframework into a diagnostic workﬂow for early-stage glaucoma detection. The dataset and thesource codes are now publicly available online at https://github.com/Jiaxuan-Li/MGU-Net.

2. Method

In current study, 10 labels including retinal nerve ﬁber layer (RNFL), ganglion cell layer (GCL),inner plexiform layer (IPL), inner nuclear layer (INL), outer plexiform layer (OPL), outer nuclearlayer (ONL), inner/outer photoreceptor segment (IS/OS), retinal pigment epithelium (RPE),choroid, and optic disc are manually annotated on the OCT dataset to facilitate the trainingprocedure as illustrated in Fig. 1(b).

The schematic diagram of the proposed segmentation framework is given in Fig. 2. The entireframework consists of three components: the optic disc detection network, the retinal layersegmentation network, and the fusion module. An input OCT image is ﬁrst processed by theoptic disc detection network, through which a mask indicating the location of the optic disc andthe corresponding feature map will be obtained. We then apply the mask on the input image togenerate a disc-free image, which is later fed to the retinal layer segmentation network withoutbeing sliced. Similarly, a feature map that delineates the nine retinal layers is also obtained andlater concatenated with that of the optic disc from the ﬁrst stage. Finally, a softmax activationfunction is used to generate the segmented output based on the concatenated feature map in thefusion module. The entire framework is trained in an end-to-end fashion: two loss functions aredeﬁned to penalize both the intermediate disc detection and the ﬁnal segmentation.Speciﬁcally, the optic disc detection network and the retinal layer segmentation network aredesigned to exploit the anatomical priors of the peripapillary region of the retina and to addressthe segmentation challenges imposed by the variation of the thicknesses among diﬀerent retinallayers as illustrated in Fig. 3. In the right panel of Fig. 3, it is observed that the non-disc and discregions are horizontally arranged in a “non-disc”-“disc”-“non-disc” fashion on the global scale.On the other hand, if we zoom in to the “non-disc” region as shown in the left panel of Fig. 3, wecould appreciate that the retinal layers with various thicknesses are instead vertically arranged.Therefore, the optic disc detection and the retinal layer segmentation networks are devised withgraph reasoning blocks with diﬀerent design goals: the former is to perform long-range horizontalspatial reasoning, while the latter is to capture the multi-level vertical structures.

The key module used in both the optic disc detection network and the retinal layer segmentationnetwork is the graph reasoning block.Inspired by the graph-based global reasoning network [41, 42], we devised a graph reasoningblock to eﬀectively extract the global features of 9 retinal layers and optic disc. The schematic ig. 2. Fig. 2. Schematic diagram of the proposed segmentation framework. (a) Theentire process consists of three main steps: (1) the ﬁrst stage is for initial optic discdetection; (2) the second stage is for retinal layer segmentation; (3) the outputs fromthe previous two stages are later fused. (b) A simpliﬁed illustration of Multi-scaleGCN-assisted U-shape network (MGU-Net). MGU-Net consists of a pair of encoderand decoder, with a Multi-scale global reasoning module (MGRM) inserted in between.The detailed structure for MGU-Net and MGRM could be found in Fig. 6 and Fig. 5,respectively. diagram of the graph reasoning block, which consists of four operations, is depicted in Fig. 4.First of all, a local feature map X ∈ R C × H × W in the latent space is fed to two convolutional layers inparallel to generate two maps: one feature map with reduced dimension and one projection matrix.After that, the reduced dimension feature is reshaped to X r ∈ R C r × HW , while the projection matrixis reshaped and transposed as X a ∈ R C n × HW . A matrix multiplication between X r and X a is thenperformed to obtain a node feature map H ∈ R C r × C n before its being sent to a GCN block. Wefurther connect X to a convolutional layer to create an inverse projection matrix X d ∈ R C n × H × W .The output of the GCN block is multiplied by the reshaped X d ∈ R C n × H × W to transform backto the original latent space, which is then reshaped and undergo another convolutional layer toeventually obtained the feature map M ∈ R C × H × W . Finally, we perform an element-wise additionof M and X to acquire the new feature map Y ∈ R C × H × W It is clear that the new feature map Y contains both the information from the global feature andthe original feature, which enables its capability of processing long-range contextual information. ig. 3. Graph-based representation of peripapillary retinal OCT image. The imagepossesses a horizontal layout as “non-disc”-“disc”-“non-disc” on the global scale. Azoomed-in view of the “non-disc” region presents a stratiﬁed structure instead. Oursegmentation framework is designed to exploit the anatomical priors of the peripapillaryregion of the retina and to address the segmentation challenges caused by the variationof thickness among retinal layers.Fig. 4. The schematic diagram of a graph reasoning block, which consists of fouroperations. First of all, the original features are projected to the node space. After that,graph convolutions are performed on the node-space features to extract global nodefeatures. In order to fuse the global node features with the original features, they arethen inversely projected back to the original feature space before being fused with theoriginal features.

To address the segmentation challenges caused by the large variation of the retinal thicknessesbetween diﬀerent layers, we propose a multi-scale global reasoning module (MGRM) to conductglobal reasoning on high-level semantic features in all nine retinal layers. MGRM is composedof multi-scale pooling operators and graph reasoning blocks. It uses multiple eﬀective receptiveﬁelds to capture and learn the features of the retinal layers with diﬀerent thicknesses.The MGRM split the input into four diﬀerent paths, three of which are equipped with poolingayers of diﬀerent kernel sizes followed by graph reasoning blocks: inspired by [43], the kernelsizes are set to 2 ×

2, 3 × × Fig. 5. The structure of multi-scale global reasoning module is composed of fourbranches. No pooling operator is in the ﬁrst branch. There are pooling operators with 2 × × × We use a U-shape network developed on the basis of the classic U-Net [28] as the backbone ofthe proposed multi-scale GCN-assisted U-shape network (MGU-Net). The schematic diagramof MGU-Net is presented in Fig. 6. MGRM is located in the center of the network to connectthe encoding and decoding paths. It captures additional long-range contextual features, whichare diﬃcult to acquire in conventional neural network. After several convolution operationsand max-pooling operations in the encoder part, the feature map provides rich spatial features,which are informative for aggregating features and extracting nodes in the following MGRM. Itshould be noted that the sizes of the max-pooling kernels are diﬀerent for the optic disc detectionnetwork and the retinal layer segmentation network: the size of each max-pooling kernel is setto (2, 2, 2) in the retinal layer segmentation network as in Fig. 6, while that of the optic discdetection network is set to be (2, 4, 2) to better capture the larger-scale semantic informationrepresented by the optic disc.

In our two-stage segmentation framework, two loss functions L Seg and L seg are proposed tosupervise these two stage MGU-Nets and to enforce them to segment optic disc and nine retinallayers in an end-to-end fashion more accurately. The total loss L in this study is the sum of thetwo losses. The total loss function is shown as follow, L = L seg + 𝜆 L seg (1) ig. 6. The structure of MGU-Net comprises of encoder, MGRM and decoder. The skip-connections concatenate low-level features from encoding path to the correspondinghigh-level features in decoding path. where 𝜆 weights the two losses. L seg are deﬁned as the sum of Dice loss and Cross-Entropy loss,which can be described as L seg = L dioe + L ce (2)where L dice = − 𝑀 𝑀 ∑︁ 𝑖 = (cid:18) (cid:205) 𝑥 ∈ Ω 𝑝 𝑖 ( 𝑥 ) × 𝑔 𝑖 ( 𝑥 ) (cid:205) 𝑥 ∈ Ω 𝑝 𝑖 ( 𝑥 ) + (cid:205) 𝑥 ∈ Ω 𝑔 𝑖 ( 𝑥 ) (cid:19) L ce = − 𝑀 𝑀 ∑︁ 𝑖 = 𝑔 𝑖 log ( 𝑝 𝑖 ) (3)in which 𝑔 𝑖 and 𝑝 𝑖 indicates the ground truth and the probability in prediction of pixel 𝑥 belongingto of class 𝑖, 𝑀 is the number of classes in the segmentation network.

3. Experiment Design

To verify the eﬀectiveness of the framework, we conducted a series of experiments on collectedperipapillary retinal OCT images. All images were de-identiﬁed and the procedure was approvedby Internal Review Board of Shanghai General Hospital. The entire dataset consists of 61diﬀerent subjects, for each of which 12 radial OCT B-scans are collected at the OphthalmologyDepartment of Shanghai General Hospital by using DRI OCT-1 Atlantis (Topcon Corporation,Tokyo, Japan). The clinical characteristics of dataset are provided in Table 1. The image size is1024 ×

992 in pixel, corresponding to a ﬁeld of view of 20.48 mm × able 1. Clinical characteristics of subjects in dataset. Characteristics Value

Subject 61

Demographic proﬁle

Male, n (%) 26 (42.6%)Age in yrs, mean ± standard deviation 66.40 ± ± standard deviation 12.52 ± ± standard deviation 26.53 ± Ophthalmological disease

High myopia, n (%) 34 (55.7%)Peripapillary atrophy, n (%) 38 (62.3%)Cataract, n (%) 29 (47.5%)

In addition, we tested our proposed technique on the Duke SD-OCT dataset, which was collectedby Chiu et al . using a Spectralis HRA+OCT (Heidelberg Engineering, Heidelberg, Germany) [40].It consists of 110 OCT B-scans obtained from 10 patients with diabetic macular edema (DME)with a size of 496 ×

768 pixels. More details about this dataset could be found in [40].

The proposed method was implemented in PyTorch and trained on NVIDIA Tesla V100 GPUs.During the training, the initial learning rate was 0.001 and was reduced by an order of magnitudeafter every 20 epochs. The number of epochs was 50. Momentum and weight decay coeﬃcientswere set to 0.9 and 0.0001, respectively. We used Adam optimizer to train the model in minibatches of size = 1. Parameter 𝜆 in Eq. (1) is empirically set as 2. To ensure a fair comparison,the training hyperparameters were kept constant to achieve the best performance for all thecomparative methods. We compared our model with state-of-the-art techniques, including U-Net [28], ReLayNet [34]and DRUNET [30]. U-Net is a popular segmentation network used for medical image. ReLayNetis specially designed for segmenting retinal layers and ﬂuid in macular OCT images. DRUNETis proposed to segment optic nerve head tissues in peripapillary OCT images using a dilatedand residual U-shape network. It is worth noting that U-Net has four down-sampling and fourup-sampling operators, while ReLayNet and DRUNET both perform three down-sampling andthree up-sampling operations.

We repeat the experiments on the Duke SD-OCT dataset, which is macular centered. Sincethe proposed two-stage framework is originally designed for segmenting peripapillary OCTB-scans, we removed the ﬁrst stage and only use the second stage to compete against thestate-of-the-art models including U-Net, ReLayNet, DRUNET and published results on thispublic dataset [34, 35, 40, 45]. .4. Ablation study

To assess the contribution of each component of the proposed framework, we performed severalablation studies on collected dataset. We compare the performance of the proposed model withthat of (1) one-stage baseline, (2) two-stage baseline, (3) without graph reasoning blocks inmulti-scale global reasoning module, and (4) with single-scale global reasoning module in thetwo-stage segmentation framework.

The Dice score (DSC) and pixel accuracy (PA) between predictions and segmentation referenceswere used for quantitative evaluation of segmentation performance. They are calculated as:DSC = | 𝑋 ∩ 𝑌 || 𝑋 | + | 𝑌 | (4)and PA = | 𝑋 ∩ 𝑌 || 𝑌 | (5)where 𝑋 is the region of prediction and 𝑌 is the region of ground truth. Dice score was usedto measure the overlap between the prediction and ground truth. Pixel accuracy was used tocalculate the true positive rate of predicted results compared with their ground truth.

4. Results

The experimental results on the collected dataset are listed in Table 2 and Table 3. The proposedmethod outperforms the selected state-of-the-art methods in most optic nerve head tissuecategories except for the optic disc (both Dice and pixel accuracy) and RPE (pixel accuracy): theoverall average Dice score for the proposed network is 1.6%, 1.5%, and 1.4% higher than that ofReLayNet, U-Net, and DRUNET, respectively. A similar trend is observed in the pixel accuracyresults.Fig. 7 shows the segmentation results obtained by various techniques on a normal peripapillaryOCT image. The segmented image obtained by the proposed method as shown in Fig.7(f)presents the best visual quality, while various artifacts are visible in the others. We roughlycategorize the artifacts into two groups,1) layer discontinuities, where the stratiﬁed retinal layers are unexpectedly terminated in thehorizontal direction as observed in ReLayNet (Fig. 7(d)) and DRUNET (Fig. 7(e)) and markedby white stars;2) scattered labels, where an isolated region enclosed by a certain layer to be recognized asothers as pointed out by the yellow arrows in Fig. 7(c)-(d). This type of errors is observed in allbut the proposed method.We suggest that the improved performance could be ascribed to the additional prior knowledgeincorporated in the proposed framework. For conventional CNN-based algorithms, the pixel-levelclassiﬁcation is often sensitive to the textural details, while we regularize this with extra spatialconstraints in the proposed technique: the segmentation results have to comply with the learnedspatial layout, which dictates that the retinal layers must be arranged as a horizontally continuousand vertically stratiﬁed structure.A similar observation could be made on a diseased sample with retinal lesion as illustratedin Fig. 8. It is clear that the blurred boundaries and the reduced contrast in the lesion area ascircled out in Fig. 8(a) are challenging for conventional CNN-based algorithms. On the otherhand, while mis-labelling are also observed in the proposed method, the stratiﬁed structure of theretinal layers is well preserved, which again demonstrates the eﬃcacy of the proposed technique. able 2. Dice score (%) for the segmentation on collected dataset obtained by diﬀerentmethods. The best performance is marked by “*”, the second-best performance isindicated by “**”. Improvement is deﬁned as the diﬀerence between the proposedmethod and the best performance obtained among other techniques.

MethodTissue U-Net ReLayNet DRUNET Proposed ImprovementAverage ± ± ± ± ↑ Layer ± ± ± ± ↑ RNFL ± ± ± ± ↑ GCL ± ± ± ± ↑ IPL ± ± ± ± ↑ INL ± ± ± ± ↑ OPL ± ± ± ± ↑ ONL ± ± ± ± ↑ IS/OS ± ± ± ± ↑ RPE ± ± ± ± ↑ Choroid ± ± ± ± ↑ Disc ± ± ± ± ↓ MethodTissue U-Net ReLayNet DRUNET Proposed ImprovementAverage ± ± ± ± ↑ Layer ± ± ± ± ↑ RNFL ± ± ± ± ↑ GCL ± ± ± ± ↑ IPL ± ± ± ± ↑ INL ± ± ± ± ↑ OPL ± ± ± ± ↑ ONL ± ± ± ± ↑ IS/OS ± ± ± ± ↑ RPE ± ± ± ± ↓ Choroid ± ± ± ± ↑ Disc ± ± ± ± ↓ The experimental results obtained on the Duke SD-OCT dataset are reported in Table 4. It isworth mentioning that the Duke dataset is macular-centered, while our segmentation frameworkis designed for disc-center images. Therefore, only one stage of proposed MGU-Net is usedin this experiment. Nonetheless, the proposed model achieved best performance in ONL-ISMlayer and second-best performance in four retinal layers. The average Dice score achieved by theproposed MGU-Net is the highest if we do not take the results reported by Roy et al into account,while it does outperform the ReLayNet we reproduced. We also displayed the segmented imagesin Fig. 9. The proposed MGU-Net manifests better visual quality in comparison with other OCTretinal image segmentation methods. Consistent with the observations we have made in Section4.1, artifacts such as layer discontinuities, which are marked by white stars, are presented in theimage segmented by U-Net and DRUNET in Fig. 9(c) and Fig. 9(e), respectively.

The quantitative results of the ablation study on our dataset are listed in Table 5 and Table6. Baseline is the U-shape network with three down-sampling operations, which is one levelshallower than the U-Net used in previous section. We ﬁrst compared the results of two-stagebaseline with that of one-stage baseline. If a two-stage network is adopted to segment opticdisc and retinal layers separately, the Dice score and pixel accuracy were improved by 0.4% and ig. 8. Segmentation of a peripapillary OCT image with retinal lesion. (a) Originalimage. (b) Ground truth. (c) U-Net’s prediction. (d) ReLayNet’s prediction. (e)DRUNET’s prediction. (f) Proposed method’s prediction. The scattered labels arepointed out by yellow arrows, the layer discontinuities are marked by white star, andthe retinal lesion is circled out by white dashed line. Magniﬁed views are also providedfor better visualization.

5. Discussion

In the current study, 122 OCT B-scans from 61 individuals are manually annotated and includedin our experiment. While the size of the dataset is relatively small, it should be noted that the able 4. Dice score for segmentation results on Duke SD-OCT dataset by diﬀerentmethods and expert 2 annotations. The best performance is marked by “*”, thesecond-best performance is indicated by “**”.

TissueMethod RNFL GCL-IPL INL OPL ONL-ISM ISE OS-RPE AverageManual expert 2 [45] 0.86 0.89 0.8 0.72 0.88 0.86 0.84 0.84

Chiu [40] 0.86 0.88 0.73 0.73 0.86 0.86 0.8 0.82

Chakravarty [45] 0.86 0.89 0.8 0.72 0.88 0.86 0.84 0.84

U-Net [34] 0.86 0.91 0.83** 0.81** 0.91 0.9 0.83 0.86

Roy [34] 0.90* 0.94* 0.87* 0.84* 0.93 0.92* 0.90* 0.90*

Wang [35] 0.86 0.9 0.78 0.78 0.94** 0.9 0.86 0.86

U-Net

ReLayNet

DRUNET

MGU-Net (ours) qualiﬁed human data are diﬃcult to acquire and manually annotating 10 layers in one OCT imageis very expensive and time-consuming. To partially overcome this issue, we performed dataaugmentations on the training set including horizontal ﬂipping, additive Gaussian noises, andcontrast adjustment.

It is well known that the commonly presented artifacts including vessel shadows and retinallesions might inﬂuence the automated segmentation algorithms. Those artifacts often cause able 5. Ablation study of each parts in our framework through comparing the Dicescore (%). The best performance is marked by “*” (%).

Method One-stage Two-stage GRB MSP AVG. RNFL GCLBaseline (cid:88) ± ± ± Two-stage baseline (cid:88) (cid:88) ± ± ± Proposed w/o MSP (cid:88) (cid:88) (cid:88) ± ± ± Proposed w/o GRB (cid:88) (cid:88) (cid:88) ± ± ± Proposed (cid:88) (cid:88) (cid:88) (cid:88) ± ± ± Method One-stage Two-stage GRB MSP AVG. RNFL GCLBaseline (cid:88) ± ± ± Two-stage baseline (cid:88) (cid:88) ± ± ± Proposed w/o MSP (cid:88) (cid:88) (cid:88) ± ± ± Proposed w/o GRB (cid:88) (cid:88) (cid:88) ± ± ± Proposed (cid:88) (cid:88) (cid:88) (cid:88) ± ± ± blurred layer boundaries, diminished tissue texture, and altered image contrast, which couldpotentially lead to a decrease in the segmentation accuracy.A good illustration is provided in Fig. 8, where the presented retinal lesion is circled out inthe right panel of Fig. 8(a). For conventional deep learning-based algorithms such as U-Net,ReLayNet, and DRUNET, labelling errors are visible as shown in Fig. 8(c)-(e). On the otherhand, the proposed method is more robust to this perturbation. We believe this might becausethese algorithms mainly depend on the texture details of the images to perform the pixel-levelclassiﬁcations, while the proposed method explicitly imposes spatial constraints on to the taskwhich regularizes the task and ensures a better visual outcome in this case as illustrated in Fig.8(f).It is also worth mentioning that the proposed method might be aﬀected by the label noises. Dueto the limited resources, the manual segmentation of the OCT images is performed collaborativelyby two graders under the supervision of a retinal specialist, such that one image is only segmentedby one grader. Therefore, it is possible that small label noises are introduced, because the twograders might possess diﬀerent preferences and styles during the annotation [45]. This mightslightly impair the performance of the segmentation.To further address these issues, we plan to perform detection, removal, and inpainting for theartifact regions and tackle the label noise in the future [46, 47]. One of the potential limitations of the proposed framework is that it requires the input images tobe well standardized such that all anatomical assumptions or spatial constraints we have madeare valid. Take the ﬁrst stage, the optic disc segmentation network, as an example, it relies onthe presumption that the optic disc region and the non-disc regions are arranged in a properhorizontal order as mentioned in Section 2.1. We could be done by perform a registration processrior to segmentation with a goal of registering the optic disc region with a retinal template inthe future.

6. Conclusion

To address the challenges imposed by the multi-scale features presented in the optic disc andthe retinal layers with various thicknesses as well as exploiting the existing anatomical priors, amulti-scale global reasoning module, which is capable of long-range contextual spatial reasoning,is proposed and integrated into a U-Net backbone. Speciﬁcally, a two-stage framework isconstructed to sequentially segment the optic disc and the retinal layers in peripapillary OCTimages. We validated the proposed framework on a collected dataset as well as a publicdataset. The experimental results on both datasets showed that the proposed method couldconsiderably improve the segmentation performance of optic nerve head tissues compared withother state-of-the-art techniques. The proposed method achieved 82.0% and 83.0% in terms ofDice score and pixel accuracy on average, which is 1.6% and 2.8% higher than the performanceof ReLayNet. More importantly, the visual quality of the segmented images is greatly enhanced,thanks to the anatomical constraints imposed by the multi-scale global reasoning module. In thefuture, we will incorporate the proposed segmentation network into the workﬂow of early-stageglaucoma diagnosis. We also believe the proposed architecture could be domain transferred toother biomedical image segmentation tasks where an abundance of anatomical priors is available.To facilitate the progression of the ﬁled, we make our segmentation dataset as well as the codesavailable. To our best knowledge, this will be the ﬁrst public peripapillary retinal OCT dataset.

7. Acknowledgement

The computations in this paper were run on the 𝜋

8. Disclosures

The authors declare no conﬂicts of interest.

References

1. S. Resnikoﬀ, D. Pascolini, D. Etya’ale, I. Kocur, R. Pararajasegaram, G. P. Pokharel, and S. P. Mariotti, “Global dataon visual impairment in the year 2002,” Bull. World Heal. Organ. , 844–851 (2004).2. P. Song, J. Wang, K. Bucan, E. Theodoratou, I. Rudan, and K. Y. Chan, “National and subnational prevalence andburden of glaucoma in china: A systematic analysis,” J. Glob. Heal. , 020705 (2017).3. A. V. Mantravadi and N. Vadhar, “Glaucoma,” Prim. Care , 437–49 (2015).4. V. Kansal, J. J. Armstrong, R. Pintwala, and C. Hutnik, “Optical coherence tomography for glaucoma diagnosis: Anevidence based meta-analysis,” PLoS One , e0190621 (2018).5. F. A. Almobarak, N. O’Leary, A. S. Reis, G. P. Sharpe, D. M. Hutchison, M. T. Nicolela, and B. C. Chauhan,“Automated segmentation of optic nerve head structures with optical coherence tomography,” Invest Ophthalmol Vis.Sci. , 1161–8 (2014).6. D. Alonso-Caneiro, S. A. Read, and M. J. Collins, “Automatic segmentation of choroidal thickness in opticalcoherence tomography,” Biomed. Opt. Express , 2795–2812 (2013).7. S. J. Chiu, M. J. Allingham, P. S. Mettu, S. W. Cousins, J. A. Izatt, and S. Farsiu, “Kernel regression basedsegmentation of optical coherence tomography images with diabetic macular edema,” Biomed. Opt. Express ,1172–94 (2015).8. M. K. Garvin, M. D. Abramoﬀ, X. Wu, S. R. Russell, T. L. Burns, and M. Sonka, “Automated 3-d intraretinal layersegmentation of macular spectral-domain optical coherence tomography images,” IEEE Trans. Med. Imaging ,1436–47 (2009).9. A. Lang, A. Carass, M. Hauser, E. S. Sotirchos, P. A. Calabresi, H. S. Ying, and J. L. Prince, “Retinal layersegmentation of macular oct images using boundary classiﬁcation,” Biomed. Opt. Express , 1133–52 (2013).10. M. A. Mayer, J. Hornegger, C. Y. Mardin, and R. P. Tornow, “Retinal nerve ﬁber layer segmentation on fd-oct scansof normal subjects and glaucoma patients,” Biomed. Opt. Express , 1358–1383 (2010).1. S. Naz, A. Ahmed, M. U. Akram, and S. A. Khan, “Automated segmentation of rpe layer for the detection of agemacular degeneration using oct images,” in (2016), pp. 1–4.12. S. Niu, Q. Chen, L. de Sisternes, D. L. Rubin, W. Zhang, and Q. Liu, “Automated retinal layers segmentation in sd-octimages using dual-gradient and spatial correlation smoothness constraint,” Comput. Biol. Med. , 116–28 (2014).13. P. P. Srinivasan, S. J. Heﬂin, J. A. Izatt, V. Y. Arshavsky, and S. Farsiu, “Automatic segmentation of up to ten layerboundaries in sd-oct images of the mouse retina with and without missing layers due to pathology,” Biomed. Opt.Express , 348–365 (2014).14. J. Tian, P. Marziliano, M. Baskaran, T. A. Tun, and T. Aung, “Automatic segmentation of the choroid in enhanceddepth imaging optical coherence tomography images,” Biomed. Opt. Express , 397–411 (2013).15. C. Wang, Y. X. Wang, and Y. Li, “Automatic choroidal layer segmentation using markov random ﬁeld and level setmethod,” IEEE J. Biomed. Heal. Inform. , 1694–1702 (2017).16. J. Wang, M. Zhang, A. D. Pechauer, L. Liu, T. S. Hwang, D. J. Wilson, D. Li, and Y. Jia, “Automated volumetricsegmentation of retinal ﬂuid on optical coherence tomography,” Biomed. Opt. Express , 1577–1589 (2016).17. P. Zang, S. S. Gao, T. S. Hwang, C. J. Flaxel, D. J. Wilson, J. C. Morrison, D. Huang, D. Li, and Y. Jia, “Automatedboundary detection of the optic disc and layer segmentation of the peripapillary retina in volumetric structural andangiographic optical coherence tomography,” Biomed. Opt. Express , 1306–1318 (2017).18. S. Borkovkina, A. Camino, W. Janpongsri, M. V. Sarunic, and Y. Jian, “Real-time retinal layer segmentation of octvolumes with gpu accelerated inferencing using a compressed, low-latency neural network,” Biomed. Opt. Express (2020).19. J. De Fauw, J. R. Ledsam, B. Romera-Paredes, S. Nikolov, N. Tomasev, S. Blackwell, H. Askham, X. Glorot,B. O’Donoghue, D. Visentin, G. van den Driessche, B. Lakshminarayanan, C. Meyer, F. Mackinder, S. Bouton,K. Ayoub, R. Chopra, D. King, A. Karthikesalingam, C. O. Hughes, R. Raine, J. Hughes, D. A. Sim, C. Egan,A. Tufail, H. Montgomery, D. Hassabis, G. Rees, T. Back, P. T. Khaw, M. Suleyman, J. Cornebise, P. A. Keane, andO. Ronneberger, “Clinically applicable deep learning for diagnosis and referral in retinal disease,” Nat. Med. ,1342–1350 (2018).20. H. Dong, G. Yang, F. Liu, Y. Mo, and Y. Guo, “Automatic brain tumor detection and segmentation using u-netbased fully convolutional networks,” in Medical Image Understanding and Analysis,

M. Valdés Hernández andV. González-Castro, eds. (Springer International Publishing, Cham, 2017), pp. 506–517.21. J. Fan, J. Yang, Y. Wang, S. Yang, D. Ai, Y. Huang, H. Song, A. Hao, and Y. Wang, “Multichannel fully convolutionalnetwork for coronary artery segmentation in x-ray angiograms,” IEEE Access , 44635–44643 (2018).22. K. Kamnitsas, C. Ledig, V. F. J. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker,“Eﬃcient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation,” Med. Image Anal. ,61–78 (2017).23. X. Li, H. Chen, X. Qi, Q. Dou, C. W. Fu, and P. A. Heng, “H-denseunet: Hybrid densely connected unet for liver andtumor segmentation from ct volumes,” IEEE Trans. Med. Imaging , 2663–2674 (2018).24. Y. Ma, H. Hao, H. Fu, J. Zhang, J. Yang, J. Liu, Y. Zheng, and Y. Zhao, “ROSE: A Retinal OCT-Angiography VesselSegmentation Dataset and New Model,” arXiv: 2007.05201 (2020).25. P. Moeskops, M. A. Viergever, A. M. Mendrik, L. S. de Vries, M. J. Benders, and I. Isgum, “Automatic segmentationof mr brain images with a convolutional neural network,” IEEE Trans. Med. Imaging , 1252–1261 (2016).26. C. Wu, Y. Xie, L. Shao, J. Yang, D. Ai, H. Song, Y. Wang, and Y. Huang, “Automatic boundary segmentation ofvascular doppler optical coherence tomography images based on cascaded u-net architecture,” OSA Continuum (2019).27. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” arXiv: 1411.4038(2014).28. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015,

N. Navab, J. Hornegger, W. M.Wells, and A. F. Frangi, eds. (Springer International Publishing), pp. 234–241.29. Z. Chai, K. Zhou, J. Yang, Y. Ma, Z. Chen, S. Gao, and J. Liu, “Perceptual-assisted adversarial adaptation for choroidsegmentation in optical coherence tomography,” in pp. 1966–1970.30. S. K. Devalla, P. K. Renukanand, B. K. Sreedhar, G. Subramanian, L. Zhang, S. Perera, J. M. Mari, K. S. Chin, T. A.Tun, N. G. Strouthidis, T. Aung, A. H. Thiery, and M. J. A. Girard, “Drunet: a dilated-residual u-net deep learningnetwork to segment optic nerve head tissues in optical coherence tomography images,” Biomed. Opt. Express ,3244–3265 (2018).31. Y. He, A. Carass, Y. Liu, B. M. Jedynak, S. D. Solomon, S. Saidha, P. A. Calabresi, and J. L. Prince, “Deep learningbased topology guaranteed surface and mme segmentation of multiple sclerosis subjects from retinal oct,” Biomed.Opt. Express , 5042–5058 (2019).32. M. Heisler, M. Bhalla, J. Lo, Z. Mammo, S. Lee, M. J. Ju, M. F. Beg, and M. V. Sarunic, “Semi-supervised deeplearning based 3d analysis of the peripapillary region,” Biomed Opt Express , 3843–3856 (2020).33. S. Krishna Devalla, T. H. Pham, S. K. Panda, L. Zhang, G. Subramanian, A. Swaminathan, C. Zhi Yun, M. Rajan,S. Mohan, R. Krishnadas, V. Senthil, J. M. S. de Leon, T. A. Tun, C.-Y. Cheng, L. Schmetterer, S. Perera, T. Aung,A. H. Thiery, and M. J. A. Girard, “Towards label-free 3d segmentation of optical coherence tomography images ofhe optic nerve head using deep learning,” arXiv e-prints p. arXiv:2002.09635 (2020).34. A. G. Roy, S. Conjeti, S. P. K. Karri, D. Sheet, A. Katouzian, C. Wachinger, and N. Navab, “Relaynet: retinal layerand ﬂuid segmentation of macular optical coherence tomography using fully convolutional networks,” Biomed. Opt.Express , 3627 (2017).35. J. Wang, Z. Wang, F. Li, G. Qu, Y. Qiao, H. Lv, and X. Zhang, “Joint retina segmentation and classiﬁcation for earlyglaucoma diagnosis,” Biomed. Opt. Express , 2639–2656 (2019).36. X. Xi, X. Meng, Z. Qin, X. Nie, Y. Yin, and X. Chen, “Ia-net: informative attention convolutional neural network forchoroidal neovascularization segmentation in oct images,” Biomed. Opt. Express (2020).37. X. Yang, X. Chen, and D. Xiang, “Attention-guided channel to pixel convolution network for retinal layer segmentationwith choroidal neovascularization,” in Medical Imaging 2020: Image Processing, vol. 11313 I. Išgum and B. A.Landman, eds., International Society for Optics and Photonics (SPIE, 2020), pp. 786 – 792.38. P. Zang, J. Wang, T. T. Hormel, L. Liu, D. Huang, and Y. Jia, “Automated segmentation of peripapillary retinalboundaries in oct combining a convolutional neural network and a multi-weights graph search,” Biomed. Opt. Express , 4340–4352 (2019).39. J. Atif, C. Hudelot, G. Fouquier, I. Bloch, and E. D. Angelini, “From generic knowledge to speciﬁc reasoning formedical image interpretation using graph based representations.” in ĲCAI, (2007), pp. 224–229.40. S. J. Chiu, M. J. Allingham, P. S. Mettu, S. W. Cousins, J. A. Izatt, and S. Farsiu, “Kernel regression basedsegmentation of optical coherence tomography images with diabetic macular edema,” Biomed. Opt. Express ,1172–1194 (2015).41. Y. Chen, M. Rohrbach, Z. Yan, Y. Shuicheng, J. Feng, and Y. Kalantidis, “Graph-based global reasoning networks,”in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019).42. X. Li, Y. Yang, Q. Zhao, T. Shen, Z. Lin, and H. Liu, “Spatial pyramid based graph reasoning for semanticsegmentation,” arXiv: 2003.10211 (2020).43. Z. Gu, J. Cheng, H. Fu, K. Zhou, H. Hao, Y. Zhao, T. Zhang, S. Gao, and J. Liu, “Ce-net: Context encoder networkfor 2d medical image segmentation,” IEEE Trans. Med. Imaging , 2281–2292 (2019).44. P. A. Yushkevich, J. Piven, H. C. Hazlett, R. G. Smith, S. Ho, J. C. Gee, and G. Gerig, “User-guided 3d active contoursegmentation of anatomical structures: signiﬁcantly improved eﬃciency and reliability,” Neuroimage , 1116–28(2006).45. A. Chakravarty and J. Sivaswamy, “A supervised joint multi-layer segmentation framework for retinal opticalcoherence tomography images using conditional random ﬁeld,” Comput. Methods Programs Biomed.165