Topology guaranteed segmentation of the human retina from OCT using convolutional neural networks
Yufan He, Aaron Carass, Bruno M. Jedynak, Sharon D. Solomon, Shiv Saidha, Peter A. Calabresi, Jerry L. Prince
TTopology guaranteed segmentation of the humanretina from OCT using convolutional neuralnetworks
Yufan He , Aaron Carass , , Bruno M. Jedynak , Sharon D. Solomon , ShivSaidha , Peter A. Calabresi , and Jerry L. Prince , Dept. of Electrical and Computer Engineering, Dept. of Computer Science,The Johns Hopkins University, Baltimore, MD 21218, USA Dept. of Mathematics & Statistics, Portland State University,Portland, OR 97201, USA Wilmer Eye Institute, Dept. of Neurology,The Johns Hopkins University School of Medicine, MD 21287, USA
Abstract.
Optical coherence tomography (OCT) is a noninvasive imag-ing modality which can be used to obtain depth images of the retina.The changing layer thicknesses can thus be quantified by analyzing theseOCT images, moreover these changes have been shown to correlate withdisease progression in multiple sclerosis. Recent automated retinal layersegmentation tools use machine learning methods to perform pixel-wiselabeling and graph methods to guarantee the layer hierarchy or topology.However, graph parameters like distance and smoothness constraintsmust be experimentally assigned by retinal region and pathology, thusdegrading the flexibility and time efficiency of the whole framework. Inthis paper, we develop cascaded deep networks to provide a topologi-cally correct segmentation of the retinal layers in a single feed forwardpropagation. The first network (S-Net) performs pixel-wise labeling andthe second regression network (R-Net) takes the topologically uncon-strained S-Net results and outputs layer thicknesses for each layer andeach position. Relu activation is used as the final operation of the R-Netwhich guarantees non-negativity of the output layer thickness. Since thesegmentation boundary position is acquired by summing up the corre-sponding non-negative layer thicknesses, the layer ordering (i.e., topology)of the reconstructed boundaries is guaranteed even at the fovea wherethe distances between boundaries can be zero. The R-Net is trainedusing simulated masks and thus can be generalized to provide topologyguaranteed segmentation for other layered structures. This deep networkhas achieved comparable mean absolute boundary error (2.82 µ m) tostate-of-the-art graph methods (2.83 µ m). Keywords:
Retina OCT, Deep learning segmentation, Topology guarantee. a r X i v : . [ c s . C V ] M a r Y. He et al.
Optical coherence tomography (OCT) is a widely used non-invasive and non-ionizing modality for retina imaging which can obtain 3D retina images rapidly [8].The depth information of the retina from OCT enables measurements of layerthicknesses, which are known to change with certain diseases [11]. Fast automatedretinal layer segmentation tools are crucial for large cohort studies of thesediseases.Automated methods for retinal layer segmentation have been well explored ([1,12]). State-of-the-art methods use machine learning (e.g, random forest (RF) [9])for coarse pixel-wise labeling and then level set [3] or graph methods [6, 9]to guarantee the segmentation topology (i.e., the anatomically correct retinallayer ordering) and obtain the final boundary surfaces. They are limited by themanually selected features for the pixel-wise labeling task and the manually tunedparameters of the graph. To build the graph, boundary distances and smoothnessconstraints which are spatially varying need to be experimentally assigned. Themanually selected features and fine tuned graph parameters limit the applicationacross cohorts.Deep learning automatically extracts relevant image features from the trainingdata and performs the segmentation in a feed forward fashion. The fully convolu-tional network (FCN) proposed by Long et al. [10] is a successful deep learningsegmentation method and the U-Net variant [14] is widely used for medical imagesegmentation. Both Roy et al. [15] and He et al. [7] proposed FCNs for retinallayer segmentation (the former also included fluid segmentation). However, theseFCN methods provide pixel-wise labeling without explicitly utilizing high levelpriors like shape, and neither guarantee the correct topology. Examples of FCNsgiving anatomical infeasible results are shown in Fig. 4.In order to obtain structured output directly from deep networks, Zheng etal. [16] implemented conditional random field as a recurrent neural network. Thismethod can provide better label consistency but cannot guarantee global topology.BenTaieb et al. [2] proposed to explicitly integrate the topology priors into theloss function during training and Romero et al. [13] used a second auto-encodernetwork to learn the output shape prior. Although those methods can improvethe segmentation results by utilizing shape and topology priors, they still cannotguarantee the correct topology.To obtain a topologically correct segmentation of the retinal layers froma deep network in a single feed forward propagation, we propose a cascadedFCN framework that transforms the layer segmentation problem from pixellabeling into a boundary position regression problem. Instead of outputtingthe boundary position directly, we use the network to output the distancebetween two boundaries, i.e, the layer thickness. The first network (S-Net)performs pixel labeling and the second regression network (R-Net) takes thetopologically unconstrained S-Net results and outputs layer thicknesses for eachlayer and each position. Relu [4] activation is used as the final operation of R-Net,which guarantees the non-negativity of the output layer thicknesses. Since theboundary position is acquired by summing up the corresponding non-negative opology guaranteed segmentation of human retina OCT using CNNs 3
Fig. 1.
A schematic of the proposed method. layer thicknesses, the ordering of the reconstructed boundaries is guaranteed evenat the fovea where the distances between boundaries can be zero.
Fig. 1 shows a schematic of our framework. We describe each step in our processingbelow.
Preprocessing
A typical retinal B-scan is 496 × ×
128 areextracted and segmented by the deep network.
Segmentation Network (S-Net) Overview
Our segmentation FCN (S-Net)is based on the U-Net [14]. It takes a 128 ×
128 image as input and the output is a10 × ×
128 segmentation probability map which includes probability maps forthe eight retinal layers and the background above and below the retina (vitreousand choroid, respectively). Fig. 2 shows the details of S-Net; specifically, four2 × Regression Net (R-Net) Overview
The R-Net consists of two parts: a U-Netidentical to our S-Net (except for the input channels) and a dense layer. Theinput to the R-Net are the topologically unconstrained results from the S-Net.R-Net is applied to learn the shape and topology priors of the layer structures,while being resistant to the segmentation defects. (see Fig. 4 for examples). Thedense layer of the R-Net uses Relu activation and thus guarantees a non-negativeoutput vector. The size of this output is 128 × We train our framework in two steps: S-Net is trained with a common pixel-wiselabeling scheme because every pixel in the training data can be treated as an
Y. He et al. independent training sample and the total training data size is enlarged [5].R-Net is trained with augmented ground truth masks to learn the shape andtopology prior. An alternative way to train the R-Net is to take the S-Netoutput as input and output the ground truth layer thickness. However, trainingin this manner would be sub-optimal as the S-Net output is not the groundtruth mask. Thus, the training pairs of the S-Net output and the ground truththicknesses are biased, which would bias the resultant R-Net. Therefore, we trainboth networks independently. We note that training the R-Net separately withsimulated training masks allows this network to be generalized for use with otherlayered structures.
S-Net training
The S-Net is trained with a common pixel-wise labeling scheme,namely the cross-entropy loss function: L = − (cid:88) x ∈ Ω g l ( x )log( p l ( x ; θ )) . (1)Here, g l ( x ) is an indicator function on the ground truth label of pixel x and p l ( x ; θ ) is the prediction probability from the deep network that the pixel x belongs to layer l . Standard back-propagation is used to minimize the loss andupdate the network parameter θ . R-Net training
The purpose of the regression net is to find a mapping fromthe pixel-wise segmentation probability maps into layer thicknesses. We simulatetopology defects with the ground truth mask and use R-Net to recover the correctlayer thicknesses. The training of R-Net is based on minimizing the mean squaredloss function below with standard back-propagation, L = || T ( g ( x )) − R ( g ( x ) + s ( x ); θ ) || . (2)Here, g ( x ) is the ground truth mask, T ( g ( x )) is the corresponding ground truthlayer thickness, R is the prediction from the regression net, s ( x ) is the simulated Fig. 2.
A schematic of the of S-Net.opology guaranteed segmentation of human retina OCT using CNNs 5 defects and Gaussian noise [13] added to the ground truth mask. The simulateddefects are random ellipses with magnitude ranging from − Ten fully delineated Spectralis Spectral Domain OCT (SD-OCT) scans (of size496 × ×
49) were used for training. 20 overlapped patches were extractedwithin each B-Scan for training both networks, which yielded 9600 samples fortraining. 20 SD-OCT macular OCT scans (of size 496 × ×
49) were acquiredfor validation. Ten data sets in our validation cohort were diagnosed with multiplesclerosis (MS) and the remaining ten were healthy controls.
Boundary segmentation accuracy was evaluated by comparing the automaticsegmentation results with manual delineation along every A-scan. The meanabsolute distance (MAD), root mean square error (RMSE), and mean signed differ-ence (MSD) were calculated for the state-of-the-art RF + Graph method (RF+G) [9] and our proposed deep networks (S-Net + R-Net). The Wilcoxon signed testwas used to compare these two methods and the 95% quantile of the MSDis also reported. These results are shown in Table. 1. The depth resolution is3.9 µ m. From the table, both methods have MAE and RMSE less than 1 pixeland our proposed method achieves similar or slightly better results than thestate-of-the-art graph methods. The MSD and 95% quantile show that comparedto our proposed method, the graph method is more biased. Figs. 4 and 5 show Code for Lang et al. downloaded from
Fig. 3.
Top row: Five g l ( x ) masks. Bottom row: After adding noise and defects. Y. He et al. some examples that when the image is of poor quality or the boundaries in theimage are not clear, the S-Net results can be wrong whereas R-Net guaranteesthe correct topology while maintaining state-of-the-art accuracy.The total segmentation time of our proposed deep network for one 496 × ×
49 scan is 10 s (preprocessing and reconstruction included), of whichthe deep network inference takes 5.85 s. The segmentation is performed withPython 3.6 and the preprocessing is performed in Matlab R2016b called directlyfrom the Python environment. The RF+G method, had a total segmentationtime of 100s in Matlab R2016b, of which RF classification was 62 s and the graphmethod took 20 s.
Table 1.
Boundary accuracy evaluated on 20 fully delineated scans for RF+G [9] andour proposed method, S-Net followed by R-Net (S+R-Net).(MAD – mean absolutedistance; RMSE – root mean square error; MSD – mean signed difference; p – p -value.) MAD ( µm ) RMSE ( µm )Boundary RF+G S+R-Net p RF+G S+R-Net p Vitre-RNFL
RNFL-GCL
IPL-INL
INL-OPL
OPL-ONL
ELM
IS-OS
OS-RPE
RPE
Overall
MSD ( µm ) (95% quantile)Boundary RF+G S+R-NetVitre-RNFL RNFL-GCL -0.37 (-9.14, 7.26) -0.43 (-9.13, 7.21)
IPL-INL
INL-OPL -0.04 (-8.70, 8.00) -1.08 (-9.44, 6.89)
OPL-ONL
ELM
IS-OS
OS-RPE
RPE
Overall
Fig. 4.
Left: S-Net results showing defects. Right: R-Net with the correct topology. (a) (b)(c) (d)Fig. 5. (a)
Manual delineation, (b)
RF+G, (c)
S-Net output, and (d)
R-Net results.
In this paper, we presented a fast topology guaranteed deep learning methodfor retinal OCT segmentation. Our method adds a thickness regression networkafter a conventional pixel-wise labeling network and utilizes the Relu activationto guarantee the non-negativity of the output and thus guarantee the topology.Since the R-Net is trained on masks that can be easily generated, our proposedframework can provide a topology guaranteed segmentation solution for otherlayered structures.
This work was supported by the NIH/NEI under grant R01-EY024655.
References
1. Antony, B.J., Miri, M.S., Abr`amoff, M.D., Kwon, Y.H., Garvin, M.K.: Automated3D segmentation of multiple surfaces with a shared hole: segmentation of the neuralcanal opening in SD-OCT volumes. In: 17 th International Conference on Medical Y. He et al.Image Computing and Computer Assisted Intervention (MICCAI 2014). LectureNotes in Computer Science, vol. 8673, pp. 739–746. Springer Berlin Heidelberg(2014)2. BenTaieb, A., Hamarneh, G.: Topology Aware Fully Convolutional Networks forHistology Gland Segmentation. In: 19 th International Conference on Medical ImageComputing and Computer Assisted Intervention (MICCAI 2016). Lecture Notes inComputer Science, vol. 9901, pp. 460–468. Springer Berlin Heidelberg (2016)3. Carass, A., Lang, A., Hauser, M., Calabresi, P.A., Ying, H.S., Prince, J.L.: Multiple-object geometric deformable model for segmentation of macular OCT. Biomed.Opt. Express 5(4), 1062–1074 (2014)4. Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for lvcsrusing rectified linear units and dropout. In: Acoustics, Speech and Signal Processing(ICASSP), 2013 IEEE International Conference on. pp. 8609–8613. IEEE (2013)5. Dou, Q., Yu, L., Chen, H., Jin, Y., Yang, X., Qin, J., Heng, P.A.: 3d deeplysupervised network for automated segmentation of volumetric medical images.Medical Image Analysis (2017)6. Garvin, M.K., Abr`amoff, M.D., Wu, X., Russell, S.R., Burns, T.L., Sonka, M.:Automated 3-D intraretinal layer segmentation of macular spectral-domain opticalcoherence tomography images. IEEE Trans. Med. Imag. 28(9), 1436–1447 (2009)7. He, Y., Carass, A., Yun, Y., Zhao, C., Jedynak, B.M., Solomon, S.D., Saidha, S.,Calabresi, P.A., Prince, J.L.: Towards topological correct segmentation of macularoct from cascaded fcns. In: Fetal, Infant and Ophthalmic Medical Image Analysis,pp. 202–209. Springer (2017)8. Hee, M.R., Izatt, J.A., Swanson, E.A., Huang, D., Schuman, J.S., Lin, C.P., Puliafito,C.A., Fujimoto, J.G.: Optical coherence tomography of the human retina. Arch.Ophthalmol. 113(3), 325–332 (1995)9. Lang, A., Carass, A., Hauser, M., Sotirchos, E.S., Calabresi, P.A., Ying, H.S.,Prince, J.L.: Retinal layer segmentation of macular OCT images using boundaryclassification. Biomed. Opt. Express 4(7), 1133–1152 (2013)10. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic seg-mentation. In: The IEEE Conference on Computer Vision and Pattern Recognition(CVPR). pp. 3431–3440 (June 2015)11. Medeiros, F.A., Zangwill, L.M., Alencar, L.M., Bowd, C., Sample, P.A., Jr., R.S.,Weinreb, R.N.: Detection of Glaucoma Progression with Stratus OCT RetinalNerve Fiber Layer, Optic Nerve Head, and Macular Thickness Measurements.Invest. Ophthalmol. Vis. Sci. 50(12), 5741–5748 (2009)12. Rathke, F., Schmidt, S., Schn¨orr, C.: Probabilistic Intra-Retinal Layer Segmentationin 3-D OCT Images Using Global Shape Regularization. Medical Image Analysis18(5), 781–794 (2014)13. Romero, A., Drozdzal, M., Erraqabi, A., J´egou, S., Bengio, Y.: Image segmen-tation by iterative inference from conditional score estimation. arXiv preprintarXiv:1705.07450 (2017)14. Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomed-ical Image Segmentation. In: 18 thth