3D Vessel Reconstruction in OCT-Angiography via Depth Map Estimation
Shuai Yu, Jianyang Xie, Jinkui Hao, Yalin Zheng, Jiong Zhang, Yan Hu, Jiang Liu, Yitian Zhao
33D VESSEL RECONSTRUCTION IN OCT-ANGIOGRAPHY VIA DEPTH MAP ESTIMATION
Shuai Yu , ,Jianyang Xie ,Jinkui Hao , ,Yalin Zheng ,Jiong Zhang ,Yan Hu ,Jiang Liu ,Yitian Zhao ∗ Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences,Ningbo, China *Email: [email protected] Department of Eye and Vision Science, University of Liverpool, Liverpool, UK Keck School of Medicine, University of Southern California, Los Angeles, US Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China University of Chinese Academy of Sciences, Beijing, China
ABSTRACT
Optical Coherence Tomography Angiography (OCTA) hasbeen increasingly used in the management of eye and sys-temic diseases in recent years. Manual or automatic analysisof blood vessel in 2D OCTA images ( en face angiograms)is commonly used in clinical practice, however it may loserich 3D spatial distribution information of blood vessels orcapillaries that are useful for clinical decision-making. In thispaper, we introduce a novel 3D vessel reconstruction frame-work based on the estimation of vessel depth maps fromOCTA images. First, we design a network with structuralconstraints to predict the depth of blood vessels in OCTAimages. In order to promote the accuracy of the predicteddepth map at both the overall structure- and pixel- level, wecombine MSE and SSIM loss as the training loss function.Finally, the 3D vessel reconstruction is achieved by utilizingthe estimated depth map and 2D vessel segmentation results.Experimental results demonstrate that our method is effec-tive in the depth prediction and 3D vessel reconstruction forOCTA images.
Index Terms — OCTA, depth prediction, 3D vessel re-construction
1. INTRODUCTION
The morphological changes of vascular network distributedin the retina is an essential sign to reveal and identify manyeye and systemic diseases. Optical Coherence TomographyAngiography (OCTA) is a fast and non-invasive imagingtechnique that is capable of acquiring blood flow informationat capillary-level without injecting contrast agents. In addi-tion, it also produces high-resolution 3D images of the retinalblood vessel down to capillary level, as shown in Fig.1(a).Nowadays, OCTA en face angiograms, as shown in Fig.1(b),has been increasingly used in the studies and clinical di-agnosis of various eye-related diseases such as artery andvein occlusions, age related macular degeneration (AMD),diabetic retinopathy (DR), and glaucoma [1], [2].
Fig. 1 . Visualization of (a) a sample 3D OCTA volume, (b)2D en face angiogram, (c) different retinal layers, and (d)depth color encoded map of (b).Several studies[3, 4, 5, 6, 7, 8] have been performed toanalyze retinal vessels in OCTA images in the last decade.Kim et al .[3] used an intensity-based algorithm to calculateindices of microvascular density and morphology on en face
OCTA images. Mou et al .[6] proposed a deep learning-basedmodel to segment the blood vessels on 40 OCTA images. Xie et al .[7] estimated the vascular topologies of paired color fun-dus and OCTA images, respectively to classify retinal arteryand vein on en face
OCTA data. Ma et al .[8] constructeda first public-available vessel segmentation dataset of retinalOCTA image - ROSE, with aiming to train and validate au-tomated vessel segmentation algorithms. However, all theseworks focus on 2D en face representation of OCTA only, andthey can not recover the 3D spatial information of the vessel.3D vessel analysis and visualization may be better at pre-senting the depth information of vessel abnormalities and can a r X i v : . [ ee ss . I V ] F e b ncoder Decoder * *
Structure Branch input
SCB
SCB SCB
SCB Input 1
Input 2 output2output1
Upsample3 × × SCB ℒ seg depth gt depth pred gt seg ℒ 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ℒ 𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑒 Fig. 2 . Overview of the proposed depth estimation method for OCTA image.be very helpful for observing microvascular changes[9, 10,11]. The rich depth information of blood vessels inevitablybe obscured in 2D space because of overlaps during the pro-jection. Therefore, the establishment of 3D OCTA analysisis critical and remains a challenge. In consequence, Zhang et al .[12] proposed a novel 3D surface-based microvascularsegmentation and reconstruction framework, with subsequentshape modeling and analysis procedures. However, this studystill suffers from shadow projection and may lead to impropervessel structure reconstruction. This is because directly pro-cessing of 3D OCTA volume for vessel reconstruction is chal-lenging, due to poor contrast, shadow projection, complextopological structures, and relatively smaller diameters.By means of OCTA imaging technology, such as the CIR-RUS HD-OCT 5000 System (Carl Zeiss Meditec), equippedwith AngioPlex® OCT Angiography software, a depth en-coded color map is provided (we refer it as depth map in thispaper), as illustrated in Fig.1 (d). Depth map is a color en-coded slab with different colors representing different depthlayers: superficial retina, deep retina, avascular retina, chori-ocapillaris, and choroid layer, and these layers can be seenin Fig.1 (c). Similar to the range image in computer vision,depth image refers to an image whose pixel value is the dis-tance from the imaging sensor to each position in the scene.For depth map obtained by CIRRUS HD-OCT 5000 System,color red indicates the vessels are closer to the imaging sensorand blue represents the vessels that are further.In this work, we aim to estimate 3D vessel network ofeach OCTA en face angiogram by the prediction of depthmap. To the best of our knowledge, this is the first work to obtain vessel depth map and to generate 3D vessel structurefrom OCTA en face angiogram. We introduce a deep neuralnetwork with structural constraints for predicting the depthmap of en face angiogram, and a novel combined loss is uti-lized to train the network to ensure that the depth map of theblood vessel is close to its groundtruth (generated by CIRRUSHD-OCT 5000 System). The spatial position of the vessel in3D space is obtained from the depth image, and the 3D vesselstructures are finally reconstructed by ensembling 2D vesselsegmentation results and the predicted depth maps.
2. METHOD2.1. Depth Map Estimation Network
In this section, we present our structure-constraint CNN ar-chitecture for depth map estimation. As illustrated in Fig.2,our network consists of three components: feature encodermodule, feature decoder module, and the structure constraintblocks (SCB). We further design a specific loss function fortraining the model to promote the predicted depth map accu-racy at both the overall structure- and pixel- level.
In view of U-Net’s[13] outstanding performance in process-ing medical images, we employ the same architecture config-uration with it as the encoder and decoder. Since retinal bloodvessels form a very complicated topological structure, the ac-curacy of depth prediction for the part of the image with bloodvessels is particularly important and challenging. In order topredict the depth of each pixel located in vessel structure, wefurther utilize a structure branch to process blood vessel infor-mation in the form of semantic structures. We enforce struc-ure branch to only process vessel-related information by ourcarefully designed SCB and local supervision.Different from skip connection used in[13], SCB enablesthe decoder to process relevant information only, and we useSCB after every block of the encoder. Let e t ( t ∈ , ..., denote the output of t th encoder block, and s (cid:101) t denote the cor-responding intermediate representations of structure branch.We first obtain an attention map a t − by concatenating e t and s (cid:101) t followed by convolutional layers, batch normalization andnonlinear activation layers. Given the attention map a t − , anelement-wise product is applied between e t and a t − to ac-quire weighted map. Note that upsampling is employed on e t before concatenation, to ensure e t and s (cid:101) t has the same size.Since containing rich edge information, the low-level featuresfrom the first block of the encoder are used to obtain the ini-tial weighted map. Intuitively, a t − can be seen as an atten-tion map that weights more important areas with blood vesselinformation. The filtered feature maps by SCB(i.e., output2in Fig.2) are cascaded with the corresponding decoder featuremaps to provide refined structure-related information. Andthe output of the last SCB is subjected to upsampling and con-volution operation to obtain the blood vessel prediction map. In this work, we jointly supervise depth map estimation andvessel segmentation during training. We use mean square er-ror (MSE) loss on predicted blood vessel maps pred seg : L seg = L MSE ( pred seg , gt seg ) , (1) where gt seg ∈ R H × W denotes groundtruth of blood ves-sel, and the generation of groundtruth will be introducedin Section 3.1. To promote the accuracy of the predicteddepth maps, we constrain them at both the pixel- and overallstructure- level. Specifically, MSE loss is utilized to ensurethe pixel level accuracy of depth map: L accuracy = λ L MSE ( v , (cid:101) v ) + λ L MSE ( b , (cid:101) b ) , (2) where v and b represent predicted depth map of vessel andbackground areas, which are generated by multiplying pre-dicted masks of structure branch, and (cid:101) v and (cid:101) b are correspond-ing groundtruth. We aim to ensure the model pay more atten-tion to the vessels than the background, and different weights λ and λ are utilized in L accuracy . In our experiments, λ and λ are empirically set to 0.8 and 0.2, respectively.Since the MSE has difficulty in discriminating structuralcontent in images, we add structural similarity index measure(SSIM) [14] loss to ensure the accuracy of the predicted im-ages in an overall structure manner. SSIM compares two im-ages from the perspectives of brightness, contrast, and struc-ture. Therefore, we define the structural loss between the pre-dicted depth map pred depth and the ground truth gt depth as: L structure = SSIM ( pred depth , gt depth ) , (3) and the final loss function may be defined as: L total = L seg + L accuracy + L structure (4) The 3D vessel reconstruction process may be treat as a map-ping problem from the segmented 2D vessel to 3D space. Aswe aforementioned, our network is also able to detect the ves-sel structures, and we use skeletonization method [15] to ex-tract centerline of vessels in OCTA en face images, and thebifurcation points of vessel network can be extracted by lo-cating intersection points (pixels with more than two neigh-bours). All the intersection points and their neighbours maythen be removed from the centreline map, in order to obtainan image with clearly separated vessel segments. Therefore,3D point clouds composed of the centreline points may beobtained by utilizing the predicted depth map, where adjacentsegments were linked according to the topology consistency,and the
Tubefilter process in VTK software is applied torend the 3D OCTA vessels architecture. In each vessel seg-ment, the bilinear interpolation was utilized to ensure vesselcontinuity.
3. EXPERIMENTS3.1. Dataset and Evaluation Metrics
A dataset comprising of 80 pairs of OCTA en face angiogramsand their depth maps were used in this work. All the OCTAand depth maps were obtained by the CIRRUS HD-OCT5000 System (Carl Zeiss Meditec Inc., USA), equipped withAngioPlex® OCT Angiography, with an image resolution of × pixels, and the scan area was × mm centeredon the fovea. We divide these 80 pairs images into trainingand testing sets, 56 of which were used for training, and therest for testing. It is worth noting that a state-of-the-art OCTAvessel segmentation model OCTA-Net[8] was used to extractvessels of training set, and an image analysis expert furtherrefined the vessel segmentation results as groundtruth.Seven metrics were employed to validate the performanceof the proposed framework, five of which were used for theevaluation of depth prediction and two were used for thevalidation of 3D vessel reconstruction. The accuracy (ACC)metric δ [20] was employed to validate the proposed depthprediction method: δ = max ( D i D ∗ i , D ∗ i D i ) < T , where D i and D ∗ i are the estimated depth and the corresponding depth ofthe i -th pixel of the groundtruth, respectively. As suggestedin[18], three different thresholds T (1.25, . , . ) wereused in the accuracy metric. As the most commonly-usedmetrics in evaluating monocular image depth estimation,the Absolute Relative Difference (ARD) and
Root MeanSquared Error (RMSE) are also used in this work. In orderto evaluate the 3D vessel reconstruction,
Chamfer Distance (CD)[21] and
Hausdorff Distance (HD)[22] were furtheremployed. CD and HD are both metrics to describe thesimilarity between two sets of points. https://vtk.org/ ig. 3 . Illustration of depth map and 3D vessel reconstruction. (a) en face OCTA angiogram. (b) Groundtruth of depth map.(c) Predicted depth map by our method. (d)-(h) Reconstructed 3D vessels with different angle of view.
Eigen et al. [16]Eigen et al. [17] Laina et al. [18]Chen et al. [19]U-Net [13]Our method
ACC ( δ < ACC ( δ < . ) 0.783 0.854 0.967 0.985 0.987 ( δ < . ) 0.927 0.941 0.981 0.991 0.992 Table 1 . Depth map estimation and 3D vessel reconstruction results by different methods.
Fig.3 illustrates the proposed framework in predicting depthmap and 3D vessel reconstruction. It may be seen that ourmethod is able to generate a depth map similarly to thegroundtruth depth map, as demonstrated in Fig.3 (b) and (c).However, it is difficult to demonstrate conclusively the supe-riority of the proposed depth estimation method purely by theabove visual inspection, we have compared our depth predic-tion results with those produced by other four state-of-the-artapproaches which were proposed by Eigen et al .[16], Eigen et al .[17], Laina et al .[18] and Chen et al .[19]. In addition, asthe backbone of our work, U-Net[13] was also employed asone of the benchmark approaches. Table 1 reports the evalua-tion results in terms of five different depth prediction metrics.As can be observed, the proposed method has reached thebest performance in terms of all the metrics by significantmargins, with only a single exception: its ACC ( δ < et al .[19].Fig.3(d)-(h) demonstrate the 3D vessel reconstruction re-sults with different angle of views. Our method is able toproduce high vessel visibility on both large vessels and smallcapillaries. As can be observed in Fig.3(h-1) and (h-2), theenlarged representative regions show high preservation of dif- ferent scales of vessel and bifurcation structure. The distribu-tions of large and small vessels are at different depth graduallydecrease from the parafovea to the fovea, and this is in linewith the anatomy of the retina. The CD and HD metrics alsoreveal that our method is superior in 3D vessel reconstruc-tion when compared with other methods. The structural con-straints and combined loss of our method substantially playvery important role in depth prediction.
4. CONCLUSION
In this work, we have proposed a novel framework to recon-struct 3D vessel structure in OCTA via a depth prediction net-work. The significance of our method is that this work may beconsidered as the first attempt to predict the vessel depth in-formation on 2D en face angiograms. The uncertainty of theposition of blood vessels in 3D spatial domain is estimated bycombining 2D vessel segmentation and the predicted depthmap. The high evaluation performance in terms of depth mapprediction and 3D vessel reconstruction demonstrate the ef-fectiveness of our method. It shows the great potential of ex-ploring 3D vessel analysis in clinical practice, and we willfocus on using the proposed framework for the diagnosis ofeye-related disease in clinical settings. . COMPLIANCE WITH ETHICAL STANDARDS
This study was performed in line with the principles of theDeclaration of Helsinki. Approval was granted by the EthicsCommittee of Ningbo Institute of Materials Technology andEngineering, Chinese Academy of Sciences.
6. ACKNOWLEDGMENTS
This work was supported in part by Zhejiang Provincial Natu-ral Science Foundation of China under Grant (LZ19F010001,LQ20F030002, LQ19H180001), in part by Ningbo PublicWelfare Science and Technology Project (2019C50049), andin part by the Ningbo 2025 S&T Megaprojects under Grant(2019B10033, 2019B10061), and in part by the Ningbo Nat-ural Science Foundation under Grant 2019A610354.
7. REFERENCES [1] D. Carlo et al. , “A review of optical coherence tomogra-phy angiography (octa),”
Int. J. Retina. Vitreous , vol. 1,no. 1, p. 5, 2015.[2] A. H. Kashani et al. , “Optical coherence tomography an-giography: A comprehensive review of current methodsand clinical applications,”
Prog. Retin. Eye. Res , vol. 60,pp. 66–100, 2017.[3] A. Y. Kim et al. , “Quantifying microvascular densityand morphology in diabetic retinopathy using spectral-domain optical coherence tomography angiography,”
In-vest. Ophthalmol. Vis. Sci , vol. 57, no. 9, pp. OCT362–OCT370, 2016.[4] Y. Zhao et al. , “Automated vessel segmentation using in-finite perimeter active contour model with hybrid regioninformation with application to retinal images,”
IEEETrans. Med. Imaging , vol. 34, no. 9, pp. 1797–1807,2015.[5] Y. Zhao et al. , “Retinal vascular network topology re-construction and artery/vein classification via dominantset clustering,”
IEEE Trans. Med. Imaging , vol. 39,no. 2, pp. 341–356, 2019.[6] L. Mou et al. , “Cs-net: channel and spatial attention net-work for curvilinear structure segmentation,” in
MIC-CAI , pp. 721–730, Springer, 2019.[7] J. Xie et al. , “Classification of retinal vessels into artery-vein in oct angiography guided by fundus images,” in
MICCAI , pp. 117–127, Springer, 2020.[8] Y. Ma et al. , “Rose: A retinal oct-angiography ves-sel segmentation dataset and new model,”
IEEE Trans.Med. Imaging , pp. 1–1, 2020. [9] R. F. Spaide, “Volume-rendered optical coherence to-mography of diabetic retinopathy pilot study,”
Am. J.Ophthalmol. , vol. 160, no. 6, pp. 1200–1210, 2015.[10] Y. Zhao et al. , “Region-based saliency estimation for3d shape analysis and understanding,”
Neurocomputing ,vol. 197, pp. 1–13, 2016.[11] Y. Zhao et al. , “Automatic 2-d/3-d vessel enhancementin multiple modality images using a weighted symme-try filter,”
IEEE Trans. Med. Imaging , vol. 37, no. 2,pp. 438–450, 2017.[12] J. Zhang et al. , “3d shape modeling and analysis of reti-nal microvasculature in oct-angiography images,”
IEEETrans. Med. Imaging , vol. 39, no. 5, pp. 1335–1346,2019.[13] O. Ronneberger et al. , “U-net: Convolutional net-works for biomedical image segmentation,” in
MICCAI ,pp. 234–241, Springer, 2015.[14] Z. Wang et al. , “Image quality assessment: from errorvisibility to structural similarity,”
IEEE Trans. Image.Process , vol. 13, no. 4, pp. 600–612, 2004.[15] P. Bankhead et al. , “Fast retinal vessel detection andmeasurement using wavelets and edge location refine-ment,”
PLoS One , vol. 7, no. 3, p. e32435, 2012.[16] D. Eigen et al. , “Depth map prediction from a singleimage using a multi-scale deep network,” in
NeurIPS ,pp. 2366–2374, 2014.[17] D. Eigen et al. , “Predicting depth, surface normals andsemantic labels with a common multi-scale convolu-tional architecture,” in
Proceedings of ICCV , pp. 2650–2658, 2015.[18] I. Laina et al. , “Deeper depth prediction with fully con-volutional residual networks,” in , pp. 239–248,IEEE, 2016.[19] W. Chen et al. , “Single-image depth perception in thewild,” in
NeurIPS , pp. 730–738, 2016.[20] L. Ladicky et al. , “Pulling things out of perspective,” in
Proceedings of CVPR , pp. 89–96, 2014.[21] G. Borgefors, “Distance transformations in digital im-ages,”
Computer vision, graphics, and image process-ing , vol. 34, no. 3, pp. 344–371, 1986.[22] D. P. Huttenlocher et al. , “Comparing images using thehausdorff distance,”