[PDF] DPN: Detail-Preserving Network with High Resolution Representation for Efficient Segmentation of Retinal Vessels

Abstract

Retinal vessels are important biomarkers for many ophthalmological and cardiovascular diseases. It is of great significance to develop an accurate and fast vessel segmentation model for computer-aided diagnosis. Existing methods, such as U-Net follows the encoder-decoder pipeline, where detailed information is lost in the encoder in order to achieve a large field of view. Although detailed information could be recovered in the decoder via multi-scale fusion, it still contains noise. In this paper, we propose a deep segmentation model, called detail-preserving network (DPN) for efficient vessel segmentation. To preserve detailed spatial information and learn structural information at the same time, we designed the detail-preserving block (DP-Block). Further, we stacked eight DP-Blocks together to form the DPN. More importantly, there are no down-sampling operations among these blocks. As a result, the DPN could maintain a high resolution during the processing, which is helpful to locate the boundaries of thin vessels. To illustrate the effectiveness of our method, we conducted experiments over three public datasets. Experimental results show, compared to state-of-the-art methods, our method shows competitive/better performance in terms of segmentation accuracy, segmentation speed, extensibility and the number of parameters. Specifically, 1) the AUC of our method ranks first/second/third on the STARE/CHASE_DB1/DRIVE datasets, respectively. 2) Only one forward pass is required of our method to generate a vessel segmentation map, and the segmentation speed of our method is over 20-160x faster than other methods on the DRIVE dataset. 3) We conducted cross-training experiments to demonstrate the extensibility of our method, and results revealed that our method shows superior performance. 4) The number of parameters of our method is only around 96k, less then all comparison methods.

Full PDF

JJOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

DPN: Detail-Preserving Network with HighResolution Representation for EfﬁcientSegmentation of Retinal Vessels

Song Guo

Abstract —Retinal vessels are important biomarkers for many ophthalmological and cardiovascular diseases. It is of great signiﬁcanceto develop an accurate and fast vessel segmentation model for computer-aided diagnosis. Existing methods, such as U-Net and FCNfollow the encoder-decoder pipeline, where detailed information is lost in the encoder in order to achieve a large ﬁeld of view. Althoughdetailed information could be recovered in the decoder via multi-scale fusion, it still contains noise. Different from existing methods, inthis paper, we propose a deep segmentation model, called detail-preserving network (DPN) for efﬁcient vessel segmentation. Topreserve detailed spatial information and learn structural information at the same time, we designed the detail-preserving block(DP-Block). Further, we stacked eight DP-Blocks together to form the DPN. More importantly, there are no down-sampling operationsamong these blocks. As a result, the DPN could maintain a high resolution during the processing, which is helpful to locate theboundaries of thin vessels. To illustrate the effectiveness of our method, we conducted experiments over DRIVE, STARE andCHASE DB1 datasets. Experimental results show, compared to state-of-the-art methods, our method shows competitive/betterperformance in terms of segmentation accuracy, segmentation speed, extensibility and the number of parameters. Speciﬁcally, 1) theAUC of our method ranks ﬁrst/second/third on the STARE/CHASE DB1/DRIVE datasets, respectively. 2) Only one forward pass isrequired of our method to generate a vessel segmentation map, and the segmentation speed of our method is over 20-160 × fasterthan other methods on the DRIVE dataset. 3) We conducted cross-training experiments to demonstrate the extensibility of our method,and results revealed that our method shows superior performance. 4) The number of parameters of our method is only around 96k,less then all comparison methods. At last, the source code of our method will be available at https://github.com/guomugong/DPN. Index Terms —Retinal Vessel Segmentation, Fast Speed, High Resolution Representation, Fundus Image. (cid:70)

NTRODUCTION R ETINAL blood vessels are an important part of fundusimages, and they can be applied to the diagnosis ofmany ophthalmological diseases, such as diabetic retinopathy [1],cataract [2], and hypertensive retinopathy [3]. Speciﬁcally, whenpatients with diffuse choroidal hemangioma, retinal blood vesselswill expand [4]. Vascular structures in patients with cataract areunclear or even invisible [2]. In addition, as retinal blood vesselsand cerebral blood vessels are similar in anatomical, physiologicaland embryological characteristics, so that retinal vessels are alsoimportant biomarkers to some cardiovascular diseases [5], [6]. Ac-curate segmentation of blood vessels is the basic step of efﬁcientcomputer-aided diagnosis (CAD). However, manual segmentationof retinal vessels is time-consuming and relies heavily on humanexperience. Therefore, it is necessary to develop accurate and fastvessel segmentation methods for CAD.Considering the clinical application scenarios, a good vesselsegmentation model for CAD should satisfy the following threeconditions. 1)

High accuracy . The model needs to be capable torecognize both thin vessels and thick vessels, even for extremelythin vessels with one pixel width. For example, the appearanceof neovascularization can be used to diagnose and grade diabetesretinopathy. 2)

Good extensibility . The model needs to show goodextensibility/generalization ability after it is training done. In otherwords, when the model is applied to clinical images, it needs to • S. Guo was with the School of Information and Control Engineering, Xi’anUniversity of Architecture and Technology, Xi’an 710055, China.E-mail: [email protected] perform well, not just on the test set. 3)

Fast running speed .The model needs to have a fast processing speed to meet clinicalapplication, as faster speed means greater throughput and higherprocessing efﬁciency.Existing vessel segmentation methods could be divided intotwo categories [7]: unsupervised methods and supervised meth-ods. Unsupervised methods utilize manually designed low-levelfeatures and rules [8], therefore, they show poor extensibility.Supervised methods utilize human annotated training images, andtheir segmentation accuracy is usually higher than that of unsu-pervised methods [6]. Deep learning based supervised methodscould learn high-level features in an end-to-end manner, and theyshow superior performance in terms of segmentation accuracyand extensibility [9], [10]. Most deep vessel segmentation modelsfollow the architecture of fully convolutional network (FCN) [11],in which the resolution of features is ﬁrst down-sampled and thenup-sampled to generate pixel-wise segmentation maps. However,the detailed information is lost in FCN. Furthermore, a U-Net [12]model was proposed, which could utilize intermediate layers inthe up-sampling path to fuse more spatial information to generateﬁne segmentation maps. Although the detailed information couldbe utilized in the U-Net, the extra noise was also introduced.Moreover, most U-Net variant models [9] require multiple forwardpasses to generate a segmentation map for one testing image, sincethey splitted one fundus image into hundreds of small patches. Asa result, they show slow segmentation speed and the contextualinformation is not fully utilized.To preserve detailed information and avoid the introduction a r X i v : . [ ee ss . I V ] S e p OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2 of noise, in this paper, we present the detail-preserving net-work (DPN). Inspired by HRNet [13], the DPN learns the highresolution representation directly rather than the low resolutionrepresentation. In this manner, the DPN could locate the bound-aries of thin vessels accurately. To this end, on one hand, wepresent the detail-preserving block (DP-Block), where multi-scalefeatures are fused in a cascaded manner, so that more contextualinformation could be utilized. And, the resolution of input featuresand output features of DP-Block is never changed, so that thedetailed spatial information could be preserved. On the otherhand, we stacked several DP-Blocks together to form the DPN.We note that there are no down-sampling operations among theseDP-Blocks, so that the DPN could learn both semantic featuresvia a large ﬁeld of view and preserve the detailed informationsimultaneously. To validate the effectiveness of our method, weconducted experiments on the DRIVE, STARE and CHASE DB1datasets. Experimental results reveal that our method shows com-petitive/better performance compared with other state-of-the-artmethods.Overall, our contributions are summarized as follows.1) We present the detail-preserving block, which could learnthe structural information and preserve the detailed infor-mation via intra-block multi-scale fusion.2) We present the detail-preserving network, which mainlyconsists of eight serially connected DP-Blocks, andit maintains high resolution representations during thewhole process. As a result, the DPN could learn bothsemantic features and preserve the detailed informationsimultaneously.3) We conducted experiments over three public datasets.Experimental results reveal that our method achievescomparable or even superior performance over othermethods in terms of segmentation accuracy, segmentationspeed, extendibility, and the number of parameters.The rest of this paper is organized as follows. Related worksabout vessel segmentation are introduced in Section 2. Our methodis described in Section 3. Experimental results are analyzed inSection 4. Conclusions are drawn in Section 5.

ELATED W ORKS

Retinal vessel segmentation is a pixel-wise binary classiﬁcationproblem, and the objective is to locate each vessel-pixel accu-rately for further processing. According to whether annotationsare used, existing methods could be divided into two categories:unsupervised methods and supervised methods.

Unsupervised methods usually utilize human-designed low-levelfeatures, such as edge, line and color. Manually annotated informa-tion is not utilized. Unsupervised methods can be roughly dividedinto match ﬁlter based method, vessel tracking based method,threshold based method and morphology based method.Wang et al. [14] proposed a multi-stage method for vesselsegmentation. A matched ﬁltering was ﬁrst adopted for vesselenhancing, and then vessels were located via a multi-scale hier-archical decomposition. Yin et al. [15] proposed a vessel trackingmethod, in which local grey information was utilized to selectvessel edge points. Then a Bayesian method was used to determine the direction of vessels. Garg et al. [16] proposed a curvature-based method. In their method, the vessel lines were ﬁrst extractedusing curvature information, and then a region growing methodwas used to generate the whole vessel tree. Li et al. [17] proposedan adaptive threshold method for vessel segmentation, and theirmethod could detect both large and small vessels. Christodoulidiset al. [18] utilized line detector and tensor voting for vesselsegmentation, and thin vessels were well detected.A major limitation for unsupervised method is that the featuresand rules are designed by human. It is hard to design a satisfactoryfeature that works well on large-scale fundus images. This kind ofmethod may show poor generalization ability.

In contrast to unsupervised methods, supervised methods needannotation information to build vessel segmentation models. Be-fore deep learning methods were applied to vessel segmentation,supervised methods usually consist of two procedures: featureextraction and classiﬁcation. In the ﬁrst procedure, features wereextracted by human-designed rules, just as that did in unsupervisedmethods. In the second procedure, supervised classiﬁers wereemployed to classify these extracted features into vessels ornon-vessels. As deep learning methods unify feature extractionand classiﬁcation procedures together, they could extract muchdiscriminative features.Deep learning based methods could be roughly divided intoclassiﬁcation-based methods and segmentation-based methods.For classiﬁcation-based methods, the category for each pixel isdetermined by its surrounding small image patch [19], [20]. Thiskind of method does not make full use of contextual information.For segmentation-based methods, existing methods follow thearchitecture of FCN, where the resolution of feature maps are ﬁrstdown-sampled to encode structural information, and then the res-olution of feature maps are up-sampled further to generate pixel-wise segmentation maps. However, the down-sampling operationsacriﬁced detailed spatial information, which is bad to identifyingthin vessels. To alleviate this problem, multi-scale fusion methodsand graph models were adopted. For instance, Maninis et al. [21]proposed a FCN for vessel segmentation. They adopted a multi-scale feature fusion to generate ﬁne vessel maps. Fu et al. [22]adopted a holistically-nested edge detection model [23] to generatecoarse segmentation maps, and then a conditional random ﬁeldwas adopted to model the relationship among long-range pixels toreﬁne segmentation maps. Besides above methods, Ronnebergeret al. proposed an u-shape network, called U-Net to preservespatial information [12]. Similar with FCN, the feature mapswere ﬁrst down-sampled to a low resolution, then they were up-sampled step-by-step. In each step, the intermediate features withhigh representation in the encoder were utilized. Several methodsbased on U-Net have been proposed for vessel segmentation. Forinstance, Jin et al. [9] proposed a DUNet for vessel segmentation.They used deformable convolution rather than grid convolution inU-Net to capture the shape of vessels. Wu et al. [24] designed atwo-branch network, where each branch consists of two U-Nets.The output of their method was the average of the predictions ofthese two branches. In addition, different from [21] and [22] thatused the entire image as training samples. Both [9] and [24] usedoverlapped image patches of size 48 ×

48 as training samples, anda re-composed procedure is required to complete a segmentationmap during testing. Hence, they suffer from a high computation

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3

Conv DP-Block lossAuxiliary loss Auxiliary loss Auxiliary lossDPR-Block1 … DPR-Block7 (a) DPN

InputPoolingStride 4 PoolingStride 23×3-C2 ConvUpsample2× 3×3-C1 ConvConcat

ConvUpsample2× 3×3-C0 ConvConcat

ConvH×W H×WH/2×W/2H/4×W/4

H×WOutput (b) DP-Block

InputPoolingStride 4 PoolingStride 23×3-C2 ConvUpsample2× 3×3-C1 ConvConcat

ConvUpsample Concat

Conv

H×WH/2×W/2H/4×W/4 H×WOutput (c) DPR-Block

Fig. 1. (a) Overview of the proposed detail-preserving network (DPN). DPN consists of one DP-Block and seven DPR-Blocks, and it maintains highresolution representations during the whole process. (b) Overview of the proposed detail-preserving block (DP-Block), where C0, C1 and C2 denotethe number of convolutional ﬁlters for each branch. (c) Overview of the proposed detail-preserving block with residual connection (DPR-Block). complexity. Despite their success, the problem of lossing spatialinformation in the down-sampling phase has not been fully ad-dressed. Meantime, considering both computation complexity andsegmentation accuracy, there still lacks a fast and accurate vesselsegmentation model. UR M ETHOD

In this section, we will describe our method in detail, including thearchitecture of our proposed detail-preserving network, the detail-preserving block, and the loss function at last.

A good vessel segmentation model should segment both thickvessels and thin vessels, this requires the segmentation model tolearn structural semantic information and preserve detailed spatialinformation simultaneously. The structural information is beneﬁtto locate thick vessels, and it requires the model to have a largeﬁeld of view. While the detailed spatial information is importantto locate vessel boundaries accurately, especially for thin vessels. However, it is easy to lose detailed information when learningstructural information. For example, the structural information ofU-Net [12] is learned by successive down-sampling operations,and the resolution of feature maps is decreased by a factor of8 or even more (as can be seen in Fig. 2). Such low resolutionimplies that the spatial information of thin vessels is lost. U-Netutilizes intermediate features of the encoder to recover the spatialinformation. However, intermediate features may have noise dueto a small ﬁeld of view.Our study is motivated by whether it is possible to preserve de-tailed information, while the network has a large ﬁeld of view. Tothis end, we present a high representation network, called detail-preserving network for vessel segmentation. The architecture ofour model is visualized in Fig. 1. We can observe that DPN mainlyconsists of the front convolution operation, eight detail-preservingblocks (speciﬁcally, one DP-Block and seven DPR-Blocks) andfour loss functions. The DPN has three characteristics. 1) Differentfrom U-Net, there are no down-sampling operations among theseDP-Blocks, this implies the resolution of features among theseDP-Blocks keeps the same. In other words, the DPN maintains

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4 input output

H/2×W/2H/4×W/4 H/8×W/8H×W H/4×W/4H/2×W/2H×W

Fig. 2. The architecture of U-Net [12]. H and W denote the height andwidth of feature maps. a full/high resolution representation during the whole processing(from input to output), thereby it could preserve detailed spatialinformation. 2) For DP-Block, the receptive ﬁeld of the outputneuron could be as large as four times that of the input neuron,while the detailed information could also be preserved. Hence,the DPN could achieves a large ﬁeld of view via successive DP-Blocks, which ensures the DPN could learn structural semanticinformation instead of local information. The architecture of theDP-Block will be described in the next section. 3) Differentfrom U-Net that utilized VGGNet or ResNet as the backbone,which incurs a large number of parameters. The total numberof parameters of DPN is only 96k. 4) The input of DPN is theentire image, so that it could integrate more contextual informationthan patch-level segmentation models. Meantime, our method onlyneeds one forward pass to generate the complete segmentationmaps, thereby the inference speed of our method is faster thanpatch-level models.

DP-Block as the key component of DPN, could learn structuralsemantic information and preserve the spatial detailed informationat the same time. Overview of the DP-Block is visualized inFig. 1(b). We can observe that the input feature of the DP-Block isfed into three branches, and each branch is processed in differentscales. The output feature of the DP-Block is obtained by fusingfeatures of three scales. The computing procedure of the DP-Blockis as follows.For the ﬁrst branch, a convolution operation with 3 × × × × ,and then connected to the second branch, and the output of thesecond branch was further connected to the ﬁrst branch. Here, we used concatenation operation for feature fusion. We note that theresolution of the output feature of the DP-Block is the same asthe input feature, so that the DP-Block could not only preservedetailed information but also learns multi-scale features.Furthermore, we extend the DP-Block and propose the detail-preserving block with residual connection (DPR-Block). Theresidual connection is helpful to the gradient propagation [25],as no pre-training model is available to train DPN. Overview ofthe DPR-Block is visualized in Fig. 1(c). As we can see, theDPR-Block is built upon the DP-Block, except that the outputof the DP-Block is further summed up to the input of the DPR-Block and then their output was connected to a convolutionoperation. Therefore, the size (Height × Width × Channel) of theoutput feature map of the DPR-Block is the same as that of theinput feature map.

The number of parameters . In our experiments, the numberof convolutional ﬁlters C0, C1 and C2 for each branch of DP-Block and DPR-Block was set to 16, 8 and 8, respectively.Suppose the dimension of the input feature of the DPR-Blockis H × W × C0, then the number of parameters for each DPR-Blockis only 11,592. In DPN, the dimension of the output feature ofthe ﬁrst convolution operation is H × W ×

32, then the number ofparameters of the DP-Block is 13,880. Hence, the DP-Block andDPR-Block could be effectively learned even when the number ofparameters is small.

Relationship with Inception Module . Different from the in-ception module [26] that uses parallel convolution operations withdifferent convolutional kernels to learn multi-scale features, ourDP-Block adopts down-sampling ﬁrst, so that the receptive ﬁeld isfurther enlarged. The receptive ﬁeld of each output neuron is fourtimes that of the input neuron. As a result, the receptive ﬁeld growsexponentially when stacking multiple DP-Blocks. Furthermore,rather than parallel processing branches in the inception module,the features of different branches were fused in a cascaded mannerin DP-Block to better learn multi-scale features.

Blood vessels account only for a small proportion of the entire im-age. Speciﬁcally, the proportion of vessels is 8.69%/7.6%/6.93%on the DRIVE/STARE/CHASE DB1 datasets, respectively. Thereexists a class imbalance problem in vessel segmentation. To solvethis problem, we adopted class balanced cross-entropy loss [23],which uses a weight factor to balance vessel pixels and non-vesselpixels. The class balanced cross-entropy loss is deﬁned as follows. L ( p, y | θ ) = − β (cid:88) y j =1 log p j − (1 − β ) (cid:88) y j =0 log (1 − p j ) (1)where p is a probability map obtained by a sigmoid operation, and p j denotes the probability that the j th pixel belongs to vessel.In addition, y denotes the ground truth, and θ denotes modelparameters. Rather than using a ﬁxed value, the weight factor β is calculated at each iteration based on the distribution of vesselpixels and non-vessel pixels. The weight factor β is deﬁned asbelow. β = N − N + + N − (2)where N + denotes the number of vessel pixels, and N − denotesthe number of non-vessel pixels. Since N − > N + , the weightfor vessel pixels is large than the weight for non-vessel pixels. So OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5 that the model would focus more on vessel pixels than non-vesselpixels.Besides the segmentation loss after the last layer of DPN, weadd three auxiliary losses to the intermediate layers of DPN to passextra gradient signals to alleviate the gradient-vanish problem, justas that did in DSN [27] and GoogLeNet [26]. As can be seen inFig. 1, the ﬁrst auxiliary loss is after DPR-Block1, the secondauxiliary loss is after the DPR-Block3, and the last one is after theDPR-Block5. The segmentation loss is connected after the DPR-Block7. Taking the ﬁrst auxiliary loss as an example, we ﬁrstadopted a convolution operation with one 1 × L all ( x, y | θ ) = (cid:88) i =1 L ( p i ( x ) , y | θ ) + λ || θ || (3)where p i denotes the probability map of the i th loss function, and λ denotes the weight decay coefﬁcient.In conclusion, we aim to minimize the above objective func-tion during training. In the test phase, the output of the lastsegmentation loss is taken as the segmentation results of DPN, andthe segmentation probability maps of auxiliary losses are ignored. XPERIMENTS

Performances of our method were evaluated on three publicdatasets: DRIVE [28], STARE [29] and CHASE DB1 [30].The DRIVE ( D igital R etinal I mages for V essel E xtraction)dataset contains 40 color fundus images captured with a 45 ◦ FOV (Field of View). Each image has the same resolution, whichis 565 ×

584 (width × height). The dataset is partitioned into thetraining set and the test set ofﬁcially, and each set contains 20images. For the test set, two groups of annotations are provided.We used the annotation of the ﬁrst group as ground-truth toevaluate our model, just as other methods did. In addition, theFOV masks for calculating evaluation metrics are also provided.The STARE ( St ructured A nalysis of the Re tina) dataset con-tains 20 equal-sized (700 × × ◦ FOV. As the split of the trainingset and the test set is not provided. For fair comparison withother methods, we did two sets of experiments. We adopted a20/8 partition for the ﬁrst set of experiments, where the ﬁrst 20images were selected for training and the rest 8 images for testing.For another set of experiments, we adopted a 14/14 (training/test)

Fig. 3. Fundus images (the ﬁrst row) and the corresponding FOV masks(the second row) from DRIVE, STARE and CHASE DB1 datasets, fromleft to right.Fig. 4. (a) A fundus image from STARE dataset. (b) Green channel ofcolor fundus image. (c) Image after CLAHE. partition. As the FOV masks are not present, we created the masksmanually. The FOV masks for DRIVE, STARE and CHASE DB1are presented in Fig. 3.

To avoid over-ﬁtting, several transformations have been adoptedto augment the training set, including ﬂipping (horizontal andvertical) and rotation (22 ◦ , 45 ◦ , 90 ◦ , 135 ◦ , 180 ◦ , 225 ◦ , 270 ◦ ,315 ◦ ). As a result, the training images were augmented by a factorof 10 ofﬂine. Moreover, the training image was randomly mirroredduring training for each iteration.For the DRIVE and CHASE DB1 datasets, no preprocessingwas performed and the raw color fundus images were fed intothe segmentation model. For the STARE dataset, we adoptedcontrast limited adaptive histogram equalization (CLAHE) [32]to the green channel of the fundus images to enhance low-contrastvessels, as can be seen in Fig. 4. Our model was implemented based on an open-source deep learn-ing framework

Caffe [33], and it ran on a workstation equippedwith one NVIDIA RTX 2080ti GPU.We initialized weights of our model using xavier [34]. Thelearning rate was initialized to 1e-3. And we trained our modelfor 100k/30k/100k iterations with ADAM [35] (batch size 1)using weight decay 0.0005 on the DRIVE/STARE/CHASE DB1datasets, respectively.To reduce computational complexity, each training image wascropped into 512 ×

512 patches randomly during training on theDRIVE and STARE datasets. For the CHASE DB1 dataset, a736 ×

736 patch was cropped to use spatial information as muchas possible. And, the crop operation was performed via the data

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6 layer of

Caffe . When testing, the entire fundus image is fed intothe network without cropping, so that our model could generate asegmentation map with only one forward pass.

We use ﬁve metrics to evaluate our method: Sensitivity (Se),Speciﬁcity (Sp), Accuracy (Acc), the Area Under the receivingoperator characteristics Curve (AUC), and F1-score (F1). Theyare deﬁned as follows. Se = T PT P + F N (4) Sp = T NT N + F P (5)

Acc = T P + T NT P + F N + T N + F P (6) F × P r × SeP r + Se (7)where P r = T PT P + F P , and true positive (TP) denotes thenumber of vessel pixels classiﬁed correctly and true negative(TN) denotes the number of non-vessel pixels classiﬁed correctly.Similarly, false positive (FP) denotes the number of non-vesselpixels misclassiﬁed as vessels and false negative (FN) denotes thenumber of vessel pixels misclassiﬁed as non-vessels. To calculateSe, Sp and Acc, we select the threshold that corresponds to theoptimal operating point of the receiving operator characteristics(ROC) curve to generate the binary segmentation maps from aprobability map. Also, we note that TP, FN, FP and TN arecounted pixel-by-pixel, and only the pixels inside the FOV maskare calculated, not the whole fundus image. The ROC curve isobtained by multiple Se versus (1-Sp) via varying threshold. AUCevaluates the segmentation probability maps not the binary maps,which is more comprehensive. The AUC ranges from 0 to 1, andthe AUC of a perfect segmentation model is 1.Besides these ﬁve evaluation metrics, we also report thesegmentation speed of our model using fps (frames per second).The segmentation time t for each image is counted starting fromreading the raw test image from the hard disk to writing thesegmentation map into the hard disk. Then, f ps = 1 . /t . We compared our method with several state-of-the-art deep ves-sel segmentation methods on three public datasets in terms ofsegmentation performance, segmentation speed and the numberof parameters. Comparison results were summarized in Table 1,Table 2 and Table 3.As we can see from Table 1, compared with DRIU [21] andBTS-DSN [7] which need only one forward pass to generatethe segmentation map during testing, our method achieves muchhigher Se, Acc, AUC and F1 when the segmentation speed isvery close. Speciﬁcally, the Acc and AUC of our method is about0.2% higher than DRIU and BTS-DSN. These two models utilizedVGGNet [36] as the backbone, and a multi-scale feature fusionmethod was adopted to recover the spatial structure and detailed information of vessels. Different from these two models, ourmethod learns in a high resolution representation directly, whichis friendly to thin vessel detection. We observe that our methodachieves much higher Se, and this means our method could detectmore vessel pixels, verifying the effectiveness of our DP-Block. Inaddition, compared with other eight methods that needs multipleforward passes to generate the segmentation map during testing foronly one fundus image, the Se, Acc, Auc and F1 of our methodis higher that six of the eight methods. Although the segmentationaccuracy of our method is slightly lower than FCN [37] andVessel-Net [38], the segmentation speed of our method is over20 × faster than FCN [37]. Speciﬁcally, our model can segmentover 10 fundus images within 1 second, which greatly improvesthe throughout. Moreover, the number of parameters of our modelis only 96k, which is much lower than all state-of-the-art models.Therefore, our model is more lightweight and more suitable fordeployment to some mobile devices.On the STARE dataset, we carried out two sets of experiments.The difference between each set of experiment is the partitionstrategy of the dataset. As we can observe from Table 2, ourmodel achieves 0.8020, 0.9848, 0.9649, 0.9859, 0.8237, and 9.1for Se, Sp, Acc, AUC, F1 and fps, respectively under leave-one-out cross validation. Among these metrics, our method showssuperior performance in terms of Se, Acc, AUC, F1-score and fpscompared with DUNet, U-Net and Three-stage FCN. Specially, thesegmentation speed of our model is 180 × faster than DUNet [9],and the AUC of our model is 0.27% higher than DUNet. DUNet isa variant of U-Net, it used deformable convolution rather than gridconvolution to better capture the shape characteristics of vessels.Our model uses DP-Block to capture the detailed informationand the structural information simultaneously. We argue that ourmethod and DUNet are two different directions for improvingthe segmentation accuracy, and the performance of our modelmight be further improved by replacing grid convolution withdeformable convolution. Compared with DRIU, DeepVessel andBTS-DSN, although the fps of our model is slightly lower thaniamge-level BTS-DSN, our model achieves the highest Sp, Acc,AUC and F1-score. In conclusion, our method shows superiorperformance on the STARE dataset.On the CHASE DB1 dataset, we compare our method withseven existing methods, and most existing state-of-the methodsrequire multiple forward passes and a recomposed operation togenerate a segmentation map for one fundus image, thus theyshow slow segmentation speed. For DUNet [9] and DEU-Net [39],they need over 10s to segment a fundus image with resolution999 × × and 70 × faster than DUNet and DEU-Net. Meantime, our model achieveshigher AUC compared with DUNet and DEU-Net. Compared withMS-NFN, Three-stage FCN, Vessel-Net and BTS-DSN, the Se,Acc and AUC of our model are only lower than these of Vessel-Net. However, the Sp of our method is 0.27% higher than Vessel-Net. In summary, taking both segmentation accuracy and segmen-tation speed into consideration, our method shows competitiveperformance compared with other state-of-the-art methods. To show the effectiveness of our proposed DPN, we present thesegmentation probability maps and the corresponding binary mapsin Fig. 5. We can observe that our model could detect both thin

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7

TABLE 1Comparison results on the DRIVE dataset (For each metric, the top three scores are marked as red, green and blue, respectively.)

Method One Forward Pass? Se Sp Acc AUC F1 fps Params(M)FCN [37] No 0.8039 0.9804 0.9576 0.9821 N.A 0.5 0.2U-Net [9] No 0.7849 0.9802 0.9554 0.9761 0.8175 0.32 3.4DUNet [9] No 0.7963 0.9800 0.9566 0.9802 0.8237 0.07 0.9DEU-Net [39] No 0.7940 0.9816 0.9567 0.9772 0.8270 0.15 N.AMS-NFN [24] No 0.7844 0.9819 0.9567 0.9807 N.A 0.1 0.4Patch BTS-DSN [7] No 0.7891 0.9804 0.9561 0.9806 0.8249 N.A 7.8Three-stage FCN [10] No 0.7631 0.9820 0.9538 0.9750 N.A N.A 20.4Vessel-Net [38] No 0.8038 0.9802 0.9578 0.9821 N.A N.A 1.7DRIU [21] Yes 0.7855 0.9799 0.9552 0.9793 0.8220 N.A 7.8Image BTS-DSN [7] Yes 0.7800 0.9806 0.9551 0.9796 0.8208 12.3 ∗ N.A : Not Available * : The metric was computed by ourselves TABLE 2Comparison results on the STARE dataset (For each metric, the top three scores are marked as red, green and blue, respectively.)

Method One Forward Pass? Se Sp Acc AUC F1 fps Split of datasetDRIU [21] Yes 0.8036 0.9845 0.9658 0.9773 0.8310 N.A 10/10 (train/test)DeepVessel [22] Yes 0.7412 N.A 0.9585 N.A N.A N.A 10/10 (train/test)Image BTS-DSN [7] Yes 0.8201 0.9828 0.9660 0.9872 0.8362 9.3 ∗ N.A : Not Available * : The metric was computed by ourselves TABLE 3Comparison results on the CHASE DB1 dataset (For each metric, the top three scores are marked as red, green and blue, respectively.)

Method One Forward Pass? Se Sp Acc AUC F1 fps Split of datasetMS-NFN [24] No 0.7538 0.9847 0.9637 0.9825 N.A < ∗ N.A : Not Available * : The metric was computed by ourselves vessels and thick vessel trees, verifying the effectiveness of ourproposed DP-Block and DPR-Block.Moreover, we present three challenging cases in Fig. 6. We canobserve that our model could detect thin vessels with only one-pixel width, as DPN always preserve the spatial information. Inaddition, our model is able to segment some extremely thin vesselswith low-contrast near macula. In the third row of Fig. 6, there ex-ists two lumps of hemorrhage, which shares similar local featureswith vessels. As the DPN could capture structural information, asa result, the DPN is robust to the presence of hemorrhages. Also,for some true vessels not annotated, our model could segmentwell. In summary, the proposed method could segment thick andthin vessels, and robust to noise. A good vessel segmentation model should not only perform wellon the test set for one dataset, but also on other datasets withoutretraining. To show the generalization ability of our model, weemployed cross-training experiments on the DRIVE and STAREdatasets. Different from BTS-DSN [7] which retrained the modelfrom the whole dataset and tested on another dataset. We followedthe setting of cross-training as that in DUNet [9] and Three-stage FCN [10] which the model trained on the training set ofthe DRIVE are applied to the whole STARE dataset withoutretraining. We compare with four methods, and the comparisonresults were summarized in Table 4.When transferring our method from STARE to DRIVE, ourmethod achieves the highest Se and Acc. Specially, the sensitivityof our model is nearly 4% higher than Three-stage FCN [10], andthe speciﬁcity is almost the same. Hence, our model could segment

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8

Fig. 5. Visualization of the segmentation maps. The ﬁrst, third and ﬁfth column correspond to the highest accuracy on the DRIVE, STARE andCHASE DB1 datasets. The second, fourth and sixth column correspond to the lowest accuracy on the DRIVE, STARE and CHASE DB1 datasets.From row 1 to 4: fundus images, ground truth, probability maps and binary maps. (a) Segmentation of extremely thin vessels(b) Segmentation of low-contrast vessels(c) Segmentation in the presence of hemorrhages

Fig. 6. Visualization of some challenging cases. From left to right: fundusimages patches, ground-truth and the segmentation probability mapsgenerated by proposed DPN. more vessels compared with Three-stage FCN, and high sensitivityis critical to clinical application. Different from Three-stage FCNthat designing specialized network structures for thin and thickvessels, respectively. Our proposed DP-Block could capture thinvessels by learning high resolution features and thick vesselsby preserving global structural information simultaneously. Thecross-training experiments show the superior performance of ourmethod than Three-stage FCN.When transferring from DRIVE to STARE, our method showspoor performance. We argue the reason is that we did no pre-processing for training samples, since there is a big gap betweenthe two datasets in terms of color and illumination. So that, weretrained our model on the DRIVE training set using CLAHE, justas that did in the STARE dataset. We can observe from Table 4 thatall of four evaluation metrics are improved after adopting CLAHEpreprocessing. Compared with other four methods, our modelshows superior performance in terms of Sp and AUC. Takingcross-training experimental results and segmentation speed intoconsideration, our method is more suitable for clinical applicationthan existing methods.

In order to show the effectiveness of using auxiliary losses in theintermediate layers, we removed all three auxiliary losses in DPNand the experimental results were summarized in Table 5. We canobserve that almost all evaluation metrics were improved afteradopting auxiliary losses. Speciﬁcally, the segmentation accuracy

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9

TABLE 4Results of the cross-training experiments.

Dataset Methods Se Sp Acc AUCSTARE → DRIVE Fraz [40] 0.7242 0.9792 0.9456 0.9697Li [41] 0.7273 0.9810 0.9486 0.9677Yan [10] 0.7014 0.9802 0.9444 0.9568Jin [9] 0.6505

Our Method → STARE Fraz [40] 0.7010 0.9770 0.9495 0.9660Li [41] 0.7027 0.9828 0.9545 0.9671Yan [10] ∗ * : The results are obtained by retrained our model on DRIVEtraining set (20 images) using a preprocessing strategy withCLAHE. TABLE 5comparison results of employing auxiliary losses or not (best resultsshown in bold).

Dataset Auxiliary Loss? Se Sp Acc AUC F1DRIVE No 0.7838

STARE No 0.8075 0.9847 0.9662 0.9872 0.8361Yes

CHASE DB1 No 0.7626 was improved over 0.1% on all three datasets. This part ofexperiment veriﬁes the rationality and effectiveness of adoptingauxiliary losses.

ONCLUSION

Deep learning models have been applied to fundus vessel seg-mentation, and achieve remarkable performance. In this paper, wepropose a deep model, called DPN to segment fundus vessel trees.Different from U-Net and FCN, in which the resolution of featureswas ﬁrst down-sampled and then up-sampled. Our method couldmaintains a high resolution throughout the whole process, so thatthe vessel boundaries could be located accurately. To accomplishthis goal, we proposed the DP-Block further, where multi-scalefusion was adopted to preserve both detailed information and learnstructural information. In order to show the effectiveness of ourmethod, we trained DPN from scratch over three publicly availabledatasets: DRIVE, STARE and CHASE DB1. Experimental resultsshow that our method shows competitive/better performance withonly about 96k parameters. Speciﬁcally, the segmentation speedof our method is over 20-160 × faster than other state-of-the-art methods on the DRIVE dataset. Moreover, to evaluate thegeneralization ability of our method, we adopted cross-trainingexperiments. Results reveal that our method achieves competitiveperformance. Considering the segmentation accuracy, segmenta-tion speed and model generalization ability together, our modelshows superior performance and it is suitable for real world ap-plication. In the future, we aim to extend our method and developrobust deep models for fundus microaneurysms segmentation. R EFERENCES [1] T. Y. Wong, J. Sun, R. Kawasaki, P. Ruamviboonsuk, N. Gupta, V. C.Lansingh, M. Maia, W. Mathenge, S. Moreker, M. M. Muqit et al. , “Guidelines on diabetic eye care: the international council of ophthalmol-ogy recommendations for screening, follow-up, referral, and treatmentbased on resource settings,”

Ophthalmology , vol. 125, no. 10, pp. 1608–1622, 2018.[2] L. Cao, H. Li, Y. Zhang, L. Zhang, and L. Xu, “Hierarchical method forcataract grading based on retinal images using improved haar wavelet,”

Information Fusion , vol. 53, pp. 196–208, 2020.[3] S. Irshad and M. U. Akram, “Classiﬁcation of retinal vessels into arteriesand veins for detection of hypertensive retinopathy,” in . IEEE,2014, pp. 133–136.[4] I. U. Scott, G. Alexandrakis, G. J. Cordahi, and T. G. Murray, “Diffuseand circumscribed choroidal hemangiomas in a patient with sturge-webersyndrome,”

Archives of Ophthalmology , vol. 117, no. 3, pp. 406–407,1999.[5] T. Y. Wong, J. Coresh, R. Klein, P. Muntner, D. J. Couper, A. R. Sharrett,B. E. Klein, G. Heiss, L. D. Hubbard, and B. B. Duncan, “Retinal mi-crovascular abnormalities and renal dysfunction: the atherosclerosis riskin communities study,”

Journal of the American Society of Nephrology ,vol. 15, no. 9, pp. 2469–2476, 2004.[6] U. Schmidt-Erfurth, A. Sadeghipour, B. S. Gerendas, S. M. Waldstein,and H. Bogunovi´c, “Artiﬁcial intelligence in retina,”

Progress in Retinaland Eye Research , vol. 67, pp. 1–29, 2018.[7] S. Guo, K. Wang, H. Kang, Y. Zhang, Y. Gao, and T. Li, “Bts-dsn:Deeply supervised neural network with short connections for retinalvessel segmentation,”

International Journal of Medical Informatics , vol.126, pp. 105–113, 2019.[8] G. Azzopardi and N. Petkov, “Automatic detection of vascular bifurca-tions in segmented retinal images using trainable cosﬁre ﬁlters,”

PatternRecognition Letters , vol. 34, no. 8, pp. 922–933, 2013.[9] Q. Jin, Z. Meng, T. D. Pham, Q. Chen, L. Wei, and R. Su, “Dunet: Adeformable network for retinal vessel segmentation,”

Knowledge-BasedSystems , vol. 178, pp. 149–162, 2019.[10] Z. Yan, X. Yang, and K.-T. Cheng, “A three-stage deep learning modelfor accurate retinal vessel segmentation,”

IEEE Journal of Biomedicaland Health Informatics , vol. 23, no. 4, pp. 1427–1436, 2018.[11] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networksfor semantic segmentation,”

IEEE Transactions on Pattern Analysis andMachine Intelligence , vol. 39, no. 4, pp. 640–651, 2017.[12] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional net-works for biomedical image segmentation,” in

International Confer-ence on Medical Image Computing and Computer-Assisted Intervention .Springer, 2015, pp. 234–241.[13] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu,M. Tan, X. Wang et al. , “Deep high-resolution representation learning forvisual recognition,”

IEEE Transactions on Pattern Analysis and MachineIntelligence , 2020.[14] Y. Wang, G. Ji, P. Lin, and E. Trucco, “Retinal vessel segmentation usingmultiwavelet kernels and multiscale hierarchical decomposition,”

PatternRecognition , vol. 46, no. 8, pp. 2117–2133, 2013.[15] Y. Yin, M. Adel, and S. Bourennane, “Retinal vessel segmentation usinga probabilistic tracking method,”

Pattern Recognition , vol. 45, no. 4, pp.1235–1244, 2012.[16] S. Garg, J. Sivaswamy, and S. Chandra, “Unsupervised curvature-based retinal vessel segmentation,” in

IEEE International Symposium onBiomedical Imaging: From Nano to Macro . IEEE, 2007, pp. 344–347.[17] Q. Li, J. You, L. Zhang, and P. Bhattacharya, “A multiscale approach toretinal vessel segmentation using gabor ﬁlters and scale multiplication,”in

IEEE International Conference on Systems, Man and Cybernetics ,vol. 4. IEEE, 2006, pp. 3521–3527.[18] A. Christodoulidis, T. Hurtut, H. B. Tahar, and F. Cheriet, “A multi-scale tensor voting approach for small retinal vessel segmentation inhigh resolution fundus images,”

Computerized Medical Imaging andGraphics , vol. 52, pp. 28–43, 2016.[19] P. Liskowski and K. Krawiec, “Segmenting retinal blood vessels withdeep neural networks,”

IEEE Transactions on Medical Imaging , vol. 35,no. 11, pp. 2369–2380, 2016.[20] X. Wang, X. Jiang, and J. Ren, “Blood vessel segmentation from fundusimage by a cascade classiﬁcation framework,”

Pattern Recognition ,vol. 88, pp. 331–341, 2019.[21] K.-K. Maninis, J. Pont-Tuset, P. Arbel´aez, and L. Van Gool, “Deep retinalimage understanding,” in

International Conference on Medical ImageComputing and Computer-Assisted Intervention , 2016, pp. 140–148.[22] H. Fu, Y. Xu, S. Lin, D. W. K. Wong, and J. Liu, “Deepvessel: Retinalvessel segmentation via deep learning and conditional random ﬁeld,” in

International Conference on Medical Image Computing and Computer-Assisted Intervention , 2016, pp. 132–139.

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 10 [23] S. Xie and Z. Tu, “Holistically-nested edge detection,”

InternationalJournal of Computer Vision , vol. 125, no. 1-3, pp. 3–18, 2017.[24] Y. Wu, Y. Xia, Y. Song, Y. Zhang, and W. Cai, “Multiscale networkfollowed network model for retinal vessel segmentation,” in

InternationalConference on Medical Image Computing and Computer-Assisted Inter-vention , 2018, pp. 119–126.[25] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning forimage recognition,” in

IEEE Conference on Computer Vision and PatternRecognition , 2016, pp. 770–778.[26] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in

IEEE Conference on Computer Vision and Pattern Recognition , 2015,pp. 1–9.[27] C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, “Deeply-Supervised Nets,” in

International Conference on Artiﬁcial Intelligenceand Statistics , ser. Proceedings of Machine Learning Research, vol. 38,2015, pp. 562–570. [Online]. Available: http://proceedings.mlr.press/v38/lee15a.html[28] J. Staal, M. D. Abr`amoff, M. Niemeijer, M. A. Viergever, and B. Van Gin-neken, “Ridge-based vessel segmentation in color images of the retina,”

IEEE Transactions on Medical Imaging , vol. 23, no. 4, pp. 501–509,2004.[29] A. W. Hoover, V. L. Kouznetsova, and M. H. Goldbaum, “Locating bloodvessels in retinal images by piecewise threshold probing of a matchedﬁlter response,”

IEEE Transactions on Medical Imaging , vol. 19, no. 3,pp. 203–210, 2000.[30] M. M. Fraz, P. Remagnino, A. Hoppe, B. Uyyanonvara, A. R. Rudnicka,C. G. Owen, and S. A. Barman, “Blood vessel segmentation methodolo-gies in retinal images - a survey,”

Computer Methods and Programs inBiomedicine , vol. 108, no. 1, pp. 407–433, 2012.[31] D. Mar´ın, A. Aquino, M. E. Geg´undez-Arias, and J. M. Bravo, “A newsupervised method for blood vessel segmentation in retinal images by us-ing gray-level and moment invariants-based features,”

IEEE Transactionson Medical Imaging , vol. 30, no. 1, pp. 146–158, 2010.[32] S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz,T. Greer, B. ter Haar Romeny, J. B. Zimmerman, and K. Zuiderveld,“Adaptive histogram equalization and its variations,”

Computer Vision,Graphics, and Image Processing , vol. 39, no. 3, pp. 355–368, 1987.[33] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fastfeature embedding,” in

ACM International Conference on Multimedia ,2014, pp. 675–678.[34] X. Glorot and Y. Bengio, “Understanding the difﬁculty of training deepfeedforward neural networks,” in

International Conference on ArtiﬁcialIntelligence and Statistic (AISTATS) , ser. Proceedings of Machine Learn-ing Research, vol. 9. PMLR, 13-15 May 2010, pp. 249–256.[35] D. P. Kingma and J. L. Ba, “Adam: a method for stochastic optimization,”in

International Conference on Learning Representations , 2015, pp. 1–13.[36] K. Simonyan and A. Zisserman, “Very deep convolutional networksfor large-scale image recognition,” in

International Conferenceon Learning Representations , 2015. [Online]. Available: http://arxiv.org/abs/1409.1556[37] A. Oliveira, S. Pereira, and C. A. Silva, “Retinal vessel segmentationbased on fully convolutional neural networks,”

Expert Systems withApplications , vol. 112, pp. 229–242, 2018.[38] Y. Wu, Y. Xia, Y. Song, D. Zhang, D. Liu, C. Zhang, and W. Cai,“Vessel-net: retinal vessel segmentation under multi-path supervision,” in

International Conference on Medical Image Computing and Computer-Assisted Intervention . Springer, 2019, pp. 264–272.[39] B. Wang, S. Qiu, and H. He, “Dual encoding u-net for retinal vessel seg-mentation,” in

International Conference on Medical Image Computingand Computer-Assisted Intervention . Springer, 2019, pp. 84–92.[40] M. M. Fraz, P. Remagnino, A. Hoppe, B. Uyyanonvara, A. R. Rudnicka,C. G. Owen, and S. A. Barman, “An ensemble classiﬁcation-based ap-proach applied to retinal blood vessel segmentation,”

IEEE Transactionson Biomedical Engineering , vol. 59, no. 9, pp. 2538–2548, 2012.[41] Q. Li, B. Feng, L. Xie, P. Liang, H. Zhang, and T. Wang, “A cross-modality learning approach for vessel segmentation in retinal images,”