[PDF] VC-Net: Deep Volume-Composition Networks for Segmentation and Visualization of Highly Sparse and Noisy Image Data

Abstract

The motivation of our work is to present a new visualization-guided computing paradigm to combine direct 3D volume processing and volume rendered clues for effective 3D exploration such as extracting and visualizing microstructures in-vivo. However, it is still challenging to extract and visualize high fidelity 3D vessel structure due to its high sparseness, noisiness, and complex topology variations. In this paper, we present an end-to-end deep learning method, VC-Net, for robust extraction of 3D microvasculature through embedding the image composition, generated by maximum intensity projection (MIP), into 3D volume image learning to enhance the performance. The core novelty is to automatically leverage the volume visualization technique (MIP) to enhance the 3D data exploration at deep learning level. The MIP embedding features can enhance the local vessel signal and are adaptive to the geometric variability and scalability of vessels, which is crucial in microvascular tracking. A multi-stream convolutional neural network is proposed to learn the 3D volume and 2D MIP features respectively and then explore their inter-dependencies in a joint volume-composition embedding space by unprojecting the MIP features into 3D volume embedding space. The proposed framework can better capture small / micro vessels and improve vessel connectivity. To our knowledge, this is the first deep learning framework to construct a joint convolutional embedding space, where the computed vessel probabilities from volume rendering based 2D projection and 3D volume can be explored and integrated synergistically. Experimental results are compared with the traditional 3D vessel segmentation methods and the deep learning state-of-the-art on public and real patient (micro-)cerebrovascular image datasets. Our method demonstrates the potential in a powerful MR arteriogram and venogram diagnosis of vascular diseases.

Full PDF

VVC-Net: Deep Volume-Composition Networks for Segmentationand Visualization of Highly Sparse and Noisy Image Data

Yifan Wang, Guoli Yan, Haikuan Zhu, Sagar Buch, Ying Wang, Ewart Mark Haacke, Jing Hua, and Zichun Zhong

Abstract —The fundamental motivation of the proposed work is to present a new visualization-guided computing paradigm to combinedirect 3D volume processing and volume rendered clues for effective 3D exploration. For example, extracting and visualizingmicrostructures in-vivo have been a long-standing challenging problem. However, due to the high sparseness and noisiness incerebrovasculature data as well as highly complex geometry and topology variations of micro vessels, it is still extremely challenging toextract the complete 3D vessel structure and visualize it in 3D with high ﬁdelity. In this paper, we present an end-to-end deep learningmethod,

VC-Net , for robust extraction of 3D microvascular structure through embedding the image composition, generated by maximumintensity projection (MIP), into the 3D volumetric image learning process to enhance the overall performance. The core novelty is toautomatically leverage the volume visualization technique (e.g., MIP – a volume rendering scheme for 3D volume images) to enhancethe 3D data exploration at the deep learning level. The MIP embedding features can enhance the local vessel signal (through cancelingout the noise) and adapt to the geometric variability and scalability of vessels, which is of great importance in microvascular tracking. Amulti-stream convolutional neural network (CNN) framework is proposed to effectively learn the 3D volume and 2D MIP feature vectors,respectively, and then explore their inter-dependencies in a joint volume-composition embedding space by unprojecting the 2D featurevectors into the 3D volume embedding space. It is noted that the proposed framework can better capture the small / micro vessels andimprove the vessel connectivity. To our knowledge, this is the ﬁrst time that a deep learning framework is proposed to construct ajoint convolutional embedding space, where the computed vessel probabilities from volume rendering based 2D projection and 3Dvolume can be explored and integrated synergistically. Experimental results are evaluated and compared with the traditional 3D vesselsegmentation methods and the state-of-the-art in deep learning, by using extensive public and real patient (micro-)cerebrovascularimage datasets. The application of this accurate segmentation and visualization of sparse and complicated 3D microvascular structurefacilitated by our method demonstrates the potential in a powerful MR arteriogram and venogram diagnosis of vascular disease.

Index Terms —Deep neural network, 3D cerebrovascular segmentation and visualization, maximum intensity projection (MIP), jointembedding

NTRODUCTION

Nowadays, there is a pressing need for better visualizing and under-standing microstructures in the raw and wild datasets. For instance, theacquisition of the in-vivo micro-level 3D vasculature from image data isa grand challenge. The notorious difﬁculties of microvascular data ana-lytics lie in high sparseness of vessel data in a large-sized 3D volume,e.g., the scattered vessel fragments in the angiographic data against theotherwise encompassing white and grey matter (as well as backgroundand noise); high noisiness , such as low signal-to-noise ratio (SNR),e.g., about 10:1 in cerebrovascular images; tininess of micro-level ves-sels, e.g., the diameter of the micro-level vessels in images is merely1 ∼ ∼

100 microns); and sophisticated vessel geometryand topology variations, e.g., local “crossing”, “kissing”, or “tortuous”vessel structures, etc. Currently, for such complex 3D micro-leveldata, it would be impossible from a timing perspective for cliniciansto review all this data manually and label abnormalities slice by slice.The 3D structural / contextual information and quantitative metrics arestill missing, although maximum intensity projection (MIP) [31], awidely-used approach for qualitatively visualizing and analyzing the3D vasculature, has been employed to enhance the local vessel signal,allowing for geometric variability and scalability. The labor-intensive,time-consuming, and 3D global / contextual information-missing natureof the procedure makes it very challenging to fully take advantage of the • Y. Wang, G. Yan, H. Zhu, J. Hua, and Z. Zhong are with the Department ofComputer Science, Wayne State University, Detroit, MI 48202. E-mail: { yifan.wang2,guoliyan,hkzhu,jinghua,zichunzhong } @wayne.edu.• S. Buch, Y. Wang, and E. M. Haacke are with the Department of Radiology,Wayne State University, Detroit, MI 48201. E-mail: { sagarbuchmri,neuying } @gmail.com, [email protected] received xx xxx. 201x; accepted xx xxx. 201x. Date of Publicationxx xxx. 201x; date of current version xx xxx. 201x. For information onobtaining reprints of this article, please send e-mail to: [email protected] Object Identiﬁer: xx.xxxx/TVCG.201x.xxxxxxx large number of 3D datasets (images and shapes) available for referenceand comparison, and reach more informed and accurate decisions.In recent decades, the automatic model-driven vessel extractionand segmentation approaches have been proposed, such as multiscaleﬁltering [12], region growing techniques [27], active contours [30],geometric ﬂow [8], level-set approach [11], nonlinear subtraction (NLS)method [47], template-based predictor-corrector algorithm [14], etc.However, these approaches are easily overwhelmed by tons of low-levelhandcrafted features and complicated manual parameter adjustment toovercome aforementioned difﬁculties and subject variations.Recently, data-driven approaches have been proposed to robustlyinvestigate the correlations between different objects / instances withoutrelying on hard-coded metrics. In medical image visualization andprocessing, several deep learning based methods have been proposed toextract vessels from 2D retinal images, such as DeepVessel [13], multi-level deep supervised networks [29], deep neural network (DNN)-basedmethod [26], uniﬁed convolutional neural network (CNN) and graphneural network (GNN) [37], etc. These methods can perform 2D vesselsegmentation tasks well, but are far from satisfactory on 3D vesselscenario. There are still very few dedicated deep learning architecturesfor 3D vessel segmentation, such as Uception [34], DeepVesselNet [40],and VesselNet [22], etc. Existing methods do not consider to usethe visualization techniques in the 3D vessel extraction and are notspeciﬁcally designed for solving the aforementioned challenges in 3Dmicro-cerebrovascular segmentation.The fundamental motivation of the proposed work is to present a new visualization-guided computing paradigm to combine direct 3D vol-ume processing and volume rendered clues for effective 3D exploration.In order to ﬁll the gap in the high-ﬁdelity 3D micro-cerebrovascularsegmentation and visualization for the medical data in-vivo , we presenta DNN method, VC-Net , for robustly extracting sparse microvascularstructures through embedding the 2D image slice composition by MIPinto the 3D volumetric image learning process to enhance the overallperformance on 3D vasculature segmentation. The core novelty is toautomatically leverage the volume visualization technique (e.g., MIP – a r X i v : . [ c s . G R ] S e p volume rendering technique for 3D volume images) to enhance thequalitative 3D data exploration, especially for 3D in-vivo segmentationand visualization, at the deep learning level. It is noted that the pro-posed framework can better capture the micro vessels and improve thevessel connectivity. The key motivation of our network is to integratethe trustworthy auxiliary from learned 2D MIP features into the 3Dvolume segmentation and visualization network, instead of using morecomplicated networks empirically. Experimental results are evaluatedand compared with the traditional 3D vessel segmentation methods andthe state-of-the-art in deep learning, using extensive public and realpatient (micro-)cerebrovascular image datasets. The key contributions of our work are as follows:• It proposes an effective end-to-end deep learning method to seg-ment and visualize high-ﬁdelity 3D sparse microvascular structurewith complicated geometry and topology variations from volu-metric images with signiﬁcant noise.• A multi-stream CNN framework is designed to effectively learnthe feature vectors of 3D raw volume and multislice compos-ited 2D MIP (volume rendering), respectively, and explore inter-dependencies between 3D and 2D embedded features in a jointvolume-composition embedding space by unprojecting (inversevolume rendering) the 2D features, learned from MIP, into the 3Dvolume embedding space.• To our knowledge, this is the ﬁrst time that a deep learning frame-work is proposed to construct such a joint convolutional embed-ding space, where the computed joint vessel probabilities from2D projection and 3D volume can be integrated synergistically.• The application and experiments on the accurate in-vivo segmenta-tion and visualization of sparse and complicated 3D microvascularstructure facilitated by our method demonstrate the potential ina novel and powerful MR arteriogram and venogram (MRAV)diagnosis of vascular disease. ELATED W ORK

In this section, we review most related work on 2D / 3D vessel extrac-tion and segmentation in visualization and medical imaging domains.

Traditionally, doctors have to manually segment each image slice toobtain accurate vessel structures, which is extremely tedious and time-consuming. Therefore, it is important to develop automatic vesselsegmentation methods. For instance, Wilson and Noble [45] introduceda mixture distribution for the data, motivated by a physical model ofblood ﬂow, that is used in a two-stage segmentation algorithm with a sta-tistical classiﬁer and structural criteria. Chung and Noble [6] presentedan extended version of the previous 3D cerebral vessel segmentationalgorithm [45], and introduced a Rician distribution for backgroundnoise modeling and used a modiﬁed expectation-maximization (EM)algorithm for the parameter estimation procedure. Frangi et al. [12] de-veloped a vessel enhancement ﬁlter by computing the multiscale secondorder local structure of an image (i.e., Hessian). A vesselness measureis obtained on the basis of all eigenvalues of the Hessian. Based on mul-tiscale ﬁltering method [12], Descoteaux et al. [8] developed a novelgeometric ﬂow for segmenting vasculature in proton-density images,which can also be applied to the cases of magnetic resonance angiogra-phy (MRA) or MRI data. Mart´ınez-P´erez et al. [27] presented a retinalblood vessel segmentation method based on scale-space analysis ofobtaining the vessel geometrical features by the ﬁrst and the secondderivative of the intensity in the image. Then they used a multiple passregion growing procedure which progressively segments the blood ves-sels. Nain et al. [30] combined image statistics and shape informationto derive a region-based active contour that segments tubular structuresand penalizes leakages. Liao et al. [25] introduced a fast marchingapproach with curvature regularization for vessel segmentation, sincemost vessels have a smooth path and curvature can be used to distin-guish desired vessels. Florin et al. [10] proposed a particle ﬁlter basedpropagation approach for the segmentation of vascular structures in3D volumes. To obtain posterior probability estimation of the vessel location, Wang et al. [43] employed sequential Monte Carlo trackingand proposed a vessel segmentation method by fusing multiple cuesextracted from CT images for enhanced segments from global pathminimization. Forkert et al. [11] presented and evaluated a level-setsegmentation approach with vesselness-dependent anisotropic energyweights, which focuses on the exact segmentation of malformed aswell as small vessels from time-of-ﬂight (TOF) MRA datasets. Ye etal. [47] proposed non-linear subtraction (NLS) method [47], which isemployed for selective MRA enhancement utilizing the ﬂow rephrasedand dephased images. Then the vessel label can be obtained basedon an enhanced angiography map. Govyadinov et al. [14] described atemplate-based predictor-corrector method for tracing ﬁlaments that isrobust in microvascular datasets, and applied a number of glyph-basedvisualization techniques to represent the aggregated and biologicallyrelevant information of the extracted microvascular network. Then,they developed a bi-modal visualization framework [15], leveraginggraph-based and geometry-based techniques to achieve interactive vi-sualization of microvascular networks. However, these approachesare exhausted by handcrafted features (e.g., gradients of the intensity,second order local structures, maximum principal curvatures) and com-plicated manual parameter adjustment to adapt to the subject variations.Therefore, their robustness and accuracy across subjects are limited.

Recently, there is an emerging trend to automatically extract, seg-ment, and reconstruct shape objects of interest from input 2D / 3Dimages [9, 42, 44] or 3D meshes / point clouds [21, 23, 28, 32, 46] bydeep neural network (DNN) [3, 18]. Particularly for vessel structures,several deep learning based methods have been proposed to extractvessels from 2D retinal images. DeepVessel [13] addresses retinalvessel segmentation as a boundary detection task that is solved using aCNN with a side-output layer to learn discriminative representations,and a conditional random ﬁeld (CRF) layer that accounts for non-localpixel correlations. Li et al. [24] presented a supervised method forvessel segmentation by using the cross-modality data transformationfrom retinal image to vessel map. Mo and Zhang [29] developed adeep supervised fully convolutional network by leveraging multi-levelhierarchical features of the deep networks for retinal vessel segmenta-tion. Liskowski and Krawiec [26] proposed a supervised segmentationtechnique that uses a DNN trained on a large number of samples prepro-cessed with global contrast normalization, zero-phase whitening, andaugmented using geometric transformations and gamma corrections.Shin et al. [37] incorporated a graph neural network (GNN) into auniﬁed CNN architecture to jointly exploit both local appearances andglobal vessel structures. Their framework has been evaluated on retinalimage datasets and a coronary artery X-ray angiography dataset. Thesemethods can perform well on the 2D vessel segmentation task, but arefar from satisfaction / feasibility on 3D micro vessel scenario, sincetheir designs either do not consider the correlation / inter-informationbetween slices in 3D volumetric images or cannot afford the computa-tional and memory burdens in the large 3D volume at the micro-level.As for deep learning-based 3D vessel segmentation, for instance,Uception [34] presents a network inspired by the 3D U-Net [7] andthe Inception modules [38] for segmentation of the cerebrovascularnetwork in MRA images. DeepVesselNet [40] and VesselNet [22]propose 2D orthogonal cross-hair ﬁlters in all sagittal, coronal, andaxial planes on each voxel to make use of 3D context informationat a reduced computational burden and memory cost. However, thechallenging problems in 3D micro-cerebrovascular segmentation arecomplicated vessel geometry and topology, high sparseness and noiseof vessel data in a large-sized 3D volume, and the limited resource of3D microvascular datasets. The above methods do not overcome thesechallenges. ET The fundamental inspiration of the proposed work is to mimic the ob-servation of human exploration in 3D aided by volume rendering. Ourwork presents a new paradigm to combine direct 3D volume processingand volume rendered clues for effective 3D exploration. For instance inFig. 1, for a micro-cerebrovascular dataset, the 3D volume image can irect 3D Segmentation Volume Rendered as a 2D MIP2D MIP SegmentationRaw Image Slices

Traditional Volume Rendering D V o l u m e P a t c h D R e g i o n t o B e P r o c e ss e d Joint 3D Exploration D E n t i r e V o l u m e Fig. 1: A novel 3D data analytic paradigm in a joint volume-composition space, where volume rendered results are used to supportthe visualization-guided 3D volume processing by deep learning.more accurately represent the 3D spatial information, but the desiredtask is easy to be confused by challenging SNR and sparse vesselness,as shown in the raw image slices; while the volume rendered 2D MIPimage can better enhance the local vessel signal by enforcing vesselcontinuity and adapt to the geometric variability and scalability of ves-sels. However, it always lacks 3D spatial sense, e.g., two “crossing”and “kissing” vessels as circled in red (in fact, the bigger one is abovethe smaller one in 3D space). So, it is deﬁcient to investigate the (sparseand noisy) 3D data from either 3D volume or volume rendered 2D MIP,respectively. In this work, we design a novel paradigm to support the3D data analytics, such as segmentation, etc., by using the visualization-guided computing. Instead of conducting the rendering / composition atthe ﬁnal stage as in the traditional visualization pipelines, this paradigmqualitatively investigates the 3D volume data from 2D composited(rendered) images. Essentially, this procedure makes the visualizationmore important via an early and simultaneous involvement of volumerendering (composition). Finally, we explore the 3D data analyticsin a joint volume-composition space. In the following, we introducethe components of the VC-Net model: network architecture and lossfunction, and dataset generation and preparation.

The proposed VC-Net mainly consists of a dual-stream component(i.e., a 3D volume segmentation stream and a 2D composited MIPsegmentation stream) and the bi-directional operations between thesetwo streams (i.e., 3D-to-2D projection and 2D-to-3D unprojection).The overall architecture is demonstrated in Fig. 2. The two-streamsegmentation component can learn vessel feature vectors in 3D volumeand corresponding multiple 2D MIPs (enhanced and dense depictionof 3D relationships via a 3D-to-2D projection computation) contexts,respectively. After that, the embedded features from the 2D compositedMIP are transformed from the 2D MIP domain into the 3D volumedomain through a 2D-to-3D unprojection process. Then, the extracted2D and 3D embedded features from two streams are integrated together,constructing a uniﬁed high-dimensional joint convolutional embeddingspace, which can strengthen the original sparse vessel features from the3D volume. Finally, the vessel segmentation prediction can be learnedat the fusion stage in this joint convolutional embedding space.In this work, we use a 3D U-Net [7] as the 3D volume segmentationstream and a half 2D U-Net [33] (in terms of feature channel numbers)as the 2D composited MIP segmentation stream, respectively. U-Net-like networks are the most commonly-used and robust medical imagingsegmentation neural networks across different data modalities for vary-ing organ / tissue geometries, and thus it is suitable for us to justify thebeneﬁts from our 2D-to-3D unprojection and joint embedding of 3Dvolume and 2D composited MIP. A U-Net-like network is essentially aconvolutional encoder-decoder network, which ﬁrst embeds the inputinto a high-dimensional feature vector through hierarchical convolutionand pooling at the encoder stages, and then decodes the feature vectorin the hidden space through hierarchical upsampling and convolution atthe decoder stages with the integration of the features directed from dif-ferent encoder stages through the long-skipped feature concatenations. In Fig. 2, the layer output feature channel numbers are denoted in thecorresponding blocks and layer input spatial dimensions are shown inthe horizontal levels of every block.Due to the limited data availability and large volume size in micro-cerebrovascular image datasets, we choose to train the network patch-wisely. Speciﬁcally, from the observation that most brain MRAs havemuch higher resolutions in axial plane than other planes, we adap-tively train our network using non-cubic patches, which have largerdimension size across axial plane, instead of resizing the data into auniform voxel spacing through an interpolation before the networktraining, to avoid potential data corruption. As shown in Fig. 2, thekey step in our network is the effective integration of the features fromtwo different streams / domains. In order to fuse the 2D compositedMIP stream into the 3D segmentation main stream within the networkduring the learning process, one may ﬁrst ﬁnd out that segmentationtask is essentially a dense voxel (pixel)-wise classiﬁcation problem,and the integration of embedded feature vectors from different learningdomains and objectives (i.e., 3D volume segmentation and 2D compos-ited MIP segmentation) must be fused voxel-wisely with the correctspatial correspondence in the volumetric domain. Accordingly, thereare two main challenges that need to be overcome in this work. Theﬁrst one is the effective format of the corresponding 2D compositedMIPs from a randomly-extracted 3D volume patch that is adequateand suitable for delivering dense voxel correspondence in the simul-taneous dual-stream learning design. The second one is the effectiveapproach for mapping / unprojecting the feature vectors extracted fromthe composited MIP image plane pixels (in a dimension-reduced 2D)back to the corresponding 3D volume spacing voxels. More details areintroduced in following subsections.

The major motivation for projecting the 3D volume space into the 2DMIP space is to enhance the local vessel probability. Given a randomly-extracted 3D volume patch V of the size K × K × K (e.g., we use × × in our experiments) and K is along the vertical axis.We compute s -sliced (e.g., s = 5 in our experiments as suggested by do-main experts) MIPs of V along vertical axis with overlapping coverageevery t slice interval. Consequently we can get a set of m consecutive/ sliding MIPs, i.e., P = { P , P , . . . , P k , . . . , P m − , P m } , in which P k is the MIP across the [( k − t + 1] th slice to the [( k − t + s ] th slice in V . It is noted that in a 2D MIP, only one voxel with the max-imum intensity among the s voxels along the vertical axis in V willbe recorded, which is prone to an information loss, considering thesegmentation task actually needs the information of every voxel. Con-sequently, we set t = 2 as a trade-off between computation cost andinformation completeness / denseness. We can get m MIPs of size K × K for V , where the MIP number m is computed as: m = (cid:22) t ( K − s ) (cid:23) + 1 . (1)A MIP conveys denser vessel information and is also naturally suit-able for 2D convolution. However, we now have m different MIPs andneed to feed them to our network in the MIP stream in company withthe 3D volume stream V as an input pair to our entire network. Theinformation from the m MIPs is equally important, which means everypixel information should be kept during learning for later back projec-tion. In order to avoid intuitively stacking them to a K × K × m volume such that the 2D CNN (in 2D MIP stream) would essentiallytreat it as a 2D input of a spatial dimension K × K with m differentproperties (feature channels), which is essentially deﬁcient in terms ofthe spatial domain size as well as the operation motivation, we con-vert the m MIPs to a tiled MIP with a larger 2D spatial size, such as . mK × K . In this case, the 2D convolution is operated equallyacross the 2D composited MIP plane domain. The slice indices fromwhere the MIP pixels are selected in the original V are also recorded soas to effectively restore the pixel-wise information extracted from MIPto the 3D volume space, which will be used in the 2D-to-3D unprojec-tion in the following process. The format of the 2D composited MIP(e.g., m = 6 consecutive MIPs) computed from a 3D volume patch isshown in Fig. 3 (a). … … … … … Back Projection Layers (2D-to-3D Unprojection) x x 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 x x

16 128 x x

32 1

Conv3D 3x3x3Maxpooling3D 2x2x2Upsampling3D 2x2x2Concatenation Conv2D 3x3Maxpooling2D 2x2Upsampling2D 2x2Conv3D 1x1x1 Conv2D 1x1 𝐿 𝑚𝑖𝑝 Random 3D Patch

2D Composited MIP

Final Vasculature MaskPrediction x x x x x x

32 6464 64

64 64 x

384 128 x

192 64 x

96 32 x

48 16 x

32 3232 64 6464

128 128128 256 256256 512 512768 256 256384 128 128192

64 64 32 3296

3D Volume Segmentation Stream2D Composited MIP Segmentation Stream

MIP Pixel Index Information

Layer Output Flow 𝐿 𝑣𝑜𝑥3𝐷−2𝐷 x Fig. 2: The architecture of VC-Net. The major procedure includes obtaining the composited MIPs via 3D-to-2D projection, dual-streamsegmentation learning for 3D volume and 2D composited MIP feature vectors, back projecting 2D composited MIP feature vectors into the 3Dvolume feature space via 2D-to-3D unprojection, building a joint convolutional embedding for learning the ﬁnal vasculature mask.

Once the 3D volume and 2D MIP streams learn their segmentationfeatures respectively, we intend to integrate them in a uniﬁed jointhidden feature embedding space to yield the ﬁnal 3D segmentationprediction. In order to achieve this, we conduct several operationswithin our network to unproject (i.e., back project) the pixel featuresextracted from the composited MIP back to their corresponding 3Dvoxel feature space.The ﬁnal-stage hidden feature from 2D composited MIP segmenta-tion stream has the size . mK × K with C channels ( C = 32 as shown in Fig. 2), which is the input of the back projection layers. Weﬁrst disassemble it to restore m C -channel features for the correspond-ing MIPs (e.g., P , P , . . . , P m − , P m , where m = 6 as illustratedin Fig. 3 b). Then we use the recorded index information to map theMIP pixel features back to where they are selected from V duringthe 2D composited MIP generation. Fig. 3 (b) shows how the featurevectors of two consecutive MIPs (e.g., P and P ) are disassembledfrom the composited MIP. They unproject their pixel feature space( P m − j , i.e., the j -th slice among 5-sliced MIP P m , ≤ j ≤ ) backto the voxel feature space ( S n , i.e., the n -th slice in the input 3D patch, ≤ n ≤ ). It is noted that the feature dimension is folded from 3Dto 2D for a convenient illustration in Fig. 3 (b) (i.e., hiding the featuredimension).For the features of overlapping slices (from the consecutive MIPs),which are covered by multiple MIPs, we take the element-wise max-imum value across the overlapping restoration through the featurechannels: F S n [ i ] = max( F P − [ i ] , . . . , F P − [ i ]) , ≤ i ≤ , (2)where F S n [ i ] represents the i -th channel in feature F at the n -th slicein the 3D patch. For example, the feature F S is computed acrossthe overlapping slices of P − , P − , P − as highlighted in pink inFig. 3 (c). The whole process of the cross-MIP fusion in the featurechannels of the 3D volume feature space is shown in Fig. 3 (c) indetail. After that, the unprojected 2D MIP features and 3D volumefeatures from two streams are integrated together, constructing a uniﬁedhigh-dimensional joint convolutional embedding for predicting the ﬁnalvessel segmentation. The major learning objective of our VC-Net is to extract the sparse3D vasculature structure from the 3D MRI volume image using a 3Dsegmentation network supplemented by information from multipledenser and more connected 2D MIPs. Consequently the network lossfunction consists of two terms: L = L vox D − D + λL mip , (3)where L vox D − D is a joint 3D-2D segmentation Dice loss adopted in3D volume stream and deﬁned as: L vox D − D = − x ∈ V p ( x ) g ( x ) + δ Σ x ∈ V p ( x ) + Σ x ∈ V g ( x ) + δ , (4)where p ( x ) and g ( x ) are the predicted voxel-wise vessel probabilitymaps and ground truth binary labels within the query volume patch V , respectively. δ is a small smooth constant. L mip is applied in 2Dcomposited MIP stream and acts as a regularization term during thelearning process, which is also a Dice loss function deﬁned (similarlyto L vox D − D ) within the 2D composited MIP plane and supervised bythe ground truth MIP vessel binary labels. λ is the constant coefﬁcientof L mip , which is set to be . for our best experiment performance. In this work, we use two different real patient datasets to evaluate ourproposed VC-Net method.

Novel MICRO-MRI Imaging and Dataset.

Some researchershave recently developed a next generation of microvascular imaging,i.e., Microvascular In-vivo Contrast Revealed Origins Magnetic Reso-nance Imaging (MICRO-MRI) [36, 41]. Thanks to MICRO-MRI, webecame the ﬁrst ever to be able to acquire such brain imaging datasetsand observe the complicated micro cerebral vessels. This dataset is pro-duced by neurologists and radiologists within our collaborative group.Data was acquired with an adapted 3D gradient echo susceptibilityweighted imaging (SWI) sequence [5] collected from a 3T MR scan-ner. The post-contrast data were acquired during a gradual increasein dose (ﬁnal concentration = 4 mg / kg). Eleven healthy volunteerswere scanned in brain regions with a dual echo SWI sequence at fourtime points: the ﬁrst was acquired pre-contrast and the remaining threewere acquired post-contrast during a gradual increase in dose deliveredover the time frame of 20 min; with the imaging parameters: echo time(TE)1 / TE2 / repetition time (TR) = 7.5 / 22.5 / 27 ms, bandwidth =180 Hz / pxl, ﬂip angle = ◦ (pre-contrast and ﬁnal post-contrast data)and ◦ (ﬁrst and second post-contrast data). The voxel spacing is . × . × mm with a volume size of × × voxels. Major-level vessel data.

This protocol enables multiple imagesources for producing both MR arteriogram (MRAG) and venogram(MRVG). For the MRVG, the pre-contrast quantitative susceptibilitymapping (QSM) and R ∗ constitute two different representations ofveins. In order to obtain the pre-contrast QSM data, the original phasedata was unwrapped using the 3D best path method [2]. The sophis-ticated harmonic artifact reduction for phase data (SHARP) method lice 1-5 Slice 3-7 Slice 5-9Slice 7-11 Slice 9-13 Slice 12-16 (a) 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑚 : the 𝑚 𝑡ℎ MIP of the input 3D patch , 𝑆 𝑛 : the 𝑛 𝑡ℎ slice in input 3D patch , 𝑃 𝑚−𝑗 : the restored 𝑗 𝑡ℎ slice among 5 slices in MIP 𝑃 𝑚 , 𝑥 [𝑖] = the 𝑖 𝑡ℎ dimension of the feature 𝐹 of 𝑥 , 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 (b) (c) 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝑆 𝐹 𝑆 [𝑖] = max(𝐹 𝑃 𝑖 , 𝐹 𝑃 𝑖 , 𝐹 𝑃 𝑖 )1 ≤ 𝑖 ≤ 32 Disassemble U np r o j e c t Cross-MIP Fusion

Back Projection Layers …… 𝑃 𝑃 𝑃 𝑃 𝑃 𝑃

3D Patch2D Composited MIP

Fig. 3: (a) Illustration of the 3D-to-2D projection in the spatial domain for computing a 2D composited MIP from a 3D volume patch. (b) and(c) Illustration of the detailed computations in back projection layers for 2D-to-3D unprojection process in the embedded feature domain. Asillustrated in bottom (b), the consecutive MIPs P and P with overlapping slice coverage of S , S , S contribute to information completenessin 3D patch volume. A pixel location on the 5-sliced MIP 2D plane which keeps the feature information of only one voxel out of ﬁve (e.g., themiddle orange pixel and the left bottom blue pixel in P are back projected to S and S ) now can be supplemented by P ’s back projection (e.g.,the middel green voxel on S and the left bottom purple voxel on S ).was used to estimate the background ﬁeld and remove it from the un-wrapped phase [35]. The truncated k-space inverse ﬁlter approach withan iterative geometric constraint (also known as iSWIM) was appliedto the resultant phase to generate the QSM data [16,39]. The QSM datawas further reﬁned by removing the strong phase gradients from thelong TE phase data based on a quality phase mask. The resultant phasewas used to obtain a QSM data QSM TE2 . The QSM of the short TE dataQSM

TE1 was also generated, but without using a quality map, since ata low TE the phase gradients were not that strong. Finally, the missinginformation on QSM

TE2 was ﬁlled in by applying an inverted qualitymask to QSM

TE1 . To obtain the pre-contrast R ∗ , the short and longTE magnitude data S ( t ) were ﬁtted to the monoexponential equation: S ( t ) = ρe − ( t R ∗ ) , where ρ is the tissue intrinsic proton density.Another MRVG was generated by subtracting the short TE mag-nitude data of pre-contrast from the short TE magnitude data of theﬁrst post-contrast. The above-mentioned subtraction provides a venous-only map V T . The QSM, R ∗ , and V T maps were then normalizedto values between 0 and 1, and an average of these different sourcesproduced a high-quality MRVG referred to as MRVG avg .An MRAG was then calculated using a nonlinear subtraction(NLS) [47], i.e.,

MRAG nls , of the long TE S (cid:48) from the short TE S of the pre-contrast magnitude data as: MRAG nls = S − α S (cid:48) ,where α is a constant with an empirically selected value of . . Due tothe T ∗ effect, this subtraction also enhances the veins, but to a muchsmaller extent than the arteries. Nevertheless, any venous enhancementis discarded by using a mask generated from MRVG avg . Finally, theultimate ground truth vessel labels are obtained by integrating the en-hanced angiography (i.e., arteriogram and venogram) maps [4] from thecomputed

MRAG nls and

MRVG avg , with a threshold-based methodfor the initial masks, followed by domain experts’ post-manual la-beling reﬁnement using our developed cerebrovascular labeling andvisualization tool. Supplemental Material and Video are included fordemonstrating the interactive interface and basic functions in detail.

Micro-level vessel data.

SWI images were generated by homodynehigh-pass ﬁltering (ﬁlter size = × ) the phase images to generate aphase mask, which was multiplied with the original magnitude imagesfour times, for all time points [17]. All the original magnitude andcorrected phase data were then registered to the pre-contrast data. Theshort TE (7.5 ms) magnitude data of pre-contrast and the ﬁrst post-contrast time points were averaged. This averaged magnitude datawas subtracted by the long TE (22.5 ms) SWI data from the last post-contrast time point (4 mg / kg) to enhance the vessels. The vesselswere further enhanced on the resultant subtracted image by applyingthe vesselness algorithm [12] to obtain the micro-level vessel map. Themicro-level vessels from this resultant vessel map were extracted using an adaptive threshold-based region growing method (ATRG) [20], i.e., SWI

ATRG , as the initial masks, followed by domain experts’ manualinspection of the extracted vessels for quality control.

Public MRA Dataset.

In order to compare with the existing meth-ods, we use a public TubeTK Toolkit MRA dataset from University ofNorth Carolina at Chapel Hill [1], acquired by a 3T MR system. Thereare 42 patient cases in the whole dataset, which have the manual-labeledvessel segmentation masks. The voxel spacing of the MRA images is . × . × . mm with a volume size of × × voxels. ESULTS

For both MRA TubeTK and MICRO-MRI datasets (the different modal-ities of input image examples are provided in Supplemental Material),we ﬁrst apply the MR-based skull-stripping method [19] to extractthe pure brain from each image. As we mentioned in Sec. 3.1, ourVC-Net network is designed for patch-wise training and the 3D train-ing patches with the imbalanced dimensions are randomly-extractedwith overlapping focusing on the brain area in the whole 3D MRA /MICRO-MRI, e.g., 80 patches for each TubeTK case and 440 patchesfor each MICRO-MRI major-level vessel case. The random training /validation / testing case split is 33 / 3 / 6 and 6 / 2 / 3 for the TubeTKdataset and the MICRO-MRI major-level vessel case, respectively. Allthe numerical evaluations are reported in terms of whole brain volumeimage patched with no overlapping.Our VC-Net adopts the Adam optimizer with 0.0001 as an initiallearning rate, 0.5 as the learning decay factor, and 10 epochs as thelearning patience across all datasets. In our implementation, we restrictour batch size to 4 due to hardware limits. No batch normalization isadopted in either stream in VC-Net and we use ReLU (Rectiﬁed LinearUnit) activation for both 2D and 3D convolutional layers in correspond-ing streams and sigmoid activation for the ﬁnal vessel probability outputfrom both 2D and 3D streams. The network is implemented in Tensor-Flow framework and the total training time is around 10 hours on twoNVIDIA GeForce GTX 1080 GPUs with 8 GB GDDR5X memory. Theinference time is given in the following subsection. Data and sourcecode of this work will be made available.The performance of our VC-Net and all methods in comparisonare numerically evaluated by the following three quantitative metrics,which are deﬁned from the classiﬁer confusion matrix from differentaspects:

Dice Similarity (Dice) , T P/ (2 T P + F P + F N ) , (the same asF-score under most of the circumstances) generally measures the in-tersection over union between prediction and ground truth. It involvestrue positive ( T P ), false positive (

F P ), and false negative (

F N ), soas to be the most comprehensive indicator to evaluate the sparse vesselsegmentation in a large portion of background, i.e., true negative (

T N ). recision , T P/ ( T P + F P ) , measures the model ability of rulingout the noise contributions and obtaining the correct vessel voxels. False Positive Rate (FPR) , F P/ ( F P + T N ) , examines the modelability of distinguishing the real background and noise against vessels,which is crucial for the clinical purpose.The best results in tables are shown in bold font. Here we do notinclude the metric of Accuracy, due to its extremely high value (e.g., ≥ ) for all methods. The reason is that it involves dominant portionof background (true negative) together with highly sparse target (e.g.,the segmented vessels in our task) in computation and consequentlyloses its effectiveness for segmentation evaluation. We ﬁrst compare our VC-Net performance on TubeTK dataset with fourstate-of-the-art deep learning based methods (i.e., 3D U-Net [7], 2DU-Net [33], DeepVesselNet [40], and Uception [34]) and one classicalparametric intensity-based method (i.e., vesselness algorithm [8,12]) in3D vessel segmentation. All deep learning methods in comparison aretrained until convergence by using the same dataset split or using theresults reported from their original publication (such as Uception). For2D U-Net, we train it with ×

2D patches, whose amount isover 10 times of the 3D patch amount extracted for the 3D CNN basedmethods in comparison with on-the-ﬂy data augmentation for a fair dataacquisition. For DeepVesselNet, we have tried different combinationsof their data pre-processing process and chosen the image intensityclipping for obtaining an optimal performance on TubeTK dataset.The quantitative performance comparison of these methods on Tu-beTK dataset is shown in Tab. 1. ‘ − ’ means ‘not applicable’ due tolack of their implementations or results. Here we also provide the per-volume inference time and the parameter number to evaluate the modelefﬁciency besides the segmentation performance. From Tab. 1, we cansee that our VC-Net has overall the best segmentation performanceamong all the methods on TubeTK dataset. With the 2D compositedMIP feature integration, our network performs better than a pure 3DU-Net [7] over the three different metrics on segmentation results. Thequalitative comparison of MIP-wise (e.g., 5-sliced) segmentation re-sults and 3D global vessel segmentation results between our VC-Netand 3D U-Net (one of the most robust state-of-the-art deep learningbased methods for biomedical image segmentation) is shown in Fig. 4(a). With the 2D composited MIP complementary information, theﬁnal vessel segmentation shows better connectivity and better smallvessel capturing as marked in red circles (3D global vessel segmen-tation visualization) and green circles (2D MIP vessel segmentationvisualization). Besides the segmentation performance gain, the increaseof time and space complexities in VC-Net is not high compared with astandalone 3D U-Net as shown in Tab. 1, since only a half 2D U-Net(i.e., 7.8 million parameters) is involved in the 2D MIP stream. Onthe other hand, since the computational complexity of 3D convolutionoperations apparently overweighs that of 2D convolution operations,the 3D stream still dominates the computational complexity of theentire VC-Net. Another observation is that the 3D U-Net greatly out-performs 2D U-Net [33] even if the latter contains many more featureembedding channels, since the former method is able to capture thecross-slice continuity and that is why 3D CNN should be involved insuch sparse 3D object segmentation with complex topology. Moreover,the full 2D U-Net implies much larger amount of 2D convolution oper-ations and model parameters, which lead to the unsatisfactory modelefﬁciency. DeepVesselNet [40] fails to yield a good performance asthey reported in their own dataset, which could result from the lackof the pre-training procedure, i.e., a relatively complicated data pre-processing, and the instability of their loss function, which is severelysensitive to the training perturbation. In addition, their light-weightednetwork only consists of ﬁve convolutional layers for high efﬁciency,whereas its simplicity may undermine its cross-dataset robustness. Thequalitative comparison with DeepVesselNet is given in Fig. 4 (b), fromwhich you can see that the DeepVesselNet result is much noisier andhas severer connectivity issues. Our method also outperforms the bestUception result reported in [34] on TubeTK dataset with even less datapre-processing procedure, which implies that empirical neural network modiﬁcation to increase the model complexity does not guarantee abetter performance all the time. To be comprehensive, we apply thevesselness algorithm [8, 12] as a traditional benchmark method forcomparison based on the available data modality in TubeTK dataset,which is a widely-used approach to segment the cylindrical vessel struc-tures in medical ﬁeld. As shown in Tab. 1, all deep learning methodsgreatly outperform classical vesselness method on TubeTK dataset withmuch higher efﬁciency, which is beneﬁcial from the essence of deeplearning techniques: mostly end-to-end, more adaptive non-linearity,controllable ambiguity for feature extraction and integration processthan traditional complex manual parameter-driven algorithms to boostthe overall robustness and accuracy for real-world image learning tasks.More qualitative comparisons with 3D U-Net and DeepVesselNet areprovided in Supplemental Material.Table 1: Quantitative performance evaluation of different methods onTubeTK dataset. Methods / Metrics Dice (%) ↑ Precision (%) ↑ FPR (%) ↓ Time (s) ↓ ↓ Ours − − − −

DeepVesselNet 64.12 63.75 0.1465

2D U-Net 65.10 70.05 0.1041 16.7 31 MVesselness 37.71 47.69 0.1393 186.6 − As mentioned in Sec. 2, most of the recently related work on brainvasculature segmentation tasks is limited to major arteries (in Sec. 4.1)since currently most of the available brain MRI image datasets withadequate amount and consistent quality are MRAGs. However, with as-sistance from the neurologists and radiologists under our collaboration,we can now extend VC-Net from general artery segmentation to the vas-culature segmentation of major artery and major vein, separately. Moreinspiringly, we also demonstrate that our VC-Net is capable of extract-ing the micro vessels in the complicated real patient MICRO-MRIs. Itis noted that the segmentation of brain vessels becomes more challeng-ing in micro-level than major-level, and more challenging in veins thanarteries. The following experiments show that our method has biggerimprovements on more challenging cases (i.e., micro-level vessel andmajor-level vein segmentations) compared with other methods.

The MRAGs in clinic MICRO-MRI datasets under our collaboration asmentioned in Sec. 3.2 focus on midbrain area from where the major-level vessels are relatively denser and more observable. Currently ourcollaborative domain experts apply the state-of-the-art model-drivenNLS method [47] (i.e., MRAG nls ) followed by case-wise thresholdselection to extract the clean midbrain vessels, which requires differ-ent data modalities and tedious manual parameter-tuning as stated inSec. 3.2. However, from Fig. 5 (a) we can see that its segmentationresult still fails to be free from location-dependent interference, suchas superior sagittal sinus (red dotted circles in 3D visualization andgreen dotted circles in MIP visualization) and some random scatteredvoxel noise (red solid circles). Aiming to improve the segmentationperformance with less manual-parameter tuning and less modality re-quirement yet provide a much more efﬁcient method for major arteryextraction that is well applicable for future patient case collection, wetrain our VC-Net with only TE1 pre-contrast SWI (a single-modalMRAG) data as input. From Tab. 2 we can see that our quantitativeevaluation results outperform the MRAG nls method on all metrics evenif the latter one integrates and enhances artery signal from several dif-ferent data modalities. Here we also include our numerical comparisonwith 3D U-Net (under the same experiment setting), the most compet-itive method on TubeTK dataset to show our network’s cross-datasetrobustness and superiority. One can also observe that the numerical dif-ference of the performance in MICRO-MRI major-level artery datasetis not as obvious as the major-level vein and micro-level vessel datasetsas shown in the following two subsections, which may result fromthe fact that the MRAGs are relatively clearer in terms of the imagedose effect and the noise type. More qualitative comparisons with theMRAG nls method are provided in Supplemental Material. urs 3D U-Net Ground Truth Ours DeepVesselNet Ground Truth (a) (b)

Fig. 4: Some qualitative comparison results from TubeTK dataset: The 3D global vessel segmentations are shown from superior direction. TheMIP segmentations are visualized by 5-sliced MRA images, and the corresponding vessel masks in MIPs are marked in semi-transparent red. Thehighlighted comparison areas are marked in circles. Yellow-circled areas are some minor mistakes in the ground truth (discussed in Sec. 4.2.4).Table 2: Quantitative performance evaluation of different methods onmajor-level artery segmentation.

Methods / Metrics Dice (%) ↑ Precision (%) ↑ FPR (%) ↓ Ours

3D U-Net 82.64 83.52 0.0342MRAG nls

Currently, the MRVGs are not readily and directly available from thescanner, so there is no raw MRI image that can produce pure veins.There are different ways that people have used to derive it such as theSWI, QSM, or R ∗ data, where the veins are highlighted. However,they all have background tissues as well as noise associated with them.In this work, the MRVGs from the MICRO-MRI dataset are ultimatelyacquired through the MRVG avg method by enhancing vein signals fromdifferent data resources as mentioned in Sec. 3.2. Our collaborativedomain experts compute the vein labels by post-manual case-wisethreshold adjustment on the MRVGs. However, we can see from themajor-level vein case in Fig. 5 (b) that the vein labels still fail tobe free from artery artifacts as marked in red / green dotted circles.In addition, the MRVGs overall have more challenging noise type(e.g., very strong artery artifacts) than single-modal MRAGs due tomodality formulation; therefore, the corresponding intensity-based veinlabels tend to have more fuzzy edges. However, our VC-Net is able toeffectively overcome the aforementioned difﬁculty as shown in Fig. 5(b). Tab. 3 shows the numerical comparison among ours, MRVG avg ,and 3D U-Net under the same experiment setting. We can see thatour numerical results overall outperform the other two methods. Thegeneral lower numerical performance compared to artery segmentation(in the previous subsection) may result from more challenging inputdata type and the sinus region (i.e., the large and thick vein-like area atthe bottom in 3D global segmentation). More qualitative comparisonswith the MRVG avg method are provided in Supplemental Material. Table 3: Quantitative performance evaluation of different methods onmajor-level vein segmentation. Methods / Metrics Dice (%) ↑ Precision (%) ↑ FPR (%) ↓ Ours

3D U-Net 76.02 80.40 0.0946MRVG avg

As mentioned in the introduction, the micro-cerebrovasculature turnsout to be a good physical indicator of many neurological disorders andvascular diseases; thus it is extremely important and a breakthroughfor MICRO MRAV diagnosis of vascular disease to trace small vesselsand analyze their topology, morphology, density, and distribution withdirect visual inspection of microvascular abnormalities in-vivo . Besidesthe major-level vessel segmentation, our VC-Net shows great capabilityto track the micro vessels as well. In this experiment, it is notedthat we have the input data modality format which is different fromthose in the previous experiments as shown in the ﬁrst column inFig. 6 (c), i.e., the minimum intensity projection (MinIP) images. Thedifferent modalities of input image slice examples in the experimentsare provided in Supplemental Material. All vessels, including majorand micro ones, appear to be dark (very low voxel intensity) and showno contrast to the dark background, which may cause confusion to ournetwork in the 2D composited MIP segmentation stream even if weaccordingly switch to compute the MinIP instead. In order to keep theframework consistency and take advantage of the pre-trained networkin the previous subsections, we inverse the voxel intensity within thebrain foreground area in the whole 3D SWI image and then extract1320 random patches from each training image (considering the vesselsare much denser in a large-sized 3D volume, more patches per case canmake up for limited datasets, e.g., two training cases). By ﬁne-tuningour VC-Net pre-trained on major-level MRI images with these patches,our network is capable of capturing the continuous micro vessels clearly urs Ground Truth Ours MRAG nls

Ground Truth Major-Level Artery Major-Level Vein (a) (b)

MRVG avg

Fig. 5: Some qualitative comparison results from MICRO-MRI major-level vessel dataset: The 3D global vessel segmentations are shown fromsuperior direction. The MIP segmentations are visualized by 5-sliced MICRO-MRI images, and the corresponding vessel masks in MIPs aremarked in semi-transparent red. The highlighted comparison areas are marked in circles. The 3D MRAG / MRVG images from MICRO-MRIdataset only focus on midbrain area and thus have less vessels compared with TubeTK dataset.as shown in Fig. 6.Currently, SWI is the only data modality available to capture themicro-level vessels and it includes a large number of major-level ves-sels as well. Consequently, it is quite challenging to provide a rigorousnumerical evaluation on pure micro-level vessels. Alternatively, Tab. 4shows the numerical evaluation based on the whole SWI image for ref-erence, in which our method quantitatively outperforms 3D U-Net (alsoﬁne-tuned from the weights pre-trained on the same major-level MRIimages) and the SWI

ATRG method on Dice, precision, and FPR metrics.The SWI

ATRG method is a state-of-the-art algorithm that our collabora-tive domain experts are currently using as described in Sec. 3.2.Fig. 6 (a) and (b) show our whole brain segmentation result (ingold) accompanied by non-overlapping midbrain subarea segmentationresults (in red) and their corresponding ground truth (in blue). Fig. 6(c) and (d) visualize the qualitative performance of pair-wise compar-isons. From Fig. 6 (d), we can see that the result from the SWI

ATRG method suffers severe voxel intensity noise (circled in yellow) andunexpectedly thicker vessels (circled in blue) due to the bold intensitythreshold in sacriﬁce to capture as many micro vessels as possible;however, it still lacks the satisﬁable ability of detecting micro vesselsas shown in the corresponding zoom-in green patch error maps (white:true positive, red: false positive, blue: false negative, black: true nega-tive). In addition, the SWI

ATRG method requires several data modalitieswhich are acquired from different time points as mentioned in Sec. 3.2;consequently, the corresponding computed vessel mask also has non-negligible registration errors (circled in white). However, even if 3DU-Net can alleviate most of the issues that the SWI

ATRG method is facedwith, it is still insufﬁcient to track the super micro vessel without the2D MIP complementary information as shown in ﬁrst two zooming-inpatches (circled in yellow) and their corresponding error maps in Fig. 6(c). Similar as the SWI

ATRG method, 3D U-Net also performs moreboldly on covering major-level vessels (less blue on error maps) as shown in the third zooming-in patch in Fig. 6 (c). However, 3D U-Netincurs more noises (more red on error maps) and thus it has a worseprecision.Table 4: Quantitative performance evaluation of different methods onmicro-level vessel segmentation.

Methods / Metrics Dice (%) ↑ Precision (%) ↑ FPR (%) ↓ Ours

3D U-Net 74.08 72.70 0.7989SWI

ATRG

It is interesting to note that in the TubeTK dataset even the groundtruth vessel mask does not cover certain vessel continuity, which canbe clearly traced on MIPs (such as some yellow circles in the groundtruth MIPs in Fig. 4), since it is very difﬁcult to label all the vesselsin corresponding MRA slices in a single slice-by-slice manipulationwithout referring to MIP and 3D global visualization. However, basedon the ground truth MIP labeling slices in the MICRO-MRI datsetfrom our experiments and collaborative evaluations, we can see thatsuch issue is greatly alleviated, since our cerebrovascular labeling andvisualization tool is applied to generate (reﬁne) these ground truthvessel labels. In the future, we will further reﬁne the ground truthvessel labels of TubeTK dataset by using our developed labeling tool(such as examples shown in Supplemental Video) under the domainexperts’ guidance for better public sharing and use.Last but not least, to our knowledge, we are the ﬁrst to investigateand apply our 3D brain vasculature segmentation to different vesseltypes and levels, especially the micro-level vessel segmentation; also,it is the ﬁrst time that the whole brain vessels with different types /levels can be visualized in-vivo . Therefore, we have also designed avisualization tool for jointly showing different vasculature systems. Our lice 39-54 (b)

Slice 23-38 Slice 55-70Our Whole Brain Vessel Segmentation (a)

Our Error MapOurs

Ground Truth

MinIP

3D U-Net Error

Map3D U-Net (c)

SWI

ATRG

Ours Ground Truth

SWI

ATRG

Error MapOurs Error Map (d)

Fig. 6: (a) Whole brain micro vessel segmentation result of our method from superior direction. (b) Midbrain non-overlapping subarea detailvisualization (in red) with ground truth in comparison (in blue). (c) Qualitative comparison results between our method and 3D U-Net onMICRO-MRI dataset, shown as three patch details extracted from three different 5-sliced MinIPs and their corresponding error maps onmicro-level and major-level vessels. (d) Qualitative comparison results between our method and the SWI

ATRG method on MICRO-MRI dataset,shown as two 5-sliced MinIP segmentations and their corresponding error maps on micro-level vessels. The highlighted comparison areas aremarked in circles. (a) (b)

Fig. 7: Joint 3D visualization of our segmentation results on MICRO-MRI dataset in two different testing cases: (a) Whole midbrain major-level arteries (in red) and veins (in blue). (b) Major-level arteries (inred), major-level veins (in blue), and micro-level vessels (in pink) fromslice No. 20 to 40 within midbrain area. Some large pink vessels arealso major-level ones which are absent from major-level MRAGs andMRVGs due to the different image acquisitions.tool enables the visualization for any combination of different vesselsystems in user-deﬁned color and lighting, and support all essentialauxiliary interactions such as rotation, translation, scaling, zooming in/ out, clipping, etc., for better examination. Fig. 7 (a) shows the jointvisualization for our prediction results of major-level midbrain arteriesand veins; and Fig. 7 (b) shows all three vasculature systems alignedtogether, i.e., major-level midbrain arteries and veins, and micro-levelvessels, from MICRO-MRI dataset. Supplemental Video is includedfor demonstrating the dynamic visualization and interaction in detail.

ONCLUSION

In this work, we have proposed the VC-Net, a deep neural network toextract and visualize high-ﬁdelity 3D cerebrovascular structure from highly sparse and noisy images. VC-Net has three major components,i.e., 3D and 2D dual-domain segmentation streams, 3D-to-2D pro-jection for two-stream design, and 2D-to-3D unprojection for jointembedding operations. By unprojecting the learned multislice compos-ited 2D MIP feature vectors into the 3D volume embedding space, theproposed framework can strengthen the sparse 3D vascular representa-tion by better capturing the small / micro vessels as well as improvingthe vessel connectivity, which outperforms the state-of-the-art classicaland deep learning based methods. In medical practice, this work canbe used as the key functions for real-time in-vivo segmentation andvisualization of sparse and complicated 3D microvascular structure toimprove MICRO MRAV diagnosis of vascular disease.In the future, we will continue to explore research problems relatedto volume rendering supported 3D exploration and analysis to leverageboth the 2D ﬁndings and the 3D knowledge and analytics by deepneural networks. We will extend current MIP-based volume rendering(i.e., a special case of volume rendering) into more general volumerendering scenarios, such as X-ray projections, full RGB composition,multi-view MIPs, and ﬂow modeling concepts. A CKNOWLEDGMENTS

We would like to thank the reviewers for their valuable comments.We are grateful to Yongsheng Chen from Neurology for the earlydiscussion of this work, Pavan K. Jella from Radiology for preparingand collecting the clinical datasets, and Michelle Hua from CranbrookSchools for pre-processing the datasets and proofreading the paper.This work was partially supported by the NSF under Grant NumbersIIS-1816511, CNS-1647200, OAC-1657364, OAC-1845962, OAC-1910469, the Wayne State University Subaward 4207299A of CNS-1821962, NIH 1R56AG060822-01A1, NIH 1R44HL145826-01A1,ZJNSF LZ16F020002, and NSFC 61972353.

EFERENCES [1] TubeTK MRA dataset. https://public.kitware.com/Wiki/TubeTK/Data.[2] H. Abdul-Rahman, M. Gdeisat, D. Burton, M. Lalor, F. Lilley, andC. Moore. Fast and robust three-dimensional best path phase unwrap-ping algorithm.

Applied Optics , 46(26):6623–6635, 2007.[3] M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst. Geo-metric deep learning: going beyond Euclidean data.

IEEE Signal Process-ing Magazine , 34(4):18–42, 2017.[4] S. Buch, Y. Wang, M.-G. Park, P. Jella, J. Hu, Y. Chen, K. Shah, Y. Ge,and E. Haacke. Subvoxel vascular imaging of the midbrain using USPIO-Enhanced MRI.

NeuroImage , p. 117106, 2020.[5] Y. Chen, S. Liu, S. Buch, J. Hu, Y. Kang, and E. Haacke. An interleaved se-quence for simultaneous magnetic resonance angiography (MRA), suscep-tibility weighted imaging (SWI) and quantitative susceptibility mapping(QSM).

Magnetic Resonance Imaging , 47:1–6, 2018.[6] A. Chung and J. Noble. Statistical 3D vessel segmentation using a Riciandistribution. In

Proceedings of International Conference on Medical ImageComputing and Computer Assisted Intervention , pp. 82–89, 1999.[7] ¨O. C¸ ic¸ek, A. Abdulkadir, S. Lienkamp, T. Brox, and O. Ronneberger. 3DU-Net: learning dense volumetric segmentation from sparse annotation.In

Proceedings of International Conference on Medical Image Computingand Computer Assisted Intervention , pp. 424–432, 2016.[8] M. Descoteaux, D. Collins, and K. Siddiqi. A geometric ﬂow for segment-ing vasculature in proton-density weighted MRI.

Medical Image Analysis ,12(4):497–513, 2008.[9] H. Fan, H. Su, and L. Guibas. A point set generation network for 3Dobject reconstruction from a single image. In

Proceedings of the IEEEConference on Computer Vision and Pattern Recognition , pp. 605–613,2017.[10] C. Florin, N. Paragios, and J. Williams. Globally optimal active contours,sequential Monte Carlo and on-line learning for vessel segmentation. In

Proceedings of European Conference on Computer Vision , pp. 476–489,2006.[11] N. Forkert, A. Schmidt-Richberg, J. Fiehler, T. Illies, D. M¨oller, D. S¨aring,H. Handels, and J. Ehrhardt. 3D cerebrovascular segmentation combiningfuzzy vessel enhancement and level-sets with anisotropic energy weights.

Magnetic Resonance Imaging , 31(2):262–271, 2013.[12] A. Frangi, W. Niessen, K. Vincken, and M. Viergever. Multiscale vesselenhancement ﬁltering. In

Proceedings of International Conference onMedical Image Computing and Computer Assisted Intervention , pp. 130–137, 1998.[13] H. Fu, Y. Xu, S. Lin, D. Wong, and J. Liu. Deepvessel: Retinal vessel seg-mentation via deep learning and conditional random ﬁeld. In

Proceedingsof International Conference on Medical Image Computing and ComputerAssisted Intervention , pp. 132–139, 2016.[14] P. Govyadinov, T. Womack, J. Eriksen, G. Chen, and D. Mayerich. Robusttracing and visualization of heterogeneous microvascular networks.

IEEETransactions on Visualization and Computer Graphics , 25(4):1760–1773,2018.[15] P. Govyadinov, T. Womack, J. Eriksen, D. Mayerich, and G. Chen. Graph-assisted visualization of microvascular networks. In

Proceedings of IEEEVisualization Conference , pp. 1–5, 2019.[16] E. Haacke, J. Tang, J. Neelavalli, and Y. Cheng. Susceptibility mappingas a means to visualize veins and quantify oxygen saturation.

Journal ofMagnetic Resonance Imaging , 32(3):663–676, 2010.[17] E. Haacke, Y. Xu, Y.-C. Cheng, and J. Reichenbach. Susceptibilityweighted imaging (SWI).

Magnetic Resonance in Medicine , 52(3):612–618, 2004.[18] J. Hua, J. Hu, and Z. Zhong.

Spectral Geometry of Shapes: Principles andApplications . Springer, 2019.[19] M. Jenkinson, M. Pechaud, and S. Smith. BET2: MR-based estimation ofbrain, skull and scalp surfaces. 2005.[20] J. Jiang, M. Dong, and E. Haacke. ARGDYP: an adaptive region growingand dynamic programming algorithm for stenosis detection in MRI. In

Proceedings of IEEE International Conference on Acoustics, Speech, andSignal Processing , vol. 2, pp. ii–465, 2005.[21] H. Jin, Y. Lian, and J. Hua. Learning facial expressions with 3D meshconvolutional neural network.

ACM Transactions on Intelligent Systemsand Technology , 10(1):1–22, 2018.[22] T. Kitrungrotsakul, X.-H. Han, Y. Iwamoto, L. Lin, A. Foruzan, W. Xiong,and Y.-W. Chen. VesselNet: A deep convolutional neural network withmulti pathways for robust hepatic vessel segmentation.

Computerized Medical Imaging and Graphics , 75:74–83, 2019.[23] A. Komarichev, Z. Zhong, and J. Hua. A-CNN: Annularly convolutionalneural networks on point clouds. In

Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition , pp. 7421–7430, 2019.[24] Q. Li, B. Feng, L. Xie, P. Liang, H. Zhang, and T. Wang. A cross-modality learning approach for vessel segmentation in retinal images.

IEEE Transactions on Medical Imaging , 35(1):109–118, 2015.[25] W. Liao, K. Rohr, and S. W¨orz. Globally optimal curvature-regularizedfast marching for vessel segmentation. In

Proceedings of InternationalConference on Medical Image Computing and Computer Assisted Inter-vention , pp. 550–557, 2013.[26] P. Liskowski and K. Krawiec. Segmenting retinal blood vessels with deepneural networks.

IEEE Transactions on Medical Imaging , 35(11):2369–2380, 2016.[27] M. Mart´ınez-P´erez, A. Hughes, A. Stanton, S. Thom, A. Bharath, andK. Parker. Retinal blood vessel segmentation by means of scale-spaceanalysis and region growing. In

Proceedings of International Conferenceon Medical Image Computing and Computer Assisted Intervention , pp.90–97, 1999.[28] J. Masci, D. Boscaini, M. Bronstein, and P. Vandergheynst. Geodesicconvolutional neural networks on Riemannian manifolds. In

Proceedingsof the IEEE International Conference on Computer Vision Workshops , pp.37–45, 2015.[29] J. Mo and L. Zhang. Multi-level deep supervised networks for retinal ves-sel segmentation.

International Journal of Computer Assisted Radiologyand Surgery , 12(12):2181–2193, 2017.[30] D. Nain, A. Yezzi, and G. Turk. Vessel segmentation using a shape drivenﬂow. In

Proceedings of International Conference on Medical ImageComputing and Computer Assisted Intervention , pp. 51–59, 2004.[31] S. Napel, M. Marks, G. Rubin, M. Dake, C. McDonnell, S. Song, D. Enz-mann, and R. Jeffrey Jr. CT angiography with spiral CT and maximumintensity projection.

Radiology , 185(2):607–610, 1992.[32] C. Qi, H. Su, K. Mo, and L. Guibas. PointNet: Deep learning on pointsets for 3D classiﬁcation and segmentation. In

Proceedings of the IEEEConference on Computer Vision and Pattern Recognition , pp. 652–660,2017.[33] O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional networksfor biomedical image segmentation. In

Proceedings of International Con-ference on Medical Image Computing and Computer Assisted Intervention ,pp. 234–241, 2015.[34] P. Sanchesa, C. Meyer, V. Vigon, and B. Naegel. Cerebrovascular networksegmentation of MRA images with deep learning. In

Proceedings of IEEEInternational Symposium on Biomedical Imaging , pp. 768–771, 2019.[35] F. Schweser, A. Deistung, B. Lehr, and J. Reichenbach. Quantitativeimaging of intrinsic magnetic tissue properties using MRI signal phase: anapproach to in vivo brain iron metabolism?

Neuroimage , 54(4):2789–2807,2011.[36] Y. Shen, J. Hu, K. Eteer, Y. Chen, S. Buch, H. Alhourani, K. Shah,Q. Jiang, Y. Ge, and E. Haacke. Detecting sub-voxel microvasculaturewith USPIO-enhanced susceptibility-weighted MRI at 7 T.

MagneticResonance Imaging , 67:90–100, 2020.[37] S. Shin, S. Lee, I. Yun, and K. Lee. Deep vessel segmentation by learninggraphical connectivity.

Medical Image Analysis , 58:101556, 2019.[38] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In

Proceedingsof The AAAI Conference on Artiﬁcial Intelligence , 2017.[39] J. Tang, S. Liu, J. Neelavalli, Y. Cheng, S. Buch, and E. Haacke. Im-proving susceptibility mapping using a threshold-based k-space/image do-main iterative reconstruction approach.

Magnetic Resonance in Medicine ,69(5):1396–1407, 2013.[40] G. Tetteh, V. Efremov, N. Forkert, M. Schneider, J. Kirschke, B. Weber,C. Zimmer, M. Piraud, and B. Menze. DeepVesselNet: Vessel segmenta-tion, centerline prediction, and bifurcation detection in 3-D angiographicvolumes. arXiv preprint arXiv:1803.09340 , 2018.[41] H. Wang, Q. Jiang, Y. Shen, L. Zhang, E. Haacke, Y. Ge, S. Qi, andJ. Hu. The capability of detecting small vessels beyond the conventionalMRI sensitivity using iron-based contrast agent enhanced susceptibilityweighted imaging.

NMR in Biomedicine , 2020.[42] N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.-G. Jiang. Pixel2Mesh:Generating 3D mesh models from single RGB images. In

Proceedings ofthe European Conference on Computer Vision , pp. 52–67, 2018.[43] S. Wang, B. Peplinski, L. Lu, W. Zhang, J. Liu, Z. Wei, and R. Summers.Sequential Monte Carlo tracking for marginal artery segmentation on CTngiography by multiple cue fusion. In

Proceedings of International Con-ference on Medical Image Computing and Computer Assisted Intervention ,pp. 518–525, 2013.[44] Y. Wang, Z. Zhong, and J. Hua. DeepOrganNet: On-the-ﬂy reconstructionand visualization of 3D/4D lung models from single-view projectionsby deep deformation network.

IEEE Transactions on Visualization andComputer Graphics , 26(1):960–970, 2019.[45] D. Wilson and J. Noble. Segmentation of cerebral vessels and aneurysmsfrom MR angiography data. In

Proceedings of Biennial InternationalConference on Information Processing in Medical Imaging , pp. 423–428,1997.[46] H. Xu, M. Dong, and Z. Zhong. Directionally convolutional networksfor 3D shape segmentation. In

Proceedings of the IEEE InternationalConference on Computer Vision , pp. 2698–2707, 2017.[47] Y. Ye, J. Hu, D. Wu, and E. Haacke. Noncontrast-enhanced magnetic reso-nance angiography and venography imaging with enhanced angiography.

Journal of Magnetic Resonance Imaging , 38(6):1539–1548, 2013.

UPPLEMENTAL M ATERIAL

S1 S

UPPLEMENTAL F IGURES

Supplemental ﬁgures are included for demonstrating additional quali-tative results from TubeTK and MICRO-MRI datasets in Fig. S1, anddifferent modalities of input image examples from TubeTK MRA andMICRO-MRI datasets in our VC-Net method in Fig. S2.

S2 A

DDITIONAL Q UANTITATIVE P ERFORMANCE E VALUATION

In order to further demonstrate the effectiveness of our VC-Net (espe-cially 3D-to-2D projection in dual-stream and 2D-to-3D unprojectionfor joint embedding in our proposed architecture), Tab. S1 shows thenumerical analyses on some simple combinations of the ﬁnal resultsfrom 3D U-Net and 2D U-Net through average and max operations onthe probabilities.Table S1: Quantitative performance evaluation between different com-binations of 3D U-Net and 2D U-Net and our method on TubeTKdataset.

Methods / Metrics Dice (%) ↑

2D U-Net 65.103D U-Net 71.01Average Fusion 65.15Max Fusion 69.41Ours

From Tab. S1, we can see our VC-Net overall outperforms bothcombination methods of the ﬁnal results of 3D U-Net and 2D U-Net.As shown in Sec. 4.1 of the paper, 2D U-Net performs much worsethan a standalone 3D U-Net on each metric. Unlike the 2D compositedMIP stream in VC-Net, 2D U-Net itself essentially does not involve anycomplementary or enhancement information, and the reception ﬁeldof 2D U-Net is restricted to an isolated 2D slice patch every time andthus lack of the contextual information from the third dimension, whichis fatal to the sparse 3D vessel segmentation. Without comprehensive3D spacing neighborhood, 2D U-Net is more prone to strong noiseperturbation (high-intensity true negative) and insensitive to weak ves-sel signal (low-intensity true positive), as a result, 2D U-Net performsunsatisfactorily even when equipped with more feature embeddingchannels. Consequently, it may not be an ideal idea to fuse the resultsfrom 3D U-Net and 2D U-Net through the simple combinations.Here, Dice Similarity is provided since it is the most comprehensiveand effective indicator / metric to justify the segmentation performance.It measures the intersection over union between the prediction andthe ground truth, which comprehensively takes into account all truepositive (TP), false negative (FN), as well as false positive (FP). Thisis also why we (as well as many other research works) select DiceSimilarity as the loss function in our VC-Net.

S3 L

ABELING R EFINEMENT AND V ISUALIZATION T OOL

The interface and basic functions of our speciﬁcally-designed cere-brovascular labeling and visualization tool are shown in Fig. S3. Ourtool enables slice-wise reﬁnement based on the pre-computed vessel la-bels by MRAG nls , MRVG avg , and SWI

ATRG methods, instead of labelingfrom scratch manually. The interactive vessel editing is conducted inthe current image slice window, e.g., manually labeling / erasing brush,automatically labeling connected components by ﬂood-ﬁll method asshown in Fig. S3 (a). The slice under editing is simultaneously visu-alized in solid red for a clearer examination in Fig. S3 (d). Unlike theoperation in most of the general-purpose labeling / segmentation soft-wares in which the current labeling (2D) slice is usually isolated fromits (3D) context and thus lacks the crucial reference, the vessel labelingin our developed tool is comprehensively assisted and guided by thefollowing speciﬁcally-desired functions: (1) simultaneously updated3D vasculature system from the beginning to the current slices withseveral interactions, such as rotation and zooming in / out, to check thecross-plane 3D vessel connectivity (Fig. S3 b); (2) synchronized brainvessel volume rendering to trace the overall segmented vasculature sys-tem (Fig. S3 c); (3) adaptive MIP labeling display (with user-deﬁned number of projection slices) that enables users to evaluate the con-textual slices to strengthen the vessel connectivity and rule out noise(Fig. S3 e). Our tool can greatly facilitate the continuous slice-wiselabeling and reduce the labeling ambiguity in some challenging areasof the micro-cerebrovascular structure, which have been extensivelytested and evaluated by our collaborative domain experts. Supplemen-tal Video is included to demonstrate the dynamic visualization andinteraction in detail.

S4 S

UPPLEMENTAL V IDEO

Supplemental video is included to demonstrate the joint 3D visualiza-tion of the major-level and micro-level vessels in the midbrain and thewhole brain on MICRO-MRI dataset; as well as the dynamic visual-ization and interaction of our developed cerebrovascular labeling andvisualization tool. dditional Results on TubeTK DatasetAdditional Results on MICRO-MRI Dataset

Ours 3D U-Net Ground Truth Ours DeepVesselNet Ground Truth Major-Level VesselsMajor-Level Artery Major-Level VeinOurs Ground Truth Ours MRAG nls

Ground Truth MRVG avg

Fig. S1: Additional qualitative results from two datasets (top: TubeTK dataset, bottom: MICRO-MRI major-level vessel dataset): The 3D globalvessel segmentations are shown from superior direction. The MIP segmentations are visualized by 5-sliced MRA / MICRO-MRI images, and thecorresponding vessel masks in MIPs are marked in semi-transparent red. The highlighted comparison areas are marked in circles. The 3D MRAG/ MRVG images from MICRO-MRI dataset only focus on midbrain area and thus have less vessels compared with TubeTK dataset. It is notedthat in TubeTK dataset even the ground truth vessel label does not perfectly cover certain vessel continuity, which can be clearly traced on MIPs(such as some yellow circles in the ground truth MIPs), in the corresponding MRA slices. We will further reﬁne the ground truth vessel labels ofTubeTK dataset by using our developed labeling tool under the domain experts’ guidance in our future work. ubeTK MRA (Whole Brain)

MICRO-MRI Major-Level MRAG (Midbrain)

Original MICRO-MRI Micro-Level SWI (Whole Brain)MICRO-MRI Major-Level MRVG (Midbrain)Pre-processed MICRO-MRI Micro-Level SWI (Whole Brain)

Fig. S2: The different modalities of input image examples from TubeTK MRA and MICRO-MRI datasets in our VC-Net method. a) Image Slice with Transparent Label (b) 3D Segmented Vessel Mask (c) 3D Full Vessel Volume Rendering(d) Image Slice with Solid Label (e) Maximum Intensity Projection with Label

Position the current slice index to allow re-editingCustomize the number of slices in a MIP