Automatic Ischemic Stroke Lesion Segmentation from Computed Tomography Perfusion Images by Image Synthesis and Attention-Based Deep Neural Networks
Guotai Wang, Tao Song, Qiang Dong, Mei Cui, Ning Huang, Shaoting Zhang
AAutomatic Ischemic Stroke Lesion Segmentation from Computed Tomography PerfusionImages by Image Synthesis and Attention-Based Deep Neural Networks
Guotai Wang a,1, ∗ , Tao Song a,b,1 , Qiang Dong c,d,e , Mei Cui c , Ning Huang b , Shaoting Zhang a,b a School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, China b SenseTime Research, Shanghai, China c Department of Neurology, Huashan Hospital, Fudan University, Shanghai, China d The State Key Laboratory of Medical Neurobiology, Fudan University, Shanghai, China e Department of Neurology, Jingan District Centre Hospital of Shanghai, Shanghai, China
Abstract
Ischemic stroke lesion segmentation from Computed Tomography Perfusion (CTP) images is important for accurate diagnosis ofstroke in acute care units. However, it is challenged by low image contrast and resolution of the perfusion parameter maps, in addi-tion to the complex appearance of the lesion. To deal with this problem, we propose a novel framework based on synthesized pseudoDi ff usion-Weighted Imaging (DWI) from perfusion parameter maps to obtain better image quality for more accurate segmentation.Our framework consists of three components based on Convolutional Neural Networks (CNNs) and is trained end-to-end. First,a feature extractor is used to obtain both a low-level and high-level compact representation of the raw spatiotemporal ComputedTomography Angiography (CTA) images. Second, a pseudo DWI generator takes as input the concatenation of CTP perfusionparameter maps and our extracted features to obtain the synthesized pseudo DWI. To achieve better synthesis quality, we proposea hybrid loss function that pays more attention to lesion regions and encourages high-level contextual consistency. Finally, wesegment the lesion region from the synthesized pseudo DWI, where the segmentation network is based on switchable normalizationand channel calibration for better performance. Experimental results showed that our framework achieved the top performance onISLES 2018 challenge and: 1) our method using synthesized pseudo DWI outperformed methods segmenting the lesion from per-fusion parameter maps directly; 2) the feature extractor exploiting additional spatiotemporal CTA images led to better synthesizedpseudo DWI quality and higher segmentation accuracy; and 3) the proposed loss functions and network structure improved thepseudo DWI synthesis and lesion segmentation performance. The proposed framework has a potential for improving diagnosis andtreatment of the ischemic stroke where access to real DWI scanning is limited. Keywords:
Ischemic stroke lesion, computed tomography perfusion, image synthesis, segmentation, deep learning
1. Introduction
Stroke is the most common cerebrovascular disease andone of the primary causes of mortality and long-term disabil-ity worldwide (Kissela et al., 2012). Ischemic stroke is themost common type of stroke and accounts for 75-85% of allstroke cases, which is an obstruction of the cerebral blood sup-ply and leads to tissue hypoxia (under-perfusion) and tissuedeath within few hours. The stages of stroke can be classi-fied into acute (0 to 24h), sub-acute (24h to 2w) and chronic( > ff erent medical imaging methods, Magnetic Res-onance Imaging (MRI) sequences such as Fluid-AttenuatedInversion Recovery (FLAIR), T1 weighted, T2 weighted, ∗ Corresponding author
Email address: [email protected] (Guotai Wang) Equal Contribution and Di ff usion-Weighted Imaging (DWI) are preferred imagingmodalities for ischemic stroke lesions due to their good softtissue contrasts. Especially, DWI is considered as the most sen-sitive method for detection of early acute stroke (Mezzapesaet al., 2006). However, MR imaging including DWI is relativelyslow and often not accessible for acute stroke patients. Alterna-tively, Computed Tomography Perfusion (CTP) imaging o ff ersinsights into cerebral hemodynamics and enables di ff erentia-tion of salvageable penumbra from irrevocably damaged infarctcore (Donahue and Wintermark, 2015). CTP has advantagesin speed and cost, leading to higher availability in acute careunits (Gillebert et al., 2014). In CTP imaging, a sequence ofComputed Tomography Angiography (CTA) images (i.e., spa-tiotemporal 4D images) are acquired during the perfusion pro-cess, which results in perfusion parameter maps such as Cere-bral Blood Flow (CBF), Cerebral Blood Volume (CBV), MeanTransit Time (MTT), Time to Peak (TTP, or Tmax) to help toidentify ischemic stroke lesions. Examples of perfusion param-eter maps of two ischemic stroke patients are shown in Fig 1.Segmentation of stroke lesions from medical images can pro-vide quantitative measurements of the lesion region, which is Preprint submitted to Elsevier July 8, 2020 a r X i v : . [ ee ss . I V ] J u l mportant for quantitative treatment decision procedures. Man-ual segmentation of the lesion is time-consuming with lowinter-rater agreement, and automatic stroke lesion segmentationis more e ffi cient and has a potential to provide more reliable andreproducible segmentation results (Maier et al., 2017).Considering the limited speed and availability of MRI foracute stroke patients, we aim to segment ischemic stroke lesionsautomatically from CTP perfusion parameter maps, which hasa potential for improving diagnosis and treatment of ischemicstroke in a timely fashion. However, this task is very di ffi cultand the segmentation accuracy is confronted with a lot of chal-lenges. First, the appearance of stroke lesions varies consid-erably at di ff erent time, even within the same clinical stage ofstroke (Gonz´alez et al., 2011). Second, the lesions have a largevariation of location, shape, size and appearance in the brain, asshown in Fig. 1. Some lesions may be aligned with the vascularsupply territories while others may not. The size of some smalllesions can be only few millimeters, and some large lesions maycover a complete hemisphere (Maier et al., 2017). The intensityis not homogeneous in the lesion region, and some other stroke-similar pathologies may lead to false positives in the segmen-tation result. Thirdly, compared with DWI, the perfusion pa-rameter maps (CBF, CBV, MTT, and Tmax) are noisy with alower spatial resolution, making it di ffi cult to accurately iden-tify the boundary of stroke lesions, as demonstrated in Fig. 1. Inaddition, the raw spatiotemporal 4D CTA images contain use-ful information of the ischemic stroke lesion but have a largedata size. Using the perfusion parameter maps alone withoutconsidering the raw spatiotemporal CTA images may limit thesegmentation accuracy, while directly taking raw spatiotempo-ral CTA images for lesion segmentation increases the computa-tional cost. Therefore, extracting compact and useful featuresfrom the raw spatiotemporal CTA images is desirable for e ffi -cient and accurate ischemic stroke lesion segmentation.Although automatic segmentation of ischemic stroke lesionhas been widely studied, most of existing methods were pro-posed to deal with multi-modal MR images (Maier et al., 2017;Winzeck et al., 2018). Only few works have been reported onischemic stroke lesion segmentation from CTP images (Gille-bert et al., 2014; Yahiaoui and Bessaid, 2016; Abulnaga and Ru-bin, 2018). Some old-fashion methods such as template-basedmethods (Gillebert et al., 2014) and fuzzy C-Means (Yahiaouiand Bessaid, 2016) are challenged by the complex appear-ance of stroke lesions. Recently, deep learning methods haveachieved state-of-the-art performance for many medical imagesegmentation tasks (Shen et al., 2017), and have been appliedto ischemic stroke lesion segmentation from CTP images (Pin-heiro et al., 2018; Abulnaga and Rubin, 2018; Vikas KumarAnand et al., 2018). However, due to the above mentioned chal-lenges, it remains di ffi cult to segment the lesions directly fromthe perfusion parameter maps.Inspired by the fact that ischemic stroke lesions in DWI areeasier to identify and segment than those in perfusion param-eter maps, it is desirable to synthesize pseudo DWI imagesfrom perfusion parameter maps to help the segmentation task.Though a lot of methods have been proposed for general med-ical image synthesis (Frangi et al., 2018), synthesizing images with lesions is still not well addressed (Roy et al., 2010), whichis challenged by the complex variation of pathological lesionsamong patients. Especially, synthesizing pseudo DWI imagesfrom CTP images of ischemic stroke lesions has rarely beeninvestigated.This work is a substantial extension of our preliminary con-ference publication (Song and Huang, 2018) that won the MIC-CAI 2018 ischemic stroke lesion segmentation (ISLES) chal-lenge . In this paper, we provide detailed description and in-depth discussion of our segmentation framework and validateit with extensive experiments. The contribution of our work issummarized as follows.First, we propose a novel elaborated framework for auto-matic ischemic stroke lesion segmentation from CTP imagesbased on synthesized pseudo DWI. Compared with using onlyCTP perfusion parameter maps, our framework additionally ex-ploits raw spatiotemporal CTA images for higher pseudo DWIsynthesis quality and lesion segmentation accuracy. Second,to make use of the raw spatiotemporal CTA images more e ffi -ciently, we propose a feature extractor that obtains more com-pact and high-level representation of the CTA images automat-ically, which helps to reduce the required memory and com-putational time and improve the performance of our segmenta-tion method. Thirdly, we propose a novel method to synthesispseudo DWI images with ischemic stroke lesions. We employ ahigh-level similarity loss function to encourage the pseudo DWIto be close to the ground truth in terms of both local details andglobal context, and propose an attention-guided synthesis strat-egy so that the generator will focus more on the lesion part,which benefits the final segmentation. Last but not least, to seg-ment lesions from our synthesized pseudo DWI, we proposea Convolutional Neural Network (CNN) with channel calibra-tion and Switchable Normalization (SN) (Luo et al., 2018) thatis suitable for small training batch size, and combine it witha novel attention-based and hardness-aware loss function thathelps to obtain more accurate segmentation of ischemic strokelesions. Experimental results show that our method achievedstate-of-the-art performance on ISLES 2018 challenge and itoutperformed direct segmentation from CTP perfusion param-eter maps and contemporary image synthesis-based methodsfor ischemic stroke lesion segmentation from CTP images (Liu,2018).
2. Related Works
Segmentation of ischemic stroke lesion from medical imageshas attracted increasing attentions in recent years (Rekik et al.,2012; Maier et al., 2017), and most of them focus on segmen-tation from MR images. For example, the ISLES 2015-2017challenges aimed at ischemic stroke lesion segmentation frommulti-modal MR images including T1, T1-contrast, FLAIR andDWI sequences (Maier et al., 2017; Winzeck et al., 2018). a) (b) CTA (Time 1) CBF CBV MTT Tmax DWI… … … … CTA (Time 2) Figure 1: Examples of CTP and DWI images of two patients with ischemic stroke lesions. Column 1-2: CTA images at di ff erent time points during perfusion.Column 3-6: perfusion parameter maps. Column 7: lesions delineated in DWI images. Note that we aim to segment the lesions from perfusion parameter maps, andDWI is not available at test time in our study. Some early works have used a range of methods for this seg-mentation task, such as Markov random field model (Kabiret al., 2007), level set (Feng et al., 2015), random forest (Mitraet al., 2014) and support vector machine (Maier et al., 2014).However, their accuracy is challenged by the complicated seg-mentation problem (Maier et al., 2015). Recently, deep learn-ing has been increasingly used for ischemic stroke lesion seg-mentation with better performance. For example, Kamnitsaset al. (2017) proposed a dual pathway 3D CNN combined withfully connected Conditional Random Field (CRF) for brain le-sion segmentation. Cui et al. (2019) proposed an adapted meanteacher model to learn from a combination of annotated andunannotated MR images for the segmentation task. Dolz et al.(2018) combined DWI and CTP to segment ischemic stroke le-sions and used a densely connected UNet with Inception mod-ules (Szegedy et al., 2016) to handle the variation of lesion size.Despite their good performance, these methods rely on MRIand cannot be directly applied to stroke lesion segmentationfrom CTP images.There have been few works on the challenging task of seg-mentation of ischemic stroke lesion from CTA or CTP per-fusion parameter maps (Rekik et al., 2012). Some earlyworks used histogram-based classifiers (Rekik et al., 2012) ortemplate-based voxel-wise comparison (Gillebert et al., 2014)to deal with this problem. Yahiaoui and Bessaid (2016) useda multi-scale contrast enhancement algorithm and fuzzy C-Means for this task. Recently, Abulnaga and Rubin (2018)used CNNs with pyramid pooling to combine global and lo-cal contextual information for this task, where a focal loss wasemployed to enable the CNNs to focus more on hard samples.However, due to the lower signal-to-noise ratio of CTP perfu-sion parameter maps compared with DWI, it remains challeng-ing to automatically segment the ischemic stroke lesion fromCTP images.
A range of works have investigated the problem of synthesiz-ing medical images from another modality (Frangi et al., 2018). For example, Burgos et al. (2014) synthesized CT images fromMRI through a multi-atlas information propagation scheme.Bahrami et al. (2016) used dictionary learning to synthesis 7T-like images from 3T MRI. Jog et al. (2017) used regressionrandom forest to synthesize T2 and FLAIR images from T1 im-ages. Deep learning methods have also been increasingly usedfor medical image synthesis (Ker et al., 2017), such as deepneural network-based synthesis methods (Nguyen et al., 2015)and deep adversarial learning-based approaches (Nie et al.,2018). However, most of existing works deal with generalcross-modality image synthesis and have not well investigatedthe more challenging problem of synthesizing medical imageswith pathological lesions. Roy et al. (2010) used an atlas-basedmethod to synthesize FLAIR images with white matter lesions.Chartsias et al. (2017) proposed a CNN for synthesizing multi-modal MR images of brain lesions. The e ff ectiveness of thesemethods for pseudo DWI synthesis from CTP perfusion param-eter maps of stroke lesions has rarely been demonstrated.
3. Method
The proposed framework for ischemic stroke lesion segmen-tation from CTP images is depicted in Fig. 2. Due to the largeinter-slice spacing (9.48 mm in average) of the experimentalimages, the proposed method operates on 2D slices. It con-sists of a feature extractor, a pseudo DWI generator and a finallesion segmenter. First, to e ffi ciently deal with the large rawspatiotemporal CTA images and reduce the computational re-quirements, we design a high-level feature extractor that uses aCNN to obtain a compact representation of the raw spatiotem-poral CTA images. Additionally, we make use of a temporalMaximal Intensity Projection (MIP) of the CTA images as alow-level feature. Then, these features are concatenated withthe perfusion parameter maps to serve as the input of the pseudoDWI generator, which obtains a pseudo DWI image with bettercontrast between the lesion and the background. To improvethe synthesis quality near lesion regions, we use a high-level3imilarity-based loss function and enable the generator to paymore attention to the lesion. Finally, a segmenter takes thepseudo DWI image as input and produces a segmentation ofthe ischemic stroke lesion, where a CNN using channel cali-bration and switchable normalization trained with an attention-based and hardness-aware loss function is proposed to improvethe performance. The three components are trained end-to-end.Details of these components will be described in the following. In CTP imaging, the raw spatiotemporal CTA images havebeen transformed into a simplified feature representation interms of perfusion parameter maps including CBF, CBV, MTTand Tmax. Though these parameter maps are useful for detec-tion of the stroke lesion, they are not a complete representationof the perfusion information in the raw spatiotemporal CTA im-ages. Therefore, we do not ignore the raw spatiotemporal CTAimages and try to mine some additional features that are usefulin the segmentation task.Let I ( x , y , z , t ) represent a raw spatiotemporal CTA image ob-tained during the perfusion, where t ∈ [0 , , , ..., T −
1] and T isthe total number of time points. Considering that the raw spa-tiotemporal CTA image has a large data size due to a large valueof T , we use a feature extractor to obtain an additional low-levelfeature and a compact and high-level representation of the rawspatiotemporal CTA image to make an e ffi cient use of it. Thefeature extraction method is shown in Fig. 2. We extract botha manually designed low-level feature and a high-level featurethat is automatically learned by a CNN.First, the maximal intensity value of a voxel during perfu-sion may contain information related to the ischemic stroke le-sion (Murayama et al., 2018). Therefore, in addition to the stan-dard perfusion parameter maps, we apply a Maximal IntensityProjection (MIP) along the temporal axis to I to obtain a low-level feature map F l : F l = max t I ( x , y , z , t ) (1)Second, we use a CNN to extract high-level features ofthe raw spatiotemporal CTA image due to CNNs’ good per-formance in automatic feature extraction (Shen et al., 2017).Though the start and end time points of perfusion do not a ff ectthe MIP image in theory, they are important for the high-levelfeature extractor, as the CNN is designed to take the framesduring the perfusion as input. To reject frames that are not per-fused in the raw spatiotemporal CTA image, we need first todetect these two time points. We define a curve of accumulatedintensity over time as q ( t ) = (cid:80) x , y , z I ( x , y , z , t ). Let T s and T e denote the estimated start and end time points of the perfusionrespectively. They are determined by the following rules: T s = min (cid:110) t | ≤ t < T − K , K − (cid:88) k = H (cid:0) q (cid:48) ( t + k ) (cid:1) = K (cid:111) (2) T e = max (cid:110) t | K ≤ t < T , K − (cid:88) k = H (cid:0) q (cid:48) ( t − k ) (cid:1) = (cid:111) (3)where H ( · ) is the Heaviside function that obtains 0 for negativeinputs and 1 for positive inputs. q (cid:48) ( t ) is the first derivative of q ( t ), and K is a positive integer value which is 5 in this paper.Therefore, T s is defined as the earliest time point where the firstderivative of q ( t ) keeps positive for its following K consecutivetime points, and T e is defined as the latest time point wherethe first derivative of q ( t ) keeps negative for its preceding K consecutive time points. Fig. 3 shows the curve of q ( t ) with T s and T e in two cases.We extract the frames between T s and T e and obtain a tem-porally cropped subsequence that corresponds to the perfusionstage of the raw spatiotemporal CTA image. As the duration ofthe perfusion stage has a variation among di ff erent subjects, thetemporally cropped subsequence can have di ff erent time pointnumbers along the temporal axis. To deal with this problem andto reduce the computational cost, we uniformly down-samplethe temporally cropped subsequence along the temporal axisinto a fixed time point number of C e . The temporally croppedand down-sampled CTA image is referred to as I ∗ , which isused as the input of a CNN for high-level feature extraction.Let C e × D × H × W represent the size of I ∗ , where D , H and W represent the spatial depth, height and width of the input 4Dimage I ∗ respectively. We treat I ∗ as a multi-channel 3D volumeand use a 2D CNN for high-level feature extraction from eachslice, as the images have a large inter-slice spacing (9.48 mmin average) in this study. Specifically, we used the UNet (Ron-neberger et al., 2015) for the high-level feature extraction dueto its good performance in a range of tasks (Abdulkadir et al.,2016; Li et al., 2018; Isensee et al., 2018). The UNet consists ofan encoding path and a decoding path. The encoding path usesconvolution and down-sampling through max-pooling layers toobtain features at di ff erent scales with reduced spatial resolu-tion, and the decoding path uses up-sampling (deconvolution)layers to recover the spatial resolutions. We set the output chan-nel of the extractor CNN to 1. Let F h denote the CNN’s outputand it has a size of 1 × D × H × W , which is a high-level repre-sentation of the input spatiotemporal CTA image I ∗ . F h = Φ e ( I ∗ , θ e ) (4)where Φ e represents the feature extraction network and θ e de-notes the set of parameters of the network. Inspired by recent works on CNN-based image synthesiswith state-of-the-art performance (Frangi et al., 2018), wealso use CNNs to generate pseudo DWI images, and selectUNet (Ronneberger et al., 2015) as the backbone networkstructure due to its good performance. Di ff erently from pre-vious works that synthesized pseudo DWI images only fromCTP perfusion parameter maps including CBF, CBV, MTT andTMax (Liu, 2018), we additionally take advantage of the ex-tracted low-level and high-level features ( F l and F h ) so thatmore information from the raw spatiotemporal CTA image can4 emporal MIPTemporal resamplingPerfusion stage detection 𝑡 Spatiotemporal CTA
CBF CBV MTT Tmax
Low-level feature 𝐹 (cid:3039) High-level feature 𝐹 (cid:3035) Weight map 𝐴 Real DWIPseudo DWI SegmentationWeight map 𝐴 Ground truthFeature extraction loss 𝐿 (cid:3032)
Image synthesis loss 𝐿 (cid:3034) Segmentation loss 𝐿 (cid:3046) Feature extractionFeature concatenation Loss function Φ (cid:3032) Φ (cid:3034) Φ (cid:3046) Perfusion parameter maps
Figure 2: Illustration of the proposed framework for ischemic stroke lesion segmentation from CTP images. We extract additional low-level features basedon temporal MIP and high-level features based on a CNN from raw spatiotemporal CTA images, and concatenate them with perfusion parameter maps. Theconcatenated images are used to generate pseudo DWI, from which the lesion is finally segmented. Φ e , Φ g and Φ s are three CNNs for high-level feature extraction,pseudo DWI generation and lesion segmentation, respectively. (a) (b) 𝑇 (cid:3046) 𝑇 (cid:3032) 𝑇 (cid:3046) 𝑇 (cid:3032) Time step 𝑡 Time step 𝑡 𝑞 (cid:4666) 𝑡 (cid:4667) 𝑞 (cid:4666) 𝑡 (cid:4667) (cid:2872) (cid:2873) (cid:2873) (cid:2872) (cid:2873) (cid:2873) Figure 3: Illustration of start time ( T s ) and end time ( T e ) detection of the per-fusion stage. help to improve the quality of the synthesized pseudo DWI. Let F o represent the concatenation of CBF, CBV, MTT and TMax.The input of our generator is a concatenation of F o , F l and F h and thus it has six channels. The generated pseudo DWI can berepresented as: I g = Φ g ( F o , F l , F h , θ g ) (5)where Φ g represents the pseudo DWI generation network and θ g denotes its parameter set.Let I d represent the DWI ground truth for synthesis. To trainthe generator Φ g so that it can focus on the lesion region andthe output I g has a high-level similarity to the ground truth I d ,we propose a novel loss function L g ( I g , I d ) that combines a low-level weighted pixel-wise loss (cid:96) l ( I g , I d ) and a high-level contex-tual loss (cid:96) h ( I g , I d ): L g ( I g , I d ) = (cid:96) l ( I g , I d ) + γ(cid:96) h ( I g , I d ) (6) (cid:96) l ( I g , I d ) = || A · ( I g − I d ) || (7) (cid:96) h ( I g , I d ) = || Φ c ( I g , θ c ) − Φ c ( I d , θ c ) || (8) x x
256 32 x x
128 64 x x x x
32 64 x x
16 64 x x x x x x x Conv + BN + ReLUAdaptive average pooling (1 x 8)
Adaptive average pooling (8 x 1)
Reshape and concatenation
Figure 4: Structure of the encoder Φ c to obtain a high-level representation ofan input image. The convolution kernels have a size of 3 × × where γ is a weighting parameter for the contextual loss and A is a spatial weight map. || · || and || · || are the L L L L L L L L L Φ c isa CNN-based encoder with a parameter set θ c and it converts I g and I d into their high-level and compact (i.e., low dimen-sional) representations, respectively. As (cid:96) l ( · ) operates on in-dividual voxel-wise predictions and does not guarantee globaland high-level consistency, (cid:96) h ( · ) based on the encoder Φ c helpsto overcome this problem by encouraging closeness betweenthe lower dimensional non-linear projections of I g and I d . Ourencoder Φ c consists of five convolutional layers and two adap-tive average pooling layers, and its output is a vector of length16. Details of Φ c are shown in Fig. 4.As our final goal is to segment the ischemic stroke lesion,a good synthesis quality around the lesion region is desirable.Therefore, we use the voxel-wise weight map A to make the5 Conv + SN + ReLU) × SE block
MaxpoolingDeconvolution1 × Figure 5: The proposed SLNet for ischemic stroke lesion segmentation withSwitchable Normalization (SN) and Squeeze-and-Excitation (SE) blocks. generator pay more attention to the lesion region and less at-tention to the background. Let F denote the set of lesion fore-ground voxels, and Eud ( i , F ) denote the shortest Euclidean dis-tance between a voxel i and F . We use A i to represent theweight of voxel i in the weight map A : A i = w , if i ∈ F . + exp ( − Eud ( i , F ) / D ) exp ( − Eud ( i , F ) / D ) + , otherwise (9)where w ≥ D is apositive parameter that controls the sharpness of the weight forbackground voxels. A i decays gradually with the increase of Eud ( i , F ), i.e., the weights for voxels that are further from thelesion region are lower. An example of A is shown in Fig. 2. Our segmentation network takes the synthesized pseudoDWI image I g as input and outputs a binary segmentation ofthe ischemic stroke lesion. Let Φ s represent the segmentationnetwork and θ s denote its parameter set. The segmentation net-work’s output probability map is formatted as: P = Φ s ( I g , θ s ) (10)where P has C channels and C equals to the class number,which is 2 in our binary segmentation task. We select the UNetstructure (Ronneberger et al., 2015) as the backbone and extendit in two aspects to obtain a better performance.First, we replace Batch Normalization (BN) layers withswitchable normalization (Luo et al., 2018) layers, which learnto automatically select suitable normalizers for di ff erent nor-malization layers of a CNN. Compared with traditional batchnormalization, switchable normalization is more robust to awide range of batch sizes and more suitable for small batchsizes (Luo et al., 2018). In our segmentation task, the large in-put patches and dense feature maps take a lot of memory, whichlimits the batch size to a small number. Therefore, switchablenormalization is preferred to batch normalization. Second, asdi ff erent channels in a feature map may have di ff erent impor-tance, we use a Squeeze-and-Excitation (SE) block (Hu et al.,2018) based on channel attention to calibrate channel-wise fea-ture responses. The SE block explicitly models inter-channeldependencies by learning an attention weight for each channelso that the network relies more on the most important channels for segmentation. We use an SE block after each convolutionblock in the encoding path of the UNet (Ronneberger et al.,2015). The proposed network is referred to as SLNet, which isshown in Fig. 5.To deal with the large range of the ischemic stroke lesion sizeand challenging training samples for the segmentation task, wepropose a novel hybrid loss function to train the segmentationnetwork. Let Y denote the one-hot ground truth label with chan-nel number C . We use P ci and Y ci to denote the probability ofvoxel i belonging to class c in the prediction output and theground truth respectively. The proposed loss function is a com-bination of a weighted cross entropy loss function L WCE and ahardness-aware generalized Dice loss function L HGD : L s ( P , Y ) = L WCE ( P , Y , A ) + L HGD ( P , Y ) (11) L WCE ( P , Y , A ) = (cid:80) Ni A i (cid:16) (cid:80) Cc − Y ci log P ci (cid:17)(cid:80) Ni A i (12) L HGD ( P , Y ) = − log (cid:16) − L GD ( P , Y ) (cid:17) (13) L GD ( P , Y ) = − (cid:80) Cc m c (cid:80) Ni Y ci P ci (cid:80) Cc m c (cid:80) Ni ( Y ci + P ci ) (14)where N is the number of voxels. A is a voxel-wise weightmap, and we use the same one as defined in Eq. 9, which drivesthe segmentation network to pay more attention to the lesionregion than the background. L GD is the generalized Dice lossthat automatically balances di ff erent classes by defining a class-wise weight m c = / ( (cid:80) Ni Y ci ) (Sudre et al., 2017). Inspiredby the focal loss (Lin et al., 2017) that automatically penalizeshard samples in object detection tasks, we use − log(1 − L GD ) inEq. 11 that has the same monotonicity as L GD but gets highergradient values for large L GD values, so that our segmentationloss function is also aware of hard image samples. The overall pipeline of our feature extractor Φ e , pseudo DWIgenerator Φ g , image context encoder Φ c and the final segmenta-tion network Φ s can be jointly trained in an end-to-end fashion.The overall loss function for training is therefore defined as: L = L s ( P , Y ) + α L g ( I g , I d ) + β L e ( F h , I d ) (15)where α and β are weighting parameters. The segmentationloss function L s ( P , Y ) is defined in Eq. 11 and the pseudo DWIsynthesis loss function L g ( I g , I d ) is defined in Eq. 6. To ob-tain better synthesized pseudo DWI and lesion segmentationresults, we add an extra explicit supervision on F h that is theoutput of the feature extractor Φ e . Therefore, we introduce aloss L e ( F h , I d ) = L g ( F h , I d ) to encourage the similarity between F h and I d . The end-to-end training will update θ e , θ g , θ c and θ s simultaneously.6 . Experiments and Results We used the dataset from ISLES challenge 2018 to validateour segmentation framework. The ISLES 2018 dataset includesCTP scanning of 103 patients in two centers who presentedwithin 8 hours of stroke onset. For the CTP scanning, a contrastagent was administered to the patient and then sequential CTAimages were acquired 1-2 seconds apart. Then the perfusionparameter maps CBF, CBV, MTT and Tmax were derived fromthe raw spatiotemporal CTA images. An MRI DWI scanningwas obtained within 3 hours after the CTP scanning for eachpatient. The intra-slice pixel spacing ranged from 0.80 mm × × × C e =
6. For preprocessing, intensity values in each DWI vol-ume were scaled to (0, 1) based on the minimal value and the99-th percentile. Manual delineation of the stroke lesion fromDWI images given by an expert was used as the segmentationground truth. The training set consisted of 94 scannings of CTPand DWI from 63 patients. The testing set consisted of 62 CTPscannings from 40 patients, for which DWI images were notprovided to participants of the challenge.Our segmentation framework was implemented by PyTorch with an NVIDIA TITAN X GPU with 12 GB memory. Theweights of all networks were initialized by Xavier method (Glo-rot and Bengio, 2010) and trained with the RMSprop opti-mizer (Tieleman and Hinton, 2012), a batch size of 5 and 300epochs. We initialized the learning rate as 0.002 and reduced itby a factor of 0.2 after 180 epochs. The parameter setting was: α = . β = . γ = . w = . D = ff Distance (HD) andAverage Symmetric Surface Distance (ASSD).
Dice = × T P × T P + FN + FP (16)where T P , FP and FN are true positive, false positive and false https://pytorch.org negative respectively. HD = max (cid:26) max s ∈ S d ( s , G ) , max g ∈ G d ( g , S ) (cid:27) (17) AS S D = | S | + | G | (cid:32) (cid:88) s ∈ S d ( s , G ) + (cid:88) g ∈ G d ( g , S ) (cid:33) (18)where S and G denote the set of surface points of a segmen-tation result and the ground truth respectively. d ( s , G ) is theshortest Euclidean distance between a point s ∈ S and all thepoints in G . We first conducted ablation studies to validate di ff erent com-ponents of our segmentation framework. Since the ground truthsegmentations of ISLES testing images were not available forparticipants, we split the o ffi cial ISLES training set at patientlevel into our local training, validation and testing sets, whichcontained images from 65, 6 and 23 scannings respectively. Inthis section, we report the experimental results obtained fromour local testing images. ff erent Loss Functions for Pseudo DWISynthesis First, we investigated the e ff ect of di ff erent loss functions onpseudo DWI synthesis from perfusion parameter maps F o , i.e.,concatenation of CBF, CBV, MTT and Tmax. The proposedloss function Ig (Eq. 6) based on weighted L2 loss and high-level contextual loss (Eq. 8) is referred to as w-L2 + (cid:96) h , whichis compared with 1) L1 loss that refers to (cid:96) l in Eq. 7 being de-fined as L1 norm with A i = A i = ffi cients defined in Eq. 9;4) adversarial training with Generative Adversarial Networks(GAN), which is referred to as GAN; 5) L2 + GAN that com-bines L2 loss and GAN loss and 6) w-L2 + (cid:96) h that refers toa variant of the proposed I g with (cid:96) h based on L2 norm. Forthe GAN method, we used the LSGAN framework proposed byMao et al. (2017), and used a multi-scale discriminator (Ting-Chun Wang et al., 2018) to guide the generator (i.e. UNet) toproduce realistic local details and global appearance.Fig. 6 shows a visual comparison of pseudo DWI generatedby UNet trained with di ff erent loss functions, where the inputimages were perfusion parameter maps ( F o ) for these variants.The synthesized pseudo DWI images are shown in the secondrow. It can be observed that L L + (cid:96) h losses helps to obtain clearer lesion boundary respec-tively. The results of GAN and L2 + GAN are less smoothed,but include some large artifacts as highlighted by the light bluearrows. We additionally investigated the e ff ect of the synthe-sized pseudo DWI images on segmentation, where we usedthe standard cross entropy loss to train a segmentation model(i.e., UNet (Ronneberger et al., 2015)) with each type of thesesynthesized pseudo DWI images respectively. The last row in7 ℓ (cid:3035)(cid:2869) Real DWI and segmentation from Real DWICBVCBF MTT Tmax w-L2 + ℓ (cid:3035) L1 Figure 6: Visual comparison of pseudo DWI synthesis result (the second row) obtained by di ff erent loss functions and their e ff ect on segmentation (the third row).First row: Concatenation of perfusion parameter maps ( F o ) was used as the input of UNet for synthesis. w-L2: Weighted L2 loss defined in Eq. 7. w-L2 + (cid:96) h :Proposed hybrid loss based on Eq. 6 and Eq. 8. Light blue arrows highlight artifacts obtained by GAN-based methods, and red arrow highlight the segmentationdi ff erences. Green and yellow curves show segmentation result and the ground truth, respectively.Table 1: Quantitative evaluation of di ff erent training loss functions for pseudo DWI synthesis and their e ff ect on segmentation. Concatenation of the CTP perfusionparameter maps ( F o ) was used as the input for synthesis. Loss Global SSIM Local SSIM Global PSNR Local PSNR Dice (%)L1 0.82 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± + GAN 0.78 ± ± ± ± ± ± ± ± ± ± + (cid:96) h ± ± ± ± ± w-L2 + (cid:96) h ± ± ± ± ± Fig. 6 shows that the segmentation based on synthesized pseudoDWI images obtained by w-L2 + (cid:96) h is more accurate than theothers, as highlighted by the red arrows. For quantitative eval-uation, the global and local SSIM and PSNR measurements ofresults obtained by di ff erent synthesis loss functions and Dicescores of their corresponding segmentation results are presentedin Table. 1, which shows that the proposed w-L2 + (cid:96) h lossfunction obtains higher local SSIM and Dice than the others. ff ect of Feature Extractor on Pseudo DWI Synthesis To investigate the e ff ect of our feature extractor on the syn-thesized pseudo DWI, we compared the quality of pseudo DWIimages generated from di ff erent inputs: 1) the standard CTPperfusion parameter maps ( F o ) only, i.e., without using our fea-ture extractor; 2) concatenation of F o and our extracted low-level feature F l defined in Eq. 1; 3) concatenation of F o , F l and F ∗ h , where F ∗ h denotes the high-level feature obtained by theCNN-based feature extractor Φ e trained without explicit super-vision, i.e., L e is not used; and 4) concatenation of F o , F l and F h , where F h is the high-level feature obtained by Φ e trainedwith explicit supervision through L e . We used the proposed lossfunction I g (i.e., w-L2 + (cid:96) h ) to train the synthesis network. To additionally investigate how these synthesized results a ff ect thesegmentation, we used the standard cross entropy loss to train aUNet (Ronneberger et al., 2015) using each type of these syn-thesized pseudo DWI images respectively. Fig. 7 shows a visualcomparison of pseudo DWI synthesized from di ff erent inputimages. It can be observed that using additional F l and F h helpsto improve local details of the synthesized pseudo DWI, and theresult obtained by concatenation of F o , F l and F h with explicitsupervision lead to better image quality than the other variants,as highlighted by the green arrows. Table 2 presents a quantita-tive comparison between these di ff erent inputs for pseudo DWIsynthesis and the downstream segmentation, which shows thatusing additional low-level feature F l leads to an improvementof global and local SSIM and PSNR from using CTP perfusionparameter maps F o only. The high-level feature F h extractedby CNN and explicit supervision by L e can further lead to im-proved SSIM and PSNR values, which demonstrates that theproposed feature extractor making use of the raw spatiotempo-ral CTA images helps to obtain better synthesized pseudo DWIimages. Fig. 7 and Table 2 also show that synthesis based on F o , F l and F h leads to higher segmentation accuracy than theother variants.8 able 2: Quantitative evaluation of di ff erent inputs for pseudo DWI synthesis and their e ff ect on segmentation. F o : Concatenation of perfusion parameter maps. F l : MIP of spatiotemporal CTA images. F ∗ h and F h are the high-level features obtained by the CNN-based feature extractor Φ e trained without and with explicitsupervision through L e , respectively. The proposed hybrid loss function L g was used for training. Input Global SSIM Local SSIM Global PSNR Local PSNR Dice (%) F o ± ± ± ± ± F o , F l ± ± ± ± ± F o , F l , F ∗ h ± ± ± ± ± F o , F l , F h ± ± ± ± ± Real DWI 72.17 ± Table 3: Quantitative evaluation of di ff erent networks for ischemic stroke lesion segmentation. SLNet: The proposed network for ischemic stroke lesion segmenta-tion. Concatenation of the CTP perfusion parameter maps ( F o ) was used as the input, and the cross entropy loss function was used for training. Network Parameter (M) Precision (%) Recall (%) Dice (%) HD (mm) ASSD (mm)FCN (Long et al., 2015) 18.64 52.69 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± / o SE) 31.04 ± ± ± ± ± SLNet (w / o SN) 33.84 68.14 ± ± ± ± ± 𝐹 (cid:3042) 𝐹 (cid:3042) ,𝐹 (cid:3039) 𝐹 (cid:3042) ,𝐹 (cid:3039) ,𝐹 (cid:3035)∗ 𝐹 (cid:3042) ,𝐹 (cid:3039) ,𝐹 (cid:3035) Real DWI
Figure 7: Visual comparison of pseudo DWI synthesized (top row) from di ff er-ent input images and their e ff ect on segmentation (bottom row). F o : Concatena-tion of perfusion parameter maps. F l : MIP of spatiotemporal CTA images. F ∗ h and F h are the high-level features obtained by the CNN-based feature extractor Φ e trained without and with explicit supervision through L e , respectively. Theproposed hybrid loss function I g defined in Eq. 6 was used for training. Greenarrows highlight local di ff erences of the pseudo DWI, and red arrows highlightthe segmentation di ff erence. Green and yellow curves show segmentation resultand the ground truth, respectively. ff erent Network for Segmentation To investigate the e ff ect of network structure on our ischemicstroke lesion segmentation task, we compared our proposedSLNet with 1) SLNet w / o SE, where the SE blocks are notused in SLNet, 2) SLNet w / o SN, where the switchable nor-malization layers are replaced with traditional batch normal-ization layers in SLNet, 3) the Fully Convolutional Network(FCN) (Long et al., 2015), 4) UNet (Ronneberger et al., 2015),5) Recurrent Residual UNet (R2UNet) (Alom et al., 2018), and6) Residual UNet (ResUNet) (Xiao et al., 2018). We trainedthese networks with CTP perfusion parameter maps F o as inputand used the cross entropy loss function for training.Fig. 8 shows a visual comparison of segmentation results ob-tained by these networks, where the lesions are shown with thecorresponding real DWI images for better visualization. It canbe observed that it is challenging for all these networks to ob-tain very accurate segmentation of the ischemic stroke lesion.However, the results of our SLNet have a better overlap withthe ground truth compared with the others. In the first row, the di ff erence between di ff erent networks is relatively small. In thesecond row, SLNet w / o SE, SLNet w / o SN and UNet obtainedmore under-segmentations than SLNet, and FCN, R2UNet andResUNet obtained more over-segmentations than SLNet.Quantitative comparison between these di ff erent networks isshown in Table 3. The proposed SLNet achieved the highest av-erage Dice score and Recall among all the compared networks,while SLNet w / o SE achieved slightly better HD and ASSDevaluation results. ff erent Training Loss Functions forSegmentation We also investigate the e ff ect of di ff erent training loss func-tions for the segmentation network. We refer to our proposedweighted cross entropy loss with hardness-aware generalizedDice loss as L WCE + L HGD and compare it with 1) cross entropyloss L CE , 2) Dice loss L DICE (Milletari et al., 2016), 3) gener-alized Dice loss L GD (Sudre et al., 2017), 4) hardness-weighted L GD , which is defined in Eq. 13 and referred to as L HGD , and 5)a variant of the proposed loss that does not pay attention to le-sion foreground (i.e., A i is 1 for every voxel), which is referredto as L CE + L HGD . We used these loss functions to train ourSLNet to segment the ischemic stroke lesion from CTP perfu-sion parameter maps F o respectively.Quantitative evaluation results of these di ff erent segmenta-tion loss functions are listed in Table 4. It can be observed thatthe combination of L CE and L HGD outperforms using a singleloss of L CE or L HGD . By enabling the network to focus moreon the lesion region through L WCE + L HGD , the values of Recalland Dice are improved. Our proposed L WCE + L HGD achievedthe highest average Dice score of 59.37%, which is a large im-provement from 54.45% achieved by the baseline of L CE . ff ect of Feature Extractor and Pseudo DWI Generatoron Segmentation With our proposed feature extraction and image synthesismethod, we evaluate the value of our pseudo DWI generatedfrom F o , F l and F h for ischemic stroke lesion segmentation,where the pseudo DWI is referred to as DWI o , l , h . We compared9 LNet SLNet w/o SE SLNet w/o SN FCN UNet R2UNet ResUNetSegmentation Ground truth
Figure 8: Visual comparison of di ff erent networks for ischemic stroke lesion segmentation. Concatenation of the CTP perfusion parameter maps ( F o ) was used asthe input of CNNs and cross entropy loss function was used for training. For better visualization, the segmentation results are shown with the real DWI images.Table 4: Quantitative evaluation of di ff erent training loss functions for ischemic stroke lesion segmentation based on our proposed SLNet. Concatenation ofperfusion parameter maps ( F o ) was used as the input. L CE : Cross entropy loss. L WCE : Weighted cross entropy loss. L DICE : Dice loss. L GD : Generalized Dice loss. L HGD : Hardness-aware generalized Dice loss.
Loss function Precision (%) Recall (%) Dice (%) HD (mm) ASSD (mm) L CE ± ± ± ± ± L DICE ± ± ± ± ± L GD ± ± ± ± ± L HGD ± ± ± ± ± L CE + L HGD ± ± ± ± ± L WCE + L HGD ± ± ± ± ± segmentation from DWI o , l , h with segmentation from 1) rawCTA images that were temporally cropped and down-sampled(i.e., I ∗ as described in Section 3.1), 2) CTP perfusion parame-ter maps F o , 3) DWI o that refers to pseudo DWI generated from F o , 4) DWI o , l that refers to pseudo DWI generated from F o and F l , and 5) concatenation of F o and DWI o , l , h . We used these dif-ferent setting of synthesized pseudo DWI images for end-to-endtraining respectively, where the overall loss function in Eq. 15combined with our SLNet was used for segmentation. We alsocompared DWI o , l , h with its variant DWI o , l , h (s) that refers to our Φ e , Φ g and Φ s were trained subsequently rather than end-to-end. Additionally, we trained SLNet with real DWI imagesto investigate the gap between segmentation from synthesizedpseudo DWI images and from real DWI images.Fig. 9 presents a visual comparison between ischemic strokelesion segmentation results from di ff erent input images, whichshows that the results segmented from our synthesized pseudoDWI images are better than those of other variants. Table 5presents the quantitative evaluation results. It shows that us-ing DWI o generated from CTP perfusion parameter maps leadsto a slightly decreased segmentation accuracy. By using ad-ditional features F l and F h extracted from the raw spatiotem-poral CTA images for synthesis, DWI o , l and DWI o , l , h lead toan improvement of Dice score respectively. Table 5 showsthat using DWI o , l , h outperformed the other variants. The av-erage Dice scores for segmentation from original CTA images,perfusion parameter maps (i.e., F o ), synthesized pseudo DWIbased on our proposed method (i.e., DWI o , l , h ) and real DWI are56.10%, 59.37%, 62.11% and 79.72%, respectively. The corre- sponding Hausdor ff Distance values are 25.25 mm, 22.29 mm,19.27 mm and 15.90 mm, respectively. We found that adding F o to DWI o , h , l leads to a reduced segmentation performancecompared with using DWI o , h , l only. This is due to that using F o performs worse than using DWI o , h , l , and a combination of themjust obtains a segmentation accuracy above that of using F o andbelow that of using DWI o , h , l . It can be observed from Table 5that DWI o , l , h and DWI o , l , h (s) obtained very close segmentationaccuracy in terms of Dice. However, DWI o , l , h achieved smallerHD and ASSD values than DWI o , l , h (s).As the ischemic stroke lesions vary largely in sizes, we inves-tigated the segmentation performance at di ff erent lesion scales.We divided the local testing set into three groups: 1) 9 imageswith small lesions ( <
10 CC), 2) 10 images with medium le-sions (10 - 50 CC) and 3) 4 images with large lesions ( >
50 CC).For evaluation, we additionally measured the Relative VolumeError (RVE):
RV E = abs ( V g − V s ) / V g , where V g and V s arethe volume of a ground truth lesion and the segmented lesion,respectively. Table 5 shows that DWI o , l , h obtained a lower aver-age RVE value than the others except for the real DWI. Fig. 10shows the distributions of Dice and RVE in these three groups.The average Dice values achieved by our proposed method(i.e., DWI o , l , h ) for these three groups were 59.50%, 68.87% and56.44% respectively. The lower performance in the small andlarge groups indicate that it remains di ffi cult for the proposedmethod to deal with extreme cases with small and very largelesions.10 (cid:3042) 𝐷𝑊𝐼 (cid:3042)
𝐷𝑊𝐼 (cid:3042),(cid:3039)
𝐷𝑊𝐼 (cid:3042),(cid:3039),(cid:3035) 𝐹 (cid:3042) (cid:3397) 𝐷𝑊𝐼 (cid:3042),(cid:3039),(cid:3035) Real DWI𝐷𝑊𝐼 (cid:3042),(cid:3039),(cid:3035) (s)
CTA
Figure 9: Visual comparison of ischemic stroke lesion segmentation from di ff erent input images. Yellow and green curves show segmentation and the ground truth,respectively. F o : CTP perfusion parameter maps. DWI o , DWI o , l and DWI o , l , h are pseudo DWI images generated from F o , ( F o , F l ), and ( F o , F l , F h ) respectively.DWI o , l , h (s) is a variant of DWI o , l , h where our Φ e , Φ g and Φ s were trained subsequently rather than end-to-end. For better visualization, the segmentation results areshown with the real DWI images.Table 5: Quantitative comparison of ischemic stroke lesion segmentation from di ff erent input images. F o : perfusion parameter maps. DWI o , l , h is our proposedmethod with pseudo DWI synthesized from ( F o , F l , F h ) as shown in Fig. 2. DWI o , l , h (s) is a variant of DWI o , l , h where our Φ e , Φ g and Φ s were trained subsequentlyrather than end-to-end. The results are based on our proposed SLNet and loss function L s defined in Eq. 11. Input Precision (%) Recall (%) Dice (%) HD (mm) ASSD (mm) RVECTA ± ± ± ± ± ± F o ± ± ± ± ± ± o ± ± ± ± ± ± o , l ± ± ± ± ± ± o , l , h ± ± ± ± ± ± DWI o , l , h (s) 57.05 ± ± ± ± ± ± F o + DWI o , l , h ± ± ± ± ± ± ± ± ± ± ± ± Table 6: Quantitative comparison of the top five methods for ISLES 2018 test-ing set.
Method Dice Precision RecallOurs 0.51 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± We also trained our proposed method with the entire ISLES2018 training set, and submitted the segmentation results ofISLES 2018 testing set to the online evaluation platform forquantitative evaluation. According to the ISLES 2018 leader-board , our method achieved the top performance among 62teams. Table 6 lists the quantitative evaluation results of the topfive methods for ISLES 2018, where our method outperformedthe others with an average Dice score of 0.51. Liu (2018) also Listed in the ’Results’ section of used a CNN to generate pseudo DWI for segmentation, butonly from CTP perfusion parameter maps with GAN, and theachieved Dice and Recall are lower than ours. The other threemethods segmented the ischemic stroke lesion from CTP per-fusion parameter maps directly. Chen et al. used an ensembleof multiple networks combined with several data augmentationmethods. Hu et al. proposed a multi-level 3D refinement mod-ule trained with curriculum learning. Clerigues et al. also usedan ensemble of multiple networks, and employed a patch sam-pling strategy to alleviate class imbalance.
5. Discussion and Conclusion
Due to the low contrast and low resolution of CTP perfusionparameter maps, it is challenging to directly use these imagesfor ischemic stroke lesion segmentation. Transferring the perfu-sion parameter maps to pseudo DWI images via image synthe-sis is a promising way for the segmentation task, as DWI images .20.40.60.81.0 D i c e C T A F o D W I o D W I o , l D W I o , l , h D W I o , l , h ( s ) F o + D W I o , l , h R e a l D W I R V E (a) Small lesions ( <
10 CC) D i c e C T A F o D W I o D W I o , l D W I o , l , h D W I o , l , h ( s ) F o + D W I o , l , h R e a l D W I R V E (b) Medium lesions (10 to 50 CC) D i c e C T A F o D W I o D W I o , l D W I o , l , h D W I o , l , h ( s ) F o + D W I o , l , h R e a l D W I R V E (c) Large lesions ( >
50 CC)Figure 10: Dice and RVE for lesions at three scales segmented from di ff erent types of images. F o : perfusion parameter maps. DWI o , l , h is our proposed method withpseudo DWI synthesized from ( F o , F l , F h ) as shown in Fig. 2. DWI o , l , h (s) is a variant of DWI o , l , h where our Φ e , Φ g and Φ s were trained subsequently rather thanend-to-end. The results are based on our proposed SLNet and loss function L s . have a better contrast between the lesion and the backgroundand they are used for obtaining the ground truth ischemic strokelesion region. The ISLES 2018 finalist and our experimentsshowed that pseudo DWI-based segmentation methods outper-formed direct segmentation from perfusion parameter maps.The quality of the synthesized pseudo DWI images has alarge impact on the segmentation performance. A good contrastwith enhanced and preserved lesion information in the pseudoDWI is important for good segmentation results. Though deeplearning for image synthesis has achieved very good perfor-mance in other tasks (Frangi et al., 2018), the synthesis ofpseudo DWI with ischemic stroke lesion in this study is stillchallenging due to the low quality of perfusion parameter mapsand a small number of training images. To alleviate this prob-lem, we used two strategies. First, we exploited informationin the raw spatiotemporal CTA images by extracting low-leveland high-level features in additional to the perfusion parametermaps. Results show that this helps to obtain higher pseudo DWIquality and higher segmentation accuracy than using perfusionparameter maps only, as demonstrated in Table 2 and Table 5.From Fig. 7 and Table 2, we find that using an explicit supervi-sion on the feature extractor leads to some improvement of seg-mentation accuracy, but the di ff erence was not significant. Thisphenomenon is expected as the explicit supervision serves as adeep supervision. When it is not used, the feature extractor canalso be updated based on the loss function, and the deep super-vision mainly helps to improve the convergence during training.Second, we designed a weighted loss function that pays atten-tion to the lesion region so that the quality of the generated le-sion is highlighted. It is combined with a high-level contextualloss function that encourages global and high-level consistencybetween the generated pseudo DWI and the ground truth DWI. Results in Table 1 show that this leads to an improvement of lo-cal SSIM around the lesion region. However, we found that oursynthesized pseudo DWI images are still not as good as the realDWI images. For example, Table 1 and Table 2 indicate that thePSNR numbers are not very high. This is mainly due to that thehigh-frequency components in the real DWI images are not wellsynthesized, as shown in Fig. 6 and Fig. 7. The high-frequencycomponents are related to local fine-grained details, noises andsome artifacts. As demonstrated by Xu et al. (2019), CNNscapture low-frequency components at the early stage of train-ing, and then capture high-frequency components and tend tooverfit at the late stage of training. During the training with ourrelatively small dataset, we used the best performing checkpointon the validation set for testing to minimize the risk of under-fitting or over-fitting. As an incidental e ff ect, we found that thesynthesized pseudo DWI images related to that checkpoint didnot have many high-frequency components. It is of interest tofurther improve the pseudo DWI quality, which has a promisingto obtain better segmentation results. As the synthesized pseudoDWI and real DWI can be regarded as coming from two di ff er-ent domains, some domain adaptation methods (Perone et al.,2019) can be used in the future to obtain better segmentationperformance with pseudo DWI.For segmentation networks, by using switchable normaliza-tion and SE block based on channel attention, the segmenta-tion Dice and Recall are improved with a marginal increaseof parameter number, as shown in Table 3. The loss functionfor training the segmentation network also has a large impacton the segmentation performance. Our weighted cross entropyloss function L WCE pays more attention to the lesion region andhelps to alleviate the imbalance between the foreground and thebackground. The hardness-aware generalized Dice loss L HGD L WCE and L HGD considers pixel-wise and region-levelaccuracy simultaneously, which leads to better Dice, Recall andASSD than the other variants as shown in Table 4. It should benoticed that the Hausdor ff distance of our results is still high. Toaddress this problem, using Hausdor ff distance-based loss func-tions (Kervadec et al., 2019a) or high-level constraints (Oktayet al., 2018) are potential solutions.Our high-level feature extraction, pseudo DWI generationand lesion segmentation modules are trained end-to-end so thatthey are updated simultaneously and adaptive to each other witha high coherence. This makes the training process more ef-ficient than training these modules subsequently. Results inFig. 9 and Table 5 show that the end-to-end training also ben-efits the final segmentation performance. However, a draw-back of end-to-end training is that these modules become lessportable as a change of the segmentation network requires thewhole system to be trained again. Subsequent training wouldmake the system more modular and is preferred in a scenariowhere there is a high demand for replacing some of these mod-ules. For example, the segmentation network can be replacedwhen more training images become available without retrain-ing the feature extractor and pseudo DWI generator. In thispaper, as the training set had a small size and was fixed duringthe study, we chose the end-to-end training strategy due to itse ffi ciency and better segmentation performance.Comparing Table 5 and Table 6, we observe that there is aperformance drop between our local testing set and the o ffi cialtesting set of ISLES 2018. This indicates some overfitting ofthe proposed method. The overfitting could be attributed to acouple of reasons. First, the training set was relatively smalland each image only contained 5.34 slices in average. Second,our method relies on image synthesis as an intermediate step,and there might be a domain shift between synthesized pseudoDWI images and real DWI images. The two steps of synthesisand segmentation are prone to accumulate the prediction errorand possibility of overfitting. To deal with this problem, usingsome advanced data augmentation methods (Abdulkadir et al.,2016; Frid-Adar et al., 2018) and additional regularizationssuch as auxiliary tasks (Myronenko, 2018) and volume con-straints (Kervadec et al., 2019b) could be potential approaches.Fig. 10 shows that the proposed method did not segment wellon large lesions, which is mainly because the large lesion groupcontained only few cases (i.e., 4 images for testing), and it wasnot statistically significant to evaluate the segmentation perfor-mance for that group. In the future, a larger dataset could beused for a better evaluation.In conclusion, to deal with the problem of ischemic strokelesion segmentation from CTP images, we propose a novelframework using synthesized pseudo DWI images for bettersegmentation results. We propose a feature extractor that ob-tains both a low-level and a high-level compact representationof the raw spatiotemporal CTA images, and combine them withthe CTP perfusion parameter maps for better pseudo DWI syn-thesis quality. We also propose to pay more attention to the le-sion region and encourage high-level similarity for synthesis ofpseudo DWI with stroke lesions. A network with switchable normalization and channel calibration trained with hardness-aware generalized Dice loss is proposed for the final segmen-tation from synthesized pseudo DWI. Extensive experimentalresults on ISLES 2018 dataset showed that our method usingsynthesized pseudo DWI outperformed methods using CTA im-ages or perfusion parameter maps directly for ischemic strokelesion segmentation, and demonstrated that our feature extrac-tor helps to obtain better synthesized pseudo DWI quality thatleads to higher segmentation accuracy. The proposed automaticsegmentation framework has a potential for improving diagno-sis and treatment of the ischemic stroke in a timely fashion,especially in acute units with limited availability of DWI scan-ning.
6. Acknowledgements
This work was supported by the National Natural ScienceFoundation of China funding [81771921, 61901084].
References
Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O., 2016. 3D U-Net:Learning dense volumetric segmentation from sparse annotation, in: MIC-CAI, pp. 424–432.Abulnaga, S.M., Rubin, J., 2018. Ischemic stroke lesion segmentation in CTperfusion scans using pyramid pooling and focal loss, in: Int. MICCAIBrainlesion Work., pp. 352–363.Alom, M.Z., Hasan, M., Yakopcic, C., Taha, T.M., Asari, V.K., 2018. Recur-rent residual convolutional neural network based on U-Net (R2U-Net) formedical image segmentation. arXiv Prepr. arXiv1802.06955 .Bahrami, K., Shi, F., Zong, X., Shin, H.W., An, H., Shen, D., 2016. Recon-struction of 7T-Like images from 3T MRI. IEEE Trans. Med. Imaging 35,2085–2097.Burgos, N., Cardoso, M.J., Thielemans, K., Modat, M., Pedemonte, S., Dick-son, J., Barnes, A., Ahmed, R., Mahoney, C.J., Schott, J.M., Duncan, J.S.,Atkinson, D., Arridge, S.R., Hutton, B.F., Ourselin, S., 2014. Attenuationcorrection synthesis for hybrid PET-MR scanners: application to brain stud-ies. IEEE Trans. Med. Imaging 33, 2332–2341.Chartsias, A., Joyce, T., Giu ff rida, M.V., Tsaftaris, S.A., 2017. Multimodal MRsynthesis via modality-invariant latent representation. IEEE Trans. Med.Imaging 37, 803 – 814.Cui, W., Liu, Y., Li, Y., Guo, M., Li, Y., Li, X., Wang, T., Zeng, X., Ye, C.,2019. Semi-supervised brain lesion segmentation with an adapted meanteacher model, in: IPMI, pp. 554–565.Dolz, J., Ben Ayed, I., Desrosiers, C., 2018. Dense multi-path U-net for is-chemic stroke lesion segmentation in multiple image modalities, in: Int.MICCAI Brainlesion Work., pp. 271–282.Donahue, J., Wintermark, M., 2015. Perfusion CT and acute stroke imaging:foundations, applications, and literature review. J. Neuroradiol. 42, 21–29.Feng, C., Zhao, D., Huang, M., 2015. Segmentation of ischemic stroke lesionsin multi-spectral MR images using weighting suppressed FCM and threephase level set, in: Int. Work. Brainlesion Glioma, Mult. Sclerosis, StrokeTrauma. Brain Inj., pp. 233–245.Frangi, A.F., Tsaftaris, S.A., Prince, J.L., 2018. Simulation and synthesis inmedical imaging. IEEE Trans. Med. Imaging 37, 673–679.Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan,H., 2018. GAN-based synthetic medical image augmentation for increasedCNN performance in liver lesion classification. Neurocomputing 321, 321–331.Ghosh, A., Kumar, H., Sastry, P.S., 2017. Robust loss functions under labelnoise for deep neural networks, in: AAAI, pp. 1919–1925.Gillebert, C.R., Humphreys, G.W., Mantini, D., 2014. Automated delineationof stroke lesions using brain CT images. NeuroImage Clin. 4, 540–548.Glorot, X., Bengio, Y., 2010. Understanding the di ffi culty of training deepfeedforward neural networks, in: AISTATS, pp. 249–256. onz´alez, R.G., Hirsch, J.A., Lev, M.H., Schaefer, P.W., Schwamm, L.H.,2011. Acute ischemic stroke: imaging and intervention. Springer, Berlin,Heidelberg.Hu, J., Shen, L., Sun, G., 2018. Squeeze-and-excitation networks, in: CVPR,pp. 7132–7141.Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., Maier-Hein, K.H.,2018. No new-net, in: Int. MICCAI Brainlesion Work., pp. 234–244.Jog, A., Carass, A., Roy, S., Pham, D.L., Prince, J.L., 2017. Random forestregression for magnetic resonance image synthesis. Med. Image Anal. 35,475–488.Kabir, Y., Dojat, M., Scherrer, B., Forbes, F., Garbay, C., 2007. MultimodalMRI segmentation of ischemic stroke lesions, in: EMBS, pp. 1595–1598.Kamnitsas, K., Ledig, C., Newcombe, V.F.J., Simpson, J.P., Kane, A.D.,Menon, D.K., Rueckert, D., Glocker, B., 2017. E ffi cient multi-scale 3DCNN with fully connected CRF for accurate brain lesion segmentation.Med. Image Anal. 36, 61–78.Ker, J., Wang, L., Rao, J., Lim, T., 2017. Deep learning applications in medicalimage analysis. IEEE Access 6, 9375 – 9389.Kervadec, H., Bouchtiba, J., Desrosiers, C., Granger, E., Dolz, J., Ayed, I.B.,2019a. Boundary loss for highly unbalanced segmentation, in: Int. Conf.Med. Imaging with Deep Learn., pp. 285–296.Kervadec, H., Dolz, J., Tang, M., Granger, E., Boykov, Y., Ben Ayed, I., 2019b.Constrained-CNN losses for weakly supervised segmentation. Med. ImageAnal. 54, 88–99.Kissela, B.M., Khoury, J.C., Alwell, K., Moomaw, C.J., Woo, D., Adeoye, O.,Flaherty, M.L., Khatri, P., Ferioli, S., De Los Rios La Rosa, F., Broderick,J.P., Kleindorfer, D.O., 2012. Age at stroke: Temporal trends in stroke inci-dence in a large, biracial population. Neurology 79, 1781–1787.Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.A., 2018. H-DenseUNet:Hybrid densely connected UNet for liver and liver tumor segmentation fromCT volumes. IEEE Trans. Med. Imaging 37, 2663–2674.Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P., 2017. Focal loss for denseobject detection, in: ICCV, pp. 2980–2988.Liu, P., 2018. Stroke lesion segmentation with 2D novel CNN pipeline andnovel loss function, in: Int. MICCAI Brainlesion Work., pp. 253–262.Long, J., Shelhamer, E., Darrell, T., 2015. Fully convolutional networks forsemantic segmentation, in: CVPR, pp. 3431–3440.Luo, P., Ren, J., Peng, Z., Zhang, R., Li, J., 2018. Di ff erentiable learning-to-normalize via switchable normalization. arXiv Prepr. arXiv1806.10779.Maier, O., Menze, B.H., von der Gablentz, J., H¨ani, L., Heinrich, M.P.,Liebrand, M., Winzeck, S., Basit, A., Bentley, P., Chen, L., Christiaens, D.,Dutil, F., Egger, K., Feng, C., Glocker, B., G¨otz, M., Haeck, T., Halme, H.L.,Havaei, M., Iftekharuddin, K.M., Jodoin, P.M., Kamnitsas, K., Kellner, E.,Korvenoja, A., Larochelle, H., Ledig, C., Lee, J.H., Maes, F., Mahmood, Q.,Maier-Hein, K.H., McKinley, R., Muschelli, J., Pal, C., Pei, L., Rangarajan,J.R., Reza, S.M., Robben, D., Rueckert, D., Salli, E., Suetens, P., Wang,C.W., Wilms, M., Kirschke, J.S., Kr¨amer, U.M., M¨unte, T.F., Schramm, P.,Wiest, R., Handels, H., Reyes, M., 2017. ISLES 2015 - A public evaluationbenchmark for ischemic stroke lesion segmentation from multispectral MRI.Med. Image Anal. 35, 250–269.Maier, O., Schr¨oder, C., Forkert, N.D., Martinetz, T., Handels, H., 2015. Clas-sifiers for ischemic stroke lesion segmentation: a comparison study. PLoSOne 10, e0145118.Maier, O., Wilms, M., von der Gablentz, J., Kr¨amer, U., Handels, H., 2014. Is-chemic stroke lesion segmentation in multi-spectral MR images with supportvector machine classifiers, in: SPIE Med. Imaging 2014 Comput. Diagnosis,p. 903504.Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z., 2017. Least squares generativeadversarial networks, in: ICCV, pp. 2794–2802.Mezzapesa, D.M., Petruzzellis, M., Lucivero, V., Prontera, M., Tinelli, A., San-cilio, M., Carella, A., Federico, F., 2006. Multimodal MR examination inacute ischemic stroke. Neuroradiology 48, 238–246.Milletari, F., Navab, N., Ahmadi, S.A., 2016. V-Net: Fully convolutional neuralnetworks for volumetric medical image segmentation, in: IC3DV, pp. 565–571.Mitra, J., Bourgeat, P., Fripp, J., Ghose, S., Rose, S., Salvado, O., Connelly,A., Campbell, B., Palmer, S., Sharma, G., Christensen, S., Carey, L., 2014.Lesion segmentation from multimodal MRI using random forest followingischemic stroke. Neuroimage 98, 324–335.Murayama, K., Suzuki, S., Matsukiyo, R., Takenaka, A., Hayakawa, M., Tsut- sumi, T., Fujii, K., Katada, K., Toyama, H., 2018. Preliminary study oftime maximum intensity projection computed tomography imaging for thedetection of early ischemic change in patient with acute ischemic stroke.Medicine (Baltimore). 97, e9906.Myronenko, A., 2018. 3D MRI brain tumor segmentation using autoencoderregularization, in: Int. MICCAI Brainlesion Work., pp. 349–356.Nguyen, H.V., Zhou, K., Vemulapalli, R., 2015. Cross-domain synthesis ofmedical images using e ffi cient location-sensitive deep network, in: MIC-CAI, pp. 677–684.Nie, D., Trullo, R., Lian, J., Wang, L., Petitjean, C., Ruan, S., Wang, Q., Shen,D., 2018. Medical image synthesis with deep convolutional adversarial net-works. IEEE Trans. Biomed. Eng. 65, 2720–2730.Oktay, O., Ferrante, E., Kamnitsas, K., Heinrich, M., Bai, W., Caballero, J.,Cook, S., Marvao, A.D., Dawes, T., Regan, D.O., Kainz, B., Glocker, B.,Rueckert, D., 2018. Anatomically constrained neural networks (ACNN):Application to cardiac image enhancement and segmentation. IEEE Trans.Med. Imaging 37, 384–395.Perone, C.S., Ballester, P., Barros, R.C., Cohen-Adad, J., 2019. Unsuperviseddomain adaptation for medical imaging segmentation with self-ensembling.Neuroimage 194, 1–11.Pinheiro, G.R., Voltoline, R., Bento, M., Rittner, L., 2018. V-Net and U-Netfor ischemic stroke lesion segmentation in a small dataset of perfusion data,in: Int. MICCAI Brainlesion Work., pp. 301–309.Rekik, I., Allassonni`ere, S., Carpenter, T.K., Wardlaw, J.M., 2012. Medicalimage analysis methods in MR / CT-imaged acute-subacute ischemic strokelesion: Segmentation, prediction and insights into dynamic evolution simu-lation models. A critical appraisal. NeuroImage Clin. 1, 164–178.Ronneberger, O., Fischer, P., Brox, T., 2015. U-Net: Convolutional networksfor biomedical image segmentation, in: MICCAI, pp. 234–241.Roy, S., Carass, A., Shiee, N., Pham, D.L., Prince, J.L., 2010. MR contrastsynthesis for lesion segmentation, in: ISBI, IEEE. pp. 932–935.Shen, D., Wu, G., Suk, H.I., 2017. Deep learning in medical image analysis.Annu. Rev. Biomed. Eng. 19, 221–248.Song, T., Huang, N., 2018. Integrated extractor, generator and segmentor forischemic stroke lesion segmentation, in: Int. MICCAI Brainlesion Work.,pp. 310–318.Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Cardoso, M.J., 2017. Gen-eralised Dice overlap as a deep learning loss function for highly unbalancedsegmentations, in: Deep Learn. Med. Image Anal. Multimodal Learn. Clin.Decis. Support, pp. 240–248.Szegedy, C., Vanhoucke, V., Io ff e, S., Shlens, J., Wojna, Z., 2016. Rethinkingthe Inception Architecture for Computer Vision, in: CVPR, pp. 2818–2826.Tieleman, T., Hinton, G., 2012. Lecture 6.5-RMSProp, COURSERA: Neuralnetworks for machine learning. Technical Report. University of Toronto.Ting-Chun Wang, Liu, M.Y., Jun-Yan Zhu, Andrew Tao, Jan Kautz, BryanCatanzaro, 2018. High-resolution image synthesis and semantic manipu-lation with conditional GANs, in: CVPR, pp. 8798–8807.Vikas Kumar Anand, Khened, M., Alex, V., Krishnamurthi, G., 2018. Fullyautomatic segmentation for ischemic stroke using CT perfusion maps, in:Int. MICCAI Brainlesion Work., pp. 328–334.Winzeck, S., Hakim, A., McKinley, R., Pinto, J.A., Alves, V., Silva, C., Pisov,M., Krivov, E., Belyaev, M., Monteiro, M., Oliveira, A., Choi, Y., Paik,M.C., Kwon, Y., Lee, H., Kim, B.J., Won, J.H., Islam, M., Ren, H., Robben,D., Suetens, P., Gong, E., Niu, Y., Xu, J., Pauly, J.M., Lucas, C., Heinrich,M.P., Rivera, L.C., Castillo, L.S., Daza, L.A., Beers, A.L., Arbelaezs, P.,Maier, O., Chang, K., Brown, J.M., Kalpathy-Cramer, J., Zaharchuk, G.,Wiest, R., Reyes, M., 2018. ISLES 2016 and 2017-benchmarking ischemicstroke lesion outcome prediction based on multispectral MRI. Front. Neurol.9, 679.Xiao, X., Lian, S., Luo, Z., Li, S., 2018. Weighted Res-UNet for high-quality retina vessel segmentation, in: Int. Conf. Inf. Technol. Med. Educ.,Hangzhou. pp. 327–331.Xu, Z.Q.J., Zhang, Y., Xiao, Y., 2019. Training behavior of deep neural networkin frequency domain, in: ICONIP, pp. 264–274.Yahiaoui, A.F.Z., Bessaid, A., 2016. Segmentation of ischemic stroke area fromCT brain images. ISIVC , 13–17.Zaharchuk, G., El Mogy, I.S., Fischbein, N.J., Albers, G.W., 2012. Comparisonof arterial spin labeling and bolus perfusion-weighted imaging for detectingmismatch in acute stroke. Stroke 43, 1843–1848.e, S., Shlens, J., Wojna, Z., 2016. Rethinkingthe Inception Architecture for Computer Vision, in: CVPR, pp. 2818–2826.Tieleman, T., Hinton, G., 2012. Lecture 6.5-RMSProp, COURSERA: Neuralnetworks for machine learning. Technical Report. University of Toronto.Ting-Chun Wang, Liu, M.Y., Jun-Yan Zhu, Andrew Tao, Jan Kautz, BryanCatanzaro, 2018. High-resolution image synthesis and semantic manipu-lation with conditional GANs, in: CVPR, pp. 8798–8807.Vikas Kumar Anand, Khened, M., Alex, V., Krishnamurthi, G., 2018. Fullyautomatic segmentation for ischemic stroke using CT perfusion maps, in:Int. MICCAI Brainlesion Work., pp. 328–334.Winzeck, S., Hakim, A., McKinley, R., Pinto, J.A., Alves, V., Silva, C., Pisov,M., Krivov, E., Belyaev, M., Monteiro, M., Oliveira, A., Choi, Y., Paik,M.C., Kwon, Y., Lee, H., Kim, B.J., Won, J.H., Islam, M., Ren, H., Robben,D., Suetens, P., Gong, E., Niu, Y., Xu, J., Pauly, J.M., Lucas, C., Heinrich,M.P., Rivera, L.C., Castillo, L.S., Daza, L.A., Beers, A.L., Arbelaezs, P.,Maier, O., Chang, K., Brown, J.M., Kalpathy-Cramer, J., Zaharchuk, G.,Wiest, R., Reyes, M., 2018. ISLES 2016 and 2017-benchmarking ischemicstroke lesion outcome prediction based on multispectral MRI. Front. Neurol.9, 679.Xiao, X., Lian, S., Luo, Z., Li, S., 2018. Weighted Res-UNet for high-quality retina vessel segmentation, in: Int. Conf. Inf. Technol. Med. Educ.,Hangzhou. pp. 327–331.Xu, Z.Q.J., Zhang, Y., Xiao, Y., 2019. Training behavior of deep neural networkin frequency domain, in: ICONIP, pp. 264–274.Yahiaoui, A.F.Z., Bessaid, A., 2016. Segmentation of ischemic stroke area fromCT brain images. ISIVC , 13–17.Zaharchuk, G., El Mogy, I.S., Fischbein, N.J., Albers, G.W., 2012. Comparisonof arterial spin labeling and bolus perfusion-weighted imaging for detectingmismatch in acute stroke. Stroke 43, 1843–1848.