[PDF] Spectral Decomposition in Deep Networks for Segmentation of Dynamic Medical Images

Abstract

Dynamic contrast-enhanced magnetic resonance imaging (DCE- MRI) is a widely used multi-phase technique routinely used in clinical practice. DCE and similar datasets of dynamic medical data tend to contain redundant information on the spatial and temporal components that may not be relevant for detection of the object of interest and result in unnecessarily complex computer models with long training times that may also under-perform at test time due to the abundance of noisy heterogeneous data. This work attempts to increase the training efficacy and performance of deep networks by determining redundant information in the spatial and spectral components and show that the performance of segmentation accuracy can be maintained and potentially improved. Reported experiments include the evaluation of training/testing efficacy on a heterogeneous dataset composed of abdominal images of pediatric DCE patients, showing that drastic data reduction (higher than 80%) can preserve the dynamic information and performance of the segmentation model, while effectively suppressing noise and unwanted portion of the images.

Full PDF

SSpectral Decomposition in Deep Networks forSegmentation of Dynamic Medical Images

Edgar A. Rios Piedra, Morteza Mardani, Frank Ong, Ukash Nakarmi,Joseph Y. Cheng, and Shreyas Vasanawala

Department of Radiology, Stanford University, CA, United StatesDepartment of Electrical Engineering, Stanford University, CA, United States

Abstract.

Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a widely used multi-phase technique routinely used in clinicalpractice. DCE and similar datasets of dynamic medical data tend tocontain redundant information on the spatial and temporal componentsthat may not be relevant for detection of the object of interest and re-sult in unnecessarily complex computer models with long training timesthat may also under-perform at test time due to the abundance of noisyheterogeneous data. This work attempts to increase the training eﬃcacyand performance of deep networks by determining redundant informationin the spatial and spectral components and show that the performanceof segmentation accuracy can be maintained and potentially improved.Reported experiments include the evaluation of training/testing eﬃcacyon a heterogeneous dataset composed of abdominal images of pediatricDCE patients, showing that drastic data reduction (higher than 80%)can preserve the dynamic information and performance of the segmen-tation model, while eﬀectively suppressing noise and unwanted portionof the images.

Keywords:

Deep networks, data reduction, kidney, heterogeneity

Dynamic contrast-enhanced (DCE) MRI is a multi-phase imaging technique thatgenerates time-series images of the body and is widely used in the clinic. Theseimages usually undergo manual segmentation of the object of interest (e.g., or-gan, tumor, other) to later extract clinically meaningful biological informationthat is used for diagnosis (e.g., functional status, anatomical characteristics,other bio-markers). In lieu of the manual segmentation, deep neural networkshave been proved to outperform previous methods [1,2] by signiﬁcant margins,but there are challenges that still need to be solved.Some of these challenges include the complexity of the data that goes intothe model, general to the domain of medical images and especially applicable todynamic information, much of this large amount of data that is acquired can beredundant on both the spatial and temporal components that may unnecessarily a r X i v : . [ phy s i c s . m e d - ph ] S e p E. Rios Piedra et al. increase the complexity of the model and alternatively exponentially increase thetime it takes for the radiologist to manually process the data [10].Additionally, image heterogeneity poses a special challenge in medical im-age given the variety of sources from which it can originate, including physicalartifacts (ringing, chemical shift, ghosting), motion blur (respiratory, cardiac, pa-tient movement), presence of pathological cases, resolution and contrast betweentissues, among others [4,5]. Image artifacts and heterogeneous images are inher-ently more likely to be observed on time-series dynamic data such as DCE-MRIgenerates large 4D datasets with multiple observations of the area of interest.For this work, we hypothesize that the large dynamic datasets can be moreeﬃciently used by doing a series of spatial and spectral processing to reducethe heterogeneity observed in temporal domain (across phases) and the spa-tial domain (noise, movement, artifacts). More speciﬁcally, the main spectralcomponents together with localized segmentation can lead to a smaller, noise-reduced and time-invariant input for the following deep architecture (U-Net,V-Net, CNNs, etc.)Pediatric kidney DCE-MRI represents an appropriate domain to test thishypothesis given that the proportion of acquired versus useful data is large [ ? ].These kind of data-sets are more likely to be aﬀected by movement over time(infants tend to move more than adults) and additionally, there is a higher de-gree of heterogeneity that is a result of the presence of some abnormality (e.g.,polycystic kidney disease, genetic variations, cysts, tumors, etc.) [4,7].Reported experiments include the evaluation of training/testing eﬃcacy ona heterogeneous DCE-MRI dataset composed of abdominal (kidney) images ofpathologic pediatric patients. Results include the comparison between the per-formance of a modiﬁed U-Net architecture under diﬀerent scenarios including thereference performance when the full DCE input is used for segmentation, andwhere a low-rank spectral decomposition at diﬀerent image spatial scales areutilized. The main contribution of this paper is that were able to show that thismethodology can be applied to dynamic medical data-sets and eﬀectively sup-press the noise and irrelevant data, while achieving a segmentation performancecomparable to that obtained when using the full data. During a DCE scan, multiple 3D MRI scans (phases) areacquired after the intravenous injection of a contrast agent (e.g., Gadolinium)with the goal of observing its ﬂow throughout the kidney to detect regions ofabnormality (e.g., segmentation of the renal compartments) [6,8]. During eachphase, the ﬂow of the contrast agent is captured as the relaxation characteristicsof nearby tissues change over time and are observed as hyper-intense structuresproportionally to the amount of present contrast agent (perfusion through blood patial and spectral decomposition for segmentation of medical images 3 ﬂow). Figure 1 (upper rows) shows the overall DCE process.The utilized data-set was collected with IRB approval and consists of 40high-resolution multi-contrast pediatric DCE cases from which 25 were used fortraining, 5 for validation and 10 for test. These cases included some degree ofabnormality (e.g., hydronephrosis, polycystic kidney disease, congenital anoma-lies) that introduce heterogeneity to our dataset. Manually delineated regions ofinterest (ROI) were generated for both kidneys by an expert technologist, withsubsequent radiologist editing, to train and assess system performance.Imaging was performed using a multi-phase 3D modiﬁed SPGR sequencewith motion navigation, intermittent spectrally selective fat-inversion pulses,and VDRad sampling patterns were used during the contrast injection. Minimumecho time (TE) 1.21.6 ms, repetition time (TR) 3.03.7 ms, ﬂip angle 15 degreesbandwidth (BW) 100 kHz, slice thickness 0.91.2 mm, FOV 2044 cm, spatialresolution 0.8 x.81.4 x 1.4 mm , and a total acceleration factor of 7.88.0. A totalof 50 phases of 100 images each were acquired for each case. Network architecture:

The utilized architecture is multi-channel U-NETbased on [9], as it has been shown to have a robust performance for the produc-tion of accurate results in volumetric medical imaging scenarios [10,11,12]. Themodel consists of three 3x3 convolution layers on the contracting path and aReLU and max pooling layers (stride 2) at the end of each block. The upstreamnetwork presents a similar conﬁguration with 2x2 up-convolution step and 3x3convolution layers. The ﬁnal layer at the end of the up-stream side is a 1x1 fullyconnected layer to produce pixel-wise scores for the ROIs of size x i , y i for an Nnumber of slices that matches the input image volumes. An illustration of thenetwork and its diﬀerent inputs is shown in Figure 2.A cross-entropy loss function and the Adam optimizer using Keras with Ten-sorﬂow on an NVIDIA GTX TITAN GPU with images of size 256x256 pixelsusing patches of size 48, 24, 12 and 6 on both down- and up-streams for 100epochs (early-stop enabled). This section describes the process to perform a segmentation at multiple imagescales around the kidney region, spectral decomposition of the DCE-MRI into itsmain components, the diﬀerent combinations tested for each scenario as a diﬀer-ent set of n main components are used to train the system and the comparisonwith the reference segmentation performance utilizing the full DCE data. To decompose the input dynamic dataset we employed the well-known singularvalue decomposition (SVD) method [13]. This method can be used to separate

E. Rios Piedra et al.

Fig. 1.

Comparison of DCE and spectral decomposition. The upper rows show some ofthe phases that compose the 4D DCE-MRI. It can be observed how the contrast agentﬂows from the vascular system to perfuse the kidneys (cortex then the inner medulla)and later go on to the collector system. The lower rows show the reconstruction of theimages once the main components have been calculated using SVD. Diﬀerent areas ofhyperintensity can be seen in the ﬁrst sets of components, representing the dynamicenhancement with reduced heterogeneity. Background noise is captured in the lattercomponents.patial and spectral decomposition for segmentation of medical images 5 the signal into its main components with respect to their contribution to theoverall variation encoded in the image. In the case of dynamic DCE-MRI theinput matrix is reconstructed into a 2D input of shape P ∗ ( x ∗ y ∗ z ), whereP is the number of phases (50 in this case) and x, y, and z represent the 3Ddimensions of each phase. Then the singular value decomposition M is given by M = U ∗ Σ ∗ V T (1)where U is has the left singular vectors, Σ is rank 1 matrix with singularvalues (diagonal matrix) and V T contains the right singular vectors.Afterward, the information contained in U ordered according to the relativeimportance of each component encoded in Σ can be reshaped into SVD inputimages as its coeﬃcients determine how much information from the input is re-tained in each component. In summary, the ﬁrst (low-rank) components containthe most relevant information observed across the 50 phases, leaving noise (ran-dom information) and artifacts in the last components as these provide the leastinformation. Consequently, the SVD output is time-invariant as the contrastenhancement information is aggregated the main components. Figure 1 (lowerrows) shows the output images from this step, which are used as input to thedeep-learning model. Spatial data reduction can further help to limit the regions where the featurelearning occurs and allow for more speciﬁc texture patterns to be detected. Forthis purpose, a localized-segmentation approach that segments the input imagesat multiple scales was utilized. The overall idea is to center the learning processon the fragments that contain the region of interest and test the performanceunder diﬀerent image partitions or sizes during training. This approach has beendetermined in MRI and other image domains to be useful in cases where localfeatures can provide better results than the patterns observed at the global scale[14,15].Three diﬀerent iterations of the network were created on which increasinglocality regions were utilized, each of them was centered around the kidneyregion (structure of interest) going from regions that included only the internalkidney components (e.g., medulla, cortex), a medium one that contained some ofthe peripheries outside the kidney cortex and a bigger one that contained a widerregion of the thoracic region [16]. Figure 2 shows the input architecture as wellas the multi-scale and SVD inputs to the network. For each case, a kidney regionof interest (ROI) was found and then compared to the manual gold standard toobtain the segmentation accuracy according to the Dice coeﬃcient [17]. D = (2 ∗ T P )(2 ∗ T P + F P + F N ) = 2 ∗ | A ∩ B | ( | A | + | B | ) (2) E. Rios Piedra et al. where D is the Dice coeﬃcient, A is the automatically generated image andB is the manually generated gold standard.

Fig. 2.

Caption: Overall diagram of the system architecture and the diﬀerent inputsutilized for the experiments. A modiﬁed U-Net based on [9] was utilized to obtain theROI. Case A represents the inputs of the original DCE-MRI images. Case B representsthe diﬀerent components generated by the SVD. Case C and D show the case wherediﬀerent sub-scales of the images around the region of interest.

The overall process included the training and evaluation of the individual andcombined performance of the spectral and spatial decomposition operations withthe deep neural network as the number of main input components was increasedin each evaluation. We tested all ﬁrst ten main components and then increasedin steps of ﬁve until they were all included (ﬁfty diﬀerent phases for these exper-iments). A ﬁve-fold cross-validation was also performed to avoid model overﬁt.Finally, Each of the experiments was compared to the performance observedwhen training and segmenting the images using the all phases of the originalDCE-MRI (shown in blue in Figure 4 and the second column in Table 1).Figure 4 contains the distribution of results observed for the spectral de-composition experiments, through the training and testing sets it was observedthat the best performance was obtained when using an interval that includedthe ﬁrst 10 components. We also observed that in many cases the performanceobserved was comparable or higher than the segmentation performance using all patial and spectral decomposition for segmentation of medical images 7

DCE phases but the performance obtained across all test subjects was about10% lower than the reference performance (0.6342 vs 0.7333) but it is worthnoting that this was achieved using less than 20% of the available (8 out of 50components). Lastly, results after the 25 th component provided poorer perfor-mance (in practice, adding noise to the training) and were omitted for betterdata visualization.Lastly, table 1 shows the summarized results using localized segmentation,spectral decomposition, and a combination of both. Showing that at trainingstage, using the ﬁrst main components (SVD column) can achieve a better per-formance than the reference DCE. Comparatively, the localized segmentationdid not manage to out-perform the reference in any scenario but managed toincrease its performance when applied to the spectral data (MS-SVD column),speciﬁcally in the case where the intermediate scale was used (covering kidneymedulla, cortex, and immediate kidney periphery). Fig. 3.

Table 1. Summary of results obtained on the diﬀerent experiments for train,validation and test sets. The DCE and L-DCE column shows the performance whenall 50 DCE-MRI phases are utilized for training. SVD and L-SVD show the best per-formance obtained when using the ﬁrst eight input components, as commented in thediscussion section. L-DCE = Localized DCE, L-SVD = Localized SVD.

In this project, we attempted to use spatial data reduction and spectral decompo-sition to constrain the amount of information used from heterogeneous data-setsand evaluate the eﬀect in segmentation accuracy. A multiple scaling approachwas utilized to evaluate the spatial utility of diﬀerent localities of the input data-set. Additionally, we utilized low-rank singular value decomposition to evaluatethe spectral information that is relevant for the input kidney data (number ofsingular values to use) to de-noise, reduce the data input size, and make theinput time invariant. These experiments were performed using a modiﬁed U-Netarchitecture, results (Table 1) were observed to achieve a segmentation perfor-mance similar to the reference DCE-MRI performance using the full data. The

E. Rios Piedra et al.

Fig. 4.

Performance of a sample test subjects and average results obtained for all testcases as diﬀerent singular values are added into the network. Performance is measuredas the Dice coeﬃcient between the predicted mask against the expert manual segmen-tation. The best performing result for each case is shown with stronger color.patial and spectral decomposition for segmentation of medical images 9 main contribution of this paper is that were able to show that spatial and spec-tral decomposition can be applied to dynamic medical data-sets to eﬀectivelysuppress noise and irrelevant data, achieving a segmentation performance similarto the reference full-data model (less than 15% diﬀerence) using only a fraction(8 main components, 16%) of the original data size, which can be of extremeutility when working with big sets of dynamic medical data and other applica-tions such as functional MRI, diﬀusion scans and other non-medical applications.