Deep Generative SToRM model for dynamic imaging
Qing Zou, Abdul Haseeb Ahmed, Prashant Nagpal, Stanley Kruger, Mathews Jacob
DDEEP GENERATIVE STORM MODEL FOR DYNAMIC IMAGING
Qing Zou, Abdul Haseeb Ahmed, Prashant Nagpal, Stanley Kruger, Mathews Jacob.
University of Iowa
ABSTRACT
We introduce a novel generative smoothness regularization onmanifolds (SToRM) model for the recovery of dynamic imagedata from highly undersampled measurements. The proposedgenerative framework represents the image time series as asmooth non-linear function of low-dimensional latent vectorsthat capture the cardiac and respiratory phases. The non-linear function is represented using a deep convolutional neu-ral network (CNN). Unlike the popular CNN approaches thatrequire extensive fully-sampled training data that is not avail-able in this setting, the parameters of the CNN generator aswell as the latent vectors are jointly estimated from the under-sampled measurements using stochastic gradient descent. Wepenalize the norm of the gradient of the generator to encour-age the learning of a smooth surface/manifold, while temporalgradients of the latent vectors are penalized to encourage thetime series to be smooth. The main benefits of the proposedscheme are (a) the quite significant reduction in memory de-mand compared to the analysis based SToRM model, and (b)the spatial regularization brought in by the CNN model. Wealso introduce efficient progressive approaches to minimizethe computational complexity of the algorithm.
1. INTRODUCTION
The quest for high spatial and temporal resolution is central toseveral dynamic imaging problems, ranging from MRI, videoimaging, to microscopy. A popular approach to improvespatio-temporal resolution is self-gating, where cardiac andrespiratory information is estimated from navigator or centralk-space using bandpass filtering or clustering, followed bybinning and reconstruction [1, 2]. Several authors have alsointroduced smooth manifold regularization, which modelsthe images in the time series as points on a high dimensionalmanifold [3, 4, 5]. This approach may be viewed as an im-plicit soft-gating alternative to self-gating methods. Manifoldmethods including our smoothness regularization on mani-folds (SToRM) approach has been demonstrated in a varietyof dynamic imaging applications with good performance[3, 4, 5]. Since the data is not explicitly binned into a specificphase, manifold methods are not vulnerable to potential er-rors in clustering the time series based on navigators. Despitethe benefits, a key challenge with current manifold methodsis the high memory demand. Unlike self-gating methods that only recover the specific phases, manifold schemes recoverthe entire time series. This approach restricts the extensionof the framework to higher dimensional problems. The highmemory demand also makes it difficult to use additionalspatial and temporal regularization.The main focus of this work is to exploit the power ofdeep convolutional neural networks (CNN) to introduce animproved and memory efficient generative/synthesis formu-lation of SToRM. Unlike current manifold and self-gatingmethods, this approach does not require k-space navigators toestimate the motion states. Besides, unlike traditional CNNbased approaches, the proposed scheme does not requireextensive training data, which is challenging to acquire infree-breathing applications. We note that current manifoldmethods can be viewed as an analysis formulation. Specifi-cally, a non-linear injective mapping is applied on the imagessuch that the mapped points of the alias-free images lie on alow-dimensional subspace. When recovering from undersam-pled data, the nuclear norm prior is applied in the transformdomain to encourage their non-linear mappings to lie in asubspace. Unfortunately, this analysis approach requires thestorage of all the image frames in the time series. In thiswork, we model the images in the time series as non-linearmappings ρ t = G θ ( z t ) , where z i are vectors that live in avery low-dimensional subspace. The dimension of the sub-space can be very small (e.g 2-4) in practical applications. Werepresent the non-linear mapping using a convolution neuralnetwork with weights θ . The memory footprint of the algo-rithm depends on the number of parameters θ and z , which isorders of magnitude smaller than that of traditional manifoldmethods.We propose to jointly optimize for the network parameters θ and the latent vector z such that the cost (cid:80) i (cid:107)A t ( G θ z t ) − b i (cid:107) is minimized during image reconstruction. The smooth-ness of the manifold generated by G θ ( z ) depends on the gra-dient of G θ with respect to its input. To obtain a smooth man-ifold, we regularize the gradient of the mapping (cid:107)∇ z G θ (cid:107) .Similarly, the images in the time series are expected to varysmoothly in time. Hence, we also use a Tikhonov smoothnesspenalty on the latent vectors z t to further constrain the solu-tions. Unlike traditional CNN methods that are fast duringtesting/inference, the direct application of this scheme to thedynamic MRI setting is computationally expensive. We usea three-step progressive-in-time approach to significantly re- a r X i v : . [ ee ss . I V ] J a n ig. 1 . (a) Analysis SToRM and (b) Generative SToRM. The anal-ysis formulation [4, 6] in (a) minimizes the nuclear norm of the non-linear mappings ϕ ( x i ) of the images x i to encourage them to bein a subspace. By contrast, the proposed formulation expresses theimages as non-linear mappings G θ ( z i ) of the low-dimensional latentvectors z i . The main benefit of the generative model is its ability tocompress the data, thus offering a memory efficient algorithm. duce the computational complexity of the algorithm. Specif-ically, we grow the number of frames in the datasets duringthe optimization process. The latent vectors from the previ-ous iteration are linearly interpolated to initialize the latentvectors. We observe that the use of the progressive-in-timeapproach significantly reduces the computational complexityof the algorithm.The proposed approach is inspired by deep image prior(DIP) [7], which was introduced for static imaging problems.We note that the extension of DIP to dynamic imaging wasconsidered in [8]. The key difference of the proposed formu-lation from the above work is the joint optimization of thelatent variables z , unlike the above method that chooses z asrandom or interpolated versions of random vectors. Anotherkey distinction is the use of regularization priors on the net-work parameters and latent vectors, which ensures that thescheme learns meaningful latent vectors and the performanceof the network does not degrade with iterations as in tradi-tional DIP methods.
2. METHODS
Smooth manifold methods model images x i in the dynamictime series as points on a smooth manifold. In SToRM, theexponential (injective) functions of the images denoted by ϕ ( x i ) of the alias-free images are assumed to lie on a low-dimensional subspace. See Fig. 1.(a). The joint recoveryof the images denoted by the matrix X = [ x , .. x N ] fromundersampled data is posed as a nuclear norm minimizationproblem X ∗ = arg min X (cid:107)A ( X ) − B (cid:107) + λ (cid:107) [ ϕ ( x ) , .., ϕ _ { t } ( x N )] (cid:107) ∗ (1)To overcome the challenges with the above analysis scheme, we propose to model the images in the time series as x i = G θ ( z i ) , (2)where G θ is a non-linear mapping. We realize G θ using adeep convolutional neural network, inspired by the extensivework on generative image models. Here, z i are latent vec-tors that lie in a low-dimensional subspace. As z i vary in thesubspace, their non-linear mappings vary on the image mani-fold. The mapping G θ may be viewed as the inverse of theinjective mapping ϕ considered in analysis SToRM; ratherthan mapping the images to a low-dimensional subspace asin classical SToRM methods we now propose to express theimages as non-linear functions of latent variables living in alow-dimensional subspace. See Fig. 1.(b).The smoothness of the manifold is determined by the gra-dient of the non-linear mapping, denoted by ∇ z G θ . A map-ping with high gradient values can result in very similar latentvectors being mapped to very different images. To minimizethis risk, we propose to penalize the (cid:96) norm of the gradientsof the network, denoted by (cid:107)∇ z G θ (cid:107) . We term this prior asnetwork regularizer. We expect the adjacent time frames inthe time series to be similar; we propose to add a temporalsmoothness regularizer on the latent vectors. The parametersof the network θ as well as the low-dimensional latent vector z are estimated from the measured data by minimizing C ( z , θ ) = N (cid:88) i =1 (cid:107)A i ( G θ [ z i ]) − b (cid:107) + λ (cid:107)∇ z G θ (cid:107) (cid:124) (cid:123)(cid:122) (cid:125) network regularization + λ (cid:107)∇ t z t (cid:107) (cid:124) (cid:123)(cid:122) (cid:125) temporal regularization (3)with respect to z and θ . We initialize the network parametersand latent vectors to be random variables.We use ADAM optimization to determine the optimal pa-rameters. Note that the first and the second term in the ex-pression is separable over i . To keep memory demand of thealgorithm low, we propose to choose mini-batches consistingof random subset of frames. A key benefit of this frameworkover conventional neural network schemes is that it does notrequire any training data. Note that it is often impossible toacquire fully-sampled training data in dynamic imaging ap-plications.The main benefit of this model is the compression offeredby the representation; the number of parameters of the modelin (2) is orders of magnitude smaller than the number of pix-els in the dataset. The dramatic compression offered by therepresentation, together with the mini-batch training providesa memory efficient alternative to analysis SToRM [3, 4]. Al-though our focus is on establishing the utility of the scheme in2-D settings in this paper, the approach can be readily trans-lated to higher dimensional applications. Another benefit isthe implicit spatial regularization brought in by the genera-tive CNN. Specifically, CNNs are ideally suited to representimages rather than noise-like alias artifacts [7].
200 400 600 800 1000 1200 1400 1600 1800 2000
Time (seconds) -15-10-505101520 SE R ( d B ) without progressive in timeprogressive in time Fig. 2 . Reconstruction performance with progressive training intime and without progressive training in time. From the plot, onecan see that progressive training in time produces better results withmuch less running time comparing to the training without progres-sive in time.
While the generative SToRM approach significantly reducesthe memory demand, a challenge with this approach is theincreased computational complexity. To minimize the com-plexity, we propose to use a progressive optimization strategy.Specifically, we solve for a sequence of vectors z , z ,.., z M each corresponding to increasing number of time frames. Forinstance, in this work we choose z to be a × vector, wherewe consider the recovery of an average image G θ ( z ) = x from the entire data. We solve for the optimal θ and z byminimizing (1). Since we are solving for a single image, thisoptimization is fast. Following convergence, the latent vector { z } is linearly interpolated to the size of z and used alongwith θ as initialization, while solving for { θ , z } . This ap-proach significantly reduces the computational complexity asseen from our experiments
3. EXPERIMENTS3.1. Dataset and imaging experiments
All the experiments in this paper are based on a whole-heartmulti-slice dataset collected in the free-breathing mode usinga golden angle spiral trajectory. The acquisition of the datawas performed on a GE 3T scanner. The sequence parameterswere: TR= 8.4 ms, FOV= 320 mm x 320 mm, flip angle= 18
Fig. 3 . Impact of network regularization and latent variable regular-ization. The SER vs epoch plots are shown above, while two of thereconstructed images, their time profiles, and recovered latent vari-ables are shown. We note that the blue curve captures respiratorymotion, while the orange one captures cardiac motion. degrees, slice thickness= 8 mm.Results were generated using an Intel Xeon CPU at 2.40GHz and a Tesla P100-PCIE 16GB GPU. Results in §4.2,§4.3 were based on the first slice in the dataset, and resultsin §4.4, §4.5 were based on the second slice in the dataset.We binned the data from six spiral interleaves correspond-ing to 50 ms temporal resolution. The entire dataset corre-sponds to 522 frames. We omit the first 22 frames and usedthe remaining 500 frames for SToRM reconstructions, whichis used as ground truth for comparisons. In all the studies, weassumed the latent variables to be two dimensional since themain source of variability in the data correspond to cardiacand respiratory motion.
We demonstrate the quite significant reduction in runningtime offered by the progressive training strategy describedin Section 2.1 in Fig. 2. Here, we consider the recoveryfrom 150 frames with and without the progressive strategy.We plot the reconstruction performance, measured by theSignal-to-Error Ratio (SER) with respect to the running time.The results show that the proposed scheme can offer goodreconstructions in ≈ seconds, which is better than thedirect approach that takes more than 2000 seconds. We study the impact of network regularization priors in Fig.3.(a), where we show the reconstruction performance with re-spect to the number of epochs. The recovered latent variablesare also shown in the plots. We chose λ = 2 in this experi- Fig. 4 . Comparison of Generative SToRM, Analysis SToRM,time dependent deep image prior.ent. We note that unlike the case without network regular-ization, the SER of the regularized reconstruction increaseswith iteration. The case without regularization will start tofit to the noise with iterations as in the case of deep imageprior. We note that with regularization, the latent variablescapture cardiac (orange curve) and respiratory (orange curve)motion, even though no explicit priors or additional informa-tion (e.g navigators) about cardiac or respiratory rates wereused. Without network regularization, we observe increasedmixing of the cardiac and respiratory patterns in the latentvectors.In the cost function (3), we also have the temporalsmoothness regularization of the latent variables. We com-pare λ = 2 against λ = 0 , while λ was fixed as . .Similar to the network regularization setting, we observe thatthe performance of the un-regularized algorithm falls withiterations, while the performance of the regularized approachincreases or plateau with iterations. We also obsrved signif-icant mixing between cardiac and respiratory patterns in thelatent variables when no regularization is used. We compare the proposed generative SToRM approach withanalysis SToRM [6] and time dependent deep image prior al-gorithm [8]. We use the k-space data of 150 frames for thereconstructions. The reconstruction results are shown in Fig.4. The results show that the generative SToRM approach isable to reduce noise and alias artifacts compared to analysisSToRM, offering around 1dB improvement in performance.We attribute the improved performance to spatial regulariza-tion offered by the CNN generator, which is absent in theanalysis SToRM formulation. The reconstruction time of boththe algorithms are comparable. The Time-DIP scheme, whichassumes the latent variables to be fixed as random values re-sults in increased artifacts and blurring of motion details. Wenote that unlike the analysis schemes, the proposed schemedoes not use k-space navigators to estimate the motion states;the latent variables are estimated from the measured k-spacedata itself.
4. CONCLUSION
We introduce a generative manifold representation for therecovery of dynamic image data from highly undersampledmeasurements. The deep CNN generator is used to lift low-dimensional latent vectors to the smooth image manifold andthis proposed scheme does not require fully-sampled trainingdata. We jointly optimize the CNN generator parameters andthe latent vectors based on the undersampled data. We alsoproposed the training-in-time approch to minimize the com-putational complexity of the algorithm. During the training,the norm of the gradients of the generator is penalized tothe learning of a smooth surface/manifold, while temporalgradients of the latent vectors are penalized to encourage thetime series to be smooth. Comparisons with existing methodssuggest the utility of the proposed scheme in dynamic images.
5. COMPLIANCE WITH ETHICAL STANDARDS
This research study was conducted using human subject data.The institutional review board at the local institution approvedthe acquisition of the data, and written consent was obtainedfrom the subject.
6. ACKNOWLEDGMENTS
This work is supported by grants NIH 1R01EB019961-01A1and R01EB019961-02S. The authors claim that there is noconflicts of interest.
7. REFERENCES [1] Li Feng, Robert Grimm, Kai Tobias Block, Hersh Chan-darana, Sungheon Kim, Jian Xu, Leon Axel, Daniel KSodickson, and Ricardo Otazo, “Golden-angle radialsparse parallel mri: combination of compressed sensing,parallel imaging, and golden-angle radial sampling forfast and flexible dynamic volumetric mri,”
Magnetic res-onance in medicine , vol. 72, no. 3, pp. 707–717, 2014.[2] Anthony G Christodoulou, Jaime L Shaw, ChristopherNguyen, Qi Yang, Yibin Xie, Nan Wang, and Debiao Li,“Magnetic resonance multitasking for motion-resolvedquantitative cardiovascular imaging,”
Nature biomedicalengineering , vol. 2, no. 4, pp. 215–226, 2018.[3] Sunrita Poddar and Mathews Jacob, “Dynamic mri usingsmoothness regularization on manifolds (storm),”
IEEEtransactions on medical imaging , vol. 35, no. 4, pp.1106–1115, 2015.[4] Sunrita Poddar, Yasir Q Mohsin, Deidra Ansah, BijoyThattaliyath, Ravi Ashwath, and Mathews Jacob, “Man-ifold recovery using kernel low-rank regularization: Ap-plication to dynamic imaging,”
IEEE Transactions onComputational Imaging , vol. 5, no. 3, pp. 478–491, 2019.[5] Ukash Nakarmi, Yanhua Wang, Jingyuan Lyu, DongLiang, and Leslie Ying, “A kernel-based low-rank (klr)model for low-dimensional manifold recovery in highlyaccelerated dynamic mri,”
IEEE transactions on medicalimaging , vol. 36, no. 11, pp. 2297–2307, 2017.[6] Abdul Haseeb Ahmed, Ruixi Zhou, Yang Yang, PrashantNagpal, Michael Salerno, and Mathews Jacob, “Free-breathing and ungated dynamic mri using navigator-lessspiral storm,”
IEEE Transactions on Medical Imaging ,2020.[7] Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky,“Deep image prior,” in
Proceedings of the IEEE Confer-ence on Computer Vision and Pattern Recognition , 2018,pp. 9446–9454.[8] Kyong Hwan Jin, Harshit Gupta, Jerome Yerly, MatthiasStuber, and Michael Unser, “Time-dependent deep imageprior for dynamic mri,” arXiv preprint arXiv:1910.01684arXiv preprint arXiv:1910.01684