Deep Residual Dense U-Net for Resolution Enhancement in Accelerated MRI Acquisition
DDeep Residual Dense U-Net for Resolution Enhancement inAccelerated MRI Acquisition
Pak Lun Kevin Ding a , Zhiqiang Li b , Yuxiang Zhou c , and Baoxin Li aa School of Computing, Informatics, and Decision Systems EngineeringArizona State University, Tempe, AZ 85281 b Dept. of Neuroradiology, Barrow Neurological Institute, Phoenix, AZ 85013 c Dept. of Radiology, Mayo Clinic Arizona, Phoenix, AZ 85054
ABSTRACT
Typical Magnetic Resonance Imaging (MRI) scan may take 20 to 60 minutes. Reducing MRI scan time isbeneficial for both patient experience and cost considerations. Accelerated MRI scan may be achieved byacquiring less amount of k-space data (down-sampling in the k-space). However, this leads to lower resolutionand aliasing artifacts for the reconstructed images. There are many existing approaches for attempting toreconstruct high-quality images from down-sampled k-space data, with varying complexity and performance.In recent years, deep-learning approaches have been proposed for this task, and promising results have beenreported. Still, the problem remains challenging especially because of the high fidelity requirement in mostmedical applications employing reconstructed MRI images. In this work, we propose a deep-learning approach,aiming at reconstructing high-quality images from accelerated MRI acquisition. Specifically, we use ConvolutionalNeural Network (CNN) to learn the differences between the aliased images and the original images, employinga U-Net-like architecture. Further, a micro-architecture termed Residual Dense Block (RDB) is introducedfor learning a better feature representation than the plain U-Net. Considering the peculiarity of the down-sampled k-space data, we introduce a new term to the loss function in learning, which effectively employs thegiven k-space data during training to provide additional regularization on the update of the network weights. Toevaluate the proposed approach, we compare it with other state-of-the-art methods. In both visual inspection andevaluation using standard metrics, the proposed approach is able to deliver improved performance, demonstratingits potential for providing an effective solution.
Keywords:
Accelerated MRI Acquisition, Deep Learning, U-Net
1. INTRODUCTION
Magnetic resonance imaging (MRI) is among the most important imaging methods for medical diagnosis. Ob-taining fully sampled MRI data requires relatively long scan time. To shorten MRI scan time for improvedpatient experience and reduced cost, researchers have investigated accelerated MRI acquisition. One basic ideafor acceleration is to under-sample in k-space, which may cause aliasing in the reconstructed images. ParallelMRI
1, 2 and compressed sensing (CS) MRI
3, 4 are two popular techniques. A representative technique of theparallel MRI, generalized auto-calibrating partial parallel acquisition (GRAPPA), uses interpolation to fill themissing k-space data with the surrounding data from all the coils, while CS-MRI randomly samples the k-spacedata for approximating the original image.Approaches using low-rank matrix completion technique to solve the CS-MRI/parallel MRI problem werealso investigated. Representatives include SAKE and the annihilating filter based low-rank Hankel matrix Further author information:P.L.K Ding: [email protected]. Li: [email protected]. Zhou: [email protected]. Li: [email protected] to SPIE Medical Imaging, February 2019 a r X i v : . [ ee ss . I V ] J a n igure 1. The illustration of our proposed RD-U-Net. Blue arrow: 3x3 convolution + Batch Normalization + Nonlinearactivation; Red arrow: 2x2 max pooling with stride 2; Yellow arrow: 2 x 2 up-convolution; Green arrow: skip connectionwith residual dense block. approach (ALOHA). However, these algorithms have high complexity and k-space data are required during thereconstruction, making them impossible for cases with only image domain inputs.In recent years, deep learning has become one of the most important tools for visual computing research,
7, 8 with great performance in image classification, segmentation, recognition, super resolution, etc. Therefore,some researcher started to utilize deep-learning techniques for medical image reconstruction. Wang et al traineda convolutional neural network (CNN) to learn the mapping from the aliased image to the original fully sampledreconstruction. The output of the network can be used as an initial guess or regularization term in conventionalCS approaches. In ref.14, the authors proposed a multilayer perceptron for parallel MRI, and in ref.15 theresearchers applied CNN on CS algorithm. Kang et al applied the CNN technique on computed tomography(CT), etc.The authors of a recent paper used U-Net with residual learning to learn the relationship between thealiased and original images, and the proposed framework outperforms traditional methods like SENSE andALOHA. However, since U-Net is originally developed for medical image segmentation, it is likely that directlyusing it on image reconstruction may not give us the best performance. For this reason, we propose ResidualDense U-Net (RD-U-Net), a U-Net based deep neural network to further improve the quality of the reconstructedimage. Inspired by DenseNet, a residual dense block for refinement is introduced to improve the quality of thereconstructed image. Furthermore, we impose the Fourier constraint into the loss function. Experimental resultsshow that, both visually and numerically, our proposed architecture has a better performance.
2. RELATED WORKS
In this section, we review the works that related to our proposed RD-U-Net model for accelerated MRI recon-struction. We note that there is a line of work on super-resolution using sparse represent ion (e.g., ), which hasdelivered superior performance for super-resolution in natural images. However, such work is not amenable forthe task of MR image reconstruction in this paper since the degradation involves structured aliasing. igure 2. The illustration of the proposed Residual Dense Block (RDB). Blue arrow: 3x3 convolution + Batch Nor-malization + Nonlinear activation; Red arrow: Skip Connection; ”Dense part: The input (blue) is first convolved, andconcatenate to itself, and then the second convolution is applied; Residual part: The input and output of the dense partare summed together to learn the residual. U-Net is first proposed for biomedical image segmentation, which incorporates skip connection and downsam-pling/upsampling layers. These skip connection intend to extract local information, while the encoding/decodingprocedure provide global information. While obtaining state-of-the-art results, U-Net is applied to other visualcomputing field like accelerated MRI reconstruction and pansharpening. Optimization is an important procedure to deep learning. A major challenge for the optimization is the gradientvanishing problem. To overcome this issue, the concept of residual learning is introduced in the residual network(ResNet). In this model, a shortcut connection (skip connection) is used in every basic residual block, whichmakes the gradient flow in the networks is relatively stable. It is also shown that in ref.21, adding skip connectionleads to the flatness of loss surfaces. ResNet has provided very promising results on many applications.
Many works have shown that the networks perform better if there are connections between layers close to theinput and the ones close to the output. The authors in ref.18 propose Dense Convolutional Network (DenseNet),which connects each layer to every other layer by using skip connections. Differ from the traditional convolutionalnetwork, every layer in DenseNet takes the feature maps from all preceding layers as inputs, and its output featuremaps are served as inputs for the subsequent layers. DenseNet achieves state-of-the-art performance in a lot ofreal world problems.
3. PROPOSED APPROACH
For traditional super resolution, patch-based
12, 22 approaches are usually used. However, for accelerated MRIreconstruction, the aliasing artifacts are of global nature. To diminish the global artifacts, we can use the wholeimage as an input. The authors of the ref.17 use U-Net with residual learning to handle this problem. To beprecise, let x be the input image, and ˜ f be the function represented by the U-Net, the output of the model canbe defined as: y = f ( x ) = x + ˜ f ( x ) (1)where f is the function representing the whole network. In this case ˜ f ( x ) = y − x , and since x and y are thelow resolution and high resolution image respectively, ˜ f maps x to the residual. Learning such ˜ f leads to fasterconvergence. The proposed network architecture is illustrated in Fig. 1. Similar to the U-Net-based approach, it consistsof an encoding path and a decoding path. The encoding path is a traditional convolutional neural network, igure 3. The plots for the testing loss versus number of epochs trained in a trial. (a) the testing loss for different valueof α ; (b) the testing loss for U-Net, RD-U-Net without Fourier constraint (denoted by α = 0) and with Fourier constraint(denoted by α = 0 . GRAPPA U-Net RD-U-Net( α = 0) RD-U-Net( α = 0 . . ± . . ± . . ± . . ± . Table 1. The MSE (mean and standard deviation) for different models on 5 trials: GRAPPA, U-Net, RD-U-Net withoutFourier constraint (denoted by α = 0), RD-U-Net with Fourier constraint (denoted by α = 0 . which consists of two 3 x 3 convolutions, each followed by a batch normalization (BN) layer and a nonlinearactivation layer. After that, a 2 x 2 max pooling layer with stride 2 is applied for down sampling, and thenumber of channels is doubled after the down sampling. For the decoding path, at every stage it consists ofa 2 x 2 deconvolution which up sample the feature map and reduce the number of channels by half. Afterup-sampling, feature maps from the same level in the encoding path are fed to the Residual Dense Block, andthe corresponding output is concatenated, followed by two 3 x 3 convolutional layers, a BN layer and a nonlinearactivation layer. A 1 x 1 convolutional layer is used at the final to map all the features to a single channel. U-Net has its limitations for extracting high frequency data. Since the high frequency content are only containedin the upper part of the network (the earlier stages of the encoding part, and the later stages of the decodingpart), The network is not deep enough to extract the high frequency features. Inspired by DenseNet, weintroduce Residual Dense Block (RDB) to refine the feature map. RDB is formed by the dense part and residualpart. For the dense part, the input is first passed through a convolutional layer, a batch normalization layerand a nonlinear activation layer, where the number of filters used in the convolutional layer is the same as thenumber of channels of its input. The output is then concatenated with the input of the RDB, and passed throughanother convolutional layer again, and reduce the number of channels by half. At the end the input of the RDBis added to the output of the dense part, to form the residual part. Fig. 2(a) is an illustration.Instead of using the plain skip connection (copying the feature maps from the encoding part to the decodingpart), adding a refinement is a more reasonable choice, as theoretically, the plain skip connection is a specialcase of the RDB (by setting all the weights of the second convolution in RDB to be zero). .2 Employing Fourier Constraints L loss is usually used in image reconstruction tasks, which is defined as follows:min (cid:107) y − f ( x ) (cid:107) (2)where f is the mapping represented by the neural network. However, given that the degradation of the imagesare come from the missing of columns (or rows) of in the K-space data, we can use this prior to improve theperformance, by using the following loss:min (cid:107) y − f ( x ) (cid:107) + α (cid:107) F ( y ) − F ( f ( x )) (cid:107) (3)where y and x represent the ground truth and the degraded image, respectively, f is the mapping represented bythe neural network, F is the inverse Fourier transform, and α is a constant. As we expect the lost informationonly comes from part of the rows in the Fourier domain, we use L norm for the Fourier regularization term.With this loss function we can effectively regularize the network learning by the error coming from those missingk-space data.
4. EXPERIMENTS
In this section, we introduce the dataset we use, followed by the network settings, and comparative studies.
We use the fully sampled knee datasets from mridata.org to evaluate our RD-U-Net. There are 20 datasets,and the data were acquired in Cartesian coordinate on a GE clinical 3T scanner, with the following parameters:Receiver Bandwidth = 50.0, number of coils = 8, acquisition matrix size = 320 × × × x ← x − mean ( x ) std ( x ) (4)After the transformation, the pixels in an image will have zero mean and unit variance. We train the network for 200 epochs with batch size = 3. Stochastic gradient descent (SGD) is used with initiallearning rate = 0.02 and momentum = 0.5. For every 20 epochs the learning rate is decreased by half. We usePoLU as our activation function. For calculating the error, we use mean square error (MSE), which can bedefined as: M SE = (cid:107) y − f ( x ) (cid:107) F N (5)where y and x represent the ground truth and degraded images, N represent the number of pixels, f denotesthe function represented by the learned neural network, (cid:107) · (cid:107) F represents the Frobenius Norm.To determine the value of α , we first test the our model with α = 0 . , . , .
005 with 1 trial for each value,using the loss function stated in Eq. 3. We use the α with the least MSE for the remaining experiments. Theplot of the testing loss for different α is in Fig. 3(a). To evaluate the performance of the RD-U-Net model,we compared the reconstruction results with GRAPPA and U-Net. Tab. 1 report the results for 5 differentruns, and Fig. 5 illustrates sample visual results. For the zero-filled reconstruction, there exists a lot of aliasinga) (b) (c)
Figure 4. The figures for (a) the sampling pattern (4x accelration, with 16 ACS, 5% of total PE); (b) the reconstructedimage from fully sampled k-space data; (c) the reconstructed image from zero-filled under sampled k-space data. (a) (b) (c) (d)
Figure 5. The figures for (a) the results from GRAPPA; (b) the results from U-Net; (c) the results from RD-U-Net;(d) the results from RD-U-Net with Fourier constraint. Row 1: the reconstructed images; Row 2: the difference betweenthe reconstructed images and the ground truth. artifacts (see Fig. 4(c)). Although GRAPPA removes the aliasing artifacts, the reconstructed image is still witha lot of noise (Fig.5(a)). The U-Net approach is better than GRAPPA, but the result is still lack of details(Fig. 5(b)). The proposed RD-U-Net provides a better reconstructed image with the presentation of residualdense block. In Fig.5(c), we can see that the result is sharper at the edges, and clearer for the details. TheFourier constraint is also useful for obtaining a better reconstruction, as shown in Fig. 3(b), adding the suchregularization leads to lower MSE.The network was implemented using Pytorch 0.4.0 with python 3.6.3 on Ubuntu 16.04. All the experimentswere performed on a computer with nVidia GTX 1080 GPU and Intel Xeon E5-2603 CPU, although the codewas not optimzed to with respect to the particular computer hardware.The reconstruction time for GRAPPA is about 20 seconds. The training time for the U-Net, RD-U-et(without/with Fourier constraint) are about 13, 15 and 15 hours respectively. The reconstruction timefor all the three compared neural networks are less than 1 second.
5. CONCLUSION
In this paper, we proposed a new architecture to approximate the fully sampled MR images from the downsampled MR ones. The architecture is based on U-Net, and it achieves low NMSE during the reconstruction ofMR images, due to the participation of the residual dense refinement and the Fourier regularization. The visualresults show that our proposed model is able to reduce more aliasing artifacts.
REFERENCES [1] Pruessmann, K. P., Weiger, M., Scheidegger, M. B., and Boesiger, P., “Sense: sensitivity encoding for fastmri,”
Magnetic resonance in medicine (5), 952–962 (1999).[2] Griswold, M. A., Jakob, P. M., Heidemann, R. M., Nittka, M., Jellus, V., Wang, J., Kiefer, B., and Haase,A., “Generalized autocalibrating partially parallel acquisitions (grappa),” Magnetic Resonance in Medicine:An Official Journal of the International Society for Magnetic Resonance in Medicine (6), 1202–1210(2002).[3] Donoho, D. L., “Compressed sensing,” IEEE Transactions on information theory (4), 1289–1306 (2006).[4] Lustig, M., Donoho, D. L., Santos, J. M., and Pauly, J. M., “Compressed sensing mri,” IEEE signalprocessing magazine (2), 72–82 (2008).[5] Shin, P. J., Larson, P. E., Ohliger, M. A., Elad, M., Pauly, J. M., Vigneron, D. B., and Lustig, M.,“Calibrationless parallel imaging reconstruction based on structured low-rank matrix completion,” Magneticresonance in medicine (4), 959–970 (2014).[6] Lee, D., Jin, K. H., Kim, E. Y., Park, S.-H., and Ye, J. C., “Acceleration of mr parameter mapping usingannihilating filter-based low rank hankel matrix (aloha),” Magnetic resonance in medicine (6), 1848–1864(2016).[7] Ren, S., He, K., Girshick, R., and Sun, J., “Faster r-cnn: Towards real-time object detection with regionproposal networks,” in [ Advances in neural information processing systems ], 91–99 (2015).[8] He, K., Gkioxari, G., Doll´ar, P., and Girshick, R., “Mask r-cnn,” in [
Computer Vision (ICCV), 2017 IEEEInternational Conference on ], 2980–2988, IEEE (2017).[9] Krizhevsky, A., Sutskever, I., and Hinton, G. E., “Imagenet classification with deep convolutional neuralnetworks,” in [
Advances in neural information processing systems ], 1097–1105 (2012).[10] Ronneberger, O., Fischer, P., and Brox, T., “U-net: Convolutional networks for biomedical image seg-mentation,” in [
International Conference on Medical image computing and computer-assisted intervention ],234–241, Springer (2015).[11] He, K., Zhang, X., Ren, S., and Sun, J., “Deep residual learning for image recognition,” in [
Proceedings ofthe IEEE conference on computer vision and pattern recognition ], 770–778 (2016).[12] Tong, T., Li, G., Liu, X., and Gao, Q., “Image super-resolution using dense skip connections,” in [
ComputerVision (ICCV), 2017 IEEE International Conference on ], 4809–4817, IEEE (2017).[13] Wang, S., Su, Z., Ying, L., Peng, X., Zhu, S., Liang, F., Feng, D., and Liang, D., “Accelerating mag-netic resonance imaging via deep learning,” in [
Biomedical Imaging (ISBI), 2016 IEEE 13th InternationalSymposium on ], 514–517, IEEE (2016).[14] Kwon, K., Kim, D., and Park, H., “A parallel mr imaging method using multilayer perceptron,”
Medicalphysics (12), 6209–6224 (2017).[15] Hammernik, K., Klatzer, T., Kobler, E., Recht, M. P., Sodickson, D. K., Pock, T., and Knoll, F., “Learninga variational network for reconstruction of accelerated mri data,” Magnetic resonance in medicine (6),3055–3071 (2018).[16] Kang, E., Min, J., and Ye, J. C., “A deep convolutional neural network using directional wavelets forlow-dose x-ray ct reconstruction,” Medical physics (10) (2017).[17] Lee, D., Yoo, J., and Ye, J. C., “Deep residual learning for compressed sensing mri,” in [ Biomedical Imaging(ISBI 2017), 2017 IEEE 14th International Symposium on ], 15–18, IEEE (2017).18] Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q., “Densely connected convolutional net-works,” in [
CVPR ], (2), 3 (2017).[19] Kulkarni, N., Nagesh, P., Gowda, R., and Li, B., “Understanding compressive sensing and sparserepresentation-based super-resolution,” IEEE Transactions on Circuits and Systems for Video Technol-ogy (5), 778–789 (2012).[20] Yao, W., Zeng, Z., Lian, C., and Tang, H., “Pixel-wise regression using u-net and its application on pan-sharpening,” Neurocomputing (2018).[21] Li, H., Xu, Z., Taylor, G., Studer, C., and Goldstein, T., “Visualizing the loss landscape of neural nets,” in[
Advances in Neural Information Processing Systems ], 6391–6401 (2018).[22] Ding, P. L. K., Li, B., and Chang, K., “Convex dictionary learning for single image super-resolution,” in[ ], 4058–4062 (Sep. 2017).[23] Li, Y., Ding, P. L. K., and Li, B., “Training neural networks by using power linear units (polus),” arXivpreprint arXiv:1802.00212arXivpreprint arXiv:1802.00212