ShuffleUNet: Super resolution of diffusion-weighted MRIs using deep learning
Soumick Chatterjee, Alessandro Sciarra, Max Dünnwald, Raghava Vinaykanth Mushunuri, Ranadheer Podishetti, Rajatha Nagaraja Rao, Geetha Doddapaneni Gopinath, Steffen Oeltze-Jafra, Oliver Speck, Andreas Nürnberger
SShuffleUNet: Super resolution of diffusion-weightedMRIs using deep learning
Soumick Chatterjee § ∗†‡ , Alessandro Sciarra § ‡§ , Max D¨unnwald †§ ,Raghava Vinaykanth Mushunuri † , Ranadheer Podishetti † , Rajatha Nagaraja Rao † , Geetha Doddapaneni Gopinath † ,Steffen Oeltze-Jafra §¶(cid:107) , Oliver Speck ‡¶(cid:107)∗∗ and Andreas N¨urnberger ∗†(cid:107)∗ Data and Knowledge Engineering Group, Otto von Guericke University Magdeburg, Germany † Faculty of Computer Science, Otto von Guericke University Magdeburg, Germany ‡ Department of Biomedical Magnetic Resonance, Otto von Guericke University Magdeburg, Germany § MedDigit, Department of Neurology, Medical Faculty, University Hospital Magdeburg, Germany ¶ German Centre for Neurodegenerative Diseases, Magdeburg, Germany (cid:107)
Center for Behavioral Brain Sciences, Magdeburg, Germany ∗∗ Leibniz Institute for Neurobiology, Magdeburg, Germany
Abstract —Diffusion-weighted magnetic resonance imaging(DW-MRI) can be used to characterise the microstructure of thenervous tissue, e.g. to delineate brain white matter connectionsin a non-invasive manner via fibre tracking. Magnetic ResonanceImaging (MRI) in high spatial resolution would play an importantrole in visualising such fibre tracts in a superior manner. How-ever, obtaining an image of such resolution comes at the expenseof longer scan time. Longer scan time can be associated with theincrease of motion artefacts, due to the patient’s psychologicaland physical conditions. Single Image Super-Resolution (SISR),a technique aimed to obtain high-resolution (HR) details fromone single low-resolution (LR) input image, achieved with DeepLearning, is the focus of this study. Compared to interpolationtechniques or sparse-coding algorithms, deep learning extractsprior knowledge from big datasets and produces superior MRIimages from the low-resolution counterparts. In this research, adeep learning based super-resolution technique is proposed andhas been applied for DW-MRI. Images from the IXI dataset havebeen used as the ground-truth and were artificially downsampledto simulate the low-resolution images. The proposed method hasshown statistically significant improvement over the baselines andachieved an SSIM of . ± . . Index Terms —super-resolution, deep learning, DWI, DTI, MRI
I. I
NTRODUCTION
Non-invasive brain imaging techniques have been advancingfor the past few decades and are used for detecting various dis-eases, to study brain anatomy and its functions [1]. Diffusiontensor imaging (DTI) or diffusion-weighted imaging (DWI) isone of the MR techniques, which is employed in diagnosingwhite matter diseases, cancer or stroke. The proportion ofwater in the human body is approximately − andit is distributed over intra- and extracellular compartments.Different tissues in the human body exhibit varying diffusioncharacteristics. DWI is a technique, which generates signalsbased on changes in Brownian motion and it provides infor-mation regarding the diffusion properties. Axon membranes § S. Chatterjee and A. Sciarra contributed equally present in brain white matter limit the molecular movementperpendicular to the fibre thus resulting in anisotropic diffusionin white matter. DWI uses this property to provide informationon white matter integrity and structural details of white mattertracts [2], [3]. Though DWI is advancing rapidly and usedwidely in medical diagnosis, obtaining high-resolution scansis rather difficult since it requires longer acquisition times,which may lead to motion artefacts. Single Image Super-Resolution (SISR), a technique aimed to obtain high-resolution(HR) details from one single low-resolution (LR) input image,is the focus of this study.
A. Related Work
Deep learning has emerged in recent times as one of themost important tools for various applications, and SISR isone of the hot topics in the world of deep learning. Variousdeep learning based SISR techniques have been developedover the years [4], [5]. UNet [6], [7], originally proposed forimage segmentation, has gained popularity in a wide arrayof applications since its inception, including as a solution ofinverse problems such as SISR of MRI reconstruction [8],[9]. Han et al. [10] proposed a tight-frame UNet architec-ture exploiting wavelet decomposition, which improves theperformance of UNet for inverse problems (shown for 2Dsparse-view CT). Even though UNet (and its variants) is oneof the most common architectures for the task of SISR andother inverse problems, it has its limitations, such as smoothedoutput and checker-board artefacts. Aitken et. al [11] attributedsuch problems to the following components of UNet - decon-volution layer for upsampling, strided convolutions or max-pooling for downsampling and random weight initialisation.It has been observed that upsampling from low-resolutionfeature maps to higher resolution, commonly referred to asdeconvolution step, can produce checker-board patterns. Sub-pixel convolution [12] is observed to have an upper handover the other methods in avoiding such artefacts [11]. Max- a r X i v : . [ ee ss . I V ] F e b ooling is conventionally used for downsampling operationsin UNet-based models, which chooses the maximum valuein the given kernel and there is a loss of pixel informationwhich is not desirable as there can be a high possibility forloss of sensitive information. Toutounchi et al. [13] proposeda lossless pooling layer to deal with the blurring problem (lossof information) of the max-pool operation. Furthermore, ithas been shown that batch normalisation layers (part of theUNet architecture) can negatively impact the result as theyrestrict the range flexibility of the networks by normalisingthe features and also increase the GPU memory requirementsby around 40% [14], [15]. One final point to be noted is thattypically deep learning based techniques require large trainingdatasets. Employing patch-based techniques (such as patch-based super-resolution: PBSR) might be able to deal with theproblem of limited training data [16], while also reducing theGPU memory requirements. B. Contribution
This research work proposes ShuffleUNet architecture for3D volumetric images, inspired by tight frame UNet, whichaddresses the aforementioned problems of blurred or smoothedoutput and checker-board artefacts by replacing the stridedconvolutions with lossless pooling layers (pixel unshuffle), byreplacing deconvolution (transposed convolution or interpo-lation) with a sub-pixel convolution operation (pixel shuffle)and by removing the batch normalisation layers. This pa-per validates the proposed model by performing patch-basedsingle-image super-resolution of diffusion-weighted imagesand shows the advantage of such an approach in the visu-alisation of fibre tracts and other derived data.II. M
ETHODOLOGY
A. Network Architecture
The proposed ShuffleUNet consists of four blocks in thecontraction path, each of them downsamples the input by halfin all dimensions. Each block consists of three sub-blocks:double convolution, convolutional decomposition, and pixelunshuffle. The input goes to double convolution and its outputserves as the input of the convolutional decomposition sub-block. The input of this sub-block is provided to each convo-lution of this block and four different outputs are obtained -these outputs are referred to as convolutional decompositionof the input of this sub-block. Pseudo-loss-less downsamplingoperation, pixel unshuffle (see Eq.1), is applied on the fourthoutput, which down-samples the input by a factor of two in alldimensions, and the rest of the outputs are directly forwardedas skip-connection to the expansion path. I D = f l ( I L ) = PU ( W l × f ( l − ( I L ) + b l ) (1)Where PU denotes the pixel-unshuffling operation, whichrearranges all the elements of a tensor of dimension (n, c,rH, rW, rD) tensor to a tensor of dimension (n, r x c, H ,W , D), r is the scaling factor. After the contraction path, onedouble convolution sub-block is applied to the output of the final pixel unshuffle as the latent convolution. The output ofthis latent convolution is passed to the expansion path.Similar to the contraction path, the expansion path alsocontains four blocks - each of them upsamples the input by afactor of two in all dimensions. Each of these blocks containsthree sub-blocks: pixel shuffle, convolutional decomposition,and double convolution. Pixel shuffle sub-blocks upscale thegiven input by a factor of k (here k=2) in all dimensionsby reducing the number of filters by a factor of 1/ k . Atensor of dimensions (n, r x c, H, W, D) is upsampled withperiodic shuffling operation in feature space as explained bythe following equation: I S = f l ( I L ) = PS ( W l × f ( l − ( I L ) + b l ) (2)where PS denotes pixel-shuffling operation which shuffles andarranges all the pixel elements of a tensor with dimensions(n, k x c, H, W, D) to a tensor with dimensions (n, c,kH × kW x kD), k is the scaling factor. W l , b l denotes theweights and bias matrices and f ( l − represents feature mapsfrom low-resolution feature space from the previous layer and f l represents upsampled feature space after pixel shuffling.The output of the pixel shuffle sub-block is forwarded tothe convolutional decomposition sub-block to obtain fourdifferent outputs which are then added with the incomingskip-connections from the same level of the contraction path,and finally, these four results are concatenated together. Then,this is further concatenated together with the output of theskip-connection coming from the pixel unshuffle operationsof the contraction path, which is then forwarded to thedouble convolution sub-block. The final output of the modelis obtained after the fully-connected convolution layer.The initial number of filters of the network (first convolutionlayer) is and the number of features is doubled by each ofthe contraction path blocks after every down-sampling stepand is reduced by half in every up-sampling process in theexpansion path. Also, it is noteworthy that here both pixelshuffle and pixel unshuffle operators have learnable parame-ters. Furthermore, the weight initialisation (convolution, pixelshuffle and pixel unshuffle layers) was performed using anormal distribution (also known as Kaiming Normal) [17],rather than the typical choice of uniform distribution (KaimingUniform), to help the convergence and to avoid degradingeffects arising from random weight initialisation [11]. B. Implementation
The implementation was done using PyTorch and wastrained using an Nvidia V100 GPU. The volumes were dividedinto patches with a patch size of × × with the help ofTorchIO [18]. The network was trained for 80 epochs (when itconverged) with a batch size of four. The loss was calculatedusing L1 loss (mean absolute error) and was optimised usingthe Adam optimiser with a learning rate of − . C. Evaluation
The results of the proposed model and the baselines (Sincinterpolation, trilinear interpolation and UNet model) were ig. 1. Proposed network architecture: ShuffleUNet compared to the ground truth data using three metrics: struc-tural similarity index (SSIM) [19], root-mean-square error(RMSE), and universal quality index (UQI) [20]. Furthermore,derived voxel-wise data, such as axial diffusivity (AD), frac-tional anisotropy (FA), mean diffusivity (MD), and diffusiontensors (E) [21], were obtained using DIPY [22]. Thesederived data can be utilised to generate the fibre tracts andassess the integrity of white-matter microstructure [21]. Thederived data from the ShuffleUNet results and baseline resultswere compared against the derived data obtained from theground-truth volumes quantitatively using RMSE and UQI.
D. Dataset
For this research, DWIs from the IXI dataset were used,which were acquired with 1.5T and 3T MRI scanners with diffusion directions for healthy subjects. The volumesin the dataset are originally of anisotropic spatial resolutions . × . × . mm and . × . × mm . The 3Dvolumes were under-sampled to simulate the low-resolutiondataset, by a factor of two in all dimensions using FSL [23]to a spatial resolution of . × . × . mm , . × . × . mm respectively, which is . of the original data.UNet based models require the same size for input and output,hence the low-resolution dataset was then interpolated usingsinc interpolation to achieve the same pixel dimensions as thehigh-resolution volumes before supplying as input. The datasetwas split into three subsets: training, test and validation with , and subjects respectively.III. R ESULTS
The results have been quantitatively and qualitatively com-pared initially using the actual structural output of the modelsand then by comparing the derived data obtained. IXI Dataset: https://brain-development.org/ixi-dataset/
A. Output Evaluation
Fig. 2 portrays the resultant SSIM, RMSE and UQI valuesfor the three baselines (trilinear, sinc and UNet) and forShuffleUNet, while comparing the structural results againstthe high-resolution ground-truth volumes. It can be observedthat trilinear resulted in the lowest values for all three met-rics ( . ± . , . ± . , . ± . ), whileShuffleUNet outperformed all the baselines ( . ± . , . ± . , . ± . ). It is to be noted that for twoof the metrics (RMSE and UQI), UNet achieved the secondposition after ShuffleUNet, but for SSIM, UNet achieved thethird position behind sinc interpolation. Fig. 2. Metrics comparison of the baseline methods and the proposedShuffleUNet
B. Evaluation of the Derived Data
Fig. 3 shows the results of the derived data for qualitativeanalysis and Fig. 5 shows the resultant UQI and RMSEwhile comparing the derived data obtained from the baselineand ShuffleUNet results against the ground-truth. It can be ig. 3. Comparison of derived data (ShuffleUNet, baseline methods and ground-truth). AD: Axial Diffusivity, FA: Fractional Anisotropy, MD: Mean Diffusivity,and E1 to E6: Diffusion TensorsFig. 4. Three dimensional visualisation of the fibre tracts, generated from thederived data obtained from a ShuffleUNet result observed from both metrics that the proposed ShuffleUNetoutperformed all the baselines for all four types of deriveddata. Moreover, Fig. 4 shows an example of obtained fibretracks from a ShuffleUNet result.
Fig. 5. Quantitative evaluation of the derived data obtained from theShuffleUNet and the baseline methods . Statistical Hypothesis Testing
The significance of the improvements observed by theShuffleUNet were analysed using the independent two-samplet-test, on the basis of the resultant UQI values. Initially, thestructural output of the methods was compared and then, thederived data. The resultant p-values obtained while comparingthe UQI values of ShuffleUNet against the values of thethree baseline methods were always < . . Hence, it can besaid that ShuffleUNet achieved significant (p-values < . )improvements over all the baseline methods explored here -for the structural output and also the derived data.IV. C ONCLUSION
This paper presents a modified tight-frame UNet archi-tecture, ShuffleUNet, by incorporating pixel unshuffle andpixel shuffle operations for improved down- and upsamplingcapabilities of the network, thereby reducing artefacts in-troduced by UNet-based processing pipelines. The methodhas been evaluated for the task of patch-based single-imagesuper-resolution of diffusion-weighted MRIs. Images from theIXI dataset were downsampled by a factor of two in alldimensions, resulting in a theoretical MRI acceleration factorof eight. The proposed model successfully super-resolves suchlow-resolution images and achieved statistically significantimprovements over the baselines. The evaluation metrics ob-tained on the derived data indicate that after super-resolvingthe artificially downsampled images, similar fibre tracts canbe derived as the ground-truth images. Hence, images canbe acquired with those downsampled resolutions, which willreduce the scan time.A
CKNOWLEDGEMENT
This work was partially conducted within the context of theInternational Graduate School MEMoRIAL at Otto von Gu-ericke University (OVGU) Magdeburg, Germany, kindly sup-ported by the European Structural and Investment Funds (ESF)under the programme ”Sachsen-Anhalt WISSENSCHAFTInternationalisierung” (project no. ZS/2016/08/80646). Thiswork was also partially conducted within the context of theInitial Training Network program, HiMR, funded by the FP7Marie Curie Actions of the European Commission, grantnumber FP7-PEOPLE-2012-ITN-316716, and supported bythe NIH grant number 1R01-DA021146, and by the State ofSaxony-Anhalt under grant number ’I 88’.R
EFERENCES[1] I. Despotovi´c, B. Goossens, and W. Philips, “Mri segmentation of thehuman brain: challenges, methods, and applications,”
Computational andmathematical methods in medicine , vol. 2015, 2015.[2] D. Le Bihan, “Diffusion mri: what water tells us about the brain,”
EMBOmolecular medicine , vol. 6, no. 5, pp. 569–573, 2014.[3] V. Baliyan, C. J. Das, R. Sharma, and A. K. Gupta, “Diffusion weightedimaging: technique and applications,”
World journal of radiology , vol. 8,no. 9, p. 785, 2016.[4] K. Zeng, H. Zheng, C. Cai, Y. Yang, K. Zhang, and Z. Chen, “Simul-taneous single-and multi-contrast super-resolution for brain mri imagesbased on a convolutional neural network,”
Computers in biology andmedicine , vol. 99, pp. 133–141, 2018. [5] X. He, Y. Lei, Y. Fu, H. Mao, W. J. Curran, T. Liu, and X. Yang,“Super-resolution magnetic resonance imaging reconstruction using deepattention networks,” in
Medical Imaging 2020: Image Processing , vol.11313. International Society for Optics and Photonics, 2020, p.113132J.[6] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networksfor biomedical image segmentation,” in
International Conference onMedical image computing and computer-assisted intervention . Springer,2015, pp. 234–241.[7] ¨O. C¸ ic¸ek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger,“3d u-net: learning dense volumetric segmentation from sparse anno-tation,” in
International conference on medical image computing andcomputer-assisted intervention . Springer, 2016, pp. 424–432.[8] Z. Iqbal, D. Nguyen, G. Hangel, S. Motyka, W. Bogner, and S. Jiang,“Super-resolution 1h magnetic resonance spectroscopic imaging utilizingdeep learning,”
Frontiers in oncology , vol. 9, 2019.[9] C. Sarasaen, S. Chatterjee, M. Breitkopf, G. Rose, A. N¨urnberger, andO. Speck, “Fine-tuning deep learning model parameters for improvedsuper-resolution of dynamic mri with prior-knowledge,” arXiv preprintarXiv:2102.02711 , 2021.[10] Y. Han and J. C. Ye, “Framing u-net via deep convolutional framelets:Application to sparse-view ct,”
IEEE transactions on medical imaging ,vol. 37, no. 6, pp. 1418–1429, 2018.[11] A. Aitken, C. Ledig, L. Theis, J. Caballero, Z. Wang, and W. Shi,“Checkerboard artifact free sub-pixel convolution: A note on sub-pixelconvolution, resize convolution and convolution resize,” arXiv preprintarXiv:1707.02937 , 2017.[12] W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop,D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in
Proceedings of the IEEE Conference on Computer Vision and PatternRecognition (CVPR) , June 2016.[13] F. Toutounchi and E. Izquierdo, “Advanced super-resolution using loss-less pooling convolutional networks,” in . IEEE, 2019, pp. 1562–1568.[14] S. Nah, T. Hyun Kim, and K. Mu Lee, “Deep multi-scale convolutionalneural network for dynamic scene deblurring,” in
Proceedings of theIEEE conference on computer vision and pattern recognition , 2017, pp.3883–3891.[15] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deepresidual networks for single image super-resolution,” in
Proceedingsof the IEEE conference on computer vision and pattern recognitionworkshops , 2017, pp. 136–144.[16] S. Jain, D. M. Sima, F. Sanaei Nezhad, G. Hangel, W. Bogner,S. Williams, S. Van Huffel, F. Maes, and D. Smeets, “Patch-based super-resolution of mr spectroscopic images: application to multiple sclerosis,”
Frontiers in neuroscience , vol. 11, p. 13, 2017.[17] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:Surpassing human-level performance on imagenet classification,” in
Proceedings of the IEEE international conference on computer vision ,2015, pp. 1026–1034.[18] F. P´erez-Garc´ıa, R. Sparks, and S. Ourselin, “Torchio: a python li-brary for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning,” arXiv preprintarXiv:2003.04696 , 2020.[19] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Imagequality assessment: from error visibility to structural similarity,”
IEEEtransactions on image processing , vol. 13, no. 4, pp. 600–612, 2004.[20] Z. Wang and A. C. Bovik, “A universal image quality index,”
IEEEsignal processing letters , vol. 9, no. 3, pp. 81–84, 2002.[21] N. Solowij, A. Zalesky, V. Lorenzetti, and M. Y¨ucel, “Chronic cannabisuse and axonal fiber connectivity,” in
Handbook of Cannabis and RelatedPathologies . Elsevier, 2017, pp. 391–400.[22] E. Garyfallidis, M. Brett, B. Amirbekian, A. Rokem, S. Van Der Walt,M. Descoteaux, and I. Nimmo-Smith, “Dipy, a library for the analysisof diffusion mri data,”
Frontiers in neuroinformatics , vol. 8, p. 8, 2014.[23] M. Jenkinson, C. F. Beckmann, T. E. Behrens, M. W. Woolrich, andS. M. Smith, “Fsl,”