[PDF] ShuffleUNet: Super resolution of diffusion-weighted MRIs using deep learning

Abstract

Diffusion-weighted magnetic resonance imaging (DW-MRI) can be used to characterise the microstructure of the nervous tissue, e.g. to delineate brain white matter connections in a non-invasive manner via fibre tracking. Magnetic Resonance Imaging (MRI) in high spatial resolution would play an important role in visualising such fibre tracts in a superior manner. However, obtaining an image of such resolution comes at the expense of longer scan time. Longer scan time can be associated with the increase of motion artefacts, due to the patient's psychological and physical conditions. Single Image Super-Resolution (SISR), a technique aimed to obtain high-resolution (HR) details from one single low-resolution (LR) input image, achieved with Deep Learning, is the focus of this study. Compared to interpolation techniques or sparse-coding algorithms, deep learning extracts prior knowledge from big datasets and produces superior MRI images from the low-resolution counterparts. In this research, a deep learning based super-resolution technique is proposed and has been applied for DW-MRI. Images from the IXI dataset have been used as the ground-truth and were artificially downsampled to simulate the low-resolution images. The proposed method has shown statistically significant improvement over the baselines and achieved an SSIM of 0.913\pm0.045.

Full PDF

SShufﬂeUNet: Super resolution of diffusion-weightedMRIs using deep learning

Soumick Chatterjee § ∗†‡ , Alessandro Sciarra § ‡§ , Max D¨unnwald †§ ,Raghava Vinaykanth Mushunuri † , Ranadheer Podishetti † , Rajatha Nagaraja Rao † , Geetha Doddapaneni Gopinath † ,Steffen Oeltze-Jafra §¶(cid:107) , Oliver Speck ‡¶(cid:107)∗∗ and Andreas N¨urnberger ∗†(cid:107)∗ Data and Knowledge Engineering Group, Otto von Guericke University Magdeburg, Germany † Faculty of Computer Science, Otto von Guericke University Magdeburg, Germany ‡ Department of Biomedical Magnetic Resonance, Otto von Guericke University Magdeburg, Germany § MedDigit, Department of Neurology, Medical Faculty, University Hospital Magdeburg, Germany ¶ German Centre for Neurodegenerative Diseases, Magdeburg, Germany (cid:107)

Center for Behavioral Brain Sciences, Magdeburg, Germany ∗∗ Leibniz Institute for Neurobiology, Magdeburg, Germany

Abstract —Diffusion-weighted magnetic resonance imaging(DW-MRI) can be used to characterise the microstructure of thenervous tissue, e.g. to delineate brain white matter connectionsin a non-invasive manner via ﬁbre tracking. Magnetic ResonanceImaging (MRI) in high spatial resolution would play an importantrole in visualising such ﬁbre tracts in a superior manner. How-ever, obtaining an image of such resolution comes at the expenseof longer scan time. Longer scan time can be associated with theincrease of motion artefacts, due to the patient’s psychologicaland physical conditions. Single Image Super-Resolution (SISR),a technique aimed to obtain high-resolution (HR) details fromone single low-resolution (LR) input image, achieved with DeepLearning, is the focus of this study. Compared to interpolationtechniques or sparse-coding algorithms, deep learning extractsprior knowledge from big datasets and produces superior MRIimages from the low-resolution counterparts. In this research, adeep learning based super-resolution technique is proposed andhas been applied for DW-MRI. Images from the IXI dataset havebeen used as the ground-truth and were artiﬁcially downsampledto simulate the low-resolution images. The proposed method hasshown statistically signiﬁcant improvement over the baselines andachieved an SSIM of . ± . . Index Terms —super-resolution, deep learning, DWI, DTI, MRI

I. I

NTRODUCTION

Non-invasive brain imaging techniques have been advancingfor the past few decades and are used for detecting various dis-eases, to study brain anatomy and its functions [1]. Diffusiontensor imaging (DTI) or diffusion-weighted imaging (DWI) isone of the MR techniques, which is employed in diagnosingwhite matter diseases, cancer or stroke. The proportion ofwater in the human body is approximately − andit is distributed over intra- and extracellular compartments.Different tissues in the human body exhibit varying diffusioncharacteristics. DWI is a technique, which generates signalsbased on changes in Brownian motion and it provides infor-mation regarding the diffusion properties. Axon membranes § S. Chatterjee and A. Sciarra contributed equally present in brain white matter limit the molecular movementperpendicular to the ﬁbre thus resulting in anisotropic diffusionin white matter. DWI uses this property to provide informationon white matter integrity and structural details of white mattertracts [2], [3]. Though DWI is advancing rapidly and usedwidely in medical diagnosis, obtaining high-resolution scansis rather difﬁcult since it requires longer acquisition times,which may lead to motion artefacts. Single Image Super-Resolution (SISR), a technique aimed to obtain high-resolution(HR) details from one single low-resolution (LR) input image,is the focus of this study.

A. Related Work

Deep learning has emerged in recent times as one of themost important tools for various applications, and SISR isone of the hot topics in the world of deep learning. Variousdeep learning based SISR techniques have been developedover the years [4], [5]. UNet [6], [7], originally proposed forimage segmentation, has gained popularity in a wide arrayof applications since its inception, including as a solution ofinverse problems such as SISR of MRI reconstruction [8],[9]. Han et al. [10] proposed a tight-frame UNet architec-ture exploiting wavelet decomposition, which improves theperformance of UNet for inverse problems (shown for 2Dsparse-view CT). Even though UNet (and its variants) is oneof the most common architectures for the task of SISR andother inverse problems, it has its limitations, such as smoothedoutput and checker-board artefacts. Aitken et. al [11] attributedsuch problems to the following components of UNet - decon-volution layer for upsampling, strided convolutions or max-pooling for downsampling and random weight initialisation.It has been observed that upsampling from low-resolutionfeature maps to higher resolution, commonly referred to asdeconvolution step, can produce checker-board patterns. Sub-pixel convolution [12] is observed to have an upper handover the other methods in avoiding such artefacts [11]. Max- a r X i v : . [ ee ss . I V ] F e b ooling is conventionally used for downsampling operationsin UNet-based models, which chooses the maximum valuein the given kernel and there is a loss of pixel informationwhich is not desirable as there can be a high possibility forloss of sensitive information. Toutounchi et al. [13] proposeda lossless pooling layer to deal with the blurring problem (lossof information) of the max-pool operation. Furthermore, ithas been shown that batch normalisation layers (part of theUNet architecture) can negatively impact the result as theyrestrict the range ﬂexibility of the networks by normalisingthe features and also increase the GPU memory requirementsby around 40% [14], [15]. One ﬁnal point to be noted is thattypically deep learning based techniques require large trainingdatasets. Employing patch-based techniques (such as patch-based super-resolution: PBSR) might be able to deal with theproblem of limited training data [16], while also reducing theGPU memory requirements. B. Contribution

This research work proposes ShufﬂeUNet architecture for3D volumetric images, inspired by tight frame UNet, whichaddresses the aforementioned problems of blurred or smoothedoutput and checker-board artefacts by replacing the stridedconvolutions with lossless pooling layers (pixel unshufﬂe), byreplacing deconvolution (transposed convolution or interpo-lation) with a sub-pixel convolution operation (pixel shufﬂe)and by removing the batch normalisation layers. This pa-per validates the proposed model by performing patch-basedsingle-image super-resolution of diffusion-weighted imagesand shows the advantage of such an approach in the visu-alisation of ﬁbre tracts and other derived data.II. M

ETHODOLOGY

A. Network Architecture

The proposed ShufﬂeUNet consists of four blocks in thecontraction path, each of them downsamples the input by halfin all dimensions. Each block consists of three sub-blocks:double convolution, convolutional decomposition, and pixelunshufﬂe. The input goes to double convolution and its outputserves as the input of the convolutional decomposition sub-block. The input of this sub-block is provided to each convo-lution of this block and four different outputs are obtained -these outputs are referred to as convolutional decompositionof the input of this sub-block. Pseudo-loss-less downsamplingoperation, pixel unshufﬂe (see Eq.1), is applied on the fourthoutput, which down-samples the input by a factor of two in alldimensions, and the rest of the outputs are directly forwardedas skip-connection to the expansion path. I D = f l ( I L ) = PU ( W l × f ( l − ( I L ) + b l ) (1)Where PU denotes the pixel-unshufﬂing operation, whichrearranges all the elements of a tensor of dimension (n, c,rH, rW, rD) tensor to a tensor of dimension (n, r x c, H ,W , D), r is the scaling factor. After the contraction path, onedouble convolution sub-block is applied to the output of the ﬁnal pixel unshufﬂe as the latent convolution. The output ofthis latent convolution is passed to the expansion path.Similar to the contraction path, the expansion path alsocontains four blocks - each of them upsamples the input by afactor of two in all dimensions. Each of these blocks containsthree sub-blocks: pixel shufﬂe, convolutional decomposition,and double convolution. Pixel shufﬂe sub-blocks upscale thegiven input by a factor of k (here k=2) in all dimensionsby reducing the number of ﬁlters by a factor of 1/ k . Atensor of dimensions (n, r x c, H, W, D) is upsampled withperiodic shufﬂing operation in feature space as explained bythe following equation: I S = f l ( I L ) = PS ( W l × f ( l − ( I L ) + b l ) (2)where PS denotes pixel-shufﬂing operation which shufﬂes andarranges all the pixel elements of a tensor with dimensions(n, k x c, H, W, D) to a tensor with dimensions (n, c,kH × kW x kD), k is the scaling factor. W l , b l denotes theweights and bias matrices and f ( l − represents feature mapsfrom low-resolution feature space from the previous layer and f l represents upsampled feature space after pixel shufﬂing.The output of the pixel shufﬂe sub-block is forwarded tothe convolutional decomposition sub-block to obtain fourdifferent outputs which are then added with the incomingskip-connections from the same level of the contraction path,and ﬁnally, these four results are concatenated together. Then,this is further concatenated together with the output of theskip-connection coming from the pixel unshufﬂe operationsof the contraction path, which is then forwarded to thedouble convolution sub-block. The ﬁnal output of the modelis obtained after the fully-connected convolution layer.The initial number of ﬁlters of the network (ﬁrst convolutionlayer) is and the number of features is doubled by each ofthe contraction path blocks after every down-sampling stepand is reduced by half in every up-sampling process in theexpansion path. Also, it is noteworthy that here both pixelshufﬂe and pixel unshufﬂe operators have learnable parame-ters. Furthermore, the weight initialisation (convolution, pixelshufﬂe and pixel unshufﬂe layers) was performed using anormal distribution (also known as Kaiming Normal) [17],rather than the typical choice of uniform distribution (KaimingUniform), to help the convergence and to avoid degradingeffects arising from random weight initialisation [11]. B. Implementation

The implementation was done using PyTorch and wastrained using an Nvidia V100 GPU. The volumes were dividedinto patches with a patch size of × × with the help ofTorchIO [18]. The network was trained for 80 epochs (when itconverged) with a batch size of four. The loss was calculatedusing L1 loss (mean absolute error) and was optimised usingthe Adam optimiser with a learning rate of − . C. Evaluation

The results of the proposed model and the baselines (Sincinterpolation, trilinear interpolation and UNet model) were ig. 1. Proposed network architecture: ShufﬂeUNet compared to the ground truth data using three metrics: struc-tural similarity index (SSIM) [19], root-mean-square error(RMSE), and universal quality index (UQI) [20]. Furthermore,derived voxel-wise data, such as axial diffusivity (AD), frac-tional anisotropy (FA), mean diffusivity (MD), and diffusiontensors (E) [21], were obtained using DIPY [22]. Thesederived data can be utilised to generate the ﬁbre tracts andassess the integrity of white-matter microstructure [21]. Thederived data from the ShufﬂeUNet results and baseline resultswere compared against the derived data obtained from theground-truth volumes quantitatively using RMSE and UQI.

D. Dataset

For this research, DWIs from the IXI dataset were used,which were acquired with 1.5T and 3T MRI scanners with diffusion directions for healthy subjects. The volumesin the dataset are originally of anisotropic spatial resolutions . × . × . mm and . × . × mm . The 3Dvolumes were under-sampled to simulate the low-resolutiondataset, by a factor of two in all dimensions using FSL [23]to a spatial resolution of . × . × . mm , . × . × . mm respectively, which is . of the original data.UNet based models require the same size for input and output,hence the low-resolution dataset was then interpolated usingsinc interpolation to achieve the same pixel dimensions as thehigh-resolution volumes before supplying as input. The datasetwas split into three subsets: training, test and validation with , and subjects respectively.III. R ESULTS

The results have been quantitatively and qualitatively com-pared initially using the actual structural output of the modelsand then by comparing the derived data obtained. IXI Dataset: https://brain-development.org/ixi-dataset/

A. Output Evaluation

Fig. 2 portrays the resultant SSIM, RMSE and UQI valuesfor the three baselines (trilinear, sinc and UNet) and forShufﬂeUNet, while comparing the structural results againstthe high-resolution ground-truth volumes. It can be observedthat trilinear resulted in the lowest values for all three met-rics ( . ± . , . ± . , . ± . ), whileShufﬂeUNet outperformed all the baselines ( . ± . , . ± . , . ± . ). It is to be noted that for twoof the metrics (RMSE and UQI), UNet achieved the secondposition after ShufﬂeUNet, but for SSIM, UNet achieved thethird position behind sinc interpolation. Fig. 2. Metrics comparison of the baseline methods and the proposedShufﬂeUNet

B. Evaluation of the Derived Data

Fig. 3 shows the results of the derived data for qualitativeanalysis and Fig. 5 shows the resultant UQI and RMSEwhile comparing the derived data obtained from the baselineand ShufﬂeUNet results against the ground-truth. It can be ig. 3. Comparison of derived data (ShufﬂeUNet, baseline methods and ground-truth). AD: Axial Diffusivity, FA: Fractional Anisotropy, MD: Mean Diffusivity,and E1 to E6: Diffusion TensorsFig. 4. Three dimensional visualisation of the ﬁbre tracts, generated from thederived data obtained from a ShufﬂeUNet result observed from both metrics that the proposed ShufﬂeUNetoutperformed all the baselines for all four types of deriveddata. Moreover, Fig. 4 shows an example of obtained ﬁbretracks from a ShufﬂeUNet result.

Fig. 5. Quantitative evaluation of the derived data obtained from theShufﬂeUNet and the baseline methods . Statistical Hypothesis Testing

The signiﬁcance of the improvements observed by theShufﬂeUNet were analysed using the independent two-samplet-test, on the basis of the resultant UQI values. Initially, thestructural output of the methods was compared and then, thederived data. The resultant p-values obtained while comparingthe UQI values of ShufﬂeUNet against the values of thethree baseline methods were always < . . Hence, it can besaid that ShufﬂeUNet achieved signiﬁcant (p-values < . )improvements over all the baseline methods explored here -for the structural output and also the derived data.IV. C ONCLUSION

This paper presents a modiﬁed tight-frame UNet archi-tecture, ShufﬂeUNet, by incorporating pixel unshufﬂe andpixel shufﬂe operations for improved down- and upsamplingcapabilities of the network, thereby reducing artefacts in-troduced by UNet-based processing pipelines. The methodhas been evaluated for the task of patch-based single-imagesuper-resolution of diffusion-weighted MRIs. Images from theIXI dataset were downsampled by a factor of two in alldimensions, resulting in a theoretical MRI acceleration factorof eight. The proposed model successfully super-resolves suchlow-resolution images and achieved statistically signiﬁcantimprovements over the baselines. The evaluation metrics ob-tained on the derived data indicate that after super-resolvingthe artiﬁcially downsampled images, similar ﬁbre tracts canbe derived as the ground-truth images. Hence, images canbe acquired with those downsampled resolutions, which willreduce the scan time.A

CKNOWLEDGEMENT

This work was partially conducted within the context of theInternational Graduate School MEMoRIAL at Otto von Gu-ericke University (OVGU) Magdeburg, Germany, kindly sup-ported by the European Structural and Investment Funds (ESF)under the programme ”Sachsen-Anhalt WISSENSCHAFTInternationalisierung” (project no. ZS/2016/08/80646). Thiswork was also partially conducted within the context of theInitial Training Network program, HiMR, funded by the FP7Marie Curie Actions of the European Commission, grantnumber FP7-PEOPLE-2012-ITN-316716, and supported bythe NIH grant number 1R01-DA021146, and by the State ofSaxony-Anhalt under grant number ’I 88’.R

EFERENCES[1] I. Despotovi´c, B. Goossens, and W. Philips, “Mri segmentation of thehuman brain: challenges, methods, and applications,”

Computational andmathematical methods in medicine , vol. 2015, 2015.[2] D. Le Bihan, “Diffusion mri: what water tells us about the brain,”

EMBOmolecular medicine , vol. 6, no. 5, pp. 569–573, 2014.[3] V. Baliyan, C. J. Das, R. Sharma, and A. K. Gupta, “Diffusion weightedimaging: technique and applications,”

World journal of radiology , vol. 8,no. 9, p. 785, 2016.[4] K. Zeng, H. Zheng, C. Cai, Y. Yang, K. Zhang, and Z. Chen, “Simul-taneous single-and multi-contrast super-resolution for brain mri imagesbased on a convolutional neural network,”

Computers in biology andmedicine , vol. 99, pp. 133–141, 2018. [5] X. He, Y. Lei, Y. Fu, H. Mao, W. J. Curran, T. Liu, and X. Yang,“Super-resolution magnetic resonance imaging reconstruction using deepattention networks,” in

Medical Imaging 2020: Image Processing , vol.11313. International Society for Optics and Photonics, 2020, p.113132J.[6] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networksfor biomedical image segmentation,” in

International Conference onMedical image computing and computer-assisted intervention . Springer,2015, pp. 234–241.[7] ¨O. C¸ ic¸ek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger,“3d u-net: learning dense volumetric segmentation from sparse anno-tation,” in

International conference on medical image computing andcomputer-assisted intervention . Springer, 2016, pp. 424–432.[8] Z. Iqbal, D. Nguyen, G. Hangel, S. Motyka, W. Bogner, and S. Jiang,“Super-resolution 1h magnetic resonance spectroscopic imaging utilizingdeep learning,”

Frontiers in oncology , vol. 9, 2019.[9] C. Sarasaen, S. Chatterjee, M. Breitkopf, G. Rose, A. N¨urnberger, andO. Speck, “Fine-tuning deep learning model parameters for improvedsuper-resolution of dynamic mri with prior-knowledge,” arXiv preprintarXiv:2102.02711 , 2021.[10] Y. Han and J. C. Ye, “Framing u-net via deep convolutional framelets:Application to sparse-view ct,”

IEEE transactions on medical imaging ,vol. 37, no. 6, pp. 1418–1429, 2018.[11] A. Aitken, C. Ledig, L. Theis, J. Caballero, Z. Wang, and W. Shi,“Checkerboard artifact free sub-pixel convolution: A note on sub-pixelconvolution, resize convolution and convolution resize,” arXiv preprintarXiv:1707.02937 , 2017.[12] W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop,D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efﬁcient sub-pixel convolutional neural network,” in

Proceedings of the IEEE Conference on Computer Vision and PatternRecognition (CVPR) , June 2016.[13] F. Toutounchi and E. Izquierdo, “Advanced super-resolution using loss-less pooling convolutional networks,” in . IEEE, 2019, pp. 1562–1568.[14] S. Nah, T. Hyun Kim, and K. Mu Lee, “Deep multi-scale convolutionalneural network for dynamic scene deblurring,” in

Proceedings of theIEEE conference on computer vision and pattern recognition , 2017, pp.3883–3891.[15] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deepresidual networks for single image super-resolution,” in

Proceedingsof the IEEE conference on computer vision and pattern recognitionworkshops , 2017, pp. 136–144.[16] S. Jain, D. M. Sima, F. Sanaei Nezhad, G. Hangel, W. Bogner,S. Williams, S. Van Huffel, F. Maes, and D. Smeets, “Patch-based super-resolution of mr spectroscopic images: application to multiple sclerosis,”

Frontiers in neuroscience , vol. 11, p. 13, 2017.[17] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectiﬁers:Surpassing human-level performance on imagenet classiﬁcation,” in

Proceedings of the IEEE international conference on computer vision ,2015, pp. 1026–1034.[18] F. P´erez-Garc´ıa, R. Sparks, and S. Ourselin, “Torchio: a python li-brary for efﬁcient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning,” arXiv preprintarXiv:2003.04696 , 2020.[19] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Imagequality assessment: from error visibility to structural similarity,”

IEEEtransactions on image processing , vol. 13, no. 4, pp. 600–612, 2004.[20] Z. Wang and A. C. Bovik, “A universal image quality index,”

IEEEsignal processing letters , vol. 9, no. 3, pp. 81–84, 2002.[21] N. Solowij, A. Zalesky, V. Lorenzetti, and M. Y¨ucel, “Chronic cannabisuse and axonal ﬁber connectivity,” in

Handbook of Cannabis and RelatedPathologies . Elsevier, 2017, pp. 391–400.[22] E. Garyfallidis, M. Brett, B. Amirbekian, A. Rokem, S. Van Der Walt,M. Descoteaux, and I. Nimmo-Smith, “Dipy, a library for the analysisof diffusion mri data,”

Frontiers in neuroinformatics , vol. 8, p. 8, 2014.[23] M. Jenkinson, C. F. Beckmann, T. E. Behrens, M. W. Woolrich, andS. M. Smith, “Fsl,”