A Tool for Super-Resolving Multimodal Clinical MRI
Mikael Brudfors, Yael Balbastre, Parashkev Nachev, John Ashburner
AA Tool for Super-Resolving Multimodal Clinical MRI
Mikael Brudfors a, ∗ , Yaël Balbastre a , Parashkev Nachev b , John Ashburner a a Wellcome Centre for Human Neuroimaging, University College London, London WC1N 3BG, UK b Institute of Neurology, University College London, London WC1N 3BG, UK
Abstract
We present a tool for resolution recovery in multimodal clinical magnetic resonance imaging (MRI). Such images exhibitgreat variability, both biological and instrumental. This variability makes automated processing with neuroimaginganalysis software very challenging. This leaves intelligence extractable only from large-scale analyses of clinical datauntapped, and impedes the introduction of automated predictive systems in clinical care. The tool presented in thispaper enables such processing, via inference in a generative model of thick-sliced, multi-contrast MR scans. All modelparameters are estimated from the observed data, without the need for manual tuning. The model-driven nature of theapproach means that no type of training is needed for applicability to the diversity of MR contrasts present in a clinicalcontext. We show on simulated data that the proposed approach outperforms conventional model-based techniques, andon a large hospital dataset of multimodal MRIs that the tool can successfully super-resolve very thick-sliced images. Theimplementation is available from https://github.com/brudfors/spm_superres . Keywords:
Generative modelling, Super-resolution, MRI, Clinical data, Multi-channel total variation
1. Introduction
Automated analysis of magnetic resonance imaging(MRI) data enables studies of the brain that would beeither very time consuming, or simply not possible, ifperformed manually. These automated methods can beused to, e.g. , investigate differences in tissue composi-tion between groups of subjects (Ashburner and Friston,2000), examine changes over time due to ageing and neu-rodegenerative disorders (Reuter et al., 2012), make pre-dictions between healthy and diseased patients (Klöppelet al., 2008) or delineate to a clinician the location of somepathology (Kamnitsas et al., 2017). Many neuroimagingsoftware packages used for such analysis, e.g. , SPM (Ash-burner and Friston, 2005), FSL (Zhang et al., 2001) andFreeSurfer (Fischl et al., 2004), are tailored for small, al-most isotropic voxels, with high contrast between greymatter (GM) and white matter (WM). As they requiremore time and high field strengths, these types of featuresare mostly found in images acquired in a controlled, re-search context.MR images acquired in a clinical setting have muchgreater variability than their research counterparts, as clin-icians do not target reproducibility, but aim at maximis-ing sensitivity to a suspected pathology. This variabilitycan be disentangled into instrumental and biological. Theinstrumental variability stems from: (1) clinicians favour-ing speed over volume resolution and therefore typically ∗ Corresponding author: Mikael Brudfors [email protected] acquiring images with high in-plane resolution, but fewerand thicker slices; and (2), the scans being acquired acrossa diverse set of MR sequences, sensitive to different tissueproperties and, therefore, different pathologies. On theother hand, the biological variability relates to the diversemorphological variability in patient brains due to differentclinical conditions, as well as a wide age distribution. Allin all, this leads to a huge diversity in clinical MRI data.A comparison between MR images acquired in a researchand a clinical context is shown in Figure 1. Applying auto-mated analysis pipelines, successfully, to images acquiredin a routine clinical setting is much more challenging thanapplying them to research images. For example, the afore-mentioned software packages were recently evaluated atthe task of brain GM and WM volume estimation fromisotropic and anisotropic volumes acquired in the samesubjects, and results were improved by several percentagepoints when isotropic volumes were used over thick-slicedones (Adduru et al., 2017).Conversely, researchers interested in applying auto-mated analysis methods to MR data trade speed for reso-lution and therefore tend to acquire high-resolution (HR)volumetric images. This makes large-scale studies, withthousands of subjects, difficult to perform due to the ex-pense of scanning with such parameters. For example,UK BioBank took a decade of planning and acquisition(Ollier et al., 2005), at a cost of more than £60 million(Palmer, 2007). However, such studies can provide greatopportunities for making new discoveries about the brain(Miller et al., 2016; Smith and Nichols, 2018; Van Essenet al., 2013). Clinical data does not have this problem,
Preprint submitted to NeuroImage September 4, 2019 a r X i v : . [ ee ss . I V ] S e p s huge amounts of population-representative scans ex-ist (Roobottom et al., 2010), accumulated over years ofclinical service. Furthermore, these scans are available forresearch at cost neutrality and have a much higher preva-lence of pathology, which is of interest if disease is to beinvestigated rather than normality.Commonly, for analyses that require isotropic voxels,thick-sliced volumes are upsampled using simple interpola-tion, even though it can introduce artefacts such as aliasing(Aganj et al., 2012) and bias subsequent analyses. In thispaper we propose a method for reconstructing HR imageswith isotropic voxels from multimodal, low-resolution (LR)clinical scans, based on a principled generative model.Carefully crafted generative models have been shown togeneralise well in the case of medical image segmentation(Ashburner and Friston, 2005; Fischl et al., 2004; Zhanget al., 2001), and we here developed such a model for clini-cal MRI super-resolution. The objective of our model is tomerge information that is distributed across all MR scansof a patient, in order to generate closer-to-research qualityimages from clinical scans. Reconstructing isotropic HRimages in such a way could in turn better enable the afore-mentioned large-scale studies. The multi-sequence capa-bility of the proposed model stems from a cross-channel functional that has been studied extensively in the com-puter vision community; in MRI, for parallel image recon-struction (Chatnuntawech et al., 2016) and recently in-troduced for multimodal super-resolution (Brudfors et al.,2018).The model-based approach we propose in this paper isdata agnostic and does not require any complex learning.Learning a representative distribution of the large num-ber of sequences used in clinical MR imaging is extremelychallenging and would make the model much less general.Having to train a model anew, each time a new combina-tion of MR contrasts is encountered, or a different scalingfactor is needed, would make a general tool much moredifficult to develop.We show on a large number of patient scans that themodel is capable of super-resolving very thick-sliced clin-ical MRIs ( ≈ https://github.com/brudfors/spm_superres , we intend to shortly incorporate it into theSPM12 software . Not the general population – rather the population who are likelyto have had strokes, tumours, etc. We consider the different MR contrasts acquired in a patient asbeing equivalent to colour channels in computer vision ( e.g. , RGB).
2. Super-Resolution in Brain MRI
In image processing, the task of improving the resolu-tion of an image after it has been acquired is known assuper-resolution . There are two cardinal ways of super-resolving: the first is to combine information from a mul-tiplicity of images of the same subject, the second is tointroduce guidance from a population of other subjects.For both these methods it common to use some kind ofinductive bias given mathematically or conceptually, e.g. ,regularisation.Super-resolution was first studied in the computer visionfield in the early 80s and has since grown to become anactive area of research using tools from, e.g. , inverse prob-lems and machine learning (Huang and Yang, 2010; Yueet al., 2016). In the early 2000s, super-resolution foundapplications in the medical imaging community, in partic-ular for brain MRI (Van Reeth et al., 2012). This waspartly due to multi-LR super-resolution’s dependency onexact image alignment, where for brain imaging it is possi-ble to obtain good alignment simply by rigid registration.It was furthermore shown that super-resolution improvedresolution and signal-to-noise ratio favourably, comparedwith direct HR acquisition (Plenge et al., 2012).The earliest works investigating super-resolution appliedto brain MR images, defined an observation model that didnot take into account multiple MR contrasts, but ratherspecified how a set of LR images, of the same contrast,were generated from a HR image (Greenspan et al., 2001;Peled and Yeshurun, 2001). The basic idea was that itshould be more beneficial (with respect to the trade-offbetween acquisition time and signal-to-noise ratio), to ac-quire multiple LR images, and then combining them us-ing super-resolution, than acquiring one HR image. Theobservation model was constructed by linear operations(shift, down-sampling, blurring) and additive Gaussiannoise. A maximum likelihood (ML) estimate of the un-known HR image could then be found by a gradient de-scent style algorithm. Limited in handling scans only ofthe same orientations, but with subpixel offsets, Shillinget al. (2009) extended the methods based on observationmodels to handle multiple LR images, of different slice-select directions. The work of Poot et al. (2010) subse-quently generalised this approach to images not necessarilyrotated around a common frequency encoding axis, allow-ing for any slice orientation. They additionally solved theoptimisation problem efficiently using a conjugate gradientalgorithm.The ML methods were quickly extended to include someform of regularisation via maximum a posteriori (MAP)estimation, in order to reduce the ill-posedness of the so-lution, as well incorporating prior knowledge ( e.g. , neigh- The term super-resolution can also be used to describe tech-niques attempting to transcend the diffraction limit of an opticalsystem, e.g. , super-resolution microscopy. Dw research scan T1w research scan T2w research scan
FLAIR clinical scan (coronal) T1w clinical scan (sagittal)
T2w clinical scan (axial)
Figure 1:
Comparing research and clinical MRI. The top row shows PD-weighted (PDw), T1-weighted (T1w) and T2-weighted (T2w) MRimages of a subject from the high-resolution IXI dataset, which is used in the simulated results section of this paper. The three scans haveclose to 1 mm isotropic voxels. The bottom row shows FLAIR, T1w and T2w MR images of a subject from the clinical dataset we use in ourevaluation. These subject’s scans have thick-slices (6.5 mm), with different slice-select directions, and partial brain coverage. bouring voxels should look similar). The work of Peeterset al. (2004), Kainz et al. (2015) and Rousseau et al.(2010b) did so using edge-preserving functionals, such asthe Huber function or anisotropic diffusion, to regularisetheir solutions, while Zhang et al. (2008) and Poot et al.(2010) used the Euclidean norm of the image gradients.In Bai et al. (2004) a Markov random field was used, andin Shi et al. (2015) two sorts of regularisation were com-bined: the edge-preserving total variation and a low-rankterm that enabled utilisation of information throughoutthe image. Also Tourbier et al. (2015) used total variationas a regulariser, with an adaptive regularisation scheme.Besides including some sort of regularisation in the objec-tive function, replacing its mean-squared error norm hasalso been explored, with the aim of increased robustnessto incorrect noise models (Gholipour et al., 2010).The methods discussed up until this point are non-optimal for processing clinical MRIs, as they assume multi-ple LR images of the same contrast and hence do not utilisethe fact that routine clinical scans often are of differentcontrasts. A method utilising information from a HR ref-erence image of a different contrast was first proposed byRousseau et al. (2010a). The super-resolved image was ob-tained by iteratively denoising the current estimate of thesuper-resolved image with a patch-based technique, usingthe HR image as a reference, and then solving an optimi-sation problem, where the denoised image regularised thesolution. The patch-based work of Manjón et al. (2010a)used the information from a HR reference in a similar way,but instead of solving for the super-resolved image in anoptimisation setting, they used an iterative reconstruction-correction scheme. Another patch-based approach esti-mated weights from a HR image of a given contrast andthen regressed a HR image from a LR image with a differ-ent contrast (Zheng et al., 2017). Numerous other workson super-resolving MRIs have used these patch-based ap-proaches, utilising the pattern redundancy present in im- age patches (Coupé et al., 2013; Jog et al., 2014; Manjónet al., 2010b; Plenge et al., 2013). However, although thesemethods enable super-resolving across MR channels, theyrequire access to HR data. Such data is seldom available ina clinical setting. To mitigate this problem, methods havebeen developed that utilises the property that clinical MRimages are inherently anisotropic to learn a regression be-tween LR and HR images (Jog et al., 2016; Zhao et al.,2018); however, using only a single contrast.Data-driven (or learning-based) methods have also beenthoroughly investigated for the task of super-resolvingbrain MRIs. These methods aim at learning a LR to HRmapping from training data. One of the first such meth-ods was proposed by Rueda et al. (2013), using sparse dic-tionary learning. Also, regression-based techniques havebeen explored, such as patch-based random forest regres-sion (Alexander et al., 2014; Sindel et al., 2018). Con-volutional neural networks is another class of supervisedmethods that has gained interest in MRI super-resolution(Cengiz et al., 2017; Pham et al., 2017; Tanno et al., 2017).Deep generative models have also been used to directlylearn a LR to HR distribution for super-resolution recov-ery (Chen et al., 2018). To combat the fact that HR train-ing data is difficult to obtain, Dalca et al. (2018) proposeda generative model for sparse image patches, where LRclinical scans were used as training data. They addition-ally dealt with missing data in an elegant way. However,to the best of our knowledge, there is not yet a learning-based super-resolution tool useable without the need fortraining anew, when faced with unseen contrasts.
3. Methods
Compared with some of the super-resolution approachesmentioned in the previous section, the model we proposehere it multimodal, and does not require any complexlearning, nor any HR reference data. We additionallyavoid any learning from training data with the hope of3 c λ c x ci τ ci A ci I c C Figure 2:
Graphical model of the joint probability distribution inequation (1). Random variables are in circles, observed variablesare shaded, plates indicate replication. There are C unknown HRimages ( y c ) and I c observed LR images ( x ci ) for each c . Modelparameters are dotted and are the Gaussian noise precisions τ ci ,projection matrices A ci , and regularisation parameters λ c . improved generalisability when applying the model to di-verse clinical MR scans. The model instead relies on thedefinition of a joint distribution of a patient’s multimodalMR images. A generative model that integrates multi-modal images requires a component that relates signalacross the various modalities. Here, this component is anovel prior in the context of MRI super-resolution, whichpromotes combining image information distributed acrossa patient’s MR images. In this section, we consider thateach MR contrast ( e.g. , T1w, T2w, PDw) constitutes onechannel of a multi-channel volume. When multiple imagesof the same contrast are acquired, they are considered asmultiple noisy observations of the same channel. The model assumes that each LR image is generated byselecting thick slices, arbitrarily rotated and/or translated,from a HR image, with the addition of random noise. Thisassumption can be written as a conditional probability dis-tribution known as the data likelihood. We also assumethat each HR image is the result of a random process,characterised by a probability distribution known as theprior. This generative model is formalised by the jointprobability distribution: p ( X , Y ) = p ( X | Y ) (cid:124) (cid:123)(cid:122) (cid:125) likelihood p ( Y ) (cid:124) (cid:123)(cid:122) (cid:125) prior = C (cid:89) c =1 I c (cid:89) i =1 p ( x ci | y c ) p ( Y ) , (1)where Y = {{ y c } Cc =1 | y c ∈ R N } denotes the unknownHR images of C different channels and X = {{ X c } Cc =1 | X c = { x ci } I c i =1 | x ci ∈ R N ci } a set of LR images. The vari-able I c is the number of observed LR images of channel c , N is the number of voxels in the HR images, and N ci isthe number of voxels in the i th LR image of the c th chan-nel. Note that we allow for some values of observed LRvoxels to be assumed missing ( i.e. , Not a Number), whichenables these values to be filled in during model fitting.Prior to super-resolving a set of MRIs we perform a rigid registration of the observed data to a common referenceusing the spm_coreg routine of SPM12 .We cast estimating the HR images ( Y ), given a set ofobserved LR images ( X ), as MAP estimation in the jointprobability distribution defined by equation (1): p ( Y | X ) = p ( X | Y ) p ( Y ) / p ( X ) ⇒ argmin Y {− ln p ( Y | X ) }⇒ argmin Y {− ln p ( X | Y ) p ( Y ) } . (2)For the model to generalise, we estimate its parametersfrom either the observed data (likelihood and prior hyper-parameters) or set them in a general and consistent way(projection matrices). A graphical representation of thegenerative model is shown in Figure 2. As in practice,there is rarely more than one observation of the same chan-nel, we will from now on drop the summations over i andassume only one observation of each channel. All deriva-tions stay the same, except for additional summations overconditional distributions of LR images. The individualcomponents of our generative model will be further ex-plained in the next three sections. The likelihood function should describe the data gener-ating process of LR images ( x c ) from unknown HR images.Its main component is a deterministic projection matrix( A c ) that encodes the slice-selection parameters (orienta-tion, thickness, gap, profile) of an LR image. It is a lin-ear operator that, when applied to the corresponding HRimage, creates a noiseless LR version. The second compo-nent of the generative process encodes acquisition noise.As MR images are usually reconstructed as the magni-tude of an image that was originally complex – and Gaus-sian noise on complex data gives a Rice distribution in themagnitude image – a Rician noise model would be suit-able (Aja-Fernández and Tristán-Vega, 2013). However,it has been shown that the mathematically more tractableGaussian distribution closely approximates the true Riciannoise distribution in MRI (Gudbjartsson and Patz, 1995).We therefore assume the following forward model: x c = A c y c + (cid:15), (cid:15) ∼ N (cid:0) , τ − c (cid:1) , (3)so that the conditional distribution of an observed LR im-age, given an unknown HR image, is a multivariate Gaus-sian distribution: p ( x c | y c ) = N (cid:0) x c | A c y c , τ − c I (cid:1) = τ N c / c (2 π ) N c / exp (cid:16) − τ c (cid:107) x c − A c y c (cid:107) (cid:17) , (4) A more elegant way of doing this would be to include registra-tion inside the generative model, such that fitting the model wouldoptimise also some registration parameters (see Discussion). Note that multi-coil MR images generally do not have Riciannoise because of the way the images are reconstructed. The Ricianassumption is therefore good for older MR scanners, but becomes abit of an approximation for more modern systems. istogram of MRI gradientsCompute MRI gradients T1w x-gradient
EmpiricalGaussianLaplace
Value of gradient P r o b a b ili t y Figure 3:
Empirical investigation of the distribution of MR imagegradients. By computing the x -, y - and z -gradients of a MR im-age, and fitting a Gaussian (yellow) and a Laplace distribution (red)to the histogram of these gradients (blue), it can be seen that theLaplace distribution more accurately captures the empirical distri-bution of MR image gradients. where τ c is the precision of the noise of LR image c and A c ∈ R N c × N is the linear operator mapping from HR toLR space. Dropping all terms that do not depend on y c ,the negative log-likelihood can be written as: − ln p ( x c | y c ) = τ c (cid:107) x c − A c y c (cid:107) + const . (5)The multivariate Gaussian distribution in equation (4) is alikelihood function that is already well-established in thesuper-resolution literature (Greenspan et al., 2001; Pootet al., 2010; Shilling et al., 2009). The prior probability should encode our belief aboutthe unknown HR images ( Y ). Many types of priors havebeen devised for image reconstruction problems. The mostpopular alternative is perhaps the Tikhonov (or (cid:96) prior),that penalises the squared norm of some image features: p ( y c ) = N (cid:0) y c | , ( λ c L ) − (cid:1) , (6)where λ c is a channel specific regularisation parameter andthe precision matrix ( L ) is designed as to induce correla-tions between image voxels. This type of prior probabilityfavours images that are smooth when the precision matrixencodes some differential operator L = D T D , so that: − ln p ( y c ) = λ (cid:107) Dy c (cid:107) + const . (7)If the differential operator encodes a first-order derivative,then the negative log-likelihood of this prior is known as afirst-order Tikhonov regulariser. The differential operatoris in dimension D ∈ R NG × N , where G are the number ofdifferential features.The underlying assumption of the Tikhonov prior is thatthe gradients in the HR image have a Gaussian distribu-tion. However, by studying the empirical gradient distri-bution of a high-resolution, noise-free MR image we see that a Laplace distribution is more suitable (see Figure 3).Discarding terms that do not depend on y c , this gives thefollowing prior distribution: p ( y c ) ∝ N (cid:89) n =1 exp ( − λ c (cid:107) D n y c (cid:107) ) , (8) − ln p ( y c ) = λ c N (cid:88) n =1 (cid:107) D n y c (cid:107) + const . (9)where λ c is the inverse of the scale parameter of theLaplace distribution, and the operator D n ∈ R G × N re-turns the gradients at the n th data point of y c . Thelog of the Laplace distribution in equation (9) is knownas (isotropic) total variation (TV) and is another popu-lar method for regularising image reconstruction problems(Rudin et al., 1992). Rather than favouring smooth re-constructions, TV retains edges and therefore leads to lessblurry results. Here, to avoid biasing the reconstruction,we extract both the forward and backward first-order finitedifferences along each dimension, giving G = 6 in 3D.If we assume that the unknown HR images follow thedistribution in equation (8), then we do not have any de-pendencies between channels. Clearly this assumption isfalse for MR images of the same subject, where most edgesshould be shared between channels. We therefore proposeusing the multi-channel total variation (MTV) functional(Sapiro and Ringach, 1996) as our prior probability: p ( Y ) ∝ N (cid:89) n =1 exp − (cid:118)(cid:117)(cid:117)(cid:116) C (cid:88) c =1 (cid:107) λ c D n y c (cid:107) , (10) − ln p ( Y ) = N (cid:88) n =1 (cid:118)(cid:117)(cid:117)(cid:116) C (cid:88) c =1 (cid:107) λ c D n y c (cid:107) + const . (11)The summation over channels ( C ) inside of the square rootensures that the channels are ‘mixed’, making the assump-tion that the MR images have large smooth regions anda few sharp edges, in similar places. Note that TV is aspecial case of MTV, when C = 1 . Defining the generative model has given us a set of pa-rameters: the projection matrices ( A ), noise precisions ( τ )and prior parameters ( λ ). These parameters are all image-specific and it is critical that we set these parameters ina principled way for our model to generalise – it is notfeasible to expect users to do manual tuning. This sectiongives more detail about how the model parameters werechosen. Projection matrices:
The projection matrix ( A c ) is alinear mapping from HR to LR image space. It shouldreproduce the slice-selection process of the MRI scan-ner. In this work we design the projection matrix asthree linear operators, applied in succession as: A c = R c S c T c . (12)5 c S c R c high-res low-res Figure 4:
Illustration of the super-resolution forward model ( A c ). The operator T c resamples from the HR image’s FOV to the LR image’sFOV, keeping the voxel size of the HR image. The operator S c simulates the slice profile of the MRI acquisition. The operator R c performsthe down-sampling operation from HR to LR space. The operator T c resamples from the HR image’s fieldof view (FOV) to the LR image’s FOV, but keepingthe voxel size of the HR image. If the slice orientationis at an angle with respect to the HR grid, the corre-sponding rotation is accounted for. Furthermore, toimprove numerical properties of the projection oper-ator, we slightly increase the FOV of the resampledimage, which is then adjusted for in the R c operator.The operator S c should simulate the slice profile ofthe MRI acquisition. The slice profile depends on theshape of the radio-frequency (RF) pulse applied dur-ing slice selection. This RF pulse can vary a lot fromone sequence to the next. Therefore, there is not a sin-gle slice profile that suits all acquisitions (Liu et al.,2002). Here, we make the assumption that the sliceprofile is Gaussian . Applying the T c operator beforethe Gaussian convolution ensures that the kernel isapplied in the correct directions. We set the full widthat half maximum (FWHM) of the Gaussian kernel tozero in the in-plane directions and to the width of thethick-slice in the thick-slice direction. We addition-ally modulate the thick-slice direction FWHM by sub-tracting a slice gap. We computed an estimate of thisslice-gap from 29,026 patients MRIs . The estimatewe obtain for the slice gap is one third of the widthof the slice thickness. This information could be readfrom the DICOM header of the images. But as ourtool works on NIfTI data, this information may notbe present, as it can be lost during DICOM-to-NIfTIconversion. The slice gap can however be provided,when available.The operator R c performs the down-sampling opera-tion from HR to LR space. The process of applying A c to a HR image is illustrated in Figure 4. Noise precision:
MR scans are usually reconstructed as Our implementation of the projection matrix however, gives auser the option to easily change the slice profile assumption by simplychanging the convolution kernel. From the DICOM representation of a MR image, whose headercontain both the field
Spacing Between Slices (0018, 0088) and
Slice Thickness (0018, 0050), the slice gap can be computed asthe
Spacing Between Slices minus the
Slice Thickness . the magnitude of an image that was originally com-plex. The Gaussian noise model in equation (3) istherefore just an approximation to a Rician noisemodel. Hence, we want to estimate the amount ofRician noise in each observed LR image ( τ c ). We doso by fitting a mixture of two Rician distributions tothe intensity histogram of each MR scan (Ashburnerand Ridgway, 2013). We then calculate the precisionof the image noise ( τ c ) from the class with the small-est noncentrality parameter, which should correspondto air. An example fit is shown in Figure 5a. Regularisation parameters:
Each unknown HR imagehas a corresponding Regularisation parameter ( λ c ).First, note that that gradient values are, in general,correlated with intensity values; the regularisation pa-rameter should therefore be set with respect to themean intensity in a channel. We observed empiri-cally a linear relationship between the standard devi-ation of gradient magnitudes σ c and the mean tissueintensity of an image µ c . We computed the meantissue intensity of 1,728 scans from the IXI datasetby fitting a two-class Rician distribution and tak-ing the noncentrality parameter of its non-air com-ponent, and regressed the standard deviation of thefirst-order gradients against it. We obtained the value k λ = σ/µ = 4 . ; the fit is shown in Figure 5b. Sincethe variance of a Laplace distribution relates to thescale parameter through the relation σ = 2 /λ , wecan set the parameter λ c according to: λ c = √ σ c = √ k λ µ c . (13) With all the individual components of the model de-fined, the negative log posterior probability can be written6 a) (b)Figure 5:
Estimating model parameters for the super-resolution model. (a) A two-class Rician mixture model was fit to the intensityhistogram of each MR image. The image noise was computed as the variance of the air class. Note that image intensities in MRI arenon-quantitative. (b) A straight line was fit between the mean tissue intensities and the standard deviations of image gradients, computedfrom the IXI dataset; its coefficient gives us the value of the parameter k λ . as: − ln ( p ( X | Y ) p ( Y )) = C (cid:88) c =1 τ c (cid:107) x c − A c y c (cid:107) + N (cid:88) n =1 (cid:118)(cid:117)(cid:117)(cid:116) C (cid:88) c =1 (cid:107) λ c D n y c (cid:107) + const , (14)where the first term on the right-hand side is known as thedata term and the second term as the penalty term. Theexpression in equation (14) is what we want to minimiseto obtain the C super-resolved images: argmin Y {− ln p ( X | Y ) p ( Y ) } . (15)This optimisation problem is hard to solve because theMTV penalty term is non-differentiable (nonsmooth). Itis, however, convex with a global optimum.A change of variables allows the unconstrained minimi-sation problem in equation (15) to be rewritten as a con-strained minimisation: min Y C (cid:88) c =1 τ c (cid:107) x c − A c y c (cid:107) + N (cid:88) n =1 (cid:118)(cid:117)(cid:117)(cid:116) C (cid:88) c =1 (cid:107) z nc (cid:107) s.t. λ c D n y c = z nc , for all n and c, (16)where the smooth data term and the nonsmooth penaltyterm have been decoupled. The constrained form ofequation (15) can now be solved efficiently by an algo-rithm known as alternating direction method of multipliers(ADMM). ADMM is part of a class of optimisation methods calledproximal algorithms (Boyd et al., 2011). In short, an aug-mented Lagrangian is formulated from a constrained min-imisation problem. This Lagrangian is then minimised inan alternating fashion until a convergence criterion is met.Many methods exist for solving TV problems. We chosean ADMM algorithm because it is straightforward to im-plement and suitable for the type of optimisation problemin equation (15).Deriving the general form of the ADMM algorithmstarts with an unconstrained minimisation problem: min y data ( y ) + penalty ( y ) , (17)which is equivalent to the constrained problem: min y , z data ( y ) + penalty ( z ) s.t. F d y + F p z = d , (18)with constraints defined by F d , F p and d . Note how theoptimisation problem in equation (18) has the same formas the super-resolution minimisation in equation (16).From equation (18), the augmented Lagrangian can beformulated as: L ρ ( y , z , w ) = data ( y ) + penalty ( z )+ w T ( F d y + F p z − d )+ ρ (cid:107) F d y + F p z − d (cid:107) , (19)where w ∈ R P holds the Lagrange multipliers and ρ > is a descent step-size. From the augmented Lagrangian in7quation (19), the ADMM updates are given as: y k +1 := argmin y L ρ ( y , z k , w k ) , (20) z k +1 := argmin z L ρ ( y k +1 , z , w k ) , (21) w k +1 := w k + ρ · ( F d y k +1 + F p z k +1 − d ) , (22)where equation (20) and equation (21) are known as theproximal operators at parameter ρ for the data and penaltyterm, respectively. These ADMM updates are usually it-erated until some convergence criteria is fulfilled. In this section we derive the ADMM updates for oursuper-resolution model. We will work with two tensors Z and W , which are both of dimensions N × C × G . Weextract column vectors from these tensors, where the sub-script of a vector indicates its length: z n = vec ( Z [ n, : , :]) , z c = vec ( Z [: , c, :]) , z nc = vec ( Z [ n, c, :]) , w n = vec ( W [ n, : , :]) and w c = vec ( W [: , c, :]) . With the notations intro-duced we formulate the augmented Lagrangian in equa-tion (19) from equation (16) as: L ρ ( Y , Z , W ) = C (cid:88) c =1 τ c (cid:107) x c − A c y c (cid:107) + N (cid:88) n =1 (cid:107) z n (cid:107) + C (cid:88) c =1 w c T ( λ c Dy c − z c )+ ρ C (cid:88) c =1 (cid:107) λ c Dy c − z c (cid:107) , (23)where we have made use of the fact that, for a fixed n : (cid:115)(cid:88) c (cid:107) z nc (cid:107) = (cid:115)(cid:88) c (cid:88) g ( z ncg ) = (cid:107) z n (cid:107) . (24)From the augmented Lagrangian in equation (23) theADMM updates can be derived as: y k +1 c := argmin y c (cid:16) τ c (cid:107) x c − A c y c (cid:107) + λ c w kc T Dy c + ρ (cid:13)(cid:13) λ c Dy c − z kc (cid:13)(cid:13) (cid:17) , (25) z k +1 n := argmin z n (cid:16) (cid:107) z n (cid:107) − w kn T z n + ρ (cid:13)(cid:13) λ c D n y k +1 c − z n (cid:13)(cid:13) (cid:17) . (26)The updates in equation (25) and equation (26) give C optimisation problems for the HR images Y and N opti-misation problems for the variables Z . From equation (22)the estimate of the Lagrange multiplier is: w k +1 c := w kc + ρ · ( λ c Dy k +1 c − z k +1 c ) , for all c. (27) ADMM update for Y :. The optimisation problem in equa-tion (25) is simply regularised least-squares with a closed-form solution given by: y k +1 c = (cid:18) τ c A c T A c + ρλ c D T D (cid:19) − (cid:18) τ c A c T x c − λ c D T ( w c − ρ z c ) (cid:19) . (28)This system is too large to be inverted directly, and aconjugate gradient method is often used as an alternative(Hestenes and Stiefel, 1952). Here, for increased speed andconvergence, we instead use a Newton’s method. Writingthe objective function as: L ( y c ) = τ c (cid:107) x c − A c y c (cid:107) + λ c w c T Dy c + ρ (cid:107) λ c Dy c − z c (cid:107) , (29)we get its gradient as: ∂ L ( y c ) ∂ y c =( τ c A c T A c + ρλ c D T D ) y c + λ c D T ( w c − ρ z c ) − τ c A c T x c , (30)and Hessian as: ∂ L ( y c ) ∂ y c ∂ y c T = ( τ c A c T A c + ρλ c D T D ) . (31)Since the problem is quadratic, Newton’s method wouldsolve it in one step, which is equivalent to computing theclosed-form solution in equation (28). However, obtainingthe full Hessian is computationally difficult. We thereforereplace the true Hessian by a majorising matrix: H c = diag ( τ c A c T A c ) + ρλ c D T D , (32)where ∈ R N is a vector of ones. This approximation,which can be inverted in linear time using a multigridmethod (Ashburner, 2007), is more positive-definite thanthe true Hessian in the Loewner ordering sense (see LemmaS.3 in Chun and Fessler (2017)) and therefore ensures con-vergence. This gives us the following update step: y k +1 c = y c − H − c ∂ L ( y c ) ∂ y c . (33) ADMM update for Z :. By some algebraic manipulationswe can rewrite equation (26) in the equivalent form: z k +1 n = argmin z n (cid:16) ρ (cid:13)(cid:13)(cid:13)(cid:13) z n − (cid:18) ρ w n + λ c D n y c (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:107) z n (cid:107) + const (cid:17) , (34)where the constant contains terms that do not depend on z n . This gives us an optimisation problem with an (cid:96) data8 lgorithm 1 Multimodal super-resolution Coregister input images ( X ) . Estimate model parameters ( τ , λ ) . Initialise variables to zero ( Y , Z , W ) . while not coverged do for c = { , . . . , C } do Compute y k +1 c by equation (33), when given z kc and w kc . end for for n = { , . . . , N } do Compute z k +1 n by equation (37), when given y k +1 n and w kn . end for for c = { , . . . , C } do Compute w k +1 c by equation (22), when given y k +1 c and z k +1 c . end for end while term and an (cid:96) penalty term. A more general formulationof this problem is: argmin s (cid:16) β (cid:107) s − t (cid:107) + α (cid:107) s (cid:107) (cid:17) , (35)which has a closed-form solution given by (proof given in, e.g. , Yang et al. (2009)): s ( t ) = max (cid:40) (cid:107) t (cid:107) − αβ , (cid:41) (cid:12) t (cid:107) t (cid:107) , (36)where (cid:12) denotes the Hadamard product. The solutionto the optimisation problem in equation (34) is thereforegiven by: z k +1 n = max (cid:40) (cid:13)(cid:13)(cid:13)(cid:13) ρ w n + λ c D n y c (cid:13)(cid:13)(cid:13)(cid:13) − ρ , (cid:41) (cid:12) ρ w n + λ c D n y c (cid:13)(cid:13)(cid:13) ρ w n + λ c D n y c (cid:13)(cid:13)(cid:13) . (37) Algorithm 1 shows the steps for super-resolving a set ofthick-sliced patient scans with the proposed model. Theprocess is computationally intensive and we take care toprovide an efficient implementation. The software is writ-ten in a mixture of MATLAB and C code. There arelarge memory requirements, which are likely to exceed therandom-access memory (RAM) of some workstations. Tosave memory, all the computations are therefore performedusing single precision floating point. This lower precisionhas negligible effect on the numerical stability. We solve for Y efficiently using a multigrid technique (Ashburner,2007). The Newton update can be iterated over to get acloser fit to the solution. In this work, however, a single up-date is used to reduce computational time. Furthermore,the loop solving for each Y is distributed, so that a sep-arate process handles each iteration. Finding the optimalstep-size ρ is an open-problem (Dohmatob et al., 2014),and though under mild conditions ADMM converges forany value of ρ , the convergence rate depends on ρ . Here,we use the heuristic: ρ = (cid:112) mean ( { λ c } Cc =1 ) mean ( { τ c } Cc =1 ) , (38)which we observe empirically gives good convergence prop-erties. Furthermore, algorithm convergence is defined bythe relative change in objective value: · ( l k − l k +1 ) / ( l k + l k +1 ) , being less than − . The objective value ( l ) iscomputed from equation (14). Finally, we define the HRimages’ FOV as the bounding box that contains all LRimages’ FOV, and we set the voxel size of the HR imagesto 1 mm isotropic.
4. Evaluation
The aim of this section is to assess the effectiveness ofthe MTV prior, compared with classically used regularis-ers for super-resolving multi-contrast MR datasets, and toillustrate the proposed model’s ability to process a largeclinical dataset.
The IXI dataset contains multimodal MR volumes,with T1w, T2w and PDw images, of 576 healthy subjects,acquired on 1.5T and 3T systems at three different cen-tres. All images have close to 1 mm isotropic resolution.This dataset was used to (1) validate the robustness of thenoise variance estimation by adding known amounts of Ri-cian noise and estimating it; (2) compare our heuristic forsetting regularisation parameters, with optimal values ob-tained by grid-search; (3) compare the efficiency of twoiterative methods (conjugate gradient and approximateNewton) for solving a quadratic optimisation problem; (4)compare four different methods for super-resolving MRIs:4th order b-spline (BS) interpolation, first-order Tikhonov(FOT), TV and MTV.LR images were generated by applying the forward pro-jection operator ( A ) to HR images, such that LR imageshad thick slices in one direction. Thick-slice directionswere picked randomly, but were always orthogonal acrosscontrasts ( e.g. , axial for T1w, coronal for T2w, sagittal forPDw). Figure 6 shows an example of simulated LR im-ages from HR images. Comparisons between methods are http://brain-development.org/ixi-dataset/ igh-res Low-res BS, PSNR=24.68 FOT, PSNR=24.83 TV, PSNR=25.25
MTV, PSNR=26.26BS, PSNR=29.31 FOT, PSNR=29.71 TV, PSNR=30.19 MTV, PSNR=30.748 mm thick-sliced4 mm thick-sliced
Figure 6:
Example of simulating and super-resolving from the IXI dataset. Two LR images were simulated from two HR images (PDw andT2w). The HR images were then reconstructed using four methods: BS, FOT and TV are single-channel SR techniques, capable of combiningLR images of only one channel; MTV on the other hand, a multi-channel method, combines information from both channels. based on the root-mean-square error (RMSE):
RMSE = (cid:118)(cid:117)(cid:117)(cid:116) N (cid:88) n =1 ( y recon n − y ref n ) , (39)and the peak signal-to-noise ratio (PSNR): PSNR = 20 log (cid:32) max (cid:0) y ref (cid:1) RMSE (cid:33) , (40)two widely used metrics for image reconstruction. Thesemetrics were computed for each contrast. We also com-puted Dice scores between GM and WM segmentationsobtained by applying SPM12’s unified segmentation (withdefault parameters) to the super-resolved and HR images.Segmentation is a typical processing task in neuroimagingas it allows for morphometric analyses to be conducted.FOT, defined in equation (7) and TV, defined in equa-tion (9), were implemented within the same framework asMTV. Therefore, they used the same projection matricesand parameters. B-spline interpolation is not technically asuper-resolution technique, but is often used in practice toreslice low-resolution images prior to automated process-ing. For b-spline interpolation, if multiple LR images ofthe same contrast are available, they are simply averaged. Noise Precision:
All 1,728 IXI scans were used in thisexperiment. A known amount of Rician noise, as apercentage of the mean image intensity (1%, 2.5%,5%, 10%), was added to each image. The two-classRician mixture model was then fit to the intensity his-togram of the noisy images. The variance of the clus-ter with the lowest noncentrality parameter was used as an estimate of the image noise percentage. Ta-ble 1 shows the mean and standard deviation, acrosssubjects, of the estimated noise variance percentage.These results show that the Rician mixture model canaccurately estimate a wide range of noise levels. Notethat there is already Rician noise present in the im-ages. However, the amount of Rician noise that isadded is an order of magnitude greater than the in-trinsic noise.
Regularisation Parameter:
Four images from differ-ent subjects and contrasts (T1w, T2w, PDw anddiffusion-weighted (DW)), were used to validatewhether the heuristic devised in section 3.1.3 to es-timate the prior parameter ( λ ) yields proper regu-larisation values. LR images with seven millimetreslice-thickness were generated and a grid-search overthe regularisation parameter was performed, (in therange (cid:2) − , (cid:3) , with step size . ). For each value,PSNR between the resulting MTV super-resolved im-ages and the known ground-truth was computed. ADW image was included to investigate whether theheuristic generalises to contrasts not part of the train-ing set. The result of each grid-search, with the cor-responding heuristic estimates marked by crosses, canbe seen in Figure 7. This experiment shows that themethod for estimating the regularisation parameterallows near-optimal reconstructions to be produced,even for unseen contrasts. Approximate Newton Solver:
The update-step for Y entails solving a large linear system, which is oftendone iteratively using the conjugate gradient method10 able 1: Validating the estimate of the noise precision parameter (for 1,728 subjects). Showing the simulated noise percentages, and thenoise percentages estimated by fitting the two class Rician mixture model. Shown as mean ± std.Ground-truth (%) 1 2.5 5 10Estimated (%) . ± .
76 2 . ± .
51 5 . ± .
01 10 . ± . Figure 7:
Validation of estimating the regularisation parameter.A grid-search over the regularisation parameter ( λ ) was performedfor four different contrasts, and PSNR between the reconstructionand reference image was computed. The regularisation parameterobtained heuristically is marked by a dot. (Hestenes and Stiefel, 1952). Here, we show that,for this particular problem, an approximate Newtonmethod (based on a majoriser of the full Hessian andsolved with a multigrid algorithm) converges fasterthan the conjugate gradient method. T1w, T2w andPDw image were simulated, with 6 mm slice thick-nesses and different thick-slice directions. The MTVsuper-resolution algorithm was then run, where theupdate for Y was solved either using the conjugategradient or the approximate Newton method. Bothsolution method were iterated for a fixed number ofiterations (50). Figure 9 shows the evolution of thenegative log-likelihood over computation time. Thistotal computation time takes into account both thenumber of iterations and the computation time periteration. The approximate Newton solver convergesfaster than the conjugate gradient solver. Note thatwe did not use pre-conditioning, which may have ledto increased conjugate gradient solver performance. Multi-channel Super-Resolution:
All IXI subjectswere used to simulate LR images. For each subject,the slice direction and thickness (between two andeight millimetre) were chosen at random. The simu-lated LR images were super-resolved using BS, FOT,TV and MTV. Among these, only MTV makes jointuse of information distributed across contrasts. Ex-ample reconstructions obtained with each method canbe seen in Figure 6.
Figure 8:
Super-resolving images of different slice-thicknesses (for20 subjects). As the simulated slice-thickness increases (top to bot-tom), MTV super-resolution results improve favourably comparedwith the single-channel methods (BS, FOT, TV).
For each contrast, PSNR was computed between thereference and super-resolved images. Table 2 showsthe average PSNR and MTV obtained the greatestmean and lowest standard deviation. Figure 8 addi-tionally shows average PSNRs for different slice thick-nesses, in which MTV once again performs favourably.Reference and super-resolved images were also seg-mented into GM and WM and cerebrospinal fluid(CSF) using SPM12, with the reference segmentationconsidered as ground-truth, and the Dice coefficientwas computed. The results can be seen in Table 3where MTV obtained the highest Dice score.
In the previous section we showed that multi-channelsuper-resolution outperforms some established single-channel techniques, on simulated data. This result ispromising as we now move on to evaluating the methodon real, clinical-grade MR data – a far more challengingscenario. Conversely to simulated data, there is no groundtruth for clinical data. Therefore, we propose to evaluateimplicitly the quality of the reconstructed images by us-ing them as input to machine learning models to predictknown, noise-free features: age and sex.The dataset we used consists of 1,046 patient’s MRIs,with 615 males and 431 females. The dataset was acquiredon a diversity of scanners and clinical indications at UCLH(University College London Hospitals, London, UK) in the11 able 2:
Results comparing single- vs multi-channel super-resolution. BS, FOT and TV are all single-channel techniques, MTV is multi-channel. PSNRs are shown as mean ± std.Channel BS FOT TV MTVT1w . ± .
35 30 . ± .
76 30 . ± . . ± . T2w . ± .
02 28 . ± .
85 28 . ± . . ± . PDw . ± .
85 28 . ± .
02 28 . ± . . ± . Table 3:
Results for segmenting super-resolved images (for 576 subjects). Dice scores for different reconstruction methods were computedfor GM, WM and CSF tissue segmentations, using HR segmentations as references. Results shown as mean ± sd.Tissue class BS FOT TV MTVGM . ± .
001 0 . ± .
001 0 . ± . . ± . WM . ± .
001 0 . ± .
002 0 . ± . . ± . CSF . ± .
002 0 . ± .
001 0 . ± . . ± . Figure 9:
Conjugate gradients vs approximate Newton for solvingthe super-resolution problem. Time in seconds is plotted againstnegative log-likelihood. It can be seen that the approximate Newtonmethod has faster convergence. context of routine clinical care. The dataset comprisesthree contrasts (T1w, T2w and FLAIR), each with a dif-ferent thick-slice direction; the average slice-thickness overthe whole dataset is 6.5 mm. Commonly, the images haveonly partial brain coverage in the thick-slice direction, withthe outer most slices excluded. The age distribution of thedataset is shown in Figure 12a.The gist of our validation relies on training machinelearning models to predict age and sex from tissue seg-mentations of patient MR images. It is now well knownthat these features can be very accurately predicted frombrain images (Cole et al., 2017; He et al., 2018; Monté-Rubio et al., 2018; Smith et al., 2019). Here, we follow theprocedure of Monté-Rubio et al. (2018). Since (accurate)normalised segmentations capture relevant and importantanatomical features of the MRIs, predictive accuracy canbe used as a measure of the quality of a super-resolutionmodel.For each subject in the dataset, the LR images weresuper-resolved to 1 millimetre isotropic with BS (4th or-der) or MTV, and segmented using SPM12’s unified seg- mentation routine (Ashburner and Friston, 2005), with de-fault parameters. This routine outputs normalised, non-modulated, GM, WM and Other (1 - GM - WM) maps,that were smoothed with a Gaussian kernel of 12 mmfull-width at half-maximum (FWHM). The concatenatedsmoothed maps were used as a feature vector for machine-learning. Two Gaussian process models were trained, us-ing 10-fold cross-validation, to predict age and sex fromthis feature vector. The PRoNTo toolbox (Schrouff et al.,2013) was used to make these predictions. Using a convo-lutional neural network based model could have been anoption, but recent studies suggest that kernel methods per-forms comparably for predictive tasks similar to the onesperformed here (He et al., 2018).The results of both the regression and classification taskare shown in Table 4. Age regression results are reportedin years using the root mean square error (RMSE), er-ror standard deviation (SD), mean error (bias) and Pear-son’s correlation coefficient; sex classification accuracy isreported in percentage, and the lower and upper boundover the 10 folds are included. Furthermore, for the MTVcase, a scatter plot of the individual predictions for theregressions task is shown in Figure 12b, and a receiver op-erator curve (ROC) for the classification task is shown inFigure 12c. Images and non-smoothed native space seg-mentations, for an example patient, are shown in Figure10. Our results make it quite clear that the segmentationsproduced from the super-resolved images have more pre-dictive power. This is true for both the regression andclassification task ( c.f.
Table 4). In particular for the clas-sification task, where the improvement in accuracy is 3.3percentage points. It also evident from the example seg-mentations in Figure 10 that anatomical features are moreclearly defined in the super-resolved segmentations. Fur-thermore, comparing, e.g. , the super-resolved close-up ofthe T1w image with its baseline counterpart, it can be seenthat the mismatched fields of views have been filled in forthe super-resolution case. Finally, the runtime of the algo-rithm to super-resolve three LR images, as was performedin this evaluation, is under 30 minutes on a modern work-station. The runtime scales linearly with the number ofchannels.12 econstruct to 1 mm using baseline method Reconstruct to 1 mm using super-resolution method
FLAIR(coronal) T1w(sagittal) T2w(axial)FLAIRClose-up T1wClose-up T2wClose-up
Input thick-sliced clinical MRIs
FLAIR(coronal) T1w(axial)FLAIRClose-up T1wClose-up T2wClose-upT2w(sagittal)
High-res by baseline method
FLAIR(coronal) T2w(sagittal)T1w(axial)FLAIRClose-up T1wClose-up T2wClose-up
High-res by super-resolution method
GM WMGMClose-up WMClose-up
Segmentations from baseline images
GM WMGMClose-up WMClose-up
Segmentations from super-resolved images
Figure 10:
Example results for baseline vs super-resolution, for a randomly selected patient. The box on top shows the three input clinicalscans (FLAIR, T1w and T2w). These three scans are reconstructed to 1 mm isotropic voxel size, using either the baseline (left) or thesuper-resolution method (right). It is clear that the multi-channel segmentation output (GM and WM), produced from the super-resolvedimages have better anatomical detail. For example, the WM has a clearer delineation close to the cortex. e a t u r e s Processed data T a r g e t Regression Classification
Age Sex
Low-res (input)
FLAIRwCoronal T1wSagittal T2wAxial
High-res (1 mm)
FLAIRw T1w T2w
Normalisedsegmentation S =1,046 patients Train and test
Smooth
Gaussian process models
FWHM=12 mmGM, WM, OtherSuper-resolution or baseline~6.5 mm thick-sliced
Figure 11:
Evaluation of the super-resolution method on clinical data. Feature vectors are obtained by concatenating smoothed, graymatter (GM), white matter (WM) and other segmentations. These segmentations are produced from multi-channel MRIs that has beenreconstructed to 1 mm isotropic voxel size using either super-resolution or a baseline method. The prediction targets are either the age orsex of the patients. Predictive performance is then evaluated, for both, methods using Gaussian process models in the PRoNTo toolbox. (a) (b) (c)Figure 12:
Results for evaluating the super-resolution model. (a) Age distribution of the 1,046 patients in the clinical dataset. Mean valueis . and standard deviation is . years. (b) Scatter plot of the individual predictions for the regressions task. (c) ROC curve for theclassification task, where an AUC of 0.97 was obtained.
5. Discussion
This paper presents a super-resolution tool that can beapplied to large datasets of clinical MR scans. Commonlyin such datasets, each patient has a collection of imagesacquired with different MR sequences. Currently, whenperforming some multi-channel analysis, these images areoften simply interpolated to the same size. The model pro-posed here is an alternative to interpolation that betterleverages these multiple MR contrasts. The model buildson a principled probabilistic generative model. The nov-elty of the model lies in the MTV prior, which allows a jointprobability distribution across MR contrasts to be mod-elled. All model parameters are set automatically, and nofine-tuning is necessary, allowing large datasets to be pro-cessed with variable slice-thicknesses and MR contrasts.We showed that, when the super-resolution model is usedas a preprocessing step, subsequent machine-learning taskshave improved results.Data-driven models currently show state-of-the art per-formances in many image processing tasks and could be analternative to the approach proposed in this paper. How- ever, generalisability is still an issue with many such meth-ods, as they excel in scenarios where the unseen data isclose to data the model was trained on but does not ex-trapolate well on out-of-sample test data. This is becausedata-driven models construct an empirical prior from thedata, which leads to a flexible, highly parametrised model– where minor prior assumptions are necessary – but whereoverfitting to a training population can be an issue. This isespecially risky with pathological imaging, because pathol-ogy adds further diversity on already hugely diverse nor-mal biology. Model-based methods, on the other hand,require prior assumptions to be made. To design priorsas flexible as ones learnt by data-driven techniques is ex-tremely challenging, and simplified assumptions are there-fore often made. If the prior assumption is close to thedata generating process, and the forward model and sta-tistical properties of the data is too, then model-basedmethods can perform well on out-of-sample data (and willnot require any retraining). Interestingly, methods havebeen developed that combine model- and data-driven ap-proaches (Adler and Öktem, 2018; Brudfors et al., 2019;Dalca et al., 2019). This combination would be an in-14 able 4:
Results for predicting age and sex from normalised brain segmentations (for 1,046 patients). We compared the proposed super-resolution model to a baseline method, which reslices using 4th order b-spline interpolation. Age regression results are reported in years usingthe root mean square error (RMSE), error standard deviation (SD), mean error (bias) and Pearson’s correlation coefficient. Sex classificationaccuracy is reported in percentage, where the lower and upper bound over the 10 folds are included.Age (years) Sex (%)Method RMSE SD Bias Correlation Accuracy Lower bound Upper boundBaseline 6.91 9.89 -1.73 0.85 87.8 85.6 89.6Super-resolution 6.32 8.66 -1.18 0.88 91.1 89.2 92.7 teresting future direction for the super-resolution modelproposed in this paper.The model we have proposed could be of value in trans-lating methods that have shown good results on researchdata to clinical imaging. For example, many techniquesbased on machine (deep) learning show promising resultson analysing neuroimaging data (Litjens et al., 2017).However, a model trained on HR data may struggle whengiven as input LR clinical data. Applying our super-resolution model as a preprocessing step, reconstructingHR versions from the LR input, could facilitate this tran-sition. The evaluation also showed that the tissue segmen-tation performance of a widely used neuroimaging pack-age improves when the multi-channel input images aresuper-resolved, compared with simply being interpolated.Furthermore, the model could easily be used for multi-channel denoising by replacing the projection matrix inequation (4) with identity .However, metric scores such as those used in this studyare specific to the data that was used to evaluate themodel. Any claim that a method generalises to the hugevariability present in clinical MR scans should thereforebe taken with a pinch of salt. This is not only becausethe imaging data varies greatly: in image contrast, intra-subject alignment and voxel size; but also because thescanner acquiring the images may not have computed thecorrect voxel sizes, slice-thicknesses, etc . In our evaluation,we used a clinical dataset with a large variability amongpatient scans. Still, this is no guarantee that our modelwill work on any clinical MR data. However, data speci-ficity is likely to be greater the more flexible the modelis, such that highly parameterised models may suffer morefrom this specificity.The model proposed in this paper could be made, pos-sibly, more robust in a few ways. One is related to our as-sumptions regarding the slice profile and slice gap. Theseparameters are highly variable and assuming them as fixed,as is currently done, can lead to inexact super-resolvedimages. Of course, a user could change these parame-ters themselves if they know their values. However, theysometimes cannot be obtained. A principled solution tothis problem would be to extend the parametrisation ofthe projection matrices to model both slice gap and pro-file. Although a non-trivial modelling problem, this would Running the denoising version of the model is an option in oursoftware implementation. allow us to estimate the most likely such parameters. Mis-alignment between scans could be another explanation forpoor super-resolved images, as our edge-based prior dis-tribution is highly dependent on well registered LR im-ages. Rather than performing an initial rigid registrationof the LR images (as is currently done) improved align-ment could be achieved by modifying the forward modelin equation (3) to incorporate a rigid transformation. Theoptimal parameters could then be found by Gauss-Newtonoptimisation, similar to what is done in (Ashburner andRidgway, 2013).A well known fact is that TV introduces stair-casingeffects on flat areas, which are abundant in brain MRI.However, the MTV regularisation proposed in this paperseems not to suffer from such artefacts. This is probablybecause MTV uses gradients distributed over all contrasts,which often have orthogonal thick-slice directions. Hence,an image area containing flat gradients in one channel mayvery well have more informative gradients in another chan-nel. Another factor that suppresses these stair-casing arte-facts is our parametrisation of the differential operator,which computes the gradient using both the forward andthe backward finite differences.Many different algorithms are available to solve nons-mooth optimisation problems such as the one that ariseswhen using a nonsmooth TV prior. Here, we used anADMM algorithm because it is relatively fast and easyto implement, and since the quadratic problem that arisescan be easily solved with a multigrid solver that assumesa stationary penalty over gradients. However, it makesproper marginalisation of the (latent) HR image diffi-cult. An alternative could be to use an approach basedon reweighted least-squares (RLS) (Bach et al., 2012;Daubechies et al., 2010; Grohs and Sprecher, 2016), whichmakes use of the bound: (cid:107) z (cid:107) = min w> (cid:26) w (cid:107) z (cid:107) + w (cid:27) . (41)We could substitute this bound, with z = D n Y , in thejoint log-distribution. However, this would require extend-ing the multigrid solver to handle non-stationary penalties.Furthermore, instead of looking for a MAP solution, anapproximate Gaussian posterior that factorises over vox-els and channels could be obtained using variational Bayes(Attias, 2000). When used in combination with RLS, thisposterior would be bounded by a Gaussian. Finally, com-monalities between channels do not solely reduce to edges;intensities co-vary as well. Mutual information between15hannels could be taken into account by introducing an ad-ditional, independent, prior over the reconstructed imagein the form of a multivariate Gaussian mixture (Brudforset al., 2019). This would be, in practice, a joint recon-struction and segmentation framework. When multipleimages are available, commonalities across subjects couldbe learnt as well by using a learnable shape and inten-sity model (Ashburner et al., 2019; Blaiotta et al., 2018).Such an extended Bayesian framework may eventually beable to compete with (deep) learning-based approaches,or could even be defined as to include such models in thegenerative process (Brudfors et al., 2019). Clinical Data:
The clinical data that was used in this paper is a sam-ple of anonymized MR studies obtained within the clinicalroutine, for which ethical and regulatory approval has beenobtained from the Health Research Authority (HRA)
Acknowledgements:
MB was funded by the EPSRC-funded UCL Centre forDoctoral Training in Medical Imaging (EP/L016478/1)and the Department of Health’s NIHR-funded Biomedi-cal Research Centre at University College London Hos-pitals. YB was funded by the MRC and Spinal Re-search Charity through the ERA-NET Neuron joint call(MR/R000050/1). PN was funded by the Wellcome Trustand the NIHR UCLH Biomedical Research Centre. MBand JA were funded by the EU Human Brain Project’sGrant Agreement No 785907 (SGA2).
ReferencesReferences
Adduru, V. R., Michael, A. M., Helguera, M., Baum, S. A., Moore,G. J., 2017. Leveraging clinical imaging archives for radiomics:Reliability of automated methods for brain volume measurement.Radiology 284 (3), 862–869.Adler, J., Öktem, O., 2018. Learned primal-dual reconstruction.IEEE transactions on medical imaging 37 (6), 1322–1332.Aganj, I., Yeo, B. T. T., Sabuncu, M. R., Fischl, B., 2012. On remov-ing interpolation and resampling artifacts in rigid image registra-tion. IEEE Transactions on Image Processing 22 (2), 816–827.Aja-Fernández, S., Tristán-Vega, A., 2013. A review on statisticalnoise models for magnetic resonance imaging. LPI, ETSI Teleco-municacion, Universidad de Valladolid, Spain, Tech. Rep.Alexander, D. C., Zikic, D., Zhang, J., et al., 2014. Image qualitytransfer via random forest regression: applications in diffusionMRI. In: MICCAI. Springer, pp. 225–232.Ashburner, J., 2007. A fast diffeomorphic image registration algo-rithm. Neuroimage 38 (1), 95–113.Ashburner, J., Brudfors, M., Bronik, K., Balbastre, Y., 2019. Analgorithm for learning shape and appearance models without an-notations. Medical image analysis 55, 197.Ashburner, J., Friston, K. J., 2000. Voxel-based morphometry—themethods. Neuroimage 11 (6), 805–821.Ashburner, J., Friston, K. J., 2005. Unified segmentation. Neuroim-age 26 (3), 839–851.Ashburner, J., Ridgway, G. R., 2013. Symmetric diffeomorphic mod-eling of longitudinal structural MRI. Frontiers in neuroscience 6,197. Attias, H., 2000. A variational baysian framework for graphical mod-els. In: Advances in neural information processing systems. pp.209–215.Bach, F., Jenatton, R., Mairal, J., Obozinski, G., et al., 2012.Optimization with sparsity-inducing penalties. Foundations andTrends® in Machine Learning 4 (1), 1–106.Bai, Y., Han, X., Prince, J. L., 2004. Super-resolution reconstruc-tion of MR brain images. In: Proc. of 38th annual conference oninformation sciences and systems (CISS04). pp. 1358–1363.Blaiotta, C., Freund, P., Cardoso, M. J., Ashburner, J., 2018. Gen-erative diffeomorphic modelling of large MRI data sets for proba-bilistic template construction. NeuroImage 166, 117–134.Boyd, S., Parikh, N., Chu, E., et al., 2011. Distributed optimizationand statistical learning via the alternating direction method ofmultipliers. Found Trends Mach Learning 3 (1), 1–122.Brudfors, M., Ashburner, J., Nachev, P., Balbastre, Y., Aug 2019.Empirical Bayesian Mixture Models for Medical Image Transla-tion. arXiv e-prints, arXiv:1908.05926.Brudfors, M., Balbastre, Y., Ashburner, J., 2019. Nonlinear markovrandom fields learned via backpropagation. In: International Con-ference on Information Processing in Medical Imaging. Springer,pp. 805–817.Brudfors, M., Balbastre, Y., Nachev, P., Ashburner, J., 2018.MRI Super-Resolution using Multi-Channel Total Variation. In:MIUA. Southampton, UK.Cengiz, S., Valdes-Hernandez, M. d. C., Ozturk-Isik, E., 2017. Su-per resolution convolutional neural networks for increasing spatialresolution of 1H magnetic resonance spectroscopic imaging. In:Annual Conference on Medical Image Understanding and Analy-sis. Springer, pp. 641–650.Chatnuntawech, I., Martin, A., Bilgic, B., Setsompop, K., Adal-steinsson, E., Schiavi, E., 2016. Vectorial total generalized varia-tion for accelerated multi-channel multi-contrast MRI. Magneticresonance imaging 34 (8), 1161–1170.Chen, Y., Shi, F., Christodoulou, A. G., Xie, Y., Zhou, Z., Li, D.,2018. Efficient and accurate MRI super-resolution using a gener-ative adversarial network and 3D multi-Level densely connectednetwork. In: International Conference on Medical Image Comput-ing and Computer-Assisted Intervention. Springer, pp. 91–99.Chun, I. Y., Fessler, J. A., 2017. Convolutional dictionary learning:Acceleration and convergence. IEEE Transactions on Image Pro-cessing 27 (4), 1697–1712.Cole, J. H., Poudel, R. P., Tsagkrasoulis, D., Caan, M. W., Steves,C., Spector, T. D., Montana, G., 2017. Predicting brain age withdeep learning from raw imaging data results in a reliable and her-itable biomarker. NeuroImage 163, 115–124.Coupé, P., Manjón, J. V., Chamberland, M., Descoteaux, M., Hiba,B., 2013. Collaborative patch-based super-resolution for diffusion-weighted images. NeuroImage 83, 245–261.Dalca, A. V., Bouman, K. L., Freeman, W. T., Rost, N. S., Sabuncu,M. R., Golland, P., 2018. Medical Image Imputation from ImageCollections. IEEE transactions on medical imaging.Dalca, A. V., Yu, E., Golland, P., Fischl, B., Sabuncu, M. R., Iglesias,J. E., 2019. Unsupervised deep learning for Bayesian brain MRIsegmentation. arXiv preprint arXiv:1904.11319.Daubechies, I., DeVore, R., Fornasier, M., Güntürk, C. S., 2010. Iter-atively reweighted least squares minimization for sparse recovery.Communications on Pure and Applied Mathematics: A JournalIssued by the Courant Institute of Mathematical Sciences 63 (1),1–38.Dohmatob, E. D., Gramfort, A., Thirion, B., et al., 2014. Bench-marking solvers for TV- (cid:96) reenspan, H., Peled, S., Oz, G., Kiryati, N., 2001. MRI inter-slicereconstruction using super-resolution. In: International Confer-ence on Medical Image Computing and Computer-Assisted Inter-vention. Springer, pp. 1204–1206.Grohs, P., Sprecher, M., 2016. Total variation regularization on Rie-mannian manifolds by iteratively reweighted minimization. In: In-formation and Inference. Vol. 4. pp. 353–378.Gudbjartsson, H., Patz, S., 1995. The Rician distribution of noisyMRI data. Magnetic resonance in medicine 34 (6), 910–914.He, T., Kong, R., Holmes, A., Nguyen, M., Sabuncu, M., Eickhoff,S. B., Bzdok, D., Feng, J., Yeo, B. T., 2018. Do deep neuralnetworks outperform kernel regression for functional connectivityprediction of behavior? BioRxiv, 473603.Hestenes, M. R., Stiefel, E., 1952. Methods of conjugate gradientsfor solving linear systems. Vol. 49. NBS Washington, DC.Huang, T., Yang, J., 2010. Image super-resolution: Historicaloverview and future challenges. In: Super-resolution imaging.CRC Press, pp. 19–52.Jog, A., Carass, A., Prince, J. L., 2014. Improving magnetic reso-nance resolution with supervised learning. In: Biomedical Imaging(ISBI), 2014 IEEE 11th International Symposium on. IEEE, pp.987–990.Jog, A., Carass, A., Prince, J. L., 2016. Self super-resolution for mag-netic resonance images. In: International Conference on MedicalImage Computing and Computer-Assisted Intervention. Springer,pp. 553–560.Kainz, B., Steinberger, M., Wein, W., Kuklisova-Murgasova, M.,Malamateniou, C., Keraudren, K., Torsney-Weir, T., Rutherford,M., Aljabar, P., Hajnal, J. V., et al., 2015. Fast volume reconstruc-tion from motion corrupted stacks of 2D slices. IEEE transactionson medical imaging 34 (9), 1901–1913.Kamnitsas, K., Ledig, C., Newcombe, V. F., Simpson, J. P., Kane,A. D., Menon, D. K., Rueckert, D., Glocker, B., 2017. Efficientmulti-scale 3D CNN with fully connected CRF for accurate brainlesion segmentation. Medical image analysis 36, 61–78.Klöppel, S., Stonnington, C. M., Chu, C., Draganski, B., Scahill,R. I., Rohrer, J. D., Fox, N. C., Jack Jr, C. R., Ashburner, J.,Frackowiak, R. S., 2008. Automatic classification of MR scans inAlzheimer’s disease. Brain 131 (3), 681–689.Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F.,Ghafoorian, M., Van Der Laak, J. A., Van Ginneken, B., Sánchez,C. I., 2017. A survey on deep learning in medical image analysis.Medical image analysis 42, 60–88.Liu, H., Michel, E., Casey, S. O., Truwit, C. L., 2002. Actual imagingslice profile of 2D MRI. In: Medical Imaging 2002: Physics ofMedical Imaging. Vol. 4682. International Society for Optics andPhotonics, pp. 767–773.Manjón, J. V., Coupé, P., Buades, A., et al., 2010a. MRI superres-olution using self-similarity and image priors. Int J Biomed Imag2010, 17.Manjón, J. V., Coupé, P., Buades, A., et al., 2010b. Non-local MRIupsampling. Med Image Anal 14 (6), 784–792.Miller, K. L., Alfaro-Almagro, F., Bangerter, N. K., Thomas, D. L.,Yacoub, E., Xu, J., Bartsch, A. J., Jbabdi, S., Sotiropoulos, S. N.,Andersson, J. L., et al., 2016. Multimodal population brain imag-ing in the UK Biobank prospective epidemiological study. Natureneuroscience 19 (11), 1523.Monté-Rubio, G. C., Falcón, C., Pomarol-Clotet, E., Ashburner, J.,2018. A comparison of various MRI feature types for characteriz-ing whole brain anatomical differences using linear pattern recog-nition methods. NeuroImage 178, 753–768.Ollier, W., Sprosen, T., Peakman, T., 2005. UK Biobank: from con-cept to reality.Palmer, L. J., 2007. UK Biobank: bank on it. The Lancet 369 (9578),1980–1982.Peeters, R. R., Kornprobst, P., Nikolova, M., Sunaert, S., Vieville,T., Malandain, G., Deriche, R., Faugeras, O., Ng, M., Van Hecke,P., 2004. The use of super-resolution techniques to reduce slicethickness in functional MRI. International Journal of Imaging Sys-tems and Technology 14 (3), 131–138.Peled, S., Yeshurun, Y., 2001. Super-resolution in MRI: application to human white matter fiber tract visualization by diffusion tensorimaging. Magnetic Resonance in Medicine: An Official Journalof the International Society for Magnetic Resonance in Medicine45 (1), 29–35.Pham, C.-H., Ducournau, A., Fablet, R., Rousseau, F., 2017. BrainMRI super-resolution using deep 3D convolutional networks. In:Biomedical Imaging (ISBI 2017), 2017 IEEE 14th InternationalSymposium on. IEEE, pp. 197–200.Plenge, E., Poot, D. H., Bernsen, M., Kotek, G., et al., 2012.Super-resolution methods in MRI: Can they improve the trade-off between resolution, signal-to-noise ratio, and acquisition time?Magn Reson Med 68 (6), 1983–1993.Plenge, E., Poot, D. H., Niessen, W. J., Meijering, E., 2013. Super-resolution reconstruction using cross-scale self-similarity in multi-slice MRI. In: International Conference on Medical Image Com-puting and Computer-Assisted Intervention. Springer, pp. 123–130.Poot, D. H., Van Meir, V., Sijbers, J., 2010. General and efficientsuper-resolution method for multi-slice MRI. In: Internationalconference on medical image computing and computer-assistedintervention. Springer, pp. 615–622.Reuter, M., Schmansky, N. J., Rosas, H. D., Fischl, B., 2012. Within-subject template estimation for unbiased longitudinal image anal-ysis. Neuroimage 61 (4), 1402–1418.Roobottom, C., Mitchell, G., Morgan-Hughes, G., 2010. Radiation-reduction strategies in cardiac computed tomographic angiogra-phy. Clinical radiology 65 (11), 859–867.Rousseau, F., Initiative, A. D. N., et al., 2010a. A non-local approachfor image super-resolution using intermodality priors. Medical im-age analysis 14 (4), 594–605.Rousseau, F., Kim, K., Studholme, C., Koob, M., Dietemann, J.-L.,2010b. On super-resolution for fetal brain MRI. In: InternationalConference on Medical Image Computing and Computer-AssistedIntervention. Springer, pp. 355–362.Rudin, L. I., Osher, S., Fatemi, E., 1992. Nonlinear total variationbased noise removal algorithms. Physica D: nonlinear phenomena60 (1-4), 259–268.Rueda, A., Malpica, N., Romero, E., 2013. Single-image super-resolution of brain MR images using overcomplete dictionaries.Medical image analysis 17 (1), 113–132.Sapiro, G., Ringach, D. L., 1996. Anisotropic diffusion of multivaluedimages with applications to color filtering. IEEE T Image Process5 (11), 1582–1586.Schrouff, J., Rosa, M. J., Rondina, J. M., Marquand, A. F., Chu,C., Ashburner, J., Phillips, C., Richiardi, J., Mourao-Miranda,J., 2013. PRoNTo: pattern recognition for neuroimaging toolbox.Neuroinformatics 11 (3), 319–337.Shi, F., Cheng, J., Wang, L., Yap, P.-T., Shen, D., 2015. LRTV:MR image super-resolution with low-rank and total variation reg-ularizations. IEEE transactions on medical imaging 34 (12), 2459–2466.Shilling, R. Z., Robbie, T. Q., Bailloeul, T., Mewes, K., Mersereau,R. M., Brummer, M. E., 2009. A super-resolution framework for 3-D high-resolution and high-contrast imaging using 2-D multisliceMRI. IEEE transactions on medical imaging 28 (5), 633–644.Sindel, A., Breininger, K., Käßer, J., Hess, A., Maier, A., Köhler,T., 2018. Learning from a Handful Volumes: MRI ResolutionEnhancement with Volumetric Super-Resolution Forests. arXivpreprint arXiv:1802.05518.Smith, S. M., Nichols, T. E., 2018. Statistical challenges in “BigData” human neuroimaging. Neuron 97 (2), 263–268.Smith, S. M., Vidaurre, D., Alfaro-Almagro, F., Nichols, T. E.,Miller, K. L., 2019. Estimation of brain age delta from brain imag-ing. NeuroImage.Tanno, R., Worrall, D. E., Ghosh, A., Kaden, E., Sotiropoulos,S. N., Criminisi, A., Alexander, D. C., 2017. Bayesian imagequality transfer with cnns: Exploring uncertainty in dmri super-resolution. In: International Conference on Medical Image Com-puting and Computer-Assisted Intervention. Springer, pp. 611–619.Tourbier, S., Bresson, X., Hagmann, P., Thiran, J.-P., Meuli, R., uadra, M. B., 2015. An efficient total variation algorithm forsuper-resolution in fetal brain MRI with adaptive regularization.NeuroImage 118, 584–597.Van Essen, D. C., Smith, S. M., Barch, D. M., Behrens, T. E., Ya-coub, E., Ugurbil, K., Consortium, W.-M. H., et al., 2013. TheWU-Minn human connectome project: an overview. Neuroimage80, 62–79.Van Reeth, E., Tham, I. W., Tan, C. H., et al., 2012. Super-resolutionin magnetic resonance imaging: A review. Concepts Magn Reson40 (6), 306–325.Yang, J., Yin, W., Zhang, Y., Wang, Y., 2009. A fast algorithm foredge-preserving variational multichannel image restoration. SIAMJ Imaging Sci 2 (2), 569–592.Yue, L., Shen, H., Li, J., Yuan, Q., Zhang, H., Zhang, L., 2016.Image super-resolution: The techniques, applications, and future.Signal Processing 128, 389–408.Zhang, X., Lam, E. Y., Wu, E. X., Wong, K. K., 2008. Applicationof Tikhonov regularization to super-resolution reconstruction ofbrain MRI images. In: Medical imaging and informatics. Springer,pp. 51–56.Zhang, Y., Brady, M., Smith, S., 2001. Segmentation of brainMR images through a hidden Markov random field model andthe expectation-maximization algorithm. IEEE Trans Med Imag20 (1), 45–57.Zhao, C., Carass, A., Dewey, B. E., Woo, J., Oh, J., Calabresi,P. A., Reich, D. S., Sati, P., Pham, D. L., Prince, J. L., 2018. Adeep learning based anti-aliasing self super-resolution algorithmfor MRI. In: International Conference on Medical Image Comput-ing and Computer-Assisted Intervention. Springer, pp. 100–108.Zheng, H., Qu, X., Bai, Z., Liu, Y., Guo, D., Dong, J., Peng, X.,Chen, Z., 2017. Multi-contrast brain magnetic resonance imagesuper-resolution using the local weight similarity. BMC medicalimaging 17 (1), 6.uadra, M. B., 2015. An efficient total variation algorithm forsuper-resolution in fetal brain MRI with adaptive regularization.NeuroImage 118, 584–597.Van Essen, D. C., Smith, S. M., Barch, D. M., Behrens, T. E., Ya-coub, E., Ugurbil, K., Consortium, W.-M. H., et al., 2013. TheWU-Minn human connectome project: an overview. Neuroimage80, 62–79.Van Reeth, E., Tham, I. W., Tan, C. H., et al., 2012. Super-resolutionin magnetic resonance imaging: A review. Concepts Magn Reson40 (6), 306–325.Yang, J., Yin, W., Zhang, Y., Wang, Y., 2009. A fast algorithm foredge-preserving variational multichannel image restoration. SIAMJ Imaging Sci 2 (2), 569–592.Yue, L., Shen, H., Li, J., Yuan, Q., Zhang, H., Zhang, L., 2016.Image super-resolution: The techniques, applications, and future.Signal Processing 128, 389–408.Zhang, X., Lam, E. Y., Wu, E. X., Wong, K. K., 2008. Applicationof Tikhonov regularization to super-resolution reconstruction ofbrain MRI images. In: Medical imaging and informatics. Springer,pp. 51–56.Zhang, Y., Brady, M., Smith, S., 2001. Segmentation of brainMR images through a hidden Markov random field model andthe expectation-maximization algorithm. IEEE Trans Med Imag20 (1), 45–57.Zhao, C., Carass, A., Dewey, B. E., Woo, J., Oh, J., Calabresi,P. A., Reich, D. S., Sati, P., Pham, D. L., Prince, J. L., 2018. Adeep learning based anti-aliasing self super-resolution algorithmfor MRI. In: International Conference on Medical Image Comput-ing and Computer-Assisted Intervention. Springer, pp. 100–108.Zheng, H., Qu, X., Bai, Z., Liu, Y., Guo, D., Dong, J., Peng, X.,Chen, Z., 2017. Multi-contrast brain magnetic resonance imagesuper-resolution using the local weight similarity. BMC medicalimaging 17 (1), 6.