[PDF] Learning image registration without images

Abstract

Full PDF

LLearning image registration without images

Malte Hoffmann

MGH, HMS

Benjamin Billot

UCL

Juan Eugenio Iglesias

MGH, HMS, UCL, and CSAIL, MIT

Bruce Fischl

MGH, HMS, and CSAIL, MIT

Adrian V. Dalca

MGH, HMS, and CSAIL, MIT

Abstract

We introduce a learning strategy for contrast-invariant image registration withoutrequiring imaging data. While classical registration methods accurately estimatethe spatial correspondence between images, they solve a costly optimization prob-lem for every image pair. Learning-based techniques are fast at test time, but canonly register images with image contrast and geometric content that are similar tothose available during training. We focus on removing this image-data dependencyof learning methods. Our approach leverages a generative model for diverse labelmaps and images that exposes networks to a wide range of variability during train-ing, forcing them to learn features invariant to image type (contrast). This strategyresults in powerful networks trained to generalize to a broad array of real inputimages. We present extensive experiments, with a focus on 3D neuroimaging, show-ing that this strategy enables robust registration of arbitrary image contrasts withoutthe need to retrain for new modalities. We demonstrate registration accuracy thatmost often surpasses the state of the art both within and across modalities, usinga single model. Critically, we show that input labels from which we synthesizeimages need not be of actual anatomy: training on randomly generated geometricshapes also results in competitive registration performance, albeit slightly lessaccurate, while alleviating the dependency on real data of any kind. Our code isavailable at http://voxelmorph.csail.mit.edu . Image registration estimates spatial correspondences between image pairs and is a fundamentalcomponent of many medical image analysis pipelines involving data acquired across time, subjects,and modalities. For example, neuroimaging segmentation packages [21, 23, 50] often performdeformable registration to a probabilistic atlas before neuroanatomical structures are labeled. Thiswork focuses on frameworks that are agnostic to image modality, and excel both at uni-modalregistration (e.g. between two T1-weighted MRI scans), as well as multi-modal registration (e.g.between MRI acquired with different contrasts). Both are important in neuroimaging, where differentcontrasts are commonly acquired, such as T1w for visualizing anatomy or T2w for detecting abnormalﬂuids [45]. These scans need to be aligned within subjects and to an external template, possibly ofanother modality [62]. Throughout this paper, we use the terms modality and contrast interchangeably,to designate a mode of acquisition such as MRI variants, which can yield images of very differentappearance for the same anatomy.Classical registration approaches estimate a deformation ﬁeld between two images by optimizingan objective that balances image similarity with ﬁeld regularity [3, 5, 43, 48, 56, 59, 67]. While thesemethods provide strong theoretical background and can yield good results, the expensive optimizationneeds to be repeated for each image pair, and the optimization objective and strategy typically needto be adapted to suit the types of images being registered. Instead, learning-based registration uses

Preprint. Under review. a r X i v : . [ c s . C V ] J un atasets of images to learn a function that maps an image pair to a deformation ﬁeld aligning theimages [6, 18, 24, 42, 55, 61, 72, 76]. These methods achieve sub-second GPU runtimes, and canimprove accuracy and robustness to local minima. Unfortunately, they are limited to the imagecontrast available during training, and therefore do not perform well on unobserved (new) imagetypes. For example, a model trained on pairs of T1w and T2w MRI scans will not accurately registerT1w and proton-density weighted (PDw) pairs. With a focus on neuroimaging, we remove thisconstraint of learning methods of registering only image types available during training, and designan approach that generalizes to any unseen image contrast at test time. Contribution.

In this work we present

SynthMorph , a general learning framework for modality-agnostic registration that can handle a wide variety of unseen image contrasts at test time.

SynthMorph enables registration of image pairs both within and across contrasts, without the needfor any real imaging data during training. We ﬁrst introduce a generative model for random labelmaps of variable geometric shapes. Second, conditioned on these maps, or optionally given othermaps of interest, we build on recent methods to generate images of arbitrary contrasts, deformations,and artifacts [10]. Third, we replace the typical image-based objective with a contrast-agnostic lossthat measures label overlap. The resulting network learns general features contained within thesynthesized image data, invariant to the contrast of a speciﬁc modality.We focus on neuroimaging and demonstrate state-of-the-art performance and runtimes comparedto existing learning-based and classical baselines. The proposed framework outperforms existingmethods across MRI contrasts, even though the network has never been exposed to any real imagingdata. We also study various aspects of the proposed method, including feature invariance and theeffects of hyperparameters, architecture size, and simulation aspects. The code is available as part ofthe

VoxelMorph library at http://voxelmorph.csail.mit.edu . Classical methods.

Deformable registration has been widely studied [3, 5, 7, 59, 67]. Classicalstrategies undergo an iterative procedure that estimates an optimal deformation ﬁeld for each imagepair. This involves maximizing an image-similarity metric, which compares the warped movingand ﬁxed images, and a regularization term that encourages desirable deformation properties suchas preservation of topology [43, 48, 56, 59]. Cost function and optimization strategies are typicallychosen to suit a particular task. Simple similarity metrics such as mean squared difference (MSD)or normalized cross-correlation (CC) [5] are widely used, and often provide excellent accuracy forimage pairs of the same contrast [37].For registration across modalities, metrics such as mutual information (MI) [68] and correlationratio [54] are often employed, although the accuracy achieved with these cross-contrast measures is noton par with the within-contrast accuracy of CC and MSD [34]. For some tasks, e.g. registering intra-operative ultrasound to MRI, estimating even approximate correspondences can be challenging [64].Although not often used in neuroimaging, metrics based on patch similarity [26, 28, 46, 69, 74] andnormalized gradient ﬁelds [38, 39, 60] outperform simpler metrics, e.g. on abdominal computer-tomography (CT). Other methods convert images to a supervoxel representation, which is thenspatially matched instead of the images [29, 36]. Although our work also employs geometric shapes,we do not generate supervoxels from input images, but synthesize arbitrary patterns (and images) fromscratch during training to encourage learning contrast-invariant features for spatial correspondence.Unfortunately, classical registration methods solve an independent optimization problem for eachimage pair, omitting advantages of learning across data. Neuroimaging scans typically have millionsof voxels, leading to runtimes of several minutes to hours, depending on computational resources andpackages used.

Learning approaches.

Learning-based techniques mostly use convolutional neural networks (CNNs)to learn a function that directly outputs the deformation ﬁeld given an image pair. After training,evaluating this function is efﬁcient, enabling fast registration. Supervised models learn to reproducesimulated warps or deformation ﬁelds estimated by classical methods [20, 41, 55, 61, 75, 76]. Incontrast, unsupervised models minimize a loss similar to classical cost functions [6, 16, 17, 40] suchas normalized MI (NMI) [25] for cross-contrast registration. In another multi-modal registrationparadigm, networks are used to synthesize one modality from the other, so that within-modality lossescan be used in subsequent nonlinear registration [9, 14, 34, 52, 58, 63].2 oise p Noise p Noise p Warped p Warped p Warped p s k = argmax j ([ p j ] k ) Figure 1: Generation of random input label maps. A set of smoothly varying 3D noise images p j ( j ∈ { , , ..., J } ) is sampled from a standard distribution, then warped by random deformation ﬁelds φ j to cover a range of spatial scales and shapes. A label map s is synthesized from the warped images ˜ p j = p j ◦ φ j : for each voxel k of s , we determine in which ˜ p j that voxel has the highest intensity,and assign the corresponding label j , i.e. s k = arg max j ([˜ p j ] k ) . The example uses J =26 .Recent approaches also use segmentation-driven losses for registering different imaging modalitieslabeled during training, such as T2w MRI and 3D ultrasound, within the same subject [32, 33], oraiding existing formulations with auxiliary segmentation data [6, 30, 44].Unfortunately, all these netowrk-based approaches learn from image data seen during training and,consequently, do not perform well on unobserved modalities. Data augmentation strategies expose amodel to a wider range of variability than the training data encompasses, for example by randomlyaltering voxel intensities or applying deformations [13, 57, 73, 78]. However, even these methods stillneed to sample data acquired with the target modality during training. Similarly, transfer learningcan be used to extend a trained network to new modalities, but does not remove the need for sometraining data with the target contrast [35]. Let m and f be a moving and a ﬁxed 3D image, respectively. We build on unsupervised learning-based registration frameworks and focus on deformable (non-linear) registration. These use a CNN h θ with parameters θ that outputs the deformation φ θ = h θ ( m, f ) for image pair { m, f } .In training, the network h θ is given a pair of images { m, f } at each iteration, and parametersare updated by optimizing a loss function L ( θ ; m, f, φ θ ) similar to classical cost functions, usingstochastic gradient descent. Typically, the loss contains an image dissimilarity term L dis ( m ◦ φ θ , f ) which penalizes the difference between the warped image and the ﬁxed in terms of appearance, and aregularization term L reg ( φ ) that encourages smooth deformations: L ( θ ; m, f, φ θ ) = L dis ( m ◦ φ θ , f ) + λ L reg ( φ θ ) , (1) Input s S h a p e s Label map s m Label map s f Image m Image f Field u m f B r a i n s Figure 2: Data synthesis. Top: from random shapes. Bottom: if available, the synthesis canbe initialized with anatomical labels. From an input label map s , we generate a pair of labelmaps { s m , s f } and from them images { m, f } with arbitrary contrast. The registration network takes { m, f } as input to predict the displacement ﬁeld u m → f . In practice, if anatomical labels are available,we generate { s m , s f } from two segmentations s from separate subjects.3 arped segmentation 𝑠 ! ∘ 𝜙 𝑣 Image sampling model 𝑔 " (𝑠 ! , 𝑠 , 𝑧) Modality agnosticCNN ℎ $ (𝑚, 𝑓) Dice term ℒ %&’ Label sampling model 𝑔 ’ (𝑧) Regularization term ℒ ()* Moving labels 𝑠 ! Fixedlabels 𝑠 Moving image 𝑚 Fixed image 𝑓 Deformation 𝜙 𝑣 via 𝑣 integration Figure 3: Proposed unsupervised learning framework for modality-agnostic diffeomorphic registration.We synthesize a pair of 3D label maps { s m , s f } and then the corresponding 3D images { m, f } basedon a generative model that covers a wide range of shapes and contrasts. The images are used to traina U-Net-style network and the label maps are used in the loss to measure alignment.where φ θ = h θ ( m, f ) is the network output, and λ controls the weighting of the terms. Unfortunately,networks trained this way only predict reasonable deformations for images with contrasts and shapessimilar to the data observed during training. In our framework, we tackle this dependency. We achieve contrast-invariance and robustness to anatomical variability by requiring no real trainingdata, but instead synthesizing arbitrary contrasts and shapes through a generative process (Figure 3).We start from scratch, synthesizing two 3D paired label maps { s m , s f } using a function g s ( z ) = { s m , s f } given random seed z . We then deﬁne the function g i ( s m , s f , z ) = { m, f } that synthesizestwo 3D intensity volumes { m, f } based on the maps { s m , s f } and seed z .This generative process resolves the limitations of existing methods as follows. First, training aregistration network h θ ( m, f ) using the generated images exposes it to arbitrary contrasts and shapesat each iteration, removing network dependency on speciﬁc modalities. Second, because we ﬁrstsynthesize label maps, we can use a similarity loss that measures label overlap independent of imagecontrast, thereby obviating the need for a cost function that depends on the contrasts being registeredat that iteration. In our experiments, we use the (soft) Dice metric [47] L (cid:48) dis ( φ, s m , s f ) = − J J (cid:88) j =1 | ( s jm ◦ φ ) (cid:12) s jf || ( s jm ◦ φ ) ⊕ s jf | , (2)where s j represents the one-hot encoded label j ∈ { , , ..., J } of label map s , and (cid:12) and ⊕ denotevoxel-wise multiplication and addition, respectively. 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑚, 𝑓 𝑣 " 𝜙 " integrate andupsample Figure 4: Convolutional U-Net architecture implementing φ θ = h θ ( m, f ) . Each block of the encoderfeatures a 3D convolution with n =256 ﬁlters and a LeakyReLU layer with parameter 0.2. Theencoding stride-2 convolutions each halve the resolution relative to the input. In the decoder, eachconvolution is followed by an upsampling layer and a skip connection to the corresponding encoderblock (indicated by long arrows). The SVF v θ is obtained after three more convolutions at halfresolution, yielding the deformation ﬁeld φ θ through integration and upsampling. All convolutionallayers use × × kernels, and the ﬁnal convolution leverages n =3 ﬁlters to bring v θ to the desiredshape. 4able 1: Hyperparameters. Spatial measures are given in voxels. In our experiments, images and labelmaps have a volume size of × × . If ﬁelds are sampled at lower resolution r , the volumesize is obtained by multiplying each dimension by r and rounding up (e.g. × × for r = 1:40 ). Parameter r p b p a µ b µ a σ b σ r B b B b K σ γ r v b v Value 1:32 100 25 225 5 25 1:40 0.3 1 0.25 1:16 2

While the framework can be used with any parametrization of the deformation ﬁeld φ , in thiswork we use a stationary velocity ﬁeld (SVF) v , which is integrated within the network to obtain adiffeomorphism [2, 3, 16], which is invertible by design. We regularize φ using L reg ( φ ) = (cid:107)∇ u (cid:107) , where u is the displacement of deformation ﬁeld φ = Id + u . To generate input label maps with J labels of random geometric shapes, we ﬁrst draw J smoothly varying noise images p j ( j ∈ { , , ..., J } ) by sampling voxels from a standard distributionat reduced resolution r p and upsampling to full size (Figure 1). Second, each image p j is warpedwith a random smooth deformation ﬁeld φ j (described below) to obtain images ˜ p j = p j ◦ φ j . Third,we create an input label map s by assigning, for each voxel k of s , the label j among the images ˜ p j that has the highest intensity, i.e. s k = arg max j ([˜ p j ] k ) (Figure 1).Given a selected label map s , we generate two new label maps. First, we deform s with a randomsmooth diffeomorphic transformation φ m (described below) using nearest-neighbor interpolation toproduce the moving segmentation map s m = s ◦ φ m . An analogous process yields the ﬁxed map s f .Alternatively, anatomical segmentations for the anatomy of interest can be used, for example from apublic dataset. In this case, we select segmentation maps from two separate subjects at random andfurther deform them as described above (Figure 2). We emphasize that no acquired images, such asT1w or T2w MRI scans, are involved. Synthetic images.

From the pair of label maps { s m , s f } , we synthesize gray-scale images { m, f } building on generative models of MR images used for Bayesian segmentation [4, 10, 65, 71, 77](Figure 2). Given a segmentation map s , we draw the intensities of all image voxels that areassociated with label j as independent samples from the normal distribution N ( µ j , σ j ) . We samplethe mean µ j and standard deviation (SD) σ j for each label from continuous distributions U ( a µ , b µ ) and U ( a σ , b σ ) , respectively, where a µ , b µ , a σ , and b σ are hyperparameters. To simulate partialvolume effects [66], we convolve the image using an anisotropic Gaussian kernel K ( σ i =1 , , ) where σ i =1 , , ∼ U (0 , b K ) .We further corrupt the image with a spatially varying intensity-bias ﬁeld B [8, 31]. We independentlysample the voxels of B from a normal distribution N (0 , σ B ) at reduced resolution r B relative tothe full image size (described below), where σ B ∼ U (0 , b B ) . We upsample B to full size, and takethe exponential of each voxel to yield non-negative values before we apply B using element-wisemultiplication. We obtain the ﬁnal images m and f through min-max normalization and contrastaugmentation through exponentiation, using the normally distributed parameter γ ∼ N (0 , σ γ ) suchthat m = ˜ m exp( γ ) , where ˜ m is the normalized moving image, and similarly for the ﬁxed image(Figure 2). Random transforms.

We obtain the transforms φ j ( j = 1 , , ..., J ) for noise image p j by integratingrandom SVFs v j [2, 3, 16, 40]. We draw each voxel of v j as an independent sample of a normaldistribution N (0 , σ j ) at reduced resolution r p , where σ j ∼ U (0 , b p ) is sampled uniformly, andeach SVF is upsampled to full size. Analogous processes yield the transforms φ m and φ f based onhyperparameters r v and b v . The generative process encompasses a number of parameters. During training,we sample these based on the hyperparameters presented in Table 1. Their values are not chosen tomimic realistic anatomy or a particular modality. Instead, we select hyperparameters visually to yieldshapes and contrasts that far exceed the range of realistic medical images, to force the networks to5earn generalizable features that are independent of the characteristics of a speciﬁc modality [13]. Wethoroughly analyze the impact of varying hyperparameters in our experiments.

Architecture.

The network implementations follow the

VoxelMorph architecture [6, 16]: a convo-lutional U-Net [57] predicts an SVF v θ from the input { m, f } . As shown in Figure 4, the encoderhas 4 blocks consisting of a stride-2 convolution and a LeakyReLU layer (parameter 0.2), thateach halve the resolution relative to the inputs. The decoder features 3 blocks that each include astride-1 convolution, an upsampling layer and a skip connection to the corresponding encoder block.We obtain the SVF v θ after 3 further convolutions at half resolution, and the deformation φ θ afterintegration and upsampling. All convolutional layers in the network use n =256 ﬁlters of size × × ,except for the last one ( n =3 ). Implementation.

Networks are implemented using Keras [15] with a TensorFlow backend [1]. Weuse a GPU version [16, 40] of scaling and squaring with 5 steps to integrate SVFs [2, 3].

We evaluate several variants of the proposed registration network and compare its performance to aset of baselines. The datasets include a variety of image contrasts and levels of processing to assessmethod robustness. Our main goal is for

SynthMorph to generalize well to unseen real contrastsduring rapid test time registration, while matching or exceeding the accuracy of classical methods, orlearning methods tested on the same data type they were trained on.

We compile 3D brain-MRI datasets from the Human Connectome Aging Project (HCP-A) [11, 27] and the Alzheimer’s Disease Neuroimaging Initiative (ADNI) [70]. HCP-A includesT1w MPRAGE [49] and T2w T2SPACE scans of the same subjects without preprocessing. ADNIincludes T1w gradient-unwarped and bias-ﬁeld corrected MPRAGE scans. We also use skullstrippedin-house PDw scans from 8 subjects. For image synthesis for one model variant, we use 40 labelmaps from distinct-subject MPRAGE scans of the Buckner40 dataset [22]. Brain and non-brain labelsfor skullstripping, image synthesis and evaluation are derived using

SAMSEG [50], except for the PDwdata, which include manual maps. As we focus on deformable registration, all images are mapped toa common afﬁne space [21, 53] at 1 mm isotropic resolution.

Setup.

For each contrast, we run experiments on 30 image pairs, where each image is from a differentsubject, except for T1-PD pairs, of which we have only 8. To assess robustness to non-brain structures,we evaluate registration within and across datasets, with and without skull-stripping, using additionalheld-out datasets of the same size.

Baselines.

We test classical registration with

ANTs ( SyN ) [5] using default parameters [51] for theNCC similarity metric within contrast, and MI across contrasts. We test

NiftyReg [48] with defaultcost function (NMI) and parameters and enable its diffeomorphic model with SVF integration asin our approach. Both

ANTs and

NiftyReg focus on neuroimaging, leading to appropriate defaultparameters for our tasks. We also run the deedsBCV patch-similarity method, which we tune forneuroimaging. To match the spatial scales of brain structures, we reduce the default grid spacing,search radius and quantization step to × × × × , × × × × and × × × × , respectively,which improves registration in our experiments.As a learning baseline, we train VoxelMorph ( vm ), using the same architecture as SynthMorph but with NCC, on 100 skull-stripped T1w HCP-A images that do not overlap with validation data,and another model trained with NMI on random combinations of 100 T1w and 100 T2w images(including e.g. T2w-T2w pairs).

SynthMorph variants.

For imaging-data and shape-agnostic training, we sample { s m , s f } fromone of 100 random-shape segmentations s at each iteration ( sm-shapes ). Each s contains J =26 labels that we all include in the loss L dis . Since brain segmentations are often available, even if notfor the target modality, we also test our framework when starting with a set of brain-anatomy labelmaps ( sm-brains ). In this case, we sample { s m , s f } from two distinct maps at each iteration andfurther deform these label maps using synthetic warps. The segmentations include non-brain labels,6 b T1-T1 b,x

T2-T2 b T1-PD b,x

T1-T2 b T1-T20.30.40.50.60.70.8 D i c e c o e ff i c i e n t D ANTsNiftyRegdeedsBCVvm-nccvm-nmism-shapessm-brains

Figure 5: Registration accuracy. Each box shows overlap of anatomical structures for 30 test-imagepairs across distinct subjects (8 when using PD contrast). The letters b and x indicate skullstrippeddata and registration across datasets (e.g. between ADNI and HCP-A), respectively.and we optimize the J =26 largest brain labels in L dis (see below). We emphasize that no real imagesare used during SynthMorph training.

Assessment.

We measure registration accuracy using the Dice metric D [19] across a representativesubset of brain structures: amygdala, brainstem, caudate, ventral DC, cerebellar white matter andcortex, pallidum, cerebral white matter (WM) and cortex, hippocampus, lateral ventricle, putamen,thalamus, 3rd and 4th ventricle, and choroid-plexus. Bilateral structure scores are averaged. We alsocompute the proportion of voxels where the deformation φ folds, i.e. with det( J φ ) ≤ for voxelJacobian J φ . Framework analysis.

To evaluate network invariance to realistic MRI contrast, we perform thefollowing procedure for 10 subject pairs. We obtain 8 spoiled gradient-echo [12] images for the samebrain, progressing from T1w to PDw as shown in Figure 7, using Bloch equation simulations withacquired parametric maps (T1, T2 ∗ , PD). We run a separate registration between each contrast and themost T1w-like of another subject, and analyze the variation of the features of the last network layer(before the SVF is formed). Speciﬁcally, we compute the root-mean-square difference d (RMSD)between the layer outputs of the ﬁrst and each other contrast over space, averaged over contrasts,features, and subjects. Original MRI sequence details are: spoiled gradient-echo, 2 ms echo time(TE), 20 ms repetition time (TR), 2-40 ◦ ﬂip angle (FA).In a validation set, we explore the effect of various hyperparameters. First, we train with regularizationweights λ ∈ [0 , and evaluate accuracy across: (1) all brain labels, (2) only the largest 26 (bilateral)structures optimized in L dis . Second, we train variants of our model with varied deformation range b v ,image smoothness b K , and number of features n per layer. Third, for the case that brain segmentationsare available, we analyze the effect of training with (1) full-head labels, (2) brain labels only or (3) amixture of both. 7 oving m T - T b Fixed f m sm brains m classical u sm brains u classical T - T b , x T - T b T - P D b , x T - T b T - T Figure 6: Typical registration results for

SynthMorph ( sm-brains ) and classical methods. Each rowshows a registration pair from the source datasets indicated on the left, where the letters b and x markskullstripped data and registration across datasets (e.g. between ADNI and HCP-A), respectively. Foreach dataset, we show the best-performing classical baseline: NiftyReg on the 1 st , ANTs on the 2 nd ,and deedsBCV on all other rows. Pair 1 M o v i n g m Pair 2 Pair 3 Pair 4 Pair 5 Pair 6 Pair 7 Pair 8 F i x e d f Figure 7: MRI contrasts used to assess network invariance. We obtain images with real MR contrastsprogressing from proton-density (PD, top left) to T1-weighted for the same brain using Bloch-equation simulations with acquired parametric maps (T1, T2 ∗ , PD). For each subject pair, we compileand illustrate 8 contrasts. 8 L N o r m a li z e d R M S D d vm-nccvm-nmi sm-shapessm-brains Figure 8: Variability of network layers in response to 10 realistic MRI-contrast pairs from distinctsubjects as shown in Figure 7. We use normalized root-mean-square deviation (RMSD) d betweeneach contrast and the most T1w-like, averaged over contrasts, features, and subjects. For the analysis,networks use the same architecture with n =64 ﬁlters per convolutional layer. Error bars indicatestandard deviation across features. Typical registration results are shown in Figure 6. Figure 5 compares mean Dice scores acrossstructures for all methods, while Figure 12 shows scores for individual structures. By exploitingthe anatomical information in a set of brain labels, sm-brains achieves the best accuracy across alldatasets, even though no real images are used during training. First, sm-brains outperforms classicalmethods on all tasks by at least 2 Dice points, and often much more ( p< . for T1-PD, p< × − for all other tasks). Second, it matches the state-of-the art accuracy of vm-ncc for T1-T1 registration,which was trained on real T1 images. Across contrasts, sm-brains outperforms all other methodsby 3 or more Dice points, demonstrating its ability to generalize to contrasts, compared especially tothe baseline learning methods which cannot generalize to contrast unseen during training.The shape and contrast agnostic network sm-shapes matches the performance of the best classicalmethod for each dataset except T1-T1 registration, where it slightly underperforms, despite neverhaving been exposed to either imaging data or even brain anatomical shapes. Like sm-brains , sm-shapes generalizes well to multi-contrast registration, matching or exceeding the accuracy of allbaselines.The baseline learning methods vm-ncc and vm-nmi perform well and clearly outperform classicalmethods when faced with contrasts similar to those used in training. However, as expected, theseapproaches break down when tested on a pair of new contrasts.In our experiments, learning-based models require less than 1 second per 3D registration on an NvidiaTesla V100 GPU. Using default settings, NiftyReg and

ANTs typically take ∼ . h and ∼ . h ona 3.3-GHz Intel Xeon CPU (single-threaded), respectively, whereas deedsBCV typically requires ∼ min. Figure 8 illustrates the variability of the response of each network layer to varying MRI contrast ofthe same anatomy shown in Figure 7. Figure 9 contains example feature maps.

SynthMorph modelsexhibit substantially less variability than baseline learning models for each feature, indicating that theproposed strategy encourages contrast-invariance.Figure 10 shows registration performance for various training settings. For sm-brains , best resultsare obtained with low deformation strength b v , when label maps s from two different subjects areused at each iteration (Figure 10A). A larger value of b v =2 is optimal for sm-shapes , for which { s m , s f } are generated from a single s , thus lacking the inter-subject deformation.9 e a t F e a t F e a t F e a t F e a t F e a t F e a t F e a t F e a t Pair 1 F e a t Pair 2 Pair 3 Pair 4 Pair 1 Pair 2 Pair 3VoxelMorph (vm-nmi) SynthMorph (sm-brains) Pair 4

Figure 9: Representative features of the last network layer (before forming the SVF) in responseto the realistic MRI-contrast pairs shown in Figure 7. Left:

VoxelMorph using normalized mutualinformation (NMI), exhibiting variability among contrasts. Right: contrast-invariant

SynthMorph ( sm-brains) . For the analysis, both networks use the same architecture with n =64 ﬁlters perconvolutional layer. 10 b v D i c e D A b K B vm-brainsvm-shapes 32 64 96 256Filters n C Figure 10: Framework analysis showing the effect of training settings on registration accuracy: A . Maximum velocity-ﬁeld SD b v . B . Maximum image-smoothing SD b K . C . Number of ﬁlters n per convolutional layer.Random blurring of the images { m, f } improves robustness to data with different smoothing levels,with optimal accuracy at b K =1 (Figure 10B). Higher numbers of ﬁlters n per convolutional layerboost the accuracy at the cost of increasing training times from days to weeks (Figure 10C), indicatingthat richer networks are better able to capture and generalize from synthesized data. Finally, trainingon full-head as compared to skull-stripped images had little impact on accuracy (not shown).Figure 11A shows that with reducing regularization, accuracy increases for the large structures usedin L dis . When we include smaller structures, the mean overlap D reduces for λ < , as the networkthen focuses on optimizing the training structures. This does not apply to sm-shapes , which isagnostic to anatomy and trained on all synthetic labels present in the random maps. Figure 11B showsa small proportion of locations where the warp ﬁeld folds, decreasing with increasing λ . For testresults, we use a value of λ = 1 , where the proportion of folding voxels is ∼ − at our numericalprecision. Unless indicated, all hyperparameters are tested using n =64 convolutional ﬁlters per layerto enable analysis. We propose

SynthMorph , an unsupervised learning framework for modality-invariant image registra-tion, training networks with samples of a generative model with widely varying synthetic contrast andshape. In our experiments, we ﬁnd

SynthMorph variants to achieve registration accuracy matching oroutperforming classical methods, and are signiﬁcantly faster. If brain labels are available, they enableeven further improvement. While learning-based methods like

VoxelMorph yield results comparableto sm-brains for contrasts observed during training, they break down when presented with othercontrasts, whereas

SynthMorph remains robust.A signiﬁcant challenge in deployment of neural networks is their generalizability.

SynthMorph performance is achieved although our network does not have access to the contrasts present in thetest set, nor indeed to any real MRI data, obviating the need for re-training for a new modality. Weempirically observe this invariance directly by examining network channel outputs. We therefore D i c e D A all labelstraining labels 0.0 0.5 1.0 1.5 2.0Regularization 10 | d e t ( J ) | B d e t ( J ) C deedsBCNiftyRegANTssm-brains Figure 11: Regularization analysis. A Registration accuracy. B Proportion of locations wherethe deformation ﬁeld φ folds, i.e. with a Jacobian J φ such that det( J φ ) ≤ (of . × voxels). C Average Jacobian determinant. For λ ≥ , the deviation from the the ideal value is below × − .11elieve this strategy can be broadly applied to registration networks to limit the need for training datawhile simultaneously improving applicability.While the proposed technique addresses important drawbacks of classical and learning-based methodsfor within and across contrast registration, it can be expanded in several ways. Since some data isoften available at training, we will investigate whether combining real images and synthetic datamight improve registration accuracy in popular use cases, while maintaining invariance to unseen data. Combining our training loss with classical cost functions may also improve accuracy providingbetter signal in label interiors. Finally, while we are focused on neuroimaging, this approach promisesto be extensible to images of other body parts, and may eventually lead to a single registration modelthat can accurately align unprocessed images of any modality and body part, or even outside medicalimaging. Broader impact

Any image registration algorithm that depends on a collection of images can be biased by thedemographic makeup of the available subjects. For example, concerns have been expressed in theneuroimaging community that atlases used to instantiate a reference coordinate system are ripe forbias due to the relatively homogeneous subject populations used to construct them.In this work we focus on the development of a general-purpose tool for nonlinear contrast-agnostic(e.g. T1w-to-PDw MRI) registration to ﬁll an unmet need in medical imaging. We achieve this viatwo approaches: ﬁrst, using brain label maps (no images) that are then warped based on deformationsthat are substantially more extreme than what is seen in the real world, and second, using no real databut instead training a network with labels and images generated purely from noise. We believe thatthese approaches have technical promise, and, from a societal perspective, can considerably reducebias by eliminating the need for any training population and hence a potentially biased demographicsample.For proper deployment, the framework needs to be evaluated thoroughly, including the range ofgeometric and imaging features that are important in the safe generalizability of such networks.Because the method presents a new training paradigm, it needs to be analyzed on a representative,broad set of imaging data to ensure consistent results across patient demographic as well as anatomicaland pathological variability. Although more research is needed for safe deployment, we believe theseapproaches hold promise in providing fair and impartial algorithms.Finally, we also hope that the method will eliminate the need to acquire large datasets for registeringa particular modality, relieving environmental burdens, due to technologies involved in acquisition,and scientiﬁc/societal burdens, as it has the potential to free researchers from both extra acquisitionsand manually labeling those data for evaluation or training.

Acknowledgments

This research was supported by the European Research Council (ERC Starting Grant 677697, projectBUNGEE-TOOLS), and by the National Institutes of Health (NIH) grant U01-AG052564.Further support was provided in part by the BRAIN Initiative Cell Census Network grant U01-MH117023, the National Institute for Biomedical Imaging and Bioengineering (P41-EB015896,1R01-EB023281, R01-EB006758, R21-EB018907, R01-EB019956), the National Institute on Aging(1R56-AG064027, 1R01-AG064027, 5R01-AG008122, R01-AG016495), the National Institute ofMental Health the National Institute of Diabetes and Digestive and Kidney Diseases (1-R21-DK-108277-01), the National Institute for Neurological Disorders and Stroke (R01-NS0525851, R21-NS072652, R01-NS070963, R01-NS083534, 5U01-NS086625, 5U24-NS10059103, R01-NS105820).The project also beneﬁted from the resources provided by Shared Instrumentation Grants 1S10-RR023401, 1S10-RR019307, and 1S10-RR023043. Additional support was provided by the NIHBlueprint for Neuroscience Research (5U01-MH093765), part of the multi-institutional Human Con-nectome Project. BF has a ﬁnancial interest in CorticoMetrics, a company focusing on brain imagingand measurement technologies. BF’s interests were reviewed and are managed by MassachusettsGeneral Hospital and Partners HealthCare in accordance with their conﬂict of interest policies.12 eferences [1] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, et al.Tensorﬂow: A system for large-scale machine learning. In , pages 265–283, 2016.[2] Vincent Arsigny, Olivier Commowick, Xavier Pennec, and Nicholas Ayache. A log-euclideanframework for statistics on diffeomorphisms. In

MICCAI , pages 924–931. Springer, 2006.[3] John Ashburner. A fast diffeomorphic image registration algorithm.

Neuroimage , 38(1):95–113,2007.[4] John Ashburner and Karl J Friston. Uniﬁed segmentation.

Neuroimage , 26:839–851, 2005.[5] B.B. Avants, C.L. Epstein, M. Grossman, and J.C. Gee. Symmetric diffeomorphic image regis-tration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerativebrain.

Medical Image Analysis , 12(1):26 – 41, 2008.[6] Guha Balakrishnan, Amy Zhao, Mert Sabuncu, John Guttag, and Adrian V. Dalca. Voxelmorph:A learning framework for deformable medical image registration.

IEEE Trans. Med. Im. ,38(8):1788–00, 2019.[7] M Faisal Beg, Michael I Miller, Alain Trouvé, and Laurent Younes. Computing large deforma-tion metric mappings via geodesic ﬂows of diffeomorphisms.

International journal of computervision , 61(2):139–157, 2005.[8] Boubakeur Belaroussi, Julien Milles, Sabin Carme, Yue Min Zhu, and Hugues Benoit-Cattin.Intensity non-uniformity correction in mri: existing methods and their validation.

Medicalimage analysis , 10(2):234–246, 2006.[9] Chitresh Bhushan, Justin P Haldar, Soyoung Choi, Anand A Joshi, David W Shattuck, andRichard M Leahy. Co-registration and distortion correction of diffusion and anatomical imagesbased on inverse contrast normalization.

Neuroimage , 115:269–280, 2015.[10] Benjamin Billot, Douglas Greve, Koen Van Leemput, Bruce Fischl, Juan Eugenio Iglesias, andAdrian V Dalca. A learning strategy for contrast-agnostic mri segmentation. arXiv preprintarXiv:2003.01995 , 2020.[11] Susan Y. Bookheimer, David H. Salat, Melissa Terpstra, Beau M. Ances, Deanna M. Barch,Randy L. Buckner, Gregory C. Burgess, et al. The lifespan human connectome project in aging:An overview.

NeuroImage , 185:335 – 348, 2019.[12] Richard B Buxton, Robert R Edelman, Bruce R Rosen, Gary L Wismer, and Thomas J Brady.Contrast in rapid mr imaging: T1- and t2-weighted imaging.

Journal of computer assistedtomography , 11(1):7–16, 1987.[13] Krishna Chaitanya, Neerav Karani, Christian F Baumgartner, Anton Becker, Olivio Donati, andEnder Konukoglu. Semi-supervised and task-driven data augmentation. In

IPMI , pages 29–41.Springer, 2019.[14] Min Chen, Aaron Carass, Amod Jog, Junghoon Lee, Snehashis Roy, and Jerry L Prince. Crosscontrast multi-channel image registration using image synthesis for mr brain images.

Medicalimage analysis , 36:2–14, 2017.[15] François Chollte. Keras: The python deep learning library, 2015.[16] Adrian V. Dalca, Guha Balakrishnan, John Guttag, and Mert Sabuncu. Unsupervised learningof probabilistic diffeomorphic registration for images and surfaces.

Medical Image Analysis ,57:226–236, 2019.[17] Bob D de Vos, Floris F Berendsen, Max A Viergever, Hessam Sokooti, Marius Staring, and IvanaIšgum. A deep learning framework for unsupervised afﬁne and deformable image registration.

Medical image analysis , 52:128–143, 2019.[18] Bob D de Vos, Floris F Berendsen, Max A Viergever, Marius Staring, and Ivana Išgum. End-to-end unsupervised deformable image registration with a convolutional neural network. In

DeepLearning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support ,pages 204–212. Springer, 2017.[19] Lee R Dice. Measures of the amount of ecologic association between species.

Ecology ,26(3):297–302, 1945. 1320] K. A. J. Eppenhof and J. P. W. Pluim. Pulmonary ct registration through supervised learningwith convolutional neural networks.

IEEE Transactions on Medical Imaging , 38(5):1097–1105,2019.[21] Bruce Fischl. Freesurfer.

NeuroImage , 62(2):774 – 781, 2012. 20 YEARS OF fMRI.[22] Bruce Fischl, David H Salat, Evelina Busa, Marilyn Albert, Megan Dieterich, Christian Hasel-grove, et al. Whole brain segmentation: automated labeling of neuroanatomical structures inthe human brain.

Neuron , 33(3):341–355, 2002.[23] R.S.J. Frackowiak, K.J. Friston, C. Frith, R. Dolan, C.J. Price, S. Zeki, J. Ashburner, and W.D.Penny.

Human Brain Function . Academic Press, 2nd edition, 2003.[24] Christoph Guetter, Chenyang Xu, Frank Sauer, and Joachim Hornegger. Learning basednon-rigid multi-modal image registration using kullback-leibler divergence. In

InternationalConference on Medical Image Computing and Computer-Assisted Intervention , pages 255–262.Springer, 2005.[25] Courtney K Guo.

Multi-modal image registration with unsupervised deep learning . PhD thesis,Massachusetts Institute of Technology, 2019.[26] Eldad Haber and Jan Modersitzki. Intensity gradient based registration and fusion of multi-modalimages. In

International Conference on Medical Image Computing and Computer-AssistedIntervention , pages 726–733. Springer, 2006.[27] Michael P. Harms, Leah H. Somerville, Beau M. Ances, Jesper Andersson, Deanna M. Barch,et al. Extending the human connectome project across ages: Imaging protocols for the lifespandevelopment and aging projects.

NeuroImage , 183:972 – 984, 2018.[28] Mattias P. Heinrich, Mark Jenkinson, Manav Bhushan, Tahreema Matin, Fergus V. Gleeson,Sir Michael Brady, and Julia A. Schnabel. Mind: Modality independent neighbourhooddescriptor for multi-modal deformable registration.

Medical Image Analysis , 16(7):1423 – 1435,2012. Special Issue on the 2011 Conference on Medical Image Computing and ComputerAssisted Intervention.[29] Mattias P. Heinrich, Ivor J.A. Simpson, BartŁomiej W. Papie˙z, Sir Michael Brady, and Julia A.Schnabel. Deformable image registration by combining uncertainty estimates from supervoxelbelief propagation.

Medical Image Analysis , 27:57 – 71, 2016. Discrete Graphical Models inBiomedical Image Analysis.[30] Alessa Hering, Sven Kuckertz, Stefan Heldmann, and Mattias P Heinrich. Enhancing label-driven deep deformable image registration with local distance metrics for state-of-the-art cardiacmotion tracking. In

Bildverarbeitung für die Medizin 2019 , pages 309–314. Springer, 2019.[31] Zujun Hou. A review on mr image intensity inhomogeneity correction.

International journal ofbiomedical imaging , 2006, 2006.[32] Yipeng Hu, Marc Modat, Eli Gibson, Nooshin Ghavami, Ester Bonmati, Caroline M Moore,Mark Emberton, J Alison Noble, Dean C Barratt, and Tom Vercauteren. Label-driven weakly-supervised learning for multimodal deformable image registration. In , pages 1070–1074. IEEE, 2018.[33] Yipeng Hu, Marc Modat, Eli Gibson, Wenqi Li, Nooshin Ghavami, Ester Bonmati, Guotai Wang,Steven Bandula, Caroline M Moore, Mark Emberton, et al. Weakly-supervised convolutionalneural networks for multimodal image registration.

Medical image analysis , 49:1–13, 2018.[34] Juan Eugenio Iglesias, Ender Konukoglu, Darko Zikic, Ben Glocker, Koen Van Leemput, andBruce Fischl. Is synthesizing mri contrast useful for inter-modality analysis? In

MICCAI , pages631–638. Springer, 2013.[35] Konstantinos Kamnitsas, Christian Baumgartner, Christian Ledig, Virginia Newcombe, JoannaSimpson, Andrew Kane, David Menon, Aditya Nori, et al. Unsupervised domain adaptation inbrain lesion segmentation with adversarial networks. In

IPMI , pages 597–609, 2017.[36] Fahdi Kanavati, Tong Tong, Kazunari Misawa, Michitaka Fujiwara, Kensaku Mori, DanielRueckert, and Ben Glocker. Supervoxel classiﬁcation forests for estimating pairwise imagecorrespondences.

Pattern Recognition , 63:561 – 569, 2017.[37] Arno Klein, Jesper Andersson, Babak A Ardekani, John Ashburner, Brian Avants, Ming-ChangChiang, et al. Evaluation of 14 nonlinear deformation algorithms applied to human brain MRIregistration.

Neuroimage , 46(3):786–802, 2009.1438] Lars König, Alexander Derksen, Marc Hallmann, and Nils Papenberg. Parallel and memoryefﬁcient multimodal image registration for radiotherapy using normalized gradient ﬁelds. In , pages 734–738. IEEE,2015.[39] Lars König and Jan Rühaak. A fast and accurate parallel algorithm for non-linear imageregistration using normalized gradient ﬁelds. In , pages 580–583. IEEE, 2014.[40] Julian Krebs, Hervé Delingette, Boris Mailhé, Nicholas Ayache, and Tommaso Mansi. Learninga probabilistic model for diffeomorphic registration.

IEEE TMI , 38(9):2165–2176, 2019.[41] Julian Krebs, Tommaso Mansi, Hervé Delingette, Li Zhang, Florin Ghesu, et al. Robustnon-rigid registration through agent-based action learning. In

MICCAI , pages 344–352, 2017.[42] Hongming Li and Yong Fan. Non-rigid image registration using fully convolutional networkswith deep self-supervision. arXiv preprint arXiv:1709.00799 , 2017.[43] M. Lorenzi, N. Ayache, G.B. Frisoni, and X. Pennec. Lcc-demons: A robust and accuratesymmetric diffeomorphic registration algorithm.

NeuroImage , 81:470 – 483, 2013.[44] Lucas Mansilla, Diego H Milone, and Enzo Ferrante. Learning deformable registration ofmedical images with anatomical constraints.

Neural Networks , 124:269–279, 2020.[45] Donald W McRobbie, Elizabeth A Moore, Martin J Graves, and Martin R Prince.

MRI fromPicture to Proton . Cambridge university press, 2017.[46] Matthew Mellor and Michael Brady. Phase mutual information as a similarity measure forregistration.

Medical image analysis , 9(4):330–343, 2005.[47] Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Fully convolutional neuralnetworks for volumetric medical image segmentation. In , pages 565–571, 2016.[48] Marc Modat, Gerard R. Ridgway, Zeike A. Taylor, Manja Lehmann, Josephine Barnes, David J.Hawkes, et al. Fast free-form deformation using graphics processing units.

Computer Methodsand Programs in Biomedicine , 98(3):278 – 284, 2010. HP-MICCAI 2008.[49] John P Mugler III and James R Brookeman. Three-dimensional magnetization-prepared rapidgradient-echo imaging (3d mp rage).

MRM , 15(1):152–157, 1990.[50] Oula Puonti, Juan Eugenio Iglesias, and Koen Van Leemput. Fast and sequence-adaptivewhole-brain segmentation using parametric bayesian modeling.

NeuroIm. , 143:235–249, 2016.[51] Dorian Pustina and Philip Cook. Anatomy of an antsregistration call, 2017.[52] Chen Qin, Bibo Shi, Rui Liao, Tommaso Mansi, Daniel Rueckert, and Ali Kamen. Unsuperviseddeformable registration for multi-modal images via disentangled representations. In

IPMI , pages249–261. Springer, 2019.[53] Martin Reuter, H. Diana Rosas, and Bruce Fischl. Highly accurate inverse consistent registration:A robust approach.

NeuroImage , 53(4):1181 – 1196, 2010.[54] Alexis Roche, Grégoire Malandain, Xavier Pennec, and Nicholas Ayache. The correlation ratioas a new similarity measure for multimodal image registration.

MICCAI , pages 1115–24, 1998.[55] Marc Rohé, Manasi Datar, Tobias Heimann, Maxime Sermesant, and Xavier Pennec. SVF-Net:Learning deformable image registration using shape matching.

MICCAI , pages 266–274, 2017.[56] Karl Rohr, H Siegfried Stiehl, Rainer Sprengel, Thorsten M Buzug, Jürgen Weese, andMH Kuhn. Landmark-based elastic registration using approximating thin-plate splines.

IEEETMI , 20(6):526–534, 2001.[57] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks forbiomedical image segmentation.

CoRR , abs/1505.04597, 2015.[58] Snehashis Roy, Aaron Carass, Amod Jog, Jerry L Prince, and Junghoon Lee. Mr to ct registrationof brains using image synthesis. In

Medical Imaging 2014: Image Processing , volume 9034,page 903419. International Society for Optics and Photonics, 2014.[59] Daniel Rueckert, Luke I Sonoda, Carmel Hayes, Derek LG Hill, Martin O Leach, and David JHawkes. Nonrigid registration using free-form deformations: application to breast mr images.

IEEE TMI , 18(8):712–721, 1999. 1560] Jan Rühaak, Lars König, Marc Hallmann, Nils Papenberg, Stefan Heldmann, Hanno Schu-macher, and Bernd Fischer. A fully parallel algorithm for multimodal image registration usingnormalized gradient ﬁelds. In ,pages 572–575. IEEE, 2013.[61] Hessam Sokooti, Bob de Vos, Floris Berendsen, Boudewijn PF Lelieveldt, Ivana Išgum, andMarius Staring. Nonrigid image registration using multi-scale 3d convolutional neural networks.In

MICCAI , pages 232–239. Springer, 2017.[62] Ramesh Sridharan, Adrian V Dalca, Kaitlin M Fitzpatrick, Lisa Cloonan, Allison Kanakis, OnaWu, Karen L Furie, Jonathan Rosand, Natalia S Rost, and Polina Golland. Quantiﬁcation andanalysis of large multimodal clinical image studies: Application to stroke. In

InternationalWorkshop on Multimodal Brain Image Analysis , pages 18–30. Springer, 2013.[63] Christine Tanner, Firat Ozdemir, Romy Profanter, Valeriy Vishnevsky, Ender Konukoglu, andOrcun Goksel. Generative adversarial networks for mr-ct deformable image registration. arXivpreprint arXiv:1807.07349 , 2018.[64] Bram van Ginneken, Sjoerd Kerkstra, and James Meakin. Miccai challenge 2019 for correctionof brain shift with intra-operative ultrasound (curious 2019), 2019.[65] Koen Van Leemput, Frederik Maes, Dirk Vandermeulen, and Paul Suetens. Automated model-based tissue classiﬁcation of mr images of the brain.

IEEE TMI , 18:897–908, 1999.[66] Koen Van Leemput, Frederik Maes, Dirk Vandermeulen, and Paul Suetens. A unifying frame-work for partial volume segmentation of brain mr images.

IEEE transactions on medicalimaging , 22(1):105–119, 2003.[67] Tom Vercauteren, Xavier Pennec, Aymeric Perchant, and Nicholas Ayache. Diffeomorphicdemons: Efﬁcient non-parametric image registration.

NeuroImage , 45(1):S61–S72, 2009.[68] Paul Viola and William M Wells III. Alignment by maximization of mutual information.

International journal of computer vision , 24(2):137–154, 1997.[69] Christian Wachinger and Nassir Navab. Entropy and laplacian images: Structural representationsfor multi-modal registration.

Medical image analysis , 16(1):1–17, 2012.[70] Michael W Weiner. Alzheimer’s disease neuroimaging initiative (ADNI) database, 2003.[71] William M Wells, W Eric L Grimson, Ron Kikinis, and Ferenc A Jolesz. Adaptive segmentationof mri data.

IEEE transactions on medical imaging , 15(4):429–442, 1996.[72] Guorong Wu, Minjeong Kim, Qian Wang, Brent C Munsell, and Dinggang Shen. Scalablehigh-performance image registration framework by unsupervised deep feature representationslearning.

IEEE Transactions on Biomedical Engineering , 63(7):1505–1516, 2015.[73] Junshen Xu, Molin Zhang, Esra Abaci Turk, Larry Zhang, P Ellen Grant, Kui Ying, PolinaGolland, and Elfar Adalsteinsson. Fetal pose estimation in volumetric mri using a 3d convolutionneural network. In

MICCAI , pages 403–410. Springer, 2019.[74] Z. Xu, C. P. Lee, M. P. Heinrich, M. Modat, D. Rueckert, S. Ourselin, R. G. Abramson, andB. A. Landman. Evaluation of six registration methods for the human abdomen on clinicallyacquired ct.

IEEE Transactions on Biomedical Engineering , 63(8):1563–1572, 2016.[75] Xiao Yang, Roland Kwitt, and Marc Niethammer. Fast predictive image registration. In

DeepLearning and Data Labeling for Medical Applications , pages 48–57. Springer, 2016.[76] Xiao Yang, Roland Kwitt, Martin Styner, and Marc Niethammer. Quicksilver: Fast predictiveimage registration–a deep learning approach.

NeuroImage , 158:378–396, 2017.[77] Yongyue Zhang, Michael Brady, and Stephen Smith. Segmentation of brain mr images througha hidden markov random ﬁeld model and the expectation-maximization algorithm.

IEEEtransactions on medical imaging , 20(1):45–57, 2001.[78] Amy Zhao, Guha Balakrishnan, Frédo Durand, John V. Guttag, and Adrian V. Dalca. Dataaugmentation using learned transforms for one-shot medical image segmentation.

CoRR ,abs/1902.09383, 2019. 16 r a i n - S t e m T h a l a m u s L a t e r a l - V e n t r i c l e P u t a m e n V e n t r a l D CC e r e b e ll u m - C o r t e x A m y gd a l a P a lli d u m C e r e b e ll u m - W M C a u d a t e C e r e b r a l - W M H i pp o c a m p u s r d - V e n t r i c l e t h - V e n t r i c l e C e r e b r a l - C o r t e x C h o r o i d - P l e x u s . . . . . . . . Dice coefficient D A N T s N i f t y R e gd ee d s B C V v m - n cc v m - n m i s m - s h a p e ss m - b r a i n s F i gu r e : R e g i s t r a ti on acc u r ac yo f a n a t o m i ca l s t r u c t u r e s f o rf u ll - h ea d T w - t o - T w r e g i s t r a ti on ( H C P - A ) . E ac hbox s u mm a r i ze s t h e l a b e l ov e r l a p f o r a s p ec i ﬁ c s t r u c t u r eac r o ss t e s t - i m a g e p a i r s o f d i s ti n c t s ub j ec t s ..