Higher-Order Momentum Distributions and Locally Affine LDDMM Registration
aa r X i v : . [ c s . C V ] N ov HIGHER-ORDER MOMENTUM DISTRIBUTIONS AND LOCALLYAFFINE LDDMM REGISTRATION
STEFAN SOMMER ∗ , MADS NIELSEN ∗ , † , SUNE DARKNER ∗ , AND
XAVIER PENNEC ‡ Abstract.
To achieve sparse parametrizations that allows intuitive analysis, we aim to representdeformation with a basis containing interpretable elements, and we wish to use elements that havethe description capacity to represent the deformation compactly . To accomplish this, we introduce inthis paper higher-order momentum distributions in the LDDMM registration framework. While thezeroth order moments previously used in LDDMM only describe local displacement, the first-ordermomenta that are proposed here represent a basis that allows local description of affine transfor-mations and subsequent compact description of non-translational movement in a globally non-rigiddeformation. The resulting representation contains directly interpretable information from bothmathematical and modeling perspectives. We develop the mathematical construction of the registra-tion framework with higher-order momenta, we show the implications for sparse image registrationand deformation description, and we provide examples of how the parametrization enables registra-tion with a very low number of parameters. The capacity and interpretability of the parametrizationusing higher-order momenta lead to natural modeling of articulated movement, and the methodpromises to be useful for quantifying ventricle expansion and progressing atrophy during Alzheimer’sdisease.
Key words.
LDDMM, diffeomorphic registration, RHKS, kernels, momentum, computationalanatomy
AMS subject classifications.
1. Introduction.
In many image registration applications, we wish to describethe deformation using as few parameters as possible and with a representation thatallows intuitive analysis: we search for parametrizations with basis elements that havethe capacity to describe deformation sparsely while being directly interpretable . Forinstance, we wish to use such a representation to compactly describe the progressiveatrophy that occurs in the human brain suffering from Alzheimer’s disease and thatcan be detected by the expansion of the ventricles [19, 13].Image registration algorithms often represent translational movement in a densesampling of the image domain. Such approaches fail to satisfy the above goals: lowdimensional deformations such as expansion of the ventricles will not be representedsparsely; the registration algorithm must optimize a large number of parameters; andthe expansion cannot easily be interpreted from the registration result.In this paper, we use higher-order momentum distributions in the LDDMM regis-tration framework to obtain a deformation parametrization that increases the capacity of sparse approaches with a basis consisting of interpretable elements. We show howthe higher-order representation model locally affine transformations, and we use thecompact deformation description to register points and images using very few param-eters. We illustrate how the deformation coded by the higher-order momenta can bedirectly interpreted and that it represents information directly useful in applications:with low numbers of control points, we can detect the expanding ventricles of thepatient shown in Figure 1.1.
Most of the methods for non-rigid registration in medicalimaging model the displacement of each spatial position by either a combination of ∗ Dept. of Computer Science, Univ. of Copenhagen, Denmark ( [email protected] ) † BiomedIQ, Copenhagen, Denmark ‡ Asclepios Project-Team, INRIA Sophia-Antipolis, France1
50 100 150 200 25050100150200250 (a) Baseline with control points.
50 100 150 200 25050100150200250 1020304050607080 (b) Follow up (box marking zoom area, figure(c) and (d)).
90 100 110 120 130 140 150 160 170708090100110120130140 −0.04−0.0200.020.040.060.08 (c) log-Jacobian in ventricle area.
80 90 100 110 120 130 140 150 160 170 180708090100110120130140 (d) Initial deformation field in ventricle area.
Fig. 1.1 . Progressing Alzheimer’s disease cause atrophy and expansion of the ventricles. Byplacing five deformation atoms in the ventricle area of the baseline MRI scan [20] and by using higher-order momenta , we can detect the expansion. (a) The position of the deformation atomsshown in the baseline scan; (b) the follow up scan; (c) the log -Jacobian determinant of the generateddeformation in the ventricle area (red box in (b)); (d) the vector field at t = 0 of the generateddeformation. The logarithm of the Jacobian determinant and the divergence at the deformationatoms are positive which is in line with the expected ventricle expansion, confer also Figure 7.5. suitable basis functions for the displacement itself or for the velocity of the voxels.The number of control points vary between one for each voxel [2, 17, 7] and fewer withlarger basis functions [25, 5, 11]. For all methods, the infinite-dimensional space ofdeformations is approximated by the finite- but high-dimensional subspace spannedby the parametrization of the individual method. The approximation will be goodif the underlying deformation is close to this subspace, and the representation willbe compact, if few basis functions describe the deformation well. The choice of ba-sis functions play a crucial role, and we will in the rest of the paper denote them deformation atoms . Two main observations constitute the motivation for the workpresented in this paper: Observation 1: Order of the Deformation Model.
In the majority of registrationmethods, the deformation atoms model the local translation of each point. We wisha richer representation which is in particular able to model locally linear componentsin addition to local translations. The Polyaffine and Log-Euclidean Polyaffine [3,1] frameworks pursue this by representing the velocity of a path of deformationslocally by matrix logarithms. Ideas from the Polyaffine methods have recently beenincorporated in e.g. the Demons algorithm [32] but, to the best of our knowledge, notin the LDDMM registration framework. We wish to extend the set of deformationatoms used in LDDMM to allow representation of first- and higher-order structure andhence incorporate the benefits of the Polyaffine methods in the LDDMM framework.
Observation 2: Order of the Similarity Measure.
When registering DT images,the reorientation is a function of the derivative of the warp; curve normals also containdirectional information which is dependent on the warp derivative and airway treescontain directional information in the tree structure which can be used for measuringsimilarity. These are examples of similarity measures containing higher-order infor-mation . For the case of image registration, the warp derivative may also enter theequation either directly in the similarity measure [24, 22] or to allow use of more imageinformation than provided by a sampling of the warp. Consider an image similaritymeasure on the form U ( ϕ ) = R Ω F ( I m ( ϕ − ( x )) , I f ( x )) dx . A finite sampling of thedomain Ω can approximate this with˜ U ( ϕ ) = 1 N N X k =1 F ( I m ( ϕ − ( x k )) , I f ( x k )) . Letting { p , . . . , p P } be uniformly distributed points around 0, we can increase theamount of image information used in ˜ U ( ϕ ) without additional sampling of the warpby using a first-order approximation of ϕ − :˜ U ( ϕ ) = 1 N P N X k =1 N X l =1 F ( I m ( Dϕ − p l + ϕ − ( x k )) , I f ( p l + x k )) . This can be considered an increase from zeroth to first-order in the approximation of U . Besides including more image information than provided by the initial samplingof the warp, the increase in order allows capture of non-translational information -e.g. rotation and dilation - in the similarity measure. The approach can be seen as aspecific case of similarity smoothing; more examples of smoothing in intensity basedimage registration can be found in [9].We focus on deformation modeling with the Large Deformation DiffeomorphicMetric Mapping (LDDMM) registration framework which has the benefit of bothproviding good registrations and drawing strong theoretical links with Lie group the-ory and evolution equations in physical modeling [8, 34]. Most often, high-dimensionalvoxel-wise representations are used for LDDMM although recent interest in compact representations [11, 28] show that the number of parameters can be much reduced.These methods use interpolation of the velocity field by deformation atoms to repre-sent translational movement but deformation by other parts of the affine group cannotbe compactly represented.The deformation atoms are called kernels in LDDMM. The kernels are centered atdifferent spatial positions and parameters determine the contribution of each kernel.In this paper, we use the partial derivative reproducing property [35] to show thatpartial derivatives of kernels fit naturally in the LDDMM framework and constitutedeformation atoms along with the original kernels. In particular, these deformationatoms have a singular higher-order momentum and the momentum stays singularwhen transported by the EPDiff evolution equations. We show how the higher-ordermomenta allow modeling locally affine deformations, and they hence extend the capac-ity of sparsely discretized LDDMM methods. In addition, they comprise the naturalvehicle for incorporating first-order similarity measures in the framework. A number of methods for non-rigid registration have beendeveloped during the last decades including non-linear elastic methods [21], parametriza-tions using static velocity fields [2, 17], the demons algorithm [29, 32], and spline-basedmethods [25, 5]. For the particular case of LDDMM, the groundbreaking work ap-peared with the deformable template model by Grenander [16] and the flow approachby Christensen et al. [7] together with the theoretical contributions of Dupuis etal. and Trouv´e [10, 30]. Algorithms for computing optimal diffeomorphisms havebeen developed in [4], and [31] uses the momentum representation for statistics anddevelops a momentum based algorithm for the landmark matching problem.Locally affine deformations can be modeled using the Polyaffine and Log-EuclideanPolyaffine [3, 1] frameworks. The velocity of a path of deformations is here computedusing matrix logarithms, and the resulting diffeomorphism flowed forward by integrat-ing the velocity. Ideas from the Polyaffine methods have recently been incorporated ine.g. the Demons algorithm [32, 26]. In LDDMM, the deformation atoms, the kernels,represent translational movement and the non-translational part of affine transforma-tions cannot directly be represented. We will show how partial derivatives of kernelsconstitute deformation atoms which allow representing the linear parts of affine trans-formations. From a mathematical point of view, this is possible due to the partialderivative reproducing property (Zhou [35]). The partial derivative reproducing prop-erty, partial derivatives of kernels, and first-order momenta have previously been usedin [6] to derive variations of flow equations for LDDMM DTI registration, in [14] tomatch landmarks with vector features, and in [15] to match surfaces with currents.Confer the monograph [34] for information on RKHSs and their role in LDDMM.In order to reduce the dimensionality of the parametrization used in LDDMM,Durrleman et al. [11] introduced a control point formulation of the registration prob-lem by choosing a finite set of control points and constraining the momentum to beconcentrated as Dirac measures at the point trajectories. As we will see, higher-ordermomenta make a finite control point formulation possible which is different in impor-tant aspects. Younes [33] in addition considers evolution in constrained subspaces.Higher-order momenta increase the capacity of the deformation parametrization,a goal which is also treated in sparse multi-scale methods such as the kernel bundleframework [28]. This method concerns the size of the kernel in contrast to the orderwhich we deal with here. As we will discuss in the experiments section, the sizeof the kernel is important when using the higher-order representation as well, andrepresentations using higher-order momenta will likely complement the kernel bundlemethod if applied together.
We start the paper with an overview of LDDMMregistration and the mathematical constructs behind the method. In the followingsection, we describe registration using higher-order image information and parame-terization using higher-order momentum distributions. We then turn to the mathe-matical background of the method and describe the evolution of the momentum andvelocity fields governed by the EPDiff evolution equations in the first-order case. Thenext sections describes the relation to polyaffine approaches, the effect of varyingthe initial conditions, and the backwards gradient transport. We then provide ex-amples and illustrate how the deformation represented by first-order atoms can beinterpreted when registering human brains with progressing atrophy. The paper endswith concluding remarks and outlook.
2. LDDMM Registration, Kernels, and Evolution Equations.
In the LD-DMM framework, registration is performed through the action of diffeomorphisms ongeometric objects. This approach is very general and allows the framework to be ap-plied to both landmarks, curves, surfaces, images, and tensors. In the case of images,the action of a diffeomorphism ϕ on the image I : Ω → R takes the form ϕ.I = I ◦ ϕ − ,and given a fixed image I f and moving image I m , the registration amounts to a searchfor ϕ such that ϕ.I m ∼ I f . In exact matching, we wish ϕ.I m be exactly equal to I f but, more frequently, we allow some amount of inexactness to account for noise in theimages and allow for smoother diffeomorphisms. This is done by defining a similaritymeasure U ( ϕ ) = U ( ϕ.I m , I f ) on images and a regularization measure E to give acombined energy E ( ϕ ) = E ( ϕ ) + λU ( ϕ.I m , I f ) . (2.1)Here λ is a positive real representing the trade-off between regularity and goodnessof fit. The similarity measure U is in the simplest form the L -error R Ω | ϕ.I m ( x ) − I f ( x ) | dx but more advanced measures can be used (e.g. [23, 18, 9]).In order to define the regularization term E , we introduce some notations in thefollowing: Let the domain Ω be a subset of R d with d = 2 ,
3, and let V denote aHilbert space of vector fields v : Ω → R d such that V with associated norm k · k V isincluded in L (Ω , R d ) and admissible [34, Chap. 9], i.e. sufficiently smooth. Given atime-dependent vector field t v t with Z k v t k V dt < ∞ (2.2)the associated differential equation ∂ t ϕ t = v t ◦ ϕ t has with initial condition ϕ s adiffeomorphism ϕ vst as unique solution at time t . The set G V of diffeomorphismsbuilt from V by such differential equations is a Lie group, and V is its tangent spaceat the identity. Using the group structure, V is isomorphic to the tangent space ateach point ϕ ∈ G V . The inner product on V associated to a norm k · k V makes G V a Riemannian manifold with right-invariant metric. Setting ϕ v = Id Ω , the map t ϕ v t is a path from Id Ω to ϕ with energy given by (2.2) and generated by v t . Wewill use this notation extensively in the following. A critical path for the energy (2.2)is a geodesic on G V , and the regularization term E is defined using the energy by E ( ϕ ) = min v t ∈ V,ϕ v = ϕ Z k v s k V ds , (2.3)i.e. it measures the minimal energy of diffeomorphism paths from Id Ω to ϕ . Sincethe energy is high for paths with great variation, the term penalizes highly varyingpaths, and a low value of E ( ϕ ) thus implies that ϕ is regular. As a consequence of the assumed admissibilityof V , the evaluation functionals δ x : v v ( x ) ∈ R d is well-defined and continuousfor any x ∈ Ω. Thus, for any z ∈ R d the map z ⊗ δ x : v z T v ( x ) belongs to thetopological dual V ∗ , i.e. the continuous linear maps on V . This in turn implies theexistence of spatially dependent matrices K : Ω × Ω → R d × d , the kernel , such that,for any constant vector z ∈ R d , the vector field K ( · , x ) z ∈ V represents z ⊗ δ x and h K ( · , x ) z, v i V = z ⊗ δ x ( v ) for any v ∈ V , point x ∈ Ω and vector z ∈ R d . Thislatter property is denoted the reproducing property and gives V the structure of areproducing kernel Hilbert space (RKHS). Tightly connected to the norm and kernelsis the notion of momentum given by the linear momentum operator L : V → V ∗ ⊂ L (Ω , R d ) which satisfies h Lv, w i L (Ω , R d ) = Z Ω (cid:0) Lv ( x ) (cid:1) T w ( x ) dx = h v, w i V for all v, w ∈ V . The momentum operator connects the inner product on V with theinner product in L (Ω , R d ), and the image Lv of an element v ∈ V is denoted themomentum of v . The momentum Lv might be singular and in fact L (cid:0) K ( · , y ) z (cid:1) ( x ) isthe Dirac measure δ y ( x ) z . Considering K as the map z R Ω K ( · , x ) z ( x ) dx , L canbe viewed as the inverse of K . We will use the symbol ρ for the momentum whenconsidered as a functional in V ∗ while we switch to the symbol z when the momentumis realized as a vector field on Ω or for the parameters when the momentum consistsof a finite number of singular point measures.Instead of deriving the kernel from V , the opposite approach can be used: build V from a kernel, and hence impose the regularization in the framework from the kernel.With this approach, the kernel is often chosen to ensure rotational and translationalinvariance [34] and the scalar Gaussian kernel K ( x, y ) = exp( − k x − y k σ )Id d is an oftenused choice. Confer [12] for details on the construction of V from Gaussian kernels. The relation be-tween norm and momentum leads to convenient equations for minimizers of the energy(2.1). In particular, the EPDiff equations for the evolution of the momentum z t foroptimal paths assert that if ϕ t is a path minimizing E ( ϕ ) with ϕ = ϕ minimizing E ( ϕ ) and v t is the derivative of ϕ t then v t satisfies the system v t = Z Ω K ( · , x ) z t ( x ) dx ,ddt z t = − Dz t v t − z t ∇ · v t − ( Dv t ) T z t with Dz t and Dv t denoting spatial differentiation of the momentum and velocityfields, respectively. The first equation connects the momentum z t with the velocity v t , and the second equation describes the time evolution of the momentum. In themost general form, the EPDiff equations describe the evolution of the momentumusing the adjoint map. Following [34], define the adjoint Ad ϕ v ( x ) = ( Dϕ v ) ◦ ϕ − ( x )for v ∈ V . The dual of the adjoint is the functional Ad ∗ ϕ on the dual V ∗ of V definedby (Ad ∗ ϕ ρ | v ) = ( ρ | Ad ϕ ( v )). Define in addition Ad Tϕ v = K (Ad ∗ ϕ ( Lv )) which thensatisfies D Ad Tϕ v, w E = (Ad ∗ ϕ ( Lv ) | w ), and let ∇ ϕ U denote the gradient of the similaritymeasure U with respect to the inner product on V so that h∇ ϕ U, v i V = ∂ ǫ U ( ψ v ǫ ◦ ϕ ) forany variation v ∈ V and diffeomorphism path ψ v ǫ with derivative v . For optimal paths v t , the EPDiff equations assert that v t = Ad Tϕ vt v with v = − ∇ ϕ v U which leadsto the conservation of momentum property for optimal paths. Conversely, the EPDiffequations reduce to simpler forms for certain objects. For landmarks x , . . . , x N , the Here and in the following, we will use the notation ( p | v ) := p ( v ) for evaluation of the functional p ∈ V ∗ on the vector field v ∈ V . momentum will be concentrated at point trajectories x t,k := ϕ t ( x k ) as Dirac measures z t,k ⊗ δ x t,k leading to the finite dimensional system of ODE’s v t = N X l =1 K ( · , x t,l )) z t,l , ddt ϕ t ( x k ) = v t ( x t,k ) ,ddt z t,k = − N X l =1 ∇ K ( x t,l , x t,k ) z Tt,k z t,l . (2.4)
3. Registration with Higher-Order Information.
We here introduce higher-order momentum distributions for registration using higher-order information withthe LDDMM framework. We start by motivating the construction by considering theapproximation used when computing the similarity measure. We then describe thedeformation encoded by higher-order momenta and the evolution equations in thefinite case, and we use this to derive a registration algorithm using first-order infor-mation. The mathematical background behind the method will be presented in thefollowing sections.We will motivate the introduction of higher-order momenta by considering a spe-cific case of image registration: we take on the goal of using a control point formulation[11] when solving the registration problem (2.1) and hence aim for using a relativelysparse sampling of the velocity or momentum field. To achieve this, we will considerthe coupling between the transported control points { ϕ − ( x ) , . . . , ϕ − ( x N ) } and thesimilarity measure in order to ensure the momentum stays singular and localized atthe point trajectories while removing the need for warping the entire image at everyiteration of the optimization process.Considering a similarity measure U ( ϕ ) = R Ω F ( I m ( ϕ − ( x )) , I f ( x )) dx as discussedin the introduction, and a finite discretization ˜ U ( ϕ ) = 1 /N P Nk =1 F ( ϕ.I m ( x k ) , I f ( x k ))with a sparse set of control points { x k } . While using ˜ U ( ϕ ) to drive registration ofthe images will be very efficient in evaluating the warp in few points, it will suffercorrespondingly from only using image information present in those points. Apartfrom not being robust under the presence of noise in the images, the discretizationimplies that local dilation or rotation around the points ϕ − ( x k ) cannot be detected:any variation v ∈ V of ϕ keeping ϕ − ( x k ) constant for all k = 1 , . . . , N will not change˜ U ( ϕ ). Formally, if ψ ǫ is a diffeomorphism path that is equal to ϕ at t = 0 and hasderivative v at t = 0, i.e. ∂ ǫ ψ ǫ = v and ψ = ϕ , then ∂ ǫ F ( ψ ǫ .I m ( x k ) , I f ( x k )) = ∂ F ( ϕ.I m ( x k ) , I f ( x k )) · (cid:0) ∇ ϕ − ( x k ) I m (cid:1) T v ( ϕ − ( x k ))which vanishes if v ( ϕ − ( x k )) = 0. Here ∂ F denotes the derivative of F : R → R with respect to the first variable.A simple way to include more image information in the similarity measure is toconvolve with a kernel K s , and thus extend ˜ U to U ( ϕ ) = 1 N N X k =1 c K s Z Ω K s ( p + x k , x k ) F ( ϕ.I m ( p + x k ) , I f ( p + x k )) dp with c K s a normalization constant. If K s is a box kernel, this amounts to a finersampling of both the image and warp, and hence a finer discretization of the Riemannintegral. The kernel K s should not be confused with the RKHS kernel connected tothe norm on V that is used when generating the V -gradient. A Gaussian kernel maybe used for K s , and more information on using smoothing kernels for intensity basedimage registration can be found in [9, 36].The measure U ( ϕ ) is problematic since a variation of ϕ would affect not onlythe point ϕ − ( x k ) but also ϕ.I m ( p + x k ), and U ( ϕ ) will therefore be dependent on ϕ.I m ( p + x k ) for any p where K s ( p, x k ) is non-zero. In this situation, the momentum isno longer concentrated in Dirac measures located at ϕ − t ( x k ), and it will be necessaryto increase the sampling of the warp. However, a first-order expansion of ϕ − yieldsthe approximation˜ U ( ϕ ) = 1 N N X k =1 c K s Z Ω K s ( p + x k , x k ) F ( I m ( D x k ϕ − p + ϕ − ( x k )) , I f ( p + x k )) dp . (3.1)The measure ˜ U ( ϕ ) is now again local depending only on ϕ − ( x k ) and the first-orderderivatives D x k ϕ − . It offers the stability provided by the convolution with K s ,and, importantly, variations v of ϕ keeping ϕ − ( x k ) constant but changing D x k ϕ − do indeed affect the similarity measure. This implies that ˜ U ( ϕ ) is able to catchrotations and dilations and drive the search for optimal ϕ accordingly. Please notethe differences with the approach of Durrleman et al. [11]: when using ˜ U ( ϕ ) asoutlined here, the need for flowing the entire moving image forward is removed andthe momentum field will stay singular directly thus removing the need for constrainingthe form of the velocity field. Thedependence on Dϕ in the similarity measure ˜ U ( ϕ ) raises the question of how torepresent variations of Dϕ in the LDDMM framework. As we will outline here, higher-order momenta appear as the natural choice for such a representation that keeps thebenefits of the finite control point formulation. Mathematical details will follow inthe next sections.Recall the reproducing property of the RKHS structure, i.e. h K ( · , x ) z, v i V = z ⊗ δ x ( v ) for v ∈ V , x ∈ Ω and z ∈ R d . Let us define the maps z ⊗ D αx : V → R that extend the Diracs z ⊗ δ x ( v ) by measuring the derivative of v at x . These will bedenoted higher-order Diracs , and we say that the momentum distribution is of higher-order if it is a sum of higher order Diracs. When applying the momentum operator L to the higher-order Diracs, we will get partial derivatives D αx K of the RHKS kernel K . In particular, we will see that when using similarity measures such as ˜ U ( ϕ ), themomentum field will be a linear combination of higher-order Diracs and the velocityfield will, correspondingly, be a linear combination of partial derivatives of K . Thiswill imply that the finite dimensional system of ODE’s (2.4) describing the EPDiffequations in the landmark case will be extended so that the velocity v t will containpartial derivatives D αx K . In the first-order case, we will get the velocity v ( · ) = N X l =1 (cid:0) K ( · , x l ) z l + d X j =1 D j K ( · , x l ) z jl (cid:1) (3.2)where z i denotes the coefficients of the Dirac measures as in (2.4) but now the ad-ditional vectors z ji denote the coefficients of the first-order Diracs z ji ⊗ D jx i for eachof the d dimensions j = 1 , . . . , d . We will later show how these coefficients evolve. (a) The RHKS kernel encodeslocal translation. (b) Ensembles of kernels canapproximate locally affine de-formation. (c) Derivatives of the kerneldirectly encode locally affinedeformation. Fig. 3.1 . The deformation encoded by the kernel: (a) the RHKS kernel, here a Gaussian ofscale 8 in grid units, encodes local translation; (b) locally affine deformation, here expansion, can beapproximated by placing kernels close together. When moving these kernels infinitesimally close, thederivative of the kernel arises in the limit, and (c) the derivative encode locally affine deformationdirectly. With higher-order momenta, we will use derivatives of the kernel as deformation atoms.
Combined with knowledge of how variations of z i and z ji affect the system, we cantransport variational information along the optimal paths specified by the EPDiffequations and thus provide the necessary building blocks for a first-order registrationalgorithm.Figure 3.1 illustrates how the local translation encoded by the kernel is com-plemented by locally affine deformation when incorporating first-order momenta andcorresponding partial derivatives of the kernel. Using the language of deformationatoms, the first-order constructions adds partial derivatives of kernels to the usual setof atoms, and the deformation atoms are thus able to compactly encode expansion,contraction, rotation etc. We can directly interpret the coefficients of the first-ordermomenta as controlling the magnitude of these first-order deformations. In Figure 7.1in the experiments section, we give additional illustrations of the deformation encodedby the new atoms. In this section, we will derivea registration algorithm for similarity measures incorporating first-order informationsuch as ˜ U ( ϕ ). Since the algorithm works for general first-order measures, we willagain let U denote the similarity measure with ˜ U ( ϕ ) being just a particular example.There exists various choices of optimization algorithms for LDDMM registration.Roughly, they can be divided into two groups based on whether they represent theinitial momentum/velocity or the entire path ϕ t . Here, we take the approach ofincorporating first-order momenta with the shooting method of e.g. Vaillant et al.[31]. The algorithm will take a guess for the initial momentum, integrate the EPDiffequations forward, compute the similarity measure gradient ∇ U ( ϕ ), and flow thegradient backwards to provide an improved guess.The registration problem (2.1) consists of both the similarity measure and theminimal path energy E . For e.g. landmark based registration, the similarity U ( ϕ ) ismost often expressed in terms of ϕ directly whether as the similarity measure is usuallydependent on the inverse ϕ − for image registration. In the first case, the gradient ∇ ϕ U is known, and, given the initial momentum z , we can obtain the gradient ∇ z U for a gradient descent based optimisation procedure from the backwards transportequations that we derive in Section 6. For the energy part, it is a fundamental propertyof critical paths in the LDDMM framework that the energy stays constant along the0path. Thus, we can easily compute the gradient from the expressions provided inSection 4. Given this, the zeroth order matching algorithm in the initial momentumis generalized to zeroth and first-order momenta in Algorithm 1. Algorithm 1
Matching with Zeroth and First-Order Momenta. z ← initial guess for initial momentum repeat Solve EPDiff equations forwardCompute similarity U Solve backwards the transpose equationsCompute the energy gradient ∇k v k Update z from ∇k v k + ∇ z U until convergenceTraditionally, the similarity measure U ( ϕ ) is in image matching formulated usingthe inverse of ϕ , and this approach was taken when formulating the approximation˜ U ( ϕ ) in (3.1). For this reason, at finite control point formulation is naturally ex-pressed using a sampling { x , . . . , x N } in the target image with the algorithm opti-mizing for the momentum z at time t = 1. The evaluation points ϕ − ( x k ) are thengenerated by flowing backwards from t = 1 to t = 0, and the gradient of U ( ϕ ) canbe computed in ϕ − ( x k ) before being flowed forwards to update z . Algorithm 1 willaccommodate this situation by just reversing the integration directions. The controlpoints can be chosen either at e.g. anatomically important locations, at random, or ona regular grid. In the experiments, we will register expanding ventricles using controlpoints placed in the ventricles. The integration of the flow equations can beperformed with standard Runge-Kutta integrators such as Matlabs ode45 procedure.With zeroth order momenta only and N points, the forward and backwards systemconsist of 2 dN equations. With zeroth and first-order momenta, the forward systemis extended to N (2 d + d ) and the backwards system to 2 N ( d + d ). For d = 3, thisimplies an 2 . d = 3 require at least four zeroth-order atoms for each first-order atom makingthe size of the approximating system equivalent to the first-order system. Due to thenon-linearity of the systems, the effect of the approximation introduced with such asan approach is not presently established.In addition to the increase in the size of the systems, the extra floating point op-erations necessary for computing the more complicated evolution equations should beconsidered. The additional computational effort should, however, be viewed againstthe fact that the finite dimensional system can contain orders of magnitude fewercontrol points, and the added capacity of deformation description included in thederivative information. In addition and in contrast to previous approaches, we trans-port the similarity gradient only at the control point trajectories, again an order ofmagnitude reduction of transported information.
4. Higher-Order Momentum Distributions.
We now link partial derivativesof kernels to higher-order momenta using the derivative reproducing property, and weprovide details on the EPDiff evolution equations that we outlined in the previous1section. We underline that the analytical of structure of LDDMM is not changedwhen incorporating higher-order momenta, and the evolution equations will thus beparticular instances of the general EPDiff equations. These equations in Hamiltonianform constitutes the explicit expressions that allows implementation of the registrationalgorithm.We will restrict to scalar kernels when appropriate for simplifying the notation.Scalar kernels are diagonal matrices where all diagonal elements are equal. Thus, wecan consider K ( x, y ) both a matrix and a scalar so that the entries K ji ( x, y ) of thekernel in matrix form equals the scalar K ( x, y ) if and only if i = j and 0 otherwise. Recall the reproducing property ofthe RKHS structure, i.e. h K ( · , x ) z, v i V = z ⊗ δ x ( v ) for v ∈ V , x ∈ Ω and z ∈ R d .Zhou [35] shows that this property holds not only for the kernel but also for its partialderivatives. Letting D αx v denote the derivative of v ∈ V at x ∈ Ω with respect to themulti-index α , D αx v = ∂ | α | ∂ α x . . . ∂ α d x d v ( x )and defining ( D αx Kz )( y ) = D αx ( K ( · , y ) z ) for z ∈ R d , Zhou proves that D αx Kz ∈ V and that the partial derivative reproducing property h D αx Kz, v i V = z T D αx ( v ) (4.1)holds when the maps in V are sufficiently smooth for the derivatives to exist. Wedenote the maps z ⊗ D αx : V → R defined by z ⊗ D αx ( v ) := z T D αx v higher-orderDiracs , and we say that the momentum distribution is of higher order if it is a sumof higher-order Diracs. It follows that z ⊗ D αx = (cid:0) v
7→ h D αx Kz, v i V (cid:1) ∈ V ∗ . As a consequence, we can connect higher-order momenta and partial derivatives D αx K of the kernel. Recall that the momentum map L : V → V ∗ satisfies h Lv, w i L = h v, w i V . With the higher-order momenta, h LD αx Kz, v i L = h D αx Kz, v i V = z ⊗ D αx ( v ) = h z ⊗ D αx , v i L . Thus LD αx Kz = z ⊗ D αx or, shorter, LD αx K = D αx . That is, partial derivatives of thekernel and higher-order momenta corresponds just as the kernels and Diracs measuresin the usual RKHS sense.Consider a map on diffeomorphisms U : G V → R e.g. an image similarity measuredependent on ϕ . In a finite dimensional setting with N evaluation points x k , U woulddecompose as U ( ϕ ) = P ◦ Q ( ϕ ) with Q ( ϕ ) = ( ϕ ( x ) , . . . , ϕ ( x N )) and P : R dN → R . Introducing higher-order momenta, we let Q ( ϕ ) = ( D α x ( ϕ ) , . . . , D α J x N ( ϕ )) with J multi-indices α j , and decompose U as U ( ϕ ) = P ◦ Q ( ϕ ) with P : R dNJ → R . Weallow α j to be empty and hence incorporate the standard zeroth order case. Thepartial derivative reproducing property now lets us compute the V -gradient of U asa sum of partial derivatives of the kernel. Proposition 4.1.
Let ∇ kj P denote the gradient with respect to the variableindexed by D α j x k ( ϕ ) in the expression for Q . Then the gradient ∇ ϕ U ∈ V of U withrespect to the inner product in V is given by ∇ ϕ U = P Nk =1 P Jj =1 D α j x k K ∇ kjQ ( ϕ ) P . Proof . The gradient ∇ ϕ U at ϕ is defined by h∇ ϕ U, v i = ∂ ǫ U ( ǫv + ϕ ) for allvariations v ∈ V . For such v , we get using (4.1) that ∂ ǫ U ( ǫv + ϕ ) = ∂ ǫ P ◦ Q ( ǫv + ϕ ) = ∂ ǫ P ( D α j x k ( ǫv + ϕ )) = ∂ ǫ P ( ǫD α j x k v + D α j x k ϕ )= N X k =1 J X j =1 ( ∇ kjQ ( ϕ ) P ) T D α j x k v = * N X k =1 J X j =1 D α j x k ∇ kjQ ( ϕ ) P, v + V . As a result of Proposition 4.1, the momentumof the gradient of U is L ∇ ϕ U = P Nk =1 P Jj =1 ∇ kjQ ( ϕ ) P ⊗ D α j x k . In general, if v ∈ V isrepresented by a sum of higher-order momenta, the energy k v k V can be computedusing (4.1) as a sum of partial derivatives of the kernel evaluated at the points x k .To keep the notation brief, we restrict to sums of zeroth and first-order momenta inthe following. If v ( · ) = P Nk =1 (cid:0) K ( x k , · ) z k + P dj =1 D j K ( x k , · ) z jk (cid:1) , we get the energy k v k V = * N X k =1 (cid:0) K ( x k , · ) z k + d X j =1 D j K ( x k , · ) z jk (cid:1) , N X k =1 (cid:0) K ( x k , · ) z k + d X j =1 D j K ( x k , · ) z jk (cid:1)+ V = N X k,l =1 h K ( x l , · ) z l , K ( x k , · ) z k i V + N X k,l =1 d X j,i =1 D D j K ( x l , · ) z jl , D i K ( x k , · ) z ik E V + 2 N X k,l =1 d X j =1 D D j K ( x l , · ) z jl , K ( x k , · ) z k E V = N X k,l =1 (cid:0) z Tl K ( x l , x k ) z k + d X j,i =1 z i,Tk D i D j K ( x l , x k ) z jl + 2 d X j =1 z Tk D j K ( x l , x k ) z jl (cid:1) (4.2)with D jq K ( · , · ) denoting differentiation with the respect to the q th variable, q = 1 , j th coordinate, j = 1 , . . . , d . For scalar symmetric kernels, this expression reducesto k v k V = N X k,l =1 (cid:0) z Tl K ( x l , x k ) z k + d X j,i =1 (cid:0) D ∇ K ( x l , x k ) (cid:1) ij z i,Tk z jl + 2 d X j =1 ( ∇ K ( x l , x l )) j z Tk z jl (cid:1) . It is important to note that higher-order momentaoffer a convenient representation for the gradients of maps U incorporating derivativeinformation but since the partial derivatives of kernels are members of V and thehigher order momentum in the dual V ∗ , the analytical of structure of LDDMM is notchanged. In particular, the adjoint form of the EPDiff equations, i.e. that optimalpaths v t satisfy v t = Ad Tϕ vt v with v = − ∇ ϕ v U , is still valid. The momentum ρ = Lv is transported to the momentum ρ t by Ad ∗ ϕ vt p . Because( ρ t | w ) = ( ρ | Ad ϕ vt ( w )) = ( ρ | ( Dϕ vt w ) ◦ ( ϕ vt ) − ) , ρ is a sum of higher-order Diracs, ρ t will be sum of higher-order Diracs for all t .However, since the time evolution of ( ρ t | w ) with the above relation involves derivativesof Dϕ vt , this form is inconvenient for computing ρ t . Instead, we make use of theHamiltonian form of the EPDiff equations [34, P. 265]. Here, the momentum ρ t is pulled back to ρ but with a coordinate change of the evaluation vector field:the Hamiltonian form µ t is defined by (cid:0) µ t (cid:12)(cid:12) w (cid:1) := (cid:0) ρ (cid:12)(cid:12) ( Dϕ v t ) − ( y ) w ( y ) (cid:1) y where thesubscript stresses that ( Dϕ v t ) − ( y ) w ( y ) is evaluated as a y -dependent vector field.To simplify the notation, we write just ϕ t instead of ϕ v t . Using this notation, theevolution equations become ∂ t ϕ t ( y ) = d X j =1 (cid:0) µ t (cid:12)(cid:12) K j ( ϕ t ( x ) , ϕ t ( y )) (cid:1) x e j (cid:0) ∂ t µ t (cid:12)(cid:12) w (cid:1) = − d X j =1 (cid:0) µ t (cid:12)(cid:12)(cid:0) µ t (cid:12)(cid:12) D K j ( ϕ t ( x ) , ϕ t ( y )) w ( y ) (cid:1) x e j (cid:1) y . (4.3)The system forms an ordinary differential equation describing the evolution of thepath and momentum [34] when ( ρ | w ) does not involve derivatives of w , e.g. when ρ and hence ρ t is a vector field z t and the first equation therefore is an integral ∂ t ϕ t ( y ) = Z Ω K ( ϕ t ( y ) , ϕ t ( x )) z t ( x ) dx . For the higher-order case, we will need to incorporate additional information in thesystem.Again we restrict to finite sums of zeroth and first-order point measures, and wetherefore work with initial momenta on the form ρ = N X k =1 z ,k ⊗ δ x ,k + N X k =1 d X j =1 z j ,k ⊗ D j δ x ,k (4.4)with x t,r as usual denoting the point positions ϕ t ( x i ) at time t . Then (cid:0) µ t (cid:12)(cid:12) w (cid:1) = (cid:0) ρ (cid:12)(cid:12) Dϕ t ( y ) − w ( y ) (cid:1) y = Z Ω (cid:16) N X k =1 z ,k ⊗ δ x ,k + N X k =1 d X j =1 z j ,k ⊗ D j δ x ,k (cid:17) Dϕ t ( y ) − w ( y ) dy = N X k =1 (cid:0)(cid:0) Dϕ t ( x ,k ) − ,T z ,k + d X j =1 (cid:0) D j Dϕ t ( x ,k ) − (cid:1) T z j ,k (cid:1) ⊗ δ x ,k (cid:12)(cid:12) w (cid:1) + N X k =1 d X j =1 (cid:0) Dϕ t ( x ,k ) − ,T z j ,k ⊗ D j δ x ,k (cid:12)(cid:12) w (cid:1) showing that µ t = P Nk =1 µ t,k ⊗ δ x ,k + P Nk =1 P dj =1 µ jt,k ⊗ D j δ x ,k with µ t,k = Dϕ t ( x ,k ) − ,T z ,k + d X j =1 (cid:0) D j Dϕ t ( x ,k ) − (cid:1) T z j ,k µ jt,k = Dϕ t ( x ,k ) − ,T z j ,k . (4.5)4The momentum ρ t can the be recovered as (cid:0) ρ t (cid:12)(cid:12) w (cid:1) = (cid:0) µ t (cid:12)(cid:12) w ◦ ϕ t (cid:1) = (cid:0) N X k =1 µ t,k ⊗ δ x ,k + N X k =1 d X j =1 µ jt,k ⊗ D j δ x ,k (cid:1) w ◦ ϕ t = N X k =1 µ t,k ⊗ δ x t,k w + N X k =1 d X j =1 µ j,Tt,k Dw ( D j ϕ t )( x ,k )= N X k =1 µ t,k ⊗ δ x t,k w + N X k =1 d X j =1 (cid:0) d X i =1 ( D i ϕ t )( x ,k ) j µ it,k (cid:1) ⊗ D j δ x t,k w and hence the coefficients of the momentum z t,k and z jt,k (confer (4.4)) are given by z t,k = µ t,k and z jt,k = P di =1 ( D i ϕ t )( x ,k ) j µ it,k . We note that both z jt,k and µ jt,k arecoordinate vectors of the first-order parts of the momentum in ordinary and Hamil-tonian form respectively. For each point k and time t , these coordinate vectors thusrepresent two d × d tensors. Even though µ t,k in (4.5) depend on the second orderderivative of ϕ , we will show that the complete evolution in the zeroth and first-ordercase can be determined by solving for the points ϕ t ( x k, ), the matrices Dϕ t ( x k, ),and the vectors µ t,k . This will provide the computational representation we will usewhen implementing the systems.Using (4.3), ϕ t evolves according to ∂ t ϕ t ( y ) = d X i =1 Z Ω N X k =1 (cid:0) µ Tt,k ⊗ δ x ,k + d X j =1 µ jt,k ⊗ D j δ x ,k (cid:1) K i ( ϕ t ( x ) , ϕ t ( y )) dx e i = d X i =1 N X k =1 (cid:0) µ Tt,k K i ( ϕ t ( x ,k ) , ϕ t ( y )) + d X j =1 µ j,Tt,k D K i ( ϕ t ( x ,k ) , ϕ t ( y )) D j ϕ t ( x ,k ) (cid:1) e i . With scalar kernels, the trajectories x t,k are given by ∂ t ϕ t ( x ,k ) = N X l =1 (cid:0) K ( ϕ t ( x ,l ) , ϕ t ( x ,k )) µ t,l + d X j =1 ∇ K ( ϕ t ( x ,l ) , ϕ t ( x ,k )) T D j ϕ t ( x ,l ) µ jt,l (cid:1) . It is shown in [34] that the evolution of the matrix Dϕ t ( x k, ) is governed by ∂ t Dϕ t ( y ) a = d X i =1 (cid:0) µ t (cid:12)(cid:12) D K i ( ϕ t ( x ) , ϕ t ( y )) Dϕ t ( y ) a (cid:1) x e i . Inserting the Hamiltonian form of the higher-order momentum, each component ( r, c )5( r ow/ c olumn) of the matrix Dϕ t ( y ) thus evolves according to ∂ t Dϕ t ( y ) cr = (cid:0) µ t (cid:12)(cid:12) D K r ( ϕ t ( x ) , ϕ t ( y )) Dϕ t ( y ) e c (cid:1) x = Z Ω N X k =1 (cid:0) µ t,k ⊗ δ x ,k + d X j =1 µ jt,k ⊗ D j δ x ,k (cid:1) D K r ( ϕ t ( x ) , ϕ t ( y )) Dϕ t ( y ) e c dx = N X k =1 µ Tt,k D K r ( ϕ t ( x ,k ) , ϕ t ( y )) Dϕ t ( y ) e c + N X k =1 d X j =1 µ j,Tt,k (cid:0) d X i =1 (cid:0) D i D K r ( ϕ t ( x ,k ) , ϕ t ( y )) (cid:1)(cid:0) D j ϕ t ( x ,k ) (cid:1) i (cid:1) Dϕ t ( y ) e c . With scalar kernels, the evolution at the trajectories is then ∂ t Dϕ t ( x ,k ) c = N X l =1 (cid:16) ∇ K ( ϕ t ( x ,l ) , ϕ t ( x ,k )) T D c ϕ t ( x ,k ) µ t,l + d X j =1 (cid:0) D ∇ K ( ϕ t ( x ,l ) , ϕ t ( x ,k )) D j ϕ t ( x ,l ) (cid:1) T D c ϕ t ( x ,k ) µ jt,l (cid:17) . The complete derivation of the evolution of µ t is notationally heavy and can befound in the supplementary material for the paper. Combining the evolution of µ t with the expressions above, we arrive at the following result: Proposition 4.2.
The EPDiff equations in the scalar case with zeroth and first-order momenta are given in Hamiltonian form by the system ∂ t ϕ t ( x ,k ) = N X l =1 (cid:0) K ( x t,l , x t,k ) µ t,l + d X j =1 ∇ K ( x t,l , x t,k ) T D j ϕ t ( x ,l ) µ jt,l (cid:1) ∂ t Dϕ t ( x ,k ) c = N X l =1 (cid:16) ∇ K ( x t,l , x t,k ) T D c ϕ t ( x ,k ) µ t,l + d X j =1 (cid:0) D ∇ K ( x t,l , x t,k ) D j ϕ t ( x ,l ) (cid:1) T D c ϕ t ( x ,k ) µ jt,l (cid:17) ∂ t µ t,k = − N X l =1 (cid:16)(cid:0) µ Tt,k µ t,l (cid:1) ∇ K ( x t,l , x t,k )+ d X j =1 (cid:0) µ j,Tt,k µ t,l (cid:1) D ∇ K ( x t,l , x t,k ) D j ϕ t ( x ,k )+ d X j =1 (cid:0) µ Tt,k µ jt,l (cid:1) D ∇ K ( x t,l , x t,k ) D j ϕ t ( x ,l )+ d X j,j ′ =1 (cid:0) µ j ′ ,Tt,k µ jt,l (cid:1) D (cid:0) D ∇ K ( x t,l , x t,k ) D j ϕ t ( x ,l ) (cid:1) D j ′ ϕ t ( x ,k ) (cid:17) µ jt,k = Dϕ t ( x ,k ) − ,T z j ,k . (4.6)Note that both x ,k = ϕ v ( x ,k ) and Dϕ v ( x ,k ) are provided by the systemand hence can be used to evaluate a similarity measure that incorporates first-order6information. As in the zeroth order case, the entire evolution can be recovered by theinitial conditions for the momentum.
5. Locally Affine Transformations.
The Polyaffine and Log-Euclidean Poly-affine [3, 1] frameworks model locally affine transformations using matrix logarithms.The higher-order momenta and partial derivatives of kernels can be seen as the LD-DMM sibling of the Polyaffine methods, and diffeomorphism paths generated byhigher-order momenta, in particular, momenta of zeroth and first-order, can locallyapproximate all affine transformations with linear component having positive deter-minant. The approximation will depend only on how fast the kernel approaches zerotowards infinity. The manifold structure of G V provides this result immediately. In-deed, let ϕ ( x ) = Ax + b be an affine transformation with det( A ) >
0. We define apath ϕ t of finite energy such that ϕ ≈ ϕ which shows that ϕ ∈ G V and can bereached in the framework. The matrices of positive determinant is path connectedso we can let ψ t be a path from Id d to A and define ˜ ψ t ( x ) = ψ t x + bt . Then with˜ v t ( x ) = ( ∂ t ψ t ) ˜ ψ − t ( x ) + b , we have ∂ t ˜ ψ t ( x ) = ( ∂ t ψ t ) x + b = ˜ v t ◦ ˜ ψ t ( x ) and x + Z ˜ v t ◦ ˜ ψ t ( x ) dt = x + Z ( ∂ t ψ t ) x + bdt = ϕ ( x ) . Now use that ( ∂ t ψ t ) ˜ ψ − t ( x ) = ( ∂ t ψ t )( ψ t ) − ( x − bt ) and let the M t = ( m ,t . . . m d,t ) bethe t -dependent matrix ( ∂ t ψ t )( ψ t ) − so that the first term of ˜ v t ( x ) equals M t ( x − bt ).Then choose a radial kernel, e.g. a Gaussian K σ , and define the approximation v t of˜ v t by v t ( x ) = d X j =1 D j ˜ ψ t (0) K σ ( x ) m j,t + K σ ( ˜ ψ t (0) , x ) b . (5.1)The path ϕ v generated by v t then has finite energy, and ϕ v ( x ) = x + Z v t ◦ ϕ v t ( x ) dt ≈ ϕ ( x )with the approximation depending only on the kernel scale σ . Note that the affinetransformations with linear components having negative determinant can in a similarway be reached by starting the integration at a diffeomorphism with negative Jacobiandeterminant.In the experiments section, we will illustrate the locally affine transformationsencoded by zeroth and first-order momenta, and, therefore, it will be useful to intro-duce a notation for these momenta. We encode the translational part of either themomentum or velocity using the notationTsl x ( b ) = K σ ( x, · ) b and the linear part by Lin x ( M ) = d X j =1 D jx K σ ( · ) m j with m , m j being the columns of the matrix M . Equation (5.1) can then be written v t ( x ) = Lin ˜ ψ t (0) ( M t ) + Tsl ˜ ψ t (0) ( b ) . (5.2)7We emphasize that though we mainly focus on zeroth and first-order momenta, themathematical construction allows any order momenta permitted by the smoothnessof the kernel at order zero.
6. Variations of the Initial Conditions.
In Algorithm 1, we used the varia-tion of the EPDiff equations when varying the initial conditions and in particular thebackwards gradient transport. We discuss both issues here.A variation δρ of the initial momentum will induce a variation of the system(4.6). By differentiating that system, we get the time evolution of the variation. Toease notation, we assume the kernel is scalar on the form K ( x, y ) = γ ( | x − y | ) andwrite γ t,lk = K ( x t,l , x t,k ). Variations of the kernel and kernel derivatives such as theentity δ ∇ K ( x t,l , x t,k ) below depend only on the variation of point trajectories δx t,l and δx t,k . The full expressions for these parts are provided in supplementary materialfor the paper. The variation of the point trajectories in the derived system then takesthe form ∂ t δϕ t ( x ,k ) = N X l =1 (cid:0) δK ( x t,l , x t,k ) µ t,l + γ t,lk δµ t,l (cid:1) + N X l =1 d X j =1 (cid:0) δ ∇ K ( x t,l , x t,k ) T D j ϕ t ( x ,l ) µ jt,l + ∇ K ( x t,l , x t,k ) T δD j ϕ t ( x ,l ) µ jt,l + ∇ K ( x t,l , x t,k ) T D j ϕ t ( x ,l ) δµ jt,l (cid:1) The similar expressions for the evolution of δµ t,k and δDϕ t ( x ,k ) are provided in thesupplementary material. The variation of µ jt,k is available as δµ jt,k = − (cid:0) Dϕ t ( x ,k ) − δDϕ t ( x ,k ) Dϕ t ( x ,k ) − (cid:1) T z j ,k + Dϕ t ( x ,k ) − ,T δz j ,k . However, when computing the backwards transport, we will need to remove the de-pendency on δz j ,k which is only available for forward integration. Instead, by writingthe evolution of µ jt,k in the form ∂ t µ jt,k = ∂ t Dϕ t ( x ,k ) − ,T z j ,k = − (cid:0) Dϕ t ( x ,k ) − ∂ t Dϕ t ( x ,k ) Dϕ t ( x ,k ) − (cid:1) T z j ,k = − Dϕ t ( x ,k ) − ,T ∂ t Dϕ t ( x ,k ) T µ jt,k , we get the variation ∂ t δµ jt,k = − δDϕ t ( x ,k ) − ,T ∂ t Dϕ t ( x ,k ) T µ jt,k − Dϕ t ( x ,k ) − ,T ∂ t δDϕ t ( x ,k ) T µ jt,k − Dϕ t ( x ,k ) − ,T ∂ t Dϕ t ( x ,k ) T δµ jt,k . The correspondence between initial momentum ρ and end diffeomorphism ϕ v asserted by the EPDiff equations allows us to view thesimilarity measure U ( ϕ v ) as a function of ρ . Let A denote the result of integratingthe system for the variation of the initial conditions from t = 0 to t = 1 such that w = Aδρ ∈ V for a variation δρ . We then get a corresponding variation δU in thesimilarity measure. To compute the gradient of U as a function of ρ , we have δU ( ϕ v ) = (cid:10) ∇ ϕ v U, w (cid:11) V = (cid:10) ∇ ϕ v U, Aδρ (cid:11) V = (cid:10) A T ∇ ϕ v U, δρ (cid:11) V ∗ . The subscript notation is used in accordance with [34]. Please note that γ t,lk contains three separate indices, i.e. the time t and the point indices l and k . V ∗ -gradient of ∇ ρ U is given by A T ∇ ϕ v U . The gradient can equivalentlybe computed in momentum space at both endpoints of the diffeomorphism path usingthe map P defined in Proposition 4.1.The complete system for the variation of the initial conditions is a linear ODE,and, therefore, there exists a time-dependent matrix M t such that the ODE ∂ t y t = M t y t has the variation as a solution y t . It is shown in [34] that, in such cases, solving thebackwards transpose system ∂ t w t = − M Tt w t (6.1)from t = 1 to t = 0 provides the value of A T w . Therefore, we can obtain ∇ ρ U bysolving the transpose system backwards. The components of M t can be identifiedby writing the evolution equations for the variation in matrix form. This provides M Tt and allows the backwards integration of the system 6.1. The components of thetranspose matrix M t are provided in the supplementary material for the paper.
7. Experiments.
In order to demonstrate the efficiency, compactness, and in-terpretability of representations using higher-order momenta, we perform four sets ofexperiments. First, we provide four examples illustrating the type of deformationsproduced by zeroth and first-order momenta and the relation to the Polyaffine frame-work. We then use point based matching using first-order information to show howcomplicated warps that would require many parameters with zeroth order deforma-tion atoms can be generated with very compact representations using higher-ordermomenta. We underline the point that higher-order momenta allow low-dimensionaltransformations to be registered using correspondingly low-dimensional representa-tions: we show how synthetic test images generated by a low-dimensional transfor-mation can be registered using only one deformation atom when representing usingfirst-order momenta and using the first-order similarity measure approximation (3.1).We further emphasize this point by registering articulated movement using only onedeformation atom per rigid part, and thus exemplify a natural representation thatreduces the number of deformation atoms and the ambiguity in the placement of theatoms while also reducing the degrees of freedom in the representation. Finally, weillustrate how higher-order momenta in a natural way allow registration of humanbrains with progressing atrophy. We describe the deformation field throughout theventricles using few deformation atoms, and we thereby suggest a method for detectinganatomical change using few degrees of freedom. In addition, the volume expansioncan be directly interpreted from the parameters of the deformation atoms. We startby briefly describing the similarity measures used throughout the experiments.For the point examples below, we register moving points x , . . . , x N against fixedpoints y , . . . , y N . In addition, we match first-order information by specifying valuesof D jx k ϕ . This is done compactly by providing matrices Y k so that we seek D x k ϕ = Y k for all k = 1 , . . . , N . The similarity measure is simple sum of squares, i.e. U ( ϕ ) = N X i =1 k ϕ ( x k ) − y k k + k D x k ϕ − Y k k using the matrix 2-norm. This amounts to fitting ϕ against a locally affine map withtranslational components y k and linear components Y k . For the image cases, we use L -similarity to build the first-order approximation (3.1) with the smoothing kernel K s being Gaussian of the same scale as the LDDMM kernel.9 (a) Expansion (b) Contraction(c) Rotation ( − π/
2) (d) Two rotations ( π/ Fig. 7.1 . The effect of the generated deformation on an initially square grid for several initialfirst-order momenta: Using the notation of section 5, (a) expansion ρ = Lin (Id ) ; (b) contraction ρ = Lin ( − Id ) ; (c) rotation ρ = Lin (Rot( v )) , v = − π/ ; (d) two rotations v = π/ . The kernelis Gaussian with σ = 8 in grid units, and the grids are colored with the trace of Cauchy-Green straintensor (log-scale). Notice the locality of the deformation caused by the finite scale of the kernel, andthat the deformation stays diffeomorphic even when two rotations force conflicting movements. To visually illustrate the deformation gener-ated by higher-order momenta, we show in Figure 7.1 the generated deformationson an initially square grid with four different first-order initial momenta. The de-formation locally model the linear part of affine transformations and the the localityis determined by the Gaussian kernel that in the examples has scale σ = 8 in gridunits. Notice for the rotations that the deformation stays diffeomorphic in the pres-ence of conflicting forces. The similarity between the examples and the deformationsgenerated in the Polyaffine framework [1] underlines the viewpoint that the registra-tion using higher-order momenta constitutes the LDDMM sibling of the Polyaffineframework. Figure 7.2 presents simple point basedmatching results with first-order information. The lower points (red) are matchedagainst the upper points (black) with match against expansion D ϕ ( x k ) = 2Id androtation D ϕ ( x k ) = Rot( v ) = (cid:18) cos( v ) , sin( v ) − sin( v ) , cos( v ) (cid:19) for v = ∓ π/
2. The optimal dif-feomorphisms exhibit the expected expanding and turning effect, respectively. Westress that the deformations are generated using only two deformations atoms with0 (a) Match with dilations (expansion) (b) Match with rotations ( − π/ π/ Fig. 7.2 . Two moving points (red) are matched against two fixed points (black) with results(green) and with match against (a) expansion D ϕ ( x k ) = 2Id , i = 1 , ; and (b) rotation D ϕ ( x k ) =Rot( v ) , v = ∓ π/ , i = 1 , . The kernel is Gaussian with σ = 8 in grid units, and the grids arecolored with the trace of Cauchy-Green strain tensor (log-scale). combined 12 parameters. Representing equivalent deformation using zeroth ordermomenta would require a significantly increased number of atoms and a correspondincrease in the number of parameters. We now exemplify how higher-order momenta allow low-dimensional transformations to be registered using corre-spondingly low-dimensional representations. We generate two test images by applyingtwo linear transformations, an dilation and a rotation, to a binary image of a square,confer the moving images (a) and (e) in Figure 7.3. By placing one deformation atomin the center of each fixed image and by using the similarity measure approxima-tion (3.1), we can successfully register the moving and fixed images. The result anddifference plots are shown in Figure 7.3. The dimensionality of the linear transfor-mations generating the moving images is equal to the number of parameters for thedeformation atom. A registration using zeroth order momenta would need more thanone deformation atom which would result in a number of parameters larger than thedimensionality. The scale of the Gaussian kernel used for the registration is 50 pixels.
The articulated motion of the finger in Figure 7.4(a) and (b) can be described by three locally linear transformations. With higher-order momenta, we can place deformation atoms at the center of the bones in themoving and fixed images, and use the point positions together with the direction ofthe bones to drive a registration. This natural and low dimensional representationallows a fairly good match of the images resembling the use of the Polyaffine affineframework for articulated registration [26]. A similar registration using zeroth ordermomenta would need two deformation atoms per bone and lacking a natural wayto place such atoms, the positions would need to be optimized. With higher-ordermomenta, the deformation atoms can be placed in a natural and consistent way, and X-ray frames from
10 20 30 40 50 60 70 801020304050607080 (a) Moving image
10 20 30 40 50 60 70 801020304050607080 (b) Fixed image
10 20 30 40 50 60 70 801020304050607080 (c) Registration result
10 20 30 40 50 60 70 801020304050607080 (d) Difference
10 20 30 40 50 60 70 801020304050607080 (e) Moving image
10 20 30 40 50 60 70 801020304050607080 (f) Fixed image
10 20 30 40 50 60 70 801020304050607080 (g) Registration result
10 20 30 40 50 60 70 801020304050607080 (h) Difference
Fig. 7.3 . With linear transformations, the dimensionality of the higher-order representationmatches the dimensionality of the transformation. A dilation (e) and rotation (d) is applied to thefixed binary images (b) and (f), respectively. The registration results (c) and (g) subtracted fromthe fixed images are shown in the difference pictures (d) and (h). The registration is performedwith a single first-order momenta in the center of the pictures, and the number of parameters forthe registration thus matches the dimensionality of the linear representations. The slight differencesbetween results and fixed images are caused by the first-order approximation in (3.1) . Increasing thekernel size, adding more control points, or using second order momenta would imply less difference.
20 40 60 80 100 120102030405060708090100 (a) Moving
20 40 60 80 100 120102030405060708090100 (b) Fixed
20 40 60 80 100 120102030405060708090100 (c) Result, first-order
Fig. 7.4 . Registering articulated movement using directional information of the bones: thelandmarks and bone orientations (red points and arrows) in the moving image (a) are matchedagainst the landmarks and bone orientations (green points and arrows) in the fixed image (b). Theresult using first-order momenta (c) can be obtained with a low number of deformation atoms thatcan be consistently placed at the center of the bones. A corresponding zeroth order representationwould use a higher number of atoms with a corresponding increase in the number of parameters. the total number of free parameters is lower than a zeroth order representation usingtwo atoms per bone.
Atrophy occurs in the human brain among patientssuffering from Alzheimer’s disease, and the progressing atrophy can be detected bythe expansion of the ventricles [19, 13]. Since first-order momenta offer compactdescription of expansion, this makes a parametrization of the registration based onhigher-order momenta suited for describing the expansion of the ventricles, and, in2addition, the deformation represented by the momenta will be easily interpretable. Inthis experiment, we therefore suggest a registration method that using few degrees offreedom describes the expansion of the ventricles, and does so in a way that can beinterpreted when doing further analysis of e.g. the volume change.We use the publicly available Oasis dataset [20], and, in order to illustrate theuse of higher-order momenta, we select a small number of patients from which twobaseline scans are acquired at the same day together with a later follow up scan. Thepatients are in various stages of dementia, and, for each patient, we rigidly registerthe two baseline and one follow up scan [9].The expanding ventricles can be registered by placing deformation atoms in thecenter of the ventricles of the fixed image as shown in Figure 1.1. For each patient,we manually place five deformation atoms in the ventricle area of the first baseline 3Dvolume. It is important to note that though we localize the description of the defor-mation to the deformation atoms, the atoms control the deformation field throughoutthe ventricle area. Based on the size of the ventricles, we use 3D Gaussian kernelswith a scale of 15 voxels, and we let the regularization weight in (2.1) be λ = 16.The effect of these choices is discussed below. Each deformation atom consists of azeroth and first-order momenta. We use L similarity to drive the registration [9] and, for each patient, we perform two registrations: we register the two baseline scansacquired at the same day, and we register one baseline scan against the follow up scan.Thus, the baseline-baseline registration should indicate no ventricle expansion, and weexpect the baseline-follow up registration to indicate ventricle expansion. Figure 1.1shows for one patient the placement of the control points in the baseline image, thefollow up image, the log-Jacobian determinant in the ventricle area of the generateddeformation, and the initial vector field driving the registration.The use of first-order momenta allows us to interpret the result of the registrationsand to relate the results to possible expansion of the ventricles. The volume change isindicated by the Jacobian determinant of the generated deformation at the deforma-tion atoms as well as by the divergence of the first-order momenta. The latter is avail-able directly from the registration parameters. We plot in Figure 7.5 the logarithm ofthe Jacobian determinant and the divergence for both the same day baseline-baselineregistrations and for the baseline-follow up registrations. Patient 1 − See also http://image.diku.dk/darkner/LOI . a v e r age l og − J a c ob i an de t e r m i nan t non−dementeddementedbaseline−baselinebaseline−follow up (a) The average log-Jacobian of the final de-formation at the 5 deformation atoms for thebaseline-baseline and baseline-follow up regis-trations a v e r age de f o r m a t i on a t o m d i v e r gen c e non−dementeddementedbaseline−baselinebaseline−follow up (b) The average divergence at the deformationatoms for the baseline-baseline and baseline-follow up registrations Fig. 7.5 . Indicated volume change: (a) The average log -Jacobian determinant of the generateddeformation at the 5 deformation atoms for six patients (1-4 demented, 5-6 non-demented); (b)divergence of the 5 higher-order momenta representing the deformation. The divergence can beextracted directly from the parameters of the first-order momenta, and the correlation between the log -Jacobian and the divergence as seen by the similarity between (a) and (b) therefore shows theinterpretability of the deformation atoms. The time-span between baseline and follow up scans are1.5-2 years with the exception of 3 years for patient four (arrows). term in (2.1) will affect the amount of expansion captured in the registration. Becauseof the low number of control points, we can in practice set the contribution of theregularization term to zero without experiencing non-diffeomorphic results. It will beinteresting in the future to estimate the actual volume expansion directly using theparameters of the deformation atoms with this less biased model.
8. Conclusion and Outlook.
We have introduced higher-order momenta in theLDDMM registration framework. The momenta allow compact representation of lo-cally affine transformations by increasing the capacity of the deformation description.Coupled with similarity measures incorporating first-order information, the higher-order momenta improve the range of deformations reached by sparsely discretizedLDDMM methods, and they allow direct capture of first-order information such asexpansion and contraction. In addition, the constitute deformation atoms for whichthe generated deformation is directly interpretable.We have shown how the partial derivative reproducing property implies singularmomentum for the higher-order momenta, and we used this to derive the EPDiffevolution equations. By computing the forward and backward variational equations,we are able to transport gradient information and derive a matching algorithm. Weprovide examples showing typical deformation coded by first-order momenta and howimages can be registered using a very few parameters, and we have applied the methodto register human brains with progressing atrophy.The experiments included here show only a first step in the application of higher-order momenta: the representation may be applied to register entire images; mergingthe method with multi-scale approaches will increase the description capacity and maylead to further reduction in the dimensionality of the representation. Combined withefficient implementations, higher-order momenta promise to provide a step forward incompact deformation description for image registration.4
REFERENCES[1]
Vincent Arsigny, Olivier Commowick, Nicholas Ayache, and Xavier Pennec , A fast andLog-Euclidean polyaffine framework for locally linear registration , J. Math. Imaging Vis.,33 (2009), pp. 222–238.[2]
Vincent Arsigny, Olivier Commowick, Xavier Pennec, and Nicholas Ayache , A Log-Euclidean framework for statistics on diffeomorphisms , in MICCAI 2006, 2006, pp. 924–931.[3]
Vincent Arsigny, Xavier Pennec, and Nicholas Ayache , Polyrigid and polyaffine trans-formations: A novel geometrical tool to deal with non-rigid deformations – application tothe registration of histological slices , Medical Image Analysis, 9 (2005), pp. 507–523.[4]
M. Faisal Beg, Michael I. Miller, Alain Trouv´e, and Laurent Younes , Computing largedeformation metric mappings via geodesic flows of diffeomorphisms , IJCV, 61 (2005),pp. 139–157.[5]
F. L. Bookstein , Linear methods for nonlinear maps: Procrustes fits, thin-plate splines,and the biometric analysis of shape variability , in Brain warping, Academic Press, 1999,pp. 157–181.[6]
Yan Cao, M. I Miller, R. L Winslow, and L. Younes , Large deformation diffeomorphic met-ric mapping of vector fields , IEEE Transactions on Medical Imaging, 24 (2005), pp. 1216–1230.[7]
GE Christensen, RD Rabbitt, and MI Miller , Deformable templates using large deforma-tion kinematics , Image Processing, IEEE Transactions on, 5 (2002).[8]
Colin J Cotter and Darryl D Holm , Singular solutions, momentum maps and computa-tional anatomy , nlin/0605020, (2006).[9]
Sune Darkner and Jon Sporring , Generalized partial volume: An inferior density estima-tor to parzen windows for normalized mutual information , in Information Processing inMedical Imaging, vol. 6801, Springer, 2011, pp. 436–447.[10]
Paul Dupuis, Ulf Grenander, and Michael I Miller , Variational problems on flows ofdiffeomorphisms for image matching , (1998).[11]
Stanley Durrleman, Marcel Prastawa, Guido Gerig, and Sarang Joshi , Optimal data-driven sparse parameterization of diffeomorphisms for population analysis , Proc. of theInformation Processing in Medical Imaging Conference, IPMI, 22 (2011), pp. 123–134.[12]
Gregory E. Fasshauer and Qi Ye , Reproducing kernels of generalized sobolev spaces via agreen function approach with distributional operators , Numerische Mathematik, (2011).[13]
N. C. Fox, E. K. Warrington, P. A. Freeborough, P. Hartikainen, A. M. Kennedy,J. M. Stevens, and M. N. Rossor , Presymptomatic hippocampal atrophy in alzheimer’sdisease , Brain, 119 (1996), pp. 2001 –2007.[14]
Laurent Garcin , Thechniques de Mise en Correspondance et D´etection de Changements , PhDthesis, 2005.[15]
Joan Glaun`es , Transport par diff´eomorphismes de points, de mesures et de courants pour lacomparaison de formes et l’anatomie num´erique , PhD thesis, Universit´e Paris 13, Villeta-neuse, France, 2005.[16]
Ulf Grenander , General Pattern Theory: A Mathematical Study of Regular Structures , Ox-ford University Press, USA, Feb. 1994.[17]
Monica Hernandez, Matias Bossa, and Salvador Olmos , Registration of anatomical imagesusing paths of diffeomorphisms parameterized with stationary vector field flows , Interna-tional Journal of Computer Vision, 85 (2009), pp. 291–306.[18]
William M. Wells III, Paul Viola, Hideki Atsumi, Shin Nakajima, and Ron Kikinis , Multi-Modal volume registration by maximization of mutual information , Medical ImageAnalysis, 1 (1996), pp. 35 – 51.[19]
Clifford R. Jack, Ronald C. Petersen, Yue Cheng Xu, Stephen C. Waring, Peter C.O’Brien, Eric G. Tangalos, Glenn E. Smith, Robert J. Ivnik, and Emre Kokmen , Medial temporal atrophy on MRI in normal aging and very mild alzheimer’s disease , Neu-rology, 49 (1997), pp. 786 –794.[20]
Daniel S Marcus, Anthony F Fotenos, John G Csernansky, John C Morris, andRandy L Buckner , Open access series of imaging studies: longitudinal MRI data innondemented and demented older adults , Journal of Cognitive Neuroscience, 22 (2010),pp. 2677–2684. PMID: 19929323.[21]
X. Pennec, R. Stefanescu, V. Arsigny, P. Fillard, and N. Ayache , Riemannian elasticity:A statistical regularization framework for non-linear registration , in MICCAI 2005, 2005,pp. 943–950.[22]
J. P.W Pluim, J. B.A Maintz, and M. A Viergever , Image registration by maximization of combined mutual information and gradient information , IEEE Transactions on MedicalImaging, 19 (2000), pp. 809–814.[23] Alexis Roche, Gr´egoire Malandain, Xavier Pennec, and Nicholas Ayache , The correla-tion ratio as a new similarity measure for multimodal image registration , in Proceedings ofthe First International Conference on Medical Image Computing and Computer-AssistedIntervention, MICCAI ’98, Springer-Verlag, 1998, p. 1115–1124. ACM ID: 709612.[24]
Alexis Roche, Xavier Pennec, Gr´egoire Malandain, and Nicholas Ayache , Rigid reg-istration of 3D ultrasound with MR images: a new approach combining intensity andgradient information , IEEE Transactions on Medical Imaging, 20 (2001), pp. 1038—1049.[25]
D Rueckert, L I Sonoda, C Hayes, D L Hill, M O Leach, and D J Hawkes , Nonrigid regis-tration using free-form deformations: application to breast MR images , IEEE Transactionson Medical Imaging, 18 (1999), pp. 712–721. PMID: 10534053.[26]
Christof Seiler, Xavier Pennec, and Mauricio Reyes , Geometry-Aware multiscale imageregistration via OBBTree-Based polyaffine Log-Demons , in Medical Image Computing andComputer-Assisted Intervention - MICCAI, Toronto, Canada, 2011.[27]
Stefan Sommer, Mads Nielsen, Francois Lauze, and Xavier Pennec , A Multi-Scale kernelbundle for LDDMM: towards sparse deformation description across space and scales , inIPMI 2011, Springer, 2011.[28]
Stefan Sommer, M. Nielsen, and X. Pennec , Sparse multi-scale diffeomorphic registration:the kernel bundle framework , To appear in Journal of Mathematical Imaging and Vision.[29]
J.-P. Thirion , Image matching as a diffusion process: an analogy with maxwell’s demons ,Medical Image Analysis, 2 (1998), pp. 243–260.[30]
Alain Trouv´e , An infinite dimensional group approach for physics based models in patternsrecognition , 1995.[31]
M. Vaillant, M.I. Miller, L. Younes, and A. Trouv´e , Statistics on diffeomorphisms viatangent space representations , NeuroImage, 23 (2004), pp. S161–S169.[32]
Tom Vercauteren, Xavier Pennec, Aymeric Perchant, and Nicholas Ayache , Diffeo-morphic demons: efficient non-parametric image registration , NeuroImage, 45 (2009),pp. 61–72.[33]
Younes , Constrained diffeomorphic shape evolution , submitted to foundations Comp Math,(2011).[34]
Laurent Younes , Shapes and Diffeomorphisms , Springer, 2010.[35]
Ding-Xuan Zhou , Derivative reproducing properties for kernel methods in learning theory ,Journal of Computational and Applied Mathematics, 220 (2008), pp. 456–463.[36]