[PDF] An inexact Newton-Krylov algorithm for constrained diffeomorphic image registration

Abstract

We propose numerical algorithms for solving large deformation diffeomorphic image registration problems. We formulate the nonrigid image registration problem as a problem of optimal control. This leads to an infinite-dimensional partial differential equation (PDE) constrained optimization problem. The PDE constraint consists, in its simplest form, of a hyperbolic transport equation for the evolution of the image intensity. The control variable is the velocity field. Tikhonov regularization on the control ensures well-posedness. We consider standard smoothness regularization based on H 1 - or H 2 -seminorms. We augment this regularization scheme with a constraint on the divergence of the velocity field rendering the deformation incompressible and thus ensuring that the determinant of the deformation gradient is equal to one, up to the numerical error. We use a Fourier pseudospectral discretization in space and a Chebyshev pseudospectral discretization in time. We use a preconditioned, globalized, matrix-free, inexact Newton-Krylov method for numerical optimization. A parameter continuation is designed to estimate an optimal regularization parameter. Regularity is ensured by controlling the geometric properties of the deformation field. Overall, we arrive at a black-box solver. We study spectral properties of the Hessian, grid convergence, numerical accuracy, computational efficiency, and deformation regularity of our scheme. We compare the designed Newton-Krylov methods with a globalized preconditioned gradient descent. We study the influence of a varying number of unknowns in time. The reported results demonstrate excellent numerical accuracy, guaranteed local deformation regularity, and computational efficiency with an optional control on local mass conservation. The Newton-Krylov methods clearly outperform the Picard method if high accuracy of the inversion is required.

Full PDF

AAN INEXACT NEWTON-KRYLOV ALGORITHM FOR CONSTRAINED DIFFEOMORPHIC IMAGEREGISTRATION

ANDREAS MANG ∗ AND

GEORGE BIROS ∗ Abstract.

We propose numerical algorithms for solving large deformation diffeomorphic image registration problems. Weformulate the nonrigid image registration problem as a problem of optimal control. This leads to an inﬁnite-dimensional partialdifferential equation (PDE) constrained optimization problem.The PDE constraint consists, in its simplest form, of a hyperbolic transport equation for the evolution of the image intensity. Thecontrol variable is the velocity ﬁeld. Tikhonov regularization on the control ensures well-posedness. We consider standard smooth-ness regularization based on H - or H -seminorms. We augment this regularization scheme with a constraint on the divergence ofthe velocity ﬁeld (control variable) rendering the deformation incompressible (Stokes regularization scheme) and thus ensuring thatthe determinant of the deformation gradient is equal to one, up to the numerical error.We use a Fourier pseudospectral discretization in space and a Chebyshev pseudospectral discretization in time. The latterallows us to reduce the number of unknowns and enables the time-adaptive inversion for nonstationary velocity ﬁelds. We usea preconditioned, globalized, matrix-free, inexact Newton-Krylov method for numerical optimization. A parameter continuationis designed to estimate an optimal regularization parameter. Regularity is ensured by controlling the geometric properties of thedeformation ﬁeld. Overall, we arrive at a black-box solver that exploits computational tools that are precisely tailored for solvingthe optimality system. We study spectral properties of the Hessian, grid convergence, numerical accuracy, computational efﬁciency,and deformation regularity of our scheme. We compare the designed Newton-Krylov methods with a globalized Picard method(preconditioned gradient descent). We study the inﬂuence of a varying number of unknowns in time.The reported results demonstrate excellent numerical accuracy, guaranteed local deformation regularity, and computationalefﬁciency with an optional control on local mass conservation. The Newton-Krylov methods clearly outperform the Picard methodif high accuracy of the inversion is required. Our method provides equally good results for stationary and nonstationary velocityﬁelds for two-image registration problems. Key words. large deformation diffeomorphic image registration, optimal control, PDE constrained optimization, Stokes regular-ization, Newton-Krylov method, pseudospectral Galerkin method, Stokes solver

AMS subject classiﬁcations.

1. Introduction and Motivation.

Image registration has become a key area of research in computervision and (medical) image analysis [54, 65]. The task is to establish spatial correspondence between twoimages m R : ¯ Ω → R , x (cid:55)→ m R ( x ) , and m T : ¯ Ω → R , x (cid:55)→ m R ( x ) , with compact support on some domain Ω : = ( − π , π ) d ⊂ R d via a mapping y : R d → R d , x (cid:55)→ y ( x ) , such that m T ◦ y ≈ m R [22, 27], where ◦ is the function composition. Here, m T is referred to as the template image (the image to be registered), m R is referred to as the reference image (the ﬁxed image), and d ∈ {

2, 3 } is the data dimensionality. Welimit ourselves to nonrigid image registration. The search for y is typically formulated as a variationaloptimization problem [27, 54],(1.1) min y ∈Y (cid:26) J [ y ] : = (cid:107) m R − m T ◦ y (cid:107) L ( Ω ) + β S [ y ] (cid:27) .The proximity between m R and m T ◦ y is measured on the basis of an L -distance (other measures can beconsidered [54, 65]). The functional S in (1.1) is a regularization model that is introduced to overcomeill-posedness. The regularization parameter β > S . Various regularizationmodels S have been considered (see [13, 14, 19, 22, 25, 26, 28, 42, 63] for examples).A key requirement in many image registration problems is that the mapping y is a diffeomorphism [4,14, 23, 68, 69, 70]. This translates into the necessary condition det ( ∇ y ) >

0, where ∇ y ∈ R d × d is the deformation gradient (also referred to as the Jacobian matrix ). An intuitive approach is to explicitly safeguardagainst nondiffeomorphic mappings y by adding a constraint to (1.1) [39, 48, 61]. Another strategy is toperform the optimization on the manifold of diffeomorphic mappings [2, 3, 4, 44, 52, 53, 69, 70]. Thelatter models, in general, do not control geometric properties of the deformation ﬁeld and may result ∗ The Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin, Texas, 78712-0027, US( [email protected] , [email protected] ) 1 a r X i v : . [ m a t h . NA ] M a y ANDREAS MANG AND GEORGE BIROS in ﬁelds that are close to being nondiffeomorphic. Further, for certain image registration problems, re-stricting the search space to the manifold of diffeomorphisms does not necessarily guarantee that y is physically meaningful . Some applications may beneﬁt from extending these types of models by introducingadditional constraints. One example for such a constraint is incompressibility (i.e. enforcing det ( ∇ y ) = ( ∇ y ) from identity. This will be the topic of a follow-up paper, in which we will extend ourformulation. Here, we focus on the algorithmic issues of incorporating the incompressibility constraint.Furthermore, we remark that our optimal control formulation can naturally be extended to account formore complex constraints on the velocity ﬁeld, for example, constraints related to biophysical models(examples of such models can be found in [31, 40, 45, 51, 66]).In what follows, we outline our method (see §1.1), summarize the key contributions (see §1.2), list thelimitations of our method (see §1.3), and relate our work to existing approaches (see §1.4). We assume the images are compactly supported functions, deﬁned onthe open set Ω : = ( − π , π ) d ⊂ R d with boundary ∂ Ω and closure ¯ Ω : = Ω ∪ ∂ Ω . The deformation ismodeled in an Eulerian frame of reference. We introduce a pseudotime variable t > v : ¯ Ω × [

0, 1 ] → R d , ( x , t ) (cid:55)→ v ( x , t ) , as follows:(1.2a) min v ∈V (cid:26) J [ v ] : = (cid:107) m R − m (cid:107) L ( Ω ) + (cid:90) S [ v ] d t subject to C [ m , v ] = (cid:27) ,where(1.2b) C [ m , v ] : =  ∂ t m + ∇ m · v m − m T γ ( ∇ · v ) in Ω × (

0, 1 ] ,in Ω × { } ,in Ω × [

0, 1 ] ,with periodic boundary conditions on ∂ Ω . The parameter γ ∈ {

0, 1 } in (1.2b) is introduced to enable ordisable this constraint on the control v . In PDE constrained optimization theory, m is referred to as the statevariable and v as the control variable . The ﬁrst equation in (1.2b), in combination with its initial condition(second equation), models the ﬂow of m T subject to v , where m : ¯ Ω × [

0, 1 ] → R , ( x , t ) (cid:55)→ m ( x , t ) ,represents the transported intensities of m T . Accordingly, the ﬁnal state m : = m ( · , 1 ) , m : ¯ Ω → R , x (cid:55)→ m ( x ) , corresponds to m T ◦ y in (1.1). We measure the proximity between the deformed templateimage m and the reference image m R in terms of an L -distance. Once we have found v , we can compute y from v as a post-processing step (this is also true for the deformation gradient ∇ y ; see §D for details). Thethird equation in (1.2b) is a control on the divergence of v and guarantees that the ﬂow is incompressible(Stokes ﬂow), i.e. the volume is conserved. This is equivalent to enforcing det ( ∇ y ) = H - or an H -seminorm for the smoothness regularization S (resulting in Lapla-cian or biharmonic vector operators, respectively; see §3.1). We report results for standard H - and H -regularization (neglecting ∇ · v =

0, i.e. γ = ∇ · v =

0, i.e. γ = m , the velocity v and the adjoint variables for the transport andthe divergence condition. Efﬁciently solving this system is a signiﬁcant challenge. For example, whenwe include the incompressibility constraint, the equation for the velocity ends up being a linear Stokesequation.We solve for the ﬁrst-order optimality conditions using a globalized, matrix-free, preconditionedNewton-Krylov method for the Schur complement of the velocity v (a linearized Stokes problem driven bythe image mismatch). We ﬁrst derive the optimality conditions and then discretize using a pseudospectraldiscretization in space with a Fourier basis. We use a second-order Runge-Kutta method in time. The pre-conditioner for the Newton-Krylov schemes is based on the exact spectral inverse of the second variationof S . NEWTON-KRYLOV ALGORITHM FOR CONSTRAINED IMAGE REGISTRATION The fundamental contributions are: • We design a numerical scheme for (1.2) with the following key features: – We use an adjoint-based Newton-type solver. – We provide a fast Stokes solver. – We introduce a spectral Galerkin method in time. – We design a parameter continuation method for automatically selecting the regularizationparameter β . – Our framework guarantees deformation regularity . • We provide a numerical study for the designed framework. We compare a globalized Picardmethod (preconditioned gradient descent) to an inexact Newton-Krylov and a Gauss-Newton-Krylov method. We report results for synthetic and real-world problems. We study spectralproperties, grid convergence and numerical accuracy of the proposed scheme. We study the ef-fects of compressible (plain H - and H regularization) and incompressible (Stokes regularization)deformation models. We report results for a varying number of unknowns in time (i.e. invertingfor stationary and nonstationary velocity ﬁelds).The numerical discretization (pseudospectral) allows for an efﬁcient solution of the Stokes-like equa-tions by eliminating the pressure (i.e. the adjoint variable for the incompressibility constraint ∇ · v = . In fact, forsmooth images, our scheme is spectrally accurate in space and second-order accurate in time. We willsee that we can numerically enforce incompressibility up to almost machine precision. Also, our schemeallows for efﬁcient preconditioning of the Hessian: at the cost of a diagonal scaling we obtain a problemwith a bounded condition number.Overall, we demonstrate that the designed framework( i ) is efﬁcient and accurate, ( ii ) features a precisecontrol of the deformation regularity, and ( iii ) does not require manual tuning of parameters. The main limitations of our method are: • The considered model assumes a constant illumination of m R and m T (a consequence of thetransport equation and the L -distance in (1.2)). Therefore, it is (in its current state) not directlyapplicable to multi-modal registration problems. Nevertheless, let us remark that the L -distanceis commonly used in practice [4, 16, 28, 41, 44, 48, 57, 71]. • The efﬁcient use of a Fourier discretization for the PDEs requires periodic boundary conditions. Ifthe images are not periodic, we artiﬁcially introduce periodic boundary conditions by molliﬁcationand zero padding.

Due to the vast body of literature, it is not possible to provide a comprehensivereview of numerical methods for nonrigid image registration. Background on image registration formu-lations and numerics can be found in [27, 54, 65]. We limit the discussion to approaches that( i ) modelthe deformation via a velocity ﬁeld v , ( ii ) view image registration as a problem of optimal control and/or( iii ) constrain v to be divergence-free (i.e. introduce a mass conservation equation as an additional con-straint).Fluid mechanical models have been introduced [18, 19] to overcome limitations of small deformationmodels [13, 25, 26]. The work in [18, 19] has been extended in [4, 23, 53, 68] using concepts from differentialgeometry. This class of approaches is referred to as large deformation diffeomorphic metric mapping (LDDMM). Note that controlling the magnitude of det ( ∇ y ) is not sufﬁcient to guarantee that y is locally well behaved. Therefore, ourframework features geometric constraints that guarantee a nice, locally diffeomorphic mapping y . In particular, we control the shearangle of the cells within the deformed grid during the parameter continuation in β . The inf-sup condition is an important requirement when solving Stokes-like problems via the ﬁnite element method (see [12,p. 200ff.]; examples for a ﬁnite element discretization of similar problem formulations can be found in [16, 62]). Satisfying the inf-supcondition ensures that the ﬁnite element solution exists, is stable and optimal. Essentially, we require two different ﬁnite elementspaces for the discretization of the pressure and the velocity. The inf-sup condition is key for the decision on an adequate pair ofspaces. For our scheme, we can use the same basis (Fourier) for the discretization of the pressure and the velocity. Also, it is veryefﬁcient since we can eliminate the pressure from the optimality system (see §4.3.4).

ANDREAS MANG AND GEORGE BIROS

Under the assumption that v is adequately smooth, it is guaranteed that y is a diffeomorphism [23, 68].The associated smoothness requirements are enforced by the regularization model S (typically an H -norm) [3, 4]. The optimization is performed on the space of nonstationary velocity ﬁelds [4]. To reduce thenumber of unknowns, it has been suggested to perform the optimization either on the space of stationaryvelocity ﬁelds [1, 2, 44] or with respect to an initial momentum that entirely deﬁnes the ﬂow of the map y [3, 71, 74].The idea of parameterizing a diffeomorphism y via a stationary velocity ﬁeld [1] has also been intro-duced to the demons registration framework [52, 69, 70]. Here, optimization is performed in a sequentialfashion, alternating between updates resulting from the distance measure (forcing term) and the applica-tion of a smoothing operator to regularize the problem (typically through Gaussian smoothing [67, 69, 70]).This scheme is somewhat equivalent to the Picard scheme we discuss in our paper but it is unclear howone couples it with line search or trust region techniques.Approaches that more closely reﬂect an optimal control PDE constrained optimization formulation (1.2)are described in [10, 16, 41, 48, 49, 57, 62, 71]. The model in [16] is equivalent to (1.2). The model in [62]follows the traditional optical ﬂow formulation [46]. The conceptual difference between our formula-tion and the latter is that in optical ﬂow, the transport equation constraint appears in the objective (seee.g. [46, 47, 62]) and is therefore only fulﬁlled approximately in an L least squares sense. We treat it asa hard constraint instead. In [57] an optimal mass transport formulation is described that is based onthe Monge-Kantorovich problem. The formulations in [41, 71] do not account for incompressibility. Theoptical ﬂow approach in [10] treats incompressibility as an L -penalty. An optimal control formulation fora constant-in-time velocity ﬁeld was proposed in [48, 49], in which the divergence of the velocity ﬁeld ispenalized along with smoothness constraints.What sets our work apart are the numerical algorithms and the discretization scheme. Almost allexisting efforts on large deformation diffeomorphic image registration that are closely related to our op-timal control formulation exclusively use ﬁrst-order information for numerical optimization [2, 4, 10, 16,18, 19, 41, 44, 48, 49, 57, 62, 71, 74]. We use second-order information. The only work in the context oflarge deformation diffeomorphic image registration that to our knowledge uses second-order informationis [3]. The model in [3] is based on the LDDMM framework [4, 23, 48, 49, 53, 44, 68]. The inversion is,likewise to [71], performed with respect to an initial momentum. No additional constraints on v are con-sidered. Another difference is that we use a Galerkin method in time to reduce the number of unknowns.This allows us to invert for stationary [1, 2, 44, 52, 69, 70] as well as time-varying [4, 10, 16, 41] velocityﬁelds. Nothing changes in our formulation other than the number of unknowns. Furthermore, we glob-alize our methods with a line search strategy (i.e. we guarantee a sufﬁcient decrease of the objective J ).This is a standard—yet important—ingredient for guaranteeing convergence, which is often not accountedfor [2, 16, 17, 41, 52, 69, 70].We, likewise to [10, 16, 48, 49, 52, 62], consider incompressibility as an optional constraint (see (1.2b)).Operating with divergence-free velocity ﬁelds is equivalent to enforcing det ( ∇ y ) = ( ∇ y ) can be found in [14, 36, 37, 39,50, 55, 58, 60, 61, 64, 73]). Unlike [10, 48, 49], which penalize the divergence of the velocity, we treat itexactly. We are not arguing that this approach is better per se. The use of penalties is adequate unlessone has reasons to insist on an incompressible velocity ﬁeld. In that case, a penalty method results inill-conditioning.Finally our pseudospectral formulation in space allows us to resolve several numerical difﬁcultiesrelated to the incompressibility constraint. For example, the inf-sup condition for pressure spaces is notan issue with our scheme. Regarding accuracy, for smooth images, our scheme is spectrally accurate inspace and second-order accurate in time. We do not have to use different discretization models [16, 62] forsolving the individual subsystems of the mixed-type (hyperbolic-elliptic) optimality conditions. Since weuse second-order explicit time stepping in combination with Fourier spectral methods, we have at handa scheme that displays minimal numerical diffusion and does not require ﬂux-limiters [10, 16, 41, 71]. After the submission of our work, another contribution on second-order numerical optimization for LDDMM appeared [43]. NEWTON-KRYLOV ALGORITHM FOR CONSTRAINED IMAGE REGISTRATION T able Notation (frequently used acronyms and symbols).

Notation DescriptionGN Gauss-NewtonKKT system Karush-Kuhn-Tucker systemPCG preconditioned conjugate gradient methodPDE partial differential equationPDE solve solution of hyperbolic PDEs of optimality systems (4.1) and (4.3) m R reference (ﬁxed) image ( m R : R d → R ) m T template image (image to be registered; m T : R d → R ) y mapping (deformation; y : R d → R d ) v velocity ﬁeld (control variable; v : R d × [

0, 1 ] → R d ) m state variable (transported image; m : R d × [

0, 1 ] → R ) m state variable at t = m : R d → R ) λ adjoint variable (transported mismatch; λ : R d × [

0, 1 ] → R ) f body force (drives the registration; f : R d × [

0, 1 ] → R d ) F deformation gradient (tensor ﬁeld) at t = F : R d → R d × d ) J objective functional S regularization functional A differential operator (ﬁrst and second variation of S ) β regularization parameter γ parameter that enables ( γ =

1) or disables ( γ =

0) the incompressibility constraint n t number of time points (discretization) n c number of coefﬁcient ﬁelds (spectral Galerkin method in time) n x number of grid points (discretization; n x = ( n x , . . . , n dx ) T ) g reduced gradient (ﬁrst variation of Lagrangian with respect to v ) H reduced Hessian (second variation of Lagrangian with respect to v ) As we will see, the conditioning of the Hessian that appears in our Newton-Krylov scheme can be quitebad. Although the literature for preconditioners for PDE constrained optimization problem is quite rich(e.g. [6, 10, 9, 9, 34]), none of these methods directly applies to our formulation. Developing effectivepreconditioning schemes for our formulation is ongoing work in our group.

2. Outline.

In §3 the mathematical model is developed. The numerical strategies are described in §4.In particular, we specify( i ) the optimality conditions (see §4.1), ( ii ) strategies for numerical optimization(see §4.2) and ( iii ) implementation details (see §4.3). Numerical experiments on synthetic and real-worlddata are reported in §5. Final remarks can be found in §6.

3. Continuous Problem Formulation.

We provide a summary of the basic notation in Tab. 3. Theoriginal problem formulation is stated in §1.1. The only missing building block is the considered choicesfor S in (1.2a). This is what we discuss next. Note that we neglect any technicalities with respect to the as-sociated function spaces; we assume that the considered functions are adequately smooth (i.e. sufﬁcientlymany derivatives exist and are bounded). In contrast to [10] we do not explicitly enforce continuity in time. Werelax the model to an L -integrability instead (see (1.2a)). This relaxation still yields a velocity ﬁeld thatvaries smoothly in time [16].Quadratic smoothing regularization models are commonly used in nonrigid image registration [27,54, 56, 65]. They can be deﬁned as(3.1) S [ v ] : = β (cid:107) v (cid:107) W = β (cid:104)B [ v ] , B [ v ] (cid:105) L ( Ω ) ,where B is a differential operator that (together with its dual) deﬁnes the function space W and β > S .As images are functions of bounded variation, regularity requirements on v ∈ V , V : = L ([

0, 1 ] ; W ) (i.e. the choice of W in (3.1)) have to be considered with care (for an analytical result see [16]). Experimental ANDREAS MANG AND GEORGE BIROS analysis suggests that an H -seminorm is appropriate if incompressibility is considered (i.e. γ = B [ v ] = ∇ v .This choice is motivated from continuum mechanics and yields a viscous model of linear Stokes ﬂow (see §4.1; Stokes regularization). If we neglect the incompressibility constraint (i.e. γ = B [ v ] = ∇ v ,instead. This choice is motivated by the fact that H -norm based quadratic regularization is commonlyused in large deformation diffeomorphic image registration [4, 41, 44].

4. Numerics.

We describe the numerical methods used to solve (1.2) next. Whenever discretizedquantities are considered, a superscript h is added to the continuous variables and operators (i.e. thediscretized representation of v is denoted by v h ). Likewise, if we refer to a discrete variable at a particulariteration, we will add the iteration index as a subscript (i.e. v h at iteration k is denoted by v hk ).We discretize the data on a nodal grid in space and time. The number of spatial grid points is denotedby n x : = ( n x , . . . , n dx ) T ∈ N d with spatial step size h x : = ( h x , . . . , h dx ) ∈ R d > . The number of time points isdenoted by n t ∈ N with step size h t = n t , h t > method of Lagrange multipliers to numerically solve (1.2) with Lagrange multipliers λ :¯ Ω × [

0, 1 ] → R , ( x , t ) (cid:55)→ λ ( x , t ) (for the hyperbolic transport equation in (1.2b)), and p : ¯ Ω × [

0, 1 ] → R , ( x , t ) (cid:55)→ p ( x , t ) (pressure; for the incompressibility constraint in (1.2b)). We use an optimize-then-discretize approach (for a discussion on advantages and disadvantages see [32, p. 57ff.]). The resulting optimalityconditions are described next. Computing variations of the Lagrangian with respect to perturbationsof the state ( m ), adjoint ( λ and p ) and control ( v ) variables, respectively, yields the (necessary) ﬁrst-orderoptimality (KKT) conditions (in strong form) ∂ t m + ∇ m · v = Ω × (

0, 1 ] ,(4.1a) − ∂ t λ − ∇ · ( v λ ) = Ω × [

0, 1 ) ,(4.1b) γ ( ∇ · v ) = Ω × [

0, 1 ] ,(4.1c) g : = β A [ v ] + γ ∇ p + f = Ω × [

0, 1 ] ,(4.1d)subject to the initial and terminal conditions m = m T in Ω × { } and λ = − ( m − m R ) in Ω × { } and periodic boundary conditions on ∂ Ω ; (4.1d) is referred to as the reduced gradient , where f : = λ ∇ m , f : ¯ Ω × [

0, 1 ] → R d , ( x , t ) (cid:55)→ f ( x , t ) , is the applied body force and A = BB H is the Gˆateaux derivative of S .In particular, we have A [ v ] = −∇ v ( H -regularization; γ = ) ,(4.2a) A [ v ] = ∇ v ( H -regularization; γ = ) ,(4.2b)respectively. We refer to (4.1a) (hyperbolic initial value problem) as the state equation , to (4.1b) as the adjointequation (hyperbolic ﬁnal value problem) and to (4.1d) as the control equation (elliptic problem). Note thatthe adjoint equation (4.1b) is, likewise to (4.1a), a scalar conservation law that ﬂows the mismatch between m R and m backward in time. If we neglect the incompressibility constraint in (1.2b), γ in (4.1) is set tozero (i.e. (4.1) consists only of (4.1a), (4.1b) and (4.1d)). NEWTON-KRYLOV ALGORITHM FOR CONSTRAINED IMAGE REGISTRATION ∂ t ˜ m + ∇ ˜ m · v + ∇ m · ˜ v = Ω × (

0, 1 ] ,(4.3a) − ∂ t ˜ λ − ∇ · ( v ˜ λ ) − ∇ · ( ˜ v λ ) = Ω × [

0, 1 ) ,(4.3b) γ ( ∇ · ˜ v ) = Ω × [

0, 1 ] ,(4.3c) H [ ˜ v ] : = β A [ ˜ v ] + γ ∇ ˜ p + ˜ f = − g in Ω × [

0, 1 ] ,(4.3d)subject to initial and terminal conditions ˜ m : = ˜ m ( · , 0 ) =

0, ˜ m : ¯ Ω → R , x (cid:55)→ ˜ m ( x ) , and ˜ λ : = ˜ λ ( · , 1 ) = − ˜ m , ˜ λ : ¯ Ω → R , x (cid:55)→ ˜ λ ( x ) , ˜ m : = ˜ m ( · , 1 ) , ˜ m : ¯ Ω → R , x (cid:55)→ ˜ m ( x ) , respectively, and periodic boundaryconditions on ∂ Ω . Here, (4.3a), (4.3b) and (4.3d) are referred to as the incremental state, adjoint and controlequations, respectively; the incremental variables are denoted with a tilde. Further, H in (4.3d) is referredto as the reduced Hessian and ˜ f : = ˜ λ ∇ m + λ ∇ ˜ m , ˜ f : ¯ Ω × [

0, 1 ] → R , ( x , t ) (cid:55)→ ˜ f ( x , t ) , is the incrementalbody force . The operator A in (4.3d) represents the second variation of S with respect to the control v . Weuse the same symbol as in (4.1), since the second variation of S with respect to v is identical to its ﬁrstvariation (the corresponding vectorial differential operators are given in (4.2a) and (4.2b), respectively). We discuss strategies for numerical optimization next. We considersecond-order Newton-Krylov methods (see §4.2.1) and a ﬁrst-order Picard method (see §4.2.3).We use a backtracking line search subject to the

Armijo condition with search direction s k ∈ R n and stepsize α k > k ∈ N to ensure a sequence of monotonically decreasing objective values J h (we use default parameters; see [59, Algorithm 3.1, p. 37]). Note that each evaluation of J h requires aforward solve (i.e. the solution of (4.1a) to obtain m h given some trial solution v hk ∈ R n ). Therefore, it isdesirable to keep the number of line search steps at minimum. Applying Newton’s method to (4.1) yields a large KKT sys-tem that has to be solved numerically at each outer iteration k . We will refer to the iterative solution of thissystem as inner iterations . In reduced space methods , incremental adjoint and state variables are eliminatedfrom the system via block elimination (under the assumption that state and adjoint equations are fulﬁlledexactly) [8, 9]. We obtain the reduced KKT system(4.4) H hk ˜ v hk = − g hk , k ∈ N ,where H hk ∈ R n × n corresponds to the reduced Hessian in (4.3d) (i.e. the Schur complement of the fullHessian for the control variable v h ) and ˜ v hk ∈ R n to the incremental control variable in (4.3) (which isnothing but the search direction s k mentioned earlier). Further, the right-hand side g hk ∈ R n correspondsto the reduced gradient in (4.1d).The numerical scheme amounts to a sequential solution of the optimality conditions (4.1) and (4.3).Alg. 1 illustrates a realization of an outer iteration . Note that we eliminate (4.1c) and (4.3c) from theoptimality conditions (see §4.3.4). The inner iteration (i.e. the solution of (4.4)) is what we discuss next.Forming or storing H h in (4.4) is computationally prohibitive. Therefore, it is desirable to use aniterative solver for which H h does not have to be assembled in practice. Krylov-subspace methods area popular choice [5, 8, 9, 15, 34], as they only require matrix-vector products. We use a PCG method,exploiting the fact that H h is positive deﬁnite (i.e. H h (cid:31)

0; see §4.2.2 for a discussion) and symmetric.Solving (4.4) exactly can be prohibitively expensive and might not be justiﬁed if an iterate is far fromthe (true) solution [20]. A common strategy is to perform inexact solves . That is, starting with a largetolerance for the Krylov-subspace method we successively reduce the tolerance and by that solve moreaccurately for the search direction, as we approach a (local) minimizer [21, 24]. This can be achieved with As opposed to the steps for updating v hk , which we refer to as outer iterations ; see Alg. 1. Note that the scheme in Alg. 1 also applies to the Picard method (see §4.2.3). The only difference is the way we compute thesearch direction s k in line 5. ANDREAS MANG AND GEORGE BIROS

Algorithm 1

Outer iteration of the designed inexact Newton-Krylov method. v h ←

0; compute m h , λ h , J h ( v h ) and g h ; k ← while true do stop ← (4.9) if stop break s k ← solve (4.4) given m hk , λ hk , v hk and g hk (cid:66) Newton step α k ← perform line search on s k v hk + ← v hk + α k s k , m hk + ( t = ) ← m hT m hk + ← solve (4.1a) forward in time given v hk + (cid:66) forward solve λ hk + ( t = ) ← ( m hR − m hk + ( t = )) λ hk + ← solve (4.1b) backward in time given v hk + and m hk + (cid:66) adjoint solve compute J h ( v hk + ) and g hk + given m hk + , λ hk + and v hk + k ← k + end while the termination criterion(4.5) (cid:107)H h ι ˜ v h ι + g hk (cid:107) = : (cid:107) r ι (cid:107) ≤ η k (cid:107) g hk (cid:107) for the Krylov-subspace method. Here, η k : = min ( (cid:113) (cid:107) g hk (cid:107) / (cid:107) g h (cid:107) ) , η k ∈ [

0, 1 ) , is referred to as forcingsequence (assuming superlinear convergence; details can be found in [59, p. 165ff.]); ι ∈ N in (4.5) is theiteration index of the inner iteration (i.e. for the iterative solution of (4.4)) at a given outer iteration k .The course of an inner iteration follows the standard PCG steps (see e.g. [59, p. 119, Alg. 5.3]). Duringeach inner iteration ι we have to apply H h in (4.3d) to a vector. We summarize this matrix-vector productin Alg. 2. As can be seen, each application of H h requires an additional forward and adjoint solve (i.e.the solution of the incremental state and adjoint equations (4.3a) and (4.3b), respectively). This is a directconsequence of the block elimination in reduced space methods.The number of inner iterations essentially depends on the spectrum of the operator H h . Typically, H h displays poor conditioning. An optimal preconditioner P ∈ R n × n renders the number of iterationsindependent of n and β . The design of such a preconditioner is an open area of research [6, 7, 8, 9,34]. Standard techniques like incomplete factorizations or algebraic multigrid are not applicable, as theyrequire the assembling of H h in (4.4). Geometric, matrix-free preconditioners are a valid option. Thisis something we will investigate in the future. Here, we consider a left preconditioner based on theexact spectral inverse of the regularization part of H h . That is, P : = A h (implementation details can befound in §4.3.3). Note that the PCG method only requires the action of P − on a vector (i.e. a matrix-freeimplementation is in place). Since we use a Fourier spectral method, the cost of our preconditioningamounts to a spectral diagonal scaling. We will refer to this algorithm as Newton-PCG (N-PCG) method. Algorithm 2

Hessian matrix-vector product of the designed inexact Newton-Krylov algorithm at outeriteration k ∈ N . We illustrate the computational steps required for applying H h in (4.3d) to the PCGsearch direction at inner iteration index ι ∈ N . ˜ m h ι ( t = ) ← ˜ m h ι ← solve (4.3a) forward in time given m hk , v hk and ˜ v h ι (cid:66) incremental forward solve ˜ λ h ι ( t = ) ← − ˜ m h ι ( t = ) ˜ λ h ι ← solve (4.3b) backward in time given λ hk , v hk and ˜ v h ι (cid:66) incremental adjoint solve apply H h ι in (4.3d) to the PCG search direction given λ hk , ˜ λ h ι , m hk and ˜ m h ι NEWTON-KRYLOV ALGORITHM FOR CONSTRAINED IMAGE REGISTRATION Even though H h is in the proximity of a (local) minimum by constructionpositive semideﬁnite (i.e. H h (cid:23)

0) it can be indeﬁnite or singular far away from the solution. Accordingly,the search direction is not guaranteed to be a descent direction. One remedy is to terminate the inneriteration whenever negative curvature occurs [21]. Another approach is to use a Quasi-Newton approxi-mation. We consider a GN approximation H h GN instead; we drop certain expressions of H h , which in turnguarantees that H h GN (cid:31)

0. In particular, we drop all expressions in (4.3) in which λ appears. Accordingly,we obtain the (continuous) system ∂ t ˜ m + ∇ ˜ m · v + ∇ m · ˜ v = Ω × (

0, 1 ] ,(4.6a) − ∂ t ˜ λ − ∇ · ( v ˜ λ ) = Ω × [

0, 1 ) ,(4.6b) γ ( ∇ · ˜ v ) = Ω × [

0, 1 ] ,(4.6c) H GN [ ˜ v ] : = β A [ ˜ v ] + γ ∇ p + ˜ λ ∇ m = − g in Ω × [

0, 1 ] .(4.6d)We expect the rate of convergence to drop from quadratic to (super-)linear when turning to (4.6). However,if the L -distance can be driven to zero, we recover fast local convergence close to the true solution v (cid:63) ,even if the adjoint variable is neglected. This is due to the fact that (4.1b) models the ﬂow of the mismatchbackward in time, such that λ → v → v (cid:63) . We refer to this method as ( inexact ) GN-PCG method [8, 9].We remark that all algorithmic details described in this note apply to both Newton-Krylov methods. We consider a globalized Picard iteration (ﬁxed point iteration) in addition tothe described Newton-Krylov methods. Based on (4.1d) we have(4.7) v hk + = − ( β A h ) − [ γ ∇ h p hk + f hk ] .Since we use Fourier spectral methods, the inversion of A h in (4.7) comes at the cost of a diagonal scaling(implementation details can be found in §4.3.3). Accordingly, this scheme does not require the (iterative)solution of a linear system. However, it potentially results in a larger number of outer iterations untilconvergence as we expect the optimization problem to be poorly conditioned.We do not directly use the solution of (4.7) as a new iterate but compute a search direction s k instead.This in turn allows us to perform a line search on s k . That is, we subtract the new from the former iterate.This scheme can be viewed as a gradient descent in the function space induced by W (i.e. a preconditionedgradient descent scheme ; see §C).Note that s k is, in contrast to Newton methods, arbitrarily scaled. Therefore, we provide an augmentedimplementation that tries to estimate an optimal scaling during the course of optimization. Details can befound in §4.3.5. The termination criteria are in accordance with [56] (see [29, 305 ff.] fora discussion) given by(4.8) ( C ) J h ( v hk − ) − J h ( v hk ) < τ J ( + J h ( v h )) , ( C ) (cid:107) v hk − − v hk (cid:107) ∞ < √ τ J ( + (cid:107) v hk (cid:107) ∞ ) , ( C ) (cid:107) g hk (cid:107) ∞ < √ τ J ( + J h ( v h )) , ( C ) (cid:107) g hk (cid:107) ∞ < (cid:101) mach , ( C ) k > n opt .Here, τ J > (cid:101) mach > n opt ∈ N is the maximalnumber of outer iterations. The algorithm is terminated if(4.9) { ( C ) ∧ ( C ) ∧ ( C ) } ∨ ( C ) ∨ ( C ) ,where ∨ denotes the logical or and ∧ the logical and operator, respectively.0 ANDREAS MANG AND GEORGE BIROS

This section provides additional speciﬁcs on the implementation. In partic-ular, we describe( i ) the numerical discretization (see §4.3.1), ( ii ) the parameterization in time (see §4.3.2),( iii ) the inversion of the operator A h (see §4.3.3), and ( iv ) strategies for the parameter selection (see §4.3.5). We use a (regular) nodal grid for the discretization in space andtime. The problem is deﬁned on the space-time interval Ω × [

0, 1 ] , where Ω : = ( − π , π ) d . Accordingly,we obtain the time step size via h t = n t . The cell size (pixel or voxel size) h x : = ( h x , . . . , h dx ) ∈ R d ≥ fora spatial grid cell can be computed via h ix = π / n ix , i =

1, . . . , d , where n ix is the number of grid pointsalong the i th spatial direction x i .The derivative operators are discretized via Fourier spectral methods [11]. The time integrator for theforward and adjoint solves is an explicit second-order Runge-Kutta (RK2) method, which, in connectionwith Fourier spectral methods, displays minimal numerical diffusion.Following standard numerical theory for hyperbolic equations, the step size h t > h t ,max : = (cid:101) CFL / max ( (cid:107) v h (cid:107) ∞ (cid:11) h x ) , h t ,max > (cid:11) denotes a Hadamard division and (cid:101) CFL > h t ,max is attainedfor (cid:101) CFL =

1. We use (cid:101)

CFL = n t (and therefore h t ) for the forward and adjoint solves as required bythe CFL condition. To reduce the number of unknowns, v is expanded in time in termsof basis functions b l : [

0, 1 ] → R , t (cid:55)→ b l ( t ) , l =

1, . . . , n c ,(4.10) v ( x , t ) = n c ∑ l = b l ( t ) v l ( x ) ,where v l : R d → R d , x (cid:55)→ v l ( x ) , is a coefﬁcient ﬁeld. The coefﬁcients v l are the new unknowns of ourproblem. This reduces the number of unknowns in time from n t to n c , where n c (cid:28) n t . Thus, we caninvert for a stationary ( n c =

1) or a nonstationary velocity ﬁeld as required. Nothing changes in ourformulation—just the number of unknowns.We use

Chebyshev polynomials as basis functions b l on account of their excellent approximation proper-ties as well as their orthogonality (see §A for details). The expansion (4.10) solely affects S and the (incre-mental) control equation (i.e. (4.1d) and (4.3d)); v is computed from the coefﬁcient ﬁelds v l , l =

1, . . . , n c ,during the forward and adjoint solves according to (4.10). The Picard iteration in (4.7) as well as the precondition-ing of (4.4) require the inversion of the differential operator A h . Since we use Fourier spectral methodsthis inversion can be accomplished at the cost of a spectral diagonal scaling. However, A h has a nontrivialkernel (which only includes constant functions due to the periodic boundary conditions). We make A h in-vertible by setting the base frequency of the inverse of A h (including the scaling by β ) to one. This ensuresnot only invertibility but also that the constant part of the (incremental) body force f (or ˜ f , respectively)remains in the kernel of our regularization scheme. This in turn allows us to invert for constant velocityﬁelds. p and ˜ p . In our numerical scheme, we eliminate p and by that (4.1c) from (4.1).Details on the derivation can be found in §B. We obtain(4.11) ˆ g : = β A [ v ] + K [ f ] = K [ f ] : = −∇ ( ∇ − ( ∇ · f )) + f ,to replace (4.1d), where f is the body force as deﬁned in §4.1 and A the ﬁrst variation of S with respectto v (see (4.2a) and (4.2b), respectively). It immediately follows that we obtain(4.12) ˆ H [ ˜ v ] : = β A [ ˜ v ] + K [ ˜ f ] = − ˆ g to replace (4.3d); ˜ f denotes the incremental body force deﬁned in §4.1. NEWTON-KRYLOV ALGORITHM FOR CONSTRAINED IMAGE REGISTRATION To the extent possible, it is desirable to design a numerical scheme thatdoes not require a selection of parameters ( black-box solver ). This is challenging for previously unseen data.In general, the user should only be required to decide on the following: • The desired accuracy of the inversion (controlled by the tolerance τ J ; see §4.2.4). • The desired properties of the mapping y (controlled by (cid:101) θ or (cid:101) F , respectively; see below). • The budget we are willing to assign to the computation (controlled by n opt ; see §4.2.4).For the purpose of this numerical study we proceed as follows: Optimization : We set the maximum number of iterations n opt (see (4.8)) to 1E6, as we do not wantour algorithm to terminate early (i.e. we make sure that we terminate only if either we reach the deﬁnedtolerances or we no longer observe a decrease in J h ). For the convergence study in §5, we use therelative change of the (cid:96) ∞ -norm of the gradient g h as a stopping criterion, as we are interested in studyingconvergence properties. This enables an unbiased comparison in terms of the required work to solve anoptimization problem up to a desired accuracy. In particular, we terminate the optimization if the relativechange of the reduced gradient g h is larger than or equal to three orders of magnitude.Following standard textbook literature [29, 56] we use the stopping criteria in (4.8) for the remainder ofthe experiments. We set the tolerance to τ J = −

3. We qualitatively did not observe signiﬁcant differencesin the ﬁnal results for the experiments performed in this study, when turning to smaller tolerances. Wewill further elaborate on the required accuracy for the inversion (i.e. the registration quality) in a follow-uppaper.The tolerance of the PCG method is set as discussed in §4.2.1 (see (4.5)). The maximal number ofiterations for the PCG method is set to n (order of the reduced KKT system in (4.4)). In theory, thisguarantees that the PCG method converges to a solution. This choice not only ensures that we provide anunbiased study (i.e. we do not terminate early) but also makes sure that we do not miss any issues in theimplementation or parameter selection. We converged for all experiments conducted in this study afteronly a fraction of n inner iterations. This statement is conﬁrmed by the reported number of PDE solves .For all our experiments we initialized the line search with a factor of α k = α k to be 1). However, this is not the case for the Picard scheme (i.e. the preconditioned gradient descent).Our implementation features an option to memorize the scaling of s k for the next outer iteration. Thatis, we introduce an additional scaling factor ˜ α k > s k before entering the line search(initialized with ˜ α k = α k by α k . On the contrary, we upscale ˜ α k by a factor of two if α k = PDE Solver : The number of time steps n t is bounded from below due to stability requirements (see§4.3.1). Since we use an expansion in time (see §4.3.2), it is possible to adaptively adjust n t , so thatnumerical stability is attained.However, we ﬁx n t for the numerical experiments in §5.3 as we are interested in studying the con-vergence behavior with respect to the employed grid size. We set n t to 4 max ( n x ) . This is a pessimisticchoice. If we still encounter instabilities (as judged by monitoring the CFL condition (see §4.3.1)), s k isscaled by a factor of 0.5 until numerical stability is attained, before entering the line search. For all nu-merical experiments conducted in this study, we did not observe any instabilities for the Newton-Krylovmethods. However, for the Picard method we observed instabilities in the case when we did not considerthe rescaling procedure detailed above. This is due to the fact that s k is arbitrarily scaled for ﬁrst-ordermethods (as opposed to second-order methods). By introducing the additional scaling parameter ˜ α k wecould stabilize the Picard method—we did not observe a violation of the CFL condition for any of theconducted experiments (for n t ﬁxed). Regularization : Estimating an optimal value for β is an area of research by itself. A variety of methodshas been designed (see e.g. [72]). A key difﬁculty is computational complexity. Methods based on theassumption that differences between model output and observed data are associated with random noise(such as generalized cross validation) might not be reliable in the context of nonrigid image registration. ‘’PDE solve‘’ refers to the solution of one of the hyperbolic PDEs that appear in the optimality system (4.1) and (4.3). ANDREAS MANG AND GEORGE BIROS

This is due to the fact that the noise in the images is likely to be highly structured [38]. Another possibilityis to estimate the regularization parameter on the basis of the spectral properties of the Hessian (see §5.3.1).That is, we can estimate the condition number of the problem during the PCG solves for the unregularizedproblem using the Lanczos algorithm (see [30, p. 528]). We can do this very efﬁciently by initializing theproblem with a zero velocity ﬁeld. Given v is zero, the application of the Hessian within the PCG iscomputationally inexpensive, as a lot of the terms in the optimality systems drop (see §4.1). However, thelevel of regularization depends not only on properties of the data but also on regularity requirements on y . Another common strategy is to perform a parameter continuation in β (see e.g. [35, 38]). In [38] it hasbeen suggested to inform the algorithm about the required regularity of a solution on the basis of a lowerbound on the L -distance between the reference and the deformed template image. The decision on suchbound, however, might not be intuitive for practitioners. Further, one is ultimately interested not only ina small residual but also in a bounded determinant of the deformation gradient. Therefore, we proposeto inform the algorithm on regularity requirements in terms of a lower bound (cid:101) F ∈ (

0, 1 ) on det ( F h ) (i.e.a bound on the tolerable compression of a volume element). If the Stokes regularization scheme ( γ = S is an H -seminorm) is considered, bounds on geometric constraints of the deformation ofa volume element can be used. In particular, we use a lower bound (cid:101) θ > π − (cid:101) θ . Note that it is actually necessaryto monitor geometric properties to guarantee a local diffeomorphism; a lower bound on det ( F h ) is notsufﬁcient.Our algorithm proceeds as follows: In a ﬁrst step, the registration problem is solved for a large valueof β ( β = . Subsequently, β is reduced by one orderof magnitude until we reach (cid:101) F (or (cid:101) θ ). From there on a binary search is performed. The algorithm isterminated if the change in β is below 5% of the value for β , for which (cid:101) F (or (cid:101) θ ) was breached. We adda lower bound of 1E − β as well as a lower bound for the relative change of the L -distance of 1E − Presmoothing : A numerical challenge in image computing is that images are functions of boundedvariation. Therefore, an accurate computation of the derivatives becomes more involved. A commonapproach to ensure numerical stability and avoid the Gibbs phenomena is to reduce high-frequency in-formation in the data. We use a Gaussian smoothing, which is parametrized by a user-deﬁned standarddeviation σ >

0. We experimentally found a value of σ = π / min ( n x ) to be adequate for the problemsat hand. However, we note that we increased σ by a factor of 2 for one set of experiments in §5.3.2. Wealso note that we implemented a method for grid and scale continuation for the images. This avoidsthe problem of deciding on σ . We will additionally investigate an automatic selection strategy for σ in afollow-up paper.It is important to note that the sensitivity of second-order derivatives to noise in the data is problem-atic. Therefore, we refrain from applying the N-PCG method to nonsmooth images.

5. Numerical Experiments.

We report results only in two dimensions. We test the algorithm on real-world and synthetic registration problems (see §5.1). The measures to analyze the registration results aresummarized in §5.2. We conduct a numerical study (see §5.3), which includes an analysis of( i ) the spectralproperties of the Hessian (see §5.3.1), ( ii ) grid convergence (see §5.3.2) and ( iii ) the effects of varyingthe number of the unknowns in time (see §5.3.3). We additionally report results for a fully automaticregistration on high-resolution images based on the designed parameter continuation in β (see §5.4). We consider synthetic and real-world registration problems . These are illustrated in Fig. 1.All images have been normalized to an intensity range of [

0, 1 ] . The synthetic problems are constructedby solving the forward problem to create an artiﬁcial template image m T given some image m R and some Note that for large β the optimization problem is almost quadratic, so that Newton-Krylov methods converge quickly. The hand images are taken from [56]. NEWTON-KRYLOV ALGORITHM FOR CONSTRAINED IMAGE REGISTRATION m R m T − | m R − m T | m R m T − | m R − m T | m R m T − | m R − m T | m R m T − | m R − m T | F ig . 1. Synthetic and real-world registration problems. All images have been normalized to an intensity range of [

0, 1 ] . The registrationproblems are referred to as sinusoidal images (top row, left), UT images (top row, right), hand images (bottom row, left) and brain images (bottomrow, right). Each row displays the reference image m R , the template (deformable) image m T and a map of their pointwise difference (from leftto right as identiﬁed by the inset in the images). We provide an illustration of the deformation pattern y (overlaid onto m T ) for the syntheticproblems. This mapping is computed from v (cid:63) in (5.1) via (D.1) . velocity ﬁeld v (cid:63) ( sinusoidal images and UT images in Fig. 1). Here, v (cid:63) is chosen to live on the manifoldof divergence-free velocity ﬁelds to provide a testing environment for the Stokes regularization scheme.Further, v (cid:63) is by construction assumed to be constant in time (i.e. n c = v i , (cid:63) ( x i , t ) = ( x i ) cos ( x i ) ∀ t ∈ [

0, 1 ] , i =

1, . . . , d . We report the number of the (outer) iterations and the number of thehyperbolic PDE solves to assess the computational work load. The latter is a good proxy for the wall clocktime. It provides a transparent comparison between the designed ﬁrst and second-order methods, giventhat the number of the hyperbolic PDE solves varies between these methods: Two hyperbolic PDE solvesare required during each iteration of the Picard method (we have to solve (4.1a) and (4.1b) to evaluate thereduced gradient in (4.1d); see also Alg. 1). For the Newton-Krylov methods, we require an additionaltwo hyperbolic PDE solves per inner iteration (we have to solve (4.3a) and (4.3b) to compute the Hessianmatrix-vector product given in (4.3d); see also Alg. 2). Each evaluation of J h in (1.2) (i.e. each line searchstep) requires an additional hyperbolic PDE solve (we have to compute the deformed template image at t = i ) the L -distance, ( ii ) the objective functional J h and ( iii ) the (cid:96) ∞ -norm of the (reduced) gradient g h to assess the quality of the inversion. We additionally report values forthe determinant of the deformation gradient to study local deformation properties. These measures aredeﬁned more explicitly in Tab. 2.We visually support this quantitative analysis on basis of snapshots of the registration results. Infor-mation on the reconstruction accuracy can be obtained from pointwise maps of the residual differencebetween m R and m . The deformation regularity and the mass conservation can be assessed via imagesof the pointwise determinant of the deformation gradient and/or of a deformed grid overlaid onto m .Details on how these are obtained and on how to interpret them can be found in §D. We study the spectral properties of the Hessian (see §5.3.1), grid convergence(see §5.3.2) as well as the inﬂuence of an increase in the number of the unknowns in time (see §5.3.3).

Purpose : We study the ill-posedness and the ill-conditioning of the problem at hand. We reportspectral properties of H h . We study the eigenvalues and the eigenvectors with respect to different choices4 ANDREAS MANG AND GEORGE BIROST able Overview of the quantitative measures that are used to assess the registration performance. We report the number of outer iterations (stepsfor updating the control variable v h ) and the number of the hyperbolic PDE solves (i.e. how often we have to solve one of the hyperbolic PDEs (4.1a) , (4.1b) , (4.3a) and (4.3b) that appear in the optimality systems) to assess the work load. We report the relative change of the L -distance,the objective and the reduced gradient to asses the quality of the inversion. We report values for the determinant of the deformation gradientto assess the regularity of the computed deformation map. We report the relative power spectrum of the coefﬁcients v hl to assess, which of thecoefﬁcients of the expansion in (4.10) are signiﬁcant. Description Deﬁnition k (cid:63) n PDE relative change of L -distance (cid:107) m h − m hR (cid:107) : = (cid:107) m h − m hR (cid:107) / (cid:107) m hT − m hR (cid:107) relative change of objective value δ J h rel : = J h ( v hk (cid:63) ) / J h ( v h ) relative change of reduced gradient (cid:107) g h (cid:107) ∞ ,rel : = (cid:107) g hk (cid:63) (cid:107) ∞ / (cid:107) g h (cid:107) ∞ determinant of deformation gradient det ( F h ) relative power spectrum of v hl (cid:107) v hl (cid:107) : = (cid:107) v hl (cid:107) / (cid:107){ v hl (cid:48) } ncl (cid:48) = (cid:107) − − − − i | Λ ii | H -regularization β = β = − β = − − − − i | Λ ii | Stokes regularization β = β = − β = F ig . 2. Trend of the absolute value of the eigenvalues Λ ii , i =

1, . . . , 8192 , of the reduced Hessian H h for plain H -regularization ( γ = ;left) and the Stokes regularization scheme ( γ = ; H -regularization; right) for β ∈ {

0, 1E −

2, 1E6 } (as indicated in the legend of the plots). Wereport the trend of the entire set of 8192 eigenvalues. The test problem is the UT images (see §5.1 for details on the construction of this syntheticregistration problem; n x = (

64, 64 ) T and n c = ). The Hessian is computed at the true solution v h , (cid:63) to ensure that H h (cid:31) (this statement isconﬁrmed by the values reported in Tab. 3). The eigenvalues (absolute value) are sorted in descending order for the unregularized problem (i.e. for β = ) and in ascending order otherwise (i.e. for β = − and β = ). for β . We study the differences between plain H -regularization and the Stokes regularization scheme( H -regularization). Setup : This study is based on the

UT images (the true solution v h , (cid:63) is divergence-free; see §5.1 formore details on the construction of this synthetic registration problem; n x = (

64, 64 ) T and n c = n = V Λ V − , V = ( ν i ) ni = , ν i ∈ R n , (cid:107) ν i (cid:107) = Λ = diag ( Λ , . . . , Λ nn ) , Λ ii >

0, is computed at the true solution v h , (cid:63) to guarantee that H h (cid:31)

0. The spectrum is computed forthree different choices of β : for the unregularized problem ( β = β = −

3) and solely for the regularization model ( β = Results : Fig. 2 displays the trend of the absolute value of the eigenvalues Λ ii , i =

1, . . . , n . Theyare sorted in descending order for β = ascending order otherwise. If an eigenvalue drops belowmachine precision (i.e. 1E − −

16 (only for visualization purposes). The extremal real andimaginary part of the eigenvalues is summarized in Tab. 3. Fig. 3 provides the spatial variation of theeigenvectors ν i ∈ R n that correspond to the eigenvalues Λ ii , i ∈ {

1, 5, 20, 100, 1000 } , in Fig. 2 with respectto different choices for β and different regularization schemes. We only display the ﬁrst component ν i ofthe coefﬁcient ﬁeld ν i : = ( ν i , ν i ) . The pattern for the second component is (qualitatively) alike. Observations : The most important observations are that( i ) the regularization schemes display a very NEWTON-KRYLOV ALGORITHM FOR CONSTRAINED IMAGE REGISTRATION H -regularization β = range ν [ − − − ] ν [ − − − ] ν [ − − − ] ν [ − − − ] ν [ − − − ] β = E − range [ − − − ] [ − − − ] [ − − − ] [ − − − ] [ − − − ] β = E range [ − − ] [ − − − ] [ − − − ] [ − − − ] [ − − − ] Stokes regularization β = range ν [ − − − ] ν [ − − − ] ν [ − − − ] ν [ − − − ] ν [ − − − ] β = E − range [ − − − ] [ − − − ] [ − − − ] [ − − − ] [ − − − ] β = E range [ − − ] [ − − − ] [ − − − ] [ − − − ] [ − − − ] F ig . 3. Eigenvector plots of the reduced Hessian H h ∈ R n × n , n = , for β ∈ {

0, 1E −

3, 1E6 } for plain H -regularization ( γ = ; top)and for the Stokes regularization scheme ( γ = ; H -regularization; bottom). The results correspond to the eigenvalue plots reported in Fig. 2.We refer to Fig. 2 and the text for details on the experimental setup. Each plot provides the spatial variation of the portion of an eigenvector ν i ∈ R n associated with the ﬁrst component of the coefﬁcient ﬁeld v hl , l = n c , n c = . The individual plots correspond to the eigenvalues Λ ii > ,i =

1, 5, 20, 100, 1000 in Fig. 2. The range of the values for ν i is provided below each plot. ANDREAS MANG AND GEORGE BIROST able Extrema of the eigenvalues Λ ii , i =

1, . . . , 8192 , of the reduced Hessian reported in Fig. 2. We report values for plain H -regularization( γ = ; top block) and the Stokes regularization scheme ( γ = ; H -regularization; bottom block). We refer to Fig. 2 and the text for details on theexperimental setup. We report the smallest and the largest real part as well as the largest absolute value of the imaginary part of the eigenvalues Λ ii with respect to different choices of the regularization parameter β ∈ {

0, 1E −

2, 1E6 } . H -regularization ( γ = β min { ( Re ( Λ ii )) ni = } max { ( Re ( Λ ii )) ni = } max { ( | Im ( Λ ii ) | ) ni = } − −

15 3.72E1 3.26E − − − γ = β min { ( Re ( Λ ii )) ni = } max { ( Re ( Λ ii )) ni = } max { ( | Im ( Λ ii ) | ) ni = } − −

13 2.48E1 5.92E − − − − − similar behavior (as judged by the clustering of the eigenvalues as well as the spatial variation of theeigenvectors for β = ii ) the smoothness of the eigenvectors decreases with a decreasing regularizationparameter β and increasing eigenvalues (for β = − β = iii ) it is less clear how to identify the smooth eigenvectors within the eigenspace of the Stokes regulariza-tion scheme.The eigenvalues Λ ii , i =

1, . . . , 8192, drop rapidly for the unregularized problem, approaching almostmachine precision for i ≈ β shifts the trend of | Λ ii | to larger numbers. The values in Tab. 3 conﬁrm that H h (cid:31) v h , (cid:63) .Turning to the eigenvector plots, we can see that the ﬁrst eigenvector displays a delta peak like struc-ture for both regularization schemes, since there is no local coupling of the spatial information. For theregularized problem we can observe a smooth spatial variation for the eigenvectors associated with largeeigenvalues for both regularization schemes. The ﬁrst eigenvector plot is almost constant for β = β = i increases, the eigenvectorsbecome more oscillatory. We can observe strong differences between the two schemes for a moderateregularization ( β = −

3; middle row of each block in Fig. 3). Also, we can observe a more complexstructure for the Stokes regularization scheme for small eigenvalues (i.e. it is difﬁcult to identify wherethe smoothest eigenvectors are located within the eigenspace).

Conclusion : We conclude that the Hessian operator is singular if we do not include a smoothnessregularization model for the control variable. For practical values of the regularization parameter, theHessian behaves as a compact operator; larger eigenvalues are associated with smooth eigenvectors. It iswell known that designing a preconditioner for such operators is challenging.

We study the grid convergence of the considered iterative optimizationmethods on the basis of synthetic registration problems. We use a rigid setting to prevent bias originatingfrom adaptive changes during the computations. That is, the results are computed on a single resolutionlevel. No grid, scale or parameter continuation is applied. The number of the time points is ﬁxed to n t = ( n x ) for all experiments. Further, we use empirically determined values for the regularizationparameter β , namely β ∈ { −

2, 1E − } . Since we are interested in studying the convergence propertiesof our method, we consider the relative change of the (cid:96) ∞ -norm of the reduced gradient g h as a stoppingcriterion. This yields a fair comparison between the different optimization methods, as a reduction in thenorm of g h directly reﬂects how well an optimization problem is solved (i.e. we exploit that g h = (cid:96) ∞ -norm of g h is at leastthree orders of magnitude. However, since the Picard method tends to converge slowly for low toleranceswith respect to the gradient, we stop if we detect a stagnation in the objective. In particular, we terminate NEWTON-KRYLOV ALGORITHM FOR CONSTRAINED IMAGE REGISTRATION T able Quantitative analysis of the convergence of the Picard, N-PCG and GN-PCG methods. The test problem is the sinusoidal images (see §5.1for details on the construction of this synthetic registration problem). We compare convergence results for plain H -regularization ( γ = ; topblock) and the Stokes regularization scheme ( γ = ; H -regularization; bottom block). We report results for different grid sizes n x = ( n x , n x ) T ,n ix ∈ {

64, 128, 256 } , i =

1, 2 . We invert for a stationary velocity ﬁeld (i.e. n c = ). We terminate the optimization if the relative change of the (cid:96) ∞ -norm of the reduced gradient g h is at least three orders of magnitude or if the change in J h between ten successive iterations is below or equalto − (i.e. the algorithm stagnates). The regularization parameter is empirically set to β = − . The number of the (outer) iterations (k (cid:63) ),the number of the hyperbolic PDE solves (n PDE ) and the relative change of (i) the L -distance ( (cid:107) m hR − m h (cid:107) rel ), (ii) the objective ( δ J hrel ), and (iii)the (reduced) gradient ( (cid:107) g h (cid:107) ∞ , rel ) and the average number of required line search steps ¯ α are reported. Note that we introduced a memory for thestep size into the Picard method (preconditioned gradient descent) to stabilize the optimization (see §4.3.5 and the description of the results). Thedeﬁnitions for the reported measures are summarized in Tab. 2. n ix k (cid:63) n PDE (cid:107) m hR − m h (cid:107) δ J h rel (cid:107) g h (cid:107) ∞ ,rel ¯ α H - r e g u l a r i za t i o n Picard 64 30 420 4.78E − − − − − − − − − − − − − − − − − − − − − − − − − − − S t o k e s r e g u l a r i za t i o n Picard 64 57 269 6.61E − − − − − − − − − − − − − − − − − − − − − − − − − − − the optimization if the change in the objective in ten consecutive iterations was equal or below 1E −

6. Wesolve for a stationary velocity ﬁeld (i.e. n c = C ∞ Registration Problem.Purpose : We study the numerical behavior for smooth registration problems. We report results for gridconvergence and deformation regularity. We compare the Picard, the GN-PCG and the N-PCG method.

Setup : This experiment is based on the sinusoidal images (see §5.1 for more details on the constructionof this synthetic registration problem). Therefore, m T , m R ∈ C ∞ ( Ω ) and v (cid:63) ∈ L ([

0, 1 ] ; C ∞ ( Ω ) d ) so thatthe excellent convergence properties of Fourier spectral methods are expected to pay off. Additionally, itis not problematic to apply the N-PCG method. We report results for different grid sizes n x = ( n x , n x ) T , n ix ∈ {

64, 128, 256 } , i =

1, 2, n t = ( n x ) . No pre-smoothing is applied. We use an experimentallydetermined value of β = − Results : The grid convergence results are summarized in Tab. 4. Values derived from the deformationgradient F h are reported in Tab. 5. Exemplary result for the plain H -regularization ( γ =

0) and theStokes regularization scheme ( γ = H -regularization) are displayed in Fig. 4. The deﬁnitions of thequantitative measures reported in Tab. 4 and Tab. 5 can be found in Tab. 2. Observations : The most important observations are:( i ) there are signiﬁcant differences in computationalwork between the Picard and the Newton-Krylov methods with the latter being much more efﬁcient,( ii ) the differences between the Newton-Krylov methods are insigniﬁcant, ( iii ) the rate of convergenceis independent of the grid resolution and ( iv ) the numerical accuracy is almost at the order of machineprecision.The registered images are quantitatively (see Tab. 4) and qualitatively (see Fig. 4) in excellent agree-ment. For the considered tolerance (reduction of the (cid:96) ∞ -norm of the reduced gradient by three ordersof magnitude) we can reduce the L -distance between three (compressible deformation) and four (in-8 ANDREAS MANG AND GEORGE BIROS H -regularization m − | m R − m | det ( F ) Stokes regularization m − | m R − m | det ( F ) F ig . 4. Qualitative comparison of exemplary registration results of the convergence study reported in Tab. 4. In particular, we display theresults for the N-PCG method for a grid size of n x = ( ) T . We refer to Tab. 4 and the text for details on the experimental setup. We reportresults for plain H -regularization ( γ = ; images to the left) and the Stokes regularization scheme ( γ = ; H -regularization; images to theright). We display the deformed template image m , a pointwise map of the residual differences between m R and m (which appears completelywhite, as the residual differences are extremely small) as well as a pointwise map of the determinant of the deformation gradient det ( F ) (from leftto right as identiﬁed by the inset in the images). The values for the det ( F ) are reported in Tab. 5. Information on how to interpret these imagescan be found in §D. T able Obtained values for det ( F h ) of exemplary registration results of the convergence study reported in Tab. 2 with respect to different iterativeoptimization methods (Picard, N-PCG and GN-PCG). We refer to Tab. 4 and the text for details on the experimental setup. We report results forplain H -regularization ( γ = ; top block) and for the Stokes regularization scheme ( γ = ; H -regularization; bottom block) for a grid size of n x = ( ) T . min ( det ( F h )) max ( det ( F h )) mean ( det ( F h )) std ( det ( F h )) H -regularization Picard 8.81E − − − − − − − − − compressible deformation) orders of magnitude (see Tab. 4). The search direction of the Newton-Krylovmethods is nicely scaled. No additional line search steps are necessary. We require 1.57 to 1.71 line searchsteps for the Picard iteration (on average). Note that we prescale the search direction of the Picard methodby an additional parameter ˜ α k , which is estimated during the computation (see §4.3.5 for details). Oth-erwise, the number of the line search steps would be seven to eight on average for the Picard method.The Picard method did stagnate during the computations. This is why the gradient has not been reducedby three orders of magnitude for the Picard method. However, it is in general possible to reduce thegradient accordingly. We decided to report only until stagnation for the Picard method as the number ofthe iterations would signiﬁcantly increase without making any real progress.The Newton-Krylov methods display quick convergence. Only ﬁve outer iterations are necessaryto reduce the gradient by more than four orders of magnitude. The results demonstrate a signiﬁcantdifference in the computational work between ﬁrst and second-order methods for the considered tolerance.The reconstruction quality improves by approximately one order of magnitude when switching fromplain H -regularization ( γ =

0) to a Stokes regularization scheme ( γ = H -regularization), as judgedby the relative change in the L -distance. This is expected, since the synthetic problem has been createdunder the assumption of mass conservation (i.e. ∇ · v (cid:63) = H -regularization model on the solution for the same values of β .From a theoretical point of view, we expect N-PCG to outperform GN-PCG (quadratic vs. super-linearconvergence). The reported results demonstrate an almost identical performance. This is due to the factthat we can drive the residual almost to zero, such that we can recover fast local convergence for theGN-PCG method (see §4.2.2 for details).The Picard method converges faster for the Stokes regularization scheme. However, the differencesbetween the Picard and Newton-Krylov methods are still signiﬁcant with an approximately four-fold NEWTON-KRYLOV ALGORITHM FOR CONSTRAINED IMAGE REGISTRATION n PDE . For the Newton-Krylov methods, we can globally observe a slight increase in the num-ber of inner iterations when switching from plain H -regularization to the Stokes regularization scheme.These differences have to be attributed to a varying relative change in the reduced gradient .The results reported in Tab. 5 demonstrate an excellent numerical accuracy for the mass conservationfor all numerical schemes. The error in the determinant of the deformation gradient is O ( − ) , i.e.we achieve an accuracy that is almost at the order of machine precision for a grid resolution of n x =( ) T and n t = Conclusion : We conclude that we can interchangeably use the Newton-Krylov methods. Therefore,given that N-PCG is more sensitive to noise and discontinuities in the data, we will exclusively considerGN-PCG for the remainder of the experiments. Also, if we require an inversion with high accuracy, theNewton-Krylov methods clearly outperform the Picard method (i.e. the preconditioned gradient descent).

Images with Sharp Features.Purpose : We study the grid convergence and deformation regularity for an image with sharp features.We compare the Picard and the GN-PCG method.

Setup : We consider the

UT images (see §5.1 for details on the construction of this synthetic registrationproblem). We report results for experimentally determined values of β ∈ { −

2, 1E − } with respectto different grid resolution levels n x = ( n x , n x ) T , n ix ∈ {

64, 128, 256 } , i =

1, 2, n t = ( n x ) . Theremainder of the parameters are chosen as stated in the introduction of this section and in §4.3.5. Both,plain H -regularization ( γ =

0) as well as the Stokes regularization scheme ( γ = H -regularization) areconsidered.For images of size n x = ( ) T and a Stokes regularization scheme, we observed difﬁculties inthe inversion (only the number of outer and inner iterations increased; the algorithm still converges tothe same solution), due to a strong forcing (i.e. the sharp features pushed the solver at an early stage toa solution that was far away from the ﬁnal minimizer). We increased the smoothing by a factor of two asa remedy. This is not an issue for the practical application of our algorithm, as our framework featuresa method for performing a scale continuation as well as a continuation in the regularization parameter.Therefore, the user does not have to decide on σ nor on β . In addition to that, we currently investigateadaptive approaches to automatically detect insufﬁcient smoothness during the course of the optimizationto prevent a deterioration in the convergence behavior.The remainder of the parameters are chosen as stated in the introduction of this section as well asin §4.3.5. Results : Tab. 6 summarizes the results of the convergence study. We illustrate intermediate results withrespect to the ﬁrst 13 (outer) iterations k in Fig. 5 (plain H -regularization; γ = J h (contribution of the L -distance and the regularization model S h )in Fig. 6. We report measures of deformation regularity in Tab. 7. Observations : The most important observations are( i ) the GN-PCG method displays a quicker conver-gence than the Picard method, ( ii ) we cannot achieve the same inversion accuracy with the Picard methodas compared to the GN-PCG method, and ( iii ) the number of the (inner and outer) iterations increases andis no longer independent of the resolution level.The rate of convergence decreases compared to the results reported in the former section (see Tab. 4).Overall, we require more outer and inner iterations to solve the registration problem.The residual differences between m R and m clearly depend on the choice of β (see Tab. 4 and Fig. 6).We achieve a similar reduction in the L -distance for both the Picard and the GN-PCG method (twoto four orders of magnitude). The residual differences are less pronounced when switching from plain H -regularization to the Stokes regularization scheme as compared to the results reported in §5.3.2.We cannot guarantee that it is possible to reduce the gradient by three orders of magnitude if we usethe Picard method. Even if we do not include a condition to terminate if we observe stagnation (i.e. thechange in J h is below or equal to 1E − Note that the tolerance of the Krylov-subspace method and therefore the number of inner iterations depends on the gradient(see (4.5)). ANDREAS MANG AND GEORGE BIROST able Quantitative analysis of the convergence for the Picard and the GN-PCG method. The test problem is the UT images (see §5.1 for more detailson the construction of this synthetic registration problem). We compare convergence results for plain H -regularization ( γ = ; top block) andthe Stokes regularization scheme ( γ = ; bottom block; H -regularization) for empirically chosen regularization parameters β ∈ { −

2, 1E − } .We report results for different grid sizes n x = ( n x , n x ) T , n ix ∈ {

64, 128, 256 } , i =

1, 2 , n t = ( n x ) . We invert for a stationary velocityﬁeld (i.e. n c = ). We terminate the optimization if the relative change of the (cid:96) ∞ -norm of the reduced gradient g h is larger than or equal tothree orders of magnitude or if the change in J h between ten successive iterations is below or equal to − (i.e. the algorithm stagnates). Wereport the number of the (outer) iterations (k (cid:63) ), the number of the hyperbolic PDE solves (n PDE ) and the relative change of (i) the L -distance( (cid:107) m hR − m h (cid:107) rel ), (ii) the objective ( δ J hrel ), and (iii) the (reduced) gradient ( (cid:107) g h (cid:107) ∞ , rel ) as well as the average number of line search steps ¯ α . Notethat we introduced a memory for the step size into the Picard method to stabilize the optimization (see §4.3.5 and the description of the results).The deﬁnitions for the reported measures can be found in Tab. 2. This study directly relates to the results for the smooth registration problem (see§5.3.2, in particular Tab. 4). n ix k (cid:63) n PDE (cid:107) m hR − m h (cid:107) δ J h rel (cid:107) g h (cid:107) ∞ ,rel ¯ α H - r e g u l a r i za t i o n β = E − Picard 64 130 752 1.99E − − − − − − − − − − − − − − − − − − β = E − Picard 64 339 4410 1.84E − − − − − − − − − − − − − − − − − − S t o k e s r e g u l a r i za t i o n β = E − Picard 64 92 448 2.73E − − − − − − − − − − − − − − − − − − β = E − Picard 64 216 2823 9.99E − − − − − − − − − − − − − − − − − − T able Values for the determinant of the deformation gradient det ( F h ) for exemplary results of the convergence study reported in Tab. 6. We reportresults for the Picard and the GN-PCG method. We refer to Tab. 6 and the text for details on the experimental setup. We report results for theStokes regularization scheme ( γ = ; H -regularization). The regularization parameter is set to β = − . The grid size is n x = ( ) T .These results directly relate to those reported for the smooth registration problem (see §5.3.2, in particular Tab. 5). min ( det ( F h )) max ( det ( F h )) mean ( det ( F h )) std ( det ( F h )) Picard 10E − − − − not possible to reduce the gradient by three orders of magnitude as the changes of the objective hit ournumerical accuracy (which causes the line search to fail). We do not observe this issue when consideringthe GN-PCG method. Further, there are signiﬁcant differences in terms of the computational work. If wedo not account for the stagnation of the Picard method we have observed a number of hyperbolic PDEsolves that is well above O ( ) . Clearly, in a practical application we terminate the Picard method atan earlier stage, as we no longer make signiﬁcant progress. However, in this part of the study we areinterested in the convergence properties. This experiment demonstrates that we cannot guarantee a highinversion accuracy (i.e. a signiﬁcant reduction in the gradient) when turning to ﬁrst-order methods. Notethat we have stabilized the Picard method by introducing an additional scaling parameter for the searchdirection that prevents additional line search steps (see §4.3.5). If we neglect this scaling, we observe seven NEWTON-KRYLOV ALGORITHM FOR CONSTRAINED IMAGE REGISTRATION H -regularization ( γ = P i c a r d G N - P C G k m m − | m R − m | − | m R − m | F ig . 5. Illustration of the course of the optimization for the Picard (top block) and the GN-PCG (bottom block) method with respect tothe (outer) iteration index k for exemplary results of the convergence study reported in Tab. 6. We refer to Tab. 6 and the text for details on theexperimental setup. We report results for plain H -regularization ( γ = ) with an empirically chosen regularization parameter of β = − forimages of grid size n x = ( ) T . We report results until convergence of the GN-PCG method (k (cid:63) = ). We display the deformed templatem (top row) and a map of the pointwise difference between m R and m (bottom row) for both iterative optimization methods (as identiﬁed by theinset on the right of the images). Information on how to interpret these images can be found in §D. to nine line search steps on average (results not included in this study) for the considered problem; also,the optimization fails at an early stage. The search direction obtained via the GN-PCG method is nicelyscaled; no additional line search steps are necessary.The trend of the J h , the L -distance and S h in Fig. 6 conﬁrm these observations. The plots in Fig. 6illustrate that the Picard and the GN-PCG method perform similarly during the ﬁrst few outer iterations.However, after about four outer iterations the differences between the methods manifest, in particular withrespect to the reduction of the L -distance. This observation conﬁrms standard numerical optimizationtheory on convergence properties of the Picard and the inexact Newton-Krylov methods.Focusing on the GN-PCG method we can observe that the number of outer iterations is almost constantacross different grid sizes. However, the effectiveness of the spectral preconditioner decreases with anincreasing grid size as well as with a reduction of the regularization parameter (as judged by an increasein the number of inner iterations). This demonstrates that the preconditioner is not optimal. A similarbehavior can be observed for the Picard method .The numerical accuracy of the incompressibility constraint deteriorates (slightly but not signiﬁcantly)as compared to the results reported in the former section. In particular, we obtain a numerical accuracyof O ( − ) for the GN-PCG method (see Tab. 7). Conclusion : We conclude that the GN-PCG is less sensitive, provides a better inversion accuracy andoverall displays quicker convergence if a high accuracy of the inversion is required and, therefore, is to bepreferred. Note that the Picard method is a gradient descent scheme in the function space induced by W . We can interpret the inverse of A h as a preconditioner acting on the body force f . This operator is exactly the spectral preconditioner we use for the Newton-Krylovmethods, which explains the similar behavior. ANDREAS MANG AND GEORGE BIROS − − k f u n c t i o n a l v a l u e β = × − J h Picard L -distance Picard S h Picard J h GN-PCG L -distance GN-PCG S h GN-PCG − − − k f u n c t i o n a l v a l u e β = × − J h Picard L -distance Picard S h Picard J h GN-PCG L -distance GN-PCG S h GN-PCG H -regularization − − − k f u n c t i o n a l v a l u e β = × − J h Picard L -distance Picard S h Picard J h GN-PCG L -distance GN-PCG S h GN-PCG − − − k f u n c t i o n a l v a l u e β = × − J h Picard L -distance Picard S h Picard J h GN-PCG L -distance GN-PCG S h GN-PCG

Stokes regularization F ig . 6. Trend of the objective J h , the L -distance and the regularization model S h (logarithmic scale) for the Picard and the GN-PCGmethod with respect to the (outer) iteration index k for exemplary results of the convergence study reported in Tab. 6. We refer to Tab. 6 and thetext for details on the experimental setup. The trend of the functionals is plotted for different (empirically determined) choices of β (left column: β = − ; right column: β = − ) and a grid size of n x = ( ) T . We report results for plain H -regularization ( γ = ; top row) andthe Stokes regularization scheme ( γ = ; H -regularization; bottom row). Purpose : It is not immediately evident how the number of the coefﬁcient ﬁelds v hl : R d → R d , l =

1, . . . , n c , affects the registration quality. We study the effects of varying n c on the reconstruction qualityand the rate of convergence. We also provide advice on how to decide on n c . Setup : We report results for registration problems of varying complexity. The analysis is limited to theGN-PCG method. We consider the hand images ( n x = ( ) T ) and the brain images ( n x = ( ) T ).The number of the time steps is ﬁxed to n t = ( n x ) . The regularization parameter is empirically setto β = − β = −

2, respectively. We consider the full set of stopping conditions in (4.8) with τ J = −

3, as we no longer compare different methods. The remainder of the parameters are set asstated in §4.3.5.One possibility to estimate an adequate number of coefﬁcients for the registration of unseen images m R and m T is to compute the relative spectral power (see Tab. 2) of an individual coefﬁcient ﬁeld v hl fordifferent choices of n c . If only a small number of coefﬁcients is necessary to recover the deformation, thisenergy should decrease rapidly with an increasing l . The problem is stationary for n c = Results : The trend of the relative (cid:96) -norm (i.e. the spectral power) of an individual coefﬁcient ﬁeld v hl for different choices of n c ∈ {

1, 2, 4, 8, 16 } is plotted in Fig. 7. A qualitative comparison of the registration NEWTON-KRYLOV ALGORITHM FOR CONSTRAINED IMAGE REGISTRATION T able Comparison of the inversion results for the GN-PCG method for a varying number of the spatial coefﬁcient ﬁelds v hl : R d → R d , l =

1, . . . , n c (i.e. we change the number of the unknowns in time). We report results for plain H -regularization. We consider the hand images( n x = ( ) T ; top block) and the brain images ( n x = ( ) T ; bottom block). We consider the full set of stopping conditions in (4.8) with τ J = − , as we no longer study grid convergence and/or compare different optimization methods. We report the number of the (outer)iterations (k (cid:63) ), the number of the hyperbolic PDE solves (n PDE ) and the relative change of (i) the L -distance ( (cid:107) m hR − m h (cid:107) rel ), (ii) the objective( δ J hrel ), and (iii) the (reduced) gradient ( (cid:107) g h (cid:107) ∞ , rel ), as well as the minimal and maximal values of the determinant of the deformation gradient.The deﬁnitions of these measures can be found in Tab. 2. The number of the coefﬁcient ﬁelds n c used to solve the individual registration problemsis chosen to be in {

1, 2, 4, 8, 16 } . n c k n PDE (cid:107) m hR − m h (cid:107) δ J h rel (cid:107) g h (cid:107) ∞ ,rel min ( det ( F h )) max ( det ( F h )) h a n d i m a g es − − − − − − − − − − − − − − − − − − − − b r a i n i m a g es − − − − − − − − − − − − − − − − − − − − − − − l k v h l k , r e l n c = n c = n c = n c = n c =

16 0 2 4 6 8 10 12 14 1610 − − − l k v h l k , r e l n c = n c = n c = n c = n c = F ig . 7. Relative power spectrum of the individual coefﬁcient ﬁelds v hl : R d → R d , l =

1, . . . , n c , for different choices of n c used to solve theconsidered registration problems. We report results for plain H -regularization. The reported results correspond to Tab. 8. We refer to Tab. 8 andthe text for details on the experimental setup. We report exemplary results for the the brain images ( n x = ( ) T ; left) and the hand images( n x = ( ) T ; right). We choose n c to be in {

1, 2, 4, 8, 16 } as indicated in the legend of each plot. The deﬁnition of the relative (cid:96) -norm(relative power spectrum) of v hl can be found in Tab. 2. results for different choices of n c can be found in Fig. 8. Convergence results are reported in Tab. 8. Observations :: The most important observation is that we obtain the same results for stationary as wellas time varying velocity ﬁelds for two-image registration problems. Qualitatively, we cannot observe anydifferences for a varying number of coefﬁcient ﬁelds (see Fig. 8). This observation is conﬁrmed by thevalues for the relative reduction in the L -distance in Tab. 8. Increasing the number of the coefﬁcientsslightly reduces the L -distance. These differences, however, are practically insigniﬁcant. In particular, we(on average) observe a relative change in the L -distance of 6.50E − ± − hand images , plain H -regularization) and 5.37E − ± − brain images , plain H -regularization). Also, we obtain identicaldeformation patterns as judged by careful visual inspection (see Fig. 8) and the variations in the determi-nant of the deformation gradient. We obtain identical results for the UT images (for plain H -regularizationand the Stokes regularization scheme; results are not included in this study).Turning to the required work load, we observe that the differences are also insigniﬁcant. The numberof outer iterations is almost constant; just the number of inner iterations varies. In particular, we require7 outer iteration with (on average) ≈

280 inner iterations ( hand images ; plain H -regularizatoin; γ = ANDREAS MANG AND GEORGE BIROS n c = n c = m − | m R − m | det ( F ) m − | m R − m | det ( F ) m − | m R − m | det ( F ) m − | m R − m | det ( F ) F ig . 8. Qualitative comparison of exemplary registration results for n c = (images to the left) and n c = (images to the right)of the study reported in Tab. 8. We refer to Tab. 8 and the text for details on the experimental setup. We report results for the hand images( n x = ( ) T ) and the brain images ( n x = ( ) T ) for plain H -regularization. We display (for each experiment) the deformedtemplate image m , a pointwise map of the absolute difference between m R and m and a map of the determinant of the deformation gradient det ( F ) (from left to right as identiﬁed by the inset in the images). Information on how to interpret these images can be found in §D. and 20-21 outer iterations with (on average) ≈

693 inner iterations ( brain images ; plain H -regularization; γ = n c increases. That is, we have to store more coefﬁcient ﬁelds v hl (to all of which the regularization operator has to be applied). The cost of the forward and adjoint solves(which is the key bottleneck), however, is (almost) the same, since we expand v h (note that this expansionis not necessary for n c =

1; see §4.3.2).The power spectrum of the coefﬁcient ﬁelds drops quickly (see Fig. 7). This also indicates that only asmall number of coefﬁcients is required to obtain an excellent agreement between the images. However,we expect the differences to manifest, when registering time series of images (multiple time frames). Here,we might beneﬁt from being able to invert for a time varying velocity ﬁeld.

Conclusion : We conclude that it is sufﬁcient to use stationary velocity ﬁelds for two-image registrationproblems. β . Purpose : We study the stability and accuracy of the designed parameter continuation method (see§4.3.5) and the associated control over the properties of the mapping. That is, we study how the quantitiesof interest (determinant of the deformation gradient and L -distance) behave during the course of theparameter continuation and how close we actually approach the given bounds. Setup : The registration problems are solved on images with a grid size of n x = ( ) T . Thenumber of the time points is adapted as required by monitoring the CFL condition (see §4.3.1). We usethe full set of stopping conditions (see §4.2.4) with a tolerance of τ J = −

3. We consider the hand images and the brain images (see Fig. 1). We invert for a stationary velocity ﬁeld (i.e. n c = H - and H -regularization (smoothness regularization; γ = ( F h ) to 1E − hand images ) and 5E − brain images ), respectively. For the case of the Stokes regularization( γ = H -regularization) we set the bound on the grid angle to (cid:101) θ = π /16 (11.25°). The remainder of theparameters are set as described in §4.3.5. Results : We report the obtained estimates for β as well as results for the reconstruction quality anddeformation regularity in Tab. 9. We provide an exemplary illustration of the obtained registration resultsin Fig. 9. We report results for the course of the parameter continuation in Fig. 10. Observations : The most important observation is that we can precisely control the properties of our

NEWTON-KRYLOV ALGORITHM FOR CONSTRAINED IMAGE REGISTRATION T able Quantitative analysis of the parameter continuation in β . We report results for the hand images and the brain images for different regular-ization schemes (see Fig. 1). The spatial grid size for the images is n x = ( ) T . The number of the time points n t is chosen adaptively (see§4.3.5 for details). We use the full set of stopping conditions (see §4.2.4) with a tolerance of τ J = − . We report results for plain H - andH -regularization ( γ = ; top block) as well as for the Stokes regularization scheme ( γ = ; H -regularization; bottom block). We invert for astationary velocity ﬁeld (i.e. n c = ). We report (i) the considered lower bound on the deformation gradient ( (cid:101) F ) or the grid angle ( (cid:101) θ ), (ii) thenumber of the required estimation steps, (iii) the minimal value for the deformation gradient for the optimal regularization parameter, (iv) thecomputed optimal value for β ( β (cid:63) ), (v) the minimal change in β ( δβ min ), as well as (vi) the relative change in the L -distance ( (cid:107) m hR − m h (cid:107) rel ). Smoothness regularizationdata S (cid:101) F steps min ( det ( F h )) β (cid:63) δβ min (cid:107) m hR − m h (cid:107) brain images H − − − − − H − − − − − hand images H − − − − − H − − − − − S (cid:101) θ steps min ( det ( F h )) β (cid:63) δβ min (cid:107) m hR − m h (cid:107) hand images H π /16 10 10E − − − − mapping without having to manually tune any parameters. We only have to decide on geometric bounds(the smallest tolerable deformation of a grid element or a bound on the shear angle of the grid cell) thedecision on which is intuitive for practitioners.The accuracy of our method (in space) is only limited by the grid resolution (i.e. how much frequencieswe can resolve; this statement is conﬁrmed by the experiments conducted in §5.3.2) as well as the deﬁnedbounds on the binary search used to estimate β (see §4.3.5). Clearly, the desired level of accuracy competeswith the computational work load we are willing to invest.For plain H - and H -regularization, we achieve an excellent agreement between m R and m (see Fig. 9)with a reduction of the L -distance by approximately half an order of magnitude for the brain images and1.5 orders of magnitude for the hand images (see Tab. 9). The discrepancy between the lower bound (cid:101) F and min ( det ( F h )) for the obtained optimal value of β is small. In particular, we are e.g. bounded fromabove by an absolute difference of 1.13E − brain images ( H -regularization) and 5.10E − hand images ( H -regularization). These values are well above the attainable accuracy reported in §5.3.2.For the results reported for the Stokes regularization scheme we can qualitatively (see Fig. 9) andquantitatively (see Tab. 9) observe that enforcing incompressibility up to numerical accuracy is a toostrong prior for the considered problem. However, the key observation and intention of this experimentis to demonstrate that we attain a deformation that is very well behaved (with det ( F h ) = ( F h ) in Fig. 9).If we further decrease the bound on det ( F h ) we will loose control and generate a mapping that locallyis close to being nondiffeomorphic. We again emphasize that the intention of this work is the study ofalgorithmic properties. We will address the practical beneﬁt of exploiting a model of (near-)incompressibleﬂow in a follow-up paper and refer to [10, 16, 52, 62] for potential applications. This exemplary result onreal-world data demonstrates that it might be beneﬁcial to consider a relaxation of the incompressibilityconstraint in order to improve the mismatch between the considered images while maintaining as muchcontrol on the deformation regularity as possible.In future work, we will focus on improvements of the computational efﬁciency for estimating β . Wehave tested combining it with a grid continuation but could not observe strong improvements. We willalso investigate the idea of providing a coarse estimation of β via the spectral properties of the Hessianand from there on do a parameter continuation (see §4.3.5 for additional comments). Conclusion : We conclude that the designed framework is highly accurate, is stable, guarantees defor-mation regularity (assuming that the user-deﬁned tolerance is sufﬁciently bounded away from irregular-ities) and does not require any additional tuning of parameters. The user merely has to provide a lower6

ANDREAS MANG AND GEORGE BIROS H - r e g u l a r i za t i o n m − | m R − m T | − | m R − m | det ( F ) H - r e g u l a r i za t i o n ) m − | m R − m T | − | m R − m | det ( F ) S t o k e s r e g u l a r i za t i o n m − | m R − m T | − | m R − m | det ( F ) F ig . 9. Qualitative illustration of exemplary registration results of the results for the parameter continuation in β reported in Tab. 9. Werefer to Tab. 9 and the text for details on the experimental setup. We report results for the brain images (top row; plain H -regularization; γ = )and the hand images (middle row: plain H -regularization ( γ = ); bottom row: Stokes regularization scheme ( γ = ; H -regularization)).We display the deformed template image m , a map of the absolute difference between m R and m T and between m R and m and a map of thedeterminant of the deformation gradient det ( F ) (from left to right as indicated in the inset in the images). Information on how to interpret theseimages can be found in §D. bound on an acceptable volume change or a bound on the distortion of a volume element (shear angle),the decision on which is intuitive for practitioners.

6. Conclusions.

We have presented numerical methods for large deformation diffeomorphic nonrigidimage registration that( i ) operate in a black-box fashion, ( ii ) introduce novel algorithmic features (includ-ing a second-order Newton-Krylov method (a full Newton method and a Gauss-Newton approximation),spectral preconditioning, an efﬁcient solver for Stokes problems, and a spectral Galerkin method in time),( iii ) is stable and efﬁcient and ( iv ) guarantees deformation regularity with an explicit control on the qualityof the deformation.In addition, we have conducted a detailed numerical study to demonstrate computational performanceand numerical behavior on synthetic and real-world problems. The most important observations of ourstudy are the following: • The Newton-Krylov methods outperform the globalized Picard method (see §5.3.2) as we increasethe image size and the registration ﬁdelity.

NEWTON-KRYLOV ALGORITHM FOR CONSTRAINED IMAGE REGISTRATION − e J m i n ( d e t ( F h )) acceptedoptimalrejected 0 2 4 6 8 100,60,811,21,4 h d r T r − − β − e J m i n ( d e t ( F h )) h d r T r − − − − β F ig . 10. Exemplary illustration of the course of the parameter continuation in β for the quantitative results reported in Tab. 9. We referto Tab. 9 and the text for details on the experimental setup. We report results for the brain images (top row) and the hand images (bottom row)for plain H -regularization. For each experiment, we display (from left to right) (i) the trend of the minimal value of the determinant of thedeformation gradient (the dashed line indicates the user-deﬁned lower bound on det ( F h ) ), (ii) the trend of the L -distance (h d is the grid cellvolume and r : = m hR − m hT , r ∈ R ˜ n , ˜ n = ∏ i = n ix ), and (iii) the trend of β , all with respect to the parameter continuation step. We indicate ourjudgment on the results in color. That is, if a result is accepted (i.e. min ( det ( F h )) ≥ (cid:101) F ) we plot the marker in green and if a result is rejected(i.e. min ( det ( F h )) < (cid:101) F ) we plot the marker in red. The optimal value is plotted in blue. The plots correspond to the results reported in Tab. 9. • We can enforce incompressibility with high accuracy. The numerical accuracy (in space) is onlylimited by the resolution of the data (see §5.3.2). • We can compute deformations that are guaranteed to be regular (i.e. a local diffeomorphism) upto user speciﬁcations. Controlling the magnitude of det ( F ) is not sufﬁcient, as volume elementsmight still collapse (deformation ﬁeld with strong shear). Therefore, we introduced a parame-ter continuation in β that can be interfaced not only with lower bounds on the determinant ofthe deformation gradient but also with bounds on the geometric properties of the grid cells (inparticular, the shear angle of a grid cell; see §5.4). • The experiments reported in this study demonstrate that it is adequate to limit the inversion tostationary velocity ﬁelds when considering two-image registration problems. This observationis in accordance with results reported for other classes of large deformation image registrationalgorithms [1, 3, 44, 52, 69, 70]. We have additionally provided advice on how to decide on thenumber of unknowns in time (see §5.3.3).The control equation for the velocity is a space-time nonlinear elliptic system. But the main cost in ourformulation is the solution of transport problems to compute the image transformation and the adjointvariables. That is, we require two hyperbolic PDE solves for computing the gradient (which essentiallycorresponds to one Picard iteration or one outer iteration of a Newton-Krylov method; see Alg. 1) andan additional two hyperbolic PDE solves for evaluating the incremental control equation (Hessian matrix-vector product in Newton-Krylov methods; see Alg. 2) in each inner iteration of the Krylov subspacemethod. Because we use a pseudospectral discretization in space, the elliptic solve for the Picard iterationis (for quadratic regularization models) only at the cost of a spectral diagonal scaling. For the Newton-Krylov methods, we have to solve a linear system using an iterative solver. The Picard scheme has a lowercost per iteration but requires more iterations than the Newton-Krylov scheme.Our results demonstrate that there is a signiﬁcant difference in stability, computational work andaccuracy between the Picard and Newton-Krylov methods, especially when we require a high accuracyof the inversion. If we require an inaccurate solution or use a strong regularization, the differences be-tween Picard and Newton-Krylov methods are less pronounced. Better preconditioning of the Hessian8

ANDREAS MANG AND GEORGE BIROS would make the Newton-Krylov approach preferable across the spectrum of accuracy requirements. TheNewton-Krylov approach is not signiﬁcantly more complex, since we essentially use the same numericaltools that have been used for the solution of the ﬁrst-order optimality conditions. Also, the individualbuilding blocks ((incremental) forcing term, regularization operator A and the projection operator K )that appear in the ﬁrst- and second-order optimality system are very similar. Therefore, the difference ofsolving the ﬁrst- or the second-order optimality conditions essentially amounts to interfacing a Krylov-subspace method to solve the saddle point problem.By formulating the nonrigid image registration as a problem of optimal control, we target the designof a generic, biophysically constrained framework for large deformation diffeomorphic image registration.Further, there are many applications that do require incompressible or near-incompressible deformations,for example, in medical image analysis. Our framework provides such a technology.We report results only in two dimensions. Nothing in our formulation and numerical approximationis speciﬁc to the 2D case. The next steps will be its extension to 3D, and to problems that have timesequences of images. For such cases, we expect to have to invert for a nonstationary velocity ﬁeld. Inaddition, we aim at designing a framework that allows for a relaxation of the incompressibility constraint,as we observed in this study that incompressibility might be a too strong prior. We also observed (resultsnot included in this study) that the use of an incompressibility constraint can promote shear. In a follow-up paper, we will target this problem by introducing a novel continuum mechanical model that allows usto control the shear inside the deformation ﬁeld. Acknowledgments.

We would like to thank Amir Gholaminejad, Georg Stadler and Bryan Quaifefor helpful discussions and comments. We would like to thank Florian Tramnitzke for his initiative workon this project. This material is based upon work supported by AFOSR grants FA9550-12-10484 andFA9550-11-10339; by NSF grants CCF-1337393, OCI-1029022; by the U.S. Department of Energy, Ofﬁce ofScience, Ofﬁce of Advanced Scientiﬁc Computing Research, Applied Mathematics program under AwardNumbers DE-SC0010518, DE-SC0009286; and DE-FG02-08ER2585 and by NIH grant 10042242. Any opin-ions, ﬁndings, and conclusions or recommendations expressed herein are those of the authors and do notnecessarily reﬂect the views of the AFOSR or the NSF.

Appendix A. Expansion in Time: Derivation.

This section summarizes modiﬁcations of the regularization operator as well as the (incremental)control equation on account of the expansion in time (see §4.3.2). Inserting (4.10) into (3.1) yields(A.1) (cid:90) S [ v ] d t = β n c ∑ l = n c ∑ l (cid:48) = c ll (cid:48) (cid:90) Ω (cid:104)B [ v l ] , B [ v l (cid:48) ] (cid:105) d x , where c ll (cid:48) : = (cid:90) b l b l (cid:48) d t ∀ l , l (cid:48) ∈ N .Taking ﬁrst and second variations with respect to the l -th expansion coefﬁcient v l yields the controlequation(A.2) β n c ∑ l (cid:48) = c ll (cid:48) A [ v l (cid:48) ] + (cid:90) b l K [ f ] d t = l =

1, . . . , n c ,and the incremental control equation(A.3) H l [ ˜ v l ] : = β n c ∑ l (cid:48) = c ll (cid:48) A [ ˜ v l (cid:48) ] + (cid:90) b l K [ ˜ f ] d t , l =

1, . . . , n c ,respectively. Accordingly, the operators B and A simply act on v l instead of v . The deﬁnition for theseoperators can be found in §4.1.We use a global basis on the unit time horizon for the expansion (see §4.3.2). We use Chebyshevpolynomials as basis functions in (4.10) on account of their excellent approximation properties as well astheir orthogonality. The latter property considerably reduces the computational complexity, since c ll (cid:48) = NEWTON-KRYLOV ALGORITHM FOR CONSTRAINED IMAGE REGISTRATION l , l (cid:48) , l (cid:48) (cid:54) = l , and c ll = Chebyshev-Gauss-Labatto nodes are used.

Appendix B. Incompressibility Constraint: Elimination.

Here, we derive the elimination of p and ˜ p from the optimality systems for the Stokes regularizationscheme. We only consider a quadratic H -regularization for the velocity v . However, the same line ofarguments applies to the H -regularization model.Applying the divergence to (4.1d) results in −∇ · β ∇ v + ∇ p + ∇ · f = Ω × [

0, 1 ] .Under the optimality assumption ∇ · v = p = −∇ − ( ∇ · f ) . Inserting this expression into (4.1d) projects v onto the manifold of divergence-freevelocity ﬁelds and as such eliminates (4.1c) (assuming that the initial v is divergence-free). Accordingly,we obtain the control equation (reduced gradient)(B.1) − β ∇ v + K [ f ] =

0, where K [ f ] : = −∇ ( ∇ − ( ∇ · f )) + f ,to replace (4.1d). This expression is equivalent to (4.11) in the case when an H -regularization model isused (i.e. A = −∇ ). As stated above, the derivation also holds for the H -regularization operator. Weonly have to replace − β ∇ v by β ∇ v . Computing the second variations of the weak form of the eliminatedsystem yields the incremental control equation − β A [ ˜ v ] + K [ ˜ f ] = − ˆ g ,where ˆ g is the reduced gradient in (B.1). Appendix C. Relation to LDDMM.

In this section we relate our work to [41] and by that to approaches based on LDDMM [3, 4, 23, 53, 68].Since the work in [4, 41] is based on ﬁrst-order information, we only consider the reduced gradient in (4.1d)(setting γ = (cid:90) (cid:104) g , ˜ v (cid:105) L ( Ω ) d t = (cid:90) (cid:104) β ( BB H )[ v ] + f , ˜ v (cid:105) L ( Ω ) d t = (cid:90) (cid:104) β v + ( BB H ) − [ f ] , ˜ v (cid:105) W . d t The expression β v + ( BB H ) − [ f ] = v + ( β BB H ) − [ f ] is exactly the gradient in the function space W thathas been used in [41]. This expression yields the preconditioned gradient descent scheme v hk + = v hk − α k (( β B h B h , H ) − [ f hk ] − v hk ) ,where ( β B h B h , H ) − [ f h ] is nothing but a Picard iterate (see (4.7)). Subtracting v hk translates this iterate intoan update. This is exactly the formulation we have used in this work (see §4.2.3) so that the consideredﬁrst-order method is equivalent to the solver used in [41] (under the assumption that α k =

1, i.e. if weneglect the line search). Accordingly, the same line of arguments used in [41] to relate their work toLDDMM apply to our numerical framework.

Appendix D. Measures of Deformation Regularity. Since we discuss the implementation of the incompressibility constraint we set γ = ANDREAS MANG AND GEORGE BIROS

D.1. Deformation Map.

To visualize the deformation pattern, y has to be inferred from v . This canbe done by solving(D.1) ∂ t u + ( ∇ u ) v = v in Ω × (

0, 1 ] , u = Ω × { } ,with periodic boundary conditions on ∂ Ω . Here, u : ¯ Ω × [

0, 1 ] → R d , ( x , t ) (cid:55)→ u ( x , t ) , is a displacementﬁeld and y : = x − u , y : ¯ Ω → R d , where u : = u ( · , t = ) , u : ¯ Ω → R d , x (cid:55)→ u ( x ) . Visualization : As can be seen in the visualization of the deformed grids, the mapping y actuallycorresponds the inverse of the deformation map applied to an image. This reﬂects the fact that our modelis formulated in an Eulerian frame of reference. Note that all images reported are high-resolution vectorgraphics. Zooming in in the digital version of the paper will reveal local properties of the deformationmap. D.2. Deformation Gradient.

It is well known from calculus that the determinant of the Jacobianmatrix det ( ∇ y ) can be used to assess invertibility of y as well as local volume change, provided that y ∈ C ( Ω ) d . In the framework of continuum mechanics, we can obtain this information from the deformationtensor ﬁeld F : ¯ Ω × [

0, 1 ] → R d × d , where F is related to v by(D.2) ∂ t F + ( v · ∇ ) F = ( ∇ v ) F in Ω × (

0, 1 ] , F = I in Ω × { } ,with periodic boundary conditions on ∂ Ω . Here, I = diag (

1, . . . , 1 ) ∈ R d × d ; det ( F ) is equivalent todet ( ∇ y ) , where F : = F ( · , t = ) , F : ¯ Ω → R d × d , x (cid:55)→ F ( x ) . Visualization : We limit the color map for the display of det ( F ) to [

0, 2 ] . In particular, the color mapranges from black (compression: det ( F ) ∈ (

0, 1 ) ; black corresponds to values of 0 or below (due toclipping), which represents a singularity or the loss of mass, respectively) to orange (mass conservation:det ( F ) =

1) to white (expansion: det ( F ) >

1; white represents values of 2 or greater (due to clipping)).

REFERENCES[1] V. A rsigny , O. C ommowick , X. P ennec , and N. A yache , A Log-Euclidean framework for statistics on diffeomorphisms , in LectNotes Comput Sc, vol. 4190, 2006, pp. 924–931. 4, 27[2] J. A shburner , A fast diffeomorphic image registration algorithm , NeuroImage, 38 (2007), pp. 95–113. 1, 4[3] J. A shburner and

K. J. F riston , Diffeomorphic registration using geodesic shooting and Gauss-Newton optimisation , NeuroImage,55 (2011), pp. 954–967. 1, 4, 27, 29[4] M. F. B eg , M. I. M iller , A. T rouv ´ e , and L. Y ounes , Computing large deformation metric mappings via geodesic ﬂows of diffeomor-phisms , Int J Comput Vis, 61 (2005), pp. 139–157. 1, 3, 4, 6, 29[5] M. B enzi , G. H. G olub , and J. L iesen , Numerical solution of saddle point problems , Acta Numerica, 14 (2005), pp. 1–137. 7[6] M. B enzi , E. H aber , and L. T aralli , A preconditioning technique for a class of PDE-constrained optimization problems , Adv ComputMath, 35 (2011), pp. 149–173. 5, 8[7] G. B iros and

G. D o ˇ gan , A multilevel algorithm for inverse problems with PDE constraints , Inverse Probl, 24 (2008). 8[8] G. B iros and

O. G hattas , Parallel Lagrange-Newton-Krylov-Schur methods for PDE-constrained optimization—Part I: The Krylov-Schur solver , SIAM J Sci Comput, 27 (2005), pp. 687–713. 7, 8, 9[9] ,

Parallel Lagrange-Newton-Krylov-Schur methods for PDE-constrained optimization—Part II: The Lagrange-Newton solver andits application to optimal control of steady viscous ﬂows , SIAM J Sci Comput, 27 (2005), pp. 714–739. 5, 7, 8, 9[10] A. B orz ` i , K. I to , and K. K unisch , Optimal control formulation for determining optical ﬂow , SIAM J Sci Comput, 24 (2002),pp. 818–847. 2, 4, 5, 25[11] J. P. B oyd , Chebyshev and Fourier spectral methods , Dover, Mineola, New York, US, 2000. 10, 29[12] F. B rezzi and

M. F ortin , eds.,

Mixed and hybrid ﬁnite element methods , Springer, 1991. 3[13] C. B roit , Optimal registration of deformed images , PhD thesis, Computer and Information Science, University of Pennsylvania,Philadelphia, Pennsylvania, US, 1981. 1, 3[14] M. B urger , J. M odersitzki , and L. R uthotto , A hyperelastic regularization energy for image registration , SIAM J Sci Comput, 35(2013), pp. B132–B148. 1, 4[15] R. H. B yrd , F. E. C urtis , and J. N ocedal , An inexact SQP method for equality constrained optimization , SIAM J Optim, 19 (2008),pp. 351–369. 7[16] K. C hen and

D. A. L orenz , Image sequence interpolation using optimal control , J Math Imag Vis, 41 (2011), pp. 222–238. 2, 3, 4, 5,6, 25[17] ,

Image sequence interpolation based on optical ﬂow, segmentation and optimal control , IEEE T Image Process, 21 (2012),pp. 1020–1030. 4 NEWTON-KRYLOV ALGORITHM FOR CONSTRAINED IMAGE REGISTRATION [18] G. E. C hristensen , R. D. R abbitt , and M. I. M iller ,

3D brain mapping using a deformable neuroanatomy , Phys Med Biol, 39(1994), pp. 609–618. 3, 4[19] ,

Deformable templates using large deformation kinematics , IEEE T Image Process, 5 (1996), pp. 1435–1447. 1, 3, 4[20] R. S. D embo , S. C. E isenstat , and T. S teihaug , Inexact Newton methods , SIAM J Numer Anal, 19 (1982), pp. 400–408. 7[21] R. S. D embo and

T. S teihaug , Truncated-Newton algorithms for large-scale unconstrained optimization , Math Program, 26 (1983),pp. 190–212. 7, 9[22] M. D roske and

M. R umpf , A variational approach to non-rigid morphological registration , SIAM Appl Math, 64 (2003), pp. 668–687.1[23] P. D upuis , U. G ernander , and M. I. M iller , Variational problems on ﬂows of diffeomorphisms for image matching , Qart Appl Math,56 (1998), pp. 587–600. 1, 3, 4, 29[24] S. C. E isentat and

H. F. W alker , Choosing the forcing terms in an inexact Newton method , SIAM J Sci Comput, 17 (1996),pp. 16–32. 7[25] B. F ischer and

J. M odersitzki , Fast diffusion registration , Contemp Math, 313 (2002), pp. 117–129. 1, 3[26] ,

Curvature based image registration , J Math Imag Vis, 18 (2003), pp. 81–85. 1, 3[27] ,

Ill-posed medicine – an introduction to image registration , Inverse Probl, 24 (2008), pp. 1–16. 1, 3, 5[28] C. F rohn -S chauf , S. H enn , and K. W itsch , Multigrid based total variation image registration , Comut Visual Sci, 11 (2008),pp. 101–113. 1, 3[29] P. E. G ill , W. M urray , and M. H. W right , Practical optimization , Academic Press, Waltham, Massachusetts, US, 1981. 9, 11[30] G. H. G olub and

C. F. V. L oan , Matrix Computations , The Johns Hopkins University Press, 3 ed., 1996. 12[31] A. G ooya , K. M. P ohl , M. B ilello , L. C irillo , G. B iros , E. R. M elhem , and C. D avatzikos , GLISTR: Glioma image segmentationand registration , IEEE T Med Imaging, 31 (2013), pp. 1941–1954. 2[32] M. D. G unzburger , Perspectives in ﬂow control and optimization , SIAM, Philadelphia, Pennsylvania, US, 2003. 6[33] M. E. G urtin , An introduction to continuum mechanics , vol. 158 of Mathematics in Science and Engineering, Academic Press,1981. 2, 4[34] E. H aber and

U. M. A scher , Preconditioned all-at-once methods for large, sparse parameter estimation problems , Inverse Probl, 17(2001), pp. 1847–1864. 5, 7, 8[35] E. H aber , U. M. A scher , and D. O ldenburg , On optimization techniques for solving nonlinear inverse problems , Inverse Probl, 16(2000), pp. 1263–1280. 12[36] E. H aber , R. H oresh , and J. M odersitzki , Numerical optimization for constrained image registration , Numer Linear Algebr, 17(2010), pp. 343–359. 4[37] E. H aber and

J. M odersitzki , Numerical methods for volume preserving image registration , Inverse Probl, 20 (2004), pp. 1621–1638.4[38] ,

A multilevel method for image registration , SIAM J Sci Comput, 27 (2006), pp. 1594–1607. 12[39] ,

Image registration with guaranteed displacement regularity , Int J Comput Vis, 71 (2007), pp. 361–372. 1, 4[40] L. H and , J. H. H ipwell , B. E iben , D. B arratt , M. M odat , S. O urselin , and D. J. H awkes , A nonlinear biomechanical modelbased registration method for aligning prone and supine MR breast images , IEEE T Med Imaging, 33 (2014), pp. 682–694. 2[41] G. L. H art , C. Z ach , and M. N iethammer , An optimal control approach for deformable registration , in Proc CVPR IEEE, 2009,pp. 9–16. 3, 4, 6, 29[42] S. H enn , A multigrid method for a fourth-order diffusion equation with application to image processing , SIAM J Sci Comput, 27 (2005),pp. 831–849. 1[43] H. H ernandez , Gauss-Newton inspired preconditioned optimization in large deformation diffeomorphic metric mapping , Phys MedBiol, 59 (2014), pp. 6085–6115. 4[44] M. H ernandez , M. N. B ossa , and S. O lmos , Registration of anatomical images using paths of diffeomorphisms parameterized withstationary vector ﬁeld ﬂows , Int J Comput Vis, 85 (2009), pp. 291–306. 1, 3, 4, 6, 27[45] C. H ogea , C. D avatzikos , and G. B iros , Brain-tumor interaction biophysical models for medical image registration , SIAM J SciComput, 30 (2008), pp. 3050–3072. 2[46] B. K. P. H orn and

B. G. S hunck , Determining optical ﬂow , Artif Intell, 17 (1981), pp. 185–203. 4[47] E. M. K almoun , L. G arrido , and V. C aselles , Line search multilevel optimization as computational methods for dense optical ﬂow ,SIAM J Imag Sci, 4 (2011), pp. 695–722. 4[48] E. L ee and

M. G unzburger , An optimal control formulation of an image registration problem , J Math Imag Vis, 36 (2010), pp. 69–80.1, 3, 4[49] ,

Analysis of ﬁnite element discretization of an optimal control fomualtion of the image registration problem , SIAM J Numer Anal,49 (2011), pp. 1321–1349. 4[50] D. L oeckx , F. M aes , D. V andermeulen , and P. S uetens , Nonrigid image registration using free-form deformations with a localrigidity constraint , in Lect Notes Comput Sc, vol. 3216, 2004, pp. 639–646. 4[51] A. M ang , A. T oma , T. A. S chuetz , S. B ecker , T. E ckey , C. M ohr , D. P etersen , and T. M. B uzug , Biophysical modeling of braintumor progression: from unconditionally stable explicit time integration to an inverse problem with parabolic PDE constraints formodel calibration , Med Phys, 39 (2012), pp. 4444–4459. 2[52] T. M ansi , X. P ennec , M. S ermesant , H. D elingette , and N. A yache , iLogDemons: A demons-based registration algorithm fortracking incompressible elastic biological tissues , Int J Comput Vis, 92 (2011), pp. 92–111. 1, 2, 4, 25, 27[53] M. I. M iller , Computational anatomy: Shape, growth and atrophy comparison via diffeomorphisms , NeuroImage, 23 (2004), pp. S19–S33. 1, 3, 4, 29[54] J. M odersitzki , Numerical methods for image registration , Oxford University Press, New York, 2004. 1, 3, 5[55] ,

FLIRT with rigidity—image registration with a local non-rigidity penalty , Int J Comput Vis, 76 (2008), pp. 153–163. 4 ANDREAS MANG AND GEORGE BIROS[56] ,

FAIR: Flexible algorithms for image registration , SIAM, Philadelphia, Pennsylvania, US, 2009. 5, 9, 11, 12[57] O. M useyko , M. S tiglmayr , K. K lamroth , and G. L eugering , On the application of the Monge-Kantorovich problem to imageregistration , SIAM J Imaging Sci, 2 (2009), pp. 1068–1097. 3, 4[58] M. N ielsen , P. J ohansen , A. D. J ackson , and B. L autrup , Brownian warps: A least committed prior for non-rigid registration , inLect Notes Comput Sc, vol. 2489, 2002, pp. 557–564. 4[59] J. N ocedal and

S. J. W right , Numerical Optimization , Springer, New York, New York, US, 2006. 7, 8[60] X. P ennec , R. S tefanescu , V. A rsigny , P. F illard , and N. A yache , Riemannian elasticity: A statistical regularization frameworkfor non-linear registration , in Lect Notes Comput Sc, vol. 3750, 2005, pp. 943–950. 4[61] T. R ohlfing , C. R. M aurer , D. A. B luemke , and M. A. J acobs , Volume-preserving nonrigid registration of MR breast images usingfree-form deformation with an incompressibility constraint , IEEE T Med Imaging, 22 (2003), pp. 730–741. 1, 4[62] P. R uhnau and

C. S chn ¨ orr , Optical Stokes ﬂow estimation: An imaging-based control approach , Exp Fluids, 42 (2007), pp. 61–78.2, 3, 4, 25[63] M. R umpf and

B. W irth , A nonlinear elastic shape averaging approach , SIAM J Imaging Sci, 2 (2009), pp. 800–833. 1[64] M. S dika , A fast nonrigid image registration with constraints on the Jacobian using large scale constrained optimization , IEEE T MedImaging, 27 (2008), pp. 271–281. 4[65] A. S otiras , C. D avatzikos , and N. P aragios , Deformable medical image registration: A survey , IEEE T Med Imaging, 32 (2013),pp. 1153–1190. 1, 3, 5[66] H. S undar , C. D avatzikos , and G. B iros , Biomechanically constrained 4D estimation of mycardial motion , in Lect Notes ComputSc, vol. 5762, 2009, pp. 257–265. 2[67] J. P. T hirion , Image matching as a diffusion process: An analogy with Maxwell’s demons , Med Image Anal, 2 (1998), pp. 243–260. 4[68] A. T rouv ´ e , Diffeomorphism groups and pattern matching in image analysis , Int J Comput Vis, 28 (1998), pp. 213–221. 1, 3, 4, 29[69] T. V ercauteren , X. P ennec , A. P erchant , and N. A yache , Symmetric log-domain diffeomorphic registration: A demons-basedapproach , in Lect Notes Comput Sc, vol. 5241, 2008, pp. 754–761. 1, 4, 27[70] ,