[PDF] Elastic 3D-2D Image Registration

Abstract

We propose a method to non-rigidly align a three-dimensional (3D) volumetric image with a two-dimensional (2D) planar image representing a projection of the deformed volume. The application in mind comes from biological studies in which 2D intravital microscopy videos of living tissue are recorded, after which the tissue is excised and a more detailed 3D volume microscopy is performed. Coregistration of both data sets allows to combine the temporal (but 2D) information with more detailed spatial 3D information. Our approach is variational and uses a hyperelastic deformation regularization, as is appropriate for biological material. As a particular feature, the out of plane deformation is estimated based on the out of focus blur inside the 2D microscopy image. The approach becomes computationally feasible through the use of a coarse-to-fine optimization strategy and higher order optimization methods.

Full PDF

EElastic 3D-2D Image Registration

Paul Striewski ∗ Benedikt Wirth † Abstract

We propose a method to non-rigidly align a three-dimensional (3D) volumetric image witha two-dimensional (2D) planar image representing a projection of the deformed volume. Theapplication in mind comes from biological studies in which 2D intravital microscopy videosof living tissue are recorded, after which the tissue is excised and a more detailed 3D volumemicroscopy is performed. Coregistration of both data sets allows to combine the temporal(but 2D) information with more detailed spatial 3D information.Our approach is variational and uses a hyperelastic deformation regularization, as is ap-propriate for biological material. As a particular feature, the out of plane deformation isestimated based on the out of focus blur inside the 2D microscopy image. The approachbecomes computationally feasible through the use of a coarse-to-ﬁne optimization strategyand higher order optimization methods.

In this article we propose an elastic 3D-2D image registration technique, which is motivated by achallenging image processing problem arising in biological studies of leucocyte extravasation intoinﬂamed tissue in mice. A common experimental procedure consists of irritating a certain tissueregion and performing two-dimensional intravital microscopy (IVM) to monitor the reactions onthe cellular level, see for instance [4]. After a suﬃciently long video sequence has been recorded,the animal is sacriﬁced, the tissue is removed, stained, and ﬁxed, and a single, volumetric 3Dimage is acquired using a 3D confocal microscope. The two obtained datasets therefore showessentially the same tissue region, one being an elastically deformed conﬁguration of the other.While the IVM recording shows in addition the temporal behaviour of the moving leucocytes, theconfocal microscopy image contains more detailed structures such as the blood vessel basementmembranes. Aligning and merging both datasets allows to combine the enhanced structural infor-mation obtained by the 3D confocal microscope with temporal information gained from the IVM.Both the 2D and 3D dataset use ﬂourescent staining and thus possess multiple colour channelscorresponding to diﬀerent ﬂourescent dyes. In particular, one channel of both images containsthe same biological structures (for instance the endothelial cell membranes) so that those can beused for registration. Our approach for registering the images is based on mimicking the imageacquisition system of the 2D microscope, including a non-rigid tissue deformation and a blurringand projection step. The intensities of the blurred and projected image volume are then comparedwith those of a 2D reference image (for instance a single frame of the IVM video), see ﬁg. 1 for theoverall procedure. The goal of the registration process is then to ﬁnd a deformation transformingthe volumetric image in such a way that the similarity of the two datasets is maximal. The set ofadmissible deformations is chosen in a way that reﬂects physical properties of the underlying tissue.In this article, we present a complete mathematical description of the aforementioned procedureand, using the direct method in the calculus of variations, guarantee the existence of an admissibledeformation which maximizes the similarity between the datasets. Although rigid or parameterized ∗ Applied Mathematics M¨unster,

[email protected] † Applied Mathematics M¨unster,

[email protected] a r X i v : . [ m a t h . O C ] J a n igure 1: Forward model of the relation between the 2D and 3D datasets. Given a specimen ofthe tissue, 3D confocal microscopy yields a corresponding 3D image of voxels. Taking the tissueconﬁguration during confocal microscopy as a reference, the 2D microscopy image is obtained by a(geometrically nonlinear) deformation and a subsequent 2D image acquisition in which structuresoutside the focus plane are blurred during the projection onto the 2D image plane.3D-2D registration [18, 25] and elastic 3D-3D registration [3] has been studied before, elastic 3D-2Dregistration has so far not been attempted to the best of our knowledge.Xu and Wan presented an intensity based 3D-2D registration technique for the matching of a CTvolume to 2D X-Ray images and a description of how their method can be implemented on GPUs[24]. Studies on CT to X-Ray Fluoroscopy registration for guidance of surgical intervention werepublished by Otake et al. [18] and Uneri et al. [23]. All three articles employ rigid motions toobtain optimal alignment, which can be expected to yield satisfactory registration results for thisspeciﬁc type of problem.Also descriptions of nonrigid methods can be found in the literature. Zheng and Yu presenteda spline based 3D-2D registration technique [25] for matching 3D CT data to 2D X-Ray imagesusing a statistical deformation model. The article also references a number of publications whichdescribe similar methods. In contrast to our setting, in all above studies the 2D images result froma 3D volume via the X-Ray transform, while our 2D images stem from optical microscopy and thusdepict structures outside the focus plane with blur. On the one hand this blur makes the lateralalignment of structures between the 3D and 2D image more diﬃcult (also computationally), onthe other hand it can provide additional information about the deformation along the viewingdirection, which in the above studies must come solely from the deformation model.Heldmann and Papenberg presented a variational method to register 3D CT data to a sequence of2D ultrasound slices [10]. Note that their 2D images thus stem from point- or slicewise evaluationof a 3D volume rather than from an integral transform as in our and the above approaches. Thewell-posedness of this problem thus requires high regularity of the 3D image to be registered and thecorresponding deformation, which the authors provide employing curvature regularization. Theyinclude results of the application of their technique to clinical data, however, no in depth analysisof the presented energy functional is presented. Berkels et al. studied 2D-3D surface registration,where graph representations of cortical surfaces had to be matched to 2D brain images. Unlikein our case, an optimal deformation ψ : R → R had to be found. For the regularization ofthis highly ill-posed problem, the authors suggested a regularizer based on the second order thinplate spline energy of ψ [2]. A diﬀerent, yet still related, situation can arise in surface to surfacematching problems, if surface regions are locally described by two dimensional images which thenin turn can be processed further by 2D-2D image matching algorithms. Such methods were forexample described by Merelli et al. [14, 15].Our method will estimate displacement in viewing direction from the severity of out of focus blur.A closely related problem, known as depth-from-defocus , is to estimate the distance between anobject and the image acquisition device from a whole image stack (rather than just one image asin our case) that was generated by capturing the object under varying focal settings. The forwardmodel described by Persch et al. [20], which mimics the image acquisition of a thin lens camera, isomparable to our 2D microscope model. Likewise related is the article by Aguet et al. [1], whichsuggests a method of how an image stack, produced by moving a sample through the diﬀerentfocal planes of a 2D microscope, can be combined into a single feature enriched image.For further methods we refer to the extensive overview [13] of 3D-2D registration techniques witha focus on applications in image guided surgery.In the next section we present the mathematical model and the analysis of existence of solutionsin detail. Section 3 describes the numerical implementation, while section 4 shows the behaviourof the method in carefully chosen test cases and a real biological dataset. We continue with the mathematical description of our registration problem. We ﬁrst brieﬂydescribe the forward operator of the 2D microscope and subsequently a physically reasonablemodel of tissue deformations.For simplicity, our exposition will consider the 3D and 2D unit domains Ω = [ − , andΩ = [ − , , respectively. The volumetric image stemming from 3D confocal microscopy isdenoted u : Ω → [0 , ∞ ) and is considered to represent the tissue reference conﬁguration.In comparison to the IVM the quality of the 3D microscopy process is typically good enough toassume u noise- and blurfree. The 2D microscopy image u : Ω → [0 , ∞ ) will be interpretedas resulting from a non-rigid deformation of u and a subsequent projection into the plane.Above, both u and u are assumed to represent only that colour channel in which the samebiological structures are visible; furthermore we will assume both images to be uniformly bounded(which is appropriate since the image intensities can be interpreted as the local concentration ofﬂuorescent molecules).The forward operator of 2D microscopy can be modelled by F : L ∞ (Ω ) → L ∞ (Ω ) , ( F u )( x , x ) = [ χ ∗ u ]( x , x , , where the blurring kernel χ ∈ L ( R ) has compact support and encodes the microscope’s pointspread function. Above, L p denotes the standard Lebesgue function space, and χ ∗ u denotesconvolution (after extending u by zero outside Ω ). Due to | [ χ ∗ u ]( x ) − [ χ ∗ u ](˜ x ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) R ( χ ( x − z ) − χ (˜ x − z )) u ( z ) d z (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:107) χ ( x −· ) − χ (˜ x −· ) (cid:107) L (cid:107) u (cid:107) L ∞ → ˜ x → x χ ∗ u ∈ C c ( R ) (the space of continuous functions with compact support) so that F iswell-deﬁned and even maps into C (Ω ) (the space of continuous functions on Ω ). The value χ (( x , x , − z ) can be interpreted as the amount of light recorded at ( x , x ) by the 2D microscopefrom a point source of light at z , which is readily seen from ( F u )( x , x ) = (cid:82) R χ (( x , x , − z ) u ( z ) d z . In our model we assume the blurring kernel to be spatially independent so that theobserved blur of a point z ∈ Ω only depends on its height z . A simple example (used in ourcalculations due to lack of a properly measured point spread function) is given by χ ( x ) = (cid:40) πc x if | x | ≤ , x + x ≤ c x , z outside the focus plane R × { } uniformly onto a disc of radius cz . Remark 1 (Compactness of forward operator) . The previous calculations even show that F is acompact operator from L ∞ (Ω ) to C (Ω ) (as expected for convolutions, even though we onlyextract a slice from the convolution result). Indeed, due to | [ χ ∗ u ]( x ) − [ χ ∗ u ](˜ x ) | ≤ (cid:107) χ ( x − · ) − χ (˜ x − · ) (cid:107) L (cid:107) u (cid:107) L ∞ , the image under F of a bounded subset A ⊂ L ∞ (Ω ) is equicontinuous andthus compact in C (Ω ) by the Arzel`a–Ascoli theorem. However, in our analysis we will notmake use of this fact. emark 2 (Less image regularity) . With additional conditions on the blurring kernel χ one mayreduce the image regularity to u ∈ L q (Ω ) and obtain a continuous linear forward operator F : L q (Ω ) → L q (Ω ) for any q ≥ . Indeed, assume that (cid:107) χ ( · , · , x ) (cid:107) L ≤ C uniformly foralmost all x ∈ R and some C > (note that this is satisﬁed by our example kernel), then for u ∈ L q (Ω ) we have F u ( x , x ) = (cid:90) R u ( · , · , x ) ∗ χ ( · , · , − x ) d x using Fubini’s theorem and thus |F u ( x , x ) | q ≤ K (cid:82) R | u ( · , · , x ) ∗ χ ( · , · , − x ) | q d x by Jensen’sinequality for some K > depending on the support of χ . Therefore, by Fubini’s theorem andYoung’s convolution inequality we have (cid:107)F u (cid:107) qL q = (cid:90) Ω |F u | q ( x , x ) d( x , x ) ≤ K (cid:90) R (cid:90) Ω | u ( · , · , x ) ∗ χ ( · , · , − x ) | q d( x , x ) d x ≤ K (cid:90) R (cid:107) u ( · , · , x ) (cid:107) qL q (cid:107) χ ( · , · , − x ) (cid:107) qL d x ≤ KC q (cid:90) R (cid:107) u ( · , · , x ) (cid:107) qL q d x ≤ KC q (cid:107) u (cid:107) qL q . Between the 2D and the 3D image acquisition the tissue is deformed by a deformation y : Ω → R , where for any point x ∈ Ω the value y ( x ) shall be interpreted as the new position assumedduring the 3D image acquisition. The inverse deformation y − thus moves the reference conﬁgu-ration back into the conﬁguration during the 2D microscopy so that we expect u = F ( u ◦ y ).However, due to noise in the image acquisition and additional artefacts not included in our model(such as diﬀuse background signals) one cannot expect equality. Instead we shall seek a deforma-tion y that leads to a small dissimilarity measure J d [ y ] = (cid:90) Ω d ( F ( u ◦ y )( x ) , u ( x )) d x , where d : R × R → [0 , ∞ ) measures the distance between two image intensities (again u isextended by zero outside Ω ). The optimal choice of d is in general determined by the type ofnoise contained in the image data. We will focus on the L and L distance measures obtainedwith d ( a, b ) = | a − b | , d ( a, b ) = | a − b | , respectively. The former is well-known to be appropriate if the data contains strong outliers, whilethe latter is appropriate for additive Gaussian noise. Other choices include correlation-based dis-tance measures (see for instance [16, Sct. 6.1]) or the Kullback–Leibler Divergence d KL ( a, b ) = b − a + a log ab in case of Poisson noise.The minimization of J d [ y ] for general mappings y : Ω → R is neither physically reasonablenor well-posed. Indeed, y might be discontinuous or not even measurable so that the composition u ◦ y does not make sense. Thus we have to regularize the deformation y by imposing additionalconstraints and adding an extra energy term. Concerning physical constraints, y should neitherreverse the orientation of the deformed body Ω nor should it introduce self-intersections of thematerial. The ﬁrst condition translates todet ∇ y > (cid:90) Ω det ∇ y ( x ) d x ≤ vol( y (Ω )) , where vol shall denote the three-dimensional Lebesgue measure. The advantage of this constraintis that it guarantees the injectivity of y if combined with det ∇ y > y k ) k satisfying the constraint, see for example [5, Thm. 7.9-1]. We shallalso constrain the maximal possible displacement according to (cid:107) y (cid:107) L ∞ ≤ diam(Ω )in order not to shift the three-dimensional volume out of the visible area. In addition to those hardconstraints, unreasonable growth or shrinkage of lengths, areas, or volumes by the deformationshould be penalized. Modelling the biological tissue as an elastic material, such a penalization canbe achieved by adding an elastic deformation energy R [ y ] = (cid:90) Ω W ( ∇ y ) d x as regularization, which will prevent physically unreasonable deformations with too high elasticenergy. Here, W : R × → [0 , ∞ ) represents the stored energy function . Since biological tissueis very soft, it typically undergoes considerable nonlinear deformation so that the elastic modelshould be geometrically nonlinear and, in particular, rigid motion invariant. Furthermore, dueto lack of better information we may assume a homogeneous and isotropic elastic constitutivelaw. Together, the above assumptions imply that the stored energy function can be written as afunction of the singular values or of the invariants of its argument (see for example [5, Ch. 4]) sothat W ( A ) = ˆ W ( (cid:107) A (cid:107) F , (cid:107) cof A (cid:107) F , det A )with the Frobenius norm (cid:107) A (cid:107) F = (cid:112) tr( A T A ) and the cofactor matrix cof A = det AA − T . Note that (cid:107) A (cid:107) F , (cid:107) cof A (cid:107) F , and det A control length, area, and volume changes respectively. Furthermore,biological tissue is almost incompressible so that deviation from det ∇ y = 1 should be stronglypenalized. In our numerical experiments we will use the particular example W ( ∇ y ) = c (cid:107)∇ y (cid:107) F + g (det ∇ y )for a positive constant c and a function g : R → [0 , ∞ ) such that g ( x ) → ∞ for x →

0. Tofavour deformations obeying det ∇ y ≈

1, a term like | det ∇ y − | can be incorporated into thedeﬁnition of g . Note that this function is polyconvex (that is, can be written as a convex functionof ( ∇ y, cof ∇ y, det ∇ y )), which ensures weak lower semi-continuity of R as will be needed for theexistence analysis. Summarizing, we arrive at the optimization problem E d [ y ] = J d [ y ] + R [ y ] → min!whose solution is the sought matching deformation. Remark 3 (Bayesian perspective) . The task of registering the two datasets can also be viewed as aninverse problem, where a measurement u is given alongside with a model of the forward operator y (cid:55)→ F ( u ◦ y ) , which maps a deformation of the tissue into the corresponding projection onto aplanar image. Due to measurement noise and unknown modelling errors in the forward operator,the measurement is a random variable. Likewise, the sought deformation y can be interpreted asa random variable with diﬀerent outcomes in diﬀerent repetitions of the measurement. From aBayesian perspective, one would now like to maximize the conditional probability P ( F ( u ◦ y ) | u ) = P ( u |F ( u ◦ y )) P ( y ) P ( u ) (we formally use P like a probability density over an inﬁnite-dimensional space). After taking thenegative logarithm, the optimization problem thus turns into min y − log P ( F ( u ◦ y ) | u ) = min y [ − log P ( u |F ( u ◦ y )) − log P ( y ) + log P ( u ) ] . The term log P ( u ) is independent of y and can be neglected, and for the probability distributionof deformations it is reasonable to assume a Boltzmann-type distribution P ( y ) ∼ exp( −R [ y ]) ,n which the probability decreases exponentially with increasing deformation energy. Likewise, theprobability distribution of the measurement ( u ) k in the k th image pixel is taken as P (( u ) k | ( v ) k ) ∼ exp( − d ( v k , ( u ) k )) , where v k = ( F ( u ◦ y )) k is the expected value in pixel k . For instance, if thenoise distribution is Gaussian we have P (( u ) k | ( v ) k ) ∼ exp( − d ( v k , ( u ) k )) . Integrating overall pixels we obtain P ( u |F ( u ◦ y )) ∼ exp( − (cid:82) Ω d ( F ( u ◦ y )( x ) , u ( x )) d x ) = exp( −J d [ y ]) .Summarizing, we arrive at the same optimization problem, min y J d [ y ] + R [ y ] . Theorem 1 (Existence of minimizer) . Let u ∈ L ∞ (Ω ) and u ∈ L ∞ (Ω ) . Also, let E d = J d + R for a ﬁnite nonnegative dissimilarity d ( · , · ) , convex and lower semi-continuous in itsﬁrst argument, and a polyconvex lower semi-continuous stored energy function W satisfying W ( A ) ≥ C (cid:0) (cid:107) A (cid:107) pF + max { , det A } − r (cid:1) − β for some exponents p > , r > and constants C, β > . Then E d admits a minimizing deforma-tion y on the set of admissible deformations A = (cid:26) y ∈ W ,p (Ω , R ) (cid:12)(cid:12)(cid:12)(cid:12) (cid:107) y (cid:107) L ∞ ≤ diam(Ω ) , det ∇ y > almost everywhere, (cid:90) Ω det ∇ y ( x ) d x ≤ vol( y (Ω )) (cid:27) ( W ,p denotes the standard Sobolev space). Furthermore, y is almost everywhere injective.Proof. We follow the direct method of the calculus of variations. First note that E d is boundedbelow by − β vol(Ω ) and that inf A E d is ﬁnite due to E d [id] < ∞ . Compactness:

Consider a minimizing sequence y k ∈ A , k = 1 , , . . . , with E d [ y k ] → inf A E d monotonically as k → ∞ . Due to the growth condition on W we have E d [ y ] ≥ E d [ y k ] ≥ C (cid:107)∇ y k (cid:107) p − β vol(Ω )so that (cid:107)∇ y k (cid:107) L p is uniformly bounded. Together with the admissibility condition (cid:107) y k (cid:107) L ∞ ≤ diam(Ω ) we obtain uniform boundedness of (cid:107) y k (cid:107) W ,p so that we can extract a weakly convergingsubsequence (still indexed by k ) y k (cid:42) y in W ,p (Ω ; R ). Due to p >

3, by Sobolev embeddingwe may even assume y k → y strongly in the space C ,α (Ω ; R ) of H¨older continuous functionswith exponent α < − p/ Lower semi-continuity of R : We have R [ y ] ≤ lim inf k →∞ R [ y k ] by the properties of W . Indeed,by H¨older’s inequality cof ∇ y k and det ∇ y k are uniformly bounded in L p/ (Ω ) and L p/ (Ω ),respectively, and thus converge for a subsequence. By [5, Thm. 7.6-1] we even havecof ∇ y k (cid:42) cof ∇ y in L p/ (Ω ) and det ∇ y k (cid:42) det ∇ y in L p/ (Ω ) . Now Mazur’s lemma implies the existence of a sequence of strongly and pointwise almost every-where converging convex combinations N n (cid:88) i = k a ki ( ∇ y i , cof ∇ y i , det ∇ y i ) → ( ∇ y, cof ∇ y, det ∇ y )as k → ∞ , where N n ≥ k and the nonnegative coeﬃcients a ki sum up to one. Since W is polyconvexwe can write W ( A ) = ˜ W ( ∇ A, cof ∇ A, det ∇ A ) for a convex function ˜ W : R × × R × × R → R .Thus, with Fatou’s lemma and the lower semi-continuity of W we now have R [ y ] = (cid:90) Ω ˜ W ( ∇ y, cof ∇ y, det ∇ y ) d x = (cid:90) Ω ˜ W (cid:32) lim k →∞ N n (cid:88) i = k a ki ( ∇ y i , cof ∇ y i , det ∇ y i ) (cid:33) d x ≤ (cid:90) Ω lim inf k →∞ N n (cid:88) i = k a ki ˜ W ( ∇ y i , cof ∇ y i , det ∇ y i ) d x ≤ lim inf k →∞ N n (cid:88) i = k a ki (cid:90) Ω ˜ W ( ∇ y i , cof ∇ y i , det ∇ y i ) d x = lim inf k →∞ N n (cid:88) i = k a ki R [ y i ] ≥ lim inf k →∞ R [ y k ] . roperties of limit function: The limit function y lies in A . Indeed, by the uniform convergence y k → y we have (cid:107) y (cid:107) L ∞ = lim k →∞ (cid:107) y k (cid:107) L ∞ ≤ diam(Ω ). To see that det ∇ y > S ε = { x ∈ Ω | det ∇ y ( x ) < ε } for ε >

0. The growth condition on W and the lower semi-continuity of R implyvol( S ε )( Cε − r − β ) ≤ R [ y ] ≤ lim inf k →∞ R [ y k ] ≤ lim inf k →∞ E d [ y k ] ≤ E d [ y ] . Thus, vol( S ε ) → ε → ∇ y > ∇ y k (cid:42) det ∇ y and the convergence y k → y in C ,α (Ω ) we have (cid:90) Ω det ∇ y d x = lim k →∞ (cid:90) Ω det ∇ y k d x ≤ lim k →∞ vol( y k (Ω )) = vol( y (Ω )) . Furthermore, y is injective almost everywhere, that is, the cardinality N ( y | v ) = card( y − ( { v } ))equals 1 for almost every v ∈ y (Ω ). Indeed, by the change of variables formula for Sobolevfunctions [12, Thm. 2] we havevol( y (Ω )) ≤ (cid:90) y (Ω ) N ( y | v ) d v = (cid:90) Ω det ∇ y ( x ) d x . Since also the opposite inequality holds, we must have equality and N ( y | v ) = 1 almost everywhere. Lower semi-continuity of J d : Note that we have u ◦ y k → u ◦ y as k → ∞ in any L q (Ω )with q ∈ [1 , ∞ ). Indeed, for a Dirac sequence G δ of smooth molliﬁers we have (cid:107) u ◦ y k − u ◦ y (cid:107) L q ≤ (cid:107) u ◦ y k − ( G δ ∗ u ) ◦ y k (cid:107) L q + (cid:107) ( G δ ∗ u ) ◦ y k − ( G δ ∗ u ) ◦ y (cid:107) L q + (cid:107) ( G δ ∗ u ) ◦ y − u ◦ y (cid:107) L q . Abbreviating s = ( r + 1) /r and employing H¨older’s inequality, for the ﬁrst summand we obtain (cid:90) Ω | u ◦ y k − ( G δ ∗ u ) ◦ y k | q d x ≤ (cid:90) Ω (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | u ◦ y k − ( G δ ∗ u ) ◦ y k | q det ∇ y s k det ∇ y s k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d x ≤ (cid:18)(cid:90) Ω | u ◦ y k − ( G δ ∗ u ) ◦ y k | qs det ∇ y k d x (cid:19) s (cid:18)(cid:90) Ω det ∇ y − s − k d x (cid:19) − s = (cid:32)(cid:90) y k (Ω ) | u − G δ ∗ u | qs N ( y k | v ) d v (cid:33) s (cid:18)(cid:90) Ω det ∇ y − rk d x (cid:19) r +1 ≤ (cid:107) u − G δ ∗ u (cid:107) qL qs ( R ) (cid:18) R [ y k ] + β vol(Ω ) C (cid:19) r +1 , where we used the change of variables for Sobolev functions [12, Thm. 2] as well as N ( y k | v ) = 1for almost all v ∈ y k (Ω ) (by the same argument as for y ). Since u ∈ L qs (Ω ) and R [ y k ] ≤E d [ y k ] ≤ E d [ y ], the right-hand side converges to 0 as k → ∞ and then δ →

0. For the secondsummand we observe (cid:107) ( G δ ∗ u ) ◦ y k − ( G δ ∗ u ) ◦ y (cid:107) qL q ≤ L qδ (cid:107) y k − y (cid:107) qL q ,L δ being the Lipschitz constant of G δ ∗ u . Again, letting ﬁrst k → ∞ and then δ → u ◦ y k → u ◦ y .ue to u ∈ L ∞ (Ω ), the composition u ◦ y k is uniformly bounded in L ∞ (Ω ) so that anysubsequence contains another weakly-* converging subsequence in L ∞ (Ω ). Due to the strongconvergence u ◦ y k → u ◦ y in L q (Ω ), the limit must be the same and thus u ◦ y k ∗ (cid:42) u ◦ y in L ∞ (Ω )for the whole sequence. Furthermore it is straightforward to check that F is the adjoint operatorto F (cid:48) : L (Ω ) → L (Ω ) , g (cid:55)→ ( x (cid:55)→ [ g ∗ χ ( −· , −· , − x )]( x , x )) , which is a bounded linear operator due to (cid:107)F (cid:48) g (cid:107) L = (cid:90) − (cid:107) g ∗ χ ( −· , −· , − x ) (cid:107) L d x ≤ (cid:90) − (cid:107) g (cid:107) L (cid:107) χ ( −· , −· , − x ) (cid:107) L d x = (cid:107) g (cid:107) L (cid:107) χ (cid:107) L by Young’s convolution inequality. As a consequence, F ( u ◦ y k ) ∗ (cid:42) F ( u ◦ y ) in L ∞ (Ω ), sincefor any g ∈ L (Ω ) we have (cid:90) Ω g F ( u ◦ y k ) d x = (cid:90) Ω F (cid:48) ( g ) u ◦ y k d x → (cid:90) Ω F (cid:48) ( g ) u ◦ y d x = (cid:90) Ω g F ( u ◦ y ) d x as k → ∞ . The convexity of d in its ﬁrst argument now implies lim inf k →∞ J d [ y k ] ≥ J d [ y ], asdesired.Summarizing, E d [ y ] = J d [ y ] + R [ y ] ≤ lim inf k →∞ J d [ y k ] + R [ y k ] = lim inf k →∞ E d [ y k ] = inf A E d so that y ∈ A must be a minimizer. In this section we discuss the discretization and numerical minimization of the energy functional E d [ y ] = (cid:90) Ω d ( F ( u ◦ y )( x ) , u ( x ) ) d x + (cid:90) Ω W ( ∇ y ( x )) d x , which is nontrivial due to the nonlocal convolution operator in F , the composition of discretizedfunctions, and the nondiﬀerentiability of the discretized functions. As before, d denotes either theEuclidean distance d ( x, y ) = | x − y | or its square d ( x, y ) = | x − y | , but other choices can beimplemented in the same way. To obtain a diﬀerentiable functional in the former case (which willallow simpler numerics), we make the modiﬁcation d ( x, y ) = (cid:112) ( x − y ) + δ with δ > W has the form W ( A ) = c (cid:107) A (cid:107) F + c (det A ) − + c (1 − det A ) + D (1)with constants c , c , c ≥ D ∈ R such that W ≥ W ( S ) = 0 for rotation matrices S .This speciﬁc choice violates the growth condition of theorem 1 (which was needed to apply achange of variables formula for Sobolev functions, while the lower semi-continuity of R could alsobe obtained for weaker growth conditions [19, Thm. 3.6]), however, we observed no indication ofdegeneration of the deformations in our numerical experiments so that the above choice seemedsuﬃcient. Note that other stored energy functions can be implemented just as well, for instancethe strain energy densities of Neo-Hookean materials, W ( A ) = 12 µ ( (cid:107) A (cid:107) F − A )) + λ − det A ) − µ , which only diﬀer from our choice by the slightly weaker penalty term for volume compression.Note also that in our experiments the constraints (cid:82) Ω det ∇ y ( x ) d x ≤ vol( y (Ω )) and (cid:107) y (cid:107) L ∞ ≤ diam(Ω ) were always satisﬁed without explicit enforcement.ssuming a twice diﬀerentiable 3D image u , the ﬁrst and second Gˆateaux derivatives of J d in y ∈ A for suitable variations φ and ψ are ∂ J d [ y ]( φ ) = (cid:90) Ω ∂ d ( F ( u ◦ y )( x ) , u ( x )) ( χ ∗ [ φ · ( ∇ u ) ◦ y ]) ( x , x ,

0) d x ,∂ J d [ y ]( φ, ψ ) = H L ( φ, ψ ) + H NL ( φ, ψ ) with H L ( φ, ψ ) = (cid:90) Ω ∂ d ( F ( u ◦ y )( x ) , u ( x )) (cid:0) χ ∗ (cid:2) ψ T ((Hess u ) ◦ y ) φ (cid:3)(cid:1) ( x,

0) d x , H NL ( φ, ψ ) = (cid:90) Ω ∂ d ( F ( u ◦ y )( x ) , u ( x )) ( χ ∗ [ φ · ( ∇ u ) ◦ y ]) ( x , x ,

0) ( χ ∗ [ ψ · ( ∇ u ) ◦ y ]) ( x , x ,

0) d x , where Hess u denotes the Hessian of u . While for convex d the second summand in ∂ J d [ y ]is always positive semi-deﬁnite, the ﬁrst summand may destroy this deﬁniteness. For the sake ofcompleteness, we also write down the expressions for the ﬁrst and second Gˆateaux derivative ofthe hyperelastic regularizer. Rewriting W deﬁned in (1) in the form W ( A ) = ¯ W ( (cid:107) A (cid:107) F , det A ) with W ( I , I ) = c I + c I − + c (1 − I ) + D , its partial derivatives are given by ∂ ¯ W ( I , I ) = c , ∂ ¯ W ( I , I ) = − c I − − c (1 − I ) ,∂ ¯ W ( I , I ) = ∂ ∂ ¯ W ( I , I ) = 0 , ∂ ¯ W ( I , I ) = 2 c I − + 2 c . With the help of the identities ∂ A det( A )( B ) = tr( B T cof A ) and ∂ A ( (cid:107) A (cid:107) F )( B ) = 2tr( B T A ) andabbreviating I = (cid:107)∇ y (cid:107) F and I = det ∇ y , the Gˆateaux derivatives of the hyperelastic regularizerread ∂ R [ y ]( φ ) = (cid:90) Ω ∂ ¯ W ( I , I ) tr( ∇ φ T ∇ y ) + ∂ ¯ W ( I , I ) tr( ∇ φ T cof ∇ y ) d x ,∂ R [ y ]( φ, ψ ) = (cid:90) Ω ∂ ¯ W ( I , I )tr( ∇ φ T ∇ ψ ) + ∂ ¯ W ( I , I ) tr( ∇ ψ T cof ∇ y ) tr( ∇ φ T cof ∇ y )+ ∂ ¯ W ( I , I ) I (cid:2) tr( ∇ ψ T cof ∇ y ) tr( ∇ φ T cof ∇ y ) − tr( ∇ φ T cof ∇ y ∇ ψ T cof ∇ y ) (cid:3) d x . Spatial discretization.

The image domains Ω and Ω are discretized by two dyadicallynested hierarchies of regular rectilinear grids ( T n ) n =1 ,...,N and ( T n ) n =1 ,...,N , respectively. Thenumber of nodes in the n th grid along each coordinate direction is M n = 2 n + 1, where theresolution of the given discrete input images determines N , the level of the ﬁnest grids T N and T N .Introducing multilinear Finite Element basis functions on each grid gives rise to a hierarchy of C -Finite Element spaces X n = ( X n × X n ) n =1 ,...,N with X n ⊂ X m whenever n ≤ m andcorresponding restriction and prolongation operators chosen as follows. On a one-dimensionalgrid with M n nodes a Finite Element function ξ can be identiﬁed with the vector ( ξ , . . . , ξ M n )of its nodal values. On such a grid we deﬁne the one-dimensional restriction and prolongationoperators as R n : ( ξ j ) ≤ j ≤ M n (cid:55)→ (cid:16) ξ j − +2 ξ j − + ξ j (cid:17) ≤ j ≤ ( M n +1) / , P n : ( ξ j ) ≤ j ≤ M n (cid:55)→ ( ξ , ξ + ξ , ξ , ξ + ξ , . . . , ξ M n − , ξ Mn − + ξ Mn , ξ M n )(for ease of notation we set ξ = ξ and ξ M n +1 = ξ M n ). The restriction and prolongation operators R n : X n → X n − , P n : X n → X n +1 re then obtained by applying consecutively R n and P n along each coordinate direction of thegrid.Fixing a grid T n with N n = (2 n + 1) nodes ( x , . . . , x N n ), we denote the Finite Element basisfunctions on T n by φ n , . . . , φ nN n . The Finite Element representation of u in X N is taken as u N = (cid:80) N N j =1 u ( x j ) φ Nj and in X n as u n = R n +1 · · · R N u N (note that we will denote discretizedfunctions on grid level n with a superscript n ). Similarly, the discretized deformation is expressedas y n = (cid:80) N n j =1 y nj φ nj with nodal coeﬃcients y nj ∈ R . The discretized version u n of u on grid T n is deﬁned in an analogous way. Discretized cost functional.

The evaluation of the data term J d and its derivatives requiresthe computation of F ( u ◦ y ). On the discretized level, the convolution contained in F is computedwith the help of the discrete Fourier transform (DFT). To this end, u n ◦ y n and the discretizedconvolution kernel χ n are evaluated at all grid points, and the resulting grid functions are paddedwith zeros so as to emulate the DFT using the fast Fourier transform on a periodic grid (in ourcase using routines of the FFTW project ). The resulting grid functionspeciﬁes the nodal values of a multilinear Finite Element function, which is then restricted to the x - x -plane to yield the discretized forward operator F n ( u n ◦ y n ). Using second order Gaussianquadrature on each element, the discrete analogue of J d is now computed as J dn [ y n ] = A n (cid:88) q w q d ( F n ( u n ◦ y n )( q ) , u n ( q )) , where A n stands for the area of each element in T n , w q denotes the quadrature weights, andthe sum is taken over all quadrature points q . Similarly, the discretized regularizer is evaluatedvia R n [ y n ] = V n (cid:88) q w q W ( ∇ y n ( q ))with V n the volume of each element in T n . Finally, E dn = J dn + R n . Discretized functional derivatives.

For the numerical evaluation of ∂ J d we exploit that theadjoint operator to a convolution is the cross-correlation. In more detail, for functions f : Ω → R , g : Ω → R we have (cid:90) Ω f ( x )[ χ ∗ g ]( x , x ,

0) d x = (cid:90) Ω (cid:90) Ω f ( x ) χ (( x , x , − y ) g ( y ) d y d x = (cid:90) Ω (cid:90) Ω f ( x ) χ (( x , x , − y ) d x g ( y ) d y = (cid:90) Ω [ χ (cid:5) f ]( y ) g ( y ) d y , where we used Fubini’s theorem and [ χ (cid:5) f ]( y ) = [ χ ( −· , −· , − y ) ∗ f ]( y , y ) denotes the adjointoperator to the convolution evaluated in the x - x -plane. Denoting by ∗ n our discrete approxima-tion of the convolution described previously, our discrete approximation of (cid:5) applied to two FiniteElement functions f n ∈ X n and χ n ∈ X n is computed as the Finite Element function χ n (cid:5) n f n = χ n ∗ n Bf n , where B : X n → X n is the operator copying all nodal function values from T n into the x - x -plane of T n and leaving all other nodal values 0. Using the above notation we can write ∂ J d [ y ]( φ ) = (cid:90) Ω [ φ · ( ∇ u ) ◦ y ] ( x ) [ χ (cid:5) ∂ d ( F ( u ◦ y, u )] ( x ) d x . Correspondingly, using second order Gaussian quadrature, the discretized derivative is calculatedfor each Finite Element basis function φ n,kj = e k φ nj (with e k the k th Cartesian unit vector) as ∂J dn [ y n ]( φ n,kj ) = V n (cid:88) q w q (cid:104) φ n,kj · ( ∇ u n ) ◦ y n (cid:105) ( q ) [ χ n (cid:5) n ( ∂ d ( F n ( u n ◦ y n , u n ))]( q ) . he discretized analogue of ∂ R can be written as ∂R n [ y n ]( φ k,nj ) = V n (cid:88) q w q tr (cid:16) ( ∇ φ k,nj ) T ( q ) (cid:2) c ∇ y n ( q ) − (cid:2) c det( ∇ y n ( q )) − + 2 c (1 − det( ∇ y n ( q ))) (cid:3) cof( ∇ y n ( q )) (cid:3)(cid:17) , and ∂E dn = ∂J dn + ∂R n . Discretized second derivatives.

The second derivative of J d can be written as the sum ∂ J d [ y ] = H L + H NL of a local and a nonlocal linear operator, which are discretized separately.Indeed, while H L ( φ, ψ ) is nonzero only if φ and ψ have overlapping support, the sparsity of a matrixrepresentation of H NL ( φ, ψ ) depends on the size of the blurring kernel χ and will typically be verylow, making the storage of the assembled matrix impractical. However, during our numericaloptimization we will only apply iterative solvers like BiCGStab, which only require the evaluationof matrix-vector products. Due to the tensor product structure of the integrand of H NL , theapplication of the linear operator can be implemented eﬃciently as follows. Again exploiting therelation between convolution and the operator (cid:5) we can rewrite H NL ( φ, ψ ) = (cid:90) Ω (cid:2) χ (cid:5) (cid:0) ∂ d ( F ( u ◦ y ) , u )( χ ∗ [ ψ · ( ∇ u ) ◦ y ])( · , · , (cid:1)(cid:3) ( x ) [ φ · ( ∇ u ) ◦ y ]( x ) d x . Thus, given a Finite Element function ψ n ∈ ( X n ) , for each Finite Element basis function φ k,nj = e k φ nj we can compute the discretized analogue H NL n ( φ k,nj , ψ n ) = V n (cid:88) q w q (cid:2) χ n (cid:5) n (cid:0) ∂ d ( F n ( u n ◦ y n ) , u n )( χ n ∗ n [ ψ n · ( ∇ u n ) ◦ y n ])( · , · , (cid:1)(cid:3) ( q ) [ φ k,nj · ( ∇ u n ) ◦ y n ]( q ) . The operator H L is discretized as H L n ( φ k,nj , φ k,ni ) = V n (cid:88) q w q (cid:104) φ k,nj · ((Hess n u n ) ◦ y n ) φ k,ni (cid:105) ( q ) [ χ n (cid:5) n ∂ d ( F n ( u n ◦ y n , u n )] ( q ) , where Hess n u n is deﬁned weakly as in mixed Finite Elements approaches. Indeed, as an artefactof our discretization, the piecewise multilinear Finite Element function u n does not possess aweak second derivative (part of its distributional second derivative is concentrated on the elementboundaries), yet second order information Hess u is helpful for the registration and should notbe neglected in the discretization. Thus we deﬁne Hess n u n via (cid:90) Ω tr (cid:0) ψ T Hess n u n (cid:1) d x = (cid:90) ∂ Ω n T ψ ∇ u n d x − (cid:90) Ω div ψ · ∇ u n d x for all ψ ∈ ( X n ) × , where n denotes the unit outward normal to ∂ Ω . This amounts to solving a linear system of theform M V = LU for the nodal value vector V of Hess n u n with M a mass matrix, L a stiﬀnessmatrix, and U the vector of nodal values of u n . Note that the matrix representation of H L n isjust a weighted mass matrix. Numerical optimization.

We tested and compared the performance of gradient-based meth-ods, in particular nonlinear conjugate gradient and quasi-Newton methods, and a second orderline search or trust region Newton method. Below we provide a few details on the latter (theimplementation of the former being straightforward).The line search Newton method for the numerical minimization of E dn over ( X n ) takes the form y nk +1 = y nk − γ k ( ∂ E dn [ y nk ]) − ∂E dn [ y nk ] , k ≥ , where we compute the step size γ k > . . . . E n e r g y Reduced HessianFull Hessian (a) Energy decay in Newton’s method using the fullHessian and the approximation ∂ E dn ≈ H L + ∂ R n . (b) Computational costs of each step of Newton’smethod. Figure 2: Performance of Newton’s method using a fully and partially assembled Hessian operator(left) and a breakdown of the computational costs of a generic step of Newton’s method (right).in H NL n , the linear system has to be solved iteratively). However, while H NL n is always positivesemideﬁnite, H L n and ∂ R n can be indeﬁnite so that ∂ E dn may be so as well. Consequently, theNewton step may not be a descent direction, and the iteration might converge to a saddle point.To compensate a possible lack of positive deﬁniteness we add a scalar multiple λ k of the identityto the Hessian operator, y nk +1 = y nk − γ k ( ∂ E dn [ y nk ] + λ k id) − ∂E dn [ y nk ] , k ≥ , where − λ k should approximate the most negative eigenvalue of H L [ y nk ] + ∂ R n [ y nk ]. To determine λ k , we use the procedure described in [7, Sct. 8.5.2, Thm. 8.5.1], which consists of a number oftruncated Lanczos iterations to obtain a symmetric tridiagonal approximation T ∈ R m × m to H L [ y nk ] + ∂ R n [ y nk ] and a subsequent computation of its characteristic polynomial, whose smallestzero is found via a bisection method.Further modiﬁcations of the Newton iteration allow a further reduction of the computationalcomplexity of each Newton step. Since ∂ d( F ( u ◦ y ) , u ) is zero for perfectly aligned images,the contribution of H L for closely aligned images is negligible and ∂ J dn can be approximated by H NL + ∂ R n . We refer to [21] for a detailed discussion of this strategy in the context of hyperelastic3D-3D image registration. Figure 2a compares the energy decrease of Newton’s method and thismodiﬁcation when applied to the datasets shown in Figure 8.Figure 2b shows that the major cost of each Newton iteration lies in the numerical solution of thelinear system. To improve convergence of the BiCGStab solver, we tested several preconditioningmethods. Jacobi, geometric scaling, and incomplete LU preconditioning – applied to the part H L [ y nk ] + ∂ R n [ y nk ] of the Hessian matrix that can be assembled – turned out to be inferior to amultigrid preconditioner applied to the entire Hessian operator ∂ E dn . Note that this operationdoes not require the assembly of ∂ E dn , since iterative solvers like BiCGStab or GMRES – thistime without preconditioning – can be employed for the pre- and postsmoothing steps.A line search Newton method is prone to getting stuck or at least slowing down at saddle points(even despite the compensation for indeﬁniteness). This can be avoided by using a trust regionNewton method, which can also minimize indeﬁnite quadratic functions within its trust region. Tothis end we solved the Newton system via preconditioned truncated Lanczos iterations [8, 26] (sincethis simultaneously allows to use the above-mentioned technique for compensating indeﬁniteness),where we applied the same preconditioners as in the line search approach.Overall, as already suggested by ﬁg. 2b, the solution of the linear system in Newton’s method (linesearch or trust region) turns out to consume so much time that a mere quasi-Newton method withFGS updates (see e. g. [17, Ch. 6.1]) is more eﬃcient. In fact, a plain conjugate gradient descentwith Polak–Ribi´ere updates (see e. g. [6, Sct. 8.5]) performed best in our experiments. Preprocessing.

Before starting the minimization of E dn we rigidly align u to u by minimiz-ing J dn [ y n ] among all rigid deformations (for which actually E dn = J dn ). In fact, to accommodatea potential slight mismatch in magniﬁcation between the 2D intravital and the 3D confocal mi-croscopy as well as diﬀerent resolutions in x -, x -, and x -direction (or to make up for a badchoice of the blurring kernel χ ), we additionally allow a rescaling along the coordinate directions.Thus, we minimize J dn among all deformations x (cid:55)→ R ( γ ) R ( β ) R ( α ) diag( s , s , s ) x + t , parameterized by a translation vector t ∈ R as well as scalings s , s , s along and rotation angles α, β, γ ∈ [0 , π [ about the three coordinate directions ( R i ( δ ) ∈ SO (3) denotes the rotation aboutthe i th axis by angle δ ). We now use the grid hierarchy T n , n = 1 , . . . , N , to iteratively ﬁnd theoptimal parameters for each grid level n via a quasi-Newton method initialized with the optimalparameters from level n −

1. After the optimal deformation on level N is found, we replace u by its composition with that deformation so that the new u now has the correct length scalesand already is optimally aligned. Multilevel optimization problems.

We have already detailed how by replacing J d , R , and E d with discretized analogues J dn , E dn , and R n we arrive at a set of optimization problemsmin y n ∈ ( X n ) E dn [ y n ] , n = 1 , . . . , N , that are numerically solved using the nonlinear conjugate gradient method. As so often in imageregistration methods, the use of a multilevel approach is essential for the quality of the results aswell as for computational eﬃciency. Owing to its nonconvexity, the functional E d can be expectedto have a large number of local minima, which is in general linked to the resolution of the givenimage data and the number of image features that promote regional alignment. Downsamplingof the image data reduces the number of image features (and thus of local minima) in the inputdatasets as well as the complexity of the optimization procedure (see the experimental illustrationin ﬁg. 4). The results of the less costly optimization on coarser grids can then be used as goodinitial values for the higher level optimization problems. We make use of this strategy by ﬁrstminimizing E dn on a low grid level n and then successively solving the optimization problem onhigher grid levels.In addition to the use of hierarchical grids we will use a smoothing-based multiscale strategy.A drawback of local, pixel-based distance measures is their inability to align corresponding butnon-overlapping image features. As illustrated in ﬁg. 4, blurring of the datasets can compensatethe lack of overlap at the expense of a diminished data ﬁdelity. We therefore solve the registrationproblem for blurred versions of u and u with successively decreasing blur radius. In ourimplementation, we convolved u and the slices of u in x - x -direction with a Gaussian kernel K s ( x ) = s √ π exp( − x s ) via the fast Fourier transform. Experimentally, a good sequence ofdecreasing kernel radii turned out to be s = 8 h, h, h,

0, where h denotes the grid width of thevolumetric input image. Full algorithm.

The full algorithmic workﬂow is depicted in ﬁg. 5. Note that the iteration overthe diﬀerent grid levels always starts with that level n as the coarsest one on which the grid sizecorresponds to the current blurring kernel radius. The actual implementation was based on the QuocMesh Library , a C ++ Finite Element library which supports quadratic, cuboid and simplicialelements. D F ( u D ) r e s . ×

129 65 ×

65 33 ×

33 17 ×

17 9 × Translation in x -direction E n e r g y × × × × × × × × × × horizontal translation t J d [ y t ] u s D F ( u s D ) b l u r s = 0 s = h s = 2 h s = 4 h horizontal translation t J d [ y t ] horizontal translation t J d [ y t ] Figure 3: Multigrid and multiscale strategies simplify the energy landscape and thereby facilitateregistration: Both top and bottom experiment use a three-dimensional image u consisting ofthree cuboids at a resolution of 256 × × u as theprojection of a simple translation, u = F ( u ◦ y . ) for y t ( x ) = ( x , x , x + t ) (note that theimage width of 256 pixels corresponds to length 1). The right graphs show the registration energyas a function of the translation t in the range [0 , . x , x -resolution of both u and u was decreased repeatedly using the restriction operators R n , while in the bottomexperiment the images are obtained by blurring in x , x -direction with the Gaussian kernel K s of scale s , u s = K s ∗ u , u s = K s ∗ u . Clearly, spurious local minima are alleviated oreven eliminated at coarser resolution or stronger blur. (cid:45)(cid:45)(cid:45)(cid:45)(cid:45) Figure 4: Left: A reference image u r (red) and a template image u t (green) show the same structureat diﬀerent locations. Due to the lack of overlap, the gradient of the registration functional y (cid:55)→ (cid:82) R d ( u t ◦ y, u r ) d x at y = id is zero. Right: After blurring both images the structuresoverlap, resulting in a nonzero gradient. The negative gradient, which points into the direction ofan improved registration, can be interpreted as a perturbation of y = id which produces a slightrightward deformation in the overlapping region, as indicated by the arrows. reate hierarchical imagedata representationPreprocessing (rigidregistration + scaling)Rigid inter-mediate result Elastic reg-istration Artiﬁcialblurringinvolved inprevious step?Decreaseartiﬁcial blur StopRegistrationresultapply computed deformation noiterate over grid levels iterate over grid levels yesupdate initial deformation Figure 5: Schematic overview of the overall registration procedure.

In all following examples we took Ω = [0 , × [0 , Z ] for some Z ∈ ]0 ,

1] and used the blurringkernel χ ( x ) = (cid:40) ( π [ x − Z ] ) − (cid:112) x + x ≤ (cid:12)(cid:12) x − Z (cid:12)(cid:12) , Synthetic data.

To test the performance of the elastic regularizer, we applied our techniqueto synthetic datasets (ﬁgs. 6 to 8), representing cuboids and a deformed vessel structure made ofsome elastic material.In the ﬁrst two test cases (ﬁgs. 6 to 7), the simplicity of the shapes made it possible to generate thethree-dimensional deformed and undeformed scenes u and u ◦ y by setting the pixel intensitiesmanually. The two-dimensional reference images u were then obtained by applying the forwardoperator to the undeformed scenes. Figure 6 depicts the results of a test assessing the algorithm’sability to compute non-rigid lateral deformations. The resolutions of u and u in this exampleare 129 × ×

17 and 129 × × ×

129 and129 × x (cid:55)→ (cid:40)(cid:0) x , x , x + (cid:1) , if 0 ≤ x ≤ x, else , which is incompatible with the regularizer R , explains this phenomenon. As the hyperelasticregularizer favours more regular deformations, its contribution to the overall energy will outweighthose of the data term, if, as in this case, a transformation introduces too much shear.Since vessel structures constitute the predominant image features in our microscope images, weapply the algorithm to another, more realistic test case to see whether branched tube-like structuresnitial conﬁguration u (perspective view) Two-dimensional ref-erence image u Projected registrationresult F ( u ◦ y ) Registration result u ◦ y (perspective view)Figure 6: Inplane deformation of a pair of elastic cuboids. In this experiment and the one shownin ﬁg. 7, the parameters of the stored energy function W were chosen to make the two cuboidsbehave like they were embedded in some soft, gel-like substance.Initial conﬁguration u (side view) Two-dimensional ref-erence image u Projected registrationresult F ( u ◦ y ) Registration result u ◦ y (side view)Figure 7: Displacement of one of a pair of elastic cubes, surrounded by a soft material.nitial conﬁguration u (perspective view) Two-dimensional ref-erence image u Projected registrationresult F ( u ◦ y ) Registration result u ◦ y (perspective view)Figure 8: Idealized vessel structure used as a more realistic test case.Projected templateimage u Two-dimensional ref-erence image u Overlay of u and u Overall registrationresultFigure 9: Microscopy images acquired with 3D confocal and 2D IVM microscopy and their elasticregistration. Data courtesy of Lydia Sorokin, Konrad Buscher, Jian Song (Institute of Physiolog-ical Chemistry and Pathobiochemistry, M¨unster, Germany).are registered equally well as in the previous cases. The synthetic dataset shown in ﬁg. 8 wasgenerated with the help of

VascuSynth [9], a software package capable of generating realisticallylooking synthetic vessel structures. To obtain a volumetric template dataset u , the generatedvessel structure was deformed using a CGAL [11] implementation of the algorithm described in[22], which is capable of generating triangular mesh deformations in real-time under the constraintthat the resulting deformation acts as rigidly as possible on each triangle. The deformed andundeformed triangular meshes were then turned into grayscale image stacks. As in the previoustests, the application of the forward operator to the undeformed volume image stack generatedthe two-dimensional reference image u . The dimensions of the input datasets were the sameas in our ﬁrst test case. It turned out that in this example, a preprocessing step as describedin section 3, preceding the elastic registration procedure, was necessary to obtain a satisfactorydata alignment. The comparison of the reference image u and the projected registration result F ( u ◦ y ) in ﬁg. 8 indicates a faultless overall alignment, but spurious small-scale deformationssimilar to those encountered in the ﬁrst test case. Microscopy data.

In ﬁg. 9 the technique was ﬁnally applied to a microscopy dataset as de-scribed in the introduction.The centre region of the reference image was obscured by diﬀused ﬂuorescence dye, interferingwith the registration process. As a consequence, the data term had to be augmented by a mask m : Ω → (0 ,

1] taking small values in the degraded image region, J d [ y ] = (cid:90) Ω m ( x ) d ( F ( u ◦ y )( x ) , u ( x )) d x . As can be seen from overlaying u and F ( u ◦ y ) in ﬁg. 9, right, the alignment of the bloodigure 10: Registration result from ﬁg. 9 showing additional channels combining temporal infor-mation from two-dimensional intravital microscopy with well-resolved spatial information fromthree-dimensional confocal microscopy images acquired after tissue excision. Magenta shows fatcells (from confocal microscopy), blue shows leukocytes (from intravital microscopy).vessel structures is satisfactory. Note that projecting the original, undeformed three-dimensionalconﬁguration, as shown in ﬁg. 9 left, one obtains dark regions on the right middle part of the image,where vessels lie outside the focus plane. This is corrected by the registration so that the overallregistration result on the right of ﬁg. 9 shows both a strong red and green signal in that region.Similarly, the in-plane distortion on the left side of the image is corrected by the registration. Thealignment of the three-dimensional with the two-dimensional images allows information of othercolour channels to be integrated into the single dataset as shown in ﬁg. 10. Thereby one cancombine temporal information from intravital microscopy with well-resolved spatial informationfrom confocal microscopy that can only be obtained after tissue excision. Here, clusters of fat cellsthat surround the larger blood vessels were stained after tissue excision and are shown in magenta,while individual migrating leukocytes were observed during intravital microscopy and are shownin blue. Acknowledgements

The authors thank Konrad Buscher, Jian Song, and Lydia Sorokin for discussions relating tothe biological motivation and for providing the data of ﬁg. 9. This work was supported by theDeutsche Forschungsgemeinschaft (DFG), within the Cells-in-Motion Cluster of Excellence (EXC1003-CiM), University of M¨unster, Germany, and under Germany’s Excellence Strategy EXC 2044– 390685587, Mathematics M¨unster: Dynamics-Geometry-Structure. The research was furthersupported by the Alfried Krupp Prize for Young University Teachers awarded by the AlfriedKrupp von Bohlen und Halbach-Stiftung.

References [1] F. Aguet, D. Van De Ville, and M. Unser. Model-based 2.5-d deconvolution for extended depthof ﬁeld in brightﬁeld microscopy.

IEEE Transactions on Image Processing , 17(7):1144–1153,July 2008.[2] Benjamin Berkels, Ivan Cabrilo, Sven Haller, Martin Rumpf, and Karl Schaller. Co-registration of intra-operative brain surface photographs and pre-operative mr images.

Inter-national Journal of Computer Assisted Radiology and Surgery , 9(3):387–400, May 2014.[3] M. Burger, J. Modersitzki, and L. Ruthotto. A hyperelastic regularization energy for imageregistration.

SIAM Journal on Scientiﬁc Computing , 35(1):B132–B148, 2013.[4] Konrad Buscher, Huiyu Wang, Xueli Zhang, Paul Striewski, Benedikt Wirth, GurpannaSaggu, Stefan L¨utke-Enking, Tanya N. Mayadas, Klaus Ley, Lydia Sorokin, and Jian Song.Protection from septic peritonitis by rapid neutrophil recruitment through omental high en-dothelial venules.

Nature Communications , 7:10828, 2016.5] P.G. Ciarlet.

Three-Dimensional Elasticity . Mathematical Elasticity. Elsevier Science, 1988.[6] P.G. Ciarlet, A. Buttigieg, D.G. Crighton, A. Buttigieg, B. Miara, J.M. Thomas, M.J.Ablowitz, S.H. Davis, E.J. Hinch, A. Iserles, et al.

Introduction to Numerical Linear Al-gebra and Optimisation . Cambridge Texts in Applied Mathematics. Cambridge UniversityPress, 1989.[7] Gene H. Golub and Charles F. Van Loan.

Matrix Computations (3rd Ed.) . Johns HopkinsUniversity Press, Baltimore, MD, USA, 1996.[8] Nicholas IM Gould, Stefano Lucidi, Massimo Roma, and Philippe L Toint. Solving the trust-region subproblem using the lanczos method.

SIAM Journal on Optimization , 9(2):504–525,1999.[9] Ghassan Hamarneh and Preet Jassi. Vascusynth: Simulating vascular trees for generatingvolumetric image data with ground-truth segmentation and tree analysis.

Comp. Med. Imag.and Graph. , 34(8):605–616, 2010.[10] Stefan Heldmann and Nils Papenberg. A variational approach for volume-to-slice registra-tion. In

Scale Space and Variational Methods in Computer Vision , pages 624–635, Berlin,Heidelberg, 2009. Springer Berlin Heidelberg.[11] S´ebastien Loriot, Olga Sorkine-Hornung, Yin Xu, and Ilker O. Yaz. Triangulated surface meshdeformation. In

CGAL User and Reference Manual . CGAL Editorial Board, 5.1.1 edition,2020.[12] M. Marcus and V. J. Mizel. Transformations by functions in Sobolev spaces and lowersemicontinuity for parametric variational problems.

Bull. Amer. Math. Soc. , 79(4):790–795,07 1973.[13] P. Markelj, D. Tomaˇzeviˇc, B. Likar, and F. Pernuˇs. A review of 3d/2d registration methodsfor image-guided interventions.

Medical Image Analysis , 16(3):642 – 661, 2012. ComputerAssisted Interventions.[14] I. Merelli, P. Cozzi, D. D’Agostino, A. Cleamatis, and L. Milanesi. Images based system forsurface matching in macromolecular screening. In , pages 397–401, Nov 2008.[15] I. Merelli, P. Cozzi, D. D’Agostino, A. Clematis, and L. Milanesi. Image-based surface match-ing algorithm oriented to structural biology.

IEEE/ACM Transactions on ComputationalBiology and Bioinformatics , 8(4):1004–1016, July 2011.[16] J. Modersitzki.

Numerical Methods for Image Registration . Oxford University Press, 2004.[17] J. Nocedal and S. Wright.

Numerical Optimization . Springer Series in Operations Researchand Financial Engineering. Springer New York, 2006.[18] Yoshito Otake, Adam S Wang, J Webster Stayman, Ali Uneri, Gerhard Kleinszig, SebastianVogt, A Jay Khanna, Ziya L Gokaslan, and Jeﬀrey H Siewerdsen. Robust 3d–2d image regis-tration: application to spine interventions and vertebral labeling in the presence of anatomicaldeformation.

Physics in Medicine & Biology , 58(23):8535, 2013.[19] P. Pedregal.

Variational Methods in Nonlinear Elasticity . Society for Industrial and AppliedMathematics, 2000.[20] Nico Persch, Christopher Schroers, Simon Setzer, and Joachim Weickert. Physically inspireddepth-from-defocus.

Image Vision Comput. , 57:114–129, 2017.21] Lars Ruthotto, Chen Greif, and Jan Modersitzki. A stabilized multigrid solver for hyperelasticimage registration.

Numerical Linear Algebra with Applications , 24(5):e2095, 2017. e2095nla.2095.[22] Olga Sorkine and Marc Alexa. As-rigid-as-possible surface modeling. In

Proceedings of EURO-GRAPHICS/ACM SIGGRAPH Symposium on Geometry Processing , pages 109–116, 2007.[23] A Uneri, A S Wang, Y Otake, G Kleinszig, S Vogt, A J Khanna, G L Gallia, Z L Gokaslan,and J H Siewerdsen. Evaluation of low-dose limits in 3d-2d rigid registration for surgicalguidance.

Physics in Medicine & Biology , 59(18):5329, 2014.[24] Lin Xu and J. W. L. Wan. Real-time intensity-based rigid 2d-3d medical image registra-tion using rapidmind multi-core development platform. In , pages 5382–5385, Aug2008.[25] Weimin Yu, Moritz Tannast, and Guoyan Zheng. Non-rigid free-form 2d–3d registration usinga b-spline-based statistical deformation model.

Pattern Recognition , 63:689 – 699, 2017.[26] Lei-Hong Zhang, Chungen Shen, and Ren-Cang Li. On the generalized lanczos trust-regionmethod.