[PDF] Image registration and super resolution from first principles

Abstract

Image registration is the inference of transformations relating noisy and distorted images. It is fundamental in computer vision, experimental physics, and medical imaging. Many algorithms and analyses exist for inferring shift, rotation, and nonlinear transformations between image coordinates. Even in the simplest case of translation, however, all known algorithms are biased and none have achieved the precision limit of the Cramer Rao bound (CRB). Following Bayesian inference, we prove that the standard method of shifting one image to match another cannot reach the CRB. We show that the bias can be cured and the CRB reached if, instead, we use Super Registration: learning an optimal model for the underlying image and shifting that to match the data. Our theory shows that coarse-graining oversampled images can improve registration precision of the standard method. For oversampled data, our method does not yield striking improvements as measured by eye. In these cases, however, we show our new registration method can lead to dramatic improvements in extractable information, for example, inferring 10× more precise particle positions.

Full PDF

11 Image registration and super resolution from ﬁrstprinciples

Colin B. Clement, Matthew Bierbaum, and James P. SethnaLaboratory of Atomic and Solid State Physics, Cornell University,Ithaca, New York 14853-2501, USA

Abstract —Image registration is the inference of transforma-tions relating noisy and distorted images. It is fundamental incomputer vision, experimental physics, and medical imaging.Many algorithms and analyses exist for inferring shift, rotation,and nonlinear transformations between image coordinates. Evenin the simplest case of translation, however, all known algorithmsare biased and none have achieved the precision limit of theCramer Rao bound (CRB). Following Bayesian inference, weprove that the standard method of shifting one image to matchanother cannot reach the CRB. We show that the bias canbe cured and the CRB reached if, instead, we use SuperRegistration: learning an optimal model for the underlying imageand shifting that to match the data. Our theory shows that coarse-graining oversampled images can improve registration precisionof the standard method. For oversampled data, our method doesnot yield striking improvements as measured by eye. In thesecases, however, we show our new registration method can lead todramatic improvements in extractable information, for example,inferring × more precise particle positions. Index Terms —Image registration, statistical learning, inferencealgorithms, Cramer-Rao bounds, parameter estimation

I. I

NTRODUCTION

Image registration is the problem of inferring the coordinatetransformation between two (or more) noisy and shifted (ordistorted) signals or images. This deceptively simple processis fundamental for stereo vision [1], autonomous vehicles [2],gravitational astronomy [3], remote sensing [4], [5], medicalimaging [6], [7], microscopy [8], and nondestructive strainmeasurement [9]. At the cutting edge of microscopy, imagingsensitive biological materials [10], [11] and metal organicframeworks [12], [13] with Transmission Electron Microscopy,requires combining multiple low-dose high-noise images, toobtain a viable signal without destroying the sample. Whilemost techniques for registering and combining images are ac-curate for low noise, errors signiﬁcantly larger than theoreticalbounds can occur for a signal-to-noise ratio as low as 20 (noise5% of the signal amplitude); so far a general explanation ofthis error has been elusive.Much has been written about the uncertainty of shift esti-mations by analyzing the information theoretic limit known asthe Cramer-Rao bound (CRB) [14], [15], [16]. These worksobserved that no known estimators achieve the CRB for imageregistration. This sub-optimal performance has been blamed onbiased estimators: some claim interpolation errors explain thebias [17], [18], [4], [19] and others claim that the problemis inherently biased [14]. More works have explored non-perturbative estimations of the uncertainty, which yield larger estimates more consistent with measured error, but also relyon assumptions about the latent image [20], [21], [22].Here we solve these problems by studying the na¨ıve maxi-mum likelihood formulation of image registration. We explorea new derivation of the standard method (comparing one imageto match the other) by integrating out the underlying trueimage. We treat the standard method as a statistical ﬁeld theoryin which two images ﬂuctuate around each other, showingthat the shift uncertainty should scale quadratically with imagenoise ( σ ∆ ∝ σ ), while the na¨ıve CRB is linear ( σ ∆ ∝ σ ).We also show that bias in image registration is due to theimage edges. Our theory makes the novel prediction thatcoarse-graining images can dramatically improve shift preci-sion, which we conﬁrm numerically. While coarse-graininghelps, it requires oversampled images and knowledge of thehighest frequencies of the underlying image. We overcomethis limitation, and reach the true CRB, by shifting a learnedmodel for the underlying image to match the data. We useBayesian model selection to ﬁnd the model most supported bythe data, effectively learning the amount of necessary coarse-graining. We demonstrate the optimality of our new method—called Super Registration (SR)—with periodic images. Wealso demonstrate clear improvements in error and removal ofbias for general non-periodic images with Chebyshev imagemodels. Finally, we show that particle tracking is 10-20 × more precise when performed on images combined with SR.We conclude by discussing the implications of our theory onmore general nonlinear registration, and registration of imagescaptured with different imaging modes.II. T HEORY OF IMAGE FORMATION

In this work, image registration will be restricted to thetask of inferring a rigid shift relating two (or more) discretelysampled noisy images with sub-pixel precision. More generaltransformations are accommodated by our subsequent argu-ments through application of the chain rule. Deﬁning sometrue image (latent, to be discovered) intensity function I ( x ) with x ∈ R , we measure at least two images by samplingdiscretely: φ i = I ( x i ) + ξ i ψ i = I ( x i + ∆ ) + ξ i , (1)where φ i is the i th pixel of image φ and ξ i are white noisedistributed with zero mean and variance σ , and ∆ is the shiftbetween the images which we intend to infer. a r X i v : . [ phy s i c s . d a t a - a n ] F e b ∆ (a) Standard Fourier Shift ∆ (b) Super Registration

Fig. 1. Illustrations of image registration techniques. (a) A schematic of thestandard method of image registration which measures the shift ∆ betweennoisy data (grayscale images) by shifting one to match the other. (b) Aschematic of our proposed method, Super Registration, which infers the shift ∆ instead by learning the underlying image I (green contours), and shiftingthe coordinates until the model image best ﬁts the data (grayscale images). Equation 1 is our model, which we can express as thelikelihood p ( φ, ψ | ∆ , I ) of measuring φ and ψ given ∆ and I : p ( φ, ψ | ∆ , I ) ∝ exp (cid:18) − σ (cid:0) || φ − I || + || ψ − T ∆ I || (cid:1)(cid:19) , (2)where || x || = (cid:80) i x i . T ∆ represents the operator which trans-lates its argument by ∆ , for a continuous image T ∆ I ( x ) = I ( x − ∆ ) . We interpret this distribution as our image modelﬂuctuating around data. Note that Eq. 2 accommodates multi-image registration by multiplying more products of termscomparing images to the shifted latent image I .In order to infer ∆ after measuring the images φ and ψ we must reverse the conditional probability in Eqn. 2 usingBayes’ theorem. The posterior (post-measurement) probability p ( ∆ , I | φ, ψ ) of ∆ and I is p ( ∆ , I | φ, ψ ) = p ( φ, ψ | ∆ , I ) p ( ∆ , I ) p ( φ, ψ ) . (3) p ( ∆ , I ) is called the prior probability and p ( φ, ψ ) is calledthe evidence because, as we later show, it can be interpretedas the probability of our data given our choice of model. Thetask of inferring ∆ is achieved by maximizing this posteriorprobability. We deﬁne the maximum likelihood estimator of ∆ to be ∆ (cid:63) = max ∆ ,I p ( ∆ , I | φ, ψ ) , = max ∆ ,I p ( φ, ψ | ∆ , I ) p ( ∆ , I ) , (4)where the second line is possible because the evidence isindependent of ∆ and I .How accurately should we be able to measure ∆ ? If weassume we know the underlying image I , the answer is givenby the Cramer-Rao bound (CRB) [23]. For any parametervector θ , the CRB of θ is σ θ ≥ θ T g − θ , where the Informationmatrix (FIM) g µν = (cid:28) ∂ log p∂θ µ ∂θ ν (cid:29) . (5)The posterior p = p ( ∆ , I | φ, ψ ) is given by Eqn. 3 and θ µ are the parameters, i.e. ∆ and I . We can calculate the na¨ıveCRB for image registration, assuming we know the underlying image I , and that ∂I/∂x and ∂I/∂y are uncorrelated, thesmallest possible variance on the estimation of the x -directionshift ∆ x is σ x ≥ σ (cid:30) (cid:90) d x (cid:18) ∂I∂x (cid:19) . (6)In other words, if the data are very noisy or if the underlyingimage has no features, it will be difﬁcult to measure the shifts.Note that the CRB predicts that the shift error will scalelinearly with noise ( σ ∆ ∝ σ ). We reiterate that this is the CRBof the shifts assuming knowledge of the true image I . Sincethis is an unrealistic assumption for real data, we call Eq. 6 andits discrete analog the na¨ıve CRB. For previous derivations anddiscussions of the na¨ıve CRB for image registration, see [14],[15]. When discussing the CRB below we use the deﬁnitionrelated to Eq. 5 and not the intuitive result of Eq. 6. A. Deriving the standard method of image registration

In an experiment we have no access to the latent image I . We offer a new derivation of the standard method forovercoming this by marginalizing, or integrating out I : p ( ∆ | φ, ψ ) ∝ (cid:90) d I p ( φ, ψ | ∆ , I ) p ( I ) . (7)If we assume that p ( I ) ∝ , i.e. all images are equallylikely, we can perform the integral by ﬁrst recognizing that || ψ − T ∆ I || = || T − ∆ ψ − I || if T ∆ is a unitary transformation(preserves the L2 norm). Transforming discrete data willrequire interpolation. Linear, quadratic, cubic, bi-cubic, andother local interpolation schemes previously studied for thisproblem [17], [18], [4], [19] are not unitary—neatly explainingsome of their observed bias. In this work we will consideronly unitary interpolation by using Fourier shifting, howeverour ultimate solution will obviate this discussion by directlyemploying Eq. 2. Now the posterior p ( ∆ | φ, ψ ) is a productof integrals of the form (cid:90) d xe − σ ( ( x − a ) +( x − b ) ) ∝ exp (cid:18) − ( a − b ) σ (cid:19) . (8)Applying this to each pixel in the data we arrive at the marginallikelihood p ( ∆ | φ, ψ ) ∝ exp (cid:18) − σ || ψ − T − ∆ φ || (cid:19) . (9)We have derived the standard least-squares similarity mea-sure (it is usually written down intuitively), in which onesimply shifts one image until it most closely matches the other.This process is illustrated by Fig.1(a), which shows a pair ofsynthetic data which will serve as I in our numerical studies ofperiodic registration. It was calculated by sampling a 64 × P ( | I ( k ) | ) ∼ k − . e − ( kkc ) , (10)damped by a Gaussian with scale k c = k Nyquist / to ensurea smooth cutoff approaching the Nyquist limit, preventingaliasing.Notice that if T ∆ is not unitary that this objective is differentdepending on whether you shift one measured image or the other. Note also that in general image registration this inversetransformation may not exist; in such cases this method willfail. The literature features multiple implementations of Eq. 9using Fourier interpolation by either shifting the data [24]or upsampling by padding in Fourier space and ﬁnding themaximum cross-correlation [25]. The latter method can onlybe as accurate as the factor of upsampling, e.g. quadrupling(in 2 D ) the number of Fourier modes allows evaluatingshifts of half a pixel. While sophisticated extrapolations havebeen used to overcome the arbitrary choice of how much toupscale, we will exactly shift the data and optimize Eqn. 12directly. Writing the 2 D Fourier transform operator as F , weimplement T ∆ φ as: T ∆ φ = F − e − i k · ∆ F φ (11)Another important result of our theory is the σ = (2 σ ) in the denominator of Eq. 9: this likelihood function is fordata with twice the variance of our original problem, whichis consistent with taking the difference of two noisy signals.Some of the reported discrepancy ( √ ∼ ) between theCRB and observed error [14], [26], [15] can be explainedby the absence of this factor. Those studying multi-imageregistration have also neglected this modiﬁcation of the noiseﬂuctuations in their estimating of shift precision [26]. We haveobtained by integrating out the latent image I a distributionwhich depends only on our data φ and ψ and the unknown shift ∆ . We can now deﬁne ∆ (cid:63)m , the marginal maximum likelihood(ML) solution, which we will now refer to as the standardFourier shift (FS) method: ∆ (cid:63)m = max ∆ p ( ∆ | φ, ψ ) = min ∆ || ψ − T − ∆ φ || . (12)This new derivation of the standard method of imageregistration highlights and clariﬁes some important limitations.Only unitary (L2-preserving) interpolation for shifting imageswill lead to unbiased shift estimation, otherwise we are simplyoptimizing a corrupted likelihood. Second, comparing thesquared error between shifted images is only correct if thenoise in the images is Gaussian. If we were studying imageswith Poisson-distributed noise, for instance, the likelihood inEqn. 2 should be a Poisson distribution. The standard methodis often successfully employed for non-Gaussian noise. Wedo not doubt its efﬁcacy, but instead claim that the standardmethod cannot be optimal in this case because it violates theimplicit assumptions of Gaussian noise.III. S TATISTICAL PROPERTIES OF THE STANDARD METHOD

It is well documented in the literature that the errors in shiftinference via FS are much larger than the na¨ıve CRB. Figure 2shows the noise-averaged error (pink dots) of inferring theshifts as measured using the standard Fourier shift methodin Eq. 12. The measured error grows quadratically with theGaussian additive noise σ , dwarﬁng The na¨ıve CRB (shadedpink region). The follow section will derive a theory (blackdotted) to predict this quadratic error growth.Say we measure the ﬁelds ψ i and φ i , then the log-marginalposterior is (up to a constant) proportional to L = 12 (cid:88) i ( ψ i − T − ∆ φ i ) = 12 (cid:88) k | (cid:101) ψ k − e ik ∆ (cid:101) φ k | , (13) Noise σ ∆ E rr o r( p i x e l s ) Fourier Shift (FS)FS CRBTheorySuper Registration (SR)SR CRB

Fig. 2. Comparing the noise-averaged errors of the inferred shift ∆ measuredby the standard Fourier Shift method and Super Registration in the case ofaligning synthetic periodic images. For each noise level, we generate an en-semble of 1000 × images statistically similar to Fig. 1 ( I ( k ) ∼ k − . ),measuring the average error for both methods, along with the minimumexpected error, CRB. The error of the standard method (pink dots) growsquadratically with noise, whereas the naive CRB (pink shaded region) predictsa linear relationship. Our theory (black dashed line) accurately describes thequadratic dependence in the error, matching numerical experiments. SuperRegistration (green pluses) demonstrates much lower error, recovers the linearrelationship between error and noise, and reaches its CRB (green shadedregion). where (cid:101) φ k and (cid:101) ψ k are the Fourier transforms of our data.Our measurements ﬂuctuate around the true latent image I according to p ( ψ ) ∝ exp (cid:18) − σ || ψ − I ( x ) || (cid:19) ,p ( φ ) ∝ exp (cid:18) − σ || φ − I ( x − ∆ ) || (cid:19) , (14)where ∆ is the latent shift and σ is the variance of the noise.Near the true shift ∆ we can expand the marginal likelihoodas L (∆) = L (∆ ) + (∆ − ∆ ) ∂ L ∂ ∆ + 12 (∆ − ∆ ) ∂ L ∂ ∆ + . . . , (15)which is approximately minimized by ∆ − ∆ = − ∂ L ∂ ∆ (cid:30) ∂ L ∂ ∆ = − i (cid:80) k k (cid:101) ψ k e − ik ∆ (cid:101) φ − k (cid:80) k k (cid:101) ψ k e − ik ∆ (cid:101) φ − k . (16)We can calculate the error of the standard method by averagingEqn. 16 and its square over the distributions in Eqn. 14. A. Bias of the standard method (1D)

Writing Eqn. 16 as

A/B we can Taylor expand about A = (cid:104) A (cid:105) and B = (cid:104) B (cid:105) , then average over the noise to ﬁnd (cid:28) AB (cid:29) = (cid:104) A (cid:105)(cid:104) B (cid:105) (cid:18) B ) (cid:104) B (cid:105) (cid:19) − cov( A, B ) (cid:104) B (cid:105) + . . . , (17)where (cid:104)·(cid:105) denotes integration over the distributions of Eqn. 14.Notice that (cid:104) A (cid:105) = (cid:42)(cid:88) k k (cid:101) φ k e − ik ∆ (cid:101) ψ − k (cid:43) = (cid:8)(cid:8)(cid:8)(cid:8)(cid:8)(cid:8)(cid:42) (cid:88) k kI k I − k , (18) which is zero because the summand is odd in k . Therefore theaverage bias for periodic images is to lowest order (cid:28) AB (cid:29) = − (cid:104) AB (cid:105)(cid:104) B (cid:105) . (19)In general for non-periodic images (cid:104) A (cid:105) (cid:54) = 0 . Examining thecontinuum limit of (cid:104) A (cid:105) in real space, we ﬁnd (cid:104) A (cid:105) = (cid:90) d x I ∂I∂x = 12 (cid:90) d x ∂∂x I = 12 (cid:0) I ( x N ) − I ( x ) (cid:1) , (20)where x N and x are the endpoints of the domain; (cid:104) A (cid:105) is a total derivative depending only on the edges of theimage. Therefore we hypothesize that the bias of the standardFS method of image registration shown in Fig. 5 will bedominated by the edges of the data. Ziv and Zakai in 1969 [21]and others [14], [3], share this speculation, however, whereasthey argued that impingement of shift ﬂuctuations onto thelimits of the domain caused bias, our theory suggests thatstructures of the edges of images themselves cause bias.Evaluating the remaining moments of Eq. 19 we ﬁnd (cid:104) B (cid:105) = (cid:88) k k I k I − k , (21)which is the roughness of the latent image I , found in thedenominator of the na¨ıve CRB in Eqn. 6. The last correlationfor the average bias is (cid:104) AB (cid:105) = (cid:88) kk (cid:48) kk (cid:48) e − i ( k + k (cid:48) )∆ (cid:104) (cid:101) ψ k (cid:101) ψ k (cid:48) (cid:105)(cid:104) (cid:101) φ − k (cid:101) φ − k (cid:48) (cid:105) , (22)which can be evaluated using the moments (cid:104) (cid:101) ψ k (cid:105) = I k , (cid:104) (cid:101) φ k (cid:105) = e − ik ∆ I k , (23) (cid:104) (cid:101) ψ k (cid:101) ψ k (cid:105) = I k I k , (cid:104) (cid:101) φ k (cid:101) φ k (cid:105) = e − ik I k I k , (24) (cid:104) (cid:101) ψ k (cid:101) ψ − k (cid:105) = I k I − k + σ , (cid:104) (cid:101) φ k (cid:101) φ − k (cid:105) = I k I − k + σ . (25)Considering the sum in Eqn. 22 in three cases k (cid:48) = − k , k (cid:48) = k and k (cid:48) (cid:54) = ± k we can apply the moments to ﬁnd (cid:104) AB (cid:105) = (cid:88) k (cid:16) k (cid:0) ( I k I − k + σ ) + ( I k I − k ) (cid:1) + k (cid:88) k (cid:48) (cid:54) = ± k k (cid:48) ( I k I − k ) (cid:17) = 0 , (26)from which we conclude the entire correlation function van-ishes due to each term of the summand being odd in k . Further,numerical evidence and inspection of higher order terms in theexpansion of Eq. 17 support the conclusion that for periodicimages the standard Fourier shift method of image registrationis unbiased. B. Variance of the standard method (1D)

Turning our attention to the variance or expected error ofthe bias given by Eq. 16; an expansion and average of ( A/B ) (simplifying for (cid:104) A (cid:105) = 0 ) yields to lowest order var (cid:18) AB (cid:19) = (cid:104) A (cid:105)(cid:104) B (cid:105) . (27) Equation 21 gives us (cid:104) B (cid:105) , so we need only to compute thecorrelation function (cid:104) A (cid:105) : (cid:104) A (cid:105) = − (cid:88) k (cid:88) k (cid:48) kk (cid:48) e − i ( k + k (cid:48) )∆ (cid:104) (cid:101) ψ k (cid:101) ψ k (cid:48) (cid:105)(cid:104) (cid:101) φ − k (cid:101) φ − k (cid:48) (cid:105) = − (cid:24)(cid:24)(cid:24)(cid:24)(cid:24)(cid:24)(cid:24)(cid:24)(cid:24)(cid:24)(cid:58) (cid:88) k (cid:88) k (cid:48) (cid:54) = k kk (cid:48) | I k | | I k (cid:48) | + (cid:88) k k (cid:0) ( I k I − k + σ ) − ( I k I − k ) (cid:1) , (28)where as before we have decomposed the sum into terms forwhich k (cid:48) (cid:54) = k , k (cid:48) = − k and k (cid:48) = k . We ﬁnd that the varianceof the bias (which is also the variance of the estimated shiftssince we have shown (cid:104) ∆ (cid:105) = ∆ ) is approximately σ = (cid:10) (∆ − ∆ ) (cid:11) = 2 σ D + Lπ σ D , (29)where D = (cid:80) k k I k I − k is the roughness of the image.We used the fact that (cid:80) k k = (2 + L ) π / L ≈ Lπ / for a one-dimensional signal with L points. The lowest orderterm in Eq. 29 is twice the na¨ıve CRB shown in Eq. 6,consistent with the fact that the marginal posterior in Eq. 9has twice the variance of the noise. We have shown thatthe standard Fourier shift method cannot achieve the na¨ıveCRB. Notice that the variance grows beyond the CRB ata rate proportional to σ and the image size L , so thaterror grows quadratically with noise. This extra factor of theimage volume means that sampling a band-limited (sampledbelow the Nyquist limit) image at a higher rate—increasingthe resolution without increasing information content—canactually decrease the registration precision for the standardFourier shift method. We discuss and verify this observationfollowing an extension of this theory to two-dimensions. C. Variance of the standard method in two dimensions

Generalizing our expansion of the marginal likelihood weﬁnd L ( ∆ ) = L ( ∆ ) + ( ∆ − ∆ ) T ∇L + 12 ( ∆ − ∆ ) T ∇ L ( ∆ − ∆ ) + . . . , (30)from which we conclude that the two-dimensional analogueof Eq. 16 is ∆ − ∆ = − (cid:0) ∇ L (cid:1) − ∇L . (31)If the off-diagonal terms of the Hessian ∇ L are smallcompared to the diagonal terms (the image is approximatelyisotropic), the two dimensions decouple into an applicationof Eq. 29 for each dimension. This is generally a goodapproximation except for contrived data. In this case weﬁnd the precision of two-dimensional image registration isapproximately (cid:10) ( ∆ − ∆ ) (cid:11) =  σ D x + Nπ σ D x σ D y + Nπ σ D y  , (32)where N is the number of pixels in the one of the measuredimages, and D x = (cid:80) k k x I k I − k and D y = (cid:80) k k y I k I − k are the horizontal and vertical image roughness. Eq. 32 is usedin Fig. 2 (black dotted) where we see excellent agreementwith the numerically measured error (pink dots). The excellentagreement—in spite of ignoring the cross terms—can beexplained by expanding Eq. 31 for small values of the off-diagonal terms: the lowest order correction averages to zero.Our analysis has shown that the error of shift estimatesof the standard Fourier shift method grow much faster thanthe CRB. Why do the errors scale quadratically with noise?Mackay found that in general and especially for ill-posedproblems (like distinguishing noise from signal), integratingover parameters can yield distributions with stretched andskewed peaks, biasing the maximum and leading to largeerrors [27]. We integrated over all possible images in orderto derive the standard FS registration method. Did this choicesabotage our effort to achieve the ultimate precision? Forexponential functions (like a Gaussian or our likelihoodsabove), there is a deep relationship between optimization andintegration through Laplace’s method or the method of steepestdescent [28]. By integrating over all possible images, weessentially maximized log p ( φ, ψ | I, ∆ ) over I —estimating thelatent image—and used that estimate for predicting the shift.This estimate is, however, unreliable as it makes no distinctionbetween the signal and the noise. The high frequency modes ofthe data, dominated by noise and ironically most discriminat-ing for shift localization, cause the ﬂuctuation of our inferredshifts to be much larger than the CRB. This is illuminated bythe following section which considers the process of coarse-graining or binning image data. D. Coarse Graining Data can Improve Precision

Our theory for the variance of the shift predicts that σ = 2 σ D (cid:16) Nπ σ D (cid:17) . The factor of the image volume N in the correction term inspired us to consider reducing N without changing σ or D . Coarse-graining the data by somelinear factor a —shown schematically in Fig 3(a)—should notchange the CRB assuming the latent image I is smooth on thatlength scale (or, equivalently, assuming that the data is sampledat least a -times the Nyquist frequency). Assuming that eachpixel of the data has noise of variance σ , the variance ofnoise for each a × a block should be a σ (variances ofuncorrelated noise add). The denominator of the na¨ıve CRB D = (cid:80) k k | I k | is subtler: the amplitude of each pixelincreases by a factor of a ( I k → a I k ), and the blocksum only removed Fourier modes with zero amplitude byour assumption above, so D → a D . Finally the coarse-grained image will have its coordinates expanded by a , so thatthe variance should be rescaled by a . Therefore coarseningshould modify our variance prediction of the Fourier shiftmethod accordingly: σ = a · a σ a D (cid:18) πN/a a σ a D (cid:19) = 2 σ D (cid:18) πN a σ D (cid:19) . (33)Our theory predicts that coarse-graining over-sampled im-ages can improve shift inference by reducing the correction σ = . (a) a = 1 a = 4 a = 16 Noise σ − − ∆ y E rr o r (b) Measured a = 1 a = 2 a = 4 a = 8 a = 16 √ σD y Theory

Fig. 3. (a) An oversampled image (the image varies on a scale × smaller than the Nyquist frequency limit) with 5% additive white Gaussiannoise then coarse-grained by summing over a × a blocks. Shown are a = 1 , a = 4 , and a = 16 , representing a drastic reduction in image size while notremoving any information which localizes the shifts between images. (b) Theerror in inferred shifts (dots) for the standard Fourier shift method applied tothe image after coarsening by 1, 2, 4, 8, and 16 blocks. The original imagewas chosen to be smooth enough so that coarsening by a factor of 16 wouldnot violate the Nyquist sampling theorem. The solid lines are the predictionof our theory, and the dotted line is √ times the na¨ıve CRB, √ σ/D y . term, but that the method can at best yield a variance equal totwice the naive CRB. This result may explain improvementsin registration precision from re-binning image intensitiesobserved in other works [29], [30]. Figure 3(b) conﬁrmsthe predicted relationship, where the black dots indicate thevariance of a N = 1024 image which was oversampled bya factor of 20. Each lighter colored dot series is the varianceafter coarsening by some factor a , and the solid lines are givenby Eq. 33. We see excellent agreement with our theory, and aconvergence of the variances onto the σ /D line. Note thatthe original image ( a = 1 ) variances differ from our theoryfor large noise: perhaps the limits of large images and largenoise are where our approximations in truncating the Taylorexpansion in Eq. 27 breaks down.Coarsening smooth images only throws away informationwhich is dominated by noise. When we use the coarsenedimages in the standard FS method, we implicitly estimate theunderlying image but with less noisy modes, and will get amore reliable estimate. In a real experiment without knowledgeof the true length scale of the image, we will not know theoptimal coarsening length scale. In the following section wepropose our generative model which will use Bayesian modelselection to infer the image complexity supported by the data.IV. S UPER REGISTRATION

How can we achieve the ultimate precision for imageregistration as predicted by the CRB? We have seen that the standard FS method of image registration which directlycompares two images has a variance in its shift predictionof the form σ = 2 σ (1 + N πσ / , where the CRBis σ = σ / (cid:80) k k I k I − k . We are still studying periodicimages, so it is natural to consider removing noise with a ﬁlterlike the optimal Wiener ﬁlter. This manifests by modifying ourlog-marginal likelihood in Eq.13 with the rule (cid:101) ψ k → A k (cid:101) ψ k and (cid:101) φ k → A k (cid:101) φ k , for some ﬁlter function A k . This modiﬁca-tion simply changes σ → σ / (cid:80) k k A k I k A − k I − k , andsince A k A − k ≤ (a ﬁlter only reduces power), this can onlyincrease σ and thus reduce our precision.Faced with this fact we abandon the standard method ofimage registration and return to ﬁrst principles by studyingthe likelihood deﬁned by the image formation model in Eq. 2.Instead of shifting the data, we will model the image and shiftthat, as shown schematically in Fig. 1. This method will resultin a de-noised and, depending on the data, a super-resolutionestimate of the latent image. Inspired by the inextricablerelationship between registration and super-resolution that wehave discovered, we call our new method Super Registration(SR). Our success depends on using all that Bayesian inferencehas to offer, and so we proceed with a discussion of evidence-based model selection. A. Bayesian inference and model selection

Following Mackay’s discussion on integration versus opti-mization in inference with hyperparameters we will choose amodel space and from this select the best model by comparingthe model evidence, p ( φ, ψ ) . The evidence is simply thenormalization constant of the posterior Eq. 3; its utility forselecting the best model can be exposed by a seemingly eruditeincrease in notational complexity which makes manifest moreof the assumptions in our model. Consider a model of imageformation for the case of periodic image registration, expressedas the likelihood of measuring two images p ( φ, ψ | ∆ , I ) . Nowthat we are optimizing over I instead of integrating, we mustchoose some parameterization I ∈ H where H is some spaceof image models, e.g. a Fourier series or sums of polynomials.This choice must be reﬂected in the conditionals of ourprobabilities, so that the likelihood of measuring φ and ψ must now be written p ( φ, ψ | ∆ , I, H λ ) , where H λ representsa speciﬁc choice of image model.Proceeding with the inference task at hand by writing again(with our new notation) the result of Bayes’ theorem shownin Eq. 3 we see that the posterior now reads p ( ∆ , I | φ, ψ, H λ ) = p ( φ, ψ | ∆ , I, H λ ) p ( ∆ , I |H λ ) p ( φ, ψ |H λ ) . (34)The solution to our problem still lies in studying this posteriordistribution, but we now must also infer the best model H λ .We again apply Bayes’ theorem, ﬁnding the probability thatour model is true given our measured images p ( H λ | φ, ψ ) ∝ p ( φ, ψ |H λ ) p ( H λ ) . (35) We have explicitly ignored the normalization constant p ( φ, ψ ) . Assuming we have no prior preference for somemodels over others, p ( H λ ) ∼ , so inferring which modelis most likely given the data is equivalent to maximizing p ( φ, ψ |H λ ) , which is the normalization of Eq. 34.Therefore Bayesian inference for image registration consistsof the following steps given some data φ and ψ .1) Choose some model H λ and evaluate Eqn. 34, the pos-terior p ( ∆ , I | φ, ψ, H λ ) .2) Summarize the posterior by calculating the position andwidths of the maximum likelihood ∆ and I .3) Evaluate Eqn. 35, the model evidence p ( φ, ψ |H λ ) , byestimating the normalization of the posterior.4) Repeat steps 1-3 with some subset of the model space H .5) Choose the model H λ with the largest evidence andexamine its concomitant posterior distribution.The ﬁnal (unlisted) step is to examine and decide whether theresiduals and the maximum likelihood image and shifts arereasonable.This recursive process of acknowledging all the contextand condition of our model and inverting them with Bayes’theorem can go on forever. We could for instance considera probability over the parameters θ of our model H λ ( θ ) ,adding another integration or optimization to the steps above.Fortunately, the deeper these model assumptions go, the lessthese decisions affect the outcome of our inference [27].Bayesian inference does not exclude the experience of theresearcher; we will terminate the inference recursion with ourown judgement. B. Super Registration for periodic images

Returning to our periodic image registration problem, let uspursue the inference steps above in a concrete example. Thenatural model space for periodic images consists of Fourierseries, indexed by the maximum frequency allowed. Given twoimages φ and ψ , the probability of measuring these imagesgiven some latent image I and shift ∆ is log p ( φ, ψ | ∆ , I, H λ ) = − σ λ (cid:88) k =0 | φ k − I k | + | ψ k − e − i k · ∆ I k | − log Z L , (36)where λ indexes the complexity of the model and φ k , ψ k , I k are the components of the Fourier transforms of our imagemodel, and Z L is the normalization. Assuming a constant prioron shifts and images, the maximum likelihood of the shifts andimage is the solution of ∆ ML , I ML = min ∆ ,I λ (cid:88) k =0 | φ k − I k | + | ψ k − e − i k · ∆ I k | . (37)Equation 37 is in the standard form of a nonlinear leastsquare problem which we solve by alternating linear least p ( φ, ψ ) = (cid:80) i p ( φ, ψ |H i ) p ( H i ) . This constant changes when weconsider more models, which naturally must happen when we obtain moredata, but does not inﬂuence the preference of one model over another. λ (cid:63)

30 45 60

Complexity of image model λ σ ∆ )0.10.2 ∆ e rr o r( σ ∆ ) Maximum evidence minimizes error ( σ = 0 . Evidence − log p ( φ, ψ )Measured errorEstimated CRBTrue CRB N e ga t i v e l og e v i d e n ce ( ) Fig. 4. Using 1000 pairs of 64 ×

64 images with additive Gaussian noiseand I ( k ) ∼ k − . , we computed the model evidence p ( H λ | φ, ψ ) (blackcurve) for all Fourier cutoffs indexed by λ , showing that when the evidenceis maximized the actual shift error (green crosses) is minimized. Further, thiserror is nearly indistinguishable from the CRB (green dashed). Finally, thena¨ıve estimate of the CRB (solid green) is computed from the curvature ofthe posterior using Eqn 5 the Fisher Information. During a real experimentonly the evidence (black curve) and the na¨ıve curvature estimate of the CRB(solid green) are available, but when the evidence is maximized all estimateof the error match. squares for I k and using Levenberg-Marquardt for ∆ . Fora given image model H λ we can ﬁnd the most likely shiftand image by evaluating Eq. 37, calculate the covariance, andcompute the evidence. Assuming ﬂat priors on ∆ , I k and H λ the evidence is the integral of our likelihood over ∆ and I : Z L = (cid:90) d I k d ∆ p ( φ, ψ | ∆ , I, H λ ) . (38) Z L can be computed by applying Laplace’s method of inte-gration using the Jacobian of the least squares problem.Figure 4 shows the result of step 4 of our algorithm forthe periodic data used in all numerical experiments so far(shown in Fig. 1), where have used every possible Fouriercutoff. We have inverted the evidence to guide the eye, sothat the minimum of the black curve is the most likelymodel. For this true image and noise level the most likelymodel is λ = 15 (15 ×

15 sinusoids). The smallest observederror (green crosses) in shift inference is also precisely at λ = 15 , and is consistent with the CRB (green dashed). Themost likely model provides the most precise inference of theshifts. The maximum evidence solution has been interpretedto embody Occam’s Razor that the simplest explanation ismost likely [31]. Therefore evidence-based model selectioncan systematically infer the number of degrees of freedom assupported by the data, avoiding over-ﬁtting and larger errorsthan the CRB.The solid green line of Fig. 4 is the CRB estimated byevaluating the second derivative of the log-likelihood; noticethat this erroneously continues to decrease with increasingcomplexity. In a real experiment we only have access to theevidence (solid black line) and this curvature estimate of theCRB (solid green line). The maximum evidence model isalso where all of our estimates of the shift error, motivatingfurther the utility of the evidence-based choice of modelcomplexity. Finally note that when the complexity is chosento be 64 (or all Fourier modes are used) the measured error σ ∆ ≈ . . In Fig. 1, when the noise is σ = 0 . , the same as in the evidence experiment above, the observed error ofthe standard FS method is also σ ∆ ≈ . . Therefore wesee numerical correspondence between integration over theunderlying image and optimization without selecting modelcomplexity by considering the evidence. C. General non-periodic Super Registration

Following the clarity of studying image registration in theperiodic case, we turn our attention to general non-periodicimages. Here there is no clearly natural model; images areextremely complicated. While there are exciting candidates inthe form of deep convolutional neural networks, these objectscannot (currently) be evaluated at arbitrary points in space;they have no notion of continuous locality [32]. In generalthe researcher’s knowledge about the physical objects beingimaged should inspire the model space. A very speciﬁc andsuccessful example is the Parameter Extraction by ModelingImages (PERI), which modeled almost every aspect of aconfocal microscope, extracting enough information from alight microscope to infer the parameters of the van der Waalsinteraction [33]. Lacking such speciﬁc inspiration therefore wechose sums of Chebyshev polynomials, in part because of theirexcellent approximation properties [34].We generated non-periodic data from the same distributionin Eq. 10, sampled twice as large (128 × ∆ , cropped out a 64 ×

64 region, and added noise. Figure 5show results for the error (pink dots and green crosses) andbias (pink and green lines) using these synthetic data, asa function of both noise σ (Fig. 5(a)) and true shift ∆ (Fig. 5(b)). Pink denotes the standard FS method and greendenotes Super Registration. Figure 5(a) shows that the standardFS method has an oscillating bias which is zero at wholeand half-pixels, and has an oscillating error which is largestat whole pixel shifts and smallest at half pixel shifts. Thepink shaded region is the CRB of the FS method. Figure 5(b)shows super-linear error (pink) growth for FS, compared withour theory from Eq. 32 (black dotted), and a bias (pink line)deviating slowly but consistently from zero.Figure 5(a) shows that Super Registration has nearly aconstant bias (green line) and error (green crosses) as afunction of true shift ∆ , and bias smaller its CRB (greenshaded). The error is much smaller than the standard FSmethod, and is one-third the error of the FS method when σ = 0 . (10% noise). Finally we see in Fig. 5(b) that the errorof SR grows linearly with noise. While SR here does not reachthe CRB, it scales the same as the CRB. A better image modelshould result in errors more consistent with the CRB. Becausewe generated data by randomly sampling in Fourier space,shifting, then cropping, our Chebyshev polynomials cannotperfectly represent that signal. This is an important reminderthat the CRB depends on the chosen model. Since the CRB isdeﬁned as the inverse of the Fisher Information in Eq. 5, theCRB is model-dependent, and thus the standard FS methodand SR have different bounds.How would Super Registration perform on data which hasnon-Gaussian noise? We cannot guarantee optimal precision inthis case, because our model assumes the noise is Gaussian. SR . . . . . y -direction True shift ∆ . . . ∆ y e rr o r a ndb i a s (a) Shift-dependence for σ = 0.075 0 .

00 0 .

02 0 .

04 0 .

06 0 .

08 0 . σ (b) ∆ =(0.94,-1.42) Super registration(SR) biasSR errorSR CRBFourier shift biasFS errorFS CRBFS theory error

Fig. 5. Comparing the error and bias of the standard Fourier shift (FS) method and Super Registration (SR) for non-periodic data. The synthetic data weregenerated by the model I ( k ) ∼ k − . , twice as large as necessary, Fourier shifted and then cropped to produce non-periodic images. Errors and biases weremeasured from 1500 64 ×

64 noise samples. (a) The ∆ y biases, errors, and CRBs for the standard FS (pink) and SR (green) are shown as a function of thetrue real shift ∆ . The standard method suffers from errors (pink dots) and bias (pink line) that are periodic in ∆ . Super Registration shows almost zerobias (green line) and no periodic structure in the error (green crosses). Similarly to the periodic case, SR is much closer to its CRB (green shaded) than thestandard FS method is to its CRB (pink shaded). (b) The biases, errors and CRBs for FS and SR methods as a function of noise for a ﬁxed random shift ∆ = (0 . , − . . The standard FS method has super-linear error (pink dots) growth with noise, and a monotonic bias (pink line) large than its CRB(pink shaded). Super Registration has linear error (green cross) growth about twice its CRB (green shaded), and a bias (green line) consistent with zero. would provide reliable results, however, in the same way thatthe FS standard method provides reliable results in this case.We can claim this because optimization (SR) and integration(FS) are the same—following the method of steepest descentor Laplace’s method of integration—so that a fully compleximage model (one degree of freedom for each pixel) wouldbe statistically the same as shifting one image to match theother. The evidence maximization procedure, however, is notguaranteed to be effective, as we know the model assumes theincorrect noise distribution.For many experimental images, Super Registration offersonly a marginal improvement in the image quality as measuredby eye. For a small shift error ∆ − ∆ the image intensityreconstruction error is ∆ I ≈ ( ∆ − ∆ ) · (cid:126) ∇ I . For smooth,highly sampled images visual changes will be small. Mostexperiments do not operate in the regime where they are notsampling at a high enough rate to see the structure of theirsample. Although the reconstructions for many experimentswill not vary dramatically visually, we show that the shifterrors can dramatically interfere with the information extractedfrom the reconstructions. When inferring parameters from datasuch as object sizes, positions, and orientations, correlationfunctions, and local contrast, the precision of these quantitieswill be limited by the quality of the registration. To emphasizethe scale of these errors, in the next section we demonstratea dramatic improvement in particle position inference fromcorrectly registered images.V. P ARTICLE TRACKING ERRORS

A very common task in image processing is tracking particlepositions. High precision, especially in atomic-scale TEM andSTEM, is important for understanding real-space structure. Forexample, charge density waves cause atoms to deviate fromtheir lattice by tiny amounts, and can be studied by carefullymeasuring the positions of the atoms in real space [35]. ForHigh-angle Annular Dark Field (HAADF) STEM, the imageof an atom is well-approximated by a 2D Gaussian [36]. InTEM and STEM, noise is often Poisson-distributed. Both SR and the standard method assume image noise is Gaussian, andachieving optimalty for Poisson noise will require modelingthe noise correctly by modifying the likelihood in Eqn. 2.Assuming Gaussian noise, then, we created synthetic dataof a pair of Gaussian particles, shown in Fig. 6(a) with10% additive noise. Simulating drift in a realistic STEMexperiment, we created 8 copies of the two particle images,randomly shifted. For each noise level we sampled 1000 noiseinstances, with each reconstructing the underlying with bothFS and our Chebyshev-polynomial based Super Registration.Figure. 6(b) shows the error of inferring the position ofthe larger particle using both the FS reconstruction (pink line)and SR reconstructions (green line). For σ = 0 . or 30% noisewe see that the precisions of particle position are 10x betterusing SR than FS. Further, the SR method, not even usingthe correct model (a sum of Gaussian particles), is only abouttwice the CRB for particle position inference (black dotted).Finally, we show the result when using shifts inferred by thesame data coarsened by a = 3 , which was chosen to havethe lowest error without being biased. In summary we seethat even though small shift errors do not have a dramaticeffect on the reconstructed image as measured by eye, thereare drastic effects on the precision of extractable informationfrom the reconstructions. A. Computational complexity

The standard Fourier shift method requires a Fourier trans-form of one of the images for each iteration of the optimiza-tion, ultimately scaling in time as O ( N log N ) , where N isthe number of pixels in one image. Super registration requiresestimating the underlying image, and thus requires O ( N M ) where M is the number of polynomials used in the imagemodel. SR requires trying multiple values of M , and multiplemodels, to ﬁnd the greatest evidence. For two × images, FS takes less than a second on a modern computer. SRrequires an hour or more to try multiple values of M , but onlya few minutes to ﬁnd the shifts and image for a given M . Formulti-image registration, optimal FS requires comparing all (a) Model I with σ =0.1 . . . Noise σ . . . . . . . . A v e r ag e y - p o s i t i o n e rr o r (b) Precision of Position Inference CRBSuper RegistrationFS reconstructionFS Coarsened

Fig. 6. (a) A model image of two Gaussian particles with 10% Gaussianadditive noise. Eight of these images with random sub-pixel relative shiftswere generated, and 1000 noise samples were drawn. For each noise sample,the underlying image was reconstructed either by the standard Fourier shift(FS) reconstruction or with Super Registration. With each reconstruction weﬁt the Gaussian models which generated the data, inferring the most likelyparticle position and width. (b) The average error of inferring the y -positionof the larger particle from images reconstructed with the standard FS method(pink line), a coarse-grained image (pink dotted), and Super Registration(green line). pairs of images, and so scales as O ( L N log N ) for L images,whereas SR scales as O ( LN M ) , as it compares the data onlyto the model. Memory requirements depend on the algorithmused. In this work we used Levenberg-Marquardt nonlinearleast-squares optimization, which requires O ( LN M ) memoryto store the Jacobian, and so images larger than × areimpractical.There are several open opportunities for improving theperformance of Super Registration. Memory consumption andcomputational time can be improved by using VariationalInference and Stochastic Gradient Descent, which scales with O ( LN ) in memory, and will be the subject of future work. Alocal image model (where each image parameter only modiﬁesa small area of the image), such as radial basis functions,would scale even better than the Fast Fourier Transform, as O ( N ) . Finally, GPUs are designed to perform optimal imagecalculations, and SR could achieve at least 10 × (by na¨ıveFLOP counts) the performance as compared to a CPU.VI. C ONCLUSION

Through a statistical theory of image formation, we havederived the standard method of image registration, which shiftsone image to match another. Our theory predicts that shifterrors for the standard FS method grow quadratically withnoise, much faster than the linear relationship of the CRB.Our explanation for the deviation between the na¨ıve CRB andthe standard method comes from a deep relationship betweenintegration and optimization. The resulting formula is usefulfor designing experiments which require image registrationand must be performed using the standard method. Ouranalysis leads to the surprising fact that coarse-graining thedata can improve the shift errors.We develop a new method of image registration, whichmodels the underlying image, shifts that to match the data,and follows Bayesian inference to select the image modelfor which there is the most evidence. Our theory reveals an inextricable relationship between image registration andsuper-resolution—that ultimate shift precision is predicatedon selecting a probable model. Therefore we named our newmethod Super Registration. We showed for periodic imagesthat a Fourier series image model achieves errors consistentwith the CRB. We demonstrated superior bias and expectederror performance for general non-periodic images, and dis-cussed the shortcomings of our general model. Finally, weshowed that, despite marginal improvements in image qualityas measured by eye, particle tracking experiments can be 10 × more precise when using Super Registration reconstructions.Our results can be extended to more general transforma-tions: by application of the chain rule each term in ourcalculation of the average bias and variance will be modiﬁedby partial derivatives. It is reasonable to assume that the sameproblems—nonzero bias and errors which are much larger thanthe CRB—will persist for transformations like afﬁne skews,rotations, and non-rigid registrations. Super Registration canaccommodate all of these problems by constructing the for-ward transformation instead of reconstructing the inversetransformation.Finally, medical imaging consists of lining up images ofthe same tissue from different modes like X-ray and MagneticResonance Imaging (MRI) [6], [7]. The Super Registrationmethod involves constructing a generative model for the data,and this perspective reminds us that contrast and features in X-ray and MRI will be different because they respond to differentunderlying tissue structures. Bias and large errors for thisproblem have been observed and attributed to this fact [37].Therefore some underlying model of tissue component den-sities and a model of image formation (Super Registration)will be critical for accurately and precisely registering theseimages.Image registration is a very important and fundamentalproblem in medical imaging, remote sensing, self-drivingautomobiles, non-destructive stress measurement, microscopy,and more. Our theoretical study of the fundamental problemof rigid shift registration in the presence of noise answerslong-standing questions on the precision and accuracy of shiftinference, elucidates an inextricable link between registrationand super-resolution, and inspires a solution to these problemswith wide applicability.A CKNOWLEDGEMENTS

Thanks to Ismail El Baggari, S.B. Kachuck, K.P. O’Keeffeand D.B. Liarte for useful conversations in the preparation ofthis manuscript. This work was supported by the NSF Centerfor Bright Beams, award

EFERENCES[1] B. D. Lucas, T. Kanade et al. , “An iterative image registration techniquewith an application to stereo vision,” 1981.[2] R. W. Wolcott and R. M. Eustice, “Visual localization within lidar mapsfor automated urban driving,” in

Intelligent Robots and Systems (IROS2014), 2014 IEEE/RSJ International Conference on . IEEE, 2014, pp.176–183.[3] D. Nicholson and A. Vecchio, “Bayesian bounds on parameter estimationaccuracy for compact coalescing binary gravitational wave signals,”

Physical Review D , vol. 57, no. 8, p. 4588, 1998. [4] J. Inglada, V. Muron, D. Pichard, and T. Feuvrier, “Analysis of artifactsin subpixel remote sensing image registration,” IEEE transactions onGeoscience and Remote Sensing , vol. 45, no. 1, pp. 254–264, 2007.[5] M. Debella-Gilo and A. K¨a¨ab, “Sub-pixel precision image matching formeasuring surface displacements on mass movements using normalizedcross-correlation,”

Remote Sensing of Environment , vol. 115, no. 1, pp.130–142, 2011.[6] L. Z¨ollei, J. W. Fisher III, and W. M. Wells III, “A uniﬁed statistical andinformation theoretic framework for multi-modal image registration,” in

IPMI . Springer, 2003, pp. 366–377.[7] M. E. Leventon and W. E. L. Grimson, “Multi-modal volume registrationusing joint intensity distributions,” in

International Conference on Med-ical Image Computing and Computer-Assisted Intervention . Springer,1998, pp. 1057–1066.[8] B. H. Savitzky, I. E. Baggari, C. B. Clement, E. Waite, B. H.Goodge, D. J. Baek, J. P. Sheckelton, C. Pasco, H. Nair, N. J.Schreiber, J. Hoffman, A. S. Admasu, J. Kim, S.-W. Cheong,A. Bhattacharya, D. G. Schlom, T. M. McQueen, R. Hovden, and L. F.Kourkoutis, “Image registration of low signal-to-noise cryo-stem data,”

Ultramicroscopy

ExperimentalMechanics , vol. 53, no. 9, pp. 1743–1761, 2013.[10] A. Bartesaghi, A. Merk, S. Banerjee, D. Matthies, X. Wu, J. L.Milne, and S. Subramaniam, “2.2 ˚a resolution cryo-em structure of β -galactosidase in complex with a cell-permeant inhibitor,” Science , vol.348, no. 6239, pp. 1147–1151, 2015.[11] A. Bartesaghi, D. Matthies, S. Banerjee, A. Merk, and S. Subramaniam,“Structure of β -galactosidase at 3.2-˚a resolution obtained by cryo-electron microscopy,” Proceedings of the National Academy of Sciences ,vol. 111, no. 32, pp. 11 709–11 714, 2014.[12] D. Zhang, Y. Zhu, L. Liu, X. Ying, C.-E. Hsiung, R. Sougrat, K. Li, andY. Han, “Atomic-resolution transmission electron microscopy of electronbeam–sensitive crystalline materials,”

Science , vol. 359, no. 6376, pp.675–679, 2018.[13] Y. Zhu, J. Ciston, B. Zheng, X. Miao, C. Czarnik, Y. Pan, R. Sougrat,Z. Lai, C.-E. Hsiung, K. Yao et al. , “Unravelling surface and interfa-cial structures of a metal–organic framework by transmission electronmicroscopy,”

Nature materials , vol. 16, no. 5, p. 532, 2017.[14] D. Robinson and P. Milanfar, “Fundamental performance limits in imageregistration,”

IEEE Transactions on Image Processing , vol. 13, no. 9, pp.1185–1199, 2004.[15] I. S. Yetik and A. Nehorai, “Performance bounds on image registration,”

IEEE Transactions on Signal Processing , vol. 54, no. 5, pp. 1737–1749,2006.[16] T. Q. Pham, M. Bezuijen, L. J. Van Vliet, C. Luengo Hendriks, andK. Schutte, “Performance of optimal registration estimators,”

Proceed-ings of SPIE, 2005 vol. 5817 , 2005.[17] G. K. Rohde, A. Aldroubi, and D. M. Healy, “Interpolation artifacts insub-pixel image registration,”

IEEE transactions on image processing ,vol. 18, no. 2, pp. 333–345, 2009.[18] H. W. Schreier, J. R. Braasch, and M. A. Sutton, “Systematic errorsin digital image correlation caused by intensity interpolation,”

Opticalengineering , vol. 39, no. 11, pp. 2915–2922, 2000.[19] D. G. Bailey, A. Gilman, and R. Browne, “Bias characteristics of bilinearinterpolation based registration,” in

TENCON 2005 2005 IEEE Region10 . IEEE, 2005, pp. 1–6.[20] M. L. Uss, B. Vozel, V. A. Dushepa, V. A. Komjak, and K. Chehdi,“A precise lower bound on image subpixel registration accuracy,”

IEEETransactions on Geoscience and Remote Sensing , vol. 52, no. 6, pp.3333–3345, 2014.[21] J. Ziv and M. Zakai, “Some lower bounds on signal parameter esti-mation,”

IEEE transactions on Information Theory , vol. 15, no. 3, pp.386–391, 1969.[22] M. Xu, H. Chen, and P. K. Varshney, “Ziv–zakai bounds on imageregistration,”

IEEE Transactions on Signal Processing , vol. 57, no. 5,pp. 1745–1755, 2009.[23] T. M. Cover and J. A. Thomas,

Elements of information theory . JohnWiley & Sons, 2012.[24] G. Jacovitti and G. Scarano, “Discrete time techniques for time delayestimation,”

IEEE Transactions on signal processing , vol. 41, no. 2, pp.525–533, 1993.[25] M. Guizar-Sicairos, S. T. Thurman, and J. R. Fienup, “Efﬁcient subpixelimage registration algorithms,”

Optics letters , vol. 33, no. 2, pp. 156–158, 2008. [26] C. Aguerrebere, M. Delbracio, A. Bartesaghi, and G. Sapiro, “Funda-mental limits in multi-image alignment,”

IEEE Transactions on SignalProcessing , vol. 64, no. 21, pp. 5707–5722, 2016.[27] D. J. MacKay, “Hyperparameters: Optimize, or integrate out?” in

Maximum entropy and bayesian methods . Springer, 1996, pp. 43–59.[28] N. G. De Bruijn,

Asymptotic methods in analysis . Courier Corporation,1970, vol. 4.[29] B. F. Hutton and M. Braun, “Software for image registration: algorithms,accuracy, efﬁcacy,” in

Seminars in nuclear medicine , vol. 33, no. 3.Elsevier, 2003, pp. 180–192.[30] T. C. Pekin, C. Gammer, J. Ciston, A. M. Minor, and C. Ophus, “Op-timizing disk registration algorithms for nanobeam electron diffractionstrain mapping,”

Ultramicroscopy , vol. 176, pp. 170–176, 2017.[31] V. Balasubramanian, “Statistical inference, occam’s razor, and statisticalmechanics on the space of probability distributions,”

Neural computa-tion , vol. 9, no. 2, pp. 349–368, 1997.[32] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” arXivpreprint arXiv:1711.10925 , 2017.[33] M. Bierbaum, B. D. Leahy, A. A. Alemi, I. Cohen, and J. P. Sethna,“Light microscopy at maximal precision,”

Physical Review X , vol. 7,no. 4, p. 041007, 2017.[34] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery,

Nu-merical recipes 3rd edition: The art of scientiﬁc computing . Cambridgeuniversity press, 2007.[35] I. El Baggari, B. H. Savitzky, A. S. Admasu, J. Kim, S.-W. Cheong,R. Hovden, and L. F. Kourkoutis, “Nature and evolution of incom-mensurate charge order in manganites visualized with cryogenic scan-ning transmission electron microscopy,”

Proceedings of the NationalAcademy of Sciences , vol. 115, no. 7, pp. 1445–1450, 2018.[36] A. B. Yankovich, B. Berkels, W. Dahmen, P. Binev, S. I. Sanchez,S. A. Bradley, A. Li, I. Szlufarska, and P. M. Voyles, “Picometre-precision analysis of scanning transmission electron microscopy imagesof platinum nanocatalysts,”

Nature communications , vol. 5, p. 4155,2014.[37] D. W. Tyler, “Intrinsic bias in ﬁsher information calculations formulti-mode image registration,”