[PDF] Estimation in the group action channel

Abstract

We analyze the problem of estimating a signal from multiple measurements on a group action channel that linearly transforms a signal by a random group action followed by a fixed projection and additive Gaussian noise. This channel is motivated by applications such as multi-reference alignment and cryo-electron microscopy. We focus on the large noise regime prevalent in these applications. We give a lower bound on the mean square error (MSE) of any asymptotically unbiased estimator of the signal's orbit in terms of the signal's moment tensors, which implies that the MSE is bounded away from 0 when N/ σ 2d is bounded from above, where N is the number of observations, σ is the noise standard deviation, and d is the so-called moment order cutoff . In contrast, the maximum likelihood estimator is shown to be consistent if N/ σ 2d diverges.

Full PDF

EEstimation in the group action channel

Emmanuel Abbe , João M. Pereira , and Amit Singer The Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA Electrical Engineering Department, Princeton University, Princeton, NJ, USA Department of Mathematics, Princeton University, Princeton, NJ, USA

Abstract —We analyze the problem of estimating a signal frommultiple measurements on a group action channel that linearlytransforms a signal by a random group action followed by a ﬁxedprojection and additive Gaussian noise. This channel is motivatedby applications such as multi-reference alignment and cryo-electron microscopy. We focus on the large noise regime prevalentin these applications. We give a lower bound on the mean squareerror (MSE) of any asymptotically unbiased estimator of thesignal’s orbit in terms of the signal’s moment tensors, whichimplies that the MSE is bounded away from 0 when

N/σ d isbounded from above, where N is the number of observations, σ is the noise standard deviation, and d is the so-called momentorder cutoff . In contrast, the maximum likelihood estimator isshown to be consistent if N/σ d diverges. Index Terms —Multi-reference alignment, cryo-EM, Chapman-Robbins bound.

I. I

NTRODUCTION

In this paper, we consider the problem of estimating x ∈ R L with N measurements from the group action channel , deﬁnedas Y j = P G j x + σZ j ∈ R K , j = 1 , . . . , N, (I.1)where the Z j are i.i.d and drawn from N (0 , I K ) , i.e. Z j ∈ R K and its entries are i.i.d standard Gaussian variables; P ∈ R K × L is a projection matrix which is known; G j are i.i.d.matrices drawn from a distribution θ on a compact subgroup Θ of O ( L ) , i.e. the space of orthogonal matrices in R L . Thedistribution θ is not known, however the main goal is toestimate the signal x ∈ R L .The goal of this paper is to understand the sample com-plexity of (I.1), i.e. the relation between the number ofmeasurements and the noise standard deviation such that anestimator ˆ X , of x , converges in probability to the true valuewith N diverging, up to a group action. Allowing for a groupaction is intrinsic to the problem: if we apply an element g of Θ to x , and its inverse g − to the right of θ , we will produceexactly the same samples, thus there is no estimator ˆ X that isable to distinguish the observations that originate from x andthe ones from gx .The model (I.1) is a generalization of multi-reference align-ment (MRA), which arises in a variety of engineering andscientiﬁc applications, among them structural biology [1], [2],[3], radar [4], [5], robotics [6] and image processing [7], [8],[9]. The one-dimensional MRA problem, where Θ is the groupgenerated by the matrix R that cyclically shifts the elementsof the signal, i.e. it maps ( x , . . . , x L ) (cid:55)→ ( x L , x . . . , x L − ) ,has been recently a topic of active research. In [10], it wasshown that the sample complexity is ω ( σ ) when θ is the uniform distribution and the projection matrix is the identity.In [11] it was presented a provable polynomial time estimatorthat achieves the sample complexity, while in [12] it waspresented a non-convex optimization framework that is moreefﬁcient in practice. Note that, when the projection matrixis the identity, we can always enforce a uniform distributionon Θ by applying a random group action, i.i.d. and drawnfrom the uniform distribution, to the observations. In [13],it was shown that ω ( σ ) is also the sample complexity if θ is unknown beforehand but is uniform or periodic, this is, θ = R (cid:96) θ for some ≤ (cid:96) ≤ L − . However, if θ is aperiodic,the sample complexity is ω ( σ ) . It is also presented an efﬁcientestimator that uses the ﬁrst and second moments of the signalover the group, which can be estimated with order of σ and σ observations, respectively, thus achieving the samplecomplexity. The main result in this paper is a generalizationof the information lower bound presented in [13], however theproof techniques remain the same.We can also use (I.1) to model the problem of singleparticle reconstruction in cryo-electron microscopy (cryo-EM),in which a three-dimensional volume is recovered from two-dimensional noisy projections taken at unknown viewing di-rections [14], [15]. Here x is a linear combination of productsof spherical harmonics and radial basis functions, Θ ≡ SO (3) ,and its elements act on x by rotating the basis functions.Finally, P is a tomographic projection onto the xy plane. Thepaper [16] considers the problem (I.1) with θ being known anduniform. It obtains the same result for the sample complexityas this paper, and together with results from computationalalgebra and invariant theory veriﬁes that in many cases thesample complexity for the considered cryo-EM model is ω ( σ ) , and at least ω ( σ ) more generally. They also considerthe problem of heterogeneity in cryo-EM.II. T HE M AIN R ESULT

Since we can only determine x up to a group action, wedeﬁne the best alignment of (cid:98) X with x by φ x ( (cid:98) X ) = argmin z ∈{ g (cid:98) X } g ∈ Θ (cid:107) z − x (cid:107) . (II.1)and the mean square error (MSE) as MSE := E (cid:20) min g ∈ Θ (cid:107) g (cid:98) X − x (cid:107) (cid:21) , = E (cid:104) (cid:107) φ x ( (cid:98) X ) − x (cid:107) (cid:105) . (II.2) a r X i v : . [ c s . I T ] J a n he expectation is taken over (cid:98) X , which is a function of theobservations with distribution determined by (I.1). Since weare interested in estimators that converge to an orbit of x inprobability as N diverges, we only consider estimators whichare asymptotically unbiased, i.e., E [ φ x ( (cid:98) X )] → x as N → ∞ .However the results presented in this paper can be adapted tobiased estimators (see Theorem III.2).Let us introduce some notation regarding tensors. For avector x ∈ R L , we denote by x ⊗ n the L ⊗ n dimensional tensorwhere the entry indexed by k = ( k , . . . , k n ) ∈ Z nL is givenby (cid:81) nj =1 x [ k j ] . The space of n -dimensional tensors formsa vector space, with sum and multiplication deﬁned entry-wise. This vector-space has inner product and norm deﬁned by (cid:104) A, B (cid:105) = (cid:80) k ∈ Z nL A [ k ] B [ k ] and (cid:107) A (cid:107) = (cid:104) A, A (cid:105) , respectively.

Deﬁnition II.1.

The n -th order moment of x over θ , is thetensor of order n and dimension K n , deﬁned by M nx,θ := E (cid:2) ( P Gx ) ⊗ n (cid:3) , (II.3)where G ∼ θ .In this paper, we provide lower bounds for the MSE interms of the noise standard deviation and the number ofobservations. We show that the MSE is bounded below byorder of N/σ d , where ¯ d is the moment order cutoff , deﬁned asthe smallest such that the moment tensors up to order ¯ d deﬁne x unequivocally. We also show that if N >> σ d , then themarginalized maximum likelihood estimator (MLE) convergesin probability to the true signal (up to a group action). We nowpresent the main result of the paper. Theorem II.2.

Consider the estimation problem given byequation (I.1) . For any signal ˜ x ∈ R L such φ x (˜ x ) (cid:54) = x andfor any group distribution ˜ θ , let K n ˜ x, ˜ θ = n ! (cid:107) M n ˜ x, ˜ θ − M nx,θ (cid:107) , d ˜ x, ˜ θ = inf (cid:110) n : K n ˜ x, ˜ θ > (cid:111) and deﬁne the moment ordercutoff as ¯ d = max d ˜ x, ˜ θ . Finally let λ mN = N/σ m . We have

MSE ≥ sup ˜ x, ˜ θ : d ˜ x, ˜ θ = ¯ d  (cid:107) φ x (˜ x ) − x (cid:107) exp (cid:16) λ ¯ dN K ¯ d ˜ x, ˜ θ (cid:17) − O (cid:16) λ ¯ dN σ − (cid:17)  , (II.4) thus the MSE is bounded away from zero if λ ¯ dN is boundedfrom above. Moreover, if lim N →∞ λ ¯ dN = ∞ , then the MLEconverges in probability to gx , for some element g ∈ Θ .A. Taking the limit (˜ x, ˜ ρ ) → ( x, ρ ) Theorem II.2 is an application of a modiﬁed Chapman-Robbins bound, presented later in Theorem III.2. On the otherhand the classical Cramér-Rao bound [17], which gives a lowerbound on the variance of an estimator ˆ S of a parameter s ∈ R ,can be obtained from the Chapman-Robbins bound by takingthe limit ˜ s → s . We present an analog version of TheoremII.2 obtained by taking a similar limit. Corollary II.3.

Under the conditions of Theorem II.2, let x h := (1 − h ) x + h ˜ x , θ h := (1 − h ) θ + h ˜ θ , Q n ˜ x, ˜ θ = lim h → n ! h (cid:107) M nx h ,θ h − M nx,θ (cid:107) ,q ˜ x, ˜ θ = inf (cid:110) n : Q n ˜ x, ˜ θ > (cid:111) and ¯ q = max q ˜ x, ˜ θ . Then MSE ≥ sup ˜ x, ˜ θ : q ˜ x, ˜ θ =¯ q  (cid:107) φ x (˜ x ) − x (cid:107) λ ¯ qN Q ¯ q ˜ x, ˜ θ + O (cid:16) λ ¯ qN σ − (cid:17)  . (II.5)We leave the proof of this corollary to [13, Appendix C]. Itis interesting to compare this bound with (II.4) when λ ¯ dN or λ ¯ qN diverge. If ¯ q ≥ ¯ d , then (II.5) will dominate (II.4), and the lowerbound for the MSE will be inversely proportional to λ ¯ qN , whichis a behavior typical of estimation problems with continuousmodel paramaters. On the other hand, if ¯ d > ¯ q , then (II.4)dominates (II.5). the MSE will depend exponentially on λ ¯ dN ,which is a behaviour typical of discrete parameter estimationproblems [18]. One can show that ¯ d > ¯ q only happens whenthe supremum in (II.4) is attained by some x ∗ not in theorbit of x . The exponential decay in (II.4) is the same as theprobability of error of the hypothesis testing which decides ifthe observations come from x or x ∗ .We conjecture that the lower bounds presented in this papercan be achieved asymptotically by the MLE. In fact when thesearch space is discrete, the MLE achieves the least probabilityof error (assuming a uniform prior on the parameters), whichbehaves like (II.4). Also, when the search space is continuous,the MLE is asymptotically efﬁcient, which means it achievesthe Cramér-Rao lower bound. However this bound is obtainedfrom the Chapman-Robbins lower bound (which we use in thispaper) by taking a similar limit as in (II.5), and the bound alsoscales inversely proportional to the number of observations. B. Prior Knowledge

The result presented can be adapted to improve the boundif we have prior knowledge about the signal and groupdistribution. If we know beforehand that ( x, θ ) ∈ A (forinstance, x has a zero element or θ is the uniform distributionon Θ ), we can instead deﬁne ¯ d = max (˜ x, ˜ θ ) ∈A d ˜ x, ˜ θ and restrictthe supremum in (II.4) to (˜ x, ˜ θ ) in A . C. Examples1)

Let x = ( a, b, c ) ∈ R ; Θ be the group generated by thecyclic shift matrix R that maps ( a, b, c ) (cid:55)→ ( b, c, a ) ; and P projects x into its ﬁrst two elements, i.e P ( a, b, c ) = ( a, b ) .Furthermore, we know a-priori that one, and only one, of theelements of x is (let’s assume without loss of generality that a = 0 ), the other two elements are distinct and θ is uniform,i.e. P ( G = I ) = P ( G = R ) = P ( G = R ) = . We have M x,θ = E [ P Gx ] , = 13 (0 , b ) + 13 ( b, c ) + 13 ( c, , = b + c , . nd M x,θ = E (cid:104) ( P Gx )( P Gx ) T (cid:105) , = 13 (cid:20) b (cid:21) + 13 (cid:20) b bcbc c (cid:21) + 13 (cid:20) c

00 0 (cid:21) , = 13 (cid:20) b + c bcbc b + c (cid:21) . From these two moments, we can solve for b and c , howeverall these equations are symmetric on b and c , thus we can’tidentify which one of the values obtained is b and which oneis c . In other words, both candidate solutions are x = (0 , b, c ) and x ∗ = (0 , c, b ) . However M x,θ differs from M x ∗ ,θ , if welook for the entry in M x,θ indexed by (1 , , we note that M x,θ [1 , ,

2] = E (cid:104) ( P Gx ) ( P Gx ) (cid:105) , = 13 0 b + 13 b c + 13 c

0= 13 b c, and analogously M x ∗ ,θ [1 , ,

2] = c b . From the 8 entries of M x,θ , are equal to M x ∗ ,θ and differ by b c − c b , inabsolute value, so (cid:107) M x ∗ ,θ − M x,θ (cid:107) = 6( b c − c b ) . Thismeans ¯ d = 3 , ¯ q = 2 , thus if λ N the lower bound (II.4)dominates (II.5), the supremum is attained at x ∗ and MSE ≥ (cid:107) φ x ( x ∗ ) − x (cid:107) exp (cid:0) λ N ( b c − c b ) (cid:1) − O (cid:0) λ N σ − (cid:1) . Note that (cid:107) φ x ( x ∗ ) − x (cid:107) = min( b , c , b − c ) ) . Let x = ( a, b ) ∈ R ; Θ be the group generated by thecyclic shift matrix R that maps ( a, b ) (cid:55)→ ( b, a ) ; and P projects x into its ﬁrst element, i.e P ( a, b ) = a . Furthermore, we knowa-priori that θ is uniform, i.e. P ( G = I ) = P ( G = R ) = .We have M x,θ = E [ P Gx ] = 12 a + 12 b = a + b and M x,θ = E (cid:104) ( P Gx ) (cid:105) = 12 a + 12 b = a + b . From these two moments we can determine a and b up to anaction of the group. Now take x h = ( a + h, b − h ) , so that M x,θ = M x h ,θ . We have lim h → h ( M x h ,θ − M x,θ ) = a − b. Here ¯ q = ¯ d = 2 , thus if λ N diverges, (II.5) dominates (II.4),and the lower bound is MSE ≥ λ N ( a − b ) + O (cid:0) λ N σ − (cid:1) . III. P

ROOF T ECHNIQUES

The outline of the proof is as follows. In Section III-A weuse an adaptation of the Chapman-Robbins lower bound [19],to derive a lower bound on the MSE in terms of the χ divergence, this is Theorem III.2. Then, in Section III-B, weexpress the χ divergence in terms of the Taylor expansionof the posterior probability density and the moment tensors,obtaining Lemma III.3. Finally in section III-C we combineTheorem III.2 and Lemma III.3 to obtain (II.4), use LemmaIII.3 to obtain a similar Taylor expansion for the Kullback-Leibler (KL) divergence and use this to show that the MLE isconsistent.Throughout the paper we denote the expectation by E , usecapital letter for random variables and lower case letter forinstances of these random variables. Let Y N ∈ R L × N bethe collection of all measurements as columns in a matrix.Let us denote by f Nx,θ the probability density of the posteriordistribution of Y N , f Nx,θ ( y N ) = N (cid:89) j =1 f x,θ ( y j ) , (III.1)and the expectation of a function g of the measurements underthe measure f Nx,θ by E x,θ (cid:20) g (cid:16) Y N (cid:17)(cid:21) := (cid:90) R L × N g (cid:16) y N (cid:17) f Nx,θ (cid:16) y N (cid:17) dy N . For ease of notation, we write E (cid:104) g (cid:0) Y N (cid:1)(cid:105) when the signaland distribution are implicit. The bias-variance trade-off of theMSE is given by: MSE = tr(Cov[ φ x ( (cid:98) X )]) + (cid:107) E [ φ x ( (cid:98) X )] − x (cid:107) , (III.2)with Cov[ φ x ( (cid:98) X )] = E (cid:104) φ x ( (cid:98) X ) φ x ( (cid:98) X ) T (cid:105) − E [ φ x ( (cid:98) X )] E [ φ x ( (cid:98) X )] T . (III.3)Our last deﬁnition is of the χ divergence, which gives ameasure of how "far" two probability distributions are. Deﬁnition III.1.

The χ divergence between two probabilitydensities f A and f B is deﬁned by χ ( f A || f B ) := E (cid:34)(cid:18) f A ( B ) f B ( B ) − (cid:19) (cid:35) , where B ∼ f B .Due to equation (III.1), the relation between the χ diver-gence for N and one observations is given by χ ( f N ˜ x, ˜ θ || f Nx,θ ) = (1 + χ ( f ˜ x, ˜ θ || f x,θ )) N − . (III.4) A. Chapman-Robbins lower bound for an orbit

The classical Chapman-Robbins gives a lower bound on anerror metric of the form E [ (cid:107) (cid:98) X − x (cid:107) ] , hence we modiﬁed it toaccommodate to the group invariant metric deﬁned in (II.2).We point out that Cov[ φ x ( (cid:98) X )] is related to the MSE by (III.2). heorem III.2 (Chapman-Robbins for orbits) . For any ˜ x ∈ R L and group distribution ˜ θ in Θ , we have Cov[ φ x ( (cid:98) X )] (cid:23) zz T χ ( f N ˜ x, ˜ θ || f Nx,θ ) , where z = E ˜ x, ˜ θ [ φ x ( (cid:98) X )] − E x,θ [ φ x ( (cid:98) X )] .Proof. The proof mimics the one of the classical Chapmanand Robbins bound, and is also presented in [13, AppendixA]. Deﬁne V := f N ˜ x, ˜ θ ( Y N ) f Nx,θ ( Y N ) . and note that • E x,θ [ g ( Y N ) V ] = E ˜ x, ˜ θ [ g ( Y N )] , • E x,θ [ V −

1] = 0 , • E x,θ [( V − ] = χ ( f N ˜ x, ˜ θ || f Nx,θ ) . We have w T (cid:16) E ˜ x, ˜ θ [ φ x ( (cid:98) X )] − E x,θ [ φ x ( (cid:98) X )] (cid:17) = E x,θ (cid:20) w T (cid:16) φ x ( (cid:98) X ) − E x,θ [ φ x ( (cid:98) X )] (cid:17) ( V − (cid:21) , and by Cauchy-Schwarz (cid:20) w T (cid:16) E ˜ x, ˜ θ [ φ x ( (cid:98) X )] − E x,θ [ φ x ( (cid:98) X )] (cid:17)(cid:21) ≤ E x,θ [( w T ( φ x ( (cid:98) X ) − E x,θ [ φ x ( (cid:98) X )])) ] χ ( f N ˜ x, ˜ θ || f Nx,θ ) . B. χ divergence and moment tensors In this subsection we give a characterization of the χ divergence, which appears in the Chapman-Robbins bound,in terms of the moment tensors.Instead of considering the posterior probability density of Y N , we will consider its normalized version (cid:101) Y N = Y N /σ .We then have (cid:101) Y j = γP G j x + Z j , (III.5)where γ = 1 /σ , G j ∼ θ and Z j ∼ N (0 , I ) . While this changeof variable does not change the χ divergence, we can nowtake the Taylor expansion of the probability density around γ = 0 , that is, f x,θ ( y ; γ ) = f Z ( y ) ∞ (cid:88) j =0 α jx,θ ( y ) γ j j ! , (III.6)where f Z ( y ) = f x,θ ( y ; 0) is the probability density of Z j (since when γ = 0 , (cid:101) Y j = Z j ) and α jx,θ ( y ) := 1 f Z ( y ) ∂ j f x,θ ∂γ j ( y ; 0) , (III.7)thus α x,θ ( y ) = 1 . We note f x,θ ( y ; γ ) is in inﬁnitely differen-tiable for all y ∈ R L , thus α jx,θ ( y ) is always well-deﬁned. Wenow use (III.6) to give an expression of the χ divergence interms of the moment tensors. Lemma III.3.

The divergence χ ( f ˜ x, ˜ θ || f x,θ ) is expressed interms of the moment tensors as: χ ( f ˜ x, ˜ θ || f x,θ )= σ − d ( d !) E (cid:20)(cid:16) α d ˜ x, ˜ θ ( Z ) − α dx,θ ( Z ) (cid:17) (cid:21) + O ( σ − d − ) , (III.8) = σ − d d ! (cid:107) M d ˜ x, ˜ θ − M dx,θ (cid:107) + O ( σ − d − ) , (III.9) where d = inf (cid:110) n : (cid:107) M n ˜ x, ˜ θ − M nx,θ (cid:107) > (cid:111) .Proof. This proof is presented in more detail in [13, AppendixB]. Equation (III.8) is obtained by Taylor expanding the χ divergence around γ = 0 , using (III.6) and the fact that α n ˜ x, ˜ θ ( z ) = α nx,θ ( z ) almost surely for all n < d , which followsfrom the deﬁnition of d and equation (III.9). Now to prove(III.9), it is enough to show that E (cid:104) α d ˜ x, ˜ θ ( Z ) α dx,θ ( Z ) (cid:105) = d ! (cid:68) M d ˜ x, ˜ θ , M dx,θ (cid:69) , (III.10)Let G and ˜ G be two independent random variables such that G ∼ θ and ˜ G ∼ ˜ θ . On one hand we have (cid:68) M d ˜ x, ˜ θ , M dx,θ (cid:69) = E (cid:20)(cid:68) P ˜ G ˜ x, P Gx (cid:69) d (cid:21) . (III.11)On the other hand, we can write f x,θ explicitly by f x,θ ( y ) = E G [ f Z ( y − γP Gx )] , (III.12)where G ∼ θ , and using equation (III.7) we can write E (cid:104) α d ˜ x, ˜ θ ( Z ) α dx,θ ( Z ) (cid:105) = ∂ d ∂ ˜ γ d ∂γ d E (cid:34) f Z ( Z − ˜ γP ˜ G ˜ x ) f Z ( Z ) f Z ( Z − γP Gx ) f Z ( Z ) (cid:35) ˜ γ,γ =0 = E (cid:34) ∂ d ∂ ˜ γ d ∂γ d exp (cid:18) γ ˜ γ (cid:68) P ˜ G ˜ x, P Gx (cid:69)(cid:19)(cid:35) ˜ γ,γ =0 = d ! E (cid:20)(cid:68) P ˜ G ˜ x, P Gx (cid:69) d (cid:21) , where G and ˜ G are deﬁned as in (III.11), and (III.10) ﬁnallyfollows from equation (III.11). C. Final details of the proof of Theorem II.2

By Theorem III.2, Lemma III.3, equations (III.3) and (III.4)we obtain

MSE ≥ (cid:107) φ x (˜ x ) − x (cid:107) (cid:16) σ − d K d + O (cid:0) σ − d − (cid:1)(cid:17) N − . (III.13)Equation (II.4) now follows from (cid:16) σ − d K d + O ( σ − d − ) (cid:17) N =exp (cid:16) λ dN K d ˜ x, ˜ θ (cid:17) + O (cid:16) λ dN σ − (cid:17) and taking the supremum over ˜ x and ˜ θ .inally we prove that the MLE is consistent, i.e. it convergesto the true signal in probability, when ρ = ∞ . Let L N (˜ x, ˜ θ ) := σ d N N (cid:88) i =1 log f ˜ x, ˜ θ ( ˜ Y i ) f x,θ ( ˜ Y i ) . (III.14)The MLE is given by ˆ X MLE = argmax ˜ x max ˜ θ L N (˜ x, ˜ θ ) . Fix ˜ x and ˜ θ , and for ease of notation let d = d ˜ x, ˜ θ . We canwrite L N (˜ x, ˜ θ ) = σ

2( ¯ d − d ) N/σ d N/σ d (cid:88) i =1 σ d (cid:88) j =1 log f ˜ x, ˜ θ ( ˜ Y σ d ( i − j ) f x,θ ( ˜ Y σ d ( i − j ) We have E  σ d (cid:88) j =1 log f ˜ x, ˜ θ ( ˜ Y j ) f x,θ ( ˜ Y j )  = σ d E  log f ˜ x, ˜ θ ( ˜ Y ) f x,θ ( ˜ Y )  = − σ d D ( f ˜ x, ˜ θ || f x,θ ) where D denotes the KL divergence, deﬁned for two proba-bility densities f A and f B as D ( f A || f B ) := E (cid:34) log (cid:18) f A ( A ) f B ( A ) (cid:19)(cid:35) , where A ∼ f A .Using (III.6), with γ = 1 /σ , we have f ˜ x, ˜ θ → f x,θ as γ → ,which implies by [20, Section F, Theorem 9] that lim γ → D ( f ˜ x, ˜ θ || f x,θ ) χ ( f ˜ x, ˜ θ || f x,θ ) = 12 , and D ( f ˜ x, ˜ θ || f x,θ ) = σ − d d ! (cid:107) M d ˜ x, ˜ θ − M dx,θ (cid:107) + O ( σ − d − ) , thus by the law of large numbers, since N/σ d diverges, L N (˜ x, ˜ θ ) → (cid:40) −∞ if d ˜ x, ˜ ρ < ¯ d −

12 ¯ d ! (cid:107) M ¯ d ˜ x, ˜ θ − M ¯ dx,θ (cid:107) otherwiseAs N → ∞ , the maximum of L N (˜ x, ˜ θ ) tends to inprobability, and is achieved when φ x (˜ x ) = x , thus the MLEmust converge in probability to gx for some g ∈ Θ .A CKNOWLEDGMENTS

EA was partly supported by the NSF CAREER AwardCCF–1552131, ARO grant W911NF–16–1–0051 and NSFCenter for the Science of Information CCF–0939370.JP and AS were partially supported by Award NumberR01GM090200 from the NIGMS, the Simons FoundationInvestigator Award and Simons Collaborations on Algorithmsand Geometry, the Moore Foundation Data-Driven DiscoveryInvestigator Award, and AFOSR FA9550-17-1-0291.We would like to thank Afonso Bandeira, Tamir Bendory,Joseph Kileel, William Leeb and Nir Sharon for many insight-ful discussions. R

EFERENCES[1] W. Park, C. R. Midgett, D. R. Madden, and G. S. Chirikjian, “Astochastic kinematic model of class averaging in single-particle electronmicroscopy,”

The International journal of robotics research , vol. 30,no. 6, pp. 730–754, 2011.[2] W. Park and G. S. Chirikjian, “An assembly automation approach toalignment of noncircular projections in electron microscopy,”

IEEETransactions on Automation Science and Engineering , vol. 11, no. 3,pp. 668–679, 2014.[3] S. H. Scheres, M. Valle, R. Nuñez, C. O. Sorzano, R. Marabini, G. T.Herman, and J.-M. Carazo, “Maximum-likelihood multi-reference re-ﬁnement for electron microscopy images,”

Journal of molecular biology ,vol. 348, no. 1, pp. 139–149, 2005.[4] J. P. Zwart, R. van der Heiden, S. Gelsema, and F. Groen, “Fasttranslation invariant classiﬁcation of HRR range proﬁles in a zerophase representation,”

IEE Proceedings-Radar, Sonar and Navigation ,vol. 150, no. 6, pp. 411–418, 2003.[5] R. Gil-Pita, M. Rosa-Zurera, P. Jarabo-Amores, and F. López-Ferreras,“Using multilayer perceptrons to align high range resolution radarsignals,” in

International Conference on Artiﬁcial Neural Networks ,pp. 911–916, Springer, 2005.[6] D. M. Rosen, L. Carlone, A. S. Bandeira, and J. J. Leonard, “A certi-ﬁably correct algorithm for synchronization over the special euclideangroup,” arXiv:1611.00128 , 2016.[7] I. L. Dryden and K. V. Mardia,

Statistical shape analysis , vol. 4. J.Wiley Chichester, 1998.[8] H. Foroosh, J. B. Zerubia, and M. Berthod, “Extension of phase corre-lation to subpixel registration,”

IEEE transactions on image processing ,vol. 11, no. 3, pp. 188–200, 2002.[9] D. Robinson, S. Farsiu, and P. Milanfar, “Optimal registration of aliasedimages using variable projection with applications to super-resolution,”

The Computer Journal , vol. 52, no. 1, pp. 31–42, 2009.[10] A. Bandeira, P. Rigollet, and J. Weed, “Optimal rates of estimation formulti-reference alignment,” arXiv preprint 1702.08546 , 2017.[11] A. Perry, J. Weed, A. Bandeira, P. Rigollet, and A. Singer, “The samplecomplexity of multi-reference alignment,” arXiv preprint 1707.00943 ,2017.[12] T. Bendory, N. Boumal, C. Ma, Z. Zhao, and A. Singer, “Bispectruminversion with application to multireference alignment,”

IEEE Transac-tions on Signal Processing . To appear.[13] E. Abbe, T. Bendory, W. Leeb, J. M. Pereira, N. Sharon, and A. Singer,“Multireference Alignment is Easier with an Aperiodic TranslationDistribution,” arXiv:1710.02793 , 2017.[14] A. Bartesaghi, A. Merk, S. Banerjee, D. Matthies, X. Wu, J. L.Milne, and S. Subramaniam, “2.2 Å resolution cryo-EM structure of β -galactosidase in complex with a cell-permeant inhibitor,” Science ,vol. 348, no. 6239, pp. 1147–1151, 2015.[15] D. Sirohi, Z. Chen, L. Sun, T. Klose, T. C. Pierson, M. G. Rossmann,and R. J. Kuhn, “The 3.8 Å resolution cryo-EM structure of Zika virus,”

Science , vol. 352, no. 6284, pp. 467–470, 2016.[16] A. S. Bandeira, B. Blum-Smith, A. Perry, J. Weed, and A. S. Wein,“Estimation under group actions: recovering orbits from invariants,” arXiv:1712.10163 , 2017.[17] H. Cramér,

Mathematical Methods of Statistics (PMS-9) , vol. 9. Prince-ton university press, 2016.[18] E. Abbe, J. M. Pereira, and A. Singer, “Sample complexity of theBoolean multireference alignment problem,” in , pp. 1316–1320, June 2017.[19] D. G. Chapman and H. Robbins, “Minimum variance estimation withoutregularity assumptions,”