[PDF] Steerable e PCA: Rotationally Invariant Exponential Family PCA

Abstract

In photon-limited imaging, the pixel intensities are affected by photon count noise. Many applications, such as 3-D reconstruction using correlation analysis in X-ray free electron laser (XFEL) single molecule imaging, require an accurate estimation of the covariance of the underlying 2-D clean images. Accurate estimation of the covariance from low-photon count images must take into account that pixel intensities are Poisson distributed, hence the classical sample covariance estimator is sub-optimal. Moreover, in single molecule imaging, including in-plane rotated copies of all images could further improve the accuracy of covariance estimation. In this paper we introduce an efficient and accurate algorithm for covariance matrix estimation of count noise 2-D images, including their uniform planar rotations and possibly reflections. Our procedure, steerable e PCA, combines in a novel way two recently introduced innovations. The first is a methodology for principal component analysis (PCA) for Poisson distributions, and more generally, exponential family distributions, called e PCA. The second is steerable PCA, a fast and accurate procedure for including all planar rotations for PCA. The resulting principal components are invariant to the rotation and reflection of the input images. We demonstrate the efficiency and accuracy of steerable e PCA in numerical experiments involving simulated XFEL datasets and rotated Yale B face data.

Full PDF

11 Steerable e PCA: Rotationally Invariant ExponentialFamily PCA

Zhizhen Zhao

Member, IEEE,

Lydia T. Liu, and Amit Singer

Abstract —In photon-limited imaging, the pixel intensities areaffected by photon count noise. Many applications, such as 3-Dreconstruction using correlation analysis in X-ray free electronlaser (XFEL) single molecule imaging, require an accurateestimation of the covariance of the underlying 2-D clean images.Accurate estimation of the covariance from low-photon countimages must take into account that pixel intensities are Poissondistributed, hence the classical sample covariance estimator issub-optimal. Moreover, in single molecule imaging, including in-plane rotated copies of all images could further improve theaccuracy of covariance estimation. In this paper we introduce anefﬁcient and accurate algorithm for covariance matrix estimationof count noise 2-D images, including their uniform planarrotations and possibly reﬂections. Our procedure, steerable e PCA ,combines in a novel way two recently introduced innovations. Theﬁrst is a methodology for principal component analysis (PCA)for Poisson distributions, and more generally, exponential familydistributions, called e PCA. The second is steerable PCA, a fastand accurate procedure for including all planar rotations forPCA. The resulting principal components are invariant to therotation and reﬂection of the input images. We demonstratethe efﬁciency and accuracy of steerable e PCA in numericalexperiments involving simulated XFEL datasets and rotated YaleB face data.

Index Terms —Poisson noise, X-ray free electron laser, steer-able PCA, eigenvalue shrinkage, autocorrelation analysis, imagedenoising.

I. I

NTRODUCTION

X-ray free electron laser (XFEL) is an emerging imagingtechnique for elucidating the three-dimensional structure ofmolecules [1], [2]. Single molecule XFEL imaging collectstwo-dimensional diffraction patterns of single particles atrandom orientations. The images are very noisy due to thelow photon count and the detector count-noise follows anapproximately Poisson distribution. Since only one diffractionpattern is captured per particle and the particle orientationsare unknown, it is challenging to reconstruct the 3-D struc-ture at low signal-to-noise ratio (SNR). One approach isto use expectation-maximization (EM) [3], [4], but it has ahigh computational cost at low signal-to-noise ratio (SNR).Alternatively, assuming that the orientations of the particlesare uniformly distributed over the special orthogonal group

ZZ is with the Department of Electrical and Computer Engineering,University of Illinois at Urbana-Champaign, Urbana, IL, 61820 USA e-mail:[email protected] .LTL is with the Department of Electrical Engineering and ComputerSciences, University of California at Berkeley, Berkeley, CA, 94720 e-mail:[email protected] is with the Department of Mathematics, and Program in Applied andComputational Mathematics, Princeton University, Princeton, NJ, 08544 e-mail:[email protected] SO (3) , Kam’s correlation analysis [5]–[12] bypasses orienta-tion estimation and requires just one pass over the data, thusalleviating the computational cost. Kam’s method requires arobust estimation of the covariance matrix of the noiseless 2-D images. This serves as our main motivation for developingefﬁcient and accurate covariance estimation and denoisingmethods for Poisson data. Nevertheless, the methods presentedhere are quite general and can be applied to other imagingmodalities involving Poisson distributions.Principal Component Analysis (PCA) is widely used fordimension reduction and denoising of large datasets [13], [14].However, it is most naturally designed for Gaussian data,and there is no commonly agreed upon extension to non-Gaussian settings such as exponential family distributions [13,Sec. 14.4]. For denoising with non-Gaussian noise, popularapproaches reduce it to the Gaussian case by a wavelettransform such as a Haar transform [15]; by adaptive waveletshrinkage [15], [16]; or by approximate variance stabilizationsuch as the Anscombe transform [17]. The latter is knownto work well for Poisson signals with large parameters, dueto approximate normality. However, the normal approximationbreaks down for Poisson distributions with a small parameter,such as photon-limited XFEL [18, Sec. 6.6]. Other methodsare based on alternating minimization [19]–[21], singular valuethresholding (SVT) [22], [23] and Bayesian techniques [24].Many of aforementioned methods such as [19] are com-putationally intractable for large datasets and do not havestatistical guarantees for covariance estimation. In particular,[21] applied the methodology of [19] to denoise a single imageby performing PCA on clusters of patches extracted from theimage; it is not suitable for our problem setting, where thegoal is to simultaneously denoise a large number of imagesand estimate their covariance. Recently, [25] introduced expo-nential family PCA ( e PCA), which extends PCA to a widerclass of distributions. It involves the eigendecomposition ofa new covariance matrix estimator, constructed in a deter-ministic and non-iterative way using moment calculations andshrinkage. e PCA was shown to be more accurate than PCAand its alternatives for exponential families. Its computationalcost is similar to that of PCA, it has substantial theoreticaljustiﬁcation building on random matrix theory, and its outputis interpretable. We refer readers to [25] for experiments thatbenchmark e PCA against previous methods.In XFEL imaging, the orientations of the particles areuniformly distributed over SO (3) , and it is therefore equallylikely to observe any planar rotation of the given 2-D diffrac-tion pattern. Therefore, it makes sense to include all possiblein-plane rotations of the images when performing e PCA. To a r X i v : . [ c s . C V ] D ec this end, we incorporate steerability in e PCA, by adaptingthe steerable PCA algorithm that avoids duplicating rotatedimages [26]. The concept of a steerable ﬁlter was introducedin [27] and various methods were proposed for computing dataadaptive steerable ﬁlters [28]–[31]. We take into account theaction of the group O (2) on diffraction patterns by in-planerotation (and reﬂection if needed). The resulting principalcomponents are invariant to any O (2) transformation of theinput images.The new algorithm, to which we refer as steerable e PCA,combines e PCA and steerable PCA in a natural way. Steer-able e PCA does not involve any optimization, with all stepsinvolve only basic linear algebra operations. The various stepsinclude expansion in a steerable basis, eigen-decomposition,eigenvalue shrinkage, and different normalizations. The math-ematical and statistical rationale for all steps is provided inSection II.We illustrate the improvement in estimating the covariancematrix through image denoising in Section III. Speciﬁcally,we introduce a Wiener-type ﬁltering using the principalcomponents. Rotation invariance enhances the effectivenessof e PCA in covariance estimation and thus achieves betterdenoising. In addition, the denoised expansion coefﬁcientsare useful in building rotationally invariant image features(i.e. bispectrum-like features [32]). We verify the beneﬁts ofsteerable e PCA through numerical experiments with simulatedXFEL diffraction patterns and a natural image dataset–Yale Bface database [33], [34]. Similar to the case of standard PCA,the computational complexity of the steerable e PCA is lowerthan e PCA.An implementation of steerable e PCA in MATLAB ispublicly available at github.com/zhizhenz/sepca/.II. M

ETHODS

The goal of steerable e PCA is to estimate the rotationallyinvariant covariance matrix from the images whose pixel inten-sities are affected by photon count noise. To develop this esti-mator, we combine our previous works on steerable PCA with e PCA in a novel way. The main challenge in combining e PCAwith steerable PCA is that steerable PCA involves a (nearlyorthogonal) change of basis transformation from a Cartesiangrid to a steerable basis. While multidimensional Gaussiandistributions are invariant to orthogonal transformations, themultidimensional Poisson distribution is not invariant to suchtransformations. Therefore, steerable PCA and e PCA cannotbe naively combined in a straightforward manner. Algorithm 1details the steps of the combined procedure that achieves thegoal of estimating a rotationally invariant covariance matrixfrom Cartesian grid images with pixel intensities followinga Poisson distribution. The following subsections explain themain concepts underlying e PCA and steerable PCA, and howthey are weaved together in a manner that overcomes theaforementioned challenge. In the following subsections, wedetail the associated concepts and steps for steerable e PCA.

A. The observation model and homogenization

We adopt the same observation model introduced in [25].We observe n noisy images Y i ∈ R p (i.e., p is the number of pixels), for i = 1 , . . . , n . These are random vectors sampledfrom a hierarchical model deﬁned as follows. First, a latentvector—or hyperparameter— ω ∈ R p is drawn from a proba-bility distribution P with mean µ ω and covariance matrix Σ ω .Conditional on ω , each coordinate of Y = ( Y (1) , . . . , Y ( p )) (cid:62) is drawn independently from a canonical one-parameter expo-nential family, Y ( j ) | ω ( j ) ∼ p ω ( j ) ( y ) , Y = ( Y (1) , . . . , Y ( p )) (cid:62) , (1)with density p ω ( j ) ( y ) = exp[ ω ( j ) y − A ( ω ( j ))] (2)with respect to a σ -ﬁnite measure ν on R , where the j th entry of the latent vector, ω ( j ) ∈ R , is the natural parameterof the family and A ( ω ( j )) = log (cid:82) exp( ω ( j ) y ) dν ( y ) is thecorresponding log-partition function. The mean and varianceof the random variable Y ( j ) can be expressed as A (cid:48) ( ω ( j )) and A (cid:48)(cid:48) ( ω ( j )) , where we denote A (cid:48) ( ω ) = dA ( ω ) /dω . Therefore,the mean of Y conditional on ω is X := E ( Y | ω ) = ( A (cid:48) ( ω (1)) , . . . , A (cid:48) ( ω ( p ))) (cid:62) = A (cid:48) ( ω ) , so the noisy data vector Y can be expressed as Y = A (cid:48) ( ω ) +˜ ε , with E (˜ ε | ω ) = 0 and the marginal mean of Y is E Y = E A (cid:48) ( ω ) . Thus, one can think of Y as a noisy realization ofthe clean vector X = A (cid:48) ( ω ) . However, the latent vector ω isalso random and varies from sample to sample. In the XFELapplication, X = A (cid:48) ( ω ) are the unobserved noiseless images,and their randomness stems for the random (and unobserved)orientation of the molecule. We may write Y = A (cid:48) ( ω ) +diag[ A (cid:48)(cid:48) ( ω )] / ε , where the coordinates of ε are conditionallyindependent and standardized given ω . The covariance of Y is given by the law of total covariance:Cov [ Y ] = Cov [ E ( Y | ω )] + E [ Cov [ Y | ω ]]= Cov [ A (cid:48) ( ω )] + E diag[ A (cid:48)(cid:48) ( ω )] . (3)The off-diagonal entries of the covariance matrix of thenoisy images are therefore the same as those of the cleanimages. However, the diagonal of the covariance matrix (i.e.,the variance) of the noisy images is further inﬂated by thenoise variance. Unlike white noise, the noise variance hereoften changes from one coordinate to another (i.e., it is het-eroscedastic). In e PCA, the homogenization step is a particularweighting method that improves the signal-to-noise ratio [25,Section 4.2.2]. Speciﬁcally, the homogenized vector is deﬁnedas Z = diag[ E A (cid:48)(cid:48) ( ω )] − / Y = diag[ E A (cid:48)(cid:48) ( ω )] − / A (cid:48) ( ω ) + ε. (4)Then the corresponding homogenized covariance matrix be-comesCov [ Z ] = diag[ E A (cid:48)(cid:48) ( ω )] − / Cov [ A (cid:48) ( ω )] diag[ E A (cid:48)(cid:48) ( ω )] − / + I. (5)This step is commonly called “prewhitening” in signal andimage processing, while “homogenization” is more commonly diag[ x ] for x ∈ R p denotes a p × p diagonal matrix whose diagonalentries are x ( j ) for j = 1 , · · · , p . used in statistics. The terms “prewhitened” and “homoge-nized” are synonyms in the context of the paper. In Sec-tion II-C, we discuss how to estimate the homogenized ro-tationally invariant covariance matrix.For the special case of Poisson observations Y ∼ Poisson p ( X ) , where X ∈ R p is random, we can write Y = X + diag( X ) / ε . The natural parameter is the vector ω with ω ( j ) = log X ( j ) and A (cid:48) ( ω ( j )) = A (cid:48)(cid:48) ( ω ( j )) =exp( ω ( j )) = X ( j ) . Therefore, we have E Y = E X , andCov [ Y ] = Cov [ X ] + diag[ E X ] . (6)In words, while the mean of the noisy images agrees with themean of the clean images, their covariance matrices differ by adiagonal matrix that depends solely on the mean image. If wehomogenize the data by Z = diag( E [ X ]) − / Y , then Eq. (6)becomes,Cov [ Z ] = diag( E [ X ]) − / Cov [ Y ] diag( E [ X ]) − / = diag( E [ X ]) − / Cov [ X ] diag( E [ X ]) − / + I. (7)We can estimate the covariance of X from the homoge-nized covariance Z according to Eq. (7), i.e. Cov [ X ] =diag( E [ X ]) / ( Cov [ Z ] − I ) diag( E [ X ]) / . In Sec. II-E, wedetail the corresponding recoloring step to estimate Cov [ X ] .In Alg. 1, Steps 1 and 5, with D n = diag[ ¯ Y ] , correspond tothe homogenization procedure in e PCA. We provide more de-tails on how to estimate the homogenized rotationally invariantcovariance matrix (Steps 1–5) in Sec. II-B and Sec. II-C.

B. Steerable basis expansion

Under the observation model in Sec. II-A, we develop amethod that estimates the rotational invariant Cov [ X ] efﬁ-ciently and accurately from the image dataset Y . We assumethat a digital image I is composed of discrete samples from acontinuous function f with band limit c . The Fourier transformof f , denoted F ( f ) , can be expanded in any orthogonal basisfor the class of squared-integrable functions in a disk of radius c . For the purpose of steerable PCA, it is beneﬁcial to choosea basis whose elements are products of radial functions withFourier angular modes, such as the Fourier-Bessel functions, or2-D prolate functions [35]. For concreteness, in the followingwe use the Fourier-Bessel functions given by ψ k,qc ( ξ, θ ) = (cid:40) N k,q J k (cid:16) R k,q ξc (cid:17) e ıkθ , ξ ≤ c, , ξ > c, (8)where ( ξ, θ ) are polar coordinates in the Fourier domain (i.e., ξ = ξ cos θ , ξ = ξ sin θ , ξ ≥ , and θ ∈ [0 , π ) ; N k,q =( c √ π | J k +1 ( R k,q ) | ) − is the normalization factor; J k is theBessel function of the ﬁrst kind of integer order k ; and R k,q is the q th root of the Bessel function J k . We also assume thatthe functions of interest are concentrated in a disk of radius R in real domain. In order to avoid aliasing, we only use Fourier-Bessel functions that satisfy the following criterion [26], [36] R k,q +1 ≤ πcR. (9)For each angular frequency k , we denote by p k the numberof components satisfying Eq. (9). The total number of compo-nents is p = (cid:80) k max k = − k max p k , where k max is the maximal possible Algorithm 1:

Steerable e PCA (s e PCA) and denoising

Input:

Image data Y that contains n images of size L × L Output:

Rotational invariant covariance estimator ofnoiseless images and denoised images Compute the sample mean ¯ Y = n (cid:80) ni =1 Y i ( e PCA) Estimate the support size R and band limit c for themean image (sPCA) Compute the Fourier-Bessel expansion coefﬁcients of F ( ¯ Y ) and estimate the rotationally invariant samplemean ¯ f as in Eq. (14) (s e PCA) Compute the variance estimate D n = diag[ ¯ f ] (s e PCA) Prewhiten the image data Z = D − / n Y ( e PCA) Estimate the band limit c for whitened images (sPCA) Compute the truncated Fourier-Bessel expansioncoefﬁcients of F ( Z ) and form the coefﬁcients matrices A ( k ) , for k = 0 , . . . , k max (sPCA) for k = 0 , , . . . , k max do Compute the prewhitened sample covariance matrix S ( k ) h as in Eq. (16) and its eigendecomposition S ( k ) h = ˆ U Λ ˆ U ∗ ( e PCA) Shrink the eigenvalues S ( k ) h,η γk = ˆ U η γ k (Λ) ˆ U ∗ of top r k eigenvalues according to Eq. (19) ( e PCA, sPCA) Compute the recoloring matrix B ( k ) in eq. (23) and D ( k ) in eq. (26) (s e PCA) Recolor the covariance matrix S ( k ) he = (cid:0) B ( k ) (cid:1) ∗ · S ( k ) h,γ k · B ( k ) (s e PCA) Compute the scaling coefﬁcients ˆ α in Eq. (27) andkeep components with ˆ α > (s e PCA) Scale the covariance matrix S ( k ) s = (cid:80) ˆ α ( k ) i ˆ v ( k ) i (cid:16) ˆ v ( k ) i (cid:17) ∗ , where theeigendecomposition of S ( k ) he is (cid:80) ˆ v ( k ) i (cid:16) ˆ v ( k ) i (cid:17) ∗ ( e PCA) Denoise { A ( k ) } k max k =0 as in Eqs. (29) and (30) (s e PCA) end The rotational invariant covariance matrix estimator ˆ S (( x, y ) , ( x (cid:48) , y (cid:48) )) = G (0) ( x, y ) S (0) s G (0) ( x (cid:48) , y (cid:48) ) ∗ +2 (cid:80) k max k =1 G ( k ) ( x, y ) S ( k ) s G ( k ) ( x (cid:48) , y (cid:48) ) ∗ . (sPCA) Reconstruct the denoised image using Eq. (31) (sPCA)value of k satisfying Eq. (9). We also denote γ k = p k n for k > and γ = p n .The inverse Fourier transform (IFT) of ψ k,qc is F − ( ψ k,qc )( r, φ ) = 2 c √ π ( − q R k,q J k (2 πcr ) ı k (2 πcr ) − R k,q e ıkφ ≡ g k,qc ( r ) e ıkφ , (10)where g k,qc ( r ) is the radial part of the inverse Fourier transformof the Fourier-Bessel function. Therefore, we can approximate f using the truncated expansion f ( r, φ ) ≈ k max (cid:88) k = − k max p k (cid:88) q =1 a k,q g k,qc ( r ) e ıkφ . (11) The approximation error in Eq. (11) is due to the ﬁnitetruncation of the Fourier-Bessel expansion. For essentiallybandlimited functions, the approximation error is controlledusing the asymptotic behavior of the Bessel functions, see [37,Section 2] for more details. We evaluate the Fourier-Besselexpansion coefﬁcients numerically as in [26] using a quadra-ture rule that consists of equally spaced points in the angulardirection θ l = πln θ , with l = 0 , . . . , n θ − and a Gaussianquadrature rule in the radial direction ξ j for j = 1 , . . . , n ξ withthe associated weights w ( ξ j ) . Using the sampling criterionintroduced in [26], the values of n ξ and n θ depend on thecompact support radius R and the band limit c . Our previouswork found that using n ξ = (cid:100) cR (cid:101) and n θ = (cid:100) cR (cid:101) results inhighly-accurate numerical evaluation of the integral to evaluatethe expansion coefﬁcients. To evaluate a k,q , we need to samplethe discrete Fourier transform of the image I , denoted F ( I ) ,at the quadrature nodes, F ( I )( ξ j , θ l ) = 12 R R − (cid:88) i = − R R − (cid:88) i = − R I ( i , i ) × exp ( − ı π ( ξ j cos θ l i + ξ j sin θ l i )) , (12)which can be evaluated efﬁciently using the the nonuniformdiscrete Fourier transform [38], and we get a k,q ≈ n ξ (cid:88) j =1 N k,q J k,q (cid:18) ξ j c (cid:19) (cid:91) F ( I )( ξ j , k ) ξ j w ( ξ j ) , (13)where (cid:91) F ( I )( ξ j , k ) is the 1D FFT of F ( I ) on each concentriccircle of radius ξ j . For real-valued images, it is sufﬁcient toevaluate the coefﬁcients with k ≥ , since a − k,q = a ∗ k,q . Inaddition, the coefﬁcients have the following properties: undercounter-clockwise rotation by an angle α , a k,q changes to a k,q e − ıkα ; and under reﬂection, a k,q changes to a − k,q . Thenumerical integration error in Eq. (13) was analyzed in [26]and drops below − for the chosen values of n ξ and n θ .The steerable basis expansion are applied in two parts ofthe steerable e PCA: (1) the rotationally invariant sample meanestimation in Step 3 of Alg. 1 and (2) the expansion of thewhitened images in Step 7 of Alg. 1.

C. The sample rotationally-invariant homogenized covariancematrix

Suppose I , . . . , I n are n discretized input images sampledfrom f , . . . , f n . Here, the observational vectors Y i for e PCAare simply Y i = I i . In e PCA, the ﬁrst step is to prewhitenthe data using the sample mean as suggested by Eqs. (4)and (6). However, the sample mean ¯ Y = n (cid:80) ni =1 Y i is notnecessarily rotationally invariant. With the estimated bandlimit and support size in Step 2, we compute the truncatedFourier-Bessel expansion coefﬁcients of F ( ¯ Y ) , denoted by ¯ a k,q . The rotationally invariant sample mean can be evaluatedfrom ¯ a k,q , ¯ f ( r, φ ) = 12 π (cid:90) π n n (cid:88) i =1 f i ( r, φ − α ) dα ≈ p (cid:88) q =1 ¯ a ,q g ,qc ( r ) . (14) The approximation error in Eq. (14) follows directly fromthat of Eq. (11). The rotationally invariant sample mean iscircularly symmetric. We denote by ¯ A a vector that containsall the coefﬁcients ¯ a ,q ordered by the radial index q . Althoughthe input images are non-negative, the ﬁnite truncation mayresult in small negative values in the mean estimation, sowe threshold any negative entries to zero. As mentioned inSec. II-A, the covariance matrix of the Poisson observationsdiffers by a diagonal matrix, where the diagonal entries areequal to the mean image. Therefore, in Step 4 of Alg. 1, wehave D n = diag[ ¯ f ] and it is used in Step 5 to compute thehomogenized vectors, similar to Eq. (4).We prewhiten the images by the estimated mean im-age to create new images Z , . . . , Z n as Z i ( x, y ) =¯ f ( x, y ) − / Y i ( x, y ) , when ¯ f ( x, y ) > , and Z i ( x, y ) is 0otherwise. The whitening step might change the band limit.Therefore, we estimate the band limit of the whitened imagesin Step 6. Combining Eqs. (9) and (13), we compute thetruncated Fourier-Bessel expansion coefﬁcients a ik,q of F ( Z i ) .Let us denote by A ( k ) the matrix of expansion coefﬁcients withangular frequency k , obtained by putting a ik,q into a matrix,where the columns are indexed by the image number i and therows are ordered by the radial index q . The coefﬁcient matrix A ( k ) is of size p k × n .We use ¯ Z to represent the rotational invariant sample meanof the whitened images. Under the action of the group O (2) ,i.e. counter-clock wise rotation by an angle α ∈ [0 , π ) andreﬂection β ∈ { + , −} , where ‘ + ’ indicates no reﬂection and‘ − ’ indicates with reﬂection, the image Z i is transformedto Z α,βi . Since the truncated Fourier-Bessel transform is al-most unitary [26], the rotationally invariant covariance kernel S (( x, y ) , ( x (cid:48) , y (cid:48) )) built from the whitened image data with allpossible in-plane rotations and reﬂections, deﬁned as, S (( x, y ) , ( x (cid:48) , y (cid:48) )) = 14 πn n (cid:88) i =1 (cid:88) β ∈{ + , −} (cid:90) π (cid:16) Z α,βi ( x, y ) − ¯ Z ( x, y ) (cid:1) (cid:16) Z α,βi ( x (cid:48) , y (cid:48) ) − ¯ Z ( x (cid:48) , y (cid:48) ) (cid:17) dα, (15)can be computed in terms of the IFT of the Fourier-Besselbasis and the associated expansion coefﬁcients. Subtractingthe sample mean is equivalent to subtracting n (cid:80) nj =1 a j ,q from the coefﬁcients a i ,q , while keeping other coefﬁcients un-changed. Therefore, we ﬁrst update the zero angular frequencycoefﬁcients by a i ,q ← a i ,q − n (cid:80) nj =1 a j ,q . In terms of theexpansion coefﬁcients, the rotational invariant homogenizedsample covariance matrix is S h = (cid:76) k max k = − k max S ( k ) h , with S ( k ) h = 1 n Re (cid:110) A ( k ) (cid:16) A ( k ) (cid:17) ∗ (cid:111) . (16)We further denote the eigenvalues and eigenvectors of S ( k ) h by λ ( k ) i and ˆ u ( k ) i , that is, S ( k ) h = p k (cid:88) i =1 λ ( k ) i ˆ u ( k ) i (ˆ u ( k ) i ) ∗ . (17)The procedure of homogenization for steerable e PCA isdetailed in Steps 1–5 in Alg. 1. Step 9 of Alg. 1 computes therotationally invariant homogenized sample covariance matrix.

D. Eigenvalue shrinkage

For data corrupted by additive white noise (with noisevariance 1 in each coordinate), previous works [39]–[42]show that if the population eigenvalue (cid:96) is above the Baik,Ben Arous, Péché (BBP) phase transition, then the sampleeigenvalue pops outside of the Marˇcenko-Pastur distribution ofthe “noise” eigenvalues. The sample eigenvalue will convergeto the value given by the spike forward map as p k , n → ∞ ,and p k /n = γ k : λ ( (cid:96) ; γ k ) = (cid:26) (1 + (cid:96) ) (cid:0) γ k (cid:96) (cid:1) if (cid:96) > √ γ k , (cid:0) √ γ k (cid:1) otherwise. (18)The underlying clean population covariance eigenvalues canbe estimated by solving the quadratic equation in Eq. (18), ˆ (cid:96) = η γ k ( λ )=  (cid:16) λ − − γ k + √ ( λ − − γ k ) − γ k (cid:17) , λ > (1 + √ γ k ) , λ ≤ (1 + √ γ k ) . (19)Shrinking the eigenvalues improves the estimation of thesample covariance matrix [43]. Since the homogenized samplecovariance matrix S h is decoupled into small sub-blocks S ( k ) h ,the shrinkers are deﬁned for each frequency k separately. Theshrinkers η γ k ( λ ) set all noise eigenvalues to zero for λ withinthe support of the Marˇcenko-Pastur distribution and reduceother eigenvalues according to Eq. (19). Then the denoisedcovariance matrices are S ( k ) h,η = r k (cid:88) i =1 η γ k (cid:16) λ ( k ) i (cid:17) ˆ u ( k ) i (ˆ u ( k ) i ) ∗ , (20)where r k is the number of components with η γ k ( λ ) > . Theempirical eigenvector ˆ u ( k ) of S ( k ) h is an inconsistent estimatorof the true eigenvector. We can heuristically quantify the in-consistency based on results from the Gaussian standard spikedmodel, even though the noise is non-Gaussian. Under thismodel, the empirical and true eigenvectors have an asymptot-ically deterministic angle: ( (cid:0) u ( k ) (cid:1) ∗ ˆ u ( k ) ) → c ( (cid:96) ; γ k ) almostsurely, where c ( (cid:96) ; γ k ) is the cosine forward map given by [41],[42]: c ( (cid:96) ; γ k ) = (cid:40) − γ k /(cid:96) γ k /(cid:96) if (cid:96) > √ γ k , otherwise. (21)Therefore, asymptotically for the population eigenvectors be-yond the BBP phase transition, the sample eigenvectors havepositive correlation with the population eigenvectors, but thiscorrelation is less than 1 [44]–[46]. We denote by ˆ c an estimateof c using the estimated clean covariance eigenvalues ˆ (cid:96) inEq. (19) and ˆ s = 1 − ˆ c .In short, Step 10 of Alg. 1 improves the estimation of therotationally invariant homogenized covariance matrix througheigenvalue shrinkage. E. Recoloring

Homogenization changes the direction of the clean eigen-vectors. Therefore, after eigenvalue shrinkage, we recolor (het-erogenize) the covariance matrix by conjugating the recoloring (a) clean (b) noisy

Fig. 1: Sample clean and noisy images of the XFEL dataset.Image size is × with mean per pixel photon count =0 . . (a) Estimate R (b) Estimate c Fig. 2: Estimating R and c from n = 7 × simulated noisyXFEL diffraction intensity maps of lysozyme. Each image isof size × pixels. (a) The radial proﬁle of the rotationalinvariant sample mean image. The radius of compact supportis chosen at R = 61 . (b) Mean radial power spectrum of thewhitened noisy images. The curve levels off at σ = 1 .matrix B with S h,η : S he = B ∗ · S h,η · B . The recoloring matrixis derived as, B k ,q ; k ,q = (cid:90) R (cid:90) π (cid:113) ¯ f ( r ) F − (cid:0) ψ k ,q c (cid:1) ( r, θ ) × (cid:18) F − (cid:16) ψ k ,q c (cid:17) ( r, θ ) (cid:19) rdrdθ = (cid:90) R (cid:113) ¯ f ( r ) g k ,q c ( r ) g k ,q c ( r ) rdr × (cid:90) π e ı ( − k + k ) θ dθ = δ k ,k (cid:90) R (cid:113) ¯ f ( r ) g k ,q c ( r ) g k ,q c ( r ) rdr, (22)which has a block diagonal structure and is decoupled for eachangular frequency, B = (cid:76) k max k = − k max B ( k ) , with B ( k ) q ,q = (cid:90) R (cid:113) ¯ f ( r ) g k,q c ( r ) g k,q c ( r ) rdr. (23)The radial integral in Eq. (23) is numerically evaluated usingthe Gauss-Legendre quadrature rule [47, Chap. 4], whichdetermines the locations of n r = (cid:100) cR (cid:101) points { r j } n r j =1 on theinterval [0 , R ] and the associated weights w ( r j ) . The integralin Eq. (23) is thus approximated by B ( k ) q ,q ≈ n r (cid:88) j =1 (cid:113) ¯ f ( r j ) g k,q c ( r j ) g k,q c ( r j ) r j w ( r j ) . (24) (a) (b) (c) (d) (e)(f) (g) (h) (i) Fig. 3: Eigenimages estimated from noisy XFEL data using PCA, steerable PCA (sPCA), e PCA, and steerable e PCA (s e PCA),ordered by eigenvalues. Input images are corrupted by Poisson noise with mean photon count 0.01 (shown in Figure 1b).The recoloring step is also decoupled for each angularfrequency sub-block. The heterogenized covariance estimatorsare S ( k ) he = (cid:16) B ( k ) (cid:17) ∗ · S ( k ) h,η γk · B ( k ) . (25)Similar to Eq. (23), we deﬁne D ( k ) which will be used to scalethe heterogenized covariance matrix estimator (see Eq. (27))and denoise the expansion coefﬁcients (see Eqs. (29) and (30)), D ( k ) q ,q = (cid:90) R ¯ f ( r ) g k,q c ( r ) g k,q c ( r ) rdr ≈ n r (cid:88) j =1 ¯ f ( r j ) g k,q c ( r j ) g k,q c ( r j ) r j w ( r j ) . (26)In Alg. 1, Steps 11 and 12 summarize the recoloring proce-dure. F. Scaling

The eigendecomposition of S ( k ) he gives S ( k ) he = (cid:80) r k i =1 ˆ v ( k ) i (cid:16) ˆ v ( k ) i (cid:17) ∗ . The empirical eigenvalues are ˆ t = (cid:107) ˆ v ( k ) (cid:107) which is a biased estimate of the true eigenvalues of the cleancovariance matrix Σ X . In [25, Sec. 4.2.3], a scaling rule wasproposed to correct the bias. We extend it in Steps 13 and 14in Alg. 1 to the steerable case and scale each eigenvalue of S ( k ) he by a parameter ˆ α ( k ) , ˆ α ( k ) = (cid:40) − ˆ s τ ( k ) ˆ c , for − ˆ s τ ( k ) > and ˆ c > otherwise (27) where the parameter τ ( k ) = tr D ( k ) p k · ˆ (cid:96) (cid:107) ˆ v ( k ) (cid:107) . The scaledcovariance matrices are S ( k ) s = r k (cid:88) i =1 ˆ α ( k ) i ˆ v ( k ) i (cid:16) ˆ v ( k ) i (cid:17) ∗ . (28)The rotational invariant covariance kernel S (( x, y ) , ( x (cid:48) , y (cid:48) )) is well approximated by (cid:80) k max k = − k max G ( k ) ( x, y ) S ( k ) s (cid:0) G ( k ) ( x (cid:48) , y (cid:48) ) (cid:1) ∗ , where G ( k ) contains IFT of all ψ k,qc with angular frequency k (seeStep 17) in Alg. 1). The computational complexity ofsteerable e PCA is O ( nL + L ) , same as steerable PCA,and it is lower than the complexity of e PCA which is O (min( nL + L , n L + n )) .In summary, Step 14 scales the heterogenized covariancematrix S he . The covariance matrix in the original pixel domainis efﬁciently computed from S ( k ) s in Step 17. G. Denoising

As an application of steerable e PCA, we develop a methodto denoise the photon-limited images. Since the Fourier-Besselexpansion coefﬁcients are computed from the prewhitenedimages, we ﬁrst recolor the coefﬁcients by multiplying B ( k ) with A ( k ) and then we apply Wiener-type ﬁltering to denoisethe coefﬁcients. For the steerable basis expansion coefﬁcients A ( k ) with angular frequency k (cid:54) = 0 , ˆ A ( k ) = S ( k ) s (cid:16) D ( k ) + S ( k ) s (cid:17) − B ( k ) A ( k ) . (29) (a) (b) Fig. 4: Error of covariance matrix estimation, measured as the (a) operator norm and (b) Frobenius norm of the differencebetween each covariance estimate and the true covariance matrix. The sample size n ranges from 100 to 140,000.Fig. 5: The estimated number of signal principal componentsusing PCA, sPCA, e PCA and s e PCA. Images are corruptedby Poisson noise with mean per pixel photon count 0.01.For k = 0 , we need to take into account the rotational invariantmean expansion coefﬁcients, ˆ A (0) = S (0) s (cid:16) D (0) + S (0) s (cid:17) − B (0) A (0) + D (0) (cid:16) D (0) + S (0) s (cid:17) − ¯ A (cid:62) n . (30)The denoised image sampled on the Cartesian grid ( x, y ) in real domain are computed from the ﬁltered expansioncoefﬁcients ˆ a ik,q , ˆ X i ( x, y ) = p (cid:88) q =1 ˆ a i ,q g ,qc ( r x,y )+ 2 Re (cid:34) k max (cid:88) k =1 p k (cid:88) q =1 ˆ a ik,q g k,qc ( r x,y ) e − ıkθ x,y (cid:35) , (31)where r x,y = (cid:112) x + y and θ x,y = tan − (cid:0) yx (cid:1) .In essense, we ﬁrst denoise the recolored steerable expan-sion coefﬁcients in Step 15 of Alg. 1 according to Eqs. (29)and (30) and then reconstruct the image using Eq. (31) inStep 18 of Alg. 1. Fig. 6: The runtimes for computing the principal componentsusing PCA, sPCA, e PCA and s e PCA. Images are corruptedby Poisson noise with mean per pixel photon count = 0 . . (a) Clean (b) Noisy Fig. 7: Sample clean and noisy images of randomly rotatedYale B face data. Image size is × with mean intensity2.3 photons per pixel.III. N UMERICAL R ESULTS

We apply PCA, e PCA, steerable PCA, and steerable e PCAto a simulated XFEL dataset and compare the results forcovariance estimation and denoising. The algorithms are im-plemented in MATLAB on a machine with 60 cores, runningat 2.3 GHz, with total RAM of 1.5TB. Only 8 cores were used (a) (b) (c) (d) (e)(f) (g) (h) (i)

Fig. 8: Eigenimages estimated from noisy rotated Yale B face data using PCA, sPCA, e PCA, and s e PCA, ordered by eigenvalues.Input images are corrupted by Poisson noise with mean intensity 2.3 photons per pixel (shown in Fig. 7b).in our experiments.We simulate n = 140 , noiseless XFEL diffractionintensity maps of a lysozyme (Protein Data Bank 1AKI) withCondor [48]. The average pixel intensity is rescaled to be0.01 for image size × pixels such that shot noisedominates [49]. To sample an arbitrary number n of noisydiffraction patterns, we sample an intensity map at random,and then sample the photon count of each detector pixel from aPoisson distribution whose mean is the pixel intensity. Figs. 1aand 1b illustrate the clean intensity maps and the resultingnoisy diffraction patterns.We estimate the radius of the concentration of the diffractionintensities in real domain and the band limit in Fourier domainfrom the noisy images in the following way. The data varianceis proportional to the sample mean. Therefore, we estimatethe rotationally invariant sample mean and show the radialproﬁle in Fig. 2a, which is also the variance map of thedataset averaged in the angular direction. At large r , theradial part of the rotational invaraint sample mean levels offat . We compute the cumulative variance by integrating theradial sample mean over r with a Jacobian weight rdr . Thefraction of the cumulative mean exceeds 99.9% at r = 61 ,and therefore R was chosen to be (see Fig. 2a). Wecompute the whitened projection images using the rotationalinvariant sample mean. For the whitened images, we computethe angular average of the mean 2D power spectrum. Thecurve in Fig. 2b levels off at the noise variance σ = 1 when ξ is large. We used the same method as before to computethe cumulative radial power spectrum. The fraction reaches99.9% at ξ = 0 . , therefore the band limit is chosen to be c = 0 . . The band limit c and support radius R are used in both steerable PCA and steerable e PCA.

A. Covariance estimation and principal components

Fig. 3 shows the top 12 eigenimages for clean XFELdiffraction patterns (Fig. 3e), and noisy diffraction patternswith mean photon count per pixel 0.01 (Figs. 3c–3i) usingPCA, steerable PCA, e PCA, and steerable e PCA. The trueeigenimages in Fig. 3e are computed from 70000 clean diffrac-tion patterns whose orientations are uniformly distributed over SO (3) . Figs. 3a–3d are computed from noisy images.Since the number of samples is much smaller than the size ofthe image and the noise type is non-Gaussian, PCA can onlyrecover the ﬁrst two or three components. e PCA improves theestimation and is able to extract the top 7 eigenimages. More-over, steerable PCA and steerable e PCA achieve much betterestimation of the underlying true eigenimages for a givensample size. Steerable e PCA achieves the best performancein estimating both the eigenvalues and eigenimages.Furthermore, we compare the operator norms and Frobeniusnorms of the difference between the covariance estimates andthe true covariance matrix. Fig. 4 shows that steerable e PCAsigniﬁcantly improves the covariance estimation, especiallywhen the sample size is small.For experiments using e PCA, we use a permutationbootstrap-based method to estimate the rank of the covariancematrix, following e.g. [50]. By randomly permuting eachcolumn of the mean-subtracted data matrix, we completelydestroy structural information including linear structure, whilethe noise statistics remain the same (see [51], [52] for an analy-sis). Singular values of the randomly permuted matrices reveal (a) Single Image NLPCA, MSE = 0 . (b) PCA, n = 1000 , MSE = 1 . × − (c) sPCA, n = 1000 , MSE = 5 . × − (d) e PCA, n = 1000 , MSE = 1 . × − (e) s e PCA, n = 1000 , MSE = 2 . × − (f) PCA, n = 70000 , MSE = 1 . × − (g) sPCA, n = 70000 , MSE = 5 . × − (h) e PCA, n = 70000 , MSE = 4 . × − (i) s e PCA, n = 70000 , MSE = 2 . × − Fig. 9: Sample denoised images of the XFEL dataset illustrated in Fig. 1 using NLPCA, PCA, sPCA, e PCA, s e PCA. Imagesize is × .Fig. 10: Comparing the denoising quality of the XFEL dataset( p = 16384 ) with various number of images.what should be the largest covariance matrix eigenvalues thatcorrespond to noise, up to a user-selected conﬁdence level ρ . This can replace the other rank estimation methods thatassume Gaussian distribution when the noise model is non-Gaussian, such as in our case. Empirically, we observe that ρ = 0 . gives the best performance in covariance estimation. For steerable e PCA, we estimate the number of componentsusing the right edge of the Marˇcenko-Pastur distribution forhomogenized covariance matrices S ( k ) h and include only thecomponents whose scaling factor ˆ α ( k ) are above zero.Steerable e PCA is able to recover more signal principalcomponents from noisy images than PCA, steerable PCA,and e PCA (see Fig. 5). When the sample size n = 1000 ,the mean number of estimated components is and for e PCA and steerable e PCA respectively. For n = 70000 , theestimated number of components is and for e PCAand steerable e PCA respectively. Fig. 6 shows that steerable e PCA is more efﬁcient than e PCA and PCA. Because steerable e PCA contains extra steps such as prewhitening, recoloring,and scaling, its runtime is slightly longer than steerable PCA.When n = 140000 , steerable e PCA is 8 times faster than e PCA.In addition to the XFEL diffraction intensity data, we in-clude a natural image dataset–Yale B face database [33], [34]–to illustrate the efﬁcacy of the proposed method. The databasecontains 5760 single light source images of 10 subjects with 9poses. For every subject in a particular pose, images werecaptured under different ambient illuminations. The originalimages of a single face under different lighting conditionsinhabit an approximately 9-dimensional linear space [53]. In (a) Single Image NLPCA, MSE = 7 . (b) PCA, n = 2 , MSE = 0 . (c) sPCA, n = 2 , MSE = 0 . (d) e PCA, n = 2 , MSE = 0 . (e) s e PCA, n = 2 , MSE = 0 . (f) PCA, n = 2 , MSE = 0 . (g) sPCA, n = 2 , MSE = 0 . (h) e PCA, n = 2 , MSE = 0 . (i) s e PCA, n = 2 , MSE = 0 . Fig. 11: Sample denoised images of the face dataset illustrated in Fig. 7 using NLPCA, PCA, sPCA, e PCA, and s e PCA. Imagesize is × . The intensity range is [0, 2.6] for (a), and is [0, 11.1] for (b)–(i).the experiments, we take one subject at a particular posewith 64 different illumination conditions. We downsample theoriginal images to the size of × pixels and scale theintensity such that the mean photon count for the whole datasetis . photons per pixel (i.e., “faces in the dark”). Since weneed to rotate the images, we mask images by a disk ofradius R = 32 . We uniformly sample n clean images fromthe original images and apply a random in-plane rotation toeach sample. Fig. 7 shows 12 samples of the clean data and thecorresponding noisy observations. The corresponding principalcomponents are illustrated in Fig. 8. The true eigenimages areevaluated using the clean images rotated at every 0.7 degrees. B. Denoising

We compare the denoising effects of steerable e PCA withPCA, steerable PCA, e PCA, and patch-based single imagenon-local PCA (NLPCA) [21], by the mean squared error,MSE := ( pn ) − (cid:80) ni =1 (cid:107) ˆ X i − X i (cid:107) . We perform “ e PCA de-noising” using the empirical best linear predictor (EBLP) [25],which had been shown to outperform “PCA denoising,” i.e.,orthogonal projection onto sample or e PCA / heterogenizedeigenimages, as well as the exponential family PCA methodproposed by [19]. Note that in our implementation of e PCAWiener-type ﬁltering, to avoid inverting a singular matrix(when some coordinates have 0 sample mean and sample variance), we compute diag[ ¯ Y ] + S s with regularization, diag[ ¯ Y ] + S s ← S s + (1 − (cid:15) ) diag[ ¯ Y ] + (cid:15)mI where (cid:15) = 0 . , m = p ¯ Y · , and I is the p × p identity matrix. The number ofcomponents are estimated by the permutation rank estimationas described in the previous section.Fig. 9 shows some examples of denoised XFEL images forsample size n = 1000 and . For robustness, we repeatedthe numerical experiment for another dataset simulated fromthe small protein chignolin (Protein Data Bank entry 1UAO)and obtained qualitatively similar results. Steerable e PCA isable to recover the images with lower MSEs compared to PCA,steerable PCA, and e PCA (see Fig. 10), especially when thesample size n is small.Fig. 11 shows examples of the denoised face images forsample size n = 2 and . We use the permutation rankestimation described in the previous section with conﬁdencelevel ρ = 10 for PCA, sPCA, and e PCA. In this example,we see that when the number of images is small, PCA and e PCA can only recover rough contours of the faces, whereassPCA and s e PCA are able to recover ﬁner details in the faces.The MSEs of the denoised images with varying number ofsamples are plotted in Fig. 12. In this experiment, the dataonly contains 64 unrotated clean images and the mean photoncount is 2.3 per pixel. Compared to the diffraction patternswith mean photon count 0.01 per pixel. Therefore, s e PCA is Fig. 12: Comparing the denoising quality of the face dataset( p = 4096 ) with various number of images. (a) XFEL images (b) Rotated Yale B face Fig. 13: Sensitivity of the denoising performance over theparameters c and R . For the face data, we cropped the imagesby the disc of radius R = 32 .able to sufﬁciently capture the clean data subspace with onlya small number of images, resulting in a much ﬂatter MSEcurve with respect to the number of images.We also test how sensitive our method is to the choiceof parameters, i.e. the band limit c and the support radius R . For the diffraction intensity data, we use n = 10000 diffraction patterns with mean photon count 0.01 per pixel.The underlying clean diffraction patterns are very smooth. Wevary the band limit from c = 0 . to . and the supportradius R ranges from 50 to 63. With the chosen parametersindicated in Fig. 2, the MSE for the denoised image is small(see Fig. 13a). Although the MSEs vary when we increasethe parameter c above the chosen band limit and change R within the range of 50 to 63, the errors are still relativelysmall ( . − . × − ), compared to above × − with e PCA and sPCA. For the face data, we compare the resultswith n = 8192 noisy images with 2.3 photons per pixel. Sincewe crop out the region within a disk of radius R = L/ , theparameter R is ﬁxed to be 32 and we vary the band limit c from . to . . We observe that choosing c = 0 . providesthe best denoising performance (see Fig. 13b), and this agreeswith the fact that the face images contain more high-frequencyinformation than the XFEL data. IV. C ONCLUSION

We presented steerable e PCA, a method for principal com-ponent analysis of a set of low-photon count images andtheir uniform in-plane rotations. This work has been mostlymotivated by its application to XFEL, but is relevant to otherlow-photon count imaging applications. The computationalcomplexity of the new algorithm is O ( nL + L ) , whereasthat of e PCA is O (min( nL + L , n L + n )) . Incorporatingrotational invariance allows more robust estimation of thetrue eigenvalues and eigenvectors. Our numerical experimentsshowed that steerable e PCA more accurately estimates the co-variance matrix and achieves better denoising results. Finally,we remark that the Fourier-Bessel basis can be replaced withother suitable bases, for example, the 2-D prolate spheroidalwave functions (PSWF) on a disk [54], [55]. Nevertheless,our method has certain limitations. For example, if the noisedistribution is not speciﬁed in advance, it is hard to estimatethe noise covariance E diag[ A (cid:48)(cid:48) ( ω )] , which will affect thehomogenization step of the proposed method. In addition,the optimal shrinkage function for the covariance estimationdepends on how we measure the error, for example usingFrobenius norm, operator norm, or other criteria. The corre-sponding theoretical guarantee in the non-Gaussian case is stillan open problem. Thus, the statistical optimality is beyondthe scope of this paper. For future work, we will study howsensitive the proposed method is to the misspeciﬁcation ofthe noise distribution and the statistical optimality of ourprocedure in the non-Gaussian case.A CKNOWLEDGMENTS

The authors would like to thank associate editor and theanonymous referees for their valuable comments. The authorswould also like to thank Edgar Dobriban and Boris Landa fordiscussions. ZZ was partially supported by National Center forSupercomputing Applications Faculty Fellowship and Univer-sity of Illinois at Urbana-Champaign College of EngineeringStrategic Research Initiative. AS was partially supported byAward Number R01GM090200 from the NIGMS, FA9550-17-1-0291 from AFOSR, Simons Investigator Award, the MooreFoundation Data-Driven Discovery Investigator Award, andNSF BIGDATA Award IIS-1837992.R

EFERENCES[1] V. Favre-Nicolin, J. Baruchel, H. Renevier, J. Eymery, and A. Borbély,“XTOP: high-resolution X-ray diffraction and imaging,”

Journal ofApplied Crystallography , vol. 48, no. 3, pp. 620–620, 2015.[2] F. R. N. C. Maia and J. Hajdu, “The trickle before the torrent-diffractiondata from X-ray lasers,”

Scientiﬁc Data , vol. 3, 2016.[3] S. H. W. Scheres, H. Gao, M. Valle, G. T. Herman, P. P. B. Eggermont,J. Frank, and J.-M. Carazo, “Disentangling conformational states ofmacromolecules in 3D-EM through likelihood optimization,”

NatureMethods , vol. 4, no. 1, pp. 27–29, 2007.[4] D. N.-T. Loh and V. Elser, “Reconstruction algorithm for single-particlediffraction imaging experiments,”

Phys. Rev. E , vol. 80, p. 026705, Aug2009.[5] Z. Kam, “Determination of macromolecular structure in solution byspatial correlation of scattering ﬂuctuations,”

Macromolecules , vol. 10,no. 5, pp. 927–934, 1977.[6] D. K. Saldin, V. L. Shneerson, R. Fung, and A. Ourmazd, “Structure ofisolated biomolecules obtained from ultrashort X-ray pulses: exploitingthe symmetry of random orientations,”

Journal of Physics: CondensedMatter , vol. 21, no. 13, 2009. [7] D. Starodub, A. Aquila, S. Bajt, M. Barthelmess, A. Barty, C. Bostedt,J. D. Bozek, N. Coppola, R. B. Doak, S. W. Epp et al. , “Single-particlestructure determination by correlations of snapshot X-ray diffractionpatterns,” Nature communications , vol. 3, p. 1276, 2012.[8] K. Pande, P. Schwander, M. Schmidt, and D. K. Saldin, “Deducingfast electron density changes in randomly orientated uncrystallizedbiomolecules in a pump–probe experiment,”

Phil. Trans. R. Soc. B , vol.369, no. 1647, p. 20130332, 2014.[9] R. P. Kurta, J. J. Donatelli, C. H. Yoon, P. Berntsen, J. Bielecki, B. J.Daurer, H. DeMirci, P. Fromme, M. F. Hantke et al. , “Correlationsin scattered X-ray laser pulses reveal nanoscale structural features ofviruses,”

Physical Review Letters , vol. 119, no. 15, p. 158102, 2017.[10] J. J. Donatelli, J. A. Sethian, and P. H. Zwart, “Reconstruction fromlimited single-particle diffraction data via simultaneous determinationof state, orientation, intensity, and phase,”

Proceedings of the NationalAcademy of Sciences , vol. 114, no. 28, pp. 7222–7227, 2017.[11] B. von Ardenne, M. Mechelke, and H. Grubmüller, “Structure deter-mination from single molecule X-ray scattering with three photons perimage,”

Nat Commun. , vol. 9, no. 1, p. 2375, 2018.[12] K. Pande, J. J. Donatelli, E. Malmerberg, L. Foucar, C. Bostedt,I. Schlichting, and P. H. Zwart, “Ab initio structure determinationfrom experimental ﬂuctuation x-ray scattering data,”

Proceedings of theNational Academy of Sciences

Principal Component Analysis . Wiley Online Library, 2002.[14] T. W. Anderson,

An Introduction to Multivariate Statistical Analysis .Wiley New York, 2003.[15] R. D. Nowak and R. G. Baraniuk, “Wavelet-domain ﬁltering for photonimaging systems,”

IEEE Transactions on Image Processing , vol. 8, no. 5,pp. 666–678, 1999.[16] F. Luisier, T. Blu, and M. Unser, “A new sure approach to image denois-ing: Interscale orthonormal wavelet thresholding,”

IEEE Transactions onimage processing , vol. 16, no. 3, pp. 593–606, 2007.[17] F. J. Anscombe, “The transformation of poisson, binomial and negative-binomial data,”

Biometrika , vol. 35, no. 3-4, p. 246, 1948.[18] J.-L. Starck, F. Murtagh, and J. M. Fadili,

Sparse image and signalprocessing: wavelets, curvelets, morphological diversity . Cambridgeuniversity press, 2010.[19] M. Collins, S. Dasgupta, and R. E. Schapire, “A generalization ofprincipal component analysis to the exponential family,”

Advances inNeural Information Processing Systems (NIPS) , 2001.[20] A. P. Singh and G. J. Gordon, “A uniﬁed view of matrix factorizationmodels,” in

Machine Learning and Knowledge Discovery in Databases ,W. Daelemans, B. Goethals, and K. Morik, Eds. Berlin, Heidelberg:Springer Berlin Heidelberg, 2008, pp. 358–373.[21] J. Salmon, Z. Harmany, C.-A. Deledalle, and R. Willett, “Poisson noisereduction with non-local PCA,”

Journal of mathematical imaging andvision , vol. 48, no. 2, pp. 279–294, 2014.[22] T. Furnival, R. K. Leary, and P. A. Midgley, “Denoising time-resolvedmicroscopy image sequences with singular value thresholding,”

Ultra-microscopy , 2016.[23] Y. Cao and Y. Xie, “Low-rank matrix recovery in Poisson noise,” in

Signal and Information Processing (GlobalSIP), 2014 IEEE GlobalConference on . IEEE, 2014, pp. 384–388.[24] M. Sonnleitner, J. Jeffers, and S. M. Barnett, “Local retrodiction modelsfor photon-noise-limited images,” in

SPIE Photonics Europe . Interna-tional Society for Optics and Photonics, 2016, pp. 98 960V–98 960V.[25] L. T. Liu, E. Dobriban, and A. Singer, “ e PCA: High dimensionalexponential family PCA,”

Annals of Applied Statistics , vol. 12, no. 4,pp. 2121–2150, 2018.[26] Z. Zhao, Y. Shkolnisky, and A. Singer, “Fast steerable principal com-ponent analysis,”

IEEE Transactions on Computational Imaging , vol. 2,no. 1, pp. 1–12, 2016.[27] W. T. Freeman and E. H. Adelson, “The design and use of steerableﬁlters,”

IEEE Transactions on Pattern Analysis & Machine Intelligence ,no. 9, pp. 891–906, 1991.[28] R. Hilai and J. Rubinstein, “Recognition of rotated images by invariantKarhunen-Loéve expansion,”

J. Opt. Soc. Am. A. , vol. 11, no. 5, pp.1610–1618, 1994.[29] P. Perona, “Deformable kernels for early vision,”

IEEE Trans. PatternAnal. Mach. Intell. , vol. 17, no. 5, pp. 488–499, 1995.[30] M. Uenohara and T. Kanade, “Optimal approxmation of uniformlyrotated images: Relationship between Karhunen-Loéve expansion anddiscrete cosine transform,”

IEEE Trans. Image Proces. , vol. 7, no. 1,pp. 116–119, 1998. [31] C. Ponce and A. Singer, “Computing steerable principal components ofa large set of images and their rotations,”

IEEE Transactions on ImageProcessing , vol. 20, no. 11, pp. 3051–3062, 2011.[32] Z. Zhao and A. Singer, “Rotationally invariant image representationfor viewing direction classiﬁcation in cryo-EM,”

Journal of StructuralBiology , vol. 186, no. 1, pp. 153–166, 2014.[33] K.-C. Lee, J. Ho, and D. J. Kriegman, “Acquiring linear subspaces forface recognition under variable lighting,”

IEEE Transactions on PatternAnalysis & Machine Intelligence , no. 5, pp. 684–698, 2005.[34] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, “From fewto many: Illumination cone models for face recognition under variablelighting and pose,”

IEEE Transactions on Pattern Analysis & MachineIntelligence , no. 6, pp. 643–660, 2001.[35] B. Landa and Y. Shkolnisky, “Approximation scheme for essentiallybandlimited and space-concentrated functions on a disk,”

Applied andComputational Harmonic Analysis , vol. 43, no. 3, pp. 381–403, 2017.[36] A. Klug and R. A. Crowther, “Three-dimensional image reconstructionfrom the viewpoint of information theory,”

Nature , vol. 238, pp. 435–440, 1972.[37] Q. Wang, O. Ronneberger, and H. Burkhardt, “Rotational invariancebased on fourier analysis in polar and spherical coordinates,”

IEEETransactions on Pattern Analysis and Machine Intelligence , vol. 31,no. 9, pp. 1715–1722, 2009.[38] L. Greengard and J. Lee, “Accelerating the Nonuniform Fast FourierTransform,”

SIAM Review , vol. 46, no. 3, pp. 443–454, 2004.[39] J. Baik, G. Ben Arous, and S. Péché, “Phase transition of the largesteigenvalue for nonnull complex sample covariance matrices,”

Annals ofProbability , vol. 33, no. 5, pp. 1643–1697, 2005.[40] J. Baik and J. W. Silverstein, “Eigenvalues of large sample covariancematrices of spiked population models,”

Journal of Multivariate Analysis ,vol. 97, no. 6, pp. 1382–1408, 2006.[41] D. Paul, “Asymptotics of sample eigenstructure for a large dimensionalspiked covariance model,”

Statistica Sinica , vol. 17, no. 4, pp. 1617–1642, 2007.[42] F. Benaych-Georges and R. R. Nadakuditi, “The eigenvalues and eigen-vectors of ﬁnite, low rank perturbations of large random matrices,”

Advances in Mathematics , vol. 227, no. 1, pp. 494–521, 2011.[43] D. L. Donoho, M. Gavish, and I. M. Johnstone, “Optimal shrinkageof eigenvalues in the Spiked Covariance Model,”

Annals of Statistics ,vol. 46, no. 4, p. 1742, 2018.[44] I. M. Johnstone and A. Y. Lu, “On consistency and sparsity for principalcomponents analysis in high dimensions,”

Journal of the AmericanStatistical Association , vol. 104, no. 486, pp. 682–693, 2009.[45] S. Jung, J. S. Marron et al. , “Pca consistency in high dimension, lowsample size context,”

The Annals of Statistics , vol. 37, no. 6B, pp. 4104–4130, 2009.[46] I. M. Johnstone and D. Paul, “Pca in high dimensions: An orientation,”

Proceedings of the IEEE , vol. 106, no. 8, pp. 1277–1292, 2018.[47] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery,

Nu-merical recipes in Fortran 90 . Cambridge university press Cambridge,1996, vol. 2.[48] M. F. Hantke, T. Ekeberg, and F. R. N. C. Maia, “Condor: A simulationtool for ﬂash X-ray imaging,”

Journal of Applied Crystallography ,vol. 49, no. 4, pp. 1356–1362, 2016.[49] P. Schwander, D. Giannakis, C. H. Yoon, and A. Ourmazd, “Thesymmetries of image formation by scattering. II. Applications,”

Opt.Express , vol. 20, no. 12, pp. 12 827–12 849, Jun 2012.[50] J. Landgrebe, W. Wurst, and G. Welzl, “Permutation-validated principalcomponents analysis of microarray data,”

Genome Biology , vol. 3, no. 4,2002.[51] E. Dobriban, “Permutation methods for factor analysis and PCA,” arXivpreprint arXiv:1710.00479 , Oct. 2017.[52] E. Dobriban and A. B. Owen, “Deterministic parallel analysis: an im-proved method for selecting factors and principal components,”

Journalof the Royal Statistical Society: Series B (Statistical Methodology) , 2018.[53] R. Basri and D. W. Jacobs, “Lambertian Reﬂectance and Linear Sub-spaces,”

IEEE Transactions on Pattern Analysis and Machine Intelli-gence , vol. 25, no. 2, pp. 218–233, 2003.[54] D. Slepian, “Prolate spheroidal wave functions, Fourier analysis, anduncertainty–IV: extensions to many dimensions, generalized prolatespheroidal wave functions,”

Bell System Technical Journal , vol. 43, pp.3009–3057, 1964.[55] B. Landa and Y. Shkolnisky, “Steerable principal components forspace-frequency localized images,”