Cross-Modal Fusion Between Data in SAXS and Cryo-EM for Biomolecular Structure Determination
Shengnan Lyu, Christian Wülker, Yuqing Pan, Amitesh S. Jayaraman, Jianhao Zheng, Yilin Cai, Gregory S. Chirikjian
CCross-Modal Fusion Between Data in SAXS and Cryo-EMfor Biomolecular Structure Determination
Shengnan Lyu , Christian W¨ulker , Yuqing Pan , Amitesh S. Jayaraman ,Jianhao Zheng , Yilin Cai , and Gregory S. Chirikjian ∗ Department of Mechanical Engineering, National University of Singapore, Singapore School of Mechanical Engineering and Automation, Beihang University, Beijing, China Department of Mechanical Engineering, Johns Hopkins University, Baltimore, MD, USA School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, ChinaAugust 12, 2019
Abstract
Cryo-Electron Microscopy (cryo-EM) has become an extremely powerful method forresolving structural details of large biomolecular complexes. However, challenging problemsin single-particle methods remain open because of (1) the low signal-to-noise ratio in EM;and (2) the potential anisotropy and lack of coverage of projection directions relative to thebody-fixed coordinate system for some complexes. Whereas (1) is usually addressed by classaveraging (and increasingly due to rapid advances in microscope and sensor technology), (2)is an artifact of the mechanics of interaction of biomolecular complexes and the vitrificationprocess. In the absence of tilt series, (2) remains a problem, which is addressed here bysupplementing EM data with Small-Angle X-Ray Scattering (SAXS). Whereas SAXS is ofrelatively low resolution and contains much lower information content than EM, we showthat it is nevertheless possible to use SAXS to fill in blind spots in EM in difficult caseswhere the range of projection directions is limited.
The goal of this paper is the fusion of information from cryo-electron microscopy (cryo EM) and small-angle X-ray scattering (SAXS) in order to exploit the synergies of both techniques. Bothof these methods are experimental biophysical techniques used to glean information about thestructure of (large) biomolecular complexes in the native state. The data acquisition in thesemethods can be briefly summarized as follows: in so-called single-particle EM, we are presentedwith noisy projections of the electron density of a molecular structure from a finite number of a priori unknown different directions relative to its body-fixed frame. The three-dimensionalshape is to be reconstructed from these projection data, i. e. , EM is a tomographic imagingtechnique. SAXS data acquisition, on the other hand, can be interpreted as taking the electrondensity of the structure — or rather the result of a convolution of the density with a reflectedversion of itself — and subsequent averaging over concentric spheres centered at the origin. This ∗ Corresponding author. E-mail: [email protected] a r X i v : . [ q - b i o . B M ] A ug s equivalent to square averaging the Fourier spectrum of the electron density on these spheres.For a detailed discussion of the mathematical aspects of SAXS and EM data acquisition, werefer the reader to Dong et al. [2015a]; Afsari et al. [2015].The information content and resolution of EM projections, either after class averaging or dueto new advances in imaging technology, are both significantly higher than SAXS. Nevertheless,we make the case that there are potential benefits to fuse these information sources.The main tool that we use in this work to fuse the information from EM and SAXS is thewell-known Fourier slice theorem . In two dimensions, this theorem states that the results of thefollowing two calculations are equal:1. Taking a two-dimensional function f ( x, y ), projecting it onto some (one-dimensional) line,and doing a Fourier transform of that projection along that line.2. Taking the function f ( x, y ), performing a two-dimensional Fourier transform, and then slicingit through its origin with a line parallel to the projection line mentioned above.This theorem directly generalizes into higher dimensions; particularly, it also holds in thethree-dimensional case.The starting point of this work is the Fourier slice theorem: we can interpret EM data asspectral data given on a finite number of planes passing through the origin as parts of theFourier spectrum of the electron density of the molecular structure of interest. The main issuehere is that there might be significantly large gaps between these planes in the spectral EMdata, because the number of directions in which projection data acquired in EM can be verylimited. This makes reconstruction via the two-dimensional inverse Fourier transform potentiallyproblematic. As indicated above, SAXS adds to the spectral EM data, for here the data canbe seen as constant values on concentric spheres centered at the origin of the Fourier space,resulting from square averaging of the spectrum over these spheres. Therefore, both EM andSAXS data can be considered in this context as partial information on the Fourier spectrum ofthe electron density to be reconstructed.The remainder of this paper is organized as follows: in Section 3, we derive our technique forSAXS-EM data fusion and in Section 4 we explore some properties of the multiple admissiblesolutions for the reconstructed data of the SAXS-EM fusion procedure. Section 5 presentsnumerical results, validating the practical applicability of our method. In Section 6, we discussour results and give an outlook on future developments. Macromolecular structure determination is of central importance in biology because of the closerelationship between shape and function. As the fields of biology and biophysics increasinglyfocus on the study of cellular phenomena, understanding the structure and motion of largebiomolecular complexes is becoming critical. Large biomolecular complexes bridge the lengthscales of single-molecule studies and cellular phenomena.Two of the main methods used in this context are small-angle X-ray scattering (SAXS)Stuhrmann [1970]; Feigin et al. [1987]; Blanchet and Svergun [2013]; Svergun and Koch [2003];Dong et al. [2015b] and cryo-electron microscopy (cryo EM) Crowther et al. [1970]; Van Heeland Frank [1981]; Penczek et al. [1992]; Frank [2006]; Wang and Sigworth [2006]; Scheres [2012].SAXS is known for its relatively low resolution and low information content in comparison tocryo-EM. Nevertheless, the information provided in these methods can be used to complement2ach other provided that the structural features in both experimental conditions are compatibleKim et al. [2017].The data acquisition processes in these methods are also quite different. A SAXS experimentis fast to perform (given access to a synchrotron) and requires relatively little sample preparation.But it provides very low resolution data. In contrast, a cryo-EM experiment can usually beperformed locally on microscopes owned by most research-intensive universities, and provideslarge amounts of data. But EM needs more effort for sample preparation and moreover, post-processing computations are necessary to classify, denoise and/or class-average images Bhamreet al. [2016]; Park and Chirikjian [2014]; Park et al. [2011]; Penczek [2002]; Schatz and Van Heel[1990]; Sigworth [1998]. After this, the images are aligned to yield a 3D density map at highresolution Penczek et al. [1996]; Scheres et al. [2005]; Shkolnisky and Singer [2012]; Singer et al.[2011]; Wang and Sigworth [2006].Usually either SAXS or cryo-EM is used, but here we investigate the “fusion” of structuralinformation obtained from SAXS and EM experiments. Although the type of data and thedata acquisition processes in these methods are quite different, they do share some commonfeatures. Both SAXS and EM differ from crystallography in that no crystallization is needed(at the expense of lower resolution). The advantage is that finding appropriate crystallizationconditions in many cases is either a very lengthy process or the process may capture the moleculesin conditions that are not biologically relevant.That said, in SAXS experiments, the biomolecular complexes of interest are in solution andare exposed to X-ray beams, which leads to the scattering of X-ray photons. Unlike in X-ray crystallography where the macromolecules are regularly positioned and oriented, in SAXSthe molecules can move and rotate randomly in solution. Hence, roughly speaking, the lowresolution in SAXS can be attributed to the fact that the SAXS data is the spherical averageof the scattering pattern of the complex under study.In cryo-EM, similar to standard tomography, we obtain numerous two-dimensional (2D)projections of a 3D complex. A specimen grid, with the molecules of interest in it, is preparedand exposed to high-energy electron beams to obtain millions of projected 2D images. However,in contrast to tomography, the projections are at random (unknown) directions; moreover,the projections are extremely noisy. As a result, the 3D volume reconstruction in cryo-EMinvolves complicated postprocessing and still the resolution is not very high as compared tocrystallographic structures (although it is much higher than SAXS).Both SAXS and EM have been successfully used for the structural study of large biomolecularcomplexes. However, as explained earlier, they have their own disadvantages for the study offunctionstructure relationships. To better investigate the relationship between the function andthe shape of a biomolecular complex, it may be desirable to fuse the information from these twoexperimental data modalities, which is the goal of this paper.
In this section, we elaborate on the theory of SAXS-EM data fusion in the two-dimensional andthree-dimensional case. We consider the two-dimensional case first, because we will later seethat our idea directly generalizes to the three-dimensional case, which simplifies the derivation.
Though proteins are 3D structures, before going into the mathematical details of 3D case, wefirst develop a conceptual framework illustrated with a 2D toy model.3n the two-dimensional case, the task is to reconstruct the unknown electron density ρ ( x, y )of a fictitious 2D structure of interest from partial Fourier data. The idea of information fusionis to combine the information from both EM and SAXS to fill in the missing values of the Fourierspectrum ˆ ρ ( ω , ω ) in the plane, so that ρ can be reconstructed via a two-dimensional inverseFourier transform. Figure 1 shows the basic two-dimensional setting. In Fourier space, fromEM and the Fourier slice theorem, the values of the spectrum ˆ ρ on each line passing throughthe origin are known. On the other hand, the values of the square average of ˆ ρ on each of theconcentric circles are known from SAXS. For simplicity, Fig. 1 shows only one such circle, withradius equal to one. Many such circles will be used to cover the plane in Fourier space. Figure 1.
SAXS-EM data in the two-dimensional setting (conceptual plot).
Switching to polar coordinates, r and θ , we see that the value of ˆ ρ ( r cos θ i , r sin θ i ) at theintersection between each line and the circle is known, because from the Fourier slice theorem,the EM data can be seen as the values of ˆ ρ on lines passing through the origin, perpendicular tothe respective projection direction (in 3D the slice would be a plane and the projection directionwould be normal to this plane). The sought-after density function ρ : R → R ≥ has the Fouriertransform ˆ ρ ( ω ) = (cid:90) R ρ ( x )e − i ω T x d x . The original function ρ can be reconstructed from its Fourier transform via the inversetransform, ρ ( x ) = (2 π ) − (cid:90) R ˆ ρ ( ω )e i ω T x d ω . (1)In polar coordinates, the frequency vector can be written as ω = r · (cos θ, sin θ ) T , where r = (cid:107) ω (cid:107) is the Euclidean 2-norm of ω . The plane is divided into an infinite number of circleswith radius r ∈ R ≥ . Without loss of generality, let us choose in Fig. 1 the unit circle S withradius r = 1, so that ω = (cos θ, sin θ ) T . In this case, we can use the short hand ˆ ρ ( ω ) ≡ ˆ ρ ( θ ),4o that ˆ ρ ( θ ) describes a function on the unit circle in the ω -plane. Since − (cos θ, sin θ ) =(cos( θ + π ) , sin( θ + π )), we get directly from the definition of the Fourier transformˆ ρ ( θ ) = ˆ ρ ( θ + π ) . (2)From the various projections in EM, only the values of the function ˆ ρ along specific lines in R is obtained, and these lines are not uniformly spaced with respect to their angle. Our goal isto interpolate the data so as to match the projection data of the experiment, while also gettingmeaningful “in-between” values of the function ˆ ρ on the whole circle. The SAXS experimentprovides the square average of the function ˆ ρ along the circle, which is the information that weshall incorporate in the interpolation process.For the unit circle S with radius r = 1, the 2D version of the SAXS experiment gives thesingle value I := (cid:90) S | ˆ ρ ( θ ) | d θ = (cid:90) S ˆ ρ ( θ )ˆ ρ ( θ )d θ. (3)On the other hand, data ˆ ρ ( θ i ) and ˆ ρ ( π + θ i ) = ˆ ρ ( θ i ) at specific points θ i , i = 1 , , . . . , m , onthe unit circle are provided by EM, where m is the number of different projection angles. Weapproximate the function ˆ ρ with a truncated Fourier series ( i. e. , a trigonometric polynomial),ˆ ρ ( θ ) = n (cid:88) k = − n ˜ ρ k e i kθ , (4)where ˜ ρ k = (2 π ) − (cid:90) π ˆ ρ ( θ )e − i kθ d θ (5)are the (classical) Fourier coefficients of the function ˆ ρ on the unit circle S , and n ∈ { , , , . . . } is the approximation order. We note that˜ ρ − k = ( − k ˜ ρ k , (6)due to (2).Now we substitute (4) into (3), and derive I = (cid:90) π (cid:18) n (cid:88) k = − n ˜ ρ k e i kθ (cid:19)(cid:18) n (cid:88) k (cid:48) = − n ˜ ρ k (cid:48) e i k (cid:48) θ (cid:19) d θ (7)= n (cid:88) k,k (cid:48) = − n ˜ ρ k ˜ ρ k (cid:48) (cid:90) π e i kθ e − i k (cid:48) θ d θ = 2 π n (cid:88) k,k (cid:48) = − n ˜ ρ k ˜ ρ k (cid:48) δ k,k (cid:48) = 2 π n (cid:88) k = − n | ˜ ρ k | , where δ k,k (cid:48) is the Kronecker delta symbol, which is equal to one if k = k (cid:48) and equal to zerootherwise. Equation (7) is an approximation to the well-known Parseval equality. Now let usbring this equation into matrix-vector notation. To this end, we introduce the vectors5 ρ := (cid:0) ˜ ρ − n , . . . , ˜ ρ − | ˜ ρ | ˜ ρ , . . . , ˜ ρ n (cid:1) T . ˆ ρ = (cid:0) ˆ ρ ( θ ) , ˆ ρ ( θ ) , . . . , ˆ ρ ( θ m ) (cid:1) T . With this vector, (7) attains the form I = 2 π ˜ ρ H ˜ ρ , where ˜ ρ H = ˜ ρ T is the Hermitian transpose of ˜ ρ .Now turning back to (4), we find that ˆ ρ = A ˜ ρ . (8)where A = e − i nθ . . . e − i θ i θ . . . e i nθ e − i nθ . . . e − i θ i θ . . . e i nθ ... ... ... ... ...e − i nθ m . . . e − i θ m i θ m . . . e i nθ m ∈ C m × (2 n +1) . It can be seen from (5) that the unknown ˜ ρ is real. We bring this unknown quantity to theleft-hand side of (8), ˆ ρ − ˜ ρ = F ˜ ρ (cid:48) (9)where is the vector of length 2 m containing only ones, F = e − i nθ . . . e − i θ e i θ . . . e i nθ e − i nθ . . . e − i θ e i θ . . . e i nθ ... ... ... ...e − i nθ m . . . e − i θ m e i θ m . . . e i nθ m ∈ C m × n , ˜ ρ (cid:48) = (cid:0) ˜ ρ − n · · · ˜ ρ − | ˜ ρ · · · ˜ ρ n (cid:1) T . (10)Suppose the matrix F is invertible (by imposing the constraint m = n ), then ˜ ρ (cid:48) = F − ( ˆ ρ − ˜ ρ ) . (11)Substituting (11) into (7), we finally get˜ ρ + [ F − ( ˆ ρ − ˜ ρ )] H [ F − ( ˆ ρ − ˜ ρ )] = (2 π ) − I. (12)This is a quadratic equation in ˜ ρ , so we can solve it for ˜ ρ first. Subsequently, we can solve(11) for ˜ ρ (cid:48) , so that all the Fourier coefficients ˜ ρ are found. This in turn allows for interpolationof the given spectral values ˆ ρ ( θ ) , . . . , ˆ ρ ( θ m ) on the circle. Note that the symmetry relations(2) and (6) can be exploited here. Alternatively, an adequate algorithm for solving the system(11) is the non-equispaced fast Fourier Transform (NFFT) of Potts et al. [2001]. This algorithmand its so-called adjoint could also be used in the first step in an iterative scheme for solvingthe quadratic equation (12) for ˜ ρ . Performing the above-described procedure on a suitablenumber of concentric circles, we can potentially interpolate the complete spectrum ˆ ρ of ρ . Inany case, from the interpolated spectral data, we can reconstruct the sought-after density ρ from6ts spectrum ˆ ρ via a discretized inverse Fourier transform, cf. (1). This last step can be carriedout using either a classical inverse FFT or the inverse NFFT of Potts et al., depending on thepoints at which the spectrum ˆ ρ is interpolated.Before we show how the above method for SAXS-EM data fusion generalizes to the three-dimensional case, an important remark is in order. Of course, the above matrix F will beinvertible if and only if m = n and if the projection angles θ i are pairwise different from eachother. If the number m of EM projection angles is smaller than the approximation order n , thanthe Moore-Penrose pseudo inverse of the matrix F , or any other method for solving the linearsystem (9) could be used. However, it clearly makes sense to choose the approximation order n such that it matches m , so that the above-described method for SAXS-EM data fusion resultsin a quadratic and invertible matrix F . If some of the EM projection angles θ i are very similarto each other, it could be numerically more stable to exclude the “redundant” angles, and tochoose the approximation order n accordingly. The over-determined case m > n is neither ofpractical nor of theoretical interest for here the information of SAXS would theoretically alreadybe included in the EM data.Due to the geometric properties of the two-dimensional unit sphere, which are different fromthose of the three-dimensional unit sphere, the situation in the three-dimensional case is slightlymore complicated than this, as we will see later. But the same conceptual framework applies. In this subsection, the basic principle behind SAXS-EM data fusion is illustrated numerically.To show the general concept, the experiment is performed in real space. But the calculationmethod can be implemented in Fourier space as well.The following function on a unit circle is used as a testing model, f ( θ ) = (2 sin θ + cos θ + 2 cos θ ) In the experiment, it is supposed that values of the function are known when θ ∈ [30 ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ], which is used as EM data.The SAXS information can then be calculated as I (1) = (cid:90) π | f ( θ ) | d θ (13)Both the EM reconstruction and SAXS-EM fusion reconstruction have been performed.Results are shown in Fig. 2, in which the two dashed lines (in red and green) are results fromthe SAXS-EM fusion, the black dotted line shows the EM result, and the blue dashed line withblack dots is the original function. 7 igure 2. Reconstruction results of EM and SAXS-EM comparing with the original function
It can be seen from the figure, the SAXS-EM fusion technique provides two very differentresults. One fits the original function pretty well, but the other one goes far from the groundtruth. The two result phenomenon can be explained since (12) is a quadratic equation solvingfor ˜ ρ , which yields two solutions. However, as shown in the figure, one of the two solutions isdefinitely wrong.In this 1D example, the value of the function can be used as the filter. Since the originalfunction indicates the density of a model must be positive in each point, the red dash linecan be easily eliminated. In the 2D and 3D cases, the minimum mean square error has beenadopted to choose the solution which gives a more stable intensity value, I ( ρ ). However, thisis a computationally intensive procedure and the origin of the two solutions obtained as well asthe filtering steps will be explained in detail later in Section 4. In single-particle cryo EM, we have many copies of the same biomolecular structure (each calleda “particle”) oriented in many different ways. If ρ : R → R ≥ is the density function of theparticle as described in its own body-fixed frame, and if R i is the rotation matrix describingthe orientation of the i th particle relative to the laboratory frame, then the projection along thelaboratory z -axis will be π i ( x, y ) := (cid:90) ∞−∞ ρ ( R T i x )d z = (cid:90) ∞−∞ ρ ( xR T i e + yR T i e + zR T i e )d z, where x = ( x, y, z ) T ∈ R , while e , e , e ∈ R are respectively the canonical unit vectors in x -, y -, and z -direction.This means that in the body-fixed frame of the i th particle, the projection direction n i is n i = R Ti e . π i of the projection π i ( x, y ) is a slice of the three-dimensional Fourier transform,ˆ ρ ( ω ) = (cid:90) R ρ ( x )e − i ω T x d x of the density function ρ in the plane through the origin with normal n i . Explicitly, the Fourierslice theorem here readsˆ π i ( ω ) = ˆ ρ ( ω R T i e + ω R T i e ) , ω = ( ω , ω ) T ∈ R . (14)The density function ρ can be recovered from its Fourier transform ˆ ρ via the inverse Fouriertransform, ρ ( x ) = (2 π ) − (cid:90) R ˆ ρ ( ω )e i ω T x d ω . (15)Let us assume that an existing EM reconstruction method has determined R , . . . , R n , where n is the total number of particles. Our goal is to fuse the experimental EM information for i = 1 , ..., n given in the left-hand-side of (14) with the SAXS experimental measurements. TheSAXS data is the square average of the Fourier transform ˆ ρ over spheres centered at the origin, I ( r ) := (cid:90) S | ˆ ρ ( r u ) | d u = (cid:90) S ˆ ρ ( r u )ˆ ρ ( r u )d u , (16)where S is the two-dimensional unit sphere in R . Figure 3.
Geometric situation in the three-dimensional case (conceptual plot).
Our approach for SAXS-EM data fusion in the three-dimensional case will be a direct gener-alization of the two-dimensional case discussed in the previous section. We wish to reconstruct ˆ ρ by viewing the Fourier space as consisting of an infinite number of concentric spheres. On eachof these spheres will be a certain number of so-called great circles, arising from the intersection9f a plane containing the origin with the respective sphere, cf. Fig. 3. The values of the spectrumˆ ρ on these great circles are provided by EM data, cf. (14). The SAXS information (16) will serveto improve the reconstruction in the potentially large “gaps” between the great circles.Apart from that, the main difference between the three-dimensional and the planar case isthat instead of the classical Fourier basis e i kθ , we will now employ spherical harmonics that webriefly review below. On the two-dimensional unit sphere S in R , let us introduce spherical coordinates θ and φ such that for any point u = ( x, y, z ) on the unit sphere S it holds x = sin θ cos φ,y = sin θ sin φ,z = cos θ. The angle θ ∈ [0 , π ] is called the polar angle, while φ ∈ [0 , π ) is referred to as the azimuthalangle. The integral of any integrable function f : S → C can be computed via (cid:90) S f ( u )d u = (cid:90) π (cid:90) π f ( θ, φ )d φ sin θ d θ. If a function f is square-integrable over S , i. e. , (cid:90) S | f ( u ) | d u < ∞ , then it can be expanded in terms of the spherical harmonics. These are defined as Y lm ( θ, φ ) := (cid:115) (2 l + 1)4 π ( l − m )!( l + m )! P ml (cos θ )e i mφ , l ∈ { , , , . . . } , m ∈ {− l, . . . , l } , where P ml ( x ) := ( − m l l ! (1 − x ) m/ d l + m d x l + m ( x − l , x ∈ [ − , , is an associated Legenrde polynomial. This is possible because the spherical harmonics consti-tute an orthonormal basis for L ( S ), the set of all square-integrable functions on the sphere.Such an expansion reads as f = ∞ (cid:88) l =0 l (cid:88) m = − l ˜ f lm Y lm , (17)where ˜ f lm = (cid:90) S f ( u ) Y lm ( u )d u (18)are the generalized, spherical Fourier coefficients of the function f . Equality in (17) holds in the L sense, but in general not pointwise. However, as is commonly done, we can use a truncated10ersion of the Fourier series as a pointwise approximation to the function f of interest, whichwill be smooth in general.Without going into too much detail, we note that the spherical harmonics have nice propertiesunder rotation. Indeed, for a given rotation matrix R we find that Y lm ( R T u ) = l (cid:88) m (cid:48) = − l D ( l ) mm (cid:48) ( R ) Y lm (cid:48) ( u ) , where D ( l ) mm (cid:48) are the well-studied Wigner-D functions (see Biedenharn and Louck [1981]; Chirikjianand Kyatkin [2016] for example). In particular, this means that a rotated spherical harmoniccan be written as a linear combination of spherical harmonics of the same degree. As in the two-dimensional case, without loss of generality, let us confine ourselves to the unitsphere S , i. e. , the sphere with radius r = 1 in R . We can approximate the sought-after densityspectrum ˆ ρ restricted to the unit sphere with a truncated spherical Fourier series,ˆ ρ | S = N − (cid:88) l =0 l (cid:88) m = − l ˜ ρ lm Y lm , where l ∈ { , , , . . . } is the approximation order, and ˜ ρ lm are the spherical Fourier coefficientsof ˆ ρ | S , cf. (18).The SAXS experiment gives I := I (1) = (cid:90) S (cid:18) N − (cid:88) l =0 l (cid:88) m = − l ˜ ρ lm Y lm ( u ) (cid:19)(cid:18) N − (cid:88) l (cid:48) =0 l (cid:48) (cid:88) m (cid:48) = − l (cid:48) ˜ ρ l (cid:48) m (cid:48) Y l (cid:48) m (cid:48) ( u ) (cid:19) d u (19)= N − (cid:88) l =0 l (cid:88) m = − l N − (cid:88) l (cid:48) =0 l (cid:48) (cid:88) m (cid:48) = − l (cid:48) ˜ ρ lm ˜ ρ l (cid:48) m (cid:48) (cid:90) S Y lm ( u ) Y l (cid:48) m (cid:48) ( u )d u = N − (cid:88) l =0 l (cid:88) m = − l N − (cid:88) l (cid:48) =0 l (cid:48) (cid:88) m (cid:48) = − l (cid:48) ˜ ρ lm ˜ ρ l (cid:48) m (cid:48) δ ll (cid:48) δ mm (cid:48) = N − (cid:88) l =0 l (cid:88) m = − l | ˜ ρ lm | . Let us assume that the EM data ˆ ρ ( u ) , . . . , ˆ ρ ( u M ) are given on the unit sphere at points u , . . . , u M ∈ S . Note that these points will lie on a certain number of great circles, cf. Fig. 3.Analogous to (2), in the three-dimensional case we have thatˆ ρ ( u ) = ˆ ρ ( − u ) . (20)Note that in spherical coordinates, − u parity transforms a point u with coordinates ( θ, φ )to ( π − θ, π + φ ). From (20) and with the well-known relation Y lm = ( − m Y l, − m , it can furtherbe derived that (cf. (6)) ˜ ρ lm = ( − l + m ˜ ρ l, − m . (21)11et us introduce the EM-data vector ˆ ρ := (ˆ ρ ( u ) , . . . , ˆ ρ ( u M )) T , as well as the sought-after frequency vector ˜ ρ := (˜ ρ , | ˜ ρ , − , ˜ ρ , , ˜ ρ , | ˜ ρ , − , . . . , ˜ ρ , | . . . | ˜ ρ N − , − N , . . . , ˜ ρ N − ,N − ) T . The SAXS data (19) thus attain the form I = ˜ ρ H ˜ ρ . (22)Now we can state the EM reconstruction problem in matrix-vector notation as ˆ ρ = B ˜ ρ (23)where B = Y , ( u ) Y , − ( u ) Y , ( u ) Y , ( u ) . . . Y N − ,N − ( u ) Y , ( u ) Y , − ( u ) Y , ( u ) Y , ( u ) . . . Y N − ,N − ( u )... ... ... ... ... Y , ( u M ) Y , − ( u M ) Y , ( u M ) Y , ( u M ) . . . Y N − ,N − ( u M ) ∈ C M × N . When l = m = 0, from (21), we have ˜ ρ , = ˜ ρ , and so ˜ ρ , is real. Bringing this unknownto the left-hand side while noting that Y , = (4 π ) − / , the system (23) transforms into ˆ ρ − ˜ ρ , √ π = F ˜ ρ (cid:48) (24)where is the vector of length M containing only ones, F = Y , − ( u ) Y , ( u ) Y , ( u ) . . . Y N − ,N − ( u ) Y , − ( u ) Y , ( u ) Y , ( u ) . . . Y N − ,N − ( u )... ... ... ... Y , − ( u M ) Y , ( u M ) Y , ( u M ) . . . Y N − ,N − ( u M ) ∈ C M × ( N − , ˜ ρ (cid:48) = (cid:0) ˜ ρ , − ˜ ρ , ˜ ρ , | · · · | ˜ ρ N − ,N − (cid:1) T . With the Moore-Penrose pseudo inverse F † of F , the system (24) can be solved as ˜ ρ (cid:48) = F † ( ˆ ρ − ˜ ρ , / √ π ) . (25)Inserting this equation into (22) then yields˜ ρ , + [ F † ( ˆ ρ − ˜ ρ , / √ π )] H [ F † ( ˆ ρ − ˜ ρ , / √ π )] = I. (26)As in the two-dimensional case, this is a quadratic equation with respect to the first unknownFourier coefficient ˜ ρ , . After solving this quadratic equation, the remaining Fourier coefficients ˜ ρ (cid:48) can be determined by solving the linear system (25) subject to (20) and (21), so that all theFourier coefficients ˜ ρ are found by inverting (23), which we do via the pseduo inverse based onthe SVD. An alternative tool for this task is the well-known fast non-equispaced spherical Fouriertransform of Kunis and Potts [2003]. This fast algorithm and its adjoint can also be used in the12rst step, in which the quadratic equation (26) would then be solved for ˜ ρ , with a suitableiterative technique. Once the spherical Fourier coefficients ˜ ρ are determined, the spectral EMdata ˆ ρ can be interpolated on the sphere. After this is done on a suitable number of concentricspheres, the spectrum ˆ ρ of ρ can potentially be interpolated in R . The sought-after densityfunction ρ can now be reconstructed from the interpolated spectral data via a discretized inverseFourier transform, cf. (15).To close this section, let us comment on the relation between the number M of EM datapoints considered on the unit sphere S above and the approximation order N . In the two-dimensional case, we saw that this relation was very clear: As long as the different projectionangles θ i in EM are not too similar to each other, the approximation order m should be chosen tobe equal to the number of projection angles n , for this will result in a quadratic and invertibleFourier matrix F in (9). As mentioned above, the situation in the three-dimensional case isslightly more complicated than that. In particular, it would in general not be sufficient tochoose the approximation order N equal to √ M , although this would mean that we had asmany data points as spherical Fourier coefficients that we wish to reconstruct ( N ). The reasonfor this is that the EM data will always lie on great circles (again, cf. Fig. 3), which is an inherentproblem in EM imaging if reconstruction is performed using concentric spheres. Indeed, it iswell known from constructive approximation theory on the sphere that N pairwise differentsample points on one great circle are not sufficient for reconstructing a spherical polynomial ofdegree N − N Fourier coefficients (see Freeden et al. [1998], for instance). The situation isslightly relaxed by the fact that there will be more than one great circle in practical applications.The number of different projection angles, however, can be quite limited, and there might belarge gaps between the projection planes, where information is missing. This makes includinginformation available from SAXS even more desirable.
This numerical experiment shows the difference between the EM reconstruction and the SAXS-EM reconstruction. The aim is to illustrate that when additional data from SAXS experimentsis supplemented, the reconstruction is better for the three-dimensional case. The calculation isperformed in Fourier space directly.The 3D function used in this experiment is f ( x ) = e − x − x o ) − e − x − x o ) where x o = (cid:2) / √ / √ (cid:3) , x o = (cid:2) − (cid:3) .The value of the function on a unit sphere ( r = 1) is shown in Fig. 4. In the numericalexperiment, 20 projections of the function were randomly generated. On each projection, whichis a unit sphere, 50 evenly distributed points are selected with known values. This constitutesthe EM data. 13 igure 4. Geometric situation in the three-dimensional case (conceptual plot).
The roots that the quadratic equation in (26) is expected to generate can be compared withan analytical solution calculated using (18),˜ ρ , = π √ π ( 1 − e − × − − e − ×
12 )and hence, ˜ ρ , = 0.Figure 6(a) shows the reconstruction result using only the EM data. Qualitatively thereconstruction creates fictitious spots on the sphere and does not reproduce the peaks of theGaussian correctly. This is reflected in the absolute error plot adjoining it. The reconstructionresult from SAXS and EM fusion and the corresponding absolute error are shown in Fig. 5. (a) (b) Figure 5.
Result of SAXS-EM reconstruction and its absolute error a) (b) Figure 6.
Result of EM reconstruction and its absolute error
By comparing Fig. 6(b) and Fig. 5(b), the reconstruction error has been reduced 50% byfusing SAXS data. Furthermore, it is worth to note that the coarser the EM data sampling,the greater the reconstruction error. In such instances, the role of SAXS data becomes morevaluable.
In this section, we describe in greater detail the origin, underlying structure as well as themethod of selection employed to choose between the roots of the quadratic equations in (12)and (26).Let a represent the coefficient of the zero-order term in the Fourier or spherical harmonicexpansion ( i. e. ρ and ρ , respectively). The two roots of the quadratic equation, a and a ,and they can be used to derive two different sets of ˜ ρ from (11) and (25): ˜ ρ and ˜ ρ . From theintensity constraint ((19) for 2D and (22) for 3D), we observe that for any ˜ ρ that satisfies theconstraint, a transformed coefficient vector, U ˜ ρ where U is a unitary matrix ( i. e. U U H = I )also satisfies the constraint. Hence, we can write ˜ ρ = U ˜ ρ . We now concentrate on the 2Dcase, although a similar analysis can be performed for the 3D scenario as well. We let 2 n + 1 denote the dimension of the ˜ ρ vector (the superscript will henceforth be omittedand the relationship between the two solutions be made clear through the unitary matrix) andlet m = 2 n be the dimension of the EM data, ˆ ρ . As we proceed to explain, the differenceof 1 between the two dimensions leaves the system under-determined and therefore allows theintensity constraint to close the system of equations. If the difference between the two dimensionswere larger, one would require the Moore-Penrose pseudo inverse to approximate the solutions(as is performed in the three-dimensional scenario). We proceed by making some observationsregarding the under-determined nature of the system of equations. Since ˜ ρ and U ˜ ρ both satisfyequation (8), 15 ρ = A ˜ ρ = AU ˜ ρ . (27)This implies that, A ( U − I ) ˜ ρ = . (28)Hence, the matrix U − I transforms a non-zero ˜ ρ such that it finds itself in the null space of A .This result implies that A has linearly independent rows but linearly dependent columns. This isa consequence of finite discretization: EM data that is known at some points can allow for morethan one interpolation of the Fourier transform between the data points. The quadratic natureof the intensity constraint narrows the space of admissible solutions (that satisfy the discreteset of equations) to two. Roots generated after gradual refinement of the EM data with moreslices can be expected to converge towards the true root. A related method to select the trueroot without needing refined EM data would be to selectively block data obtained at variousangles and evaluate the roots. The spurious root would be expected to fluctuate significantlywhereas the true root would be fairly stable throughout this blocking and calculation procedure.However such a method is also computationally intensive especially in two or three dimensionalcases. In such scenarios other techniques of root selection would be important, and these wouldbe highlighted later. A similar procedure can be performed for 3D although the situation is more complicated. We let N denote the dimension of the ˜ ρ vector and let M be the dimension of the EM data, ˆ ρ . Againthe set of equations would be under-determined since due to the interpolation conditions on asphere, √ N data points are not sufficient to interpolate an N − B in (23). Note that the fundamental assumption is that the system of equations isoriginally under-determined and allows for the (approximate) evaluation of a root. As mentionedearlier, over-determined systems would not be of practical interest. The quadratic equation (12) in the two-dimensional problem can be expanded as, (cid:0) H M (cid:1) ˜ ρ − ρ H M ˜ ρ + ˆ ρ H M ˆ ρ − I π = 0 (29)where M = ( F F H ) − is a Hermitian matrix that contains information about the distributionof the EM sampled points. The roots of this equation are,˜ ρ = ˆ ρ H M ± (cid:113) (1 + H M )[ I π − ˆ ρ H ( M − M H M H M )ˆ ρ ]1 + H M . (30)We also observe that the intensity only appears in the discriminant, which is proportional tothe term under the square root in (30). In fact, since H M ≥ M , the nature of roots is solely controlled by the sign of[ I π − ˆ ρ H ( M − M H M H M )ˆ ρ ]. 16ow, we define I ∗ as the following expression, I ∗ = 2 π ˆ ρ H (cid:18) M − M H M H M (cid:19) ˆ ρ. (31)Here, I ∗ can be interpreted as an intensity since it has a similar form as the original intensityconstraint in (22). Moreover, it will be shown that I ∗ is in fact the lowest possible intensitythat the EM data can admit, and is associated with the Fourier interpolation with the lowestnorm that is used to fit the EM data. All values of I (cid:54) = I ∗ will allow for two different rootsto exist. If I < I ∗ then the roots are complex whereas when I ≥ I ∗ the roots are real. Infact, the difference between the two roots would be proportional to √ I − I ∗ . The constantof proportionality would be related to the sampling of angles in the EM data since the scalar1 + H M is related to the angular spacing between various sampling points (and will have aconsiderably simple structure if the samples are uniformly spaced). A similar form of I ∗ existsfor the three-dimensional problem where we have, I ∗ = ˆ ρ H (cid:18) M − M H M π + H M (cid:19) ˆ ρ. (32)where M = ( F F H ) † and is now expressed in terms of the pseudo-inverse. The special case of complex conjugate roots:
Since the matrix M = ( F F H ) − is Her-mitian, the coefficients of the quadratic equation (12) are real (a similar conclusion can alsobe drawn for the three-dimensional case in (26)). The three possible solutions of ˜ ρ would berepeated real roots, distinct real roots and a complex conjugate pair. Here, we make a remarkon the significance of the complex conjugate pair of roots. A complex conjugate pair of rootswould imply that ˜ ρ = ˜ ρ . Relating the two vectors by the unitary matrix transformation, weget, ˜ ρ = U ˜ ρ . (33)Multiplying both sides by the inverse of the Hermitian matrix, U H and taking the conjugateof both sides, ˜ ρ = U T ˜ ρ . (34)Equating the right hand side of (33) and (34), we obtain U = U T for a non-zero ˜ ρ . We alsopoint out that a symmetric unitary matrix has many interesting properties, especially under thedecomposition, U = A + iB for symmetric real matrices, A and B . We observe from U U H = I that A + B = I and AB = BA . Particularly, if we define Y = B + iA , then Y U = U Y = iI .However, the existence of complex roots violates the important symmetry condition in equation(6) and its corresponding form in the three-dimensional case (21). A complex root suggests thatno real value of ˜ ρ can satisfy both the intensity as well as the EM data fitting constraint inthe given set of EM data. Complex roots (since it appears in pairs) are yet again symptoms ofdiscretisation introduced by the finite set of EM data and will disappear with finer sampling. Asimple numerical technique to deal with complex roots would be to find the closest real root toapproximate a given complex root that in this case would be the real component of the complexroot since it corresponds to the minimum point of the parabola representing the quadraticexpression. However it must be noted that by removing the imaginary component of the root,one also removes the information provided by the SAXS fusion data (since the intensity appearsin the discriminant, which finds itself only in the imaginary component of the complex root). In17uch scenarios, the SAXS-EM fusion would degenerate to a reconstruction from EM data alone.The situations when the SAXS information fusion degenerates to an EM reconstruction will beexplained in the following section in greater detail. In this subsection, we aim to establish a connection between the results obtained from theSAXS-EM fusion and those obtained from EM data alone. By doing so, it is possible to shedlight on the characteristics of the roots obtained in (30).Let a vector, ˜ ρ , be evaluated in the EM experiment such that, ˜ ρ = C † ˆ ρ . (35)Here, C is a matrix that corresponds to A (8) in the two-dimensional case or B (23) in thethree-dimensional case. It is important to point out that since C † is usually under-determined,(35) would have an infinite number of solutions and without any other constraint, the Moore-Penrose pseudo inverse would select the solution, ˜ ρ with the lowest Euclidean norm. The valueof the norm is proportional to the intensity obtained from the EM calculations, i. e. , I EM = 2 π ˜ ρ H ˜ ρ . (36)Substituting (35) in (36) and expressing C † as C H ( CC H ) − (which is true for under-determinedsystems), we obtain, I EM = 2 π ˆ ρ H ( CC H ) − ˆ ρ (37)using the Hermitian property of CC H . The matrix, CC H , can be related to the F matrixdefined in equations (10) and (24) as CC H = F F H + H where is a column vector of ones.The inverse of CC H appears in (37) and the formula derived by Sherman and Morrison [1950]allows us to express, ( CC H ) − = ( F F H + H ) − = ( F F H ) − − ( F F H ) − H ( F F H ) − H ( F F H ) − ) . (38)Here tr represents the trace operator. Substituting M = ( F F H ) − in (38) and using theresult in (37), we see that, I EM = I ∗ (39)where I ∗ was defined in (32).The result has important implications in understanding the usefulness of the SAXS intensityinformation in the fusion procedure. Firstly, we emphasize that I EM sets a minimum boundto the intensity value that can be used to set a constraint on a given set of EM data. Whenthe SAXS intensity information is greater than this value, i. e. I > I EM , one obtains tworeal solutions. As shown earlier, these solutions are related by a unitary transformation andgeometrically they can be visualized as two points on a constant intensity sphere in ˜ ρ spacethat can be related through reflections or rotations. When I = I EM the SAXS-EM fusiondegenerates to an EM reconstruction alone and the SAXS intensity information adds no furtherinformation to the EM data. Geometrically, the solution space collapses to a point and the onlyunitary transformation that can exist is the identity transformation. When I < I EM , it is notpossible to fuse the SAXS information with the given set of EM data without violating some18f the constraints (in this case, the symmetry conditions). In practical usage, the imaginaryterm is dropped such that we yet again obtain the situation where I = I EM and the SAXS-EMfusion degenerates to an EM reconstruction. Hence, SAXS intensity information can only addadditional information to the results from the EM experiment when I > I EM .We also make a final remark that when the EM data is sampled over concentric circlesor spheres, the lower bound of intensity applies separately for each radius. In overall, it isreasonable to expect that for some, but not necessarily all, radii the SAXS-EM fusion woulddegenerate to an EM reconstruction. Additionally, this result as of yet makes no predictionon whether the SAXS-EM reconstruction is “better” than the EM reconstruction. This woulddepend on other factors as well such as sampling, form of function to be reconstructed and orderof the harmonic interpolation and on the conditioning of the relevant matrices. In the development of the numerical algorithm to implement SAXS-EM fusion, the importantquestion of how to select the “correct” root needs to be answered. For the case of complex andrepeated roots, this is not a problem since in both cases the real part of the two roots are thesame. However, for the case of real but distinct roots, an additional root selection method needsto be devised. The underlying concept behind the creation of such a method is to observe thatthe spurious root would be more highly dependent on discretiszation parameters such as thesample points and their spacing. On the other hand, the true root would tend to be stable as themesh is refined or the sample points modified. For instance, consider the Fig. 7 that shows thevariations of the two roots with increasing refinement of samples of EM data for the problem inSection 3.3.3: roughly, the higher the number of great circles sampled, the lesser the differencein the absolute values of the two roots. Both roots converge to the analytic solution of zero asthe number of great circles increases. However, the convergence is not monotonic since the greatcircles are randomly chosen for each step of sample refinement. Hence, the fluctuations dependon the sensitivity of a given root to the set of sample points; the greater the fluctuation thegreater the likelihood that a given root can be considered as “spurious”. The yellow markersindicate the root that is ultimately selected by the algorithm to perform the SAXS-EM fusionfor the given set of data. In this specific case, root 1 appears to experience fluctuations of loweramplitude and is selected in preference to root 2 for most part. The selection was carried outby a direct summation method, which will be explained further in the subsequent paragraphs.19 igure 7.
Convergence of the absolute magnitudes of the roots with increasing number of sampledgreat circles.
The problem of selecting the correct root is complicated by the fact that with the discrete setof EM data, one does not usually have a priori knowledge of the final reconstructed function. Insome cases, it may be possible to assume that the final reconstruction should be fairly smoothand hence spurious Fourier interpolations that fluctuate greatly over the domain of EM datacan be eliminated. This can be done by integrating (5) or (18) numerically, using the EMdata, to obtain an approximate value for ρ or ρ , respectively. This value forms a guess ora “selector” and would then be compared with the roots of the quadratic equation and thevalue with a lower absolute difference is chosen. This working assumption would then be thatthe numerical integration of the original EM data to obtain the zeroth order Fourier coefficientwould be similar to the numerical integration of the interpolated data and this may be true,roughly, if the fluctuations have a period greater than the average spacing between the discreteEM data samples.There are many ways to carry out this integration. One way would be to interpolate the orig-inal EM data (using a polynomial for instance) over the sphere of interest and then evaluate theintegrals in (5) and (18) through a quadrature rule. Another method would be to approximatethe integral as a summation over the EM data as the following in three-dimensions,˜ ρ , ≈ √ π m (cid:88) i =1 ˆ ρ ( θ i , φ i ) sin θ i ∆ θ ∆ φ (40)and as, ˜ ρ ≈ π m (cid:88) i =1 ˆ ρ ( θ i )∆ θ. (41)for the two-dimensional case.Here, there are 2 m points in the EM data set and ∆ θ and ∆ φ represent an average gridspacing in the polar and azimuthal directions respectively. The following plot presents the20onvergence rates of the roots along with that of the selector, again applied to the example inSection 3.3.3. Figure 8.
The variation of the real parts of the roots as well as that of the root selectors (evaluatedby direct summation or interpolation) with increasing number of sampled great circles. The imaginaryparts of the roots are of order 10 − and have been neglected. From the figure it is clear that interpolating the EM data over the sphere followed by in-tegration to obtain the zeroth order Fourier series coefficient is worse as a selector as opposedto approximating the root through a direct sum method in (40) or (41). The analytic solutionfor the root is zero for this particular example and it can be seen that the selector evaluatedthrough an interpolation followed by numerical integration converges too slowly with samplerefinement. The roots are therefore selected through (40) and plotted as the green circles. Theroot evaluated directly from the EM data is also plotted; this corresponds to the value of ρ , that results in a unique real solution for (26). Since we know from Section 4.4 that this solutionis the average of the two roots evaluated from SAXS-EM fusion, the EM root would not selectone root in the preference of another as both SAXS-EM fusion roots differ by the same absolutemagnitude from it; hence, the EM solution forms an unbiased reference to compare various rootselection strategies against each other.Another approach to selecting roots is to repeatedly block angles of EM information, onedata point at a time, and calculate the roots for the angles that remain. This is repeated forall angles in the set of EM data and the root whose value is nearly stable in this blocking andcalculating procedure is selected. Although this does allow for a good reconstruction, as shownin Section 5.1, it is a computationally intensive procedure and was replaced by (40) and (41)for more complicated shapes such as the “smiley” and “minion” tested later.21 Numerical Experiments
An experiment involving the reconstruction of a sum of two two-dimensional Gaussian functionsis performed in this section. The original function in real space is displayed in Figure 9 and hasthe following expression, f ( x, y ) = 12 πσ σ exp (cid:18) − (cid:18) x σ + y σ (cid:19)(cid:19) + 12 πσ σ exp (cid:18) − (cid:18) x σ + y σ (cid:19)(cid:19) (42)where σ = 0 . σ = 0 . σ = 0 .
05 and σ = 0 . Figure 9.
Sum of two co-origin Gaussian Functions in real space (conceptual plot).
The Fourier transform of this function is the sum of another two Gaussian functions, asshown in Fig. 10. The analytical expression for this Fourier transform would then be, F ( ω , ω ) = exp (cid:18) − σ ω − σ ω (cid:19) + exp (cid:18) − σ ω − σ ω (cid:19) . (43)22 igure 10. Fourier transform of the sum of two co-origin Gaussian Functions in Fourier space (con-ceptual plot).
A comparison of the Fourier transform of the original density with that obtained by SAXS-EM fusion and EM alone with the projection angle gap [60 ◦ , ◦ ] is shown in Fig. 11. Theknown EM data is evenly distributed in the meshed range, while the unmeshed region indicatesthe gap in the data. Figure 11.
Sampling of EM data
Figure 12 shows the result with the projection angle gap [60, 110]. As can be seen fromthe figure, SAXS-EM fusion gives results close to the original function while EM alone cannotreconstruct the values well. This result therefore shows a success of the synergism betweencryo-EM and SAXS in the 2D case. More tests were run with different original functions, inputangles, regions and widths of gaps, showing consistent results.23 igure 12.
Comparison of the Fourier transform of original density with that obtained by SAXS-EMfusion and EM alone
A two-dimensional smiley face was constructed using (47) and substituting N = 2 in the defi-nition of the Gaussian in (45). Here, the mean vector, µ k would represent the two-dimensionalcoordinates of a point on a discretized smiley face, with each coordinate in the smiley facemapping to a µ k and a corresponding Gaussian, f ( x ; µ k , Σ k ). The weighting coefficient A k was adjusted for each Gaussian to modify the amplitude of each function and a smiley face wascreated as shown in Fig. 13. The analytical Fourier transform, F ( ω ) was evaluated for eachGaussian using (46) with N = 2 and summed using (48). Figure 13.
Smiley Face constructed by sum of 2D Gaussian distributions
Since the SAXS experiment provides the square average of the function F ( ω ) along circleswith different radii, we let ω = [ r cos θ, r sin θ ]; the Fourier transform plotted in polar coordinatesis shown in Fig. 14. Then, the SAXS data can be obtained by integrating the square of theamplitude of the analytical Fourier transform along each circle with radius r . Similar to (13),the SAXS information is, I ( r ) = (cid:90) π | F ( r, θ ) | d θ . (44)24 igure 14. Geometric situation of Smiley Faceafter Fourier transform
Figure 15.
Reconstruction of 2D smiley facewith no projection gap
In our numerical experiment, the Fourier transform of the original smiley face is discretizedinto 65 circles whose radii are evenly spaced. Along each circle, 65 points are distributed evenly.Following this, a gap is introduced in the projection directions in order to simulate the EMexperiment. All numerical simulations were performed with a MATLAB R2018b, i5-4690 CPU.Both the EM reconstruction and SAXS-EM fusion reconstruction were completed along eachcircle where a quadratic (29) was solved. Equation (30) gives the analytical expression of theroot. But instead of selecting the true root by selectively blocking data obtained at variousangles and choosing the fairly stable one, we obtain an approximate value of the root calculatedby (41) as a reference to to select root. The one closer to this approximated value is chosen asthe right root. On the other hand, if the roots are complex, we only take the real part of the root.During the numerical experiment, when the radius in Fourier space gets larger, the root tendsto transform from real to complex, introducing relatively large errors when compared with theanalytical solution for the true roots. But in a real EM experiment, we have no way to do such acomparison with an analytical result because only discrete data from EM is provided. Therefore,the root solved from the quadratic equation that is closest to the approximated selecting rootis the one that we use to fuse the SAXS and EM information. ◦ Projection Angle Gap
Figure 15 provides the ground truth of the smiley face reconstruction using complete EMdata alone ( i. e. there are no missing wedges in the EM data). Figure 16 shows the result withthe projection angle gap [60 ◦ , ◦ ]. As can be seen from the figure, SAXS-EM fusion restoresmost information in the projection gap that EM alone cannot. EM data alone also gives a falserestoration at the edge of the region. These high frequency artefacts in Fourier space correspondto sharp discontinuities or changes in the reconstructed smiley face in real space. The presenceof such artefacts can also lower the resolution of the final reconstructed smiley face.25 igure 16. Comparison of Fourier transform of original density with that obtained by SAXS-EMfusion and EM alone
With the results provided by the two methods above, a real smiley face can be reconstructedby interpolating from the discrete data in polar coordinates to Cartesian coordinates to set-upthe data for an inverse FFT. To improve the accuracy of coordinate transformation, oversamplingis used here to increase the density of radius and angle distribution. Based on the intensive data,two-dimensional cubic spline interpolation is used to evaluate the value at each point given inthe Cartesian coordinate. The output domain of the polar to Cartesian transformation andinterpolation is a square whose side length equals to the diameter of the biggest circle that canbe drawn in the polar coordinates; outside this circle, the Fourier transform values are set tozero, consistent with the analytical condition.Figure 17 gives the results and comparison of reconstruction of the smiley face obtained bySAXS-EM fusion and EM alone. In Fourier space, the EM reconstruction adds two more peaksin addition to the central peak, resulting in large errors when filling the missing wedge. Themean error in magnitude compared of the SAXS-EM result compared with the analytical smileyface Gaussian distribution is 0.82591 while EM alone without the SAXS fusion gives an errorof 2.0562 when compared with the analytical smiley face result. This shows a success of thesynergism between cryo-EM and SAXS in 2D case when reconstructing to real space.Figure 18 depicts the results and comparison of reconstruction with a 30 ◦ projection anglegap when the start angle is 5 ◦ . The result shows a higher error in SAXS-EM fusion reconstructionin the case when information in [5 ◦ , ◦ ] is missing as opposed to that when [60 ◦ , ◦ ] is missing;nevertheless, the SAXS-EM fused result outperforms the EM-only reconstruction. Figure 19shows the results and comparison of a reconstruction with a 30 ◦ projection angle gap when thestart angle is 100 ◦ . In such a condition, the SAXS-EM fusion does not show any distinctiveadvantage and both EM alone and SAXS-EM reconstruct a relatively good smiley face. Inconclusion, the reconstruction effect is related to the information contained in the projectiongap. Only when a certain amount of key information is contained in the gap will the advantageof SAXS-EM fusion be more pronounced. 26 igure 17. Comparison of smiley face reconstruction obtained by SAXS-EM fusion and EM alone for[60 ◦ , ◦ ] missing wedge Figure 18.
Comparison of smiley face reconstruction obtained by SAXS-EM fusion and EM alone[5 ◦ , ◦ ] missing wedge igure 19. Comparison of smiley face reconstruction obtained by SAXS-EM fusion and EM aloneunder [100 ◦ , ◦ ] missing wedge Two Projection Angle Gaps
Up to this point, we have investigated the case of one gap in the projection angle. In thissubsection, two projection angle gaps of 15 ◦ and 25 ◦ each are considered. Figure 20 showsthe result when the gaps are [5 ◦ , ◦ ] and [60 ◦ , ◦ ]. The EM experiment helps to reconstruct asmoother smiley face while SAXS-EM fusion provides additional artefacts that result in a highermean error. But judging from the color, the SAXS-EM fusion result is closer to the ground truthin the outline and the eyes of the face. Since 15 ◦ is so small that each gap does not containmuch information, we increased the missing wedge and tested [5 ◦ , ◦ ] and [60 ◦ , ◦ ]. Figure 21shows the reconstruction result of two wedges when each angle gap is 25 ◦ . By using SAXS-EMfusion, most of the smiley face is reconstructed except for some diagonal streaks in the centre,which are also present in the EM results. Nevertheless, SAXS-EM fusion has a considerablylower error. This illustrates again that the difference between a SAXS-EM fusion reconstructionand an EM-alone reconstruction is greater if the missing wedges contain important informationabout the object. Figure 20.
Comparison of smiley face reconstruction obtained by SAXS-EM fusion and EM aloneunder two gaps of 15 ◦ each. igure 21. Comparison of smiley face reconstruction obtained by SAXS-EM fusion and EM aloneunder two gaps of 25 ◦ each. Larger Gaps and Fewer Sampling points
In the cases of a larger projection angle gap such as [60 ◦ , ◦ ], Fig. 22 gives the results whereboth SAXS-EM and EM-alone reconstructions fail to reconstruct a smiley face. In Fourier space,we find that the largest error occurs when the points in the gap are far from the centre. However,the reconstructed smiley face using SAXS-EM information fusion still has some advantages dueto the display of the face outline and the lower mean error compared with that given by usingEM data alone. In other cases, when the projection angle gap gets smaller ( i. e. ◦ ), the twomethods can both reconstruct the 2D smiley face successfully with low error.Another remark is that the reconstruction effect is also related to grid size in real space.When we sample 33 different radii and 33 discrete points evenly spaced along each circle, SAXS-EM fusion fails to show its superiority in the smiley face reconstruction. This is partly because ofan ill-conditioning in the matrix, M − = ( F F H ), leads to numerical errors in the computation of M in (29) at certain angle gaps and starting angles. Moreover, since the root selection procedureis not entirely foolproof, it is possible that for some radii a “wrong” root is selected in a coarsegrid —where the sampling is not fine enough to ensure that both roots have converged—allowingthe possibility of the SAXS reconstruction performing worse than the EM reconstruction.29 igure 22. Comparison of smiley face reconstruction obtained by SAXS-EM fusion and EM aloneunder 50 ◦ missing wedge Figure 23.
Comparison of smiley face reconstruction obtained by SAXS-EM fusion and EM aloneunder 50 ◦ missing wedge .3 3D smiley face A three-dimensional smiley face was constructed using (47) and substituting N = 3 in thedefinition of the Gaussian in (45). Here, the µ of each Gaussian distribution is similar to thatin the two-dimensional case in Sec. 5.2. The only difference between the two-dimensional andthree-dimensional mean vector is in the component of µ perpendicular to the plane containingthe two-dimensional smiley face explored in Sec. 5.2. In this example, we construct a three-dimensional smiley face such that µ = ( µ x , µ y , µ z ) and ( µ x , µ y ) is the mean vector correspondingto a given Gaussian on the two-dimensional smiley face in Sec. 5.2. Then, µ z is defined as 10values divided evenly from 0 to 1.4. In effect, the three-dimensional smiley face is constructedby extruding the two-dimensional smiley face and discretizing it into spherical Gaussians; theresult would be a cylindrical object as shown in Fig. 24. The Fourier transform of the three-dimensional smiley face is then calculated by (46) with N = 3 and summed using (48). Figure 24.
3D smiley face
The Fourier transform of the smiley face in spherical coordinate for several values of radiuscan be seen in Fig. 25. 31 igure 25.
Fourier transform of the smiley face ( r = 1 , , , Since it is possible to deduce the Fourier transform of the 3D smiley face analytically, we cangenerate multiple great circles and obtain the value of the Fourier transform on sampled pointson each great circle using the analytical function. This sampling was done at discrete pointsand great circles to simulate the incompleteness of the data generated by the EM experiment.Figure 26 shows all the sampled points from all great circles considered in Fourier space. Atthe moment, the number of sampled points for each sphere with different radii in Fourier spaceis the same but this need not always be the case, and the case where the number of sampledpoints is a non-constant function of sphere radius will be discussed later.
Figure 26.
Sampled points in Fourier space
Now, we proceed to discuss the results obtained in Fourier space by SAXS-EM fusion and anEM-alone experiment. With that, a real smiley face can be reconstructed by interpolating fromdiscrete data in spherical coordinate to Cartesian coordinate that is prepared for inverse FFT.Based on the data in spherical coordinates, a three-dimensional cubic spline interpolation isused to evaluate the value at each point given in Cartesian coordinates. Like the 2D case (5.2),the whole region of Cartesian coordinate is a cube whose side length equals the diameter of thebiggest circle given in spherical coordinates whereas outside the spherical range the value of the32ourier transform is set to zero, consistent with the analytical condition. After that, we takethe inverse Fourier transform to real space. The reconstruction obtained by the two methodsis showed in Fig. 27 and Fig. 28. In Fig. 27 and Fig. 28, the first row shows three slices of thereconstruction to make a better comparison and the second row is the 3D model in a differentview. To show a better comparison, we define “bad points” as the points at which the errorbetween the original and the reconstructed function is greater than 20% of the original value.
Figure 27.
Result of SAXS-EM reconstruction
Figure 28.
Result of EM reconstruction
We can see that the two reconstructions are all good since there are many sampled points inFourier space and the simulation covers almost all the information. However, we may lose someinformation in a real experiment.
To simulate what happens in the real experiment, we also set some cones in the spherein Fourier space and exclude all the great circles whose directions are outside these cones. Asshown in Fig. 29, the red points represent the direction perpendicular to the plane containing thegreat circles and the blue cones represent the collection of those directions whose correspondinggreat circles are sampled in the EM data. Those red points which are not in the blue cones areexcluded. The center axes of these blue cones are generated by several different sets of θ and ϕ ,where ϕ ∈ (cid:2) π π (cid:3) θ ∈ (cid:2) π π π π (cid:3) . The angle around each center axes is 9 ◦ . 33 igure 29. Direction of great circles removed
Figure 30.
Sample points after exclusion
The reconstruction result by two methods are showed in Figure 32 and Figure 31.
Figure 31.
SAXS-EM reconstruction by randomgreat circle and constant sampling points
Figure 32.
EM reconstruction by random greatcircle and constant sampling points
The reconstruction by SAXS-EM fusion is more like a smiley face than that obtained byEM alone especially in z = 0 plane. Though we may not see an obvious difference between tworesults, the number of bad points can show the advantage of SAXS-EM fusion. Uniformly Distributed Great Circles with Constant Sampling Number
In the simulation above, since the normal direction of great circles are generated randomly,the normal vectors will tend aggregate in some areas and be sparsely distributed in other regions.Hence, we generate the great circles evenly on the sphere, as shown in Fig. 33, and remove theinformation obtained by great circles whose normal vectors, which are the red points, are in theblue cones. 34 igure 33.
Normal direction of great circles re-moved
Figure 34.
Sample points after exclusion
With the same process mentioned above, we could get the reconstruction of smiley face inreal space. The result obtained by SAXS-EM and EM alone can be seen in Fig. 35 and Fig. 36.
Figure 35.
SAXS-EM reconstruction by uniformgreat circles and constantly sampling
Figure 36.
EM reconstruction by uniform greatcircles and constantly sampling
We can see that the outline of the smiley face is missed in plane z = 0 and z = 1 . Uniformly Distributed Great Circles with Radius Dependent Sampling Number
In the process of reconstruction above, the sampled points for each great circle in Fourierspace is constant for each sphere with different radii. Since the perimeter of a great circle is linearwith respect to the radius, if we take sample points like that, we may miss much information insphere with a larger radius and get too much information in sphere with smaller radius, whichis not true in the real experiment. 35ence, we modify the number of sample points in each great circle to be linear to the radiusof the sphere. We take the same great circles and missing wedges as showed in Fig. 33. In thiscase, the number of the sampled points in each great circles is linear to the radius, num = (cid:100) kr (cid:101) .We set k = 1 . Figure 37.
Sampled points in sphere with different radius
With same process, we can get the reconstruction of two methods and the results obtainedby SAXS-EM and EM alone are in Fig. 38 and Fig. 39.We can see that the reconstruction obtained by SAXS-EM looks quite similar to the originalone while we can hardly say that the reconstruction obtained by EM alone looks like a smileyface. Also, the number of bad points of the reconstruction by using EM alone is much greaterthan that by using SAXS-EM fusion.At a higher value of k , when k = 3, there is a greater improvement in the results. In thiscase, for the SAXS-EM reconstruction, one can clearly see the smiley face. Using EM alone, wecan also get a good reconstruction since we obtain enough information with a higher value of k .The results are shown in Fig. 40 and Fig. 41. 36 igure 38. SAXS-EM reconstruction (k=1.5)
Figure 39.
EM reconstruction (k=1.5)
Figure 40.
SAXS-EM reconstruction (k=3)
Figure 41.
EM reconstruction (k=3)
A lower value of k , where k = 1, also been simulated. The result can be seen in Fig. 42 andFig. 43. In this case, the EM reconstruction looks messy. Moreover, the SAXS-EM reconstruc-tion also does not look like a smiley face. Nevertheless, we can see the outline of the smiley facein SAXS-EM reconstruction and the number of bad points is also much less in the SAXS-EMreconstruction. 37 igure 42. SAXS-EM reconstruction (k=1)
Figure 43.
EM reconstruction (k=1)
Before we apply the technique to a real biological macromolecule, we create a minion as the sumof a set of 3D Gaussian distributions by separating different means of the Gaussian functionin space similar to the 3D smiley face in Sec. 5.3. The center of each Gaussian function is theposition of atoms and we seek to reconstruct the electron density in these atoms. As Fig. 44shows, the colorbar from yellow to red indicates the magnitude of Gaussian function at certainpoints which represents the density in real space. We also use the magnitude to set the size ofeach points in this space for better visual effect. We can clearly see the different body parts ofthis character. Actually, the whole space is full of points indicating the density but when thedensity gets very low, the points there are too small to be visible. Hence, most atoms center atthe outline of the minion.
Figure 44.
Minion created in real space
Figure 45 shows what the minion is like in Fourier space. We pick spheres with radius 0.5,1 and 2 to illustrate its Fourier transform magnitude. Similar to what we have experimentedin 3D smiley face first, Fig. 46 shows the reconstruction results of the minion without cones toexclude sampling points in Fourier space. 38 igure 45.
Fourier transform of minion in spheres of radius 0.5, 1 and 2
Figure 46.
Reconstruction of Minion without exclusion of sampling points
And then cones are set in the sphere to exclude all the great circles whose normal directionare inside these cones in Fourier space. The center axis direction vector is defined in Sec. 5.3.2and the great circles are generated uniformly in space. We still keep the radius dependentsampling number on each great circle here. The reconstruction result is shown in Fig. 47.39 igure 47.
Comparison of Minion reconstruction by SAXS-EM fusion and EM-alone
SAXS-EM fusion successfully restores the main body of the minion and especially the twohands which are the two clouds of black points on both sides of the body. Although we cansee fuzzy blue eyes and feet by EM reconstruction alone, the main yellow body is separatedin the whole space. It means the places where density should be extremely small are givencomparatively higher density. 40
Discussion, Conclusion, and Outlook
In this paper, we presented a novel method of fusing information obtained from Small AngleX-ray Scattering (SAXS) and cryo-EM experiments. Despite the fact that SAXS data is ofa lower resolution than EM data, it can be shown that information fusion greatly improvesan EM reconstruction especially when the EM data is sparse or misses important regions ofhigh information density. Numerical experiments with shapes in two and three-dimensionsdemonstrated the advantages of implementing a SAXS-EM fusion as opposed to an EM-onlyreconstruction. On the other hand, there also exists a theoretical lower bound for the minimumintensity that a given sample of EM data can accommodate and the SAXS information fusionwould only add new information if the intensity is above this value for a given radius in Fourierspace. Finally, we make the following set of observations that we hope would spur furtherresearch in the capabilities of SAXS-EM fusion and particularly, with applications to problemsinvolving the elucidation of macromolecule structure from sparse EM data:1. Application of the SAXS-EM fusion to complex biological macromolecules. In this paper,we tested the SAXS-EM fusion with a model of a “minion”, with encouraging results. It revealsthat the SAXS-EM fusion technique would be useful to obtain important information thatEM experiments may otherwise miss by virtue of having the slice planes not pass throughimportant structural details of biological macromolecules. Since the SAXS information is aspherical average, outstanding structural information would be recorded and can be used toenrich the EM data.2. The matrices involved in the solution of the quadratic equation tend to become ill-conditioned when the angles sampled are too close to each other. To allow the possibilityof finely sampled EM data, the solution method to obtain the roots for the governing quadraticequation may need improvement so as to avoid the need to deal with ill-conditioned matrices.3. Allowing the number of sample points on a given great circle to vary linearly with theradius of the sphere leads to large improvements in the SAXS-EM reconstruction, and thistechnique can be utilized in further implementations of SAXS-EM fusion.4. The root selection procedure proposed in the paper was computationally efficient and alsoaccurate. Nevertheless, it may also be possible to formulate more robust selection rules throughintegral quadratures (specifically, the stable and accurate integration over scattered data on asphere) and allow for a more accurate identification of spurious roots. In fact, it may be possibleto add further terms to “correct” the roots obtained from the quadratic equation to obtain moreaccurate results.
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this article.
Acknowledgement
Research reported in this publication was supported by the National Institute of General MedicalSciences of the National Institutes of Health under award number R01GM113240.41 ppendix A - The N -dimensional Gaussian The definition of the N -dimensional Gaussian is reviewed in the following. Shapes in two-dimensions and three-dimensions can be constructed as a superposition of two-dimensional andthree-dimensional Gaussians respectively. The N -dimensional Gaussian function defined at x ,and with a mean of µ and covariance matrix, Σ, is, f ( x ; µ , Σ) = 1(2 π ) N | det Σ | exp (cid:26) −
12 (x − µ ) T Σ − (x − µ ) (cid:27) (45)The Fourier transform of the Gaussian function is also a Gaussian function in Fourier space,and can be written as a function of ω as,ˆ f ( ω ; µ , Σ) = exp (cid:18) − ω T Σ ω − i µ (cid:19) . (46)Hence, it is possible to approximate a general shape φ ( x ) as a linear combination of n Gaussians through, φ ( x ) ≈ n (cid:88) k =1 A k f ( x ; µ k , Σ k ) (47)where A k is the area of the k th Gaussian function, and used to control the weight of the Gaussianand thereby alter its contribution to the total sum. Since the Fourier transform is a linearoperator, the Fourier transform of φ ( x ) would be,ˆ φ ( ω ) ≈ n (cid:88) k =1 A k ˆ f ( ω ; µ k , Σ k ) . (48) References
Afsari, B., Kim, J. S., and Chirikjian, G. S. (2015). Cross-validation of data in SAXS and cryo-EM. In IEEE
International Conference on Bioinformatics and Biomedicine (BIBM), pages1224–1230.Bhamre, Tejal, Zhang, Teng, and Singer, Amit (2016). Denoising and covariance estimation ofsingle particle cryo-em images.
Journal of structural biology , 195(1):72–81.Biedenharn, L. C., and Louck, J. D. (1981).
Angular Momentum in Quantum Physics: Theoryand Application . Addison-Wesley, Boston, MA, USA.Blanchet, Clement E, and Svergun, Dmitri I (2013). Small-angle x-ray scattering on biologicalmacromolecules and nanocomposites in solution.
Annual review of physical chemistry , 64:37–54.Chirikjian, Gregory S, and Kyatkin, Alexander B (2016).
Harmonic Analysis for Engineers andApplied Scientists: Updated and Expanded Edition . Courier Dover Publications.Crowther, Richard Anthony, DeRosier, DJ, and Klug, Aaron (1970). The reconstruc-tion of a three-dimensional structure from projections and its application to electron mi-croscopy.
Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences ,317(1530):319–340.Dong, H., Kim, J. S., and Chirikjian, G. S. (2015a). Computational analysis of SAXS dataacquisition.
J. Comput. Biol. , 22(9):787–805.42ong, Hui, Kim, Jin Seob, and Chirikjian, Gregory S (2015b). Computational analysis of saxsdata acquisition.
Journal of Computational Biology , 22(9):787–805.Feigin, LA, Svergun, Dimitrij I et al. (1987).
Structure analysis by small-angle X-ray and neutronscattering , Volume 1. Springer.Frank, Joachim (2006).
Three-dimensional electron microscopy of macromolecular assemblies:visualization of biological molecules in their native state . Oxford University Press.Freeden, Willi, Gervens, Theodor, and Schreiner, Michael (1998).
Constructive Approximationon the Sphere . Oxford University Press, Oxford, UK.Kim, Jin Seob, Afsari, Bijan, and Chirikjian, Gregory S (2017). Cross-validation of data com-patibility between small angle x-ray scattering and cryo-electron microscopy.
Journal of Com-putational Biology , 24(1):13–30.Kunis, Stefan, and Potts, Daniel (2003). Fast spherical Fourier algorithms.
J. Comput. Appl.Math. , 161(1):75–98.Park, Wooram, and Chirikjian, Gregory S (2014). An assembly automation approach to align-ment of noncircular projections in electron microscopy.
IEEE Transactions on AutomationScience and Engineering , 11(3):668–679.Park, Wooram, Midgett, Charles R, Madden, Dean R, and Chirikjian, Gregory S (2011). Astochastic kinematic model of class averaging in single-particle electron microscopy.
TheInternational journal of robotics research , 30(6):730–754.Penczek, Pawel, Radermacher, Michael, and Frank, Joachim (1992). Three-dimensional recon-struction of single particles embedded in ice.
Ultramicroscopy , 40(1):33–53.Penczek, Pawel A (2002). Three-dimensional spectral signal-to-noise ratio for a class of recon-struction algorithms.
Journal of structural biology , 138(1-2):34–46.Penczek, Pawel A, Zhu, Jun, and Frank, Joachim (1996). A common-lines based method fordetermining orientations for n¿ 3 particle projections simultaneously.
Ultramicroscopy , 63(3-4):205–218.Potts, Daniel, Steidl, Gabriele, and Tasche, Manfred (2001). Fast Fourier transforms for noneq-uispaced data: a tutorial. In Benedetto, John J., and Ferreira, Paulo J. S. G., editors,
ModernSampling Theory , Applied and Numerical Harmonic Analysis, pages 247–270. Birkh¨auserBoston, MA, USA.Schatz, Michael, and Van Heel, Marin (1990). Invariant classification of molecular views inelectron micrographs.
Ultramicroscopy , 32(3):255–264.Scheres, Sjors HW (2012). Relion: implementation of a bayesian approach to cryo-em structuredetermination.
Journal of structural biology , 180(3):519–530.Scheres, Sjors HW, Valle, Mikel, Nu˜nez, Rafael, Sorzano, Carlos OS, Marabini, Roberto, Her-man, Gabor T, and Carazo, Jose-Maria (2005). Maximum-likelihood multi-reference refine-ment for electron microscopy images.
Journal of molecular biology , 348(1):139–149.Sherman, Jack, and Morrison, Winifred J. (1950). Adjustment of an inverse matrix correspond-ing to a change in one element of a given matrix.
The Annals of Mathematical Statistics ,21(1):124–127.Shkolnisky, Yoel, and Singer, Amit (2012). Viewing direction estimation in cryo-em usingsynchronization.
SIAM journal on imaging sciences , 5(3):1088–1110.Sigworth, Fred J (1998). A maximum-likelihood approach to single-particle image refinement.
Journal of structural biology , 122(3):328–339.Singer, Amit, Zhao, Zhizhen, Shkolnisky, Yoel, and Hadani, Ronny (2011). Viewing angleclassification of cryo-electron microscopy images using eigenvectors.
SIAM Journal on ImagingSciences , 4(2):723–759.Stuhrmann, HEINRICH B (1970). Interpretation of small-angle scattering functions of dilute43olutions and gases. a representation of the structures related to a one-particle scatteringfunction.
Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical andGeneral Crystallography , 26(3):297–306.Svergun, Dmitri I, and Koch, Michel HJ (2003). Small-angle scattering studies of biologicalmacromolecules in solution.
Reports on Progress in Physics , 66(10):1735.Van Heel, Marin, and Frank, Joachim (1981). Use of multivariates statistics in analysing theimages of biological macromolecules.
Ultramicroscopy , 6(1):187–194.Wang, Liguo, and Sigworth, Fred J (2006). Cryo-em and single particles.