24
Pඋඈർ. Iඇඍ. Cඈඇ. ඈൿ Mൺඍ. – 2018 Rio de Janeiro, Vol. 4 (4013–4032) MATHEMATICS FOR CRYO-ELECTRON MICROSCOPY Aආංඍ Sංඇൾඋ Abstract Single-particle cryo-electron microscopy (cryo-EM) has recently joined X-ray crys- tallography and NMR spectroscopy as a high-resolution structural method for biolog- ical macromolecules. Cryo-EM was selected by Nature Methods as Method of the Year 2015, large scale investments in cryo-EM facilities are being made all over the world, and the Nobel Prize in Chemistry 2017 was awarded to Jacques Dubochet, Joachim Frank and Richard Henderson “for developing cryo-electron microscopy for the high-resolution structure determination of biomolecules in solution”. This paper focuses on the mathematical principles underlying existing algorithms for structure determination using single particle cryo-EM. 1 Introduction The field of structural biology is currently undergoing a transformative change Kühlbrandt [2014] and Smith and Rubinstein [2014]. Structures of many biomolecular targets previ- ously insurmountable by X-ray crystallography are now being obtained using single par- ticle cryo-EM to resolutions beyond 4Å on a regular basis Liao, Cao, Julius, and Cheng [2013], Amunts et al. [2014], and Bartesaghi, Merk, Banerjee, Matthies, X. Wu, Milne, and Subramaniam [2015]. This leap in cryo-EM technology, as recognized by the 2017 Nobel Prize in Chemistry, is mainly due to hardware advancements including the inven- tion of the direct electron detector and the methodological development of algorithms for data processing. Cryo-EM is a very general and powerful technique because it does not require the formation of crystalline arrays of macromolecules. In addition, unlike X-ray crystallography and nuclear magnetic resonance (NMR) that measure ensembles of parti- cles, single particle cryo-EM produces images of individual particles. Cryo-EM therefore Partially supported by Award Number R01GM090200 from the NIGMS, FA9550-17-1-0291 from AFOSR, Simons Foundation Math+X Investigator Award, and the Moore Foundation Data-Driven Discovery Investigator Award. MSC2010: primary 92C55; secondary 68U10, 44A12, 62H12, 33C55, 90C22. 4013

Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

P . I . C . M . – 2018Rio de Janeiro, Vol. 4 (4013–4032)

MATHEMATICS FOR CRYO-ELECTRON MICROSCOPY

A S

Abstract

Single-particle cryo-electronmicroscopy (cryo-EM) has recently joinedX-ray crys-tallography and NMR spectroscopy as a high-resolution structural method for biolog-ical macromolecules. Cryo-EM was selected by Nature Methods as Method of theYear 2015, large scale investments in cryo-EM facilities are being made all over theworld, and the Nobel Prize in Chemistry 2017 was awarded to Jacques Dubochet,Joachim Frank and Richard Henderson “for developing cryo-electron microscopy forthe high-resolution structure determination of biomolecules in solution”. This paperfocuses on the mathematical principles underlying existing algorithms for structuredetermination using single particle cryo-EM.

1 Introduction

The field of structural biology is currently undergoing a transformative change Kühlbrandt[2014] and Smith and Rubinstein [2014]. Structures of many biomolecular targets previ-ously insurmountable by X-ray crystallography are now being obtained using single par-ticle cryo-EM to resolutions beyond 4Å on a regular basis Liao, Cao, Julius, and Cheng[2013], Amunts et al. [2014], and Bartesaghi, Merk, Banerjee, Matthies, X. Wu, Milne,and Subramaniam [2015]. This leap in cryo-EM technology, as recognized by the 2017Nobel Prize in Chemistry, is mainly due to hardware advancements including the inven-tion of the direct electron detector and the methodological development of algorithms fordata processing. Cryo-EM is a very general and powerful technique because it does notrequire the formation of crystalline arrays of macromolecules. In addition, unlike X-raycrystallography and nuclear magnetic resonance (NMR) that measure ensembles of parti-cles, single particle cryo-EM produces images of individual particles. Cryo-EM therefore

Partially supported by Award Number R01GM090200 from the NIGMS, FA9550-17-1-0291 from AFOSR,Simons FoundationMath+X Investigator Award, and theMoore Foundation Data-Driven Discovery InvestigatorAward.MSC2010: primary 92C55; secondary 68U10, 44A12, 62H12, 33C55, 90C22.

4013

Page 2: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

4014 AMIT SINGER

has the potential to analyze conformational changes and energy landscapes associated withstructures of complexes in different functional states.

As there exist many excellent review articles and textbooks on single particle cryo-EMFrank [2006], van Heel, Gowen, et al. [2000], Nogales [2016], Glaeser [2016], Subrama-niam, Kühlbrandt, and Henderson [2016], C. O. S. Sorzano and Carazo [2017], and F. J.Sigworth [2016], we choose to solely focus here on the mathematical foundations of thistechnique. Topics of great importance to practitioners, such as the physics and optics ofthe electron microscope, sample preparation, and data acquisition are not treated here.

In cryo-EM, biological macromolecules are imaged in an electron microscope. Themolecules are rapidly frozen in a thin layer of vitreous ice, trapping them in a nearly-physiological state. The molecules are randomly oriented and positioned within the icelayer. The electronmicroscope produces a two-dimensional tomographic projection image(called a micrograph) of the molecules embedded in the ice layer. More specifically, whatis being measured by the detector is the integral in the direction of the beaming electronsof the electrostatic potential of the individual molecules.

Cryo-EM images, however, have very low contrast, due to the absence of heavy-metalstains or other contrast enhancements, and have very high noise due to the small electrondoses that can be applied to the specimen without causing too much radiation damage.The first step in the computational pipeline is to select “particles” from the micrographs,that is, to crop from each micrograph several small size images each containing a singleprojection image, ideally centered. The molecule orientations associated with the particleimages are unknown. In addition, particle images are not perfectly centered, but this wouldbe of lesser concern to us for now.

The imaging modality is akin to the parallel beam model in Computerized Tomogra-phy (CT) of medical images, where a three-dimensional density map of an organ needsto be estimated from tomographic images. There are two aspects that make single parti-cle reconstruction (SPR) from cryo-EM more challenging compared to classical CT. First,in medical imaging the patient avoids movement, hence viewing directions of individualprojections are known to the scanning device, whereas in cryo-EM the viewing directionsare unknown. Electron Tomography (ET) employs tilting and is often used for cellularimaging, providing reconstructions of lower resolution due to increased radiation damagefor the entire tilt series. While it is possible to tilt the specimen and register relative view-ing directions among images within a tilt series, radiation damage destroys high frequencycontent and it is much more difficult to obtain high resolution reconstructions using ET.In SPR, each particle image corresponds to a different molecule, ideally of the same struc-ture, but at different and unknown orientation. Second, the signal-to-noise ratio (SNR)typical of cryo-EM images is smaller than one (more noise than signal). Thus, to obtain areliable three-dimensional density map of a molecule, the information from many imagesof identical molecules must be combined.

Page 3: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

MATHEMATICS FOR CRYO-ELECTRON MICROSCOPY 4015Image Formation Model and Inverse Problem

Projection Ii

Molecule φ

Electronsource

Ri =

− R1

i

T−

− R2

i

T−

− R3

i

T−

∈ SO(3)

jection images ) =−∞

xR yR zR dz + “noise”.

7→ is the scattering potential of the molecule.

Cryo-EM core problem: Estimate and , . . . , given , . . . ,

Amit Singer (Princeton University) June 2016 7 / 32

Figure 1: Schematic drawing of the imaging process: every projection image corre-sponds to some unknown rotation of the unknown molecule. The effect of the pointspread function is not shown here.

2 Image formation model and inverse problems

The mathematical image formation model is as follows (Figure 1). Let � : R3 ! R be theelectrostatic potential of the molecule. Suppose that following the step of particle picking,the dataset contains n particle images, denoted I1; : : : ; In. The image Ii is formed byfirst rotating � by a rotation Ri in SO(3), then projecting the rotated molecule in the z-direction, convolving it with a point spread function Hi , sampling on a Cartesian grid ofpixels of size L � L, and contaminating with noise:

(1) Ii (x; y) = Hi ?

Z 1

�1

�(RTi r) dz + “noise”; r = (x; y; z)T :

The rotations R1; : : : ; Rn 2 SO(3) are unknown. The Fourier transform of the pointspread function is called the contrast transfer function (CTF), and it is typically known,or can be estimated from the data, at least approximately, although it may vary from oneimage to another. Equivalently, we may rewrite the forward model (Equation (1)) as

(2) Ii = Hi ? PR ı � + “noise”;

whereRı�(r) = �(RT r) andP is the tomographic projection operator in the z-direction,Pf (x; y) =

RR f (x; y; z) dz. We write “noise” in Equations (1) and (2) as a full discus-

sion of the noise statistics and its possible dependence on the structure itself (i.e., structuralnoise) are beyond the scope of this paper.

The basic cryo-EM inverse problem, called the cryo-EM reconstruction problem, isto estimate � given I1; : : : ; In and H1; : : : ; Hn, without knowing R1; : : : ; Rn. Noticethat cryo-EM reconstruction is a non-linear inverse problem, because the rotations areunknown; if the rotations were known, then it would become a linear inverse problem, for

Page 4: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

4016 AMIT SINGER

which there exist many classical solvers. Because images are finitely sampled, � cannotbe estimated beyond the resolution of the input images.

An even more challenging inverse problem is the so-called heterogeneity cryo-EMproblem. Here each image may originate from a different molecular structure correspond-ing to possible structural variations. That is, to each image Ii there may correspond adifferent molecular structure �i . The goal is then to estimate �1; : : : ; �n from I1; : : : ; In,again, without knowing the rotations R1; : : : ; Rn. Clearly, as stated, this is an ill-posed in-verse problem, sincewe are required to estimatemore output parameters (three-dimensionalstructures) than input data (two-dimensional images). In order to have any hope of makingprogress with this problem, we would need to make some restrictive assumptions aboutthe potential functions �1; : : : ; �n. For example, the assumption of discrete variability im-plies that there is only a finite number of distinct conformations from which the potentialfunctions are sampled from. Then, the goal is to estimate the number of conformations,the conformations themselves, and their distribution. Another popular assumption is thatof continuous variability with a small number of flexible motions, so that �1; : : : ; �n aresampled from a low-dimensional manifold of conformations. Either way, the problem ispotentially well-posed only by assuming an underlying low-dimensional structure on thedistribution of possible conformations.

In order to make this exposition less technical, we are going to make an unrealisticassumption of ideally localized point spread functions, or equivalently, constant contrasttransfer functions, so that H1; : : : ; Hn are eliminated from all further consideration here.All methods and analyses considered below can be generalized to include the effect ofnon-ideal CTFs, unless specifically mentioned otherwise.

3 Solving the basic cryo-EM inverse problem for clean images

Even with clean projection images, the reconstruction problem is not completely obvi-ous (Figure 2). A key element to determining the rotations of the images is the Fourierprojection slice theorem Natterer [1986] that states that the two-dimensional Fourier trans-form of a tomographic projection image is the restriction of the three-dimensional Fouriertransform of � to a planar central slice perpendicular to the viewing direction:

(3) F PR ı � = SR ı F �;

where F denotes the Fourier transform (over R2 on the left hand side of Equation (3), andover R3 on the right hand side of Equation (3)), and S is the restriction operator to thexy-plane (z = 0).

The Fourier slice theorem implies the common line property: the intersection of two(non-identical) central slices is a line. Therefore, for any pair of projection images, there

Page 5: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

MATHEMATICS FOR CRYO-ELECTRON MICROSCOPY 4017

Figure 2: Illustration of the basic cryo-EM inverse problem for clean images. Howto estimate the three-dimensional structure (top) from clean projection images takenat unknown viewing angles (bottom)?

Page 6: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

4018 AMIT SINGER

Figure 3: Left: Illustration of the Fourier slice theorem and the common line prop-erty. Right: Angular reconstitution

is a pair of central lines (one in each image) on which their Fourier transforms agree (Fig-ure 3, left panel). For non-symmetric generic molecular structures it is possible to uniquelyidentify the common-line, for example, by cross-correlating all possible central lines inone image with all possible central lines in the other image, and choosing the pair of lineswith maximum cross-correlation. The common line pins down two out of the three Eu-ler angles associated with the relative rotation R�1

i Rj between images Ii and Ij . Theangle between the two central planes is not determined by the common line. In order todetermine it, a third image is added, and the three common line pairs between the threeimages uniquely determine their relative rotations up to a global reflection (Figure 3, rightpanel). This procedure is known as “angular reconstitution”, and it was proposed inde-pendently by Vainshtein and A. Goncharov [1986] and van Heel [1987]. Notice that thehandedness of the molecule cannot be determined by single particle cryo-EM, becausethe original three-dimensional object and its reflection give rise to identical sets of pro-jection images with rotations related by the following conjugation, Ri = JRi J

�1, withJ = J �1 = diag(1; 1; �1). For molecules with non-trivial point group symmetry, e.g.,cyclic symmetry, there are multiple common lines between pairs of images, and even self-common lines that enable rotation assignment from fewer images.

As a side comment, notice that for the analog problem in lower dimension of recon-structing a two-dimensional object from its one-dimensional tomographic projections takenat unknown directions, the Fourier slice theorem does not help in determining the viewingdirections, because it only has a trivial geometric implication that the Fourier transform ofthe line projections intersect a point, the zero frequency, corresponding to the total mass ofthe density. Yet, it is possible to uniquely determine the viewing directions by relating themoments of the projections with those of the original object, as originally proposed by A.

Page 7: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

MATHEMATICS FOR CRYO-ELECTRON MICROSCOPY 4019

(a) Clean (b) SNR=20 (c) SNR=2�1 (d) SNR=2�2 (e) SNR=2�3

(f) SNR=2�4 (g) SNR=2�5 (h) SNR=2�6 (i) SNR=2�7 (j) SNR=2�8

Figure 4: Simulated projections of size 129 � 129 pixels at various levels of SNR.

Goncharov [1987] and further improved and analyzed by Basu and Bresler [2000a,b]. Anextension of the moment method to 3-D cryo-EM reconstruction was also proposed A. B.Goncharov [1988] and A. Goncharov and Gelfand [1988]. As the moment based methodis very sensitive to noise and cannot handle varying CTF in a straightforward manner, itmostly remained a theoretical curiosity.

4 Solving the basic cryo-EM inverse problem for noisy images

For noisy images it is more difficult to correctly identify the common lines. Figure 4 showsa simulated clean projection image contaminated by white Gaussian noise at various levelsof SNR, defined as the ratio between the signal variance to noise variance. Section 4specifies the fraction of correctly identified common lines as a function of the SNR forthe simulated images, where a common line is considered to be correctly identified ifboth central lines deviate by no more than 10ı from their true directions. The fractionof correctly identified common lines deteriorates quickly with the SNR. For SNR valuestypical of experimental images, the fraction of correctly identified common lines is around0:1, and can be even lower for smaller molecules of lower SNR. As angular reconstitutionrequires three pairs of common lines to be correctly identified, its probability to succeedis only 10�3. Moreover, the procedure of estimating the rotations of additional imagessequentially using their common lines with the previously rotationally assigned imagesquickly accumulates errors.

Page 8: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

4020 AMIT SINGER

log2(SNR) p

20 0.9970 0.980-1 0.956-2 0.890-3 0.764-4 0.575-5 0.345-6 0.157-7 0.064-8 0.028-9 0.019

Table 1: Fraction p of correctly identified common lines as a function of the SNR.

The failure of angular reconstitution at low SNR, raises the question of how to solvethe cryo-EM reconstruction problem at low SNR. One possibility is to use better commonline approaches that instead of working their way sequentially like angular reconstitutionuse the entire information between all common lines at once, in an attempt to find a set ofrotations for all images simultaneously. Another option is to first denoise the images inorder to boost the SNR and improve the detection rate of common lines. Denoising canbe achieved for example by a procedure called 2-D classification and averaging, in whichimages of presumably similar viewing directions are identified, rotationally aligned, andaveraged, thus diminishing the noise while maintaining the common signal. While thesetechniques certainly help, and in many cases lead to successful ab-initio three-dimensionalmodeling (at least at low resolution), for small molecules with very low SNR they still fail.

The failure of these algorithms is not due to their lack of sophistication, but rather afundamental one: It is impossible to accurately estimate the image rotations at very lowSNR, regardless of the algorithmic procedure being used. To understand this inherent limi-tation, consider an oracle that knows the molecular structure �. Even the oracle would notbe able to accurately estimate image rotations at very low SNR. In an attempt to estimatethe rotations, the oracle would produce template projection images of the form PR ı �,and for each noisy reference experimental image, the oracle would look for its best matchamong the template images, that is, the rotation R that minimizes the distance betweenthe template PR ı � ad the reference image. At very low SNR, the random contributionof the noise dominates the distance, and the oracle would be often fooled to assign wrongrotations with large errors.

Page 9: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

MATHEMATICS FOR CRYO-ELECTRON MICROSCOPY 4021

Since at very low SNR even an oracle cannot assign rotations reliably, we should giveup on any hope for a sophisticated algorithm that would succeed in estimating the rotationsat any SNR. Instead, we shouldmainly focus on algorithms that try to estimate the structure� without estimating rotations. This would be the topic of the next section. Still, becausein practice algorithms for estimating rotations are quite useful for large size molecules, wewould quickly survey those first.

4.1 Common-line approaches. There are several procedures that attempt to simulta-neously estimate all rotations R1; : : : ; Rn from the common lines between all pairs ofimages at once Singer, Coifman, F. J. Sigworth, Chester, and Shkolnisky [2010], Singerand Shkolnisky [2011], Shkolnisky and Singer [2012], andWang, Singer, andWen [2013].Due to space limitations, we only briefly explain the semidefinite programming (SDP) re-laxation approach Singer and Shkolnisky [2011]. Let (xij ; yij ) be a point on the unit circleindicating the location of the common line between images Ii and Ij in the local coordi-nate system of image Ii (see Figure 3, left panel). Also, let cij = (xij ; yij ; 0)T . Then, thecommon-line property implies that Ri cij = Rj cj i . Such a linear equation can be writtenfor every pair of images, resulting an overdetermined system, because the number of equa-tions is O(n2), whereas the number of variables associated with the unknown rotations isonly O(n). The least squares estimator is the solution to minimization problem

(4) minR1;R2;:::;Rn2SO(3)

Xi¤j

kRi cij � Rj cj i k2:

This is a non-convex optimization problem over an exponentially large search space. TheSDP relaxation and its rounding procedure are similar in spirit to the Goemans-WilliamsonSDP approximation algorithm for Max-Cut Goemans and Williamson [1995]. Specifi-cally, it consists of optimizing over a set of positive definite matrices with entries relatedto the rotation ratios RT

i Rj and satisfying the block diagonal constraints RTi Ri = I ,

while relaxing the rank-3 constraint. There is also a spectral relaxation variant, which ismuch more efficient to compute than SDP and its performance can be quantified using rep-resentation theory Hadani and Singer [2011], but requires the distribution of the viewingdirections to be uniform.

A more recent procedure A. S. Bandeira, Y. Chen, and Singer [2015] attempts to solvean optimization problem of the form

(5) minR1;R2;:::;Rn2SO(3)

Xi¤j

fij (RTi Rj )

using an SDP relaxation that generalizes an SDP-based algorithm for unique gamesCharikar,K. Makarychev, and Y. Makarychev [2006] to SO(3) via classical representation theory.

Page 10: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

4022 AMIT SINGER

The functions fij encode the cost for the common line implied by the rotation ratioRTi Rj

for images Ii and Ij . The unique feature of this approach is that the common lines donot need to be identified, but rather all possibilities are taken into account and weightedaccording to the pre-computed functions fij .

4.2 2-D classification and averaging. If images corresponding to similar viewing di-rection can be identified, then they can be rotationally (and translationally) aligned andaveraged to produce “2-D class averages” that enjoy a higher SNR. The 2-D class aver-ages can be used as input to common-line based approaches for rotation assignment, astemplates in semi-automatic procedures for particle picking, and to provide a quick assess-ment of the particles.

There are several computational challenges associated with the 2-D classification prob-lem. First, due to the low SNR, it is difficult to detect neighboring images in terms oftheir viewing directions. It is also not obvious what metric should be used to compare im-ages. Another difficulty is associated with the computational complexity of comparing allpairs of images and finding their optimal in-plane alignment, especially for large datasetsconsisting of hundreds of thousands of particle images.

Principal component analysis (PCA) of the images offers an efficient way to reducethe dimensionality of the images and is often used in 2-D classification procedures VanHeel and Frank [1981]. Since particle images are just as likely to appear in any in-planerotation (e.g., by rotating the detector), it makes sense to perform PCA for all images andtheir uniformly distributed in-plane rotations. The resulting covariance matrix commuteswith the group action of in-plane rotation. Therefore, it is block-diagonal in any steerablebasis of functions in the form of outer products of radial functions and Fourier angularmodes. The resulting procedure, called steerable PCA is therefore more efficiently com-puted compared to standard PCA Zhao, Shkolnisky, and Singer [2016]. In addition, theblock diagonal structure implies a considerable reduction in dimensionality: for imagesof size L � L, the largest block size is O(L � L), whereas the original covariance is ofsize L2 � L2. Using results from the spiked covariance model in high dimensional PCAJohnstone [2001], this implies that the principal components and their eigenvalues are bet-ter estimated using steerable PCA, and modern eigenvalue shrinkage procedures can beapplied with great success Bhamre, Zhang, and Singer [2016].

The steerable PCA framework also paves the way to a natural rotational invariant rep-resentation of the image Zhao and Singer [2014]. Images can therefore be compared usingtheir rotational invariant representation, saving the cost associated with rotational align-ment. In addition, efficient algorithms for approximate nearest neighbors search can beapplied for initial classification of the images. The classification can be further improved

Page 11: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

MATHEMATICS FOR CRYO-ELECTRON MICROSCOPY 4023

by applying vector diffusion maps Singer and H.-T. Wu [2012] and Singer, Zhao, Shkol-nisky, and Hadani [2011], a non-linear dimensionality reduction method that generalizesLaplacian eigenmaps Belkin and Niyogi [2002] and diffusion maps Coifman and Lafon[2006] by also exploiting the optimal in-plane transformation between neighboring im-ages.

5 How to solve the cryo-EM problem at very low SNR?

The most popular approach for cryo-EM reconstruction is iterative refinement. Itera-tive refinement methods date back to (at least) Harauz and Ottensmeyer [1983, 1984]and are the cornerstone of modern software packages for single particle analysis Shaikh,Gao, Baxter, Asturias, Boisset, Leith, and Frank [2008], van Heel, Harauz, Orlova, R.Schmidt, and Schatz [1996], C. Sorzano, Marabini, Velázquez-Muriel, Bilbao-Castro,S. H. Scheres, Carazo, and Pascual-Montano [2004], Tang, Peng, Baldwin, Mann, Jiang,Rees, and Ludtke [2007], Hohn et al. [2007], Grigorieff [2007], S. Scheres [2012], andPunjani, Rubinstein, Fleet, and Brubaker [2017]. Iterative refinement starts with someinitial 3-D structure �0 and at each iteration project the current structure at many differentviewing directions to produce template images, then match the noisy reference imageswith the template images in order to assign rotations to the noisy images, and finally per-form a 3-D tomographic reconstruction using the noisy images and their assigned rota-tions. Instead of hard assignment of rotations, a soft assignment in which each rotation isassigned a distribution rather than just the best match, can be interpreted as an expectation-maximization procedure for maximum likelihood of the structure � while marginalizingover the rotations, which are treated as nuisance parameters. The maximum likelihoodframework was introduced to the cryo-EM field by F. Sigworth [1998] and its implemen-tation in the RELION software package S. Scheres [2012] is perhaps most widely usednowadays. Notice that a requirement for the maximum likelihood estimator (MLE) to beconsistent is that the number of parameters to be estimated does not grow indefinitely withthe number of samples (i.e., number of images in our case). The Neyman-Scott “paradox”Neyman and Scott [1948] is an example where maximum likelihood is inconsistent whenthe number of parameters grows with the sample size. The MLE of � and R1; : : : ; Rn istherefore not guaranteed to be consistent. On the other hand, the MLE of � when treatingthe rotations as hidden parameters is consistent.

The MLE approach has been proven very successful in practice. Yet, it suffers fromseveral important shortcomings. First, expectation-maximization and other existing opti-mization procedures are only guaranteed to converge to a local optimum, not necessary theglobal one. Stochastic gradient descent Punjani, Rubinstein, Fleet, and Brubaker [2017]

Page 12: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

4024 AMIT SINGER

and frequency marching Barnett, Greengard, Pataki, and Spivak [2017] attempt to miti-gate that problem. MLE requires an initial starting model, and convergence may dependon that model, a phenomenon known as “model bias”. MLE can be quite slow to com-pute, as many iterations may be required for convergence, with each iteration performinga computationally expensive projection template-reference matching and tomographic re-construction, although running times are significantly reduced in modern GPU implemen-tations. From a mathematical standpoint, it is difficult to analyze the MLE. In particular,what is the sample complexity of the cryo-EM reconstruction problem? That is, howmanynoisy images are needed for successful reconstruction?

6 Kam’s autocorrelation analysis

About 40 years ago, Kam [1980] proposed amethod for 3-D ab-initio reconstructionwhichis based on computing the autocorrelation and higher order correlation functions of the 3-D structure in Fourier space from the 2-D noisy projection images. Remarkably, it wasrecently shown in A. S. Bandeira, Blum-Smith, Perry, Weed, and Wein [2017] that thesecorrelation functions determine the 3-D structure uniquely (or at least up to a finite numberof possibilities). Kam’s method completely bypasses the estimation of particle rotationsand estimates the 3-D structure directly. The most striking advantage of Kam’s methodover iterative refinement methods is that it requires only one pass over the data for com-puting the correlation functions, and as a result it is extremely fast and can operate ina streaming mode in which data is processed on the fly while being acquired. Kam’smethod can be regarded as a method of moments approach for estimating the structure �.The MLE is asymptotically efficient, therefore its mean squared error is typically smallerthan that of the method of moments estimator. However, for the cryo-EM reconstructionproblem the method of moments estimator of Kam is much faster to compute comparedto the MLE. In addition, Kam’s method does not require a starting model. From a theo-retical standpoint, Kam’s theory sheds light on the sample complexity of the problem asa function of the SNR. For example, using Kam’s method in conjunction with tools fromalgebraic geometry and information theory, it was shown that in the case of uniformly dis-tributed rotations, the sample complexity scales as 1/SNR3 in the low SNR regime A. S.Bandeira, Blum-Smith, Perry, Weed, and Wein [ibid.].

Interest in Kam’s theory has been recently revived due to its potential application toX-ray free electron lasers (XFEL) Kam [1977], Liu, B. K. Poon, Saldin, Spence, andZwart [2013], Starodub et al. [2012], Saldin, Shneerson, et al. [2010], Saldin, H.-C. Poon,Schwander, Uddin, and M. Schmidt [2011], and Kurta et al. [2017]. However, Kam’smethod has so far received little attention in the EM community. It is an idea that wasclearly ahead of its time: There was simply not enough data to accurately estimate second

Page 13: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

MATHEMATICS FOR CRYO-ELECTRON MICROSCOPY 4025

and third order statistics from the small datasets that were available at the time (e.g. typ-ically just dozens of particles). Moreover, accurate estimation of such statistics requiresmodern techniques from high dimensional statistical analysis such as eigenvalue shrinkagein the spiked covariance model that have only been introduced in the past two decades. Es-timation is also challenging due to the varying CTF between micrographs and non-perfectcentering of the images. Finally, Kam’s method requires a uniform distribution of particleorientations in the sample, an assumption that usually does not hold in practice.

In Bhamre, Zhang, and Singer [2016], we have already addressed the challenge of vary-ing CTF and also improved the accuracy and efficiency of estimating the covariance ma-trix from projection images by combining the steerable PCA framework Zhao and Singer[2013] and Zhao, Shkolnisky, and Singer [2016] with optimal eigenvalue shrinkage proce-dures Johnstone [2001], Donoho, Gavish, and Johnstone [2013], and Gavish and Donoho[2017]. Despite this progress, the challenges of non-perfect centering of the images thatlimits the resolution and the stringent requirement for uniformly distributed viewing di-rections, still put severe limitations on the applicability of Kam’s method in cryo-EM.

Here is a very brief account of Kam’s theory. Kam showed that the Fourier projectionslice theorem implies that if the viewing directions of the projection images are uniformlydistributed, then the autocorrelation function of the 3-D volumewith itself over the rotationgroup SO(3) can be directly computed from the covariance matrix of the 2-D images,i.e. through PCA. Specifically, consider the spherical harmonics expansion of the Fouriertransform of �

(6) F �(k; �; ') =

1Xl=0

lXm=�l

Alm(k)Y ml (�; ');

where Y ml

are the spherical harmonics, and Alm are functions of the radial frequency k.Kam showed that from the covariance matrix of the 2-D Fourier transform of the 2-Dprojection images it is possible to extract matrices Cl (l = 0; 1; 2; : : :) that are related tothe radial functions Alm through

(7) Cl(k1; k2) =

lXm=�l

Alm(k1)Alm(k2):

For images sampled on a Cartesian grid of pixels, each Cl is a matrix of size Kl � Kl ,where Kl is determined by a sampling criterion dating back to Klug and Crowther [1972]to avoid aliasing. Kl is a monotonic decreasing function of l , and we set L as the largest l

in the spherical harmonics expansion for which Kl � l . In matrix notation, Equation (7)is equivalent to

(8) Cl = AlA�l ;

Page 14: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

4026 AMIT SINGER

whereAl is a matrix of sizeKl �(2l+1)whosem’th column is the vectorAlm and whoserows are indexed by the radial frequency k, and whereA� is the Hermitian conjugate ofA.However, the factorization of Cl in Equation (8), also known as the Cholesky decomposi-tion, is not unique: If Al satisfies Equation (8), then for any (2l + 1) � (2l + 1) unitarymatrix U (i.e., U satisfies U U � = U �U = I2l+1), also AlU satisfies Equation (8).In fact, since the molecular density � is real-valued, its Fourier transform is conjugate-symmetric, and hence the matrices Al are purely real for even l , and purely imaginary forodd l . Therefore, Equation (8) determines Al uniquely up to an orthogonal matrix Ol ofsize (2l+1)�(2l+1) (i.e.,Ol is a real valuedmatrix satisfyingOlO

Tl

= OTl

Ol = I2l+1).Formally, we take a Cholesky decomposition of the estimated Cl to obtain a Kl � (2l +1)

matrix Fl satisfying Cl = FlF�l. Accordingly, Al = FlOl for some unknown orthogonal

matrix Ol .In other words, from the covariance matrix of the 2-D projection images we can re-

trieve, for each l , the radial functions Alm (m = �l; : : : ; l) up to an orthogonal matrix Ol .This serves as a considerable reduction of the parameter space: Originally, a completespecification of the structure requires, for each l , a matrix Al of size Kl � (2l +1), but theadditional knowledge of Cl reduces the parameter space to that of an orthogonal matrixof size (2l + 1) � (2l + 1) which has only l(2l + 1) degrees of freedom, and typicallyKl � l .

In Bhamre, Zhang, and Singer [2015] we showed that the missing orthogonal matricesO1; O2; : : : ; OL can be retrieved by “orthogonal extension”, a process that relies on theexistence of a previously solved similar structure and in which the orthogonal matricesare grafted from the previously resolved similar structure to the unknown structure. How-ever, the structure of a similar molecule is usually unavailable. We also offered anothermethod for retrieving the orthogonal matrices using “orthogonal replacement”, inspiredby molecular replacement in X-ray crystallography. While orthogonal replacement doesnot require any knowledge of a similar structure, it assumes knowledge of a structure thatcan bind to the molecule (e.g., an antibody fragment of known structure that binds to aprotein).

An alternative approach for determining the orthogonal matrices was already proposedby Kam [1980] and Kam and Gafni [1985], who suggested using higher order correla-tions. Specifically, Kam proposed using triple products of the form I 2(k1)I (k2) andquadruple products of the form I 2(k1)I 2(k2), where I is the 2-D Fourier transform ofimage I . The main disadvantage of using higher order correlations is noise amplification:Methods based on triple correlations require number of images that scale as 1/SNR3, andeven more badly as 1/SNR4 in the case of quadruple correlation. The higher correlationterms that Kam proposed are not complete. In general, a triple product takes the form

Page 15: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

MATHEMATICS FOR CRYO-ELECTRON MICROSCOPY 4027

I (k1)I (k2)I (k3). Kam is using only a slice of the possible triple products (namely, set-ting k3 = k1) due to the large number of coefficients it results in. This is closely relatedto restricting the bispectrum due to its high dimensionality Marabini and Carazo [1996].In that respect we note that a vast reduction in the dimensionality of the triple correlation(or bispectrum coefficients) can be achieved by only using triple products of the steerablePCA coefficients. The number of meaningful PCA expansion coefficient is typically ofthe order of a few hundreds (depending on the noise level), much smaller than the numberof pixels in the images.

7 A mathematical toy model: multi-reference alignment

The problem of multi-reference alignment serves as a mathematical toy model for ana-lyzing the cryo-EM reconstruction and heterogeneity problems. In the multi-referencealignment model, a signal is observed by the action of a random circular translation andthe addition of Gaussian noise. The goal is to recover the signal’s orbit by accessing mul-tiple independent observations (Figure 5). Specifically, the measurement model is of theform

(9) yi = Ri x + "i ; x; yi ; "i 2 RL; "i ∼ N(0; �2IL�L); i = 1; 2; : : : ; n:

While pairwise alignment succeeds at high SNR, accurate estimation of rotations isimpossible at low SNR, similar to the fundamental limitation in cryo-EM. Two naturalquestions arise: First, how to estimate the underlying signal at very low SNR, and howmany measurements are required for accurate estimation.

Just like in the cryo-EM reconstruction problem, an expectation-maximization typealgorithm can be used to compute the MLE of the signal x, treating the cyclic shifts as nui-sance parameters. Alternatively, a method of moments approach would consist of estimat-ing correlation functions that are invariant to the group action. Specifically, the followingare invariant features (in Fourier / real space), and the number of observations needed foraccurate estimation by the central limit theorem:

• Zero frequency / average pixel value:

(10)1

n

nXi=1

yi (0) ! x(0) as n ! 1: Need n & �2:

• Power spectrum / autocorrelation:

(11)1

n

nXi=1

jyi (k)j2

! jx(k)j2 + �2 as n ! 1: Need n & �4:

Page 16: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

4028 AMIT SINGER

=0 =0.1 =1.2

Figure 5: Multi-reference alignment of 1-D periodic signals, at different noise levels� .

• Bispectrum / triple correlation Tukey [1953]:

(12)1

n

nXi=1

yi (k1)yi (k2)yi (�k1 � k2) !x(k1)x(k2)x(�k1 � k2)

as n ! 1Need n & �6:

The bispectrum Bx(k1; k2) = x(k1)x(k2)x(�k1 � k2) contains phase informationand is generically invertible (up to global shift) Kakarala [1993] and Sadler and Gian-nakis [1992]. It is therefore possible to accurately reconstruct the signal from sufficientlymany noisy shifted copies for arbitrarily low SNR without estimating the shifts and evenwhen estimation of shifts is poor. Notice that if shifts are known, then n & 1/SNR issufficient for accurate estimation of the signal. However, not knowing the shifts make abig difference in terms of the sample complexity, and n & 1/SNR3 for the shift-invariantmethod. In fact, no method can succeed with asymptotically fewer measurements (as afunction of the SNR) in the case of uniform distribution of shifts Perry, Weed, A. Ban-deira, Rigollet, and Singer [2017], A. Bandeira, Rigollet, and Weed [2017], and Abbe,J. M. Pereira, and Singer [2017]. The computational complexity and stability of a vari-ety of bispectrum inversion algorithms was studied in Bendory, Boumal, Ma, Zhao, andSinger [2017] and H. Chen, Zehni, and Zhao [2018]. A somewhat surprising result isthat multi-reference alignment with non-uniform (more precisely, non-periodic) distribu-tion of shifts can be solved with just the first two moments and the sample complexityis proportional to 1/SNR2 Abbe, Bendory, Leeb, J. Pereira, Sharon, and Singer [2017]and Abbe, J. M. Pereira, and Singer [2018]. The method of moments can also be applied

Page 17: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

MATHEMATICS FOR CRYO-ELECTRON MICROSCOPY 4029

to multi-reference alignment in the heterogeneous setup, and avoids both shift estimationand clustering of the measurements Boumal, Bendory, Lederman, and Singer [2017].

The analysis of the multi-reference alignment model provides key theoretical insightsinto Kam’smethod for cryo-EM reconstruction. In addition, themulti-reference alignmentproblem also offers a test bed for optimization algorithms and computational tools beforetheir application to the more challenging problems of cryo-EM.

8 Summary

Computational tools are a vital component of the cryo-EM structure determination processthat follows data collection. Still, there are many computational aspects that are either un-resolved or that require further research and development. New computational challengesconstantly emerge from attempts to further push cryo-EM technology towards higher res-olution, higher throughput, smaller molecules, and highly flexible molecules. Importantcomputational challenges include mapping conformational landscapes, structure valida-tion, dealingwith low SNR for small molecule reconstruction, motion correction and videoprocessing, ab-initio modeling, and sub-tomogram averaging, among others. We empha-size that this paper is of limited scope, and therefore addressed only a few core elementsof the reconstruction pipeline, mainly focusing on the cryo-EM reconstruction problem.

Moreover, the paper did not aim to present any new algorithms and techniques, butinstead provide a review of some of the already existing methods and their analysis, withperhaps some new commentary. Although the heterogeneity problem is arguably one ofthe most important challenges in cryo-EM analysis nowadays, techniques for addressingthis problemwere not discussed here mainly for space limitations. Another reason to deferthe review of methods for the heterogeneity problem is that techniques are still being de-veloped, and that aspect of the cryo-EM analysis is less mature and not as well understoodcompared to the basic cryo-EM reconstruction problem.

To conclude, mathematics plays a significant role in the design and analysis of algo-rithms for cryo-EM. Different aspects of representation theory, tomography and integralgeometry, high dimensional statistics, random matrix theory, information theory, alge-braic geometry, signal and image processing, dimensionality reduction, manifold learn-ing, numerical linear algebra, and fast algorithms, all come together in helping structuralbiologists discover new biology using cryo-electron microscopy.

Page 18: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

4030 AMIT SINGER

References

Emmanuel Abbe, Tamir Bendory, William Leeb, João Pereira, Nir Sharon, and AmitSinger (2017). “Multireference Alignment is Easier with an Aperiodic Translation Dis-tribution”. arXiv: 1710.02793 (cit. on p. 4028).

Emmanuel Abbe, João M Pereira, and Amit Singer (2017). “Sample complexity of theboolean multireference alignment problem”. In: Information Theory (ISIT), 2017 IEEEInternational Symposium on. IEEE, pp. 1316–1320 (cit. on p. 4028).

– (2018). “Estimation in the group action channel”. arXiv: 1801.04366 (cit. on p. 4028).Alexey Amunts et al. (2014). “Structure of the yeast mitochondrial large ribosomal sub-

unit”. Science 343.6178, pp. 1485–1489 (cit. on p. 4013).Afonso S Bandeira, Ben Blum-Smith, Amelia Perry, Jonathan Weed, and Alexander S

Wein (2017). “Estimation under group actions: recovering orbits from invariants”. arXiv:1712.10163 (cit. on p. 4024).

Afonso S Bandeira, Yutong Chen, and Amit Singer (2015). “Non-unique games overcompact groups and orientation estimation in cryo-em”. arXiv: 1505.03840 (cit. onp. 4021).

Afonso Bandeira, Philippe Rigollet, and Jonathan Weed (2017). “Optimal rates of estima-tion for multi-reference alignment”. arXiv: 1702.08546 (cit. on p. 4028).

Alex Barnett, Leslie Greengard, Andras Pataki, and Marina Spivak (2017). “Rapid Solu-tion of the Cryo-EM Reconstruction Problem by Frequency Marching”. SIAM Journalon Imaging Sciences 10.3, pp. 1170–1195 (cit. on p. 4024).

Alberto Bartesaghi, Alan Merk, Soojay Banerjee, Doreen Matthies, XiongwuWu, Jacque-line LSMilne, and Sriram Subramaniam (2015). “2.2Å resolution cryo-EM structure ofˇ-galactosidase in complexwith a cell-permeant inhibitor”. Science 348.6239, pp. 1147–1151 (cit. on p. 4013).

Samit Basu and Yoram Bresler (2000a). “Feasibility of tomography with unknown viewangles”. IEEE Transactions on Image Processing 9.6, pp. 1107–1122 (cit. on p. 4019).

– (2000b). “Uniqueness of tomography with unknown view angles”. IEEE Transactionson Image Processing 9.6, pp. 1094–1106 (cit. on p. 4019).

Mikhail Belkin and Partha Niyogi (2002). “Laplacian eigenmaps and spectral techniquesfor embedding and clustering”. In: Advances in neural information processing systems,pp. 585–591 (cit. on p. 4023).

Tamir Bendory, Nicolas Boumal, Chao Ma, Zhizhen Zhao, and Amit Singer (2017). “Bis-pectrum inversion with application to multireference alignment”. IEEE Transactionson Signal Processing 66.4, pp. 1037–1050 (cit. on p. 4028).

Tejal Bhamre, Teng Zhang, and Amit Singer (2015). “Orthogonal matrix retrieval in cryo-electron microscopy”. In: Biomedical Imaging (ISBI), 2015 IEEE 12th InternationalSymposium on. IEEE, pp. 1048–1052 (cit. on p. 4026).

Page 19: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

MATHEMATICS FOR CRYO-ELECTRON MICROSCOPY 4031

– (2016). “Denoising and covariance estimation of single particle cryo-EM images”. Jour-nal of structural biology 195.1, pp. 72–81 (cit. on pp. 4022, 4025).

Nicolas Boumal, Tamir Bendory, Roy R Lederman, and Amit Singer (2017). “Heteroge-neous multireference alignment: a single pass approach”. arXiv: 1710.02590 (cit. onp. 4029).

Moses Charikar, Konstantin Makarychev, and Yury Makarychev (2006). “Near-optimalalgorithms for unique games”. In: Proceedings of the thirty-eighth annual ACM sym-posium on Theory of computing. ACM, pp. 205–214 (cit. on p. 4021).

Hua Chen, Mona Zehni, and Zhizhen Zhao (2018). “A Spectral Method for Stable Bispec-trum Inversion with Application to Multireference Alignment”. arXiv: 1802.10493(cit. on p. 4028).

Ronald R Coifman and Stéphane Lafon (2006). “Diffusion maps”. Applied and computa-tional harmonic analysis 21.1, pp. 5–30 (cit. on p. 4023).

David L Donoho, Matan Gavish, and Iain M Johnstone (2013). “Optimal shrinkage ofeigenvalues in the spiked covariance model”. arXiv: 1311.0851 (cit. on p. 4025).

J. Frank (2006). Three-dimensional electron microscopy of macromolecular assemblies.Academic Press (cit. on p. 4014).

Matan Gavish and David L Donoho (2017). “Optimal shrinkage of singular values”. IEEETransactions on Information Theory 63.4, pp. 2137–2152 (cit. on p. 4025).

RobertMGlaeser (2016). “Howgood can cryo-EMbecome?”NatureMethods 13.1, pp. 28–32 (cit. on p. 4014).

Michel XGoemans andDavid PWilliamson (1995). “Improved approximation algorithmsfor maximum cut and satisfiability problems using semidefinite programming”. Jour-nal of the ACM (JACM) 42.6, pp. 1115–1145 (cit. on p. 4021).

A. B. Goncharov (1988). “Integral geometry and three-dimensional reconstruction of ran-domly oriented identical particles from their electron microphotos”. Acta Appl. Math.11.3, pp. 199–211 (cit. on p. 4019).

AB Goncharov (1987). “Methods of integral geometry and finding the relative orientationof identical particles arbitrarily arranged in a plane from their projections onto a straightline”. In: Soviet Physics Doklady. Vol. 32, p. 177 (cit. on p. 4018).

AB Goncharov and MS Gelfand (1988). “Determination of mutual orientation of identi-cal particles from their projections by the moments method”. Ultramicroscopy 25.4,pp. 317–327 (cit. on p. 4019).

Nikolaus Grigorieff (2007). “FREALIGN: high-resolution refinement of single particlestructures”. Journal of structural biology 157.1, pp. 117–125 (cit. on p. 4023).

Ronny Hadani and Amit Singer (2011). “Representation theoretic patterns in three dimen-sional Cryo-Electron Microscopy I: The intrinsic reconstitution algorithm”. Annals ofmathematics 174.2, p. 1219 (cit. on p. 4021).

Page 20: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

4032 AMIT SINGER

George Harauz and FP Ottensmeyer (1983). “Direct three-dimensional reconstruction formacromolecular complexes from electronmicrographs”.Ultramicroscopy 12.4, pp. 309–319 (cit. on p. 4023).

– (1984). “Nucleosome reconstruction via phosphorous mapping”. Science 226, pp. 936–941 (cit. on p. 4023).

Marin van Heel, Brent Gowen, et al. (2000). “Single-particle electron cryo-microscopy:towards atomic resolution”. Quarterly reviews of biophysics 33.4, pp. 307–369 (cit. onp. 4014).

Marin van Heel, George Harauz, Elena V Orlova, Ralf Schmidt, and Michael Schatz(1996). “A new generation of the IMAGIC image processing system”. Journal of struc-tural biology 116.1, pp. 17–24 (cit. on p. 4023).

Michael Hohn et al. (2007). “SPARX, a new environment for Cryo-EM image processing”.Journal of structural biology 157.1, pp. 47–55 (cit. on p. 4023).

Iain M Johnstone (2001). “On the distribution of the largest eigenvalue in principal com-ponents analysis”. Annals of statistics, pp. 295–327 (cit. on pp. 4022, 4025).

Ramakrishna Kakarala (1993). “A group-theoretic approach to the triple correlation”. In:Higher-Order Statistics, 1993., IEEE Signal Processing Workshop on. IEEE, pp. 28–32(cit. on p. 4028).

Z. Kam (1980). “The reconstruction of structure from electron micrographs of randomlyoriented particles”. J. Theor. Biol. 82.1, pp. 15–39 (cit. on pp. 4024, 4026).

Z Kam and I Gafni (1985). “Three-dimensional reconstruction of the shape of human wartvirus using spatial correlations”. Ultramicroscopy 17.3, pp. 251–262 (cit. on p. 4026).

Zvi Kam (1977). “Determination of macromolecular structure in solution by spatial corre-lation of scattering fluctuations”.Macromolecules 10.5, pp. 927–934 (cit. on p. 4024).

A Klug and R A Crowther (1972). “Three-dimensional image reconstruction from theviewpoint of information theory”. Nature 238.5365, pp. 435–440 (cit. on p. 4025).

Werner Kühlbrandt (2014). “The Resolution Revolution”. Science 343.6178, pp. 1443–1444 (cit. on p. 4013).

Ruslan P Kurta et al. (2017). “Correlations in scattered x-ray laser pulses reveal nanoscalestructural features of viruses”.Physical review letters 119.15, p. 158102 (cit. on p. 4024).

Maofu Liao, Erhu Cao, David Julius, and Yifan Cheng (2013). “Structure of the TRPV1ion channel determined by electron cryo-microscopy”. Nature 504.7478, pp. 107–112(cit. on p. 4013).

Haiguang Liu, Billy K Poon, Dilano K Saldin, John CH Spence, and Peter H Zwart (2013).“Three-dimensional single-particle imaging using angular correlations fromX-ray laserdata”.ActaCrystallographica Section A: Foundations of Crystallography 69.4, pp. 365–373 (cit. on p. 4024).

Page 21: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

MATHEMATICS FOR CRYO-ELECTRON MICROSCOPY 4033

Roberto Marabini and José Marıa Carazo (1996). “On a new computationally fast imageinvariant based on bispectral projections”. Pattern recognition letters 17.9, pp. 959–967 (cit. on p. 4027).

F. Natterer (1986). Themathematics of computerized tomography. Springer (cit. on p. 4016).Jerzy Neyman and Elizabeth L Scott (1948). “Consistent estimates based on partially con-

sistent observations”. Econometrica: Journal of the Econometric Society, pp. 1–32 (cit.on p. 4023).

Eva Nogales (2016). “The development of cryo-EM into a mainstream structural biologytechnique”. Nature Methods 13.1, pp. 24–27 (cit. on p. 4014).

Amelia Perry, JonathanWeed, AfonsoBandeira, PhilippeRigollet, andAmit Singer (2017).“The sample complexity of multi-reference alignment”. arXiv: 1707.00943 (cit. onp. 4028).

Ali Punjani, John L Rubinstein, David J Fleet, and Marcus A Brubaker (2017). “cryo-SPARC: algorithms for rapid unsupervised cryo-EM structure determination”. NatureMethods 14.3, pp. 290–296 (cit. on p. 4023).

Brian M Sadler and Georgios B Giannakis (1992). “Shift-and rotation-invariant objectreconstruction using the bispectrum”. JOSA A 9.1, pp. 57–69 (cit. on p. 4028).

D K Saldin, V L Shneerson, et al. (2010). “Structure of a single particle from scatteringby many particles randomly oriented about an axis: toward structure solution withoutcrystallization?” New J. Phys. 12.3, p. 035014 (cit. on p. 4024).

D. K. Saldin, H.-C. Poon, P. Schwander, M. Uddin, and M. Schmidt (Aug. 2011). “Re-constructing an icosahedral virus from single-particle diffraction experiments”. Opt.Express 19.18, p. 17318 (cit. on p. 4024).

S. Scheres (2012). “RELION: implementation of a Bayesian approach to cryo-EM struc-ture determination”. Journal of structural biology 180.3, pp. 519–530 (cit. on p. 4023).

Tanvir R Shaikh, Haixiao Gao, William T Baxter, Francisco J Asturias, Nicolas Bois-set, Ardean Leith, and Joachim Frank (2008). “SPIDER image processing for single-particle reconstruction of biological macromolecules from electron micrographs”. Na-ture protocols 3.12, pp. 1941–1974 (cit. on p. 4023).

Yoel Shkolnisky and Amit Singer (2012). “Viewing direction estimation in cryo-EM us-ing synchronization”. SIAM journal on imaging sciences 5.3, pp. 1088–1110 (cit. onp. 4021).

FJ Sigworth (1998). “A maximum-likelihood approach to single-particle image refine-ment”. Journal of structural biology 122.3, pp. 328–339 (cit. on p. 4023).

Fred J Sigworth (2016). “Principles of cryo-EM single-particle image processing”. Mi-croscopy 65.1, pp. 57–67 (cit. on p. 4014).

Amit Singer, Ronald R Coifman, Fred J Sigworth, David W Chester, and Yoel Shkolnisky(2010). “Detecting consistent common lines in cryo-EM by voting”. Journal of struc-tural biology 169.3, pp. 312–322 (cit. on p. 4021).

Page 22: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

4034 AMIT SINGER

Amit Singer and Yoel Shkolnisky (2011). “Three-dimensional structure determinationfrom common lines in cryo-EMby eigenvectors and semidefinite programming”. SIAMjournal on imaging sciences 4.2, pp. 543–572 (cit. on p. 4021).

Amit Singer and H-T Wu (2012). “Vector diffusion maps and the connection Laplacian”.Communications on pure and appliedmathematics 65.8, pp. 1067–1144 (cit. on p. 4023).

Amit Singer, Zhizhen Zhao, Yoel Shkolnisky, and Ronny Hadani (2011). “Viewing angleclassification of cryo-electron microscopy images using eigenvectors”. SIAM Journalon Imaging Sciences 4.2, pp. 723–759 (cit. on p. 4023).

Martin TJ Smith and John L Rubinstein (2014). “Beyond blob-ology”. Science 345.6197,pp. 617–619 (cit. on p. 4013).

Carlos Oscar S Sorzano and Jose Maria Carazo (2017). “Challenges ahead Electron Mi-croscopy for Structural Biology from the Image Processing point of view”. arXiv: 1701.00326 (cit. on p. 4014).

COS Sorzano, Roberto Marabini, Javier Velázquez-Muriel, José Román Bilbao-Castro,Sjors HW Scheres, José M Carazo, and Alberto Pascual-Montano (2004). “XMIPP: anew generation of an open-source image processing package for electron microscopy”.Journal of structural biology 148.2, pp. 194–204 (cit. on p. 4023).

Dmitri Starodub et al. (2012). “Single-particle structure determination by correlationsof snapshot X-ray diffraction patterns”. Nature communications 3, p. 1276 (cit. onp. 4024).

Sriram Subramaniam, Werner Kühlbrandt, and Richard Henderson (2016). “CryoEM atIUCrJ: a new era”. IUCrJ 3.Pt 1, pp. 3–7 (cit. on p. 4014).

Guang Tang, Liwei Peng, Philip R Baldwin, Deepinder S Mann, Wen Jiang, Ian Rees, andSteven J Ludtke (2007). “EMAN2: an extensible image processing suite for electronmicroscopy”. Journal of structural biology 157.1, pp. 38–46 (cit. on p. 4023).

JW Tukey (1953). “The spectral representation and transformation properties of the highermoments of stationary time series”. Reprinted in The Collected Works of John W. Tukey1, pp. 165–184 (cit. on p. 4028).

B.K. Vainshtein and A.B. Goncharov (1986). “Determination of the spatial orientationof arbitrarily arranged identical particles of unknown structure from their projections”.Soviet Physics Doklady 31, p. 278 (cit. on p. 4018).

M. van Heel (1987). “Angular reconstitution: A posteriori assignment of projection direc-tions for 3D reconstruction”. Ultramicroscopy 21.2, pp. 111–123 (cit. on p. 4018).

Marin Van Heel and Joachim Frank (1981). “Use of multivariate statistics in analysingthe images of biological macromolecules”. Ultramicroscopy 6.2, pp. 187–194 (cit. onp. 4022).

Lanhui Wang, Amit Singer, and Zaiwen Wen (2013). “Orientation determination of cryo-EM images using least unsquared deviations”. SIAM journal on imaging sciences 6.4,pp. 2450–2483 (cit. on p. 4021).

Page 23: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys

MATHEMATICS FOR CRYO-ELECTRON MICROSCOPY 4035

Zhizhen Zhao, Yoel Shkolnisky, and Amit Singer (2016). “Fast steerable principal com-ponent analysis”. IEEE transactions on computational imaging 2.1, pp. 1–12 (cit. onpp. 4022, 4025).

Zhizhen Zhao and Amit Singer (2013). “Fourier–Bessel rotational invariant eigenimages”.JOSA A 30.5, pp. 871–877 (cit. on p. 4025).

– (2014). “Rotationally invariant image representation for viewing direction classifica-tion in cryo-EM”. Journal of structural biology 186.1, pp. 153–166 (cit. on p. 4022).

Received 2018-03-10.

A SD MP A C MP [email protected]

Page 24: Mathematics for cryo-electron microscopyP.I.C. M.–2018 RiodeJaneiro,Vol.4(4013–4032) MATHEMATICSFORCRYO-ELECTRONMICROSCOPY AS Abstract Single-particlecryo-electronmicroscopy(cryo-EM)hasrecentlyjoinedX-raycrys