Overview of the Phase Problem - Illinois Institute of ...agni.phys.iit.edu/~howard/ACASchool/lectures07/Phasing2.pdfFrom Glusker, Lewis and Rossi Note: F P = protein F H = heavy atom

19 Feb 2008 Biology 555:Crystallographic Phasing II p. 1 of 38

Protein DataCrystal StructurePhases

Overview of the Phase Problem

John RoseACA Summer School 2006

Reorganized by Andy Howard, Biology 555, Spring 2008Part 2 of 2

RememberWe can measure reflection intensities

We can calculate structure factors from the intensitiesWe can calculate the structure factors from atomic positions

We need phase information to generate the image


From Glusker, Lewis and Rossi

!

Puvw

=1

V|F

hkl

hkl

" |2cos2# (hu + kv + lv)

Finding the Heavy Atomsor Anomalous Scatterers

The Patterson function - a F2 Fourier transform

with φ = 0 - vector map

(u,v,w instead of x,y,z) - maps all inter-atomic vectors - get N2 vectors!!

(where N= number ofatoms)


The Difference Patterson Map

SIR : |ΔF|2 = |Fnat - Fder|2SAS : |ΔF|2 = |Fhkl - F-h-k-l|2

Patterson map is centrosymmetric - see peaks at u,v,w & -u, -v, -wPeak height proportional to ZiZjPeak u,v,w’s give heavy atom x,y,z’s - Harker analysisOrigin (0,0,0) maps vector of atom to itself


Harker analysis• Certain relationships apply in Patterson

maps that enable us to determine some ofthe coordinates of our heavy atoms

• They depend on looking at differencesbetween atomic positions

• These relationships were worked out byLindo Patterson and David Harker

• Patterson space is centrosymmetric butotherwise similar to original symmetry; butPatterson symmetry has no translations

DavidHarker


Example: space group P21

• P21 has peaks atR1=(x,y,z) and R2=(-x,y+1/2,-z)

• Therefore we’ll get Patterson (difference) peaks atR1-R1, R1-R2, R2-R1, R2-R1:

• (0,0,0), (2x,-1/2,2z), (-2x,1/2,-2z),(0,0,0)• So if we look at the section of the map at Y=1/2,

we can find peaks at (-2x,1/2,-2z) and therebydiscern what the x and z coordinates of a real atomare


How do we actually use this?

• Compute difference Patterson map,i.e. map with coefficients derived fromFhkl

PH - FhklP or Fhkl - F-h-k-l

• Examine Harker sections• Peaks in Harker sections tell us where the

heavy atoms or anomalous scatterers are• Automated programs like BNP, SOLVE,

SHELX can do the heavy lifting for us


A Note About Handedness• We identify each reflection by an index, hkl.• The hkl also tells us the relative location of that reflection

in a reciprocal space coordinate system.• The indexed reflection has correct handedness if a data

processing program assigns it correctly.• The identity of the handedness of the molecule of the

crystal is related to the assignment of the handedness of thedata, which may be right or wrong!

• Note: not all data processing programs assign handednesscorrectly!

• Be careful with your data processing.


O

N

M

L

ΔOLM = ΔOLN

Q

!

"QLM

+"LON

= #

!

"LON

= # $%H

FPH = FP + FH

Need value of FHFrom Glusker, Lewis and Rossi

The Phase Triangle Relationship

FP, FPH, FH and -FH are vectors (have direction)FP <= obtained from native dataFPH <= obtained from derivative or anomalous dataFH <= obtained from Patterson analysis


O

N

M

LQ


The Phase Triangle Relationship

• In simplest terms, isomorphous replacement finds theorientation of the phase triangle from the orientation of oneof its sides. It turns out, however, that there are twopossible ways to orient the triangle if we fix the orientationof one of its sides.


X1 = φtrueor φfalseX2 = φtrueor φfalse


Note:FP = proteinFH = heavy atomFP1 = heavy atom derivative

The center of the FP1circle isplaced at the end of thevector -FH1.

Single Isomorphous Replacement

• The situation of two possible SIR phases is called the“phase ambiguity” problem, since we obtain both a trueand a false phase for each reflection. Both phasesolutions are equally probable, i.e. the phase probabilitydistribution is bimodal.


X1 = φtrueor φfalseX2 = φtrueor φfalse


Note:FP = proteinFH = heavy atomFP1 = heavy atom derivative

The center of the FP1circleis placed at the end of thevector -FH1.

Resolving the Phase Ambugity

Add more information:(1) Add another derivative (Multiple Isomorphous Replacement)(2) Use a density modification technique (solvent flattening)(3) Add anomalous data (SIR with anomalous scattering)


X1 = φtrueX2 = φfalseX3 = φfals

Exact overlap at X1 dependent on data accuracy dependent on HA accuracy called lack of closure


Note:FP = proteinFH1 = heavy atom #1FH2 = heavy atom #2FP1 = heavy atom derivativeFP2 = heavy atom derivative

The center of the FP1 and FP1circles are placed at the end of thevector -FH1 and -FH2, respectively.

Multiple Isomorphous Replacement

• We still get two solutions, one true andone false for each reflection from thesecond derivative. The true solutionsshould be consistent between the twoderivatives while the false solution shouldshow a random variation.


B.C. Wang, 1985


Solvent FlatteningSimilar to noise filteringResolve the SIR or SAS phase ambiguityElectron density can’t be negativeUse an iterative process to enhance true phase!


How does solvent flatteningresolve the phase ambiguity?

• Solvent flattening can locate and enhance the proteinimage—viz., whatever is not solvent must be protein!

• From the protein image, the phases of the structurefactors of the protein can be calculated

• These calculated phases are then used to select the truephases from sets of true and false phases


Using the structure to solve thephase ambiguity

• Thus, in essence, the phaseambiguity is resolved by the proteinimage itself!

• This solvent-flattening process wasmade practical by the introduction ofthe ISIR/ISAS program suite (Wang,1985) and other phasing programssuch DM and PHASES are based onthis approach.


Handedness from solvent flattening

• The ISAS process is performed twice, once withheavy atom sites @ refined locations, once intheir inverted locations

0.9260.350.78Wrong0.560.9640.240.82Correct0.56NP+I+S4

0.9190.360.76Wrong0.540.9550.270.80Correct0.54NP + I3

0.9400.300.80Wrong0.540.9580.260.82Correct0.54RHE

Corr.Coeff.

RfactorFOM2Handed-ness

FOM1Data


Notes on the handedness table

• 1: Figure of merit before solvent flattening

• 2: Figure of merit after one filter and four cycles of solvent

flattening

• 3: Four Iodine were used for phasing

• 4: Four Iodine and 56 Sulfur atoms were used for phasing• Heavy Atom Handedness and Protein Structure Determination

using Single-wavelength Anomalous Scattering Data, ACAAnnual Meeting, Montreal, July 25, 1995.


Does the correct hand make a difference?

• Yes!• The wrong

hand will givethe mirrorimage!


Anomalous Dispersion Methods

• All elements display an anomalous dispersion(AD) effect in X-ray diffraction

• For light elements (H, C, N, O), anomalousdispersion effects are negligible; they’re smalleven for S and P at typical X-ray energies

• For heavier elements, especially when the X-raywavelength approaches an atomic absorption edgeof the element, these AD effects can be very large.


Scattering power whenanomalous scattering exists

The scattering power of an atom exhibiting AD effects is:fAD = fn + Δf' + iΔf”where:fnis the normal scattering power of the atom in absence of

AD effectsΔf' arises from the AD effect and is a real factor

(+/- signed) added to fnΔf" is an imaginary term which also arises from the AD

effectΔf" is always positive and 90° ahead of (fn + Δf') in phase

angle


Δf’ and Δf”

• The values of Δf' and Δf" are highly dependenton the wavelength of the X-radiation.

• In the absence of AD effects, Ihkl = I-h-k-l(Friedel’s Law).

• With AD effects, Ihkl ≠ I-h-k-l (Friedel’s Lawbreaks down).

• Accurate measurement of Friedel pair differencescan be used to extract starting phases if the ADeffect is large enough.


Δf”

Fn

Figure 6. Illustration of the effect of anomalous dispersion whichproduces different vector lengths for F hkl and F -h-k-l .

Fhkl

real

F-h-k-lΔf”

F-n

F+++

F---real

f’

f’

Breakdown of Friedel’s Law

(Fhkl Left) Fn represents the total scattering by "normal" atoms without AD effects,f’ represents the sum of the normal and real AD scattering values (fn + Δf'), Δf"is the imaginary AD component and appears 90° (at a right angle) ahead of the f’vector and the total scattering is the vector F+++.

(F-h-k-l Right) F-n is the inverse of Fn (at -Φhkl) and f’ is the inverse of f’, the Δf"vector is once again 90° ahead of f’. The resultant vector, F--- in this case, isobviously shorter than the F+++ vector.


Collecting Anomalous Scattering Data

• Anomalous scatterers, such asselenium, are generally incorporatedinto the protein during expression ofthe protein or are soaked into thecrystals in a manner similar topreparing a heavy atom derivative.

• Bromine, iodine, xenon andtraditional heavy atom compoundsare also good anomalous scatterers.


How strong is the signal?

• The anomalous signal, the difference between|F+++| and |F---| is generally about one order ofmagnitude smaller than that between |FPH(hkl)|,and |FP(hkl)|.

• Thus, the signal-to-noise (S/n) level in the dataplays a critical role in the success of anomalousscattering experiments, i.e. the higher the S/n inthe data the greater the probability ofproducing an interpretable electron densitymap.


Why does it work at all?

The lack of isomorphism problem is muchmilder for anomalous data than forisomorphous replacement:

• One sample, not two or more• Unit cell is by definition (?) identical• Molecule is in the same place within that

unit cell• That partly compensates for the low S/N


Why is selenium a good choice?

• Methionine is a relatively rare amino acid: 2.4%(vs. average of 5%)

• So there aren’t a huge number of mets in a typicalprotein, but there generally are a few

• It’s possible to make E.coli auxotrophic formethionine and then feed it selenomethionine inits place

• This incorporates SeMet stoichiometrically andcovalently, which is definitely good!


Anomalous data collection• The anomalous signal can be optimized by data

collection at or near the absorption edge of theanomalous scatterer. This requires a tunable X-raysource such as a synchrotron.

• The S/n of the data can also be increased bycollecting redundant data.

• The two common anomalous scatteringexperiments are Multiwavelength AnomalousDispersion (MAD) and single wavelengthanomalous scattering/diffraction (SAS or SAD)

• The SAS technique is becoming more popular sinceit does not require a tunable X-ray source.


Increasing Number of SAS Structures

MAD

SAD


Increasing S/n with Redundancy



.

Multiwavelength Anomalous Dispersion

Note:FP = proteinFH1 = heavy atomF+

PH = F+++F-

PH = F---F+

H” = Δf”+++F-

H” = Δf”---

The center of the F+PH and F-

PHcircles are placed at the end ofthe vector -F+

H” and -F-H”

respectively

• In the MAD experiment a strong anomalous scatterer is introduced into the crystaland data are recorded at several wavelengths (peak, inflection and remote) near theX-ray absorption edge of the anomalous scatterer. The phase ambiguity resolved amanner similar to the use of multiple derivatives in the MIR technique


Single Wavelength Anomalous Scattering

• The SAS method, which combines the use of SAS dataand solvent flattening to resolve phase ambiguity wasfirst introduced in the ISAS program (Wang, 1985).The technique is very similar to resolving the phaseambiguity in SIR data.

• The SAS method does not require a tunable sourceand successful structure determination can be carriedout using a home X-ray source on crystals containinganomalous scatterers with sufficiently large Δf” suchas iron, copper, iodine, xenon and many heavy atomsalts.


Sulfur S-SAS:experimental realities

• The ultimate goal of the SAS method is the use of S-SAS to phase protein data since most proteins containsulfur. However sulfur has a very weak anomalousscattering signal with Δf” = 0.56 e- for Cu X-rays. TheS-SAS method requires careful data collection andcrystals that diffract to 2Å resolution.

• A high symmetry space group (more internalsymmetry equivalents) increases the chance of success.

• The use of soft X-rays such as Cr Kα (λ = 2.2909Å)X-rays doubles the sulfur signal (Δf” = 1.14 e-).

• There over 20 S-SAS structures in the Protein DataBank.


What is the Limit of the SASMethod?

• Electron density maps of Rhe by Sulfur-ISAS• Calculated using simulated data in 1983• Δf” = 0.56e- using Cu Kα X-rays

Wang (1985), Methods Enzymol. 115: 90-112


Molecular Replacement

• Molecular replacement has proven effective forsolving macromolecular crystal structures basedupon the knowledge of homologous structures.

• The method is straightforward and reduces the timeand effort required for structure determinationbecause there is no need to prepare heavy atomderivatives and collect their data.

• Model building is also simplified, since little or nochain tracing is required.


Molecular Replacement:Practical Considerations

• The 3-dimensional structure of the search model must bevery close (< 1.7Å r.m.s.d.) to that of the unknownstructure for the technique to work.

• Sequence homology between the model and unknownprotein is helpful but not strictly required. Success hasbeen observed using search models having as low as 17%sequence similarity.

• Several computer programs such as AmoRe, X-PLOR/CNS PHASER are available for MR calculations.


px.cryst.bbk.ac.uk/03/sample/molrep.htm

How Molecular Replacment works• Use a model of the

protein to estimatephases

• Must be a structuralhomologue(RMSD < 1.7Å)

• Two-step process:rotation and translation

• Find orientation ofmodel (red→ black)

• Find location of orientedmodel (black→ blue)


Using a protein model to estimate phases:the rotation function

• We need to determine the model’sorientation in X1’s unit cell

• We use a Patterson search approach in(α,β,γ), which are Euler anglesassociated with the rotational space


Euler angles forrotation function

The coordinate systemis rotated by:

• an angle α aroundthe original z axis;

• then by an angle βaround the new yaxis;

• and then by an angleγ around the final zaxis.

zyz convention


Using a protein model to estimate phases:translation function

• We need to determine the oriented model’slocation in X1’s unit cell

• We do this with an R-factor search, where


Translation functions• Oriented model is stepped through the X1 unit

cell using small increments in x, y, and z (e.g.x → x+ step)

• The point where R is lowest represents thecorrect location

• There exists an alternative method that usesmaximum likelihood to find the translationpeak; this notion is embodied in the softwarepackage PHASER by Randy Read

Documents

Overview of the Phase Problem - Illinois Institute of ...agni.phys.iit.edu/~howard/ACASchool/lectures07/Phasing2.pdfFrom Glusker, Lewis and Rossi Note: F P = protein F H = heavy atom