Electron Density Structure factor amplitude defined as: F unit cell (S) = ∫ r (r) · exp (2 i r · S) dr Using the inverse Fourier Transform (r)

Electron Density• Structure factor amplitude defined as:

Funit cell(S) = ∫r (r) · exp (2i r · S) dr

• Using the inverse Fourier Transform

(r) = ∫r F(S) · exp (-2i r · S) dS

• In practice you make a discrete inverse Fourier Transform:

(r) = hkl Fhkl · exp (-2i hkl)

• But measure the X-ray diffraction intensities

Ihkl |F∝ hlk|2

• The fact that you cannot measure hkl directly is called

THE PHASE PROBLEM

Solutions to the Phase Problem• Direct methods:

- Based upon systematic relations between certain reflections. - Need high resolution data & relatively small systems.- Overwhelmingly most popular for small molecule structures.

• Molecular replacement:- Find a molecule of known structure which is close enough to your protein of interest to provide a good first guess.- Becoming more popular as the spectrum of possible structures is filled up.

• Heavy atom methods:- Soak an atom which is a strong scatter (eg. Hg, Fe, Pb, I,Se ..) into your crystal. - Replace the methionine’s in your protein with selenio-methionine derivatives.- Use Multiple Isomorphous replacement. - Use Multiple or Single Anomolous Diffraction. - An old & powerful method for finding phases.

Heavy atom methods• Must search around for a heavy atom which binds within your crystal & doesn’t destroy the crystal lattice.

- Can be extremely frustrating!- Many soaking, freezing & diffraction experiments.

- The heavy atom must bind in an ordered way to the protein.

• Suppose you have a protein with structure factor FP.

• For every X-ray intensity measured the addition of the heavy atom adds a term FH to the scattering:

FP

FPH

FH

Heavy atom methods

• Assuming that FH has an angle H, the structure factor amplitude will be perturbed by

FPH = | FPH | – | FP | = | FH | cos P - H

FP

FPH

FH PH - H

Heavy atom methods

• If you can find the location of you heavy atom rH, then you can calculate the heavy atom structure factor

FH = fH exp 2i rH · Shkl

just like any other atom within the protein.

• If you have already measured FH and FPH you can recover a constraint on the phases for the protein:

FP

FPH

FH

FP

FPH

FH

Additional constraints• A single heavy atoms derivative gives you a two complex phases which may be correct for every reflection (h,k,l). • For the approach to work crystals must be isomorphous (ie. Same a, b, c, , , & space group). • In order to determine which phase is correct you need to find additional derivatives.• Called Multiple Isomorphous Replacement

Additional constraints• If you have a second constraint:

-Better to draw diagram with FP centred at origin rather than FH & FPH.

- This then represents FP = FPH – FH

• Should recover one place where three circles intersect.• This is your solution for the phases.

FP

Possible solution for light green & light blue

measurements.

Possible solution for dark green & dark blue

measurements.

Combining phase information• In practice there are errors with respect to each heavy atom experiment.

- Therefore you recover a probability distribution for a phase, rather than an absolute phase. - Best to take a weighted sum of the probabilities to determine the experimental phases.

• m is the length of the probability weighted experimental phase & is called the ”figure of merit”.

• Ideally m =1, but in practice m < 1.

hkl

Phkl) m

hkl

Phkl)

m

u

Finding Heavy atom Positions• The previous treatment relied on finding the heavy atom positions. • In practice this is usually solvable using the Patterson Function.

P(u,v,w) = hkl | Fhkl |2 exp 2i u·S

• In this case you can calculate it without any phase information, but directly from the measured intensities.

• Note that | Fhkl |2 = Fhlk × Fhkl* hence

P(u,v,w) = (r) ✴ (r) = ∫ (r) × †(r + u) dr

where we use the convolution theorem for Fourier Transforms.• †(r + u) represents the inverse of the real electron density. • eg. If you had two heavy atoms in a unit cell the Patterson function would look like:

u

Patterson Function• Since

P(u,v,w) = (r) ✴ (r) = ∫ (r) × (r+u) dr

This means that the Patterson Map gives you:• A central peak for u = 0 since the density sits upon itself for all atoms (including the protein).• An additional peak whenever one heavy atom sits upon another.

- It rapidly becomes very complicated but can be solved (used to be done by inspection) if you have a small number of heavy atoms.

Difference Patterson Function• Since you are looking for the scattering from the heavy atoms & the protein adds only background its more convenient to calculate the difference Patterson:

P(u,v,w) = hkl ( | FPH |2 - | FP |2) exp 2i u·S

• This difference Patterson map gives you the vectors between the heavy atoms directly.

- Somewhat easier to interpret than the Patterson for FPH itself since it has less background from protein-protein distances.

Solving the Difference Patterson• If you are successful with recovering a derivative & recover a Patterson function then you must solve it to proceed.

- The Patterson function gives you a set of constraints in 3D which are the distances between heavy atoms.

• There exist algorithms for finding unique solutions which work for a reasonable number of heavy atoms

- usually ten or less; although I think the record may be more than twenty.- The number of non-origin peaks is N(N-1) for N heavy atoms.

⇒ invert

Magnitude of intensity changes with heavy atoms• May ask how one (or a few) heavy atoms within a protein of eg. 50 kDa could be seen?

- eg. Hg has 80 e-

- A protein a sea of eg. 4,000 non-carbon atoms with faverage ≈ 7 e- (ie 28,000 e- in total).

- How can you see 80 e-/28000 e- = 0.3 % ?• On average:

Fprotein ≈ faverage × √N

(N is the number of non-hydrogen atom random walk result) hence

Iprotein | F∝ atom |2 ≈ (faverage)2 × N

• But for the heavy atom

IH F∝ H2

• Now

|Fprotein + FH ≈ |faverage | × √N + |FH|

• Hence

IPH ≈ | faverage × √N + FH |2

IPH ≈ | faverage × √N + FH |2 ≈ (faverage × √N)2 + 2 (faverage × √N) × FH + FH 2

•Therefore

IPH ≈ (faverage × √N)2 + 2 (faverage × √N) × FH + FH 2 - (faverage × √N)2

= 2 (faverage × √N) × FH + FH 2

• The first term can be evaluate for eg. 4000 non-hydrogen atoms:

2 (faverage × √N) × FH = 2 × 7 × √4000 × 80 = 70,835

• The second term

FH 2 = 802 = 6,400

• And

Iprotein | F∝ atom |2 ≈ (faverage)2 × N = 196,000

• Hence in this case

IPH/Iprotein ≈ 71/196 = 36 %

Protein weight 100 % occupancy 50 % occupancy

14,000 51 % 25 %

28,000 36 % 18 %

56,000 25 % 12 %

112,000 18 % 9 %

224,000 13 % 6 %

448,000 9 % 4 %

Average intensity change if one heavy atom is bound to a protein

In practice changes of 14 % is a good phasing measurement.

Lack of Closure• In practice when you have found the phases experimentally there is some mis-match: • The mis-match is called the lack of closure & is given the symbol .

FP

FPH

FH

Ideal case

FP

FPH

FH

Reality

Phasing power• From the miss-match you can estimate the phasing power

Phasing Power = √( |FH|2 / 2)

• A phasing power of 4 is excellent & is rare: • A value between 1 & 2 is acceptable & means that the scattering of the heavy-atom is larger than the lack of closure.

FP

FPH

FH

First Map.• Once you have recovered the experimental phases you can make a Fourier transform. • At that point if the electron density is good enough you can build a structural model into the density. • If this goes well you can then make interative rounds of structural refinement, phase improvement, and more model building until you recover a satisfactory model.

Anomalous scattering

• An assumption to date was that X-rays scattering from an atom, fj, was a real number:

- In physics this is equivalent to assuming that all electrons can be treated as scattering from free electrons & therefore contribute a phase change of for the scattered X-ray.

• When you go close to the atomic energy levels of certain atoms then you can no longer assume that this holds.• X-rays scatter with an anomalous scattering term near an absorption edge.

A transition from K to L

shell electrons. A photoelectron ejected from the

K shell.

Atomic absorption coeffecient• For example copper has a K-absorption edge at 1.38 Å due to the photoelectric effect. • There are also transitions from K to L at 1.43 Å.• If you have Cu in your protein & you record near 1.43 Å you will have an anomalous scattering from this atom.

1.0 2.0X-ray wavelength (Å)

X-r

ay a

bso

rpti

on

Anomalous scattering• To describe the anomalous scattering (which is wavelength dependent) we modify the atomic scattering factor for the particular heavy atom:

fanom = f + f + if'' = f + f' + f''

• Two different symbols are used for the second term depending on what book you look at.

f f'

f''fanom

Consequence for Friedel paris

• Earlier we showed that the reflection (h,k,l) & its opposite (-h,-k,-l) have the same intensity since

F-h-k-l = j fj(Shkl) · exp (2i rj · S-h-k-l) = (Fhkl)*

& therefore Ihkl = I-h-k-l

- These are called Friedel pairs & have the same intensity:• When you have anomalous scattering this no longer holds:

F-h-k-l

Fhkl

FPH(-h-k-l)

FH

F*H

No anomalous scattering

F-h-k-l

Fhkl

FH

F*H

With anomalous scattering

FPH(hkl)

FPH(-h-k-l)

FPH(hkl)

• When you have anomalous scattering this no longer holds:

• Frequently draw the picture with the FPH(-h-k-l) reflected in the real axis.

• When scaling data it is possible to measure these differences due to the anomalous signal.

Fhkl

FH

FPH(-h-k-l)

FPH(hkl)

f’f’’

-f’’

f f'

f''fanom

|FPH(hkl)| – |FPH(-h-k-l)|

|F|ano Patterson map

• If we make a measurement and are careful to measure all the Friedel pairs we can define:

|Fano| = {|FPH(hkl)| – |FPH(-h-k-l)|} × f’/2f’’

where the scaling factor f’/2f’’ is put in for technical normalisation reasons. • If we then calculate the Patterson map using

Pano(u) = hkl |Fano|2 exp 2pi u·S

This gives us the inter-distance constraints for the anomolous scatterers.

- This is powerful since you only need one set of observations & not two, so the noise level is low even if |Fano| is small.

Multiple Anomalous Diffraction• It is possible to accurately tune synchrotron radiation.

- It is therefore possible to collect diffraction data from the same crystal at different wavelengths.

• Near an X-ray absorption edge the real & imaginary components f’ & f’’ change rapidly:

12.4 12.5 12.6 12.7 12.8

X-ray Energy (keV)

f’’

(ele

ctro

ns)

X-ray Energy (keV)

12.4 12.5 12.6 12.7 12.8

f’ (

elec

tro

ns)5

1-10

0

Multiple Anomalous Diffraction• As such it is possible to record three diffraction data sets from a single crystal:

- At the peak where f’ has its maximum.

- At the peak where f’’ has its minimum (ie. Most negative).

- Far removed from the above two wavelengths.• This is called Multiple Anomalous diffraction (MAD) since you use three wavelengths to get your data.

- You can solve the structure from a single crystal (ie. don’t need additional derivatives or another native data set).- No problems with crystal being isomorphous.

• The text book has a detailed description of the algebra of solving structures using MAD, but for this course you can assume its (more or less) the same as having one native plus two derivative data sets.

SAD & SIR• In addition to MIR & MAD there is Single Isomorphous Replacement (SIR) or Single Anomalous Diffraction (SAD).

- The same as MIR or MAD but with one data set rather than several. • In SIR if your data is good you assume that your phases from one heavy atom are the sum of the two possible phases with 50 % probability.

- Then go ahead & calculate a map.

- The wrong choice of phases adds noise.- Relies on the data being good enough to overcome additional noise.

• SAD is like SIR but with the variation that you also use the anomalous signal to give additonal information.

FP

FPH

FH

Molecular Replacement•If you believe that your protein may have a structure similar to that of another protein of known structure you can use that information.

- Solving the structure by making a good guess!• The problem is to place the possible structure correctly within the unit cell & then you calculate a first electron density map using phases calculated from the know structure.

? • Also useful if you have sub-domains of known structure.

The Rotation Function• Once you choose a molecular replacement model you first need to determine its orientation.

- Done by comparing the experimental Patterson Map with one calculated from your candidate structure. - The Patterson map is sensitive to the orientation of the molecule but not its position within the unit cell.

Rotation

Cross Rotation Function• An overlap function is defined as R of the experimental P(u) with the rotated version of the candidate model Patterson P’r(ur) is defined as:

R() = ∫ P(u) × P’r(ur) d(u)

• Where are normally the Euler angles describing rotation.• R() is maximal when the rotational angle is correct & is not affected by where in the unit cell the candidate structural model is placed.

• Poor correlation because the overlap is not excellent.• If rotated can recover perfect overlap ie. R() is a maximum.

Translation function• Once you have found the optimal rotational orientation next need to find the correct translation.

- In this case the Patterson function is useless since its insensitive to translation.

• What one does is move the molecule around in the unit cell & calculate the theoretical Fhkl

calc values & compare these with the experimental Fhkl

obs.

?

R-factor & Correlation Coeffecient• Two numbers are optimised:• The R-factor

R = hkl ||Fobs| - k|Fcalc||/ (hkl |Fobs|)

which should be minimised

- ie. calculated structure factors are as close as possible to the experimental structure factors.

• The Standard Linear Correlation Coeffecient

C = hkl (|Fobs|2 - <|Fobs|2>) × (|Fcalc|2 - <|Fcalc|2>)

× {hkl (|Fobs|2 - <|Fobs|2>)2 × hkl (|Fcalc|2 - <|Fcalc|2>)2 }-1/2

which should be maximised.- ie. When Fobs is much greater than the average, Fcalc should also be greater than the average etc.

• Useful values are C > 30 % & R< 55 %.

Direct methods• If you have a small number of atoms & very good resolution you may recover a structure from a native data set. • The concept is that there are phase-relations between different Bragg reflections:

(h1) + (h2) + (-h1 – h2) = 0

• Geometrical arguments can be used to show that this holds when atoms sit on the lattice planes but not in-between lattice planes (chapter 11 of the textbook).

- In practice this assumption holds approximately for very strong reflections, which are strong since most of the atoms are scattering in phase & therfore on the same lattice plane.

- This assumption itself breaks down for large proteins.

Shake & Bake• Direct methods is very successful for small molecule crystallography.

- In practice you pick a few phases & derive the rest from the triplet relations for a limited number of strong reflections.

- For molecules with > 150 non-hydrogen atoms the unit cell is so evenly filled with atoms that the phase triplet relation doesn’t work.

• An algorithm has been written to extend up to about 1000 non-hydrogen atoms but requires about 1.2 Å data.

- The principle is that the phase triplet is no longer set to zero but obeys a probability distribution. - You then shake the phase angles in reciprocal space & bake out the low density regions in real space.

Protein Data Bank Entries • This month the number of entries in the protein data bank will surpass 30,000. • Current deposit rate is approaching 6,000 per year or 20 per day.

- Not all are unique!• X-ray crystallography is responsible for approx 75 %

- Phasing methods have been very successful!

Documents

Electron Density Structure factor amplitude defined as: F unit cell (S) = ∫ r (r) · exp (2 i r · S) dr Using the inverse Fourier Transform (r)