A stochastic approach to Molecular Replacement. Nicholas M. Glykos & Michael Kokkinidis IMBB,...

Preview:

Citation preview

A stochastic approachA stochastic approachto Molecular Replacement.to Molecular Replacement.

Nicholas M. Glykos & Michael Kokkinidis

IMBB, FORTH, Heraklion, Crete, GREECE

A stochastic approachA stochastic approachto Molecular Replacement.to Molecular Replacement.

Nicholas M. Glykos & Michael Kokkinidis

IMBB, FORTH, Heraklion, Crete, GREECE

“Why ? What’s wrong with AMoRe ?”

“Why ? What’s wrong with AMoRe ?”

“Interesting.

Can we now go back to the AMoRe .log file ?”

stochastic stochastic adj.adj.

1. determined by a random distribution of probabilities.

2. (of a process) characterized by a sequence of random variables.

3. governed by the laws of probability.

Etymology : Gk stokhastikos, f. stokhazomai aim at, guess, f. stokhos aim.

crystal2 ~

crystal2 ~ file Stochastic.ppt

Stochastic.ppt :

c program text with garbage

crystal2 ~

crystal2 ~

“The classical approach to the problem of placing nn copies of a search model in the asymmetric unit of a target crystal structure, is to divide this 6n6n-dimensional optimisation problem into a succession of three-dimensional searches.”

“The classical approach to the problem of placing nn copies of a search model in the asymmetric unit of a target crystal structure, is to divide this 6n6n-dimensional optimisation problem into a succession of three-dimensional searches.”

“The classical approach to the problem of placing nn copies of a search model in the asymmetric unit of a target crystal structure, is to divide this 6n6n-dimensional optimisation problem into a succession of three-dimensional searches.”

“The classical approach to the problem of placing nn copies of a search model in the asymmetric unit of a target crystal structure, is to divide this 6n6n-dimensional optimisation problem into a succession of three-dimensional searches.”

Acta Cryst. (2000), D56, 169-174

“The classical approach to the problem of placing nn copies of a search model in the asymmetric unit of a target crystal structure, is to divide this 6n6n-dimensional optimisation problem into a succession of three-dimensional searches.”

Acta Cryst. (2000), D56, 169-174

The method(s) :The method(s) :

I. Treat all translational & orientational parameters of all molecules as variables whose values are to be simultaneously and independently determined.

The method(s) :The method(s) :

II. Assume that the correct solution corresponds to the (pronounced) global minimum of a suitable (?) statistic (like the R-factor, or the linear correlation coefficient between Fo’s and Fc’s, or, Fo2 and Fc2, or, …).

The method(s) :The method(s) :

III. Use simulated annealing (in the form of a modified reverse Monte Carlo method) to explore the 6n-dimensional parameter space.

The method(s) :The method(s) :

III. Use simulated annealing (in the form of a modified reverse Monte Carlo method) to explore the 6n-dimensional parameter space.

Other published optimisation techniques include : a genetic algorithm approach (Chang & Lewis, 1997), an evolutionary search methodology (Kissinger et al., 1999) and a systematic 6D search (Sheriff et al., 1999).

The program :The program :

Name : “Queen of Spades” Availability : absolutely free, no warranties

whatsoever. The distribution includes source code plus pre-

compiled executables for Irix, OSF, Linux, Solaris, VMS & windoze.

Download the latest version via http://origin.imbb.forth.gr/~glykos/

Current stable version : αα , Release 0.9.

The reverse Monte Carlo method:The reverse Monte Carlo method:

1. Assign random initial positions & orientations to all molecules present in the asymmetric unit of the target crystal structure. Calculate Fc’s from this arrangement.

2. Calculate the R-factor between the Fo’s and the Fc’s. Call this Rold.

The reverse Monte Carlo method:The reverse Monte Carlo method:

3. Randomly chose and alter the orientation and position of one of the molecules. Calculate the R-factor resulting from the new arrangement (Rnew).

4. If Rnew < Rold , then, the new arrangement is accepted and we start again from (3).

5. If the new R-factor is worse, we still accept the move with probability exp[ –(Rnew – Rold) / T ].

The reverse Monte Carlo method:The reverse Monte Carlo method:

3. Randomly chose and alter the orientation and position of one of the molecules. Calculate the R-factor resulting from the new arrangement (Rnew).

4. If Rnew < Rold , then, the new arrangement is accepted and we start again from (3).

5. If the new R-factor is worse, we still accept the move with probability exp[ –(Rnew – Rold) / T ].

Speeding it up :Speeding it up :

Avoid FFTs : calculate and store (in core) the molecular transform of the search model.

Keep a table containing the contribution of each molecule to each reflection.

CPU time per step ~ Number of reflections in P1.

Annealing schedules :Annealing schedules :

Constant temperature run. Linear temperature gradient (slow cooling). “Heating bath” mode.

Annealing schedules :Annealing schedules :

Constant temperature run. Linear temperature gradient (slow cooling). “Heating bath” mode.At T=0.3125000, average R=0.59937

At T=0.1562500, average R=0.59707

At T=0.0781250, average R=0.59861

At T=0.0390625, average R=0.59028

At T=0.0195312, average R=0.58783

At T=0.0097656, average R=0.57545

At T=0.0048828, average R=0.55527

At T=0.0024414, average R=0.53016

At T=0.0012207, average R=0.52038

At T=0.0006104, average R=0.51799

At T=0.0003052, average R=0.51524

Annealing schedules :Annealing schedules :

Constant temperature run. Linear temperature gradient (slow cooling). “Heating bath” mode.At T=0.3125000, average R=0.59937

At T=0.1562500, average R=0.59707

At T=0.0781250, average R=0.59861

At T=0.0390625, average R=0.59028

At T=0.0195312, average R=0.58783

At T=0.0097656, average R=0.57545

At T=0.0048828, average R=0.55527

At T=0.0024414, average R=0.53016

At T=0.0012207, average R=0.52038

At T=0.0006104, average R=0.51799

At T=0.0003052, average R=0.51524

Annealing schedules :Annealing schedules :

Constant temperature run. Linear temperature gradient (slow cooling). “Heating bath” mode.At T=0.3125000, average R=0.59937

At T=0.1562500, average R=0.59707

At T=0.0781250, average R=0.59861

At T=0.0390625, average R=0.59028

At T=0.0195312, average R=0.58783

At T=0.0097656, average R=0.57545

At T=0.0048828, average R=0.55527

At T=0.0024414, average R=0.53016

At T=0.0012207, average R=0.52038

At T=0.0006104, average R=0.51799

At T=0.0003052, average R=0.51524

Annealing schedules :Annealing schedules :

Constant temperature run. Linear temperature gradient (slow cooling). “Heating bath” mode.At T=0.3125000, average R=0.59937

At T=0.1562500, average R=0.59707

At T=0.0781250, average R=0.59861

At T=0.0390625, average R=0.59028

At T=0.0195312, average R=0.58783

At T=0.0097656, average R=0.57545

At T=0.0048828, average R=0.55527

At T=0.0024414, average R=0.53016

At T=0.0012207, average R=0.52038

At T=0.0006104, average R=0.51799

At T=0.0003052, average R=0.51524

Annealing schedules :Annealing schedules :

Constant temperature run. Linear temperature gradient (slow cooling). “Heating bath” mode.At T=0.3125000, average R=0.59937

At T=0.1562500, average R=0.59707

At T=0.0781250, average R=0.59861

At T=0.0390625, average R=0.59028

At T=0.0195312, average R=0.58783

At T=0.0097656, average R=0.57545

At T=0.0048828, average R=0.55527

At T=0.0024414, average R=0.53016

At T=0.0012207, average R=0.52038

At T=0.0006104, average R=0.51799

At T=0.0003052, average R=0.51524

Move size control :Move size control :

Constant move size : max(Δt) =

dmin/max(a,b,c) ) max(Δκ) =

dmin (in degrees).

Move size linearly dependent on current R-factor and time step :

max(Δt) = 0.5 R (1.0 - t/ttotal )

max(Δκ) = π R (1.0 - t/ttotal )

Scaling : To B or not to B ?Scaling : To B or not to B ?

The default is to scale |Fc|’s to |Fo|’s using both a scale and a temperature factor, but …

Scaling : To B or not to B ?Scaling : To B or not to B ?

The default is to scale |Fc|’s to |Fo|’s using both a scale and a temperature factor, but …

Scaling : To B or not to B ?Scaling : To B or not to B ?

The default is to scale |Fc|’s to |Fo|’s using both a scale and a temperature factor, but …

0.32±0.02 23±5

Bulk solvent correction :Bulk solvent correction :

The absence of a bulk-solvent correction from this type of calculations is a serious problem : introduces systematic errors up to 5Å, makes a low resolution cutoff necessary.

Bulk solvent correction :Bulk solvent correction :

The absence of a bulk-solvent correction from this type of calculations is a serious problem : introduces systematic errors up to 5Å, makes a low resolution cutoff necessary.

The exponential scaling model algorithm allows a computationally efficient and model-independent correction to be applied : Fcorrected = Fp { 1.0 – ksol exp[ -Bsol / d2 ]

}

Bulk solvent correction :Bulk solvent correction :

The absence of a bulk-solvent correction from this type of calculations is a serious problem : introduces systematic errors up to 5Å, makes a low resolution cutoff necessary.

The exponential scaling model algorithm allows a computationally efficient and model-independent correction to be applied : Fcorrected = Fp { 1.0 – ksol exp[ -Bsol / d2 ]

}

Bulk solvent correction ?Bulk solvent correction ?

Acta Cryst. (2000), D56, 1070-1072

Bulk solvent correction ?Bulk solvent correction ?

Acta Cryst. (2000), D56, 1070-1072

Using the program :Using the program :

Input : a .pdb file, and a formatted file containing h,k,l,F,σ(F).

Running the program :$ Qs –auto 1 , or,

$ Qs –auto 2 , etc. (no scripts), or, $ Qs <script.file>

Output : .pdb files containing the final coordinates for each model, plus a packing diagram for each solution.

Examples : A 5D problem. Examples : A 5D problem.

One molecule of lysozyme per a.u.

Monoclinic space group (C2), 4Å data.

rms deviation of model 1.4Å.

Up to ±20% noise added to error-free data.

About 90 seconds of CPU time per minimisation.

Examples : A 6D problem (1). Examples : A 6D problem (1).

Target structure 1bvx, search model 2lz2 (rms deviation 1.3Å).

One molecule of lysozyme per a.u.

Tetragonal space group (P43212) .

Real 15-4Å data. About 3.8 hours of CPU

time per minimisation.

Examples : A 6D problem (2). Examples : A 6D problem (2).

Target structure 1b6q. 30% solvent. Search model :

incomplete poly-Ala. One monomer of Rop

per a.u. Orthorhombic space

group (C2221) . Real 15-4Å data. About 40 minutes of

CPU time per run.

Examples : A 6D problem (2). Examples : A 6D problem (2).

Target structure 1b6q. 30% solvent. Search model :

incomplete poly-Ala. One monomer of Rop

per a.u. Orthorhombic space

group (C2221) . Real 15-4Å data. About 40 minutes of

CPU time per run.

Examples : An 11D problem.Examples : An 11D problem.

Target structure 1lys, model 2ihl (rmsd 1.52 & 1.56Å).

Two molecules of lysozyme per asymmetric unit.

Monoclinic space group (P21), 4Å data.

±20% noise added to error-free data.

Solutions appear after ~3.8 hours of CPU time.

Disadvantages :Disadvantages :

In most cases, treating the problem as 6n-dimensional is a waste of CPU time.

You can only have one search model (ie you can not search simultaneously with your DNA & protein models).

The structure of the search model is kept fixed throughout the calculation.

Disadvantages :Disadvantages :

The (putative) evidence from the self-rotation function and/or the native Patterson function are ignored

When the starting model deviates significantly from the target structure, (i) there is no guarantee that the global minimum of any chosen statistic will correspond to the correct solution, (ii) traditional methods may be more sensitive in identifying the correct solution.

Disadvantages :Disadvantages :

The (putative) evidence from the self-rotation function and/or the native Patterson function are ignored (but, in a way, for n >n >11 they are also ignored by the traditional methods).

When the starting model deviates significantly from the target structure, (i) there is no guarantee that the global minimum of any chosen statistic will correspond to the correct solution, (ii) traditional methods may be more sensitive in identifying the correct solution.

Advantages :Advantages :

If there are just one or two molecules per asymmetric unit and CPU time is not a problem, the method can be used as a last ditch effort to conclusively show that there is no such thing as a pronounced global minimum (or otherwise ?).

The automatic (black box) mode is really black: no keywords, no scripts, just a .pdb file containing the model and an ASCII file containing h,k,l,F,σ(F).

Advantages :Advantages :

The computational procedures differ so much from those used in conventional methods, that the results obtained can be considered as independent.

Advantages :Advantages :

The computational procedures differ so much from those used in conventional methods, that the results obtained can be considered as independent.

The method is honest in the sense that it is rather unlikely to find a wrong solution which will give a simultaneous sudden drop of both the R and Rfree leading to a solution with a reasonable packing arrangement.

A word of caution …A word of caution …

A word of caution …A word of caution …

Res R Corr

0.020 0.57 0.66

0.030 0.68 0.43

0.040 0.61 0.58

0.050 0.66 0.43

0.060 0.64 0.50

0.111 0.61 0.42

----- ---- ----

0.64 0.51

A word of caution …A word of caution …

Res R Corr

0.020 0.57 0.66

0.030 0.68 0.43

0.040 0.61 0.58

0.050 0.66 0.43

0.060 0.64 0.50

0.111 0.61 0.42

----- ---- ----

0.64 0.51

A word of caution …A word of caution …

Res R Corr

0.020 0.57 0.66

0.030 0.68 0.43

0.040 0.61 0.58

0.050 0.66 0.43

0.060 0.64 0.50

0.111 0.61 0.42

----- ---- ----

0.64 0.51

Conclusion :Conclusion :

Substituting computing for thinking will probably fail for nn ≥ 3.

Recommended