A stochastic approach to Molecular Replacement. Nicholas M. Glykos & Michael Kokkinidis IMBB,...

A stochastic approachA stochastic approachto Molecular Replacement.to Molecular Replacement.

Nicholas M. Glykos & Michael Kokkinidis

IMBB, FORTH, Heraklion, Crete, GREECE

A stochastic approachA stochastic approachto Molecular Replacement.to Molecular Replacement.

Nicholas M. Glykos & Michael Kokkinidis

IMBB, FORTH, Heraklion, Crete, GREECE

“Why ? What’s wrong with AMoRe ?”

“Interesting.

Can we now go back to the AMoRe .log file ?”

stochastic stochastic adj.adj.

1. determined by a random distribution of probabilities.

2. (of a process) characterized by a sequence of random variables.

3. governed by the laws of probability.

Etymology : Gk stokhastikos, f. stokhazomai aim at, guess, f. stokhos aim.

crystal2 ~

crystal2 ~ file Stochastic.ppt

Stochastic.ppt :

c program text with garbage

crystal2 ~

“The classical approach to the problem of placing nn copies of a search model in the asymmetric unit of a target crystal structure, is to divide this 6n6n-dimensional optimisation problem into a succession of three-dimensional searches.”

Acta Cryst. (2000), D56, 169-174

“The classical approach to the problem of placing nn copies of a search model in the asymmetric unit of a target crystal structure, is to divide this 6n6n-dimensional optimisation problem into a succession of three-dimensional searches.”

Acta Cryst. (2000), D56, 169-174

The method(s) :The method(s) :

I. Treat all translational & orientational parameters of all molecules as variables whose values are to be simultaneously and independently determined.

II. Assume that the correct solution corresponds to the (pronounced) global minimum of a suitable (?) statistic (like the R-factor, or the linear correlation coefficient between Fo’s and Fc’s, or, Fo2 and Fc2, or, …).

III. Use simulated annealing (in the form of a modified reverse Monte Carlo method) to explore the 6n-dimensional parameter space.

Other published optimisation techniques include : a genetic algorithm approach (Chang & Lewis, 1997), an evolutionary search methodology (Kissinger et al., 1999) and a systematic 6D search (Sheriff et al., 1999).

The program :The program :

Name : “Queen of Spades” Availability : absolutely free, no warranties

whatsoever. The distribution includes source code plus pre-

compiled executables for Irix, OSF, Linux, Solaris, VMS & windoze.

Download the latest version via http://origin.imbb.forth.gr/~glykos/

Current stable version : αα , Release 0.9.

The reverse Monte Carlo method:The reverse Monte Carlo method:

1. Assign random initial positions & orientations to all molecules present in the asymmetric unit of the target crystal structure. Calculate Fc’s from this arrangement.

2. Calculate the R-factor between the Fo’s and the Fc’s. Call this Rold.

3. Randomly chose and alter the orientation and position of one of the molecules. Calculate the R-factor resulting from the new arrangement (Rnew).

4. If Rnew < Rold , then, the new arrangement is accepted and we start again from (3).

5. If the new R-factor is worse, we still accept the move with probability exp[ –(Rnew – Rold) / T ].

3. Randomly chose and alter the orientation and position of one of the molecules. Calculate the R-factor resulting from the new arrangement (Rnew).

4. If Rnew < Rold , then, the new arrangement is accepted and we start again from (3).

5. If the new R-factor is worse, we still accept the move with probability exp[ –(Rnew – Rold) / T ].

Speeding it up :Speeding it up :

Avoid FFTs : calculate and store (in core) the molecular transform of the search model.

Keep a table containing the contribution of each molecule to each reflection.

CPU time per step ~ Number of reflections in P1.

Annealing schedules :Annealing schedules :

Constant temperature run. Linear temperature gradient (slow cooling). “Heating bath” mode.

Constant temperature run. Linear temperature gradient (slow cooling). “Heating bath” mode.At T=0.3125000, average R=0.59937

At T=0.1562500, average R=0.59707

At T=0.0781250, average R=0.59861

At T=0.0390625, average R=0.59028

At T=0.0195312, average R=0.58783

At T=0.0097656, average R=0.57545

At T=0.0048828, average R=0.55527

At T=0.0024414, average R=0.53016

At T=0.0012207, average R=0.52038

At T=0.0006104, average R=0.51799

At T=0.0003052, average R=0.51524

At T=0.1562500, average R=0.59707

At T=0.0781250, average R=0.59861

At T=0.0390625, average R=0.59028

At T=0.0195312, average R=0.58783

At T=0.0097656, average R=0.57545

At T=0.0048828, average R=0.55527

At T=0.0024414, average R=0.53016

At T=0.0012207, average R=0.52038

At T=0.0006104, average R=0.51799

At T=0.0003052, average R=0.51524

At T=0.1562500, average R=0.59707

At T=0.0781250, average R=0.59861

At T=0.0390625, average R=0.59028

At T=0.0195312, average R=0.58783

At T=0.0097656, average R=0.57545

At T=0.0048828, average R=0.55527

At T=0.0024414, average R=0.53016

At T=0.0012207, average R=0.52038

At T=0.0006104, average R=0.51799

At T=0.0003052, average R=0.51524

At T=0.1562500, average R=0.59707

At T=0.0781250, average R=0.59861

At T=0.0390625, average R=0.59028

At T=0.0195312, average R=0.58783

At T=0.0097656, average R=0.57545

At T=0.0048828, average R=0.55527

At T=0.0024414, average R=0.53016

At T=0.0012207, average R=0.52038

At T=0.0006104, average R=0.51799

At T=0.0003052, average R=0.51524

At T=0.1562500, average R=0.59707

At T=0.0781250, average R=0.59861

At T=0.0390625, average R=0.59028

At T=0.0195312, average R=0.58783

At T=0.0097656, average R=0.57545

At T=0.0048828, average R=0.55527

At T=0.0024414, average R=0.53016

At T=0.0012207, average R=0.52038

At T=0.0006104, average R=0.51799

At T=0.0003052, average R=0.51524

Move size control :Move size control :

Constant move size : max(Δt) =

dmin/max(a,b,c) ) max(Δκ) =

dmin (in degrees).

Move size linearly dependent on current R-factor and time step :

max(Δt) = 0.5 R (1.0 - t/ttotal )

max(Δκ) = π R (1.0 - t/ttotal )

Scaling : To B or not to B ?Scaling : To B or not to B ?

The default is to scale |Fc|’s to |Fo|’s using both a scale and a temperature factor, but …

0.32±0.02 23±5

Bulk solvent correction :Bulk solvent correction :

The absence of a bulk-solvent correction from this type of calculations is a serious problem : introduces systematic errors up to 5Å, makes a low resolution cutoff necessary.

The exponential scaling model algorithm allows a computationally efficient and model-independent correction to be applied : Fcorrected = Fp { 1.0 – ksol exp[ -Bsol / d2 ]

Bulk solvent correction ?Bulk solvent correction ?

Acta Cryst. (2000), D56, 1070-1072

Bulk solvent correction ?Bulk solvent correction ?

Acta Cryst. (2000), D56, 1070-1072

Using the program :Using the program :

Input : a .pdb file, and a formatted file containing h,k,l,F,σ(F).

Running the program :$ Qs –auto 1 , or,

$ Qs –auto 2 , etc. (no scripts), or, $ Qs <script.file>

Output : .pdb files containing the final coordinates for each model, plus a packing diagram for each solution.

Examples : A 5D problem. Examples : A 5D problem.

One molecule of lysozyme per a.u.

Monoclinic space group (C2), 4Å data.

rms deviation of model 1.4Å.

Up to ±20% noise added to error-free data.

About 90 seconds of CPU time per minimisation.

Examples : A 6D problem (1). Examples : A 6D problem (1).

Target structure 1bvx, search model 2lz2 (rms deviation 1.3Å).

One molecule of lysozyme per a.u.

Tetragonal space group (P43212) .

Real 15-4Å data. About 3.8 hours of CPU

time per minimisation.

Target structure 1b6q. 30% solvent. Search model :

incomplete poly-Ala. One monomer of Rop

per a.u. Orthorhombic space

group (C2221) . Real 15-4Å data. About 40 minutes of

CPU time per run.

Target structure 1b6q. 30% solvent. Search model :

incomplete poly-Ala. One monomer of Rop

per a.u. Orthorhombic space

group (C2221) . Real 15-4Å data. About 40 minutes of

CPU time per run.

Examples : An 11D problem.Examples : An 11D problem.

Target structure 1lys, model 2ihl (rmsd 1.52 & 1.56Å).

Two molecules of lysozyme per asymmetric unit.

Monoclinic space group (P21), 4Å data.

±20% noise added to error-free data.

Solutions appear after ~3.8 hours of CPU time.

Disadvantages :Disadvantages :

In most cases, treating the problem as 6n-dimensional is a waste of CPU time.

You can only have one search model (ie you can not search simultaneously with your DNA & protein models).

The structure of the search model is kept fixed throughout the calculation.

The (putative) evidence from the self-rotation function and/or the native Patterson function are ignored

When the starting model deviates significantly from the target structure, (i) there is no guarantee that the global minimum of any chosen statistic will correspond to the correct solution, (ii) traditional methods may be more sensitive in identifying the correct solution.

The (putative) evidence from the self-rotation function and/or the native Patterson function are ignored (but, in a way, for n >n >11 they are also ignored by the traditional methods).

When the starting model deviates significantly from the target structure, (i) there is no guarantee that the global minimum of any chosen statistic will correspond to the correct solution, (ii) traditional methods may be more sensitive in identifying the correct solution.

Advantages :Advantages :

If there are just one or two molecules per asymmetric unit and CPU time is not a problem, the method can be used as a last ditch effort to conclusively show that there is no such thing as a pronounced global minimum (or otherwise ?).

The automatic (black box) mode is really black: no keywords, no scripts, just a .pdb file containing the model and an ASCII file containing h,k,l,F,σ(F).

The computational procedures differ so much from those used in conventional methods, that the results obtained can be considered as independent.

The method is honest in the sense that it is rather unlikely to find a wrong solution which will give a simultaneous sudden drop of both the R and Rfree leading to a solution with a reasonable packing arrangement.

A word of caution …A word of caution …

Res R Corr

0.020 0.57 0.66

0.030 0.68 0.43

0.040 0.61 0.58

0.050 0.66 0.43

0.060 0.64 0.50

0.111 0.61 0.42

----- ---- ----

0.64 0.51

Res R Corr

0.020 0.57 0.66

0.030 0.68 0.43

0.040 0.61 0.58

0.050 0.66 0.43

0.060 0.64 0.50

0.111 0.61 0.42

----- ---- ----

0.64 0.51

Res R Corr

0.020 0.57 0.66

0.030 0.68 0.43

0.040 0.61 0.58

0.050 0.66 0.43

0.060 0.64 0.50

0.111 0.61 0.42

----- ---- ----

0.64 0.51

Conclusion :Conclusion :

Substituting computing for thinking will probably fail for nn ≥ 3.

A stochastic approach to Molecular Replacement. Nicholas M. Glykos & Michael Kokkinidis IMBB,...

Documents

Prediction of > 3000 novel human microRNAs … Martin Reczko ICS/IMBB Bioinformatics Program Biomedical Informatics Lab Institute for Computer Science –

Foundation for Research and Technology-Hellas …...Foundation for Research and Technology-Hellas-FORTH-Institute of Molecular Biology & Biotechnology-IMBB-Heraklion, Crete, Greece

Mupid ex - hpc.ilri.cgiar.orghpc.ilri.cgiar.org/beca/training/IMBB/course/Mupid_gel_system.pdf · Mupid®-ex I Submarine type electrophoresis systemI 3 Instruction Manual Congratulations

Generation and modulation of network oscillations in the ... · Generation and modulation of network oscillations in the rodent prefrontal cortex in vitro Vasileios Glykos Thesis

RESTRICTION ENZYME ANALYSIS - CGIARhpc.ilri.cgiar.org/.../IMBB/lectures/RESTRICTION_ENZYME_ANALYSIS.pdf · NEBcutter - Custom Digest - Mozilla Firefox File Edit View History Bookmarks

Journal of Structural Biology - Utopiautopia.duth.gr/~glykos/pdf/oligo_TTSS.pdf · cAstbury Centre for Structural Molecular Biology, School of Biochemistry and Molecular Biology,

Committees - FORTH-IMBB · Scientific Topics •Innate Immunity •Adaptive Immunity •Toll-like Receptors •Cytokines •B Cells •T Cells •Dendritic Cells •Tolerance

Mια Μεντελική προσέγγιση - FORTH-IMBB · iGenetics Ακαδημαϊκές Εκδόσεις2009 3 ΕΙΚΟΝΑ16.1. Θέση περιορισμού συμμετρική

NCBI Molecular Biology Resourceshpc.ilri.cgiar.org/beca/training/IMBB/lectures/NCBI_2013... · 2013. 5. 16. · NCBI Molecular Biology Resources May 16, 2012 . BLAST Sequence VAST

Glykos V, Whittington MA, LeBeau FEN. Subregional differences in …eprint.ncl.ac.uk/file_store/production/214487/4D211A92... · 2015-07-17 · J Physiol 000.0 (2015) pp 1–19 1

Mια Μεντελική προσέγγιση - FORTH-IMBB · YAC:φορέας που επιτρέπει την κλωνοποίηση μεγάλων τμημάτων DNA (0.2-2Mb)

Kostas Tokatlidis University of Crete and IMBB-FORTH Oxidative folding in mitochondria: From electron transfer reactions to cellular architecture and disease

ΑΝΑΡΤΗΤΕΑ ΣΤO ΔΙΑΔΙΚΤΥΟ - FORTH-IMBB · 2018-04-13 · Δρασηριόηες ση Βιοϊαρική Τεχνολογία- Ήπειρος» ҮΒΙΤ-ΗΠү, νης

Journal of Structural Biologyutopia.duth.gr/glykos/pdf/oligo_TTSS.pdf · bDepartment of Molecular Biology and Genetics, Democritus University of Thrace, Alexandroupolis, ... ded into

Stochastic Molecular Replacement. Nicholas M. Glykos MBG, DUTH, Alexandroupolis, Greece

0 oooo 0 XI 0 Yeast Sequencing Reports - FORTH-IMBB€¦ · complete sequencing of a 24.6 kb segment of chromosome xi 665 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401

The Axon Guide - FORTH-IMBB › images › facilities › Cells_Animals...PREFACE Axon Instruments, Inc. is pleased to present you with The Axon Guide, a laboratory guide to electrophysiology

IMBB 2013 Genomic DNA purification. Why purify DNA? The purpose of DNA purification from the cell/tissue is to ensure it performs well in subsequent downstream

The Bio-Organic Materials Chemistry Laboratory...The Bio-Organic Materials Chemistry Laboratory (BOMCLab) is a joint activity between the IESL and IMBB institutes of FORTH to provide

Multidimensional Molecular Replacement. Nicholas M. Glykos & Michael Kokkinidis IMBB, FORTH, Heraklion, Crete, GREECE