Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe

Computing Protein Structures from Electron Density Maps: The

Missing Loop Problem

I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe

Protein Structure: Experimental Techniques

Nuclear Magnetic Resonance (NMR) spectroscopy – limited to short sequences.

X-ray crystallography

X-ray Crystallography

Crystallizing protein samples

Collect X-ray diffraction images

Calculate electronic charge – a 3-D Electron Density Map (EDM)

Electron Density Map

3-D “image” of atomic structure– High value (electron density) at atom centers– Density falls off exponentially away from center– Limited resolution, sampled on 3D grid

The End Goal: Build Protein Model from EDM

Completeness of automatically generated models varies with experimental data quality:

High Resolution 90% completeness.Low Resolution 2/3 completeness.

Completing the missing fragments manually is time consuming.

Experimental Data Quality Varies

Recovering the phase of diffracted beam is associated with error.

Resolution at which data were collected (High resolution images cannot be obtained for all proteins)

Not all replicas of protein in the protein crystal are identical

Mobility of molecule fragments Temperature dependent atomic vibration

Existing Techniques

Existing software rely on: Pattern recognition techniques Unambiguous density Elementary stereochemical constraints.

Model Refinement

Standard Maximum Likelihood (ML) algorithms exploit experimental and model phase information to build new refined models.

Iterating model building and refinement steps improves completeness and quality of models.The problem: missing fragments (Usually loops).The solution: filling the gaps at early stage.

Goal: Propose Candidates to Missing Fragments

Input:– EDM– Known structure– Anchor residues– The amino acid sequence

Output: propose a structure that fall within the radius of convergence of existing refinement tools (1-1.5Å)

Model

Standard Phi-Psi model. Compute backbone, ignore

side chains except Cß and O atoms.

Loop closure Mobile anchor vs.

stationary anchor.Closure is measured as the RMSD distance of the Mobile anchor atoms from stationary anchor atoms.

Stationary Anchor

Mobile Anchor

IK + EDM Loop Structure

Two stages algorithm:1. Guided by the EDM, sample closing

conformation.2. Refine top-ranking conformation, using local

optimization, while maintaining loop closure.

Conformations Ranking – density fit and conformational likelihood.

Stage 1: Generating Loop Candidates

Employ cyclic coordinate descent (CCD) method to obtain closing conformations, up to a tolerance distance dclose.

Starting conformations are obtained by a random procedure, biased by PDB-derived distributions.

Best scoring (95% percentile) conformations are submitted to stage 2.

Cyclic Coordinate Descent (CCD)

Adding the Electron Density Constraints

We would like to guide the loop closing to fit the EDM. For residue i the CDD proposes a distance minimizing dihedral

angles (Φ,Ψ)ip

.

Find a pair (Φ,Ψ)i in a square neighborhood of (Φ,Ψ)ip that

maximizes the local fit to the EDM. The neighborhood’s size is reduced linearly with CCD iterations to allow closure.

Atoms that are changed

by angle pair i and not i+1

Center of atom

Aj

Stage 2: Refining Loop Candidates

Improve models fit to experimental data (This time the model as a whole, as opposed to local fit in stage 1).

Maintain loop-closure constraint during optimization process.

Target Function

For conformation q, the target function T(q) is the sum of the squared differences between the observed density and the calculated density at each grid point in some volume V around the loop.

Scaling Factors Calculated Density (sum of contributions of atoms within a cutoff distance

from gi)Observed Density

Grid Points in Volume

Optimization with Closure Constraints

Generic Approach: Objective function optimization (T(q)) while performing given task (loop-closure) by taking

advantage of manipulator redundancy (DoFs).f(q) : forward kinematics equation.

J(q) : 6-by-n Jacobian

: the change to the end of the chain

J+(q) : an approximation of J-1(q)

N(q) : Orthonormal basis for the Null-Space (n-6 dimensions)

y = əT(q)/əq : gradient vector of objective function T(q)

Minimization Procedure: Monte Carlo and Simulated Annealing

Choose a random sub-chain with at least 8 DoFs. Propose random move with magnitude proportional to current

temperature– High temperature: use exact IK solver (Dill)– Low temperature: pick random direction in null-space

Minimize resulting conformation (gradient decent) Accept using Metropolis criterion:

P(accept qnew) = e^[( T(qprev) - T(qnew) ) / temp] Use simulated annealing – at each step decrease pseudo-

temperature At each step verify closure constrained is satisfied within

tolerance.

Results – High Resolution Data

Applying RESOLVE to the data (high resolution) yielded 88% completed initial model .Applying the alg to a gap of 12 residues.Magenta – the structure from the PDBCyan – Best scoring structure, RMSD = 0.25Å.The lowest RMSD for 7 residues gap at the end of stage 1 is 0.35Å.

Results – Low Resolution Data

Applying RESOLVE yielded a model with 61% completeness.

Applying the alg to a gap of 12 residues.

Magenta – the highest scoring, RMSD 0.6Å.

Yellow – starting conformation (end of stage 1), RMSD = 2.1Å (the lowest)

Documents

Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe