Upload
ashley-jefferson
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
2. Introduction to Rosetta and structural modeling
(From Ora Schueler-Furman)• Approaches for structural modeling of proteins • The Rosetta framework and its prediction
modes• Cartesian and polar coordinates• Sampling (finding the structure) and scoring
(selecting the structure)
Structural Modeling of Proteins - Approaches
Prediction of Structure from Sequence
Flowchart Comparison of query sequence to nr database
Similar to a sequence of known structure?
Homology Modeling(Comparative Modeling)
No
Fold Recognition(Threading)
Fits a known fold?
Yes
Yes
Ab initio prediction
No
The Rosetta framework and its prediction modes
A short history of Rosetta
In the beginning: ab initio modeling of protein structure starting from sequence Short fragments of known proteins are
assembled by a Monte Carlo strategy to yield native-like protein conformations
Reliable fold identification for short proteins. Recently improved to high-resolution models (within 2A RMSD)
ATCSFFGRKLL…..
A short history of Rosetta
Success of ab initio protocol lead to extension to Protein design Design of new fold: TOP7 Protein loop modeling; homology modeling Protein-protein docking; protein interface design
Protein-ligand docking Protein-DNA interactions; RNA modeling Many more, e.g. solving the phase problem in
Xray crystallography
ATCSFFGRKLL…..
ATCSFFGRKLL…..
The Rosetta Strategy
• Observation: local sequence preferences bias, but do not uniquely define, the local structure of a protein
• Goal: mimic interplay of local and global interactions that determine protein structure
• Local interactions: fragments derived from known structures (sampled for similar sequences/secondary structure propensity)
• Global (non-local) interactions: buried hydrophobic residues, paired b strands, specific side chain interactions, etc
The Rosetta Strategy
• Local interactions – fragments– Fragment library representing accessible local
structures for all short sequences in a protein chain, derived from known structures
• Global (non-local) interactions – scoring function– Derived from conformational statistics of known
structures
Scoring and Sampling
The basic assumption in structure prediction
Native structure located in global minimum (free) energy conformation (GMEC)
➜A good Energy function can select the correct model among decoys
➜A good sampling technique can find the GMEC in the rugged landscape
EGMEC
Conformation space
Two-Step Procedure
1. Low-resolution step locates potential minima (fast)
2. Cluster analysis identifies broadest basins in landscape
3. High-resolution step can identify lowest energy minimum in the basins (slow)
GMEC
E
Conformation space
Structure Representation:• Equilibrium bonds and
angles (Engh & Huber 1991)
• Centroid: average location of center of mass of side-chain(Centroid | aa, f,)
• No modeling of side chains• Fast
Low-Resolution Step
Bayes Theorem:• Independent components prevent over-counting
P(str | seq) = P(str)*P(seq|str) / P(seq)
Low-Resolution Scoring Function
constantsequence-dependent features
structuredependent features
N
O
OO
N
O
N
O
N
N
O
......
Bayes Theorem: P(str | seq) = P(str) * P(seq | str) / P(seq)
Score = Senv+ Spair + …
neighbors: Cb-Cb <10Ǻ
Sequence-Dependent Components
Rohl et al. (2004) Methods in Enzymology 383:66Origin: Simons et al., JMB 1997; Simons et al., Proteins 1999
P(str | seq) = P(str) * P(seq | str) / P(seq)
Score = … + Srg + Scb + Svdw + …
Structure-Dependent Components
P(str | seq) = P(str) * P(seq | str) / P(seq)
Score = … + Sss + …
Structure-Dependent Components
P(str | seq) = P(str) * P(seq | str) / P(seq)
Score = … + Ssheet+ Shs + …
+ Srama
10
Structure-Dependent Components
Slow, exact step• Locates global energy
minimum
Structure Representation:
• All-atom (including polar and non-polar hydrogens, but no water)
• Side chains as rotamers from backbone-dependent library
• Side chain conformation adjusted frequently
High-Resolution Step
Dunbrack 1997
• Side chains have preferred conformations
• They are summarized in rotamer libraries
• Select one rotamer for each position
• Best conformation: lowest-energy combination of rotamers
High-Resolution Step: Rotamer Libraries
Serine c1 preferences
t=180o
g-=-60og+=+60o
High-Resolution Scoring Function
• Major contributions:– Burial of hydrophobic
groups away from water– Void-free packing of
buried groups and atoms– Buried polar atoms form
intra-molecular hydrogen bonds
Packing interactions
Score = SLJ(atr + rep) + ….
rij
Linearized repulsive part
e: well depth from CHARMm19
High-Resolution Scoring Function
Implicit solvation
Score = … + Ssolvation + ….
Lazaridis & Karplus, Proteins 1999
solvation free energy density of i
polar
polar
High-Resolution Scoring Function
xij=(rij - Ri)/li
xij2
xji2
NH
O Cd
(Kortemme, 2003; Morozov 2004)
Hydrogen Bonds (original function)
Score = …. + Shb(srbb+lrbb+sc) + ….
srbb: short range, backbone HBlrbb: long range, backbone HBsc: HB with side chain atom
High-Resolution Scoring Function
Hydrogen Bonding Energy
Based on statistics from high-resolution structures in the Protein Data Bank (rcsb.org)
lnG kT P
(Kortemme, Morozov & Baker 2003 JMB)
HB HB[ ( ) ( ) ( ) ( )HAE W E E E E
Slide from Jeff Gray
]
Rotamer preference
Score = … + Sdunbrack + ….
Dunbrack, 1997
High-Resolution Scoring Function
One long, generic function ….
Score = Senv+ Spair + Srg + Sc b + Svdw + Sss+ Ssheet+ Shs + Srama + Shb (srbb + lrbb) + docking_score + Sdisulf_cent+ Srs+ Sco + Scontact_prediction + Sdipolar+ Sprojection + Spc+ Stether+ Sfy+ Sw+ Ssymmetry + Ssplicemsd + …..
docking_score = Sd env+ Sd pair + Sd contact+ Sd vdw+ Sd site constr + Sd + Sfab score
Score = SLJ(atr + rep) + Ssolvation + Shb(srbb+lrbb+sc) + Sdunbrack + Spair – Sref + Sprob1b + Sintrares + Sgb_elec + Sgsolt
+ Sh2o(solv + hb) + S_plane
Scoring Function: Summary
Representations of protein structure: Cartesian and polar coordinates
Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI41 0.00 -60.00 -180.00 -60.00 0.00 0.00 0.00 23….……
PDB x y zATOM 490 N GLN A 31 52.013 -87.359 -8.797 1.00 7.06 NATOM 491 CA GLN A 31 52.134 -87.762 -10.201 1.00 8.67 CATOM 492 C GLN A 31 51.726 -89.222 -10.343 1.00 10.90 CATOM 493 O GLN A 31 51.015 -89.601 -11.275 1.00 9.63 O…..….
2 ways to represent the protein structure
Cartesian coordinates (x,y,z; pdb format)
Intuitive – look at molecules in space
Easy calculation of energy score (based on atom-atom distances)
– Difficult to change conformation of structure (while keeping bond length and bond angle unchanged)
Polar coordinates ( - - ;F Y W equilibrium angles and bond lengths)
Compact (3 values/residue)Easy changes of protein
structure (turn around one or more dihedral angles)
– Non-intuitive– Difficult to evaluate energy
score (calculation of neighboring matrix complicated)
A snake in the 2D world
• Cartesian representation:points:(0,0),(1,1),(1,2),(2,2),(3,3)connections (predefined):1-2,2-3,3-4,4-5
x
y(0,0)
(1,1)
(1,2)
(2,2)
(3,3)
1-2
2-3
3-4
4-5
12
3
4
5
A snake in the 2D world
• Internal coordinates:bond lengths (predefined):√2,1,1,√2angles:450,90o,0o,45o
x
y√2
√21
1
x
y
45o
45o
90o
From wikipedia
A snake wiggling in the 2D world
• Constraint: keep bond length fixed
• Move in Cartesian representation
(0,0),(1,1),(1,2),(2,2),(3,3) (0,0),(1,1),(1,2),(2,2),(3,0)
Bond length changed!
x
y
√2
√3
A snake wiggling in the 2D world
• Constraint: keep bond length fixed
• Move in polar coordinates450,90o,0o,45o 450,90o,45o,45o
Bond length unchanged!Large impact on structure
x
y
Polar Cartesian coordinatesConvert r and q to x and y
(0,0),(1,1),(1,2),(2,2),(3,3)
450,90o,0o,45o
√2,1,1,√2
x
y
From wikipedia
Cartesianpolar coordinatesConvert x and y to r and q
(0,0),(1,1),(1,2),(2,2),(3,3)
450,90o,0o,45o
√2,1,1,√2
x
y
Moving the snake to the 3D world
x
y
• Cartesian representation:points: additional z-axis(0,0,0),(1,1,0),(1,2,0),(2,2,0),
(3,3,0)connections (predefined):1-2,2-3,3-4,4-5
• Internal coordinates:bond lengths (predefined):√2,1,1,√2angles:450,90o,0o,45o
dihedral angles: 1800,180o
z
Proteins: bond lengths and angles fixed. Only dihedral angles are varied
Dihedral angles
Dihedral angles c1-c4 define side chain
From wikipedia
• Dihedral angle: defines geometry of 4 consecutive atoms (given bond lengths and angles)
What we learned from our snake
x y
• Cartesian representation: Easy to look at, difficult to move– Moves do not preserve bond length
(and angles in 3D)
• Internal coordinates: Easy to move, difficult to see – calculation of distances between
points not trivial
z
Proteins: bond lengths and angles fixed. Only dihedral angles are varied
Solution: toggle
CALCULATE ENERGY - Cartesian coordinates:
Derive distance matrix (neighbor list) for energy score calculation
Transform: build positions in space according to
dihedral anglesPDB x y zATOM 490 N GLN A 31 52.013 -87.359 -8.797 1.00 7.06 NATOM 491 CA GLN A 31 52.134 -87.762 -10.201 1.00 8.67 CATOM 492 C GLN A 31 51.726 -89.222 -10.343 1.00 10.90 CATOM 493 O GLN A 31 51.015 -89.601 -11.275 1.00 9.63 O…..….
MOVE STRUCTURE - Polar coordinates:
introduce changes in structure by rotating around dihedral angle(s) (change - F Yvalues)
Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI41 0.00 -60.00 -180.00 -60.00 0.00 0.00 0.00 23….……
Transform: calculate dihedral angles from
coordinates
(0,0),(1,1),(1,2),(2,2),(3,3) 450,90o,0o,45o
Cartesian polar coordinates
Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI4…..32 -59.00 -60.00 -180.00 0.00 0.00 0.00 0.00 3334….……
PDB x y z…ATOM 490 C GLN A 31 52.013 -87.359 -8.797 1.00 7.06 NATOM 491 N GLY A 32 52.134 -87.762 -10.201 1.00 8.67 CATOM 492 CA GLY A 32 51.726 -89.222 -10.343 1.00 10.90 CATOM 493 O GLY A 32 51.015 -89.601 -11.275 1.00 9.63 O…..….
How to calculate polar from Cartesian coordinates: example F: C’-N-Ca-C
– define plane perpendicular to N-Ca (b2) vector– calculate projection of Ca-C (b3) and C’-N (b1) onto plane– calculate angle between projections
(0,0),(1,1),(1,2),(2,2),(3,3) 450,90o,0o,45o
Polar Cartesian coordinates
Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI4…..32 -59.00 -60.00 -180.00 0.00 0.00 0.00 0.00 3334….……
PDB x y z…ATOM 490 C GLN A 31 52.013 -87.359 -8.797 1.00 7.06 NATOM 491 N GLY A 32 52.134 -87.762 -10.201 1.00 8.67 CATOM 492 CA GLY A 32 51.726 -89.222 -10.343 1.00 10.90 CATOM 493 O GLY A 32 51.015 -89.601 -11.275 1.00 9.63 O…..….
Find x,y,z coordinates of C, based on atom positions of C’, N and Ca, and a given F value (F: C’-N-Ca-C)
• create Ca-C vector: – size Ca-C=1.51A (equilibrium bond length)– angle N-Ca-C= 111o (equilibrium value for N-
Ca-C angle)• rotate vector around N-Ca axis to obtain
projections of Ca-C and N-C’ with wanted F
(0,0),(1,1),(1,2),(2,2),(3,3) 450,90o,0o,45o
Representation of protein structure431 2 875 6Rosetta folding
3 backbone dihedral angles per residue
Sampling and minimization in TORSIONAL space: change angle and rebuild, starting from changed angle
Build coordinates of structure starting from first atom, according to dihedral angles (and equilibrium bond length and angle)
431 2 875 687
Based on slides by Chu Wang