Upload
archibald-hoover
View
245
Download
0
Tags:
Embed Size (px)
Citation preview
2. Introduction to Rosetta and structural modeling
• Approaches for structural modeling of proteins • The Rosetta framework and its prediction
modes• Cartesian and polar coordinates• Sampling (finding the structure) and scoring
(selecting the structure)
Structural Modeling of Proteins - Approaches
Prediction of Structure from Sequence
Flowchart Comparison of query sequence to nr databaseComparison of query sequence to nr database
Similar to a sequence of known structure?Similar to a sequence of known structure?
Homology Modeling(Comparative Modeling)
Homology Modeling(Comparative Modeling)
NoNo
Fold Recognition(Threading)
Fold Recognition(Threading)
Fits a known fold?Fits a known fold?
YesYes
YesYes
Ab initio predictionAb initio prediction
NoNo
Protocols: ab initio, loops, side chains, active sites….Protocols: ab initio, loops, side chains, active sites….
The Rosetta framework and its prediction modes
The Rosetta Strategy
• Observation: local sequence preferences bias, but do not uniquely define the local structure of a protein
• Goal: mimic interplay of local and global interactions that determine protein structure
The Rosetta Strategy
Local interactions: fragments •Derived from known structures• Sampled for similar
sequences/secondary structure propensity
• Fragment library represents accessible local structures for short sequence
The Rosetta Strategy
Global (non-local) interactions: scoring function•Buried hydrophobic residues, paired strands, specific side chain interactions, etc.•Derived from known structures (statistics on preferred conformations)•Boltzmann’s principle relates frequency to energy
A short history of Rosetta
In the beginning: ab initio modeling of protein structure starting from sequence Short fragments of known proteins are
assembled by a Monte Carlo strategy to yield native-like protein conformations
Reliable fold identification for short proteins. Recently improved to high-resolution models (within 2A RMSD)
ATCSFFGRKLL…..ATCSFFGRKLL…..
A short history of Rosetta
Success of ab initio protocol lead to extension to Protein design Design of new fold: TOP7 Protein loop modeling; homology modeling Protein-protein docking; protein interface design
Protein-ligand docking Protein-DNA interactions; RNA modeling Many more, e.g. solving the phase problem in
Xray crystallography
ATCSFFGRKLL…..ATCSFFGRKLL…..
ATCSFFGRKLL…..ATCSFFGRKLL…..
More recent additions
• Boinc (Rosetta@home)• FoldIt
• Rosettascripts; RosettaDiagrams• PyRosetta
Scoring and Sampling
The basic assumption in structure prediction
Native structure located in global minimum (free) energy conformation (GMEC)
➜A good Energy function can select the correct model among decoys
➜A good sampling technique can find the GMEC in the rugged landscape
EEGMECGMEC
Conformation spaceConformation space
Two-Step Procedure
1. Low-resolution step locates potential minima (fast)
2. Cluster analysis identifies broadest basins in landscape
3. High-resolution step can identify lowest energy minimum in the basins (slow)
GMECGMEC
EE
Conformation spaceConformation space
Nature uses one scoring function…
Aim: one generic function for different applications
Optimization of parameters: Originally from small
molecules (experiments & quantum mechanical calculations)
Today: use of protein structures solved at high-accuracy
How are scoring terms optimized?
Benchmarks:
Discriminate ground state from alternative conformations
Identify correct side chain conformation
Predict effect of stability of point mutations (G)
Leaver-Fay, …, & Baker (2013) Methods in Enzymology 523:109
Structure Representation:• Equilibrium bonds and
angles (Engh & Huber 1991)
• Centroid: average location of center of mass of side-chain(Centroid | aa, ,)
• No modeling of side chains• Fast
Low-Resolution Step (e.g. score4)
Bayes Theorem:• Independent components prevent over-counting
P(str | seq) = P(str)*P(seq|str) / P(seq)
Low-Resolution Scoring Function
constantconstantsequence-dependent features
sequence-dependent features
structuredependent features
structuredependent features
N
O
OO
N
O
N
O
N
N
O
......
Bayes Theorem: P(str | seq) = P(str) * P(P(seq seq | | strstr)) / P(seq)
Score = Senv+ Spair + …
neighbors: C-C <10Ǻ
Sequence-Dependent Components
Rohl et al. (2004) Methods in Enzymology 383:66Origin: Simons et al., JMB 1997; Simons et al., Proteins 1999
P(str | seq) = P(P(strstr)) * P(seq | str) / P(seq)
Score = … + Srg + Sc + Svdw + …
Structure-Dependent Components
P(str | seq) = P(P(strstr)) * P(seq | str) / P(seq)
Score = … + Srama
….+…..+
10
Structure-Dependent Components
Slow, exact step• Locates global energy
minimum
Structure Representation:• All-atom (including polar and non-
polar hydrogens, but no water)• Side chains as rotamers from
backbone-dependent library• Side chain conformation adjusted
frequently
e.g. score12; Talaris; …
High-Resolution Step
Dunbrack 1997
• Side chains have preferred conformations
• They are summarized in rotamer libraries
• Select one rotamer for each position
• Best conformation: lowest-energy combination of rotamers
High-Resolution Step: Rotamer Libraries
Serine 1 preferences
t=180o
g-=-60og+=+60o
High-Resolution Scoring Function
• Major contributions:– Burial of hydrophobic
groups away from water– Void-free packing of
buried groups and atoms– Buried polar atoms form
intra-molecular hydrogen bonds
Packing interactionsScore = SLJ(atr + rep) + ….
rij
Linearized repulsive part
e: well depth from CHARMm19
High-Resolution Scoring Function
(new in score12’: starts from minimum)
Implicit solvation
Score = … + Ssolvation + ….
Lazaridis & Karplus, Proteins 1999
solvation free energy density of i
polar
polar
High-Resolution Scoring Function
xij=(rij - Ri)/i
xij2
xji2
Hydrogen Bonding Energy
Based on statistics from high-resolution structures in the PDB
(Kortemme, Morozov & Baker 2003 JMB)
Slide from Jeff Gray
]
Score = …. + Shb(srbb+lrbb+sc) + ….
srbb: short range, backbone HB
lrbb: long range, backbone HB
sc: HB with side chain atom
Rotamer preference
Score = … + Sdunbrack + ….
Dunbrack, 1997
High-Resolution Scoring Function
One long, generic function ….
Score = Senv+ Spair + Srg + Sc+ Svdw + Sss+ Ssheet+ Shs + Srama + Shb (srbb + lrbb) + docking_score + Sdisulf_cent+ Sr+ Sco + Scontact_prediction + Sdipolar+ Sprojection + Spc+ Stether+ S+ S+ Ssymmetry + Ssplicemsd + …..
docking_score = Sd env+ Sd pair + Sd contact+ Sd vdw+ Sd site constr + Sd + Sfab score
Score = SLJ(atr + rep) + Ssolvation + Shb(srbb+lrbb+sc) + Sdunbrack + Spair – Sref + Sprob1b + Sintrares + Sgb_elec + Sgsolt
+ Sh2o(solv + hb) + S_plane
Scoring Function: Summary
One long, generic function …. A weighted sum of different terms
Score12 = w1*SLJatr + w2*SLJrep + w3*Ssolvation + w4*Shb(srbb+lrbb+sc) + w5*Sdunbrack + w6*Spair – Sref
Scoring Function: Summary
Leaver-Fay, …, & Baker (2013) Methods in Enzymology 523:109
How can it be improved ? Feature Analysis Tool : improve parametersOptE : optimize weights
How can it be improved ? Feature Analysis Tool : improve parametersOptE : optimize weights
Feature Analysis : improve scoring term
Leaver-Fay, …, & Baker (2013) Methods in Enzymology 523:109
Aim: similar distributions in crystal structures and modelsAim: similar distributions in crystal structures and models
e.g. HB distance H- Oin Ser & Thr
e.g. HB distance H- Oin Ser & Thr
Feature Analysis : improve scoring term
Leaver-Fay, …, & Baker (2013) Methods in Enzymology 523:109
Aim: similar distributions in crystal structures and modelsAim: similar distributions in crystal structures and models
e.g. HB distance H- Oin Ser & Thr
e.g. HB distance H- Oin Ser & Thr
After correction: distribution in native & model structures overlap After correction: distribution in native & model structures overlap
Score12 = w1*SLJatr + w2*SLJrep + w3*Ssolvation + w4*Shb(srbb+lrbb+sc) + w5*Sdunbrack + w6*Spair – Sref
OptE : optimize weights
Leaver-Fay, …, & Baker (2013) Methods in Enzymology 523:109
Maximum Likelihood Parameter EstimationBenchmarks: Discriminate ground state from alternative conformations Identify correct side chain conformation Sequence recovery in design: choose correct amino acid
residue Predict effect of stability of point mutations (G)
& more …
Aim: Best score for correct predictionAim: Best score for correct prediction
Representations of protein structure: Cartesian and polar coordinates
Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI41 0.00 -60.00 -180.00 -60.00 0.00 0.00 0.00 23….……
PDB x y zATOM 490 N GLN A 31 52.013 -87.359 -8.797 1.00 7.06 NATOM 491 CA GLN A 31 52.134 -87.762 -10.201 1.00 8.67 CATOM 492 C GLN A 31 51.726 -89.222 -10.343 1.00 10.90 CATOM 493 O GLN A 31 51.015 -89.601 -11.275 1.00 9.63 O…..….
2 ways to represent the protein structure
Cartesian coordinates (x,y,z; pdb format)
Intuitive – look at molecules in space
Easy calculation of energy score (based on atom-atom distances)
– Difficult to change conformation of structure (while keeping bond length and bond angle unchanged)
Polar coordinates ( equilibrium angles and bond lengths)
Compact (3 values/residue)Easy changes of protein
structure (turn around one or more dihedral angles)
– Non-intuitive– Difficult to evaluate energy
score (calculation of neighboring matrix complicated)
A snake in the 2D world
• Cartesian representation:points:(0,0),(1,1),(1,2),(2,2),(3,3)
connections (predefined):1-2,2-3,3-4,4-5
x
y(0,0)
(1,1)
(1,2)
(2,2)
(3,3)
1-2
2-3
3-4
4-5
1122
33
44
55
A snake in the 2D world
• Internal coordinates:bond lengths (predefined):√2,1,1,√2
angles:450,90o,0o,45o
x
y√2√2
√2√211
11
x
y
45o
45o
90o
From wikipedia
A snake wiggling in the 2D world
• Constraint: keep bond length fixed
• Move in Cartesian representation
(0,0),(1,1),(1,2),(2,2),(3,3) (0,0),(1,1),(1,2),(2,2),(3,0)
Bond length changed!
x
y
√2√2
√3√3
A snake wiggling in the 2D world
• Constraint: keep bond length fixed
• Move in polar coordinates450,90o,0o,45o 450,90o,45o,45o
Bond length unchanged!Large impact on structure
x
y
Polar Cartesian coordinatesConvert r and to x and y
(0,0),(1,1),(1,2),(2,2),(3,3)
450,90o,0o,45o
√2,1,1,√2
x
y
From wikipedia
Cartesianpolar coordinatesConvert x and y to r and
(0,0),(1,1),(1,2),(2,2),(3,3)
450,90o,0o,45o
√2,1,1,√2
x
y
Moving the snake to the 3D world
x
y
• Cartesian representation:points: additional z-axis(0,0,0),(1,1,0),(1,2,0),(2,2,0),
(3,3,0)connections (predefined):1-2,2-3,3-4,4-5
• Internal coordinates:bond lengths (predefined):√2,1,1,√2angles:450,90o,0o,45o
dihedral angles: 1800,180o
z
Proteins: bond lengths and angles fixed. Only dihedral angles are variedProteins: bond lengths and angles fixed. Only dihedral angles are varied
Dihedral angles
Dihedral angles 1-4 define side chain
From wikipedia
• Dihedral angle: defines geometry of 4 consecutive atoms (given bond lengths and angles)
What we learned from our snake
x y
• Cartesian representation: Easy to look at, difficult to move– Moves do not preserve bond length
(and angles in 3D)
• Internal coordinates: Easy to move, difficult to see – calculation of distances between
points not trivial
z
Proteins: bond lengths and angles fixed. Only dihedral angles are variedProteins: bond lengths and angles fixed. Only dihedral angles are varied
Solution: toggle
CALCULATE ENERGY - Cartesian coordinates:
Derive distance matrix (neighbor list) for energy score calculation
CALCULATE ENERGY - Cartesian coordinates:
Derive distance matrix (neighbor list) for energy score calculation
Transform: build positions in space according to
dihedral angles
Transform: build positions in space according to
dihedral anglesPDB x y zATOM 490 N GLN A 31 52.013 -87.359 -8.797 1.00 7.06 NATOM 491 CA GLN A 31 52.134 -87.762 -10.201 1.00 8.67 CATOM 492 C GLN A 31 51.726 -89.222 -10.343 1.00 10.90 CATOM 493 O GLN A 31 51.015 -89.601 -11.275 1.00 9.63 O…..….
MOVE STRUCTURE - Polar coordinates:
introduce changes in structure by rotating around dihedral angle(s) (change values)
MOVE STRUCTURE - Polar coordinates:
introduce changes in structure by rotating around dihedral angle(s) (change values)
Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI41 0.00 -60.00 -180.00 -60.00 0.00 0.00 0.00 23….……
Transform: calculate dihedral angles from
coordinates
Transform: calculate dihedral angles from
coordinates
(0,0),(1,1),(1,2),(2,2),(3,3) 450,90o,0o,45o
Cartesian polar coordinates
Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI4…..32 -59.00 -60.00 -180.00 0.00 0.00 0.00 0.00 3334….……
PDB x y z…ATOM 490 C GLN A 31 52.013 -87.359 -8.797 1.00 7.06 NATOM 491 N GLY A 32 52.134 -87.762 -10.201 1.00 8.67 CATOM 492 CA GLY A 32 51.726 -89.222 -10.343 1.00 10.90 CATOM 493 O GLY A 32 51.015 -89.601 -11.275 1.00 9.63 O…..….
How to calculate polar from Cartesian coordinates: example : C’-N-Ca-C
– define plane perpendicular to N-Ca (b2) vector– calculate projection of Ca-C (b3) and C’-N (b1) onto plane– calculate angle between projections
(0,0),(1,1),(1,2),(2,2),(3,3) 450,90o,0o,45o
Polar Cartesian coordinates
Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI4…..32 -59.00 -60.00 -180.00 0.00 0.00 0.00 0.00 3334….……
PDB x y z…ATOM 490 C GLN A 31 52.013 -87.359 -8.797 1.00 7.06 NATOM 491 N GLY A 32 52.134 -87.762 -10.201 1.00 8.67 CATOM 492 CA GLY A 32 51.726 -89.222 -10.343 1.00 10.90 CATOM 493 O GLY A 32 51.015 -89.601 -11.275 1.00 9.63 O…..….
Find x,y,z coordinates of C, based on atom positions of C’, N and Ca, and a given value (: C’-N-Ca-C)
• create Ca-C vector: –size Ca-C=1.51A (equilibrium bond length)–angle N-Ca-C= 111o (equilibrium value for N-Ca-C angle)
• rotate vector around N-Ca axis to obtain projections of Ca-C and N-C’ with wanted
(0,0),(1,1),(1,2),(2,2),(3,3) 450,90o,0o,45o
Representation of protein structure
431 2 875 6Rosetta folding
3 backbone dihedral angles per residue
Sampling and minimization in TORSIONAL space: change angle and rebuild, starting from changed angle
Build coordinates of structure starting from first atom, according to dihedral angles (and equilibrium bond length and angle)
431 2 875 687
Based on slides by Chu Wang
Representation of protein structure
431 2 875 6
431 2 875 6
4’3’1’ 2’ 8’7’5’ 6’
Backbone dihedral angles fixed (rigid-body)
Rosetta folding
3 backbone dihedral angles per residue
Rosetta docking
6 rigid-body DOFs --3 translational vectors3 rotational angles
Sampling and minimization in TORSIONAL space
Sampling and minimization in RIGID-BODY space
How can those two types of degrees of freedom be combined?How can those two types of degrees of freedom be combined?
Fold tree representation
“long-range” edge – 6 rigid-body DOFs
4’3’1’ 2’ 8’7’5’ 6’
“peptide” edge – 3 backbone dihedral angles
431 2 875 6
“peptide” edge – 3 backbone dihedral anglesExample:fold-tree based docking
Originally developed to improve sampling of strand registers in -sheet proteins. Allows simultaneous optimization of rigid-body and backbone/sidechain torsional degrees of freedom.
Fold tree: Bradley and Baker, Proteins (2006)
4’3’1’ 2’ 8’7’5’ 6’
Construct fold-trees to treat a variety of protein folding and docking problems.
Fold-trees for different modeling tasks protein folding N C
N: N-terminal; C: C-terminal; X: chain break; O: root of the tree;
Flexible “peptide” edge rigid “peptide” edge 1 1’ rigid “jump” 1 1’ flexible “jump”
Color – flexible bbGray – fixed bb
Fold-trees for different modeling tasks
N 1 1’ C2 2’xx
loop modeling
N: N-terminal; C: C-terminal; X: chain break; O: root of the tree;
Flexible “peptide” edge rigid “peptide” edge 1 1’ rigid “jump” 1 1’ flexible “jump”
Color – flexible bbGray – fixed bb
Fold-trees for different modeling tasks
N 1 C
N 1’ C
fully flexible docking
N: N-terminal; C: C-terminal; X: chain break; O: root of the tree;
Flexible “peptide” edge rigid “peptide” edge 1 1’ rigid “jump” 1 1’ flexible “jump”
N 1 C
N 1’ C
docking w/ hinge motion
N 1
N 1’ C
2 2’x C
3’ 3x
docking w/ loop modeling
Color – flexible bbGray – fixed bb
Fold-trees for different modeling tasks
Color – flexible bbGray – fixed bbPale – symmetry operation
Fold-trees for different modeling tasks
Color – flexible bbGray – fixed bb• Filled colored circles - flexible sc
Fold-trees for different modeling tasks
Color – flexible bbGray – fixed bb
• Filled colored circles - flexible sco empty colored circles – flexible amino acid: design
Fold-trees for different modeling tasks
Color – flexible bbGray – fixed bb
• Filled colored circles - flexible sco empty colored circles – flexible amino acid: design
Rosetta3: Object-oriented architecture
Color – flexible bbGray – fixed bb
Description of object-oriented organization in Rosetta3: Leaver-Fay et al. Methods in Enzymology (2013)
The Rosetta sampling strategy: A general overview