41
2. Introduction to Rosetta and structural modeling (From Ora Schueler- Furman) Approaches for structural modeling of proteins The Rosetta framework and its prediction modes Cartesian and polar coordinates Sampling (finding the structure) and scoring (selecting the structure)

2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Embed Size (px)

Citation preview

Page 1: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

2. Introduction to Rosetta and structural modeling

(From Ora Schueler-Furman)• Approaches for structural modeling of proteins • The Rosetta framework and its prediction

modes• Cartesian and polar coordinates• Sampling (finding the structure) and scoring

(selecting the structure)

Page 2: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Structural Modeling of Proteins - Approaches

Page 3: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Prediction of Structure from Sequence

Flowchart Comparison of query sequence to nr database

Similar to a sequence of known structure?

Homology Modeling(Comparative Modeling)

No

Fold Recognition(Threading)

Fits a known fold?

Yes

Yes

Ab initio prediction

No

Page 4: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

The Rosetta framework and its prediction modes

Page 5: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

A short history of Rosetta

In the beginning: ab initio modeling of protein structure starting from sequence Short fragments of known proteins are

assembled by a Monte Carlo strategy to yield native-like protein conformations

Reliable fold identification for short proteins. Recently improved to high-resolution models (within 2A RMSD)

ATCSFFGRKLL…..

Page 6: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

A short history of Rosetta

Success of ab initio protocol lead to extension to Protein design Design of new fold: TOP7 Protein loop modeling; homology modeling Protein-protein docking; protein interface design

Protein-ligand docking Protein-DNA interactions; RNA modeling Many more, e.g. solving the phase problem in

Xray crystallography

ATCSFFGRKLL…..

ATCSFFGRKLL…..

Page 7: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

The Rosetta Strategy

• Observation: local sequence preferences bias, but do not uniquely define, the local structure of a protein

• Goal: mimic interplay of local and global interactions that determine protein structure

• Local interactions: fragments derived from known structures (sampled for similar sequences/secondary structure propensity)

• Global (non-local) interactions: buried hydrophobic residues, paired b strands, specific side chain interactions, etc

Page 8: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

The Rosetta Strategy

• Local interactions – fragments– Fragment library representing accessible local

structures for all short sequences in a protein chain, derived from known structures

• Global (non-local) interactions – scoring function– Derived from conformational statistics of known

structures

Page 9: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Scoring and Sampling

Page 10: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

The basic assumption in structure prediction

Native structure located in global minimum (free) energy conformation (GMEC)

➜A good Energy function can select the correct model among decoys

➜A good sampling technique can find the GMEC in the rugged landscape

EGMEC

Conformation space

Page 11: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Two-Step Procedure

1. Low-resolution step locates potential minima (fast)

2. Cluster analysis identifies broadest basins in landscape

3. High-resolution step can identify lowest energy minimum in the basins (slow)

GMEC

E

Conformation space

Page 12: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Structure Representation:• Equilibrium bonds and

angles (Engh & Huber 1991)

• Centroid: average location of center of mass of side-chain(Centroid | aa, f,)

• No modeling of side chains• Fast

Low-Resolution Step

Page 13: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Bayes Theorem:• Independent components prevent over-counting

P(str | seq) = P(str)*P(seq|str) / P(seq)

Low-Resolution Scoring Function

constantsequence-dependent features

structuredependent features

N

O

OO

N

O

N

O

N

N

O

......

Page 14: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Bayes Theorem: P(str | seq) = P(str) * P(seq | str) / P(seq)

Score = Senv+ Spair + …

neighbors: Cb-Cb <10Ǻ

Sequence-Dependent Components

Rohl et al. (2004) Methods in Enzymology 383:66Origin: Simons et al., JMB 1997; Simons et al., Proteins 1999

Page 15: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

P(str | seq) = P(str) * P(seq | str) / P(seq)

Score = … + Srg + Scb + Svdw + …

Structure-Dependent Components

Page 16: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

P(str | seq) = P(str) * P(seq | str) / P(seq)

Score = … + Sss + …

Structure-Dependent Components

Page 17: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

P(str | seq) = P(str) * P(seq | str) / P(seq)

Score = … + Ssheet+ Shs + …

+ Srama

10

Structure-Dependent Components

Page 18: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Slow, exact step• Locates global energy

minimum

Structure Representation:

• All-atom (including polar and non-polar hydrogens, but no water)

• Side chains as rotamers from backbone-dependent library

• Side chain conformation adjusted frequently

High-Resolution Step

Dunbrack 1997

Page 19: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

• Side chains have preferred conformations

• They are summarized in rotamer libraries

• Select one rotamer for each position

• Best conformation: lowest-energy combination of rotamers

High-Resolution Step: Rotamer Libraries

Serine c1 preferences

t=180o

g-=-60og+=+60o

Page 20: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

High-Resolution Scoring Function

• Major contributions:– Burial of hydrophobic

groups away from water– Void-free packing of

buried groups and atoms– Buried polar atoms form

intra-molecular hydrogen bonds

Page 21: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Packing interactions

Score = SLJ(atr + rep) + ….

rij

Linearized repulsive part

e: well depth from CHARMm19

High-Resolution Scoring Function

Page 22: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Implicit solvation

Score = … + Ssolvation + ….

Lazaridis & Karplus, Proteins 1999

solvation free energy density of i

polar

polar

High-Resolution Scoring Function

xij=(rij - Ri)/li

xij2

xji2

Page 23: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

NH

O Cd

(Kortemme, 2003; Morozov 2004)

Hydrogen Bonds (original function)

Score = …. + Shb(srbb+lrbb+sc) + ….

srbb: short range, backbone HBlrbb: long range, backbone HBsc: HB with side chain atom

High-Resolution Scoring Function

Page 24: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Hydrogen Bonding Energy

Based on statistics from high-resolution structures in the Protein Data Bank (rcsb.org)

lnG kT P

(Kortemme, Morozov & Baker 2003 JMB)

HB HB[ ( ) ( ) ( ) ( )HAE W E E E E

Slide from Jeff Gray

]

Page 25: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Rotamer preference

Score = … + Sdunbrack + ….

Dunbrack, 1997

High-Resolution Scoring Function

Page 26: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

One long, generic function ….

Score = Senv+ Spair + Srg + Sc b + Svdw + Sss+ Ssheet+ Shs + Srama + Shb (srbb + lrbb) + docking_score + Sdisulf_cent+ Srs+ Sco + Scontact_prediction + Sdipolar+ Sprojection + Spc+ Stether+ Sfy+ Sw+ Ssymmetry + Ssplicemsd + …..

docking_score = Sd env+ Sd pair + Sd contact+ Sd vdw+ Sd site constr + Sd + Sfab score

Score = SLJ(atr + rep) + Ssolvation + Shb(srbb+lrbb+sc) + Sdunbrack + Spair – Sref + Sprob1b + Sintrares + Sgb_elec + Sgsolt

+ Sh2o(solv + hb) + S_plane

Scoring Function: Summary

Page 27: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Representations of protein structure: Cartesian and polar coordinates

Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI41 0.00 -60.00 -180.00 -60.00 0.00 0.00 0.00 23….……

PDB x y zATOM 490 N GLN A 31 52.013 -87.359 -8.797 1.00 7.06 NATOM 491 CA GLN A 31 52.134 -87.762 -10.201 1.00 8.67 CATOM 492 C GLN A 31 51.726 -89.222 -10.343 1.00 10.90 CATOM 493 O GLN A 31 51.015 -89.601 -11.275 1.00 9.63 O…..….

Page 28: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

2 ways to represent the protein structure

Cartesian coordinates (x,y,z; pdb format)

Intuitive – look at molecules in space

Easy calculation of energy score (based on atom-atom distances)

– Difficult to change conformation of structure (while keeping bond length and bond angle unchanged)

Polar coordinates ( - - ;F Y W equilibrium angles and bond lengths)

Compact (3 values/residue)Easy changes of protein

structure (turn around one or more dihedral angles)

– Non-intuitive– Difficult to evaluate energy

score (calculation of neighboring matrix complicated)

Page 29: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

A snake in the 2D world

• Cartesian representation:points:(0,0),(1,1),(1,2),(2,2),(3,3)connections (predefined):1-2,2-3,3-4,4-5

x

y(0,0)

(1,1)

(1,2)

(2,2)

(3,3)

1-2

2-3

3-4

4-5

12

3

4

5

Page 30: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

A snake in the 2D world

• Internal coordinates:bond lengths (predefined):√2,1,1,√2angles:450,90o,0o,45o

x

y√2

√21

1

x

y

45o

45o

90o

From wikipedia

Page 31: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

A snake wiggling in the 2D world

• Constraint: keep bond length fixed

• Move in Cartesian representation

(0,0),(1,1),(1,2),(2,2),(3,3) (0,0),(1,1),(1,2),(2,2),(3,0)

Bond length changed!

x

y

√2

√3

Page 32: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

A snake wiggling in the 2D world

• Constraint: keep bond length fixed

• Move in polar coordinates450,90o,0o,45o 450,90o,45o,45o

Bond length unchanged!Large impact on structure

x

y

Page 33: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Polar Cartesian coordinatesConvert r and q to x and y

(0,0),(1,1),(1,2),(2,2),(3,3)

450,90o,0o,45o

√2,1,1,√2

x

y

From wikipedia

Page 34: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Cartesianpolar coordinatesConvert x and y to r and q

(0,0),(1,1),(1,2),(2,2),(3,3)

450,90o,0o,45o

√2,1,1,√2

x

y

Page 35: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Moving the snake to the 3D world

x

y

• Cartesian representation:points: additional z-axis(0,0,0),(1,1,0),(1,2,0),(2,2,0),

(3,3,0)connections (predefined):1-2,2-3,3-4,4-5

• Internal coordinates:bond lengths (predefined):√2,1,1,√2angles:450,90o,0o,45o

dihedral angles: 1800,180o

z

Proteins: bond lengths and angles fixed. Only dihedral angles are varied

Page 36: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Dihedral angles

Dihedral angles c1-c4 define side chain

From wikipedia

• Dihedral angle: defines geometry of 4 consecutive atoms (given bond lengths and angles)

Page 37: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

What we learned from our snake

x y

• Cartesian representation: Easy to look at, difficult to move– Moves do not preserve bond length

(and angles in 3D)

• Internal coordinates: Easy to move, difficult to see – calculation of distances between

points not trivial

z

Proteins: bond lengths and angles fixed. Only dihedral angles are varied

Page 38: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Solution: toggle

CALCULATE ENERGY - Cartesian coordinates:

Derive distance matrix (neighbor list) for energy score calculation

Transform: build positions in space according to

dihedral anglesPDB x y zATOM 490 N GLN A 31 52.013 -87.359 -8.797 1.00 7.06 NATOM 491 CA GLN A 31 52.134 -87.762 -10.201 1.00 8.67 CATOM 492 C GLN A 31 51.726 -89.222 -10.343 1.00 10.90 CATOM 493 O GLN A 31 51.015 -89.601 -11.275 1.00 9.63 O…..….

MOVE STRUCTURE - Polar coordinates:

introduce changes in structure by rotating around dihedral angle(s) (change - F Yvalues)

Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI41 0.00 -60.00 -180.00 -60.00 0.00 0.00 0.00 23….……

Transform: calculate dihedral angles from

coordinates

(0,0),(1,1),(1,2),(2,2),(3,3) 450,90o,0o,45o

Page 39: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Cartesian polar coordinates

Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI4…..32 -59.00 -60.00 -180.00 0.00 0.00 0.00 0.00 3334….……

PDB x y z…ATOM 490 C GLN A 31 52.013 -87.359 -8.797 1.00 7.06 NATOM 491 N GLY A 32 52.134 -87.762 -10.201 1.00 8.67 CATOM 492 CA GLY A 32 51.726 -89.222 -10.343 1.00 10.90 CATOM 493 O GLY A 32 51.015 -89.601 -11.275 1.00 9.63 O…..….

How to calculate polar from Cartesian coordinates: example F: C’-N-Ca-C

– define plane perpendicular to N-Ca (b2) vector– calculate projection of Ca-C (b3) and C’-N (b1) onto plane– calculate angle between projections

(0,0),(1,1),(1,2),(2,2),(3,3) 450,90o,0o,45o

Page 40: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Polar Cartesian coordinates

Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI4…..32 -59.00 -60.00 -180.00 0.00 0.00 0.00 0.00 3334….……

PDB x y z…ATOM 490 C GLN A 31 52.013 -87.359 -8.797 1.00 7.06 NATOM 491 N GLY A 32 52.134 -87.762 -10.201 1.00 8.67 CATOM 492 CA GLY A 32 51.726 -89.222 -10.343 1.00 10.90 CATOM 493 O GLY A 32 51.015 -89.601 -11.275 1.00 9.63 O…..….

Find x,y,z coordinates of C, based on atom positions of C’, N and Ca, and a given F value (F: C’-N-Ca-C)

• create Ca-C vector: – size Ca-C=1.51A (equilibrium bond length)– angle N-Ca-C= 111o (equilibrium value for N-

Ca-C angle)• rotate vector around N-Ca axis to obtain

projections of Ca-C and N-C’ with wanted F

(0,0),(1,1),(1,2),(2,2),(3,3) 450,90o,0o,45o

Page 41: 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and

Representation of protein structure431 2 875 6Rosetta folding

3 backbone dihedral angles per residue

Sampling and minimization in TORSIONAL space: change angle and rebuild, starting from changed angle

Build coordinates of structure starting from first atom, according to dihedral angles (and equilibrium bond length and angle)

431 2 875 687

Based on slides by Chu Wang