47
Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Embed Size (px)

Citation preview

Page 1: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Structure prediction: Ab-initio

Lecture 9Structural Bioinformatics

Dr. Avraham Samson81-871

Let’s think!

Page 2: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Levinthal's paradoxIn 1969, Cyrus Levinthal noted that, because of the very large number of degrees of freedom in a polypeptide chain, the molecule has an astronomical number of possible conformations. For example, a polypeptide of 100 residues will have 99 peptide bonds, and therefore 198 different phi and psi bond angles. If each of these bond angles can be in one of three stable conformations, the protein may misfold into a maximum of 3198 (~10100) different conformations. Therefore, a polypeptide would require a time longer than the age of the universe to arrive at its correct native conformation. This is true even if conformations are sampled at rapid (picosecond) rates. The "paradox" is that most small proteins fold spontaneously on a millisecond or even microsecond time scale.

Page 3: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

3

Protein Structure Prediction

• Two main categories of protein structure prediction methods:– Homology modeling (class of last week!)

– Ab-initio methods (class of today!)

• Methods can also be characterized: – Based on physical principles (simulations)– Based on statistics derived from known structures

(knowledge-based)

Page 4: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

4

Secondary Structure Prediction

• Methods attempt to decide which type of secondary structure (helix, strand or coil) each amino acid in a protein sequence is likely to adopt.

• The based methods are currently able to achieve success rates of over 75% based on sequence profiles.

Page 5: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

5

Folding Simulations

• Accurate folding simulations will allow us to predict the structure of any protein.

• However, this approach is impractical due to limitations of computing power.

• Our understanding of the principles of protein folding are far short of the level needed to achieve this.

Page 6: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

6

Homology Modeling

• Sometimes referred to as “Comparative modeling”• The most reliable technique for predicting protein

structure• Comparing the sequence of the new protein with the

sequences of proteins of known structure– Strong similarity (% identity, % similarity, alignment)– No strong similarities comparative modeling cannot be

used.

• Similar sequences Almost identical structures

Page 7: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

7

Predicting Small Conformational Changes

• Even between very similar proteins, there are differences.

• Some of these differences might be functionally important (different binding loop conformations)

• Predicting what the effects of these small structural changes is the real challenge in modeling

• Native fold of a protein can be found by finding the conformation of the protein which has the lowest energy as defined by a suitable potential energy function.

Page 8: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

8

Ab initio Prediction

• Ab initio (i.e. ‘from scratch’)• Use only the information in the target sequence

itself• Two branches

– Knowledge-based methods• Predict structure by applying statistical rules• Rules: observations made on known protein structures

– Simulation methods• Predict structures by applying physical parameters (Van-der-

Waals, dipole-dipole, etc)

Page 9: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

9

Simulation Methods

• Most ambitious approach• Simulate the protein-folding process using

basic physics• Only useful for short peptides and small

molecules• Very useful for predicting unknown loop

conformations as part of homology modeling

Page 10: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

10

Energy Function

• The exact form of this energy function is as yet unknown

• It is reasonable to assume that it would incorporate terms pertaining to the types of interactions observed in protein structures– Hydrogen bonding– Van der Waals effects

• Find a potential function• Construct an algorithm capable of finding

the global minimum of this function

Page 11: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

11

Searching Conformational Space

• Consider a protein chain of N residues• The size of its conformational space is roughly 10N

states. • 10 main chain torsion angle triples for each residue

• Not consider the additional conformational space provided by the side chain torsion yet.

Page 12: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

12

How to Find Global Energy Minimum Efficiently

• Clearly proteins do not fold by searching their entire conformational space (Levinthal’s paradox)

• Proteins fold by means of a folding pathway encoded in the protein sequence ?

• Short-chain segments (5-7 residues) could quite easily locate their global minimum.

• Location of the native fold is driven by the folding of such short fragments ?

Page 13: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

13

One Subtle Point

• The native conformation need not necessarily correspond to the global minimum of free energy.

Page 14: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

14

Secondary Structure Prediction

• Although predicting just the secondary structure of a protein is a long way from predicting its tertiary structure, information on the locations of helices and strands in a protein can provide useful insights as to its possible overall fold.

• It is also worth noting that the origins of the protein structure prediction field lie in this area

Page 15: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

15

Intrinsic Propensities for Secondary Structure Formation

• Are some residues more likely to form -helices or -strands than others?

• Yes– Ex. proline residues are not often found in -helices

• 1974, statistical analysis of 15 proteins with known 3-D structures

• For each of the 20 amino acids, calculate the probability of finding any residue in -helices and in -strands

• Also calculate the probability of finding any residue in -helices and in -strands

Page 16: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

16

Example (Chou and Fasman, 1974)

• Suppose there was a total of 2000 residues in their 15 protein data set

Total number of residues 2000

Number of alanines 100

Number of helical residues 500

Number of alanines in helices 50

We would calculate the propensity of alanine for helix formation as follows:P(Ala in Helix) = 50/500 = 0.1

P(Ala) = 100/2000 = 0.05

Helix propensity (PA) of Ala = P(Ala in Helix)/P(Ala) = 0.1/0.05 = 2

Page 17: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

AVVTW...GTTWVRAVVTW...GTTWVR

ab-initio prediction

• Prediction from sequence using first principles

Page 18: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Ab-initio prediction

• “In theory”, we should be able to build native structures from first principles using sequence information and molecular dynamics simulations: “Ab-initio prediction of structure”

– Simulation of the villin head piece (36-residues). (Pande et al.)

http://www.youtube.com/watch?v=1eSwDKZQpok&feature=relatedhttp://www.youtube.com/watch?NR=1&v=meNEUTn9Atg&feature=endscreen

Page 19: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

... the bad news ...

• It is not possible to span simulations to the “seconds” range

• Simulations are limited to small systems and fast folding/unfolding events in known structures– steered dynamics– biased molecular dynamics

• Simplified systems

Page 20: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

typical shortcuts

• Reduce conformational space– 1,2 atoms per residue– fixed lattices

• Statistic force-fields obtained from known structures– Average distances between residues– Interactions

• Use building blocks: 3-9 residues from PDB structures

Page 21: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

“lattice” folding (2D)

Self-avoidance is easily monitored! Energy is easily calculated

Page 22: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Example PROSA potential

Total

Hydrophobic

C-C

http://lore.came.sbg.ac.at:8080/CAME/CAME_EXTERN/ProsaII/index_html

Very stable Low stability

Page 23: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Some protein from ESome protein from E.coli.coli predicted at 7.6 Åpredicted at 7.6 Å

(CASP3, H.Scheraga)(CASP3, H.Scheraga)

Results from ab-initio

• Average error 5 Å - 10 ÅAverage error 5 Å - 10 Å• Long simulationsLong simulations

Page 24: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Ab initio PDB

“loops” in homology modeling

Page 25: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Final test

• The model must justify experimental data (i.e. differences between unknown sequence and templates) and be useful to understand function.

Page 26: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Rosetta energy function

• Residue environment (solvation)• Residue pair interaction (electrostatic, disulfides)• Steric repulsion• Radius of gyration (vdw attraction, solvation)• Cb density (solvation, correction for excluded

volume)• SS pairing (hydrogen bonding)• Strand arrangement into sheet• Helix-strand packing

Page 27: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Protein Structure Prediction using ROSETTA

Page 28: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Worldwide distributed computing

Page 29: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Ab Initio Methods• Ab initio: “From the beginning”.• Assumption 1: All the information about the

structure of a protein is contained in its sequence of amino acids.

• Assumption 2: The structure that a (globular) protein folds into is the structure with the lowest free energy.

• Finding native-like conformations require: - A scoring function (potential). - A search strategy.

Page 30: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Rosetta• The scoring function is a model generated using

various contributions. It has a sequence dependent part (including for example a term for hydrophobic burial), and a sequence independent part (including for example a term for strand-strand packing).

• The search is carried out using simulated annealing. The move set is defined by a fragment library for each three and nine residue segment of the chain. The fragments are extracted from observed structures in the PDB.

Page 31: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

The Rosetta Scoring Function

Page 32: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Hydrophobic Burial

Page 33: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Residue Pair Interaction

Page 34: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

The Sequence Independent Term

vector representation

Page 35: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Strand Packing – Helps!

Estimated distribution

Page 36: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Sheer Angles – Help not!

Page 37: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Parameter Estimation

Page 38: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Parameter Estimation

Page 39: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Parameter Estimation

Page 40: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Parameter Estimation

Page 41: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Fragment Selection

Page 42: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!
Page 43: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Validation Data Set

Page 44: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

CASP3 Protocol• Construct a multiple sequence alignment from -blast.• Edit the multiple sequence alignment.• Identify the ab initio targets from the sequence.• Search the literature for biological and functional

information.• Generate 1200 structures, each the result of 100,000

cycles.• Analyze the top 50 or so structures by an all-atom

scoring function (also using clustering data).• Rank the top 5 structures according to protein-like

appearance and/or expectations from the literature.

Page 45: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

CASP3 Predictions

Page 46: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

Monte Carlo (Random Sampling)• Randomly (or pseudorandomly)

pick a configuration and evaluate its energy.

• If acceptably low, store result.• If not, move a distance away

from that point as a function of the energy (Metropolis criterion, a.k.a. simulated annealing) and evaluate again

• When some convergence threshold or time limit is met, stop and return stored results.http://www.chemistryexplained.com/images/

chfa_03_img0571.jpg

Why is Rosetta so fast?

Page 47: Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson 81-871 Let’s think!

What have we learned?

• Can tackle sampling today

• Forcefields sufficient? Folding to the native state folding rate prediction

• Role of water– Explicit solvent not crucial to rate determination?– Compare to explicit solvent simulation

• Universal mechanism of folding?– Maybe no universal mechanism: all proteins could be different?

1

10

100

1000

10000

100000

1 10 100 1000 10000 100000experimental measurement

(nanoseconds)

Pre

dic

ted

fold

ing

tim

e

(nan

osecon

ds)

PPA

alpha helix

betahairpin

villin

BBAW