30
Protein Structure Prediction

Protein Structure Prediction. Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal

Embed Size (px)

Citation preview

Protein Structure Prediction

Historical Perspective

Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999

A personal perspective on advances and developments in protein folding over the last 40 years

Levinthal Paradox

Cyrus Levinthal, Columbia University, 1968 Observed that there is insufficient time to

randomly search the entire conformational space of a protein

Resolution: Proteins have to fold through some directed process

Goal is to understand the dynamics of this process

Old vs. New Views

Old: Heirarchical view of protein folding Secondary structures form, then interact to form tertiary

structures General order of events

New: Statistical ensembles of states Potential energy landscape Folding “Funnel”

Not all that different; most important ideas were theorized many years ago

Secondary Structures

Consensus view is that secondary structure formation is the earliest part of the folding process

Numerous studies indicate that local sequence codes for local structures Helical sequences in a folded protein tend to be helical in

isolation Current SSE prediction algorithms about 70%

correct (1993). Failure indicates some tertiary interactions in stabilizing SSEs

However…

Not clear what sequence elements code for overall topology

One factor is the existence of hydrophobic faces on the surface of SSEs

Still challenges in predicting topology of SSEs, even when protein class is known

Atomic level calculations

Molecular calculations have made great impact in our understanding of protein folding

Harold Scheraga, 1968 Shneior Lifson, 1969 Martin Karplus’s laboratory, ~1979 Early calculations had trouble dealing with

solvent effects

Secondary Structure

Many of the essential elements of protein energetics can be derived from looking at SSE formation

Early experimental work: Ingwall et all, 1968 Baldwin et all, 1989, Worked on stabilizing shorter

helices Dyson, Wright, 1991, demonstrated that even

short peptides in solution can be partially structured

Results

Yang and Honig, 1995 Alpha-helices stabilized by hydrophobic

interactions and close packing; hydrogen bonding has little effect

Beta-sheets stabilized by non-polar interactions between residues on adjacent strands

Work supports idea that SSEs coded for locally in the sequence

Folding Pathways

SSEs can change conformation in the presence of a relatively small number of tertiary interactions

Free-energy difference between alpha-helix, beta-sheet, and coil is not great

Individual helices can be changed into beta-sheets by changing just a few amino acids

This suggests that proteins have a “structural plasticity” which allows for changes in conformation

Folding Pathways

Early in folding processes, many different combinations of SSEs have very similar stabilities

In the end, it is the tertiary interactions which drive towards the native topology

Early in folding, “flickering” of SSEs, eventually stabilized by tertiary interactions and converge to native state

Suggests that multiple folding pathways exist, which can all lead to the same end result once stabilized

Structure Prediction

Recently, a split has been seen Protein prediction problem

Trying to predict the end result of folding, using a large amount of comparison between known and unknown structures

Protein folding problem Trying to understand the folding path which leads to the end

result of folding, typically by MD simulations or energy calculation

Authors contention that both areas will need to be used together to fully understand protein folding

PrISM

Yang and Honig, 1999 Software suite which integrates prediction based on

simulations and known information about structures Sequence analysis Structure based sequence alignment Fast structure-structure superposition using a structural

domain database Multiple Structure alignment Fold recognition and homology model building

Used to make predictions for all 43 targets of CASP3 conference (more on CASP later)

Conclusions

Much of the current understanding of protein folding was theorized long ago

Vague and speculative ideas have been replaced by carefully defined theoretical concepts and rigorous experimental observations

Conclusions

Polypeptide backbone is the most important determinant of structure

SSEs are “meta-stable”; statement that sequence determines structure not wholly accurate

More accurate statement is that sequence chooses from a limited set of available SSEs and determines how they are ordered in space

Conclusions

Free-energy differences between alternate conformations is not large: may provide a bases for rapid evolutionary change

CASP

A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction, John Moult

CASP = Critical Assessment of Structure Prediction

First held in 1994, every 2 years afterwards Teams make structure predictions from

sequences alone

CASP

Two categories of predictorsAutomated

Automatic Servers, must complete analysis within 48 hours

Shows what is possible through computer analysis aloneNon-automated

Groups spend considerable time and effort on each target

Utilize computer techniques and human analysis techniques

CASP

CASP6, 1994200 prediction teams from 24 countriesOver 30,000 predictions for 64 protein targets

collected and evaluatedConference held after to discuss results, with

many teams presenting individual results and methodologies

Helps to steer future work

Modeling classes

Comparative modeling based on a clear sequence relationship

Modeling based on more distant evolutionary relationships

Modeling based on non-homologous fold relationships

Template free modeling

Comparative modeling based on a clear sequence relationship Easily detectable sequence relationship between

the target protein and one or more known protein structures, typically through BLAST

Copy from template, however: Must align target and template sequences In general, reliably building regions not present in the

template is still a challenge Sidechain accuracy is poor

Refinement remains a challenge

Comparative modeling based on a clear sequence relationship Progress in MD

needed for refinement Models useful for

identifying which members of a protein family have similar functionalities, and which are different

Modeling based on more distant evolutionary relationships Makes use of PSI-BLAST and hidden Markov

models Compile a profile for the sequence, compare this

profile to other known profiles Allows for prediction of structures, even when

sequence is not close Use of metaservers to find consensus structures

between CASP4 and CASP5 has led to improved accuracy

Modeling based on more distant evolutionary relationships Limitations:

Correct template may not be identified Alignment of target sequence to template is not trivial Significant fraction of residues will have no structural

equivalent in the template; modeling of these regions is hit or miss

Although regions are similar, they are not identical, and the greater the difference, the higher the error

Details are thus not accurate, but overall structure can be useful

For improvements, must work together with template-free methodologies

Modeling based on more distant evolutionary relationships

Modeling based on non-homologous fold relationships Protein “threading” In recent CASP experiments, these

methods have not been competitive with template free models

Template-free Modeling

For sequences where no template is available Historically physics based approaches were used Newer methods focus on substructures

While we have not seen all folds, we have probably seen nearly all substructures

Make use of substructure relationships From a few residues through SSEs to super-secondary

structures

Template-free Modeling

Range of possible conformations and considered Most successful package has been ROSETTA For proteins less than ~100 residues, produce

one or several approximately correct structures (4-6 A rmsd for C-alpha atoms)

Selecting the most accurate structures from all possibilities is still to be solved, typically make use of clustering currently

Development of atomic models is crucial to further progress

Template-free Modeling

CASP Progress