59
Visualisation/ prediction 3D structures

Visualisation/prediction 3D structures. Recognition ability is the basis of biological function 3D struture is key for recognition

Embed Size (px)

Citation preview

Visualisation/prediction 3D structures

Recognition ability is the basis of biological function

3D struture is key for recognition

Objectives

Visualize / understand 3D structures and their interactions Derive structure-function relationships

Predict 3D structure

Total entries

Protein folds

Structure prediction

aim

Structure prediction tries to build models of 3D structures of proteins that could be useful for understanding structure-function relationships.

The protein folding problem

The information for 3D structures is coded in the protein sequence

Proteins fold in their native structure in seconds

Native structures are both thermodynamically stables and kinetically available

AVVTW...GTTWVRAVVTW...GTTWVR

ab-initio prediction

Prediction from sequence using first principles

Ab-initio prediction

“In theory”, we should be able to build native structures from first principles using sequence information and molecular dynamics simulations: “Ab-initio prediction of structure”

Simulaciones de 1 s de “folding” de una proteína modelo (Duan-Kollman: Science, 277, 1793, 1998).

Simulaciones de folding reversible de péptidos (20-200 ns) (Daura et al., Angew. Chem., 38, 236, 1999).

Simulaciones distribuidas de folding de Villin (36-residues) (Zagrovic et al., JMB, 323, 927, 2002).

... the bad news ...

It is not possible to span simulations to the “seconds” range

Simulations are limited to small systems and fast folding/unfolding events in known structures steered dynamics biased molecular dynamics

Simplified systems

Some protein from ESome protein from E.coli.coli predicted at 7.6 Åpredicted at 7.6 Å

(CASP3, H.Scheraga)(CASP3, H.Scheraga)

Results from ab-initio

Average error 5 Average error 5 Å - 10 ÅÅ - 10 Å

Function cannot Function cannot be predictedbe predicted

Long simulationsLong simulations

comparative modelling

The most efficient way to predict protein structure is to compare with known 3D structures

Basic concept

In a given protein 3D structure is a more conserved characteristic than sequence Some aminoacids are “equivalent” to each

other Evolutionary pressure allows only

aminoacids substitutions that keep 3D structure largely unaltered

Two proteins of “similar” sequences must have the “same” 3D structure

Possible scenarios

1. Homology can be recognized using sequence comparison tools or protein family databases (blast, clustal, pfam,...).

Structural and functional predictions are feasible

2. Homology exist but cannot be recognized easily (psi-blast, threading)

Low resolution fold predictions are possible. No functional information.

3. No homology

1D predictions. Sequence motifs. Limited functional prediction. Ab-initio prediction

fold prediction

3D struc. prediction

1D prediction

Prediction is based on averaging aminoacid properties

AGGCFHIKLAAGIHLLVILVVKLGFSTRDEEASS

Average over a window

1D prediction. Properties

Secondary structure propensitites Hydrophobicity Accesibility ...

Aminoacido P() P() P(turn)Ala 1.29 0.9 0.78Cys 1.11 0.74 0.8Leu 1.3 1.02 0.59Met 1.47 0.97 0.39Glu 1.44 0.75 1Gln 1.27 0.8 0.97His 1.22 1.08 0.69Lys 1.23 0.77 0.96

Val 0.91 1.49 0.47Ile 0.97 1.45 0.51Phe 1.07 1.32 0.58Tyr 0.72 1.25 1.05Trp 0.99 1.14 0.75Thr 0.82 1.21 1.03

Gly 0.56 0.92 1.64Ser 0.82 0.95 1.33Asp 1.04 0.72 1.41Asn 0.9 0.76 1.23Pro 0.52 0.64 1.91

Arg 0.96 0.99 0.88

Propensities Chou-FasmanBiochemistry 17, 4277 1978

turn

Some programs (www.expasy.org)

BCM PSSP - Baylor College of Medicine Prof - Cascaded Multiple Classifiers for Secondary Structure

Prediction GOR I (Garnier et al, 1978) [At PBIL or at SBDS] GOR II (Gibrat et al, 1987) GOR IV (Garnier et al, 1996) HNN - Hierarchical Neural Network method (Guermeur, 1997) Jpred - A consensus method for protein secondary structure

prediction at University of Dundee nnPredict - University of California at San Francisco (UCSF) PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology,

PHDthreader, MaxHom, EvalSec from Columbia University PSA - BioMolecular Engineering Research Center (BMERC) /

Boston PSIpred - Various protein structure prediction methods at Brunel

University SOPM (Geourjon and Deléage, 1994) SOPMA (Geourjon and Deléage, 1995) AGADIR - An algorithm to predict the helical content of peptides

1D Prediction

Original methods: 1 sequence and uniform parameters (25-30%)

Original improvements: Parameters specific from protein classes

Present methods use sequence profiles obtained from multiple alignments and neural networks to extract parameters (70-75%, 98% for transmembrane helix)

PredictProtein (PHD)

1. Building of a multiple alignment using Swissprot, prosite, and domain databases

2. 1D prediction from the generated profile using neural networks

3. Fold recognition4. Confidence evaluation

PredictProteinAvailable information

Signal peptides SignalP O-glycosilation NetOglyc Chloroplast import signal CloroP Consensus secondary struc. JPRED Transmembrane TMHMM, TOPPRED SwissModel

Methods for remote homology

Homology can be recognized using PSI-Blast

Fold prediction is possible using threading methods

Acurate 3D prediction is not possible: No structure-function relationship can be inferred from models

Threading

Unknown sequence is “folded” in a number of known structures

Scoring functions evaluate the fitting between sequence and structure according to statistical functions and sequence comparison

ATTWV....PRKSCTATTWV....PRKSCT

..........

10.510.5 5.2>> ..........

SELECTED HITSELECTED HIT

ATTWV....PRKSCTATTWV....PRKSCT SequenceSequenceHHHHH....CCBBBBHHHHH....CCBBBB Pred. Sec. Struc.Pred. Sec. Struc.eeebb....eeebebeeebb....eeebeb Pred. accesibilityPred. accesibility

..........

SequenceSequence GGTV....ATTW ........... ATTVL....FFRKGGTV....ATTW ........... ATTVL....FFRKObs SS Obs SS BBBB....CCHH ........... HHHB.....CBCB BBBB....CCHH ........... HHHB.....CBCB Obs Acc. Obs Acc. EEBE.....BBEB ........... BBEBB....EBBEEEBE.....BBEB ........... BBEBB....EBBE

Technical aspectsTechnical aspects

Alignment:Alignment: Dynamic programming Dynamic programming (Needleman & Wunsch, 1970)(Needleman & Wunsch, 1970)

Scoring FunctionScoring Function::

wwseqseq.P.Pseqseq + w + wstrstr . (P . (PSSSS + P + PACAC))

PPseqseq: Dayhoff matrix, P: Dayhoff matrix, PSSSS y P y PACAC: probability : probability

model on pred. SS and ACmodel on pred. SS and AC

Threading accurancyThreading accurancy

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

% ACIERTOS

5 10 15 20 25

% IDENTIDAD SECUENCIAS

3D-PSSM Steps

Building of 1D/superfamily profile Building of 3D/superfamily profile Determine/predict secondary

structure and accesibility Best score from

1. Structure vs. query PSSM2. Query vs. 1D-PSSM structures3. Query vs. 3D-PSSM structures

Comparative modelling

Good for homology >30%

Accurancy is very high for homology > 60%

Remainder

The model must be USEFUL Only the “interesting” regions of the

protein need to be modelled

Expected accurancy

Strongly dependent on the quality of the sequence alignment

Strongly dependent on the identity with “template” structures. Very good structures if identity > 60-70%.

Quality of the model is better in the backbone than side chains

Quality of the model is better in conserved regions

Steps

1. Alignment of template structures2. Alignment of unknown sequence

against template alignment3. Build structure of conserved

regions (SCR)4. Build of unconserved regions

(“loops” usually)

Optimization

1. Optimize side chain conformation1. Energy minimization restricted to standard

conformers and VdW energy

2. Optimize everything• Global energy minimization with restrains• Molecular dynamics

Quality test

No energy differences between a correct or wrong model

The structure must by “chemically correct” to use it in quantitative predictions

Prediction software

SwissModel (automatic) http://www.expasy.org/swissmod/

SwissModel Repository http://swissmodel.expasy.org/repository/

3D-JIGSAW (M.Stenberg) http://www.bmm.icnet.uk/servers/3djigsaw/

Modeller (A.Sali) http://salilab.org/modeller/modeller.html

MODBASE (A. Sali) http://alto.compbio.ucsf.edu/modbase-cgi/

index.cgi

Resultspdbv

Final test

The model must justify experimental data (i.e. differences between unknown sequence and templates) and be useful to understand function.