54
Homology Modeling Lu Chih-Hao 1

Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Embed Size (px)

Citation preview

Page 1: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Homology Modeling

Lu Chih-Hao

1

Page 2: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Why study protein structure?

• Proteins play crucial functional roles in all biological processes: enzymatic catalysis, signaling messengers …

• Function depends on 3D structure.

• Easy to obtain protein sequences, difficult to determine structure.

2

Page 3: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Where find the data?

• Protein Data Bank (PDB)– http://www.rcsb.org/pdb/– > ~65,500 structures of proteins

• Text file contain: coordinates for each heavy atom from the first residue to the last

X Y Z

3

Page 4: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

PDB Statistics

4

Page 5: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

TIM barrel

5

Page 6: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

How to determine the protein structure?

• By experimentation– X-Ray– NMR (nuclear magnetic resonance spectroscopy)

• Sequence-Structure gap

6

Page 7: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Protein Structure Prediction

• The primary sequence already contain all the information necessary to define 3D structure.

• The 3D protein structure can be predicted according to three main categories of methods (Rost & O’Donoghue, 1997): (1) homology modeling; (2) fold recognition (threading); (3) ab initio techniques.

• Homology modeling is currently the most accurate method to predict protein 3D structure (Tramontano, 1998).

7

Page 8: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Protein Structure Prediction

Sequence

Sequence HomologyTo known fold

HomologyModeling

>30%

Threading

Match Found?

Ab initio

No

Model

Yes

<30%

8

Page 9: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

.

0

20

40

60

80

100

0 50 100 150 200 250

identity

Number of residues aligned

Perc

enta

ge s

equence

identi

ty/s

imila

rity

(B.Rost, Columbia, NewYork)

Sequence identity implies structural similarity

Sequence similarity implies structural similarity?

Safe zone

9

Page 10: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Homology Modeling

• Basis– Structure is much more conserved than sequence

during evolution

• Limited applicability– A large number of proteins and ORFs have no

similarity to proteins with known structure

10

Page 11: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

??

KQFTKCELSQNLYDIDGYGRIALPELICTMFHTSGYDTQAIVENDESTEYGLFQISNALWCKSSQSPQSRNICDITCDKFLDDDITDDIMCAKKILDIKGIDYWIAHKALCTEKLEQWLCEKE

Use as template

8lyz1alc

KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRLShare Similar

Sequence

Homologous

What is Homology Modeling?

Target Template

11

Page 12: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Structure prediction by homology modeling

12

Step 1

Step 2

Step 3

Step 4

Page 13: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Homology detection and template selection

• Homology detection– To detect the fold of a probe sequence from a library

of known target fold.

• The three type of sequence based methods:– Pair-wise sequence-sequence comparison

• FASTA, BLAST

– Sequence profile comparison• PSI-BLAST, IMPALA, HMMER, SAM

– Profile-profile comparison• prof_sim, COMPASS

13

Page 14: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Q T

Sequence-Sequence comparison

BLAST, FASTA, SSEARCH

14

Page 15: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Q T

Profile-Sequence comparison

PSI-BLAST15

Page 16: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

PSI-BLAST Overview

16

Page 17: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Q T

Sequence-Profile comparison

RPS-BLAST, IMPALA, HMMER, SAM

17

Page 18: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Q T

Profile-Profile comparison

prof_sim, COMPASS18

Page 19: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Method_11lmb3 <-> 1pou shift = 9.34 σ = 39.62LEDARRLKAIYEKKKNELGLSQESVADKMGMGQSGVGALFNGINALNAYNAALLAKILKVSVEEFSPSIAREIYEMYEAHHHHHHHHHHHHHHHHHCCCChhhhhhhhccchhhhhhhhccccccchhhhhhhhhhhccchhhcchhhhhhhhhhhhh||||||||||||||||||||| ++++++++ + ++++++++++++ ++++++++000000000000000000000 99999999 X XXXXXXXXXXXX XXXXXXXXHHHHHHHHHHHHHHHHHHCCC---------cchhhhhhhhhcccccc---chhhhhhhcccccccchhhhhhhhhhhhhLEELEQFAKTFKQRRIKLGFT---------QGDVGLAMGKLYGNDFS---QTTISRFEALNLSFKNMCKLKPLLEKWLN

Method_21lmb3 <-> 1pou Shift = 0.67 σ = 60.78LEDARRLKAIYEKKKNELGLS----QESVADKMG--MGQSGVGALFN-GINALNAYNAALLAKILKVSVEEFSHHHHHHHHHHHHHHHHHCCCC----hhhhhhhhc--cCHHHHHHHHC-cccccchhhhhhhhhhhccchhhcc||||||||||||||||||||| ---- |||||||||| -- ++++++++ ++ 000000000000000000000 4444 0000000000 11 11111111 44 HHHHHHHHHHHHHHHHHHCCCcchhhhhhhhhcccccCCHHHHHHHCccccccchhhhhhhhhhh---hhhccLEELEQFAKTFKQRRIKLGFTQGDVGLAMGKLYGNDFSQTTISRFEALNLSFKNMCKLKPLLEKW---LNDAE

The importance of the sequence alignment

SCR; structure conserved region SVR; structure variable region

19

Page 20: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Backbone generation

• Rigid-body assembly– Building model core

20

Page 21: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

21

Page 22: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Construction of loops might be done by:

Wedemeyer,ScheragaJ. Comput. Chem.20, 819-844(1999)

Ab initio methods - without any prior knowledge. This is done by empirical scoring functions that check large number of conformations and evaluates each of them.

22

Page 23: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

data clustereddata

library

Construction of loops might be done by:

Using database of loops which appear in known structures. The loops could be categorized by their length or sequence

23

Page 24: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Scan database and search protein fragments with correct number of residuesand correct end-to-end distances

24

Page 25: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

25

Page 26: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

26

Page 27: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Loop length

cRM

S (Ǻ

)

Method breaksdown for loopslarger than 9

Loop Modeling: A database approach

27

Page 28: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

GDT_TS = 45.96 GDT_TS = 60.48

Predicted model with long loop Without loop

Target: 2bj7A

28

Page 29: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

29

Page 30: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Errors in Homology Modeling

a) Side chain packing b)Distortions and shifts c) No template

Template ModelTrue structure30

Page 31: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Errors in Homology Modeling

d) Misalignments e) Incorrect template

(Marti-Renom et al., 2000)

Template ModelTrue structure31

Page 32: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

PROCHECK, Verify3D, Prosa, Anolea, Bala …

32

Page 33: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

PROCHECK

β

α http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html

33

Page 34: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Verify3D

• Verify3D analyzes the compatibility of an atomic model (3D) with its own amino acid sequence (1D).

Luethy et al., 1992 34

Page 35: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

ProQ Server

• ProQ is a neural network-based predictor

– Structural features quality of a protein model.

Arne Elofssons group: http://www.sbc.su.se/~bjorn/ProQ/

Correct Good Very goodLGscore > 1.5 LGscore > 3 LGscore > 5MaxSub > 0.1 MaxSub > 0.5 MaxSub > 0.8

35

Page 36: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Modeling accuracy

(Marti-Renom et al., 2000)

36

Page 37: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Utility of Structural Information

37

Page 38: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

38

Page 39: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

39

Page 40: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

(PS)2: protein structure prediction server

40

Page 41: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Consensus strategy

• The idea of consensus analysis is to gather predictions from a set of different methods.

• The performance of consensus methods is significantly higher than for individual methods.

3d-shotgun (Fischer D., 2003)3d-jury (Ginalski K et al., 2003)Pmodeller (Bjorn W et al., 2003)

41

Page 42: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Structure prediction by homology modeling

42

Step 1

Step 2

Step 3

Step 4

Page 43: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Figure 1. Overview of the protein structure prediction server, (PS)2.

Overview of the (PS)2 method

Step1: Template search/selection by the

consensus of PSI-BLAST and IMPALA

Step2: Target-template alignment by the consensus of T-Coffee, PSI-BLAST,

and IMPALA

Step3: Model building by MODELLER and structure evaluation and visualization

by CHIME and Raster3D

(d)

(a)

(b) (c)

43

Page 44: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

: Aligned path of PSI-BLAST

: Aligned path of T-Coffee

: Aligned path of IMPALA

: Final aligned path

9: aligned in 1st cycle7: aligned in 2nd cycle5: aligned in 3rd cycle3: aligned in 4th cycle4 and 2: unfeasible solution

Input: target and template sequences

Output: target-template aligned sequences

Step 1: Initial all entries of the aligned matrix to 0. Align target and template sequences using PSI-BLAST, IMPALA, and T-Coffee.

Step 2: Sum aligned scores of these three alignments for each position with different scoring weights.

Step 3: Take the positions with the highest score as the aligned points to build the final target-template alignment. (e.g., the highest scoring is 9 for the 1st cycle in (b) )

Step 4: Identify the unfeasible positions. ( 4 and 2 in (b))

Step 5: Change the scores of unfeasible positions and the aligned points to 0.

Step 6: Repeatedly Steps 3 and 5 until all entries are 0.

Step 7: Output the path with the aligned points as the target-template alignment

(b)(a)

Alignment method

44

Page 45: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

http://predictioncenter.org/

45

Page 46: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

CASP3 servers registered:

1. 3D-PSSM (Sternberg) [email protected] 2. Karplus [email protected] 3. frsvr (Fischer) [email protected] 4. pscan (Eloffson) [email protected] 5. BASIC (Godzik) [email protected] 6. GenTHREADER [email protected] 7. Valentina di Francesco [email protected] 8. TOPITS (Rost) [email protected] 9. Bork

46

Page 47: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

CASP8 servers registered:

47

Page 48: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

}8 4, 2, 1,{(%)4

100_

dN

GDT

TSGDT d

d

- N is the total number residues of the target (native structure)- GDTd is the number of aligned residues whose Cα-atom distance

between the target and predicted model is less than d- d is 1, 2, 4, or 8 Å.

Model Evaluation

• Performance evaluation– Comparing the 47 CM targets to evaluate the

performance with the other groups in CASP6.

• GDT_TS Score

48

Page 49: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

10 272

6 294 Native structure

PSI-BLAST modelGDT_TS = 64.97

272(PS)2 modelGDT_TS = 67.22

GDT_TS = 66.00

10

10 272 IMPALA modelGDT_TS = 63.32

294 T-Coffee modelGDT_TS = 65.14

T0264 (1wde)

Aligned rate: 91.00 %

Aligned rate: 91.00 %

Aligned rate: 100 %6

Aligned rate: 100 %6 294

Figure 3. Comparison (PS)2 with PSI-BLAST, IMPALA, and T-Coffee of the prediction accuracies (global / local GDT_TS scores) on target T0264.

49

Page 50: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

Figure 4. Comparison of (PS)2 models with all automated servers in CASP6.

0

20

40

60

80

100

T0196

T0199_1

T0200

T0204

T0205

T0208

T0211

T0222_1

T0223_1

T0226

T0226_1

T0229

T0229_1

T0229_2

T0231

T0233

T0233_1

T0233_2

T0234

T0235_1

T0240

T0246

T0247

T0247_1

T0247_2

T0247_3

T0264

T0264_1

T0264_2

T0266

T0267

T0268

T0268_1

T0268_2

T0269

T0269_1

T0269_2

T0271

T0274

T0275

T0276

T0277

T0279

T0279_1

T0279_2

T0280_1

T0282

Targets

GD

T_T

S Sc

ore

(%)

50

Page 51: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

51

Table 1. Compare with the other groups in CASP6

(PS)2 RBTA ESYP 3DJR MGTH 3DJS PROS PMO5 PRCM PCO5 PCOB

Average GDT_TS

65.89 64.92 63.14 62.54 61.27 61.08 58.11 57.93 57.62 56.37 37.57

• Cases

T0269, Template 1prxA(PS)2 model, GDT_TS: 85.76

T0269, Template 1qq2AESYP model, GDT_TS: 78.48

Page 52: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

http://ps2.life.nctu.edu.tw

52

Page 53: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

53

Page 54: Homology Modeling Lu Chih-Hao 1. Why study protein structure? Proteins play crucial functional roles in all biological processes: enzymatic catalysis,

54