Upload
berniece-harmon
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Homology Modeling
Lu Chih-Hao
1
Why study protein structure?
• Proteins play crucial functional roles in all biological processes: enzymatic catalysis, signaling messengers …
• Function depends on 3D structure.
• Easy to obtain protein sequences, difficult to determine structure.
2
Where find the data?
• Protein Data Bank (PDB)– http://www.rcsb.org/pdb/– > ~65,500 structures of proteins
• Text file contain: coordinates for each heavy atom from the first residue to the last
X Y Z
3
PDB Statistics
4
TIM barrel
5
How to determine the protein structure?
• By experimentation– X-Ray– NMR (nuclear magnetic resonance spectroscopy)
• Sequence-Structure gap
6
Protein Structure Prediction
• The primary sequence already contain all the information necessary to define 3D structure.
• The 3D protein structure can be predicted according to three main categories of methods (Rost & O’Donoghue, 1997): (1) homology modeling; (2) fold recognition (threading); (3) ab initio techniques.
• Homology modeling is currently the most accurate method to predict protein 3D structure (Tramontano, 1998).
7
Protein Structure Prediction
Sequence
Sequence HomologyTo known fold
HomologyModeling
>30%
Threading
Match Found?
Ab initio
No
Model
Yes
<30%
8
.
0
20
40
60
80
100
0 50 100 150 200 250
identity
Number of residues aligned
Perc
enta
ge s
equence
identi
ty/s
imila
rity
(B.Rost, Columbia, NewYork)
Sequence identity implies structural similarity
Sequence similarity implies structural similarity?
Safe zone
9
Homology Modeling
• Basis– Structure is much more conserved than sequence
during evolution
• Limited applicability– A large number of proteins and ORFs have no
similarity to proteins with known structure
10
??
KQFTKCELSQNLYDIDGYGRIALPELICTMFHTSGYDTQAIVENDESTEYGLFQISNALWCKSSQSPQSRNICDITCDKFLDDDITDDIMCAKKILDIKGIDYWIAHKALCTEKLEQWLCEKE
Use as template
8lyz1alc
KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRLShare Similar
Sequence
Homologous
What is Homology Modeling?
Target Template
11
Structure prediction by homology modeling
12
Step 1
Step 2
Step 3
Step 4
Homology detection and template selection
• Homology detection– To detect the fold of a probe sequence from a library
of known target fold.
• The three type of sequence based methods:– Pair-wise sequence-sequence comparison
• FASTA, BLAST
– Sequence profile comparison• PSI-BLAST, IMPALA, HMMER, SAM
– Profile-profile comparison• prof_sim, COMPASS
13
Q T
Sequence-Sequence comparison
BLAST, FASTA, SSEARCH
14
Q T
Profile-Sequence comparison
PSI-BLAST15
PSI-BLAST Overview
16
Q T
Sequence-Profile comparison
RPS-BLAST, IMPALA, HMMER, SAM
17
Q T
Profile-Profile comparison
prof_sim, COMPASS18
Method_11lmb3 <-> 1pou shift = 9.34 σ = 39.62LEDARRLKAIYEKKKNELGLSQESVADKMGMGQSGVGALFNGINALNAYNAALLAKILKVSVEEFSPSIAREIYEMYEAHHHHHHHHHHHHHHHHHCCCChhhhhhhhccchhhhhhhhccccccchhhhhhhhhhhccchhhcchhhhhhhhhhhhh||||||||||||||||||||| ++++++++ + ++++++++++++ ++++++++000000000000000000000 99999999 X XXXXXXXXXXXX XXXXXXXXHHHHHHHHHHHHHHHHHHCCC---------cchhhhhhhhhcccccc---chhhhhhhcccccccchhhhhhhhhhhhhLEELEQFAKTFKQRRIKLGFT---------QGDVGLAMGKLYGNDFS---QTTISRFEALNLSFKNMCKLKPLLEKWLN
Method_21lmb3 <-> 1pou Shift = 0.67 σ = 60.78LEDARRLKAIYEKKKNELGLS----QESVADKMG--MGQSGVGALFN-GINALNAYNAALLAKILKVSVEEFSHHHHHHHHHHHHHHHHHCCCC----hhhhhhhhc--cCHHHHHHHHC-cccccchhhhhhhhhhhccchhhcc||||||||||||||||||||| ---- |||||||||| -- ++++++++ ++ 000000000000000000000 4444 0000000000 11 11111111 44 HHHHHHHHHHHHHHHHHHCCCcchhhhhhhhhcccccCCHHHHHHHCccccccchhhhhhhhhhh---hhhccLEELEQFAKTFKQRRIKLGFTQGDVGLAMGKLYGNDFSQTTISRFEALNLSFKNMCKLKPLLEKW---LNDAE
The importance of the sequence alignment
SCR; structure conserved region SVR; structure variable region
19
Backbone generation
• Rigid-body assembly– Building model core
20
21
Construction of loops might be done by:
Wedemeyer,ScheragaJ. Comput. Chem.20, 819-844(1999)
Ab initio methods - without any prior knowledge. This is done by empirical scoring functions that check large number of conformations and evaluates each of them.
22
data clustereddata
library
Construction of loops might be done by:
Using database of loops which appear in known structures. The loops could be categorized by their length or sequence
23
Scan database and search protein fragments with correct number of residuesand correct end-to-end distances
24
25
26
Loop length
cRM
S (Ǻ
)
Method breaksdown for loopslarger than 9
Loop Modeling: A database approach
27
GDT_TS = 45.96 GDT_TS = 60.48
Predicted model with long loop Without loop
Target: 2bj7A
28
29
Errors in Homology Modeling
a) Side chain packing b)Distortions and shifts c) No template
Template ModelTrue structure30
Errors in Homology Modeling
d) Misalignments e) Incorrect template
(Marti-Renom et al., 2000)
Template ModelTrue structure31
PROCHECK, Verify3D, Prosa, Anolea, Bala …
32
PROCHECK
β
α http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html
33
Verify3D
• Verify3D analyzes the compatibility of an atomic model (3D) with its own amino acid sequence (1D).
Luethy et al., 1992 34
ProQ Server
• ProQ is a neural network-based predictor
– Structural features quality of a protein model.
Arne Elofssons group: http://www.sbc.su.se/~bjorn/ProQ/
Correct Good Very goodLGscore > 1.5 LGscore > 3 LGscore > 5MaxSub > 0.1 MaxSub > 0.5 MaxSub > 0.8
35
Modeling accuracy
(Marti-Renom et al., 2000)
36
Utility of Structural Information
37
38
39
(PS)2: protein structure prediction server
40
Consensus strategy
• The idea of consensus analysis is to gather predictions from a set of different methods.
• The performance of consensus methods is significantly higher than for individual methods.
3d-shotgun (Fischer D., 2003)3d-jury (Ginalski K et al., 2003)Pmodeller (Bjorn W et al., 2003)
41
Structure prediction by homology modeling
42
Step 1
Step 2
Step 3
Step 4
Figure 1. Overview of the protein structure prediction server, (PS)2.
Overview of the (PS)2 method
Step1: Template search/selection by the
consensus of PSI-BLAST and IMPALA
Step2: Target-template alignment by the consensus of T-Coffee, PSI-BLAST,
and IMPALA
Step3: Model building by MODELLER and structure evaluation and visualization
by CHIME and Raster3D
(d)
(a)
(b) (c)
43
: Aligned path of PSI-BLAST
: Aligned path of T-Coffee
: Aligned path of IMPALA
: Final aligned path
9: aligned in 1st cycle7: aligned in 2nd cycle5: aligned in 3rd cycle3: aligned in 4th cycle4 and 2: unfeasible solution
Input: target and template sequences
Output: target-template aligned sequences
Step 1: Initial all entries of the aligned matrix to 0. Align target and template sequences using PSI-BLAST, IMPALA, and T-Coffee.
Step 2: Sum aligned scores of these three alignments for each position with different scoring weights.
Step 3: Take the positions with the highest score as the aligned points to build the final target-template alignment. (e.g., the highest scoring is 9 for the 1st cycle in (b) )
Step 4: Identify the unfeasible positions. ( 4 and 2 in (b))
Step 5: Change the scores of unfeasible positions and the aligned points to 0.
Step 6: Repeatedly Steps 3 and 5 until all entries are 0.
Step 7: Output the path with the aligned points as the target-template alignment
(b)(a)
Alignment method
44
http://predictioncenter.org/
45
CASP3 servers registered:
1. 3D-PSSM (Sternberg) [email protected] 2. Karplus [email protected] 3. frsvr (Fischer) [email protected] 4. pscan (Eloffson) [email protected] 5. BASIC (Godzik) [email protected] 6. GenTHREADER [email protected] 7. Valentina di Francesco [email protected] 8. TOPITS (Rost) [email protected] 9. Bork
46
CASP8 servers registered:
47
}8 4, 2, 1,{(%)4
100_
dN
GDT
TSGDT d
d
- N is the total number residues of the target (native structure)- GDTd is the number of aligned residues whose Cα-atom distance
between the target and predicted model is less than d- d is 1, 2, 4, or 8 Å.
Model Evaluation
• Performance evaluation– Comparing the 47 CM targets to evaluate the
performance with the other groups in CASP6.
• GDT_TS Score
48
10 272
6 294 Native structure
PSI-BLAST modelGDT_TS = 64.97
272(PS)2 modelGDT_TS = 67.22
GDT_TS = 66.00
10
10 272 IMPALA modelGDT_TS = 63.32
294 T-Coffee modelGDT_TS = 65.14
T0264 (1wde)
Aligned rate: 91.00 %
Aligned rate: 91.00 %
Aligned rate: 100 %6
Aligned rate: 100 %6 294
Figure 3. Comparison (PS)2 with PSI-BLAST, IMPALA, and T-Coffee of the prediction accuracies (global / local GDT_TS scores) on target T0264.
49
Figure 4. Comparison of (PS)2 models with all automated servers in CASP6.
0
20
40
60
80
100
T0196
T0199_1
T0200
T0204
T0205
T0208
T0211
T0222_1
T0223_1
T0226
T0226_1
T0229
T0229_1
T0229_2
T0231
T0233
T0233_1
T0233_2
T0234
T0235_1
T0240
T0246
T0247
T0247_1
T0247_2
T0247_3
T0264
T0264_1
T0264_2
T0266
T0267
T0268
T0268_1
T0268_2
T0269
T0269_1
T0269_2
T0271
T0274
T0275
T0276
T0277
T0279
T0279_1
T0279_2
T0280_1
T0282
Targets
GD
T_T
S Sc
ore
(%)
50
51
Table 1. Compare with the other groups in CASP6
(PS)2 RBTA ESYP 3DJR MGTH 3DJS PROS PMO5 PRCM PCO5 PCOB
Average GDT_TS
65.89 64.92 63.14 62.54 61.27 61.08 58.11 57.93 57.62 56.37 37.57
• Cases
T0269, Template 1prxA(PS)2 model, GDT_TS: 85.76
T0269, Template 1qq2AESYP model, GDT_TS: 78.48
http://ps2.life.nctu.edu.tw
52
53
54