Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
/113© Burkhard Rost
1
title: Computational Biology 1 - Protein structure: Intro: 3D comparisonsshort title: cb1_intro_3dcompare
lecture: Protein Prediction 1 - Protein structure Computational Biology 1 - TUM Summer 2015
/113© Burkhard Rost
Protein Prediction - Part 1: Structure
1 Introduction (contd.)
2
/113© Burkhard Rost
3
Notation: protein structure 1D, 2D, 3DPQITLWQRPLVTIKIGGQLKEALLDTGADDTVL
PP PQQQYFFQVISSIVRLLSTLWWQEDRKQAKRRRPQPPPPPVVTKFVVLIITTKEKAALIVHYKKFIILVIEENGGGGGTGQQKRRPPLWWVVFKVEESKKVVGLGLLILLLLLVVDDDDDTTTTTGGGGGAAAAADDDDDDDAKESSTTVIIVIVVVIVL
1281757077
120238169200247114740
904
466268
11831
1241
292449726217
102691
140
1109760691481976248590
690
730
415371597395000
5851300
79586900
EEEEE
EEEEEE
EEEEEEE
EE
EEEEE
EEEEEE
EE
kcal/mol0 -1 -2 -3 -4 -5
1 10 20 30 40 50 60 70 80 90
1
10
20
30
40
50
60
70
80
90
1D1D 2D2D 3D3D
Doyle et al (1998) Science 280:69-77 - The structure of the potassium channel
/113© Burkhard Rost
4
Notation: protein structure 1D, 2D, 3DPQITLWQRPLVTIKIGGQLKEALLDTGADDTVL
PP PQQQYFFQVISSIVRLLSTLWWQEDRKQAKRRRPQPPPPPVVTKFVVLIITTKEKAALIVHYKKFIILVIEENGGGGGTGQQKRRPPLWWVVFKVEESKKVVGLGLLILLLLLVVDDDDDTTTTTGGGGGAAAAADDDDDDDAKESSTTVIIVIVVVIVL
1281757077
120238169200247114740
904
466268
11831
1241
292449726217
102691
140
1109760691481976248590
690
730
415371597395000
5851300
79586900
EEEEE
EEEEEE
EEEEEEE
EE
EEEEE
EEEEEE
EE
kcal/mol0 -1 -2 -3 -4 -5
1 10 20 30 40 50 60 70 80 90
1
10
20
30
40
50
60
70
80
90
1D1D 2D2D 3D3D
/113© Burkhard Rost
5
Notation: protein structure 1D, 2D, 3DPQITLWQRPLVTIKIGGQLKEALLDTGADDTVL
PP PQQQYFFQVISSIVRLLSTLWWQEDRKQAKRRRPQPPPPPVVTKFVVLIITTKEKAALIVHYKKFIILVIEENGGGGGTGQQKRRPPLWWVVFKVEESKKVVGLGLLILLLLLVVDDDDDTTTTTGGGGGAAAAADDDDDDDAKESSTTVIIVIVVVIVL
1281757077
120238169200247114740
904
466268
11831
1241
292449726217
102691
140
1109760691481976248590
690
730
415371597395000
5851300
79586900
EEEEE
EEEEEE
EEEEEEE
EE
EEEEE
EEEEEE
EE
kcal/mol0 -1 -2 -3 -4 -5
1 10 20 30 40 50 60 70 80 90
1
10
20
30
40
50
60
70
80
90
1D1D 2D2D 3D3D
/113© Burkhard Rost
Comparing 3D structures
6
/113© Burkhard Rost
7
Blue and red similar?
Doyle et al. (1998) Science 280:69-77 - The structure of the potassium channel: molecular basis of K+ conduction and selectivity
/113© Burkhard Rost
8
Similarity now cleare?
Doyle et al. (1998) Science 280:69-77 - The structure of the potassium channel: molecular basis of K+ conduction and selectivity
/113© Burkhard Rost
3D comparisons: how to?
9
/113© Burkhard Rost
10
Matching shapes
© http://www.whattoexpect.com/toddler/photo-gallery/best-toys-for-toddlers.aspx#/slide-3
/113© Burkhard Rost
11
How to match?
?
/113© Burkhard Rost
12
How to match?
?
/113© Burkhard Rost
13
Differences for corresponding points
/113© Burkhard Rost
14
Differences for corresponding points
/113© Burkhard Rost
15
Differences for corresponding points
/113© Burkhard Rost
16
Differences for corresponding points
/113© Burkhard Rost
17
Differences for corresponding points
12
34
56
87
/113© Burkhard Rost
18
Differences for corresponding points
12
34
56
87
Difference= d1+d2+d3...+d8= |r1a-r1b|+...+|r8a-r8b|
RMSD (root mean square deviation)=SQRT [ (r1a-r1b)2+...(r8a-r8b)2 ]=
!"#$ !,! = !!! − !!!!
!!
/113© Burkhard Rost
19
Differences for corresponding points
12
34
56
87
12
34
56
87
!"#$ !,! = !!! − !!!!
!!
/113© Burkhard Rost
1st: find corresponding points 2nd: superimpose
20
Actual algorithm inversed
!"#$ !,! = !!! − !!!!
!!
/113© Burkhard Rost
21
fit now?
/113© Burkhard Rost
22
Scaling easy for simple shapes
x^2+y^2=r^2
/113© Burkhard Rost
23
Proteins: points are defined->no scaling
12
34
56
87
Doyle et al (1998) Science 280:69-77 - The structure of the potassium channel
/113© Burkhard Rost
24
Global vs. local comparisons
/113© Burkhard Rost
25
Global vs. local comparisons
/113© Burkhard Rost
26
Global vs. local comparisons
global solution 1:
global solution 2:
/113© Burkhard Rost
27
cut into “units”
/113© Burkhard Rost
28
cut into “units”
/113© Burkhard Rost
29
trouble: where to stop?
valid “unit”for comparison?
/113© Burkhard Rost
30
How to decide what is a valid unit?
??
???
/113© Burkhard Rost
31
valid “unit”for comparison?
Decision upon validity
/113© Burkhard Rost
Scientifically significant:some expert says
32
Valid or not?
/113© Burkhard Rost
Scientifically significant:some expert says
Statistically significant:background
33
Valid or not?
background
signal
/113© Burkhard Rost
34
Cut, match, compare by RMSD
!"#$ !,! = !!! − !!!!
!!
/113© Burkhard Rost
35
Only Cartesian RMSD comparison?
12
34
56
87
12
34
56
87
!"#$ !,! = !!! − !!!!
!!
/113© Burkhard Rost
36
2D: difference matrix
12
34
56
87
1 2 3 4 5 6 7 81
2 3 4 5 6 7 8
/113© Burkhard Rost
37
Comparison 2D: differences of differences
12
34
56
87
Total of 8 x 8 differences
/113© Burkhard Rost
3D comparisons: biology
38
/113© Burkhard Rost
Slides taken from Patrice Koehl, UC Davis
39
Structure alignment
Patrice Koehl
/113© Burkhard Rost
1. Identify equivalent positions (residues that match in 3D) 2. find superposition independent of domain movements
40
Structure alignment: two steps
Patrice Koehl UC Davies: http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf
© Patrice Koehl, UC Davis
/113© Burkhard Rost
Step 1: find corresponding points in proteins A and B d(i) are the distances between all corresponding points (typic: Calpha, all atoms)
41
rmsd(A,B) = N1
i=1
N
di2
Root mean square displacement (rmsd)
/113© Burkhard Rost
42
RMSD is not a metric
A similar B B similar C NOT implying: A similar C
cRMSD = 2.8 Ǻ = 0.28 nm
A
B C
cRMSD = 2.85 Ǻ = 0.285 nm
/113© Burkhard Rost
SSAP 3D alignment
Taylor & Orengo
43
/113© Burkhard Rost
44
SSAP concept
Sik=a
m=-n
m=+n
| di,i+m - dk,k+m | A B + b
WR Taylor & CA Orengo (1989) Protein structure alignment JMB 208:1-22
/113© Burkhard Rost
45
SSAP concept
Sik=a
m=-n
m=+n
| di,i+m - dk,k+m | A B + b
WR Taylor & CA Orengo (1989) Protein structure alignment JMB 208:1-22
Problem: loss of information about direction
/113© Burkhard Rost
DALI 3D alignment
Holm & Sander
46
/113© Burkhard Rost
L Holm & C Sander (1993) JMB 233:123-38
Distance matrix Alignment Algorithm: Monte Carlo on all-against-all for hexapeptides (5)
47
Structural alignment: DALI
L Holm & C Sander 91993) JMB 233:123-38: Fig. 1
/113© Burkhard Rost
Vorolign 3D alignment
Birzele & Zimmer
48
/113© Burkhard Rost
Fabian Birzele, Jan E Gewehr, Gergely Csaba & Ralf Zimmer (2006) Bioinformatics 23:e205-11 Dynamic programming on Voronoi environments
49
Structural alignment: VOROLIGN
F Birzele, JE Gewehr, G Csaba & R Zimmer (2006) Bioinformatics 23:e205-11: Fig. 2
/113© Burkhard Rost
3D comparisons: others
50
/113© Burkhard Rost
51
2 forms of calcium-bound Calmodulin
Two forms of calcium-bound Calmodulin:
Ligand free
Complexed with trifluoperazine
Patrice Koehl UC Davies: http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf
© Patrice Koehl, UC Davis
/113© Burkhard Rost
52
Global alignment: RMSD =15 Å /143 residues
Local alignment: RMSD = 0.9 Å/ 62 residues
Patrice Koehl UC Davies: http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf
© Patrice Koehl, UC Davis
/113© Burkhard Rost
SSAP WR Taylor & CA Orengo 1989 JMB 208:1-22 DALI L Holm & C Sander 1993 JMB 233:123-38 STRUCTAL S Subbiah, DV Laurents M Levitt 1993 Curr Biol 3:141-8 CE IN Shindyalov & P Bourne 1998 Prot Engng 1:739-47 VAST JF Gibrat, T Madej, SH Bryant 1996 Curr Opin Struct Biol 6:377-85 LSQMAN GJ Kleywegt 1996 Acta Crystallogr. D 52:842-57 SSM E Krissinel & K Henrick 2004 Acta Crystallogr. D 60:2256-68 SKAN A Yan, D Petrey & B Honig, unpublished …
53
Structure alignment methods
/113© Burkhard Rost
Rachel Kolodny, Patrice Koehl, Michael Levitt
R Kolodny, P Koehl & M Levitt (2004) Comprehensive Evaluation of Protein Structure Alignment Methods: Scoring by Geometric Measures JMB 346:1173-88
54
Comparison of structure alignments
Rachel Kolodny Univ of Haifa
Patrice Koehl UC Davis
Michael Levitt Stanford Univ
/113© Burkhard Rost
55
How to assess 3D comparisons? standard-of-truth?
??
???
/113© Burkhard Rost
56
Comparison of structure alignments
dashed lines: original method solid lines: SAS measure
R Kolodny, P Koehl & M Levitt (2004) JMB 346:1173-88 (Fig. 1A)
/113© Burkhard Rost
57
Comparison of structure alignments
dashed lines: original method solid lines: SAS measure
R Kolodny, P Koehl & M Levitt (2004) JMB 346:1173-88 (Fig. 1A)
Best-of-All
/113© Burkhard Rost
3D comparisons: protein space and databases
58
/113© Burkhard Rost
59
Structural universe
B Rost 1998 Structure 6:259-263
© Burkhard Rost /113
Unit of the universe: a domain
60
/113© Burkhard Rost
61
3D modules
Multiple 3D alignment identifies consensus secondary structure
/113© Burkhard Rost
62
Guessing domains from sequence
protein Aprotein Bprotein Cprotein Dprotein Eprotein F
domain 1 domain 2
/113© Burkhard Rost
63
Structural universe
B Rost 1998 Structure 6:259-263
/113© Burkhard Rost
64
Evolution of pieces
© Andrei Lupas MPI Tuebingen
/113© Burkhard Rost
65
Structure evolves without leaps?
© Nick V Grishin HHMI + Univ Dallas
Fig. 1: NV Grishin 2001 NAR 29:638-43
/113© Burkhard Rost
66
Structural universe: no islands, really
B Rost 1998 Structure 6:259-263
/113© Burkhard Rost
Similar 3D -> Similar function Learn from 3D about function Learn about evolution
classify
67
3D classifications: goals
/113© Burkhard Rost
68
3D modules
Multiple 3D alignment identifies consensus secondary structure
© C
hrist
ine O
reng
o
/113© Burkhard Rost
some structures more often observed than others limited number of shapes? fold remains an assumption (that increasingly seems to be proven inappropriate)
69
Fold of a protein
Multiple 3D alignment identifies consensus secondary structure
© C
hrist
ine O
reng
o
/113© Burkhard Rost
70
Protein structure comparisons
1bww3sdh 1xne
All-alpha All-beta AlphaBeta
/113© Burkhard Rost
71
How to recognize the similarity?
/113© Burkhard Rost
SCOPhttp://scop.mrc-lmb.cam.ac.uk/scop/[A Murzin et al. (1995) JMB 247, 536-540]
CATHhttp://www.cathdb.info/[AL Cuff et al. (2009) NAR 37, D310-314; CA Orengo et al. (1997) Structure 15, 1093-1108]
COPS - QSCOP - TopMatchhttp://cops.services.came.sbg.ac.at[SJ Suhrer et al. (2009) NAR 37, W539-W44. ]
72
3D classification databases
/113© Burkhard Rost
Classify protein structure: SCOP
73
/113© Burkhard Rost
SCOPhttp://scop.mrc-lmb.cam.ac.uk/scop/[Murzin et al. J. Mol. Biol. 247, 536-540]
hierarchy
74
3D classification databases
/113© Burkhard Rost
75
Protein structure comparisons
1bww3sdh 1xne
All-alpha All-beta AlphaBeta
/113© Burkhard Rost
76
SCOP hierarchy
SCOP
classa.
Example
{All-alpha}
Structure similarity increases
/113© Burkhard Rost
alpha beta alpha and beta (a/b – interspersed) alpha plus beta (a+b – segregated) multidomain proteins membrane and cell-surface proteins small proteins coiled coil proteins low-resolution protein structures peptides designed proteins
77
SCOP classes
/113© Burkhard Rost
78
SCOP class
1sw0-TIM beta/alpha barrelNAD(P)-binding Rossmann-fold domains 1sw0-TIM
CLASS = alpha and beta (a/b)
/113© Burkhard Rost
79
SCOP hierarchy
SCOP
fold
classa.
Example
a.1
{All-alpha}
{Globin-like} Structure similarity increases
/113© Burkhard Rost
same major secondary structures • in the same arrangement • with the same topological connections peripheral elements may differ
• up to 50% peripheral • Turns and secondary structure elements evolutionary relationship unclear
80
SCOP fold definition
/113© Burkhard Rost
81
SCOP fold
CLASS = alpha and beta (a/b) FOLD = TIM beta/alpha-barrel
/113© Burkhard Rost
82
Structural universe: no islands, really
B Rost 1998 Structure 6:259-263
/113© Burkhard Rost
83
SCOP hierarchy
SCOP
fold
superfamily
classa.
Example
a.1
a.1.2
{All-alpha}
{Globin-like}
{alpha-helical ferrodoxin}
Structure similarity increases
/113© Burkhard Rost
probable common evolutionary origin low similarities, but
• share the same fold • have similar functions
84
SCOP superfamily definition
/113© Burkhard Rost
85
SCOP hierarchy
TRIOSEPHOSPHATE ISOMERASE (1swo)
QUINOLINIC ACID PHOSPHORIBOSYLTRANSFERASE (1qap)
PHOSPHATE ALDOLASE (1p1x )
/113© Burkhard Rost
86
SCOP hierarchySCOP
family(sequence based)
c.1.1.1{Triosephosphate isomerase}
Structure similarityincreasessuperfamily
Example
c.1.1{ Triosephosphate isomerase}
fold
classc
c.1{TIM beta/alpha-barrel}
{Alpha and beta a/b}
/113© Burkhard Rost
clearly evolutionary relation Sequence identity often >30%,but not necessarily, e.g. globins: < 15% sequence identity for some members
87
SCOP family definition
/113© Burkhard Rost
SCOPhttp://scop.mrc-lmb.cam.ac.uk/scop/[Murzin et al. J. Mol. Biol. 247, 536-540]
88
3D classification databases
/113© Burkhard Rost
Classify protein structure: CATH
89
/113© Burkhard Rost
SCOPhttp://scop.mrc-lmb.cam.ac.uk/scop/[Murzin et al. J. Mol. Biol. 247, 536-540]
CATHhttp://www.cathdb.info/[AL Cuff et al. (2009) NAR 37, D310-314; CA Orengo et al. (1997) Structure 15, 1093-1108]
90
3D classification databases
/113© Burkhard Rost
91
Chr is t ine Orengo (S t ruc tu res , 1997 , 5 , 1093 -1108)
Christine Orengo et al. 1997 Structures 5 1093-1108
Chr is t ine Orengo (S t ruc tu res , 1997 , 5 , 1093 -1108)
Universe of protein structures
/113© Burkhard Rost
Class Architecture Topology Homology
92
CATH
/113© Burkhard Rost
Class: mostly alpha, mostly beta, mixed alpha/beta, few regular secondary structure
93
CATH
1bww3sdh 1xn
All-alpha All-beta AlphaBeta
/113© Burkhard Rost
Class: mostly alpha, mostly beta, mixed alpha/beta, few regular secondary structure Architecture: classification according to overall shape, ignoring connectivity Topology: fold groups = shape & connectivity Homology: evolutionarily related superfamily
94
CATH
/113© Burkhard Rost
folds 1,282 superfamilies 2,549 sequence families 11,330 domains 24,232
95
CATH stats (2011/05)
/113© Burkhard Rost
Find domain
96
CATH: steps involved
Multiple 3D alignment identifies consensus secondary structure
© C
hrist
ine O
reng
o
/113© Burkhard Rost
Find domains • ab initio: consensus of three methods:
DETECTIVE:hydrophobic interiorPUU:likely separation motionDOMAK:count internal and external contactsProblem: only 20% consistent!
• Based on prior knowledge: CATHEDRAL GT: secondary structure matchingDDP: structural alignment
97
CATH: steps involved
Redfern,O.C. et al. (2007) PLoS Comp. Biol., 3, e232
/113© Burkhard Rost
Find domain From domain to superfamily
98
CATH: steps involved
Ⓒ CATH tutorial (www.cathdb.info)
PDB id: 1gcq
(SH3 domains)
PDB id: 1gcqA0
(SH3 domain)
/113© Burkhard Rost
Find domain From domain to superfamily
99
CATH: steps involved
Ⓒ CATH tutorial (www.cathdb.info)
PDB id: 1gcqA0
(SH3 domain)
http://www.cathdb.info/domain/1gcqA00
/113© Burkhard Rost
at 80% residue domain overlap: 70% of proteins in PDB have similar domains
100
CATH vs SCOP
G Csaba et al. (2009) Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis. BMC Struct Biol 9:23.
/113© Burkhard Rost
101
CATH: 50 structures - 1 superfamily
Ⓒ CATH
superfamily 3.40.640.10
Type I PLP-dpenedent aspartate aminotransferase-like (Major domain)
/113© Burkhard Rost
102Ⓒ Christine Orengo
Rank by family size
Perc
enta
ge o
f al
l dom
ain
sequ
ence
s
Structural families (i.e. one or more solved structures CATH/SCOP)
BIG families (currently Pfam)
Remaining families (new BIG)
/113© Burkhard Rost
SCOPhttp://scop.mrc-lmb.cam.ac.uk/scop/[Murzin et al. J. Mol. Biol. 247, 536-540]
CATHhttp://www.cathdb.info/[AL Cuff et al. (2009) NAR 37, D310-314; CA Orengo et al. (1997) Structure 15, 1093-1108]
103
3D classification databases
Christine Orengo
Janet Thornton
/113© Burkhard Rost
Classify protein structure:
COPS/QSCOP/TopMatch
104
/113© Burkhard Rost
COPS = Classification Of Protein Structures Based on quantified structural comparison 2007: additional info for SCOP domains: qSCOP 2009: workbench based on PDB chains: TopSearchhttp://topsearch.services.came.sbg.ac.at/
105
COPS hierarchy
SJ Suhrer et al. & M Sippl (2009) COPS--a novel workbench for explorations in fold space. Nucl Acids Res 37:W539-44.
/113© Burkhard Rost
106
COPS metric
Triangle inequality/transitivity:A≈B (A similar to B) B≈C (B similar to C)
⇏ B≈C (does not imply: A similar to C)
protein Aprotein Bprotein C
/113© Burkhard Rost
107
PDB updates 2008/08/19-2009/04/14
SJ Suhrer et al. (2009) NAR 37:W539-W544
novelty:
/113© Burkhard Rost
108
PDB diversity in light of COPS
SJ Suhrer et al. (2009) NAR 37:W539-W544
/113© Burkhard Rost
109
COPS domain parsing
SJ Suhrer et al. (2009) NAR 37:W539-W544
Apaf-1 PDB id 1z6t
COPS c1z6tA1 (CARD domain) - c2a5yB1
c1z6tA2 (α/β domain) -
c2a5yB
c1z6tA3 (helical domain
I) - c2a5yB3
c1z6tA4 (winged-helix domain) -
c2a5yB4
/113© Burkhard Rost
110
COPS domain parsing
SJ Suhrer et al. (2009) NAR 37:W539-W544
PDB id 1z6t-Awith 2a5y-B
/113© Burkhard Rost
No domain decomposition But:
• Complete structure comparisons
• Biological units • New metric
111
COPS <-> TopSearch
MJ Sippl & M Wiederstein (2012) Detection of spatial correlations in protein structures and molecular complexes. Structure 20:718-28
/113© Burkhard Rost
SCOPhttp://scop.mrc-lmb.cam.ac.uk/scop/[A Murzin et al. (1995) JMB 247, 536-540] CATHhttp://www.cathdb.info/[AL Cuff et al. (2009) NAR 37, D310-314; CA Orengo et al. (1997) Structure 15, 1093-1108] COPShttp://cops.services.came.sbg.ac.at[SJ Suhrer et al. (2009) NAR 37, W539-W44. ]
112
3D classification databases
Manfred Sippl Salzburg
/113© Burkhard Rost
SCOPhttp://scop.mrc-lmb.cam.ac.uk/scop/A Murzin et al. 1995 JMB:247, 536-540
CATHhttp://www.cathdb.info/AL Cuff et al. 2009 NAR 37:D310-314; CA Orengo et al. 1997 Structure 15:1093-1108
COPShttp://cops.services.came.sbg.ac.atSJ Suhrer et al. 2009 NAR 37:W539-W44
113
3D classification databases
Alexey Murzin et al, Cambridge UK
Christine Orengo & Janet Thornton
et al, UCL UK
Manfred Sippl et al. Salzburg