Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
/77© Burkhard Rost
1
title: Membrane structure prediction 1short title: cb1_tmh1
lecture: Computational Biology 1 - Protein structure (for Informatics) - TUM summer semester
/77© Burkhard Rost
Videos: YouTube / www.rostlab.orgTHANKS :. EXERCISES: Special lectures: • 07/xx Predrag Radivojac - Indiana Univ. • 06/xx Yana Bromberg - Rutgers Univ. No lecture: • 05/09 no lecture • 05/23 Student assembly (SVV) • 05/25 Ascension day • 06/06 Whitsun holiday • 06/15 Corpus Christi LAST lecture: bef: Jul 11 after: Jul 28 Examen: WEDNESDAY(!!) July 12: 18:00-19:30 TBA • Makeup: TBC: Oct 17 & 19, 2017 - lecture time
2
CONTACT: Lothar Richter [email protected]
Dmitrij Nechaev
Lothar Richter
Christian Dallago
© Burkhard Rost
Recap: 1D secondary
structure prediction
3
/77© Burkhard Rost
4
Goal of structure prediction
Epstein & Anfinsen, 1961:sequence uniquely determines structure
• INPUT: sequence
3D structureand function
• OUTPUT:
/78© Burkhard Rost
5
Zones
Day
light
Zon
e
Twili
ght Z
one
Mid
nigh
t Zon
eprofile - profile
sequence - profilesequence - sequence
sequ
ence
sim
ilar
->
stru
ctur
e sim
ilar
B Rost (1997) Fold Des 2:S19-24B Rost (1999) Protein Eng 12:85-94
/78© Burkhard Rost
6structure (PDB id 4lpk): JM Ostrem et al. & KM Shokat (2013) Nature 503:548-51
Comparative modeling predicts 3D structure in silico
pretein seqwence
priteen peqwinse
Query
PDB
predicts 3D structure for short regions in 50% of all
known protein s
/77© Burkhard Rost
single residues (1. generation) • Chou-Fasman, GOR 1957-70/80
50-55% accuracy
segments (2. generation) • GORIII 1986-92
55-60% accuracy
problems • < 100% may be: 65% max • < 40% may be: strand non-local • short segments
7
Secondary structure prediction: 1.+2. Generation
B Rost and C Sander (2000) Methods in Molecular Biology 143: 71-95
/78© Burkhard Rost
8B Rost (1996) Methods in Enzymology 266: 525-39
ACDEFGHIKLMNPQRSTVWY.
H
E
L
D (L)
R (E)
Q (E)
G (E)
F (E)
V (E)
P (E)
A (H)
A (H)
Y (H)
V (E)
K (E)
K (E)
Neural Network for secondary structure
/77© Burkhard Rost
single residues (1. generation) • Chou-Fasman, GOR 1957-70/80
50-55% accuracy
segments (2. generation) • GORIII 1986-92
55-60% accuracy
problems • < 100% they said: 65% max • < 40% they said: strand non-local • short segments
9
Secondary structure predictions of 1. and 2. generation
/78© Burkhard Rost
H
E
L
V (E)
P (E)
A (H)
PHDsec:
structure-to-structure
10
PHDsec: structure-to-structure network
B Rost (1996) Methods Enzymol 266:525-39
/77© Burkhard Rost
single residues (1. generation) • Chou-Fasman, GOR 1957-70/80
50-55% accuracy
segments (2. generation) • GORIII 1986-92
55-60% accuracy
problems • < 100% they said: 65% max • < 40% problem was NOT locality but distribution • short segments
11
Secondary structure predictions of 1. and 2. generation
© Burkhard Rost /77
STILL ONLY 60+ε% accuracy.
How to improve beyond that?
12
/78© Burkhard Rost
Η
Ε
L
>
>
>
pickmaximal
unit=>
currentprediction
J2
inputlayer
first orhidden layer
second oroutput layer
s0 s1 s2J1
:GYIY
DPAVGDPDNGVEP
GTEF:
:GYIY
DPEVGDPTQNIPP
GTKF:
:GYEY
DPAEGDPDNGVKP
GTSF:
:GYEY
DPAEGDPDNGVKP
GTAF:
Alignments
5 . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 5 . .. . . . . . . 2 . . . . . 3 . . . . . .. . . . . . . . . . . . . . . . . 5 . .
. . . . 5 . . . . . . . . . . . . . . .
. . . 5 . . . . . . . . . . . . . . . .
. . 3 . . . . 2 . . . . . . . . . . . .
. . . . 1 . . 2 . . . 2 . . . . . . . .5 . . . . . . . . . . . . . . . . . . .. . . . 5 . . . . . . . . . . . . . . .. . . 5 . . . . . . . . . . . . . . . .. . . . 4 . 1 . . . . . . . . . . . . .. . . . 1 3 . . . 1 . . . . . . . . . .4 . . . . 1 . . . . . . . . . . . . . .. . . . . . . . . . . 4 . 1 . . . . . .. . . 1 . 1 . 1 2 . . . . . . . . . . .. . . 5 . . . . . . . . . . . . . . . .
5 . . . . . . . . . . . . . . . . . . .. . . . . . 5 . . . . . . . . . . . . .. 1 1 . 1 . . 1 1 . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 5 .
GSAPD NTEKQ CVHIR LMYFW
profile table
:GYIY
DPEDGDPDDGVNP
GTDF:
Protein
corresponds to the the 21*3 bits coding for the profile of one residue
13
PHD: Neural network & evolutionary information
B Rost & C Sander (1993) PNAS 90:7558-62B Rost (1996) Methods Enzymol 266:525-39
/78© Burkhard Rost
P H D s e c
H
L
E
4+1""""""
20444
outputlayer
inputlayer
hiddenlayer
20444
21+3""""""
H
L
E
0.5
0.1
0.4percentage of each amino acid in proteinlength of protein (≤60, ≤120, ≤240, >240)distance: centre, N-term (≤40,≤30,≤20,≤10)distance: centre, C-term (≤40,≤30,≤20,≤10)
input global in sequence
input local in sequence
localalign-ment13
adjacentresidues
:::
AAA
AA.
LLL
LII
AAG
CCS
GVV
:::
globalstatist.wholeprotein
%AALength∆ N-term∆ C-term
A C L I G S V ins del cons
100 0 0 0 0 0 0 0 0 1.17
100 0 0 0 0 0 0 33 0 0.42
0 0 100 0 0 0 0 0 33 0.92
0 0 33 66 0 0 0 0 0 0.74
66 0 0 0 33 0 0 0 0 1.17
0 66 0 0 0 33 0 0 0 0.74
0 0 0 33 0 0 66 0 0 0.48
first levelsequence-to- structure
second levelstructure-to- structure
14
PHDsec: more details
B Rost (1996) Methods Enzymol 266:525-39
© Burkhard Rost /77
Does global information help?
15
/78© Burkhard Rost
16
With/without global information
Q3
Only sliding window, i.e. only local 72 %
Local & global 72 %
/78© Burkhard Rost
17
Protein structure comparisons
1bww3sdh 1xne
All-alpha All-beta AlphaBeta
/78© Burkhard Rost
P H D s e c
H
L
E
4+1""""""
20444
outputlayer
inputlayer
hiddenlayer
20444
21+3""""""
H
L
E
0.5
0.1
0.4percentage of each amino acid in proteinlength of protein (≤60, ≤120, ≤240, >240)distance: centre, N-term (≤40,≤30,≤20,≤10)distance: centre, C-term (≤40,≤30,≤20,≤10)
input global in sequence
input local in sequence
localalign-ment13
adjacentresidues
:::
AAA
AA.
LLL
LII
AAG
CCS
GVV
:::
globalstatist.wholeprotein
%AALength∆ N-term∆ C-term
A C L I G S V ins del cons
100 0 0 0 0 0 0 0 0 1.17
100 0 0 0 0 0 0 33 0 0.42
0 0 100 0 0 0 0 0 33 0.92
0 0 33 66 0 0 0 0 0 0.74
66 0 0 0 33 0 0 0 0 1.17
0 66 0 0 0 33 0 0 0 0.74
0 0 0 33 0 0 66 0 0 0.48
first levelsequence-to- structure
second levelstructure-to- structure18
Predict protein class
B Rost (1996) Methods Enzymol 266:525-39
in out
All-alphaAll-beta
AlphaBeta
P H D s e c
H
L
E
4+1""""""
20444
outputlayer
inputlayer
hiddenlayer
20444
21+3""""""
H
L
E
0.5
0.1
0.4percentage of each amino acid in proteinlength of protein (≤60, ≤120, ≤240, >240)distance: centre, N-term (≤40,≤30,≤20,≤10)distance: centre, C-term (≤40,≤30,≤20,≤10)
input global in sequence
input local in sequence
localalign-ment13
adjacentresidues
:::
AAA
AA.
LLL
LII
AAG
CCS
GVV
:::
globalstatist.wholeprotein
%AALength∆ N-term∆ C-term
A C L I G S V ins del cons
100 0 0 0 0 0 0 0 0 1.17
100 0 0 0 0 0 0 33 0 0.42
0 0 100 0 0 0 0 0 33 0.92
0 0 33 66 0 0 0 0 0 0.74
66 0 0 0 33 0 0 0 0 1.17
0 66 0 0 0 33 0 0 0 0.74
0 0 0 33 0 0 66 0 0 0.48
first levelsequence-to- structure
second levelstructure-to- structure
/78© Burkhard Rost
19
With/without global information
Q3 (per-
residue)
Q4 (per-protein)
Only sliding window, i.e. only local 72 % 70 %
Local & global 72 % 75 %
© Burkhard Rost
1D: TM
transmembrane helix
prediction20
© Burkhard Rost
Intro: membranes
21
© Burkhard Rost /77
What to put around a cell?
22
/78© Burkhard Rost
23
Roshni Nelson: Cell Membranes
© Roshni Nelson UT Southwestern Dallas http://www.roshninelson.com
http://utsouthwestern.edu/STARS - vimeo.com/31412291
/78© Burkhard Rost
24
Roshni Nelson: Phospholipids
© Roshni Nelson UT Southwestern Dallas http://www.roshninelson.com
http://utsouthwestern.edu/STARS - vimeo.com/31412291
/78© Burkhard Rost K Rogers (2011) Britannica
Prokaryotic Cell Eukaryotic Cell(bacillus type)
25
Cellular compartments
© Tatyana Goldberg (TUM Munich)
/78© Burkhard Rost
26
How 2 separate outside/inside?
/78© Burkhard Rost
27
Lipid bilayer
Wikipedia © http://en.wikipedia.org/wiki/Lipid_bilayer
/78© Burkhard Rost
-+-
+ +++
-
-
--
----
- ++
++
+
+
HHHHH
H HH
HHH H H
H-
+
solvent
28
Hydrophobic core of a protein
/78© Burkhard Rost
29
Lipid bilayer: hydrophobic in inside
© Wikipedia http://en.wikipedia.org/wiki/Lipid_bilayer
/78© Burkhard Rost
30
Lipid bilayer: hydrophobic in insideeasy to pull aroundhorizontally
© Wikipedia http://en.wikipedia.org/wiki/Lipid_bilayer
/78© Burkhard Rost
31
Lipid bilayer: hydrophobic in insidehard to enter
© Wikipedia http://en.wikipedia.org/wiki/Lipid_bilayer
/78© Burkhard Rost
32
Bacterial injection needles
Model of type VI secretion system (TSS6) in gram-negative bacteria
Marek Basler Biozentrum Basel
/78© Burkhard Rost
33Borenstein DB, Ringel P, Basler M, Wingreen NS (2015) Established Microbial Colonies Can Survive Type VI Secretion Assault. PLoS Comput Biol 11(10): e1004520. doi:10.1371/
Shot through two membranes
Marek Basler
Biozentrum Basel
/78© Burkhard Rost
34Marek Basler, BT Ho, JJ Mekalanos (2013) Cell 152:884-894
Tit-for-tat: type 6 secretion system counter-attack
Marek Basler
Biozentrum Basel
/78© Burkhard Rost
35
Localization for drug targets
TMBakheetandAJDoig(2008)Bioinforma)cs
Membrane57 %
Cytoplasm13 %
Extra-cellular13 %
ER7 %
Nucleus3 %
Mito2 %
Other2 %
Microsome2 %
Pero1 %
Drug targets tend to be found in membranes, cytoplasm or are
extra-cellular!© Tatyana Goldberg (TUM Munich)
© Burkhard Rost
TMH (Transmembrane
helix) background
36
/77© Burkhard Rost
37
/77© Burkhard Rost
38
1JB0Cyanobacterial Photosystem I
Jordan P, Krauss N
1E7PFumarate ReductaseLancaster CD, Michel
H
periplasm
cytoplasmCytoplasm (stromal side)
?
/78© Burkhard Rost
39
Membrane prediction
/78© Burkhard Rost
40
TM prediction wait for db growth ...
1993
1999
1996
/78© Burkhard Rost
41
Topology for membrane helical proteins.
exex tratra -cy-cy toto pp ll aa smsm ii cc
ii nn tt rr aa -- cc yy tt oo pp ll aa ss mm ii ccin
protein Aprotein C
C-term
out
in
protein B
C-term
C-term
lipid membranebilayer
inside cytoplasm
outside cytoplasm
© Burkhard Rost
TMH prediction
42
/78© Burkhard Rost
P H D s e c
H
L
E
4+1""""""
20444
outputlayer
inputlayer
hiddenlayer
20444
21+3""""""
H
L
E
0.5
0.1
0.4percentage of each amino acid in proteinlength of protein (≤60, ≤120, ≤240, >240)distance: centre, N-term (≤40,≤30,≤20,≤10)distance: centre, C-term (≤40,≤30,≤20,≤10)
input global in sequence
input local in sequence
localalign-ment13
adjacentresidues
:::
AAA
AA.
LLL
LII
AAG
CCS
GVV
:::
globalstatist.wholeprotein
%AALength∆ N-term∆ C-term
A C L I G S V ins del cons
100 0 0 0 0 0 0 0 0 1.17
100 0 0 0 0 0 0 33 0 0.42
0 0 100 0 0 0 0 0 33 0.92
0 0 33 66 0 0 0 0 0 0.74
66 0 0 0 33 0 0 0 0 1.17
0 66 0 0 0 33 0 0 0 0.74
0 0 0 33 0 0 66 0 0 0.48
first levelsequence-to- structure
second levelstructure-to- structure
43
Membrane helices are helices, right?
B Rost (1996) Methods Enzymol 266:525-39
/78© Burkhard Rost
44
PHDsec “success” on Poly-Valine
HEADER LIPOPROTEIN(SURFACE FILM)COMPND PULMONARY SURFACTANT-ASSOCIATED POLYPEPTIDE C(SP-C)SOURCE PIG (SUS SCROFA)AUTHOR J.JOHANSSON,T.SZYPERSKI,T.CURSTEDT,K.WUTHRICH
AA LRIPCCPVNLKRLLVVVVVVVLVVVVTVGALLMGLOBS sec HHHHHHHHHHHHHHHHHHHHHHHHHPHD sec EEEEEEEEEEEEEEEEEEEEEEE
/78© Burkhard Rost
45
Goes wrong because swap: outside/inside
Protein
Membrane
H=hydrophobic
LIPID
H
HHHH
H
HH H H
H
HProtein
non-membrane(globular water-soluble)
H=hydrophobicL= hydrophilic
Water
H
LL
HL L
HH
HL L L
L
LL
/78© Burkhard Rost
-+-
+ +++
-
-
--
----
- ++
++
+
+
HHHHH
H HH
HHH H H
H-
+
solvent
46
Hydrophobic core of a protein
/78© Burkhard Rost
47
Topology for membrane helical proteins.
exex tratra -cy-cy toto pp ll aa smsm ii cc
ii nn tt rr aa -- cc yy tt oo pp ll aa ss mm ii ccin
protein Aprotein C
C-term
out
in
protein B
C-term
C-term
lipid membranebilayer
inside cytoplasm
outside cytoplasm
/78© Burkhard Rost
48
Hydrophobic side chains
/78© Burkhard Rost
49
Eisenberg hydrophobicity indexAA-3 AA-1 Eisenberg
Ile I 1.38Phe F 1.19Val V 1.08Leu L 1.06Trp W 0.81Met M 0.64Ala A 0.62Gly G 0.48Cys C 0.29Tyr Y 0.26Pro P 0.12Thr T -0.05Ser S -0.18His H -0.4Glu E -0.74Asn N -0.78Gln Q -0.85Asp D -0.9Lys K -1.5Arg R -2.53
David Eisenberg, UCLA © https://www.uclaaccess.ucla.edu/
uploads/image/faculty/134.jpg
D Eisenberg et al. (1984) J Mol Biol 179:125-42
/78© Burkhard Rost
50
Pure hydrophobicity scaleshydrophobicity scales
-6.75
-4.50
-2.25
0.00
2.25
4.50
6.75
9.00
A R N D C Q E G H I L K M F P S T W Y VGES EISEN KYDO
/78© Burkhard Rost
51
5 Hydrophobicity/tm/occupancy scaleshydrophobicity scales
0.00
0.25
0.50
0.75
1.00
A R N D C Q E G H I L K M F P S T W Y VGES EISEN KYDO OOI HEIJNE
/78© Burkhard Rost
52
Many indices exist
K Tomii and M Kanehisa (1996) Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. Protein Eng 9:27-36: Fig. 2 (402 indices)
/78© Burkhard Rost
53
PHDsec success on Poly-Valine
HEADER LIPOPROTEIN(SURFACE FILM)COMPND PULMONARY SURFACTANT-ASSOCIATED POLYPEPTIDE C(SP-C)SOURCE PIG (SUS SCROFA)AUTHOR J.JOHANSSON,T.SZYPERSKI,T.CURSTEDT,K.WUTHRICH
AA LRIPCCPVNLKRLLVVVVVVVLVVVVTVGALLMGLOBS sec HHHHHHHHHHHHHHHHHHHHHHHHHPHD sec EEEEEEEEEEEEEEEEEEEEEEE
NLKRLLVVVVVVVLVVVVTVGALL h hhhhhhhhhhhhhh h hhhh: hydrophobic
/78© Burkhard Rost
54
Identify hydrophobic regions
G von Heijne (1992) Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol 225: 487-94: Fig. 4
/78© Burkhard Rost
55
Topology for membrane helical proteins.
exex tratra -cy-cy toto pp ll aa smsm ii cc
ii nn tt rr aa -- cc yy tt oo pp ll aa ss mm ii ccin
protein Aprotein C
C-term
out
in
protein B
C-term
C-term
lipid membranebilayer
/77© Burkhard Rost
G von Heijne (1986) The distribution of positively charged residues in bacterial inner membrane proteins correlates with the trans-membrane topology. EMBO J 5:3021-7 Fig. 2
56
Positive-inside rule
cytosolic loops
periplasmic loops
/78© Burkhard Rost
57
Topology for membrane helical proteins.
exex tratra -cy-cy toto pp ll aa smsm ii cc
ii nn tt rr aa -- cc yy tt oo pp ll aa ss mm ii ccin
protein Aprotein C
C-term
out
in
protein B
C-term
C-term
lipid membranebilayer
/78© Burkhard Rost
58
Heijne rule: positive inside out
0.920.95
0.93
0.91 0.900.92
0.870.89
N-term C-term
5 30 6 5
outout
Eight bestHTM's
µ=0: 0 HTM
µ=2: 2 HTMµ=3: 3 HTM
µ=1: 1 HTM
Loop lengths
Charge:Number of R+Kin loops 1-4
final prediction:∆ =(5+1) - (2+3)>0=> first loop out lipid membrane bilayer
extra-cytoplasmic
intra-cytoplasmic
R+KΣ=2
R+KΣ =5
R+KΣ =3
R+KΣ=1
/77© Burkhard Rost
1. predict <H> 2. assign positive inside-out 3. choose threshold to optimize inside-out difference
59
G von Heijne (1992) Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol 225: 487-94: Fig. 4
Identify hydrophobic regions
/77© Burkhard Rost
S Jayasinghe, K Hristova, SH White (2001) Energetics, stability, and prediction of transmembrane helices. J Mol Biol 12:927-34idea: optimize hydrophobicity scale for prediction
60
Hydrophobicity-based
/78© Burkhard Rost
61
PHDsec success on Poly-Valine
HEADER LIPOPROTEIN(SURFACE FILM)COMPND PULMONARY SURFACTANT-ASSOCIATED POLYPEPTIDE C(SP-C)SOURCE PIG (SUS SCROFA)AUTHOR J.JOHANSSON,T.SZYPERSKI,T.CURSTEDT,K.WUTHRICH
AA LRIPCCPVNLKRLLVVVVVVVLVVVVTVGALLMGLOBS sec HHHHHHHHHHHHHHHHHHHHHHHHHPHD sec EEEEEEEEEEEEEEEEEEEEEEE
/77© Burkhard Rost
62
HTM
nonHTM
outputlayer
inputlayer
hiddenlayer
20444
21+3""""""
percentage of each amino acid in proteinlength of protein (≤60, ≤120, ≤240, >240)distance: centre, N-term (≤40,≤30,≤20,≤10)distance: centre, C-term (≤40,≤30,≤20,≤10)
input global in sequence
input local in sequence
localalign-ment13
adjacentresidues
:::
AAA
AA.
LLL
LII
AAG
CCS
GVV
:::
globalstatist.wholeprotein
%AALength∆ N-term∆ C-term
A C L I G S V ins del cons
100 0 0 0 0 0 0 0 0 1.17
100 0 0 0 0 0 0 33 0 0.42
0 0 100 0 0 0 0 0 33 0.92
0 0 33 66 0 0 0 0 0 0.74
66 0 0 0 33 0 0 0 0 1.17
0 66 0 0 0 33 0 0 0 0.74
0 0 0 33 0 0 66 0 0 0.48
HTM
nonHTM
3+1""""""
20444
first levelsequence-to- structure
second levelstructure-to- structure
P H D ht m
/78© Burkhard Rost
63
Dynamic programming on NN ‘energy’
1
01
0residue number
T
N
/78© Burkhard Rost
64
PHDhtm
refine
0.920.95
0.93
0.91 0.900.92
0.870.89
N-term C-term
5 30 6 5
outout
Eight bestHTM's
µ=0: 0 HTM
µ=2: 2 HTMµ=3: 3 HTM
µ=1: 1 HTM
Loop lengths
Charge:Number of R+Kin loops 1-4
final prediction:∆ =(5+1) - (2+3)>0=> first loop out lipid membrane bilayer
extra-cytoplasmic
intra-cytoplasmic
R+KΣ=2
R+KΣ =5
R+KΣ =3
R+KΣ=1
/78© Burkhard Rost
65
PHDhtm on Poly-Valine
HEADER LIPOPROTEIN(SURFACE FILM)COMPND PULMONARY SURFACTANT-ASSOCIATED POLYPEPTIDE C(SP-C)SOURCE PIG (SUS SCROFA)AUTHOR J.JOHANSSON,T.SZYPERSKI,T.CURSTEDT,K.WUTHRICH
AA LRIPCCPVNLKRLLVVVVVVVLVVVVTVGALLMGLOBS htm TTTTTTTTTTTTTTTTTTTTTTTTTPHD htm TTTTTTTTTTTTTTTTTTTTTTTT
/78© Burkhard Rost
66
Membrane helix prediction: TMHMM
TMHMM: sketch
details: inside/outside loop
details: TM core
A Krogh, B Larsson, G von Heijne, EL Sonnhammer (2001) 305:567-80, Fig. 1
/77© Burkhard Rost
Gabor E Tusnady & Istvan Simon (2001) The HMMTOP transmembrane topology prediction server. Bioinformatics 17:849-850.
67
Membrane helix prediction: HMMTOP
/78© Burkhard Rost
68
TMHs (helices) correctly predicted?
C-P Chen, A Kernytsky & B Rost 2002 Protein Science 11, 2774-91
Observed Helix 1 (O1) O2 O3
Predicted Helix 1 (P1) P2 P3
/78© Burkhard Rost
69
TMHs (helices) correctly predicted: if at most ±5 residues overlap
C-P Chen, A Kernytsky & B Rost 2002 Protein Science 11, 2774-91
Observed Helix 1 (O1) O2 O3
Predicted Helix 1 (P1) P2 P3
here=0
/78© Burkhard Rost
70
Prediction of membrane helicesQo
k: %
of p
rote
in w
ith al
l TM
H rig
ht
J Reeb, E Kloppmann, M Bernhofer & B Rost (2015) Proteins 83:473-84
© Burkhard Rost
Other problems unravelled by recent
structures
71
/78© Burkhard Rost
72
Kingdoms similar in length
J Liu. & B Rost (2001) Prot. Sci. 10, 1970-1979.B Rost (2002) Curr Op Struct Biol, 12, 409-416
eukaryotesbacteriaarchaea
/78© Burkhard Rost
73
Kingdoms similar in amino acids usage
J Liu. & B Rost (2001) Prot Sci 10, 1970-1979.B Rost (2002) Curr Op Struct Biol, 12, 409-416
eukaryotes
bacteria
archaea
/78© Burkhard Rost
74
Inventory of life: membrane proteins
0 5 10 15 20 25 30
A pernixA fulgidus
M jannaschiiM thermoautotrophicu
P abyssiP horikoshii
A aeolicusB subtilis
B burgdorferiC jejuni
C pneumoniaeC trachomatisD radiodurans
E coliH influenzae
H pyloriM genitalium
M pneumoniaeM tuberculosisN meningitidis
R prowazekiiS PCC6803T maritimaT pallidum
U urealyticum
S cerevisiaeC elegans
D melanogasterH sapiens (SP/TrEmbl
H sapiens(chr 22)
%mem
eukaryotes
bacteria
archaea
J Liu. & B Rost (2001) Prot. Sci. 10, 1970-1979.
2013 note: some issues with data (incomplete sequences?)e.g. human has more than 18%
/78© Burkhard Rost
75
Inventory of life: coiled-coil proteins
0 5 10 15 20 25 30
A pernixA fulgidus
M jannaschiiM thermoautotrophicu
P abyssiP horikoshii
A aeolicusB subtilis
B burgdorferiC jejuni
C pneumoniaeC trachomatisD radiodurans
E coliH influenzae
H pyloriM genitalium
M pneumoniaeM tuberculosisN meningitidis
R prowazekiiS PCC6803T maritimaT pallidum
U urealyticumS cerevisiae
C elegansD melanogaster
H sapiens (SP/TrEmblH sapiens(chr 22)
%mem
0 2 4 6 8 10 12
%coils
J Liu. & B Rost (2001) Prot. Sci. 10, 1970-1979.
eukaryotes
bacteria
archaea
/77© Burkhard Rost
statistics for PDB in June 2010:67,086 structures in PDB (June 2010) 1,197 transmembrane 1,014 alpha helical 179 beta barrel
-> < 2% BUT: >20% of all proteins!
76
TMH proteins: reminders
GE Tusnady, ZS Dosztanyi & I Simon (2005) Bioinformatics 21:1276-7
/77© Burkhard Rost
statistics for PDB in June 2010:67,086 structures in PDB (June 2010) 246 unique* transmembrane
-> < way less than 2% BUT: >20% of all proteins!
• * unique=non-identical sequence (can have PIDE>99.5%!)
77
TMH proteins: reminders
S Jayasinghe, K Hristova, SH White (2001) Protein Sci 10:455-8
/77© Burkhard Rost
Edda Kloppmann & Marco Punta: 1,035 PDB unique TM structures (Jan 2012)-> 107 Pfam families
78
TMH proteins: reminders
E Kloppmann, M Punta & B Rost (2012) Curr Op Struct Biol 22:326-32
/78© Burkhard Rost
79
Thanks to Arne Elofsson
Following slides taken from Arne Elofsson, Stockholm Univ
/77© Burkhard Rost
78 interface helices ~50% of chains contain interface helix Average length ~ 9 aa Longest is 19 aa Most frequent in photosynthetic reaction center
E Granseth, G von Heijne & A Elofsson (2005) J Mol Biol 346:377-85
© Arne Elofsson (Stockholm Univ) 80
Interface helices (Granseth, JMB 2005)
/77© Burkhard Rost
36 reentrant helices • 20 in new classification 24% contain reentry 72% on the outside Length 3-32 residues Loops 11-117 residues
81
Re-entry regions
H Viklund, E Granseth & A Elofsson (2006) J Mol Biol 361:591-603
© Arne Elofsson (Stockholm Univ)
/78© Burkhard Rost
82
36 reentry regions in 3 classes
Helix-coil/Coil-helix
Helix-coil-helix Coil
H Viklund, E Granseth & A Elofsson (2006) J Mol Biol 361:591-603© Arne Elofsson (Stockholm Univ)
/78© Burkhard Rost
83
Predict re-entry regions
H Viklund, E Granseth & A Elofsson (2006) J Mol Biol 361:591-603: Fig. 5
/78© Burkhard Rost
84
Re-entry predicted in entire genomes
H Viklund, E Granseth & A Elofsson (2006) J Mol Biol 361:591-603
© Arne Elofsson (Stockholm Univ)
0.280.720.24079Observed in dataset
0.520.480.167773E. coli0.400.600.10757S. cerevisiae
0.540.460.154181H. sapiens
Reentrants in
Reentrants out
Reentrant fraction
ProteinsGenome
0.310.220.110.07FractionChannels
Active transporters
Electron transporters
Signal receptors
/77© Burkhard Rost
Membrane protein structures are complex • TM-helices ends at different locations • Different angles • Neighboring helices often interact • Interface helices • reentrant regions No sheets close to the membrane
85
The not so simple TM proteins
H Viklund, E Granseth & A Elofsson (2006) J Mol Biol 361:591-603
© Arne Elofsson (Stockholm Univ)
/78© Burkhard Rost
86
More complex structures need new prediction methodsNout
Cin
C
N
cytoplasm
periplasm
Membrane
H Viklund, E Granseth & A Elofsson (2006) J Mol Biol 361:591-603
© Arne Elofsson (Stockholm Univ)
/78© Burkhard Rost
87
The Z-coordinate
Z-coordinate: distance residue 2 membrane center
Z
0
15
-15
Periplasm
Cytoplasm
H Viklund, E Granseth & A Elofsson (2006) J Mol Biol 361:591-603 © Arne Elofsson (Stockholm Univ)
/00© Burkhard Rost
01: 04/25 Tue: no lecture 02: 04/27 Thu: no lecture 03: 05/02 Tue: Intro 1: organization of lecture: intro into cells & biology 04: 05/04 Thu: Intro 2: amino acids, protein structure (comparison), domains 05: 05/09 Tue: No lecture 06: 05/11 Thu: Alignment 1 07: 05/16 Tue: Alignment 2 08: 05/18 Thu: Comparative modeling & exp structure determination & secondary structure assignment 09: 05/23 Tue: SKIP: student assembly (SVV) 10: 05/25 Thu: SKIP: Ascension Day 11: 05/30 Tue: 1D: Secondary structure prediction 1 12: 06/01 Thu: 1D: Secondary structure prediction 2 13: 06/06 Tue: SKIP: Whitsun holiday (06/03-06) 14: 06/08 Thu: 1D: Secondary structure prediction 3 15: 06/13 Tue: 1D: Transmembrane structure prediction 1 16: 06/15 Thu: SKIP: Corpus Christi 17: 06/20 Tue: 1D: Transmembrane structure prediction 2 / Solvent accessibility prediction 18: 06/22 Thu: 1D: Disorder prediction 19: 06/27 Tue: 2D prediction 20: 06/29 Thu: 3D prediction / Nobel prize symposium 21: 07/04 Tue: TBA 22: 07/06 Thu: recap 1 23: 07/11 Tue: recap 2 24: 07/12 Thu: examen 25: 07/13 Tue: TBA 26: 07/18 Thu: TBA 27: 07/20 Tue: TBA 28: 07/22 Thu: TBA 29: 07/25 Thu: TBA
88
Lecture plan (CB1 structure)
today