View
223
Download
1
Tags:
Embed Size (px)
Citation preview
Structure and evolution Structure and evolution of IDPsof IDPs
Peter Tompa
Institute of EnzymologyHungarian Academy of Sciences
Budapest, Hungary
Why do we want to Why do we want to characterize/predict IDPs?characterize/predict IDPs?
1) Find new ones (460 in 1) Find new ones (460 in DisProt vs. DisProt vs. tens of thousands)tens of thousands)
2) Describe our 2) Describe our proteinprotein
Extend the structure-function Extend the structure-function paradigmparadigm
Why do we want to describe the Why do we want to describe the structure of IDPs in detail?structure of IDPs in detail?
To characterize…
Structure
In the bound state
In the free state
Structural levels
Structure
Local (secondary)
Global (tertiary)
Sequence (primary)
1) Primary structure
Dunker et al. (2001) J. Mol. Graph. Model. 19, 26
Primary structure (sequence) of Primary structure (sequence) of IDPsIDPs
Low-complexity regions in proteins
Wootton (1994) Comp. Chem. 18, 269
Low complexity:Low complexity: Drosophila Drosophila mastermindmastermind
MDAGGLPVFQSASQAAAAAVAQQQQQQQQQQQQQQQQQQQHLNLQLHQQHLQQQQSLGIHLQQQQQLQLQQQQQHNAQAQQQ
QQLQVQQQQQQRQQQQQQQQQHSLYNANLAAAGGIVGGLVPGGNGAGGVALQQVFGGPNGNNNSNNNNNSNNNSININNGNI
SPGDGLPTKRQPILDRLRRRMENYRRRQTDCVPRYEQTFSTVCEQQNHETSALQKRFLESKNKRAAKKTEKKLPETQQQAQT
QMLAGQLQSSVHVQQKILKRPADDVDNGAENYEPPQKLPNNNNNNNNNNNNNNNSSSGVGGGSENLTKFSVEIVQQLEFTTS
AANSQPQQISTNVTVKALTNTSVKSEPGVGGGRGRHQQQQQHQQHQQQQHQQQQHQQHQQHQQQQQHQQQQHQQQQHQQQQQ
QHHHQQQQQQGGGLGGLGNNGRGGGGPGGGGHMATGPGGVGVGMGPNMMSAQQKSALGNLANLVECKREPDHDFPDLGSLAK
DGANGQFPGFPDLLGDDNSENNDTFKDLINNLHDFNPSFLDGFDEKPLLDIKTEDGIKVEPPNAQDLINSLNVKSETGLGHG
FGGFGVGLGLDPQSMKMRPGVGFQNGPNGNANAGNGGPTAGGGGGGNGPGGLMSEHSLAAQTLKQMAEQHQHKSAMGGMGGF
HVPPHGMQQQQPQQQQQAPQQQQQQHGQMMGGPGQGQQQQQQQQPRYNDYGGGFPNDFAMGPNPTQQQQQHLPPQFHQKAPG
GGPGMNVQQNFLDIKQELFYSSPNDFDLKHLQQQQAMQQQQQQQQQQQQQQQHHAQQQQQHPNGPNMGVPMGGAGNFAKQQQ
QQVPTPQQQQQQQLQQQQQQYSPFSNQNANANFLNCPPRGGPQGNQAPGNMPQQQQQQPQQQQQPPRGPQSNPNAVPGGNAA
NATQQQQQQQQQQQQQQQQQQQQQQQATTTTLQMKQTQQLHISQQGGGSHGIQVSAGQHLHLSSDMKSNVSVAAQQGVFFSQ
QQAAQQQQQQQQQPGNAGPNPQQQQQQPHGGNAGANGGGPNGPQQQQPNQNMNNSNVPSDGFSLSQSQSMNFTQQQQQQAAA
AAAAAAAAQQQQAAAAQQQQQQVPPNMRQRQTQAQAAAAAAAAAAAQAQAAANANGGPGGNVPLMQQQQQTPGGVPVGAGSG
NASVGVPVSAGGPNNGAMNQLGGPMGGMPGMQMGGPGGVPINPMQMNPNGGAPNAQMMMGGNGGGPVPAASQAKFLQQQQIM
RAQAMQHQQQVQQHMAGARPPPPEYNATKAQLMQAQMMQQTVGGGGGGGVGVGVGVGGGVGGGGGAGRFPNSAAQAAAMRRM
TQQPIPPSGPMMRPQHAAMYMQQHGGAGGGPRGGMGGPYGGGGVGGAGGPMGGGGGGQQQQQRPPNVQVTPDGMPMGSQQEW
RHMMMTQQQQQMGFGPGGPMRQGPGGFNGGNFMPNGAPNAPGNGPNGGGGGGMMPGPNGPQMQLTPAQMQQQHMRQQQQQQH
MGPGGGGGGGGGNMQMQQLLQQQQNAAAGGGGGMMATQMQMTSIHMSQTQQQQQLTMQQQQFVQSTSTTTTHQQQQQLQLQM
QSQSGGPGGNGPSNNNGANQAGGVGVGVGVGVGVGVVGSSATIASASSISQTINSVVANSNDLCLEFLDNLPDGNFSTQDLI
NSLDNDNFNIQDILQ
Drosophila Drosophila mastermindmastermind
2) Secondary structure
Structure in the free state (3 examples)
CREB-KID - CBP-KIX binding and NMR
Radhakrishnan et al. (1998) FEBS Lett. 430, 317
FlgM: evidence for disorder in vivo
Plaxco and Gross (1997) Nature, 386, 657
Sorenson (2004) Mol. Cell 14, 127
FlgM - sigma 28 binding and NMR
p27 p27 – CycA/Cdk2 binding– CycA/Cdk2 binding (NMR, MD) (NMR, MD)
Sivakolundu et al. (2005) JMB 353, 1118
Wikipedia
SH3-PPII
And a fourth: polyproline II helixAnd a fourth: polyproline II helix
PPII helix conformation is common in IDPs
Raman optical activity (ROA)
Syme et al. (2002) EJB 269, 148
PPII
Dominates in : -casein -synuclein tau wheat gluten
2) Secondary structure
Structure in the bound state
Complexes of IDPs in PDB
p27Kip1 IA3
FnBP
Tcf3
CycA
Cdk2
fibronectin
-catenin
Asp prot.
IUP SP code Length Partner Method CREB P16220 28 CBP KIX NMR
DFF 45 O00273 89 DFF 40 NMR
E-cadherin P09803 57 -catenin X-ray
FCP1 AAC64549.1* 21 TFIIF/RAP74 NMR
FnBPA Q53971 24 Fibronectin NMR
IA3 P01094 29 Proteinase A X-ray
Killer toxin P19972 77 Killer toxin chain X-ray
Bob-1 P10636 13 Pin1 WW NMR
MAP tau P25912 86 DNA X-ray
MAX Q16633 22 Oct-1 POU/DNA X-ray
p27Kip1 P46527 69 CycA-Cdk2 X-ray
p53 P04637 11 MDM 2 X-ray
Phe-tRNA synthetase P27001 79 Phe-tRNA synthetase + tRNA X-ray
PKI P04541 20 PKA X-ray
RB3 Q9H169 91 tubulin X-ray
RNA pol II P04050 17 mRNA capping enzyme X-ray
SNAP 25 P13795-2 77 neuronal fusion complex X-ray
SV 40 virus coat P03087 66 assembled coat X-ray
TAFII230 P51123 67 TBP NMR
TBS virus coat P11795 34 assembled coat X-ray
Tcf3 CAA67686* 41 -catenin X-ray
Tcf4 Q9NQB0 24 -catenin X-ray
Troponin I P19429 17 Troponin C NMR
Vitamin D3R P11473 89 DNA X-ray
0 20 40 60 80 1000
20
40
60
80
100
Hélixfe
hér
jék
%-a
másodlagos szerkezet %-a
globularIDP
0 20 40 60 80 1000
20
40
60
80
100Extended
feh
érjé
k %
-a
másodlagos szerkezet %-a
0 20 40 60 80 1000
20
40
60
80
100
Turn
feh
érjé
k %
-a
másodlagos szerkezet %-a
0 20 40 60 80 1000
20
40
60
80
100
Coil
feh
érjé
k %
-a
másodlagos szerkezet %-a
Secondary structural elements
31.3 %44.8 %
21.9 %10.9 %
Helix
Comparison of free and bound states:
what does it tell us ?
Local secondary structural elements in IDPs:
molecular recognition
1) disorder pattern molecular recognition element
MoRE, MoRF2) consensus sequence:
linear motifLM, ELM, SLiM
3) local predictable structurepreformed structural element
PSE
1) Disorder pattern: MoRE in tumor suppressor p53
Uversky et al. (2005) J. Mol. Recogn. 18, 343
2) Consensus sequences: ELMs2) Consensus sequences: ELMs
ELMs and local disorderELMs and local disorder
Fuxreiter et al (2006) Bioinformatics, 23, 950
3) Predictability of structure: 3) Predictability of structure: preformed structural elements, preformed structural elements,
PSEsPSEs
p27Kip1 IA3
FnBP
Tcf3
CycA
Cdk2
fibronectin-catenin
Asp prot.
IUP SP code Length Partner Method CREB P16220 28 CBP KIX NMR
DFF 45 O00273 89 DFF 40 NMR
E-cadherin P09803 57 -catenin X-ray
FCP1 AAC64549.1* 21 TFIIF/RAP74 NMR
FnBPA Q53971 24 Fibronectin NMR
IA3 P01094 29 Proteinase A X-ray
Killer toxin P19972 77 Killer toxin chain X-ray
Bob-1 P10636 13 Pin1 WW NMR
MAP tau P25912 86 DNA X-ray
MAX Q16633 22 Oct-1 POU/DNA X-ray
p27Kip1 P46527 69 CycA-Cdk2 X-ray
p53 P04637 11 MDM 2 X-ray
Phe-tRNA synthetase P27001 79 Phe-tRNA synthetase + tRNA X-ray
PKI P04541 20 PKA X-ray
RB3 Q9H169 91 tubulin X-ray
RNA pol II P04050 17 mRNA capping enzyme X-ray
SNAP 25 P13795-2 77 neuronal fusion complex X-ray
SV 40 virus coat P03087 66 assembled coat X-ray
TAFII230 P51123 67 TBP NMR
TBS virus coat P11795 34 assembled coat X-ray
Tcf3 CAA67686* 41 -catenin X-ray
Tcf4 Q9NQB0 24 -catenin X-ray
Troponin I P19429 17 Troponin C NMR
Vitamin D3R P11473 89 DNA X-ray
Q3 SOV0
20
40
60
80
%
PSE: predictability of secondary structure
IDP
Partner
Fuxreiter et al. (2004) JMB 338, 1015
MorE, LM, PSE: devices of effective recognition
MoRE
PSE
Lacy et al (2004) NSMB 11, 358
Sequential mechanism of p27 binding
45
3) Tertiary structure
Dedmon et al. (2005) JACS 127, 476
Structural ensemble of a-synuclein
(NMR paramagnetic relaxation enhancement)
SAXS distance-distribution SAXS distance-distribution function and function and
topology of cellulase Etopology of cellulase E
Von Ossowski et al. (2005) Biophys. J. 88, 2823
102
103
104
105
106
107
Number of residues
Hy
dro
dy
na
mic
vo
lum
e, Å
3
Native
MGPMG
U (RC)
IUPPM
G
IUPRC
Uversky (2002) Prot. Sci. 11, 739
Global (tertiary) structure of IUPs
A lesson from denatured states of globular proteins:
Gillespie et al (1997) JMB 268, 170
spatial topology in denatured state resembles native structure (David
Shortle)
p27
ModelModelss
Protein trinity
Protein quartet
ordered
molten globule
random coil MG RC
ordered PMG
(Dunker) (Uversky)
The evolution of protein disorder
Evolution
Generation
Disorder in complete genomes (PONDR)
Dunker et al. (2000) Genome Inf. 11, 161
Disorder in complete genomes (DISOPRED)
Ward et al. (2004) JMB 337, 635
IDPs: high frequency in proteomes
Tompa et al. (2006) J. Prot. Res 5, 1996
coli
yeast
Structural disorder: evolutionary success story
20
40
60
0
LD
R (
40<
) pr
otei
n,
%
Domain of life
B
A
E
Vucetic et al. (2002) Proteins 52, 573
The evolution of protein disorder
Evolution
Generation
de novo generation
gene duplicationlateral gene transfer, LGT
The evolution of protein disorder
Evolution
Generation
Mutations
Point mutation
de novo generation
gene duplicationlateral gene transfer, LGT
Brown et al. (2002) J. Mol. Evol. 55, 104
Rapid evolution by point mutations
0
5
10
15
20
smallersamelargernu
mbe
r of
fam
ilies
evolutionary variability IUP vs glob.
Non-synonymous vs. synonymous substitutions
Point mutations
Synonymous (Ks)
Non-synonymous (Ka)
Nonsense
Evolution (Ka/Ks):
0.1-0.2: „functional”
1.0: „neutral” 1.0: „adaptive”
Rapid Rapid evolution of SRY of SRY genegeneSRY: sex determining region on the Y
chromosome (testis determining factor)
The evolution of protein disorder
Evolution
Generation
Mutations
Point mutation
Repeat expansion
de novo generation
gene duplicationlateral gene transfer, LGT
RNA polymerase II
TFs
Initiation
Elongation
Termination
CTDK
RNAP II CTD: coordination of 5’ capping, splicing, 3’ polyadenylation of mRNA
IGTGAFDVMIDEESLVKYMPEQKITEIEDGQDGGV
TPYSNESGLVNADLDVKDELMFSPLVDSGSNDAMA
GGFTAYGGADYGEATSPFGAYGEAPTSPGFGVSSP
GFSPTSPTYSPTSPAYSPTSPSYSPTSPSYSPTSP
SYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSP
SYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSP
SYSPTSPSYSPTSPAYSPTSPSYSPTSPSYSPTSP
SYSPTSPSYSPTSPNYSPTSPSYSPTSPGYSPGSP
AYSPKQDEQKHNENENSR
Yeast RNAP II CTDYeast RNAP II CTD
RNAP II CTD evolution
-2.5 -2.0 -1.5 -1.0 -0.5 0.0
10
20
30
40
50
60
time (GYr)
rep
eat
nu
mb
er
-SPSYSPT-
Repeats in IUPs and other datasets
0
5
10
15
20
25
30
35
40
Swiss-Prot Yeast Human IUP
freq
uenc
y (%
)
protein dataset
proteins
residues
Tompa (2003) BioEssays 25, 847
Protein
(repeat region)Repeat sequence Repetition Function Type
Calreticulin E/D2-8K/R1-3 6weak, large-capacity calcium
bindingI
Cdk p57 AP 43 linker between domains I
RS protein SC-35 (K)RS 50 mRNA splicing I
mastermind G1-7V/A 7 linker/spacer I
TF GAL11 Q 23assembly of transcription
preinitiation complexI
CPEB Q1-13R/I/L/S 15 regulation of mRNA translation I
Sup35p Q2X2NN/Y 14 Nonsense mutation suppression I
Sry (QQK)0,1Q2-13FHDH1-5 1– 19transactivator domain of sex-
determining factorI
Functional microsatellites (short repeats) in IDPs
MPRVYIGRLSYNVREKDIQRFFSGYGRLLEVDLKN
GYGFVEFEDSRDADDAVYELNGKELCGERVIVEHA
RGPRRDRDGYSYGSRSGGGGYSSRRTSGRDKYGPP
VRTEYRLIVENLSSRCSWQDLKDFMRQAGEVTYAD
AHKERTNEGVIEFRSYSDMKRALDKLDGTEINGRN
IRLIEDKPRTSHRRSYSGSRSRSRSRRRSRSRSRR
SSRSRSRSISKSRSRSRSRSKGRSRSRSKGRKSRS
KSKSKPKSDRGSHSHSRSRSKDEYEKSRSRSRSRS
PKENGKGDIKSKSRSRSQSRSNSPLPVPPSKARSVSPPPKRATSRSRSRSRSKSRSRSRSSSRD
SFRS6_HUMAN Splicing SFRS6_HUMAN Splicing factorfactor
MEGHVKRPMNAFMVWSRGERHKLAQQNPSMQNTEISKQLGCRWKSLTEAEKRPFFQEAQRLKILHREKYPNYKYQPHRRAKVSQRSGILQPAVASTKLYNLLQWDRNPHAITYRQDWSRAAHLYSKNQQSFYWQPVDIPTGHLQQQQQQQQQQQFHNHHQQQQQFYDHHQQQQQQQQQQQQFHDHHQQKQQFHDHHQQQQQFHDHHHHHQEQQFHDHHQQQQQFHDHQQQQQQQQQQQFHDHHQQKQQFHDHHHHQQQQQFHDHQQQQQQFHDHQQQQHQFHDHPQQKQQFHDHPQQQQQFHDHHHQQQQKQQFHDHHQQKQQFHDHHQQKQQFHDHHQQQQQFHDHHQQQQQQQQQQQQQFHDQQLTYLLTADITGEHTYQEHLSTALWLAVS
Mouse SRYMouse SRY (testis determining (testis determining factor)factor)
Protein
(repeat region)Repeat sequence Repetition Function Type
fibronectin-binding protein A (Du-D4)
EDT/SX9,10GGX3,4I/VDF
2 – 5 fibronectin binding I
involucrin (Q-region) QEGQLK/EH/LL/PEQ 24 – 63transglutaminase cross-
linking to form keratinocyte envelope
I
neurofilament-H (KSP domain)
XKSPY1-3K 42 – 55entropic sidearm of
neurofilamentsI
prion protein (octarepeats)
PQ/HGGGWGQ 3 – 14 copper binding III
RNA polymerase II (CTD)
YSPTSPS 11 – 52coordination of
transcription and mRNA processing
II
salivary PRPsPPPGKPQGPPPQGG
NKPQGPP6 – 33
binding of polyphenolic plant compounds
(tannins)I
tau proteinVQ/K/TSKI/CGSL/T/
KD/E/GNI/LK/H/THV/KQPGGG
3 – 5microtubule-binding,
polymerizationI
titin (PEVK)
PEV/APKEVVPEKKA/VPVAPPKKPEV/
APPVKV
5 – 60providing entropic elasticity during
sarcomere stretchI
Functional minisatellites (long repeats) in IDPs
MSQQHTLPVTLSPALSQELLKTVPPPVNTHQEQMKQPTPLPPPCQKVPVELPVEVPSKQEEKHMTAVKGLPEQECEQQQKEPQEQELQQQHWEQHEEYQKAENPEQQLKQEKTQRDQQLNKQLEEEKKLLDQQLDQELVKRDEQLGMKKEQLLELPEQQEGHLKHLEQQEGQLKHPEQQEGQLELPEQQEGQLELPEQQEGQLELPEQQEGQLELPEQQEGQLELPQQQEGQLELSEQQEGQLELSEQQEGQLELSEQQEGQLKHLEHQEGQLEVPEEQMGQLKYLEQQEGQLKHLDQQEQEGQLEQLEEQEGQLKHLEQQEGQLEHLEHQEGQLGLPEQQVLQLKQLEKQQGQPKHLEEEEGQLKHLVQQEGQLKHLVQQEGQLEQQERQVEHLEQQVGQLKHLEEQEGQLKHLEQQQGQLEVPEQQVGQPKNLEQEEKQLELPEQQEGQVKHLEKQEAQLELPEQQVGQPKHLEQQEKHLEHPEQQDGQLKHLEQQEGQLKDLEQQKGQLEQPVFAPAPGQVQDIQPALPTKGEVLLPVEHQQQKQEVQWPPKHK
INVO_HUMAN InvolucrinINVO_HUMAN Involucrin
................SDLGLCKKRPKPGGWNTGG
SRYPGQGSPGGNRYPPQGGGGWGQPHGGGWGQPHG
GGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKP
KTNMKHMAGAAAAGAVVGGLGGYMLGSAMSRPIIH
FGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNF
VHDCVNITIKQHTVTTTTKGENFTETDVKMMERVV
EQMCITQYERESQAYYQRGSSMVLFSSPPVILLIS
FLIFLIVG
PRIO_HUMAN major prion protein
IUPsIUPs often evolve by repeat often evolve by repeat expansionexpansion
Basic mechanisms of repeat Basic mechanisms of repeat expansionexpansion
Meiotic: replication slippage (micro)
Mitotic: unequal crossing over (mini)
Wells RD (2001) JBC 271, 2875)
Replication slippageReplication slippage
(Unequal) crossing (Unequal) crossing overover
Morgan 1916
Evolution of repetitive regions in IUPs
Tompa (2003) BioEssays 25, 847
Type I
Type II
Type III