View
214
Download
0
Tags:
Embed Size (px)
Citation preview
CENTR
FORINTEGRATIVE
BIOINFORMATICSVU
E
Anton Feenstra23 nov 2006
Sequence Entropy
Sequence Analysis
[2] 23 nov 2006 Anton Feenstra[2] 23 nov 2006 Anton Feenstra[2] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Significance of Alignment Positions
• Observed occurrence of amino acids at some position in an alignment that deviates from expected may indicate some (functional) significance
• What ‘deviates from expected’?
• unlikely occurrences
• What is unlikely?
• only (relatively) few possibilities to obtain observed result
[3] 23 nov 2006 Anton Feenstra[3] 23 nov 2006 Anton Feenstra[3] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Counting…
• Number of possibilities for finding given
numbers Nx of aminoacids types x :
• Problem: evaluates to huge numbers even for modest numbers of sequences…
!
!
xx
xx
N
N
[4] 23 nov 2006 Anton Feenstra[4] 23 nov 2006 Anton Feenstra[4] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Sequence Entropy
• Entropy:
• Using Stirlings approximation:ln M! ≈ M(ln M-1) for M 1≫
!
!
lnlnx
x
xx
N
N
S
x
xx plogpS
[5] 23 nov 2006 Anton Feenstra[5] 23 nov 2006 Anton Feenstra[5] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Shannon’s ‘Information Entropy’:
• ‘A Mathematical Theory of Communication’, The Bell System Technical Journal, Vol. 27, 1948.
“ Can we define a quantity which will measure, in some sense, how much information is ‘produced’ by such a process, or better, at what rate information is produced? ”
• He was thinking about the Transmission of Information, i.e., from a Source through some Channel to a Destination.
[6] 23 nov 2006 Anton Feenstra[6] 23 nov 2006 Anton Feenstra[6] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Choice, Uncertainty and Entropy
• A set of ‘events’ with probabilities p1, p2, …, pn
• Is there a measure that indicates how much ‘choice’ is possible, given those probabilities?
• If there is, it should be:
• continuous for all pi
• monotonic in n if all probabilities are equal
• additive for ‘sub-events’
[7] 23 nov 2006 Anton Feenstra[7] 23 nov 2006 Anton Feenstra[7] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Additivity:
H(1/2,1/3,1/6)
1/2
1/3
1/6H(2/3,1/3)
1/2
1/3
1/6
1/2
1/2
1/3
2/3
H(1/2,1/2)
H(1/2,1/2) + 1/2 H(2/3,1/3)=
[8] 23 nov 2006 Anton Feenstra[8] 23 nov 2006 Anton Feenstra[8] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Solution: Entropy
• the entropy of a set of probabilities pi
• measures information, choice and uncertainty
• zero only if only one pi is not zero
• there is only one choice
• maximal if all pi are equal
• most ‘uncertain’ situation: all options are possible
n
iii
1
plogpH
n
iii
1
plogpH
[9] 23 nov 2006 Anton Feenstra[9] 23 nov 2006 Anton Feenstra[9] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Information Content
• Shannon was thinking about the Transmission of Information, i.e., from a Source through some Channel to a Destination.
• …but it applies equally well to any type of ‘message’
• We can use it to measure the level of conservation in columns in an alignment
[10] 23 nov 2006 Anton Feenstra[10] 23 nov 2006 Anton Feenstra[10] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Simple Example: Sequence Entropy
A A A A A A LA A A A A L LA A A A L L LA A A L L L LA A L L L L LA L L L L L L
.0
.1
.2
.3
.4
.5
.6
.7
.8
.9
1.0
.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0
p
H
p1 = 0 p2 = 0
p1 = p2 = ½
p1 = f (‘L’)p2 = f (‘A’)
n
iii
1
plogpH
[11] 23 nov 2006 Anton Feenstra[11] 23 nov 2006 Anton Feenstra[11] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Conditional Probability / Prior Knowledge• Suppose we observe two things (A and B),
and we suspect a relation (A causes B)
• The fundamental question is then:How likely is A when we know B?
• (n.b., this is Bayes statistics…)
• Or, what is the uncertainty (entropy) of A knowing B?
[12] 23 nov 2006 Anton Feenstra[12] 23 nov 2006 Anton Feenstra[12] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Conditional Entropy
• Entropy of joint occurrence of
value i for event x and value j for event y :
• and
• so that
• i.e., the entropy of a joint event is less than or equal to the sum of the individual entropies
• it is equal only if the events are independent
ji
jipjipyx,
),(log),(),H(
ji i
jipjipy,
),(log),()H( ji j
jipjipx,
),(log),()H(
)H()H(),H( yxyx
[13] 23 nov 2006 Anton Feenstra[13] 23 nov 2006 Anton Feenstra[13] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Co-occurrence in practice
• Measure of ‘mutial information’, by relative entropy:
• more often written like:
• i.e., what is the entropy in x, given y
jii
iy i,j
i,ji,jx
, )p(
)p(log)p()(H
x y
xxyx
)p(
)p(log)p()||H(
[14] 23 nov 2006 Anton Feenstra[14] 23 nov 2006 Anton Feenstra[14] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Relative Entropy in Sequence Analysis• Many biological problems relate to questions like:
“Why do these proteins do this, and those proteins not?”
• or
“Why do these patients get sick, and those not?”
• The answer can be related to similarities and differences between sequences
• Similarities (conservation) relate to functionally critical positions
• Differences can explain functional differences
[15] 23 nov 2006 Anton Feenstra[15] 23 nov 2006 Anton Feenstra[15] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Comparing groups of Sequences
• For each position i in an alignment, we calculate the
relative entropy for group A vs. B from the
frequencies p (observed probabilities) of all
aminoacid types x, as follows:
n
x xi
xixii
1B,
A,A
,A/B
p
plogpH
[16] 23 nov 2006 Anton Feenstra[16] 23 nov 2006 Anton Feenstra[16] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Simple Example: Relative Entropy ‘A/B’
0.0
0.5
1.0
1.5
2.0
2.5
3.0
En
tro
py
/ H
arm
on
y
A A A A A A L A A A A A A LA A A A A L L A A A A A L LA A A A L L L A A A A L L LA A A L L L L A A A L L L LA A L L L L L A A L L L L LA L L L L L L A L L L L L LA A A A A A A A A A A A A AA A A A A A A A A A A A A AA A A A A A A L L L L L L L
A
B
x xi
xixi B
,
A,A
,p
plogp
x
xi
xi
xi A
,
B
,B
,p
plogp
[17] 23 nov 2006 Anton Feenstra[17] 23 nov 2006 Anton Feenstra[17] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Relative Entropy
• Captures similarities and differences
• Is infinite for completely dissimilar positions
• e.g., A vs. L
• Not symmetrical:
• Maybe not easy for selecting dissimilar positions
)(H)(H yx xy
[18] 23 nov 2006 Anton Feenstra[18] 23 nov 2006 Anton Feenstra[18] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Simple Example: Relative Entropy ‘A/AB’
0.0
0.5
1.0
1.5
2.0
2.5
3.0
En
tro
py
/ H
arm
on
y
A A A A A A L A A A A A A LA A A A A L L A A A A A L LA A A A L L L A A A A L L LA A A L L L L A A A L L L LA A L L L L L A A L L L L LA L L L L L L A L L L L L LA A A A A A A A A A A A A AA A A A A A A A A A A A A AA A A A A A A L L L L L L L
A
B
x xi
xix i AB
,
B,B
,p
plog p
x xi
xixi AB
,
A,A
,p
plogp
[19] 23 nov 2006 Anton Feenstra[19] 23 nov 2006 Anton Feenstra[19] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Relative Entropy (A/AB)
• Also captures similarities and differences
• Is no longer infinite for completely dissimilar positions
• Only symmetrical for equal size groups
• in practice, not symmetrical
• Maybe still not too easy for selecting dissimilar positions
[20] 23 nov 2006 Anton Feenstra[20] 23 nov 2006 Anton Feenstra[20] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Measuring Overlapping Distributions• Weigh both groups equally;
take pA+pB in stead of pAB :
• Fixed interval [0,1], but not completely symmetrical
x xixi
xixii B
,A,
A,A
,A/B
pp
plogp SH
[21] 23 nov 2006 Anton Feenstra[21] 23 nov 2006 Anton Feenstra[21] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
0.0
0.5
1.0
1.5
2.0
2.5
3.0
En
tro
py
/ H
arm
on
y
A A A A A A L A A A A A A LA A A A A L L A A A A A L LA A A A L L L A A A A L L LA A A L L L L A A A L L L LA A L L L L L A A L L L L LA L L L L L L A L L L L L LA A A A A A A A A A A A A AA A A A A A A A A A A A A AA A A A A A A L L L L L L L
A
B
Entropy vs. Sequence Harmony: Example
x xixi
xixi B
,A,
A,A
,pp
plogp x xixi
xixi B
,A,
B,B
,pp
plogp
[22] 23 nov 2006 Anton Feenstra[22] 23 nov 2006 Anton Feenstra[22] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Sequence Harmony
• Introduce symmetry by averaging:
• May seem a trivial choice, but:
B/ABA/AB21A/B SH SH SH iii
x xixi
xixi
x xixi
xixii B
,A,
B,B
,B,
A,
A,A
,A/B
pp
plogp
pp
plogp
2
1SH
x xxixixixixixi
xxixi
B,
A,
B,
A,
B,
B,
A,
A, pplogpp plogp plogp
2
1
xxixixixi
B,
A,
B,
A, pplogpp BHAH
2
1
2logBAHBHAH21
Pirovano, Feenstra & Heringa. “Sequence Comparison by Sequence Harmony Identifies Subtype Specific Functional Sites”,
Nucleic Acids Res., in press (2006).
[23] 23 nov 2006 Anton Feenstra[23] 23 nov 2006 Anton Feenstra[23] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
0.0
0.5
1.0
1.5
2.0
2.5
3.0
En
tro
py
/ H
arm
on
yEntropy vs. Sequence Harmony: Example
A A A A A A L A A A A A A LA A A A A L L A A A A A L LA A A A L L L A A A A L L LA A A L L L L A A A L L L LA A L L L L L A A L L L L LA L L L L L L A L L L L L LA A A A A A A A A A A A A AA A A A A A A A A A A A A AA A A A A A A L L L L L L L
A
B
2logBAHBHAH21
[24] 23 nov 2006 Anton Feenstra[24] 23 nov 2006 Anton Feenstra[24] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Analyzing multiple groups
• Relative Entropy and Sequence Harmony are defined to compare a set of groups (N=2)
• problem for multiple groups (N>2)
• A solution: Entropy-variability plots
• variability is number of different aminoacid types in a certain alignment position
• problem: variability always tends towards maximum (20) for larger number of sequences
• Another solution: Two-entropy plots
• Total family entropy vs. sum of sub-family entropy
[25] 23 nov 2006 Anton Feenstra[25] 23 nov 2006 Anton Feenstra[25] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Two-Entropies analysis of GPCRs
• Goal: to identify the function of individual positions
Ye, Lameijer, Beukers & IJzerman. “A Two-Entropies Analysis to Identify Functional Positions in the Transmembrane Region of Class A G Protein-Coupled Receptors”, Proteins 63, pp1018–1030 (2006)
[26] 23 nov 2006 Anton Feenstra[26] 23 nov 2006 Anton Feenstra[26] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
G-protein Coupled Receptors (GPCRs)• Huge family of integral cell-membrane proteins
• crucial in signal transduction
• 70 subfamilies
• 1935 Class A GPCRs
• Three main regions:
• extracellular side
• transmembrane (TM)
• cytoplasmic side
[27] 23 nov 2006 Anton Feenstra[27] 23 nov 2006 Anton Feenstra[27] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Two-entropies vs. Entropy-variability:upper vs. lower domain
Red: upper domain Blue: lower domain
[28] 23 nov 2006 Anton Feenstra[28] 23 nov 2006 Anton Feenstra[28] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Two-entropies vs. Entropy-variability:relative solvent accessibility (RSA)
Red: RSA < 15% Blue: RSA > 15%
[29] 23 nov 2006 Anton Feenstra[29] 23 nov 2006 Anton Feenstra[29] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Two-entropies vs. Entropy-variability:ligand binding site vs. other positions
Red: <4 Å from retinal Blue: >4 Å from retinal
Subfamily specific binding
Common activation mechanismSubfamily specific binding
Common activation mechanism
[30] 23 nov 2006 Anton Feenstra[30] 23 nov 2006 Anton Feenstra[30] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
GPCR Two-entropies analysis
Red: ligand binding Green: coupling & activation Blue: others
[31] 23 nov 2006 Anton Feenstra[31] 23 nov 2006 Anton Feenstra[31] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
GPCR Functional Site Prediction: Method Comparison
Finding known sites for ligand binding in Bovine Rhodopsin (left) and Aminergic Receptors (right)
[32] 23 nov 2006 Anton Feenstra[32] 23 nov 2006 Anton Feenstra[32] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Smad-MH2 Alignment & Sequence Harmony
• Walter Pirovano*, K. Anton Feenstra* and Jaap Heringa. “Sequence Comparison by Sequence Harmony Identifies Subtype Specific Functional Sites”, Nucleic Acids Res., in press (2006).
• K. Anton Feenstra, Walter Pirovano and Jaap Heringa. “Sub-type Specific Sites for SMAD Receptor Binding Identified by Sequence Comparison using ‘Sequence Harmony’ ”. in: From Computational Biophysics to Systems Biology. pp. 73-78. Eds. U.H.E. Hansmann, J. Meinke, S. Mohanty and O. Zimmermann, Jülich, NIC Series, Vol. 34, 2006.
• Elena Marchiori*, Walter Pirovano, Jaap Heringa and K. Anton Feenstra*. “A Feature Selection Algorithm for Detecting Subtype Specific Sites for Smad Receptor Binding”, Bio-ICMLA06, accepted (2006).
Smad2 H.sapiens D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L
Smad2 D.melanogaster D A A P V M Y H E P A F W C S I S Y Y E L N T R V G E T F H A S Q P S I T V D G F T D P S N S E - R F C L G L
Smad2 D.rerio D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L
Smad2 C.auratus D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L
Smad2 R.norvegicus D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L
Smad2 M.musculus D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L
Smad2 D.rerio D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L C L
Smad3 S.scrofa D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad3 X.laevis D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad3 H.sapiens D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad3 M.musculus D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R L C L G L
Smad3 C.auratus D L Q P V T Y C E S A F W C S I S Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N A E - R F C L G L
Smad3 G.gallus D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad3 S.scrofa D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad3 R.norvegicus D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad1 S.mansoni T M H P V N Y Q E P K Y W C S I V Y Y E L N N R V G E A F N A S Q L S I I I D G F T D P S N N S D R F C L G L
Smad1 M.musculus D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad1 H.sapiens D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad1 S.scrofa D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad1 R.norvegicus D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad1 X.tropicalis D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N R N R F C L G L
Smad1 G.gallus D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L
Smad1 D.rerio D V H P V A Y Q E P K H W C S I V Y Y E L N N R V G E A F L A S S T S V L V D G F T D P S N N R N R F C L G L
Smad1 C.coturnix D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L
Smad5 H.sapiens D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L
Smad5 M.musculus D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L
Smad5 R.norvegicus D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P A N N K S R F C L G L
Smad5 G.gallus D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad5 D.rerio D V Q P V E Y Q E P S H W C S I V Y Y E L N N R V G E A Y H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad8 M.musculus D F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L
Smad8 R.norvegicus D F R P V C Y E E P L H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L
Smad8 G.gallus N F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S I L I D G F T D P S N N K N R F C L G L
50
|
40
|
20
|
30
|
10
|
L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N E V V E Q T R R H I G K G V R L Y Y I G G E V F A E C L S D S S I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y D W H P A T V C K
L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D N A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H N F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D T S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K
110
|
100
|
80
|
90
|
70
|
60
|
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L S Q S V S Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y R L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C S L K I F S N Q E F A H - - - - L L S R T V H H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T L R M S F V K G W G A E Y H R Q D V
I P S R C S L K I F N N Q E F A E - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A K Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T L R M S F V K G W G A E Y H R Q D V
I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K V F N N Q L F A Q - - - - L L A Q S V H H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K V F N N Q L F A Q L L A Q L L A Q S V H H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q L F A Q - - - - P L A Q S V N H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
170
|
160
|
140
|
150
|
130
|
120
|
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S
T S T P C W I E L H L N G P L Q W L D R V L T Q M G S P R L P C S S M S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P N L R C S S V S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S
T S T P C W V E I H L N G P L Q W L D R V L T Q M G T P R N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S
T S T P C W I E V H L H G P L Q W L D K V L T Q M G S P L N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
210
|
200
|
190
|
180
|
260 280 300 320 340 360 380 400 420 440 460
1
0
[33] 23 nov 2006 Anton Feenstra[33] 23 nov 2006 Anton Feenstra[33] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Smad2 H.sapiens D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L
Smad2 D.melanogaster D A A P V M Y H E P A F W C S I S Y Y E L N T R V G E T F H A S Q P S I T V D G F T D P S N S E - R F C L G L
Smad2 D.rerio D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L
Smad2 C.auratus D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L
Smad2 R.norvegicus D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L
Smad2 M.musculus D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L
Smad2 D.rerio D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L C L
Smad3 S.scrofa D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad3 X.laevis D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad3 H.sapiens D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad3 M.musculus D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R L C L G L
Smad3 C.auratus D L Q P V T Y C E S A F W C S I S Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N A E - R F C L G L
Smad3 G.gallus D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad3 S.scrofa D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad3 R.norvegicus D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad1 S.mansoni T M H P V N Y Q E P K Y W C S I V Y Y E L N N R V G E A F N A S Q L S I I I D G F T D P S N N S D R F C L G L
Smad1 M.musculus D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad1 H.sapiens D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad1 S.scrofa D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad1 R.norvegicus D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad1 X.tropicalis D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N R N R F C L G L
Smad1 G.gallus D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L
Smad1 D.rerio D V H P V A Y Q E P K H W C S I V Y Y E L N N R V G E A F L A S S T S V L V D G F T D P S N N R N R F C L G L
Smad1 C.coturnix D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L
Smad5 H.sapiens D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L
Smad5 M.musculus D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L
Smad5 R.norvegicus D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P A N N K S R F C L G L
Smad5 G.gallus D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad5 D.rerio D V Q P V E Y Q E P S H W C S I V Y Y E L N N R V G E A Y H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad8 M.musculus D F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L
Smad8 R.norvegicus D F R P V C Y E E P L H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L
Smad8 G.gallus N F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S I L I D G F T D P S N N K N R F C L G L
26
2 27
0 28
0 29
0 30
0 31
0
AR
BR
Smads: Comparing two Groups
[34] 23 nov 2006 Anton Feenstra[34] 23 nov 2006 Anton Feenstra[34] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Smad-MH2 Alignment & Functionally Specific Sites
• 29 known sites of functional specificity
• based mostly on site-specific mutants and characterized on affinity for binding to BMPR-I vs. TBR-I receptor types
Smad2 H.sapiens D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L
Smad2 D.melanogaster D A A P V M Y H E P A F W C S I S Y Y E L N T R V G E T F H A S Q P S I T V D G F T D P S N S E - R F C L G L
Smad2 D.rerio D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L
Smad2 C.auratus D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L
Smad2 R.norvegicus D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L
Smad2 M.musculus D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L
Smad2 D.rerio D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L C L
Smad3 S.scrofa D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad3 X.laevis D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad3 H.sapiens D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad3 M.musculus D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R L C L G L
Smad3 C.auratus D L Q P V T Y C E S A F W C S I S Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N A E - R F C L G L
Smad3 G.gallus D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad3 S.scrofa D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad3 R.norvegicus D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L
Smad1 S.mansoni T M H P V N Y Q E P K Y W C S I V Y Y E L N N R V G E A F N A S Q L S I I I D G F T D P S N N S D R F C L G L
Smad1 M.musculus D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad1 H.sapiens D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad1 S.scrofa D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad1 R.norvegicus D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad1 X.tropicalis D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N R N R F C L G L
Smad1 G.gallus D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L
Smad1 D.rerio D V H P V A Y Q E P K H W C S I V Y Y E L N N R V G E A F L A S S T S V L V D G F T D P S N N R N R F C L G L
Smad1 C.coturnix D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L
Smad5 H.sapiens D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L
Smad5 M.musculus D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L
Smad5 R.norvegicus D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P A N N K S R F C L G L
Smad5 G.gallus D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad5 D.rerio D V Q P V E Y Q E P S H W C S I V Y Y E L N N R V G E A Y H A S S T S V L V D G F T D P S N N K N R F C L G L
Smad8 M.musculus D F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L
Smad8 R.norvegicus D F R P V C Y E E P L H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L
Smad8 G.gallus N F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S I L I D G F T D P S N N K N R F C L G L
50
|
40
|
20
|
30
|
10
|
L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N E V V E Q T R R H I G K G V R L Y Y I G G E V F A E C L S D S S I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y D W H P A T V C K
L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D N A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H N F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D T S I F V Q S R N C N Y H H G F H P T T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K
L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K
110
|
100
|
80
|
90
|
70
|
60
|
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L S Q S V S Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y R L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V
I P P G C S L K I F S N Q E F A H - - - - L L S R T V H H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T L R M S F V K G W G A E Y H R Q D V
I P S R C S L K I F N N Q E F A E - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A K Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T L R M S F V K G W G A E Y H R Q D V
I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K V F N N Q L F A Q - - - - L L A Q S V H H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K V F N N Q L F A Q L L A Q L L A Q S V H H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
I P S G C S L K I F N N Q L F A Q - - - - P L A Q S V N H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V
170
|
160
|
140
|
150
|
130
|
120
|
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S
T S T P C W I E L H L N G P L Q W L D R V L T Q M G S P R L P C S S M S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P N L R C S S V S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S
T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S
T S T P C W V E I H L N G P L Q W L D R V L T Q M G T P R N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S
T S T P C W I E V H L H G P L Q W L D K V L T Q M G S P L N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S
210
|
200
|
190
|
180
|
Method Predict Specificity Error
AMAS 6 21% 3%
TreeDet 21 52% 21%
SDPpred 12 31% 10%
[35] 23 nov 2006 Anton Feenstra[35] 23 nov 2006 Anton Feenstra[35] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Pos. Sec.str. SH AR BR Interaction
L263 B1’ 0 La Vfm SARA
T267 B1’ 0 Tm Acen SARA
S269 loop 0 CSh Eq ?? (putative)
A272 loop 0 A Kqls ?? (putative)
F273 loop 0 F Hy ?? (putative)
Q284 B2 0 Qt N TR-I
Q294 loop 0.16 Q Sq c-Ski/SnoN
P295 B3 0 P Trl c-Ski/SnoN
L297 B3 0.11 LMi Vi c-Ski/SnoN
T298 B3 0 T Li c-Ski/SnoN
S308 L1 0 Sa N c-Ski/SnoN
– L1 0 – Nsd c-Ski/SnoN
E309 L1 0 E Krs c-Ski/SnoN
A323 H1 0 Ae S ALK1/2
V325 H1 0 V I ?? (putative)
M327 H1 0 LMq N ALK1/2
R334 Loop 0.18 Rk K ?? (putative)
R337 B5 0 R H ?? (putative)
Finding Low-harmony sites in Smad-MH2
27
0 28
0 29
0
D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G
D A A P V M Y H E P A F W C S I S Y Y E L N T R V G E T F H A S Q P S I T V D G
D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G
D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G
D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G
D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G
D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G
D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G
D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G
D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G
D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G
D L Q P V T Y C E S A F W C S I S Y Y E L N Q R V G E T F H A S Q P S L T V D G
D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G
D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G
D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G
T M H P V N Y Q E P K Y W C S I V Y Y E L N N R V G E A F N A S Q L S I I I D G
D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G
D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G
D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G
D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G
D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G
D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G
D V H P V A Y Q E P K H W C S I V Y Y E L N N R V G E A F L A S S T S V L V D G
D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G
D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G
D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G
D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G
D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G
D V Q P V E Y Q E P S H W C S I V Y Y E L N N R V G E A Y H A S S T S V L V D G
D F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G
D F R P V C Y E E P L H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G
N F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S I L I D G
30
0
[36] 23 nov 2006 Anton Feenstra[36] 23 nov 2006 Anton Feenstra[36] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Finding Low-harmony sites in Smad-MH2Pos. Sec.str. SH AR BR Interaction
L263 B1’ 0 La Vfm SARA
T267 B1’ 0 Tm Acen SARA
S269 loop 0 CSh Eq ?? (putative)
A272 loop 0 A Kqls ?? (putative)
F273 loop 0 F Hy ?? (putative)
Q284 B2 0 Qt N TR-I
Q294 loop 0.16 Q Sq c-Ski/SnoN
P295 B3 0 P Trl c-Ski/SnoN
L297 B3 0.11 LMi Vi c-Ski/SnoN
T298 B3 0 T Li c-Ski/SnoN
S308 L1 0 Sa N c-Ski/SnoN
– L1 0 – Nsd c-Ski/SnoN
E309 L1 0 E Krs c-Ski/SnoN
A323 H1 0 Ae S ALK1/2
V325 H1 0 V I ?? (putative)
M327 H1 0 LMq N ALK1/2
R334 Loop 0.18 Rk K ?? (putative)
R337 B5 0 R H ?? (putative)
Method Predict Specificity Error
AMAS 6 21% 3%
TreeDet 21 52% 21%
SDPpred 12 31% 10%
Sequence Harmony
(SH=0) 32(SH<0.2)
40
79%
93%
28%
33%
Pirovano, Feenstra & Heringa. “Sequence Comparison by Sequence Harmony Identifies Subtype Specific Functional Sites”, Nucleic Acids Res., in press (2006).www.few.vu.nl/~feenstra/articles/NAR 2006 Sequence Harmony.pdf
[37] 23 nov 2006 Anton Feenstra[37] 23 nov 2006 Anton Feenstra[37] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Smad-MH2: Low Harmony Patches
[38] 23 nov 2006 Anton Feenstra[38] 23 nov 2006 Anton Feenstra[38] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Smad-MH2: Functional Clusters
R462 C463
Q400
R410 W368
Y366
A392
S269
F273
N443
Q294
Q309L297
L440
N381
A354
V461
S460Q407
Q364
P360
R365
T267
A272
I341
P295S308
T298R337F346
P378
Q284
V325
A323R427
M327T430
R334FAST1, Mixer, SARA
c-Ski/SnoN
SARA
TR-I/ALK1/2TR-I/BMPR-I
?SARA/Mixer
TR-I/BMPR-I/ALK1/2
?
receptor-binding
retention & transcription factorsco-repressors
[39] 23 nov 2006 Anton Feenstra[39] 23 nov 2006 Anton Feenstra[39] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Conclusions Smad-MH2
• 40 Sites of Low Sequence Harmony in Smad-MH2 • different between the AR (TGF-) and BR (BMP) sub-type Smads
• Low Harmony sites in Smad-MH2 are functionally relevant
• Other methods cannot select all known sites!
Functional Sites are Interaction Surfaces on Protein Surface: Next: Analyze Interaction Partners in the Pathway
• 14 Low Harmony Sites in Smad-MH2 of unknown function• 11 putative functions from structural considerations
• promising candidates that determine TGF-/BMP specificity
• confirm (or rebuke) putative functions?
[40] 23 nov 2006 Anton Feenstra[40] 23 nov 2006 Anton Feenstra[40] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Sequence Harmony Webserver
http://www.ibi.vu.nl/programs/seqharmwww1-b/
[41] 23 nov 2006 Anton Feenstra[41] 23 nov 2006 Anton Feenstra[41] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Sequence Harmony Webserver: Groups
[42] 23 nov 2006 Anton Feenstra[42] 23 nov 2006 Anton Feenstra[42] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Sequence Harmony Webserver: Reference
[43] 23 nov 2006 Anton Feenstra[43] 23 nov 2006 Anton Feenstra[43] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
Sequence Harmony Webserver: Structure
[44] 23 nov 2006 Anton Feenstra[44] 23 nov 2006 Anton Feenstra[44] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
SMAD Sequence Harmony: Raw Table
[45] 23 nov 2006 Anton Feenstra[45] 23 nov 2006 Anton Feenstra[45] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
SMAD Sequence Harmony: Results
[46] 23 nov 2006 Anton Feenstra[46] 23 nov 2006 Anton Feenstra[46] 23 nov 2006 Anton Feenstra
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
SMAD Sequence Harmony: Structure