46
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

CENTR

FORINTEGRATIVE

BIOINFORMATICSVU

E

Anton Feenstra23 nov 2006

Sequence Entropy

Sequence Analysis

Page 2: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[2] 23 nov 2006 Anton Feenstra[2] 23 nov 2006 Anton Feenstra[2] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Significance of Alignment Positions

• Observed occurrence of amino acids at some position in an alignment that deviates from expected may indicate some (functional) significance

• What ‘deviates from expected’?

• unlikely occurrences

• What is unlikely?

• only (relatively) few possibilities to obtain observed result

Page 3: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[3] 23 nov 2006 Anton Feenstra[3] 23 nov 2006 Anton Feenstra[3] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Counting…

• Number of possibilities for finding given

numbers Nx of aminoacids types x :

• Problem: evaluates to huge numbers even for modest numbers of sequences…

!

!

xx

xx

N

N

Page 4: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[4] 23 nov 2006 Anton Feenstra[4] 23 nov 2006 Anton Feenstra[4] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Sequence Entropy

• Entropy:

• Using Stirlings approximation:ln M! ≈ M(ln M-1) for M 1≫

!

!

lnlnx

x

xx

N

N

S

x

xx plogpS

Page 5: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[5] 23 nov 2006 Anton Feenstra[5] 23 nov 2006 Anton Feenstra[5] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Shannon’s ‘Information Entropy’:

• ‘A Mathematical Theory of Communication’, The Bell System Technical Journal, Vol. 27, 1948.

“ Can we define a quantity which will measure, in some sense, how much information is ‘produced’ by such a process, or better, at what rate information is produced? ”

• He was thinking about the Transmission of Information, i.e., from a Source through some Channel to a Destination.

Page 6: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[6] 23 nov 2006 Anton Feenstra[6] 23 nov 2006 Anton Feenstra[6] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Choice, Uncertainty and Entropy

• A set of ‘events’ with probabilities p1, p2, …, pn

• Is there a measure that indicates how much ‘choice’ is possible, given those probabilities?

• If there is, it should be:

• continuous for all pi

• monotonic in n if all probabilities are equal

• additive for ‘sub-events’

Page 7: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[7] 23 nov 2006 Anton Feenstra[7] 23 nov 2006 Anton Feenstra[7] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Additivity:

H(1/2,1/3,1/6)

1/2

1/3

1/6H(2/3,1/3)

1/2

1/3

1/6

1/2

1/2

1/3

2/3

H(1/2,1/2)

H(1/2,1/2) + 1/2 H(2/3,1/3)=

Page 8: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[8] 23 nov 2006 Anton Feenstra[8] 23 nov 2006 Anton Feenstra[8] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Solution: Entropy

• the entropy of a set of probabilities pi

• measures information, choice and uncertainty

• zero only if only one pi is not zero

• there is only one choice

• maximal if all pi are equal

• most ‘uncertain’ situation: all options are possible

n

iii

1

plogpH

n

iii

1

plogpH

Page 9: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[9] 23 nov 2006 Anton Feenstra[9] 23 nov 2006 Anton Feenstra[9] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Information Content

• Shannon was thinking about the Transmission of Information, i.e., from a Source through some Channel to a Destination.

• …but it applies equally well to any type of ‘message’

• We can use it to measure the level of conservation in columns in an alignment

Page 10: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[10] 23 nov 2006 Anton Feenstra[10] 23 nov 2006 Anton Feenstra[10] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Simple Example: Sequence Entropy

A A A A A A LA A A A A L LA A A A L L LA A A L L L LA A L L L L LA L L L L L L

.0

.1

.2

.3

.4

.5

.6

.7

.8

.9

1.0

.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

p

H

p1 = 0 p2 = 0

p1 = p2 = ½

p1 = f (‘L’)p2 = f (‘A’)

n

iii

1

plogpH

Page 11: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[11] 23 nov 2006 Anton Feenstra[11] 23 nov 2006 Anton Feenstra[11] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Conditional Probability / Prior Knowledge• Suppose we observe two things (A and B),

and we suspect a relation (A causes B)

• The fundamental question is then:How likely is A when we know B?

• (n.b., this is Bayes statistics…)

• Or, what is the uncertainty (entropy) of A knowing B?

Page 12: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[12] 23 nov 2006 Anton Feenstra[12] 23 nov 2006 Anton Feenstra[12] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Conditional Entropy

• Entropy of joint occurrence of

value i for event x and value j for event y :

• and

• so that

• i.e., the entropy of a joint event is less than or equal to the sum of the individual entropies

• it is equal only if the events are independent

ji

jipjipyx,

),(log),(),H(

ji i

jipjipy,

),(log),()H( ji j

jipjipx,

),(log),()H(

)H()H(),H( yxyx

Page 13: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[13] 23 nov 2006 Anton Feenstra[13] 23 nov 2006 Anton Feenstra[13] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Co-occurrence in practice

• Measure of ‘mutial information’, by relative entropy:

• more often written like:

• i.e., what is the entropy in x, given y

jii

iy i,j

i,ji,jx

, )p(

)p(log)p()(H

x y

xxyx

)p(

)p(log)p()||H(

Page 14: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[14] 23 nov 2006 Anton Feenstra[14] 23 nov 2006 Anton Feenstra[14] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Relative Entropy in Sequence Analysis• Many biological problems relate to questions like:

“Why do these proteins do this, and those proteins not?”

• or

“Why do these patients get sick, and those not?”

• The answer can be related to similarities and differences between sequences

• Similarities (conservation) relate to functionally critical positions

• Differences can explain functional differences

Page 15: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[15] 23 nov 2006 Anton Feenstra[15] 23 nov 2006 Anton Feenstra[15] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Comparing groups of Sequences

• For each position i in an alignment, we calculate the

relative entropy for group A vs. B from the

frequencies p (observed probabilities) of all

aminoacid types x, as follows:

n

x xi

xixii

1B,

A,A

,A/B

p

plogpH

Page 16: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[16] 23 nov 2006 Anton Feenstra[16] 23 nov 2006 Anton Feenstra[16] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Simple Example: Relative Entropy ‘A/B’

0.0

0.5

1.0

1.5

2.0

2.5

3.0

En

tro

py

/ H

arm

on

y

A A A A A A L A A A A A A LA A A A A L L A A A A A L LA A A A L L L A A A A L L LA A A L L L L A A A L L L LA A L L L L L A A L L L L LA L L L L L L A L L L L L LA A A A A A A A A A A A A AA A A A A A A A A A A A A AA A A A A A A L L L L L L L

A

B

x xi

xixi B

,

A,A

,p

plogp

x

xi

xi

xi A

,

B

,B

,p

plogp

Page 17: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[17] 23 nov 2006 Anton Feenstra[17] 23 nov 2006 Anton Feenstra[17] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Relative Entropy

• Captures similarities and differences

• Is infinite for completely dissimilar positions

• e.g., A vs. L

• Not symmetrical:

• Maybe not easy for selecting dissimilar positions

)(H)(H yx xy

Page 18: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[18] 23 nov 2006 Anton Feenstra[18] 23 nov 2006 Anton Feenstra[18] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Simple Example: Relative Entropy ‘A/AB’

0.0

0.5

1.0

1.5

2.0

2.5

3.0

En

tro

py

/ H

arm

on

y

A A A A A A L A A A A A A LA A A A A L L A A A A A L LA A A A L L L A A A A L L LA A A L L L L A A A L L L LA A L L L L L A A L L L L LA L L L L L L A L L L L L LA A A A A A A A A A A A A AA A A A A A A A A A A A A AA A A A A A A L L L L L L L

A

B

x xi

xix i AB

,

B,B

,p

plog p

x xi

xixi AB

,

A,A

,p

plogp

Page 19: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[19] 23 nov 2006 Anton Feenstra[19] 23 nov 2006 Anton Feenstra[19] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Relative Entropy (A/AB)

• Also captures similarities and differences

• Is no longer infinite for completely dissimilar positions

• Only symmetrical for equal size groups

• in practice, not symmetrical

• Maybe still not too easy for selecting dissimilar positions

Page 20: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[20] 23 nov 2006 Anton Feenstra[20] 23 nov 2006 Anton Feenstra[20] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Measuring Overlapping Distributions• Weigh both groups equally;

take pA+pB in stead of pAB :

• Fixed interval [0,1], but not completely symmetrical

x xixi

xixii B

,A,

A,A

,A/B

pp

plogp SH

Page 21: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[21] 23 nov 2006 Anton Feenstra[21] 23 nov 2006 Anton Feenstra[21] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

0.0

0.5

1.0

1.5

2.0

2.5

3.0

En

tro

py

/ H

arm

on

y

A A A A A A L A A A A A A LA A A A A L L A A A A A L LA A A A L L L A A A A L L LA A A L L L L A A A L L L LA A L L L L L A A L L L L LA L L L L L L A L L L L L LA A A A A A A A A A A A A AA A A A A A A A A A A A A AA A A A A A A L L L L L L L

A

B

Entropy vs. Sequence Harmony: Example

x xixi

xixi B

,A,

A,A

,pp

plogp x xixi

xixi B

,A,

B,B

,pp

plogp

Page 22: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[22] 23 nov 2006 Anton Feenstra[22] 23 nov 2006 Anton Feenstra[22] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Sequence Harmony

• Introduce symmetry by averaging:

• May seem a trivial choice, but:

B/ABA/AB21A/B SH SH SH iii

x xixi

xixi

x xixi

xixii B

,A,

B,B

,B,

A,

A,A

,A/B

pp

plogp

pp

plogp

2

1SH

x xxixixixixixi

xxixi

B,

A,

B,

A,

B,

B,

A,

A, pplogpp plogp plogp

2

1

xxixixixi

B,

A,

B,

A, pplogpp BHAH

2

1

2logBAHBHAH21

Pirovano, Feenstra & Heringa. “Sequence Comparison by Sequence Harmony Identifies Subtype Specific Functional Sites”,

Nucleic Acids Res., in press (2006).

Page 23: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[23] 23 nov 2006 Anton Feenstra[23] 23 nov 2006 Anton Feenstra[23] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

0.0

0.5

1.0

1.5

2.0

2.5

3.0

En

tro

py

/ H

arm

on

yEntropy vs. Sequence Harmony: Example

A A A A A A L A A A A A A LA A A A A L L A A A A A L LA A A A L L L A A A A L L LA A A L L L L A A A L L L LA A L L L L L A A L L L L LA L L L L L L A L L L L L LA A A A A A A A A A A A A AA A A A A A A A A A A A A AA A A A A A A L L L L L L L

A

B

2logBAHBHAH21

Page 24: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[24] 23 nov 2006 Anton Feenstra[24] 23 nov 2006 Anton Feenstra[24] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Analyzing multiple groups

• Relative Entropy and Sequence Harmony are defined to compare a set of groups (N=2)

• problem for multiple groups (N>2)

• A solution: Entropy-variability plots

• variability is number of different aminoacid types in a certain alignment position

• problem: variability always tends towards maximum (20) for larger number of sequences

• Another solution: Two-entropy plots

• Total family entropy vs. sum of sub-family entropy

Page 25: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[25] 23 nov 2006 Anton Feenstra[25] 23 nov 2006 Anton Feenstra[25] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Two-Entropies analysis of GPCRs

• Goal: to identify the function of individual positions

Ye, Lameijer, Beukers & IJzerman. “A Two-Entropies Analysis to Identify Functional Positions in the Transmembrane Region of Class A G Protein-Coupled Receptors”, Proteins 63, pp1018–1030 (2006)

Page 26: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[26] 23 nov 2006 Anton Feenstra[26] 23 nov 2006 Anton Feenstra[26] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

G-protein Coupled Receptors (GPCRs)• Huge family of integral cell-membrane proteins

• crucial in signal transduction

• 70 subfamilies

• 1935 Class A GPCRs

• Three main regions:

• extracellular side

• transmembrane (TM)

• cytoplasmic side

Page 27: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[27] 23 nov 2006 Anton Feenstra[27] 23 nov 2006 Anton Feenstra[27] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Two-entropies vs. Entropy-variability:upper vs. lower domain

Red: upper domain Blue: lower domain

Page 28: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[28] 23 nov 2006 Anton Feenstra[28] 23 nov 2006 Anton Feenstra[28] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Two-entropies vs. Entropy-variability:relative solvent accessibility (RSA)

Red: RSA < 15% Blue: RSA > 15%

Page 29: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[29] 23 nov 2006 Anton Feenstra[29] 23 nov 2006 Anton Feenstra[29] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Two-entropies vs. Entropy-variability:ligand binding site vs. other positions

Red: <4 Å from retinal Blue: >4 Å from retinal

Subfamily specific binding

Common activation mechanismSubfamily specific binding

Common activation mechanism

Page 30: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[30] 23 nov 2006 Anton Feenstra[30] 23 nov 2006 Anton Feenstra[30] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

GPCR Two-entropies analysis

Red: ligand binding Green: coupling & activation Blue: others

Page 31: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[31] 23 nov 2006 Anton Feenstra[31] 23 nov 2006 Anton Feenstra[31] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

GPCR Functional Site Prediction: Method Comparison

Finding known sites for ligand binding in Bovine Rhodopsin (left) and Aminergic Receptors (right)

Page 32: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[32] 23 nov 2006 Anton Feenstra[32] 23 nov 2006 Anton Feenstra[32] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Smad-MH2 Alignment & Sequence Harmony

• Walter Pirovano*, K. Anton Feenstra* and Jaap Heringa. “Sequence Comparison by Sequence Harmony Identifies Subtype Specific Functional Sites”, Nucleic Acids Res., in press (2006).

• K. Anton Feenstra, Walter Pirovano and Jaap Heringa. “Sub-type Specific Sites for SMAD Receptor Binding Identified by Sequence Comparison using ‘Sequence Harmony’ ”. in: From Computational Biophysics to Systems Biology. pp. 73-78. Eds. U.H.E. Hansmann, J. Meinke, S. Mohanty and O. Zimmermann, Jülich, NIC Series, Vol. 34, 2006.

• Elena Marchiori*, Walter Pirovano, Jaap Heringa and K. Anton Feenstra*. “A Feature Selection Algorithm for Detecting Subtype Specific Sites for Smad Receptor Binding”, Bio-ICMLA06, accepted (2006).

Smad2 H.sapiens  D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 D.melanogaster   D A A P V M Y H E P A F W C S I S Y Y E L N T R V G E T F H A S Q P S I T V D G F T D P S N S E - R F C L G L

Smad2 D.rerio   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 C.auratus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 R.norvegicus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 M.musculus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 D.rerio   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L C L

Smad3 S.scrofa   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 X.laevis   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 H.sapiens   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 M.musculus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R L C L G L

Smad3 C.auratus   D L Q P V T Y C E S A F W C S I S Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N A E - R F C L G L

Smad3 G.gallus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 S.scrofa   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 R.norvegicus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad1 S.mansoni   T M H P V N Y Q E P K Y W C S I V Y Y E L N N R V G E A F N A S Q L S I I I D G F T D P S N N S D R F C L G L

Smad1 M.musculus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 H.sapiens   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 S.scrofa   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 R.norvegicus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 X.tropicalis   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N R N R F C L G L

Smad1 G.gallus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L

Smad1 D.rerio   D V H P V A Y Q E P K H W C S I V Y Y E L N N R V G E A F L A S S T S V L V D G F T D P S N N R N R F C L G L

Smad1 C.coturnix   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L

Smad5 H.sapiens   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L

Smad5 M.musculus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L

Smad5 R.norvegicus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P A N N K S R F C L G L

Smad5 G.gallus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad5 D.rerio   D V Q P V E Y Q E P S H W C S I V Y Y E L N N R V G E A Y H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad8 M.musculus   D F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L

Smad8 R.norvegicus   D F R P V C Y E E P L H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L

Smad8 G.gallus   N F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S I L I D G F T D P S N N K N R F C L G L

50

|

40

|

20

|

30

|

10

|

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N E V V E Q T R R H I G K G V R L Y Y I G G E V F A E C L S D S S I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y D W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D N A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H N F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D T S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K

110

|

100

|

80

|

90

|

70

|

60

|

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L S Q S V S Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y R L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C S L K I F S N Q E F A H - - - - L L S R T V H H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T L R M S F V K G W G A E Y H R Q D V

I P S R C S L K I F N N Q E F A E - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A K Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T L R M S F V K G W G A E Y H R Q D V

I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K V F N N Q L F A Q - - - - L L A Q S V H H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K V F N N Q L F A Q L L A Q L L A Q S V H H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q L F A Q - - - - P L A Q S V N H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

170

|

160

|

140

|

150

|

130

|

120

|

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D R V L T Q M G S P R L P C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P N L R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W V E I H L N G P L Q W L D R V L T Q M G T P R N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E V H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

210

|

200

|

190

|

180

|

260 280 300 320 340 360 380 400 420 440 460

1

0

Page 33: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[33] 23 nov 2006 Anton Feenstra[33] 23 nov 2006 Anton Feenstra[33] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Smad2 H.sapiens  D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 D.melanogaster   D A A P V M Y H E P A F W C S I S Y Y E L N T R V G E T F H A S Q P S I T V D G F T D P S N S E - R F C L G L

Smad2 D.rerio   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 C.auratus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 R.norvegicus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 M.musculus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 D.rerio   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L C L

Smad3 S.scrofa   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 X.laevis   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 H.sapiens   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 M.musculus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R L C L G L

Smad3 C.auratus   D L Q P V T Y C E S A F W C S I S Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N A E - R F C L G L

Smad3 G.gallus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 S.scrofa   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 R.norvegicus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad1 S.mansoni   T M H P V N Y Q E P K Y W C S I V Y Y E L N N R V G E A F N A S Q L S I I I D G F T D P S N N S D R F C L G L

Smad1 M.musculus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 H.sapiens   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 S.scrofa   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 R.norvegicus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 X.tropicalis   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N R N R F C L G L

Smad1 G.gallus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L

Smad1 D.rerio   D V H P V A Y Q E P K H W C S I V Y Y E L N N R V G E A F L A S S T S V L V D G F T D P S N N R N R F C L G L

Smad1 C.coturnix   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L

Smad5 H.sapiens   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L

Smad5 M.musculus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L

Smad5 R.norvegicus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P A N N K S R F C L G L

Smad5 G.gallus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad5 D.rerio   D V Q P V E Y Q E P S H W C S I V Y Y E L N N R V G E A Y H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad8 M.musculus   D F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L

Smad8 R.norvegicus   D F R P V C Y E E P L H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L

Smad8 G.gallus   N F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S I L I D G F T D P S N N K N R F C L G L

26

2 27

0 28

0 29

0 30

0 31

0

AR

BR

Smads: Comparing two Groups

Page 34: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[34] 23 nov 2006 Anton Feenstra[34] 23 nov 2006 Anton Feenstra[34] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Smad-MH2 Alignment & Functionally Specific Sites

• 29 known sites of functional specificity

• based mostly on site-specific mutants and characterized on affinity for binding to BMPR-I vs. TBR-I receptor types

Smad2 H.sapiens  D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 D.melanogaster   D A A P V M Y H E P A F W C S I S Y Y E L N T R V G E T F H A S Q P S I T V D G F T D P S N S E - R F C L G L

Smad2 D.rerio   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 C.auratus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 R.norvegicus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 M.musculus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 D.rerio   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L C L

Smad3 S.scrofa   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 X.laevis   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 H.sapiens   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 M.musculus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R L C L G L

Smad3 C.auratus   D L Q P V T Y C E S A F W C S I S Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N A E - R F C L G L

Smad3 G.gallus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 S.scrofa   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 R.norvegicus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad1 S.mansoni   T M H P V N Y Q E P K Y W C S I V Y Y E L N N R V G E A F N A S Q L S I I I D G F T D P S N N S D R F C L G L

Smad1 M.musculus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 H.sapiens   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 S.scrofa   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 R.norvegicus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 X.tropicalis   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N R N R F C L G L

Smad1 G.gallus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L

Smad1 D.rerio   D V H P V A Y Q E P K H W C S I V Y Y E L N N R V G E A F L A S S T S V L V D G F T D P S N N R N R F C L G L

Smad1 C.coturnix   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L

Smad5 H.sapiens   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L

Smad5 M.musculus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L

Smad5 R.norvegicus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P A N N K S R F C L G L

Smad5 G.gallus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad5 D.rerio   D V Q P V E Y Q E P S H W C S I V Y Y E L N N R V G E A Y H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad8 M.musculus   D F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L

Smad8 R.norvegicus   D F R P V C Y E E P L H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L

Smad8 G.gallus   N F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S I L I D G F T D P S N N K N R F C L G L

50

|

40

|

20

|

30

|

10

|

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N E V V E Q T R R H I G K G V R L Y Y I G G E V F A E C L S D S S I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y D W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D N A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H N F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D T S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K

110

|

100

|

80

|

90

|

70

|

60

|

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L S Q S V S Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y R L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C S L K I F S N Q E F A H - - - - L L S R T V H H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T L R M S F V K G W G A E Y H R Q D V

I P S R C S L K I F N N Q E F A E - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A K Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T L R M S F V K G W G A E Y H R Q D V

I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K V F N N Q L F A Q - - - - L L A Q S V H H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K V F N N Q L F A Q L L A Q L L A Q S V H H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q L F A Q - - - - P L A Q S V N H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

170

|

160

|

140

|

150

|

130

|

120

|

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D R V L T Q M G S P R L P C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P N L R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W V E I H L N G P L Q W L D R V L T Q M G T P R N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E V H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

210

|

200

|

190

|

180

|

Method Predict Specificity Error

AMAS 6 21% 3%

TreeDet 21 52% 21%

SDPpred 12 31% 10%

Page 35: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[35] 23 nov 2006 Anton Feenstra[35] 23 nov 2006 Anton Feenstra[35] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Pos. Sec.str. SH AR BR Interaction

L263 B1’ 0 La Vfm SARA

T267 B1’ 0 Tm Acen SARA

S269 loop 0 CSh Eq ?? (putative)

A272 loop 0 A Kqls ?? (putative)

F273 loop 0 F Hy ?? (putative)

Q284 B2 0 Qt N TR-I

Q294 loop 0.16 Q Sq c-Ski/SnoN

P295 B3 0 P Trl c-Ski/SnoN

L297 B3 0.11 LMi Vi c-Ski/SnoN

T298 B3 0 T Li c-Ski/SnoN

S308 L1 0 Sa N c-Ski/SnoN

– L1 0 – Nsd c-Ski/SnoN

E309 L1 0 E Krs c-Ski/SnoN

A323 H1 0 Ae S ALK1/2

V325 H1 0 V I ?? (putative)

M327 H1 0 LMq N ALK1/2

R334 Loop 0.18 Rk K ?? (putative)

R337 B5 0 R H ?? (putative)

Finding Low-harmony sites in Smad-MH2

27

0 28

0 29

0

D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G

D A A P V M Y H E P A F W C S I S Y Y E L N T R V G E T F H A S Q P S I T V D G

D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G

D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G

D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G

D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G

D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G

D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G

D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G

D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G

D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G

D L Q P V T Y C E S A F W C S I S Y Y E L N Q R V G E T F H A S Q P S L T V D G

D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G

D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G

D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G

T M H P V N Y Q E P K Y W C S I V Y Y E L N N R V G E A F N A S Q L S I I I D G

D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G

D V H P V A Y Q E P K H W C S I V Y Y E L N N R V G E A F L A S S T S V L V D G

D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G

D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q P V E Y Q E P S H W C S I V Y Y E L N N R V G E A Y H A S S T S V L V D G

D F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G

D F R P V C Y E E P L H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G

N F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S I L I D G

30

0

Page 36: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[36] 23 nov 2006 Anton Feenstra[36] 23 nov 2006 Anton Feenstra[36] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Finding Low-harmony sites in Smad-MH2Pos. Sec.str. SH AR BR Interaction

L263 B1’ 0 La Vfm SARA

T267 B1’ 0 Tm Acen SARA

S269 loop 0 CSh Eq ?? (putative)

A272 loop 0 A Kqls ?? (putative)

F273 loop 0 F Hy ?? (putative)

Q284 B2 0 Qt N TR-I

Q294 loop 0.16 Q Sq c-Ski/SnoN

P295 B3 0 P Trl c-Ski/SnoN

L297 B3 0.11 LMi Vi c-Ski/SnoN

T298 B3 0 T Li c-Ski/SnoN

S308 L1 0 Sa N c-Ski/SnoN

– L1 0 – Nsd c-Ski/SnoN

E309 L1 0 E Krs c-Ski/SnoN

A323 H1 0 Ae S ALK1/2

V325 H1 0 V I ?? (putative)

M327 H1 0 LMq N ALK1/2

R334 Loop 0.18 Rk K ?? (putative)

R337 B5 0 R H ?? (putative)

Method Predict Specificity Error

AMAS 6 21% 3%

TreeDet 21 52% 21%

SDPpred 12 31% 10%

Sequence Harmony

(SH=0) 32(SH<0.2)

40

79%

93%

28%

33%

Pirovano, Feenstra & Heringa. “Sequence Comparison by Sequence Harmony Identifies Subtype Specific Functional Sites”, Nucleic Acids Res., in press (2006).www.few.vu.nl/~feenstra/articles/NAR 2006 Sequence Harmony.pdf

Page 37: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[37] 23 nov 2006 Anton Feenstra[37] 23 nov 2006 Anton Feenstra[37] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Smad-MH2: Low Harmony Patches

Page 38: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[38] 23 nov 2006 Anton Feenstra[38] 23 nov 2006 Anton Feenstra[38] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Smad-MH2: Functional Clusters

R462 C463

Q400

R410 W368

Y366

A392

S269

F273

N443

Q294

Q309L297

L440

N381

A354

V461

S460Q407

Q364

P360

R365

T267

A272

I341

P295S308

T298R337F346

P378

Q284

V325

A323R427

M327T430

R334FAST1, Mixer, SARA

c-Ski/SnoN

SARA

TR-I/ALK1/2TR-I/BMPR-I

?SARA/Mixer

TR-I/BMPR-I/ALK1/2

?

receptor-binding

retention & transcription factorsco-repressors

Page 39: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[39] 23 nov 2006 Anton Feenstra[39] 23 nov 2006 Anton Feenstra[39] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Conclusions Smad-MH2

• 40 Sites of Low Sequence Harmony in Smad-MH2 • different between the AR (TGF-) and BR (BMP) sub-type Smads

• Low Harmony sites in Smad-MH2 are functionally relevant

• Other methods cannot select all known sites!

Functional Sites are Interaction Surfaces on Protein Surface: Next: Analyze Interaction Partners in the Pathway

• 14 Low Harmony Sites in Smad-MH2 of unknown function• 11 putative functions from structural considerations

• promising candidates that determine TGF-/BMP specificity

• confirm (or rebuke) putative functions?

Page 40: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[40] 23 nov 2006 Anton Feenstra[40] 23 nov 2006 Anton Feenstra[40] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Sequence Harmony Webserver

http://www.ibi.vu.nl/programs/seqharmwww1-b/

Page 41: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[41] 23 nov 2006 Anton Feenstra[41] 23 nov 2006 Anton Feenstra[41] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Sequence Harmony Webserver: Groups

Page 42: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[42] 23 nov 2006 Anton Feenstra[42] 23 nov 2006 Anton Feenstra[42] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Sequence Harmony Webserver: Reference

Page 43: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[43] 23 nov 2006 Anton Feenstra[43] 23 nov 2006 Anton Feenstra[43] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Sequence Harmony Webserver: Structure

Page 44: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[44] 23 nov 2006 Anton Feenstra[44] 23 nov 2006 Anton Feenstra[44] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

SMAD Sequence Harmony: Raw Table

Page 45: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[45] 23 nov 2006 Anton Feenstra[45] 23 nov 2006 Anton Feenstra[45] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

SMAD Sequence Harmony: Results

Page 46: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Anton Feenstra 23 nov 2006 Sequence Entropy Sequence Analysis

[46] 23 nov 2006 Anton Feenstra[46] 23 nov 2006 Anton Feenstra[46] 23 nov 2006 Anton Feenstra

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

SMAD Sequence Harmony: Structure