82
Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Master in Bioinformatics UPF 2017-2018

Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entropy and Information

Eduardo Eyras Computational Genomics

Pompeu Fabra University - ICREA Barcelona, Spain

Master in Bioinformatics UPF 2017-2018

Page 2: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

What are the best variables to describe our model?

Feature/attribute selection Can we compare two probabilistic models?

Is our model informative? (different from random)

Which model is more informative?

Page 3: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

E.g. see “The Information”, by James Gleick

We can use information

Page 4: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Anystringofnucleo0descanthenbeexpressedasstringsof0and1,butalways2bitspersymbol.E.g.considerthefollowingsequence:

A = 00 , C = 01 , G =10 , T =11

Messages(e.g.nucleo0desequences)canbeencodedindifferentwaystotransmitamessageE.g.Binaryencoding:Thebit(binarydigit)isavariablethatcanassumethevalue0or1.Considera2-bitencodingofthenucleo0des

ACATGAAC=0001001110000001

Wehaveused16binarydigitstoencode8symbols,thus2bitspersymbol.Thatistheexpectednumberofbitspersymbolsusingthisencoding.

However,thisassumesthatallnucleo0desareequallyprobable.Thiswouldnotbeanop0malencodingifinoursequenceswefindmorefrequentlyoneofthegivensymbols,e.g.A.

Information

Page 5: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

ConsideradiscreterandomvariableXwithpossiblevalues{x1,…,xn}TheShannonself-Informa/onofanoutcomeisdefinedas:

I (xi) = − log2 P(xi)

ItismeasuredinbitsItisalsocalledtheSurprisalofXforagivenvaluexiIfP(xi)isquitelow,I(xi) wouldbeveryhigh(highlysurprisedtoseexi)IfP(xi)iscloseto1,I(xi)isalmostzero(wearenotsurprisedtoseexiatall)I(xi) measurestheop0mallengthcodeassignedtoasymbolxiofprobabilityP(xi)

f (x) = − log2 x

Information

Page 6: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Considerasequencewherethenucleo0desappearwiththefollowingprobabili0es:

P(A) =12

, P(C) =14

, P(G) =18

, P(T) =18

,

− log2 P(A) =1 bit , − log2 P(C) = 2 bit , − log2 P(G) = 3 bits , − log2 P(T ) = 3 bits ,

A =1 , C = 01 , G = 000 , T = 001Consideringthecorrespondingrecoding

NowwehaveACATGAAC=10110010001101Wehaveused14binarydigitstoencode8symbols,hence14/8=1.75bitspersymbol.Weobtainalowerexpectednumberofbitspersymbol.

Accordingtoinforma0ontheory,theop0mal-lengthencodingis:

Weshoulduseanencodingsuchthatthemorefrequentthesymbol,thelessbitsweuseforthesymbol.Theaverageencodinglengthwouldbethenminimized.

Information

Page 7: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Expectedvalues

TheexpectedvalueofarandomvariableXthattakesonnumericalvaluesxisdefinedas:

E[X] = xii=1

n

∑ P(xi)

WhichisthesamethingasthemeanWecanalsocalculatetheexpectedvalueofafunc0onofarandomvariable:

E[g(X)]= g(xi)P(xi)i=1

n

Page 8: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entropy

Considerastringofvaluestakenfrom{x1,…,xN}suchthateachvaluexiappearni0mes,and

M = nii=1

N

Theaveragenumberofbitspersymboltoencodethemessageiswri[enas:

1M

niI(xi)i=1

N

I(xi ) = − log2 P(xi )Where

1M

niI (xi)i=1

N

∑ =niMI (xi)

i=1

N

∑ M →∞% → % % P(xi)I (xi) =

i=1

N

∑ − P(xi) log2 P(xi)i=1

N

Theaveragenumberofbitspersymbolconvergestotheexpectedvalueofthesurprisalforalargenumberofobserva0ons.

Entropy

Page 9: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

H (X) = − P(xi )i=1

N

∑ log2 P(xi )

ThustheEntropyisdefinedastheexpected(average)numberofbitspersymbolneededtoencodeastringofsymbolsxidrawnfromasetofpossibleones{x1,…,xn}

Entropy

Thelogistakeninbase2andtheentropyismeasuredinbitsEntropyisalsoameasureofuncertaintyassociatedwithavariable...

Page 10: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Theentropyisminimalifweareabsolutelycertainabouttheoutcomeofavaluefromthedistribu0on,ForinstanceP(xj)=1foraspecificj,andP(xi) = 0forxi ≠ xj

H (X) = − P(xi )i=1

N

∑ log2 P(xi ) = − log2 P(x j ) = 0

lim x log x =x→00

P(x j ) =1

Entropyanduncertainty

Intui0vely:wedonotneedanyinforma0ontoknowaboutthetransmi[edmessage

Page 11: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entropyachievesitsuniquemaximumfortheuniformdistribu0on

Intui0vely:underequiprobability,theShannoninforma0onforacharacterinanalphabetofsizeNisI= log2(N),i.e.wemustwaitforallbinarybitstoknowthetransmi[edmessage

Thatis,theentropyismaximalwhenalloutcomesareequallyprobable,orequivalently,whenwearemaximallyuncertainabouttheoutcomeofarandomvariable:

H (X) = − P(xi )i=1

N

∑ log2 P(xi ) = −N1Nlog2

1N

#

$%

&

'(= log2 N€

P(xi) =1N

ThemaximalentropydependsonlyonthenumberofsymbolsN

Entropyanduncertainty

Page 12: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entr

opy(S

)

1.0

0.5

0.0 0.5 1.0

p+

For N=2:

H (X) = −p log2 p− q log2 q= −p log2 p− (1− p)log2(1− p)€

0 < p,q <1p+ q =1

H(X)

p

Entropyanduncertainty

Page 13: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Ifyouaretoldtheoutcomeofanevent,theuncertaintyisreducedfromHtozero.Areduc0onofuncertaintyisequivalenttoanincreaseofinforma)on.(anincreaseoftheentropyalwaysimplieslossofinforma0on)We define informa)on content as the reduc0on of the uncertainty a]er somemessagehasbeenreceived.Thatis,thechangeinentropy:

Informa0oncontent

Ic (X) = Hbefore −Hafter

Page 14: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Ic (X) = Hbefore −Hafter = log2 N + P(xi)i=1

N

∑ logP(xi)

Ifwestartfrommaximaluncertainty:

Note that the uncertainty isnot necessarily reduced tozero

H (X) = − P(xi )i=1

N

∑ logP(xi )

Ic (X) = Hbefore −Hafter = log2 N The maximum information is log N

Whatisthemaximalreduc0onofentropy(maximumInforma0oncontent):

Hbefore = log2 NHafter = 0

Informa0oncontent

Page 15: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

ATG GT AG GT AG... ... ...startcodon stopcodondonorsite donorsiteacceptorsite acceptorsite

exon exon exonintronintronTGA

Stop codons

TGA 50%TAA 25%TAG 25%

1 2 3 4 5 6 7 …

Pos P(n) Ic

1 P(a) = 0.25

5 P(T)=16 P(G)=P(A)=0.57 P(A)=0.75, P(G)=0.25

Informa0oncontent

Page 16: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Ic (X) = Hbefore −Hafter

= log2 N + P(xi )i=1

N

∑ log2 P(xi )

Themoreconservedtheposi0on,thehighertheinforma0oncontent.

ATG GT AG GT AG... ... ...startcodon stopcodondonorsite donorsiteacceptorsite acceptorsite

exon exon exonintronintronTGA

Pos P(n) Ic

1 P(a) = 0.25 0

5 P(T)=1 2

6 P(G)=P(A)=0.5 1

7 P(A)=0.75, P(G)=0.25 1.18

1 2 3 4 5 6 7 …

Informa0oncontent

Page 17: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Informa0on content of each posi0on can be represented graphically usingsequencelogos:h[p://weblogo.berkeley.edu/

Frequencyplot:theheightofeachle[erispropor0onaltoitsfrequencyinthatposi0on

Informa0oncontentplot:theheightofthecolumnisle[erispropor0onaltotheInforma0onatthatposi0on

invariant informative

Almost random

Informa0oncontent

Page 18: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

CTGAGGTAGATTGACATAGTGTGAGCTAAATTGACATAAT

Exercise: Consider the following 5 positions in a set of sequences

12345

Each position i=1,2,3,4,5, can be considered to correspond to a probabilistic model Pi(X) on the nucleotides Calculate the entropy for each one of the positions. What is the maximum possible value of the entropy? Can you extract any information for each position from the entropy values?

Page 19: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entropy-based measures

Page 20: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

GivenvariablesX,Y,takingNpossiblevalues,theirjointEntropyisdefinedas:

JointEntropy

H (X,Y ) = − P(x, y)log2 P(x, y)y∑

x∑

H(X,Y ) = H(X) +H(Y )

Entropyisaddi0veforindependentvariables:

P(X,Y ) = P(X)P(Y ) è

H (X,Y ) = − P(x, y)log2 P(x, y)y∑

x∑

= − P(x)P(y) log2 P(x)+ log2 P(y)( )y∑

x∑

= − P(y)y∑ P(x)log2 P(x)−

x∑ P(x)

x∑ P(y)log2 P(y)

y∑

= − P(x)log2 P(x)−x∑ P(y)log2 P(y)

y∑

If

Proof:

Page 21: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Conditional Entropy

It quantifies the amount of information related to Y given that X is known:

H (Y | X) = − P(x, y)log2P(x, y)P(x)y

∑x∑

The entropy of Y conditioned to X is defined as:

Similarly, the entropy of Y conditioned to X is

H (X |Y ) = − P(x, y)log2P(x, y)P(y)y

∑x∑

1) If the value of Y is completely determined by the value of X ⇒ H (Y | X) = 0

⇒ H (Y | X) = H (X)2) If Y and X are independent

Exercise:

Using the definition of condition entropy, show that:

Page 22: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

The chain rule:

H (X,Y )−H (Y ) = H (X |Y )

H (X,Y )−H (X) = H (Y | X)

(a special case of this is

H(X,Y ) = H(X) +H(Y ) when they are independent)

and

H (Y | X) = − P(x, y)log2y∑

x∑ P(x, y)

P(x)

= − P(x, y)log2y∑

x∑ P(x, y)+ P(x, y)log2

y∑

x∑ P(x)

= H (X,Y )+ P(x)log2x∑ P(x)

= H (X,Y )−H (X)P(x) = P(x, y)

y∑

Definition of Joint Entropy

The relation between joint and conditional entropy

Proof:

Page 23: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Other properties:

H (X,Y ) ≥max H (X),H (Y ){ }H (X,Y ) ≤ H (X)+H (Y )

H (X1,...,XN ) ≥max H (X1),...,H (XN ){ }H (X1,...,XN ) ≤ H (X1)+...+H (XN )

The joint entropy is always larger (or equal) than the individual entropies. and it is always smaller (or equal) than the sum of the individual entropies:

The same holds for any number of variables:

The equality only holds when the variables are independent. So the difference is due to the dependencies between the variables, which can be measured with the Mutual Information….

Page 24: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Mutualinforma0on

MI (X,Y ) = P(x,y) log P(x,y)P(x)P(y)y

∑x∑

MI(X,Y ) = H (X)+H (Y )−H (H,Y )

= − P(x)log2 P(x)x∑ − P(y)log2 P(y)

y∑ − P(x, y)log2 P(x, y)

y∑

x∑

= − P(x, y)y∑ log2 P(x)

x∑ − P(y, x)

x∑ log2 P(y)

y∑ + P(x, y)log2 P(x, y)

y∑

x∑

= P(x, y)log2P(x, y)P(x)P(y)y

∑x∑

The mutual information describes the difference between the individual Entropies of two variables and their joint entropy:

MI(X,Y ) = H (X)+H (Y )−H (H,Y )

Page 25: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Mutualinforma0on

Mutualinforma0onmeasuresthedependenciesbetweentwovariables

MI(X,Y ) = P(x, y)log2P(x, y)P(x)P(y)y

∑x∑

MI(X,Y)measurestheinforma0oninXthatissharedwithYIfXandYareindependentMItakesvaluezero(knowingonedoesnothelpknowingtheother)MIissymmetricMI(X,Y) = MI(Y,X)Ifthe2variablesareiden0cal,knowingonedoesnotaddtotheother,henceMIisequaltotheEntropyofasinglevariable.e.g.XandYtakeasvaluesthenucleo0desintwodifferentposi0ons,andthesumiscarriedoutoverthealphabetofnucleo0des.Posi0onsXandYdonotneedtobecon0guous.Noteasytoextendtomorethan2posi0ons

H (X)+H (Y ) = H (H,Y )

Page 26: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Using H (X,Y )−H (Y ) = H (X |Y )

H (X,Y )−H (X) = H (Y | X)or

The relation between joint and conditional entropy

MI(X,Y ) = H (X)+H (Y )−H (X,Y )

MI(X,Y ) = H (X)−H (X |Y )

MI(X,Y ) = H (Y )−H (Y | X)

We can rewrite the Mutual Information

In terms of the conditional entropy

Page 27: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Mutualinforma0on

Joint entropy: Entropy given by both variables

Once we know Y, The rest of the entropy in X

The contribution from both together:

MI(X,Y ) = H (X,Y )−H (X |Y )−H (Y | X)

H (X,Y ) =MI(X,Y )+H (X |Y )+H (Y | X)

H(X|Y) H(Y|X) MI(X,Y)

H(X) H(Y)

H (X,Y ) = H (X |Y )+H (Y ) = H (Y | X)+H (X)

Page 28: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Consider the following multiple alignment (with just two symbols A and B). Consider the positions X and Y

Exercise:

Calculate:(a) H(X), H(Y ). (b) H(X|Y ), H(Y |X) (c) H(X,Y) (d) H(Y)−H(Y|X) (e) MI(X,Y)

Mutual information MI(X,Y ) = P(x, y)log2P(x, y)P(x)P(y)y

∑x∑

Joint Entropy H (X,Y ) = − P(x, y)log2 P(x, y)y∑

x∑

H (Y | X) = − P(x, y)log2P(x, y)P(x)y

∑x∑Conditional entropy:

Entropy H (X) = − P(x)log2 P(x)x∑

A B A BA A A BA B B AA A B BB B B AB A B A

X Z Y W

Recall:

Page 29: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Exercise:

A B A BA A A BA B B AA A B BB B B AB A B A

P(X=A) = 2/3 P(X=A,Y=B) = 1/3

H (X,Y ) = − P(x, y)log2 P(x, y)y={A,B}∑

x={A,B}∑

= −P(A,A)log2 P(A,A)−P(A,B)log2 P(B,A)+−P(B,A)log2 P(B,A)−P(B,B)log2 P(B,B)

Here P(B,A) means P(X=B,Y=A), etc…

X Z Y W

Consider the following multiple alignment (with just two symbols A and B). Consider the positions X and Y

Page 30: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

log2 L = log2P(x)Q(x)

DKL (P ||Q) = E(log2 L) = P(xi )log2P(xi )Q(xi )i=1

n

Alsocalledtherela/veentropy,istheexpectedvalueofthelog-rateoftwodistribu0ons

Therela0veentropyisdefinedfortwoprobabilitydistribu0onsthattakevaluesoverthesamealphabet(samesymbols)

Kullback-Leibler Divergence of two distributions

DKL (P ||Q) = P(x)log2P(x)Q(x)x

Page 31: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Kullback-Leibler Divergence of two distributions

DKL (P ||Q) = P(x)log2P(x)Q(x)x

Therela0veentropyisnotadistance,butmeasureshowdifferenttwodistribu/onsareThevalueisnevernega0ve.Itiszerowhenthe2distribu0onsareiden0calDKL (P ||Q) ≥ 0 with “=0 “ for P=Q

DKL (P ||Q) ≠ DKL (P ||Q) It is not symmetric

The relative entropy provides a measure of the information content gained with the distribution P with respect to the distribution Q. Its applications are similar to those of the Information Content

Page 32: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Consider two discrete probability distributions P and Q, such that Show that the relative entropy DKL (P||Q) is equivalent to the information content of P when the distribution Q is uniform.

Q(xi ) =1i∑P(xi ) =1

i∑

Exercise: (exam 2013)

and

Page 33: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Jensen-Shannondivergence

JS(P,Q) = 12DKL (P ||M )+

12DKL (Q ||M ) M =

12(P +Q)

Provides another way of measuring the similarity of two probability distributions.

Page 34: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Jensen-Shannondivergence

M =12(P +Q)DKL (P ||Q) = P(x)log P(x)

Q(x)x∑

JS(P,Q) = H P +Q2

!

"#

$

%&−12H (P)− 1

2H (Q)

JS(X1,...,XN ) = HX1 +...+ XN

N!

"#

$

%&−1N

H (X1)+...+H (XN )( )

You can generalize this to N variables (distributions):

JS(P,Q) = 12

P(x)log22P(x)

P(x)+Q(x)x∑ +

12

Q(x)log 2Q(x)P(x)+Q(x)x

= −P(x)+Q(x)

2log2

P(x)+Q(x)2x

∑ +12

P(x)log2 P(x)+x∑ 1

2Q(x)log2Q(x)

x∑

= H P +Q2

#

$%

&

'(−12H (P)− 1

2H (Q)

JS(P,Q) = 12DKL (P ||M )+

12DKL (Q ||M )

Page 35: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Jensen-Shannondivergence

JS(X,Y) issymmetric JS(X,Y ) = JS(Y,X)

JS(X,Y ) ≥ 0 JS(X,Y ) = 0⇔ X =YIt is non-negative

d(X,Y ) = JS(X,Y )

with

The square-root is a metric (= a distance) and distributes normally

d(X,Y ) ≥ 0d(X,Y ) = 0⇔ X =Yd(X,Y ) = d(Y,X)d(X,Y ) ≤ d(X,Z )+ d(Z,Y )

Properties of a metric

JS(P,Q) = 12DKL (P ||M )+

12DKL (Q ||M )

0 ≤ d(X,Y ) ≤1s(X,Y ) =1− d(X,Y )

Page 36: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Example:

Gene 1 Gene 2 Gene 3 Gene 4 Gene 5

Sample 1 4,00 3,00 2,00 1,00 0,10 Sample 2 0,10 1,00 2,00 3,00 4,00 Sample 3 5,00 2,00 5,00 1,00 3,00 Sample 4 2,00 2,00 2,00 2,00 2,00

Gene 1 Gene 2 Gene 3 Gene 4 Gene 5

Sample 1 0,40 0,30 0,20 0,10 0,01 Sample 2 0,01 0,10 0,20 0,30 0,40 Sample 3 0,31 0,13 0,31 0,06 0,19 Sample 4 0,20 0,20 0,20 0,20 0,20

Normalize gene expression per sample: P(sample = s,gene = g) =e(s,g)e(s,g ')

g '∑

JS divergence can be used to compute a dissimilarity between distributions

See: Berretta R, Moscato P. Cancer biomarker discovery: the entropic hallmark. PLoS One. 2010 Aug 18;5(8):e12262.

≈ 4.00 / 10.10

Page 37: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Gene 1 Gene 2 Gene 3 Gene 4 Gene 5

Sample 1 0,40 0,30 0,20 0,10 0,01 Sample 2 0,01 0,10 0,20 0,30 0,40 Sample 3 0,31 0,13 0,31 0,06 0,19 Sample 4 0,20 0,20 0,20 0,20 0,20

Example:

What samples are most similar? And most different?

Page 38: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Example:

H H/log2(5)

Sample 1 1,91 0,82

Sample 2 1,91 0,82

Sample 3 2,13 0,92

Sample 4 2,32 1,00

H (s) = P(g)log2 P(g)g∑

P(g)

Normalized to 1

Sample 1 and 2 have same Entropy but different gene expression profiles

Entropy describes how expression is distributed, but it is not a good measure of distance/similarity

Page 39: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

JS(1, 2) = H P1 +P22

!

"#

$

%&−12H (P1)−

12H (P2 )

Example:

Sample 1 Sample 2 Sample 3 Sample 4 Sample 1 0 0,28 0,07 0,82Sample 2 0,28 0 0,28 0,82Sample 3 0,07 0,28 0 0,03Sample 4 0,82 0,82 0,03 0

P(g)

The closest expression profiles are sample 3 and sample 4 The most distant ones are sample 1 (or 2) and sample 4

Page 40: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Merkin J, Russell C, Chen P, Burge CB. Evolutionary dynamics of gene and isoform regulation in Mammalian tissues. Science. 2012 Dec 21;338(6114):1593-9.

The JS-divergence (square-root) has been used before to establish the similarities between expression patterns in tissues from different species:

Page 41: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Examples of application of JSD in tissue specific expression

Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011 Sep 15;25(18):1915-27.

Page 42: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entropy and Classification

Page 43: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entropycanbeinterpretedasameasureofthehomogeneityofexamplesaccordingtotheclassifica0onvalues.

ConsiderSisasampleoftrainingexamplesinabinaryclassifica0onproblemwithp+thepropor0onofposi0vecasesp-thepropor0onofnega0vecases

H (S) = −p+ log2 p+ − p− log2 p−

Entropyandclassifica0on

Entr

opy(S

)

1.0

0.5

0.0 0.5 1.0

p+

p+

H(S)

EntropymeasurestheimpurityofthesetS:

H~0, mostly 1 class H~Hmax, random mixture of classes

Page 44: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Consider a collection of 14 examples (with 4 attributes) for a boolean classification (PlayTennis = yes/no)

Entropyandclassifica0on

Page 45: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entropyandclassifica0on

Theentropyofthisclassifica0onis(yes=9,no=5)

H (S) = − 914log2

914

−514log2

514

= 0.940

Recall:Theentropyisminimal(zero)ifallmembersbelongtothesameclassTheentropyismaximal(=log2(N))ifthereisanequalnumberofmembersin

eachclass

Generally,iftheclassifica0oncantakeNdifferentvalues:

H (S) = − P(si )i=1

N

∑ log2 P(si )

WhereP(si)isthepropor0onofcasesinSbelongingtotheclasssi

Page 46: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entropyandclassifica0on

Wecanmeasuretheeffec0venessofana[ributeinclassifyingthetrainingdataasthe“informa0ongain”:Informa)ongainofana[ributeArela0vetoacollec0onSisdefinedasthemutualinforma0onofthecollec0onandthea[ribute:

IG(S,A) =MI(S,A) = H (S)−H (S | A)

IG(S,A)measureshowmuchinforma0onwegainintheclassifica0onbyknowingthevalueofa[ributeA.IG(S,A)istheexpectedreduc0oninentropycausedbypar00oningtheexamplesaccordingtoonea[ribute.

Page 47: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entropyandclassifica0on

IG(S,A) =MI(S,A) = H (S)−H (S | A)Informa)ongain

H (S) = − P(s)log2 P(s)s={classes}∑ Thetotalentropyofthesystemaccordingtotheclasses

H (S | A) = − P(s,a)log2P(s,a)P(a)s={classes}

∑a={values}∑ = − P(a)

a={values}∑ P(s | a)log2 P(s | a)

s={classes}∑

The proportion of examples for each value of attribute A

Entropy according to the classes restricted to a specific value of attribute A

Using

Page 48: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entropyandclassifica0on

IG(S,A) =MI(S,A) = H (S)−H (S | A)Informa)ongain

H (S) = − P(s)log2 P(s)s={classes}∑ Thetotalentropyofthesystemaccordingtotheclasses

H (S | A) = − P(s,a)log2P(s,a)P(a)s={classes}

∑a={values}∑ = − P(a)

a={values}∑ P(s | a)log2 P(s | a)

s={classes}∑

The proportion of examples for each value of attribute A

Entropy according to the classes restricted to a specific value of attribute A

MI(S,A) = − P(s)log2 P(s)s={classes}∑ + P(a)

a={values}∑ P(s | a)log2 P(s | a)

s={classes}∑ = H (S)− | Sa |

| S |H (Sa )

a={values}∑

We can rewrite it as:

Using

Sa = {s∈ S | A(s) = a}Saisthesubsetofthecollec0onSforwhicha[ributeAhasvaluea:

IG(S,A) = H (S)− | Sa || S |a∈Values(A)

∑ H (Sa )

Page 49: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entropyandclassifica0on

H(S)isthetotalentropyofthesystem.Values(A)isthesetofallpossiblevaluesfora[ributeA(e.g.:Outlook={rain,overcast,sunny})Saisthesubsetofthecollec0onSforwhicha[ributeAhasvaluea:

Sa = {s∈ S | A(s) = a}

|Sa|/|S|isthefrac0onfromthecollec0onforwhicha[ributeAhasvaluea

IG(S,A) =MI(S,A) = H (S)−H (S | A) = H (S)− | Sa || S |a∈Values(A)

∑ H (Sa )

Page 50: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entropyandclassifica0on

IG(S,A) =MI(S,A) = H (S)−H (S | A) = H (S)− | Sa || S |a∈Values(A)

∑ H (Sa )

H(Sa ) = − P(si | a)i=1

N

∑ logP(si | a)

Thesecondtermcontainstheentropyoftheelementswithagivenvalueofa[ributeA:

withP(s|a) thepropor0onofcaseswithvalueforA=aandclassifiedinclasss.

IG(S,A) = H (S)− | Sa || S |a∈Values(A)

∑ H (Sa )

sumoftheentropiesofeachsubsetSaweightedbythefrac0onofcases

IG(S,A)istheinforma0on(reduc0oninentropy)providedbyknowingthevalueofana[ribute(weightedbythepropor0onsofthea[ributes)

where

Page 51: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

IG(S,A) =MI(S,A) = H (S)−H (S | A)

GR(S,A) = MI(S,A)H (A)

SU(S,A) = 2MI(S,A)H (S)+H (A)

Information Gain (IG) is defined as the mutual information between the group labels of the training set S and the values of a feature (or attribute) A

Gain Ratio (GR) is the mutual information of the group labels and the attribute, normalized by the entropy contribution from the proportions of the samples according to the partitioning by the attribute:

Symmetrical Uncertainty (SU) provides a symmetric measurement of feature correlation with the labels and it compensates possible biases from the other two measures:

See: Hall M. Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning. 2000 ICML’00 Proceedings of the Seventeenth International Conference of Machine Learning, pages 359-366.

Entropyandclassifica0on

Page 52: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Example

Page 53: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entropyandclassifica0on

Consider a collection of 14 examples S: [9+,5-] (9 Yes, 5 No)

Page 54: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entropyandclassifica0on

Values(Wind)={strong,weak} S = [9+,5-] S(weak) = [6+,2-] S(strong) = [3+,3-]

Page 55: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entropyandclassifica0on

IG(S,wind) = H (S)− | Sv || S |v∈{weak,strong}

∑ H (Sv )

= H (S)− 814H (Sweak )−

614H (Sstrong ) = 0.940− 8

140.811− 6

141.00 = 0.048

H (S) = − P(si )i=1

N

∑ log2 P(si )

IG(S,humidity) = H (S)− | Sv || S |v∈{high,normal}

∑ H (Sv )

= H (S)− 714H (Shigh )− 7

14H (Snormal )

= 0.940− 714

0.985− 714

0.592

= 0.151

Whataretheimplica0onsofthis?

IG(S,humidity) >IG(S,wind)

Page 56: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entropyandclassifica0on

Entropy=0.985 Entropy=0.592 Entropy=0.811 Entropy=1

Entropy Entropy

“Humidity”providesgreaterinforma0ongainthanwindrela0vetothetargetclassifica0on(yes/no).

Page 57: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Entropyandclassifica0on

Entropy=0.985 Entropy=0.592 Entropy=0.811 Entropy=1

“Humidity”providesgreaterinforma0ongainthanwindrela0vetothetargetclassifica0on(yes/no).ABribute“Humidity”isbeBerclassifier(Ifweonlyuse“Humidity”toclassify,weareclosertothetargetclassifica0on(yes/no)

Entropy Entropy

[6+,1-]Entropy=0.592

Page 58: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Which attribute would be the best classifier if tested alone?

IG(S,outlook) = 0.246IG(S,humidity) = 0.151IG(S,wind) = 0.048IG(S, temperature) = 0.029

Outlook performs the best prediction of the target value “play tennis”

Entropyandclassifica0on

Page 59: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Everyexample“overcast”islabeledasyes->leafnodewithclassifica0on“yes”

Entropyandclassifica0on

Page 60: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Theotherdescendants(sunnyandrain)s0llhavenon-zeroentropy->con0nuedownthesenodes

Everyexample“overcast”islabeledasyes->leafnodewithclassifica0on“yes”

Whicha[ributeshouldbetestedhere?

Entropyandclassifica0on

Page 61: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Theotherdescendants(sunnyandrain)s0llhavenon-zeroentropy->con0nuedownthesenodes

Everyexample“overcast”islabeledasyes->leafnodewithclassifica0on“yes”

Entropyandclassifica0on

IG(Ssunny,humidity) = 0.970−350.0− 2

50.0 = 0.970

IG(Ssunny, temperature) = 0.970−250.0− 2

51.0 = 0.570

IG(Ssunny,wind) = 0.970−251.0− 3

50.918 = 0.19

Ssunny = {D1,D2,D8,D9,D11}

Page 62: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Incorporating continuous-valued attributes

We have used also attributes with discrete values (e.g. Wind=weak, strong) We can dynamically define discrete valued attributes by partitioning the continuous attribute values A into a discrete set of intervals. The then define a new boolean attribute Ac that is true if A < c and false otherwise. Consider:

Temperature (C°)

5 10 15 20 25 30

PlayTennis No No Yes Yes Yes No

Pick a threshold that produces the largest information gain: Sort examples according to the attribute Select to test only adjacent examples with different target classification choose the boundary with largest information gain

Page 63: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Incorporating continuous-valued attributes

Temperature (C°)

5 10 15 20 25 30

PlayTennis No No Yes Yes Yes No

Pick a threshold that produces the largest information gain: Sort examples according to the attribute Select to test only adjacent examples with different target classification choose the boundary with largest information gain. This dynamically creates a boolean attribute: An alternative is to use multiple (discrete) intervals

10 +152

=12.5⇒ temperature >12.5

25 + 302

= 27.5⇒ temperature > 27.5

temperature >12.5

Page 64: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Incorporating continuous-valued attributes

Temperature (C°)

5 10 15 20 25 30

PlayTennis No No Yes Yes Yes No

Equivalently, pick the point a0 that produces the minimum entropy after separating the attribute values by this threshold:

| Sa<a0

|| S |

H (Sa<a0)+

| Sa>a0|

| S |H (Sa>a0

)

IG(S,A) = H (S)− | Sa || S |a∈Values(A)

∑ H (Sa )

That is, we minimize the right side of the IG definition:

See e.g. Fayyad, U, and Keki I. (1993) "Multi-interval discretization of continuous-valued attributes for classification learning." (1993).Proceedings of the thirteen joint conference of Artificial Intelligence, pages 1022-1027. Morgan Kaufmann

Page 65: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Outlook

Overcast

Humidity

NormalHigh

No Yes

Wind

Strong Weak

No Yes

Yes

RainSunny

Entropyandclassifica0on

Informa0onGainallowsyoutofindthea[ributes(variables)thataremostinforma0vetotest/measurefortheclassifica0onproblem.

Page 66: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Repea0ngthisprocessallowsyoutobuildatree.Anya[ributecanappearonlyoncealonganypathinthetreeThisprocesscon0nuesun0leither1)  Everya[ributehasalreadybeenincludedalongthispath2)  Alltrainingexampleshavethesametargetvalue(entropyiszero)

Outlook

Overcast

Humidity

NormalHigh

No Yes

Wind

Strong Weak

No Yes

Yes

RainSunny

Entropyandclassifica0on

Page 67: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Outlook

Overcast

Humidity

NormalHigh

No Yes

Wind

Strong Weak

No Yes

Yes

RainSunny

Decision trees

Decision nodes: specifies a test on a single attribute

One branch for each outcome of the test

Leaf nodes: value of a target (classification: take in general the most probable classification)

A decision tree is used to classify a new instance (example) by starting from the root of the tree and moving down through it until a leaf node is reached

Page 68: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Decision trees

1) Decision trees are best suited for instances that are represented by attribute – value pairs. E.g. attribute =temp, Values(temp)={hot, mild, cold} 2) Target classification should take discrete values (2 or more): e.g. “yes”/”no”, although can be extended to real value outputs. 3) Decision trees represent naturally disjunction of conjunctions descriptions: E.g. I play tennis if …

(outlook=sunny AND humidity=normal) OR (outlook=overcast) OR (outlook=rain AND wind=weak)

4) Decision trees are robust to errors in the training data 5) Decision trees can be used even when there is some missing data

Page 69: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Overfitting

We build the tree deep enough to perfectly classify the training examples Too few examples (or too noisy) may cause overfitting There is overfitting when we add new training data that makes the model reproduce perfectly the training data but at the cost of performing worse or being not valid for new cases. This can be detected using cross-validation

Page 70: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

To avoid overfitting: 1)  Stop growing tree before it reaches perfection 2)  Fully grow the tree, and then post-prune some branches

Consider one extra (noisy) example: (outlook=sunny, temperature=hot, humidity=normal, wind=strong, play tennis=no) How does it affect our earlier tree?

Overfitting

Outlook

Overcast

Humidity

NormalHigh

No Yes

Wind

Strong Weak

No Yes

Yes

RainSunnyThe new tree would fit the training data perfectly well but the earlier tree will perform better in general for new examples

Page 71: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

ID3 Algorithm Consider a classification of examples with two class values: “+” or “-”

Page 72: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

ID3 Algorithm ID3 (Examples S, Target_labels (classes), attributes)

Page 73: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

ID3 Algorithm ID3 (Examples S, Target_labels (classes), attributes) Create a root node for the tree If all examples are positive

return single-node tree with label “+” Else If all examples are negative

return single-node tree with label “-” Else If attribute is empty

return single-node tree with most common label in Examples Else…

First we deal with the extreme cases

Page 74: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

ID3 Algorithm ID3 (Examples S, Target_labels (classes), attributes) Create a root node for the tree If all examples are positive

return single-node tree with label “+” Else If all examples are negative

return single-node tree with label “-” Else If attribute is empty

return single-node tree with most common label in Examples Else

pick the attribute A that best classifies Examples (maximizes Gain(S,A) ) assign attribute A to root of the tree For each value a of A add a new tree branch below corresponding to the test A=v consider Examples(a) the subset of Examples that have value a for A if Examples(a) empty add a leaf node with most common label from Examples Else add the subtree ID3(Examples(a), target_labels(a), attributes – A)

End Return tree

Page 75: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

ID3 Algorithm ID3 (Examples S, Target_labels (classes), attributes) Create a root node for the tree If all examples are positive

return single-node tree with label “+” Else If all examples are negative

return single-node tree with label “-” Else If attribute is empty

return single-node tree with most common label in Examples Else

pick the attribute A that best classifies Examples (maximizes Gain(S,A) ) assign attribute A to root of the tree For each value a of A add a new tree branch below corresponding to the test A=a consider Examples(a) the subset of Examples that have value v for A if Examples(a) empty add a leaf node with most common label from Examples Else add the subtree ID3(Examples(v), target_labels(v), attributes – A)

End Return tree

Prior probability of the classification

Page 76: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

ID3 Algorithm ID3 (Examples S, Target_labels (classes), attributes) Create a root node for the tree If all examples are positive

return single-node tree with label “+” Else If all examples are negative

return single-node tree with label “-” Else If attribute is empty

return single-node tree with most common label in Examples Else

pick the attribute A that best classifies Examples (maximizes Gain(S,A) ) assign attribute A to root of the tree For each value a of A add a new tree branch below corresponding to the test A=a consider Examples(a) the subset of Examples that have value a for A if Examples(a) empty add a leaf node with most common label from Examples Else add the subtree ID3(Examples(a), target_labels(v), attributes – A)

End Return tree

If we run out of attributes and entropy is non-zero, choose the most common target_label of these subset of examples

Page 77: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

ID3 Algorithm

One of the attribute values does not appear in the subpopulation: we choose a default, which is the most common target label over the entire tree (the most probable label)

ID3 (Examples S, Target_labels (classes), attributes) Create a root node for the tree If all examples are positive

return single-node tree with label “+” Else If all examples are negative

return single-node tree with label “-” Else If attribute is empty

return single-node tree with most common label in Examples Else

pick the attribute A that best classifies Examples (maximizes Gain(S,A) ) assign attribute A to root of the tree For each value a of A add a new tree branch below corresponding to the test A=a consider Examples(a) the subset of Examples that have value a for A if Examples(a) empty add a leaf node with most common label from Examples Else add the subtree ID3(Examples(a), target_labels(v), attributes – A)

End Return tree

Page 78: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Decision trees

Example: decision tree to predict protein-protein interactions

Each data point is a gene-pair (e.g. A-B) associated with some attributes Some attributes take on real values (e.g. Genomic distance) Other attributes take on discrete values (e.g. shared localization?) The target values of the classification is “yes” (interact) or “no” (do not interact)

Page 79: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Decision trees

Binary classification on continuous values are based on a threshold

Proportion of each attribute value in the training set

Page 80: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Decision trees

Binary classification on continuous values are based on a threshold

Proportion of each attribute value in the training set

New examples are predicted to interact if they arrive at a leaf with higher proportion of green, or to not interact if they arrive at a predominately red leaf

Page 81: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

Exercise (exam from 2014) We would like to build a decision tree model to predict cell proliferation based on the gene expression of two genes: NUMB and SRSF1. Our experiments have been recorded in the following table:

Which of the attributes will you test first in the decision tree? Explain why. Help: you can use log2 3 = 1.6

Page 82: Entropy and Information - Pompeu Fabra Universityregulatorygenomics.upf.edu › courses › Master_AGB › 2...Entropy and Information Eduardo Eyras Computational Genomics Pompeu Fabra

References

Machine Learning. Tom Mitchell McGraw Hill, 1997. http://www.cs.cmu.edu/~tom/mlbook.html Computational Molecular Biology. An Introduction Peter Clote and Rolf Backofen. Wiley 2000 What are decision trees? Kingsford C, Salzberg SL. Nat Biotechnol. 2008 Sep;26(9):1011-3. Review. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison. Cambridge University Press, 1999 Problems and Solutions in Biological Sequence Analysis ‎ Mark Borodovsky, Svetlana Ekisheva Cambridge University Press, 2006