Inference in DBNs with non-disjoint clustersperso.crans.org/~genest/CFF.pdf · 2015. 10. 1. ·...

Preview:

Citation preview

Inference in DBNs with non-disjoint clusters

Matthieu Pichené

Introduction

Apoptosis pathway

Mcl1

Mcl1

Method

simulations Analysis

MATHEMATICAL FORMALISM

BIOLOGICAL SYSTEM

Method

Method

Method(Approximate)  abstrac1on    

of  the  low  level    biochemical  model  

DBNs

ES

S

E

P

S + E <—> ES —> P + E

t0 t1 t2 t3

k+1

k-1

k+2

{1 2 3 4 5

{1 2 3 4 5

{1 2 3 4 5

{1 2 3 4 5

Every specie at time point t is a random

variable over a discrete

number of values.

Number of configurations at each time point: ValuesSpecies

DBNs

ES

S

E

P

t0 t1 t2 t3

+CPT

S ES

E P

k+1

k-1

k+2S + E <—> ES —> P + E

CPTS + E <—> ES —> P + Ek+1

k-1

k+2

S S E ES Pr1 1 1 1 0.11 2 1 2 0.22 2 3 3 0.1…

SES S E ES P Pr112…

E S E ES Pr112…

P ES P Pr112…

ES

E P

DBNs

ES

S

E

P

t0 t1 t2 t3

+CPT

S ES

E P

k+1

k-1

k+2S + E <—> ES —> P + E

Complexity of exact inference: at least ValuesSpecies

DBNs

• We need an approximation. Express configurations as product of probabilities

• Simplest idea : Consider all species independent ( Factored Frontier )

Factored Frontier

ES

S

E

P

t0 t1 t2 t3

k+1

k-1

k+2

Hypothesis : Independent

S + E <—> ES —> P + E

complexity of FF inference: Species x ValuesNbPar+1

Pt2(P=h)= f(Pt1(P),Pt1(ES),CPT)

Low accuracy

Clustered Factored Frontier

• Use of clusters containing the species that have the most mutual information

• Clusters may vary over time

• All sets of states for species in a clusters are calculated (that limits the length of clusters)

Clustered Factored Frontier

• Use information theory (Eric) to obtain the important relations

• We (Eric) chose the tree to minimize distance

• Tree implies cluster of size 2

R

R

L:R

L:R

R*

R*

R*:pC8

R

*:pC

8

C8

C8

Bar

Bar

Bid

Bid

C8:Bar

C

8:Ba

r

flip

flip

R*:flip

R

*:flip

pC8

pC8

pC3

pC3

C8:pC3

C

8:pC

3

C3:XIAP

C

3:XI

AP

C3U

C3 U

tBid:Mcl1

tBid

:Mcl

1

C8:Bid

C

8:Bi

d

tBid

tBid

C3

C3

XIAP

XIAP

Smac

Smac

Smacr

S

mac

r

Apop

Apop

Apop:XIAP

Apo

p:XI

AP

PARP

PAR

P

cPARP

cPAR

P

CyCm

CyC

m

Smacm

Smac

m

CyC

CyC

CyCr

CyC

r

Smac:XIAP

Sm

ac:X

IAP

Bax2:Bcl2

Bax

2:Bc

l2

Bcl2

Bcl2

Bax

Bax

Bax*m

Bax*

m

Bax*

Bax*

Bax2

Bax2

Mcl1

Mcl

1

Pore*

Pore

*

Bax4

Bax4

Bax4:M

B

ax4:

M

Bax*m:Bcl2

Bax*

m:B

cl2

Apaf*

A

paf*

pC9

pC

9

Apaf

Apaf

Bax4:Bcl2

Bax

4:Bc

l2

Apop:pC3

Apo

p:pC

3

C3:PARP

C

3:PA

RP

tBid:Bax

tBi

d:Ba

x

M*:CyCm

M

*:CyC

m

M*:Smacm

M*:S

mac

m

CyC:Apaf

CyC

:Apa

f

pC6

pC6

Pore

Pore

C6

C6

C3:pC6

C

3:pC

6

C6:pC8

C

6:pC

8

136 238 5 61439 337 4 7404335464515 8113029345612132528272657492217191820162421514832333150554447525354 923104142

136 238 5 61439 337 4 7404335464515 8113029345612132528272657492217191820162421514832333150554447525354 923104142 0

0.5

1

1.5

2

2.5

3

Mutual information on the whole graph

Mutual Information on the Tree Approximation

R

R

L:R

L:R

R*

R*

R*:pC8

R

*:pC

8

C8

C8

Bar

Bar

Bid

Bid

C8:Bar

C

8:Ba

r

flip

flip

R*:flip

R

*:flip

pC8

pC8

pC3

pC3

C8:pC3

C

8:pC

3

C3:XIAP

C

3:XI

AP

C3U

C3 U

tBid:Mcl1

tBid

:Mcl

1

C8:Bid

C

8:Bi

d

tBid

tBid

C3

C3

XIAP

XIAP

Smac

Smac

Smacr

S

mac

r

Apop

Apop

Apop:XIAP

Apo

p:XI

AP

PARP

PAR

P

cPARP

cPAR

P

CyCm

CyC

m

Smacm

Smac

m

CyC

CyC

CyCr

CyC

r

Smac:XIAP

Sm

ac:X

IAP

Bax2:Bcl2

Bax

2:Bc

l2

Bcl2

Bcl2

Bax

Bax

Bax*m

Bax*

m

Bax*

Bax*

Bax2

Bax2

Mcl1

Mcl

1

Pore*

Pore

*

Bax4

Bax4

Bax4:M

B

ax4:

M

Bax*m:Bcl2

Bax*

m:B

cl2

Apaf*

A

paf*

pC9

pC

9

Apaf

Apaf

Bax4:Bcl2

Bax

4:Bc

l2

Apop:pC3

Apo

p:pC

3

C3:PARP

C

3:PA

RP

tBid:Bax

tBi

d:Ba

x

M*:CyCm

M

*:CyC

m

M*:Smacm

M*:S

mac

m

CyC:Apaf

CyC

:Apa

f

pC6

pC6

Pore

Pore

C6

C6

C3:pC6

C

3:pC

6

C6:pC8

C

6:pC

8

136 238 5 61439 337 4 7404335464515 8113029345612132528272657492217191820162421514832333150554447525354 923104142

136 238 5 61439 337 4 7404335464515 8113029345612132528272657492217191820162421514832333150554447525354 923104142 0

0.5

1

1.5

2

2.5

3

Species correlations (Eric)

Hypothesis :

Pr(St=h,ESt=l,Et=m,Pt=h) =

Pr(St=h,ESt=l) Pr(ESt=l, Et=m) Pr(ESt=l,Pt=h)

Pr2(ESt=l)

S ES E

P

Clustered Factored Frontierwe assume that relations not in tree are irrelevant

Apoptosis pathway

−1.5 −1 −0.5 0 0.5 1 1.5

−1.5

−1

−0.5

0

0.5

1

1.5

1

R

2

R*

3flip

4 pC8

5

C8

6

Bar

7 pC3

8 C3

9

pC6 10

C6

11XIAP

12

PARP

13

cPARP

14

Bid

15

tBid

16

Mcl1

17

Bax

18

Bax*

19

Bax*

m

20

Bax2

21Bax4

22

Bcl2

23

Pore

24

Pore*

25

CyCm

26

CyC

r

27

CyC

28

Smacm

29 Smacr

30 Smac

31 Apaf

32 Apaf*

33 pC9

34 Apop

35 C3U

36

L:R

37 R

*:flip

38

R

*:pC8

39

C8:Bar

40 C8:pC3

41

C3:pC6

42

C6:pC

8

43 C3:XIAP

44

C3:PARP

45 C8:Bid

46 tBid:Mcl1

47

tBid:Bax

48Bax*m:Bcl2

49

Bax

2:Bc

l2

50

Bax4:Bcl2

51 Bax4:M

52

M*:CyCm53

M*:Smacm54

CyC:Apaf

55

Apop:pC3

56

Apop:XIAP57

Sm

ac:X

IAP

Apoptosis pathway

Clustered Factored Frontier

ES

S

E

P

t0 t1 t2 t3

+CPT

S ES

E P

k+1

k-1

k+2S + E <—> ES —> P + E

Clustered Factored Frontier

ES

S

E

P

t0 t1 t2 t3

+CPT

S ES

E P

k+1

k-1

k+2S + E <—> ES —> P + E

Pt1(s’,es’)=Σs,es,e (Pt0(s,es,e)CPT(s,es,e,s’)CPT(s,es,e,es’))

How our algorithm work

Hypothesis :

Pr(St=h,ESt=l,Et=m,Pt=h) =

Pr(St=h,ESt=l) Pr(ESt=l, Et=m) Pr(ESt=l,Pt=h)

Pr2(ESt=l)

S ES E

P

How to compute P(parents(Cluster))

Proposition : P(Xp = vp, XL = VL, XR =VR) = P(Xp = vp, XL = VL) x P(Xp = vp, XR =VR)

P(Xp = vp)

p

L R

How to compute P(parents(Cluster))

Parent_Cluster= set of nodes necessary to use the CPTs.

How to compute P(parents(Cluster))

How to compute P(parents(Cluster))

How to compute P(parents(Cluster))

How to compute P(parents(Cluster))

How to compute P(parents(Cluster))

Independence between trees Complexity : Species x Values Parents_Cluster+1

Algorithm comparison

FF ClusteredFF Exact computation

Complexity Species x ValuesNbParents

Species x ValuesParents_Cluster+1 > ValuesSpecies

Accuracy Low ? but better than FF Exact

Conclusion

• Our program is currently still being written. Results will tell if the accuracy is good or not.

• After the first results are obtained we will upgrade it to accept bigger clusters and non-tree graphs

How our algorithm work

How our algorithm work

How our algorithm work

How our algorithm work

How our algorithm work

Order S x N

How our algorithm work

• For each time T groups of clusters are found

• Most efficient path is found to calculate each cluster

• Calculate probability using CPTs

• Results are saved, cluster probabilities are kept in memory

Clustered Factored Frontier

A

A*

A <—> A* CPT:

96.04% A = h , A* = l 0.04% A = l , A* = h 1.96% A = h , A* = h 1.96% A = l , A* = l 0.04% A = h , A* = l 96.04% A = l , A* = h 1.96% A = h , A* = h 1.96% A = l , A* = l

98% : A = h A* = l —> A = h 2% : A = h A* = l —> A = l 2% : A = l A* = h —> A = h 98% : A = l A* = h —> A = l 2% : A = h A* = l —> A* = h 98% : A = h A* = l —> A* = l 98% : A = l A* = h —> A* = h 2% : A = l A* = h —> A* = l

50% A = h A* = l 50% A = l A* = h :

Clustered Factored Frontier

A

A*

A <—> A* CPT:

53.04% A = h , A* = l 53.04% A = l , A* = h 1.96% A = h , A* = h 1.96% A = l , A* = l

98% : A = h A* = l —> A = h 2% : A = h A* = l —> A = l 2% : A = l A* = h —> A = h 98% : A = l A* = h —> A = l 2% : A = h A* = l —> A* = h 98% : A = h A* = l —> A* = l 98% : A = l A* = h —> A* = h 2% : A = l A* = h —> A* = l

50% A = h A* = l 50% A = l A* = h :

Recommended