Decomposition for Reasoning with Biological Network Gauvain Bourgne, Katsumi Inoue ISSSB’11,...

Preview:

Citation preview

Decomposition for Reasoning with

Biological Network

Gauvain Bourgne, Katsumi InoueISSSB’11, Shonan Village, November 13th -17th 2011

Automated Problem Decomposition 2

Motivation In bioinformatics, need to reason on huge amount

of data◦ Huge networks (e.g. metabolic pathways, signaling

pathways…)

On such problems, centralized methods◦ Long computation time◦ Memory overflow

Problem decomposition◦ Divide into smaller problems or steps to recompose a

global solution◦ Need for (1) an automated process to decompose and

(2) an algorithm to solve local problems and recompose global solution

/33

Automated Problem Decomposition 3

Example Problem (Krebs Cycle)

3

succinate

formaldehyde

creatinine

creatine

beta-alanine

2-oxe-glutarate

l-lysinel-2-aminoadipate

isocitrate

trans-aconitate

taurine

nmnd nmnahippurate

formate

sarcosine

l-as citrulline

ornithinearginine

urea

methylamine

tmao

lactate

glucose

acetate

acryloyl-coapyruvate

Fumaratefumarate

2.6.1.39 1.1.1.4

2

2.3.1.61

4.2.1.3

4.2.1.2

1.3.99.11.13.11.1

62.1.1.

12.1.1.

7 6.3.4.5

2.1.3.3

2.1.1.2

3.5.3.1

3.5.3.3

3.5.2.10

1.5.99.1

1.1.99.8

1.4.99.3

4.1.2.32

4.2.1.54

4.3.1.6

2.1.3.1

4.1.1.20

2.6.1.14

1.2.1.31

glycolisis

1.1.1.27

4.3.2.1

3.5.1.59

2.6.1.-

acetylcoa

2.3.3.1

1.2.4.1

6.2.1.1

citrate

/33

Automated Problem Decomposition 4

Example Problem (Krebs Cycle)

4

succinate

formaldehyde

creatinine

creatine

beta-alanine

2-oxe-glutarate

l-lysinel-2-aminoadipate

isocitrate

trans-aconitate

taurine

nmnd nmnahippurate

formate

sarcosine

l-as citrulline

ornithinearginine

urea

methylamine

tmao

lactate

glucose

acetate

acryloyl-coapyruvate

Fumaratefumarate

2.6.1.39 1.1.1.4

2

2.3.1.61

4.2.1.3

4.2.1.2

1.3.99.11.13.11.1

62.1.1.

12.1.1.

7 6.3.4.5

2.1.3.3

2.1.1.2

3.5.3.1

3.5.3.3

3.5.2.10

1.5.99.1

1.1.99.8

1.4.99.3

4.1.2.32

4.2.1.54

4.3.1.6

2.1.3.1

4.1.1.20

2.6.1.14

1.2.1.31

glycolisis

1.1.1.27

4.3.2.1

3.5.1.59

2.6.1.-

acetylcoa

2.3.3.1

1.2.4.1

6.2.1.1

citrate

Ag2

Ag0

Ag4

Ag1

Ag3

Ag5

4.2.1.2

1.1.1.424.1.1.20

2.3.3.14.3.1.6

2.1.3.1

2.1.3.33.5.3.1

1.5.99.1

1.3.99.1

/33

Automated Problem Decomposition 5

OverviewReasoning taskPartition-based algorithmAutomated decompositionExperimental evaluationConclusion

/33

Automated Problem Decomposition 6

OverviewReasoning taskPartition-based algorithmAutomated decompositionExperimental evaluationConclusion

/33

Automated Problem Decomposition 7

Logical representationMetabolic pathways: set of reactions Ri:

Ri: m1,m2,…,mp p1,p2,…,pn Such reactions can be represented as

◦ an activation rule ¬m1v¬m2v…v¬mp v Ri

◦ n production rules ¬Ri v p1

¬Ri v p2

¬Ri v pn

Clausal theory

/33

Automated Problem Decomposition 8

Problems(Conditional) accessibility problems

Sources (si), Conditional sources (ci), Targets (ti) Find which ti can be produced from si, possibly with the

addition of ci as a new source

◦ Find all consequences of the form ¬civ…v¬ckv tj

Extraction of sub-networksPathways completion (abduction)

◦ Find reactions (set of clauses)Hypothesis on state of reaction given

experiments

Consequence finding (with specific form) /33

Automated Problem Decomposition 9

Main reasoning taskConsequence Finding (CF) in clausal

theories◦ Input A clausal theory T A production field P=<L,Cond>

L is a list of literals Cond is a condition (maximal length of the

consequences, or number of occurrences of some literals)

◦Output All the consequences of T that are subsumption-

minimal and belongs to P (formed with literals of L respecting condition Cond).

Carc(T,P)

/33

Automated Problem Decomposition 10

OverviewReasoning taskPartition-based algorithmAutomated decompositionExperimental evaluationConclusion

/33

Automated Problem Decomposition 11

Partition-based CFThe task

◦Consequence Finding (CF) in clausal theories Input

A set of clausal theory Ti such that UTi=T, and a set of reasoners ai associated with each partition

A production field P=<L,Cond> Output

Carc(T,P) Where

The output should be produced through local computations and interactions between reasoners (message exchange)

/33

Automated Problem Decomposition 12

Partition-based Consequence Finding

Generalization of Partition-based Theorem Proving [Amir & McIlraith, 2005]◦Based on Craig’s Interpolation Theorem:

If C entails D, then there is a formula F involving only symbols common to C et D such that C entails F and F entails D.

Principles Identify common symbols (communication

languages) Build a tree structure (cycle-cut) Forward relevant consequences from leaf to root

C DF

/33

Automated Problem Decomposition

Communication languages

Graph induced from the partitionProblem : eliminate cycles from it while

ensuring a proper labeling. Cycle-cut

While (G not acyclic) Take a minimal cycle

S=(i1,i2),(i2,i3),…,(ip,i1). Choose (i,j) in S s.t.

is minimal

For each (q,r)≠(i,j) in S, l(q,r)l(q,r)Ul(i,j)

Remove (i,j) from E

abc

bfg ade

acdf

a

ac

b

f ad

l(p,q)∪ l(i, j)(p,q )∈S(p,q )≠( i, j )

∑b

b

/3313

Automated Problem Decomposition 14

Forward Message-passing Algorithm(Sequential)

Preprocessing◦ Determine initial l(i,j)◦ Apply Cut-cycles◦ Determine Pi

Non-root agents ai (with parent aj): Pi=<LUl(i,j)>

Root ak: Pk=P

Consequence-Finding◦ From leaves to root Determine Cni=Carc(∑i,Pi)

Forward Cni

Carc

CarcCarc

Carc

/33

Automated Problem Decomposition 15

Parallel Variant

Carc

CarcCarc

Carc

Newcarc

Newcarc

Incremental computations:Newcarc(TUC,P)=Carc(TUC,P)\Carc(T,P)

/33

Automated Problem Decomposition 16

OverviewReasoning taskPartition-based algorithmAutomated decompositionExperimental evaluationConclusion

/33

Automated Problem Decomposition 17

Decomposition of clausal theoriesGiven a Clausal Theory TFind a set of partitions Ti, such that

◦UTi=T

◦Reasoning is easier ie the application of partition-based algorithm to this decomposition is as efficient as possible. Minimize the size of the communication

languages Ensure that some simplification can be done

locally

Partitions should be cohesive and loosely coupled.

/33

c1: ¬b∨c∨e∨fc2: ¬a∨d∨ec3: ¬d∨g∨hc4: ¬e∨gc5: ¬g∨¬h∨i

c2

c1 c4

c3

c5

a d h

igec

f

b

c2

c1 c4

c3

c5

a d h

igec

f

b

c2

c1 c4

c3

c5

e

e

d

g,h

g

Graph representationClausal theory can be represented as

graph

Focus on common symbols

18Automated Problem Decomposition

/33

c2

c1 c4

c3

c5

1

1

1

2

1

Automated Problem Decomposition 19

ArchitectureInitial Theory

.sol file

Reduced graph

representation

Partitioned graph

Partitioned clausal theory.dcf file

Root

Solution

kmetis

Number of partitions

Partition-based CF

buildGraph

graph2dcf

Root choice heuristicChoose root with maximal average clause size

/33

Automated Problem Decomposition 20

Problem Decomposition

succinate

formaldehyde

creatinine

creatine

beta-alanine

2-oxe-glutarate

l-lysinel-2-aminoadipate

isocitrate

trans-aconitate

taurine

nmnd nmnahippurate

formate

sarcosine

l-as citrulline

ornithinearginine

urea

methylamine

tmao

lactate

glucose

acetate

acryloyl-coapyruvate

Fumaratefumarate

2.6.1.39 1.1.1.4

2

2.3.1.61

4.2.1.3

4.2.1.2

1.3.99.11.13.11.1

62.1.1.

12.1.1.

7 6.3.4.5

2.1.3.3

2.1.1.2

3.5.3.1

3.5.3.3

3.5.2.10

1.5.99.1

1.1.99.8

1.4.99.3

4.1.2.32

4.2.1.54

4.3.1.6

2.1.3.1

4.1.1.20

2.6.1.14

1.2.1.31

Glycolisis path

1.1.1.27

4.3.2.1

3.5.1.59

2.6.1.-

acetylcoa

2.3.3.1

1.2.4.1

6.2.1.1

citrate

ag1

ag3

ag2

ag5

ag4 ag0

/33

Automated Problem Decomposition 21

OverviewReasoning taskPartition-based algorithmAutomated decompositionExperimental evaluationConclusion

/33

Automated Problem Decomposition 22

Benchmark Problems

Biological networksTPTP problems

◦Production field : Vocabulary of conjecture (+ removing conjecture) Full vocabulary with length limit

SAT problems◦Production field Based on frequency of literals

N% most/less frequent literals

◦Size Problems still not tractable as CF problems

Solving only a cohesive sub-problem (obtained by partition of the clause graph)

/33

Automated Problem Decomposition 23

Problems characteristics

/33

Automated Problem Decomposition 24

Results – Biological Networks

2 682 252 (3 321 857)

/33

Automated Problem Decomposition 25

Results – SAT problems

/33

Automated Problem Decomposition 26

Results – TPTP problems

/33

Automated Problem Decomposition 27

Results - summary

100 1000 10000 100000 1000000 10000000 100000000100

1000

10000

100000

1000000

10000000

100000000

Seq-bestPar-bestLine

/33

Automated Problem Decomposition 28

Results - summary

100 1000 10000 100000 1000000 10000000 100000000100

1000

10000

100000

1000000

10000000

100000000

Seq-heurPar-heurLine

/33

Automated Problem Decomposition

Results

For almost all problems, decomposition can reduce the number of resolve operations needed.Especially, it can solve some problems that

could not be solvedTime is no often improved

◦Due to communication time (parsing, and such)

Approached decomposition with metis: ok.Root choice heuristic: still insufficient,

though not bad for biological networks problems.

/3329

Automated Problem Decomposition 30

OverviewReasoning taskPartition-based algorithmAutomated decompositionExperimental evaluationConclusion

/33

Automated Problem Decomposition 31

ConclusionA sound and complete algorithm

combined with automated problem decomposition◦Can increase efficiency (nb of

operation) for almost all problems◦But, results dependent on the choice

of root

/33

Automated Problem Decomposition 32

Future worksPartition-based algorithm

◦Variant for Newcarc computations◦Common Theories for 1st order

representations◦Ordered partitions to break cycle

(without removing links)Decomposition

◦Directly from metabolic pathway◦Root choice heuristic Learning preference relation on root choice

◦Choosing the number of partition /33

Thank you for your attention

Any question ?

/3333Automated Problem Decomposition

Recommended