Structure Refinement in First Order Conditional Influence Language

Structure Refinement in First Order Conditional Influence Language

Sriraam Natarajan, Weng-Keen Wong, Prasad TadepalliSchool of EECS, Oregon State University

Weighted Mean{Weighted Mean{If {task(t), doc(d), role(d,r,t)} then If {task(t), doc(d), role(d,r,t)} then t.id, r.id Qinf (Mean) d.foldert.id, r.id Qinf (Mean) d.folderIf {doc(s), doc(d), source(s,d) } then If {doc(s), doc(d), source(s,d) } then s.folder Qinf (Mean) d.folders.folder Qinf (Mean) d.folder}}

0

50

100

1st

Qtr

2nd

Qtr

3rd

Qtr

4th

Qtr

0

50

100

1st

Qtr

2nd

Qtr

3rd

Qtr

4th

Qtr

0

50

100

1st

Qtr

2nd

Qtr

3rd

Qtr

4th

Qtr

t1.id

d.folder d.folder

d.folder

d.folder d.folder

s2.folder

d.folder d.folder

r1.id t2.id r2.ids1.folder

0

50

100

1st

Qtr

2nd

Qtr

3rd

Qtr

4th

Qtr

0

50

100

1st

Qtr

2nd

Qtr

3rd

Qtr

4th

Qtr

0

50

100

1st

Qtr

2nd

Qtr

3rd

Qtr

4th

Qtr

W1 W2

“Unrolled” Network for Folder Prediction

First-order Conditional Influence Language (FOCIL)

t1.id

d.f

d.f

s2.fr1.id r2.id t2.cd s1.ft1.cd t2.la s1.la s2.la

d.f d.f

d.f

d.f

d.f

t1.la T2.id

Prior Network

d.f

d.f

s2.fr1.id t2.id r2.id s1.ft1.id

d.f d.f

d.f

d.f

d.f

Learned Network

Prior Program

Learned Program

•Conditional BIC score = -2 *CLL + dConditional BIC score = -2 *CLL + dmmlogNlogN

• Different instantiations of the same rule share parametersDifferent instantiations of the same rule share parameters

• Conditional Likelihood: EM – Maximize the joint likelihoodConditional Likelihood: EM – Maximize the joint likelihood

• CBIC score with penalty scaled downCBIC score with penalty scaled down

•Greedy Search with random restartsGreedy Search with random restarts

Scoring metric

Folder Prediction

Rank Exhaustive -R HC+RR - R Exhaustive - I HC+RR - I

1 349 354 312 311

2 107 98 128 130

3 22 26 26 26

4 15 12 20 23

5 6 4 3 4

6 0 0 1 1

7 1 4 2 0

8 0 2 1 2

9 0 0 0 0

10 0 0 0 0

11 0 0 2 3

Score 0.8299 0.8325 .7926 0.7841

Synthetic Data Set

Irrelevant attributesRelevant attributes

• Data is expensive – Exploit prior knowledge in Data is expensive – Exploit prior knowledge in structure searchstructure search

• Derived the CBIC score for our settingDerived the CBIC score for our setting

• Learned the “true” network in the synthetic Learned the “true” network in the synthetic dataset dataset

• Folder dataset: Learned the best network with Folder dataset: Learned the best network with only relevant attributesonly relevant attributes

• Folder dataset with irrelevant attributes:Folder dataset with irrelevant attributes:

Conclusions

CB I Clear ned < CB I Cbest

• Different scoring metricsDifferent scoring metrics

• BDeuBDeu• Bias/VarianceBias/Variance

• Choose the best combining rule that fits the Choose the best combining rule that fits the datadata

• Structure refinement in large real-world Structure refinement in large real-world domainsdomains

Future work

• What is the correct complexity penalty in the presence of What is the correct complexity penalty in the presence of multi-valued variables?multi-valued variables?

• Counting the # of parameters may not be the right Counting the # of parameters may not be the right solutionsolution

• What is the right scoring metric in relational setting for What is the right scoring metric in relational setting for classification?classification?

• Can the search space be intelligently pruned?Can the search space be intelligently pruned?

Issues

i X iNlmd )1(

NNXXYPCBICScorei Xd

rkm i

rlog...|(log2 ,

11,1

Weighted Mean{Weighted Mean{If {task(t), doc(d), role(d,r,t)} then If {task(t), doc(d), role(d,r,t)} then t.id, r.id Qinf (Mean) d.foldert.id, r.id Qinf (Mean) d.folderIf {doc(s), doc(d), source(s,d) } then If {doc(s), doc(d), source(s,d) } then s.folder Qinf (Mean) d.folders.folder Qinf (Mean) d.folder}}

Weighted Mean{Weighted Mean{If {task(t), doc(d), role(d,r,t)} then If {task(t), doc(d), role(d,r,t)} then t.id, r.id, t.creationDate, t.lastAccessed Qinf t.id, r.id, t.creationDate, t.lastAccessed Qinf

(Mean) d.folder (Mean) d.folderIf {doc(s), doc(d), source(s,d) } then If {doc(s), doc(d), source(s,d) } then s.folder, s.lastAccessed Qinf (Mean) d.folders.folder, s.lastAccessed Qinf (Mean) d.folder}}

Documents

Structure Refinement in First Order Conditional Influence Language