Upload
alan-stanley
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
1
Learning the Structure of Markov Logic
Networks
Stanley Kok
2
Overview Introduction CLAUDIEN, CRFs Algorithm
Evaluation Measure Clause Construction Search Strategies Speedup Techniques
Experiments
3
Introduction Richardson & Domingoes (2004) learned MLN
structure in two disjoint steps: Learn FO clauses with off-the-shelf ILP system
(CLAUDIEN) Learn clause weights by optimizing pseudo-
likelihood
Develop algorithm: Learns FO clauses by directly optimizing pseudo-
likelihood Fast enough Learns better structure than R&D, pure ILP, purely
probabilistic and purely KB approaches
4
CLAUDIEN CLAUsal DIscovery ENgine Starts with trivially false clause Repeatedly refine current clauses by adding literals Adds clauses that satisfy min accuracy and
coverage to KBtrue ) false
m ) false f ) false h ) false
m^f ) false m ) h
m ) fm^h ) false
f ) h f ) m f^h ) false h ) f h ) m
h ) m v f
5
CLAUDIEN language bias ´ clause template
Refine handcrafted KB Example,
Professor(P) ( AdvisedBy(S,P) in KB dlab_template(‘1-2:[Professor(P),Student(S)]<-
AdvisedBy(S,P)’) Professor(P) v Student(S) ( AdvisedBy(S,P)
6
Conditional Random Fields Markov networks used to compute P(y|x)
(McCallum2003)
Model:
Features, fk e.g. “current word is capitalized and next word is Inc”
y1 y2 y3 yn-1 yn
x1,x2,…,xn
…
IBM hired Alice….
Org PersonMisc Misc Misc
7
CRF – Feature Induction Set of atomic features (word=the, capitalized etc) Starts from empty CRF While convergence criteria is not met
Create list of new features consisting of Atomic features Binary conjunctions of atomic features Conjunctions of atomic features with features already in
model Evaluate gain in P(y|x) of adding each feature to model Add best K features to model (100s-1000s features)
8
Algorithm High-level algorithm
RepeatClauses <- FindBestClauses(MLN)Add Clauses to MLN
Until Clauses =
FindBestClauses(MLN)Search for, For each candidate clause c
Compute gainevaluation measure of adding c to MLNReturn k clauses with highest gain
and create candidate clauses
9
Evaluation Measure Ideally use log-likelihood, but slow
Recall: Value: Gradient:
10
Evaluation Measure Use pseudo-log-likelihood
(R&D(2004)), but Undue weight to predicates with large #
of groundings Recall: E.g.:
11
Evaluation Measure Use weighted pseudo-log-likelihood (WPLL)
E.g.:
12
Algorithm High-level algorithm
RepeatClauses <- FindBestClauses(MLN)Add Clauses to MLN
Until Clauses =
FindBestClauses(MLN)Search for, For each candidate clause c
Compute gainevaluation measure of adding c to MLNReturn k clauses with highest gain
and create candidate clauses
13
Clause Construction
Add a literal (negative/positive) All possible ways variables of new literal can
be shared with those of clause !Student(S) v AdvBy(S,P)
Remove a literal (when refining MLN) Remove spurious conditions from rules !Student(S) v !YrInPgm(S,5) v TA(S,C)
v TmpAdvBy(S,P)
14
Clause Construction Flip signs of literals (when refining MLN)
Move literals on wrong side of implication !CseQtr(C1,Q1) v !CseQtr(C2,Q2) v !
SameCse(C1,C2) v !SameQtr(Q1,Q2) Beginning of algorithm Expensive, optional
Limit # of distinct variables to restrict search space
15
Algorithm High-level algorithm
RepeatClauses <- FindBestClauses(MLN)Add Clauses to MLN
Until Clauses =
FindBestClauses(MLN)Search for, For each candidate clause c
Compute gainevaluation measure of adding c to MLNReturn k clauses with highest gain
and create candidate clauses
16
Search Strategies Shortest-first search (SFS)
1. Find gain of each clause2. Sort clauses by gain3. Return top 5 with positive gainMLN
wt1, !AdvBy(S,P)wt2, clause2
…
4. Add 5 clauses to MLN5. Retrain wts of MLN
candidate set
1. Find gain of each clause2. Sort them by gain
(Yikes! All length-2 clauses have gains · 0)
!AdvBy(S,P) v Stu(S)
17
Shortest-First Search
a. Extend 20 length-2 clause with highest gains
b. Form new candidate setc. Keep 1000 clauses with
highest gains
MLNwt1, !AdvBy(S,P)
wt2, clause2…
!AdvBy(S,P) v Stu(S) !AdvBy(S,P) v Stu(S) v Prof(P)
18
Shortest-First Search Shortest-first search (SFS)
• Repeat process • Extend all length-2
clauses before length-3 ones
MLNwt1, clause1wt2, clause2
…
candidate setHow do you refine a non-empty MLN?
19
SFS – MLN Refinementa. Extend 20 length-2
clause with highest gainsb. Extend length-2 clauses
in MLNc. Remove a predicate from
length-4 clauses in MLNd. Flip signs of length-3
clauses in MLN (optional)e. b,c,d replaces original
clause in MLN
MLNwt1, !AdvBy(S,P)
wt2, clause2…
wtA, clauseAwtB, clauseB
…
20
Search Strategies Beam Search
1. Keep a beam of 5 clauses with highest gains 2. Track best clause3. Stop when best clause does not change after two
consecutive iterations
MLNwt1, clause1wt2, clause2
…wtA, clauseAwtB, clauseB
…
How do you refine a non-empty MLN?
21
Algorithm High-level algorithm
RepeatClauses <- FindBestClauses(MLN)Add Clauses to MLN
Until Clauses =
FindBestClauses(MLN)Search for, For each candidate clause c
Compute gainevaluation measure of adding c to MLNReturn k clauses with highest gain
and create candidate clauses
22
Difference from CRF – Feature Induction Set of atomic features (word=the, capitalized etc) Start from empty CRF While convergence criteria is not met
Create list of new features consisting of Atomic features Binary conjunctions of atomic features Conjunctions of atomic features with features already in
model Evaluate gain in P(y|x) of adding each feature to model Add best K features to model (100s-1000s features)
We can refine non-empty MLN
•We use pseudo-likelihood; different optimizations.•Applicable to arbitrary MN (not only linear chains)
•Maintain separate candidate set•Add best ¼10s in model
Flexible enough to fit in different search algms
23
OverviewIntroductionCLAUDIEN, CRFsAlgorithm
Evaluation MeasureClause ConstructionSearch Strategies
Speedup Techniques Experiments
24
Speedup Techniques Recall: FindBestClauses(MLN)
Search for, and create candidate clausesFor each candidate clause c
Compute gainWPLL of adding c to MLNReturn k clauses with highest gain
LearnWeights(MLN+c) to optimize WPLL with L-BFGS L-BFGS computes value and gradient of WPLL
Many candidate clauses; important to compute WPLL and its gradient efficiently
25
Speedup Techniques WPLL:
Ignore clauses in which predicate does not appear in e.g. predicate l does not appear in clause 1
CLL
26
Speedup Techniques Gnd pred’s CLL affected by clauses that contains it Most clause weights do not significantly
Most CLLs do not much Don’t have to recompute all CLLs
Store WPLL and CLLs Recompute CLLs only if weights affecting it beyond
some threshold Subtract old CLLs and add new CLLs to WPLL
27
Speedup Techniques WPLL is a sum over all ground predicates
Estimate WPLL Uniformly sampling grounding of each FO predicates
Sample x% of # groundings subject to min, max Extrapolate the average
28
Speedup Techniques WPLL and its gradient
Compute # true groundings of a clause #P-complete problem
Karp & Luby (1983)’s Monte-Carlo algorithm Gives estimate that is within of true value with
probability 1- Draws samples of a clause
Found that estimate converges faster than algorithm specifies Use convergence test (DeGroot & Schervish 2002) after
every 100 samples Earlier termination
29
Speedup Techniques L-BFGS used to learn clause weights to
optimize WPLL Two parameters:
Max number of iterations Convergence Threshold
Use smaller # max iterations and looser convergence thresholds When evaluating candidate clause’s gain Faster termination
30
Speedup Technique Lexicographic ordering on clauses
Avoid redundant computations for clauses that are syntactically the same
Don’t detect semantically identical but syntactically different clauses (NP-complete problem)
Cache new clauses Avoid recomputation
31
Speedup Techniques Also used R&D04 techniques for WPLL gradient :
Ignore predicates that don’t appear in ith formula
Ignore ground formulas with truth value unaffected by changing truth value of any literal
# true groundings of a clause computed once and cached
32
OverviewIntroductionCLAUDIEN, CRFsAlgorithm
Evaluation MeasureClause ConstructionSearch StrategiesSpeedup Techniques
Experiments
33
Experiments UW-CSE domain
22 predicates e.g. AdvisedBy, Professor etc 10 types e.g. Person, Course, Quarter etc Total # ground predicates about 4 million # true ground predicates (in DB) = 3212 Handcrafted KB with 94 formulas
Each student has at most one advisor If a student is an author of a paper, so is her advisor
etc
34
Experiments Cora domain
1295 citations to 112 CS research papers Author, Venue, Title, Year fields 5 Predicates viz. SameCitation, SameAuthor,
SameVenue, SameTitle, SameYear Evidence Predicates e.g.
WordsInCommonInTitle20%(title1, title2) Total # ground predicates about 5 million # true ground predicates (in DB) = 378,589 Handcrafted KB with 26 clauses
If two citations same, then they have same authors, titles etc, and vice versa
If two titles have many words in common, then they are the same, etc
35
Systems MLN(KB): weight-learning applied to handcrafted
KB MLN(CL): structure-learning with CLAUDIEN;
weight-learning MLN(KB+CL): structure-learning with CLAUDIEN,
using the handcrafted KB as its language bias; weight-learning
MLN(SLB): structure-learning with beam search, start from empty MLN
MLN(KB+SLB): ditto, start from handcrafted KB MLN(SLB+KB): structure-learning with beam
search, start from empty MLN, allow handcrafted clauses to be added in a first search step
MLN(SLS): structure-learning with SFS, start from empty MLN
36
Systems CL: CLAUDIEN alone KB: handcrafted KB alone KB+CL: CLAUDIEN with KB as its language
bias NB: naïve bayes BN: Bayesian networks
37
Methodology UW-CSE domain
DB divided into 5 areas: ai, graphics, languages, systems, theory
Leave-one-out testing by area Cora domain
5 different train-test splits Measured
average CLL of the predicates average area under the precision-recall curve
of the predicates (AUC)
38
Results MLN(SLS), MLN(SLB) better than
MLN(CL), MLN(KB), CL, KB, NB, BN
UW-CSE
0.533
0.4710.430
0.5500.507
0.306
0.419
0.320
0.170
0.286
0.3890.395
0.000
0.100
0.200
0.300
0.400
0.500
0.600
AU
UW-CSE
0.0590.0860.142
0.0680.114
0.418
0.141
1.100
0.733
1.234
0.507
0.166
0.000
0.200
0.400
0.600
0.800
1.000
1.200
1.400
CLL
CLL
(-v
e)
AU
C
39
Results MLN(SLS), MLN(SLB) better than
MLN(CL), MLN(KB), CL, KB, NB, BN
Cora
0.7820.826
0.782 0.796
0.148
0.813
0.693
0.148
0.693
0.1040.061
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
AU
CLL
AU
C
Cora
0.058 0.058 0.058 0.071
0.693
0.067
0.224
0.693
0.225
0.440
0.266
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
CLL
CLL
(-v
e)
40
Results MLN(SLB+KB) better than
MLN(KB+CL), KB+CL
UW-CSE
0.533
0.4710.430
0.5500.507
0.306
0.419
0.320
0.170
0.286
0.3890.395
0.000
0.100
0.200
0.300
0.400
0.500
0.600
AU
UW-CSE
0.0590.0860.142
0.0680.114
0.418
0.141
1.100
0.733
1.234
0.507
0.166
0.000
0.200
0.400
0.600
0.800
1.000
1.200
1.400
CLL
CLL
(-v
e)
AU
C
41
Results MLN(SLB+KB) better than
MLN(KB+CL), KB+CL
Cora
0.7820.826
0.782 0.796
0.148
0.813
0.693
0.148
0.693
0.1040.061
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
AU
CLL
AU
C
Cora
0.058 0.058 0.058 0.071
0.693
0.067
0.224
0.693
0.225
0.440
0.266
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
CLL
CLL
(-v
e)
42
Results MLN(<system>) does better than corresponding
<system>
UW-CSE
0.533
0.4710.430
0.5500.507
0.306
0.419
0.320
0.170
0.286
0.3890.395
0.000
0.100
0.200
0.300
0.400
0.500
0.600
AU
UW-CSE
0.0590.0860.142
0.0680.114
0.418
0.141
1.100
0.733
1.234
0.507
0.166
0.000
0.200
0.400
0.600
0.800
1.000
1.200
1.400
CLL
CLL
(-v
e)
AU
C
43
Results MLN(<system>) does better than corresponding
<system>
Cora
0.7820.826
0.782 0.796
0.148
0.813
0.693
0.148
0.693
0.1040.061
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
AU
CLL
AU
C
Cora
0.058 0.058 0.058 0.071
0.693
0.067
0.224
0.693
0.225
0.440
0.266
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
CLL
CLL
(-v
e)
44
Results MLN(SLS) on UW-CSE; cluster of 15 dual-
CPUs 2.8 GHz Pentium 4 machines With speed-ups: 5.3 hrs Without speed-ups: didn’t finish running in 24
hrs MLN(SLB) on UW-CSE; on single 2.8 GHz
Pentium 4 machine With speedups: 8.8 hrs Without speedups: 13.7 hrs
45
Future Work Speeding up counting of # true
groundings of clause Probabilistically bounding the loss in
accuracy due to subsampling Probabilistic predicate discovery
46
Conclusion Develop algorithm:
Learns FO clauses by directly optimizing pseudo-likelihood
Fast enough Learns better structure than R&D, pure ILP,
purely probabilistic and purely KB approaches