Upload
tate-followell
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Efficient Inference Methods for Probabilistic Logical Models
Sriraam Natarajan
Dept of Computer Science, University of Wisconsin-Madison
Take-Away Message
Inference in SRL Models is very hard!!!!This talk – Presents 3 different yet related inference methods
The methods are independent of the underlying formalism
They have been applied to different kinds of problems
The World is inherently Uncertain
Graphical Models (here e.g. a Bayesian network) - Model uncertainty explicitly by representing the joint distribution
Fever Ache
InfluenzaRandom Variables
Direct Influences
Propositional Model!
Real-World Data (Dramatically Simplified)
PatientID Gender Birthdate
P1 M 3/22/63
PatientID Date Physician Symptoms Diagnosis
P1 1/1/01 Smith palpitations hypoglycemic P1 2/1/03 Jones fever, aches influenza
PatientID Date Lab Test Result
P1 1/1/01 blood glucose 42 P1 1/9/01 blood glucose 45
PatientID SNP1 SNP2 … SNP500K
P1 AA AB BB P2 AB BB AA
PatientID Date Prescribed Date Filled Physician Medication Dose Duration
P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months
Non- i.i.d
Multi-Relational
Solution: First-Order Logic / Relational Databases
Shared Parameters
Logic + Probability = Probabilistic Logic aka Statistical Relational Learning Models
Logic
Probabilities
Add Probabilities
Add Relations
Statistical Relational Learning (SRL)
Uncertainty in SRL Models is captured by probabilities, weights or potential functions
Alphabetic Soup => Endless Possibilities Web data (web) Biological data (bio) Social Network Analysis
(soc) Bibliographic data (cite) Epidimiological data (epi) Communication data
(comm) Customer networks (cust) Collaborative filtering
problems (cf) Trust networks (trust)…
Fall 2003– Dietterich @ OSU, Spring 2004 –Page @ UW, Spring 2007-Neville @ Purdue, Fall 2008 – Pedro @ CMU
Probabilistic Relational Models (PRM) Bayesian Logic Programs (BLP) PRISM Stochastic Logic Programs (SLP) Independent Choice Logic (ICL) Markov Logic Networks (MLN) Relational Markov Nets (RMN) CLP-BN Relational Bayes Nets (RBN) Probabilistic Logic Progam (PLP) ProbLog….
Key Problem - Inference
Equivalent to counting 3SAT Models => #P-
complete
More pronounced in SRL Models Prohibitively large number of Objects and Relations Inference has been the biggest bottleneck for the use of SRL Models in practice
Grounding / Propositionalization
Difficulty(C,D), Grade(S,C,G) :- Satisfaction(S) 1 student s1, 10 Courses
Diff(c1,d1)
Diff(c2,d1)
Diff(c8,d2)Diff(c3,d2)
Diff(c9,d4)
Diff(c7,d2)Diff(c4,d4)
Diff(c6,d3)Diff(c5,d1)
Diff(c10,d2)Grade(s1,c2,A)
Grade(s1,c3,B)
Grade(s1,c4,A)
Grade(s1,c1,B)
Grade(s1,c10,A)
Grade(s1,c9,A)
Grade(s1,c8,A)
Grade(s1,c7,A)
Grade(s1,c6,B)Grade(s1,c5,A)
Satisfaction(S)
Recent Advances in SRL Inference
Preprocessing for Inference FROG – Shavlik & Natarajan (2009)
Lifted Exact Inference Lifted Variable Elimination – Poole (2003), Braz et al(2005) Milch et al (2008) Lifted VE + Aggregation – Kisynski & Poole (2009)
Sampling Methods MCMC techniques – Milch & Russell (2006) Logical Particle Filter – Natarajan et al (2008), ZettleMoyer et al (2007) Lazy Inference – Poon et al (2008)
Approximate Methods Lifted First-Order Belief Propagation – Singla & Domingos (2008) Counting Belief Propagation – Kersting et al (2009) MAP Inference – Riedel (2008)
Bounds Propagation Anytime Belief Propagation – Braz et al (2009)
Fast Reduction of Grounded MLNs Counting Belief Propagation Anytime Lifted Belief Propagation Conclusion
Fast Reduction of Grounded MLNs Counting Belief Propagation Anytime Lifted Belief Propagation Conclusion
Markov Logic Networks
Weighted logic
Standard approach
1) Assume finite number of constants
2) Create all possible groundings
3) Perform statistical inference (often via sampling)
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Weight of formula i No. of true groundings of formula i in x
iii xnw
ZxP )(exp
1)(
(Richardson & Domingos, MLJ 2005)
Counting Satisfied Groundings
Typically lots of redundancy in FOL sentences
x, y, z p(x) ⋀ q(x, y, z) ⋀ r(z) w(x, y, z)
If p(John) = false,then formula = truefor all Y and Z values
e Bi
e B1 + … + e Bn
Let A = weighted sum of formula
satisfied by evidence
Let Bi = weighted sum of formula in world i
not satisfied by evidence
Prob(world i ) =
e A + Bi
e A + B1 + … + e A + Bn
Factoring Out the Evidence
Take-Away Message - I
Efficiently factor out those formula groundings that evidence satisfies
Can potentially eliminate the need for approximate inference
Worked Example x, y, z GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x, z) ⋀ SameGroup(y, z)
AdvisedBy(x, y)10,000 People at some school
2000 Graduate students
1000 Professors
1000 TAs
500 Pairs of professors in the same group
Total Num of Groundings = |x| |y| |z| = 1012
1012
The Evidence
1012
¬ GradStudent(P2)¬ GradStudent(P4)
…
2 × 1011
GradStudent(x) GradStudent(P1)¬ GradStudent(P2) GradStudent(P3)
…
True
False
GradStudent(P1) GradStudent(P3)
…
2000 Grad Students
8000 Others
All these values for X satisfy the clause, regardless of Y
and Z
GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z) AdvisedBy(x,y)FROG keeps only these X values
Instead of 104 values for X,
have 2 x 103
2 × 10112 × 1010
Prof(y)¬ Prof(P1) Prof(P2)
…
Prof(P2)…
1000 Professors
¬ Prof(P1)…
9000 Others
GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z) AdvisedBy(x,y)
True
False
2 × 10102 × 109
GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z) AdvisedBy(x,y)
<<< Same as Prof(y) >>>
2 × 1092 × 106
SameGroup(y, z)
106 Combinations
SameGroup(P1, P2)…
1000 trueSameGroup’s
¬ SameGroup(P2, P5)…
106 – 1000 Others
GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z) AdvisedBy(x,y)
True
False
2000 values of X1000 Y:Z
combinations
TA(x, z)
2 × 106 Combinations
TA(P7,P5)…
1000TA’s
¬ TA(P8,P4)…
2 × 106 – 1000 Others
≤ 106
GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z) AdvisedBy(x,y)
True
False
≤ 1000 values of X≤ 1000 Y:Z
combinations
Original number of groundings = 1012
1012
106
GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z) AdvisedBy(x,y)
Final number of groundings ≤ 106
Sample Results: UWash-CSE
0 100 200 300 400 500 600 700 8001,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1,000,000,000
10,000,000,000
Number of Constants
Num
ber o
f Gro
undi
ngs
FROG’s Reduced Net without One Challenging Rule
FROG’s Reduced Net
Fully Grounded Net
advisedBy(x,y) advisedBy(x,z) samePerson(y,z))
Fast Reduction of Grounded MLNs Counting Belief Propagation Anytime Lifted Belief Propagation Conclusion
Belief Propagation
Message passing algorithm – Inference on graphical models For factor graphs
Exact – if the factor graph is a tree Approximate when it has cycles Loopy BP does not guarantee convergence, but is found to be very
useful in practice
X3
X2
X1
f1
f2
Take-Away Message – II
Counting shared factors can result in great efficiency gains for (loopy) belief propagation
Factored Frontier (FF) Probabilistic inference over time is central to many AI problems In contrast to static domains, we need approximation
Variables easily become correlated over time by virtue of sharing common influences in the past
Factored Frontier [Murphy and Weiss 01] Unroll DBN Run (loopy) BP
Lifted First-Order FF: Use CBP in place of BP
Lifted First-order Factored Frontier
20 people over 10 time steps
Max number of friends 5 Cancer never observedTime step randomly
selected
Successor fluent
Fast Reduction of Grounded MLNs Counting Belief Propagation Anytime Lifted Belief Propagation Conclusion
The Need for Shattering
Lifted BP depends on clusters of variables being symmetric, that is, sending and receiving identical messages
In other words, it is about dividing random variables in cases – called as “shattering”
Intuition for Anytime Lifted BP
alarm(House)alarm(House)
earthquake(Town)earthquake(Town)
in(House, Town)in(House, Town)
burglary(House)burglary(House)
next(House,Another)next(House,Another)
lives(Another,Neighbor)
lives(Another,Neighbor)
saw(Neighbor,Someone)saw(Neighbor,Someone)
masked(Someone)masked(Someone)
in(House,Item)in(House,Item)
missing(Item)missing(Item)
partOf(Entrance,House)partOf(Entrance,House)
broken(Entrance)broken(Entrance)
Alarm can go off due to an earthquake
Alarm can go off due to an earthquake
Alarm can go off due to burglary
Alarm can go off due to burglary
A “prior” factor makes alarm going off unlikely without
those causes
A “prior” factor makes alarm going off unlikely without
those causes
Intuition for Anytime Lifted BP
Givena home in sf with home2 and home3 next to it with neighbors jim and mary,each seeing person1 and person2,several items in home, including a missing ring and non-missing cash,broken front but not broken back entrances to home,an earthquake in sf,what is the probability that home’s alarm goes off?
alarm(House)alarm(House)
earthquake(Town)earthquake(Town)
in(House, Town)in(House, Town)
burglary(House)burglary(House)
next(House,Another)next(House,Another)
lives(Another,Neighbor)
lives(Another,Neighbor)
saw(Neighbor,Someone)
saw(Neighbor,Someone)
masked(Someone)masked(Someone)
in(House,Item)in(House,Item)
missing(Item)missing(Item)
partOf(Entrance,House)partOf(Entrance,House)
broken(Entrance)broken(Entrance)
Lifted Belief Propagation
alarm(home)alarm(home)
burglary(home)burglary(home)
earthquake(sf)earthquake(sf)
in(home, sf)in(home, sf)
partOf(front,home)partOf(front,home)
broken(front)broken(front)
next(home,home2)next(home,home2)
lives(home2,jim)lives(home2,jim)
saw(jim,person1)saw(jim,person1)
masked(person1)masked(person1)
in(home,ring)in(home,ring)
missing(ring)missing(ring)partOf(back,home)partOf(back,home)
broken(back)broken(back)
in(home,cash)in(home,cash)
missing(cash)missing(cash)
Item not in { ring,cash,…}
in(home,Item)in(home,Item)
missing(Item)missing(Item)
next(home,home3)next(home,home3)
lives(home2,mary)lives(home2,mary)
saw(mary,person2)saw(mary,person2)
masked(person2)masked(person2)
…
…
…
Complete shattering before belief
propagation starts
Complete shattering before belief
propagation starts
Message passing over entire model before
obtaining query answer
Message passing over entire model before
obtaining query answer
Model for house ≠ home and town ≠ sf not shownModel for house ≠ home and town ≠ sf not shown
Intuition for Anytime Lifted BP
alarm(home)alarm(home)
burglary(home)burglary(home)
earthquake(sf)earthquake(sf)
in(home, sf)in(home, sf)
partOf(front,home)partOf(front,home)
broken(front)broken(front)
next(home,home2)next(home,home2)
lives(home2,jim)lives(home2,jim)
saw(jim,person1)saw(jim,person1)
masked(person1)masked(person1)
in(home,ring)in(home,ring)
missing(ring)missing(ring)partOf(back,home)partOf(back,home)
broken(back)broken(back)
in(home,cash)in(home,cash)
missing(cash)missing(cash)
Item not in { ring,cash,…}
in(home,Item)in(home,Item)
missing(Item)missing(Item)
next(home,home3)next(home,home3)
lives(home2,mary)lives(home2,mary)
saw(mary,person2)saw(mary,person2)
masked(person2)masked(person2)
…
…
…
Query
Evidence
Given earthquake, we already have a good lower
bound, regardless of burglary branch
Given earthquake, we already have a good lower
bound, regardless of burglary branch
Wasted shattering!Wasted shattering!Wasted shattering!Wasted shattering!Wasted shattering!Wasted shattering!Wasted shattering!Wasted shattering!Wasted shattering!Wasted shattering!
Using only a portion of a model
By using only a portion, we don’t have to shatter other parts of the model
How can we use only a portion?A solution for propositional models already
exists: box propagation (Mooij & Kappen NIPS ‘08)
Box Propagation
A way of getting bounds on query without examining entire network.
A Bf1
[0, 1][0.36, 0.67]
Box Propagation
A way of getting bounds on query without examining entire network.
A Bf1
[0.05, 0.5][0.38, 0.50] f2 ...
f3 ...[0.32, 0.4]
[0.1, 0.6][0,1]
[0,1]
Box PropagationA way of getting bounds on query without
examining entire network.
A Bf1
[0.17, 0.3][0.41, 0.44] f2 ...
f3 ...[0.32, 0.4]
[0.3, 0.4][0.2,0.8]
[0,1]
Box Propagation
A way of getting bounds on query without examining entire network.
A Bf1
0.210.42 f2 ...
f3 ...0.36
0.320.45
0.3
Convergence after all messages are collected
Anytime Lifted Belief Propagation
alarm(home)alarm(home)
Start from query aloneStart from query alone
[0,1]
The algorithm works by picking a cluster
variable and including the factors in its
blanket
The algorithm works by picking a cluster
variable and including the factors in its
blanket
burglary(home)burglary(home)
Anytime Lifted Belief Propagation
alarm(home)alarm(home)
earthquake(Town)earthquake(Town)
in(home, Town)in(home, Town)
[0.1, 0.9]
(alarm(home), in(home,Town), earthquake(Town))
after unifying alarm(home) and alarm(House) in
(alarm(House), in(House,Town), earthquake(Town))producing constraint House = home
(alarm(home), in(home,Town), earthquake(Town))
after unifying alarm(home) and alarm(House) in
(alarm(House), in(House,Town), earthquake(Town))producing constraint House = home
Again, through unificationAgain, through unificationBlanket factors alone can determine a bound on queryBlanket factors alone can determine a bound on query
Anytime Lifted Belief Propagation
alarm(home)alarm(home)
earthquake(sf)earthquake(sf)
in(home, sf)in(home, sf)
earthquake(Town)earthquake(Town)
in(home, Town)in(home, Town)
Town ≠ sf
(in(home, sf))
burglary(home)burglary(home)
Cluster in(home, Town) unifies with
in(home, sf) in (in(home, sf))(which represents evidence)splitting cluster around Town = sf
Cluster in(home, Town) unifies with
in(home, sf) in (in(home, sf))(which represents evidence)splitting cluster around Town = sf
[0.1, 0.9]
Bound remains the same because we still haven’t considered evidence on earthquakes
Bound remains the same because we still haven’t considered evidence on earthquakes
Anytime Lifted Belief Propagation
alarm(home)alarm(home)
earthquake(sf)earthquake(sf)
in(home, sf)in(home, sf)
earthquake(Town)earthquake(Town)
in(home, Town)in(home, Town)
Town ≠ sf
burglary(home)burglary(home)
[0.8, 0.9]
No need to further expand (and shatter) other branches
No need to further expand (and shatter) other branches
If bound is good enough, there is no need to further expand (and shatter) other branches
If bound is good enough, there is no need to further expand (and shatter) other branches
(earthquake(sf)) represents the evidence that there was an earthquake
(earthquake(sf)) represents the evidence that there was an earthquake
Now query bound becomes narrowNow query bound becomes narrow
Anytime Lifted Belief Propagation
alarm(home)alarm(home)
earthquake(sf)earthquake(sf)
in(home, sf)in(home, sf)
earthquake(Town)earthquake(Town)
in(home, Town)in(home, Town)
burglary(home)burglary(home)
[0.85, 0.9]partOf(front,home)partOf(front,home)
broken(front)broken(front)
Now query bound becomes narrowNow query bound becomes narrow
We can keep expanding at will for narrower bounds…
We can keep expanding at will for narrower bounds…
Town ≠ sf
Anytime Lifted Belief Propagation
alarm(home)alarm(home)
burglary(home)burglary(home)
earthquake(sf)earthquake(sf)
in(home, sf)in(home, sf)
partOf(front,home)partOf(front,home)
broken(front)broken(front)
next(home,home2)next(home,home2)
lives(home2,jim)lives(home2,jim)
saw(jim,person1)saw(jim,person1)
masked(person1)masked(person1)
in(home,ring)in(home,ring)
missing(ring)missing(ring)partOf(back,home)partOf(back,home)
broken(back)broken(back)
in(home,cash)in(home,cash)
missing(cash)missing(cash)
Item not in { ring,cash,…}
in(home,Item)in(home,Item)
missing(Item)missing(Item)
next(home,home3)next(home,home3)
lives(home2,mary)lives(home2,mary)
saw(mary,person2)saw(mary,person2)
masked(person2)masked(person2)
…
…
…
… until convergence,
if desired.
… until convergence,
if desired.
0.8725
Connection to Resolution Refutation
Incremental shattering corresponds to building a proof tree
alarm(home)
earthquake(sf)
in(home, sf)
earthquake(L), L not in {sf}in(home,L), L not in {sf}
burglary(home)
true
…
Fast Reduction of Grounded MLNs Counting Belief Propagation Anytime Lifted Belief Propagation Conclusion
Conclusion Inference is the key issue in several SRL formalisms FROG - Keeps the count of unsatisfied groundings
Order of Magnitude reduction in number of groundings Compares favorably to Alchemy in different domains
Counting BP - BP + grouping nodes sending and receiving identical messages Conceptually easy, scaleable BP algorithm Applications to challenging AI tasks
Anytime BP – Incremental Shattering + Box Propagation Only the most necessary fraction of model considered and shattered Status – Implementation and evaluation
Conclusion Algorithms are independent of representation Variety of Applications
Parameter Learning of Relational Models Social Networks Object Recognition Link Prediction Activity Recognition Model Counting Bio-Medical Applications Relational RL
Future Work FROG
Combine with Lifted Inference Exploit commonality across rules
CBP Integrate with Parameter Learning in SRL Models Extend to Multi-Agent RL, Lifted Pairwise BP
Anytime BP Heuristic to expand the network Understand closer connections to Resolution
SRL Models Learning Dynamic SRL Models Structure Learning remains an open issue
Acknowledgements* Babak Ahmadi - Fraunhofer Institute Rodrigo de Salvo Braz – SRI International Hung Bui – SRI International Vitor Santos Costa – U Porto Kristian Kersting - Fraunhofer Institute Gautam Kunapuli – UW Madison David Page – UW Madison Stuart Russell – UC Berkeley Jude Shavlik – UW Madison Prasad Tadepalli – Oregon State University
* Ordered by Last name