Efficient Inference Methods for Probabilistic Logical Models Sriraam Natarajan Dept of Computer Science, University of Wisconsin-Madison

Efficient Inference Methods for Probabilistic Logical Models

Sriraam Natarajan

Dept of Computer Science, University of Wisconsin-Madison

Take-Away Message

Inference in SRL Models is very hard!!!!This talk – Presents 3 different yet related inference methods

The methods are independent of the underlying formalism

They have been applied to different kinds of problems

The World is inherently Uncertain

Graphical Models (here e.g. a Bayesian network) - Model uncertainty explicitly by representing the joint distribution

Fever Ache

InfluenzaRandom Variables

Direct Influences

Propositional Model!

Real-World Data (Dramatically Simplified)

PatientID Gender Birthdate

P1 M 3/22/63

PatientID Date Physician Symptoms Diagnosis

P1 1/1/01 Smith palpitations hypoglycemic P1 2/1/03 Jones fever, aches influenza

PatientID Date Lab Test Result

P1 1/1/01 blood glucose 42 P1 1/9/01 blood glucose 45

PatientID SNP1 SNP2 … SNP500K

P1 AA AB BB P2 AB BB AA

PatientID Date Prescribed Date Filled Physician Medication Dose Duration

P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months

Non- i.i.d

Multi-Relational

Solution: First-Order Logic / Relational Databases

Shared Parameters

Logic + Probability = Probabilistic Logic aka Statistical Relational Learning Models

Logic

Probabilities

Add Probabilities

Add Relations

Statistical Relational Learning (SRL)

Uncertainty in SRL Models is captured by probabilities, weights or potential functions

Alphabetic Soup => Endless Possibilities Web data (web) Biological data (bio) Social Network Analysis

(soc) Bibliographic data (cite) Epidimiological data (epi) Communication data

(comm) Customer networks (cust) Collaborative filtering

problems (cf) Trust networks (trust)…

Fall 2003– Dietterich @ OSU, Spring 2004 –Page @ UW, Spring 2007-Neville @ Purdue, Fall 2008 – Pedro @ CMU

Probabilistic Relational Models (PRM) Bayesian Logic Programs (BLP) PRISM Stochastic Logic Programs (SLP) Independent Choice Logic (ICL) Markov Logic Networks (MLN) Relational Markov Nets (RMN) CLP-BN Relational Bayes Nets (RBN) Probabilistic Logic Progam (PLP) ProbLog….

Key Problem - Inference

Equivalent to counting 3SAT Models => #P-

complete

More pronounced in SRL Models Prohibitively large number of Objects and Relations Inference has been the biggest bottleneck for the use of SRL Models in practice

Grounding / Propositionalization

Difficulty(C,D), Grade(S,C,G) :- Satisfaction(S) 1 student s1, 10 Courses

Diff(c1,d1)

Diff(c2,d1)

Diff(c8,d2)Diff(c3,d2)

Diff(c9,d4)



Diff(c10,d2)Grade(s1,c2,A)

Grade(s1,c3,B)

Grade(s1,c4,A)

Grade(s1,c1,B)

Grade(s1,c10,A)

Grade(s1,c9,A)

Grade(s1,c8,A)

Grade(s1,c7,A)

Grade(s1,c6,B)Grade(s1,c5,A)

Satisfaction(S)

Realistic Example – Gene-fold Prediction

Thanks to Irene Ong

Recent Advances in SRL Inference

Preprocessing for Inference FROG – Shavlik & Natarajan (2009)

Lifted Exact Inference Lifted Variable Elimination – Poole (2003), Braz et al(2005) Milch et al (2008) Lifted VE + Aggregation – Kisynski & Poole (2009)

Sampling Methods MCMC techniques – Milch & Russell (2006) Logical Particle Filter – Natarajan et al (2008), ZettleMoyer et al (2007) Lazy Inference – Poon et al (2008)

Approximate Methods Lifted First-Order Belief Propagation – Singla & Domingos (2008) Counting Belief Propagation – Kersting et al (2009) MAP Inference – Riedel (2008)

Bounds Propagation Anytime Belief Propagation – Braz et al (2009)

Fast Reduction of Grounded MLNs Counting Belief Propagation Anytime Lifted Belief Propagation Conclusion


Markov Logic Networks

Weighted logic

Standard approach

1) Assume finite number of constants

2) Create all possible groundings

3) Perform statistical inference (often via sampling)

)()(),(,

)()(

ySmokesxSmokesyxFriendsyx

xCancerxSmokesx

1.1

5.1

Weight of formula i No. of true groundings of formula i in x

iii xnw

ZxP )(exp

1)(

(Richardson & Domingos, MLJ 2005)

Counting Satisfied Groundings

Typically lots of redundancy in FOL sentences

x, y, z p(x) ⋀ q(x, y, z) ⋀ r(z) w(x, y, z)

If p(John) = false,then formula = truefor all Y and Z values

e Bi

e B1 + … + e Bn

Let A = weighted sum of formula

satisfied by evidence

Let Bi = weighted sum of formula in world i

not satisfied by evidence

Prob(world i ) =

e A + Bi

e A + B1 + … + e A + Bn

Factoring Out the Evidence

Take-Away Message - I

Efficiently factor out those formula groundings that evidence satisfies

Can potentially eliminate the need for approximate inference

Worked Example x, y, z GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x, z) ⋀ SameGroup(y, z)

AdvisedBy(x, y)10,000 People at some school

2000 Graduate students

1000 Professors

1000 TAs

500 Pairs of professors in the same group

Total Num of Groundings = |x| |y| |z| = 1012

1012

The Evidence

1012

¬ GradStudent(P2)¬ GradStudent(P4)

…

2 × 1011

GradStudent(x) GradStudent(P1)¬ GradStudent(P2) GradStudent(P3)

…

True

False

GradStudent(P1) GradStudent(P3)

…

2000 Grad Students

8000 Others

All these values for X satisfy the clause, regardless of Y

and Z

GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z) AdvisedBy(x,y)FROG keeps only these X values

Instead of 104 values for X,

have 2 x 103

2 × 10112 × 1010

Prof(y)¬ Prof(P1) Prof(P2)

…

Prof(P2)…

1000 Professors

¬ Prof(P1)…

9000 Others

GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z) AdvisedBy(x,y)

True

False

2 × 10102 × 109


<<< Same as Prof(y) >>>

2 × 1092 × 106

SameGroup(y, z)

106 Combinations

SameGroup(P1, P2)…

1000 trueSameGroup’s

¬ SameGroup(P2, P5)…

106 – 1000 Others


True

False

2000 values of X1000 Y:Z

combinations

TA(x, z)

2 × 106 Combinations

TA(P7,P5)…

1000TA’s

¬ TA(P8,P4)…

2 × 106 – 1000 Others

≤ 106


True

False

≤ 1000 values of X≤ 1000 Y:Z

combinations

Original number of groundings = 1012

1012

106


Final number of groundings ≤ 106

Sample Results: UWash-CSE

0 100 200 300 400 500 600 700 8001,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1,000,000,000

10,000,000,000

Number of Constants

Num

ber o

f Gro

undi

ngs

FROG’s Reduced Net without One Challenging Rule

FROG’s Reduced Net

Fully Grounded Net

advisedBy(x,y) advisedBy(x,z) samePerson(y,z))


Belief Propagation

Message passing algorithm – Inference on graphical models For factor graphs

Exact – if the factor graph is a tree Approximate when it has cycles Loopy BP does not guarantee convergence, but is found to be very

useful in practice

X3

X2

X1

f1

f2

Belief Propagation

Identical Factors

Take-Away Message – II

Counting shared factors can result in great efficiency gains for (loopy) belief propagation

Counting Belief Propagation

Two Steps1. Compress Factor Graph2. Run modified BP

Step 1: Compression

Step 2: Modified Belief Propagation

Factored Frontier (FF) Probabilistic inference over time is central to many AI problems In contrast to static domains, we need approximation

Variables easily become correlated over time by virtue of sharing common influences in the past

Factored Frontier [Murphy and Weiss 01] Unroll DBN Run (loopy) BP

Lifted First-Order FF: Use CBP in place of BP

Lifted First-order Factored Frontier

20 people over 10 time steps

Max number of friends 5 Cancer never observedTime step randomly

selected

Successor fluent


The Need for Shattering

Lifted BP depends on clusters of variables being symmetric, that is, sending and receiving identical messages

In other words, it is about dividing random variables in cases – called as “shattering”

Intuition for Anytime Lifted BP

alarm(House)alarm(House)

earthquake(Town)earthquake(Town)

in(House, Town)in(House, Town)

burglary(House)burglary(House)

next(House,Another)next(House,Another)

lives(Another,Neighbor)


saw(Neighbor,Someone)saw(Neighbor,Someone)

masked(Someone)masked(Someone)

in(House,Item)in(House,Item)

missing(Item)missing(Item)

partOf(Entrance,House)partOf(Entrance,House)

broken(Entrance)broken(Entrance)

Alarm can go off due to an earthquake

Alarm can go off due to an earthquake

Alarm can go off due to burglary

Alarm can go off due to burglary

A “prior” factor makes alarm going off unlikely without

those causes

A “prior” factor makes alarm going off unlikely without

those causes


Givena home in sf with home2 and home3 next to it with neighbors jim and mary,each seeing person1 and person2,several items in home, including a missing ring and non-missing cash,broken front but not broken back entrances to home,an earthquake in sf,what is the probability that home’s alarm goes off?

alarm(House)alarm(House)


in(House, Town)in(House, Town)

burglary(House)burglary(House)

next(House,Another)next(House,Another)



saw(Neighbor,Someone)

saw(Neighbor,Someone)

masked(Someone)masked(Someone)

in(House,Item)in(House,Item)


partOf(Entrance,House)partOf(Entrance,House)

broken(Entrance)broken(Entrance)

Lifted Belief Propagation

alarm(home)alarm(home)

burglary(home)burglary(home)

earthquake(sf)earthquake(sf)

in(home, sf)in(home, sf)

partOf(front,home)partOf(front,home)

broken(front)broken(front)

next(home,home2)next(home,home2)

lives(home2,jim)lives(home2,jim)

saw(jim,person1)saw(jim,person1)

masked(person1)masked(person1)

in(home,ring)in(home,ring)

missing(ring)missing(ring)partOf(back,home)partOf(back,home)

broken(back)broken(back)

in(home,cash)in(home,cash)

missing(cash)missing(cash)

Item not in { ring,cash,…}

in(home,Item)in(home,Item)



lives(home2,mary)lives(home2,mary)

saw(mary,person2)saw(mary,person2)


…

…

…

Complete shattering before belief

propagation starts

Complete shattering before belief

propagation starts

Message passing over entire model before

obtaining query answer

Message passing over entire model before

obtaining query answer

Model for house ≠ home and town ≠ sf not shownModel for house ≠ home and town ≠ sf not shown
























…

…

…

Query

Evidence

Given earthquake, we already have a good lower

bound, regardless of burglary branch

Given earthquake, we already have a good lower

bound, regardless of burglary branch

Wasted shattering!Wasted shattering!Wasted shattering!Wasted shattering!Wasted shattering!Wasted shattering!Wasted shattering!Wasted shattering!Wasted shattering!Wasted shattering!

Using only a portion of a model

By using only a portion, we don’t have to shatter other parts of the model

How can we use only a portion?A solution for propositional models already

exists: box propagation (Mooij & Kappen NIPS ‘08)

Box Propagation

A way of getting bounds on query without examining entire network.

A

[0, 1]

Box Propagation


A Bf1

[0, 1][0.36, 0.67]

Box Propagation


A Bf1

[0.05, 0.5][0.38, 0.50] f2 ...

f3 ...[0.32, 0.4]

[0.1, 0.6][0,1]

[0,1]

Box PropagationA way of getting bounds on query without

examining entire network.

A Bf1

[0.17, 0.3][0.41, 0.44] f2 ...

f3 ...[0.32, 0.4]

[0.3, 0.4][0.2,0.8]

[0,1]

Box Propagation


A Bf1

0.210.42 f2 ...

f3 ...0.36

0.320.45

0.3

Convergence after all messages are collected

Take-Away Message - III

Anytime BP = Incremental Shattering + Box Propagation

Anytime Lifted Belief Propagation


Start from query aloneStart from query alone

[0,1]

The algorithm works by picking a cluster

variable and including the factors in its

blanket

The algorithm works by picking a cluster

variable and including the factors in its

blanket





in(home, Town)in(home, Town)

[0.1, 0.9]

(alarm(home), in(home,Town), earthquake(Town))

after unifying alarm(home) and alarm(House) in

(alarm(House), in(House,Town), earthquake(Town))producing constraint House = home

(alarm(home), in(home,Town), earthquake(Town))

after unifying alarm(home) and alarm(House) in

(alarm(House), in(House,Town), earthquake(Town))producing constraint House = home

Again, through unificationAgain, through unificationBlanket factors alone can determine a bound on queryBlanket factors alone can determine a bound on query







Town ≠ sf

(in(home, sf))


Cluster in(home, Town) unifies with

in(home, sf) in (in(home, sf))(which represents evidence)splitting cluster around Town = sf

Cluster in(home, Town) unifies with

in(home, sf) in (in(home, sf))(which represents evidence)splitting cluster around Town = sf

[0.1, 0.9]

Bound remains the same because we still haven’t considered evidence on earthquakes

Bound remains the same because we still haven’t considered evidence on earthquakes







Town ≠ sf


[0.8, 0.9]

No need to further expand (and shatter) other branches

No need to further expand (and shatter) other branches

If bound is good enough, there is no need to further expand (and shatter) other branches

If bound is good enough, there is no need to further expand (and shatter) other branches

(earthquake(sf)) represents the evidence that there was an earthquake

(earthquake(sf)) represents the evidence that there was an earthquake

Now query bound becomes narrowNow query bound becomes narrow








[0.85, 0.9]partOf(front,home)partOf(front,home)


Now query bound becomes narrowNow query bound becomes narrow

We can keep expanding at will for narrower bounds…

We can keep expanding at will for narrower bounds…

Town ≠ sf
























…

…

…

… until convergence,

if desired.

… until convergence,

if desired.

0.8725

Connection to Resolution Refutation

Incremental shattering corresponds to building a proof tree

alarm(home)

earthquake(sf)

in(home, sf)

earthquake(L), L not in {sf}in(home,L), L not in {sf}

burglary(home)

true

…


Conclusion Inference is the key issue in several SRL formalisms FROG - Keeps the count of unsatisfied groundings

Order of Magnitude reduction in number of groundings Compares favorably to Alchemy in different domains

Counting BP - BP + grouping nodes sending and receiving identical messages Conceptually easy, scaleable BP algorithm Applications to challenging AI tasks

Anytime BP – Incremental Shattering + Box Propagation Only the most necessary fraction of model considered and shattered Status – Implementation and evaluation

Conclusion Algorithms are independent of representation Variety of Applications

Parameter Learning of Relational Models Social Networks Object Recognition Link Prediction Activity Recognition Model Counting Bio-Medical Applications Relational RL

Future Work FROG

Combine with Lifted Inference Exploit commonality across rules

CBP Integrate with Parameter Learning in SRL Models Extend to Multi-Agent RL, Lifted Pairwise BP

Anytime BP Heuristic to expand the network Understand closer connections to Resolution

SRL Models Learning Dynamic SRL Models Structure Learning remains an open issue

Acknowledgements* Babak Ahmadi - Fraunhofer Institute Rodrigo de Salvo Braz – SRI International Hung Bui – SRI International Vitor Santos Costa – U Porto Kristian Kersting - Fraunhofer Institute Gautam Kunapuli – UW Madison David Page – UW Madison Stuart Russell – UC Berkeley Jude Shavlik – UW Madison Prasad Tadepalli – Oregon State University

* Ordered by Last name

Thanks!

Documents

Efficient Inference Methods for Probabilistic Logical Models Sriraam Natarajan Dept of Computer Science, University of Wisconsin-Madison