First-Order Probabilistic Inference Rodrigo de Salvo Braz

Preview:

DESCRIPTION

3 Slides online  Just search for “Rodrigo de Salvo Braz” and check at the presentations page. (www.cs.berkeley.edu/~braz)

Citation preview

First-OrderProbabilistic Inference

Rodrigo de Salvo Braz

2

A remarkFeel free to ask clarification questions!

3

Slides online Just search for

“Rodrigo de Salvo Braz” and check at the presentations page.(www.cs.berkeley.edu/~braz)

4

Outline A goal: First-order Probabilistic Inference

An ultimate goal Propositionalization

Lifted Inference Inversion Elimination Counting Elimination

DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference

Future Directions and Conclusion

5

Outline A goal: First-order Probabilistic Inference

An ultimate goal Propositionalization

Lifted Inference Inversion Elimination Counting Elimination

DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference

Future Directions and Conclusion

6

How to solvecommonsense

reasoning

How to solveNatural

LanguageProcessing

How to solvePlanning

How to solveVision

DomainKnowledge

Inferenceand

Learning

LanguageKnowledge

Inferenceand

Learning

Domain &Planning

Knowledge

Inferenceand

Learning

Objects &Optics

Knowledge

Inferenceand

Learning

Abstracting Inference and Learning

Commonsense reasoning

Natural LanguageProcessing

Planning Vision ...

Inferenceand

Learning

AI Problems

7

Inference and LearningWhat capabilities should it have? Inference

andLearningIt should represent and use:

•predicates (relations) and quantification over objects: 8 X tank(X) Ç plane(X) 8 X tank(X) ) color(X)=greencolor(o121) = red

•varying degrees of uncertainty: 8 X { tank(X) ) color(X)=green }

since it may hold often but not absolutely.•essential for language, vision, pattern recognition,

commonsense reasoning etc•modal knowledge, utilities, and more.

8

Abstracting Inference and Learning

Domain knowledge:

tank(X) Ç plane(X){ tank(X) ) color(X)=green}color(o121) = red...

Commonsense reasoningLanguage knowledge:

{ Sentence(S) Æ Verb(S,”attacks”) Æ Subject(S,X) Æ Object(S,Y) ) attacking(X,Y) }...

Natural Language Processing

Inferenceand

Learning

Sharing inference and learning module

9

Abstracting Inference and Learning

Domain knowledge:

tank(X) Ç plane(X){ tank(X) ) color(X)=green}color(o121) = red...

Commonsense reasoning WITH Natural Language ProcessingLanguage knowledge:

{ Sentence(S) Æ Verb(S,”attacks”) Æ Subject(S,X) Æ Object(S,Y) ) attacking(X,Y) }...

Inferenceand

Learning

Joint solution made simpler

Verb(s13,”attacks”), Subject(s13,o121)...

10

Standard solutions fall short Logic

Has objects and properties; but statements are absolute;

Graphical models and machine learning Have varying degrees of uncertainty; but propositional; treating objects and

quantification is awkward.

11

Standard solutions fall shortGraphical models and objects

tank(X) Ç plane(X){ tank(X) )color(X)=green }{ next(X,Y) Æ tank(X) ) tank(Y) }

color(o121)=red color(o135)=green

color(121)

tank(o121) plane(o121)

next(o121, o135)

tank(o135) plane(o135)

color(o135)

•Transformation (propositionalization) not part of graphical model framework;

•Graphical model have only random variables, not objects;

•Different data creates distinct, formally unrelated model.

12

Outline A goal: First-order Probabilistic Inference

An ultimate goal Propositionalization

Lifted Inference Inversion Elimination Counting Elimination

DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference

Future Directions and Conclusion

13

First-Order Probabilistic Inference Many languages have been proposed

Knowledge-based model construction (Breese, 1992) Probabilistic Logic Programming (Ng & Subrahmanian, 1992) Probabilistic Logic Programming (Ngo and Haddawy, 1995) Probabilistic Relational Models (Friedman et al., 1999) Relational Dependency Networks (Neville & Jensen, 2004) Bayesian Logic (BLOG) (Milch & Russell, 2004) Bayesian Logic Programs (Kersting & DeRaedt, 2001) DAPER (Heckerman, Meek & Koller, 2007) Markov Logic Networks (Richardson & Domingos, 2004)

“Smart” propositionalization is the main method.

14

Propositionalization Bayesian Logic Programs

Prolog-like statements such as

rescue(Battalion) | in(Soldier,Battalion), wounded(Soldier).

wounded(Soldier) | in(Soldier,Battalion),

attacked(Battalion).

associated with CPTs and combination rules.

|instead of

:-

15

Propositionalizationrescue(Battalion) | in(Soldier,Battalion),

wounded(Soldier). (plus CPT, cr)BN grounded from and-or tree:

rescue(bat13)

in(john,bat13)

wounded(john)

in(peter,bat13)

wounded(peter)

...

...

...

...

...

16

PropositionalizationMulti-Entity Bayesian Networks (Laskey, 2004)

in(Soldier,Battalion) wounded(Soldier)

rescue(Battalion)

in(Soldier,Battalion) attacked(Battalion)

wounded(Soldier)

17

PropositionalizationMulti-Entity Bayesian Networks (Laskey, 2004)

in(Soldier,Battalion) wounded(Soldier)

rescue(Battalion)

in(john,bat13) wounded(john)

rescue(bat13)

in(peter,bat13) wounded(peter)

rescue(bat13)

...

first-order fragments

instantiated

18

PropositionalizationMulti-Entity Bayesian Networks (Laskey, 2004)

in(john,bat13) wounded(john)

rescue(bat13)

in(peter,bat13) wounded(peter)

...

in(john,bat13) wounded(john)

rescue(bat13)

in(peter,bat13) wounded(peter)

rescue(bat13)

...

19

Outline A goal: First-order Probabilistic Inference

An ultimate goal Propositionalization

Lifted Inference Inversion Elimination Counting Elimination

DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference

Future Directions and Conclusion

20

Lifted Inference

rescue(bat13)

rescue(Battalion) ( in(Soldier,Battalion) Æ wounded(Soldier){ wounded(Soldier) ( in(Soldier,Battalion) Æ attacked(Battalion) }

wounded(Soldier)

attacked(bat13)

in(Soldier, bat13)

Unbound, general variable

•Faster•More compact•More intuitive•Higher level – more

information/structure available for optimization

21

Bayesian Networks (directed)

epidemic

P(sick_john, sick_mary, sick_bob, epidemic)= P(sick_john | epidemic) * P(sick_mary | epidemic) * P(sick_bob | epidemic) * P(epidemic)

sick_john sick_mary sick_bob

P(sick_bob | epidemic)

P(epidemic)

P(sick_john | epidemic)

P(sick_mary | epidemic)

22

Factor Networks (undirected)

epidemic_france

P(epi_france, epi_belgium, epi_uk, epi_germany)/ (epi_france, epi_belgium) * (epi_france, epi_uk) * (epi_france, epi_germany) * (epi_belgium, epi_germany)

epidemic_belgium epidemic_germany

epidemic_uk

factor,or potential

function

23

Bayesian Nets as Factor Networks

epidemic

P(sick_john, sick_mary, sick_bob, epidemic)/ P(sick_john | epidemic) * P(sick_mary | epidemic) * P(sick_bob | epidemic) * P(epidemic)

sick_john sick_mary sick_bob

P(sick_bob | epidemic)

P(epidemic)

P(sick_john | epidemic)

P(sick_mary | epidemic)

24

Inference: Marginalization

epidemic

P(sick_john) / epidemic sick_mary sick_bob P(sick_john | epidemic) * P(sick_mary | epidemic) * P(sick_bob | epidemic) * P(epidemic)

sick_john sick_mary sick_bob

P(sick_bob | epidemic)

P(epidemic)

P(sick_john | epidemic)

P(sick_mary | epidemic)

25

epidemic

sick_john sick_mary sick_bob

P(sick_bob | epidemic)

P(epidemic)

P(sick_john | epidemic)

P(sick_mary | epidemic)

P(sick_john) / epidemic P(sick_john | epidemic) * P(epidemic)

* sick_mary P(sick_mary | epidemic)

* sick_bob P(sick_bob | epidemic)

Inference: Variable Elimination (VE)

26

epidemic

sick_john sick_mary

(epidemic)

P(epidemic)

P(sick_john | epidemic)

P(sick_mary | epidemic)

P(sick_john) / epidemic P(sick_john | epidemic) * P(epidemic)

* sick_mary P(sick_mary | epidemic) * (epidemic)

Inference: Variable Elimination (VE)

27

epidemic

sick_john sick_mary

(epidemic)

P(epidemic)

P(sick_john | epidemic)

P(sick_mary | epidemic)

P(sick_john) / epidemic P(sick_john | epidemic) * P(epidemic) * (epidemic)

* sick_mary P(sick_mary | epidemic)

Inference: Variable Elimination (VE)

28

epidemic

sick_john

(epidemic)

P(epidemic)

P(sick_john | epidemic)

P(sick_john) / epidemic P(sick_john | epidemic) * P(epidemic) * (epidemic) * (epidemic)

(epidemic)

Inference: Variable Elimination (VE)

29

sick_john

(sick_john)

P(sick_john) / (sick_john)

Inference: Variable Elimination (VE)

30

A Factor Network

sick(mary,measles)

hospital(mary)

epidemic(measles) epidemic(flu)

sick(mary,flu)

… sick(bob,measles)

hospital(bob)

sick(bob,flu)……

… …

… …

31

First-order Representation

sick(P,D)

hospital(P)

epidemic(D)

Atoms representa set of random

variables

Logical Variablesparameterize random

variables

Parfactors, for parameterized

factors

Contraints to logical

variables

P john

32

Semantics

sick(mary,measles)

hospital(mary)

epidemic(measles) epidemic(flu)

sick(mary,flu)

… sick(bob,measles)

hospital(bob)

sick(bob,flu)……

… …

… …sick(mary,measles), epidemic(measles))

33

Outline A goal: First-order Probabilistic Inference

An ultimate goal Propositionalization

Lifted Inference Inversion Elimination Counting Elimination

DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference

Future Directions and Conclusion

34

Inversion Elimination (IE)

sick(P, D)

epidemic(D)

D measles

D measles

IE

…sick(john, flu)

epidemic(flu)

sick(mary, rubella)

epidemic(rubella)…

Grounding

…sick(john, flu)

epidemic(flu)

sick(mary, rubella)

epidemic(rubella)…

VE

Abstraction

epidemic(D)

D measles

D measles

sick(P, D)

sick(P,D) (epidemic(D),sick(P,D))

35

Inversion Elimination (IE) Proposed by Poole (2003); Lacked formalization; Did not identify cases in which it is

incorrect; Formalized and identified correctness

conditions (with Dan Roth and Eyal Amir).

36

Requires eliminated RVs to occur in separate instances of parfactor

…sick(mary, flu)

epidemic(flu)

sick(mary, rubella)

epidemic(rubella)…

sick(mary, D)

epidemic(D)

D measles

InversionElimination

correct

Inversion Elimination - Limitations

37

IE not correct

Requires eliminated RVs to occur in separate instances of parfactor

Inversion Elimination - Limitations

epidemic(D2)

epidemic(D1)

D1 D2 month

epidemic(measles)

epidemic(flu)

epidemic(rubella)

…month

epidemic(D2)

epidemic(D1)

D1 D2 month

38

Outline A goal: First-order Probabilistic Inference

An ultimate goal Propositionalization

Lifted Inference Inversion Elimination Counting Elimination

DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference

Future Directions and Conclusion

39

Counting Elimination Need to consider joint assignments; Exponential number of those; But actually, potential depends on

histogram of values in assignment only: 00101 the same as 11000;

Polynomial number of assignments instead.

epidemic(D2)

epidemic(D1)

D1 D2 month

epidemic(measles)

epidemic(flu)

epidemic(rubella)

…month

40

A Simple Experiment

41

Outline A goal: First-order Probabilistic Inference

An ultimate goal Propositionalization

Lifted Inference Inversion Elimination Counting Elimination

DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference

Future Directions and Conclusion

42

Outline A goal: First-order Probabilistic Inference

An ultimate goal Propositionalization

Lifted Inference Inversion Elimination Counting Elimination

DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference

Future Directions and Conclusion

43

BLOG (Bayesian LOGic) Milch & Russell (2004); A probabilistic logic language; Current inference is propositional

sampling; Special characteristics:

Open Universe; Expressive language.

44

type Battalion;#Battalion ~ Uniform(1,10);

random Boolean Large(Battalion bat) ~ Bernoulli(0.6);

random NaturalNum Region(Battalion bat) ~ TabularCPD[[0.3, 0.4, 0.2, 0.1]]();

type Soldier;#Soldier(BatallionOf = Battalion bat)

if Large(bat) then ~ Poisson(1500); else if Region(bat) = 2 then = 300; else ~ Poisson(500);

query Average( {#Soldier(b) for Battalion b} );

BLOG (Bayesian LOGic)Open Universe

Expressive language

45

Inference in BLOGrandom Boolean Attacked(bat) ~ Bernoulli(0.7);

random Boolean Wounded(Soldier sol)if Attacked(BattalionOf(sol)) then if Experienced(sol) then ~ Bernoulli(0.1); else ~ Bernoulli(0.4); else = False;

random Boolean Experienced(Soldier sol) ~ Bernoulli(0.1);

random Boolean Rescue(Battalion bat)= exists Soldier sol BattalionOf(sol) = bat & Wounded(sol);

guaranteed Battalion bat13;query Rescue(bat13);

46

Inference in BLOG - Sampling

Rescue(bat13)

#Soldier(bat13)

Wounded(soldier1)

Wounded(soldier73)

Attacked(bat13)

...

Sampled false

Open Universe

false

false

false

47

Inference in BLOG - Sampling

Rescue(bat13)

#Soldier(bat13)

Wounded(soldier1)

Wounded(soldier49)

Attacked(bat13)

...

Sampled true

Open Universe

true Experienced(soldier1)

truetrue

48

Outline A goal: First-order Probabilistic Inference

An ultimate goal Propositionalization

Lifted Inference Inversion Elimination Counting Elimination

DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference

Future Directions and Conclusion

49

Temporal models in BLOG Expressive enough to directly write temporal

modelstype Aircraft;#Aircraft ~ Poisson[3];

random Real Position(Aircraft a, NaturalNum t) if t = 0 then ~ UniformReal[-10, 10]() else ~ Gaussian(Position(a, Pred(t)), 2);type Blip;// num of blips from aircraft a is 0 or 1#Blip(Source = Aircraft a, Time = NaturalNum t) ~ TabularCPD[[0.2, 0.8]]()

// num false alarms has Poisson distribution.#Blip(Time = NaturalNum t) ~ Poisson[2];

random Real ApparentPos(Blip b, NaturalNum t) ~ Gaussian(Position(Source(b), Pred(t)), 2));

50

Temporal models in BLOG Inference algorithm does not use

temporal structure: Markov property: state depends on

previous state only; evidence and query come for successive

time steps; Dynamic BLOG (DBLOG); analogous to Dynamic Bayesian

networks (DBNs).

51

DBLOG Syntax Only change in syntax is a special Timestep

type, with same semantics as NaturalNum:type Aircraft;#Aircraft ~ Poisson[3];

random Real Position(Aircraft a, Timestep t) if t = @0 then ~ UniformReal[-10, 10]() else ~ Gaussian(Position(a, Prev(t)), 2);type Blip;// num of blips from aircraft a is 0 or 1#Blip(Source = Aircraft a, Time = Timestep t) ~ TabularCPD[[0.2, 0.8]]()

// num false alarms has Poisson distribution.#Blip(Time = Timestep t) ~ Poisson[2];

random Real ApparentPos(Blip b, Timestep t) ~ Gaussian(Position(Source(b), Prev(t)), 2));

52

Similar idea to DBN Particle Filtering

DBLOG Particle Filtering

evidence at t

State samples for t - 1

Samples weighed by evidence

resampling

State samples for t

Likelihoodweighting

53

Weighting by evidence in DBNs

evidence

state

t-1 t

54

Weighting evidence in DBLOG

evidence

state Position(a1,@1)

Position(a2,@1)

#Aircraft

t-1 t

obs {Blip b : Source(b) = a1} = {B1};

obs ApparentPos(B1, @2) = 3.2;ApparentPos(B1,@2)

#Blip(a1,@2)

Position(a1,@2)B1

Lazy instantiation

55

Outline A goal: First-order Probabilistic Inference

An ultimate goal Propositionalization

Lifted Inference Inversion Elimination Counting Elimination

DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference

Future Directions and Conclusion

56

Retro-instantiation in DBLOG

Position(a1,@1)

Position(a2,@1)

#Aircraft

ApparentPos(B1,@2)

#Blip(a1,@2)

Position(a1,@2)B1

ApparentPos(B8,@9)

#Blip(a2,@9)

Position(a2,@9)

B8

Position(a2,@2)

...

obs {Blip b : Source(b) = a2} = {B8};

obs ApparentPos(B8,@2) = 2.1;

...

...

57

Coping with retro-instantiation Analyse possible queries and

evidence (provided by user) andpre-instantiate random variables that will be necessary later;

Use mixing time to decide when to stop sampling back.

58

Outline A goal: First-order Probabilistic Inference

An ultimate goal Propositionalization

Lifted Inference Inversion Elimination Counting Elimination

DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference

Future Directions and Conclusion

59

Improving BLOG inference Rejection sampling

evidence

evidence

= ?

We count samples that match the evidence.

60

Improving BLOG inference Likelihood weighting

evidence

We count samples using evidence likelihood as weight.

61

Improving BLOG inference

{Happy(p) : Person p}=

{true, true, false}

Happy(john)= true

Happy(mary)= false

Happy(peter)=false

Evidence is deterministic function, so likelihood is always 0 or 1.So likelihood weighting is no better than rejection sampling.

62

Improving BLOG inference

{Salary(p) : Person p}=

{90,000, 75,000, 30,000}

Salary(john)= 100,000

Salary(mary)= 80,000

Salary(peter)=500,000

The more unlikely the evidence, the worse it gets.

63

Improving BLOG inference

{ApparentPos(b) : Blip b}=

{1.2, 0.5, 5.1}

ApparentPos(b1) = 2.1

ApparentPos(b2) = 3.3

ApparentPos(b3) = 3.5

Doesn’t work at all for continuous evidence.

64

Structure-dependent automatic importance sampling

{ApparentPos(b) : Blip b}=

{1.2, 0.5, 5.1}

ApparentPos(b1) ApparentPos(b2) ApparentPos(b3)

Position(Source(b1)) = 1.5

Position(Source(b2)) = 3.9

Position(Source(b3)) = 4.0

= 1.2 = 5.1 = 0.5 High-level language allows algorithm to automatically apply better sampling schemes

65

Outline A goal: First-order Probabilistic Inference

An ultimate goal Propositionalization

Lifted Inference Inversion Elimination Counting Elimination

DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference

Future Directions and Conclusion

66

For the future Lifted inference works on symmetric,

indistinguishable objects only; Adapt approximations for dealing with

similar objects; Parameterized queries:

P(sick(X, measles)) = ? Function symbols:

(diabetes(X), diabetes(father(X))) Equality (paramodulation); Learning.

67

Conclusion Expressive, first-order probabilistic

representations in inference too! Lifted inference equivalent to

grounded inference; much faster, yet yielding the same exact answer;

DBLOG brings techniques for temporal processes reasoning to first-order probabilistic reasoning.

68