81
Bayesian Networks Chris Amato Northeastern University Some images and slides are used from: Rob Platt, CS188 UC Berkeley, AIMA, Mykel Kochenderfer

Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Bayesian Networks Chris Amato Northeastern University Some images and slides are used from: Rob Platt, CS188 UC Berkeley, AIMA, Mykel Kochenderfer

Page 2: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Modelsdescribehow(apor1onof)theworldworksModelsarealwayssimplifica1ons

MaynotaccountforeveryvariableMaynotaccountforallinterac1onsbetweenvariables“Allmodelsarewrong;butsomeareuseful.”

–GeorgeE.P.Box

Whatdowedowithprobabilis1cmodels?

We(orouragents)needtoreasonaboutunknownvariables,givenevidence

Example:explana1on(diagnos1creasoning)Example:predic1on(causalreasoning)Example:valueofinforma1on

Probabilistic models

Page 3: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Twoproblemswithusingfulljointdistribu1ontablesasourprobabilis1cmodels:Unlessthereareonlyafewvariables,thejointisWAYtoobigtorepresentexplicitlyHardtolearn(es1mate)anythingempiricallyaboutmorethanafewvariablesata1me

Bayes’nets:atechniquefordescribingcomplexjointdistribu1ons(models)usingsimple,localdistribu1ons(condi1onalprobabili1es)MoreproperlycalledgraphicalmodelsWedescribehowvariableslocallyinteractLocalinterac1onschaintogethertogiveglobal,indirectinterac1ons

Bayes’ nets: Big picture

Page 4: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Nodes:variables(withdomains)

Canbeassigned(observed)orunassigned(unobserved)

Arcs:interac1ons

SimilartoCSPconstraintsIndicate“directinfluence”betweenvariablesFormally:encodecondi1onalindependence(morelater)

Graphical model notation

Page 5: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Adirected,acyclicgraph,onenodeperrandomvariable§  Acondi1onalprobabilitytable(CPT)foreachnode

§  Acollec1onofdistribu1onsoverX,oneforeachcombina1onofparents’values

§  Bayes’netsimplicitlyencodejointdistribu1ons

§  Asaproductoflocalcondi1onaldistribu1ons§  ToseewhatprobabilityaBNgivestoafullassignment,mul1plyalltherelevantcondi1onalstogether:

Bayes’ net semantics

Bayesian networksA compact representation of a joint probability distribution

I Each node corresponds to a random variableI Arrows connect pairs of nodes, and there are no directed

cyclesI Associated with each node Xi is a distribution P(Xi | PaXi )

B S

E

D C

battery failure solar panel failure

electrical system failure

trajectory deviation communication loss

DMU 2.1.4

Representation of joint distribution

B S

E

D C

P(B) 1 1 P(S)

4 P(E | B, S)

P(D | E ) 2 2 P(C | E )

I Chain rule:P(B, S, E , D, C) = P(B)P(S)P(E | B, S)P(D | E )P(C | E )

I How many independent parameters in general? 25 ≠ 1 = 31I How many independent parameters in Bayes net?

1 + 1 + 4 + 2 + 2 = 10

Bayesian networks can reduce the number of parametersDMU 2.1.4

Page 6: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

NindependentcoinflipsNointerac1onsbetweenvariables:absoluteindependence

X1 X2 Xn

Example: Coin flips

Page 7: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Variables:R:Itrains

T:Thereistraffic

Model1:independence

Whyisanagentusingmodel2be^er?

R

T

R

T

§  Model2:raincausestraffic

Example: Traffic

Page 8: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Bayes’netsimplicitlyencodejointdistribu1ons

Asaproductoflocalcondi1onaldistribu1ons

ToseewhatprobabilityaBNgivestoafullassignment,mul1plyalltherelevantcondi1onalstogether:

Example:

Probabilities in Bayes’ nets

Page 9: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Whyareweguaranteedthatse_ng

resultsinaproperjointdistribu1on?

Chainrule(validforalldistribu1ons):Assumecondi1onalindependences:àConsequence:

NoteveryBNcanrepresenteveryjointdistribu1on

Thetopologyenforcescertaincondi1onalindependencies

Probabilities in Bayes’ nets

Page 10: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Onlydistribu0onswhosevariablesareabsolutelyindependentcanberepresentedbyaBayes’netwithnoarcs.

h 0.5

t 0.5

h 0.5

t 0.5

h 0.5

t 0.5

X1 X2 Xn

Example: Coin flips

Page 11: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

R

T

+r 1/4

-r 3/4

+r +t 3/4

-t 1/4

-r +t 1/2

-t 1/2

Example: Traffic

Page 12: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Burglary Earthquake

Alarm

Johncalls

Marycalls

B P(B)

+b 0.001

-b 0.999

E P(E)

+e 0.002

-e 0.998

B E A P(A|B,E)

+b +e +a 0.95

+b +e -a 0.05

+b -e +a 0.94

+b -e -a 0.06

-b +e +a 0.29

-b +e -a 0.71

-b -e +a 0.001

-b -e -a 0.999

A J P(J|A)

+a +j 0.9

+a -j 0.1

-a +j 0.05

-a -j 0.95

A M P(M|A)

+a +m 0.7

+a -m 0.3

-a +m 0.01

-a -m 0.99

Example: Alarm network

Page 13: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Causaldirec1on

R

T

+r 1/4

-r 3/4

+r +t 3/4

-t 1/4

-r +t 1/2

-t 1/2

+r +t 3/16

+r -t 1/16

-r +t 6/16

-r -t 6/16

Example: Traffic

Page 14: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Reversecausality?

T

R

+t 9/16

-t 7/16

+t +r 1/3

-r 2/3

-t +r 1/7

-r 6/7

+r +t 3/16

+r -t 1/16

-r +t 6/16

-r -t 6/16

Example: Reverse traffic

Page 15: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

WhenBayes’netsreflectthetruecausalpa^erns:

Onensimpler(nodeshavefewerparents)OneneasiertothinkaboutOneneasiertoelicitfromexperts

BNsneednotactuallybecausal

Some1mesnocausalnetexistsoverthedomain(especiallyifvariablesaremissing)

E.g.considerthevariableTrafficEndupwitharrowsthatreflectcorrela1on,notcausa1on

Whatdothearrowsreallymean?

TopologymayhappentoencodecausalstructureTopologyreallyencodescondi1onalindependence

Causality?

Page 16: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Howbigisajointdistribu1onoverNBooleanvariables?

2N

§  HowbigisanN-nodenetifnodeshaveuptokparents?

O(N*2k+1)

§  Bothgiveyouthepowertocalculate

§  BNs:Hugespacesavings!

§  AlsoeasiertoelicitlocalCPTs

§  Alsofastertoanswerqueries(coming)

Size of a Bayes’ net

Page 17: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Assump1onswearerequiredtomaketodefinetheBayesnetwhengiventhegraph:

§  Beyondabove“chainrule→Bayesnet”condi1onalindependenceassump1ons

§  Onenaddi1onalcondi1onalindependences

§  Theycanbereadoffthegraph

§  Importantformodeling:understandassump1onsmadewhenchoosingaBayesnetgraph

P (xi|x1 · · ·xi�1) = P (xi|parents(Xi))

Bayes’ nets: Assumptions

Page 18: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Condi1onalindependenceassump1onsdirectlyfromsimplifica1onsinchainrule:§  Addi1onalimpliedcondi1onalindependenceassump1ons?

Example

Page 19: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Importantques1onaboutaBN:§  Aretwonodesindependentgivencertainevidence?§  Ifyes,canproveusingalgebra(tediousingeneral)§  Ifno,canprovewithacounterexample§  Example:

§  Ques1on:areXandZnecessarilyindependent?§  Answer:no.Example:lowpressurecausesrain,whichcausestraffic.§  XcaninfluenceZ,ZcaninfluenceX(viaY)§  Addendum:theycouldbeindependent:how?

Independence in a Bayes’ net

Page 20: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Studyindependenceproper1esfortriples§  Analyzecomplexcasesintermsofmembertriples§  D-separa1on:acondi1on/algorithmforansweringsuchqueries

D-separation: Outline

Page 21: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Thisconfigura1onisa“causalchain”

X:LowpressureY:RainZ:Traffic

§  GuaranteedXindependentofZ?§  No!

§  OneexamplesetofCPTsforwhichXisnotindependentofZissufficienttoshowthisindependenceisnotguaranteed.

§  Example:

§  Lowpressurecausesraincausestraffic,highpressurecausesnoraincausesnotraffic

§  Innumbers:

P(+y|+x)=1,P(-y|-x)=1,P(+z|+y)=1,P(-z|-y)=1

Causal chains

Page 22: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Thisconfigura1onisa“causalchain”

§  GuaranteedXindependentofZgivenY?

§  Evidencealongthechain“blocks”theinfluence

Yes!

X:LowpressureY:RainZ:Traffic

Causal chains

Page 23: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Thisconfigura1onisa“commoncause”

§  GuaranteedXindependentofZ?§  No!

§  OneexamplesetofCPTsforwhichXisnotindependentofZissufficienttoshowthisindependenceisnotguaranteed.

§  Example:

§  Projectduecausesbothforumsbusyandlabfull

§  Innumbers:

P(+x|+y)=1,P(-x|-y)=1,P(+z|+y)=1,P(-z|-y)=1

Y:Projectdue

X:Forumsbusy Z:Labfull

Common cause

Page 24: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Thisconfigura1onisa“commoncause”

§  GuaranteedXandZindependentgivenY?

§  Observingthecauseblocksinfluencebetweeneffects.

Yes!

Y:Projectdue

X:Forumsbusy Z:Labfull

Common cause

Page 25: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Lastconfigura1on:twocausesofoneeffect(v-structures)

Z:Traffic

§  AreXandYindependent?

§  Yes:theballgameandtheraincausetraffic,buttheyarenotcorrelated

§  S1llneedtoprovetheymustbe(tryit!)

§  AreXandYindependentgivenZ?

§  No:seeingtrafficputstherainandtheballgameincompe11onasexplana1on.

§  Thisisbackwardsfromtheothercases

§  Observinganeffectac1vatesinfluencebetweenpossiblecauses.

X:Raining Y:Ballgame

Common effect

Page 26: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Conditional independence: Battery example

Bayesian networksA compact representation of a joint probability distribution

I Each node corresponds to a random variableI Arrows connect pairs of nodes, and there are no directed

cyclesI Associated with each node Xi is a distribution P(Xi | PaXi )

B S

E

D C

battery failure solar panel failure

electrical system failure

trajectory deviation communication loss

DMU 2.1.4

Representation of joint distribution

B S

E

D C

P(B) 1 1 P(S)

4 P(E | B, S)

P(D | E ) 2 2 P(C | E )

I Chain rule:P(B, S, E , D, C) = P(B)P(S)P(E | B, S)P(D | E )P(C | E )

I How many independent parameters in general? 25 ≠ 1 = 31I How many independent parameters in Bayes net?

1 + 1 + 4 + 2 + 2 = 10

Bayesian networks can reduce the number of parametersDMU 2.1.4

Page 27: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

C is independent of B given E

Knowing that you have a battery failure does not affect your belief that there is a communication loss if you know that there has been an electrical system failure

D is independent of S given E

If you know there is an electrical failure, observing a trajectory deviation does not affect your belief that there has been a solar panel failure

Bayes nets: conditional independence example

Conditional independence

I The structure of a Bayes net encodes conditionalindependence assumptions

I X and Y are independent if and only ifP(X , Y ) = P(X )P(Y )

I Equivalently, P(X ) = P(X | Y )I X and Y are conditionally independent given Z if and only if

P(X , Y | Z ) = P(X | Z )P(Y | Z )I Equivalently, P(X | Z ) = P(X | Y , Z )

Independence assumptions reduce the number of parameters

DMU 2.1.5

Conditional independence example

B S

E

D C

I C is independent of B given EI Knowing that you have a battery failure does not a�ect your

belief that there is a communication loss if you know thatthere has been an electrical system failure

I D is independent of S given EI If you know there is an electrical failure, observing a trajectory

deviation does not a�ect your belief that there has been a solarpanel failure

DMU 2.1.5

Page 28: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

B is independent of S

Knowing there is a battery failure does not affect your belief about whether there has been a solar panel failure

B is not independent of S given E

If you know there has been an electrical failure and there has not been a solar panel failure, then it is more likely there was a battery failure

Influence only flows through B → E ← S (a v-structure) when E (or one of its descendants) is known

Conditional independence in v-structures

Page 29: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Two sets of nodes, A and B, are conditionally independent given node set C are called d-separated in the Bayes net if for any path between A and B:

•  The path contains a chain of nodes, A → X → B, such that X is in C

•  The path contains a fork, A ← X → B, such that X is in C

•  The path contains a v-structure, A → X ← B, such that X is not in C and no descendant of X is in C

Markov blanket: the parents, the children and the parents of the children

•  A node is independent of all other nodes in the graph given its Markov blanket

Independence concepts: General case

Page 30: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Yes

Example

Page 31: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Yes

Yes

Yes

Example

Page 32: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Variables:§  R:Raining§  T:Traffic§  D:Roofdrips§  S:I’msad

§  Ques1ons:

Yes

Example

Page 33: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  GivenaBayesnetstructure,canrund-separa1onalgorithmtobuildacompletelistofcondi1onalindependencesthatarenecessarilytrueoftheform

§  Thislistdeterminesthesetofprobability

distribu1onsthatcanberepresented

Xi �� Xj |{Xk1 , ..., Xkn}

Structure implications

Page 34: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  GivensomegraphtopologyG,onlycertainjointdistribu1onscanbeencoded

§  Thegraphstructureguaranteescertain(condi1onal)independences

§  (Theremightbemoreindependence)

§  Addingarcsincreasesthesetofdistribu1ons,buthasseveralcosts

§  Fullcondi1oningcanencodeanydistribu1on

Topology limits distributions

Page 35: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Wewanttotrackmul1plevariablesover1me,usingmul1plesourcesofevidence

§  Idea:RepeatafixedBayesnetstructureateach1me§  Variablesfrom1metcancondi1ononthosefromt-1

§  DynamicBayesnetsareageneraliza1onofHMMs

DynamicBayesNets(DBNs)

Page 36: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Apar1cleisacompletesamplefora1mestep

§  Ini*alize:Generatepriorsamplesforthet=1Bayesnet§  Examplepar1cle:G1

a=(3,3)G1b=(5,3)

§  Elapse*me:Sampleasuccessorforeachpar1cle§  Examplesuccessor:G2

a=(2,3)G2b=(6,3)

§  Observe:Weighteachen0resamplebythelikelihoodoftheevidencecondi1onedonthesample§  Likelihood:P(E1a|G1

a)*P(E1b|G1b)

§  Resample:Selectpriorsamples(tuplesofvalues)inpropor1ontotheirlikelihood

DBNpar1clefilters

Page 37: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Video:Pacmansonar--GhostDBN

Page 38: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Examples:

§  Posteriorprobability

§  Mostlikelyexplana1on:

Inference

§  Inference:calcula1ngsomeusefulquan1tyfromajointprobabilitydistribu1on

Page 39: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Initial evidence: car won’t start

Testable variables (green), “broken, so fix it” variables (orange)

Hidden variables (gray) ensure sparse structure, reduce parameters

Bayes net for car diagnosis

Example: Car diagnosis

Initial evidence: car won’t startTestable variables (green), “broken, so fix it” variables (orange)Hidden variables (gray) ensure sparse structure, reduce parameters

lights

no oil no gas starterbroken

battery age alternator broken

fanbeltbroken

battery dead no charging

battery flat

gas gauge

fuel lineblocked

oil light

battery meter

car won’t start dipstick

Chapter 14.1–3 19

Page 40: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Suppose we want to infer the distribution P(B=true | D=true, C=true)

Query variables: B

Evidence variables: D, C

Hidden variables: S, E

Inference

Example: Inference in temporal models

X0 X1 X2

O0 O1 O2

I Filtering: P(Xk | O0:k) (Kalman filter/particle filter)I Prediction: P(Xk | O0:t) where k > tI Smoothing: P(Xk | O0:t) where k < t (forward-backward)I Most likely explanation: arg maxX0:t P(X0:t | O0:t) (Viterbi)

DMU 2.2.2

Inference

Suppose we want to infer the distribution P(b1 | d1, c1)

B S

E

D C

I Query variables: BI Evidence variables: D, CI Hidden variables: S, E

DMU 2.2.3

Page 41: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Burglar alarm example Example contd.

.001P(B)

.002

P(E)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

BTTFF

ETFTF

.95

.29

.001

.94

P(A|B,E)

ATF

.90

.05

P(J|A) ATF

.70

.01

P(M|A)

Chapter 14.1–3 6

Page 42: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

B P(B)

+b 0.001

-b 0.999

E P(E)

+e 0.002

-e 0.998

B E A P(A|B,E)

+b +e +a 0.95

+b +e -a 0.05

+b -e +a 0.94

+b -e -a 0.06

-b +e +a 0.29

-b +e -a 0.71

-b -e +a 0.001

-b -e -a 0.999

A J P(J|A)

+a +j 0.9

+a -j 0.1

-a +j 0.05

-a -j 0.95

A M P(M|A)

+a +m 0.7

+a -m 0.3

-a +m 0.01

-a -m 0.99

Burglar alarm example

Page 43: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

What if I want to calculate P(B|j,m)?

Burglar alarm example Example contd.

.001P(B)

.002

P(E)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

BTTFF

ETFTF

.95

.29

.001

.94

P(A|B,E)

ATF

.90

.05

P(J|A) ATF

.70

.01

P(M|A)

Chapter 14.1–3 6

Page 44: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

General case: Evidence variables: Query* variable: Hidden variables:

Allvariables

*Worksfinewithmul0plequeryvariables,too

§  Wewant:

§  Step1:Selecttheentriesconsistentwiththeevidence

§  Step2:SumoutHtogetjointofqueryandevidence

§  Step3:Normalize

⇥ 1

Z

Inference by enumeration

Page 45: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

P(W)?

P(W | winter)?

P(W | winter, hot)?

S T W Psummer hot sun 0.30summer hot rain 0.05summer cold sun 0.10summer cold rain 0.05winter hot sun 0.10winter hot rain 0.05winter cold sun 0.15winter cold rain 0.20

Inference by enumeration

Page 46: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation

P(B|j,m)

= P(B,j,m)/P(j,m)

= α P(B,j,m)

=α Pa∑

e∑ (B,e,a, j,m)

Inference by enumeration

Page 47: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation

Rewrite full joint entries using product of CPT entries:

P(B|j,m)

Recursive depth-first enumeration: O(n) space, O(dn) time

Inference by enumeration

=α Pa∑

e∑ (B)P(e)P(a | B,e)P( j | a)P(m | a)

=αP(B) Pe∑ (e) P

a∑ (a | B,e)P( j | a)P(m | a)

Page 48: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Enumeration is inefficient: repeated computation

e.g., computes P(j|a)P(m|a) for each value of e

Evaluation tree Evaluation tree

P(j|a).90

P(m|a).70 .01

P(m| a)

.05P(j| a) P(j|a)

.90

P(m|a).70 .01

P(m| a)

.05P(j| a)

P(b).001

P(e).002

P( e).998

P(a|b,e).95 .06

P( a|b, e).05P( a|b,e)

.94P(a|b, e)

Enumeration is inefficient: repeated computatione.g., computes P (j|a)P (m|a) for each value of e

Chapter 14.4–5 6

Page 49: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

P (Antilock|observed variables) = ?

Inference by enumeration?

Page 50: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Whyisinferencebyenumera1onsoslow?§  Youjoinupthewholejointdistribu1onbefore

yousumoutthehiddenvariables

§  Idea:interleavejoiningandmarginalizing!§  Called“VariableElimina1on”§  Sumright-to-len,storingintermediateresults

(factors)toavoidrecomputa1on§  S1llNP-hard,butusuallymuchfasterthan

inference

§  Firstwe’llneedsomenewnota1on:factors

Inference by enumeration vs. variable elimination

Page 51: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Ingeneral,whenwewriteP(Y1…YN|X1…XM)

§  Itisa“factor,”amul1-dimensionalarray

§  ItsvaluesareP(y1…yN|x1…xM)

§  Anyassigned(=lower-case)XorYisadimensionmissing(selected)fromthearray

Factors

Page 52: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  RandomVariables§  R:Raining§  T:Traffic§  L:Lateforclass!

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

P (L) = ?

=X

r,t

P (r, t, L)

=X

r,t

P (r)P (t|r)P (L|t)

Example: Traffic domain

Page 53: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Trackobjectscalledfactors§  Ini1alfactorsarelocalCPTs(onepernode)§  Anyknownvaluesareselected

§  E.g.ifweknow,theini1alfactorsare

§  Procedure:Joinallfactors,theneliminateallhiddenvariables

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

+t +l 0.3-t +l 0.1

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

Inference by enumeration: Procedural outline

Page 54: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Firstbasicopera1on:joiningfactors§  Combiningfactors:

§  Justlikeadatabasejoin§  Getallfactorsoverthejoiningvariable§  Buildanewfactorovertheunionofthevariables

involved

§  Example:JoinonR

§  Computa1onforeachentry:pointwiseproducts

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81

Operation 1: Join factors

Page 55: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

JoinR+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

+r +t +l 0.024+r +t -l 0.056+r -t +l 0.002+r -t -l 0.018-r +t +l 0.027-r +t -l 0.063-r -t +l 0.081-r -t -l 0.729

JoinT

Example: Multiple joins

Page 56: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Secondbasicopera1on:marginaliza1on§  Takeafactorandsumoutavariable

§  Shrinksafactortoasmallerone§  Aprojec1onopera1on

§  Example:

+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81

+t 0.17-t 0.83

Operation 2: Eliminate

Page 57: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

SumoutR

SumoutT

+r +t +l 0.024+r +t -l 0.056+r -t +l 0.002+r -t -l 0.018-r +t +l 0.027-r +t -l 0.063-r -t +l 0.081-r -t -l 0.729

+t +l 0.051+t -l 0.119-t +l 0.083-t -l 0.747

+l 0.134-l 0.886

Multiple elimination

Page 58: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Thus far: Multiple join, multiple eliminate (= inference by enumeration)

Page 59: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Marginalizing early (= variable elimination)

Page 60: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  InferencebyEnumera1on

P (L) = ?

§  VariableElimina1on

=X

t

P (L|t)X

r

P (r)P (t|r)=X

t

X

r

P (L|t)P (r)P (t|r)

Traffic domain

Page 61: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

SumoutR

+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

+t 0.17-t 0.83

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

JoinR

+t +l 0.051+t -l 0.119-t +l 0.083-t -l 0.747

+l 0.134-l 0.866

JoinT SumoutT

Variable elimination

Page 62: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Ifevidence,startwithfactorsthatselectthatevidence§  Noevidenceusestheseini1alfactors:

§  Compu1ng,theini1alfactorsbecome:

§  Weeliminateallvariablesotherthanquery+evidence

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

+r 0.1 +r +t 0.8+r -t 0.2

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

Evidence

Page 63: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Resultwillbeaselectedjointofqueryandevidence§  E.g.forP(L|+r),wewouldendupwith:

§  Togetouranswer,justnormalizethis!

§  That’sit!

+l 0.26-l 0.74

+r +l 0.026+r -l 0.074

Normalize

Evidence

Page 64: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Query:§  Startwithini1alfactors:

§  LocalCPTs(butinstan1atedbyevidence)§  Whilethereares1llhiddenvariables

(notQorevidence):§  PickahiddenvariableH§  Joinallfactorsmen1oningH§  Eliminate(sumout)H

§  Joinallremainingfactorsandnormalize

General variable elimination

Page 65: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Choose A

Example

Page 66: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

ChooseE

FinishwithB

Example

Page 67: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

marginalcanbeobtainedfromjointbysummingoutuseBayes’netjointdistribu1onexpressionusex*(y+z)=xy+xzjoiningona,andthensummingoutgivesf1usex*(y+z)=xy+xzjoiningone,andthensummingoutgivesf2

Allwearedoingisexploi*nguwy+uwz+uxy+uxz+vwy+vwz+vxy+vxz=(u+v)(w+x)(y+z)toimprovecomputa*onalefficiency!

Same example in equations

Page 68: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

P(B|j,m)

P(B|j,m)

Complexity depends on factor size

Variable ordering

=αP(B) Pe∑ (e) P

a∑ (a | B,e)P( j | a)P(m | a)

=αP(B) Pa∑ ( j | a)P(m | a) P

e∑ (e)P(a | B,e)

Page 69: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Consider the query P(JohnCalls|Burglary = true)

Sum over m is identically 1; M is irrelevant to the query

Thm 1: Y is irrelevant unless Y ∈ Ancestors(Q ∪ E)

Here, Q = {JohnCalls}, E = {Burglary}, and Ancestors(Q ∪ E) = {Alarm,Earthquake}

so MaryCalls is irrelevant

Irrelevant variables

P(J | b) =αP(b) Pe∑ (e) P

a∑ (a | b,e)P(J | a) P

m∑ (m | a)

Page 70: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

§  Thecomputa1onalandspacecomplexityofvariableelimina1onisdeterminedbythelargestfactor

§  Theelimina1onorderingcangreatlyaffectthesizeofthelargestfactor§  E.g.,2nvs.2

§  Doestherealwaysexistanorderingthatonlyresultsinsmallfactors?

§  No!

VE: Computation and space complexity

Page 71: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Complexity in polytrees is linear in number of variables

A polytree has no undirected cycles

In general, inference in Bayesian networks is NP hard

In worst case, it is probably exponential

Variable elimination algorithm relies on a heuristic ordering of variables to eliminate in sequence

Often linear, sometimes exponential

Belief propagation propagates ``messages'' through the network: Linear for polytrees, not exact for other graphs

Complexity of exact inference

Page 72: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Suppose we want to infer the distribution P(B=true | D=true, C=true)

Query variables: B

Evidence variables: D, C

Hidden variables: S, E

Approximate inference

Example: Inference in temporal models

X0 X1 X2

O0 O1 O2

I Filtering: P(Xk | O0:k) (Kalman filter/particle filter)I Prediction: P(Xk | O0:t) where k > tI Smoothing: P(Xk | O0:t) where k < t (forward-backward)I Most likely explanation: arg maxX0:t P(X0:t | O0:t) (Viterbi)

DMU 2.2.2

Inference

Suppose we want to infer the distribution P(b1 | d1, c1)

B S

E

D C

I Query variables: BI Evidence variables: D, CI Hidden variables: S, E

DMU 2.2.3

Page 73: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

List the nodes in order

If there is an edge A → B, then A comes before B in the list

Topological sort

Example: Inference in temporal models

X0 X1 X2

O0 O1 O2

I Filtering: P(Xk | O0:k) (Kalman filter/particle filter)I Prediction: P(Xk | O0:t) where k > tI Smoothing: P(Xk | O0:t) where k < t (forward-backward)I Most likely explanation: arg maxX0:t P(X0:t | O0:t) (Viterbi)

DMU 2.2.2

Inference

Suppose we want to infer the distribution P(b1 | d1, c1)

B S

E

D C

I Query variables: BI Evidence variables: D, CI Hidden variables: S, E

DMU 2.2.3

Page 74: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Direct sampling in a Bayes Net

Approximate inference through sampling

Page 75: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

In topological order

Sample from the condition probability distribution of X, given the sampled parent values

Approximate inference through sampling (Direct sampling)

Complexity of exact inference

I Complexity in polytrees is linear in number of variablesI A polytree has no undirected cycles

I In general, inference in Bayesian networks is NP hardI In worst case, it is probably exponential

I Variable elimination algorithm relies on a heuristic ordering ofvariables to eliminate in sequence

I Often linear, sometimes exponentialI Belief propagation propagates “messages” through the

networkI Linear for polytrees, not exact for other graphs

DMU 2.2.4

Approximate inference through sampling

B S

E

D C

B S E D C1

DMU 2.2.5

Page 76: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

In topological order

Sample from the condition probability distribution of X, given the sampled parent values

Approximate inference through sampling (Direct sampling)

Approximate inference through sampling

B S

E

D C

B S E D C1 1

DMU 2.2.5

Approximate inference through sampling

B S

E

D C

B S E D C1 1 1

DMU 2.2.5

Page 77: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

In topological order

Sample from the condition probability distribution of X, given the sampled parent values

Approximate inference through sampling (Direct sampling)

Approximate inference through sampling

B S

E

D C

B S E D C1 1

DMU 2.2.5

Approximate inference through sampling

B S

E

D C

B S E D C1 1 1

DMU 2.2.5

Page 78: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

In topological order

Sample from the condition probability distribution of X, given the sampled parent values

Approximate inference through sampling (Direct sampling)

Approximate inference through sampling

B S

E

D C

B S E D C1 1 1 0

DMU 2.2.5

Approximate inference through sampling

B S

E

D C

B S E D C1 1 1 0 00 1 0 1 00 0 0 1 11 0 1 1 11 0 0 0 00 0 0 0 11 0 0 0 10 0 1 0 0

Only 2 rows are useful for estimating P(b1 | d1, c1)!

If likelihood of evidence is small, then many samples are required

DMU 2.2.5

Page 79: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

What is the current approximation for P(B=true | D=true, C=true)?

Approximate inference through sampling (Direct sampling)

Approximate inference through sampling

B S

E

D C

B S E D C1 1 1 0

DMU 2.2.5

Approximate inference through sampling

B S

E

D C

B S E D C1 1 1 0 00 1 0 1 00 0 0 1 11 0 1 1 11 0 0 0 00 0 0 0 11 0 0 0 10 0 1 0 0

Only 2 rows are useful for estimating P(b1 | d1, c1)!

If likelihood of evidence is small, then many samples are required

DMU 2.2.5

Page 80: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

If likelihood of evidence is small, then many samples are required

Approximate inference through sampling (Direct sampling)

Approximate inference through sampling

B S

E

D C

B S E D C1 1 1 0

DMU 2.2.5

Approximate inference through sampling

B S

E

D C

B S E D C1 1 1 0 00 1 0 1 00 0 0 1 11 0 1 1 11 0 0 0 00 0 0 0 11 0 0 0 10 0 1 0 0

Only 2 rows are useful for estimating P(b1 | d1, c1)!

If likelihood of evidence is small, then many samples are required

DMU 2.2.5

Page 81: Bayesian Networks - Khoury College of Computer Sciences · 2016-11-11 · Bayesian networks A compact representation of a joint probability distribution I Each node corresponds to

Likelihood weighting involves generating samples that are consistent with evidence and weighting them

Gibbs sampling involves making random local changes to samples (form of Markov Chain Monte Carlo)

Many other approaches

Approximate sampling approaches