Artificial IntelligenceProbabilistic reasoning
Fall 2008
professor: Luigi Ceccaroni
Bayesian networks
• A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions.
• Syntax:– a set of nodes, one per variable– a directed, acyclic graph (links ≈ "directly influences")– a conditional distribution for each node given its
parents:P (Xi | Parents (Xi))
• In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values.
Example
• Topology of network encodes conditional independence assertions:
• Weather is independent of the other variables.• Toothache and Catch are conditionally
independent given Cavity.
Example
• What is the probability of having a heart attack?
• This probability depends on “4 variables”: – Sport– Diet– Blood pressure– Smoking
• Knowing the dependency among these variables let us build a Bayesian network.
4
Constructing Bayesian networks
• 1. Choose an ordering of variables X1, … ,Xn
• 2. For i = 1 to n– add Xi to the network
– select parents from X1, … ,Xi-1 such that
P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1)
This choice of parents guarantees:P (X1, … ,Xn) = πi =1 P (Xi | X1, … , Xi-1) (chain rule)
= πi =1P (Xi | Parents(Xi)) (by construction)
n
n
Example
Heartattack
Smoking
Blood pressure
DietSport
Diet P(Di)
balanced 0.4
unbalanced 0.6
Sport P(Sp)
yes 0.1
no 0.9
Smoking P(Sm)
yes 0.4
no 0.6
Diet Sport P(Bp = high)
P(Bp = normal)
bal. yes 0.01 0.99
unbal. yes 0.2 0.8
bal. no 0.25 0.75
unbal. no 0.7 0.3
Bp Sm P(Ha=yes) P(Ha=no)
high yes 0.8 0.2
norm. yes 0.6 0.4
high no 0.7 0.3
norm. no 0.3 0.7
Compactness
• A CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of parent values.
• Each row requires one number p for Xi = true(the number for Xi = false is just 1-p).
• If each variable (n) has no more than k parents (k<<n), the complete network requires O(n · 2k) numbers.
Representation cost
• The network grows linearly with n, vs. O(2n) for the conditional full joint distribution.
• Examples: – With 10 variables and at most 3 parents:
•80 vs. 1024
– With 100 variables and at most 5 parents:•3200 vs. 1030
Semantics
The full joint distribution is defined as the product of the local conditional distributions:
P (X1, … ,Xn) = πi = 1 P (Xi | Parents(Xi))
Example:P (sp ∧ Di=balanced ∧ Bp=high ∧ ¬sm ∧ ¬ha) =
= P (sp) P (Di=balanced) P (Bp=high | sp, Di=balanced) P (¬sm) P (¬ha | Bp=high, ¬sm)
n
Bayesian networks – Joint distribution - Example
P(ha Bp = high sm sp Di = ∧ ∧ ∧ ∧balanced)
= P(ha | Bp = high, sm) P(Bp = high | sp, Di = balanced) P(sm) P(sp) P(Di = balanced)
= 0.8 x 0.01 x 0.4 x 0.1 x 0.4
= 0.000128
10
Exact inference in Bayesian networks: example
• Inference by enumeration:
P(X | e) = α P(X, e) = α y P(X, e, y)
• Let’s calculate:
P(Smoking | Heart attack = yes, Sport = no)• The full joint distribution of the network is:
P(Sp, Di, Bp, Sm, Ha) =
= P(Sp) P(Di) P(Bp | Sp, Di) P(Sm) P(Ha | Bp, Sm)
• We want to calculate: P(Sm | ha, ¬sp).
Exact inference in Bayesian networks: example
P(Sm | ha, ¬sp) = α P(Sm, ha, ¬sp) =
= αDi{b, ¬b}Bp{h, n}P(Sm, ha, ¬sp, Di, Bp) =
= α P(¬sp) P(Sm) Di{b, ¬b}P(Di) Bp{h,n}P(Bp | ¬sp, Di) P(ha | Bp, Sm) =
= α <0.9 * 0.4 * (0.4 * (0.25 * 0.8 + 0.75 * 0.6) + 0.6 * (0.7 * 0.8 + 0.3 * 0.6)),
0.9 * 0.6 * (0.4 * (0.25 * 0.7 + 0.75 * 0.3) + 0.6 * (0.7 * 0.7 + 0.3 * 0.3)> =
= α <0.253, 0.274> = <0.48, 0,52>
Variable elimination algorithm
• The variable elimination algorithm let us avoid the calculation repetition of inference by enumeration.
• Each variable is represented by a factor.• Intermediate results are saved to be later
reused.• Non-relevant variables, being constant
factors, are not directly computed.
13
Variable elimination algorithm
14
CALCULA-FACTOR generates the factor corresponding to variable var in the function of the joint probability distribution.
PRODUCTO-Y-SUMA multiplies factors and sums over the hidden variable.
PRODUCTO multiplies a set of factors.
Variable elimination algorithm - Example
α P(¬sp) P(Sm) Di{b, ¬b}P(Di) Bp{h,n}P(Bp | ¬sp, Di) P(ha | Bp, Sm)
• Factor for variable Heart attack P(ha | Bp, Sm), fHa(Bp, Sm):
15
Bp Sm fHa(Bp, Sm)
high yes 0.8
high no 0.7
normal yes 0.6
normal no 0.3
Variable elimination algorithm - Example
• Factor for variable Blood pressure P(Bp | ¬sp, Di), fBp(Bp, Di):
• To put together the factors just obtained, we calculate the product of fHa(Bp, Sm) x fBp(Bp, Di) = fHa Bp(Bp, Sm, Di) 16
Bp Di fBp(Bp, Di)
high balanced 0.25
high unbalanced 0.7
normal balanced 0.75
normal unbalanced 0.3
fHa Bp(Bp, Sm, Di) =
= fHa(Bp, Sm) x fBp(Bp, Di)
Variable elimination algorithm - Example
Bp Sm Di fHa Bp(Bp, Sm, Di)
high yes balanced 0.8 * 0.25
high yes unbalanced 0.8 * 0.7
high no balanced 0.7 * 0.25
high no unbalanced 0.7 * 0.7
normal yes balanced 0.6 * 0.75
normal yes unbalanced 0.6 * 0.3
normal no balanced 0.3 * 0.75
normal no unbalanced 0.3 * 0.3
17
• We sum over the values of variable Bp to obtain factor fHa Bp(Sm, Di)
• Factor for variable Di, fDi(Di):
Variable elimination algorithm - Example
18
Sm Di fHa Bp(Sm, Di)
yes balanced 0.8 * 0.25 + 0.6 * 0.75 = 0.65
yes unbalanced 0.8 * 0.7 + 0.6 * 0.3 = 0.74
no balanced 0.7 * 0.25 + 0.3 * 0.75 = 0.4
no unbalanced 0.7 * 0.7 + 0.3 * 0.3 = 0.58
Di fDi(Di)
balanced 0.4
unbalanced 0.6
• fHa Di Bp(Sm, Di) = fDi(Di) x fHa Bp(Sm, Di)
• We sum over the values of variable Di to obtain factor fHa Di Bp(Sm)
Variable elimination algorithm - Example
19
Sm Di fHa Di Bp(Sm, Di)
yes balanced 0.65 * 0.4
yes unbalanced 0.74 * 0.6
no balanced 0.4 * 0.4
no unbalanced 0.58 * 0.6
Sm fHa Di Bp(Sm)
yes 0.65 * 0.4 + 0.74 * 0.6 = 0.7
no 0.4 * 0.4 + 0.58 * 0.6 = 0.51
Variable elimination algorithm - Example
• Factor for variable Sm, fSm(Sm):
• fHa Sm Di Bp(Sm) = fSm(Sm) x fHa Di Bp(Sm)
• Normalizing, we obtain:20
Sm fSm(Sm)
yes 0.4
no 0.6
Sm fHa Sm Di Bp(Sm)
yes 0.4 * 0.7 = 0.282
no 0.6 * 0.51 = 0.305
Sm P(Sm | ha, ¬sp)
yes 0.48
no 0.52
• Bayesian networks provide a natural representation for (causally induced) conditional independence.
• Topology + CPTs = compact representation of joint distribution.
• Generally easy for domain experts to construct.
Summary