20
Full Joint Distribution (13.3) Product Rule: For example: the probability of a cavity, given evidence of a toothache. Given n variables, and d as an upper bound on the number of values then the full joint distribution table size & corresponding processing of it are AIMA 3e: 13 P ( a | b) = P ( a ^ b) P (b) P (cavity | toothache) = P (cavity ^ toothache) P ( toothache) = (0.108 + 0.012) (0.108 + 0.012 + 0.016 + 0.064) = 0.6 Ρ(cavity, toothache , catch) Ο( d n )

Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

Full Joint Distribution

(13.3) Product Rule:

For example: the probability of a cavity, given evidence of a toothache.

Given n variables, and d as an upper bound on the number of values then the full joint distribution table size & corresponding processing of it are

AIMA 3e: 13

P(a | b) =P(a^b)

P(b)

P(cavity | toothache) =P(cavity^toothache)

P(toothache)=

(0.108 + 0.012)(0.108 + 0.012 + 0.016 + 0.064)

= 0.6

Ρ(cavity,toothache,catch)

Ο(dn )

Page 2: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

AIMA 3e: 13

Simplifying our representation

Absolute Independence

Bayes’ Rule

Allows computing probabilities typically of a cause, given an effect, from known conditional probabilities

Conditional Independence

P(a | b) = P(a)P(cloudy | toothache,catch,cavity) = P(cloudy)

P(cause | effect) =P(effect | cause)P(cause)

P(effect)

Ρ(X,Y | Z) = Ρ(X | Z)Ρ(Y | Z)Ρ(Toothache,Catch | Cavity) = Ρ(Toothache | Cavity)Ρ(Catch | Cavity)

Page 3: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

AIMA 3e: 14.1

Bayesian Network

Bayesian networks can represent essentially any full joint distribution

Specifications:

Each node corresponds to a random variable, which may be discrete or continuous.

A set of directed links or arrows connects pairs of nodes. If there is an arrow from node X to node Y, X is said to be a parent of Y. The graph has no directed cycles.

Each node Xi has a conditional probability distributionthat quantifies the effect of the parents on the node.

Absolute

Conditional

Ρ(Xi | Parents(Xi))

Page 4: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

AIMA 3e: 14.1

Page 5: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

AIMA 3e: 14.2

Semantics

Bayesian network full joint distribution

(14.2)

For example: the probability that the alarm has sounded, but neither a burglary nor an earthquake has occurred, and both John and Mary call.

P(x1,...,xn ) = P(xi | parents(Xii=1

n

∏ ))

P( j,m,a,¬b,¬e) = P( j | a)P(m | a)P(a | ¬b^¬e)P(¬b)P(¬e)

= 0.90 *0.70*0.001*0.999*0.998 ≈ 0.000628

Page 6: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

Semantics

Problem Bayesian network

A Bayesian network is a correct representation of the domain if each node is conditionally independent of its other predecessors in the node ordering, given its parents.

NodesChoose required variables, order them such that cause precedes effect

LinksFor i = 1 to n:

Choose, from a minimal set of parents for

satisfying

For each parent insert a link from the parent to

Write in the conditional probability table (CPT).

X1,..., Xn

X1,...,Xi−1

Ρ(Xi | Xi−1,..., X1) = Ρ(Xi | Parents(Xi))

Xi

Xi

AIMA 3e: 14.2

Page 7: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

Benefits

Each node is connected only to earlier nodes, which guarantees that the network is acyclic.

There are no redundant probability values, therefore no chance for inconsistency.

It is impossible for the knowledge engineer or domain expert to create a Bayesian network that violates the axioms of probability.

AIMA 3e: 14.2

Page 8: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

AIMA 3e: 14.2

Compactness

A Bayesian network is a locally structured and a sparse system.

In a network with Boolean variables, the CPT for a variable with at most k parents

will have values

This Bayesian network requires only values

If each variable were directly influenced by all others the network would require

values, equivalent to the full joint distribution.

Xi

2k

n2k

2n

Page 9: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

2 Extra links3 Extra probabilities

6 Extra links21 Extra probabilities

AIMA 3e: 14.2

If we stick to a causal model, we end up having to specify fewer numbers, and the number will often be easier for come up with.

These specify the exact same joint distribution.

Page 10: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

AIMA 3e: 14.2

Each node X is conditionally independent of its non-descendants given its parents.

Each node is conditionally independent of all others given its Markov blanket:

parentschildrenchildren's parents

Page 11: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

AIMA 3e: 14.3

Efficient Representation

When the relationship between the parents and children is arbitrary, the CPT can become unmanageably large.

Ideally, the relationships should be describable by a canonical distribution that can used to specify the complete table.

Certain Relationships

Deterministic nodesValue can be specified exactly by the values of the parents

Canadian American Mexican

North American

NorthAmerican← Canadian ∨ American ∨ Mexican

Page 12: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

AIMA 3e: 14.3

Uncertain Relationships

Noisy-OR distributionUncertainty in the ability of each parent to affect the child.

The child is false if all true parents are completely inhibited.

AssumptionsAll the possible causes are included.The factor inhibiting one parent is independent of the factor inhibiting any other.

qcold = P(¬fever | cold,¬flu,¬malaria) = 0.6qflu = P(¬fever | ¬cold, flu,¬malaria) = 0.2qmalaria = P(¬fever | ¬cold,¬flu,malaria) = 0.1

P(xi | parents(Xi)) =1− q j{ j:X j = true}

Page 13: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

AIMA 3e: 14.3

Much of the time when there are k parents, the relationship can be described using only parameters, even though are needed for the full conditional probability table.

This is very useful in diagnostic medicine.

Ο(k)

Ο(2k )

Page 14: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

AIMA 3e: 14.3

Continuous Variables

Continuous variables have an infinite set of values, making CPTs impossible to specify.

How do we deal with this?We could perform discretization, which is essentially subdividing the domain into a finite set of intervals, eg. for temperature: {<0 degrees, 0-100 degrees, >100 degrees}

Or we could define a probability density function using a finite set of parameters.

Hybrid Bayesian Networks

ContinuousDiscrete

Page 15: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

AIMA 3e: 14.3

Continuous Variable with Discrete of Continuous Parents

Linear Gaussian

One distribution for each value of Subsidy.

Discrete Variable with Continuous Parent

Probit DistributionIntegral of Gaussian

Soft threshold.

Page 16: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

AIMA 3e: 14.4

Exact Inference

The inference procedure involves a query where X denotes a variable and ea particular observed event.

We know how to do this for a full joint distribution by summing entries in the table.

A query can be answered using a Bayesian network by computing sums of products of conditional probabilities from the network (using equation 14.2).

In this way we reproduce the necessary values from the joint distribution without explicitly constructing the table.

14.4

Algorithm complexity with n Boolean variables, .

Ρ(X | e)

P(b | j,m) = αP(b) P(e) P(a | b,e)P( j | a)P(m | a)a∑

e∑

Ο(2n )

Ρ(Burglary | JohnCalls = true,MaryCalls = true)

Page 17: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

AIMA 3e: 14.4

Repeated subexpressions

Page 18: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

AIMA 3e: 14.4

Variable Elimination

Bottom up evaluation to avoid repeated calculation.

Notice factors 4 and 5 only depend on A since J and M are set in the query.

We substitute the factors in then sum over a, then sum over e. We are left with a product of two sums, but we need to do a number of pointwise products.

The pointwise product of two factors f1 and f2 yields a new factor f whose variables are the union of the variables in f1 and f2 and whose elements are given by the product of the corresponding elements in the two factors.

The pointwise product has entries.

See Figure 14.10 for a computational example of .

P(b | j,m) = αP(b)f1 (B ){ P(e)

f2 (E ){ P(a | b,e)

f3 (A ,B ,E )1 2 4 3 4 P( j | a)

f4 (A )1 2 3 P(m | a)

f5 (A )1 2 4 3 4

a∑

e∑

f (X1...X j ,Y1...Yk,Z1...Zl ) = f1(X1...X j ,Y1...Yk ) f2(Y1...Yk,Z1...Zl )

2 j +k + l

f1(A,B) × f2(B,C) = f3(A,B,C)

Page 19: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

AIMA 3e: 14.4

More Variable Elimination

Although variable ordering does affect evaluation time, it is intractable to determine the optimal ordering. Some generally effective heuristics exist.

See Figure 14.11 for pseudo code of a variable elimination algorithm.

Every variable that is not an ancestor of a query variable or evidence variable is irrelevant to the query and can be trimmed.

Complexity of Exact Inference

Single connected networks (polytrees)At most one undirected path between any two nodes.Time and space complexity are linear in n

Multiply connected networksTime and space complexity is exponential =Very similar to solving a CSPInference is NP-hard

Burglary Network

Wet Grass Network

Page 20: Ρ cavity toothache catch - Queen's University...toothache) = (0.108+0.012) (0.108+0.012+0.016+0.064) =0.6 Ρ(cavity, toothache, catch) Ο(d n) AIMA 3e: 13 Simplifying our representation

AIMA 3e: 14.4

Clustering Algorithms

If we want to query every non-evidence variable, our time complexity jumps

from to

However, we can avoid this increase by using a clustering algorithm to transform a multiply connected network into a polytree by creating meganodes.

Ο(n)

Ο(n2)