Upload
jeffrey-joseph
View
218
Download
0
Embed Size (px)
DESCRIPTION
Objectives Understand 4 important discrete distributions Describe uncertain worlds with joint probability distributions Reel with terror at the intractability of reasoning with joint distributions Prepare to build models of natural phenomena as Bayes Nets
Citation preview
CS 401R: Intro. to Probabilistic Graphical Models
Lecture #6: Useful Distributions; Reasoning with Joint Distributions
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.
Some slides (14-onward) adapted from slides originally created by Andrew W. Moore of CMU.See message on slide #14.
Announcements Assignment 0.1
Due today
Reading Report #2 Due Wednesday
Assignment 0.2 Mathematical exercises Early: Friday Due next Monday
Objectives
Understand 4 important discrete distributions Describe uncertain worlds with joint
probability distributions Reel with terror at the intractability of
reasoning with joint distributions Prepare to build models of natural
phenomena as Bayes Nets
Parametric Distributions
e.g., Normal Distribution
Bernoulli Distribution
2 possible outcomes1(1 ) when {0,1}
( ) ( ; )0 otherwise
(1 ) (1 ) when {0,1}0 otherwise
when 11 when 00 otherwise
x xp p xP x B x p
x p x p x
p xp x
“What’s the probability of a single binary event x, if a ‘positive’ event has probability p?”
Categorical Distribution Extension for m possible outcomes
“What’s the probability of a single event x (containing a 1 in only one position), if outcomes 1, 2, …, and m, are specified by p = [p1, p2, …, pm]?”
Note: pi must sum to 1
Categorical Distribution Extension for m possible outcomes
“What’s the probability of a single event x (containing a 1 in only one position), if outcomes 1, 2, …, and m, are specified by p = [p1, p2, …, pm]?”
Note: pi must sum to 1
Equivalently:
Great for language models, where each value corresponds to aword or an n-gram of words. (e.g., value ‘1’ corresponds to ‘the’)
1
when {0,1} and exactly one 1( ) ( ; )
0 otherwise
i
mxi i i
i
p x xP x Cat x p
when {1,..., }( ) ( ; )
0 otherwisexp x m
P x C x p
Binomial Distribution 2 possible outcomes; N trials
“What’s the probability in N independent Bernoulli events that x of them will come up ‘positive’, if a ‘positive’ event has probability p?”
!( ) ( ; , ) (1 )!( )!
(1 )
x N x
x N x
NP x Bin x N p p px N xNp p
x
Multinomial Distribution Extension for m possible outcomes; N trials
“What’s the probability in N independent categorical events that value 1 will occur x1 times … and thatvalue m will occur xm times, if the probabilities of each possible value are specified byp = [p1, p2, …, pm]?”
Note: pi must sum to 1
1 2
1 2
1 21111 2
1
1 211 2
! !when ... when ! ! ! ... !( ) ( ; , )
0 otherwise0 otherwise
... when ...
0 otherwise
im
m
m mx m
xx xi imm iii
ii mi
mxx x
m iim
N Np x N p p p x Nx x x xP x Mul x N p
Np p p x N
x x x
Acknowledgments
Note to other teachers and users of the following slides:Andrew Moore would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.
Why Bayes Nets Matter Andrew Moore (Google, formerly CMU):
One of the most important technologies in the Machine Learning / AI field to have emerged in the last 20 years
A clean, clear, manageable language Express what you’re certain and uncertain about
Many practical applications in medicine, factories, helpdesks, robotics, and NLP!
Inference: P(diagnosis | these symptoms)Anomaly Detection: anomalousness of this observationActive Data Collection: next diagnostic test | current observations
The Joint Distribution
Recipe for making a joint distribution of M variables:
Example: Boolean variables X, Y, Z
The Joint Distribution
Recipe for making a joint distribution of M variables:
1. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables, then the table will have 2M
rows).
X Y Z0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
Example: Boolean variables X, Y, Z
The Joint Distribution
Recipe for making a joint distribution of M variables:
1. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables, then the table will have 2M
rows).2. For each combination of values,
indicate the probability.
X Y Z Prob0 0 0 0.30
0 0 1 0.05
0 1 0 0.10
0 1 1 0.05
1 0 0 0.05
1 0 1 0.10
1 1 0 0.25
1 1 1 0.10
Example: Boolean variables X, Y, Z
Recipe for making a joint distribution of M variables:
1. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables, then the table will have 2M
rows).2. For each combination of values,
indicate the probability.3. Per the axioms of probability, those
numbers must sum to 1.
Note: You could be economical and specify only probabilities.
X Y Z Prob0 0 0 0.30
0 0 1 0.05
0 1 0 0.10
0 1 1 0.05
1 0 0 0.05
1 0 1 0.10
1 1 0 0.25
1 1 1 0.10
X=1
Y=1Z=1
0.050.25
0.100.050.05
0.10
0.100.30
The Joint Distribution Example: Boolean variables
X, Y, Z
Using the Joint Distribution
Using the Joint Distribution
Once you have the joint dist., you can ask for the probability of any logical expression involving any of the “attributes”.
rows : r matches
allvaluesof varsnot in
( ) ( )
( , , )r e
e
P e P r
P g h w
What is this summation called?
P(Poor, Male) =
Using the Joint Distribution
rows : r matches
all valuesof varsnot in
( ) ( )
( , , )r e
e
P e P r
P g h w
P(Poor) =
Using the Joint Distribution
Try this.
rows : r matches
all valuesof varsnot in
( ) ( )
( , , )r e
e
P e P r
P g h w
P(Poor) = 0.7604
Using the Joint Distribution
rows : r matches
all valuesof varsnot in
( ) ( )
( , , )r e
e
P e P r
P g h w
1 2
2
rows matching and 1 21 2
2rows matching
( )( , )( | )
( ) ( )r e e
r e
P rP e eP e eP e P r
Inference with the Joint Dist.
P(Male | Poor) =
Inference with the Joint Dist.
1 2
2
rows matching and 1 21 2
2rows matching
( )( , )( | )
( ) ( )r e e
r e
P rP e eP e eP e P r
P(Male | Poor) = 0.4654 / 0.7604 = 0.612
Inference with the Joint Dist.
1 2
2
rows matching and 1 21 2
2rows matching
( )( , )( | )
( ) ( )r e e
r e
P rP e eP e eP e P r
News
GoodOnce you have a joint
distribution, you can ask important questions about uncertain events.
BadImpossible to create for
more than about ten attributes because there are so many numbers needed when you build the distribution.
Next
Address our efficiency problem by making independence assumptions!
Use the Bayes Net methodology to build joint distributions in manageable chunks