View
216
Download
1
Embed Size (px)
Citation preview
Abstraction
Random variable, x
it has no set value, until you ‘realize’ it
its properties are described by a probability, P
pot of an infinite number of x’s
Drawing one x from the pot “realizes” x
One way to think about it
p(x)
x
Describing P
If x can take on only discrete values,
say (1, 2, 3, 4, or 5)
then a table would work:
x 1 2 3 4 5
P 10% 30% 40% 15% 5%
Probabilitiesshould sum to
100%
40% probability that x=4
Sometimes you see probabilities written as fractions, instead of percentages
x 1 2 3 4 5
P 0.10 0.40 0.40 0.15 0.05
Probabilityshould sum
to 1
0.15 probability that x=4
x
P(x)
0.0
0.5
1 2 3 4 5
And sometimes you see probabilities plotted as a histogram
0.15 probability that x=4
If x can take on any value, then use a smooth function (or “distribution”) p(x) instead of a table
p(x)
xx1 x2
probability that x is between x1 and x2 is proportional to
this area
mathematically P(x1<x<x2) = x1
x2 p(x) dx
p(x)
x
Probability that x is between - and + is 100%, so total area = 1
Mathematically -+ p(x) dx = 1
One Reason Why all this is relevant …
Any measurement of data that contains noise is treated as a random variable, d
and …
The distribution p(d) embodies both the ‘true value’ of the datum being measured and the measurement noise
and …
The algebra of random variables allows you to understand how …
… measurement noise affects inferences made from the data
Basic Description of Distributions
want two basic numbers
1) something that describes what x’s commonly occur
2) something that describes the variability of the x’s
The most popular car in the US is the Honda CR-V
But the next car you see on the highway will probably not be a Honda CR-V
Where’s a CV-R?
Honda CV-R
But modes can be deceptive …p(
x)
xxmode
peak
0 10
x N0-1 31-2 182-3 113-4 84-5 115-6 146-7 87-8 78-9 119-10 9
Sure, the 1-2 range has the most counts, but most of the measurements are bigger than 2!
100 realizations of x
p(x)
xxmedian
Median
50% chance x is smaller than xmedian
50% chance x is bigger than xmedian
No special reason the median needs to coincide with the peak
50% 50%
P(x
)
x
Expected value or ‘mean’
value you would get if you took the mean of lots of realizations of x
01
2
3
4
1 2 3
Let’s examine a discrete distribution, for simplicity ...
x N
1 20
2 80
3 40
Total 140
mean = [ 20 1 + 80 2 + 40 3 ] / 140
= (20/140) 1 + (80/140) 2 + (40/140) 3
= p(1) 1 + p(2) 2 + p(3) 3
= Σi p(xi) xi
Hypothetical table of 140 realizations of x
p(x)
x
Here’s a perfectly sensible way to define the width of a distribution…
50%25%25%
W50
… it’s not used much, though
p(x)
x
Variance = 2 = -+ [x-E(x)]2 p(x) dx
E(x)
[x-E
(x)]
2
[x-E
(x)]
2 p(x
)
xE(x)
Compute this total area …
Idea is that if distribution is narrow, then most of the probability lines up with the low spot of the parabola
But if it is wide, then some of the probability lines up with the high parts of the parabola
p(x)
x
variance =
A measure of width …
we don’t immediately know its relationship to area, though …
E(x)
the Gaussian or normal distribution
p(x) = exp{ - (x-x)2 / 22 ) 1(2)
x is expected value
2 is variance
Memorize me !
x
p(x)
x x+2x-2s
95%
Expectation =
Median =
Mode = x
95% of probability within 2 of the expected value
Properties of the normal distribution
Again, Why all this is relevant …
Inference depends on data …
You use measurement, d, to deduce the values of some underlying parameter of interest, m.
e.g.use measurements of travel time, d, to deduce the seismic velocity, m, of the earth
If data, d, is a random variable
then so is model parameter, m
All inferences made from uncertain data are themselves uncertain
Model parameters are described by a distribution, p(m)
Special case of a linear relationship and a normal distribution
Normal p(d) with mean d and variance 2d
Linear relationship m = a d + b
Normal p(m) with mean ad+b and variance a22
d
Example
Liberty island is inhabited by both pigeons and seagulls
40% of the birds are pigeonsand 60% of the birds are gulls
50% of pigeons are white and 50% are grey100% of gulls are white
Two variables
species s takes two values
pigeon p
and gull g
color c takes two values
white w
and tan t
Of 100 birds,
20 are white pigeons
20 are grey pigeons
60 are white gulls
0 are grey gulls
What is the probability that a bird has species s and color c ?
cw t
p
g
s
20% 20%
60% 0%
Note: sum of all boxes is 100%
a random bird, that is
Two continuous variablessay x1 and x2
have a joint probability distributionand written
p(x1, x2)
with p(x1, x2) dx1 dx2 = 1
What is the probability that a bird has color c ?
cw t
p
g
s
20% 20%
60% 0%
start with P(s,c)
80% 20%
and sum columns
To get P(c)
Of 100 birds,
20 are white pigeons
20 are grey pigeons
60 are white gulls
0 are grey gulls
What is the probability that a bird has species s ?
cw t
p
g
s
20% 20%
60% 0%
start with P(s,c)
60%
40%
and sum rows
To get P(s)
Of 100 birds,
20 are white pigeons
20 are grey pigeons
60 are white gulls
0 are grey gulls
These operations make sense with distributions, too
x2
x1
x2
x1
x1
p(x1)
p(x1) = p(x1,x2) dx2
x2
p(x2)
p(x2) = p(x1,x2) dx1
distribution of x1
(irrespective of x2)distribution of x2
(irrespective of x1)
Given that a bird is species swhat is the probability that it has color c ?
cw t
p
g
s
50% 50%
100% 0%
Note, all rows sum to 100
Of 100 birds,
20 are white pigeons
20 are grey pigeons
60 are white gulls
0 are grey gulls
Given that a bird is color cwhat is the probability that it has species s ?
cw t
p
g
s
25% 100%
75% 0%
Note, all columns sum to 100
Of 100 birds,
20 are white pigeons
20 are grey pigeons
60 are white gulls
0 are grey gulls
So 25% of white birds are pigeons
Actor Patrick Swaysepancreatic cancer victim
Lot of errors occur from confusing the two:
Probability that, if you have pancreatic cancer, that you
will die from it
90%
Probability that, if you die, you will have died of
pancreatic cancer
1.4%
and if
P(s,c) = P(s|c) P(c) = P(c|s) P(s)
then
P(s|c) = P(c|s) P(s) / P(c)
and
P(c|s) = P(s|c) P(c) / P(s)
… which is called Bayes Theorem
In this example
bird color is the observable, the “data”
bird species is the “model parameter”
P(c|s) “color given species” or P(d|m) is
“making a prediction based on the model”Given a pigeon, what the probability that it’s grey?
P(s|c), “species given color” or P(m|d) is
“making an inference from the data”Given a grey bird, what the probability that it’s a pigeon?
Why Bayes Theorem is important
It provides a framework for relating
making a prediction from the model, P(d|m)
to
making an inference from the data, P(m|d)
Bayes Theorem also implies that the joint distribution of data and model parameters
p(d, m)
is the fundamental quantity
If you know p(d, m), you know everything there is to know …
The expectation is computed by first reducing the distribution to one dimension
x2
x1
x2
x1
x1
p(x1)
x2
p(x2)
take theexpectationof p(x1) to get x1
x1
x2
take theexpectationof p(x2) to get x2
The varaince is also computed by first reducing the distribution to one dimension
x2
x1
x2
x1
x1
p(x1)
x2
p(x2)
take thevarianceof p(x1) to get 1
2
x1
x2
take thevarianceof p(x2) to get 2
2
1
2
Note that in this distributionif x1 is bigger than x1, then x2 is bigger than x2 and if x1 is smaller than x1, then x2 is smaller than x2
x2
x1Expected value
x1
x2
This is a
positive correlation
Conversely, in this distributionif x1 is bigger than x1, then x2 is smaller than x2 and if x1 is smaller than x1, then x2 is smaller than x2
x2
x1Expected value
x1
x2
This is a
negative correlation
This correlation can be quantified by multiplying the distribution by a four-quadrant function
x2
x1
x1
x2
+
+ -
-
And then integrating. The function (x1-x1)(x2-x2) works fine
C = (x1-x1) (x2-x2) p(x1,x2) dx1dx2Called the “covariance”
Note that the matrix C with elements
Cij = (xi-xi) (xj-xj) p(x1,x2) dx1dx2
has diagonal elements of xi2 the variance of xi
andoff-diagonal elements of cov(xi,xj) the covariance of xi and xj
C =
12 cov(x1,x2) cov(x1,x3)
cov(x1,x2) 22 cov(x2,x2)
cov(x1,x3) cov(x2,x2) 32
The “vector of means” of multivatiate distribution
x
and the “Covariance matrix” of multivariate distribution
Cx
summarized a lot – but not everything –about a multivariate distribution