8/6/2019 Analogues Slides
1/44
Probability and Statistics:
A Sample Analogues Approach
Charlie Gibbons
Economics 140University of California, Berkeley
Summer 2011
8/6/2019 Analogues Slides
2/44
Outline
1 Populations and samples
2 Probability
Simple probabilityJoint probabilities
Conditional probabilityIndependence
3 Expectations
4 DispersionVarianceCovariance
5 Appendix: Additional examples
8/6/2019 Analogues Slides
3/44
Populations and samples
The population is the universe of units that you care about.Example: American adults.
A sample is a subset of the population.Example: The observations in the Current Population Survey.
Econometrics uses a sample to make inferences about thepopulation. Sample statistics have population analogues.
8/6/2019 Analogues Slides
4/44
Sample frequencies
We begin with some random quantity Y that takes on Kpossible values y1, . . . , yK. The value of this random variable forobservation i is yi; yi is a realization of the random variable Y.Example: The roll of a die can take on values 1, . . . ,6.
We ask, what is the sample frequency of some y from the sety1, . . . , yK? That is, what fraction of our observations have anobserved value of y?
All we do is count the number of observations that have the
value y and divide by the number of observations:
f(y) =#{yi = y}
N.
8/6/2019 Analogues Slides
5/44
Probability mass function
We typically define the probability of y as the fraction of timesthat it arises if we had infinite observationsthe samplefrequency of y in an infinite sample.
We write this as Pr(Y = y). This is the probability that a
random variable Y takes on the value y.Example: The probability of getting some value y {1, . . . , 6}when you roll a die is Pr(Y = y) = 1
6for all y.
Terminology: y is a realization of the random variable Y and
Pr(Y = y) is a probability mass function.
8/6/2019 Analogues Slides
6/44
Cumulative distribution function
We might care about the probability that Y takes on a value ofy or less: Pr(Y y). This is called the cumulative distributionfunction (CDF) of Y.
To get this, we add up the probability of getting any value less
than y:F(y) Pr(Y y) =
yjy
Pr(Y = yj).
Example: When you roll a die, the probability of getting a 3 orless is
F(3) = Pr(Y 3) = Pr(Y = 1) + Pr(Y = 2) + Pr(Y = 3) =1
2.
8/6/2019 Analogues Slides
7/44
Continuous random variables
Life is pretty simple when we have a finite number of y values,but what if we have an infinite number?
The definition of the sample frequency doesnt change, butoften the frequency of any particular value of y will be 1
Ni.e.,
only one observation will have that value.
8/6/2019 Analogues Slides
8/44
Probability density function
Instead of a probability mass function, we have a probabilitydensity function that is defined as the derivative of the CDF:
f(y) =d
dyF(y).
Intuition: The derivative of the CDF answers, how much doesthe total probability change if we consider a little bigger valueof y? How much more probable is getting a value less than y ifwe make y a bit bigger? This is additional contribution in
probability of a small change in y is the probability density of y.Note: For continuous random variables, the CDF is the integralof the PDF (cf., for discrete random variables, the CDF is thesum of the PMF).
8/6/2019 Analogues Slides
9/44
CDF-PDF example
3 2 1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
x
CDF
3 2 1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
x
Figure: Normal CDF and PDF; slope of CDF line is height of PDF line
8/6/2019 Analogues Slides
10/44
Joint probabilities
Suppose that we have 2 random variables, X and Y and wantto consider their joint frequency in the sample. Extending ourprevious definition, we have
f(x, y) =#{yi = y and xi = x}
N.
These are often called cross tabs (tabulations).
We have obvious extensions to a joint PMF, Pr(X = x, Y = y), joint PDF, f(x, y), and joint CDF, F(x, y).
Examples: Joint PMF Joint CDF (discrete)Joint normal PDF Joint normal CDF
8/6/2019 Analogues Slides
11/44
Conditional frequencies
Suppose that we have two random variables, but we want toconsider the distribution of one for some fixed value of theother. That is, what is the distribution of Y when X = x?
Note that we are limiting our samplewe only care about the
observations such that xi = x. Of this subgroup, what is thefrequency of y?Example: What is the distribution of student heights given thatthey are male?
f(y|X = x) = #{yi = y and xi = x}#{xi = x}.
This is the sample frequency of y given or conditional upon Xbeing xthe conditional sample frequency.
8/6/2019 Analogues Slides
12/44
Conditional probability
The population analogue of conditional frequency, theconditional probability of Y, forms the core of econometrics.The probability that Y takes the value y given that X takesthe value x is
Pr(Y = y|X = x) = Pr(Y = y and X = x)Pr(X = x).
We divide by the probability that X = x to account for the factthat we are only considering a subpopulation.
Example: Conditional probabilities and dice
8/6/2019 Analogues Slides
13/44
Dictatorships and growth
Example from Bill Easterlys Benevolent Autocrats (2011).
Growth Commission Report, World Bank
Growth at such a quick pace, over such a long period, requiresstrong political leadership.
Thomas Friedman, NY Times
One-party autocracy certainly has its drawbacks. But when it isled by a reasonably enlightened group of people, as China istoday, it can also have great advantages. That one party canjust impose the politically difficult but critically importantpolicies needed to move a society forward in the 21st century.
8/6/2019 Analogues Slides
14/44
Wrong question, wrong interpretation
Autocracy Democracy
Growth Success 9 1
f(Autocracy | Success) =9
9 + 1= 90%
f(Democracy | Success) =1
9 + 1= 10%
8/6/2019 Analogues Slides
15/44
Typical question
Econometricians generally ask for the
Pr(outcome | treatment and other predictors).
8/6/2019 Analogues Slides
16/44
8/6/2019 Analogues Slides
17/44
Independence
X and Y are independent if and only if
FX,Y(x, y) = FX(x)FY(y)
(note: these are the population CDFs) and
fX,Y(x, y) = fX(x)fY(y).
We also see that X and Y are independent if and only if
fY|X(y | X = x) = fY(y) x X.
Example: Whats the probability of getting heads on a secondcoin toss if you got heads on the first?
This implies that knowing X gives you no additional ability topredict Y, an intuitive notion underlying independence.Example: Independence and dependence
8/6/2019 Analogues Slides
18/44
Sample average
We are all familiar with the sample average of Y: add up all theobserved values and divide by N:
y =1
N
Ni=1
yi.
Alternatively, we can consider every possible value of Y,y1, . . . , yK and multiply each by its sample frequency:
y =
Kj=1
yj#{yi = yj}
N .
8/6/2019 Analogues Slides
19/44
Expectations
The population version is the expectationtake each value thatY can take on and multiply by its probability (as opposed to itssample frequency):
E
(Y) =
K
j=1 y
j Pr(Y = yj).
For a continuous random variable, we turn sums into integrals:
E(Y) =
yf(y) dy.
8/6/2019 Analogues Slides
20/44
Expectations of functions
We can calculate expectations of functions of Y, g(Y).We have the equations
E[g(Y)] =yY
f(y)g(y)
E[g(Y)] =
g(y)f(y) dy
for discrete and continuous random variables respectively.
8/6/2019 Analogues Slides
21/44
Expectations of functions example
Note that, in general, E[g(Y)] = g[E(Y)].Using a die rolling example,
E(Y2) = 12 1
6+ 22
1
6+ 32
1
6+ 42
1
6+ 52
1
6+ 62
1
6
= 916
= 15.17
= 3.52 = 12.25
E(Y2) = [E(Y)]2
8/6/2019 Analogues Slides
22/44
Expectations are linear
Expectations are linear operators, i.e.,
E(a g(Y) + b h(Y) + c) = a E[g(Y)] + b E[h(Y)] + c.
8/6/2019 Analogues Slides
23/44
Expectations and independence
Recall that, for independent random variables X and Y,
fY|X(y | X = x) = fY(y) and fX|Y(x | Y = y) = fX(x)
Hence,E
(Y | X) =E
(Y) andE
(X | Y) =E
(X).
C
8/6/2019 Analogues Slides
24/44
Conditional expectations
The conditional expectationE
[Y | X = x] asks, what is theaverage value of Y given that X takes on the value x?
Conditional expectations hold X fixed at some x and the valueE[Y | X = x] varies depending upon which x we pick.
Since X is fixed, it isnt random and can come out of theexpectation:
E[g(X)Y + h(X) | X = x] = g(x)E[Y] + h(x).
L f i d i
8/6/2019 Analogues Slides
25/44
Law of iterated expectations
The law of iterated expectations says that
EY[Y] = EX [E[Y | X = x]] ;
the expectation of Y is the conditional expectation of Y at
X = x averaged over all possible values of X.
V i
8/6/2019 Analogues Slides
26/44
Variance
The variance of a random variable is a measure of its dispersionaround its mean. It is defined as the second central moment ofY:
2
Y Var(Y) = E
(Y )2
Multiplying this out yields:
= E
Y2 2Y + 2
= E
Y2
2E(Y) + 2
= E Y2 [E(Y)]2
S diff t i
8/6/2019 Analogues Slides
27/44
Same mean, different variance
3 2 1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
Density
V i f t
8/6/2019 Analogues Slides
28/44
Variance facts
The standard deviation, , of a random variable is the squareroot of its variance; i.e., =
Var(Y).
While the variance is in squared units, the standard deviation isin the same units as y.
See that Var(aY + b) = a2
Var(Y).
S l l f i
8/6/2019 Analogues Slides
29/44
Sample analogue of variance
A candidate for the sample analogue of the variance of Y is
2 =
1
N
Ni=1
(yi y)2.
It turns out that this is a biased estimator of2, so we use
2 =
1
N 1
Ni=1
(yi y)2
instead to get an unbiased estimator.
It turns out that the other estimator is consistent; its bias goesto 0 as N goes to .
Covariance
8/6/2019 Analogues Slides
30/44
Covariance
The covariance of random variables X and Y is defined as
Cov(X, Y) XY = E [(X EX(X)) (Y EY(Y))]
= E(XY) XY.
We have
Var(aX + bY) = a2Var(X) + b2Var(Y) + 2abCov(X, Y).
Note that covariance only measures the linear relationship
between two random variables; well see just what this meanslater on.
The covariance between two independent random variables is 0.
Correlation
8/6/2019 Analogues Slides
31/44
Correlation
The correlation of random variables X and Y is defined as
XY =XY
XY
.
Correlation is a normalized version of covariancehow big is
the covariance relative to the variation in X and Y? Both willhave the same sign.
Sample analogues for covariance and correlation
8/6/2019 Analogues Slides
32/44
Sample analogues for covariance and correlation
Of course, we can get an unbiased estimator for covariance:
XY =1
N 1
Ni=1
(xi x)(yi y).
The sample analogue of correlation can be calculated using thepreceding definitions.
Standardization
8/6/2019 Analogues Slides
33/44
Standardization
Suppose that we take Y, subtract off its mean and divide byits standard deviation . We have
E
Y
=E[Y]
= 0
and
Var
Y
=
1
2Var(Y ) =
1
2Var(Y) = 1.
This is called standardizing a random variable.
Appendix: Additional examples
8/6/2019 Analogues Slides
34/44
Appendix: Additional examples
Example setup
8/6/2019 Analogues Slides
35/44
Example setup
Consider the roll of two dice and let X and Y be the outcomeson each die. Then the 36 (equally-likely) possibilities are:
x, y 1 2 3 4 5 6
1 1,1 1,2 1,3 1,4 1,5 1,6
2 2,1 2,2 2,3 2,4 2,5 2,63 3,1 3,2 3,3 3,4 3,5 3,64 4,1 4,2 4,3 4,4 4,5 4,65 5,1 5,2 5,3 5,4 5,5 5,66 6,1 6,2 6,3 6,4 6,5 6,6
Joint PMF example
8/6/2019 Analogues Slides
36/44
Joint PMF example
The joint probability mass function (joint PMF), fX,
Y is
fX,Y(x, y) = Pr(X = x and Y = y)
What is fX,Y(6, 5)?
x, y 1 2 3 4 5 61 1,1 1,2 1,3 1,4 1,5 1,62 2,1 2,2 2,3 2,4 2,5 2,63 3,1 3,2 3,3 3,4 3,5 3,64 4,1 4,2 4,3 4,4 4,5 4,65 5,1 5,2 5,3 5,4 5,5 5,66 6,1 6,2 6,3 6,4 6,5 6,6
fX,Y(6, 5) =1
36
Joint CDF definition
8/6/2019 Analogues Slides
37/44
Joint CDF definition
The joint cumulative distribution function (joint CDF),FX,Y(x, y), of the random variables X and Y is defined by
FX,Y(x, y) = Pr(X x and Y y)
=
x
s=
y
t=f
X,
Y(s , t)
Joint CDF example
8/6/2019 Analogues Slides
38/44
Joint CDF example
What is FX
,
Y(2, 3)?
x, y 1 2 3 4 5 6
1 1,1 1,2 1,3 1,4 1,5 1,62 2,1 2,2 2,3 2,4 2,5 2,6
3 3,1 3,2 3,3 3,4 3,5 3,64 4,1 4,2 4,3 4,4 4,5 4,65 5,1 5,2 5,3 5,4 5,5 5,66 6,1 6,2 6,3 6,4 6,5 6,6
FX,Y(2, 3) = 636
= 16
Joint normal PDF
8/6/2019 Analogues Slides
39/44
Jo t o a
Joint PDF of independent normals
4
2
0
2
4
4
2
0
2
4
XY
Density
8/6/2019 Analogues Slides
40/44
Conditional probability example
8/6/2019 Analogues Slides
41/44
p y p
What is f(Y = 3 | X 2)?
1 2 3 4 5 6
1 1,1 1,2 1,3 1,4 1,5 1,62 2,1 2,2 2,3 2,4 2,5 2,6
fY|X(y = 3 | X 2) = 212= 1
6
Note how our table changed dimensions because conditioning isall about changing the range of values that we care about; here,we only care about what happens if X 2.
Independence example
8/6/2019 Analogues Slides
42/44
p p
We showed in the two dice example that FX,Y(2, 3) =1
6
, whichis equal to
FX(2) FY(3) =2
6
3
6=
1
6.
This is because the rolls of the two dice are intuitively
independentthe result on one die has no bearing on that ofthe other.
A new random variable
8/6/2019 Analogues Slides
43/44
Imagine instead that X is the outcome on the first die and Z isthe sum of the outcomes on two dice. Then we have
x, z 1 2 3 4 5 6
1 1,2 1,3 1,4 1,5 1,6 1,72 2,3 2,4 2,5 2,6 2,7 2,8
3 3,4 3,5 3,6 3,7 3,8 3,94 4,5 4,6 4,7 4,8 4,9 4,105 5,6 5,7 5,8 5,9 5,10 5,116 6,7 6,8 6,9 6,10 6,11 6,12
As we would imagine, the result of X influences the value of Z,so they shouldnt be independent.
Dependence example
8/6/2019 Analogues Slides
44/44
Lets prove it: What is FX,Z(2, 5)?
x, z 1 2 3 4 5 6
1 1,2 1,3 1,4 1,5 1,6 1,72 2,3 2,4 2,5 2,6 2,7 2,83 3,4 3,5 3,6 3,7 3,8 3,94 4,5 4,6 4,7 4,8 4,9 4,105 5,6 5,7 5,8 5,9 5,10 5,116 6,7 6,8 6,9 6,10 6,11 6,12
FX,Z(2, 5) =
7
36 =
5
54 =
2
6
10
36 = FX(2) FZ(5)