Analogues Slides

8/6/2019 Analogues Slides

1/44

Probability and Statistics:

A Sample Analogues Approach

Charlie Gibbons

Economics 140University of California, Berkeley

Summer 2011


2/44

Outline

1 Populations and samples

2 Probability

Simple probabilityJoint probabilities

Conditional probabilityIndependence

3 Expectations

4 DispersionVarianceCovariance

5 Appendix: Additional examples


3/44

Populations and samples

The population is the universe of units that you care about.Example: American adults.

A sample is a subset of the population.Example: The observations in the Current Population Survey.

Econometrics uses a sample to make inferences about thepopulation. Sample statistics have population analogues.


4/44

Sample frequencies

We begin with some random quantity Y that takes on Kpossible values y1, . . . , yK. The value of this random variable forobservation i is yi; yi is a realization of the random variable Y.Example: The roll of a die can take on values 1, . . . ,6.

We ask, what is the sample frequency of some y from the sety1, . . . , yK? That is, what fraction of our observations have anobserved value of y?

All we do is count the number of observations that have the

value y and divide by the number of observations:

f(y) =#{yi = y}

N.


5/44

Probability mass function

We typically define the probability of y as the fraction of timesthat it arises if we had infinite observationsthe samplefrequency of y in an infinite sample.

We write this as Pr(Y = y). This is the probability that a

random variable Y takes on the value y.Example: The probability of getting some value y {1, . . . , 6}when you roll a die is Pr(Y = y) = 1

6for all y.

Terminology: y is a realization of the random variable Y and

Pr(Y = y) is a probability mass function.


6/44

Cumulative distribution function

We might care about the probability that Y takes on a value ofy or less: Pr(Y y). This is called the cumulative distributionfunction (CDF) of Y.

To get this, we add up the probability of getting any value less

than y:F(y) Pr(Y y) =

yjy

Pr(Y = yj).

Example: When you roll a die, the probability of getting a 3 orless is

F(3) = Pr(Y 3) = Pr(Y = 1) + Pr(Y = 2) + Pr(Y = 3) =1

2.


7/44

Continuous random variables

Life is pretty simple when we have a finite number of y values,but what if we have an infinite number?

The definition of the sample frequency doesnt change, butoften the frequency of any particular value of y will be 1

Ni.e.,

only one observation will have that value.


8/44

Probability density function

Instead of a probability mass function, we have a probabilitydensity function that is defined as the derivative of the CDF:

f(y) =d

dyF(y).

Intuition: The derivative of the CDF answers, how much doesthe total probability change if we consider a little bigger valueof y? How much more probable is getting a value less than y ifwe make y a bit bigger? This is additional contribution in

probability of a small change in y is the probability density of y.Note: For continuous random variables, the CDF is the integralof the PDF (cf., for discrete random variables, the CDF is thesum of the PMF).


9/44

CDF-PDF example

3 2 1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

x

CDF

3 2 1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

x

PDF

Figure: Normal CDF and PDF; slope of CDF line is height of PDF line


10/44

Joint probabilities

Suppose that we have 2 random variables, X and Y and wantto consider their joint frequency in the sample. Extending ourprevious definition, we have

f(x, y) =#{yi = y and xi = x}

N.

These are often called cross tabs (tabulations).

We have obvious extensions to a joint PMF, Pr(X = x, Y = y), joint PDF, f(x, y), and joint CDF, F(x, y).

Examples: Joint PMF Joint CDF (discrete)Joint normal PDF Joint normal CDF


11/44

Conditional frequencies

Suppose that we have two random variables, but we want toconsider the distribution of one for some fixed value of theother. That is, what is the distribution of Y when X = x?

Note that we are limiting our samplewe only care about the

observations such that xi = x. Of this subgroup, what is thefrequency of y?Example: What is the distribution of student heights given thatthey are male?

f(y|X = x) = #{yi = y and xi = x}#{xi = x}.

This is the sample frequency of y given or conditional upon Xbeing xthe conditional sample frequency.


12/44

Conditional probability

The population analogue of conditional frequency, theconditional probability of Y, forms the core of econometrics.The probability that Y takes the value y given that X takesthe value x is

Pr(Y = y|X = x) = Pr(Y = y and X = x)Pr(X = x).

We divide by the probability that X = x to account for the factthat we are only considering a subpopulation.

Example: Conditional probabilities and dice


13/44

Dictatorships and growth

Example from Bill Easterlys Benevolent Autocrats (2011).

Growth Commission Report, World Bank

Growth at such a quick pace, over such a long period, requiresstrong political leadership.

Thomas Friedman, NY Times

One-party autocracy certainly has its drawbacks. But when it isled by a reasonably enlightened group of people, as China istoday, it can also have great advantages. That one party canjust impose the politically difficult but critically importantpolicies needed to move a society forward in the 21st century.


14/44

Wrong question, wrong interpretation

Autocracy Democracy

Growth Success 9 1

f(Autocracy | Success) =9

9 + 1= 90%

f(Democracy | Success) =1

9 + 1= 10%


15/44

Typical question

Econometricians generally ask for the

Pr(outcome | treatment and other predictors).


16/44


17/44

Independence

X and Y are independent if and only if

FX,Y(x, y) = FX(x)FY(y)

(note: these are the population CDFs) and

fX,Y(x, y) = fX(x)fY(y).

We also see that X and Y are independent if and only if

fY|X(y | X = x) = fY(y) x X.

Example: Whats the probability of getting heads on a secondcoin toss if you got heads on the first?

This implies that knowing X gives you no additional ability topredict Y, an intuitive notion underlying independence.Example: Independence and dependence


18/44

Sample average

We are all familiar with the sample average of Y: add up all theobserved values and divide by N:

y =1

N

Ni=1

yi.

Alternatively, we can consider every possible value of Y,y1, . . . , yK and multiply each by its sample frequency:

y =

Kj=1

yj#{yi = yj}

N .


19/44

Expectations

The population version is the expectationtake each value thatY can take on and multiply by its probability (as opposed to itssample frequency):

E

(Y) =

K

j=1 y

j Pr(Y = yj).

For a continuous random variable, we turn sums into integrals:

E(Y) =

yf(y) dy.


20/44

Expectations of functions

We can calculate expectations of functions of Y, g(Y).We have the equations

E[g(Y)] =yY

f(y)g(y)

E[g(Y)] =

g(y)f(y) dy

for discrete and continuous random variables respectively.


21/44

Expectations of functions example

Note that, in general, E[g(Y)] = g[E(Y)].Using a die rolling example,

E(Y2) = 12 1

6+ 22

1

6+ 32

1

6+ 42

1

6+ 52

1

6+ 62

1

6

= 916

= 15.17

= 3.52 = 12.25

E(Y2) = [E(Y)]2


22/44

Expectations are linear

Expectations are linear operators, i.e.,

E(a g(Y) + b h(Y) + c) = a E[g(Y)] + b E[h(Y)] + c.


24/44

Conditional expectations

The conditional expectationE

[Y | X = x] asks, what is theaverage value of Y given that X takes on the value x?

Conditional expectations hold X fixed at some x and the valueE[Y | X = x] varies depending upon which x we pick.

Since X is fixed, it isnt random and can come out of theexpectation:

E[g(X)Y + h(X) | X = x] = g(x)E[Y] + h(x).

L f i d i


25/44

Law of iterated expectations

The law of iterated expectations says that

EY[Y] = EX [E[Y | X = x]] ;

the expectation of Y is the conditional expectation of Y at

X = x averaged over all possible values of X.

V i


26/44

Variance

The variance of a random variable is a measure of its dispersionaround its mean. It is defined as the second central moment ofY:

2

Y Var(Y) = E

(Y )2

Multiplying this out yields:

= E

Y2 2Y + 2

= E

Y2

2E(Y) + 2

= E Y2 [E(Y)]2

S diff t i


27/44

Same mean, different variance

3 2 1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Density

V i f t


28/44

Variance facts

The standard deviation, , of a random variable is the squareroot of its variance; i.e., =

Var(Y).

While the variance is in squared units, the standard deviation isin the same units as y.

See that Var(aY + b) = a2

Var(Y).

S l l f i


29/44

Sample analogue of variance

A candidate for the sample analogue of the variance of Y is

2 =

1

N

Ni=1

(yi y)2.

It turns out that this is a biased estimator of2, so we use

2 =

1

N 1

Ni=1

(yi y)2

instead to get an unbiased estimator.

It turns out that the other estimator is consistent; its bias goesto 0 as N goes to .

Covariance


30/44

Covariance

The covariance of random variables X and Y is defined as

Cov(X, Y) XY = E [(X EX(X)) (Y EY(Y))]

= E(XY) XY.

We have

Var(aX + bY) = a2Var(X) + b2Var(Y) + 2abCov(X, Y).

Note that covariance only measures the linear relationship

between two random variables; well see just what this meanslater on.

The covariance between two independent random variables is 0.

Correlation


31/44

Correlation

The correlation of random variables X and Y is defined as

XY =XY

XY

.

Correlation is a normalized version of covariancehow big is

the covariance relative to the variation in X and Y? Both willhave the same sign.

Sample analogues for covariance and correlation


32/44

Sample analogues for covariance and correlation

Of course, we can get an unbiased estimator for covariance:

XY =1

N 1

Ni=1

(xi x)(yi y).

The sample analogue of correlation can be calculated using thepreceding definitions.

Standardization


33/44

Standardization

Suppose that we take Y, subtract off its mean and divide byits standard deviation . We have

E

Y

=E[Y]

= 0

and

Var

Y

=

1

2Var(Y ) =

1

2Var(Y) = 1.

This is called standardizing a random variable.

Appendix: Additional examples


34/44

Appendix: Additional examples

Example setup


35/44

Example setup

Consider the roll of two dice and let X and Y be the outcomeson each die. Then the 36 (equally-likely) possibilities are:

x, y 1 2 3 4 5 6

1 1,1 1,2 1,3 1,4 1,5 1,6

2 2,1 2,2 2,3 2,4 2,5 2,63 3,1 3,2 3,3 3,4 3,5 3,64 4,1 4,2 4,3 4,4 4,5 4,65 5,1 5,2 5,3 5,4 5,5 5,66 6,1 6,2 6,3 6,4 6,5 6,6

Joint PMF example


36/44

Joint PMF example

The joint probability mass function (joint PMF), fX,

Y is

fX,Y(x, y) = Pr(X = x and Y = y)

What is fX,Y(6, 5)?

x, y 1 2 3 4 5 61 1,1 1,2 1,3 1,4 1,5 1,62 2,1 2,2 2,3 2,4 2,5 2,63 3,1 3,2 3,3 3,4 3,5 3,64 4,1 4,2 4,3 4,4 4,5 4,65 5,1 5,2 5,3 5,4 5,5 5,66 6,1 6,2 6,3 6,4 6,5 6,6

fX,Y(6, 5) =1

36

Joint CDF definition


37/44

Joint CDF definition

The joint cumulative distribution function (joint CDF),FX,Y(x, y), of the random variables X and Y is defined by

FX,Y(x, y) = Pr(X x and Y y)

=

x

s=

y

t=f

X,

Y(s , t)

Joint CDF example


38/44

Joint CDF example

What is FX

,

Y(2, 3)?

x, y 1 2 3 4 5 6

1 1,1 1,2 1,3 1,4 1,5 1,62 2,1 2,2 2,3 2,4 2,5 2,6

3 3,1 3,2 3,3 3,4 3,5 3,64 4,1 4,2 4,3 4,4 4,5 4,65 5,1 5,2 5,3 5,4 5,5 5,66 6,1 6,2 6,3 6,4 6,5 6,6

FX,Y(2, 3) = 636

= 16

Joint normal PDF


39/44

Jo t o a

Joint PDF of independent normals

4

2

0

2

4

4

2

0

2

4

XY

Density


40/44

Conditional probability example


41/44

p y p

What is f(Y = 3 | X 2)?

1 2 3 4 5 6

1 1,1 1,2 1,3 1,4 1,5 1,62 2,1 2,2 2,3 2,4 2,5 2,6

fY|X(y = 3 | X 2) = 212= 1

6

Note how our table changed dimensions because conditioning isall about changing the range of values that we care about; here,we only care about what happens if X 2.

Independence example


42/44

p p

We showed in the two dice example that FX,Y(2, 3) =1

6

, whichis equal to

FX(2) FY(3) =2

6

3

6=

1

6.

This is because the rolls of the two dice are intuitively

independentthe result on one die has no bearing on that ofthe other.

A new random variable


43/44

Imagine instead that X is the outcome on the first die and Z isthe sum of the outcomes on two dice. Then we have

x, z 1 2 3 4 5 6

1 1,2 1,3 1,4 1,5 1,6 1,72 2,3 2,4 2,5 2,6 2,7 2,8

3 3,4 3,5 3,6 3,7 3,8 3,94 4,5 4,6 4,7 4,8 4,9 4,105 5,6 5,7 5,8 5,9 5,10 5,116 6,7 6,8 6,9 6,10 6,11 6,12

As we would imagine, the result of X influences the value of Z,so they shouldnt be independent.

Dependence example


44/44

Lets prove it: What is FX,Z(2, 5)?

x, z 1 2 3 4 5 6

1 1,2 1,3 1,4 1,5 1,6 1,72 2,3 2,4 2,5 2,6 2,7 2,83 3,4 3,5 3,6 3,7 3,8 3,94 4,5 4,6 4,7 4,8 4,9 4,105 5,6 5,7 5,8 5,9 5,10 5,116 6,7 6,8 6,9 6,10 6,11 6,12

FX,Z(2, 5) =

7

36 =

5

54 =

2

6

10

36 = FX(2) FZ(5)

Documents

Analogues Slides