37
A D P D S D E S D GOV Section : Probability Distributions & Sampling Distributions Konstantin Kashin Harvard University September , ese notes and accompanying code draw on the notes from Molly Roberts, Maya Sen, Iain Osgood, Brandon Stewart, and TF’s from previous years

GOV óþþþ Section ƒ: Probability Distributions & Sampling

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

GOV 2000 Section 4:Probability Distributions & Sampling

Distributions

Konstantin Kashin1

Harvard University

September 26, 2012

1�ese notes and accompanying code draw on the notes fromMolly Roberts,Maya Sen, Iain Osgood, Brandon Stewart, and TF’s from previous years

Page 2: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Outline

Administrative Details

Probability Distributions

Sampling Distributions

Estimating Sampling Distributions

Page 3: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Problem Set Expectations

▸ �ird problem set distributed yesterday.▸ Corrections for �rst problem set due next Tuesday.▸ Must be typeset using LATEX or Word and submitted as onedocument containing graphics and explanation electronically inpdf form

▸ Must be accompanied by source-able, commented code

Page 4: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Midterm Exam

▸ Window of exam: Tuesday, October 9th (a�er class) - Sunday,October 14th at 11.59pm

▸ Needs to be completed in 5 hours▸ Open note / book, but no collaboration allowed

Page 5: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Outline

Administrative Details

Probability Distributions

Sampling Distributions

Estimating Sampling Distributions

Page 6: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Sampling from Common Probability Distributions

How do we sample 10 observations from the normal distribution withµ = 0 and σ = 2 in Stata?

set seed 02138

set obs 10

drawnorm x, mean (0) sd(2)

How do we sample 10 observations from the uniform distribution onthe interval [0, 10] in R?

set seed 02138

set obs 10

gen u = 10*uniform ()

Page 7: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

What is the probability that a random variable liesin a particular subdomain?

X is the height (in meters) of the US adult population (18+). LetX ∼ N (1.7, 0.0625). What is the probability that X ∈ [1.5, 1.7], thatone’s height is between 1.5 and 1.7 meters?

0.0

0.5

1.0

1.5

0.5 1.0 1.5 2.0 2.5 3.0X (Height in Meters)

Density

Page 8: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

What is the probability that a random variable liesin a particular subdomain?

X is the height (in meters) of the US adult population (18+). LetX ∼ N (1.7, 0.0625). What is the probability that X ∈ [1.5, 1.7], thatone’s height is between 1.5 and 1.7 meters?

0.0

0.5

1.0

1.5

0.5 1.0 1.5 2.0 2.5 3.0X (Height in Meters)

Density

Page 9: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Analytic Approach: CDFs

P(1.5 ≤ X ≤ 1.7) = P(X ≤ 1.7) − P(X ≤ 1.5) = FX(1.7) − FX(1.5)

In Stata:

di normal ((1.7 -1.7)/0.25) - normal ((1.5 -1.7)/0.25)

∴P(1.5 ≤ X ≤ 1.7) ≈ 0.288

Page 10: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Simulation Approach

1. Sample from distribution of interest to approximate it2. Calculate proportion of observations in sample that fall in

subdomain of interest

In Stata:

clear

set seed 02138

set obs 1000

drawnorm x, mean (1.7) sd (0.25)

gen area = x < 1.7 & x > 1.5

quietly sum area

di r(mean)

∴P(1.5 ≤ X ≤ 1.7) ≈ 0.2839

Page 11: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Outline

Administrative Details

Probability Distributions

Sampling Distributions

Estimating Sampling Distributions

Page 12: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Our Data

We are going to work with fulton.csv.

▸ Election data at precinct level for Fulton County, Georgia.▸ Population: 268 precincts▸ Variables: turnout rate, % black, % female, mean age, turnout inDem. primary, turnout in Rep. primary, dummy variable forwhether precinct is in Atlanta, and location dummies for pollingstations

Let’s de�ne X = % black

Page 13: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Visualizing the Population

What does the population of X = % black look like?

0

20

40

60

80

0.0 0.2 0.4 0.6 0.8 1.0% Black in Precinct (X)

Frequency

Page 14: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Calculate True (Population) Mean

sum black

di r(mean)

µ = 0.506

Page 15: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Sampling of Precincts

Suppose we can only sample (SRS without replacement) n = 40precincts. How do we do this in Stata?

set seed 02138

sample 40, count

Page 16: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Visualizing the Sample

What does the sample of X = % black look like?

0

5

10

15

0.0 0.2 0.4 0.6 0.8 1.0% Black in Precinct (X)

Frequency

Page 17: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Calculate SampleMean (X̄)

sum black

di r(mean)

X̄ = 0.412

Page 18: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Connecting this to Resampling

We can think of our sample as one of many possible samples we candraw from our population:

Samples from Pop.1 2 3 ⋯ 10000

X1 0.496 x1,2 x1,3 ⋯ x1,10000X2 0.320 x2,2 x2,3 ⋯ x2,10000⋮ ⋮ ⋮ ⋮ ⋱ ⋮X40 0.172 x40,2 x40,3 ⋯ x40,10000X̄40 0.412 x̄2 x̄3 ⋯ x̄10000S̄2 0.160 s22 s23 ⋯ s210000

Page 19: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Connecting this to Resampling

We can think of our sample as one of many possible samples we candraw from our population:

Samples from Pop.1 2 3 ⋯ 10000

X1 0.496 x1,2 x1,3 ⋯ x1,10000X2 0.320 x2,2 x2,3 ⋯ x2,10000⋮ ⋮ ⋮ ⋮ ⋱ ⋮X40 0.172 x40,2 x40,3 ⋯ x40,10000

µ̂ = X̄40 0.412 x̄2 x̄3 ⋯ x̄10000σ̂2 = S̄2 0.160 s22 s23 ⋯ s210000

Page 20: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Sampling Distribution of X̄

�e sampling distribution of X̄ is the distribution of the followingvector:

Samples from Pop.1 2 3 ⋯ 10000

X1 0.496 x1,2 x1,3 ⋯ x1,10000X2 0.320 x2,2 x2,3 ⋯ x2,10000⋮ ⋮ ⋮ ⋮ ⋱ ⋮X40 0.172 x40,2 x40,3 ⋯ x40,10000

µ̂ = X̄40 0.412 x̄2 x̄3 ⋯ x̄10000σ̂2 = S̄2 0.160 s22 s23 ⋯ s210000

Page 21: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Calculating Sampling Distribution with KnownPopulation

1. Start with complete population.2. De�ne a quantity of interest (the parameter). For us, it’s µ.3. Choose a plausible estimator. For us, it’s µ̂ = X̄.4. Draw a sample from the population and calculate the estimate

using the estimator.5. Repeat step 4 many times.

Page 22: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Calculating Sampling Distribution with KnownPopulation

We can approximate a sampling distribution for the mean andvariance by doing the sampling we did before repeatedly and manuallystoring the estimates.set seed 02138

insheet using fulton.csv , comma clear

sample 40, count

quietly sum receipts , detail

di r(mean)

di r(Var)

insheet using fulton.csv , comma clear

sample 40, count

quietly sum receipts , detail

di r(mean)

di r(Var)

*** and so on...

Page 23: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Visualizing Sampling Distributions

We can now plot the manually stored estimates of the populationmean.

matrix m=(0.4117 \ 0.5123 \ 0.5398 \ 0.6038\ 0.5376\

0.5901\ 0.5087\ 0.4707\ 0.5044\ 0.5646)

svmat m, name(simu)

hist(simu)

Page 24: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Can we automate this a bit more?

insheet using fulton.csv , comma clear

foreach i of numlist 1/20 {

preserve

quietly sample 40, count

quietly sum black

mat holder = (nullmat(holder) \ r(mean))

restore

}

svmat holder

hist holder1 , frequency

Page 25: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Visualizing Sampling Distributions

Here is the sampling distribution (with 10,000 samples) for X̄:

0

2

4

6

0.3 0.4 0.5 0.6 0.7Sample Mean

Density

Page 26: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Visualizing Sampling Distributions

Here is the sampling distribution (with 10,000 samples) for X̄:

0

2

4

6 Original Sample Mean Pop. Mean

0.3 0.4 0.5 0.6 0.7Sample Mean

Density

Page 27: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Characterizing Sampling Distribution of SampleMean

We know that in large sample (large enough for Central Limit�eorem to kick in), the sample mean will be distributed as:

X̄ ∼ N (µ, σ2/n)

Page 28: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Sampling Distributions for Other Statistics

We can calculate in�nitely many statistics from a sample. If we havethe population, we can simulate a sampling distribution for those!

For example, we can look at the sampling distribution of S - the samplestandard deviation.

Page 29: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Visualizing Sampling Distributions

Here is the sampling distribution (with 10,000 draws) for S:

0

5

10

15

20

25

0.35 0.40 0.45Sample Standard Deviation

Density

Page 30: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Outline

Administrative Details

Probability Distributions

Sampling Distributions

Estimating Sampling Distributions

Page 31: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Sampling Distribution of X̄ with UnknownPopulation

In reality, we only draw one sample and cannot directly observe thesampling distribution of X̄:

Samples from Pop.1 2 3 ⋯ 10000

X1 0.496 ? ? ⋯ ?X2 0.320 ? ? ⋯ ?⋮ ⋮ ⋮ ⋮ ⋱ ⋮X40 0.172 ? ? ⋯ ?

µ̂ = X̄40 0.412 ? ? ⋯ ?σ̂2 = S̄2 0.160 ? ? ⋯ ?

Page 32: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Estimating the Sampling Distribution of SampleMean

We can estimate the equation in the previous slide using estimators forthe population mean µ and the population variance σ2. We will usethe following estimators:

▸ µ̂ = X̄ = 1n ∑

ni=1 Xi

▸ σ̂2 = S2X = 1n−1 ∑

ni=1(Xi − X̄)2

�e estimated sampling distribution is thus:

X̄ ∼ N (X̄, S2X/n)

Page 33: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Bootstrapping the Sampling Distribution of SampleMean

1. Start with sample.2. De�ne a quantity of interest (the parameter). For us, it’s µ.3. Choose a plausible estimator. For us, it’s µ̂ = X̄.4. Take a resample of size n (with replacement) from the sample and

calculate the estimate using the estimator.5. Repeat step 4 many times (we will do 10,000).

Page 34: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Visualizing the Bootstrapped Sampling Distribution

Here is the bootstrapped sampling distribution:

0

2

4

6

Sample Mean

0.2 0.3 0.4 0.5 0.6 0.7Sample Mean

Density

Page 35: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Visualizing the Bootstrapped Sampling Distribution

However, any given sample may be far away from the truth!

0

2

4

6

Sample Mean Population Mean

0.2 0.3 0.4 0.5 0.6 0.7Sample Mean

Density

Page 36: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Visualizing the Bootstrapped Sampling Distribution

However, any given sample may be far away from the truth!

0

2

4

6

Sample Mean Population Mean

0.2 0.3 0.4 0.5 0.6 0.7Sample Mean

Density

Page 37: GOV óþþþ Section ƒ: Probability Distributions & Sampling

Administrative Details Probability Distributions Sampling Distributions Estimating Sampling Distributions

Conclusion

ANY QUESTIONS?