38
Chapter 9 Sampling Distributions

Stats chapter 9

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Stats chapter 9

Chapter 9

Sampling Distributions

Page 2: Stats chapter 9

9.1 SAMPLING DISTRIBUTIONS

Page 3: Stats chapter 9

Definitions

Parameter• the value of a characteristic for the

entire population attained through census

• in practice, is usually an unknown or estimated value

Page 4: Stats chapter 9

Definitions

Statistic• the value of a characteristic for the

entire population attained through sampling

• In practice, the value of a statistic is used to estimate the parameter

Page 5: Stats chapter 9

Sampling Variability

• Random samples will produce different values for a statistic

• The statistics are usually not the same value of the parameter

• Different sample produce different values (all of which are “close” to the parameter)

• This fact is known as sampling variability

• The value of a statistic for the same parameter varies in repeated sampling.

Page 6: Stats chapter 9

Parameters Statistics

• Parameter Statistic

• Mean of a Pop Mean of a sample

• Prop. of a pop. Prop. of a sample

x

p p

Page 7: Stats chapter 9

Sampling Distribution

• All samples of size n are taken from a population of size N

• A histogram of these sample statistics is created

• This distribution is called the “sampling distribution”

• In practice, the sampling distribution is theorized, but never “created”

Page 8: Stats chapter 9

Creating a Sampling Distribution

• Let’s look at a pop N = 5, who answered ‘yes’ or ‘no’ to the question “Do you like toast?”

• We want to know proportion who say ‘yes’• Here are the responses:• ID Response

01 Yes02 No03 Yes04 No05 Yes

Page 9: Stats chapter 9

Creating a Sampling Distribution

• Let’s look at each sample and the phat for sample size n = 3

• Sample # ID’s in sample p-hat• 1 01, 02, 03 0.66

2 01, 02, 04 0.333 01, 02, 05 0.664 01, 03, 04 0.665 01, 03, 05 16 01, 04, 05 0.667 02, 03, 04 0.338 02, 03, 05 0.669 02, 04, 05 0.3310 03, 04, 05 0.66

• You can imagine that this quickly gets labor intensive!

Page 10: Stats chapter 9

Creating a Sampling Distribution

• Create a Histogram• Class Count

0.00-0.24 00.25-0.49 30.50-0.74 60.75-1.00 1

• Notice that p = 0.6, and the mean of this distribution is approx 0.6

0 0.5 1

7654321

Page 11: Stats chapter 9

Describing Sampling Distributions

• Like most 1-var data, we describe :– Center– Shape– Spread– Unusual features/Outliers

• If you are using a sample to estimate a parameter, of the sampling distribution:– Where should the center be?– What about the “ideal shape?”– What would you like the spread to be?– Would outliers be helpful?

Page 12: Stats chapter 9

Sampling Distribution and Bias

• When a statistic is unbiased, the mean of the sampling distribution is the value of the parameter.– This is actually a pretty powerful statement. – In order to find the value of the parameter, you just

need to take a lot of samples! (wait, that’s not good either)

– Revision: If a statistic is unbiased, then “chances are” the value of any sample should be close to the value of parameter

• Statistics that are unbiased are called “unbiased estimators” (these are good)

Page 13: Stats chapter 9

Variability of a Statistic

• The spread of a sampling distribution is known as the variability of the statistic

• Large sample size = less variability

Page 14: Stats chapter 9

The Enemies of Sampling

• Enemy #1: Bias• Enemy #2: Variability• A visual of the difference:

Page 15: Stats chapter 9

The Enemies of Sampling

• Another look with Histograms:

Page 16: Stats chapter 9

9.2 SAMPLE PROPORTIONS

Page 17: Stats chapter 9

Sampling Distribution for Proportions

• For each sample, calculate p-hat:

• The sampling distribution of p-hat will have:–Mean = p (the parameter)– Standard deviation:

# of successes

sample size

Xp

n

p

pq

n

Page 18: Stats chapter 9

Sampling Distribution for Proportions

• Notice that this is an unbiased estimator!

• The standard deviation decreases when the sample size is large

• Std. Dev. and sample size have an “inverse square” relation– Ex. If we want ½ the std dev,

we need to 4x the sample size– Ex. If we want to 1/3 the std dev,

we need to 9x the sample size

Page 19: Stats chapter 9

Sampling Distribution for Proportions

• We will (almost) always use the Normal approximation for the sampling distribution for p-hat.

• This means we will need some conditions:1. We want “N > 10n”

This ensures our std dev formula holds2. np > 10 and nq > 10

This ensures our samp. dist. is approx. Normal

Page 20: Stats chapter 9

Samp Dist for Prop. (Example)

We are sampling from a large population. Our sample size is 1500. We know that the p = 0.35. What is the probability that our sample is more than 2 percent from the parameter?

Page 21: Stats chapter 9

Samp Dist for Prop. (Example)

• To summarize the problem, we are trying to find out what proportion of samples have a p-hat greater than 0.37 or less than 0.33

• It will be easier to use the rules of compliments and to find “1 – P(0.33 < p-hat < 0.37)”

Page 22: Stats chapter 9

Samp Dist for Prop. (Example)

• Can we use a Normal approximation for this problem? Let’s check the conditions:1. Although we are not told the exact

population size N, we are told the population is large.

“We are told the population is large, so N > 10(1500)”

Tip: when a problem says the population is large, you are to interpret that the population is greater than 10n

Page 23: Stats chapter 9

Samp Dist for Prop. (Example)

• Can we use a Normal approximation for this problem? Let’s check the conditions:2. np = 1500(0.35) = 525 > 10

nq = 1500(0.65) = 975 > 10

“Since np = 525 > 10 and nq = 975 >10 and N > 10(1500), we can use the Normal distribution”

• Note: It is extremely important that you state and justify the use of the Normal distribution.

Page 24: Stats chapter 9

Samp Dist for Prop. (Example)

• Time for a graph (before normalization)Remember, you don’t have to be too fancy here!

Page 25: Stats chapter 9

Samp Dist for Prop. (Example)

• Let’s Normalize!

0.35

0.35 0.650.0123

1500

p

pq

n

0.33 or 0.37 1 0.33 0.37P p p P p

0.33 0.35 0.37 0.351

0.123 0.123P z

1 1.63 1.63P z

Page 26: Stats chapter 9

Samp Dist for Prop. (Example)

• Now the normalized graph

Page 27: Stats chapter 9

Samp Dist for Prop. (Example)

• Compute the area 0.33 or 0.37 1 0.33 0.37P p p P p

0.33 0.35 0.37 0.351

0.123 0.123P z

1 1.63 1.63P z

use "Normcdf(-1.63, 1.63)"1 0.8968

0.1032

Page 28: Stats chapter 9

Samp Dist for Prop. (Example)

• Finish the normalized graph

Page 29: Stats chapter 9

Samp Dist for Prop. (Example)

• Summary:– “The probability that a sample (n=1500) is

more than 2 percent from the parameter is 0.1032”

• Notes: remember that in this context, probability is the same as proportion, and proportion is the same as area.

• Actually, you’ve done many of these kinds of problems already, right?

Page 30: Stats chapter 9

9.3 SAMPLE MEANS

Page 31: Stats chapter 9

Samples vs. Census

• Histogram for returns on common stocks in 1987:

• Histogram for 5 stock portfolios in 1987

Page 32: Stats chapter 9

Samples vs. Census

• We can see from the previous slide that the distribution of samples (portfolio)– Are less variable than the census– Are more Normal than the census

Page 33: Stats chapter 9

Sampling Distribution for Means

• Suppose we have a sampling distribution of samples size n from a large population

• The mean of the sampling distribution is the mean of the population

• The std dev of the samp dist is given by:

x

x n

Page 34: Stats chapter 9

Sampling Distribution for Means

• The sample mean is an unbiased estimator of the population mean

• Like for proportions, the std dev and the population size have an inverse square relation

• Like for proportions, we need N > 10n for our std dev formula to hold up

• This sampling distribution holds true even if the population is not Normal!

Page 35: Stats chapter 9

The Central Limit Theorem

• An SRS of size n from any population will produce a sampling distribution that is N( , /(n)) whenever n is large enough.

• Caution: this theorem is only true for means. Do not try to use the CLT for proportions!

Page 36: Stats chapter 9

The Central Limit Theorem

Why we use CLT:• From the previous section, we saw

that we use the Normal dist to gauge probability of producing samples

• We invoke the CLT to justify usage of the Normal distribution– Using Normal dist w/o justification is a

“nono”

Page 37: Stats chapter 9

The Central Limit Theorem

When to use the CLT:• Sampling Distribution for a mean ()• We need to Normalize the sample

mean• The sample is described as “large”– Generally, n > 30

• The raw data is not given

Page 38: Stats chapter 9