44
1 AP Statistics Chapter 7 -- Sampling Distributions BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 Mixed up parameter vs. statistic Mixed up bias vs. variability Mixed up greater than vs. less than Mixing up singular vs. plural Used formulas from the formula sheet that weren’t relevant to Chapter 7 Forgot how to calculate a z-score and its subsequent probabilities Thought that a large sample size was ‘bad.’ Inappropriate ‘communication’ (mixing up: p-hat, x- bar and x) Didn’t trust what the problems told you (ex: mean/proportion of a sampling distribution) Either didn’t know the TWO conditions at all or mixed them up o 10% condition – tells whether you can calculate standard deviation o Normality condition – tells you whether the distribution is normal or not. For proportions – the ‘tests’

BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

Embed Size (px)

Citation preview

Page 1: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

1

AP Statistics Chapter 7 -- Sampling Distributions BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes)

Biggest issues from Chapter 7 Mixed up parameter vs. statistic Mixed up bias vs. variability Mixed up greater than vs. less than Mixing up singular vs. plural Used formulas from the formula sheet that weren’t

relevant to Chapter 7 Forgot how to calculate a z-score and its subsequent

probabilities Thought that a large sample size was ‘bad.’ Inappropriate ‘communication’ (mixing up: p-hat, x-

bar and x) Didn’t trust what the problems told you (ex:

mean/proportion of a sampling distribution) Either didn’t know the TWO conditions at all or mixed

them up o 10% condition – tells whether you can calculate

standard deviation o Normality condition – tells you whether the

distribution is normal or not. For proportions – the ‘tests’

Page 2: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

2

For means – what did it say about population; was ‘n’ greater than 30

Didn’t know what the Central Limit Theorem said. DO NOT WRITE ‘MAGIC NUMBER’ on an AP Stats test

or assignment again….. Lots and lots and lots and lots of order of operations

errors

Ultimately the issue with most students on this test was a ‘lack of experience’… (i.e.: probably waited until the night before test to either study or BEGIN doing homework).

Page 3: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

3

Day #1: Sampling Distributions I can distinguish between a parameter and a statistic.

Parameters and Statistics A number that describes the population is called a parameter

o EXAMPLES of parameters:

A number that is computed from a sample is called a statistic.

o EXAMPLES of statistics:

In this chapter, we will be introduced to new a new way to think of a

parameter/statistic using proportions;

Example: Parameters or Statistics? For each of the following statements: a. Identify the underlined number as the value of either a population

characteristic (parameter) or a statistic. b. Use appropriate notation to describe each number.

a. A department store reports that 84% of all customers who use the store’s credit plan pay their bills on time.

b. A sample of 100 students at a large university had a mean age of 24.1 years.

Page 4: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

4

c. The Department of Motor Vehicles reports that 22% of all vehicles registered in a particular state are imports.

d. A hospital reports that based on the 10 most recent cases, the mean length of stay for surgical patients is 6.4 days.

e. A consumer group, after testing 100 batteries of a certain brand, reported an average life of 63 hrs of use with a standard deviation of 3.6 hrs..

f. A random sample of female college students has a mean height of 64.5 inches, which is greater than the 63 inch mean height of all adult American women.

g. The Bureau of Labor Statistics announces that last month it interviewed all members of the labor force in a sample of 50,000 households, 4.5% of the people interviewed were employed.

Page 5: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

5

Day #2: Sampling Distributions: I can understand the definition of a sampling distribution. Definition of a sampling distribution: The sampling distribution of a statistic is the distribution of

values taken by the statistic in all possible samples of the same _________ from the same ______________.

Activity: Choosing Cards MATERIALS: A deck of cards Several of you will receive a deck of cards and you are to REMOVE the aces and face cards so that only the cards 2 through 10 remain. 1. After removing the aces and face cards, shuffle the deck, randomly

select 5 cards, and note the median value of the cards. (For example, if the selected cards were 2, 2, 4, 5, and 9, the median would be 4.)

2. We will all record the value of the sample median on a class dotplot going from 2 to 10. Use a lowercase m instead of a dot.

3. Describe what you see: shape, center, spread, and any unusual values.

Page 6: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

6

I can distinguish between population distribution, sampling distribution, and the distribution of sample data.

It is important that we are able to distinguish between the following three things: Population distribution Distribution of sample data Sampling distribution

Consider a deck of cards. There are 52 cards in the deck. 26 are black and the remaining 26 are red. Show the population distribution graphically: For those with a deck of cards, take a random sample of 10 cards and create a graph that will accurately reflect your sample of n = 10. Show the sampling distribution of all the decks of cards in the class by considering the proportion of black cards.

Page 7: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

7

I can determine whether a statistic is an unbiased estimator of a population parameter.

Definition of unbiased estimator: A statistic used to estimate a parameter if an unbiased estimator if

the mean of its sampling distribution is equal to the true value of the parameter being estimated.

EXAMPLE: Below are histograms of the values taken by three sample statistics in several hundred samples from the same population. The true value of the population parameter is marked on each histogram.

a. Which statistic has the largest bias among these three? Justify your

answer.

b. Which statistic has the lowest variability among these three?

c. Based on the performance of the three statistics in many samples, which is preferred as an estimate of the parameter? Why?

Page 8: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

8

Example: More Tanks Here are 5 methods for estimating the total number of tanks: METHOD #1: partition = max(5/4), METHOD #2: max = max, METHOD #3: MeanMedian = mean + median, METHOD #4: SumQuartiles = Q1 + Q3, METHOD #5: TwiceIQR = 2IQR.

The graph below shows the approximate sampling distribution for each of these statistics when taking samples of size 4 from a population of 342 tanks.

Problem: (a) Which of these statistics appear to be biased estimators? Explain. (b) Of the unbiased estimators, which is best? Explain. (c) Explain why a biased estimator might be preferred to an unbiased estimator.

Partition

Max

MeanMedian

SumQuartil...

TwiceIQR

0 100 200 300 400 500 600 700

= 342

Measures from Sample of Collection 1 Dot Plot

Page 9: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

9

I can understand the relationship between sample size and the variability of an estimator.

Sampling Variability The variability of a statistic is described by the

__________________ of its sampling distribution. o This spread is determined primarily by the

___________ of the random sample. o Larger samples will give ___________________

spread. o The spread of the sampling distribution DOES NOT

depend on the size of the population as long as the population is at least _______ times larger than the sample.

o If I compare many different samples and the statistic is very similar in each one, then the sampling variability is ____________.

o If I compare many different samples and the statistic is very different in each one, then the sampling variability is _____________.

Example: Bias and Variability: The figure shown shows histograms of four sampling distributions of statistics intended to estimate the same parameter. Label each distribution relative to the others as having large or small bias and as having large or small variability.

Page 10: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

10

Page 11: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

11

Page 12: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

12

Example: Penny For Your Thoughts…on Proportions Here are the results of 500 SRSs of size 5, 500 SRSs of size 10, and 500 SRSs of size 20 when sampling from a population of 2341 pennies where the true proportion of pennies minted in the 2000s is p = 0.293.

Notice that all three distributions have a mean of about 0.293, the

value of the true proportion of pennies minted in the 2000s in the population. The spread, however, gets smaller as the sample size increases and the shape becomes more symmetric and less skewed to the right as the sample size increases.

Now, suppose we wanted to estimate the proportion of pennies minted before 1976. In the same population, the true proportion of pennies minted before 1976 is p = 0.092. Here are the results of 500 SRSs of sizes 5, 10, and 20:

Notice that the means of the distributions are about the same and

approximately equal to the true proportion, p = 0.092. Also, the spread of the distributions get smaller as the sample size increases and the shape of the distribution becomes more symmetric and less skewed to the right as the sample size increases, although the shape is still clearly right-skewed for n = 20 in this case.

For all three sample sizes, the distributions were more skewed when p = 0.092 than when p = 0.293. In general, the closer p is to 0 or 1, the more skewed the distribution of p̂ will be for samples of a given size.

0.0 0.2 0.4 0.6 0.8 1.0 1.2

SampleProportion5

Measures from Sample of Pennies 5Dot Plot

0.0 0.2 0.4 0.6 0.8 1.0 1.2

SampleProportion10

Measures from Sample of Pennies...Dot Plot

SampleProportion20

0.0 0.2 0.4 0.6 0.8 1.0 1.2

Measures from Sample of Pennies...Dot Plot

0.0 0.2 0.4 0.6 0.8 1.0

SampleProportion5

Measures from Sample of Pennies 5Dot Plot

0.0 0.2 0.4 0.6 0.8 1.0

SampleProportion10

Measures from Sample of Pennies...Dot Plot

SampleProportion20

0.0 0.2 0.4 0.6 0.8 1.0

Measures from Sample of Pennies...Dot Plot

Page 13: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

13

MULTIPLE CHOICE QUESTION: Five estimators for a parameter are being evaluated. The true value of the parameter is 0. Simulations of 100 random samples, each of size ‘n’ are drawn from a population. For each simulated sample, the five estimates are computed. The histograms below display the simulated sampling distributions for the five estimators. Which simulated sampling distribution is associated with the best estimator for this parameter?

Page 14: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

14

Page 15: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

15

Day #3 – Sample Proportions REVIEW: The parameter ___________ is the population proportion. In

practice, this value is always unknown. (If we know the population proportion, then there is no need for a sample.)

The statistic __________ is the sample proportion.

We use __________ to estimate the value of __________.

The value of the statistic _________ changes as the sample changes. Recap of Parameters and Statistics:

Write examples of both parameters and statistics below: PARAMETERS STATISTICS_______________

I can find the mean and standard deviation of the sampling distribution of a sample proportion p̂ for an SRS of size n from a

population having proportion p of successes. I can check whether the 10% and Normal conditions are met in a

given setting. Example: (from text: Peck, #8.27 p. 467) Genetics A certain chromosome defect occurs in only 1 out of 200 adult Caucasian males. A random sample of n = 100 adult Caucasian males is to be obtained.

a. What is the mean value of sample proportion p?

Page 16: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

16

b. What is the standard deviation of the sample proportion? NOTE: Before you can ever calculate the standard deviation of a sample proportion, you MUST check what is known as the 10% condition.

The 10% condition is: “The population must be at least 10 times as large as your sample.”

Consider that in this situation prior to calculating the answer.

c. Does p have approximately a normal distribution in this case? Explain.

NOTE: In order to determine if there is a normal distribution, you must check the ‘normal condition’. The way that you do this is by doing the following tests:

Normality test: n·p ≥ 10 and n·q ≥ 10 (remember: q = 1-p)

d. What is the smallest value of n for which the sampling distribution of p is approximately normal?

Page 17: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

17

Example: (from text: Bock, #7 p. 428) SPEEDING: State police believe that 70% of the drivers traveling on a major interstate highway exceed the speed limit. They plan to set up a radar trap and check the speeds of 80 cars. Do you think the appropriate conditions necessary for your analysis are met? Explain.

Example: Church Attendance: The Gallup Poll asked a probability sample of 1785 adults whether they attended church or synagogue during the past week. Suppose that 40% of the adult population did attend. We would like to know the probability that an SRS of 1785 would come with plus or minus 3 percentage points of this true value.

a. If ̂ is the proportion of the sample who did attend church or synagogue, what is the mean of the sampling distribution of ̂? What is its standard deviation?

b. Explain why you can use the formula for the standard deviation of ̂ in this setting.

c. Check that you can use the normal approximation for this distribution of ̂.

Page 18: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

18

CHAPTER 2 – Revisited… The total area under a density curve is always equal to

____________. True or False. A density curve must always be ‘curvy.’ A normal distribution can be described as having a shape that is

______________________. When a distribution is described as being ‘normal’ and having a

shape that is ‘symmetric’, then its mean is equal to its ______________________.

Standard deviation is a measure of _________________. The 68-95-99.7% Rule (AKA: Empirical rule) deals with how many

_______________________________________________________ away from the ___________ lie certain individual observations.

A z-score is a measure of ___________________________________.

To standardize (find a z-score), we use the following formula to find a ‘test statistic’:

Standardized Test Statistic =

Page 19: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

19

I can use Normal approximation to calculate probabilities involving

p̂ .

I can use the sampling distribution of p̂ to evaluate a claim about a

population proportion. Example: Planning for College The superintendent of a large school district wants to know what proportion of middle school students in her district are planning to attend a four-year college or university. Suppose that 80% of all middle school students in her district are planning to attend a four-year college or university. What is the probability that a SRS of size 125 will give a result within 7 percentage points of the true value? We want to find the probability that the proportion of middle school students who plan to attend a four-year college or university falls between 73% and 87%. That is, P(0.73 p̂

0.87).

Page 20: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

20

Example: Church Attendance (again): a. Find the probability that ̂ takes a value between 0.37 and 0.43. Will an

SRS of size 1785 usually give a result ̂ within plus or minus 3 percentage points of the true population proportion? Explain.

b. Suppose that 40% of the adult population attended church or synagogue last week. Previously, you were asked to find the probability that ̂ from an SRS estimates p = 0.4 within 3 percentage points. Find the probability for SRS’s of sizes 300, 1200, and 4800. What general fact do your results illustrate?

Page 21: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

21

Day #4 – Sample Means: I can find the mean and standard deviation of the sampling

distribution of a sample mean x from an SRS of size n.

I can calculate probabilities involving a sample mean x when the population distribution is Normal.

When we choose many SRSs of size ‘n’ from a population, the sampling distribution of the sample means is centered at the population mean ( ) and is less spread out than the population distribution. Here are the facts: The mean of the sampling distribution of ̅ is ̅ .

The standard deviation of the sampling distribution of ̅ is now going

to be:

NOTE: the 10% condition MUST be satisfied for the standard deviation.

Example: Movie-going students Suppose that the number of movies viewed in the last year by high school students has an average of 19.3 with a standard deviation of 15.8. Suppose we take an SRS of 100 high school students and calculate the mean number of movies viewed by the members of the sample. Problem: (a) What is the mean of the sampling distribution of x ? (b) What is the standard deviation of the sampling distribution of x ? Check whether the 10% condition is satisfied.

Page 22: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

22 Consider the confusion of n = 1:

Example: Weights of Newborn Children: The weights of newborn children in the United States vary according to the normal distribution with mean 7.5 pounds and standard deviation 1.25 pounds. The government classifies a newborn as having low birth weight if the weight is less than 5.5 pounds. a. What is the probability that a baby chosen at random weighs less than

5.5 pounds at birth? You choose three babies at random and compute their mean weight,

x . b. What are the mean and standard deviation of the mean weight

x of the three babies?

c. What is the probability that their average birth weight is less than 5.5 pounds?

Page 23: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

23

Example: Buy Me Some Peanuts and Sample Means Problem: At the P. Nutty Peanut Company, dry roasted, shelled peanuts are placed in jars by a machine. The distribution of weights in the bottles is approximately Normal, with a mean of 16.1 ounces and a standard deviation of 0.15 ounces. (a) Without doing any calculations, explain which outcome is more likely, randomly selecting a single jar and finding the contents to weigh less than 16 ounces or randomly selecting 10 jars and finding the average contents to weigh less than 16 ounces. (b) Find the probability of each event described above.

Page 24: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

24

Example – Auto Parts: An automatic grinding machine in an auto parts plant prepares axles with a target diameter μ = 40.125 millimeters (mm). The machine has some variability, so the standard deviation of the diameters is σ = 0.002 mm. The machine operator inspects a sample of 4 axles each hour for quality control purposes and records the sample mean diameter.

a. What will be the mean and standard deviation of the numbers recorded? Do your results depend on whether or not the axle diameters have a normal distribution?

b. Can you find the probability that an SRS of 4 axles has a mean diameter greater than 40.127 mm? If so, do it. If not, explain why not.

Page 25: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

25

2004 AP Statistics Free-Response Question (Form B) #3 Trains carry bauxite ore from a mine in Canada to an aluminum processing plant in northern New York State in hopper cars. Filling equipment is used to load ore into the hopper cars. When functioning properly, the actual weights of ore loaded into each car by the filling equipment at the mine are approximately normally distributed with a mean of 70 tons and a standard deviation of 0.9 ton. If the mean is greater than 70 tons, the loading mechanism is overfilling. a. If the filling equipment is functioning properly, what is the probability

that the weight of the ore in a randomly selected car will be 70.7 tons or more? Show your work.

b. Suppose that the weight of ore in a randomly selected car is 70.7 tons.

Would that fact make you suspect that the loading mechanism is overfilling the cars? Justify your answer.

Page 26: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

26

c. If the filling equipment is functioning properly, what is the probability that a random sample of 10 cars will have a mean ore weight of 70.7 tons or more? Show your work.

d. Based on your answer in part (c), if a random sample of 10 cars had a mean ore weight of 70.7 tons, would you suspect that the loading mechanism was overfilling the cars? Justify your answer.

Page 27: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

27

Day #5: Central Limit Theorem I can explain how the shape of the sampling distribution of x is

related to the shape of the population distribution.

Example: A Strange Population Here is a population distribution with a strange shape:

What do you think the sampling distribution of x will look like for samples of size 2? What about samples of size 5? Size 25? Here are the results of 10,000 SRSs of each size. The first graph has three peaks, since there are only 4 basic outcomes for a sample: two small values, which gives a small mean, two large values, which gives a large mean, or one of each, with gives a mean in the middle. Since there are two ways to get one of each, the middle pile is roughly twice as big.

Page 28: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

28

Example: 1998 AP Statistics Question #1 Consider the sampling distribution of a sample mean obtained by random sampling from an infinite population. This population has a distribution that is highly skewed toward the larger values. a. How is the mean of the sampling distribution related to the mean of the

population?

b. How is the standard deviation of the sampling distribution related to the standard deviation of the population?

c. How is the shape of the sampling distribution affected by the sample size?

Page 29: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

29

I can use the central limit theorem to help find probabilities involving a sample mean x .

Central Limit Theorem Draw an SRS of size n from any population whatsoever with mean and

standard deviation .

When n is large, the sampling model of the sample means x is close to

the normal model ,Nn

with mean and standard deviation n

Central Limit Theorem: In more ‘simplistic’ language: ‘The larger “n” is… the closer to normal, the sampling distribution will be…’

Compare the Central Limit Theorem to the Law of Large Numbers:

Law of Large Numbers Draw observations at random from any population with mean . As the

number of observations increases, the sample mean x gets closer and closer to .

Law of Large Numbers In more ‘simplistic’ language: As ‘n’ gets bigger and bigger the sample means gets closer to the population mean.

EXAMPLE: Coin Toss Suppose that you and your lab partner flip a coin 20 times and you calculate the proportion of tails to be 0.8. Your partner seems surprised at these results and suspects that the coin is not fair. Write a brief statement that describes why you either agree or disagree with him.

Page 30: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

30

Example: Hot Dogs(from text: Peck, Ex #8.6 p. 458) A hot dog manufacturer asserts that one of its brands of hot dogs

has an average fat content of μ = 18 g per hot dog. Consumers of this brand would probably not be disturbed if the mean is less than 18 but would be unhappy if it exceeds 18. Let x denote the fat content of a randomly selected hot dog, and suppose that σ, the standard deviation of the x distribution, is 1.

An independent testing organization is asked to analyze a random

sample of 36 hot dogs. Let ̅ be the average fat content for this sample. The sample size, n = 36, is large enough to rely on the Central Limit Theorem and to regard the ̅ distribution as approximately normal. The standard deviation of the ̅distribution is:

If the manufacturer’s claim is correct, we know that μx = μ = 18 g. Suppose that the sample resulted in a mean of ̅ = 18.4 g. Does this result suggest that the manufacturer’s claim is incorrect?

Page 31: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

31

Day #6: Chapter 7 Test Review I can distinguish between a parameter and a statistic.

1. For each description below, identify each underlined number as a parameter or statistic. Use appropriate notation to describe each number, e.g., ̂ = 0.96 . (a) A 1993 survey conducted by the Richmond Times-Dispatch one week before election day asked voters which candidate for the state’s attorney general they would vote for. 37% of the respondents said they would vote for the Democratic candidate. On election day, 41% actually voted for the Democratic candidate.

(b) The National Center for Health Statistics reports that the mean systolic blood pressure for males 35 to 44 years of age is 128 and the standard deviation is 15. The medical director of a large company looks at the medical records of 72 executives in this age group and finds that the mean systolic blood pressure for these executives is 126.07.

I can determine whether a statistic is an unbiased estimator of a population parameter.

I can understand the relationship between sample size and the variability of an estimator.

2. Suppose two different statistics—call them Statistic A and Statistic B—can be used to estimate the same population parameter. Statistics A has lower bias than B, but A also has high variability compared to B. On the two axes below, draw two parallel dotplots showing 8 values of each statistic that are consistent with these characteristics. Assume that the parameter value is at the arrow on the axes.

Page 32: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

32

I can understand the definition of a sampling distribution.

I can distinguish between population distribution, sampling distribution, and the distribution of sample data.

3. A large pet store that specializes in tropical fish has several thousand guppies. The store claims that the guppies have a mean length of 5 cm and a standard deviation of 0.5 cm. You come to the store and buy 10 randomly-selected guppies and find that the mean length of your 10 guppies is 4.8 cm. This makes you suspect that the mean fish length is not what the store says it is. To explore this further, you assume that the length of guppies is Normally distributed and use a computer to simulate 200 samples of 10 guppies from the store’s claimed population. Below is a dotplot of the means from these 200 samples.

(a) What is the population in this situation, and what population parameters have we been given? (b) The distribution of one sample is described in the opening paragraph. What information have we been given about this sample? (c) Is the dotplot above a sampling distribution? Explain. (d) Do you think the store is being honest about the length of its guppies? Justify your answer.

Page 33: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

33

I can find the mean and standard deviation of the sampling distribution of a sample proportion p̂ for an SRS of size n from a population having

proportion p of successes. I can check whether the 10% and Normal conditions are met in a given

setting. I can use Normal approximation to calculate probabilities involving p̂ .

I can use the sampling distribution of p̂ to evaluate a claim about a

population proportion. 4. According to the 2000 U.S. Census, 80% of Americans over the age of 25 have earned a high school diploma. Suppose we take a random sample of 120 Americans and record the proportion, ̂, of individuals in our sample that have a high school diploma. (a) What are the mean and standard deviation of the sampling distribution of ̂ ? (b) What is the approximate shape of the sampling distribution? Justify your answer. (c) Suppose our sample size was 30 instead of 120. Compare the shape, center, and spread of this sampling distribution to the one in parts (a) and (b).

Page 34: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

34 (d) You live in a small town with only 500 residents over the age of 25. What is the largest possible sample you can take from your town and still be able to calculate the standard deviation of sampling distribution of ̂ using the method presented in the textbook? Explain. 5. George is a big fan of music from the 1960s, and 22% of the songs on his mp3 player are Beatles songs. Suppose George sets his mp3 player to “shuffle,” so that it selects songs randomly (assume the shuffle function permits repetition of songs). During a long drive, George plays 50 randomly-selected songs. (a) What are the mean and standard deviation of the proportion of the 50 randomly-selected songs that are Beatles songs? (b) Calculate the probability that more than 30% of the 50 randomly-selected songs are Beatles songs.

Page 35: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

35

I can find the mean and standard deviation of the sampling distribution of a sample mean x from an SRS of size n.

I can calculate probabilities involving a sample mean x when the population distribution is Normal.

I can explain how the shape of the sampling distribution of x is related to the shape of the population distribution.

I can use the central limit theorem to help find probabilities involving a sample mean x .

6. The customer care manager at a cell phone company keeps track of how long each help-line caller spends on hold before speaking to a customer service representative. He finds that the distribution of wait times for all callers has a mean of 12 minutes with a standard deviation of 5 minutes. The distribution is moderately skewed to the right. Suppose the manager takes a random sample of 10 callers and calculates their mean wait time, x . (a) What is the mean of the sampling distribution of x ? (b) Is it possible to calculate the standard deviation of x ? If it is, do the calculation. If it isn’t, explain why. (c) Do you know the approximate shape of the sampling distribution of x ? If so, describe the shape and justify your answer. If not, explain why not.

Page 36: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

36 7. The weights of Granny Smith apples from a large orchard are Normally distributed with a mean of 380 gm and a standard deviation of 28 gm. (a) A single apple is selected at random from this orchard. What is the probability that it weighs more 400 gm? (b) Three apples are selected at random from this orchard. What is the probability that their mean weight is greater than 400 gm.? (c) Explain why the probabilities in (a) and (b) are not equal.

Page 37: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

37

Day #7: Chapter 7 Test Review FRAPPY #1: IPOD’s a. David’s iPod has about 10,000 songs. The distribution of the

play times for these songs is heavily skewed to the right with a mean of 225 seconds and a standard deviation of 60 seconds. Suppose we choose an SRS of 10 songs from this population and calculate the mean play time ̅ of these songs. What are the mean and standard deviation of the sampling distributions of ̅? Explain.

b. Explain why you cannot safely calculate the probability that the mean play time ̅ is more than 4 minutes (240 seconds) for an SRS of 10 songs.

c. Suppose that we take an SRS of 36 songs instead. Explain how the central limit theorem allows us to find the probability that the mean play time is more than 240 seconds. Then calculate this probability. Show your work.

Page 38: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

38

FRAPPY #2: ‘Bottling Cola’: A bottling company uses a filling machine to fill plastic bottles with cola. The bottles are supposed to contain 300 millimeters (mL). In fact, the contents vary according to a normal distribution with mean μ = 298 mL and standard deviation σ = 3 mL. a. What is the probability that an individual bottle contains less

than 295 mL?

b. What is the probability that the mean contents of the bottle in a six-pack is less than 295 mL?

c. What is the probability that the mean contents of the bottle in a six-pack is greater than 299 mL?

Page 39: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

39

FRAPPY #3: ‘Polling Women’: Suppose that 47% of all adult women think they do not get enough time for themselves. An opinion poll interviews 1025 randomly chosen women and records the sample proportion who feels they don’t get enough time for themselves. a. Describe the sampling distribution of ̂.

b. The truth about the population is p = 0.47. In what range will the middle 95% of all sample results fall?

c. What is the probability that the poll gets a sample in which fewer than 45% say they do not get enough time for themselves?

Page 40: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

40

FRAPPY #4: “Candy Bars” The distribution of actual weights of 8-ounce chocolate bars produced by a certain machine is Normal with mean 8.1 ounces and standard deviation 0.1 ounces. Company managers do not want the weight of a chocolate bar to fall below 7.85 ounces, for fear that consumers will complain.

a. Find the probability that the weight of a randomly selected candy bar is less than 7.85 ounces.

b. Four candy bars are selected at random and their mean weight, x , is computed. Describe the center, shape, and spread of the sampling distribution of x .

c. Find the probability that the mean weight of the four candy bars is less than 7.85 ounces.

Page 41: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

41

Day #8: Chapter 7 Test Review FRAPPY #5: ELECTRICAL PROBLEMS: It is generally believed that electrical problems affect about 14% of all new cars. An automobile mechanic conducts diagnostic tests on 128 new cars on the lot. a. Describe the sampling distribution for the sample proportion by telling

its mean and standard deviation. Justify your answer. (Be sure to check for normality and to address any conditions/assumptions)

b. What is the probability that in this group of new cars, over 18% will be found to have electrical problems?

c. What is the probability that in this group of new cars, between 11% and 17% will be found to have electrical problems?

Page 42: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

42

FRAPPY #6: SALES TAX: A survey asks a random sample of 1500 adults in Ohio if they support an increase in the state sales tax from 5% to 6%, with the additional revenue going to education. Let p denote the proportion in the sample that says they support the increase. Suppose that 40% of all adults in Ohio support the increase. a. If ̂ is the proportion of the sample who support the increase, what is

the mean of the sampling distribution of ̂?

b. What is the standard deviation of ̂?

c. Explain why you can use the formula for the standard deviation of ̂ in this setting.

d. Check that you can use the normal approximation for the distribution of ̂.

Page 43: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

43

FRAPPY #7: SODA: A certain beverage company is suspected of under filling its cans of soft drink. The company advertises that its cans contain, on the average, 12 ounces of soda with standard deviation 0.4 ounce. a. Compute the probability that a random sample of 50 cans produces a

sample mean fill of 11.9 ounces or less. (A sketch of the distribution is required.)

b. Suppose that each of the 25 students in a statistics class collects a random sample of 50 cans and calculates the mean number of ounces of soda. Describe the approximate shape of the distribution for these 25 values of

x .

c. What important principle that we studied is used to answer the previous question?

Page 44: BEFORE WE EVEN BEGIN: (Flashback to 2012 and to … Statistics...BEFORE WE EVEN BEGIN: (Flashback to 2012 and to 2013 SHS AP Statistics classes) Biggest issues from Chapter 7 ... I

44

FRAPPY #8: Television (from Strive Book): A television producer must schedule a selection of paid advertisements during each hour of programming. The lengths of the advertisements are Normally distributed with a mean of 28 seconds and standard deviation of 5 seconds. During each hour of programming, 45 minutes are devoted to the program and 15 minutes are set aside for advertisements. To fill in the 15 minutes, the producer randomly selects 30 advertisements. a. Describe the sampling distribution of the sample mean length for the random samples of 30 advertisements. b. If 30 advertisements are randomly selected, what is the probability that the total time needed to air them will exceed the 15 minutes available? Show your work.