9.1 Power Point

8/8/2019 9.1 Power Point

1/33

Sampling DistributionsSampling Distributions::Section 9.1Section 9.1


2/33

WhyAm I Here Again?WhyAm I Here Again?

Statistics is the science (and art) of learning from data.

Remember from the first week of class that there are

two basic kinds of statistics:

exploratorydata analysis: an informal and open-ended examination of data for patterns

statistical inference: follows strict rules and focuseson judging whether the patterns you found are the

sort you would expect


3/33

Does It Matter Which I Use?Does It Matter Which I Use?

Yes! Exploratorydata analysis can be done with anydata, but formal inference should onlybe used incertain situations. Although experts disagree abouthow widelystatistical inference should be used, they

all agree that inference is most secure when weproduce data through random sampling orrandomized comparative experiments. Because

when we use chance to choose respondents orassign subjects, the laws of probabilitycan answerthe question What would happen if we did thismany, manytimes?


4/33

Where Weve BeenWhere Weve Been

In chapters 1 4 we focused on exploratory

data analysis where we developed tools and

strategies for organizing, describing, and

analyzing data. In chapter 5 we learned how to correctlycollect

or produce data through surveys, experiments,

and observational studies. In chapters 6 8 we learned about probability.


5/33

Where Were GoingWhere Were Going

The purpose of chapter 9 is to prepare us for the

studyof statistical inference (chapters 10 15) by

looking at th

e probability

distributions of someverycommon statistics: sample proportions andsample means.


6/33

Some Basic VocabularySome Basic Vocabulary

Parameter: a number that describes the entire

population. It statistics this value is never

known.

Statistic: a number that can be computed fromthe sample data. In practice, we often use a

statistic to estimate an unknown parameter.


7/33

The Essence of the MatterThe Essence of the Matter

As long as we were just doing some basic data

analysis, the distinction between statistics and

parameters was not all th

at important, but nowas we get into inference, it is essential.

Remember:

The sample mean is . This is a statistic.

The population mean is . This is a parameter.

x


8/33

More NotationMore Notation

If instead of the mean we are interested in the

percent of people or things that have a certaincharacteristic, we use a proportion.

Remember:

The sample proportion is . (statistic)

The population proportion is p. (parameter)

p


9/33

It All VariesIt All Varies

Each time we take a sample of things we expect

to get a slightlydifferent mean or proportion,even when sampling from the same population.

This basic fact is called sampling variability: the

value of a statistic varies in repeated sampling.


10/33

Always Practice Safe StatisticsAlways Practice Safe Statistics

Because of sampling variability, we would never justcollect data from a single sample and saythat our

sample statistic is equal to the populationparameter. It maybe close, but it maybe veryfar

off.

So how can we ever be sure that our sample statisticis a good estimator of our population parameter?

Well, we cant, but we will learn how to be prettyconfident during the next few chapters of our book.


11/33

Customs ConundrumCustoms Conundrum

In practice it is too difficult and expensive to takemanysamples from a population, so we can imitatethe sampling byusing a simulation.

C

ustoms officials at the Guadalajara airport want tomake sure that travelers do not bring illegal itemsinto the country. Theycannot afford to searcheveryone though, so theyhave each traveler press a

button; green theygo through, red theygetsearched. The officers claim that the probability thelight shows green on anypress of the button is 0.70.


12/33

Simulate an SRSSimulate an SRS

We can imitate the population with a table of

random digits, such as Table B in our book, witheach entrystanding for a traveler. How can we

simulate the results of the button pushes of thenext 100 people in line?

Lets do this, starting at line 101.


13/33

If we continued this process through the first 100

digits, we would find that 71 of the 100 entries are0 through 6. So our sample proportion of people

who make it throughCustoms in Guadalajara is0.71 or .

If we carried out this process again using the next

100 digits, we get a different result, .

These two sample results are different , and neitheris equal to the true population value p = 0.7. Thats

sampling variability!

0.71p !

0.62p !


14/33

SpeedySimulationsSpeedySimulations

Simulations are verypowerful tools in statistics

because theyallow us to studychance withoutphysicallycollecting the data. Technologymakes

this even faster than using a random digit table.


15/33

A Picture is Worth 1000 SRSsA Picture is Worth 1000 SRSs

The distribution of the sample proportion of 1000 SRSs ofsize 100 drawn from a population with p = 0.7.


16/33

A Sampling DistributionA Sampling Distribution

The histogram approximates the sampling

distribution of .p


17/33

StrictlySpeakingStrictlySpeaking

The true sampling distribution for our Customs

situation is the ideal histogram that would formwhen using all the possible samples of size 100

from our population. The histogram that wascreated using 1000 trials is onlyan

approximation of the true sampling distribution.


18/33

Dont Table the Issue!Dont Table the Issue!

* Note this is a probability distribution, not a sampling distribution!

The probabilitydistribution used to construct arandom number table:


19/33

An Actual Sampling DistributionAn Actual Sampling Distribution

Consider the process of taking an SRS of size 2

from this population and computing the meanof the sample. We could perform a simulation

many, manytimes and get an approximatesampling distribution. Since the data set is fairly

small and calculating the mean is easy, we can

instead construct the actual samplingdistribution.


20/33

All Possible Means forAll Possible Means for nn= 2= 2


21/33

The Sampling Distribution of the MeansThe Sampling Distribution of the Means


22/33

Continue CUSSing!Continue CUSSing!

Whether we use probabilityto create the true

sampling distribution of an event, or use asimulation to create an approximate sampling

distribution, we can still describe the distributionthat is created.


23/33

Lets Describe the DistributionsLets Describe the Distributions

SRS w/ n= 100:

SRS w/ n= 1000:


24/33

SneakyScalesSneakyScales


25/33

Randomization Rules!Randomization Rules!

The shape of the approximate samplingdistributions that we just looked at are a result of

random sampling. Non-random sampling wouldnot give such regular and predictable results.

When randomization is used in a design forproducing data, the statistics computed from the

data have a definite pattern of behavior overmanyrepetitions even though the result of asingle repetition is uncertain.


26/33

Can You ReallyTrust a Statistic?Can You ReallyTrust a Statistic?

The fact that statistics calculated from random

samples have definite sampling distributionsallows a more careful answer to the question of

how trustworthya statistic is as an estimator of aparameter.


27/33

Can a Statistic Be Biased?Can a Statistic Be Biased?

Yes! We have alreadydiscussed how sampling

methods can be biased, but what does it mean ifa statistic is biased?

The bias of a statistic is the difference between

the parameter being estimated and the averagevalue of the statistic used to estimate that

parameter.


28/33

An Easier Definition:An Easier Definition:

* That does not mean that each time I calculate an

unbiased statistic it will exactly equal the population

parameter. It will sometimes be larger and sometimessmaller.


29/33

Is Bigger ReallyBetter?Is Bigger ReallyBetter?

Yes! Larger samples are better than smaller

samples because theyare much more likelytoproduce an estimate close to the true parameter.

This is because large samples have much lessvariabilitythan small samples.


30/33

The Variabilityof a StatisticThe Variabilityof a Statistic


31/33

Why?Why?

Well since a statistic is onlycalculated from the

sample data collected, it is onlythe size of thesample that you collect that effects the statistic.

The size of the population doesnt have anythingto do with that calculation. (Of course this is

onlyfor populations that are at least 10 times as

big as the sample size.)


32/33

If Youre Going to San FranciscoIf Youre Going to San Francisco

The fact that the variabilityof sample results iscontrolled bythe size of the sample has very

important consequences for sampling design.

This means that a statistic from a sample of size2500 from the population of the US (more than300 million) is just as precise as a sample of size

2500 from the population of San Francisco

(about 750,000.) To obtain equally likelyresults,you must use equal sample sizes.


33/33

Bias and Variabilityon the DartboardBias and Variabilityon the Dartboard

If the bullseye is the true population parameterand the arrows we throw are our sample

statistics, which pictures represent high and lowbias? High and low variability?

Documents

9.1 Power Point