Upload
krothroc
View
214
Download
0
Embed Size (px)
Citation preview
8/8/2019 9.1 Power Point
1/33
Sampling DistributionsSampling Distributions::Section 9.1Section 9.1
8/8/2019 9.1 Power Point
2/33
WhyAm I Here Again?WhyAm I Here Again?
Statistics is the science (and art) of learning from data.
Remember from the first week of class that there are
two basic kinds of statistics:
exploratorydata analysis: an informal and open-ended examination of data for patterns
statistical inference: follows strict rules and focuseson judging whether the patterns you found are the
sort you would expect
8/8/2019 9.1 Power Point
3/33
Does It Matter Which I Use?Does It Matter Which I Use?
Yes! Exploratorydata analysis can be done with anydata, but formal inference should onlybe used incertain situations. Although experts disagree abouthow widelystatistical inference should be used, they
all agree that inference is most secure when weproduce data through random sampling orrandomized comparative experiments. Because
when we use chance to choose respondents orassign subjects, the laws of probabilitycan answerthe question What would happen if we did thismany, manytimes?
8/8/2019 9.1 Power Point
4/33
Where Weve BeenWhere Weve Been
In chapters 1 4 we focused on exploratory
data analysis where we developed tools and
strategies for organizing, describing, and
analyzing data. In chapter 5 we learned how to correctlycollect
or produce data through surveys, experiments,
and observational studies. In chapters 6 8 we learned about probability.
8/8/2019 9.1 Power Point
5/33
Where Were GoingWhere Were Going
The purpose of chapter 9 is to prepare us for the
studyof statistical inference (chapters 10 15) by
looking at th
e probability
distributions of someverycommon statistics: sample proportions andsample means.
8/8/2019 9.1 Power Point
6/33
Some Basic VocabularySome Basic Vocabulary
Parameter: a number that describes the entire
population. It statistics this value is never
known.
Statistic: a number that can be computed fromthe sample data. In practice, we often use a
statistic to estimate an unknown parameter.
8/8/2019 9.1 Power Point
7/33
The Essence of the MatterThe Essence of the Matter
As long as we were just doing some basic data
analysis, the distinction between statistics and
parameters was not all th
at important, but nowas we get into inference, it is essential.
Remember:
The sample mean is . This is a statistic.
The population mean is . This is a parameter.
x
8/8/2019 9.1 Power Point
8/33
More NotationMore Notation
If instead of the mean we are interested in the
percent of people or things that have a certaincharacteristic, we use a proportion.
Remember:
The sample proportion is . (statistic)
The population proportion is p. (parameter)
p
8/8/2019 9.1 Power Point
9/33
It All VariesIt All Varies
Each time we take a sample of things we expect
to get a slightlydifferent mean or proportion,even when sampling from the same population.
This basic fact is called sampling variability: the
value of a statistic varies in repeated sampling.
8/8/2019 9.1 Power Point
10/33
Always Practice Safe StatisticsAlways Practice Safe Statistics
Because of sampling variability, we would never justcollect data from a single sample and saythat our
sample statistic is equal to the populationparameter. It maybe close, but it maybe veryfar
off.
So how can we ever be sure that our sample statisticis a good estimator of our population parameter?
Well, we cant, but we will learn how to be prettyconfident during the next few chapters of our book.
8/8/2019 9.1 Power Point
11/33
Customs ConundrumCustoms Conundrum
In practice it is too difficult and expensive to takemanysamples from a population, so we can imitatethe sampling byusing a simulation.
C
ustoms officials at the Guadalajara airport want tomake sure that travelers do not bring illegal itemsinto the country. Theycannot afford to searcheveryone though, so theyhave each traveler press a
button; green theygo through, red theygetsearched. The officers claim that the probability thelight shows green on anypress of the button is 0.70.
8/8/2019 9.1 Power Point
12/33
Simulate an SRSSimulate an SRS
We can imitate the population with a table of
random digits, such as Table B in our book, witheach entrystanding for a traveler. How can we
simulate the results of the button pushes of thenext 100 people in line?
Lets do this, starting at line 101.
8/8/2019 9.1 Power Point
13/33
If we continued this process through the first 100
digits, we would find that 71 of the 100 entries are0 through 6. So our sample proportion of people
who make it throughCustoms in Guadalajara is0.71 or .
If we carried out this process again using the next
100 digits, we get a different result, .
These two sample results are different , and neitheris equal to the true population value p = 0.7. Thats
sampling variability!
0.71p !
0.62p !
8/8/2019 9.1 Power Point
14/33
SpeedySimulationsSpeedySimulations
Simulations are verypowerful tools in statistics
because theyallow us to studychance withoutphysicallycollecting the data. Technologymakes
this even faster than using a random digit table.
8/8/2019 9.1 Power Point
15/33
A Picture is Worth 1000 SRSsA Picture is Worth 1000 SRSs
The distribution of the sample proportion of 1000 SRSs ofsize 100 drawn from a population with p = 0.7.
8/8/2019 9.1 Power Point
16/33
A Sampling DistributionA Sampling Distribution
The histogram approximates the sampling
distribution of .p
8/8/2019 9.1 Power Point
17/33
StrictlySpeakingStrictlySpeaking
The true sampling distribution for our Customs
situation is the ideal histogram that would formwhen using all the possible samples of size 100
from our population. The histogram that wascreated using 1000 trials is onlyan
approximation of the true sampling distribution.
8/8/2019 9.1 Power Point
18/33
Dont Table the Issue!Dont Table the Issue!
* Note this is a probability distribution, not a sampling distribution!
The probabilitydistribution used to construct arandom number table:
8/8/2019 9.1 Power Point
19/33
An Actual Sampling DistributionAn Actual Sampling Distribution
Consider the process of taking an SRS of size 2
from this population and computing the meanof the sample. We could perform a simulation
many, manytimes and get an approximatesampling distribution. Since the data set is fairly
small and calculating the mean is easy, we can
instead construct the actual samplingdistribution.
8/8/2019 9.1 Power Point
20/33
All Possible Means forAll Possible Means for nn= 2= 2
8/8/2019 9.1 Power Point
21/33
The Sampling Distribution of the MeansThe Sampling Distribution of the Means
8/8/2019 9.1 Power Point
22/33
Continue CUSSing!Continue CUSSing!
Whether we use probabilityto create the true
sampling distribution of an event, or use asimulation to create an approximate sampling
distribution, we can still describe the distributionthat is created.
8/8/2019 9.1 Power Point
23/33
Lets Describe the DistributionsLets Describe the Distributions
SRS w/ n= 100:
SRS w/ n= 1000:
8/8/2019 9.1 Power Point
24/33
SneakyScalesSneakyScales
8/8/2019 9.1 Power Point
25/33
Randomization Rules!Randomization Rules!
The shape of the approximate samplingdistributions that we just looked at are a result of
random sampling. Non-random sampling wouldnot give such regular and predictable results.
When randomization is used in a design forproducing data, the statistics computed from the
data have a definite pattern of behavior overmanyrepetitions even though the result of asingle repetition is uncertain.
8/8/2019 9.1 Power Point
26/33
Can You ReallyTrust a Statistic?Can You ReallyTrust a Statistic?
The fact that statistics calculated from random
samples have definite sampling distributionsallows a more careful answer to the question of
how trustworthya statistic is as an estimator of aparameter.
8/8/2019 9.1 Power Point
27/33
Can a Statistic Be Biased?Can a Statistic Be Biased?
Yes! We have alreadydiscussed how sampling
methods can be biased, but what does it mean ifa statistic is biased?
The bias of a statistic is the difference between
the parameter being estimated and the averagevalue of the statistic used to estimate that
parameter.
8/8/2019 9.1 Power Point
28/33
An Easier Definition:An Easier Definition:
* That does not mean that each time I calculate an
unbiased statistic it will exactly equal the population
parameter. It will sometimes be larger and sometimessmaller.
8/8/2019 9.1 Power Point
29/33
Is Bigger ReallyBetter?Is Bigger ReallyBetter?
Yes! Larger samples are better than smaller
samples because theyare much more likelytoproduce an estimate close to the true parameter.
This is because large samples have much lessvariabilitythan small samples.
8/8/2019 9.1 Power Point
30/33
The Variabilityof a StatisticThe Variabilityof a Statistic
8/8/2019 9.1 Power Point
31/33
Why?Why?
Well since a statistic is onlycalculated from the
sample data collected, it is onlythe size of thesample that you collect that effects the statistic.
The size of the population doesnt have anythingto do with that calculation. (Of course this is
onlyfor populations that are at least 10 times as
big as the sample size.)
8/8/2019 9.1 Power Point
32/33
If Youre Going to San FranciscoIf Youre Going to San Francisco
The fact that the variabilityof sample results iscontrolled bythe size of the sample has very
important consequences for sampling design.
This means that a statistic from a sample of size2500 from the population of the US (more than300 million) is just as precise as a sample of size
2500 from the population of San Francisco
(about 750,000.) To obtain equally likelyresults,you must use equal sample sizes.
8/8/2019 9.1 Power Point
33/33
Bias and Variabilityon the DartboardBias and Variabilityon the Dartboard
If the bullseye is the true population parameterand the arrows we throw are our sample
statistics, which pictures represent high and lowbias? High and low variability?