9.1 Power Point

Embed Size (px)

Citation preview

  • 8/8/2019 9.1 Power Point

    1/33

    Sampling DistributionsSampling Distributions::Section 9.1Section 9.1

  • 8/8/2019 9.1 Power Point

    2/33

    WhyAm I Here Again?WhyAm I Here Again?

    Statistics is the science (and art) of learning from data.

    Remember from the first week of class that there are

    two basic kinds of statistics:

    exploratorydata analysis: an informal and open-ended examination of data for patterns

    statistical inference: follows strict rules and focuseson judging whether the patterns you found are the

    sort you would expect

  • 8/8/2019 9.1 Power Point

    3/33

    Does It Matter Which I Use?Does It Matter Which I Use?

    Yes! Exploratorydata analysis can be done with anydata, but formal inference should onlybe used incertain situations. Although experts disagree abouthow widelystatistical inference should be used, they

    all agree that inference is most secure when weproduce data through random sampling orrandomized comparative experiments. Because

    when we use chance to choose respondents orassign subjects, the laws of probabilitycan answerthe question What would happen if we did thismany, manytimes?

  • 8/8/2019 9.1 Power Point

    4/33

    Where Weve BeenWhere Weve Been

    In chapters 1 4 we focused on exploratory

    data analysis where we developed tools and

    strategies for organizing, describing, and

    analyzing data. In chapter 5 we learned how to correctlycollect

    or produce data through surveys, experiments,

    and observational studies. In chapters 6 8 we learned about probability.

  • 8/8/2019 9.1 Power Point

    5/33

    Where Were GoingWhere Were Going

    The purpose of chapter 9 is to prepare us for the

    studyof statistical inference (chapters 10 15) by

    looking at th

    e probability

    distributions of someverycommon statistics: sample proportions andsample means.

  • 8/8/2019 9.1 Power Point

    6/33

    Some Basic VocabularySome Basic Vocabulary

    Parameter: a number that describes the entire

    population. It statistics this value is never

    known.

    Statistic: a number that can be computed fromthe sample data. In practice, we often use a

    statistic to estimate an unknown parameter.

  • 8/8/2019 9.1 Power Point

    7/33

    The Essence of the MatterThe Essence of the Matter

    As long as we were just doing some basic data

    analysis, the distinction between statistics and

    parameters was not all th

    at important, but nowas we get into inference, it is essential.

    Remember:

    The sample mean is . This is a statistic.

    The population mean is . This is a parameter.

    x

  • 8/8/2019 9.1 Power Point

    8/33

    More NotationMore Notation

    If instead of the mean we are interested in the

    percent of people or things that have a certaincharacteristic, we use a proportion.

    Remember:

    The sample proportion is . (statistic)

    The population proportion is p. (parameter)

    p

  • 8/8/2019 9.1 Power Point

    9/33

    It All VariesIt All Varies

    Each time we take a sample of things we expect

    to get a slightlydifferent mean or proportion,even when sampling from the same population.

    This basic fact is called sampling variability: the

    value of a statistic varies in repeated sampling.

  • 8/8/2019 9.1 Power Point

    10/33

    Always Practice Safe StatisticsAlways Practice Safe Statistics

    Because of sampling variability, we would never justcollect data from a single sample and saythat our

    sample statistic is equal to the populationparameter. It maybe close, but it maybe veryfar

    off.

    So how can we ever be sure that our sample statisticis a good estimator of our population parameter?

    Well, we cant, but we will learn how to be prettyconfident during the next few chapters of our book.

  • 8/8/2019 9.1 Power Point

    11/33

    Customs ConundrumCustoms Conundrum

    In practice it is too difficult and expensive to takemanysamples from a population, so we can imitatethe sampling byusing a simulation.

    C

    ustoms officials at the Guadalajara airport want tomake sure that travelers do not bring illegal itemsinto the country. Theycannot afford to searcheveryone though, so theyhave each traveler press a

    button; green theygo through, red theygetsearched. The officers claim that the probability thelight shows green on anypress of the button is 0.70.

  • 8/8/2019 9.1 Power Point

    12/33

    Simulate an SRSSimulate an SRS

    We can imitate the population with a table of

    random digits, such as Table B in our book, witheach entrystanding for a traveler. How can we

    simulate the results of the button pushes of thenext 100 people in line?

    Lets do this, starting at line 101.

  • 8/8/2019 9.1 Power Point

    13/33

    If we continued this process through the first 100

    digits, we would find that 71 of the 100 entries are0 through 6. So our sample proportion of people

    who make it throughCustoms in Guadalajara is0.71 or .

    If we carried out this process again using the next

    100 digits, we get a different result, .

    These two sample results are different , and neitheris equal to the true population value p = 0.7. Thats

    sampling variability!

    0.71p !

    0.62p !

  • 8/8/2019 9.1 Power Point

    14/33

    SpeedySimulationsSpeedySimulations

    Simulations are verypowerful tools in statistics

    because theyallow us to studychance withoutphysicallycollecting the data. Technologymakes

    this even faster than using a random digit table.

  • 8/8/2019 9.1 Power Point

    15/33

    A Picture is Worth 1000 SRSsA Picture is Worth 1000 SRSs

    The distribution of the sample proportion of 1000 SRSs ofsize 100 drawn from a population with p = 0.7.

  • 8/8/2019 9.1 Power Point

    16/33

    A Sampling DistributionA Sampling Distribution

    The histogram approximates the sampling

    distribution of .p

  • 8/8/2019 9.1 Power Point

    17/33

    StrictlySpeakingStrictlySpeaking

    The true sampling distribution for our Customs

    situation is the ideal histogram that would formwhen using all the possible samples of size 100

    from our population. The histogram that wascreated using 1000 trials is onlyan

    approximation of the true sampling distribution.

  • 8/8/2019 9.1 Power Point

    18/33

    Dont Table the Issue!Dont Table the Issue!

    * Note this is a probability distribution, not a sampling distribution!

    The probabilitydistribution used to construct arandom number table:

  • 8/8/2019 9.1 Power Point

    19/33

    An Actual Sampling DistributionAn Actual Sampling Distribution

    Consider the process of taking an SRS of size 2

    from this population and computing the meanof the sample. We could perform a simulation

    many, manytimes and get an approximatesampling distribution. Since the data set is fairly

    small and calculating the mean is easy, we can

    instead construct the actual samplingdistribution.

  • 8/8/2019 9.1 Power Point

    20/33

    All Possible Means forAll Possible Means for nn= 2= 2

  • 8/8/2019 9.1 Power Point

    21/33

    The Sampling Distribution of the MeansThe Sampling Distribution of the Means

  • 8/8/2019 9.1 Power Point

    22/33

    Continue CUSSing!Continue CUSSing!

    Whether we use probabilityto create the true

    sampling distribution of an event, or use asimulation to create an approximate sampling

    distribution, we can still describe the distributionthat is created.

  • 8/8/2019 9.1 Power Point

    23/33

    Lets Describe the DistributionsLets Describe the Distributions

    SRS w/ n= 100:

    SRS w/ n= 1000:

  • 8/8/2019 9.1 Power Point

    24/33

    SneakyScalesSneakyScales

  • 8/8/2019 9.1 Power Point

    25/33

    Randomization Rules!Randomization Rules!

    The shape of the approximate samplingdistributions that we just looked at are a result of

    random sampling. Non-random sampling wouldnot give such regular and predictable results.

    When randomization is used in a design forproducing data, the statistics computed from the

    data have a definite pattern of behavior overmanyrepetitions even though the result of asingle repetition is uncertain.

  • 8/8/2019 9.1 Power Point

    26/33

    Can You ReallyTrust a Statistic?Can You ReallyTrust a Statistic?

    The fact that statistics calculated from random

    samples have definite sampling distributionsallows a more careful answer to the question of

    how trustworthya statistic is as an estimator of aparameter.

  • 8/8/2019 9.1 Power Point

    27/33

    Can a Statistic Be Biased?Can a Statistic Be Biased?

    Yes! We have alreadydiscussed how sampling

    methods can be biased, but what does it mean ifa statistic is biased?

    The bias of a statistic is the difference between

    the parameter being estimated and the averagevalue of the statistic used to estimate that

    parameter.

  • 8/8/2019 9.1 Power Point

    28/33

    An Easier Definition:An Easier Definition:

    * That does not mean that each time I calculate an

    unbiased statistic it will exactly equal the population

    parameter. It will sometimes be larger and sometimessmaller.

  • 8/8/2019 9.1 Power Point

    29/33

    Is Bigger ReallyBetter?Is Bigger ReallyBetter?

    Yes! Larger samples are better than smaller

    samples because theyare much more likelytoproduce an estimate close to the true parameter.

    This is because large samples have much lessvariabilitythan small samples.

  • 8/8/2019 9.1 Power Point

    30/33

    The Variabilityof a StatisticThe Variabilityof a Statistic

  • 8/8/2019 9.1 Power Point

    31/33

    Why?Why?

    Well since a statistic is onlycalculated from the

    sample data collected, it is onlythe size of thesample that you collect that effects the statistic.

    The size of the population doesnt have anythingto do with that calculation. (Of course this is

    onlyfor populations that are at least 10 times as

    big as the sample size.)

  • 8/8/2019 9.1 Power Point

    32/33

    If Youre Going to San FranciscoIf Youre Going to San Francisco

    The fact that the variabilityof sample results iscontrolled bythe size of the sample has very

    important consequences for sampling design.

    This means that a statistic from a sample of size2500 from the population of the US (more than300 million) is just as precise as a sample of size

    2500 from the population of San Francisco

    (about 750,000.) To obtain equally likelyresults,you must use equal sample sizes.

  • 8/8/2019 9.1 Power Point

    33/33

    Bias and Variabilityon the DartboardBias and Variabilityon the Dartboard

    If the bullseye is the true population parameterand the arrows we throw are our sample

    statistics, which pictures represent high and lowbias? High and low variability?