10.1 Power Point

Embed Size (px)

Citation preview

  • 8/7/2019 10.1 Power Point

    1/17

    Confidence Intervals: The Basics

    Section 10.1

  • 8/7/2019 10.1 Power Point

    2/17

    How long can you expect a AAA battery tolast? What proportion of college

    undergraduates have engaged in bingedrinking? Is caffeine dependence real?

    These are the types of questions that we would

    like to be able to answer, but it just isntpractical to ask/experiment on every battery,college undergrad, or caffeine addictedperson.

    Instead we select a sample and collect datafrom those individuals only. The goal is toinfer from the sample data some conclusion

    about the population.

  • 8/7/2019 10.1 Power Point

    3/17

    We cannot ever be certain that our

    conclusions are correct since a different

    sample would generally lead to a differentconclusion.

    Statistical inference uses the language of

    probability to express the strength of ourconclusions. Probability allows us to take

    chance variation into account and to correct

    our judgment using calculations.

  • 8/7/2019 10.1 Power Point

    4/17

    In the Next Two Chapters

    We will study the two most common methods

    of statistical inference, confidence intervals

    and significance tests.

    Both are based on the sampling distributions

    of statistics.

    That is, both report probabilities that state

    what would happen if we used the inference

    method many, many times.

  • 8/7/2019 10.1 Power Point

    5/17

    Your Data is Only as Good as Your

    Collection Methods

    The methods of formal inference require the long-run, regular behavior that probability describes.

    Inference is most reliable when the data are

    produced by a properly randomized design. If thisisnt done, your conclusions will be open tochallenge.

    Formal inference cannot remedy basic flaws in

    producing data, such as voluntary responsesamples and uncontrolled experiments. Use thecommon sense you acquired over the first ninechapters and proceed to formal inference onlywhen you are satisfied that the data deserve such

    analysis.

  • 8/7/2019 10.1 Power Point

    6/17

    Lets Pretend

    To begin learning about the methods of inference, wewill pretend we know the true population standarddeviation , although we would never actually knowthat value without knowing . Once we know thebasic ideas, we will get rid of this unrealisticrequirement.

    There are libraries full of more elaborate statisticaltechniques than we will use, but informed use of anyof these methods requires an understanding of theunderlying reasoning.

    A computer or calculator can do the arithmetic, butwe must still exercise judgment based on

    understanding.

  • 8/7/2019 10.1 Power Point

    7/17

    Hey Baby, Whats Your IQ?

    Big City University would like to find the

    average IQ of its 5000 freshman. Having

    each one take an IQ test would be difficult

    and expensive though, so the university

    decides to administer the test to an SRS of

    50 freshman. The university finds that themean IQ score for this sample is

    Lets ponder a few questions

    112.x !

  • 8/7/2019 10.1 Power Point

    8/17

    Find Your Exact Match! Really?

    Is the mean IQ score of all Big CityUniversity freshman exactly 112?

    Probably not. But the law of large numbers tellsus that the sample mean from a large SRS willbe close to the unknown population parameter.Because , we guess that issomewhere around 112.

    How close to 112 is likely to be?

    Well, to answer this we must ask anotherquestion

    112x !

  • 8/7/2019 10.1 Power Point

    9/17

    We ask How would the sample mean vary if we

    took many, many samples of 50 freshman from this

    same population?

    The sampling distribution of describes how the values

    of vary in repeated samples. Remember from last

    chapter:

    The mean of the sampling distribution of is the same as theunknown mean of the entire population.

    The standard deviation of for an SRS of 50 freshman is ,

    where is the standard deviation of the IQ scores from all Big

    City University freshman. Suppose we know that the IQ scores

    have standard deviation = 15, then the standard deviation ofis

    The central limit theorem tells us that the mean of 50 scores has

    a distribution that is close to Normal.

    x

    x

    x

    x

    x 50

    W

    15 2.1.50

    }

  • 8/7/2019 10.1 Power Point

    10/17

    Put It All Together and What Do

    You Have?

    Putting these facts together gives us the

    reasoning of statistical estimation in a nutshell for

    this example:

    1. To estimate the unknown population mean ,

    use the mean of the SRS

    2. Although is an unbiased estimator of, it will

    rarely be exactly equal to , so our estimate hassome error.

    3. In repeated samples, the values of follow an

    approximately Normal distribution with mean

    and standard deviation 2.1.

    x

    x

    .x

  • 8/7/2019 10.1 Power Point

    11/17

    4. The 68-95-99.7% Rule says that in about 95% of all

    samples, the mean IQ score for the sample will be

    within 4.2 (two standard deviations) of the population

    mean .

    5. Whenever is within 4.2 points of, is within 4.2

    points of . This happens in about 95% of all

    samples. So the unknown population parameterlies between and in about 95% of all

    samples.

    6. So if we estimate that , lies somewhere in theinterval 112 4.2 = 107.8 to 112 + 4.2 = 116.2, we

    would be calculating this interval using a method that

    captures the true in about 95% of all possible

    samples.

    x

    4.2x

    x

    x

    4.2x

  • 8/7/2019 10.1 Power Point

    12/17

    The Big Picture - The big idea is that thesampling distribution of tells us how big the error

    is likely to be when we use as an estimate for.

    x

    x

  • 8/7/2019 10.1 Power Point

    13/17

    What Was That Again?

    We have just learned that in 95% of all

    samples of 50 Big City University freshman,

    the interval will contain the truepopulation mean .

    The language of statistical inference usesthis fact about what would happen in many

    samples to express our confidence in the

    results of any one sample.

    4.2x s

  • 8/7/2019 10.1 Power Point

    14/17

    So To Finally Answer the

    Question

    Earlier we asked How close to 112 is likely

    to be? The resulting interval is 112 4.2,

    which can be written as (107.8, 116.2). Wecan now say that We are 95% confident that

    the unknown mean IQ score for all Big City

    University freshman is between 107.8 and

    116.2.

    Remember this phrasing because we will use

    it every time we create a confidence interval.

  • 8/7/2019 10.1 Power Point

    15/17

    Be Careful!

    Be sure that you understand the basis for our

    confidence. There are only two possibilities:

    1. The interval (107.8, 116.2) contains the true .

    2. Our SRS was one of the few samples for which

    is not within 4.2 points of the true . (Only 5% of all

    samples give such inaccurate results.)

    We cannot know whether our sample is one of the

    unlucky 5%.The phrase we are 95% confident is

    shorthand for saying, We got these numbers by a

    method that gives correct results 95% of the time.

    x

  • 8/7/2019 10.1 Power Point

    16/17

  • 8/7/2019 10.1 Power Point

    17/17

    Twenty-five

    samples fromthe same

    population give

    these 95%

    confidenceintervals. In the

    long run, 95% of

    all samples give

    an interval thatcontains the

    population mean

    .