22
Using Inference Wisely: A Cautionary Tale Mar 5 , 2012

Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

Using Inference Wisely:

A Cautionary Tale

Mar 5 , 2012

Page 2: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

1.0 Inference So Far

Statistical methods/ideas provide us with the ability tomake inferences about a large group (population) basedon observations of a smaller subset of that group(sample).

We have met two types of statistical inference:confidence intervals and tests of significance.

Our focus has only been on procedures designed forinference about one population proportion p. can begeneralized to many populations, other parameters

Any inference can only be trusted under specificconditions.

Page 3: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

2.0 Inference in Practice

The most important requirement for any inferenceprocedure is that the data comes from a process to whichthe laws of probability apply.

A city government did a survey of working women, to see howthey felt about juggling jobs and family responsibilities.Businesses, unions, and community service organizationshelped distribute the survey questionnaire to locations wherethe women could pick up copies. 1,678 out of 2,800respondents, or 59.8%, checked the item that “stress is aserious problem” on the questionnaire. An approximate 95%confidence interval for the true proportion of women who feltstress in the population is:

0.598 ± 2

√0.598 × (1 - 0.598)

2800

Does this make sense?

Page 4: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

2.1 How We Derived Our Confidence

Intervals

The key ingredient for calculating a confidence interval isthe sampling distribution.

A sampling distribution shows how a statistic such asp-hat varies in repeated sampling.

This variation causes random sampling error because thestatistic misses the parameter by a random amount.

From this perspective, the confidence interval on theprevious slide makes no sense. Why?

Page 5: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

2.2 Where Does the Chance Come

From?

When you use statistical inference, you are acting as if youroutcomes are determined by a chance process.

For a random sample, the chance is in the selectionprocess.

For a randomized comparative experiment, the chance isin the random assignment process.

For other situations, we assume chance plays a role indetermining what outcome is observed for each individualin our sample.

I Infants chose the helper/hinderer using a coin flip.I Deaths happen on a shift according to a coin flip.I Son’s height is assumed to be off from his father’s

height by some random error.

Page 6: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

3.0 The Design Matters

Poll Finds 1 Out of 3 Americans Open to Doubt There Was a HolocaustApril 20, 1993|HOWARD LIBIT | TIMES STAFF WRITER

WASHINGTON — Just three days before the dedication of the U.S. Holocaust Memorial Museum, a survey released Monday revealed that one in three Americans is open to the possibility that the Holocaust never occurred at all.

The Roper poll surveyed 992 adults at random.

The question:

Does it seem possible, or does it seemimpossible to you that the Nazi extermination ofthe Jews never happened?

22% said “it seemed possible”. Another 12% said “theydid not know”. Only 66% believed “it seemed impossibleit never happened”. The margin of error was plus orminus 4 percentage points.

Page 7: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

3.1 Data is Fragile

Major Holocaust Polls Show Built-In Bias

Flawed Wording

As it turned out, however, the "startling" results of the AJC/Roper survey were not accurate. One of the poll's most serious flaws was the confusing wording of question 16, which produced the response that generated the most media comment: "Does it seem possible or does it seem impossible to you that the Nazi extermination of the Jews never happened?" The question's compound structure and double negative wording almost certainly confused many respondents.

Responding to criticism, the AJC recommissioned the poll, changing only question 16. In the second survey, this question was worded: "Does it seem possible to you that the Nazi extermination of the Jews never happened, or do you feel certain that it happened?" The results of the second, 1994 AJC poll were quite different: Only about one percent of Americans thought it was possible the Holocaust never happened, while eight percent were unsure. ("Poll on Doubt of Holocaust is Corrected," The New York Times, July 8, 1994)

Page 8: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

3.2 Inference Does Not Check Your

Design

There is no correct method for data collected haphazardlywith bias of unknown size.

Confidence intervals and tests of significance use the datayou give them.

They do not check if your experiment was properlydesigned.

They do not account for any biases in your datacollection.

Margins of error are likely to be unrealistically small formost practical settings.

Page 9: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

4.0 Know what statistical significance

says

A test answers only one question: “How strong is theevidence that the null hypothesis is not true?”

The P-value tells us how unlikely data as or more extremethan ours would be if the null hypothesis were true.

Data that is very unlikely is good evidence that thenull hypothesis is not true.

Page 10: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

4.1 What the P-value Does Not Say

The P-value is the probability of observing data as ormore extreme if the null hypothesis were true.

It is not the probability that the null hypothesis is true.prosecutor’s fallacy

If deaths on a shift occur in a chance-like manner(p=0.045), then it would be almost impossible (1 out of100 million) to get 40+ deaths on 257 shifts.

This does not say that therefore the probability of deathsoccurring in a chance-like manner (with p=0.045) is 1out of 100 million.

Page 11: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

4.2 The Strength of Evidence

The purpose of a test of significance is to give evidencefor the presence of some effect in the population.

If the effect is large, it will show up in most samples.

Small effects will often be hidden behind the chancevariation in a sample.

This is as it should be: the P-value will usually be smallwhen the population truth is far from the null hypothesis.

The “woes” of testing start with the fact that largesamples make tests more sensitive and pick up differencesthat are so small as to be of no practical interest.

Page 12: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

4.2 The Strength of Evidence

Is this outcomesurprising?

n=1000

n=4000

n=100,000

Sampling Distribution of p-hat if p=0.5

0.507

The outcome p-hat=0.507 is not convincing evidence if itcomes up in 1000 tosses, but it is completely convincing ifit comes up in 100,000 tosses.

Page 13: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

4.2 The Strength of Evidence

A finding can be statistically significant onaccount of a large sample. evidence of a strongassociation versus strong evidence of an association

Conversely, small samples often miss important effectsthat are really present in the population.

Practical significance is much more important to thinkabout.

Watch out for P-values that are not accompanied bysample sizes or observed statistic values.

Page 14: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

4.3 Significance Depends on the

Alternative Hypothesis

P-value

H-a: not equal

Possible values for statistic

Sampling Distribution ofStatistic if H-nought istrue

++

Observed value of statisticand its mirror image

+

H-a: greater thanP-value

Sampling Distribution ofStatistic if H-nought istrue

Observed value of statistic

+

+

Possible values for statistic

P-value

+

+Observed value of statistic

Possible values of statistic

Sampling Distribution ofStatistic if H-nought istrue

Ha: less than

Page 15: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

4.3 Significance Depends on the

Alternative Hypothesis

A randomized controlled double blind study wasperformed to study the efficacy of a drug to reducecholesterol. There were 3806 middle aged men at highrisk of heart attacks. Half were assigned randomly to thedrug and the other half were in the control. There were155 heart attacks in the treatment group compared to187 in the control group: 8.1% vs. 9.8%, P ≈ 3.5% (onetailed).

At the 5% level, this result will be called statisticallysignificant.

However, at the same 5% level but with a two-sidedalternative hypothesis, this result would not bestatistically significant.

Page 16: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

4.3 Significance at 5% Isn’t Magical

The purpose of a test of significance is to describe thedegree of evidence provided by the sample against thenull hypothesis.

The P-value does this. But how small a P-value isconvincing evidence against H0?

This depends on two questions.I How plausible is H0? A widely-believed H0 requires

strong evidence against to persuade.I What are the consequences of rejecting H0? If rejecting

H0 means making expensive changes, again strongevidence is needed.

This is subjective. In discrimination court cases, a P-valueless than 0.05 is considered evidence.

There is no sharp border between significant andinsignificant.

Page 17: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

4.3 Significance at 5% Isn’t Magical

The ConclusionThe small P-value (n =, p-hat = ) means that this dataprovides strong evidence that the null hypothesis may not betrue.

The large P-value (n =, p-hat = ) gives us no reason to thinkthat the null hypothesis may not be true.

Page 18: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

4.4 Avoid Searching for Significance

Statistical significance ought to mean that you havefound an effect that you were looking for.

This method works well if you decide what effect you areseeking, design a study to search for it, and then use atest of significance to weigh the evidence.

When used in a different way, it has little meaning.

Page 19: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

4.4 Avoid Searching for Significance

You want to learn what distinguishes managerial traineeswho eventually become executives from those who, afterexpensive training, don’t succeed.

You have abundant data on past trainees – data on theirpersonality, goals, college prep, college GPA, familybackground and even their hobbies.

You perform dozens of significance tests on these dozensof variables to see which ones differ significantly betweenthe future executives and the washouts. mea culpa

Voila! You find that future executives are significantlymore likely to have an urban/suburban upbringing and adegree in a technical field.

Page 20: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

4.4 Avoid Searching for Significance

Caution: Even if the null hypothesis is true, when youmake dozens of tests, you can expect a few of them toturn out significant at the 5% level, by chance alone.

Running one test and finding a P-value smaller than 0.05is good evidence. Running a few dozen tests and gettingone or two P-values smaller than 0.05 is not. multiplecomparisons problem

Searching data for patterns is legitimate. But thereasoning of inference does not apply when your searchfor a striking effect in the data is successful.

Once you have a striking effect, come up with ahypothesis and design a study to test this hypothesis.

Page 21: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

Wising Up: Example 1

An anthropologist suspects that color blindness is lesscommon in societies that live by hunting and gatheringthan in settled agricultural societies. He tests a numberof adults in two populations in Africa, one of each type.The proportion of color-blind people is significantly lower(P < 0.05) in the hunter-gatherer population. Whatadditional information would you want to help you decidewhether you accept the claim about color blindness?

Page 22: Using Inference Wisely: A Cautionary Talefaculty.washington.edu/grover4/class18.pdfA Cautionary Tale Mar 5 , 2012 1.0 Inference So Far Statistical methods/ideas provide us with the

Wising Up: Example 2

A company compares two package designs for a laundrydetergent by placing bottles with both designs on theshelves of several markets in a large city. Checkoutscanner data on more than 10,000 bottles bought showthat more shoppers bought Design A than Design B. Thedifference is statistically significant (P=0.02). Can weconclude that consumers strongly prefer Design A?