Preliminaries Introduction to Statistical Investigations

Preview:

Citation preview

Preliminaries

Introduction to Statistical Investigations

Statistics vs. Anecdotal Evidence

Smoking causes cancer. Seat belts save lives.

Do Vaccines Cause Autism?

Nelson says it wasn't long after her son Parker's shots at 15 months that she noticed something was wrong."He had run a slight fever after the vaccinations, but I didn't think anything of it," said Nelson. “… about a week after that he just completely stopped talking."After months of worrying, wondering, and going back and forth with doctors, an official diagnosis was made: autism.Nelson believes it started with the vaccines."Gradually, I started piecing it together. He got sick after his vaccinations and about a week later everything changed. He was a completely different little boy then," said Nelson.

http://www.wsaz.com/charleston/headlines/19376044.html

Statistics

Scientific conclusions cannot be based on anecdotal evidence. We need evidence from data.

Statistics is the science of producing useful data to address a research question, analyzing the resulting data, and drawing appropriate conclusions from the data.

Six-Step Statistical Investigation Method

Logic ofInference

Scope ofInference

SignificanceEstimation

GeneralizationCausation

6. Look back and ahead

1. Ask a research question

Research Hypothesis

2. Design a study and collect data

3. Explore the data

4. Draw inferences

5. Formulate conclusions

Example P.1: Organ Donations

While a majority of people approve of organ donation in principle, far less than that actually sign up when getting a driver’s license.

Different states have different recruiting methods.

Do these different methods result in different sign-up rates?

Recruiting Organ DonorsStep 1. Ask a Research Question In general: Is there a method that will

increase the likelihood that a person agrees to become an organ donor.

More specifically: Does the default option presented to driver’s license applicants influence the likelihood of someone becoming an organ donor?

Recruiting Organ DonorsStep 2: Design a study and collect dataThe researchers decided to recruit various

participants and ask them to pretend to apply for a new driver’s license.

The participants did not know in advance that different options were given for the donor question, or even that this issue was the main focus of the study.

They offered an incentive of $4.00 for completing an online survey. After the results were collected, the researchers removed data arising from multiple responses from the same IP address, surveys completed in less than five seconds, and respondents whose residential address could not be verified.

Recruiting Organ DonorsStep 2: Design a study and collect dataSome of the participants were forced to make

a choice of becoming a donor or not, without being given a default option (the “neutral” group).

Other participants were told that the default option was not to be a donor but that they could choose to become a donor if they wished (the “opt-in” group).

The remaining participants were told that the default option was to be a donor but that they could choose not to become a donor if they wished (the “opt-out” group).

Recruiting Organ DonorsStep 3: Explore the data. 44 of the 56 (78.6%)

participants in the neutral group agreed to become organ donors,

23 of 55 (41.8%) participants in the opt-in group agreed to become organ donors, and

41 of 50 (82.0%) participants in the opt-out group agreed to become organ donors.

 

Recruiting Organ DonorsStep 4: Draw inferences beyond the data. Using methods that you will learn in this course, the

researchers analyzed whether the observed differences between the groups was large enough to indicate that the default option had a genuine effect.

In particular, they reported strong evidence that the neutral and opt-out versions do lead to a higher chance of agreeing to become a donor, as compared to the opt-in version currently used in many states.

In fact, they could be quite confident that the neutral version increases the chances that a person agrees to become a donor by between 20 and 54 percentage points, a difference large enough to save thousands of lives per year in the United States.

Recruiting Organ DonorsStep 5: Formulate conclusions. Based on the analysis of the data and the

design of the study, it is reasonable for these researchers to conclude that the neutral version causes an increase in the proportion who agree to become donors.

But because the participants in the study were volunteers recruited from internet bulletin boards, generalizing conclusions beyond these participants is only legitimate if they are representative of a larger group of people.

Recruiting Organ DonorsStep 6: Look back and ahead. One limitation of the study is that participants were

asked to imagine how they would respond, which might not mirror how people would actually respond in such a situation.

A new study might look at people’s actual responses to questions about organ donation or could monitor donor rates for states that adopt a new policy.

Researchers could also examine whether presenting educational material on organ donation might increase people’s willingness to donate.

Another improvement would be to include participants from wider demographic groups than these volunteers.

TerminologyThe individual entities on which data are

recorded are called observational units. The recorded characteristics of the

observational units are the variables of interest.

Variables can be: ◦ Quantitative

You can add, subtract, etc. with the values. Height, weight, distance, time…

◦ Categorical Labels for which arithmetic does not make sense. Sex, ethnicity, eye color…

What are the observational units and variables in the Organ Donation Study?

More TerminologyThe distribution

of variable describes the pattern of value/category outcomes.

For the organ donation study the bar chart shown displays the distribution of responses.

Old FaithfulExample P.2

Old FaithfulHow faithful is Old Faithful?Can the time of the next eruption

be accurately predicted?

Old Faithful

Old FaithfulResearchers collected data on

222 eruptions taken over a number of days in the summers of 1978 and 1979.

The results are shown in a dotplot.

100959085807570656055504540time until next eruption (min)

Old FaithfulWhat are the observational units and

variable in this study? Is the variable quantitative or

categorical?We can see from the dotplot that Old

Faithful is not perfectly predictable. The time until the next eruption varies

from eruption to eruption. This variability is the most fundamental

property in studying Statistics.  Without variability, we wouldn’t need statistics.

Old FaithfulLet’s take another look at the dotplot

and describe the distribution.

What could be some explanations for the variability?

100959085807570656055504540time until next eruption (min)

Old FaithfulOne explanation could be the

duration of previous eruption (short: < 3.5 min. or long > 3.5 min.)

100959085807570656055504540

short

long

time until next eruption (min)

eru

ption t

ype

Old Faithful

Summer 2005

Old FaithfulOne way to measure the center

of a distribution is with the average, also called the mean.

One way to measure variability is with the standard deviation, which is roughly the average distance between a data value in the distribution and the mean of the distribution

Old Faithful  Mean Standard

deviationOverall 71.0 12.8After short duration

56.3 8.5

After long duration

78.7 6.3

100959085807570656055504540

short

long

time until next eruption (min)

eru

ption type

Old Faithful

Basic TerminologySome aspects to look for in a distribution of

a quantitative variable are:◦ Shape: Is the distribution symmetric? Mound-

shaped? Are there several peaks or clusters? ◦ Center: Where is the distribution centered?

What is a typical value?◦ Variability: How spread out are the data? Are

most within a certain range of values?◦ Unusual observations: Are there outliers that

deviate markedly from the overall pattern of the other data values? Are there other unusual features in the distribution?

Exploration P.3: Cars or GoatsPages P-13 to P-17

Recommended