Producing data: sampling and experiments BPS chapters 8 & 9 © 2006 W. H. Freeman and Company

Producing data: sampling and experiments

BPS chapters 8 & 9

© 2006 W. H. Freeman and Company

From Exploration to Inference (p 186)Exploratory Data

AnalysisStatistical Inference

Purpose is unrestricted exploration of the data, searching for interesting

patterns.

Purpose is to answer specific questions, posed before the data were

produced.

Conclusions apply only to the individuals and circumstances for

which we have data in hand.

Conclusions apply to a larger group of individuals or a broader class of

circumstances.

Conclusions are informal, based on what we see in the data.

Conclusions are formal, backed by a statement of our confidence in them.

Objectives (BPS chapter 8)

Producing data: sampling

Observation versus experiment

Population versus sample

Sampling methods

How to sample badly

Simple random samples

Other sampling designs

Caution about sample surveys

Learning about populations from samples (inference)

Observation versus experiment (p 190)Observational study: Record data on individuals without attempting to

influence the responses. We typically cannot prove causation this way.

Example 8.1: Observational study on hormone replacement.

Experimental study: Deliberately impose a treatment on individuals and record their responses. Influential factors can be controlled.

Example 8.1: Experimental study on hormone replacement.

Confounding (p 191)Two variables (explanatory variables or lurking variables)

are confounded when their effects on a response

variable cannot be distinguished from each other.

Well-designed experiments take steps to defeat confounding.

Lurking:Richer/Better Educated

Reduced Risk of Heart Attack

Hormone Replacement

CAUSE?

Confounding?Confounding?

Observational studies of the effect of one variable on another often fail because

the explanatory variable is confounded with lurking variables.

Confounding?Confounding?

Population versus sample (p 192) Sample: The part of the

population we actually examine

and for which we do have data

How well the sample represents the population depends on the sample design.

Population: The entire group

of individuals in which we are

interested but can’t usually

assess directly

Example: All humans, all working-age people in California, all crickets

A parameter is a number

describing a characteristic of

the population.

A statistic is a number

describing a characteristic of a

sample.

Population

Sample

We want samples that are representative of the population

Convenience sampling: Just ask whoever is around.

Example: “Man on the street” survey (cheap, convenient, often quite

opinionated or emotional → now very popular with TV “journalism”)

Which men, and on which street?

Ask about gun control or legalizing marijuana “on the street” in

Berkeley, CA and in some small town in Idaho and you would probably

get totally different answers.

Even within an area, answers would probably differ if you did the

survey outside a high school or a country-western bar.

Bias: Opinions limited to individuals present

Bad sampling methods (p 194-195)

Voluntary Response Sample:

A voluntary response sample consists of people who choose

themselves for inclusion in the sample in response to a broad

appeal. Voluntary response samples are prone to bias because people with

strong opinions are most likely to respond.

Bias: Sample design systematically favors a particular outcome.

Ann Landers summarizing responses of readers:

Seventy percent of (10,000) parents wrote in to say that having

kids was not worth it—if they had to do it over again, they

wouldn’t.

Bias: Most letters to newspapers are written by disgruntled

people. A random sample showed that 91% of parents

WOULD have kids again.

CNN on-line surveys:

Bias: People have to care enough about an issue to bother replying. This sample

is probably a combination of people who hate “wasting the taxpayers’ money”

and “animal lovers.”

Good sampling methods

Probability or random sampling:

Individuals are randomly selected. No one group should be over-

represented.

Random samples rely on the absolute objectivity of random numbers. There are books and tables of random digits available for random sampling.

Statistical software cangenerate random digits (e.g., Excel “=random()”).

Sampling randomly gets rid of bias.

Sampling randomly controls for lurking variables

Simple random samples (p 196)The simple random sample (SRS) is made of randomly selected

individuals. Each individual in the population has the same probability of

being in the sample. All possible samples of size n have the same

chance of being drawn.

How to choose an SRS of size n from a population of size N using the TI-8x randInt function:

Label the individuals in your population from 1 to N.

(optional) Store a “seed” value in the rand register of the TI (the seed will be given to you):

Generate 2n random integers from 1 to N and store them in a list.

Select the n corresponding individuals for your sample

19 STO rand MATH | PRB menuFor a seed of 19

randInt(1,N,n) STO L6MATH|PRB menu

01 Alison02 Amy03 Brigitte04 Darwin05 Emily06 Fernando07 George08 Harry09 Henry10 John11 Kate12 Max13 Moe14 Nancy15 Ned16 Paul17 Ramon18 Rupert19 Tom20 Victoria

Example: Choosing a simple random sample using the randInt function


Problem: select a simple random sample of size 5 from a

class of size 20. The class members are listed to the right…

1. Number the members of the population.

2. Let’s use a seed value of 71 for the randInt function.

71 STO rand

3. Generate the sample, store it in list 6.

randInt(1,20,10) STO L6

4. Pick the corresponding individuals. (Skip duplicates.)

Simple random samples (p 196)How to choose an SRS of size n from a population of size N using a random number table.

Label. Give each member of the population a numerical label of the same length.

Table. To choose an SRS, read from Table B (page 686) successive groups of digits of the length you used as labels. Your sample contains the individuals whose labels you find in the table.

Example: Choosing a simple random sample using a table of random numbers (Table B page 686)Problem: select a simple random sample of size 5 from a

class of size 20. The class members are listed to the right…

Step 1: Assign a numerical label to each member of the

population. Use leading zeros to ensure that each label

has the same number of digits.

In our case there are 20 individuals in the

population, so all labels must use 2 digits

Step 2: Go to the table of random digits, and select a

row at random. (Or, I may tell you which row to start

with.)



Example: Choosing a simple random sample using a table of random numbers (Table B page 686) 01 Alison

02 Amy03 Brigitte04 Darwin05 Emily06 Fernando07 George08 Harry09 Henry10 John11 Kate12 Max13 Moe14 Nancy15 Ned16 Paul17 Ramon18 Rupert19 Tom20 Victoria

For this example, let’s start with line 103…

Step 2: Go to the table of random digits, and select a

row at random. (Or, I may tell you which row to start

with.)



For this example, let’s start with line 103…

Our data labels have two digits, so we parse the

random number table into blocks of length 2…

Step 3: Parse the random numbers into blocks of the

same length as your data labels.



Our data labels have two digits, so we parse the

random number table into blocks of length 2…

Step 3: Parse the random numbers into blocks of the

same length as your data labels.

45 46 77 17 09 77 55 80 00 95 32 86 32 94 85 82 22 69 00 56



Step 4: Read through your parsed numbers in order,

and each time you come to one of your labels, select it

for the sample.

45 46 77 17 09 77 55 80 00 95 32 86 32 94 85 82 22 69 00 56



Step 5: Continue line-by-line through the table until

you’ve got the number of samples that you need.

45 46 77 17 09 77 55 80 00 95 32 86 32 94 85 82 22 69 00 56

52 71 13 88 89 93 07 46 02 27 40 01 18 58 48 48 76 75 25 73

If you pick a sample that you’d already picked

previously, just skip it.

Stratified samplesA stratified random sample is essentially a series of SRS performed

on subgroups of a given population. The subgroups are chosen to

contain all the individuals with a certain characteristic. For example: Divide the population of UCI students into males and females.

Divide the population of California by major ethnic group.

Divide the counties in America as either urban or rural based on a

criterion of population density.

The SRS taken within each group in a stratified random sample need

not be of the same size. For example: Stratified random sample of 100 male and 150 female UCI students

Stratified random sample of a total of 100 Californians, representing

proportionately the major ethnic groups

Multistage samples use multiple stages of stratification. They are often

used by the government to obtain information about the U.S. population.

Example: Sampling both urban and rural areas, people in different ethnic

and income groups within the urban and rural areas, and then individuals

of different political orientation within those strata.

Data are obtained by taking an SRS for each substrata.

Statistical analysis for

multistage samples is more

complex than for an SRS.

Caution about sampling surveys (p 201) Nonresponse: People may simply refuse to participate. Yet they are

part of the population, and should be represented. Nonresponse can

introduce bias into results, especially if those who refuse to participate

come predominately from one subgroup of the population.

Response bias: Fancy term for lying when you think you should not tell

the truth. Like if your family doctor asks: “How much do you drink?”

People also simply forget, give erroneous answers, or avoid giving

answers that show them in a potentially bad light. Example: a survey of

white voters asking: “Does a candidate’s race affect your decision of

whether a not to vote for him or her?” (Bradley effect)

Wording effects: Questions worded like “Don’t you agree that it is

awful that…” are prompting you to give a particular response.

Undercoverage

Undercoverage occurs when parts of the population

are left out in the process of choosing the sample.

Because the U.S. Census goes “house to house,” homeless people

are not represented. Illegal immigrants also avoid being counted.

Geographical districts with a lot of undercoverage tend to be poor

ones. Representatives from richer areas typically strongly oppose

statistical adjustment of the census.

Historically, clinical trials have avoided including women in

their studies because of their periods and the chance of

pregnancy. This means that medical treatments were not

appropriately tested for women. This problem is slowly

being recognized and addressed.

1) To assess the opinions of students at The Ohio State University regarding campus safety, a reporter interviews 15 students he meets walking on the campus late at night who are willing to give their opinions.

What is the sample here? What is the population? Has the reporter chosen a random sample of students?

2) An SRS of 1200 adult Americans is selected and asked: “In light of the huge national deficit, should the government at this time spend additional money to establish a national system of health insurance?” Thirty-nine percent of those responding answered yes.

What is the sample here? What is the population? What else can you say about this survey?

Should you trust the results of the first survey? Of the second? Why?

Learning about populations from samples

The techniques of inferential statistics allow us to draw inferences or

conclusions about a population from a sample.

Your estimate of the population is only as good as your sampling design

Work hard to eliminate biases.

Your sample is only an estimate—and if you randomly sampled again,

you would probably get a somewhat different result.

The bigger the sample the better. We’ll get back to it in later chapters.

Population

Sample

Objectives (BPS chapter 9)

Producing data: experiments

Experiments

How to experiment badly

Randomized comparative experiments

The logic of randomized comparative experiments

Cautions about experimentation

Matched pairs and other block designs

Terminology

The individuals in an experiment are the experimental units. If they are human, we call them subjects.

The explanatory variables in an experiment are often called factors.

A treatment is any specific experimental condition applied to the subjects. If an experiment has several factors, a treatment is a

combination of specific values of each factor.

The factor may be the administration of a drug.

One group of people may be placed on a diet/exercise program for 6 months (treatment), and their blood pressure (response variable) would be

compared with that of people who did not diet or exercise.

If the experiment involves giving two different doses of a drug, we

say that we are testing two levels of the factor.

A response to a treatment is statistically significant if it is larger

than you would expect by chance (due to random variation among

the subjects). We will learn how to determine this later.

In a study of sickle cell anemia, 150 patients were given the drug hydroxyurea, and 150

were given a placebo (dummy pill). The researchers counted the episodes of pain in

each subject. Identify:

•The subjects : the 300 patients

•The factors : hydroxyurea and placebo

•The treatments :

• Treatment 1 : hydroxyurea

• Treatment 2 : placebo

•And the response variable : number of episodes of pain

How to experiment badlySubjects Treatment Measure response

In a controlled environment of a laboratory (especially if human

subjects are not being used), a simple design like this one, where all

subjects receive the same treatment, can work well.

Field experiments and experiments with human subjects are

exposed to more variable conditions and deal with more variable

subjects.

A simple design often yields worthless results because of

confounding with lurking variables.

3 Principles of experimental design Control the effects of lurking variables on the response, most

simply by comparing two or more treatments.

Randomize—use impersonal chance to assign subjects to

treatments.

Replicate—use enough subjects in each group to reduce chance

variation in the results.

Principles of comparative experimentsMost experiments are comparative in nature: We compare the

response to a treatment versus to: another treatment no treatment (a control) a placebo or any combination of the above

A control is a situation in which no treatment is administered. It serves

as a reference mark for an actual treatment (e.g., a group of subjects

does not receive any drug or pill of any kind).

A placebo is a fake treatment, such as a sugar pill. It is used to test

the hypothesis that the response to the treatment is due to the actual

treatment and not to how the subject is being taken care of.

About the placebo effect

The “placebo effect” is an improvement in health due not to

any treatment but only to the patient’s belief that he or she

will improve.

The placebo effect is not well understood, but it is believed

to have therapeutic results on a significant proportion of

patients.

It can sometimes ease the symptoms of a variety of ills,

from asthma to pain to high blood pressure and even to

heart attacks.

An opposite, or “negative placebo effect,” has been

observed when patients believe their health will get worse.

The best way to exclude biases in an experiment is to randomize

the design. Both the individuals and treatments are assigned

randomly.

A double-blind experiment is one in which neither the subjects nor

the experimenter know which individuals got which treatment until

the experiment is completed.

Getting rid of sampling biases

Another way to make sure your conclusions are robust is to replicate

your experiment—do it over. Replication ensures that particular results

are not due to uncontrolled factors or errors of manipulation.

In a completely randomized experimental design, individuals are

randomly assigned to groups, then the groups are randomly assigned

to treatments.

Completely randomized designs

Matched pairs: Choose pairs of subjects that are closely matched—

e.g., same sex, height, weight, age, and race. Within each pair,

randomly assign who will receive which treatment.

It is also possible to just use a single person and give the two

treatments to this person over time in random order. In this case, the

“matched pair” is just the same person at different points in time.

Matched pairs designs

Matched pairs designs are not always possible. But when they

are, they usually do a better job of controlling for lurking variables

than completely randomized designs.

Example: To assess the effects of genetics and environment on IQ,

study identical twins who have been separated at birth.

Example: To assess the effects cell phone usage on braking time, study

the reaction times of the same individual both with and without a cell.

In a block design, subjects are divided into groups, or blocks, prior

to the experiment to test hypotheses about differences between the

groups.

The blocking, or stratification, here is by gender.

Block designs

Example: Advertising

What are the effects of repeated exposure to advertising messages? Do these effects vary between men and women?

The effectiveness of an advertising campaign may depend on both the length of the message, and the number of times a viewer is exposed to the message. In addition, there might be a difference between the way men and women percieve the ad. A researcher decides to study this question. He recruits 40 undergraduates as subjects for his experiment. He has two versions of the same ad for a digital camera; the first version is 30 seconds long, while the second version is 90 seconds long. He wants to assess the effects of showing the ad either 1, 3, or 5 times.

Discuss an appropriate design for the experiment.

What are the factors?

What are the treatments?

What are the blocks?

What is the appropriate design?

Factors:

1. Number of Repetitions

a) 3 values (or levels) of this variable: 1, 3, or 5 repetitions

2. Length of ad

a) 2 values (or levels) of this variable: 30 seconds or 90 seconds

Treatments:

All combinations of a value from each factor:

1 rep.30 sec.

2 rep.30 sec.

3 rep.30 sec.

1 rep.90 sec.

2 rep.90 sec.

3 rep.90 sec.

Factor:Ad Length

Factor:Num. Repetitions

30 sec.

90 sec.

1 rep. 2 rep. 3 rep.Treatment 1: 1 rep., 30 sec.

Treatment 2: 2 rep., 30 sec.





Blocks:

1. Men

2. Women

Design:

subjects

Men

Women

Group 1Group 2Group 3Group 4Group 5Group 6

Group 1Group 2Group 3Group 4Group 5Group 6

Treatment 1Treatment 2Treatment 3Treatment 4Treatment 5Treatment 6

Treatment 1Treatment 2Treatment 3Treatment 4Treatment 5Treatment 6

Compareresponse

Compareresponse

Compareresponse

Example: Alchohol and Reaction Times A study of the effects of alchohol on reaction time studied 50

undergraduate subjects. Reaction times for each subject were measured while playing a “shooter” video game. Each subject was tested twice. In one test, the subject was asked to drink a beer 10 minutes before playing the game. In the other test, the subject was asked to drink a non-alchoholic beer 10 minutes before playing the game. (The order of the two tests was varied from subject to subject, and the subject was not given any information about which test involved the real beer, and which involved the NA beer.) The difference in average reaction times was compared for the two tests. This is an example of a1. Observational study

2. Completely randomized experimental design.

3. Matched pairs experimental design.

4. Block experimental design.

Documents

Producing data: sampling and experiments BPS chapters 8 & 9 © 2006 W. H. Freeman and Company