22
Chapter 5 Data Production AP Statistics

Chapter 5 Data Production

Embed Size (px)

DESCRIPTION

AP Statistics. Chapter 5 Data Production. 4.1 Designing Samples. Observational study: We observe individuals and measure variables of interest but do not attempt to influence responses. Experiment: We deliberately impose some treatment on individuals in order to observe their responses. - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 5  Data Production

Chapter 5 Data Production

AP Statistics

Page 2: Chapter 5  Data Production

4.1 Designing Samples

Observational study: We observe individuals and measure variables of interest but do not attempt to influence responses.

Experiment: We deliberately impose some treatment on individuals in order to observe their responses.

Pros vs. Cons of each? (control etc…experiment better)

Page 3: Chapter 5  Data Production

Population and Sample

Pop: the entire group of individuals that we want information about

Sample: a part of the population that we actually examine in order to gather info

Sampling vs. Census: Sampling studies a part in order to gain info about the whole, census attempts to contact every individual in the pop

Page 4: Chapter 5  Data Production

Methods of (bad) Sampling

Voluntary response: People choose themselves by responding

Convenience sampling: Choosing individuals who are easiest to reach

Bias: The sampling method is biased if it systematically favors certain outcomes

Page 5: Chapter 5  Data Production

Simple Random Samples (SRS)

The simplest way to use chance to select a sample is to place names in a hat (the population) and draw out a handful (the sample).

SRS: every individual has = chance of getting picked, every sample of the size you are drawing has = chance of getting picked

Page 6: Chapter 5  Data Production

Random digits

Table B: long string of digits 0-9, each entry in table is equally likely to be any of the 1- digits

Choosing SRS with table: 1. Label: Assign a # label to every individual in the

pop (example: 01-50 for each senior girl @ CSH) 2. Table: use table B to select random labels 3. Stop: indicate when you should stop sampling

(toss out repeated numbers, or numbers out of your range)

4. Identify sample: use the random #’s to identify subjects to be selected from your pop. This is your sample!

Page 7: Chapter 5  Data Production

Calculator Random #’s

Math, prb, randint(lowest #, highest #, # of people you want in your sample)

Page 8: Chapter 5  Data Production

(good) Types of sampling SRS: samples chosen by chance Stratified random sample: divide population

into groups (aka strata) that are similar in some way, then choose a separate SRS in each stratum, then combine these SRS’s to form the full sample

Cluster sampling: divide population into groups (aka clusters). Some of these clusters are randomly selected. Then all individuals in chosen clusters are selected to be in the sample

Page 9: Chapter 5  Data Production

Caution about Sample Surveys

Undercoverage: occurs when some groups in the population are left out in the process of choosing the sample (hard to get an accurate and complete list of the population. Most samples suffer from some degree of this)

Nonresponse: occurs when an individual chosen for the sample can’t be contacted or does not cooperate.

Page 10: Chapter 5  Data Production

More causes of bias

The behavior of the respondent or interviewer can cause response bias in sample results

Wording of questions can influence answers

We can improve our results by knowing that larger random samples give more accurate results than smaller samples

Page 11: Chapter 5  Data Production

4.2 Designing Experiments The individuals on which the experiment is

done are the experimental units. If units are humans, they are called

subjects. The experimental condition applied to the

units (aka the thing we ‘do’ to the people participating) is called a treatment.

Goal of research is to establish a causal link between a particular treatment and a response.

Page 12: Chapter 5  Data Production

Factors & levels

Factors: number of variables interested in (example: Study differences of gender and alcohol preference. 2 factors: Gender, alcohol preference)

Levels: number of ‘categories’ for each: (gender has 2 levels…M/F, Alcohol lets say has 3 levels…hard liquor/beer/wine)

This is an example of a 2x3 study

Page 13: Chapter 5  Data Production

Control

Control for lurking variables as much as possible.

Page 14: Chapter 5  Data Production

Randomization

Comparison of the effects of several treatments is valid only when all treatments are applied to similar groups of experimental units.

Page 15: Chapter 5  Data Production

Replication

Even w/control, natural variability occurs among experimental units.

We would like to see units within a treatment group responding similarly to one another, but differently from units in other treatment groups (then we can be sure that the treatment is responsible for the differences).

If we assign many individuals to each treatment group, the effects of chance (and individual differences) will average out.

Page 16: Chapter 5  Data Production

Randomized Comparative Experiments

Page 17: Chapter 5  Data Production

Randomization produces 2 groups of subjects we expect to be similar in all respects before treatment is applied

Comparative design insures that influences other than what is being studied operate equally on both groups

Therefore, measured differences must be due either to treatment or play of chance in the random assignment of subjects to 2 groups

Page 18: Chapter 5  Data Production

Principles of Experimental Design

1. Control the effects of lurking variables on the response, most simply by comparing 2 or more treatments

2. Replicate each treatment on many units to reduce chance variation in results

3. Randomize – use impersonal chance to assign experimental units to treatments

Page 19: Chapter 5  Data Production

Statistical Significance

We hope to see big differences (differences so large they are not likely just due to chance or individual differences).

If we do have an observed effect so large that it would rarely occur by chance, we call our result Statistically Significant

Page 20: Chapter 5  Data Production

Blocking/block design

A block is a group of experimental units that are known before the experiment to be similar in some way that is expected to systematically affect the response to treatments (ex: Testing the effect of weight lifting on a group of people- men/women will have obvious differences).

Separate into “blocks” of similar subjects to reduce the effect of variation

Page 21: Chapter 5  Data Production

Matched Pairs Design

Matching the subjects in various ways can produce more precise results than simple randomization

Matched pairs design compares 2 treatments. Subjects matched in pairs.

Fitness example: Pair females with each other, males with each other, one person in each pair goes to one treatment group (weights), the other person goes to the other treatment group (pilates)

Page 22: Chapter 5  Data Production

Cautions about experimentation

Double-blind: neither subject nor experimenter knows which treatment is assigned

Lack of realism: subjects or treatments of an experiment may not realistically duplicate the conditions we really want to study.