23
Lecture 19: Program Evaluation and Quasi- Experiments

Lecture 19: Program Evaluation and Quasi-Experiments

Embed Size (px)

Citation preview

Page 1: Lecture 19: Program Evaluation and Quasi-Experiments

Lecture 19: Program Evaluation and Quasi-Experiments

Page 2: Lecture 19: Program Evaluation and Quasi-Experiments

Correlation and Causality

Fact 1: Correlation does not imply causality. Fact 2: Causality implies correlation. A correlational study can provide evidence

AGAINST a causal hypothesis. The problem is that a correlational study (by itself) does not give us much confidence when making affirmative causal inferences.

Page 3: Lecture 19: Program Evaluation and Quasi-Experiments

Program Evaluation

Page 4: Lecture 19: Program Evaluation and Quasi-Experiments

What is Program Evaluation?

The use of social science methods to systematically investigate the effectiveness of social intervention programs.

Focus on summative evaluation. Formative evaluation is where the goal is to help improve existing programs rather than making effectiveness judgments.

Senator D. P. Moynihan (1927-2003): “If there is any empirical law that is emerging from the past decade of widespread evaluation activity, it is that the expected value for any measured effect of a social program is zero.”

Page 5: Lecture 19: Program Evaluation and Quasi-Experiments

Reforms as Experiments (Campbell, 1969)

Problem: Specific [programs] are [often] advocated as though they were certain to be successful.

Campbell’s Solution: Hard-Headed Evaluation of Effects. Focus on the Importance of the Problem. Demand Empirical Evidence that Programs Work.

Political Stance: “This is a serious problem. We propose to initiate Policy A on a [trial] basis. If after five years there has been no significant improvement, we will shift to Policy B” (p. 410)

Page 6: Lecture 19: Program Evaluation and Quasi-Experiments

Surefire Paths to Success (p. 428)

What to do if you want to see that your program works? Some ideas…rely on testimonials and capitalize on regression artifacts

“Human courtesy and gratitude being what it is, the most dependable means of assuring a favorable evaluation is to use voluntary testimonials for those who have had the treatment” (p. 426)

Page 7: Lecture 19: Program Evaluation and Quasi-Experiments

Regression Toward the Mean

Extreme Scores at one time are not likely to be as extreme on a second testing.

Why? Two sets of scores are never perfectly correlated.

Take those 25 people who scored 65 or worse on Exam 1. What was their average gain from Exam 1 to Exam 2? 4.10 points. What about those 13 people who scored 95 or better? What was their average difference? A loss of 2.34 points.

Page 8: Lecture 19: Program Evaluation and Quasi-Experiments

Regression to the Mean - 2

“This true story illustrates a saddening aspect of the human condition. We normally reinforce others when their behavior is good and punish them when their behavior is bad. By regression alone, therefore, they are most likely to improve after being punished and most likely to deteriorate after being rewarded. Consequently, we are exposed to a lifetime schedule in which we are most often rewarded for punishing others, and punished for rewarding.” (Kahneman & Tversky, 1973, p. 251).

Page 9: Lecture 19: Program Evaluation and Quasi-Experiments

How could you use this phenomenon to create a positive impression of a

program?

Page 10: Lecture 19: Program Evaluation and Quasi-Experiments

Death, Taxes, and Regression to the Mean

Regression to the mean is as inevitable as death and taxes. Academic performance, emotional well-being, medical diagnosis, investment return, athletic feats, motion picture sales, and any other variable you can think of all exhibit regression toward the mean. (Reichardt, 1999, p. ix)

Page 11: Lecture 19: Program Evaluation and Quasi-Experiments

Applied Research

Page 12: Lecture 19: Program Evaluation and Quasi-Experiments

Example

Palmgreen, P., Donohew, L., Lorch, E. P., Hoyle, R. H., Stephenson, M. T. (2001). Television campaigns and adolescent marijuana use: Tests of sensation seeking targeting. American Journal of Public Health, 91, 292-296.

Page 13: Lecture 19: Program Evaluation and Quasi-Experiments

Palmgreen et al. (2001)

Objective: To evaluate the effectiveness of one television media campaign designed to reduce marijuana use among an at-risk group, high sensation seekers.

Sensation seeking is a trait associated with the need for novel and intense stimuli and the willingness to take risks to obtain such stimuli.

Design: 32-month interrupted time series design. Logic: Before and After Comparisons. Cross-site

comparisons.

Page 14: Lecture 19: Program Evaluation and Quasi-Experiments

Data Collection Basics

Duration: Beginning 8 months before the first campaign and lasting 8 months after the last campaign.

Sites: Two Counties in Southern States. Each month draw a random sample of 100

public school students in each county Measures: Sensation Seeking and 30-day

use of ATOD.

Page 15: Lecture 19: Program Evaluation and Quasi-Experiments
Page 16: Lecture 19: Program Evaluation and Quasi-Experiments

Interrupted Time Series Design

Extension of the pretest-posttest design A stronger argument can be made to

eliminate maturation, testing, and history effects

Can also be used with multiple groups with and without the treatment

Group 1 O2 O3O1 O5 O6O4X1

Page 17: Lecture 19: Program Evaluation and Quasi-Experiments

Replicated Interrupted Time Series Design #2 (p. 323)

Group 1 O2 O3O1 O5 O6O4X1

Group 2 O8 O9O7 X2 O13O11O10

Both groups are exposed to treatment but at different times.

Quite strong from the point of view of internal validity

Why?

Page 18: Lecture 19: Program Evaluation and Quasi-Experiments

What makes this a quasi-experimental design?

One or more independent variables are being manipulated but participants are not randomly assigned to conditions.

What is the textbook threat in this case? How plausible is this threat? “Quasi-experiments, however, rely heavily on

researcher judgments about assumptions, especially on the fuzzy but indispensable concept of plausibility.” (Shadish et al., 2002, p. 484)

Page 19: Lecture 19: Program Evaluation and Quasi-Experiments

Quasi-Experimental Designs

Page 20: Lecture 19: Program Evaluation and Quasi-Experiments

Notation

X = IV, a treatment, or putative (supposed) cause

O = DV, an observation, or putative effect Notation used in Campbell and Stanley

(1966) Recall Threats: Selection, Maturation,

History, Instrumentation, Attrition

Page 21: Lecture 19: Program Evaluation and Quasi-Experiments

One-Shot Case Study

Single Group Studied Once “As been pointed out (e.g., Boring, 1954;

Stouffer, 1949) such studies have such a total absence of control as to be of almost no scientific value” (Campbell & Stanley, 1966, p. 6). Basic to scientific evidence is the process of comparison. There is no point of comparison here. Misplaced precision.

X O1

Page 22: Lecture 19: Program Evaluation and Quasi-Experiments

Static-Group Comparisons

The dashed line indicates a lack of random assignment

Selection is a serious threat to internal validity

Temporal precedence is often hard to establish

Group 1 X O1

Group 2Not X O2

Group 1 X1 O1

Group 2 X2 O2

OR

Group 3 X3 O3

Page 23: Lecture 19: Program Evaluation and Quasi-Experiments

Pretest-Posttest Nonequivalent control group design

Can help evaluate the extent to which selection is a threat to validity

Temporal precedence is clear

Group 1 X1 O2

Group 2 X2 O4

Group 3 X3 O6

O1

O3

O5

OR

Group 1 X1 O2

Group 2 O4

O1

O3