Is Random Assignment Passé? - Amherst Collegewenicholson.people.amherst.edu/RApaper.pdf · Is Random Assignment Passé? Walter Nicholson Department of Economics Amherst College [email protected]

Is Random Assignment Passé?

Walter Nicholson Department of Economics

Amherst College [email protected]

Version of November 14, 2001

Paper prepared for International Methodology Conference “From Theory to Practice” sponsored by Evaluation and Data Development, Human Resources Development, Canada. Ottawa, November 16, 2001.

2Estimates from designs featuring random assignment have long been considered by researchers

to be the “gold standard” in social policy evaluation. The statistical advantage of such designs

(that they control for unobserved differences between treated and untreated individuals) in

combination with the ease with which such results can be explained to policymakers has made

random assignment the benchmark against which other studies are measured (see, for example,

Lalonde, 1986). Despite this apparent success a number of challenges have recently been made

to the random assignment hegemony. Specifically, some authors have pointed out that, for some

applications, the conceptual superiority of random assignment is not a foregone conclusion

(Heckman and Smith, 1995). Others have pointed to practical and ethical problems in the

implementation of random assignment designs1. Finally, a recent challenge to random

assignment has been the development of a number of statistical methodologies that, it is claimed,

yield estimates that are equally good as those obtained through random assignment without the

implementation problems that random assignment poses (see, for example, Dehejia and Wahba,

1999).

The purpose of this paper is to provide a critical assessment of these attacks on random

assignment. It concludes that in many cases the objections to random assignment are overstated.

In other cases we show that, although the objections to random assignment have some validity,

these objections apply with even greater force to other statistical approaches to the evaluation

problem. A final section of the paper restates the case for random assignment and suggests some

ways in which such research using such designs might be enhanced.

1 Program staff are more likely to raise ethical issues connected with random assignment than are researchers. For example a survey of JTPA training centers found that more than half cited ethical and public relation concerns about participating in a random assignment evaluation (Doolittle and Traeger, 1990)

3

A. The Simple Case for Random Assignment

The case for random assignment is very simple – it is that this is the only approach to social

policy evaluation that assures statistical independence between the treatment being offered and

other determinants of outcomes. This point can be illustrated with a simple linear2 structural

model which assumes that the outcome of interest for a particular individual (Yi – which might

be taken to be annual earnings) depends on both observed (X1i) and unobserved (X2i)

characteristics3 of a population, on a treatment variable (T – which may be either binary of

continuous), and on a purely random error term (Ui – which is assumed to be independent of the

other variables) according to the equation:

.322110 iiiii UTXXY ++++= ββββ [1]

Random assignment ensures that Ti is statistically independent of all of the other variables in the

model. The “treatment effect” (β3) can therefore be consistently estimated by least squares or

other procedures4. The key independence here is between T and the unobservable variables in

the model (X2). No other approach to evaluation can guarantee that independence. In the

absence of such independence, potential correlations between T and X2 may well result in

inconsistent estimates because, by hypothesis, it is impossible to control completely for these

unobservable variables. Hence, estimates will be confounded by unobservable influence that

affect both treatments received and outcomes. That is, as a rule, they will be subject to

“selectivity biases” of unknown magnitudes and directions.

2 Of course, an equation that is linear in parameters need not be linear in variables, so this model includes many possible non-linearities. Determination of the precise functional form may not be straightforward, however (see the discussion of matching methods later in this paper). 3 Such characteristics may be “unobserved” either because the analyst has chosen not to measure them or because the variables themselves are innately difficult to measure. A brief discussion on the potential importance of more extensive data collection is provided in the concluding section of this paper.

4In addition to this unique advantage, estimates from random assignment evaluations have a

number of secondary advantages. First, because the methodology is easily explained and is

familiar from other sciences, policymakers may have greater confidence in them than they would

in those based on more complex calculations. Second, because the underlying methodology is

considered valid, variations among experimental findings (say, across sites) can be viewed as

arising from “real” differences that might be informative (about differences in program

implementation and operation or about interactions with the economic environment, say) rather

than from statistical artifacts. Finally, adoption of random assignment offers at least the

possibility of using structurally-oriented treatment specifications so that complex response

surfaces can be estimated in ways that permit estimates to be made for newly developed program

options (see Conlisk and Watts, 1977 and the discussion in the conclusion to this paper).

These advantages of random assignment have not gone unchallenged, however. The challenges

can be roughly categorized into three general groupings: (1) Methodological; (2) Ethical and

Cost and; (3) Availability of better alternatives. In the following sections we examine each of

these in turn.

4 Later we discuss why β3 cannot always be interpreted as the effect of “the treatment on the treated”

5

B. Methodological Challenges to Random Assignment

Methodological challenges to random assignment focus primarily on claims that the estimates of

the treatment parameter (β3) in Equation [1] do not actually measure what they purport to

measure. Here we look at four such claims.

1. Faulty Randomization

Although attempts by clients or program staff to undermine randomization explicitly

(through purposeful creaming of participants, for example) are relatively rare, the correct

implementation of random assignment in social experiments is not so simple as it may at first

appear. Some of the most complex design questions involve the seemingly simple issue of

“when to flip the coin”. Of course, no matter when randomization occurs in the program entry

process Equation [1] can be used to obtain a consistent estimate of β3, but the interpretation of

what T as a program “impact” may be subject to considerable ambiguity. For example,

Heckman and Smith (1995) illustrate how cost considerations drove researchers to adopt a less

than optimal placement of random assignment in the JTPA evaluation. By placing random

assignment too long before the actual initiation of specific programs, the evaluation experienced

large-scale attrition among those individuals assigned to the training treatments. It was then

problematic whether estimates of β3 actually reflect the “impact of training”5. Similar problems

occurred in randomizing the prepaid “HMO” treatment in the Health Insurance Experiment

(Manning et al., 1987). There, it proved very difficult for the researchers to implement random

assignment at a stage where it was known that both experimental and control individuals would

5 The standard solution to this problem is to redefine the observed treatment effect as measuring the impact of “an offer of training”. An alternative approach is to use random assignment as an instrumental variable to predict actual

6have been willing to participate in a prepaid medical plan – a decision that may have depended

importantly on the individual’s health status. In the Canadian context, the design of the very

successful Self-Sufficiency Project (Michalopoulos, et al. 2000) required that all participants

volunteer for the study before randomization was implemented. Whether this procedure

replicates the operation of an earnings supplement in an on-going program is open to question.

The lesson of these experiences is, of course, that considerable care must be taken in the design

of random assignment evaluations. Practical problems in implementation may indeed undermine

the validity of the results. But this problem is not unique to random assignment evaluations. All

statistical estimates of program impacts must define which individuals are program

“participants”. There will always be some arbitrariness in this definition6. And, each potential

definition may introduce its own unique selectivity biases.

participation in training. In this case, the estimate of β3 can be interpreted as the effect of the “treatment on the treated”. See Bloom et al., 1997. 6 For example, a focus only on those who “complete” a particular treatment would clearly yield a biased sample of all individuals who spent any time in it. But a participation definition that includes everyone with any time in a program will also be a biased sample of those who actually received the intended intervention. Similar problems arise in defining members of the comparison sample – especially in deciding on an artificial starting date to be applied to that sample (for a discussion in the context of evaluating employment and training programs in Nova Scotia see Nicholson, 2000)

7

2. Substitution Bias

A related set of problems occurs in random assignment evaluations in defining precisely what

“treatment” was received by members of the control group. If persons assigned to the control

group choose to enroll in programs similar to those offered to individuals in the treatment

category, or if treated individuals displace those in the comparison group in obtaining new jobs,

the interpretation of β3 requires some care. Although the parameter still measures the difference

between those treated and those not treated it cannot be viewed as an unbiased estimated of the

“impact of the treatment on the treated”. That is, because the control group does not actually

receive a “null” treatment, simple experimental-control differences will not measure the pure

impact of the program. Heckman, Lalonde, and Smith (1999) explore the statistical issues that

surround this problem. In some cases, of course, policymakers may not be specifically interested

in a “pure’ impact estimate, but may prefer information about an “incremental” estimate that

measures how program participants fare relative to individuals who experience a baseline set of

services. Hence, all evaluations must ultimately ask whether the impact estimate obtained is the

one of most direct policy relevance.

Again, although the implications of these observations for understanding impact estimates based

on random assignment should be clearly recognized, it should be pointed out that other statis tical

methods are not immune to these problems also. In most respects, there are no meaningful

differences between problems in defining the “treatment” to which the comparison group has

been exposed in experimental and non-experimental evaluations.

8

3. Experimental Artifacts

A primary purpose of social policy evaluation is to allow the analyst to extrapolate from the

results of an experiment or demonstration to derive estimates of the impact of a fully

implemented program. If the experimental treatment, T, does not accurately replicate how

individuals would view a fully implemented program, experimental estimates will cannot be

used directly for this purpose. Experimental artifacts include: (1) Reactions by either

experimental or comparison group members to the data collection aspects of an evaluation

(Hawthorne effects); (2) Reactions related to the limited duration of experiments which may not

be replicated in a permanent program; (3) Focusing experimental evaluations on client groups

that differ from those who would be served in an on-going program; and (4) Devoting more

resources to experimental treatments than would be the case under a fully implemented program.

Any of these effects could in principle severely damage the validity of experimental estimates in

making projections of the effects of full implementation.

Because experimental artifacts do not exist in evaluations that use program-generated data, these

are indeed problems unique to random-assignment or other demonstration-type evaluations.

Such problems are, however, well known and analysts have developed a number of innovative

ways for coping with them. For example, a recent job search demonstration in Maryland used a

separate experimental cell in order to evaluate Hawthorne effects (Johnson, et al., 1998).

Similarly, an early and influential paper by Metcalf (1973) shows how the biases from limited

duration experiments can be extrapolated to long term impacts by using precisely formulated

structural models. And many of the random assignment evaluations in unemployment

9compensation have taken great pains to ensure that experimental treatments closely follow the

procedures that would be used in on-going programs (Robins and Spiegelman, 2001).

4. Heterogeneous Impacts

The evaluation model specified in equation [1] implicitly assumes that there is a single treatment

effect observed for all participants. Of course, most researchers recognize that estimates of β3

can only be regarded as measuring mean impacts and that other parameters of the distribution of

impacts (such as the standard deviation of impacts, the median impact, impacts at various

quantiles of the distribution, or impacts at the margin for program expansions 7) may also be of

interest to policymakers. Perhaps the most frequently used method for presenting such

information is to look at impacts on subgroups. In general, randomization assures that such

estimates will also be consistent. Because of sample size limitations, especially for narrowly

defined populations, most random-assignment experiments have paid relatively little attention to

subgroup results, however.

An alternative approach to the study of impact heterogeneity is to model the distribution of

impacts directly. One approach is through the use of variants of the random coefficients model

(Swamy, 1971; Greene, 1997). Heckman, Lalonde, and Smith (1999) review a few other

approaches. They also make the important point that if program participation or attrition is

dependent on individuals’ perceptions of the likely impact of the program on them, the

consistency of most statistical procedures (including random assignment) may be called into

question.

10It is important again to note that many of these concerns with impact heterogeneity occur with

equal force to evaluation methods that do not employ random assignment. Examining all of the

ways in which impacts may differ across participants is simply impossible in most

circumstances. Seeking too fine a disaggregation of program impacts may also undermine the

policy rationale for learning about these impacts in the first place.

C. Ethical and Cost Challenges

Implementation of random assignment experiments in social policy often presents ethical

dilemmas. Perhaps the most frequent is that the experimental need to assign some individuals to

a “null” control treatment conflicts with basic societal norms of universal access to programs that

are assumed8 to be beneficial. Sometimes funding limitations can be used as an argument for

service denial – if there are only so many slots available in a program one might as well assign

them randomly. But this rationale often does not sit well with program staff who strongly

believe that resource limitations make the desirability of focusing services on the most needy

even more pressing. Hence, alternative approaches to solving the universality dilemma seem

worth pursuing.

One approach that has been frequently employed in many recent experimental evaluations is to

use only treatments that represent service enhancements ove r “standard” levels (this is the

approach, for example, in the on-going Self Sufficiency Project in Canada which provides

earnings supplements to former welfare recipients who find full time employment – see

Michalopoulos, et al., 2000). Denial of such enhancements to control group members is not

7 This latter notion of impact is sometimes referred to as the “local average treatment effect” (LATE) for such groups. For a discussion see Heckman, Lalonde, and Smith, 1999. 8 Notice that there is an important ambiguity here. Because program impacts are not known (they are to be determined in the evaluation), service denial may not in fact be harmful.

11viewed as conflicting with the universality mandate. The disadvantage of this approach is, of

course, that it permits an evaluation only of the enhancements, not of the standard level of

service. It would require strong structural assumptions indeed to infer the impact of standard

services only from observed reactions to the enhancements. Because many policy initiatives are

incremental this loss of generality may not be a major disadvantage, however.

Some economists have suggested monetary compensation as a way of mitigating the low service

levels provided to control cases in random assignment designs. One of the few examples of

applying this principle in practice was in the Health Insurance Experiment (Manning et al. 1987)

where all participants received a lump sum payment that insured that even those in high

coinsurance cells were not made financially worse-off by participation in the experiment.

Although no employment or training intervention has sought to buy individuals out of their

universal eligibility, there seems no obvious ethical reason (other than cost) why this could not

be done. Of course, using such a treatment would have to be carefully evaluated to ensure that

payment of the lump sum participation bonus did not interact with the treatment effect to be

estimated.

Random assignment experiments in social policy are costly in financial terms and may also

impose costs on agencies delivering the services. The financial costs of experimentation in on-

going programs consist mainly of research-specific costs because the majority of services would

have been delivered in any case. Still, the administrative costs associated with the incremental

features of random assignment experiments can be significant, amounting to perhaps 30 percent

of the total research budget. For experiments that include separately budgeted treatments, costs

can be quite high. To replicate the three largest of the random assignment experiments

12conducted in the United States during the 1970’s (the Seattle-Denver Income Maintenance

Experiment, the Health Insurance Experiment, and the National Supported Work Demonstration)

would easily cost more that $200 million each in today’s dollars (Greenberg and Shroder, 1997).

Is the information gathered from these experiments worth such costs? A comparison of

experimental costs to the costs of the on-going programs on which they focus is perhaps overly

optimistic in this regard. Surely, it might be argued, spending less than one percent of program

costs on experimental evaluations must provide information that can improve program

operations by at least that amount. Such a conclusion is not obvious, however, because, in order

for the information developed in social experiments to have value, that information must in some

way change policymakers’ decisions.9 Although it seems clear that some narrowly focused

experiments have generated information that changed policy10 it is much less clear whether

larger scale experiments have had that effect. For example, probably the two most important

empirical findings from the random assignment experiments of the 1970’s were: (1) That the

substitution effects induced by implicit tax rates on income support payments were relatively

low, at least for males (Burtless, 1986; Hum and Simpson, 1993); and (2) That individuals did

indeed respond to coinsurance rates for medical care (Manning et al. 1987). It seems probable

that both of these findings had some influence in future policy debate over welfare reform and

expansions of government-provided health insurance, respectively. But estimating the extent to

which this information produced “better” policy would be a monumental task.

D. The Challenge of New Statistical Methodologies

9 This is, the losses from adoption of non-optimal policies mu st be reduced as a result of the information gathered. 10 For example, some of the experiments that focused on the enforcement of continuing eligibility rules for unemployment insurance claimants had important effects on how job search provisions were enforced.

13Many of the disadvantages of random assignment experiments could be avoided if it were

possible to estimate impacts directly from existing data on program participants. That is, if

statistical methodologies could be developed to obtain consistent estimates of β3 directly from

program data on equation [1] by adopting procedures that substitute for random assignment,

much of the rationale for controlled experiments would disappear. Over the past twenty years

major advances have been made along two different lines of approach to this problem: (1)

Instrumental variable estimation; and (2) Matching procedures. Neither of these has yet been

shown to obviate the need for random assignment, however. So long as outcomes are

determined by unobservable factors, the validity of such procedures can never be assessed with

certainty.

1. Instrumental Variable Estimation

Instrumental variable (IV) estimation procedures depend crucially on the existence of a

measurable variable that is correlated with program participation and is statistically independent

of untreated outcomes. Inclusion of this variable in the analysis of equation [1] then permits a

separation of the program participation decision from outcome determination and, in principle,

provides a consistent estimate of β3. Perhaps the most famous method for accomplishing this

procedure was developed by Heckman (1979). In that procedure the instrumental variable (or

possibly several such variables) is first used to identify the program participation relationship 11

and then estimates from this relationship are used to obtain selectivity adjusted estimates of

equation [1].

11 In principle this relationship might be identified because of its non-linearity even in the absence of a suitable instrument. In practice identification by this alternative approach yields unreliable estimates.

14The primary shortcoming of these procedures is the absence of believable instruments. Most

measurable variables that affect program participation also affect untreated outcomes. In this

case IV estimates of β3 will be very sensitive to exactly how the procedure is employed. The

resulting instability imparts a large degree of subjectivity into which estimates are reported and

how statistical significance is assessed. Because impact estimates derived by IV procedures are

also difficult to explain to policymakers, their influence in social policy evaluation in the United

States has, to date, been rather minimal. The concluding section to this paper discusses how

some of these difficulties with IV estimation might be ameliorated through special uses of

random assignment methods.

2. Matching Procedures

Matching procedures pay little attention to the unmeasurable variables (X2) in equation [1] on the

implicit belief that a close enough matching on measured variables (X1) will ameliorate

selectivity problems 12. Early approaches to matching used multidimensional cells to draw

samples of participants and non-participants that closely matched along all the chosen

dimensions. Often these procedures floundered because of dimensionality problems. Exact

matching on a large number of variables proved intractable and the choices involved in reducing

the dimensionality of the problem often proved rather arbitrary. In any event, the matching

procedures did not control for unmeasured determinants of program participation and often this

resulted in estimates that may have been inconsistent.

A more recent approach to matching adopts the propensity score procedures developed by

Rosenbaum and Rubin (1983). These procedures match participants and non-participants

12 In principle matching on X1 may prove preferable to estimating equation [1] by least squares because the latter procedure imposes a specific structural form on the data whereas the former does not. Matching can also illustrate the “support” problem – that there are significant non- overlaps in the characteristics of participants and non-participants that are often obscured when OLS is applied uncritically.

15according to their estimated likelihood of participating in the program of interest. Because

such matching takes place over only one dimension13, it is easier to implement than more

complex matching on many characteristics and it may pose fewer support problems14. Some

initial research on these procedures suggested that they perform rather well in that they were able

to reproduce closely some estimates based on random assignment (Dehejia and Wahba, 1998).

A recent reanalysis of these results suggests that this correspondence may be an artifact of the

specific sample used, however (Smith and Todd, 2000). Regardless of how these uncertainties

about how propensity score matching performed on this specific data set (taken from the JTPA

evaluation), however, the fact remains that this matching procedure also does nothing to ensure

that unmeasured variables will not continue to impart selectivity biases into estimates of β3. It is

impossible to prove this would be true in all cases. Only with random assignment benchmarks

can the validity of the procedure be accurately assessed.

E. Conclusion – Still the Gold Standard

Hence, the overall conclusion of these brief remarks is that random assignment remains the gold

standard for social policy evaluation. Most conceptual objections to the approach apply equally

well to any other approach to evaluation. Although ethical and cost considerations may be

important in specific applications, in many others these are not insurmountable given the policy

interest in knowing program impacts. And the statistical alternatives to random assignment have

so far proven to fall far short of general acceptability (although they may work well in some

applications). In the remainder of this paper I illustrate a few of the ways that random

assignment methodology might be improved and made more general.

13 In practice, however, a variety of specifications for the propensity score equation are often tested using multidimensional matching as a test. 14 In this context such problems arise when there is little overlap between the estimated propensity scores for participant and non-participant groups.

16 1. Reconsider the advantages of structural modeling

Most recent random assignment experiments have utilized a “black box” approach in which the

treatment is conceptualized as a single binary variable. This is in contrast to the earlier

generation of random assignment experiments that were designed based on rather strong

structural models. Although some authors (Burtless, 1995) have claimed that the black box

experiments have ultimately had a greater impact on policymaking, I believe that case remains

unproven. As stated previously, probably the most lasting contributions to general knowledge

about the values of economic parameters were provided by the income maintenance and health

insurance experiments -- I imagine these results will continue to be used in a wide variety of

policy contexts long after the simpler experiments have been forgotten. Heckman, Lalonde, and

Smith (1999) make the case accurately:

Samples generated under the new model for social experiments [black box experiments] produce evidence that does not accumulate in the same way as evidence accumulated under the old model, because there is no common basis for comparing the “treatment effects” from one experiment to those from another....it is difficult to estimate policy- invariant structural parameters that can be used to evaluate a wide variety of programs never previously implemented.(p. 2084)

Designing random assignment experiments based on structural models also has advantages in

terms of sample allocation decisions and addressing experimental artifacts. Of course, it may be

the case that economic theory is not well-enough developed to specify clear structural models

that represent decisions that are of special interest to policy makers. A particular need in this

regard is the development of more carefully specified models of the process of human capital

accumulation. The goal of such models would be to identify the key parameters that influence

whether job-training programs pay off so that these could be the focus of experimental

estimation. Devising such models is no easy task. But it is unlikely that knowledge on “what

works” will advance much until this is done.

17 2. Explore innovative ways of defining randomly assigned program

enhancements

Ethical considerations constrain most random assignment experiments that focus on existing

programs to use program enhancements as treatments. There are two ways in which this

constraint can be made less severe in terms of the information generated. First, to the extent that

the enhancements can be tied to the “base level” treatment through a structural model, it may be

possible to extrapolate “backwards” to learn something about the impact of that base level.

Second, and more likely, random assignment of enhancements may encourage participation in

the program. Hence, the randomly assigned enhancement can act as an instrumental variable in

estimating the impact of the treatment itself15. For example, random assignment of child-care

vouchers in a training program for young mothers might encourage them to get training. Using

voucher eligibility as a first stage predictor of program participation may circumvent some of the

identification problems typically encountered in instrumental variable estimation procedures.

The use of monetary side payments as a program “enhancement” has been uncommon in social

experimentation. Reasons for this probably relate both to cost and to fears that public knowledge

of such payments may bring claims of giving money away. But there are good reasons why such

payments should be reconsidered. Most important, availability of cash as part of a treatment

may make it feasible to implement treatments that would otherwise be ruled out by ethical

15 The use of random assignment to generate instrumental variables provides a more robust procedure than would, for example, the collection of more information because selectivity could remain a problem with such additional information. Still, additional data might mitigate inconsistencies in IV estimation. Two particularly promising areas in which additional data collection might be considered are: (1) Measurement of information and psychological attributes of clients that might predict program participation; and (2) Measurement of characteristics of the program entry process that affect participation (for an initial attempt at this process in Nova Scotia see Nicholson, 2000)

18concerns (for example, denials or restrictions of “universal” services). Surely economists

should feel comfortable with such treatments and perhaps they can convince others of their

usefulness16.

3. Devote additional resources to formal aspects of random assignment design.

Designs of most recent random assignment experiments have stressed operational aspects of

randomization and how it can best be coordinated with on going program functions. Because the

implementation of random assignment can have important influences on how experimental

estimates are to be interpreted, such a focus does provide valuable information on how to avoid

randomization biases. However, the focus on randomization of simple “black box” treatments

has led to some lack of attention to many aspects of formal experimental design that provided

many of the early insights from social experiments. These include important topics such as

treatment definition, response surface specification, optimal sample allocation, and developing a

statistical methodology appropriate to such designs. Devoting additional resources to the design

phases of random assignment evaluations might lead to advances in these areas similar to those

that were made in the 1970s. Adapting advances in experimental design from other research

areas such as public health or engineering might be especially promising in this regard. Clearly,

after a gap of twenty-five years, now may be a good time to revisit basic methodological issues

in the design of random assignment social experiments.

4. Increasing the availability of experimental data

19Although some social experiments have made serious efforts to make their data available to

other researchers, this has not always been the case. Often issues of confidentiality, costs of

preparing public use data sets, or the simple desire of researchers to keep their data to themselves

have resulted in very limited availability. From a scientific point of view this is clearly an

undesirable state of affairs. Reanalysis of the data from a specific experiment can often turn up

unexpected results or suggest alternative ways to proceed. The ability to compare estimates

across experiments using common data definitions can often yield important insights about the

causes of impact differences. More generally, the public availability of data can help to ensure

that the results from experiments can contribute to the overall incremental accumulation of

knowledge. For these reasons, most future random assignment evaluations should contain

explicit funding for the creation of public use data sets. Resources for further examinations of

those data sets and for conducting pooled analyses of several experimental data sets should also

be available.

16 Of course, as pointed out earlier, one must be careful with the use of income supplements to understand how these may interact with the treatment parameters on interest.

20References

Bloom, H.S., L.L. Orr, S.H. Bell, G. Care, F. Doolittle, W. Lin, and J.M. Bas. 1997. “The Benefits and Costs of JTPA Title II-A Programs.” Journal of Human Resources, Summer, 32(3), pp. 549-576.

Burtless, G. 1995. “The Case for Randomized Field Trail in Economic and Policy Research.”

Journal of Economic Perspectives, Spring, 9, pp. 63-84. Conlisk, J., and H.W. Watts. 1977. “A Model for Optimizing Designs for Estimating

Response Surfaces.” In H.W. Watts and A. Rees (Editors), The New Jersey Income Maintenance Experiment, Volume III. New York: Academic Press. Pp. 430-440.

Dehejia, R. and S. Wahba. 1998 “Propensity Score Matching Methods for Non-experimental

Causal Studies.” National Bureau of Economic Research (Cambridge, MA) Working Paper NO. 6829..

Doolittle, F and L. Traeger. 1990 Implementing the National JTPA Study New York:

Manpower Demonstration Research Corporation. Greenberg, D., and M. Shroder. 1997. The Digest of Social Experiments, 2nd edition.

Washington, DC: The Urban Institute Press. Greene, W.J. 1997 Econometric Analysis, third edition. Upper Saddle River New Jersey.

Prentice –Hall. Heckman, J. 1979 “Sample Selection Bias as a Specification Error.” Econometrica. 47. pp

153-161. Heckman, J., R. LaLonde and J. Smith. 1999 “The Economics and Econometrics of Active

Labor Market Programs,” in Orley Ashenfelter and David Card, eds., Handbook of labor economics, Vol. 3A. Amsterdam: North-Holland, 1999, pp. 1865-2097.

Heckman, J.J. and J.A. Smith. 1995. “Assessing the Cases for Social Experiments.” Journal

of Economic Perspectives, Spring, 9, pp. 85-110. Hum, D., and W. Simpson 1993. “Economic Response to a Guaranteed Annual Income:

Experience from Canada and the United States” Journal of Labor Economics 11(1 Part 2):S263-S296.

Johnson, T.R., D.H. Klepinger, J.M. Joesch, and J.M. Benus 1998 “Evaluation of the

Maryland Unemployment Insurance Work Search Demonstration”. U.S. Department of Labor. Unemployment Insurance Occasional Paper 98-2.

21Manning, W.G., J.P. Newhouse, N. Duan, E.B. Keeler, and A. Leibowitz. 1987 “Health

Insurance and the Demand for Medical Care: Evidence from a Randomized Experiment”.American Economic Review 77:3 (June), pp. 251-277.

Metcalf, C. 1973. “Making Inferences from Controlled Income Maintenance Experiments.”

American Economic Review, June, pp. 478-483. Michalopoulos, C., D. Card, L.A. Gennetian, K. Hasknett, and P.K. Robins. 2000 The Self-

Sufficiency Project at 36 Months: Effects of a Financial Work Incentive on Employment and Income. Social Research and Demonstration Corporation.

Nicholson, W., 2000 “Assessing the Feasibility of Measuring Medium Term Net Impacts of the

EBSM Program in Nova Scotia” Working Paper prepared for HRDC, March. Robins, P.K. and R. G. Spiegelman. 2001 Reemployment Bonuses in the Unemployment

Insurance System: Evidence from Three Field Experiments. Kalamazoo, MI. W.E. Upjohn Institute.

Rosenbaum, P. and D. Rubin 1983 “The Central Role of the Propensity Score in

Observational Studies for Causal Effects.” Biometrika, (April) 70(1), pp. 41-55. Smith, Jeffrey and Todd, Petra. “Reconciling Conflicting Evidence on Performance of

Propensity Score Matching Methods.” American Economic Review Papers and Proceedings, May 2001, 91(2), pp. 112-18.

Swamy, P. 1971. Statistical Inference in Random Coefficient Regression Models. New York.

Springer-Verlag.

Documents

Is Random Assignment Passé? - Amherst Collegewenicholson.people.amherst.edu/RApaper.pdf · Is Random Assignment Passé? Walter Nicholson Department of Economics Amherst College [email protected]