10
Assessing program impact using latent growth modeling: a primer for the evaluator Brian Hess * Test Scoring and Reporting Services, The University of Georgia, 211 Fairfax Hall, Athens, GA 30602, USA Received 1 January 1999; received in revised form 1 December 1999; accepted 1 April 2000 Abstract Latent growth modeling (LGM) has emerged as a flexible analytic technique for modeling change over time because it can describe developmental processes at both the inter- and intra-individual levels. The LGM method can also provide a means for testing the contribution of other variables in order to explain variability in growth trajectories. This paper didactically illustrates the use of LGM as an analytical tool in program evaluation. Specifically, a hypothetical evaluation of a high school drug prevention program was used to demonstrate: (a) how LGM can be used to assess the longitudinal impact of a prevention program by comparing treatment and control populations with respect to individual differences in initial status and in rate of change; and (b) how predictors of initial status (post-intervention) and growth selected on the basis of a particular program theory can be incorporated in the model to explain program impact. Some advantages and limitations of using LGM in program evaluation are highlighted. q 2000 Elsevier Science Ltd. All rights reserved. Keywords: Growth curves; Program evaluation; Longitudinal modeling; Multiple-group analysis; Program theory 1. Introduction In order to overcome some of the limitations of traditional statistical approaches to the assessment of change over time (e.g., repeated measures ANOVA), a branch of analytical methods has recently emerged from the family of structural equation modeling (SEM). Such methods are referred to as “Latent Growth Models” (LGM) and approach the analysis of growth from a unique perspective (Lawrence & Hancock, 1998). In general, LGM methods are used to assess devel- opmental processes across a variety of behavioral domains by modeling both inter- and intra-individual variability in terms of initial levels and in developmental trajectories from those levels (Duncan, Duncan, & Stoolmiller, 1994). In addition, LGM methods also provide a means for testing the contribution of other variables or constructs in order to explain variability in initial levels and in patterns of growth (Lawrence & Hancock, 1998; Rogosa & Willett, 1985). Because the LGM approach permits the researcher to model both inter- and intra-individual variability in growth, more information is available and used in the measured variables than in traditional methods. That is, traditional methods like repeated measures ANOVA focus only on group mean values for each measurement point and thus variability in rate of change at the individual level is not modeled (i.e., intra-individual variability is captured in the error term). On the other hand, LGM techniques capitalize on individual variability, and simultaneously focuses on correlations over time, changes in variance, and shifts in mean values (McArdle, 1988). Recent applications of the LGM method have been to analyze growth in drug and alcohol use (see, e.g., Andrew & Duncan, 1998; Curran, Stice, & Chassin, 1997; Duncan, Duncan, & Hops, 1998), and to analyze children’s academic and social development (e.g., Schmitt, Sacco, Ramey, Ramey, & Chan, 1999). The purpose of this paper is to provide program evalua- tors with a brief introduction to the LGM method and to some of the recent literature pertaining to its use. The discussion is primarily didactic in terms of how to analyze longitudinal data using LGM to determine program impact (little emphasis is placed on discussing the technical or statistical properties of LGM). To do so, a hypothetical high school drug prevention program is utilized to illustrate how LGM could be used to assess the longitudinal impact of a program over time. Specifically, a two-group analysis strategy proposed by Muthe ´n and Curran (1997) is used to compare a treatment to a control group in terms of inter- and intra-individual variability in initial levels and in rate of Evaluation and Program Planning 23 (2000) 419–428 0149-7189/00/$ - see front matter q 2000 Elsevier Science Ltd. All rights reserved. PII: S0149-7189(00)00032-X www.elsevier.com/locate/evalprogplan * Corresponding author. 12711 Arbor Isle Drive, Temple Terrace, FL 33637, USA. E-mail address: [email protected] (B. Hess).

Assessing program impact using latent growth modeling: a primer for the evaluator

Embed Size (px)

Citation preview

Assessing program impact using latent growth modeling: a primer forthe evaluator

Brian Hess*

Test Scoring and Reporting Services, The University of Georgia, 211 Fairfax Hall, Athens, GA 30602, USA

Received 1 January 1999; received in revised form 1 December 1999; accepted 1 April 2000

Abstract

Latent growth modeling (LGM) has emerged as a ¯exible analytic technique for modeling change over time because it can describe

developmental processes at both the inter- and intra-individual levels. The LGM method can also provide a means for testing the contribution

of other variables in order to explain variability in growth trajectories. This paper didactically illustrates the use of LGM as an analytical tool

in program evaluation. Speci®cally, a hypothetical evaluation of a high school drug prevention program was used to demonstrate: (a) how

LGM can be used to assess the longitudinal impact of a prevention program by comparing treatment and control populations with respect to

individual differences in initial status and in rate of change; and (b) how predictors of initial status (post-intervention) and growth selected on

the basis of a particular program theory can be incorporated in the model to explain program impact. Some advantages and limitations of

using LGM in program evaluation are highlighted. q 2000 Elsevier Science Ltd. All rights reserved.

Keywords: Growth curves; Program evaluation; Longitudinal modeling; Multiple-group analysis; Program theory

1. Introduction

In order to overcome some of the limitations of traditional

statistical approaches to the assessment of change over time

(e.g., repeated measures ANOVA), a branch of analytical

methods has recently emerged from the family of structural

equation modeling (SEM). Such methods are referred to as

ªLatent Growth Modelsº (LGM) and approach the analysis

of growth from a unique perspective (Lawrence & Hancock,

1998). In general, LGM methods are used to assess devel-

opmental processes across a variety of behavioral domains

by modeling both inter- and intra-individual variability in

terms of initial levels and in developmental trajectories from

those levels (Duncan, Duncan, & Stoolmiller, 1994). In

addition, LGM methods also provide a means for testing

the contribution of other variables or constructs in order to

explain variability in initial levels and in patterns of growth

(Lawrence & Hancock, 1998; Rogosa & Willett, 1985).

Because the LGM approach permits the researcher to

model both inter- and intra-individual variability in growth,

more information is available and used in the measured

variables than in traditional methods. That is, traditional

methods like repeated measures ANOVA focus only on

group mean values for each measurement point and thus

variability in rate of change at the individual level is not

modeled (i.e., intra-individual variability is captured in the

error term). On the other hand, LGM techniques capitalize

on individual variability, and simultaneously focuses on

correlations over time, changes in variance, and shifts in

mean values (McArdle, 1988). Recent applications of the

LGM method have been to analyze growth in drug and

alcohol use (see, e.g., Andrew & Duncan, 1998; Curran,

Stice, & Chassin, 1997; Duncan, Duncan, & Hops, 1998),

and to analyze children's academic and social development

(e.g., Schmitt, Sacco, Ramey, Ramey, & Chan, 1999).

The purpose of this paper is to provide program evalua-

tors with a brief introduction to the LGM method and to

some of the recent literature pertaining to its use. The

discussion is primarily didactic in terms of how to analyze

longitudinal data using LGM to determine program impact

(little emphasis is placed on discussing the technical or

statistical properties of LGM). To do so, a hypothetical

high school drug prevention program is utilized to illustrate

how LGM could be used to assess the longitudinal impact of

a program over time. Speci®cally, a two-group analysis

strategy proposed by MutheÂn and Curran (1997) is used to

compare a treatment to a control group in terms of inter- and

intra-individual variability in initial levels and in rate of

Evaluation and Program Planning 23 (2000) 419±428

0149-7189/00/$ - see front matter q 2000 Elsevier Science Ltd. All rights reserved.

PII: S0149-7189(00)00032-X

www.elsevier.com/locate/evalprogplan

* Corresponding author. 12711 Arbor Isle Drive, Temple Terrace, FL

33637, USA.

E-mail address: [email protected] (B. Hess).

change over time. Next, a demonstration of how predictor

variables selected on the basis of program theory can be

incorporated in the LGM in order to explain variation in

initial status and in patterns of growth is provided. Last,

some advantages and limitations of using the LGM method

in program evaluation are highlighted.

2. Latent growth models

In general, growth curves are viewed as a set of statistical

models for the study of individual differences in change.

This modeling strategy assumes that each individual has

his or her own (latent) trajectory of growth, which is

measured with some inaccuracy (Lawrence & Hancock,

1998). ªGrowthº can re¯ect either a positive or a negative

rate of change over time. Because each individual has his or

her own growth trajectory, time is nested within the indivi-

dual. This is referred to as the within individual level or the

Level 1 model (Willett & Sayer, 1994). The basic equation

for this within individual level is

yij � b0j 1 b1jtj 1 eij

where yij is the outcome of interest at time i for person j; b 0j

represents the initial status at time tj � 0; b 0j represents the

rate of change over time for person j; and eij is the distur-

bance (error) term.

Systematic variability in parameters of growth across

individuals can also be modeled. This is referred to as the

between individual level or the Level 2 model. In this case,

two models are speci®ed, one for the initial status para-

meter, b 0j, and one for the slope parameter, b 1j (Willett &

Sayer, 1994). The basic equation for the Level 2 model can

be expressed as:

b0j � a00 1 g01Gj 1 z0j

and

b1j � a10 1 g11Gj 1 z1j

where a 00 and a 10 are intercept parameters representing

initial status and rate of change when Gj is zero; g 01 and

g 11 are slopes relating Gj to initial status and rate of change,

respectively; z 0j and z 1j are the disturbance terms.

Finally, this growth model can be further extended to

allow individuals to be nested in groups such as in treatment

and control groups. In this case, the two groups become the

Level 3 model and can be used to study intervention

(program) effects on initial status and rate of change over

time. Moreover, as one can see in Fig. 1, individuals may

grow at different rates, thus LGMs capitalize on individual

variability. For a more detailed discussion of how covar-

iance structure analysis (general LISREL model) is used

to model individual change over time in accordance with

Levels 1 and 2 delineated above, see Willett and Sayer

(1994, 1996).

One major underlying assumption when using the LGM

method is that individual growth within a particular group

must follow the same functional form. This does not mean

that growth must necessarily follow a linear tend; growth

may actually be of a different functional nature (e.g., quad-

ratic or logarithmic). For example, in many evaluation

studies, the control group exhibits a growth form that is

linear, but growth due the speci®c program treatment is

quadratic in form. In addition, when using a SEM approach,

the LGM method assumes, ideally, that the number and

spacing of assessments are the same for all individuals

over time. If the number of time points or the spacing

between time points vary across individuals, other growth

curve techniques are available (e.g., under the framework of

hierarchical linear modeling described by Bryk & Rauden-

bush, 1992). Duncan et al. (1994) pointed out that LGM

methods can be applied to circumstances in which indivi-

duals are not measured at the same time intervals, however,

speci®c constraints are needed for parameter estimation.

Nevertheless, in LGM, each individual's growth over a

span of time is represented by a line and may be summarized

by a unique intercept and slope. An individual's intercept

describes the amount of the outcome variable possessed at

the initial measurement point. The slope captures informa-

tion about how much the individual changes for each time

interval after the initial measurement point (Lawrence &

Hancock, 1998). As a result of individual differences in

these intercepts and slopes, changes occur in the relation-

ships among individuals' data across the different time inter-

vals. Speci®cally, an individual's position shifts across time,

relative to other individuals in the population.

In LGM, both intercept and slope are treated as latent

variables (or factors) and are not measured directly. Fig. 2

represents an example structural model for linear growth

trajectory. Note that in this linear model all paths from the

intercept factor to each time point are ®xed to 1 and all paths

from the slope factor to each time point are ®xed to 0, 1,

B. Hess / Evaluation and Program Planning 23 (2000) 419±428420

Fig. 1. Individual growth curves (nested in groups) across four time points.

2, and 3. Fixing the paths from the slope factor to 0, 1, 2, and

3 allows the investigator to test whether this model repre-

senting linear growth ®ts the data satisfactory. Conversely,

if one was interested in determining quadratic growth, the

paths from the slope factor would be set 0, 1, 4, and 9,

respectively. Once these paths are ®xed accordingly, then

mean and variance parameters associated with each factor

as well as residuals can be estimated.

A unique and powerful advantage of using the LGM

method is its ability to incorporate predictors of intercept

and slope factors (Lawrence & Hancock, 1998). As will be

shown, this may prove useful when a particular theory is

used to explain how a program is designed to have an

impact. Moreover, the latent variable analysis approach to

the assessment of change is presented in more technical

detail in McArdle (1988), McArdle and Epstein (1987),

Meredith (1991), Meredith and Tisak (1990) and MutheÂn

(1991). Some applications that elucidate the development of

the LGM method can be found in Duncan and Duncan

(1995) and Duncan et al. (1994).

2.1. Estimating a LGM

To estimate a LGM as shown in Fig. 2, the correlation

matrix and the means and standard deviations of the

measured variables at each time point are used. Using the

SEM approach, the LGM must be speci®ed and assessed for

®t by using a x 2 test and other ®t indices (e.g., root mean

square error of approximation). Procedurally, a constant C is

introduced between the intercept and slope factors allowing

one to estimate the means for both factors. It should be

pointed out that the constant C is not a variable per se; it

assumes a constant value of 1 for all individuals and thus has

no variance and cannot covary or have an effect on any

measured variable or factor. With the constant C in the

model, parameter a1 or the average initial level, and para-

meter a2 or the average rate of change, can be estimated for

the sample. The variance of the intercept and slope factors

are also estimated and are denoted as v1 and v2, respectively

in Fig. 2. In addition, the parameter represented by the

curved double-headed arrow path between the slope and

intercept factors denoted as v3 can also be estimated and

represents the belief that how much one's behavior changes

may be related to where one starts out. In other words, a

positive value for the estimate v3 would indicate that, on

average, those who started out higher at the initial level

grow at a positive rate across time. Finally, errors in

measurement (residuals) are estimated for each time point

and are denoted as E1±E4 in Fig. 2.1

3. Implications for program evaluation

When using the LGM method to assess change over time,

any general form of longitudinal data can be studied. This

includes testing, for example: (1) if mediational variables

in¯uence the developmental process; (2) if distal outcome

variables are in¯uenced by the developmental process; and

(3) if multiple developmental processes exist for more than

one outcome variable (MutheÂn & Curran, 1997). Addition-

ally, the LGM framework can be applied to multiple popu-

lation (e.g., treatment-control) studies. Therefore, the LGM

method can be used in program evaluation.

Typical evaluation questions that LGM methodology

could address include, ªGiven individual differences in

initial status and in growth, what are the longitudinal effects

of a program?º Or, more speci®cally, ªIs there a slower rate

of marijuana use for those exposed to a drug prevention

program as compared to students who did not receive the

program?º Another question LGM methods might address

for the educational evaluator is, ªDo rates at which children

learn differ by attributes of the program in which they were

exposed?º Questions like these can be answered when

continuous data are available longitudinally on many

individuals (Willett & Sayer, 1994).

LGM methods have an advantage over the traditional

pretest±posttest designs in utilizing more than two waves

of data to maximize information on individual change.

When development follows an interesting trajectory over

time, ªsnapshotsº of status taken only before and after are

unlikely to reveal the intricacies of individual change

(Willett & Sayer, 1994). Hence, LGM methods can capture

long term impact of a program while capitalizing on

intra-individual differences in rate of change.

It stands to reason that assessing program impact via

B. Hess / Evaluation and Program Planning 23 (2000) 419±428 421

1 The analysis of LGMs can be easily carried out using LISREL due to its

¯exibility and convenience for estimating speci®c parameters in the model

(JoÈreskog and SoÈrbom, 1993). In LISREL terminology, the paths from each

intercept and slope factors to each observed timepoint are ®xed in the l y

(lambda) matrix according to Fig. 1, assuming a linear trajectory. Intercept

and slope means are free to be estimated in the a (alpha) matrix, and the

intercept and slope variances are free to be estimated in the c (psi) matrix.

Errors or residuals are free to be estimated in the u1 (theta-epsilon) matrix.

Fig. 2. A LGM used to determine program impact.

covariance modeling may become more recognized given

the ¯exibility that LGM techniques afford. Recently, Irvine,

Biglan, Smolkowski, Metzler, and Ary (1999) applied LGM

to determine the effectiveness of a parenting skills program

for parents of middle school students using a randomized

control trial design. However, one challenging area in

program evaluation in which the ef®cacy of LGM techni-

ques could be assessed is within quasi-experimental designs

where time invariant covariates might be recommended.

These types of non-equivalent groups designs are often

encountered when evaluating prevention programs in

mental health or education programs.

4. Assessing program impact via LGM: a hypotheticalexample

To provide an example, say an evaluator is interested in

assessing the impact of an innovative high school drug

prevention program across the 4 years of high school

(ninth through twelfth grades). One sample of ninth grade

students from an urban school is identi®ed to receive the

program at the beginning of the year while a sample from

another school very similar in characteristics is chosen as a

control group.2 The observed outcome variable will be

frequency and amount of drug use (based on a composite

of alcohol and other drugs) and will be measured at the end

of the ninth grade (initial time point), followed by measures

taken at the end of tenth grade (Time 2), eleventh grade

(Time 3), and twelfth grade (Time 4) in order to gauge

rate of change across the high school years. For the sake

of simplicity, growth is hypothesized to be linear for both

populations; the rate of change for the treatment population

is hypothesized, in this example, to be slower as a result of

the program. The basic model in Fig. 2 is used to test for this

program effect. Notice that the paths are ®xed to test for

linear growth.

To begin to model change in a way that allows the evalua-

tor to determine program impact, a step-by-step multiple-

group analysis strategy put forth by MutheÂn and Curran

(1997) is recommended. According to MutheÂn and Curran,

the ®rst step is to ®t the model to the control group. The

control population represents the normative set of individual

growth trajectories that would have been observed in the

treatment population had they not been chosen for treat-

ment. From this analysis, it is important to rule out that

the control population exhibits any of the post-intervention

changes in growth trajectories that are hypothesized to be

due to treatment (MutheÂn & Curran, 1997). For the control

group, the model should statistically ®t the data (e.g., based

on a x 2 test). Following this, the treatment-group model is

then analyzed separately and the basic growth trajectory

form is investigated.

Keep in mind that for both the treatment and control

models, four parameters (along with the residuals) will be

assessed in order to fully understand the treatment effect. To

reiterate, the ®rst two parameters re¯ect the mean of the

intercept factor (mean initial level) and mean of the slope

factor (average rate of change) denoted as a1 and a2, respec-

tively, in Fig. 2. The second two parameters re¯ect the

variance of the intercept and slope factors denoted as v1

and v2, respectively, in Fig. 2. In addition, v3 can be assessed

if it is hypothesized that a relationship exists between initial

status and rate of change.

Following separate tests of the treatment and control-

group models, program impact is assessed by conducting

a multiple-group analysis. This requires ®rst constraining

all four parameters to be invariant (equal) in the control

group and then obtain the model ®t or x 2 for the control

group model. Next, each of the four parameters are freed in

the treatment model one at a time. When testing each para-

meter individually, a signi®cant change in x 2 from the treat-

ment model compared to that of the control model of the

previous step would reveal if that parameter value for the

treatment group is signi®cantly different from that of the

control group. That is, lack of parameter equality suggests

a signi®cant program effect. Thus, the effect of the program

is assessed by comparing the set of initial levels and trajec-

tories in the treatment population with those in the control

population.

To explain how parameters are assessed for inequality

and what is interpreted from the multiple-group analysis

described above, consider ®rst testing a1 or the mean drug

use at the end of ninth grade (i.e., initial status). When this

parameter is freed in the treatment model and this model's

x 2 is found to be statistically different from the control

model x 2 where all parameters were constrained to be invar-

iant, this will indicate that, for the treatment group, the mean

drug use at the end of ninth grade is signi®cantly different

than that of the control group. This of course assumes that

the two groups are equated at the very beginning. In nonran-

domized designs, equality may not be assured and so it may

behoove the evaluator to obtain a pre-intervention measure

of the outcome variable and use this pre-measure as a

covariate for the intercept factor. In this example, if the

estimated mean value for the intercept factor is lower than

that estimated for control group, this will show that the

prevention program had an impact at the initial level.

Next, a2 or average rate of change is assessed. As with a1,

if a2 is signi®cantly smaller for the treatment group based on

the x 2 difference test, this will indicate a slower rate of drug

use for the treatment group compared to the control group.

Next, parameters v1 and v2 represent the variance of the

intercept and slope factors and can be assessed individually

as a1 and a2. For example, a smaller treatment-group variance

B. Hess / Evaluation and Program Planning 23 (2000) 419±428422

2 For sake of demonstration, it will be assumed that the two groups are

equated at the onset on all measures. However, it is recognized that because

random assignment is often unfeasible in this type of evaluation design,

other strategies should be used to equate the two groups. In such quasi-

experimental designs, time variant and/or time invariant covariates such as

pretest measures may be included in the model to adequately equate and

compare the two groups.

for the growth factor would represent a program effect that

makes growth more homogeneous among individuals (i.e.,

students exposed to the drug prevention program will be

more alike in their rate in change across time). Note that v3

in Fig. 2 may also be tested if one has reason to believe, for

those exposed to the program, the lower one starts out at the

initial level, the slower the rate of change across time. Finally,

errors or residuals at each time point can be assessed and

compared.

4.1. An alternative approach

As an alternative method to determine treatment effect in

a two-group design, particularly when the intervention takes

place after the initial time point, MutheÂn and Curran (1997)

suggested the same multiple-population analysis discussed

above. However the actual effect of the treatment is deter-

mined by including an additional slope or growth factor for

the treatment population. Fig. 3 shows the model for the

treatment group with the added growth factor (set to be

quadratic in form).

In this model, the ®rst two factors are interpreted the same

as the control population, however, the third factor repre-

sents the growth form (quadratic) that is speci®c only to the

treatment population. Notice that only the intercept factor

in¯uences the ®rst time point (i.e., path to Time 1 is set to 1).

This re¯ects the individual's initial status level before the

intervention. At following time points, the second factor

represents the linear growth rate that the individual would

progress had the individual not received the program. The

inclusion of the third or added growth factor captures the

increment or decline in growth beyond that of the control

population. Moreover, this analysis strategy is particularly

useful when growth due to a program (e.g., a quadratic

trend) might be different than the growth presented by a

control population. Therefore, the evaluator should consider

this analysis approach and include the added growth factor

of the form hypothesized to be due to the program.

In short, to test for program impact by way of including

an added growth factor, begin analyzing the control and

treatment-group models separately to determine the growth

form of each population. Following this, proceed with the

multiple-group analysis strategy. In doing so, the para-

meters for the ®rst and second growth factors for the

program group are constrained to be equal to those of the

control group.3 The added growth factor must be included in

order to capture the positive or negative rate of change

beyond that of the control group. For instance, if the control

group shows linear growth, one may add a quadratic growth

factor to the program group (assuming that the only differ-

ence is in the quadratic growth trend). Program impact is

re¯ected in the mean and variance of the added growth

factor compared to the control group's growth rate. Again,

the evaluator can test each parameter by way of the x 2

difference test.

However, MutheÂn and Curran (1997) pointed out that

before testing the equality of each parameter, one might

test for an initial status by treatment interaction Ð that is,

allow the initial status factor to in¯uence the treatment

growth factors in the treatment group. This is equivalent

to testing for homogeneity of slopes in the traditional

ANCOVA context. In sum, this analysis strategy whereby

testing the contribution of an added growth factor due to the

program is indeed a more useful approach, especially when

the intervention follows the initial time point and when

treatment growth follows a different form than the control

population.

5. Incorporating program theory in the evaluationdesign

In many instances, evaluators incorporate program theory

in their evaluation design to assist them in understanding

how the program produced (or failed to produce) its

intended and perhaps unintended effects (Chen, 1990, p.

171). ªCausalº mechanisms of a program should be exam-

ined within the framework of the program's theory because

traditional input±output assessments may lead an evaluator

to provide an impoverished version of causal inference

(Cordray, 1986). Allowing theory to drive the evaluation

also broadens the evidential basis by actively considering

plausible rival explanations, by examining implementation

procedures, and by investigation mediation and contextual

factors.

Speci®c bene®ts of integrating program theory in the

evaluation design have been recognized. Bickman (1987)

provided a list of bene®ts that can result in articulation of

program theory and its integration with program evaluation.

B. Hess / Evaluation and Program Planning 23 (2000) 419±428 423

Fig. 3. Determining program impact by including an added growth factor

for the treatment group.

3 Restricting the parameters of the ®rst growth factor to be equal across

treatment and control groups is suggested for randomized studies where

populations are similar at the initial time point. In nonrandomized designs,

MutheÂn and Curran (1997) suggested relaxing the equality constraint for the

initial status factor, and similarly, the model can be improved if time

invariant covariates are used.

Advantages include: (1) specifying the underlying theory of

a program within the evaluation allows that theory to be

tested in a way that reveals whether program failure results

from implementation failure or theory failure; (2) program

theory clari®es the connections between a program's opera-

tions and its effects, and thus helps the evaluator to ®nd

either positive or negative effects that otherwise might not

be anticipated; (3) program theory can also be used to

specify intermediate effects of a program that might become

evident and measurable before ®nal outcomes can be mani-

fested, which can provide opportunities from early program

assessment in time for corrective action by program imple-

menters; and (4) program theory may be the best method of

informing and educating stakeholders so that they can

understand the limits of the program. In conclusion, it

seems to be to the evaluator's advantage to integrate

program theory in their evaluation design (at least in part

if possible).

5.1. Testing the contribution of predictors based on program

theory

In addition to assessing program impact by way of the

two-group analysis approach as illustrated using the models

in Figs. 2 and 3, the same analysis approach can be used

when incorporating predictors of the intercept and slope

factors based on a particular program theory. Consider the

same hypothetical high school drug prevention program

discussed earlier. However, this time, assume the evaluator

would like to determine program impact as driven by a

program theory. The (simpli®ed) theory proposed underly-

ing the hypothetical program has, say, two components: a

psychological and a social component. The psychological

component centers on knowledge of the health conse-

quences of drug use. The social component centers on social

skills like assertiveness, competency, and resistance

towards in¯uential drug-using peers. The hypothetical

program is designed to reduce the rate of drug use over

time by providing students with adequate knowledge

about the consequences of drug use, and at the same

time, provide students with good social skills so they can

be interpersonally strong during their high school years.

In terms of design and analysis, to explain how this inno-

vative prevention program is designed to have an impact

over time, the two theoretical components must be opera-

tionally de®ned, adequately measured, and included as

predictor variables of both intercept and slope factors. For

the sake of illustration, knowledge about the consequences

of drug use is measured using a valid instrument and will be

assumed to be measured without error. Similarly, social skill

is measured by degree of assertiveness and competency,

also measured without error. Parenthetically, it should be

noted that, in practice, the evaluator might incorporate

several measures of these two constructs as a way to account

for the measurement error. Both of these variables will be

measured at the completion of the prevention program

(before Time 1). Moreover, it is hypothesized that, for

those exposed to the prevention program, there will be an

increase in the amount of knowledge about drug use, and

similarly, an increase in level of social skill due to the

program. In turn, this will lead to lower initial levels and

slower rate of drug use over the high school years (as

compared to a control group). Fig. 4 shows that the rate of

change is hypothesized to be linear in form (i.e., paths from

the slope factor to each observed time point are set 0, 1, 2,

and 3).

To provide an illustration of how this is incorporated

within a LGM design, consider Fig. 4, which shows the

complete model (used for both the control and treatment

groups). In this case, instead of just four parameters to be

assessed as in Fig. 2, there are now 12 parameters of interest

(as well as the residuals) labeled accordingly in Fig. 4. As

with the basic two-group intervention design, the same step-

by-step multiple-group analysis strategy is used. That is,

®rst ®t the model to the control and then the same model

for the treatment group. Next, conduct the multiple-group

analysis to determine lack of equality for each parameter

value.

To explain how parameters are assessed for inequality

and what is interpreted from the multiple-group analysis

using the model in Fig. 4, begin with parameter a1. Para-

meter a1 refers to the mean drug use at the end of ninth grade

(i.e., initial status). When a1 is free to be estimated in the

treatment model and this model is compared to the control

model where all parameters were constrained to be invar-

iant, a signi®cant change in x 2 value will indicate that, for

the treatment group, the mean drug use at the end of ninth

B. Hess / Evaluation and Program Planning 23 (2000) 419±428424

Fig. 4. A LGM incorporating predictors of intercept and slope factors based

on program theory.

grade is signi®cantly different than that of the control group

(again assuming that the two groups are equated at the very

beginning).

Parameter a2 in Fig. 4 re¯ects the average rate of change

over the high school years. As with a1, if this parameter is

signi®cantly smaller based on x 2 difference test, this will

indicate a slower rate of drug use for the treatment group

compared to the control group. Parameter a3 and a4 in Fig. 4

refer to the mean level of knowledge and mean level of

social skill, respectively. A signi®cant difference in means

(i.e., in this case, the means for the treatment group should

be larger than those of the control group) indicates that the

prevention program signi®cantly increased amount of

knowledge and social skill. Of course again this assumes

the groups were equated at the beginning and selection bias

was not a threat; if not, a pretest covariate for each predictor

might be used.

To determine if the two predictors account for a signi®-

cant amount of variability in initial levels and rate of change

as suggested by program theory, the four parameters or path

coef®cients labeled b1, b2, b3, and b4 in Fig. 4 are individu-

ally tested (in LISREL terminology these path coef®cients

are b parameters). Based on the present hypothetical

evaluation, it is hypothesized that for those in the prevention

program, the program will lead to higher amounts of knowl-

edge and greater levels of social skill. This in turn will lead

to lower initial levels and slower rate of change over time

for those exposed to the program. To test this hypothesis,

each of the four path estimates for the treatment group is

compared to those in the control group one at a time in the

multi-group analysis. If change in x 2 is signi®cant for

each test and the paths are in the right direction (i.e., in

this example, all four paths should be negative and signif-

icantly larger than those for control group), then one can

conclude that the predictors explain a signi®cant amount

of the variability in initial levels and in rate of change.

Parameters v1 and v2 in Fig. 4 represent the variance of the

intercept and slope factors and can be assessed to deter-

mine if the variability in initial status and in growth for

the treatment group is greater than that of the control

group. Similarly, parameters v3 and v4 represent the

variance of the two predictors and can also be assessed

to determine if the variability in knowledge and social

skill for the treatment group is greater than that of the

control group. Last, errors or residuals can be assessed

and compared.

In sum, program theory can be incorporated within a

LGM approach to assess how a program had (or failed to

have) an immediate as well as a longitudinal impact over

time (i.e., a preventive effect). In the simplest terms, this can

be done by testing the contribution of predictor variables

(based on program theory) in order to explain inter- and

intra-individual variability in initial levels and in rate of

change over time. This requires a well-formulated program

theory and a properly speci®ed model in accordance with

this theory.

6. Advantages of using LGM in program evaluation

As seen in the previous examples, the major advantage of

utilizing the SEM approach to modeling change is its ¯ex-

ibility to capitalize on variability in individual differences at

both the initial level and in growth over time. Subsequently,

LGM is easy to implement using existing software programs

like LISREL (Kaplan & George, 1998). Although only the

x 2 difference test was discussed, other various goodness-of-

®t indices provided by LISREL allow for a wide variety of

substantively interesting hypotheses to be tested, such as

testing the adequacy and stability of the hypothesized

growth form (e.g., as indexed by the root mean square

error of approximation). The SEM approach also allows

for the study of more than one outcome (see Willett &

Sayer, 1996). Furthermore, Kaplan and George (1998)

pointed out that utilizing a SEM approach allows the

researcher to incorporate multiple indicators of the outcome

of interest at each time point, thus accounting for the

problems of measurement error. Last, when using SEM,

one can easily handle missing data (see MutheÂn, 1993) as

well as incorporate categorical variables (see MutheÂn,

1996). As with repeated measures ANOVA, when including

categorical independent variables, interaction effects (e.g.,

between gender and treatment) can also be tested using

LGM techniques.

Compared to ANCOVA, LGM techniques are more ef®-

cient, requiring fewer cases. The trade-off is that for

increased power, more measurement periods are needed,

but on fewer cases. This bene®t is reduced by an increase

in attrition rate. In addition, interaction effects (say between

gender and treatment) can be detected without large sample

sizes if the interaction effect is sizeable (MutheÂn & Curran,

1997). MutheÂn and Curran put forth a framework for power

estimation to detect treatment effects. Using this framework,

the researcher can compute his or her own power curves for

a speci®c model given the hypothesized size of the para-

meter values.

Whether or not program theory is considered in the

evaluation design, LGM methodology allows the evaluator

to test the contribution of predictors of initial status and

growth. LGM techniques also provide the evaluator with

the ability to incorporate both ®xed and time varying covari-

ates. Schmitt et al. (1999), for example, used parental

employment as a predictor of children's academic develop-

ment, using school climate as a covariate. In the case of the

hypothetical drug prevention evaluation discussed in this

paper, the evaluator may want to adjust or control for

other psychological variables or demographics like SES

(depending on the program) that may be hypothesized to

explain drug use. As noted earlier, incorporating time invar-

iant covariates (e.g., pretest measures) are essential in

nonrandomized designs in order to equate groups. Addition-

ally, using LGM within the evaluation of a drug prevention

program over time can also allow growth on several

constructs to be tested simultaneously (e.g., concomitant

B. Hess / Evaluation and Program Planning 23 (2000) 419±428 425

drug and alcohol use). In sum, a further understanding of the

strengths of using LGM in intervention designs, particularly

the data demands that are associated with this type of

modeling, will evolve as more applications emerge. One

such application by Reynolds and Temple (1995) discussed

strengths associated with reducing selection bias in quasi-

experimental LGM designs compared to traditional

econometric modeling.

6.1. Distinguishing implementation failure from theory

failure

Other postulated bene®ts afforded when integrating

program theory in evaluation echo some of the bene®ts

proposed by Bickman (1987) discussed earlier. For exam-

ple, one of the most bene®cial aspects of integrating

program theory is that, by including predictor (and/or

mediator) variables in the model, the evaluator can pinpoint

which factors in the causal chain lead to program success or

failure. Speci®cally, keeping with the present hypothetical

high school drug prevention example, specifying the under-

lying theory of the development of drug use helps the

evaluator to determine if program failure was due to

implementation failure or theory failure.

Chen (1990, p. 201) suggested that implementation fail-

ure occurs when the treatment or program fails to affect the

ªcausalº or predictor variable(s), but the basic conceptual

theory underlying the program is sound. From a theory-

driven perspective, Chen refers to the link between the

program and the casual variable as ªaction theoryº and

subsequently he refers to this type of breakdown as ªaction

theory failure.º For example, if the drug prevention program

failed to ªactivate,º or more speci®cally, failed to increase

level of knowledge and social skill, then the problem was in

the program as implemented not in the basic conceptualiza-

tion of the program. On the other hand, theory failure occurs

when program treatment has successfully activated the

causal or predictor variables, but the underlying conceptual

theory is invalid. Consequently, the outcome variable is

unaffected. Chen refers to this link between the causal vari-

able and outcome variable as ªconceptual theoryº and thus

calls this breakdown ªconceptual theory failure.º In this

case, the prevention program may have increased level of

drug knowledge and social skill, but frequency of drug use

at the initial time point and/or over time was unaffected.

Fig. 5 illustrates how the two links tie program variables

to their intended effects based on Chen's paradigm. For both

the action and conceptual theory links, it is critical that the

model is properly speci®ed and includes all relevant vari-

ables based on program theory, otherwise the model will not

give a full impression of how the program succeeded or

failed. For example, if the action theory suggests that

improved social skill will lead to less drug use not directly,

but indirectly through a mediator variable (e.g., resistance to

peer pressure), then the mediator variable must be measured

and included in the model. Otherwise, if program failure due

to poor implementation occurred, it will not be known

whether it was due to the program's failure to increase

social skill or if increased social skill failed to increase

resistance to peer pressure. From Fig. 5, it is easy to see

that both the action and conceptual components of program

theory helps clarify the connection between the program's

operation and its effect on drug use across time. Moreover,

identifying both the action and conceptual program theory

links and properly specifying the LGM provide a means for

informing and educating stakeholders regarding how to

diagnose sources of program failure or success. This kind

of diagnostic function can also provide useful informa-

tion for future program improvements (Chen, 1990,

p. 191).

7. Some limitations of using LGM in program evaluation

Like all other evaluation designs, latent growth modeling

has limitations. One of the most obvious disadvantages in

using LGM methods in assessing program impact is the fact

that it requires measures of an outcome at multiple time

points. Attrition may be a problem leading to missing

data. Fortunately, MutheÂn (1993) and MutheÂn, Kaplan,

and Hollis (1987) discussed how to handle missing data in

the latent variable framework. For a discussion on how to

model incomplete data due to attrition in the LGM frame-

work, see Duncan and Duncan (1994) and Schafer (1997).

From a statistical perspective, because LGM is a latent

variable analysis method using SEM, it shares many of the

same weaknesses (Duncan et al., 1994). For instance, LGM

methods assume multinormally distributed variables and

assume that change is systematically related to the passage

of time, at least over the time interval under study. Further-

more, Duncan et al. (1994, p. 344) noted that, ªthe applica-

tion of LGM within the SEM framework depends, at least

ideally, on data that are collected when individuals are

observed at approximately the same time, and the number

and spacing of assessments are the same for all individuals.º

Limitations exist as well when integrating program

theory and including relevant predictor(s) of the intercept

B. Hess / Evaluation and Program Planning 23 (2000) 419±428426

Fig. 5. Specifying the action and conceptual theory links provides assis-

tance when determining the sources of program failure or success.

and slope factors. Weiss (1997) provided a discussion as to

the dif®culty associated with adopting a [strict] theory-

driven perspective in evaluation practice. For instance, she

noted that there are stark disagreements as to what repre-

sents an adequate, well-de®ned program theory. In social

science, theory is broadly construed by many scientists and

so theory at times offers little guidance for analysis. Hence,

it is not advocated in this paper that evaluators should

consider themselves strictly ªtheory-drivenº in order to

fully understand the operations of a program and its

outcome, even when using an LGM approach. What is

suggested is that evaluators should be more ªtheory

consciousº or try to consider a clear, properly speci®ed

theory as a guide when choosing explanatory variables

that help explain variability in their outcome variables (In

the LGM case, explaining variability in initial levels and in

growth). Moreover, even if a program theory is too concep-

tually broad for a good theory-driven evaluation, incorpor-

ating predictors of initial levels and growth will still add to

our knowledge of the mechanisms by which programs have

their effects.

Apart from the fact that using the LGM method requires

at least some working knowledge of SEM, one possible

resistance to using LGM in program evaluation, particularly

when integrating program theory, might be due to the issue

of evaluation design. Many evaluations are restricted to

using, for instance, a nonequivalent groups pretest±posttest

design Ð mainly due to cost and convenience (time) as well

as training. When integrating program theory in the evalua-

tion design, additional measures are needed to capture the

predictor and mediating variables that must be included in

the model, and so, depending on the speci®c program as

well as planner and stakeholder interest, this might be

over achieving. Many stakeholders might be satis®ed know-

ing simply whether or not the program worked, and not so

interested in why and how it worked (a typical argument put

forth by the ªblack boxº design advocates). In addition,

LGM requires at least three time intervals, and so LGM is

not recommended when only one posttest measure is taken.

This may place more emphasis on the use of LGM in

evaluation prevention type programs. Finally, when incor-

porating LGM as an analytic tool in non-randomized

designs, selection bias still pose as a threat as in the non-

equivalent group comparative change design (however,

using covariates such as pretest measures in the model to

equate the groups can be incorporated).

8. Conclusions

The focus of this paper was to serve as a primer for future

consideration of the LGM method within program evalua-

tion. Subsequently, this paper proffers one hypothetical

model for evaluators interested in determining the longitu-

dinal impact of a program (e.g., prevention programs). An

emphasis was placed on demonstrating how program theory

can be integrated in the LGM design so that evaluators can

explain variability in their outcomes and pinpoint how the

program succeeded or failed. Because the focus of this paper

was conceptually rudimentary and centered mainly on one

hypothetical example, future research should focus on some

of the methodological and statistical issues inherent when

using the LGM method in program evaluation (e.g., using

time invariance/variant covariates in nonrandomized

designs).

In closing, LGMs are obviously not the only way to

model change (repeated measures ANOVA and ANCOVA

models remain valid given statistical assumptions and the

evaluator's goal). However, LGM techniques are more

versatile because, unlike traditional analytical methods

like ANCOVA, they allow the evaluator to capitalize on

modeling intra-individual variability in initial status and in

growth. Additionally, LGM techniques are more ¯exible

when evaluators choose to integrate program theory in

their evaluation designs. Directly including speci®c predic-

tors of initial status levels and in rate of change based on

theory is easily done. The tough part of course is specifying

the proper model given a well-formulated theory. Because

program theory drives many evaluations (once argued by

Chen, 1990), evaluators may also ®nd LGM to be a ¯exible

methodology used within a theory-driven context. More-

over, even if an evaluator is not strictly theory-driven, he

or she may ®nd the luxury of easily modeling predictors of

change within a LGM design a fruitful procedure, and there-

fore may want to incorporate latent curve analyses into their

repertoire.

Acknowledgements

A version of this paper was presented at the annual meet-

ing of the American Evaluation Association, November 3±

6, 1999 in Orlando, Florida. Appreciation is extended to Dr

Martin F. Sherman for his constructive review of a previous

draft of this manuscript.

References

Andrew, J. A., & Duncan, S. C. (1998). The effect of attitude on the

development of adolescent cigarette use. Journal of Substance Abuse,

10 (1), 1±7.

Bickman, L. (1987). The functions of program theory. In L. Bickman (Ed.),

Using program theory in evaluation. San Francisco: Jossey-Bass.

Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models:

Applications and data analysis methods. Newbury Park, CA: Sage.

Chen, H. T. (1990). Theory driven evaluations. Newbury Park: Sage.

Cordray, D. S. (1986). Quasi-experimental analysis: A mixture of methods

and judgment. In W. M. K. Trochim (Ed.), Advances in quasi-experi-

mental design and analysis. New Directions in Program Evaluation,

(31). San Francisco: Jossey-Bass.

Curran, P. J., Stice, E., & Chassin, L. (1997). The relation between adoles-

cent alcohol use and peer alcohol use: a longitudinal random coef®-

cients model. Journal of Consulting and Clinical Psychology, 65,

130±140.

B. Hess / Evaluation and Program Planning 23 (2000) 419±428 427

Duncan, T. E., & Duncan, S. C. (1994). Modeling incomplete longitudinal

substance abuse data using latent variable growth curve methodology.

Multivariate Behavioral Research, 29, 313±338.

Duncan, T. E., & Duncan, S. C. (1995). Modeling the processes of

development via latent variable growth curve methodology. Structural

Equation Modeling, 2, 187±213.

Duncan, T. E., Duncan, S. C., & Hops, H. (1998). Latent variable modeling

of longitudinal and multilevel alcohol use data. Journal of Studies on

Alcohol, 59, 399±408.

Duncan, T. E., Duncan, S. C., & Stoolmiller, M. (1994). Modeling

developmental processes using latent growth structural equation

methodology. Applied Psychological Measurement, 18, 343±354.

Irvine, A. B., Biglan, A., Smolkowski, K., Metzler, C. W., & Ary, D. V.

(1999). The effectiveness of a parenting skills program for parents of

middle school students in small communities. Journal of Consulting

and Clinical Psychology, 67, 811±825.

JoÈreskog, K. G., & SoÈrbom, D. (1993). LISREL 8 user's guide. Chicago, Il:

Scienti®c Software.

Kaplan, D., & George, R. (1998). Evaluating latent variable growth models

through ex post simulation. Journal of Educational and Behavioral

Statistics, 23, 216±235.

Lawrence, F. R., & Hancock, G. R. (1998). Assessing change over time

using latent growth modeling. Measurement and Evaluation in

Counseling and Development, 30, 211±223.

McArdle, J. J. (1988). Dynamic but structural equation modeling with

repeated measures data. In J. R. Nesselroade & R. B. Cattell (Eds.),

Handbook of multivariate experimental psychology (pp. 561±614).

New York: Plenum Press.

McArdle, J. J., & Epstein, D. (1987). Latent growth curves within devel-

opmental structural equation models. Child Development, 58, 110±133.

Meredith, W. (1991). Latent variable models for studying differences in

change. In L. M. Collins & J. L. Horn (Eds.), Best methods for analysis

of change: recent advances, unanswered questions, future directions

(pp. 149±163). Washington, DC: American Psychological Association.

Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55,

107±122.

MutheÂn, B. O. (1991). Analysis of longitudinal data using latent variable

models with varying parameters. In L. M. Collins & J. L. Horn (Eds.),

Best methods for analysis of change: Recent advances, unanswered

questions, future directions (pp. 1±17). Washington, DC: American

Psychological Association.

MutheÂn, B. O. (1993). Latent variable modeling of growth with missing

data and multilevel data. In C. M Cuadras & C. R. Rao (Eds.), Multi-

variate analysis: Future directions 2 (pp. 199±210). Amsterdam: North

Holland.

MutheÂn, B. O. (1996). Growth modeling via binary responses. In A. von

Eye & C. C. Clogg (Eds.), Categorical variables in development

research (pp. 37±54). San Diego, CA: Academic Press.

MutheÂn, B. O., & Curran, P. J. (1997). General longitudinal modeling of

individual differences in experimental designs: A latent variable

framework for analysis of power estimation. Psychological Methods,

4, 371±402.

MutheÂn, B. O., Kaplan, D., & Hollis, M. (1987). On structural equation

modeling with data that are not missing at random. Psychometrika, 42,

431±462.

Reynolds, A. J., & Temple, J. A. (1995). Quasi-experimental estimates of

the effects of a preschool intervention: Psychometric and econometric

comparisons. Evaluation Review, 19, 347±373.

Rogosa, D. R., & Willett, J. B. (1985). Understanding correlates of change

by modeling individual differences in growth. Psychometrika, 50,

203±228.

Schafer, J. (1997). Analysis of incomplete multivariate data. New York:

Chapman & Hall.

Schmitt, N., Sacco, J. M., Ramey, S., & Chan, D. (1999). Parental employ-

ment, school climate, and children's academic and social development.

Journal of Applied Psychology, 84, 737±753.

Weiss, C. H. (1997). How can theory-based evaluation make greater

headway? Evaluation Review, 21, 501±524.

Willett, J. B., & Sayer, A. G. (1994). Using covariance structure analysis to

detect correlates and predictors of individual change over time.

Psychological Bulletin, 116, 363±381.

Willett, J. B., & Sayer, A. G. (1996). Cross-domain analysis of change over

time: Combining growth modeling with covariance structure modeling.

In G. A. Marcoulides & R. E. Shumacker (Eds.), Advanced structural

equation modeling: Issues and techniques (pp. 125±157). Newbury

Park: Sage.

B. Hess / Evaluation and Program Planning 23 (2000) 419±428428