Upload
brian-hess
View
212
Download
0
Embed Size (px)
Citation preview
Assessing program impact using latent growth modeling: a primer forthe evaluator
Brian Hess*
Test Scoring and Reporting Services, The University of Georgia, 211 Fairfax Hall, Athens, GA 30602, USA
Received 1 January 1999; received in revised form 1 December 1999; accepted 1 April 2000
Abstract
Latent growth modeling (LGM) has emerged as a ¯exible analytic technique for modeling change over time because it can describe
developmental processes at both the inter- and intra-individual levels. The LGM method can also provide a means for testing the contribution
of other variables in order to explain variability in growth trajectories. This paper didactically illustrates the use of LGM as an analytical tool
in program evaluation. Speci®cally, a hypothetical evaluation of a high school drug prevention program was used to demonstrate: (a) how
LGM can be used to assess the longitudinal impact of a prevention program by comparing treatment and control populations with respect to
individual differences in initial status and in rate of change; and (b) how predictors of initial status (post-intervention) and growth selected on
the basis of a particular program theory can be incorporated in the model to explain program impact. Some advantages and limitations of
using LGM in program evaluation are highlighted. q 2000 Elsevier Science Ltd. All rights reserved.
Keywords: Growth curves; Program evaluation; Longitudinal modeling; Multiple-group analysis; Program theory
1. Introduction
In order to overcome some of the limitations of traditional
statistical approaches to the assessment of change over time
(e.g., repeated measures ANOVA), a branch of analytical
methods has recently emerged from the family of structural
equation modeling (SEM). Such methods are referred to as
ªLatent Growth Modelsº (LGM) and approach the analysis
of growth from a unique perspective (Lawrence & Hancock,
1998). In general, LGM methods are used to assess devel-
opmental processes across a variety of behavioral domains
by modeling both inter- and intra-individual variability in
terms of initial levels and in developmental trajectories from
those levels (Duncan, Duncan, & Stoolmiller, 1994). In
addition, LGM methods also provide a means for testing
the contribution of other variables or constructs in order to
explain variability in initial levels and in patterns of growth
(Lawrence & Hancock, 1998; Rogosa & Willett, 1985).
Because the LGM approach permits the researcher to
model both inter- and intra-individual variability in growth,
more information is available and used in the measured
variables than in traditional methods. That is, traditional
methods like repeated measures ANOVA focus only on
group mean values for each measurement point and thus
variability in rate of change at the individual level is not
modeled (i.e., intra-individual variability is captured in the
error term). On the other hand, LGM techniques capitalize
on individual variability, and simultaneously focuses on
correlations over time, changes in variance, and shifts in
mean values (McArdle, 1988). Recent applications of the
LGM method have been to analyze growth in drug and
alcohol use (see, e.g., Andrew & Duncan, 1998; Curran,
Stice, & Chassin, 1997; Duncan, Duncan, & Hops, 1998),
and to analyze children's academic and social development
(e.g., Schmitt, Sacco, Ramey, Ramey, & Chan, 1999).
The purpose of this paper is to provide program evalua-
tors with a brief introduction to the LGM method and to
some of the recent literature pertaining to its use. The
discussion is primarily didactic in terms of how to analyze
longitudinal data using LGM to determine program impact
(little emphasis is placed on discussing the technical or
statistical properties of LGM). To do so, a hypothetical
high school drug prevention program is utilized to illustrate
how LGM could be used to assess the longitudinal impact of
a program over time. Speci®cally, a two-group analysis
strategy proposed by MutheÂn and Curran (1997) is used to
compare a treatment to a control group in terms of inter- and
intra-individual variability in initial levels and in rate of
Evaluation and Program Planning 23 (2000) 419±428
0149-7189/00/$ - see front matter q 2000 Elsevier Science Ltd. All rights reserved.
PII: S0149-7189(00)00032-X
www.elsevier.com/locate/evalprogplan
* Corresponding author. 12711 Arbor Isle Drive, Temple Terrace, FL
33637, USA.
E-mail address: [email protected] (B. Hess).
change over time. Next, a demonstration of how predictor
variables selected on the basis of program theory can be
incorporated in the LGM in order to explain variation in
initial status and in patterns of growth is provided. Last,
some advantages and limitations of using the LGM method
in program evaluation are highlighted.
2. Latent growth models
In general, growth curves are viewed as a set of statistical
models for the study of individual differences in change.
This modeling strategy assumes that each individual has
his or her own (latent) trajectory of growth, which is
measured with some inaccuracy (Lawrence & Hancock,
1998). ªGrowthº can re¯ect either a positive or a negative
rate of change over time. Because each individual has his or
her own growth trajectory, time is nested within the indivi-
dual. This is referred to as the within individual level or the
Level 1 model (Willett & Sayer, 1994). The basic equation
for this within individual level is
yij � b0j 1 b1jtj 1 eij
where yij is the outcome of interest at time i for person j; b 0j
represents the initial status at time tj � 0; b 0j represents the
rate of change over time for person j; and eij is the distur-
bance (error) term.
Systematic variability in parameters of growth across
individuals can also be modeled. This is referred to as the
between individual level or the Level 2 model. In this case,
two models are speci®ed, one for the initial status para-
meter, b 0j, and one for the slope parameter, b 1j (Willett &
Sayer, 1994). The basic equation for the Level 2 model can
be expressed as:
b0j � a00 1 g01Gj 1 z0j
and
b1j � a10 1 g11Gj 1 z1j
where a 00 and a 10 are intercept parameters representing
initial status and rate of change when Gj is zero; g 01 and
g 11 are slopes relating Gj to initial status and rate of change,
respectively; z 0j and z 1j are the disturbance terms.
Finally, this growth model can be further extended to
allow individuals to be nested in groups such as in treatment
and control groups. In this case, the two groups become the
Level 3 model and can be used to study intervention
(program) effects on initial status and rate of change over
time. Moreover, as one can see in Fig. 1, individuals may
grow at different rates, thus LGMs capitalize on individual
variability. For a more detailed discussion of how covar-
iance structure analysis (general LISREL model) is used
to model individual change over time in accordance with
Levels 1 and 2 delineated above, see Willett and Sayer
(1994, 1996).
One major underlying assumption when using the LGM
method is that individual growth within a particular group
must follow the same functional form. This does not mean
that growth must necessarily follow a linear tend; growth
may actually be of a different functional nature (e.g., quad-
ratic or logarithmic). For example, in many evaluation
studies, the control group exhibits a growth form that is
linear, but growth due the speci®c program treatment is
quadratic in form. In addition, when using a SEM approach,
the LGM method assumes, ideally, that the number and
spacing of assessments are the same for all individuals
over time. If the number of time points or the spacing
between time points vary across individuals, other growth
curve techniques are available (e.g., under the framework of
hierarchical linear modeling described by Bryk & Rauden-
bush, 1992). Duncan et al. (1994) pointed out that LGM
methods can be applied to circumstances in which indivi-
duals are not measured at the same time intervals, however,
speci®c constraints are needed for parameter estimation.
Nevertheless, in LGM, each individual's growth over a
span of time is represented by a line and may be summarized
by a unique intercept and slope. An individual's intercept
describes the amount of the outcome variable possessed at
the initial measurement point. The slope captures informa-
tion about how much the individual changes for each time
interval after the initial measurement point (Lawrence &
Hancock, 1998). As a result of individual differences in
these intercepts and slopes, changes occur in the relation-
ships among individuals' data across the different time inter-
vals. Speci®cally, an individual's position shifts across time,
relative to other individuals in the population.
In LGM, both intercept and slope are treated as latent
variables (or factors) and are not measured directly. Fig. 2
represents an example structural model for linear growth
trajectory. Note that in this linear model all paths from the
intercept factor to each time point are ®xed to 1 and all paths
from the slope factor to each time point are ®xed to 0, 1,
B. Hess / Evaluation and Program Planning 23 (2000) 419±428420
Fig. 1. Individual growth curves (nested in groups) across four time points.
2, and 3. Fixing the paths from the slope factor to 0, 1, 2, and
3 allows the investigator to test whether this model repre-
senting linear growth ®ts the data satisfactory. Conversely,
if one was interested in determining quadratic growth, the
paths from the slope factor would be set 0, 1, 4, and 9,
respectively. Once these paths are ®xed accordingly, then
mean and variance parameters associated with each factor
as well as residuals can be estimated.
A unique and powerful advantage of using the LGM
method is its ability to incorporate predictors of intercept
and slope factors (Lawrence & Hancock, 1998). As will be
shown, this may prove useful when a particular theory is
used to explain how a program is designed to have an
impact. Moreover, the latent variable analysis approach to
the assessment of change is presented in more technical
detail in McArdle (1988), McArdle and Epstein (1987),
Meredith (1991), Meredith and Tisak (1990) and MutheÂn
(1991). Some applications that elucidate the development of
the LGM method can be found in Duncan and Duncan
(1995) and Duncan et al. (1994).
2.1. Estimating a LGM
To estimate a LGM as shown in Fig. 2, the correlation
matrix and the means and standard deviations of the
measured variables at each time point are used. Using the
SEM approach, the LGM must be speci®ed and assessed for
®t by using a x 2 test and other ®t indices (e.g., root mean
square error of approximation). Procedurally, a constant C is
introduced between the intercept and slope factors allowing
one to estimate the means for both factors. It should be
pointed out that the constant C is not a variable per se; it
assumes a constant value of 1 for all individuals and thus has
no variance and cannot covary or have an effect on any
measured variable or factor. With the constant C in the
model, parameter a1 or the average initial level, and para-
meter a2 or the average rate of change, can be estimated for
the sample. The variance of the intercept and slope factors
are also estimated and are denoted as v1 and v2, respectively
in Fig. 2. In addition, the parameter represented by the
curved double-headed arrow path between the slope and
intercept factors denoted as v3 can also be estimated and
represents the belief that how much one's behavior changes
may be related to where one starts out. In other words, a
positive value for the estimate v3 would indicate that, on
average, those who started out higher at the initial level
grow at a positive rate across time. Finally, errors in
measurement (residuals) are estimated for each time point
and are denoted as E1±E4 in Fig. 2.1
3. Implications for program evaluation
When using the LGM method to assess change over time,
any general form of longitudinal data can be studied. This
includes testing, for example: (1) if mediational variables
in¯uence the developmental process; (2) if distal outcome
variables are in¯uenced by the developmental process; and
(3) if multiple developmental processes exist for more than
one outcome variable (MutheÂn & Curran, 1997). Addition-
ally, the LGM framework can be applied to multiple popu-
lation (e.g., treatment-control) studies. Therefore, the LGM
method can be used in program evaluation.
Typical evaluation questions that LGM methodology
could address include, ªGiven individual differences in
initial status and in growth, what are the longitudinal effects
of a program?º Or, more speci®cally, ªIs there a slower rate
of marijuana use for those exposed to a drug prevention
program as compared to students who did not receive the
program?º Another question LGM methods might address
for the educational evaluator is, ªDo rates at which children
learn differ by attributes of the program in which they were
exposed?º Questions like these can be answered when
continuous data are available longitudinally on many
individuals (Willett & Sayer, 1994).
LGM methods have an advantage over the traditional
pretest±posttest designs in utilizing more than two waves
of data to maximize information on individual change.
When development follows an interesting trajectory over
time, ªsnapshotsº of status taken only before and after are
unlikely to reveal the intricacies of individual change
(Willett & Sayer, 1994). Hence, LGM methods can capture
long term impact of a program while capitalizing on
intra-individual differences in rate of change.
It stands to reason that assessing program impact via
B. Hess / Evaluation and Program Planning 23 (2000) 419±428 421
1 The analysis of LGMs can be easily carried out using LISREL due to its
¯exibility and convenience for estimating speci®c parameters in the model
(JoÈreskog and SoÈrbom, 1993). In LISREL terminology, the paths from each
intercept and slope factors to each observed timepoint are ®xed in the l y
(lambda) matrix according to Fig. 1, assuming a linear trajectory. Intercept
and slope means are free to be estimated in the a (alpha) matrix, and the
intercept and slope variances are free to be estimated in the c (psi) matrix.
Errors or residuals are free to be estimated in the u1 (theta-epsilon) matrix.
Fig. 2. A LGM used to determine program impact.
covariance modeling may become more recognized given
the ¯exibility that LGM techniques afford. Recently, Irvine,
Biglan, Smolkowski, Metzler, and Ary (1999) applied LGM
to determine the effectiveness of a parenting skills program
for parents of middle school students using a randomized
control trial design. However, one challenging area in
program evaluation in which the ef®cacy of LGM techni-
ques could be assessed is within quasi-experimental designs
where time invariant covariates might be recommended.
These types of non-equivalent groups designs are often
encountered when evaluating prevention programs in
mental health or education programs.
4. Assessing program impact via LGM: a hypotheticalexample
To provide an example, say an evaluator is interested in
assessing the impact of an innovative high school drug
prevention program across the 4 years of high school
(ninth through twelfth grades). One sample of ninth grade
students from an urban school is identi®ed to receive the
program at the beginning of the year while a sample from
another school very similar in characteristics is chosen as a
control group.2 The observed outcome variable will be
frequency and amount of drug use (based on a composite
of alcohol and other drugs) and will be measured at the end
of the ninth grade (initial time point), followed by measures
taken at the end of tenth grade (Time 2), eleventh grade
(Time 3), and twelfth grade (Time 4) in order to gauge
rate of change across the high school years. For the sake
of simplicity, growth is hypothesized to be linear for both
populations; the rate of change for the treatment population
is hypothesized, in this example, to be slower as a result of
the program. The basic model in Fig. 2 is used to test for this
program effect. Notice that the paths are ®xed to test for
linear growth.
To begin to model change in a way that allows the evalua-
tor to determine program impact, a step-by-step multiple-
group analysis strategy put forth by MutheÂn and Curran
(1997) is recommended. According to MutheÂn and Curran,
the ®rst step is to ®t the model to the control group. The
control population represents the normative set of individual
growth trajectories that would have been observed in the
treatment population had they not been chosen for treat-
ment. From this analysis, it is important to rule out that
the control population exhibits any of the post-intervention
changes in growth trajectories that are hypothesized to be
due to treatment (MutheÂn & Curran, 1997). For the control
group, the model should statistically ®t the data (e.g., based
on a x 2 test). Following this, the treatment-group model is
then analyzed separately and the basic growth trajectory
form is investigated.
Keep in mind that for both the treatment and control
models, four parameters (along with the residuals) will be
assessed in order to fully understand the treatment effect. To
reiterate, the ®rst two parameters re¯ect the mean of the
intercept factor (mean initial level) and mean of the slope
factor (average rate of change) denoted as a1 and a2, respec-
tively, in Fig. 2. The second two parameters re¯ect the
variance of the intercept and slope factors denoted as v1
and v2, respectively, in Fig. 2. In addition, v3 can be assessed
if it is hypothesized that a relationship exists between initial
status and rate of change.
Following separate tests of the treatment and control-
group models, program impact is assessed by conducting
a multiple-group analysis. This requires ®rst constraining
all four parameters to be invariant (equal) in the control
group and then obtain the model ®t or x 2 for the control
group model. Next, each of the four parameters are freed in
the treatment model one at a time. When testing each para-
meter individually, a signi®cant change in x 2 from the treat-
ment model compared to that of the control model of the
previous step would reveal if that parameter value for the
treatment group is signi®cantly different from that of the
control group. That is, lack of parameter equality suggests
a signi®cant program effect. Thus, the effect of the program
is assessed by comparing the set of initial levels and trajec-
tories in the treatment population with those in the control
population.
To explain how parameters are assessed for inequality
and what is interpreted from the multiple-group analysis
described above, consider ®rst testing a1 or the mean drug
use at the end of ninth grade (i.e., initial status). When this
parameter is freed in the treatment model and this model's
x 2 is found to be statistically different from the control
model x 2 where all parameters were constrained to be invar-
iant, this will indicate that, for the treatment group, the mean
drug use at the end of ninth grade is signi®cantly different
than that of the control group. This of course assumes that
the two groups are equated at the very beginning. In nonran-
domized designs, equality may not be assured and so it may
behoove the evaluator to obtain a pre-intervention measure
of the outcome variable and use this pre-measure as a
covariate for the intercept factor. In this example, if the
estimated mean value for the intercept factor is lower than
that estimated for control group, this will show that the
prevention program had an impact at the initial level.
Next, a2 or average rate of change is assessed. As with a1,
if a2 is signi®cantly smaller for the treatment group based on
the x 2 difference test, this will indicate a slower rate of drug
use for the treatment group compared to the control group.
Next, parameters v1 and v2 represent the variance of the
intercept and slope factors and can be assessed individually
as a1 and a2. For example, a smaller treatment-group variance
B. Hess / Evaluation and Program Planning 23 (2000) 419±428422
2 For sake of demonstration, it will be assumed that the two groups are
equated at the onset on all measures. However, it is recognized that because
random assignment is often unfeasible in this type of evaluation design,
other strategies should be used to equate the two groups. In such quasi-
experimental designs, time variant and/or time invariant covariates such as
pretest measures may be included in the model to adequately equate and
compare the two groups.
for the growth factor would represent a program effect that
makes growth more homogeneous among individuals (i.e.,
students exposed to the drug prevention program will be
more alike in their rate in change across time). Note that v3
in Fig. 2 may also be tested if one has reason to believe, for
those exposed to the program, the lower one starts out at the
initial level, the slower the rate of change across time. Finally,
errors or residuals at each time point can be assessed and
compared.
4.1. An alternative approach
As an alternative method to determine treatment effect in
a two-group design, particularly when the intervention takes
place after the initial time point, MutheÂn and Curran (1997)
suggested the same multiple-population analysis discussed
above. However the actual effect of the treatment is deter-
mined by including an additional slope or growth factor for
the treatment population. Fig. 3 shows the model for the
treatment group with the added growth factor (set to be
quadratic in form).
In this model, the ®rst two factors are interpreted the same
as the control population, however, the third factor repre-
sents the growth form (quadratic) that is speci®c only to the
treatment population. Notice that only the intercept factor
in¯uences the ®rst time point (i.e., path to Time 1 is set to 1).
This re¯ects the individual's initial status level before the
intervention. At following time points, the second factor
represents the linear growth rate that the individual would
progress had the individual not received the program. The
inclusion of the third or added growth factor captures the
increment or decline in growth beyond that of the control
population. Moreover, this analysis strategy is particularly
useful when growth due to a program (e.g., a quadratic
trend) might be different than the growth presented by a
control population. Therefore, the evaluator should consider
this analysis approach and include the added growth factor
of the form hypothesized to be due to the program.
In short, to test for program impact by way of including
an added growth factor, begin analyzing the control and
treatment-group models separately to determine the growth
form of each population. Following this, proceed with the
multiple-group analysis strategy. In doing so, the para-
meters for the ®rst and second growth factors for the
program group are constrained to be equal to those of the
control group.3 The added growth factor must be included in
order to capture the positive or negative rate of change
beyond that of the control group. For instance, if the control
group shows linear growth, one may add a quadratic growth
factor to the program group (assuming that the only differ-
ence is in the quadratic growth trend). Program impact is
re¯ected in the mean and variance of the added growth
factor compared to the control group's growth rate. Again,
the evaluator can test each parameter by way of the x 2
difference test.
However, MutheÂn and Curran (1997) pointed out that
before testing the equality of each parameter, one might
test for an initial status by treatment interaction Ð that is,
allow the initial status factor to in¯uence the treatment
growth factors in the treatment group. This is equivalent
to testing for homogeneity of slopes in the traditional
ANCOVA context. In sum, this analysis strategy whereby
testing the contribution of an added growth factor due to the
program is indeed a more useful approach, especially when
the intervention follows the initial time point and when
treatment growth follows a different form than the control
population.
5. Incorporating program theory in the evaluationdesign
In many instances, evaluators incorporate program theory
in their evaluation design to assist them in understanding
how the program produced (or failed to produce) its
intended and perhaps unintended effects (Chen, 1990, p.
171). ªCausalº mechanisms of a program should be exam-
ined within the framework of the program's theory because
traditional input±output assessments may lead an evaluator
to provide an impoverished version of causal inference
(Cordray, 1986). Allowing theory to drive the evaluation
also broadens the evidential basis by actively considering
plausible rival explanations, by examining implementation
procedures, and by investigation mediation and contextual
factors.
Speci®c bene®ts of integrating program theory in the
evaluation design have been recognized. Bickman (1987)
provided a list of bene®ts that can result in articulation of
program theory and its integration with program evaluation.
B. Hess / Evaluation and Program Planning 23 (2000) 419±428 423
Fig. 3. Determining program impact by including an added growth factor
for the treatment group.
3 Restricting the parameters of the ®rst growth factor to be equal across
treatment and control groups is suggested for randomized studies where
populations are similar at the initial time point. In nonrandomized designs,
MutheÂn and Curran (1997) suggested relaxing the equality constraint for the
initial status factor, and similarly, the model can be improved if time
invariant covariates are used.
Advantages include: (1) specifying the underlying theory of
a program within the evaluation allows that theory to be
tested in a way that reveals whether program failure results
from implementation failure or theory failure; (2) program
theory clari®es the connections between a program's opera-
tions and its effects, and thus helps the evaluator to ®nd
either positive or negative effects that otherwise might not
be anticipated; (3) program theory can also be used to
specify intermediate effects of a program that might become
evident and measurable before ®nal outcomes can be mani-
fested, which can provide opportunities from early program
assessment in time for corrective action by program imple-
menters; and (4) program theory may be the best method of
informing and educating stakeholders so that they can
understand the limits of the program. In conclusion, it
seems to be to the evaluator's advantage to integrate
program theory in their evaluation design (at least in part
if possible).
5.1. Testing the contribution of predictors based on program
theory
In addition to assessing program impact by way of the
two-group analysis approach as illustrated using the models
in Figs. 2 and 3, the same analysis approach can be used
when incorporating predictors of the intercept and slope
factors based on a particular program theory. Consider the
same hypothetical high school drug prevention program
discussed earlier. However, this time, assume the evaluator
would like to determine program impact as driven by a
program theory. The (simpli®ed) theory proposed underly-
ing the hypothetical program has, say, two components: a
psychological and a social component. The psychological
component centers on knowledge of the health conse-
quences of drug use. The social component centers on social
skills like assertiveness, competency, and resistance
towards in¯uential drug-using peers. The hypothetical
program is designed to reduce the rate of drug use over
time by providing students with adequate knowledge
about the consequences of drug use, and at the same
time, provide students with good social skills so they can
be interpersonally strong during their high school years.
In terms of design and analysis, to explain how this inno-
vative prevention program is designed to have an impact
over time, the two theoretical components must be opera-
tionally de®ned, adequately measured, and included as
predictor variables of both intercept and slope factors. For
the sake of illustration, knowledge about the consequences
of drug use is measured using a valid instrument and will be
assumed to be measured without error. Similarly, social skill
is measured by degree of assertiveness and competency,
also measured without error. Parenthetically, it should be
noted that, in practice, the evaluator might incorporate
several measures of these two constructs as a way to account
for the measurement error. Both of these variables will be
measured at the completion of the prevention program
(before Time 1). Moreover, it is hypothesized that, for
those exposed to the prevention program, there will be an
increase in the amount of knowledge about drug use, and
similarly, an increase in level of social skill due to the
program. In turn, this will lead to lower initial levels and
slower rate of drug use over the high school years (as
compared to a control group). Fig. 4 shows that the rate of
change is hypothesized to be linear in form (i.e., paths from
the slope factor to each observed time point are set 0, 1, 2,
and 3).
To provide an illustration of how this is incorporated
within a LGM design, consider Fig. 4, which shows the
complete model (used for both the control and treatment
groups). In this case, instead of just four parameters to be
assessed as in Fig. 2, there are now 12 parameters of interest
(as well as the residuals) labeled accordingly in Fig. 4. As
with the basic two-group intervention design, the same step-
by-step multiple-group analysis strategy is used. That is,
®rst ®t the model to the control and then the same model
for the treatment group. Next, conduct the multiple-group
analysis to determine lack of equality for each parameter
value.
To explain how parameters are assessed for inequality
and what is interpreted from the multiple-group analysis
using the model in Fig. 4, begin with parameter a1. Para-
meter a1 refers to the mean drug use at the end of ninth grade
(i.e., initial status). When a1 is free to be estimated in the
treatment model and this model is compared to the control
model where all parameters were constrained to be invar-
iant, a signi®cant change in x 2 value will indicate that, for
the treatment group, the mean drug use at the end of ninth
B. Hess / Evaluation and Program Planning 23 (2000) 419±428424
Fig. 4. A LGM incorporating predictors of intercept and slope factors based
on program theory.
grade is signi®cantly different than that of the control group
(again assuming that the two groups are equated at the very
beginning).
Parameter a2 in Fig. 4 re¯ects the average rate of change
over the high school years. As with a1, if this parameter is
signi®cantly smaller based on x 2 difference test, this will
indicate a slower rate of drug use for the treatment group
compared to the control group. Parameter a3 and a4 in Fig. 4
refer to the mean level of knowledge and mean level of
social skill, respectively. A signi®cant difference in means
(i.e., in this case, the means for the treatment group should
be larger than those of the control group) indicates that the
prevention program signi®cantly increased amount of
knowledge and social skill. Of course again this assumes
the groups were equated at the beginning and selection bias
was not a threat; if not, a pretest covariate for each predictor
might be used.
To determine if the two predictors account for a signi®-
cant amount of variability in initial levels and rate of change
as suggested by program theory, the four parameters or path
coef®cients labeled b1, b2, b3, and b4 in Fig. 4 are individu-
ally tested (in LISREL terminology these path coef®cients
are b parameters). Based on the present hypothetical
evaluation, it is hypothesized that for those in the prevention
program, the program will lead to higher amounts of knowl-
edge and greater levels of social skill. This in turn will lead
to lower initial levels and slower rate of change over time
for those exposed to the program. To test this hypothesis,
each of the four path estimates for the treatment group is
compared to those in the control group one at a time in the
multi-group analysis. If change in x 2 is signi®cant for
each test and the paths are in the right direction (i.e., in
this example, all four paths should be negative and signif-
icantly larger than those for control group), then one can
conclude that the predictors explain a signi®cant amount
of the variability in initial levels and in rate of change.
Parameters v1 and v2 in Fig. 4 represent the variance of the
intercept and slope factors and can be assessed to deter-
mine if the variability in initial status and in growth for
the treatment group is greater than that of the control
group. Similarly, parameters v3 and v4 represent the
variance of the two predictors and can also be assessed
to determine if the variability in knowledge and social
skill for the treatment group is greater than that of the
control group. Last, errors or residuals can be assessed
and compared.
In sum, program theory can be incorporated within a
LGM approach to assess how a program had (or failed to
have) an immediate as well as a longitudinal impact over
time (i.e., a preventive effect). In the simplest terms, this can
be done by testing the contribution of predictor variables
(based on program theory) in order to explain inter- and
intra-individual variability in initial levels and in rate of
change over time. This requires a well-formulated program
theory and a properly speci®ed model in accordance with
this theory.
6. Advantages of using LGM in program evaluation
As seen in the previous examples, the major advantage of
utilizing the SEM approach to modeling change is its ¯ex-
ibility to capitalize on variability in individual differences at
both the initial level and in growth over time. Subsequently,
LGM is easy to implement using existing software programs
like LISREL (Kaplan & George, 1998). Although only the
x 2 difference test was discussed, other various goodness-of-
®t indices provided by LISREL allow for a wide variety of
substantively interesting hypotheses to be tested, such as
testing the adequacy and stability of the hypothesized
growth form (e.g., as indexed by the root mean square
error of approximation). The SEM approach also allows
for the study of more than one outcome (see Willett &
Sayer, 1996). Furthermore, Kaplan and George (1998)
pointed out that utilizing a SEM approach allows the
researcher to incorporate multiple indicators of the outcome
of interest at each time point, thus accounting for the
problems of measurement error. Last, when using SEM,
one can easily handle missing data (see MutheÂn, 1993) as
well as incorporate categorical variables (see MutheÂn,
1996). As with repeated measures ANOVA, when including
categorical independent variables, interaction effects (e.g.,
between gender and treatment) can also be tested using
LGM techniques.
Compared to ANCOVA, LGM techniques are more ef®-
cient, requiring fewer cases. The trade-off is that for
increased power, more measurement periods are needed,
but on fewer cases. This bene®t is reduced by an increase
in attrition rate. In addition, interaction effects (say between
gender and treatment) can be detected without large sample
sizes if the interaction effect is sizeable (MutheÂn & Curran,
1997). MutheÂn and Curran put forth a framework for power
estimation to detect treatment effects. Using this framework,
the researcher can compute his or her own power curves for
a speci®c model given the hypothesized size of the para-
meter values.
Whether or not program theory is considered in the
evaluation design, LGM methodology allows the evaluator
to test the contribution of predictors of initial status and
growth. LGM techniques also provide the evaluator with
the ability to incorporate both ®xed and time varying covari-
ates. Schmitt et al. (1999), for example, used parental
employment as a predictor of children's academic develop-
ment, using school climate as a covariate. In the case of the
hypothetical drug prevention evaluation discussed in this
paper, the evaluator may want to adjust or control for
other psychological variables or demographics like SES
(depending on the program) that may be hypothesized to
explain drug use. As noted earlier, incorporating time invar-
iant covariates (e.g., pretest measures) are essential in
nonrandomized designs in order to equate groups. Addition-
ally, using LGM within the evaluation of a drug prevention
program over time can also allow growth on several
constructs to be tested simultaneously (e.g., concomitant
B. Hess / Evaluation and Program Planning 23 (2000) 419±428 425
drug and alcohol use). In sum, a further understanding of the
strengths of using LGM in intervention designs, particularly
the data demands that are associated with this type of
modeling, will evolve as more applications emerge. One
such application by Reynolds and Temple (1995) discussed
strengths associated with reducing selection bias in quasi-
experimental LGM designs compared to traditional
econometric modeling.
6.1. Distinguishing implementation failure from theory
failure
Other postulated bene®ts afforded when integrating
program theory in evaluation echo some of the bene®ts
proposed by Bickman (1987) discussed earlier. For exam-
ple, one of the most bene®cial aspects of integrating
program theory is that, by including predictor (and/or
mediator) variables in the model, the evaluator can pinpoint
which factors in the causal chain lead to program success or
failure. Speci®cally, keeping with the present hypothetical
high school drug prevention example, specifying the under-
lying theory of the development of drug use helps the
evaluator to determine if program failure was due to
implementation failure or theory failure.
Chen (1990, p. 201) suggested that implementation fail-
ure occurs when the treatment or program fails to affect the
ªcausalº or predictor variable(s), but the basic conceptual
theory underlying the program is sound. From a theory-
driven perspective, Chen refers to the link between the
program and the casual variable as ªaction theoryº and
subsequently he refers to this type of breakdown as ªaction
theory failure.º For example, if the drug prevention program
failed to ªactivate,º or more speci®cally, failed to increase
level of knowledge and social skill, then the problem was in
the program as implemented not in the basic conceptualiza-
tion of the program. On the other hand, theory failure occurs
when program treatment has successfully activated the
causal or predictor variables, but the underlying conceptual
theory is invalid. Consequently, the outcome variable is
unaffected. Chen refers to this link between the causal vari-
able and outcome variable as ªconceptual theoryº and thus
calls this breakdown ªconceptual theory failure.º In this
case, the prevention program may have increased level of
drug knowledge and social skill, but frequency of drug use
at the initial time point and/or over time was unaffected.
Fig. 5 illustrates how the two links tie program variables
to their intended effects based on Chen's paradigm. For both
the action and conceptual theory links, it is critical that the
model is properly speci®ed and includes all relevant vari-
ables based on program theory, otherwise the model will not
give a full impression of how the program succeeded or
failed. For example, if the action theory suggests that
improved social skill will lead to less drug use not directly,
but indirectly through a mediator variable (e.g., resistance to
peer pressure), then the mediator variable must be measured
and included in the model. Otherwise, if program failure due
to poor implementation occurred, it will not be known
whether it was due to the program's failure to increase
social skill or if increased social skill failed to increase
resistance to peer pressure. From Fig. 5, it is easy to see
that both the action and conceptual components of program
theory helps clarify the connection between the program's
operation and its effect on drug use across time. Moreover,
identifying both the action and conceptual program theory
links and properly specifying the LGM provide a means for
informing and educating stakeholders regarding how to
diagnose sources of program failure or success. This kind
of diagnostic function can also provide useful informa-
tion for future program improvements (Chen, 1990,
p. 191).
7. Some limitations of using LGM in program evaluation
Like all other evaluation designs, latent growth modeling
has limitations. One of the most obvious disadvantages in
using LGM methods in assessing program impact is the fact
that it requires measures of an outcome at multiple time
points. Attrition may be a problem leading to missing
data. Fortunately, MutheÂn (1993) and MutheÂn, Kaplan,
and Hollis (1987) discussed how to handle missing data in
the latent variable framework. For a discussion on how to
model incomplete data due to attrition in the LGM frame-
work, see Duncan and Duncan (1994) and Schafer (1997).
From a statistical perspective, because LGM is a latent
variable analysis method using SEM, it shares many of the
same weaknesses (Duncan et al., 1994). For instance, LGM
methods assume multinormally distributed variables and
assume that change is systematically related to the passage
of time, at least over the time interval under study. Further-
more, Duncan et al. (1994, p. 344) noted that, ªthe applica-
tion of LGM within the SEM framework depends, at least
ideally, on data that are collected when individuals are
observed at approximately the same time, and the number
and spacing of assessments are the same for all individuals.º
Limitations exist as well when integrating program
theory and including relevant predictor(s) of the intercept
B. Hess / Evaluation and Program Planning 23 (2000) 419±428426
Fig. 5. Specifying the action and conceptual theory links provides assis-
tance when determining the sources of program failure or success.
and slope factors. Weiss (1997) provided a discussion as to
the dif®culty associated with adopting a [strict] theory-
driven perspective in evaluation practice. For instance, she
noted that there are stark disagreements as to what repre-
sents an adequate, well-de®ned program theory. In social
science, theory is broadly construed by many scientists and
so theory at times offers little guidance for analysis. Hence,
it is not advocated in this paper that evaluators should
consider themselves strictly ªtheory-drivenº in order to
fully understand the operations of a program and its
outcome, even when using an LGM approach. What is
suggested is that evaluators should be more ªtheory
consciousº or try to consider a clear, properly speci®ed
theory as a guide when choosing explanatory variables
that help explain variability in their outcome variables (In
the LGM case, explaining variability in initial levels and in
growth). Moreover, even if a program theory is too concep-
tually broad for a good theory-driven evaluation, incorpor-
ating predictors of initial levels and growth will still add to
our knowledge of the mechanisms by which programs have
their effects.
Apart from the fact that using the LGM method requires
at least some working knowledge of SEM, one possible
resistance to using LGM in program evaluation, particularly
when integrating program theory, might be due to the issue
of evaluation design. Many evaluations are restricted to
using, for instance, a nonequivalent groups pretest±posttest
design Ð mainly due to cost and convenience (time) as well
as training. When integrating program theory in the evalua-
tion design, additional measures are needed to capture the
predictor and mediating variables that must be included in
the model, and so, depending on the speci®c program as
well as planner and stakeholder interest, this might be
over achieving. Many stakeholders might be satis®ed know-
ing simply whether or not the program worked, and not so
interested in why and how it worked (a typical argument put
forth by the ªblack boxº design advocates). In addition,
LGM requires at least three time intervals, and so LGM is
not recommended when only one posttest measure is taken.
This may place more emphasis on the use of LGM in
evaluation prevention type programs. Finally, when incor-
porating LGM as an analytic tool in non-randomized
designs, selection bias still pose as a threat as in the non-
equivalent group comparative change design (however,
using covariates such as pretest measures in the model to
equate the groups can be incorporated).
8. Conclusions
The focus of this paper was to serve as a primer for future
consideration of the LGM method within program evalua-
tion. Subsequently, this paper proffers one hypothetical
model for evaluators interested in determining the longitu-
dinal impact of a program (e.g., prevention programs). An
emphasis was placed on demonstrating how program theory
can be integrated in the LGM design so that evaluators can
explain variability in their outcomes and pinpoint how the
program succeeded or failed. Because the focus of this paper
was conceptually rudimentary and centered mainly on one
hypothetical example, future research should focus on some
of the methodological and statistical issues inherent when
using the LGM method in program evaluation (e.g., using
time invariance/variant covariates in nonrandomized
designs).
In closing, LGMs are obviously not the only way to
model change (repeated measures ANOVA and ANCOVA
models remain valid given statistical assumptions and the
evaluator's goal). However, LGM techniques are more
versatile because, unlike traditional analytical methods
like ANCOVA, they allow the evaluator to capitalize on
modeling intra-individual variability in initial status and in
growth. Additionally, LGM techniques are more ¯exible
when evaluators choose to integrate program theory in
their evaluation designs. Directly including speci®c predic-
tors of initial status levels and in rate of change based on
theory is easily done. The tough part of course is specifying
the proper model given a well-formulated theory. Because
program theory drives many evaluations (once argued by
Chen, 1990), evaluators may also ®nd LGM to be a ¯exible
methodology used within a theory-driven context. More-
over, even if an evaluator is not strictly theory-driven, he
or she may ®nd the luxury of easily modeling predictors of
change within a LGM design a fruitful procedure, and there-
fore may want to incorporate latent curve analyses into their
repertoire.
Acknowledgements
A version of this paper was presented at the annual meet-
ing of the American Evaluation Association, November 3±
6, 1999 in Orlando, Florida. Appreciation is extended to Dr
Martin F. Sherman for his constructive review of a previous
draft of this manuscript.
References
Andrew, J. A., & Duncan, S. C. (1998). The effect of attitude on the
development of adolescent cigarette use. Journal of Substance Abuse,
10 (1), 1±7.
Bickman, L. (1987). The functions of program theory. In L. Bickman (Ed.),
Using program theory in evaluation. San Francisco: Jossey-Bass.
Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models:
Applications and data analysis methods. Newbury Park, CA: Sage.
Chen, H. T. (1990). Theory driven evaluations. Newbury Park: Sage.
Cordray, D. S. (1986). Quasi-experimental analysis: A mixture of methods
and judgment. In W. M. K. Trochim (Ed.), Advances in quasi-experi-
mental design and analysis. New Directions in Program Evaluation,
(31). San Francisco: Jossey-Bass.
Curran, P. J., Stice, E., & Chassin, L. (1997). The relation between adoles-
cent alcohol use and peer alcohol use: a longitudinal random coef®-
cients model. Journal of Consulting and Clinical Psychology, 65,
130±140.
B. Hess / Evaluation and Program Planning 23 (2000) 419±428 427
Duncan, T. E., & Duncan, S. C. (1994). Modeling incomplete longitudinal
substance abuse data using latent variable growth curve methodology.
Multivariate Behavioral Research, 29, 313±338.
Duncan, T. E., & Duncan, S. C. (1995). Modeling the processes of
development via latent variable growth curve methodology. Structural
Equation Modeling, 2, 187±213.
Duncan, T. E., Duncan, S. C., & Hops, H. (1998). Latent variable modeling
of longitudinal and multilevel alcohol use data. Journal of Studies on
Alcohol, 59, 399±408.
Duncan, T. E., Duncan, S. C., & Stoolmiller, M. (1994). Modeling
developmental processes using latent growth structural equation
methodology. Applied Psychological Measurement, 18, 343±354.
Irvine, A. B., Biglan, A., Smolkowski, K., Metzler, C. W., & Ary, D. V.
(1999). The effectiveness of a parenting skills program for parents of
middle school students in small communities. Journal of Consulting
and Clinical Psychology, 67, 811±825.
JoÈreskog, K. G., & SoÈrbom, D. (1993). LISREL 8 user's guide. Chicago, Il:
Scienti®c Software.
Kaplan, D., & George, R. (1998). Evaluating latent variable growth models
through ex post simulation. Journal of Educational and Behavioral
Statistics, 23, 216±235.
Lawrence, F. R., & Hancock, G. R. (1998). Assessing change over time
using latent growth modeling. Measurement and Evaluation in
Counseling and Development, 30, 211±223.
McArdle, J. J. (1988). Dynamic but structural equation modeling with
repeated measures data. In J. R. Nesselroade & R. B. Cattell (Eds.),
Handbook of multivariate experimental psychology (pp. 561±614).
New York: Plenum Press.
McArdle, J. J., & Epstein, D. (1987). Latent growth curves within devel-
opmental structural equation models. Child Development, 58, 110±133.
Meredith, W. (1991). Latent variable models for studying differences in
change. In L. M. Collins & J. L. Horn (Eds.), Best methods for analysis
of change: recent advances, unanswered questions, future directions
(pp. 149±163). Washington, DC: American Psychological Association.
Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55,
107±122.
MutheÂn, B. O. (1991). Analysis of longitudinal data using latent variable
models with varying parameters. In L. M. Collins & J. L. Horn (Eds.),
Best methods for analysis of change: Recent advances, unanswered
questions, future directions (pp. 1±17). Washington, DC: American
Psychological Association.
MutheÂn, B. O. (1993). Latent variable modeling of growth with missing
data and multilevel data. In C. M Cuadras & C. R. Rao (Eds.), Multi-
variate analysis: Future directions 2 (pp. 199±210). Amsterdam: North
Holland.
MutheÂn, B. O. (1996). Growth modeling via binary responses. In A. von
Eye & C. C. Clogg (Eds.), Categorical variables in development
research (pp. 37±54). San Diego, CA: Academic Press.
MutheÂn, B. O., & Curran, P. J. (1997). General longitudinal modeling of
individual differences in experimental designs: A latent variable
framework for analysis of power estimation. Psychological Methods,
4, 371±402.
MutheÂn, B. O., Kaplan, D., & Hollis, M. (1987). On structural equation
modeling with data that are not missing at random. Psychometrika, 42,
431±462.
Reynolds, A. J., & Temple, J. A. (1995). Quasi-experimental estimates of
the effects of a preschool intervention: Psychometric and econometric
comparisons. Evaluation Review, 19, 347±373.
Rogosa, D. R., & Willett, J. B. (1985). Understanding correlates of change
by modeling individual differences in growth. Psychometrika, 50,
203±228.
Schafer, J. (1997). Analysis of incomplete multivariate data. New York:
Chapman & Hall.
Schmitt, N., Sacco, J. M., Ramey, S., & Chan, D. (1999). Parental employ-
ment, school climate, and children's academic and social development.
Journal of Applied Psychology, 84, 737±753.
Weiss, C. H. (1997). How can theory-based evaluation make greater
headway? Evaluation Review, 21, 501±524.
Willett, J. B., & Sayer, A. G. (1994). Using covariance structure analysis to
detect correlates and predictors of individual change over time.
Psychological Bulletin, 116, 363±381.
Willett, J. B., & Sayer, A. G. (1996). Cross-domain analysis of change over
time: Combining growth modeling with covariance structure modeling.
In G. A. Marcoulides & R. E. Shumacker (Eds.), Advanced structural
equation modeling: Issues and techniques (pp. 125±157). Newbury
Park: Sage.
B. Hess / Evaluation and Program Planning 23 (2000) 419±428428