Creation of Dataset and screening programpeople.oregonstate.edu/~acock/growth/handouts/Grow… · Web viewSome may not like using the BMI on this age group, but this is only to illustrate

GROWTH CURVES AND EXTENSIONS USING MPLUS

Alan C. Acock

[email protected] of HDFS

322 Milam HallOregon State University

Corvallis, OR 973317/2009

This document and selected references, data, and programs can be downloaded from

http://oregonstate.edu/~acock/growth

Growth Curve and Related Models, Alan C. Acock 1

http://oregonstate.edu/~acock/growth

mailto:[email protected]

Section 1: Brief Summary of Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Section 2: A Simple Growth Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 Precautionary guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Graphic representation of a growth curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Mplus program for simple growth curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Annotated selected growth curve output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.5 Here are some available plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Section 3: Quadratic growth curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1 Graphic representation of quadratic growth curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Mplus program & output quadratic growth curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Plots for quadratic growth curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Section 4: How Many Waves Should We Have? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.1 Linear Curve—3 minimum, 4 much better . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2 For a quadratic—4 minimum, 5 much better . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Section 5: Alternative to Use of a Quadratic Slope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Section 6: Working with Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 6.1 Two approaches used by Mplus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 6.2 Multiple cohort extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Section 7: Multiple group growth curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 7.1 Program and output without constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 7.2 Comparing intercept and slope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Section 8: An Alternative to Multiple Group Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 8.1 Model and figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 8.2 Mplus program and output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 8.3 Graphic representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Section 9: Growth Curves with Time Invariant Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 9.1 A conditional latent trajectory model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 9.2 Mplus program and output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Section 10: Mediation & Moderation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Section 11: Time Varying Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Section 12: Extensions and Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Goal of the WorkshopThe goal of this workshop is to explore a variety of applications of latent growth curve models using the Mplus program. Because we will cover a wide variety of applications and extensions of growth curve modeling, we will not cover each of them in great detail. At the end of this workshop it is hoped that participants will be able to run Mplus programs to execute a variety of growth curve modeling applications and to interpret the results correctly.

Assumed BackgroundParticipants should be familiar with the content in Introduction to Mplus that is located at www.oregonstate.edu/~acock/growth . It will be assumed that participants in the workshop have some background in Structural Equation Modeling. Background in multilevel analysis will also be useful, but is not assumed. It is possible to learn how to estimate the specific models we will cover without a comprehensive knowledge of Mplus, but some background using an SEM program is useful.


http://www.oregonstate.edu/~acock/growth

Section 1: Brief Summary of Topics

I ntroduction to Growth Curve Modeling

Growth Curves are ideal for longitudinal studies. Instead of predicting a person’s score on a variable (e.g., mean comparison among scores at

different time points or relationships among variables at different time points), we predict their growth trajectory—what is their level on the variable AND how is this changing.

We will present a conceptual model, show how to apply the Mplus program, and interpret the results.

Once we can estimate growth trajectories, the more interesting issue becomes explaining individual differences in trajectories (why some people go up, down, or stay the same).

We will introduce growth curves for multiple groups such as comparing women and men Time invariant and time variant covariates will be introduced Mediation will be introduced

Section 2: A Simple Growth Curve

2.1 Precautionary guidelines

Estimating a basic growth curve using Mplus is quite easy, but when developing a complex model it is best to start easy and gradually build complexity.

Starting easy should include data screening to evaluate the distributions of the variables, patterns of missing values, and possible outliers.

Even if you have a theoretically specified model that is complex, always start with the simplest model and gradually add the complexity.

Here we will show how structural equation modeling conceptualizes a latent growth curves.

Before showing a figure to represent a growth curve, we examine a small sample of our observations:

Data is from the National Longitudinal Survey of Youth that started in 1997. We use the cohort that was 12 years old in 1997 and examine their trajectory for the BMI. Some may not like using the BMI on this age group, but this is only to illustrate an

application of growth curve modeling. The following graph of 10 randomly selected kids was produced by Mplus


A BMI value of 25 is considered overweight and a BMI of 30 is considered obese (I’m aware of problems with the BMI as a measure of obesity and with its limitations when used for adolescents)

With just 10 observations it is hard to see much of a trend, but it looks like adolescents are getting a higher BMI score as they get older.

The X-axis value of 0 is when the adolescent was 12 years old; the 1 is when the adolescent was 13 years old, etc. We are using seven waves of data (labeled 0 to 6) from the panel study.

2.2 Graphic representation of a growth curve

A growth curve requires us to have a model and we should draw this before writing the Mplus program. Figure 1 shows a model for our simple growth curve:


This figure is much simpler than it first appears. The key variables are the two latent variables labeled the Intercept growth factor and the Slope growth factor.

The Intercept growth factor

a. The intercept represents the initial level and is sometimes called the initial level for this reason. It is the estimated initial level and its value may differ from the actual mean for BMI97 because in this case we are imposing a linear growth model.

b. It may differ from the mean of BMI97 - When covariates are added, especially when a zero value on covariates is rare and

covariates are not centered (household income) - A straight line may over or underestimate any one mean including the initial mean

c. Unless the covariates are centered, it usually makes sense to just call it an intercept rather than the initial level.

d. The intercept is identified by the constant loadings of 1.0 going to each BMI score. Some programs call the intercept the constant, representing the constant effect to which other effects are added.

e. It is possible to shift the intercept by how the waves are coded, e.g., we might make it the last year or the middle year.


The Slope growth factor

a. Is identified by fixing the values of the paths to each BMI variable. In a publication you normally would not show the path to BMI97, since this is fixed at 0.0.

b. We fix the other paths at 1.0, 2,0, 3.0, 4.0, 5.0, and 6.0. Where did we get these values? The first year is the base year or year zero. The BMI was measured each subsequent year so these are scored 1.0 through 6.0.

c. Other values are possible. Suppose the survey was not done in 2000 or 2001 so that we had 5 time points rather than 7. We would use paths of 0.0, 1.0, 2.0, 5.0, and 6.0 for years 1997, 1998, 1997, 2002, and 2003, respectively.

d. It is also possible to fix the first couple years and then allow the subsequent waves to be free. - This might make sense for a developmental process where

the yearly intervals may not reflect the developmental rate. Developmental time may be quite different than chronological time.

- This has the effect of “stretching” or “shrinking” time to the pattern of the data (Curran & Hussong, 2003).

- An advantage of this approach is that it uses fewer degrees of freedom than adding a quadratic slope and can fit better.

- Compared to a quadratic for a curve, this approach doesn’t require a monotonic function.

e. Mplus has a feature that allows each participant to have a different interval which is important when the time between waves varies.- Annual surveys—One person has a 12-month difference,

one a 10-month difference, and one a 14-month difference.- TSCORE

Residual Variance and Random Effects

a. The individual variation around the Intercept and Slope are represented in Figure 1 by the RI and Rs. These are the variance in the intercept and slope around their respective means.

b. We expect substantial variance in both of these as some individuals have a higher or lower starting BMI and some individuals will increase (or decrease) their BMI at a different rate than the average growth rate.

c. In addition to the mean intercept and slope, each individual will have their own intercept and slope. We say the intercept and the slope are random effects since they may vary across individuals.


- They are random in the sense that each individual may have a steeper or flatter slope than the mean slope and

- Each individual may have a higher or lower initial level than the mean intercept.d. In our sample of 10 individuals shown above, notice one adolescent starts with a BMI

around 12 and three adolescents start with a BMI around 30. Some children have a BMI that increases and others do not.

e. The variances, RI and R2 are critical if we are going to explore more complex models with covariates (e.g., gender, psychological problems, race, household income, physical activity) that might explain why some individuals have a steeper or less steep growth rate than the average.

f. A random intercept model would have a free Ri and fixed Rs.

The ei terms represent individual error terms for each year. Some years may move above or below the growth trajectory described by our Intercept and Slope.

Sometimes it might be important to allow error terms to be correlated, especially subsequent pairs such as e97-e98, e98-e99, etc.

2.3 Mplus program for simple growth curve

Here is the Mplus program for a simple growth model:

Title: bmi_growth.inp Basic growth curveData: File is "C:\Mplus examples\bmi_stata.dat" ;Analysis: Processors = 2;Variable: Names are id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 white black hispanic asian other; Missing are all (-9999) ; ! Notice usevariables is limited to bmi variables Usevariables are bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 ;Model: i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;


Output: Sampstat Mod(3.84); Plot: Type is Plot3; Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);

What is new compared to an SEM program?

Usevariables are: subcommand to only include the BMI variables since we are doing a growth curve for these variables.

We drop the Analysis: section if we have a single processor because we are doing basic growth curve and can use the default options. With multiple processors, this is included to tell Mplus how many processors to utilize.

We have a Model: section because we need to describe the model. Mplus was designed after growth curves were well understood. There is a single line to describe our model:

i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;

a. In this line the “i” and “s” stand for the intercept and slope growth factors, respectively. We could have called these anything such as intercept and slope or initial and trend. The vertical line, | (sometimes called “or bar,” tells Mplus that it is about to define an intercept and slope.

b. Defaults- The intercept is defined by a constant of 1.0 for each bmi

variable. Interceptbmij path is 1.0. Therefore, we do not need to mention this.- The slope is defined by fixing the path from the slope to

bmi97 at 0, the path to bmi98 at 1, etc. The @ sign is used for “at.” Don’t forget the semi-colon to end the command.

- Mplus assumes that there is a residual variance for both the intercept and slope (RI and R2) and that these covary. Therefore, we do not need to mention this

- Mplus assumes there is uncorrelated random error, ei for each observed variable

- The intercepts for the Y variables (BMi97-BMi03) are fixed at zero by default. We could specify this default by adding a line [BMi97-BMi03@0];. The square brackets are used to fix the intercepts in this case.

- Means of intercept and slope are free. We could specify this by adding a line i s; where simply naming the variables make their means free.


c.To allow e97 and e98 to be correlated, we would need to add a line saying bmi97 with bmi98; . - This may seem strange because we are not really

correlating bmi97 with bmi98, but e97 with e98. Mplus knows this and we do not need to generate a separate set of names for the error terms.

The last additional section in our Mplus program is for selecting what output we want Mplus to provide. There are many optional outputs of the program and we will only illustrate a few of these. The Output: section has the following lines

Output: Sampstat Mod(3.84); Plot: Type is Plot3; Series = bmi97 bmi98 bmi99 bmi00

bmi01 bmi02 bmi03(*);

The first line, Sampstat Mod(3.84) asks for sample statistics and modification indices for parameters we might free, as long as doing so would reduce chi-square by 3.84 (corresponding to the .05 level). We do not bother with parameter estimates that would have less effect than this. The default value is 10.0.

Next comes the Plot: subcommand and we say that we want Type is Plot3; for our output. This gives us the descriptive statistics and graphs for the growth curve.

The last line of the program specifies the series to plot. By entering the variables with an (*) at the end we are setting a path at 0.0 for bmi97, 1.0 for bmi98, etc.

2.4 Annotated Selected Growth Curve Output

The following is selected output with comments:

Mplus VERSION 5.1MUTHEN & MUTHEN07/01/2008 2:40 PM


*** WARNING Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 3 1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONSMplus uses all available data assuming MAR. There were three cases that were dropped because they had no BMI report for any wave.


bmi_growth.inpBasic growth curve

SUMMARY OF ANALYSIS

Number of groups 1Number of observations 1768With listwise deletion we would have an N = 1102Number of dependent variables 7Number of independent variables 0Number of continuous latent variables 2Observed dependent variables

Continuous BMI97 BMI98 BMI99 BMI00 BMI01 BMI02 BMI03Continuous latent variables I S

The following is a very nice analysis of patterns of missing values.Estimator MLInformation matrix OBSERVEDSUMMARY OF DATA


Number of missing data patterns 81COVARIANCE COVERAGE OF DATAMinimum covariance coverage value 0.100 PROPORTION OF DATA PRESENT Covariance Coverage BMI97 BMI98 BMI99 BMI00 BMI01 ________ ________ ________ ________ ________ BMI97 0.925 BMI98 0.847 0.902 BMI99 0.850 0.856 0.910 BMI00 0.842 0.846 0.864 0.906 BMI01 0.839 0.837 0.854 0.859 0.904 BMI02 0.796 0.794 0.805 0.811 0.817 BMI03 0.777 0.775 0.788 0.788 0.801 Covariance Coverage BMI02 BMI03 ________ ________ BMI02 0.861 BMI03 0.774 0.840

Check means to see if there is a clear overall trajectory SAMPLE STATISTICS ESTIMATED SAMPLE STATISTICS


Means BMI97 BMI98 BMI99 BMI00 BMI01 ________ ________ ________ ________ ________ 1 20.572 21.839 22.651 23.305 23.846 Means BMI02 BMI03 ________ ________ 1 24.390 24.935

Correlations BMI97 BMI98 BMI99 BMI00 BMI01 ________ ________ ________ ________ ________ BMI97 1.000 BMI98 0.764 1.000 BMI99 0.765 0.850 1.000 BMI00 0.721 0.812 0.853 1.000 BMI01 0.709 0.799 0.853 0.856 1.000 BMI02 0.652 0.720 0.745 0.752 0.813 BMI03 0.651 0.707 0.737 0.751 0.815

Correlations BMI02 BMI03 ________ ________


BMI02 1.000 BMI03 0.766 1.000

TESTS OF MODEL FITChi-Square Test of Model Fit

Value 268.041 Degrees of Freedom 23 P-Value 0.0000Chi-Square Test of Model Fit for the Baseline Model Value 11502.912 Degrees of Freedom 21 P-Value 0.0000CFI/TLI CFI 0.979 TLI 0.981Loglikelihood H0 Value -27739.720 H1 Value -27605.699Information Criteria Number of Free Parameters 12 Akaike (AIC) 55503.439 Bayesian (BIC) 55569.171


Sample-Size Adjusted BIC 55531.048 (n* = (n + 2) / 24)RMSEA (Root Mean Square Error Of Approximation) Estimate 0.078 90 Percent C.I. 0.069 0.086 Probability RMSEA <= .05 0.000SRMR (Standardized Root Mean Square Residual) Value 0.051

MODEL RESULTS Two-Tailed Estimate S.E. Est./S.E. P-ValueThe intercept and slope are fixed so there is no test for them. I | BMI97 1.000 0.000 999.000 999.000 BMI98 1.000 0.000 999.000 999.000 BMI99 1.000 0.000 999.000 999.000 BMI00 1.000 0.000 999.000 999.000 BMI01 1.000 0.000 999.000 999.000 BMI02 1.000 0.000 999.000 999.000 BMI03 1.000 0.000 999.000 999.000

S | BMI97 0.000 0.000 999.000 999.000


BMI98 1.000 0.000 999.000 999.000 BMI99 2.000 0.000 999.000 999.000 BMI00 3.000 0.000 999.000 999.000 BMI01 4.000 0.000 999.000 999.000 BMI02 5.000 0.000 999.000 999.000 BMI03 6.000 0.000 999.000 999.000Intercept and slope have significant covariance S WITH I 0.408 0.073 5.559 0.000

Means I 21.035 0.100 210.352 0.000 S 0.701 0.017 40.663 0.000Growth curve is BMI’ = 21.035 + .701×Year Intercepts BMI97 0.000 0.000 999.000 999.000 BMI98 0.000 0.000 999.000 999.000 BMI99 0.000 0.000 999.000 999.000 BMI00 0.000 0.000 999.000 999.000 BMI01 0.000 0.000 999.000 999.000 BMI02 0.000 0.000 999.000 999.000 BMI03 0.000 0.000 999.000 999.000

Variances


I 15.051 0.597 25.209 0.000 S 0.255 0.018 14.228 0.000There is a big random intercept and random slope effect. The standard deviation is sqrt(.255) = .50. Putting plus or minus two standard deviations around the slope of .70 shows how big the variance is. The standard deviation for the intercept is sqrt(15.051) = 3.880. BMI is probably skewed positively.

Residual Variances BMI97 5.730 0.268 21.413 0.000 BMI98 3.276 0.164 19.942 0.000 BMI99 3.223 0.146 22.009 0.000 BMI00 4.361 0.185 23.538 0.000 BMI01 2.845 0.150 19.005 0.000 BMI02 9.380 0.397 23.622 0.000 BMI03 8.589 0.422 20.345 0.000

QUALITY OF NUMERICAL RESULTS

Condition Number for the Information Matrix 0.656E-02 (ratio of smallest to largest eigenvalue)

MODEL MODIFICATION INDICESMinimum M.I. value for printing the modification index 3.840We don’t want to change the intercept loadings of 1.0. We might think about a nonlinear growth. We might think about correlating adjacent error terms. The suggested correlation between E1 and E7 indicates a straight line is missing both ends, hence a curve of some kind? Muthen suggests to try not to mess with the intercepts.


M.I. E.P.C. Std E.P.C. StdYX E.P.C.BY StatementsI BY BMI97 112.472 -0.038 -0.147 -0.032I BY BMI98 6.440 0.007 0.027 0.006I BY BMI99 33.234 0.014 0.054 0.012I BY BMI00 13.026 0.010 0.037 0.008I BY BMI02 4.015 -0.008 -0.032 -0.005I BY BMI03 28.212 -0.023 -0.091 -0.015S BY BMI97 70.828 -0.825 -0.417 -0.091S BY BMI99 18.208 0.276 0.139 0.030S BY BMI00 8.062 0.204 0.103 0.021S BY BMI03 38.314 -0.755 -0.382 -0.062WITH StatementsBMI99 WITH BMI98 12.747 0.449 0.449 0.138BMI00 WITH BMI97 9.699 -0.511 -0.511 -0.102BMI00 WITH BMI99 26.084 0.641 0.641 0.171BMI01 WITH BMI97 6.914 -0.388 -0.388 -0.096BMI01 WITH BMI98 11.566 -0.403 -0.403 -0.132BMI01 WITH BMI00 5.456 0.310 0.310 0.088BMI02 WITH BMI97 8.645 0.715 0.715 0.098BMI02 WITH BMI99 9.066 -0.544 -0.544 -0.099BMI02 WITH BMI00 9.560 -0.633 -0.633 -0.099BMI03 WITH BMI97 37.342 1.564 1.564 0.223BMI03 WITH BMI99 22.526 -0.874 -0.874 -0.166


BMI03 WITH BMI00 11.717 -0.724 -0.724 -0.118BMI03 WITH BMI02 11.053 1.083 1.083 0.121Means/Intercepts/Thresholds[ BMI97 ] 97.476 -0.754 -0.754 -0.165[ BMI98 ] 7.230 0.155 0.155 0.035[ BMI99 ] 25.098 0.257 0.257 0.056[ BMI00 ] 10.542 0.185 0.185 0.038[ BMI02 ] 4.646 -0.189 -0.189 -0.032[ BMI03 ] 22.536 -0.448 -0.448 -0.073There are a number of plots available. These are not bad, but Stata or some other package, even Excel, could do nicer graphs.

PLOT INFORMATIONThe following plots are available: Histograms (sample values, estimated factor scores, estimated values) Scatterplots (sample values, estimated factor scores, estimated values) Sample means Estimated means Sample and estimated means Observed individual values Estimated individual values

2.5 Here are Some Available Plots

It is often useful to show the actual means for a small random sample of participants. These are Sample Means.

Click on Graphs Observed Individual Values

This gives you a menu where you can make some selections. I used the clock to seed a random generation of observations.


Here I selected Random Order and for 20 cases. This results in the following graph:

This shows one person who started at an obese BMI = 30 and then dropped down. However, most people increased gradually.

Next, let’s look at a plot of the actual means and the estimated means using our linear growth model. Click on

Graphs and then select View graphs


Sample and estimated means. Demonstrate how to edit the graph.

Notice that there is a clear growth trend in BMI. A BMI of 15-20 is considered healthy and a BMI of 25 is considered overweight. Notice what happens to American youth between the age of 12 and the age of 18.

Section 3: Quadratic Growth Curve

This graph is useful to seeing if there is a nonlinear trend. Changing the scale of the Y-axis can clarify this. It is simple to add a quadratic term, if the curve is departing from linearity.

Looking at the graph it may seem that the linear trend works very well, but our RMSEA was a bit big.

The estimated initial BMI is higher than the observed mean. The estimated BMI at 2003 is also higher than the observed mean A quadratic might pick this up by having a curve that drops slightly to pick up the BMI97

mean and the BMI2003 mean. Estimation of three terms (Intercept, Linear trend, Quadratic trend) requires at least four

waves of data, but more waves are highly desirable for a good test of the quadratic term.

3.1 Graphic representation of quadratic growth curve

The conceptual model in Figure 1 will be unchanged except a third latent variable is added.


We will have the Intercept, Slope, now called linear trend, and the new latent variable called the Quadratic trend.

Like the first two, the Quadratic trend will have a residual variance (R3) that will freely correlated with R1 and R2.

The paths from the quadratic trend to the individual BMI variables will be the square of the path from the Linear trend to the BMI variables. Hence a. The values for the linear trend will remain 0.0, 1.0, 2.0, 3.0, 4.0,

5.0, and 6.0. b. For the quadratic these values will be 0.0, 1.0, 4.0, 9.0, 16.0, 25.0,

and 36.0.

You really appreciate the defaults in Mplus when you see what we need to change in the Mplus program when we add a quadratic slope. Here is the only change we need to make:


3.2 Mplus program & output for quadratic growth curve

Model: i s q| bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;

Mplus will know that the quadratic, q (we could use any name) will have values that are the square of the values for the slope, s.

Title: bmi_guadratic.inp Quadratic growth curveData: File is "C:\Mplus examples\bmi_stata.dat" ;Variable: Names are id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 white black hispanic asian other; Missing are all (-9999) ; ! usevariables is limited to bmi variables Usevariables are bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 ;Model: i s q| bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;Output: Sampstat Mod(3.84); Plot: Type is Plot3; Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);

Here is selected output:

TESTS OF MODEL FITWe had 23 degrees of freedom with the linear growth curve and a chi-square of 268.041. Now we have 19 degrees of freedom and a chi-square of 73.121. Where did we lose four degrees of freedom?

Mean for the quadratic


Variance of the quadratic Covariance of quadratic residual with intercept residual Covariance of quadratic residual with slope residual

Did we improve our fit? 268.041-73.121 = 194.92 with 4 degrees of freedom, p

< .001Does our model fit?

Chi-square (19) = 73.121, p < .001, but CFI = .995 RMSEA = .040

Chi-Square Test of Model Fit

Value 73.121 Degrees of Freedom 19 P-Value 0.0000

Chi-Square Test of Model Fit for the Baseline Model

Value 11502.912 Degrees of Freedom 21 P-Value 0.0000

CFI/TLI

CFI 0.995 TLI 0.995

Loglikelihood

H0 Value -27642.260 H1 Value -27605.699

Information Criteria

Number of Free Parameters 16 Akaike (AIC) 55316.520 Bayesian (BIC) 55404.161 Sample-Size Adjusted BIC 55353.330 (n* = (n + 2) / 24)

RMSEA (Root Mean Square Error Of Approximation)

Estimate 0.040 90 Percent C.I. 0.031 0.050 Probability RMSEA <= .05 0.949

SRMR (Standardized Root Mean Square Residual)


Value 0.026

MODEL RESULTS

Two-Tailed Estimate S.E. Est./S.E. P-Value

S WITH I 0.550 0.226 2.441 0.015

Q WITH I -0.030 0.036 -0.854 0.393 S -0.159 0.022 -7.236 0.000

Means I 20.713 0.101 204.728 0.000 S 1.060 0.044 23.834 0.000 Q -0.063 0.007 -8.585 0.000

Variances I 14.273 0.638 22.382 0.000 S 1.141 0.139 8.184 0.000 Q 0.029 0.004 7.730 0.000

Residual Variances BMI97 4.635 0.306 15.132 0.000 BMI98 3.340 0.162 20.643 0.000 BMI99 2.852 0.143 19.954 0.000 BMI00 3.994 0.182 21.926 0.000 BMI01 2.880 0.154 18.762 0.000 BMI02 9.343 0.394 23.690 0.000 BMI03 5.677 0.507 11.192 0.000

MODEL MODIFICATION INDICESMinimum M.I. value for printing the modification index 3.840 M.I. E.P.C. Std E.P.C. StdYX E.P.C.

BY Statements

I BY BMI97 24.292 -0.024 -0.090 -0.021I BY BMI98 9.860 0.008 0.032 0.007


I BY BMI99 5.419 0.006 0.022 0.005I BY BMI01 12.777 -0.009 -0.035 -0.007I BY BMI03 14.857 0.024 0.092 0.016S BY BMI97 18.253 -0.363 -0.388 -0.089S BY BMI98 7.381 0.126 0.135 0.031S BY BMI01 9.168 -0.137 -0.147 -0.029S BY BMI03 10.308 0.349 0.373 0.063Q BY BMI97 12.444 3.442 0.589 0.136Q BY BMI99 4.868 -1.114 -0.191 -0.041Q BY BMI01 11.767 1.725 0.295 0.058Q BY BMI03 13.934 -5.610 -0.961 -0.163ON/BY StatementsQ ON I /I BY Q 999.000 0.000 0.000 0.000WITH StatementsBMI98 WITH BMI97 11.493 -1.044 -1.044 -0.265BMI99 WITH BMI98 8.019 0.361 0.361 0.117BMI01 WITH BMI98 8.978 -0.354 -0.354 -0.114BMI02 WITH BMI01 12.482 0.694 0.694 0.134BMI03 WITH BMI02 5.261 -1.040 -1.040 -0.143Means/Intercepts/Thresholds[ BMI97 ] 23.635 -0.492 -0.492 -0.113[ BMI98 ] 11.403 0.191 0.191 0.043[ BMI01 ] 9.093 -0.166 -0.166 -0.032[ BMI03 ] 13.777 0.495 0.495 0.084

PLOT INFORMATIONThe following plots are available: Histograms (sample values, estimated factor scores, estimated values) Scatterplots (sample values, estimated factor scores, estimated values) Sample means Estimated means Sample and estimated means Observed individual values Estimated individual values


3.3 Plots for quadratic growth curve

The fit is so good because the estimated means and observed means are so close. However, there is still significance variance (random effects for both the intercept and the

slope) among individual adolescents that still needs to be explained. Here are 20 estimated individual growth curves.

a. Notice that each of these is a curve, but they start at different initial levels and have different trajectories.

b. Next, we want to use covariates to explain these differences in the initial levels and growth trajectories.


Section 4: How Many Waves Should We Have?

4.1 Linear model--3 minimum, 4 much better

In this example we have 7 waves of data and this will give us lots of degrees of freedom. What is the minimum?

Consider degrees of freedom for 3 waves of data.

Degrees of freedom are the differences in number of parameters estimated for an H1: model which is essentially no relationships and the number of parameters estimated in your H0: model

We are estimating a number of parameters—How many?

H1: model (unrestricted) for 3 waves has 3 means: My1, My2, My3

3 variances: Var(Y1), Var(Y2), Var(Y3)3 covariances: Cov(Y1,Y2), Cov(Y1,Y3), Cov(Y2,Y3)

9 known statistics

H0: model (simple growth curve) has

The figure shows the following parameters:.a 1 Variance for Intercept.b 1 Variance for Slope.c 1 Covariance of variance of Slope and Intercept.d 1 Mean of Intercept.e 1 Mean of Slope.f, g, h 3 Error Variances

We need to estimate 8 parameters.

Therefore, we have 9 – 8 = 1 degree of freedom.

We could not fit a quadratic. It would use parameter estimates for its mean, variance, and covariance with both the intercept and slope

We can only free one parameter such as a covariance of the error terms or a loading for a wave on the slope.


What about 4 waves?

H1: model (unrestricted) for 4 waves has 4 means: My1, My2, My3, My4

4 variances: Var(Y1), Var(Y2), Var(Y3), Var(Y4)6 covariances: Cov(Y1,Y2), Cov(Y1,Y3), Cov(Y1,Y), Cov(Y2,Y3) Cov(Y2,Y4), Cov(Y3,Y4)

total is 14 known statisticsGrowth Curve and Related Models, Alan C. Acock 29

We are still estimating the same 8 parameters so we have 14-8 = 6 degrees of freedom. Adding a 4th wave provides a much better test of a linear model.

Rule—Publish it with three waves, but always try to get 4 or more waves of data.

4.2 For quadratic—4 minimum, 5 much better

We can follow the same procedure and see that we need to have 4 waves for a quadratic We have a much better test (degrees of freedom) if we have 5 waves for a quadratic

Section 5: Alternative to Use of a Quadratic Slope

An alternative to adding a quadratic slope is to allow some of the time loadings for the slope to be free.

We have used loadings of 0, 1, 2, 3, 4, 5, and 6 for the linear slope and 0, 1, 4, 9, 16, 25, and 36 for the quadratic slope. Alternatively

We could allow all but two of the loadings to be free. We might use loadings of 0, 1, *, *, *, * .

It is necessary to have the 0 and 1 fixed but the 1 does not have to be second; we could use 0, *, *, *, *,1.

You may ask how you could justify allowing some of the time loadings to be free if there was a one month or one year difference between waves of data. The answer is that developmental time may be different than chronological time.

Allowing these loadings to be free has an advantage over the quadratic in that it uses fewer degrees of freedom but still allows for growth spurts.

This model is not nested under a quadratic, but you could think of a linear growth model with fixed values for each year (0, 1, 2, 3, 4, 5, 6) being nested within the free model that uses 0, 1, *, *, *, *. If the free model fits much better than the fixed linear model, you might use this instead of the quadratic model.

This approach does not impose a specific form on the relationship—it is a free from that can connect the means in whatever complexity they are



Section 6: Working with Missing Values

6.1 Two approaches used by Mplus Mplus has two ways of working with missing values.

The simplest is to use maximum likelihood estimation with missing values (ML). o This uses all available data and is the default since version 5.0. o For example, some adolescents were interviewed all six years but others may have

skipped one, two, or even more years. o We use all available information with this approach.

The second approach is to utilize multiple imputations.o Multiple imputations should not be confused with single imputation available from

earlier versions of SPSS which gives incorrect standard errors. o Multiple imputation involves imputing multiple datasets (usually 5-20) o Estimating the model for each of these datasets, and o Then pooling the estimates and standard errors.

When the standard errors are pooled this way, they incorporate the variability across the 5-20 solutions and are thereby produced unbiased estimates of standard errors. Multiple imputations can be done with:

Norm, a freeware program that works for normally distributed, continuous variables and is often used even on dichotomized variables.

A Stata user has written a program called ice that is an implementation of the S-Plus/R program called MICE, that has advantages over Norm. It does the imputation by using different estimation models for outcome variables that are continuous, counts, or categorical. See Royston (2005).

SAS has similar capabilities. Mplus can read these multiple datasets, estimate the model for each dataset, and pool the

estimates and their standard errors.

We will not illustrate the multiple imputation approach because that involves working with other programs to impute the datasets. However, the Mplus User’s Guide, discusses how you specify the datasets in the Data: section.


6.2 Multiple cohort extension

Major datasets often have multiple cohorts. NLSY97 has youth who were 12-18 in 1997. Seven years later, they are 19-25. It is quite likely that many growth processes that involve going from the age of 12 to the age

of 19 are different than going from 19-25. For example, involvement in minor crimes (petty theft, etc.) may increase from 12 to 19, but

then decrease from there to 25. Here is what we might have for our NLSY97 data (data inside tables are scores, person 1,

born in 1985, in 1997 at age of 12 had a score of 3 on the outcome variable)

Score by survey year for a single case from each cohort

Survey YearIndividual Brth Cohort 1997 1998 1999 2000 2001 2002 20031 1985 3 4 5 6 7 7 82 1985 2 4 3 5 6 7 73 1984 4 5 6 7 6 6 54 1982 6 7 5 4 3 2 25 1982 5 5 6 4 2 2 1

We can rearrange this data

Data for first 5 cases

Case Cohort 12 13 14 15 16 17 18 19 20 211 1985 3 4 5 6 7 7 8 * * *2 1985 2 4 3 5 6 7 7 * * *3 1984 * 4 5 6 7 6 6 5 * *4 1982 * * * 6 7 5 4 3 2 25 1982 * * * 5 5 6 4 2 2 1

In this table the top row is the age at which the data was collected. To capture everybody we would need to extend the table to HD25 because the youth who were 18 in 1997 are 25 seven years latter.

This table would have massive amounts of missing data, but the missingness would not be related to other variables. It would be missing completely at random (MCAR).

We could develop a growth curve that covered the full range from age 12 to age 25. We would have 14 waves of data even though each participant was only measured 7 times. Each


participant would have data for a maximum of 7 of the years and have missing values for a minimum of 7 years.

We would want to estimate a growth model with a quadratic term and expect the linear slope to be positive (growth from 12-18) and the quadratic term to be negative (decline from 18-25).

Mplus has a special Analysis: type called MCOHORT. There is an example on the Mplus WebPage and we will not cover it here. This is an extraordinary way to deal with missing values.

Here is an example from data Muthén analyzed. He had 7 waves of data on people who were 18-24 at the first wave. No data was collected 6 years He has a growth curve from 18 to 37

When I copied this image, a couple waves were grey’d out that do not show very well here.

Section 7: Multiple group growth curves

Multiple group analysis using SEM is extremely flexible—some would say it is too flexible because there are so many possibilities.

We use gender for our grouping variable because we are interested in the trend in BMI for girls compared to boys.

We think of adolescent girls are more concerned about their weight and therefore more likely to have a lower BMI than boys and to have a flatter trajectory.


There are several ways of comparing a model across multiple groups.

One approach is to see if the same model fits each group, allowing all of the estimated parameters to be different.

Here we are saying that a linear growth model fits the data for both boys and girls, but We are not constraining girls and boys to have the same values on any of the parameters. They may differ on the

intercept mean slope mean intercept variance slope variance covariance of intercept and slope residuals residual errors covariance of the residual errors that may be specified.

We can then put increasing invariance constraints on the model. a. At a minimum, we want to test whether the two groups have a

different intercept (level) and slope. b. If this constraint is acceptable we can add additional constraints

on the variances, covariances, and error terms.

7.1 Program and output without constraints

First, we will estimate the model simultaneously for girls and boys with no constraints on the parameters. Here is the program with new commands highlighted:

Title: bmi_growth_gender.inp Data: File is bmi_stata.dat ;Variable: Names are id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 white black hispanic asian other; Missing are all (-9999) ; ! usevariables keeps bmi variables and gender


Usevariables are male bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 ; Grouping is male (0=female 1=male);Model: i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;Output: Sampstat Mod(3.84) ; Plot: Type is Plot3; Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);

I’ve put the only changes we need to make in bold, underline. We have a binary variable, male, that is coded 0 for females and 1 for males. We add male to the list of variables we are using. We add a subcommand to the Variable: section that says we have a grouping variable,

names it, and defines what the values are so the output will be labeled nicely. The command Grouping is male (0=female 1 = male); is going to give us a

separate set of estimates for the parameters for girls (labeled female) and boys (labeled male).

The estimation does both groups simultaneously.

Here is selected, annotated output:

SUMMARY OF ANALYSIS

Number of groups 2Number of observations Group FEMALE 859 Group MALE 909The following shows that we have the same variables in the modelNumber of dependent variables 7Number of independent variables 0Number of continuous latent variables 2Observed dependent variables Continuous BMI97 BMI98 BMI99 BMI00 BMI01 BMI02 BMI03Continuous latent variables I SVariables with special functions Grouping variable MALESAMPLE STATISTICS


ESTIMATED SAMPLE STATISTICS FOR FEMALE Means BMI97 BMI98 BMI99 BMI00 BMI01 ________ ________ ________ ________ ________ 1 20.432 21.840 22.375 22.916 23.443 Means BMI02 BMI03 ________ ________ 1 24.295 24.727

ESTIMATED SAMPLE STATISTICS FOR MALE Means BMI97 BMI98 BMI99 BMI00 BMI01 ________ ________ ________ ________ ________ 1 20.698 21.848 22.896 23.665 24.220 Means BMI02 BMI03 ________ ________ 1 24.467 25.111TESTS OF MODEL FIT

Chi-Square Test of Model Fit Value 411.966 Degrees of Freedom 46 was 23 P-Value 0.0000Chi-Square Contributions From Each Group FEMALE 150.775 MALE 261.191Chi-Square Test of Model Fit for the Baseline Model Value 11735.530 Degrees of Freedom 42 P-Value 0.0000CFI/TLI CFI 0.969 TLI 0.971Loglikelihood H0 Value -27639.607 H1 Value -27433.624Information Criteria Number of Free Parameters 24 Akaike (AIC) 55327.213 Bayesian (BIC) 55458.676 Sample-Size Adjusted BIC 55382.430 (n* = (n + 2) / 24)RMSEA (Root Mean Square Error Of Approximation) Estimate 0.095 90 Percent C.I. 0.087 0.103SRMR (Standardized Root Mean Square Residual) Value 0.072

MODEL RESULTS Two-Tailed


Estimate S.E. Est./S.E. P-ValueGroup FEMALE S WITH I 0.522 0.103 5.050 0.000 Means I 20.881 0.143 145.640 0.000 S 0.663 0.025 27.015 0.000Variances I 15.141 0.859 17.626 0.000 S 0.264 0.026 10.221 0.000 Residual Variances BMI97 4.662 0.334 13.980 0.000 BMI98 3.368 0.242 13.940 0.000 BMI99 2.753 0.190 14.503 0.000 BMI00 5.154 0.308 16.718 0.000 BMI01 3.084 0.226 13.649 0.000 BMI02 13.344 0.769 17.360 0.000 BMI03 6.105 0.517 11.812 0.000

Group MALE S WITH I 0.278 0.102 2.719 0.007 Means I 21.180 0.139 152.166 0.000 S 0.732 0.024 30.661 0.000 Variances I 14.911 0.824 18.094 0.000 S 0.254 0.025 10.292 0.000 Residual Variances BMI97 6.693 0.417 16.066 0.000 BMI98 3.237 0.227 14.279 0.000 BMI99 3.671 0.223 16.487 0.000 BMI00 3.730 0.224 16.656 0.000 BMI01 2.489 0.185 13.434 0.000 BMI02 5.416 0.357 15.190 0.000 BMI03 10.857 0.676 16.063 0.000

Here is the graph of the two growth curves. It appears that the girls have a lower initial level and a flatter rate of growth of BMI.


7.2 Comparing intercept and slope

We should not rely on our visual inspection, but should explicitly test whether the girls and boys have a significant difference in their intercept and their slope. We can re-estimate the model with the intercept and slope invariant (or do it twice so we could have separate tests.) To do this we make the following modifications to the model:

Notice that we added two lines to the Model: section, Title: bmi_growth_gender_equal.inp Data: File is bmi_stata.dat ;Variable: Names are id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 white black hispanic asian other; Missing are all (-9999) ; Usevariables are male bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 ; Grouping is male (0=female 1=male);Model: i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;


Model: ! models group with lowest score, 0, female [i] (1); [s] (2); Model male: [i] (1); ! this makes intercept be the same [s] (2);Output: Sampstat Mod(3.84) ; Plot: Type is Plot3;

Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02bmi03(*);

We kept the group subcommand and added the grouping variable, male. We added two Model: subcommands under the Model: command. The first one refers to the first group.

Since girls were coded 0 and boys were coded 1, the first group is girls. We put the name of parameters in square brackets, [i] and [s].

If we had called these initial and trend we would have typed [initial] and [trend]. We put an arbitrary number in parentheses after the parameter name. Thus, we put (1) after [i].

In the second subcommand, Model male:, we put the name of the parameters followed by the same numbers as they had in the first gorup. Thus, the intercept gets the number 1 for girls and also gets the number 1 for boys. This tells Mplus these must be held equal. They are still optimized, but with the constraint that they are equal. If we had typed [i] (2) under Model male: what would happen? We would have

constrained the boys intercept to be equal to the girls slope—not something we would want to do.

If we had omitted the [s] (2) under Model male: What would happen. We would have constrained both solutions to have the same intercept [i] (1), but not constrained them to have the same slopes.

The first Model: command is understood to be the group coded as zero on the male variable. These changes force the intercept to be equal in both groups because they are both assigned

parameter (1) and the slopes to be equal because they are both assigned a parameter (2). Any parameters with a (1) after them are equal in both groups as are any parameters with

a (2) after them in both groups. Notice that we have square brackets [ ] around the names of the intercept and slope.

When we run the revised program we obtain a chi-square that has two extra degrees of freedom because of the two constraints.


TESTS OF MODEL FIT

Chi-Square Test of Model Fit

Value 418.884 Degrees of Freedom 48 P-Value 0.0000We had a chi-square(46) = 411.966 without these constraints. The difference has a chi-square of 6.918 with 2 degrees of freedom. Using Stata the significance is chi-square(2) = 6.918, p < .05. display 1-chi2(2,6.918).03146121

Chi-Square Contributions From Each GroupThe model fits females much better than it fits males: FEMALE 154.530 MALE 264.353Chi-Square Test of Model Fit for the Baseline Model Value 11735.530 Degrees of Freedom 42 P-Value 0.0000CFI/TLI CFI 0.968 TLI 0.972Loglikelihood H0 Value -27643.065 H1 Value -27433.624Information Criteria Number of Free Parameters 22 Akaike (AIC) 55330.131 Bayesian (BIC) 55450.638 Sample-Size Adjusted BIC 55380.746 (n* = (n + 2) / 24)RMSEA (Root Mean Square Error Of Approximation) Estimate 0.093 90 Percent C.I. 0.085 0.102SRMR (Standardized Root Mean Square Residual) Value 0.079


MODEL RESULTS Two-Tailed Estimate S.E. Est./S.E. P-ValueGroup FEMALES WITH I 0.528 0.104 5.097 0.000 Means I 21.046 0.100 210.324 0.000 S 0.700 0.017 40.814 0.000 Group MALE S WITH I 0.281 0.102 2.742 0.006 Means I 21.046 0.100 210.324 0.000 S 0.700 0.017 40.814 0.000We should also look at the variances and covariances.

Although we can say there is a highly significant difference between the level and trend for girls and boys, we need to be cautious because this difference of chi-square has the same problem with a large sample size that the original chi-squares have.

In fact, the measures of fit are hardly changed whether we constrain the intercept and slope to be equal or not. Moreover, the visual difference in the graph is not dramatic.

We could also put other constraints on the two solutions such as equal variances and covariances, and even equal residual error variances, but we will not.

Section 8: An Alternative to Multiple Group Analysis

8.1 Model and figure

An alternative way of doing this, where there are two groups, is to enter the grouping variable as a predictor. This requires re-conceptualizing our model. We can think of the indicator variable Male having a direct path to both the intercept and the slope. Because the indicator variable is coded as 1 for male and 0 for female, If the path from Male to the Intercept is positive this means that boys have a higher initial

level on BMI.


Similarly, if there is a positive path from Male to the Slope, this indicates that boys have a steeper slope than girls on BMI. This direct effect actually represents an interaction between the trajectory and gender.

Such results would be consistent with our expectation that boys both start higher and gain more fat than girls during adolescence.

This approach does not let us test for other types of invariances such as the residual variances, covariances, and error terms. a. We are forcing these to be the same for both females and males; this may be unreasonable.b. The random effect for the slope for boys, Rs, may be greater or less than it is for girls. We

will not be able to evaluate this possibility with this approach.

The following figure shows these two paths. We are explaining why some people have a higher or lower initial level and why some have a steeper or flatter slope by whether they are a girl or a boy. We are predicting that boys have a higher initial level and a steeper slope.

Here is the figure:


8.2 Mplus program and output

Here is the program:Title: bmi_gender_alternatives.inp bmi growth curve using gender as a single covariate. This is an alternative to using gender as two groups.Data: File is "c:\Mplus examples\bmi_stata.dat" ;Variable: Names are id grlprb_y boyprb_y grlprb_p boyprb_p male

race_eth bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 white black Hispanic asian other;

Missing are all (-9999) ; Usevariables are male bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 ;Model: i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6; i on male; s on male; Output: Sampstat Mod(3.84) standardized; Plot: Type is Plot3; Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);

Here is selected, annotated output:

SUMMARY OF ANALYSIS

Number of groups 1Number of observations 1771TESTS OF MODEL FIT

Chi-Square Test of Model FitWe cannot compare this chi-square to the two group chi-square because it is not a nested model. Value 301.244 Degrees of Freedom 28 P-Value 0.0000Chi-Square Test of Model Fit for the Baseline Model


Value 11544.530 Degrees of Freedom 28 P-Value 0.0000CFI/TLI CFI 0.976 TLI 0.976Loglikelihood H0 Value -29020.154 H1 Value -28869.532Information Criteria Number of Free Parameters 14 Akaike (AIC) 58068.308 Bayesian (BIC) 58145.018 Sample-Size Adjusted BIC 58100.541 (n* = (n + 2) / 24)RMSEA (Root Mean Square Error Of Approximation) Estimate 0.074 90 Percent C.I. 0.067 0.082 Probability RMSEA <= .05 0.000SRMR (Standardized Root Mean Square Residual) Value 0.046MODEL RESULTS Two-Tailed Estimate S.E. Est./S.E. P-Value I ON MALE 0.242 0.199 1.216 0.224 S ON MALE 0.086 0.034 2.524 0.012Boys and girls do not differ significantly at age 12 (intercept), although boys are .242 higher on BMI than girls in this linear model. However, gender and trajectory do interact with boys rate of growth being .086 higher than it is for girls. S WITH I 0.403 0.073 5.500 0.000When there is a covariate, the mean intercept and slope appear under the intercepts heading. Intercepts BMI97 0.000 0.000 999.000 999.000 BMI98 0.000 0.000 999.000 999.000 BMI99 0.000 0.000 999.000 999.000


BMI00 0.000 0.000 999.000 999.000 BMI01 0.000 0.000 999.000 999.000 BMI02 0.000 0.000 999.000 999.000 BMI03 0.000 0.000 999.000 999.000 I 20.911 0.144 145.653 0.000 S 0.656 0.025 26.553 0.000

We cannot estimate different random effects for boys and girls on the intercept or slope using this approach. Residual Variances BMI97 5.731 0.268 21.423 0.000 BMI98 3.266 0.164 19.905 0.000 BMI99 3.223 0.146 22.018 0.000 BMI00 4.354 0.185 23.534 0.000 BMI01 2.834 0.149 18.972 0.000 BMI02 9.409 0.398 23.624 0.000 BMI03 8.626 0.424 20.360 0.000 I 15.045 0.597 25.198 0.000 S 0.253 0.018 14.175 0.000

R-SQUARE

Observed Two-Tailed Variable Estimate S.E. Est./S.E. P-Value

BMI97 0.724 0.012 59.339 0.000 BMI98 0.832 0.009 93.431 0.000 BMI99 0.846 0.008 110.551 0.000 BMI00 0.820 0.008 99.975 0.000 BMI01 0.888 0.006 138.194 0.000 BMI02 0.731 0.011 65.511 0.000 BMI03 0.772 0.011 70.133 0.000You can see why we rarely report the R-square for the intercept and slope. Latent Two-Tailed Variable Estimate S.E. Est./S.E. P-Value

I 0.001 0.002 0.608 0.543 S 0.007 0.006 1.266 0.206


8.3 Graphic representation

We see that the intercept is 20.385 and the slope is .625. How is gender related to this?

For girls the equation is:

Est. BMI = 20.911 + .656(Time) + .242(Male) + .086(Male)(Time)20.911 + .656(Time) + .242(0) + .086(0)(Time)

= 20.911 + .656(Time)

For boys the equation is:

Est BMI = 20.911 + .656(Time) + .242(1) + .086(1)(Time) = (20.911 + .242) + (.625 + .086)(Time)

= 21.153 + .711(Time)

Where Time is coded as 0, 1, 2, 3, 4, 5, 6

Using these we estimate the BMI for girls is initially 20.911. By the seventh year when she is 18(Time = 6) her estimated BMI will be 20.385 + .656(6) or 24.847

Using these results, we estimate the BMI for boys is initially 21.153. By the seventh year it will be 21.153 + .711(6) or 25.419. Since a BMI of 25 is considered overweight, by the age of 18 we estimate the average boy will be classified as overweight and the average girl is not far behind!

We could use the plots provided by Mplus, but if we wanted a nicer looking plot we could use another program. I used Stata getting this graph.

The Stata command is (this is driven by a drop down menu)

twoway (connected Girls Age, lcolor(black) lpattern(dash) /// lwidth(medthick)) (connected Boys Age, lcolor(black) /// lpattern(solid) lwidth(medthick)), /// ytitle(Body Mass Index) xtitle(Age of Adolescent) /// caption(NLSY97 Data)

and the data is

+-----------------------+


| Age Girls Boys | |-----------------------| 1. | 12 20.911 21.153 | 2. | 18 24.847 25.419 | +-----------------------+

Body Mass Index for AdolescentsComparison of Girls and Boys

Limitations of this approach

When we treat a categorical variable as a grouping variable and do multiple comparisons we can test the equality of all the parameters.

When we treat it as a predictor as in this example, we only test whether the intercept and slope are different for the two groups (interaction). In this example we do not allow the other parameters to be different for boys and girls and this might be a problem in some applications.

Section 9: Growth Curves with Time Invariant Covariates

An extension of having a single categorical predictor includes having a series of covariates that explain variance in the intercept and slope. In this example we use what are known as time invariant covariates. These are covariates that either remain constant (gender) or for which you have a measure only at the start of the study. These are some times considered fixed effects since


their value cannot change from one wave to another. It is possible to add time varying covariates as well.

9.1 A conditional latent trajectory model

This has been called the Conditional Latent Trajectory Modeling (Curran & Hussong, 2003) because your initial level and trajectory (slope) are conditional on other variables.

The covariates are moderators that moderate the initial level and trajectory.

In this figure we have two covariates. One is whether the adolescent is white (coded 1) versus African American or Hispanic

(coded 0). The other is a latent variable reflecting the level of emotional problems a youth has. There

are two indicators of emotional problems, one from a parent report, boyprb_p, and the other from a youth report, boyprb_y.

The emotional problems are problems as reported at age 12.

A researcher may predict that Whites have a lower initial BMI (intercept) which persists during adolescence, but the White advantage does not increase (same slope as nonwhites).

Alternatively, a researcher may predict that being White predicts a lower initial BMI (intercept) and less increase of the BMI (smaller slope) during adolescence.

a. This suggests that minorities start with a disadvantage (high BMI) and

b. This disadvantaged gets even greater across adolescence. A researcher may argue that emotional problems are associated with both higher initial BMI

(intercept) and a more rapid increase in BMI over time (slope). By including a covariate that is a latent variable itself, emotional problems, we will show how

these are handled by Mplus.

We estimated this model for boys only; girls were excluded.

9.2 Mplus program and output

The following is our Mplus program:

Title: bmi_timea.inp bmi growth curve using race/ethnicity and emotional

problems as a second covariate. There are two indicators of emotional problems.


Data: File is "c:\Mplus examples\bmi_stata.dat" ;Variable: Names are id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 white black hispanic asian other; Missing are all (-9999) ; Usevariables are boyprb_y boyprb_p white bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 ; Useobservations = male eq 1 and asian ne 1 and other ne 1;Model: i s q| bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5

bmi03@6; emot_prb by boyprb_p boyprb_y ; i on white emot_prb; s on white emot_prb; q on white emot_prb; Output: Sampstat Mod(3.84) standardized; Plot: Type is Plot3; Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);

I have highlighted the new lines in the Mplus program. The format of the Useobservations subcommand is similar to if or select used

with other programs. The Useobservations = male eq 1 and asian ne 1 and other ne 1;

restricts our sample to males (male eq 1). This is very handy when using the same dataset for a variety of models where you want some models to only include selected participants.

We have dropped Asians and members of the “other” category. There are relatively few of them in this sample dataset and they may have very different BMI trajectories. Also, the meaning of the category “other” is ambiguous.

I added a quadratic term in the Model: command. I first estimated this model using just a linear slope and the fit was not very good (results not shown here). Adding the quadratic improved the fit.

This example has a measurement model for a latent covariate, emot_prb. In other programs this can involve complicated programming. Here it is done with the single line. (You usually would like to have 3 and preferably 4 indicators of a latent variable.)

emot_prb by boyprb_p boyprb_y ;

The by is a key word in Mplus for creating latent variables used in Confirmatory Factor Analysis and SEM.


On the right of the by are two observed variables. The boyprb_p is the report of parents about the adolescent’s emotional problems. The boyprb_y is the youths own report.

It is desirable to have three or more indicators of a latent variable, but we only have two here so that will have to do.

To the left of the by is the name we give to the latent variable, emot_prb. This new latent variable did not appear in the list of variables we are using, but it is defined here.

The “by” term o fixes the first variable to the right as a reference indicator, boyprb_p, and assigns a

loading of 1 to it. o It lets the loading of the second variable, boyprb_y, be estimated. It also creates

error/residual variances that are labeled e1 and e2 in the figure. o The default is that these errors are uncorrelated. o It is good practice to have the strongest indicator on the right of the “by” be the

reference indicator with a loading fixed at 1.0. You can run the model and if this does not happen, you can re-run it, reversing the order of the items on the right of the “by.”

The next three new lines,

o i on white emot_prb; o s on white emot_prb; and o q on white emot_prb; o Define the relationship between the covariates and the intercept and slope. These

represent interactions of each covariate with the intercept and slope.o These are the 1wi in the equation for HLM users. o Mplus uses the on command to signify that a variable depends on another variable in

the structural part of the model. The by command is the key to understanding how Mplus sets up the measurement model and the on is the key to how Mplus sets up the structural model.

There are many defaults. Mplus assumes there are residual variances and covariances for the intercept and slopes. It fixes the intercepts at zero. It assumes the intercept and slope variances are correlated.

Here is selected results:

Mplus VERSION 5.1MUTHEN & MUTHEN07/01/2008 8:01 PM


TESTS OF MODEL FIT

Chi-Square Test of Model Fit Value 88.824 Degrees of Freedom 34 P-Value 0.0000Chi-Square Test of Model Fit for the Baseline Model Value 5975.020 Degrees of Freedom 45 P-Value 0.0000CFI/TLI CFI 0.991 TLI 0.988Loglikelihood H0 Value -17221.918 H1 Value -17177.507Information Criteria Number of Free Parameters 29 Akaike (AIC) 34501.837 Bayesian (BIC) 34640.190 Sample-Size Adjusted BIC 34548.093 (n* = (n + 2) / 24)RMSEA (Root Mean Square Error Of Approximation) Estimate 0.043 90 Percent C.I. 0.032 0.054 Probability RMSEA <= .05 0.845SRMR (Standardized Root Mean Square Residual) Value 0.031

MODEL RESULTS Two-Tailed Estimate S.E. Est./S.E. P-Value

Measurement of latent variable. EMOT_PRB BY BOYPRB_P 1.000 0.000 999.000 999.000 BOYPRB_Y 0.575 0.171 3.374 0.001Emotional problems does not have a significant effect on the initial level at age 12, but significantly increases the slope. Significant negative effect on quadratic is a bit confusing. I ON EMOT_PRB 0.300 0.168 1.793 0.073 S ON EMOT_PRB 0.212 0.089 2.370 0.018

Q ON


EMOT_PRB -0.037 0.015 -2.462 0.014Whites have a significant advantage initially (intercept), but there is not a significant compounding of this over time since White does not significantly influence the slope or quadratic. I ON WHITE -1.030 0.292 -3.529 0.000 S ON WHITE 0.130 0.138 0.941 0.346 Q ON WHITE -0.030 0.023 -1.293 0.196 S WITH I 0.701 0.329 2.128 0.033 Q WITH I -0.101 0.053 -1.922 0.055 S -0.174 0.034 -5.081 0.000 WHITE WITH EMOT_PRB -0.111 0.028 -4.013 0.000 Intercepts BOYPRB_Y 2.108 0.052 40.712 0.000 BOYPRB_P 1.893 0.058 32.668 0.000 BMI97 0.000 0.000 999.000 999.000 BMI98 0.000 0.000 999.000 999.000 BMI99 0.000 0.000 999.000 999.000 BMI00 0.000 0.000 999.000 999.000 BMI01 0.000 0.000 999.000 999.000 BMI02 0.000 0.000 999.000 999.000 BMI03 0.000 0.000 999.000 999.000 I 21.279 0.210 101.495 0.000 S 1.171 0.100 11.719 0.000 Q -0.077 0.016 -4.649 0.000The linear slope of 1.171 is huge when you project this over the six years. The quadratic slope being negative indicates that there is some leveling off in the increase in BMI. Variances EMOT_PRB 1.485 0.455 3.262 0.001

Residual Variances BOYPRB_Y 1.850 0.170 10.875 0.000 BOYPRB_P 1.203 0.442 2.722 0.006 BMI97 5.535 0.479 11.564 0.000 BMI98 3.276 0.227 14.402 0.000 BMI99 3.333 0.221 15.067 0.000 BMI00 3.182 0.218 14.586 0.000 BMI01 2.361 0.186 12.670 0.000 BMI02 5.225 0.357 14.654 0.000 BMI03 8.961 0.743 12.057 0.000 I 13.094 0.869 15.070 0.000


S 1.291 0.215 6.007 0.000 Q 0.030 0.006 5.092 0.000

STANDARDIZED MODEL RESULTS

STDYX Standardization

Two-Tailed Estimate S.E. Est./S.E. P-Value

EMOT_PRB BY BOYPRB_P 0.743 0.111 6.708 0.000 BOYPRB_Y 0.458 0.073 6.308 0.000Notice the z-tests are slightly different. Most standard packages assume the unstandardized test is the same. I ON EMOT_PRB 0.099 0.052 1.909 0.056

S ON EMOT_PRB 0.222 0.080 2.775 0.006

Q ON EMOT_PRB -0.253 0.087 -2.918 0.004

I ON WHITE -0.140 0.039 -3.558 0.000

S ON WHITE 0.056 0.059 0.942 0.346

Q ON WHITE -0.083 0.064 -1.293 0.196

S WITH I 0.170 0.090 1.891 0.059

Q WITH I -0.163 0.094 -1.731 0.083 S -0.891 0.021 -42.268 0.000

WHITE WITH EMOT_PRB -0.183 0.049 -3.748 0.000


Residual Variances BOYPRB_Y 0.790 0.067 11.864 0.000 BOYPRB_P 0.448 0.165 2.717 0.007 BMI97 0.290 0.025 11.460 0.000 BMI98 0.171 0.012 13.889 0.000 BMI99 0.151 0.011 13.794 0.000 BMI00 0.132 0.010 13.364 0.000 BMI01 0.096 0.008 11.571 0.000 BMI02 0.185 0.013 14.176 0.000 BMI03 0.269 0.022 12.260 0.000 I 0.966 0.016 62.173 0.000 S 0.952 0.034 27.829 0.000 Q 0.937 0.043 22.016 0.000

Unfortunately, we cannot get graphs when we have covariates. You could create these yourself by substituting fix values for race and emotional problems.


Section 10: Meditation & Moderation

Sometimes all of the covariates are time invariant or at least measured at just the start of the study. Curran and Hussong (2003) discuss a study of a latent growth curve on drinking problems with a covariate of parental drinking. Parental drinking influences both the initial level and the rate of growth of drinking problem behavior among adolescents. The question is whether some other variables might mediate this relationship Parental monitoring Peer influence

Mplus allows us to estimate the direct and indirect effect of Parent Drinking on the Intercept and Slope. It also provides a test of significance for these effects.

Section 11: Time Varying CovariatesGrowth Curve and Related Models, Alan C. Acock 56

We have illustrated time invariant covariates that are measured at time 1. It is possible to extend this to include time varying covariates. Time varying covariates either are measured after the process has started or have a value that changes (hours of nutrition education, level of program fidelity). Although we will not show our output, we will illustrate the use of time varying covariates in a figure. In this figure the time varying covariates, a21 to a24 might be

Hours of nutrition education completed between waves. Independent of the overall growth trajectory, η1, students who have several hours of nutrition education programming may have a decrease in their BMI

Physical education curriculum. A physical activity program might lead to reduced BMI. Students who spend more time in this physical activity program might have a lower BMI independent of the overall growth trend. Hours in physical education courses will vary from year to year.

This would be a good way to incorporate fidelity into a program evaluation.

This figure is borrowed from Muthén where he is examining growth in math performance over 4 years. The w vector contains x variables are covariates that directly influence the intercept, η0, or slope, η1. The aij are number of math courses taken each year.

yit = repeated measures on the outcome (math achievement)a1it = Time score (0, 1, 2, 3) as discussed previouslya2it = Time varying covariates (# of math courses taken that year)w = Vector of x covariates that are time invariant and measured at or before the first yit

In this example we might think of the yi variables being measures of conflict behavior where y1 is at age 17 and y4 is at age 25. We know there is a general decline in conflict behavior during this time interval. Therefore, the slope η1 is expected to be negative.

Now suppose we also have a measure of alcohol abuse for each of the 4 waves (aij). We might hypothesize that during a year in


which an adolescent has a high score on alcohol abuse (say number of days the person drinks 5 or more drinks in the last 30 days) that there will be an elevated level of conflict behavior that cannot be explained by the general decline (negative slope).

The negative slope reflects the general decline in conflict behavior by young adults as the move from age 17 to age 25. The effect of aij on yi provides the additional explanation that those years when there is a lot of drinking; there will be an elevated level of conflict that does not fit the general decline.

Section 12: Extensions and Suggested Readings

If you want more, here are a few references

b. Basic growth curve modeling

a. Bollen, K. A., & Curran, P. J. (2006). Latent Curve Models: A Structural Equation Perspective. Hoboken, NJ: Wiley.

b. Curran, F. J., & Hussong, A. M. (2003). The Use of latent Trajectory Models in Psychopathology Research. Journal of Abnormal Psychology. 112:526-544. This is a general introduction to growth curves that is accessible.

c. Duncan, T. E., Duncan, S. C., & Strycker, L. A. (2006). An Introduction to Latent Variable Growth Curve Modeling: Concepts, Issues, and Applications (2nd ed.). Mahwah NJ: Lawrence Erlbaum. The second edition of a classic text on growth curve modeling.

d. Kaplan, D. (2000). Chapter 8: Latent Growth Curve Modeling. In D. Kaplan, Structural Equation Modeling: Foundations and Extensions (pp 149-170). Thousand Oaks, CA: Sage. This is a short overview.

e. Wang, M. (2007). Profiling retirees in the retirement transition and adjustment process: Examining the longitudinal change patterns of retirees' psychological well-being. Journal of Applied Psychology, 92(2), 455-474. This is a nice example of presenting results showing some graphs and tables.

c. Limited Outcome Variables: Binary and count variables

a. Muthén, B. (1996). Growth modeling with binary responses. In A. V. Eye & C. Clogg (Eds.) Categorical Variables in Developmental Research: Methods of analysis (pp 37-54). San Diego, CA: Academic Press.


b. Long, J. S., & Freese, J. (2006). Regression Models for Categorical Dependent Variables Using Stata, 2nd ed. Stata Press (www.stata-press.com). This provides the most accessible and still rigorous treatment of how to use an interpret limited dependent variables.

c. Rabe-Hesketh, S., & Skrondal, A. (2005). Multilevel and Longitudinal Modeling Using Stata. Stata Press (www.stata-press.com). This discusses a free set of commands that can be added to Stata that will do most of what Mplus can do and some things Mplus cannot do. It is hard to use and very slow.

d. Growth mixture modeling

a. Muthén, B., & Muthén, L. K. (2000). Integrating person-centered and variable-centered analysis: Growth mixture modeling with latent trajectory classes. Alcoholism: Clinical and Experimental Research. 24:882-891.This is an excellent and accessible conceptual introduction.

b. Muthén, B. (2001). Latent variable mixture modeling. In G. Marcoulides, & R. Schumacker (Eds.) New Developments and Techniques in Structural Equation Modeling (pp. 1-34). Mahwah, NJ: Lawrence Erlbaum.

c. Muthén, B., Brown, C. H., Booil, J., Khoo, S. Yang, C. Wang, C., Kellam, S., Carlin, J., & Liao, J. (2002). General growth mixture modeling for randomized preventive interventions. Biostatistics, 3:459-475

d. Muthén, B. Latent Variable analysis: Growth Mixture Modeling and Related Techniques for Longitudinal Data. (2004) In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications

e. Muthén, B., Brown, C. H., Booil Jo, K, M., Khoo, S., Yang, C. Wang, C., Kellam, S., Carlin, J., Liao, J. (2002). General growth mixture modeling for randomized preventive interventions. Biostatistics. 3,4, pp. 459-475.

e. The web page for Mplus, www.statmodel.com , maintains a current set of references, many as PDF files. These are organized by topic and some include data and the Mplus program.


http://www.statmodel.com/

http://www.stata-press.com/

http://www.stata-press.com/

Documents

Creation of Dataset and screening programpeople.oregonstate.edu/~acock/growth/handouts/Grow… · Web viewSome may not like using the BMI on this age group, but this is only to illustrate