Upload
torie
View
102
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Growth Mixture Modeling. Shaunna L. Clark & Ryne Estabrook Advanced Genetic Epidemiology Statistical Workshop October 24, 2012. Outline. Growth Mixture Model Regime Switching Other Longitudinal Mixture Models OpenMx GMM How to extend GMM to FMM - PowerPoint PPT Presentation
Citation preview
1
GROWTH MIXTURE MODELINGShaunna L. Clark & Ryne EstabrookAdvanced Genetic Epidemiology Statistical WorkshopOctober 24, 2012
2
OUTLINEGrowth Mixture ModelRegime SwitchingOther Longitudinal Mixture ModelsOpenMx
GMMHow to extend GMM to FMMHow to get individual class probabilities from OpenMx
Exercise
3
HOMOGENEITY VS. HETEROGENEITY Previous session showed a growth model where
everyone follows the same mean trajectory of use With some individual variations
Is this an accurate representation of the development of substance abuse\dependence? Probably Not
12 14 16 18 21 240
5
10
15
20
25
Age
Num
ber
of D
rinks
Per
Wee
k
4
GROWTH MIXTURE MODELING (GMM) Muthén & Shedden, 1999; Muthén, 2001 Setting
A single item measured repeatedly Example: Number of substances currently using
Hypothesized trajectory classes Non-users; Early initiate; Late, but consistent use
Individual trajectory variation within class Aims
Estimate trajectory shapes Linear, quadratic, etc.
Estimate trajectory class probabilities Proportion of sample in each trajectory class
Estimate variation within class
5
LINEAR GROWTH MODEL DIAGRAM
σ2Slope
xT1 xT2 xT3 xT4 xT5
1I SmInt mSlope
σ2Int
σ2Int,Slope
1 1 111 0 1 2 3 4
σ2ε1 σ2
ε2 σ2ε3 σ2
ε4 σ2ε5
6
LINEAR GMM MODEL DIAGRAM
σ2ε1 σ2
ε2 σ2ε3 σ2
ε4 σ2ε5
xT1 xT2 xT3 xT4 xT5
1I S
C
mInt mSlope
σ2Slopeσ2
Intσ2
Int,Slope
1 1 111 0 1 2 3 4
7
GMM EXAMPLE PROFILE PLOT
8
GMM EXAMPLE PROFILE PLOT
9
GROWTH MIXTURE MODEL EQUATIONS
xitk = Interceptik + λtk*Slopeik + εitk
Interceptik = α0k + ζ0ik
Slopeik = α1k + ζ1ik
for individual i at time t in class kεitk ~ N(0,σ)
10
LATENT CLASS GROWTH MODEL (LCGA) VS. GMM
Nagin, 1999; Nagin & Tremblay, 1999
Same as GMM except no residual variance on growth factors No individual variation
within class Everyone has the same
trajectory LCGA is a special case
of GMMxT1 xT2 xT3 xT4 xT5
1I S
C
mInt mSlope
σ2Int,Slope
1 1 111 0 1 2 3 4
11
CLASS ENUMERATION Still cannot use LRT χ2
Information Criteria: AIC (Akaike, 1974), BIC (Schwartz,1978) Penalize for number of parameters and sample
size Model with lowest value
Interpretation and usefulness Profile plot Substantive theory Predictive validity Size of classes
12
ANALYSIS PLAN Determine growth function Determine number of classes Examine mean plots, with and without
individual trajectories Determine if growth factor variances need:
1. To be different from zero (GMM vs. LCGA)2. Should be held equal across classes
Add covariates and distal outcomes
13
MODELING ZERO
14
HOW DO I MODEL ZEROS? Particularly relevant for substance abuse (or
other outcome with floor effects) to model non-users
Some outcomes are right skewed so that there are many low values of the dependent variable
However, some outcomes may have more zero’s than expected Example: Alcohol consumption; Individuals who
never drink These individuals will always respond that
consumed zero drinks
15
WHEN YOU HAVE MORE ZERO’S THAN EXPECTED In this case, zeros can be thought of coming
from two populations1. Structural Zeros – zeros always occur in this
population Example: Never drinkers
2. Others who produce zero with some probability at the time of measurement
Example: Occasional drinkers
16
ONE OPTION Identify those individuals in the two
populations Structural zeros can then be eliminated Those who could potentially produce zeros are
retained But it can very difficult to tell the difference
between the two Or the population of interest is the entire
population i.e. both drinkers and non-drinkers Stem issue
17
ZERO-CLASS Consider what you mean by a zero
Only non-users who have not initiated use or those have initiated but only one try?
Fix growth factor mean to zero Start not using, stay not using If only fix the means it will not be a pure zero-
class Likely to pick up people that have tried once or twice,
but have not moved to regular use Fix growth factor means and (co)variance to
zero No variance in group Sometimes can cause computation issues
18
REGIME SWITCHING
19
IS GMM A GOOD MODEL FOR SUBSTANCE USE DEVELOPMENT? Maybe not Assumes that individuals remain in same
trajectory over timeOnce a heavy smoker always a heavy
smoker, even if you successfully quit for a period
May not hold with many substance use outcomesExamples: Switching from moderate to
heavy drinking, changing from daily smoker to non-smoker
20
INDIVIDUAL TRAJECTORY PLOTS
Dolan et al. (2005) presented the regime switching model (RSM) a way to get traction on this issue
21
DOLAN ET AL. REGIME SWITCHING MODEL (RSM) Regime = latent trajectory class
Ex: habitual moderate drinkers, heavy drinkers
Regime Switch = move from one regime to another Ex: A switch from moderate to heavy
drinking Used latent markov modeling for normally
distributed outcomes (Schmittmann et al., 2005)
22
RSM WITH ORDINAL DATA Dolan RSM model was designed to be used
with normally distributed data Substance abuse measures are often:
If continuous, not normally distributed Count
Ex: # of drinking using days per month Categorical
Ex: Do you use X substance? As we’ve seen in previous talks, can use the
Mehta, Neale and Flay (2004) method when we have ordinal data
23
APPLICATION:ADOLESCENT DRINKING From Dolan et al. paper Data: National Longitudinal Survey Youth
(NLSY97) Years 1998, 1999, 2000, 2001 737 white males and females Age 13 or 14 in 1998 Indicated the regularly drank alcohol
Outcome: “In the past 30 days, on days you drank, how much did you drink?”
Made ordinal: 0= 0-2 drinks; 1= 3 drinks; 2= 4-6 drinks, 3= 7+ drinks
24
MODEL SIMPLIFICATIONS FOR GMM & RSM APPLICATION Assumed linear model
Really quadratic No correlation between intercept and slope
Where you start drinking at the beginning of the study does not influence how your drinking develops during the study
Transition probabilities equivalent across time
Probability of drinking between age 12-13 are the same as 20-21
25
COMPARING GMM AND RSM
Model -2*LL np AIC BIC saBIC3-ClassGMM
-5077 18 995 -4199 -958
3-ClassRSM
-4589 26 990 -4183 -955
26
3-CLASS GMM PROFILE PLOT
Series1
-2
0
2
4
6
8
10
12
Growing-72%
Moderate-18%
Low-10%
27
3-CLASS RSM PROFILE PLOT
0 1 2 3
-2
0
2
4
6
8
10
12
Moderate-12%High-10%Low-77%
28
GSM & RSM COMBINED PROFILE PLOT
0 1 2 3
-2
0
2
4
6
8
10
12
RSW-ModerateRSW-HighRSW-LowGMM-GrowingGMM-ModerateGMM-Low
29
RSM TRANSITION PROBABILITIES
Likely to stay in same class
Low class unlikely to switch to other classes
Most likely to switch between moderate and high drinking classes
Class Low Moderate HeavyLow 0.74 0.01 0.04Moderate 0.17 0.67 0.22Heavy 0.09 0.32 0.74
30
OTHER LONGITUDINAL MIXTURE MODELS Longitudinal Latent Class Analysis
Models patterns of change over time, rather than functional growth form
Lanza & Collins, 2006; Feldman et al., 2009
LCA LCGABinary item 35 113 category item 68 12
11 variables3 ClassesQuadratic
31
LATENT TRANSITION ANALYSIS
x1 x2 x3 x4 x5
C1
x1 x2 x3 x4 x5
C2
• Models transition from one state to another over time• Unlike RSM, do not impost growth structure
• Ex: Drinking alcohol or not over time• Graham et al., 1991; Nylund et al., 2006• Script on the OpenMx forum
Time 1 Time 2
32
OTHER LONGITUDINAL MIXTURE MODELS Survival Mixture
Multiple latent classes of individuals with different survival functions Ex: Different groups based on age of initiation
Kaplan, 2004; Masyn, 2003; Muthén & Masyn, 2005
33
OPENMX:GMM EXAMPLEGMM_example.R
2 ClassesIntercept and Slope
34
MAKE OBJECTS FOR THINGS WE WILL REFERENCE THROUGHOUT THE SCRIPT#Number of measurement occasions
nocc <- 4#Number of growth factors (intercept, slope)
nfac <- 2#Number of classes
nclass <- 2#Number of thresholds; 1 minus categories of variable
nthresh <- 3
#Function that will help us label our thresholds labFun <- function(name="matrix",nrow=1,ncol=1){matlab <-
matrix(paste(rep(name, each=nrow*ncol), rep(rep(1:nrow),ncol), rep(1:ncol,each=nrow),sep="_"))return(matlab)}
35
SETTING UP THE GROWTH PART OF THE MODEL
#Factor Loadingslamda <- mxMatrix("Full", nrow = nocc, nco l= nfac,
values = c(rep(1,nocc),0:(nocc-1)),name ="lambda")#Factor Variancesphi <-mxMatrix("Diag", nrow = nfac, ncol = nfac,
free = TRUE,labels = c("vi", "vs"), name ="phi")#Error termstheta <-mxMatrix("Diag", nrow = nocc, ncol = nocc,
free = TRUE,labels = paste("theta",1:nocc,sep = ""), values = 1,name ="theta")
#Factor Meansalpha <- mxMatrix("Full", nrow= 1, ncol = nfac, free = TRUE,
labels = c("mi", "ms"), name ="alpha")
36
GROWTH PART CONT’D#Item Thresholdsthresh <- mxMatrix(type="Full", nrow=nthresh, ncol=nocc,
free=rep(c(F,F,T),nocc), values=rep(c(0,1,1.1),nocc), lbound=.0001,labels=labFun("th",nthresh,nocc),name="thresh")
cov <-mxAlgebra(lambda %*% phi %*% t(lambda) + theta, name="cov")
mean <-mxAlgebra(alpha %*% t(lambda), name="mean”
obj<-mxFIMLObjective("cov", "mean", dimnames=names(ordgsmsData), threshold="thresh",vector=TRUE)
lgc <- mxModel("LGC", lamda, phi, theta, alpha, thresh, cov, mean, obj)
37
CLASS-SPECIFIC MODELclass1 <- mxModel(lgc, name ="Class1")
class1 <- omxSetParameters(class1,labels = c("vi", "vs", "mi", "ms"),values = c(0.01, 0.05, 0.14, 0.32),newlabels = c("vi1", "vs1", "mi1", "ms1"))
As in LCA, repeat for all your latent classes. Just make sure to change the class number and starting values accordingly.
38
CLASS PROPORTIONS#Fixing one probability to 1classP <- mxMatrix("Full", nrow = nclass, ncol = 1, free =
c(TRUE, FALSE), values = 1, lbound = 0.001, labels = c("p1", "p2"), name="Props")
# rescale the class proportion matrix into a class probability matrix by dividing by their sum
# (done with a kronecker product of the class proportions and 1/sum)
classS <- mxAlgebra(Props%x%(1/sum(Props)), name ="classProbs")
39
CLASS-SPECIFIC OBJECTIVES# weighted by the class probabilitiessumll<-mxAlgebra(-2*sum(log(
classProbs[1,1]%x%Class1.objective + classProbs[2,1]%x%Class2.objective)),name = "sumll")
# make an mxAlgebraObjectiveobj <- mxAlgebraObjective("sumll")
40
FINISH IT OFF# put it all in a model gmm <- mxModel("GMM 2 Class",
mxData(observed = ordgsmsData, type ="raw”), class1, class2, classP, classS, sumll, obj)
# run itgmmFit <- mxRun(gmm, unsafe = TRUE)
# run it again using starting values from previous runsummary(gmmFit2 <- mxRun(gmmFit))
41
DIFFERENCE BETWEEN GMM AND FMM?
σ2Int
xT1 xT2 xT3 xT4 xT5
1I
C
1 1 111
x1 x2 x3 x4 x5
C
Fσ2
F
Factor Mixture Model Intercept Only Growth Mixture Model
1
42
GMM AND FMM The difference between the two models
shown on the previous slide is that the factor loadings are restricted to 1 in the GMM where in the FMM they are freely estimated
Adjust the script by having letting the values of the lambda matrix be freely estimated
To run the FMM on the previous page, similar to factor analysis, need to fix a parameter
so the model is identified Restrict the mean of two of the factors in two
class to set the metric of the factor
43
FMM & MEASUREMENT INVARIANCE Clark et al. (In Press) In previous version, the threshold of the
items were measurement invariant across classes Classes were differentiated based on difference
in the mean and variances of the factor Can also have models where there are
measurement non-invariant thresholds Classes arising because of difference in item
thresholds Add thresholds to class-specific statements Need to restrict the factor mean to zero because
can’t identify factor mean and item thresholds
44
HOW DO WE EXTRACT CLASS PROBABILITIES AND CALCULATE ENTROPY IN OPENMXRyne Estabrook
45
OPEN MX EXERCISE\HOMEWORK Adjust the GMM_example.R script to include:
A quadratic growth function A third class
Run it Re-run it
Interpret the output What are the classes?