A measure to evaluate latent variablemodel fit by sensitivity analysis
Daniel Oberski
Department of methodology and statistics
Dept of Statistics, Leiden University
Latent variable model fit by sensitivity analysis Daniel Oberski
Latent variable modelsWhat do they assume and what are they good for?
Latent variable model fit by sensitivity analysis Daniel Oberski
ξ
y1 y2 yJ...
p(y) =∑ξ
p(ξ)J∏
j=1
p(yj|ξ)
Latent variable model fit by sensitivity analysis Daniel Oberski
ξ
y1 y2 yJ...
p(y) =∑ξ
p(ξ)p(y1, y2|ξ)J∏
j=3
p(yj|ξ)
Latent variable model fit by sensitivity analysis Daniel Oberski
Example
Goal: estimate false positives and false negatives in fourdiagnostic tests for C. Trachomatis infection:
y1 Ligase chain reaction (LCR) test (Yes/No);y2 Polymerase chain reaction (PCR) test (Yes/No);y3 DNA probe test (DNAP) (Yes/No);y4 Culture (CULT) (Yes/No).
Tool: 2-latent class model (diseased or non-diseased).
(Original data from Dendukuri et al. 2009)
Latent variable model fit by sensitivity analysis Daniel Oberski
Assume:ξ
y1 y2 yJ...
But really:ξ
y1 y2 yJ...
What difference does it make for the goal: false positives andfalse negatives? (simulation by Van Smeden et al., submitted)
Latent variable model fit by sensitivity analysis Daniel Oberski
ξ
y1 y2 yJ...
x
p(y) =∑ξ
p(ξ|x)J∏
j=1
p(yj|ξ)
Latent variable model fit by sensitivity analysis Daniel Oberski
ξ
y1 y2 yJ...
x
p(y) =∑ξ
p(ξ|x)J∏
j=1
p(yj|ξ, x)
Latent variable model fit by sensitivity analysis Daniel Oberski
ExampleGoal: Estimate gender differences in ”valuing Stimulation”:
(1) Very much like me; (2) Like me; (3) Somewhat like me; (4) Alittle like me; (5) Not like me; (6) Not like me at all.
impdiff (S)he looks for adventures and likes to take risks.(S)he wants to have an exciting life.
impadv (S)he likes surprises and is always looking for newthings to do. He thinks it is important to do lots ofdifferent things in life.
Tool: Structural Equation Model for European Social Survey data(n = 18519 men and 16740 women).(Original study by Schwarz et al. 2005)
Latent variable model fit by sensitivity analysis Daniel Oberski
Assume:ξ
y1 y2 yJ...
x
But really (?):ξ
y1 y2 yJ...
x
What difference does it make for the goal: true genderdifferences in values? (re-analysis of data by Oberski 2014)
●
●●
●
●
●
●
●
Men value moreWomen value more
−0.2
0.0
0.2
ACPO ST SD HE COTR SE UN BE"Human value" factor
Late
nt m
ean
diffe
renc
e es
timat
e ±
2 s.
e.
Model
● Scalar invariance
Free intercept 'Adventure'
Latent variable model fit by sensitivity analysis Daniel Oberski
PROBLEM
The original authors found that the conditional independencemodel fit the data ”approximately” (p. 1013)...
”Chi-square deteriorated significantly, ∆χ2(19) = 3313,p < .001, but CFI did not change. Change in chi-square ishighly sensitive with large sample sizes and complex models.The other indices suggested that scalar invariance might beaccepted (CFI = .88, RMSEA = .04, CI = .039.040, PCLOSE= 1.0).”
... but unfortunately this ”acceptable” misspecification couldreverse their conclusions!
Latent variable model fit by sensitivity analysis Daniel Oberski
Numbers that indicate how well the model fits the data• Likelihood Ratio vs. saturated• Information-based criteria: AIC, BIC, CAIC, ...• Bivariate residuals (Maydeu & Joe 2005; Oberski, Van Kollenburg &
Vermunt 2013)
• Score/Lagrange multiplier tests, “modification index”,“expected parameter change” (EPC) (Saris, Satorra & Sörbom1989; Oberski & Vermunt 2013; Oberski & Vermunt accepted)
“Fit indices”:• RMSEA:
√(χ2/df)−1)
N−1
• CFI:[(χ2
null − dfnull)− (χ2 − df)]/(χ2
null − dfnull)
• Lots of others: TLI, NFI, NNFI, RFI, IFI, RNI, RMR,SRMR1-3, GFI, AGFI, MFI, ECVI, ...
Latent variable model fit by sensitivity analysis Daniel Oberski
What is the problem?• We do latent variable modeling with a goal in mind.• But the latent variable model might be misspecified.• The appropriate question: ”will that affect my goal?”• The actual question: ”do the data fit the model in the
population” (LR) or ”are the model and the data far apartrelative to model complexity” (RMSEA etc.)
What is the solution?
Evaluate directly what effect possible misspecificationshave on the goal of the analysis.
Latent variable model fit by sensitivity analysis Daniel Oberski
How to evaluate directly what effect possible misspecificationshave on the goal of the analysis.
Latent variable model fit by sensitivity analysis Daniel Oberski
Two ideas to evaluate the effect of misspecifications
1 Try out all possible models with misspecifications, calculatethe estimates of interest under these models and evaluatewhether these are substantively different.Advantage: Does the job.Disadvantage: There may be too many alternative models.Also: are applied researchers really going to do this?
2 Use EPC-interest: expected change in free parametersAdvantage: Does the job without the need to estimate anyalternative models.Disadvantage: Is an approximation (though a reasonableone).
Latent variable model fit by sensitivity analysis Daniel Oberski
EPC-interest applied to Stimulation example
• After fitting the full scalar invariance model,• Effect size estimate of sex difference in Stimulation is +0.214
(s.e. 0.0139).• But EPC-interest of equal ”Adventure” item intercept is
-0.243.• So EPC-interest suggests conclusion can be reversed by
freeing a misspecified scalar invariance restriction• Actual change when freeing this intercept is very close to
EPC-interest: -0.235.
Latent variable model fit by sensitivity analysis Daniel Oberski
EPC-interestHow does it work?
Latent variable model fit by sensitivity analysis Daniel Oberski
• Let’s say there is a restricted model whose purpose it is toestimate its parameters, θ, or some linear function of themsuch as a subselection, Pθ.
• We could parameterize these restrictions as ψ = 0.For example: ψ could be direct effect of gender on”Adventure”, or loglinear dependence between DNA tests.
• The maximum likelihood estimates are thenθ = arg max L(θ,ψ = 0)Question: How much would θ change if we freed ψ?
Latent variable model fit by sensitivity analysis Daniel Oberski
How much would θ change if we freed ψ?
The trick is to consider estimate of θ we would get under ψ = 0;that is, θ = arg max L(θ,ψ).
As it turns out, we don’t actually need θ, since
θ − θ = H−1θθ HθψD−1
[∂L(θ,ψ)
∂ψ
∣∣∣∣θ=θ
]+O(δ′δ),
where H is a Hessian, D = Hψψ − H′θψH−1
θθ Hθψ and δ is the”overall wrongness” of the model (ψ′,θ′ − θ′)′.
Latent variable model fit by sensitivity analysis Daniel Oberski
How much would θ change if we freed ψ?
Dropping the approximation term (assuming the modelparameters are not ”too far” from the truth) we get theapproximation
EPC-interest = −PH−1θθ Hθψ EPC-self ≈ −PH−1
θθ Hθψ
(ψ − ψ
)
For those of you familiar with Structural Equation Modeling (orattending my 2013 MBC2 talk), ”EPC-self” is the usual ”expectedparameter change” in the fixed parameter vector, i.e. the size ofthe misspecification.
Latent variable model fit by sensitivity analysis Daniel Oberski
Monte Carlo simulation: EPC-interest is a goodapproximation to the actual change in parameters ofinterest when freeing equality restriction
Average over 200 replications∆ν1 ng EPC-self ∆α ∆α bias EPC-interest EPC-interest bias0.1 50 0.064 0.240 -0.040 -0.034 0.0050.3 50 0.213 0.313 -0.113 -0.113 -0.0010.8 50 0.657 0.505 -0.305 -0.401 -0.0960.1 100 0.058 0.231 -0.031 -0.031 0.0000.3 100 0.203 0.323 -0.123 -0.109 0.0140.8 100 0.619 0.492 -0.292 -0.370 -0.0770.1 500 0.063 0.233 -0.033 -0.033 0.0000.3 500 0.208 0.307 -0.107 -0.112 -0.0050.8 500 0.598 0.501 -0.301 -0.349 -0.048
Latent variable model fit by sensitivity analysis Daniel Oberski
Another example showcasing EPC-interest
Latent variable model fit by sensitivity analysis Daniel Oberski
Ranking data in 48 WVS countries
Option # M/P Value wordingSet A
1. M A high level of economic growth2. M Making sure this country has strong defense forces3. P Seeing that people have more say about how things are done at
their jobs and in their communities4. P Trying to make our cities and countryside more beautiful
Set B1. M Maintaining order in the nation2. P Giving people more say in important government decisions3. M Fighting rising prices4. P Protecting freedom of speech
Set C1. M A stable economy2. P Progress toward a less impersonal and more humane society3. P Progress toward a society in which ideas count more than money4. M The fight against crime
Latent variable model fit by sensitivity analysis Daniel Oberski
Figure: Graphical representation of the multilevel latent class regressionmodel for (post)materialism measured by three partial ranking tasks.Observed variables are shown in rectangles while unobserved (“latent”)variables are shown in ellipses.
Latent variable model fit by sensitivity analysis Daniel Oberski
Latent class ranking model with 4 choices
Each ranking set, for example, set A:
P(A1ic = a1,A2ic = a2|Xic = x) = ωa1x∑k ωkx
ωa2x∑k=a1 ωkx
,
where ωkx is the “utility” of object k for respondents in class x.Multilevel structure to account for the countries using group classvariable G:
P(Xic = x|Z1ic = z1ic,Z2ic = z2,Gc = g) =
=exp(αx + γ1xz1 + γ2xz2 + βgx)∑t exp(αt + γ1tz1 + γ2tz2 ++βtg)
,
Latent variable model fit by sensitivity analysis Daniel Oberski
Multilevel latent class model w/ covariates for rankings
L(θ) = P(A1,A2,B1,B2,C1,C2|Z1,Z2) =C∏
c=1
∑G
P(Gc)nc∏i=1
∑X
P(Xic|Z1ic,Z2ic,Gc)×
P(A1ic,A2ic|Xic)P(B1ic,B2ic|Xic)P(C1ic,C2ic|Xic),
Goal: estimate γ (especially its sign).Possible problem: Violations of scalar and metricmeasurement invariance (DIF), parameterized respectively asτ∗ and λ∗.Solution: See if these matter for the sign of γ.
Latent variable model fit by sensitivity analysis Daniel Oberski
Table: Full invariance multilevel latent class model: parameter estimatesof interest with standard errors (columns 3 and 4), as well as expectedchange in these parameters measured by the EPC-interest whenfreeing each of six sets of possible misspecifications (columns 5–10).
EPC-interest for...τ∗jkg λ∗
jkxgEstimates Ranking task Ranking task
Est. s.e. 1 2 3 1 2 3Class 1 GDP -0.035 (0.007) -0.013 0.021 -0.002 0.073 0.252 0.005Class 2 GDP -0.198 (0.012) -0.018 -0.035 0.015 -0.163 -0.058 0.002
Class 1 Women 0.013 (0.001) -0.006 0.002 0.000 -0.003 0.029 0.002Class 2 Women -0.037 (0.001) 0.007 -0.003 0.002 -0.006 -0.013 0.002
Latent variable model fit by sensitivity analysis Daniel Oberski
Table: Partially invariant multilevel latent class model: parameterestimates of interest with standard errors (columns 3 and 4), as well asexpected change in these parameters measured by the EPC-interestwhen freeing each of four sets of remaining possible misspecifications(columns 5–7 and 10).
EPC-interest for non-invariance of...τ∗kg λ∗
kxgRanking task Ranking task
Est. s.e. 1 2 3 1 2 3Class 1 GDP -0.127 (0.008) -0.015 -0.003 0.002 0.097Class 2 GDP 0.057 (0.011) -0.043 -0.013 0.002 0.161
Class 1 Women 0.008 (0.001) -0.002 0.000 0.002 0.001Class 2 Women 0.020 (0.001) -0.007 -0.001 0.002 0.007
Latent variable model fit by sensitivity analysis Daniel Oberski
Mixed
Postmaterialist
Materialist
Mixed
Postmaterialist
Materialist
% Women in parliament GDP per capita
0.2
0.4
0.6
Minimum Maximum Minimum MaximumCovariate level
Pro
babi
lity
of C
lass
Figure: Estimated probability of choosing each class as a function of thecovariates of interest under the final model.
Latent variable model fit by sensitivity analysis Daniel Oberski
ARM
AUS
AZE
BLR
CHLCHNCOL
CYP
DEU
DZA
ECUEGY ESPEST
GHA
IRQ
JOR
JPN
KAZKGZ
KOR
LBN
MAR
MEX
MYSNGA
NLDNZL
PAK
PER
PHL
POLQAT ROU
RUSRWA
SGPSVN
SWE
TTO
TUN
TUR
UKR
URY
USA
UZB
YEM
ZWE
ARM
AUS
AZE
BLR
CHL
CHN
COL
CYP
DEU
DZA
ECU
EGY
ESPESTGHA
IRQJOR
JPN
KAZKGZ
KOR
LBN MAR
MEXMYSNGA NLD
NZL
PAK
PERPHLPOL
QAT
ROU
RUSRWA
SGP
SVN
SWE
TTO
TUN
TUR
UKR
URY
USA
UZB
YEM ZWE
ARM
AUS
AZE
BLR
CHL
CHN
COL
CYP
DEU
DZA
ECU
EGY
ESPEST
GHA
IRQ
JOR
JPN KAZKGZ
KOR
LBN
MAR
MEX
MYSNGA
NLDNZL
PAK
PER
PHL
POL
QAT
ROU
RUS RWASGP
SVN
SWE
TTOTUN
TUR
UKRURY
USA
UZB
YEM
ZWE
Class 1("Materialist")
Class 2 ("Postmaterialist")
Class 3("Mixed")
0.0
0.2
0.4
0.6
0.8
0 20 40 0 20 40 0 20 40% Women in Parliament
Cla
ss p
oste
rior
ARM
AUS
AZE
BLR
CHLCHNCOL
CYP
DEU
DZA
ECUEGY ESPEST
GHA
IRQ
JOR
JPN
KAZKGZ
KOR
LBN
MAR
MEX
MYSNGA
NLDNZL
PAK
PER
PHL
POL QATROU
RUSRWA
SGPSVN
SWE
TTO
TUN
TUR
UKR
URY
USA
UZB
YEM
ZWE
ARM
AUS
AZE
BLR
CHL
CHN
COL
CYP
DEU
DZA
ECU
EGY
ESPESTGHA
IRQJOR
JPN
KAZKGZ
KOR
LBNMAR
MEXMYSNGA NLD
NZL
PAK
PERPHL POL
QAT
ROU
RUSRWA
SGP
SVN
SWE
TTO
TUN
TUR
UKR
URY
USA
UZB
YEMZWE
ARM
AUS
AZE
BLR
CHL
CHN
COL
CYP
DEU
DZA
ECU
EGY
ESPEST
GHA
IRQ
JOR
JPNKAZKGZ
KOR
LBN
MAR
MEX
MYSNGA
NLDNZL
PAK
PER
PHL
POL
QAT
ROU
RUSRWA SGP
SVN
SWE
TTOTUN
TUR
UKRURY
USA
UZB
YEM
ZWE
Class 1("Materialist")
Class 2 ("Postmaterialist")
Class 3("Mixed")
0.0
0.2
0.4
0.6
0.8
7 8 9 10 11 7 8 9 10 11 7 8 9 10 11Ln(GDP per capita)
Cla
ss p
oste
rior
Latent variable model fit by sensitivity analysis Daniel Oberski
What has been gained by using EPC-interest:
I am fairly confident here that there truly is ”approximatemeasurement invariance”, in the sense that any violations ofmeasurement invariance do not bias the primary conclusions.
I think attaining this goal is the main purpose of model fitevaluation.
Latent variable model fit by sensitivity analysis Daniel Oberski
Conclusion
Latent variable model fit by sensitivity analysis Daniel Oberski
Conclusion
• Latent variable modeling is often performed for a purpose;• Model fit evaluation should then be done for the reason that
violations of assumptions can disturb this purpose.
• Introduced the EPC-interest to look into this;• Evaluates the change in the parameter(s) of interest that
would result if a restriction is freed that parameterizes apotential violation of assumptions.
Latent variable model fit by sensitivity analysis Daniel Oberski
Implemented in SEM software lavaan for R:
Oberski (2014). Evaluating Sensitivity of Parameters of Interest to MeasurementInvariance in Latent Variable Models. Political Analysis, 22 (1).
Implemented in LCA software Latent Gold:
Oberski, Vermunt & Moors (submitted). Evaluating measurement invariance incategorical data latent variable models with the EPC-interest. Underreview.
Oberski & Vermunt (2014). A model-based approach to goodness-of-fitevaluation in item response theory. Measurement, 11, 117–122.
Nagelkerke, Oberski, & Vermunt (accepted). ”Goodness-of-fit of MultilevelLatent Class Models for Categorical Data”. Sociological Methodology.
Oberski & Vermunt (conditionally accepted). ”The Expected Parameter Change(EPC) for Local Dependence Assessment in Binary Data Latent ClassModels”. Psychometrika.
Latent variable model fit by sensitivity analysis Daniel Oberski
Thank you for your attention!
Daniel [email protected] http://daob.nl/publications for full texts & code
Latent variable model fit by sensitivity analysis Daniel Oberski
SEM regression coefficient example
European Sociological Review 2008, 24(5), 583–599Latent variable model fit by sensitivity analysis Daniel Oberski
SEM regression coefficient example
Conservation Self−transcendence
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
SwedenDanmark
AustriaSwitzerlandNetherlands
GermanyIrelandSpain
NorwayHungaryFinland
PortugalFrance
BelgiumSlovenia
United KingdomGreece
Czech RepublicPoland
SwedenDanmark
AustriaSwitzerlandNetherlands
GermanyIrelandSpain
NorwayHungaryFinland
PortugalFrance
BelgiumSlovenia
United KingdomGreece
Czech RepublicPoland
ALLO
WN
OC
ON
D
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0Regression coefficient
Latent variable model fit by sensitivity analysis Daniel Oberski
SEM regression coefficient example
EPC-interest statistics of at least 0.1 in absolute value withrespect to the latent variable regression coefficients.
Metric invariance (loading) restriction“Conditions → Work skills” in...
Slovenia France Hungary IrelandEPC-interest w.r.t.:Conditions →
Self-transcendence -0.073 -0.092 -0.067 0.073Conservation 0.144 0.139 0.123 -0.113
SEPC-self 0.610 0.692 0.759 -0.514
Latent variable model fit by sensitivity analysis Daniel Oberski
SEM regression coefficient example
What has been gained by using EPC-interest
• Full metric invariance model: ”close fit”;• EPC-interest still detects threats to cross-country
comparisons of regression coefficients;• MI and EPC-self do not detect these particular
misspecifications;• MI and EPC-self detect other misspecifications;• Looking at EPC-interest reveals that these do not affect the
cross-country comparisons of regression coefficients.
Latent variable model fit by sensitivity analysis Daniel Oberski