Upload
araksun
View
50
Download
0
Tags:
Embed Size (px)
Citation preview
CONFIRMATORYFACTOR ANALYSIS: ONE FACTOR MODELS
PSYC 5130 Week 4 September 22, 2009
G
c
e1
1
1
e
e2
1
m
e3
1
p
e4
1
Conceptual Nature of Latent Variables Latent variables correspond to some
type of hypothetical construct Not measured directly
Require a specific operational definition Indicators of the construct need to be
selected Data from the indicators must be
consistent with certain predictions (e.g., moderately correlated with one another)
Why use latent variables? Brown 1-2, Kline 70-74
Measurement error is inevitable and we must deal with it.◦ Advantageous because latent variables are
less influenced by measurement error. Latent variables are the “real” variables,
not measured variables. Cannot use factor scores because they
still have measurement error. Ability to use latent variables is the
primary strength of SEM
Multi-Indicator Approach
Reduces the overall effect of measurement error of any individual observed variable on the accuracy of the results
We distinguish between observed variables (indicators) and underlying latent variables or factors (constructs)
measurement model: observed variables and the latent variables
LatentVariable
ObservedVariable 1
ErrorVar1
1
1
ObservedVariable 2
ErrorVar2
1
ObservedVariable 3
ErrorVar3
1
Constructof interest
VerbalAbilities
VocabularyTest
ErrorVT
1
1
AnalogiesTest
ErrorAT
1
WritingSample
ErrorST
1
Exploratory Factor Analysis (EFA)
EFA also has to do with the “latent” structure of a set of variables a set of variables are entered into analysis based on the correlations among these
variables, factors are extracted each factor represents a linear combination
meant to account for as much of the variance in the set of entered variables as possible
In EFA, no a priori specification of how many latent factors or about how measures relate to these factors
Exploratory Factor Analysis (EFA)
Problems Communality: must know the
communality before estimation, but communality is a function of the loadings
Number of factors Rotation: when there are two or
more factors, the solution is not unique
Exploratory Factor Analysis (EFA)
Principal Components Communality is set to 1 Factor is defined as the sum of the
variables Loadings chose to maximize the
explanation of the variances of the measures
Loadings are usually too “high” in that the predicted correlations are larger than the observed correlations
Exploratory Factor Analysis (EFA)
Principal Factors Communality iteratively estimated Factor is a latent variable Loadings chose to:
to maximize the explanation of the correlations between the measures.
minimizes the sum of squared residuals residual = observed correlation minus
predicted correlation
Exploratory Factor Analysis (EFA)
Maximum Likelihood Solution is iteratively estimated Factor is a latent variable Loadings chosen to maximize the
explanation of the correlations between the measures
tries harder to explain the larger correlations
Statistical tests available
Exploratory Factor Analysis (EFA)
Example: Bollen.sps
Variable Principal Components Principal Axis Maximum Likelihood
Overall .905 .877 .816
Clear .865 .811 .862
Color .921 .918 .938
Odor .805 .710 .684
Note that Principal Components loadings generally larger than the other methods. PA and ML fairly similar.
EFA vs. CFA
EFA is useful when the researcher does not know how many factors there are or when it is uncertain what measures load on what factors
EFA typically used as a “data reduction” strategy
Both EFA and CFA reduce a larger number of observed variables into a smaller number of latent factors
However, EFA is done with little a priori hypothesis; CFA requires a priori specification based on hypothesis
Assumptions of CFA
Multivariate normality Sufficient sample size Correct model specification Sampling Assumptions- Simple random
sample
Confirmatory Factor Analysis
Representation in SEM Latent variable represented by a circle Measured variables (indicators)
represented by a square Each indicator variable has an error term
CFA Initial Specification Each variable loads on one and only one
factor Factors can (and typically are) correlated Errors across indicator variables are
independent Assumptions
The factors are uncorrelated with the measurement errors
Most (if not all) of the errors of different indicators are uncorrelated with each other
LatentVariable
ObservedVariable 1
ErrorVar1
1
1
ObservedVariable 2
ErrorVar2
1
ObservedVariable 3
ErrorVar3
1
Residuals In CFA
Item level residuals are represented as latent variables. They are not called disturbances. They represent measurement error in EFA/CTT sense.
This is a tremendous advantage in hybrid models, which combine CFA and path models, because it separates measurement error from error in the model.
Errors vs. Disturbances
Are both residuals. Both necessary in their respective roles. Errors always represent measurement
error. Disturbances
Represent omitted variables (in hybrid model).
If no error terms, measurement error will be in disturbance (in path model).
“Standard” Specification of CFA
One indicator treated as a marker or reference variable (Brown p. 61, Kline p. 170): Its loading is fixed to one
Which variable should you choose? closest in meaning to the factor most interpretable units of measurement empirical: strongest correlations with other indicators No test of statistical significance
Factor variance is freely estimated Error paths are set to one Error variances are freely estimated
Standard Specification
G
c
e1
1
1
e
e2
1
m
e3
1
p
e4
1
Identification
Identification in CFA is largely determined by the number of indicator variables used in the model (more later).
Number of indicators ◦ 2 is the minimum◦ 3 is safer, especially if factor correlations are
weak◦ 4 provides safety◦ 5 or more is more than enough (If too many
indicators then combine indicators into sets or parcels.)
Identification
Overidentified model = knowns > unknowns Number of knowns = Number of variances and covariances of
observed variables computed by k(k+1)/2, where k is the number of observed variables
Number of unknowns (free parameters) is based on the specified model. It is typically a sum of the number of:
exogenous variables (one variance estimated for each) endogenous variables (one error variance each) correlations between variables (one covariance for each
pairing) regression paths (arrows linking exogenous variables to
endogenous variables)Latent variables indicator variables (one error variance for each) paths from latent variables to indicator variables (excluding
those fixed to 1)
Parameters in a CFA Model
Factor loadings: the effect of latent variable on that observed measure Can be unstandardized or standardized If measure loads on only one factor, standardized factor
loading is the correlation between the measure and the factor (and square root of measure’s reliability).
Factor covariances or correlations: association between each pair of latent variables
Error variance: variance in the observed measure that is not explained by latent variable Error variance is variance not explained by the factor
(but not necessarily random or meaningless variance) Correlated errors: correlation between a pair of
error terms
Parameters in a CFA Model
V1
DepressionFactor
BSI2
V2
e2
W1
1
BSI5
V3
e51
BSI8
V4
e81 1
BSI10
V5
e10W2 1
BSI14
V6
e141
BSI16
V7
e161
BSI18
V8
e181
W3
W4
W5
W6
CONFIRMATORY FACTOR ANALYSIS: One Factor of BSI Depressionwith all parameters labeled
Number of knowns = 28
(7*8)/2=28
Number of unknowns = 14Variance of latent factor
(1)
Free factor loadings (6)
Variances of error terms (7)
If a researcher’s measurement model is reasonably correct, then… (Kline)
1. Items (indicators) specified to measure a common underlying factor should have relatively high loadings on that factor (convergent validity)
2. Estimated correlations between the factors should not be excessively high (>.85) (discriminant validity)
Discriminant validity refers to the distinctiveness of the factors measured by different sets of indicators.
What to examine
Parameter estimates (all should be statistically significant different from zero) loadings factor variance (one-tailed test)
error variances (one-tailed test) error correlations (two tailed) Check for Heywood cases!!!!
(Negative error variances)
Problems In Estimation: Heywood Cases
Heywood Cases: Negative error variance (or a standardized loading larger than 1)
Why? Misspecification Outliers Small sample+2 indicators per factor Empirical under-identification
Problems In Estimation: Heywood Cases
How to eliminate Heywood cases:◦ Search for specification error
Sometimes a measure has correlated error Create 2 factors
◦ Fix error variance to zero Creates an extra df as one parameter is not estimated Need to adjust chi-square and fit indices
◦ Non-linear constraints that error variances cannot be negative (always in EQS)
◦ Set loadings equal (must use covariance matrix)◦ Use an alternative estimation method beside ML◦ Empirical underidentification: make sure correlations
are not weak
AMOS Bollen Example
Respecification
Simpler Model Set loadings equal: use covariance matrix
and variables must be in the same metric More Complex Model
Correlated errors: a priori justification Second factor
Use Diagnostics: residuals
These are nested
If the two-headed arrow in model b is set to 1, that would be saying there is only one latent trait.
Thus model b has one more path than model a.
Proposed 1 Factor Structure
Testing: Comparison of Nested Models Two models
Base Model More Complex Model (e.g., base model with
additional paths)fewer df
If the base model is good fitting, then the more complex model must also be good fitting.
Chi square and degrees of freedom are subtracted to test constraints made in the base model
The more complex model should be a “good fitting” model, otherwise the conclusion is that one model is less poor than another.
Testing: Comparison of Nested Models
Chi-square difference test◦ Run both models◦ Subtract Chi-square and df: Simple-Complex
Complex: more parameterized, less parsimonious Simpler: less parameterized, more parsimonious
χ2 diff = n.s. favor parsimonious model
χ2 diff = sig favor more parameterized model
Notes about Nested Models
Models must have same variables Models will not be nested if your
respecification involves deleting a variable(s)
Can add parameters, delete parameters, but cannot do both
Nested Models?
Nested Models?
“Path Analytic” Specification All loadings are freely estimated Factor variance are set to one Error paths are freely estimated:
(Standardized) error path equals the square root of one minus the standardized factor loading squared
Error variances are set to one
Path Analytic Specification
1
G
c
1
e1
e
1
e2
m
1
e3
p
1
e4
4 basic steps- CFA:
1. Define the factor model. select the number of factors to
determine the nature of the paths between the factors and the measures.
Paths can be fixed at zero, fixed at another constant value, allowed to vary freely, or be allowed to vary under specified constraints (such as being equal to each other).
2. Fit the model to the data.
4 Basic Steps for CFA
3. Evaluate model adequacy. When the factor model is fit to the data,
the factor loadings are chosen to minimize the discrepancy between the correlation matrix implied by the model and the actual observed matrix.
The amount of discrepancy after the best parameters are chosen can be used as a measure of how consistent the model is with the data.
Fit statistics
4 Basic Steps for CFA
4. Compare with other models. To compare two nested models, examine
the difference between their 2 statistics. Most tests of individual factor loadings
can be made as comparisons of full and reduced factor models.
For non-nested models, you can compare the Root mean square error of approximation (RMSEA), an estimate of discrepancy per degree of freedom in the model, other fit indices, and the AIC and BIC.