Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
The European Commission’s science and knowledge service
Joint Research Centre
Step 7: Statistical coherence (II)
Principal Component Analysis and Reliability Analysis
Hedvig Norlén
COIN 2018 - 16th JRC Annual Training on Composite Indicators & Scoreboards 05-09/11/2018 Ispra (IT)
3 JRC-COIN © | Step 7: Statistical Coherence (II)
Step 7: Statistical coherence (II) introduction
“New property” will be given by a linear combination
w1x + w2y PCA will find this “best” line: 1) Maximum variance 2) Minimum error
x
y
First Principal Component!
4 JRC-COIN © | Step 7: Statistical Coherence (II)
Step 7: Statistical coherence (II) introduction
Input Output
Black Box
PCA = Principal Component Analysis RA= Reliability Analysis
5 JRC-COIN © | Step 7: Statistical Coherence (II)
Outline talk Statistical coherence (II)
How multivariate methods can help to understand the statistical coherence of our composite indicator
Part 1 Principal Component Analysis
Part 2 Reliability Analysis (Cronbach’s alfa)
Is the structure of our composite indicator statistically well-defined?
Is the set of available indicators sufficient to describe the pillars and the sub-pillars of the composite indicator?
6 JRC-COIN © | Step 7: Statistical Coherence (II)
Ex 1. European Skills Index
The European Skills Index (ESI) measures the performance of a country’s skills system taking into account its manifold facets from continually developing the skills of the population to activating and effectively matching these skills to the needs of employers in the labour market. ESI launched in September 2018
7 JRC-COIN © | Step 7: Statistical Coherence (II)
Type equation here.
European Skills Index
A Composite Indicator measures multifaceted phenomenon - combination of different aspects (Sub-pillars/Pillars).
Each aspect can be measured by a set of observable variables (Indicators).
Framework of the European Skills Index 2018
Composite Indicator Pillars Sub-pillars Indicators
8 JRC-COIN © | Step 7: Statistical Coherence (II)
Type equation here.
Statistical coherence (II) - unidimensionality
PCA is used to verify the internal consistency, verify “unidimensionality” within:
1) each Sub-pillar (across indicators)
Framework of the European Skills Index 2018
9 JRC-COIN © | Step 7: Statistical Coherence (II)
Type equation here.
Statistical coherence (II) - unidimensionality
PCA is used to verify the internal consistency, verify “unidimensionality” within:
1) each Sub-pillar (across Indicators)
2) each Pillar (across Sub-pillars)
Framework of the European Skills Index 2018
10 JRC-COIN © | Step 7: Statistical Coherence (II)
Type equation here.
Statistical coherence (II) - unidimensionality
PCA is used to verify the internal consistency, verify “unidimensionality” within:
1) each Sub-pillar (across Indicators)
2) each Pillar (across Sub-pillars)
3) the Composite Indicator (across Pillars)
Framework of the European Skills Index 2018
11 JRC-COIN © | Step 7: Statistical Coherence (II)
PCA
Identify a small number of “averages” (PCs) that explain most of the variance observed.
PCA summarizes information of all indicators and reduces it into a fewer number of components
Each principal component PCi is a new variable computed as a linear combination of the original (standardized) variables
𝑃𝐶1 = 𝑤1𝑥1 + 𝑤2𝑥2 + ⋯ 𝑤𝑝𝑥𝑝
Observed indicators are reduced into components
. . .
𝒘𝟏
𝒘𝟐
𝒘𝒑
Component
Indicator 1 X1
Indicator 2 X2
Indicator p Xp
PCA
12 JRC-COIN © | Step 7: Statistical Coherence (II)
-3 -1.5 0 1.5 3
-3 -1.5 0 1.5 3
RO BG IT BE FI SE
ITFI BG ROBESE
European Skills Index – 2 dimensions
0 0.25 0.50 0.75 1
0 0.25 0.50 0.75 1
RO BG IT BE SE FI
IT BGRO BE FI SE
x
y
x
y
PC1
PC2
PC1
PC2
Variance 81%
Variance 19%
13 JRC-COIN © | Step 7: Statistical Coherence (II)
European Skills Index – third piller dimension added
0 0.25 0.50 0.75 1
0 0.25 0.50 0.75 1
0 0.25 0.50 0.75 1
RO FI
IT SE
EL CZ
x
y
z
Variance 58%
Variance 29%
Variance 12%
PCA example to be completed in a little while!
14 JRC-COIN © | Step 7: Statistical Coherence (II)
First steps in PCA
Check the correlation structure of the data and perform 2 “pre-tests”
1) Bartlett’s sphericity test
The test checks if the observed correlation matrix R diverges significantly from the identity matrix. H0 : |R| = 1 , H1 : |R| ≠ 1
Want to reject H0 to be able to do perform a valid PCA (Bartlett (1937))
2) Kaiser-Meyer-Olkin (KMO) Measure for Sampling Adequacy
Measure of the strength of relationship among variables based on correlations and partial correlations. KMO between [0;1].
Want KMO close to 1 to be able to perform a valid PCA. KMO>0.6 OK!
(Kaiser (1970), Kaiser-Meyer (1974))
15 JRC-COIN © | Step 7: Statistical Coherence (II)
First steps in PCA
2) Kaiser-Meyer-Olkin (KMO) Measure for Sampling Adequacy
Kaiser’s own interpretations of the KMO values
KMO values: “in the .90s, marvelous in the .80s, meritorious in the .70s, middling in the .60s, mediocre in the .50s, miserable below .50, unacceptable”
1) Bartlett’s sphericity test and 2) KMO test provide info whether it is possible to do PCA, but do not give info of the “magic number” - how many components are needed
KMO>0.6 OK
16 JRC-COIN © | Step 7: Statistical Coherence (II)
How to get the PCs
1) Eigenvalue decomposition of a data correlation matrix (case of composite indicator),
2) Singular Value Decomposition (SVD) of a data matrix after mean centering (normalizing) the data matrix Assumptions: I) Linearity II)Normal distribution III) Principal components are orthogonal
17 JRC-COIN © | Step 7: Statistical Coherence (II)
Finding the “magic number”- determining how many components in PCA
Several methods exist. The 3 most common are:
1) Kaiser–Guttman ‘Eigenvalues greater than one’ criterion (Guttman (1954), Kaiser (1960)). Select all components with eigenvalues over 1 (or 0.9)
2) Cattell’s scree test
(Cattell (1966)) “Above the elbow” approach
3) Certain percentage of explained variance e.g., >2/3, 75%, 80%,…
Scree
18 JRC-COIN © | Step 7: Statistical Coherence (II)
Type equation here.
European Skills Index finalizing the PCA example
Use PCA to explore the dimensionality of ESI across the three pillars.
The 3 pillars should be correlated.
Supports the assumption that they are all measuring the a similar concept!
Framework of the European Skills Index 2018
19 JRC-COIN © | Step 7: Statistical Coherence (II)
Check the correlation structure
Pearson correlation coefficients (Correlations not significant at significance level α
= 0.01 are grey, n=28, critical value = 0.48)
European Skills Index - PCA example
20 JRC-COIN © | Step 7: Statistical Coherence (II)
Check the correlation structure 1) Bartlett’s sphericity test p-value (1.5e-17) < 0.01 Reject H0 2) KMO Measure for Sampling Adequacy Overall MSA = 0.54 KMO>0.6!
European Skills Index - PCA example
Pearson correlation coefficients
21 JRC-COIN © | Step 7: Statistical Coherence (II)
Component loadings
Component Eigenvalue VarianceCumulative
variancePC1 PC2 PC3
1 1.75 58.41 58.41 1. Skills Development 0.88 -0.16 -0.44
2 0.88 29.36 87.77 2. Skills Activation 0.84 -0.36 0.41
3 0.37 12.22 100,00 3. Skills Matching 0.51 0.85 0.09
Sum 3 100 Sum of Squares 1.75 0.88 0.37
Stopping criterion Eigenvalue > 1
Pearson correlation coefficients between
Total variance explained pillar and principal component
SMSASDXXXPC 51.084.088.0
1
One dimension verified!
European Skills Index - PCA results
1.75=0.882+0.842+0.512
58.41=1.75/3*100
22 JRC-COIN © | Step 7: Statistical Coherence (II)
European Skills Index - Audit results
ESI theoretical framework
23 JRC-COIN © | Step 7: Statistical Coherence (II)
Ex 2. Social Progress Index
The Social Progress Index (SPI) is an international monitoring framework for measuring social progress without resorting to the use of economic indicators.
SPI measures country performance on aspects of social and environmental performance, which are relevant for countries at all levels of economic development.
146 fully and 90 partially ranked countries.
24 JRC-COIN © | Step 7: Statistical Coherence (II)
Social Progress Index - PCA at Index level
Social progress is measured taking into account the following three broad aspects:
1. Meeting everyone’s basic needs for food, clean water, shelter and security.
2. Living long, healthy lives with basic knowledge and communication and a clean environment.
3. Practicing equal rights and freedoms and pursuing higher education.
5th version of SPI was launched September 2018.
PCA to explore the dimensionality of SPI across the three dimensions.
25 JRC-COIN © | Step 7: Statistical Coherence (II)
Social Progress Index - PCA example
Correlation
matrix
Basic Human
Needs
Foundations
of WellbeingOpportunity
Basic Human
Needs 1
Foundations
of Wellbeing 0.95 1
Opportunity 0.81 0.88 1
Pearson correlation coefficients
Check the correlation structure
(Significance level α = 0.01, n=146,
critical value = 0.21)
26 JRC-COIN © | Step 7: Statistical Coherence (II)
Social Progress Index - PCA example
Correlation
matrix
Basic Human
Needs
Foundations
of WellbeingOpportunity
Basic Human
Needs 1
Foundations
of Wellbeing 0.95 1
Opportunity 0.81 0.88 1
Pearson correlation coefficients
Check the correlation structure
1) Bartlett’s sphericity test p-value (2.0e-116) < 0.01 Reject H0 2) KMO Measure for Sampling Adequacy Overall MSA = 0.69 KMO>0.6
27 JRC-COIN © | Step 7: Statistical Coherence (II)
Component loadings
Component Eigenvalue VarianceCumulative
variancePC1 PC2 PC3
1 2.76 92.02 92.021. Basic Human
Needs0.96 -0.25 0.12
2 0.20 6.55 98.572. Foundations of
Wellbeing0.98 -0.09 -0.16
3 0.04 1.43 100,00 3. Opportunity 0.93 0.35 0.05
Sum 3 100 Sum of Squares 2.76 0.20 0.04
Stopping criterion Eigenvalue > 1
Pearson correlation coefficients between
Total variance explained pillar and principal component
Social Progress Index - PCA results
One dimension verified!
OppFoWBHNXXXPC 93.098.096.0
1
28 JRC-COIN © | Step 7: Statistical Coherence (II)
Social Progress Index - PCA results
PCA results often summarized in a “Factor map”
Second principal component is useful to evaluate the differences between the first two and third dimensions.
OppFoWBHNXXXPC 93.098.096.0
1
OppFoWBHNXXXPC 35.009.025.0
2
2PC
29 JRC-COIN © | Step 7: Statistical Coherence (II)
Social Progress Index - Audit results
30 JRC-COIN © | Step 7: Statistical Coherence (II)
Social Progress Index - Audit results
1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1
3 Dim 12 Comp
Index Redundancies in SPI framework - very strong correlations between the SPI aggregates (dimensions and components).
PCA was also performed on the whole set of 51 indicators
Some components measure similar concepts:
C1:Water and Sanitation
C2:Shelter
C5: Access to Basic Knowledge
C12: Access to Advanced Education
31 JRC-COIN © | Step 7: Statistical Coherence (II)
Social Progress Index - Audit results
Very strong correlations between the SPI aggregates (dimensions and components).
PCA was also performed on the whole set of 51 indicators.
7 latent dimensions (principal components) are retrieved, which capture 78% of the total variance in the underlying indicators.
Difference less than 8%
Total variance explained
Component EigenvalueCumulative
variance
1 27,55 54,02
2 5,30 64,42
3 1,98 68,31
4 1,63 71,51
5 1,24 73,94
6 1,22 76,34
7 1,07 78,44
8 0,97 80,33
9 0,84 81,98
10 0,76 83,46
11 0,67 84,78
12 0,67 86,10
…
51 0,01 100,00
Sum 51
Stopping criterion Eigenvalue > 1
32 JRC-COIN © | Step 7: Statistical Coherence (II)
Social Progress Index - Audit results
From JRC Audit report:
“Less number of components is also in line with the results from the “beyond GDP” approach by the Commission on the Measurement of Economic Performance and Social Progress*. The SPI conceptual framework has been influenced by this work. So apart from the statistical reasoning, the result from the “beyond GDP” approach, may also give conceptual justification for reducing the number of components in the SPI framework.”
* Stiglitz, J., Sen,A., Fitoussi,J.-P., 2009. Report by the Commission on the Measurement of Economic Performance and Social Progress. In: Commission on the Measurement of Economic Performance and Social Progress, Paris, France.
33 JRC-COIN © | Step 7: Statistical Coherence (II)
Historical notes PCA
PCA is probably the oldest of the multivariate analysis methods.
Pearson (1901) said that his methods “can be easily applied to numerical problems” and although he says that the calculations become “cumbersome” for four or more variables he suggests that they are “still quite feasible”.
Hotelling (1933) chose his “components” so as to maximize their successive contributions to the total of the variances of the original variables, “principal components”.
Karl Pearson (1857-1936)
Harold Hotelling (1895-1973)
34 JRC-COIN © | Step 7: Statistical Coherence (II)
Cronbach’s α is measure of internal consistency and reliability
Based upon classical test theory. Most common estimate of reliability, Cronbach (1951).
Cronbach’s α is defined as Ratio of the Total Covariance in the “test” to the Total Variance in the “test”.
By “test” we mean the set of indicators constituting a Sub-pillar/Pillar/Composite Indicator.
Part 2: Reliability Analysis
35 JRC-COIN © | Step 7: Statistical Coherence (II)
Cronbach’s α is measure of internal consistency and reliability
α ranges from 0 to 1. If all indicators are entirely independent from one another (i.e. are not correlated or share no covariance) α = 0. If all of the items have high covariances, then α will approach 1.
Caution! α increases when the number of indicators increases.
Cronbach’s α
36 JRC-COIN © | Step 7: Statistical Coherence (II)
Cronbach’s α is measure of internal consistency and reliability
Cronbach’s α cut off values
Cronbach's alpha Internal consistency 0.9 ≤ α Excellent 0.8 ≤ α < 0.9 Good 0.7 ≤ α < 0.8 Acceptable 0.6 ≤ α < 0.7 Questionable 0.5 ≤ α < 0.6 Poor α < 0.5 Unacceptable
Lee Cronbach (1916-2001)
37 JRC-COIN © | Step 7: Statistical Coherence (II)
Cronbach’s α is measure of internal consistency and reliability
Cronbach’s α - European Skills Index
Cronbach's alpha Internal consistency 0.9 ≤ α Excellent 0.8 ≤ α < 0.9 Good 0.7 ≤ α < 0.8 Acceptable 0.6 ≤ α < 0.7 Questionable 0.5 ≤ α < 0.6 Poor α < 0.5 Unacceptable
European Skills Index
Index level (3 pillars)
Item levelCorrected
Item-Total
Correlation
Alpha if Item
is dropped
Pillar 1: Skills Development 0.59 0.28
Pillar 2: Skills Activation 0.44 0.42
Pillar 3: Skills Matching 0.23 0.74
0.59
Cronbach's Alpha
38 JRC-COIN © | Step 7: Statistical Coherence (II)
Cronbach’s α is measure of internal consistency and reliability
Cronbach’s α - Social Progress Index
Cronbach's alpha Internal consistency 0.9 ≤ α Excellent 0.8 ≤ α < 0.9 Good 0.7 ≤ α < 0.8 Acceptable 0.6 ≤ α < 0.7 Questionable 0.5 ≤ α < 0.6 Poor α < 0.5 Unacceptable
Social Progress Index
Index level (3 dimensions)
Item levelCorrected
Item-Total
Correlation
Alpha if Item
is dropped
1: Basic Human Needs 0.91 0.94
2: Foundations of Wellbeing 0.96 0.89
3: Opportunity 0.86 0.97
Cronbach's Alpha
0.96
If α is too high it may suggest that some items are redundant as that are measuring very similar concepts.
Maximum α 0.9 has been recommended.
39 JRC-COIN © | Step 7: Statistical Coherence (II)
PCA and RA
Concluding words
Input Output
Black Box Open transparent box
40 JRC-COIN © | Step 7: Statistical Coherence (II)
References
Books
• Johnson, “Applied Multivariate Statistical Analysis (6th Revised edition)”, Pearson Education Limited (2013)
• Jolliffe, “Principal Component Analysis, Second Edition”, Springer (2002)
• Leandre, Fabrigar, Duane, “Exploratory Factor Analysis”, Oxford (2011)
• Rencher, “Methods of multivariate analysis”, Wiley (1995)
• Tinsley and Brown, “Handbook of Applied Multivariate Statistics and Mathematical Modeling”, Academic Press (2000)
• Tucker, MacCallum, “Exploratory Factor Analysis”, https://www.unc.edu/~rcm/book/factornew.htm (1997)
https://www.unc.edu/~rcm/book/factornew.htmhttps://www.unc.edu/~rcm/book/factornew.htm
Any questions? You may contact us at @username & [email protected]
Welcome to email us at: [email protected]
THANK YOU
COIN in the EU Science Hub https://ec.europa.eu/jrc/en/coin
COIN tools are available at: https://composite-indicators.jrc.ec.europa.eu/
The European Commission’s
Competence Centre on Composite
Indicators and Scoreboards
mailto:[email protected]:[email protected]:[email protected]://ec.europa.eu/jrc/en/coinhttps://ec.europa.eu/jrc/en/coinhttps://ec.europa.eu/jrc/en/coinhttps://composite-indicators.jrc.ec.europa.eu/https://composite-indicators.jrc.ec.europa.eu/https://composite-indicators.jrc.ec.europa.eu/https://composite-indicators.jrc.ec.europa.eu/https://composite-indicators.jrc.ec.europa.eu/
42 JRC-COIN © | Step 7: Statistical Coherence (II)
Data matrix of indicators
11 12 1
21 22 2
1 2
.....
.....
.
.
.....
p
p
n n np
x x x
x x x
x x x
X
n = number of objects (countries, …)
p = number of variables (indicators)
The previous steps in building our CI - Step 3 Outlier detection and missing data estimation - Step 4 Normalisation/Standardisation
X correlated data set (without outliers and (without or a few) missing values)
PCA is based on correlations
43 JRC-COIN © | Step 7: Statistical Coherence (II)
First steps in PCA
• Check the correlation structure of the data and do 1) and 2)
1) Bartlett’s sphericity test
The Bartlett’s test checks if the observed correlation matrix R diverges significantly from the identity matrix.
H0 : |R| = 1 , H1 : |R| ≠ 1
Test statistic
Under H0, it follows a χ² distribution with a p(p-1)/2 degrees of freedom.
Want to reject H0 to be able to do PCA,FA! (Bartlett (1937))
Rp
n log6
5212
n = sample size
p = number of indicators
44 JRC-COIN © | Step 7: Statistical Coherence (II)
First steps in PCA
• Check the correlation structure of the data and do 1) and 2)
2) Kaiser-Meyer-Olkin (KMO) Measure for Sampling Adequacy
Indicators are more or less correlated, but the correlation between two indicators can be influenced by the others. Use partial correlation in order to measure the relation between two indicators by removing the effect of the remaining indicators (controlling for the confounding variable)
Want KMO close to 1 to be able to do PCA! KMO>0.6 ok
(Kaiser (1970), Kaiser-Meyer (1974))
22
2
ijij
ij
ar
rKMO
rij Pearson correlation coefficient aij partial correlation coefficient
45 JRC-COIN © | Step 7: Statistical Coherence (II)
PCA and sample size
Sufficient number of cases (countries, regions, cities…) necessary to perform PCA. NO scientific answer – opinions differ!
Two different approaches :
(A) Minimum total sample size
(B) Examining the ratio of subjects to variables
Rule of 10: These should be at least 10 cases for each variable
3:1 ratio: The cases-to variables ratio should be no lower than 3
5:1 ratio: The cases-to variables ratio should be no lower than 5
46 JRC-COIN © | Step 7: Statistical Coherence (II)
PCA and sample size (cont.1)
(A) Minimum total sample size
Rule of 100: Gorsuch (1983) and Kline (1979, p. 40) recommended at least 100 (MacCallum, Widaman, Zhang & Hong, 1999). No sample should be less than 100 even though the number of variables is less than 20 (Gorsuch, 1974, p. 333; in Arrindell & van der Ende, 1985, p. 166);
Hatcher (1994) recommended that the number of subjects should be the larger of 5 times the number of variables, or 100. Even more subjects are needed when communalities are low and/or few variables load on each factor (in David Garson, 2008).
Rule of 150: Hutcheson and Sofroniou (1999) recommends at least 150 - 300 cases, more toward the 150 end when there are a few highly correlated variables, as would be the case when collapsing highly multicollinear variables (in David Garson, 2008).
Rule of 200: Guilford (1954, p. 533) suggested that N should be at least 200 cases (in MacCallum, Widaman, Zhang & Hong, 1999, p84; in Arrindell & van der Ende, 1985; p. 166).
Rule of 250: Cattell (1978) claimed the minimum desirable N to be 250 (in MacCallum, Widaman, Zhang & Hong, 1999, p84).
Rule of 300: There should be at least 300 cases (Noruis, 2005: 400, in David Garson, 2008).
Significance rule. Lawley and Maxwell (1971) suggested 51 more cases than the number of variables, to support chi-square testing (in David Garson, 2008).
Rule of 500. Comrey and Lee (1992) thought that 100 = poor, 200 = fair, 300 = good, 500 = very good, 1,000 or more = excellent They urged researchers to obtain samples of 500 or more observations whenever possible (in MacCallum, Widaman, Zhang & Hong, 1999, p84).
47 JRC-COIN © | Step 7: Statistical Coherence (II)
PCA and sample size (cont.2)
(B) Examining the ratio of subjects to variables
A ratio of 20:1: Hair, Anderson, Tatham, and Black (1995, in Hogarty, Hines, Kromrey, Ferron, & Mumford, 2005)
Rule of 10: There should be at least 10 cases for each item in the instrument being used. (David Garson, 2008; Everitt, 1975; Everitt, 1975, Nunnally, 1978, p. 276, in Arrindell & van der Ende,1985, p. 166; Kunce, Cook, & Miller, 1975, Marascuilor & Levin, 1983, in Velicer & Fava, 1998, p. 232)
Rule of 5: The subjects-to-variables ratio should be no lower than 5 (Bryant and Yarnold, 1995, in David Garson, 2008; Gorsuch, 1983, in MacCallum, Widaman, Zhang & Hong, 1999; Everitt, 1975, in Arrindell & van der Ende, 1985; Gorsuch, 1974, in Arrindell & van der Ende, 1985, p. 166)
A ratio of 3(:1) to 6(:1) of subjects-to-variables is acceptable if the lower limit of variables-to-factors ratio is 3 to 6. But, the absolute minimum sample size should not be less than 250.(Cattell, 1978, p. 508, in Arrindell & van der Ende, 1985, p. 166)
Ratio of 2. “There should be at least twice as many subjects as variables in factor-analytic investigations. This means that in any large study on this account alone, one should have to use more than the minimum 100 subjects" (Kline, 1979, p. 40).
48 JRC-COIN © | Step 7: Statistical Coherence (II)
Multivariate data
Data Indicators
Quantitative Mixed/
structured
disPCA/FA,(M)CA
Qualitative
PCA/FA FAMD/MFA
PCA Principal Component Analysis FA Factor Analysis
PCA for discrete data (based on polychoric correlations) (M)CA (Multiple) Correspondence Analysis ….
FAMD Factor Analysis of Mixed Data MFA Multiple Factor Analysis ….
The type of analysis to be performed depends on the data
of the indicators
49 JRC-COIN © | Step 7: Statistical Coherence (II)
Exploratory FA
• Two main types of FA: Exploratory (EFA) and Confirmatory (CFA)
• EFA examining the relationships among the variables/indicators and do not have an “à priori” fixed number of factors
• CFA a clear idea about the number of factors you will encounter, and about which variables will most likely load onto each factor
50 JRC-COIN © | Step 7: Statistical Coherence (II)
Is PCA = Exploratory FA?
• Similar in many ways but an important difference that has effects on how to use them
1) Both are data reduction techniques - they allow you to capture the variance in variables in a smaller set
2) Same (or similar) procedures used to run in many software (same steps extraction, interpretation, rotation, choosing the number of components/factors)
PCA FA
51 JRC-COIN © | Step 7: Statistical Coherence (II)
Is PCA = Exploratory FA? NO!
• Similar in many ways but an important difference that has effects on how to use them
1) Both are data reduction techniques - they allow you to capture the variance in variables in a smaller set
2) Same (or similar) procedures used to run in many software (same steps extraction, interpretation, choosing the number of components/factors)
• Difference:
PCA is a linear combination of variables
FA is a measurement model of a latent variable (statistical model)
PCA FA
52 JRC-COIN © | Step 7: Statistical Coherence (II)
Is PCA = Exploratory FA? NO!
PCA FA
. . .
Observed indicators are reduced into components
𝒘𝟏
𝒘𝟐
𝒘𝒑
Component
Item 1 X1
Item 2 X2
Item p Xp
PCA summarizes information of all indicators and reduces it into a fewer number of components Each PCi is a new variable computed as a linear combination of the original variables
ppxwxwxwPC ....
22111
Factor
Item 1 X1
Item 2 X2
Item p Xp
. . .
𝞴𝟏
𝞴𝟐
𝞴𝒑
𝞮𝟏
Latent factors drive the observed variables
𝞮𝟐
𝞮𝒑
𝞮𝒊
Model of the measurement of a latent variable. This latent variable cannot be directly measured with a single variable. Seen through the relationships it causes in a set of X variables.
Error terms
ppp
iii
FX
FX
FX
1
1
1111
53 JRC-COIN © | Step 7: Statistical Coherence (II)
PCA for discrete data
A practically important violation of the normality assumption underlying the PCA occurs when the data are discrete!
Several kinds of discrete data (binary, ordinal, count, nominal…)
Two ways on how to proceed:
1) Simple approach, use the discrete x’s as if they were continuous in the PCA but instead of using the Pearson moment correlation coefficients use Spearman or Kendall rank correlation coefficients.
2) PCA using polychoric correlations. Pearson (1901) founder of this approach (as well!), developed further by Olsson (1979) and Jöreskog (2004)
54 JRC-COIN © | Step 7: Statistical Coherence (II)
PCA using polychoric correlations
The polychoric correlation (PCC) coefficient is a measure of association for ordinal variables which rests upon an assumption of an underlying joint continuous normal distribution. (Later generalizations of different continuous distributions).
PCC coefficient is MLE of the product-moment correlation between the underlying normal variables
55 JRC-COIN © | Step 7: Statistical Coherence (II)
Cronbach’s α is measure of internal consistency and reliability Based upon classical test theory. Most common estimate of reliability, Cronbach (1951)
Cronbach’s α is defined as Ratio of the Total Covariance in the “test” to the Total Variance in the “test”. By “test” we mean the set of indicators constituting a sub-pillar/ pillar/CI.
Let be the variance of indicator i and total variance in sub-pillar/pillar/CI
Average covariance of an indicator with any other indicator is
Part 2: Reliability Analysis
)( iXVar p
i iXVar
1
)1(
)(11
pp
XVarXVarp
i i
p
i i
p
i i
p
i i
p
i i
p
i i
p
i i
p
i i
XVar
XVarXVar
p
p
pXVar
ppXVarXVar
1
11
2
1
11)(
1/
)1(/)(
p = number of indicators
p
i i
p
i i
XVar
XVar
p
p
1
1)(
11