55
The European Commission’s science and knowledge service Joint Research Centre

Joint Research Centre - Europa 2018 Step 7...Step 7: Statistical coherence (II) Principal Component Analysis and Reliability Analysis Hedvig Norlén COIN 2018 - 16th JRC Annual Training

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • The European Commission’s science and knowledge service

    Joint Research Centre

  • Step 7: Statistical coherence (II)

    Principal Component Analysis and Reliability Analysis

    Hedvig Norlén

    COIN 2018 - 16th JRC Annual Training on Composite Indicators & Scoreboards 05-09/11/2018 Ispra (IT)

  • 3 JRC-COIN © | Step 7: Statistical Coherence (II)

    Step 7: Statistical coherence (II) introduction

    “New property” will be given by a linear combination

    w1x + w2y PCA will find this “best” line: 1) Maximum variance 2) Minimum error

    x

    y

    First Principal Component!

  • 4 JRC-COIN © | Step 7: Statistical Coherence (II)

    Step 7: Statistical coherence (II) introduction

    Input Output

    Black Box

    PCA = Principal Component Analysis RA= Reliability Analysis

  • 5 JRC-COIN © | Step 7: Statistical Coherence (II)

    Outline talk Statistical coherence (II)

    How multivariate methods can help to understand the statistical coherence of our composite indicator

    Part 1 Principal Component Analysis

    Part 2 Reliability Analysis (Cronbach’s alfa)

    Is the structure of our composite indicator statistically well-defined?

    Is the set of available indicators sufficient to describe the pillars and the sub-pillars of the composite indicator?

  • 6 JRC-COIN © | Step 7: Statistical Coherence (II)

    Ex 1. European Skills Index

    The European Skills Index (ESI) measures the performance of a country’s skills system taking into account its manifold facets from continually developing the skills of the population to activating and effectively matching these skills to the needs of employers in the labour market. ESI launched in September 2018

  • 7 JRC-COIN © | Step 7: Statistical Coherence (II)

    Type equation here.

    European Skills Index

    A Composite Indicator measures multifaceted phenomenon - combination of different aspects (Sub-pillars/Pillars).

    Each aspect can be measured by a set of observable variables (Indicators).

    Framework of the European Skills Index 2018

    Composite Indicator Pillars Sub-pillars Indicators

  • 8 JRC-COIN © | Step 7: Statistical Coherence (II)

    Type equation here.

    Statistical coherence (II) - unidimensionality

    PCA is used to verify the internal consistency, verify “unidimensionality” within:

    1) each Sub-pillar (across indicators)

    Framework of the European Skills Index 2018

  • 9 JRC-COIN © | Step 7: Statistical Coherence (II)

    Type equation here.

    Statistical coherence (II) - unidimensionality

    PCA is used to verify the internal consistency, verify “unidimensionality” within:

    1) each Sub-pillar (across Indicators)

    2) each Pillar (across Sub-pillars)

    Framework of the European Skills Index 2018

  • 10 JRC-COIN © | Step 7: Statistical Coherence (II)

    Type equation here.

    Statistical coherence (II) - unidimensionality

    PCA is used to verify the internal consistency, verify “unidimensionality” within:

    1) each Sub-pillar (across Indicators)

    2) each Pillar (across Sub-pillars)

    3) the Composite Indicator (across Pillars)

    Framework of the European Skills Index 2018

  • 11 JRC-COIN © | Step 7: Statistical Coherence (II)

    PCA

    Identify a small number of “averages” (PCs) that explain most of the variance observed.

    PCA summarizes information of all indicators and reduces it into a fewer number of components

    Each principal component PCi is a new variable computed as a linear combination of the original (standardized) variables

    𝑃𝐶1 = 𝑤1𝑥1 + 𝑤2𝑥2 + ⋯ 𝑤𝑝𝑥𝑝

    Observed indicators are reduced into components

    . . .

    𝒘𝟏

    𝒘𝟐

    𝒘𝒑

    Component

    Indicator 1 X1

    Indicator 2 X2

    Indicator p Xp

    PCA

  • 12 JRC-COIN © | Step 7: Statistical Coherence (II)

    -3 -1.5 0 1.5 3

    -3 -1.5 0 1.5 3

    RO BG IT BE FI SE

    ITFI BG ROBESE

    European Skills Index – 2 dimensions

    0 0.25 0.50 0.75 1

    0 0.25 0.50 0.75 1

    RO BG IT BE SE FI

    IT BGRO BE FI SE

    x

    y

    x

    y

    PC1

    PC2

    PC1

    PC2

    Variance 81%

    Variance 19%

  • 13 JRC-COIN © | Step 7: Statistical Coherence (II)

    European Skills Index – third piller dimension added

    0 0.25 0.50 0.75 1

    0 0.25 0.50 0.75 1

    0 0.25 0.50 0.75 1

    RO FI

    IT SE

    EL CZ

    x

    y

    z

    Variance 58%

    Variance 29%

    Variance 12%

    PCA example to be completed in a little while!

  • 14 JRC-COIN © | Step 7: Statistical Coherence (II)

    First steps in PCA

    Check the correlation structure of the data and perform 2 “pre-tests”

    1) Bartlett’s sphericity test

    The test checks if the observed correlation matrix R diverges significantly from the identity matrix. H0 : |R| = 1 , H1 : |R| ≠ 1

    Want to reject H0 to be able to do perform a valid PCA (Bartlett (1937))

    2) Kaiser-Meyer-Olkin (KMO) Measure for Sampling Adequacy

    Measure of the strength of relationship among variables based on correlations and partial correlations. KMO between [0;1].

    Want KMO close to 1 to be able to perform a valid PCA. KMO>0.6 OK!

    (Kaiser (1970), Kaiser-Meyer (1974))

  • 15 JRC-COIN © | Step 7: Statistical Coherence (II)

    First steps in PCA

    2) Kaiser-Meyer-Olkin (KMO) Measure for Sampling Adequacy

    Kaiser’s own interpretations of the KMO values

    KMO values: “in the .90s, marvelous in the .80s, meritorious in the .70s, middling in the .60s, mediocre in the .50s, miserable below .50, unacceptable”

    1) Bartlett’s sphericity test and 2) KMO test provide info whether it is possible to do PCA, but do not give info of the “magic number” - how many components are needed

    KMO>0.6 OK

  • 16 JRC-COIN © | Step 7: Statistical Coherence (II)

    How to get the PCs

    1) Eigenvalue decomposition of a data correlation matrix (case of composite indicator),

    2) Singular Value Decomposition (SVD) of a data matrix after mean centering (normalizing) the data matrix Assumptions: I) Linearity II)Normal distribution III) Principal components are orthogonal

  • 17 JRC-COIN © | Step 7: Statistical Coherence (II)

    Finding the “magic number”- determining how many components in PCA

    Several methods exist. The 3 most common are:

    1) Kaiser–Guttman ‘Eigenvalues greater than one’ criterion (Guttman (1954), Kaiser (1960)). Select all components with eigenvalues over 1 (or 0.9)

    2) Cattell’s scree test

    (Cattell (1966)) “Above the elbow” approach

    3) Certain percentage of explained variance e.g., >2/3, 75%, 80%,…

    Scree

  • 18 JRC-COIN © | Step 7: Statistical Coherence (II)

    Type equation here.

    European Skills Index finalizing the PCA example

    Use PCA to explore the dimensionality of ESI across the three pillars.

    The 3 pillars should be correlated.

    Supports the assumption that they are all measuring the a similar concept!

    Framework of the European Skills Index 2018

  • 19 JRC-COIN © | Step 7: Statistical Coherence (II)

    Check the correlation structure

    Pearson correlation coefficients (Correlations not significant at significance level α

    = 0.01 are grey, n=28, critical value = 0.48)

    European Skills Index - PCA example

  • 20 JRC-COIN © | Step 7: Statistical Coherence (II)

    Check the correlation structure 1) Bartlett’s sphericity test p-value (1.5e-17) < 0.01 Reject H0 2) KMO Measure for Sampling Adequacy Overall MSA = 0.54 KMO>0.6!

    European Skills Index - PCA example

    Pearson correlation coefficients

  • 21 JRC-COIN © | Step 7: Statistical Coherence (II)

    Component loadings

    Component Eigenvalue VarianceCumulative

    variancePC1 PC2 PC3

    1 1.75 58.41 58.41 1. Skills Development 0.88 -0.16 -0.44

    2 0.88 29.36 87.77 2. Skills Activation 0.84 -0.36 0.41

    3 0.37 12.22 100,00 3. Skills Matching 0.51 0.85 0.09

    Sum 3 100 Sum of Squares 1.75 0.88 0.37

    Stopping criterion Eigenvalue > 1

    Pearson correlation coefficients between

    Total variance explained pillar and principal component

    SMSASDXXXPC 51.084.088.0

    1

    One dimension verified!

    European Skills Index - PCA results

    1.75=0.882+0.842+0.512

    58.41=1.75/3*100

  • 22 JRC-COIN © | Step 7: Statistical Coherence (II)

    European Skills Index - Audit results

    ESI theoretical framework

  • 23 JRC-COIN © | Step 7: Statistical Coherence (II)

    Ex 2. Social Progress Index

    The Social Progress Index (SPI) is an international monitoring framework for measuring social progress without resorting to the use of economic indicators.

    SPI measures country performance on aspects of social and environmental performance, which are relevant for countries at all levels of economic development.

    146 fully and 90 partially ranked countries.

  • 24 JRC-COIN © | Step 7: Statistical Coherence (II)

    Social Progress Index - PCA at Index level

    Social progress is measured taking into account the following three broad aspects:

    1. Meeting everyone’s basic needs for food, clean water, shelter and security.

    2. Living long, healthy lives with basic knowledge and communication and a clean environment.

    3. Practicing equal rights and freedoms and pursuing higher education.

    5th version of SPI was launched September 2018.

    PCA to explore the dimensionality of SPI across the three dimensions.

  • 25 JRC-COIN © | Step 7: Statistical Coherence (II)

    Social Progress Index - PCA example

    Correlation

    matrix

    Basic Human

    Needs

    Foundations

    of WellbeingOpportunity

    Basic Human

    Needs 1

    Foundations

    of Wellbeing 0.95 1

    Opportunity 0.81 0.88 1

    Pearson correlation coefficients

    Check the correlation structure

    (Significance level α = 0.01, n=146,

    critical value = 0.21)

  • 26 JRC-COIN © | Step 7: Statistical Coherence (II)

    Social Progress Index - PCA example

    Correlation

    matrix

    Basic Human

    Needs

    Foundations

    of WellbeingOpportunity

    Basic Human

    Needs 1

    Foundations

    of Wellbeing 0.95 1

    Opportunity 0.81 0.88 1

    Pearson correlation coefficients

    Check the correlation structure

    1) Bartlett’s sphericity test p-value (2.0e-116) < 0.01 Reject H0 2) KMO Measure for Sampling Adequacy Overall MSA = 0.69 KMO>0.6

  • 27 JRC-COIN © | Step 7: Statistical Coherence (II)

    Component loadings

    Component Eigenvalue VarianceCumulative

    variancePC1 PC2 PC3

    1 2.76 92.02 92.021. Basic Human

    Needs0.96 -0.25 0.12

    2 0.20 6.55 98.572. Foundations of

    Wellbeing0.98 -0.09 -0.16

    3 0.04 1.43 100,00 3. Opportunity 0.93 0.35 0.05

    Sum 3 100 Sum of Squares 2.76 0.20 0.04

    Stopping criterion Eigenvalue > 1

    Pearson correlation coefficients between

    Total variance explained pillar and principal component

    Social Progress Index - PCA results

    One dimension verified!

    OppFoWBHNXXXPC 93.098.096.0

    1

  • 28 JRC-COIN © | Step 7: Statistical Coherence (II)

    Social Progress Index - PCA results

    PCA results often summarized in a “Factor map”

    Second principal component is useful to evaluate the differences between the first two and third dimensions.

    OppFoWBHNXXXPC 93.098.096.0

    1

    OppFoWBHNXXXPC 35.009.025.0

    2

    2PC

  • 29 JRC-COIN © | Step 7: Statistical Coherence (II)

    Social Progress Index - Audit results

  • 30 JRC-COIN © | Step 7: Statistical Coherence (II)

    Social Progress Index - Audit results

    1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1

    3 Dim 12 Comp

    Index Redundancies in SPI framework - very strong correlations between the SPI aggregates (dimensions and components).

    PCA was also performed on the whole set of 51 indicators

    Some components measure similar concepts:

    C1:Water and Sanitation

    C2:Shelter

    C5: Access to Basic Knowledge

    C12: Access to Advanced Education

  • 31 JRC-COIN © | Step 7: Statistical Coherence (II)

    Social Progress Index - Audit results

    Very strong correlations between the SPI aggregates (dimensions and components).

    PCA was also performed on the whole set of 51 indicators.

    7 latent dimensions (principal components) are retrieved, which capture 78% of the total variance in the underlying indicators.

    Difference less than 8%

    Total variance explained

    Component EigenvalueCumulative

    variance

    1 27,55 54,02

    2 5,30 64,42

    3 1,98 68,31

    4 1,63 71,51

    5 1,24 73,94

    6 1,22 76,34

    7 1,07 78,44

    8 0,97 80,33

    9 0,84 81,98

    10 0,76 83,46

    11 0,67 84,78

    12 0,67 86,10

    51 0,01 100,00

    Sum 51

    Stopping criterion Eigenvalue > 1

  • 32 JRC-COIN © | Step 7: Statistical Coherence (II)

    Social Progress Index - Audit results

    From JRC Audit report:

    “Less number of components is also in line with the results from the “beyond GDP” approach by the Commission on the Measurement of Economic Performance and Social Progress*. The SPI conceptual framework has been influenced by this work. So apart from the statistical reasoning, the result from the “beyond GDP” approach, may also give conceptual justification for reducing the number of components in the SPI framework.”

    * Stiglitz, J., Sen,A., Fitoussi,J.-P., 2009. Report by the Commission on the Measurement of Economic Performance and Social Progress. In: Commission on the Measurement of Economic Performance and Social Progress, Paris, France.

  • 33 JRC-COIN © | Step 7: Statistical Coherence (II)

    Historical notes PCA

    PCA is probably the oldest of the multivariate analysis methods.

    Pearson (1901) said that his methods “can be easily applied to numerical problems” and although he says that the calculations become “cumbersome” for four or more variables he suggests that they are “still quite feasible”.

    Hotelling (1933) chose his “components” so as to maximize their successive contributions to the total of the variances of the original variables, “principal components”.

    Karl Pearson (1857-1936)

    Harold Hotelling (1895-1973)

  • 34 JRC-COIN © | Step 7: Statistical Coherence (II)

    Cronbach’s α is measure of internal consistency and reliability

    Based upon classical test theory. Most common estimate of reliability, Cronbach (1951).

    Cronbach’s α is defined as Ratio of the Total Covariance in the “test” to the Total Variance in the “test”.

    By “test” we mean the set of indicators constituting a Sub-pillar/Pillar/Composite Indicator.

    Part 2: Reliability Analysis

  • 35 JRC-COIN © | Step 7: Statistical Coherence (II)

    Cronbach’s α is measure of internal consistency and reliability

    α ranges from 0 to 1. If all indicators are entirely independent from one another (i.e. are not correlated or share no covariance) α = 0. If all of the items have high covariances, then α will approach 1.

    Caution! α increases when the number of indicators increases.

    Cronbach’s α

  • 36 JRC-COIN © | Step 7: Statistical Coherence (II)

    Cronbach’s α is measure of internal consistency and reliability

    Cronbach’s α cut off values

    Cronbach's alpha Internal consistency 0.9 ≤ α Excellent 0.8 ≤ α < 0.9 Good 0.7 ≤ α < 0.8 Acceptable 0.6 ≤ α < 0.7 Questionable 0.5 ≤ α < 0.6 Poor α < 0.5 Unacceptable

    Lee Cronbach (1916-2001)

  • 37 JRC-COIN © | Step 7: Statistical Coherence (II)

    Cronbach’s α is measure of internal consistency and reliability

    Cronbach’s α - European Skills Index

    Cronbach's alpha Internal consistency 0.9 ≤ α Excellent 0.8 ≤ α < 0.9 Good 0.7 ≤ α < 0.8 Acceptable 0.6 ≤ α < 0.7 Questionable 0.5 ≤ α < 0.6 Poor α < 0.5 Unacceptable

    European Skills Index

    Index level (3 pillars)

    Item levelCorrected

    Item-Total

    Correlation

    Alpha if Item

    is dropped

    Pillar 1: Skills Development 0.59 0.28

    Pillar 2: Skills Activation 0.44 0.42

    Pillar 3: Skills Matching 0.23 0.74

    0.59

    Cronbach's Alpha

  • 38 JRC-COIN © | Step 7: Statistical Coherence (II)

    Cronbach’s α is measure of internal consistency and reliability

    Cronbach’s α - Social Progress Index

    Cronbach's alpha Internal consistency 0.9 ≤ α Excellent 0.8 ≤ α < 0.9 Good 0.7 ≤ α < 0.8 Acceptable 0.6 ≤ α < 0.7 Questionable 0.5 ≤ α < 0.6 Poor α < 0.5 Unacceptable

    Social Progress Index

    Index level (3 dimensions)

    Item levelCorrected

    Item-Total

    Correlation

    Alpha if Item

    is dropped

    1: Basic Human Needs 0.91 0.94

    2: Foundations of Wellbeing 0.96 0.89

    3: Opportunity 0.86 0.97

    Cronbach's Alpha

    0.96

    If α is too high it may suggest that some items are redundant as that are measuring very similar concepts.

    Maximum α 0.9 has been recommended.

  • 39 JRC-COIN © | Step 7: Statistical Coherence (II)

    PCA and RA

    Concluding words

    Input Output

    Black Box Open transparent box

  • 40 JRC-COIN © | Step 7: Statistical Coherence (II)

    References

    Books

    • Johnson, “Applied Multivariate Statistical Analysis (6th Revised edition)”, Pearson Education Limited (2013)

    • Jolliffe, “Principal Component Analysis, Second Edition”, Springer (2002)

    • Leandre, Fabrigar, Duane, “Exploratory Factor Analysis”, Oxford (2011)

    • Rencher, “Methods of multivariate analysis”, Wiley (1995)

    • Tinsley and Brown, “Handbook of Applied Multivariate Statistics and Mathematical Modeling”, Academic Press (2000)

    • Tucker, MacCallum, “Exploratory Factor Analysis”, https://www.unc.edu/~rcm/book/factornew.htm (1997)

    https://www.unc.edu/~rcm/book/factornew.htmhttps://www.unc.edu/~rcm/book/factornew.htm

  • Any questions? You may contact us at @username & [email protected]

    Welcome to email us at: [email protected]

    THANK YOU

    COIN in the EU Science Hub https://ec.europa.eu/jrc/en/coin

    COIN tools are available at: https://composite-indicators.jrc.ec.europa.eu/

    The European Commission’s

    Competence Centre on Composite

    Indicators and Scoreboards

    mailto:[email protected]:[email protected]:[email protected]://ec.europa.eu/jrc/en/coinhttps://ec.europa.eu/jrc/en/coinhttps://ec.europa.eu/jrc/en/coinhttps://composite-indicators.jrc.ec.europa.eu/https://composite-indicators.jrc.ec.europa.eu/https://composite-indicators.jrc.ec.europa.eu/https://composite-indicators.jrc.ec.europa.eu/https://composite-indicators.jrc.ec.europa.eu/

  • 42 JRC-COIN © | Step 7: Statistical Coherence (II)

    Data matrix of indicators

    11 12 1

    21 22 2

    1 2

    .....

    .....

    .

    .

    .....

    p

    p

    n n np

    x x x

    x x x

    x x x

    X

    n = number of objects (countries, …)

    p = number of variables (indicators)

    The previous steps in building our CI - Step 3 Outlier detection and missing data estimation - Step 4 Normalisation/Standardisation

    X correlated data set (without outliers and (without or a few) missing values)

    PCA is based on correlations

  • 43 JRC-COIN © | Step 7: Statistical Coherence (II)

    First steps in PCA

    • Check the correlation structure of the data and do 1) and 2)

    1) Bartlett’s sphericity test

    The Bartlett’s test checks if the observed correlation matrix R diverges significantly from the identity matrix.

    H0 : |R| = 1 , H1 : |R| ≠ 1

    Test statistic

    Under H0, it follows a χ² distribution with a p(p-1)/2 degrees of freedom.

    Want to reject H0 to be able to do PCA,FA! (Bartlett (1937))

    Rp

    n log6

    5212

    n = sample size

    p = number of indicators

  • 44 JRC-COIN © | Step 7: Statistical Coherence (II)

    First steps in PCA

    • Check the correlation structure of the data and do 1) and 2)

    2) Kaiser-Meyer-Olkin (KMO) Measure for Sampling Adequacy

    Indicators are more or less correlated, but the correlation between two indicators can be influenced by the others. Use partial correlation in order to measure the relation between two indicators by removing the effect of the remaining indicators (controlling for the confounding variable)

    Want KMO close to 1 to be able to do PCA! KMO>0.6 ok

    (Kaiser (1970), Kaiser-Meyer (1974))

    22

    2

    ijij

    ij

    ar

    rKMO

    rij Pearson correlation coefficient aij partial correlation coefficient

  • 45 JRC-COIN © | Step 7: Statistical Coherence (II)

    PCA and sample size

    Sufficient number of cases (countries, regions, cities…) necessary to perform PCA. NO scientific answer – opinions differ!

    Two different approaches :

    (A) Minimum total sample size

    (B) Examining the ratio of subjects to variables

    Rule of 10: These should be at least 10 cases for each variable

    3:1 ratio: The cases-to variables ratio should be no lower than 3

    5:1 ratio: The cases-to variables ratio should be no lower than 5

  • 46 JRC-COIN © | Step 7: Statistical Coherence (II)

    PCA and sample size (cont.1)

    (A) Minimum total sample size

    Rule of 100: Gorsuch (1983) and Kline (1979, p. 40) recommended at least 100 (MacCallum, Widaman, Zhang & Hong, 1999). No sample should be less than 100 even though the number of variables is less than 20 (Gorsuch, 1974, p. 333; in Arrindell & van der Ende, 1985, p. 166);

    Hatcher (1994) recommended that the number of subjects should be the larger of 5 times the number of variables, or 100. Even more subjects are needed when communalities are low and/or few variables load on each factor (in David Garson, 2008).

    Rule of 150: Hutcheson and Sofroniou (1999) recommends at least 150 - 300 cases, more toward the 150 end when there are a few highly correlated variables, as would be the case when collapsing highly multicollinear variables (in David Garson, 2008).

    Rule of 200: Guilford (1954, p. 533) suggested that N should be at least 200 cases (in MacCallum, Widaman, Zhang & Hong, 1999, p84; in Arrindell & van der Ende, 1985; p. 166).

    Rule of 250: Cattell (1978) claimed the minimum desirable N to be 250 (in MacCallum, Widaman, Zhang & Hong, 1999, p84).

    Rule of 300: There should be at least 300 cases (Noruis, 2005: 400, in David Garson, 2008).

    Significance rule. Lawley and Maxwell (1971) suggested 51 more cases than the number of variables, to support chi-square testing (in David Garson, 2008).

    Rule of 500. Comrey and Lee (1992) thought that 100 = poor, 200 = fair, 300 = good, 500 = very good, 1,000 or more = excellent They urged researchers to obtain samples of 500 or more observations whenever possible (in MacCallum, Widaman, Zhang & Hong, 1999, p84).

  • 47 JRC-COIN © | Step 7: Statistical Coherence (II)

    PCA and sample size (cont.2)

    (B) Examining the ratio of subjects to variables

    A ratio of 20:1: Hair, Anderson, Tatham, and Black (1995, in Hogarty, Hines, Kromrey, Ferron, & Mumford, 2005)

    Rule of 10: There should be at least 10 cases for each item in the instrument being used. (David Garson, 2008; Everitt, 1975; Everitt, 1975, Nunnally, 1978, p. 276, in Arrindell & van der Ende,1985, p. 166; Kunce, Cook, & Miller, 1975, Marascuilor & Levin, 1983, in Velicer & Fava, 1998, p. 232)

    Rule of 5: The subjects-to-variables ratio should be no lower than 5 (Bryant and Yarnold, 1995, in David Garson, 2008; Gorsuch, 1983, in MacCallum, Widaman, Zhang & Hong, 1999; Everitt, 1975, in Arrindell & van der Ende, 1985; Gorsuch, 1974, in Arrindell & van der Ende, 1985, p. 166)

    A ratio of 3(:1) to 6(:1) of subjects-to-variables is acceptable if the lower limit of variables-to-factors ratio is 3 to 6. But, the absolute minimum sample size should not be less than 250.(Cattell, 1978, p. 508, in Arrindell & van der Ende, 1985, p. 166)

    Ratio of 2. “There should be at least twice as many subjects as variables in factor-analytic investigations. This means that in any large study on this account alone, one should have to use more than the minimum 100 subjects" (Kline, 1979, p. 40).

  • 48 JRC-COIN © | Step 7: Statistical Coherence (II)

    Multivariate data

    Data Indicators

    Quantitative Mixed/

    structured

    disPCA/FA,(M)CA

    Qualitative

    PCA/FA FAMD/MFA

    PCA Principal Component Analysis FA Factor Analysis

    PCA for discrete data (based on polychoric correlations) (M)CA (Multiple) Correspondence Analysis ….

    FAMD Factor Analysis of Mixed Data MFA Multiple Factor Analysis ….

    The type of analysis to be performed depends on the data

    of the indicators

  • 49 JRC-COIN © | Step 7: Statistical Coherence (II)

    Exploratory FA

    • Two main types of FA: Exploratory (EFA) and Confirmatory (CFA)

    • EFA examining the relationships among the variables/indicators and do not have an “à priori” fixed number of factors

    • CFA a clear idea about the number of factors you will encounter, and about which variables will most likely load onto each factor

  • 50 JRC-COIN © | Step 7: Statistical Coherence (II)

    Is PCA = Exploratory FA?

    • Similar in many ways but an important difference that has effects on how to use them

    1) Both are data reduction techniques - they allow you to capture the variance in variables in a smaller set

    2) Same (or similar) procedures used to run in many software (same steps extraction, interpretation, rotation, choosing the number of components/factors)

    PCA FA

  • 51 JRC-COIN © | Step 7: Statistical Coherence (II)

    Is PCA = Exploratory FA? NO!

    • Similar in many ways but an important difference that has effects on how to use them

    1) Both are data reduction techniques - they allow you to capture the variance in variables in a smaller set

    2) Same (or similar) procedures used to run in many software (same steps extraction, interpretation, choosing the number of components/factors)

    • Difference:

    PCA is a linear combination of variables

    FA is a measurement model of a latent variable (statistical model)

    PCA FA

  • 52 JRC-COIN © | Step 7: Statistical Coherence (II)

    Is PCA = Exploratory FA? NO!

    PCA FA

    . . .

    Observed indicators are reduced into components

    𝒘𝟏

    𝒘𝟐

    𝒘𝒑

    Component

    Item 1 X1

    Item 2 X2

    Item p Xp

    PCA summarizes information of all indicators and reduces it into a fewer number of components Each PCi is a new variable computed as a linear combination of the original variables

    ppxwxwxwPC ....

    22111

    Factor

    Item 1 X1

    Item 2 X2

    Item p Xp

    . . .

    𝞴𝟏

    𝞴𝟐

    𝞴𝒑

    𝞮𝟏

    Latent factors drive the observed variables

    𝞮𝟐

    𝞮𝒑

    𝞮𝒊

    Model of the measurement of a latent variable. This latent variable cannot be directly measured with a single variable. Seen through the relationships it causes in a set of X variables.

    Error terms

    ppp

    iii

    FX

    FX

    FX

    1

    1

    1111

  • 53 JRC-COIN © | Step 7: Statistical Coherence (II)

    PCA for discrete data

    A practically important violation of the normality assumption underlying the PCA occurs when the data are discrete!

    Several kinds of discrete data (binary, ordinal, count, nominal…)

    Two ways on how to proceed:

    1) Simple approach, use the discrete x’s as if they were continuous in the PCA but instead of using the Pearson moment correlation coefficients use Spearman or Kendall rank correlation coefficients.

    2) PCA using polychoric correlations. Pearson (1901) founder of this approach (as well!), developed further by Olsson (1979) and Jöreskog (2004)

  • 54 JRC-COIN © | Step 7: Statistical Coherence (II)

    PCA using polychoric correlations

    The polychoric correlation (PCC) coefficient is a measure of association for ordinal variables which rests upon an assumption of an underlying joint continuous normal distribution. (Later generalizations of different continuous distributions).

    PCC coefficient is MLE of the product-moment correlation between the underlying normal variables

  • 55 JRC-COIN © | Step 7: Statistical Coherence (II)

    Cronbach’s α is measure of internal consistency and reliability Based upon classical test theory. Most common estimate of reliability, Cronbach (1951)

    Cronbach’s α is defined as Ratio of the Total Covariance in the “test” to the Total Variance in the “test”. By “test” we mean the set of indicators constituting a sub-pillar/ pillar/CI.

    Let be the variance of indicator i and total variance in sub-pillar/pillar/CI

    Average covariance of an indicator with any other indicator is

    Part 2: Reliability Analysis

    )( iXVar p

    i iXVar

    1

    )1(

    )(11

    pp

    XVarXVarp

    i i

    p

    i i

    p

    i i

    p

    i i

    p

    i i

    p

    i i

    p

    i i

    p

    i i

    XVar

    XVarXVar

    p

    p

    pXVar

    ppXVarXVar

    1

    11

    2

    1

    11)(

    1/

    )1(/)(

    p = number of indicators

    p

    i i

    p

    i i

    XVar

    XVar

    p

    p

    1

    1)(

    11