Introduction to Principal Components Analysis and Factor ...ltabata/mlm/EDEP606-S2013/Wk8_H10-Intro to PC… · Introduction to Principal Components Analysis and Factor Analysis

Ronald H. Heck and Lynn N. Tabata 1 EDEP 606: Multivariate Methods (S2013) February 24, 2013

Introduction to Principal Components Analysis and Factor Analysis Often when we are doing survey research we have a large number of items that we think may comprise a smaller subset of variables. Over the years, several different approaches have been developed for thinking about what these variables representing the reduced dimensions in the data set are. In general, they fall into two basic approaches: principal components analysis (PCA) and exploratory factor analysis (EFA). These basic approaches have generated a considerable amount of debate over their proper uses, the nature of their indicators, as well as whether or not one approach or the other is better able to “capture the real world.” Certain fields may have a preference for one approach over the other. Psychologists often are interested in the nature of underlying constructs, which because they are abstract concepts such as job satisfaction, anxiety, or leadership, must be measured by a set of observable variables. This approach has typically favored factor analysis, where the underlying constructs (or latent factors) are proposed to “cause” the pattern of covariation observed among a set of observed indicators. The observed indicators in the FA approach are often termed as “reflective.” We might think of the proposed relationships like the following in Fig.1, where the factor “causes” the relationships among its observed indicators (with any measurement errors indicated by short arrows):

Figure 1. Proposed factor model. The EFA approach focuses on the amount of “common variance” shared between the underlying factor and its observed indicators (with stronger solutions accounting for more common variance and having less corresponding measurement error). A factor score can be saved in the data set, which represents a weighted combination of the observed indicators, adjusted for error, which defines individuals’ standing on the construct. Different methods of estimation (e.g., principal axis factoring, maximum likelihood) may result in slightly different estimation of the factor scores. We may define the equations in the above figure as follows: X1 = λ1F1 + e1

X2 = λ2F1 + e2 X3 = λ3F1 + e3

Ronald H. Heck and Lynn N. Tabata Introduction to PCA and FA 2 EDEP 606: Multivariate Methods (S2013) February 24, 2013

In contrast, PCA is primarily a data reduction method where there is not necessarily some “proposed” conceptual relationship posited between the specific items measuring a latent variable. Unlike factor analysis, the observed variables are proposed to cause the component. The observed indicators are referred to as “formative” in this approach. In this case, therefore, there is no measurement error associated with the observed indicators. In the PCA case, the goal is to summarize the data through a linear combination of the observed variables. In the figure below, we can see this is similar to a multiple regression model, where the observed variables represent a linear combination of the outcome. The difference is that the principal component is an “artificial” variable, rather than an actual dependent variable as in a multiple regression analysis. In this approach, the goal is to obtain a solution that accounts for almost all of the variance among the observed variables in a correlation matrix through a minimum number of components which must be orthogonal (uncorrelated) with each other. The “total variance” in the correlation matrix of observed variables is the sum of the variances of the set of observed variables, which because they are standardized to 1.0, represents the sum of the diagonal elements (or 3 in this case). Keep in mind that all the variables in the correlation matrix will always account for “all” of the observed variance, so the idea is to reduce the number of components needed to account for as much variance as possible. If one or two components can account for almost all of the observed variance, we would feel comfortable that we had “reduced,” or simplified, the dimensionality of the data sufficiently. We might think of the implied relationships in a PCA as follows:

Figure 2. Proposed PCA model. In the figure above, the arrows from the observed indicators suggest they cause the component, which actually represents a “weighted” combination of the observed variables. The component score (or scores) can be saved in the data set for further analysis. The component represents a weighted combination of the observed indicators: C1 = b 11(X1) + b12(X 2) + b13(X 3) + ... + b1p(Xp) It is possible that for a set of observed variables there might be more than one component, but each is constructed to be a weighted combination that is uncorrelated with the previous component. Each added component will account for less variance than the preceding component. Typically, we only wish to interpret a limited number of components that account for most of the variance in the observed correlation matrix.


Applying Each Approach to Actual Data Keep in mind that the different assumptions underlying each approach may make one approach more suitable to an investigation than the other approach. In the following example1, let’s suppose we have six items on a survey that we think might measure two underlying constructs. One we might call professional support and the other we might call discrimination. We obtain responses from 100 subjects in an organization, and we want to see if indeed the items can be used to define two hypothesized constructs. Let’s look at the correlations between the observed items. We can see two clusters of positively correlated items (the p items defining professional support) and the disc items defining discrimination. The correlations between the three professional support items range from .548 to .659, which the correlations between the three discrimination items range from .424 to .715. It is likely, however, that we might have two separate underlying factors. We can add the variances in the diagonals representing each observed item and obtain the estimate of the total variance in the matrix as 6.0. Preliminary Model, Table 1. Correlations

proact promo percrit sexdisc ethdisc agedisc

Proact 1 .548** .653** -.320** -.365** -.349**

Promo .548** 1 .659** -.385** -.369** -.299**

Percrit .653** .659** 1 -.246* -.257** -.310**

Sexdisc -.320** -.385** -.246* 1 .715** .424**

Ethdisc -.365** -.369** -.257** .715** 1 .496**

Agedisc -.349** -.299** -.310** .424** .496** 1

**. Correlation is significant at the 0.01 level (2-tailed).

*. Correlation is significant at the 0.05 level (2-tailed).

What happens if we first apply PCA, which is the default in SPSS? PCA proceed by “repackaging” the total variance of p variables in the correlation matrix R into p eigenvalues (i.e., this is accomplished through finding a matrix V of eigenvectors and a diagonal matrix L which contains eigenvalues in its main diagonal (R=VLV′). Marcoulides and Hershberger (1997) provide further discussion of how the principal components are then derived as the product of the eigenvectors and square roots of the eigenvalues through the eigenvalue equation (see pp. 168-171). One of the assumptions of PCA is a component derived from the analysis should account for at least an eigenvalue of 1.0, since each observed variable in the correlation matrix accounts for variance of 1.0. Component extraction will therefore produce p components, which together account for all of the variance in the p variables. We might hope the first component extracted might account for at least an eigenvalue of 4.0, which would be 2/3, or 67%, of the observed variance (i.e., 4/6 = 2/3, or 67%). Let’s see what happens.

1 Instructions for replicating the models and tables is included at the end of this handout.


We obtain the following information about the total variance accounted for by the extracted components. We can see the solution obtained two components with eigenvalues greater than 1.0. The first component accounted for about 53.3% of the observed variance (a bit less than the initial guess of 67%). The second component accounted for about 20.4% of the variance for a total of about 72.7% of the variance. As the default is extracting eigenvalues of at least 1.0, we can see that extracting further components would not meet the default criterion of accounting for at least an eigenvalue of 1.0, so the output suggests stopping here. Note that we can also override this default criterion and tell the program to extract a certain number of components. Model 1, Table 2. Total Variance Explained

Component Initial Eigenvalues Extraction Sums of Squared Loadings

Total % of

Variance

Cumulative % Total % of Variance Cumulative %

1 3.138 52.298 52.298 3.138 52.298 52.298

2 1.224 20.400 72.698 1.224 20.400 72.698

3 .635 10.591 83.289

4 .443 7.379 90.668

5 .287 4.784 95.452

6 .273 4.548 100.000

Extraction Method: Principal Component Analysis. We can obtain a scree plot which shows that after two components there is little gained by adding successive components. The third through sixth components would fall below an eigenvalue of 1.0, so they would be assumed to not add much of value to the data reduction.


Figure 3. Model 1 scree plot Here is how the observed variables tend to explain the components. Below we can see that for the first component, the professional support items load positively, and the discrimination items load negatively. On the second component, there is a positive weighting of each. Remember that the components themselves must be uncorrelated, so if there is any actual correlation between the professional support and discrimination constructs, it will be artificially removed. I would say this is not a real “pleasing” solution, since we had in mind that there might be two distinct constructs (one defined as professional support and one defined as discrimination). These do not show up well here. It should be noted that people often “rotate” the PCA solution to obtain more “desirable” looking solutions, but there is some debate about this. A number of researchers (e.g., Rencher, 2002, p. 403) and some software programs (e.g., Stata) advise against this extra step, since the rotated solutions may no longer be orthogonal (which violates a primary assumption of PCA). If the primary reason for the PCA is to reduce the dimensionality of the data, there is no need for rotation since the components are exact orthogonal linear combinations that maximize the total variance in the observed correlation matrix. If the goal is to see the variables in relation to the component functions in physical space, then varimax (orthogonal) rotation may facilitate the overall “look” of the PCA.


Model 1, Table 3. Component Matrixa

Component

1 2

Proact .754 .380

Promo .762 .359

Percrit .728 .549

Sexdisc -.708 .520

Ethdisc -.734 .526

Agedisc -.646 .319

Extraction Method: Principal Component

Analysis.

a. 2 components extracted.

Let’s see what happens if we use an EFA approach instead of PCA. In this case, we will first let the computer derive a proposed solution. We will let it again extract “underlying factors” based on eigenvalues above 1.0. We might expect two underlying factors, given the way we constructed our instrument to measure support and discrimination. We will use principal axis factoring (PAF), which is one commonly used approach. In EFA we typically use a rotation scheme either forcing the factors to be uncorrelated (which is easier to interpret) or allowing them to be correlated. In this first example, we will use varimax rotation, which specifies that the final set of factors must be uncorrelated. Here we first obtain estimates of how much variance is explained by the factors extracted. We can see the initial estimates suggest two factors with eigenvalues over 1.0. Once the factors are actually extracted, the second factor accounts for a bit less total variance than 1.0. However, we would typically keep the factor since it is hypothesized to exist in our model. We can see that after rotating the factors, each accounts for about an equal amount of the “common variance” between the items and the underlying factors. Model 2, Table 4. Total Variance Explained

Factor Initial Eigenvalues Extraction Sums of Squared

Loadings

Rotation Sums of Squared

Loadings

Total % of

Variance

Cumulative

%

Total % of

Variance

Cumulativ

e %

Total % of

Variance

Cumulative

%

1 3.138 52.298 52.298 2.777 46.279 46.279 1.911 31.852 31.852

2 1.224 20.400 72.698 .936 15.600 61.879 1.802 30.027 61.879

3 .635 10.591 83.289

4 .443 7.379 90.668

5 .287 4.784 95.452

6 .273 4.548 100.000

Extraction Method: Principal Axis Factoring.


The matrix of item loadings on the underlying factors is summarized in the following table. Here we can see clearly the presence of two different underlying constructs. The first factor (which we might call professional support) is defined by relatively high loadings of percrit, proact, and promo. Items should load on their factors at less than 1.0. The item most defined with the underlying dimension is percrit. The discrimination items are little associated with the professional support factor, since their loadings are small (under .30) and negative. The second factor is defined primarily by high loadings related to the discrimination items, and the support items have small negative loadings on the second factor. Keep in mind that a limitation of varimax rotation is that the underlying factors are forced to be uncorrelated. This might not reflect the “true life” nature of the two constructs, however. Model 2, Table 5. Rotated Factor Matrixa

Factor

1 2

proact .686 -.274

promo .680 -.294

percrit .916 -.107

sexdisc -.187 .766

ethdisc -.172 .893

agedisc -.274 .494


Rotation Method: Varimax with Kaiser

Normalization.

a. Rotation converged in 3 iterations.

Let’s try the factor analysis again with direct oblimin rotation, which allows factors to be correlated. Here, we will concentrate on the factor loadings and correlation between factors. When we specify direct oblimin rotation we obtain two relevant tables regarding the relationship of the times to the underlying factors. The first is the Pattern Loading Matrix. This is similar to the standardized beta estimates in a multiple regression analysis. The information provides the relative weights (or importance) of each item in defining the underlying factor, adjusting out the effects of the other items. In this case, we can see that the professional support items load quite well on the first factor (from about .67 to .99). In contrast, the discrimination items have very little relationship to the first factor. Instead, the discrimination items go quite well with the second factor (with loadings from about .47 to .94). The support items are only weakly associated with the second factor. This seems to represent a satisfactory solution.


Model 3, Table 6. Pattern Matrixa

Factor

1 2

Proact .682 -.105

Promo .669 -.129

Percrit .985 .147

Sexdisc .014 .795

Ethdisc .068 .941

Agedisc -.161 .468Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization. a. Rotation converged in 7 iterations. The Structure Matrix provides the correlations between the items and the factors. It should resemble the Pattern Matrix, but the factor loadings will not be exactly the same since in the latter case, each loading represents the correlation between the item and the underlying dimension. Here we can see the support items define the first factor while the discrimination items define the second factor (with negative correlations for the items proposed to load primarily on the other factor). Model 3, Table 7. Structure Matrix

Factor

1 2

proact .733 -.439

promo .732 -.456

percrit .913 -.335

sexdisc -.375 .788

ethdisc -.392 .908

agedisc -.390 .547Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization.


We can also examine the correlation between the two factors. It is -0.489, which suggests a moderate negative relationship between perceptions of professional support and discrimination. In this case, forcing the factors to not be correlated would obscure this relationship. Model 3, Table 8. Factor Correlation Matrix

Factor 1 2

1 1.000 -.489

2 -.489 1.000


Rotation Method: Oblimin with Kaiser

Normalization. We can also look at the loadings of the items on the rotated factors. Items closer to the axes can be considered to be more strongly related to the underlying dimension. We can see, for example, that age discrimination is least related to the discrimination factor, and that the professional support items are best represented by percrit (specific performance criteria are related to how I am evaluated).

Figure 4. Loadings of Model 4’s items on the rotated factors. At this point, we might decide that we have represented the two factors generally well with the set of observed indicators. We might then decide to save the two factor scores in the data base


and use them for further types of analyses, for example, investigating how individuals’ backgrounds might differentiate people according to their standing on the two latent factors.

References

Marcoulides, G. A. & Hershberger, S. L.(1997). Multivarite statistical methods: A first course. Mahwah, NJ: Lawrence Erlbaum.

Rencher, A. (2002). Methods of multivariate analysis (2nd Edition). NY: Wiley


Defining Correlations (Table 1) with IBM SPSS Menu Commands

IBM SPSS syntax:

CORRELATIONS /VARIABLES=proact promo percrit sexdisc ethdisc agedisc /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE.

(Launch the IBM SPSS application program and select the FA-orgturnover.sav data file.) 1. Go to the toolbar, select ANALYZE, CORRELATE, BIVARIATE. This command opens the Bivariate Correlations main dialog box.


2. In the Bivariate Correlations main dialog box we will select 6 variables to be assessed then click the right arrow button to move them into the Variables box. Note: An alternative method is to click and select the variables as a group then “drag” them to the Variables box. We will retain the default settings so click the OK button to generate the results.


Defining Factor Analysis Model 1 (Tables 2, 3; Figure 3) with IBM SPSS Menu Commands

IBM SPSS syntax:

FACTOR /VARIABLES proact promo percrit sexdisc ethdisc agedisc /MISSING LISTWISE /ANALYSIS proact promo percrit sexdisc ethdisc agedisc /PRINT INITIAL EXTRACTION /PLOT EIGEN /CRITERIA MINEIGEN(1) ITERATE(25) /EXTRACTION PC /ROTATION NOROTATE /METHOD=CORRELATION.

(Continue using the FA-orgturnover.sav data file.) 1. Go to the toolbar, select DIMENSION REDUCTION, FACTOR. This command opens the Factor Analysis main dialog box.


2. In the Factor Analysis main dialog box click to select the 6 variables then click the right arrow button to move them into the Variables box. Note: An alternative method is to click and select the variables as a group then “drag” them to the Variables box. Click the EXTRACTION button to access the Factor Analysis: Extraction dialog box for specifying the method of extraction to be used. 3. The Factor Analysis: Extraction dialog box offers a variety of factor extraction methods. a. For this analysis, we will use the default principal components method. As noted in the IBM SPSS online Help reference, Principal Components is used to form uncorrelated linear combinations of the observed variables. The first component has maximum variance. Successive components explain progressively smaller portions of the variance and are all uncorrelated with each other. Principal components analysis is used to obtain the initial factor solution. It can be used when a correlation matrix is singular. b. We want to have a scree plot generated so click to select: Scree plot. c. We will retain all factors whose eigenvalues exceed 1 which is the default setting.

Click the CONTINUE button to return to the Factor Analysis main dialog box.


4. From the Factor Analysis main dialog box click the OK button to generate the output results.


Defining Factor Analysis Model 2 (Tables 4, 5) with IBM SPSS Menu Commands

IBM SPSS syntax:

FACTOR /VARIABLES proact promo percrit sexdisc ethdisc agedisc /MISSING LISTWISE /ANALYSIS proact promo percrit sexdisc ethdisc agedisc /PRINT INITIAL EXTRACTION ROTATION /PLOT EIGEN /CRITERIA MINEIGEN(1) ITERATE(25) /EXTRACTION PAF /CRITERIA ITERATE(25) /ROTATION VARIMAX /METHOD=CORRELATION.

(Continue using the FA orgturnover.sav data file. IBM SPSS settings default to those used in Model 1.) 1. Go to the toolbar, select DIMENSION REDUCTION, FACTOR. This command opens the Factor Analysis main dialog box.


2. In the Factor Analysis main dialog box click the EXTRACTION button to access the Factor Analysis: Extraction dialog box for specifying the method of extraction to be used.

3. The Factor Analysis: Extraction dialog box offers a variety of factor extraction methods. a. For this analysis, click the pull-down menu and select: Principal axis factoring. As noted in the IBM SPSS online Help reference, Principal Axis Factoring extracts factors from the original correlation matrix with squared multiple correlation coefficients placed in the diagonal as initial estimates of the communalities. These factor loadings are used to estimate new communalities that replace the

old communality estimates in the diagonal. Iterations continue until the changes in the communalities from one iteration to the next satisfy the convergence criterion for extraction. We will retain the default Extract (based on Eigenvalues) and Display (Unrotated factor solution) settings.

Click the CONTINUE button to return to the Factor Analysis main dialog box.


3a. We will select the method of rotation to be used so from the Factor Analysis main dialog box click the ROTATION button to access the Factor Analysis: Rotation dialog box. b. To specify the rotation method for this analysis click to select: Varimax. As noted in IBM SPSS online Help (2011), varimax is an orthogonal rotation method that minimizes the number of variables that have high loadings on each factor. This method simplifies the interpretation of the factors. We will use the default Rotated x old communality estimates in the diagonal. Iterations continue until the changes in the communalities from one iteration to the next satisfy the convergence criterion for extraction.

Click the CONTINUE button to return to the main Factor Analysis dialog box. From the Factor Analysis main dialog box click the OK button to generate the output results.


Defining Factor Analysis Model 3 (Tables 6, 7, 8; Figure 4) with IBM SPSS Menu Commands

IBM SPSS syntax:

FACTOR /VARIABLES proact promo percrit sexdisc ethdisc agedisc /MISSING LISTWISE /ANALYSIS proact promo percrit sexdisc ethdisc agedisc /PRINT INITIAL EXTRACTION ROTATION /PLOT EIGEN ROTATION /CRITERIA MINEIGEN(1) ITERATE(25) /EXTRACTION PAF /CRITERIA ITERATE(25) DELTA(0) /ROTATION OBLIMIN /METHOD=CORRELATION.

(Continue using the FA orgturnover.sav data file. IBM SPSS settings default to those used in Model 2.) 1. Go to the toolbar, select DIMENSION REDUCTION, FACTOR. This command opens the Factor Analysis main dialog box.


2a. In the Factor Analysis main dialog box click the ROTATION button to access the Factor Analysis: Rotation dialog box. b. To specify the rotation method for this analysis click to select: Direct Oblimin. We will retain the default delta of “0”. Direct Oblimin is a method for oblique (nonorthogonal) rotation. When delta equals 0 (the default), solutions are most oblique. (IBM SPSS online Help, 2011.) d. We will use the default Rotated solution setting. However, we would like to see the loadings of the items on the rotated factors as items closer to the axes can be considered to be more strongly related to the underlying dimension. To generate factor plot in the output, click to select: Loading plot(s). Click the CONTINUE button to return to the main Factor Analysis dialog box. From the Factor Analysis main dialog box click the OK button to generate the output results.

Documents

Introduction to Principal Components Analysis and Factor ...ltabata/mlm/EDEP606-S2013/Wk8_H10-Intro to PC… · Introduction to Principal Components Analysis and Factor Analysis