31
Exploratory Factor Analysis 0. Introduction In social sciences (e.g., psychology), it is often not possible to measure the variables of interest directly. Examples: Intelligence Social class Such variables are called latent variables or common factors . Researchers examine such variables indirectly, by measuring variables that can be measured and that are believed to be indicators of the latent variables of interest. Examples: Examination scores on various tests Occupation, education, home ownership Such variables are called observed variables . Goal: study the relationship between the latent variables and the observed variables 1. Psychological Testing Data data(Harman74.cor) test.cor = Harman74.cor$cov[c(6, 7, 9, 10, 12),c(6, 7, 9, 10, 12)] colnames(test.cor) = c("PARA","SENT","WORD","ADD","COUNT") rownames(test.cor) = colnames(test.cor) test.cor PARA SENT WORD ADD COUNT PARA 1.000 0.722 0.714 0.203 0.095 SENT 0.722 1.000 0.685 0.246 0.181 WORD 0.714 0.685 1.000 0.170 0.113 ADD 0.203 0.246 0.170 1.000 0.585 COUNT 0.095 0.181 0.113 0.585 1.000 image(1:5, 1:5, test.cor, zlim=c(-1,1), col=cm.colors(21)) # Plot of correlations - magenta = positive, cyan = negative image(1:5, 1:5, test.cor, zlim=c(-1,1), col=grey(0:20/20) ) # Similar, with white = +1, black = -1

R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

Exploratory Factor Analysis

0. Introduction

In social sciences (e.g., psychology), it is often not possible to measure the variables of interest directly. Examples:

Intelligence Social class

Such variables are called latent variables or common factors. Researchers examine such variables indirectly, by measuring variables that can be measured and that are believed to be indicators of the latent variables of interest. Examples:

Examination scores on various tests Occupation, education, home ownership

Such variables are called observed variables.

Goal: study the relationship between the latent variables and the observed variables

1. Psychological Testing Data

data(Harman74.cor) test.cor = Harman74.cor$cov[c(6, 7, 9, 10, 12),c(6, 7, 9, 10, 12)] colnames(test.cor) = c("PARA","SENT","WORD","ADD","COUNT") rownames(test.cor) = colnames(test.cor) test.cor PARA SENT WORD ADD COUNTPARA 1.000 0.722 0.714 0.203 0.095SENT 0.722 1.000 0.685 0.246 0.181WORD 0.714 0.685 1.000 0.170 0.113ADD 0.203 0.246 0.170 1.000 0.585COUNT 0.095 0.181 0.113 0.585 1.000 image(1:5, 1:5, test.cor, zlim=c(-1,1), col=cm.colors(21))

# Plot of correlations - magenta = positive, cyan = negative image(1:5, 1:5, test.cor, zlim=c(-1,1), col=grey(0:20/20) )

# Similar, with white = +1, black = -1

1b. Principal components analysis

test.pc = eigen(test.cor)# Principal components analysis

Page 2: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

test.pc$values[1] 2.5875 1.4217 0.4152 0.3111 0.2645 plot(test.pc$values,type="o", pch=16) abline(h=1,col="grey")

test.pc$vectors[,1:2] [,1] [,2][1,] -0.5345 -0.2449[2,] -0.5424 -0.1641[3,] -0.5234 -0.2470[4,] -0.2971 0.6268[5,] -0.2406 0.6776

test.loadings =test.pc$vector%*%diag(sqrt(test.pc$values))test.loadings [,1] [,2] [,3] [,4] [,5][1,] -0.8597723 -0.2920337 -0.07368865 -0.055092837 0.40870847[2,] -0.8725092 -0.1957119 0.03846385 -0.368390488 -0.25146294[3,] -0.8419192 -0.2945670 0.09276139 0.411615628 -0.16238911[4,] -0.4779188 0.7473420 -0.45579984 0.053423069 -0.04965974[5,] -0.3869818 0.8079858 0.43809069 -0.008495525 0.07354173test.loadings %*% t(test.loadings) # same as the correlation test.cor [,1] [,2] [,3] [,4] [,5][1,] 1.000 0.722 0.714 0.203 0.095[2,] 0.722 1.000 0.685 0.246 0.181[3,] 0.714 0.685 1.000 0.170 0.113[4,] 0.203 0.246 0.170 1.000 0.585[5,] 0.095 0.181 0.113 0.585 1.000

1c. Exploratory factor analysis – two factors

test.fa2 = factanal(covmat = test.cor, factors=2, n.obs=145)

The R function factanal() assume that X has a multivariate normal distribution and estimate the log likelihood function over the factor loading matrix and Uniqnesness to estimate the parameters (i.e. MLE estimates are obtained iteratively).

test.fa2

Call:

Page 3: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

factanal(factors = 2, covmat = test.cor, n.obs = 145)

Uniquenesses: PARA SENT WORD ADD COUNT 0.242 0.300 0.327 0.574 0.155

Loadings: Factor1 Factor2PARA 0.867 SENT 0.820 0.166 WORD 0.816 ADD 0.167 0.631 COUNT 0.918

Factor1 Factor2SS loadings 2.119 1.282Proportion Var 0.424 0.256Cumulative Var 0.424 0.680

Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 0.58 on 1 degree of freedom.The p-value is 0.446 # df(chi sq) = df(test.cor)-df(factor 1)–df(factor 2) = 10 – 5 – 4 = 1

apply(test.fa2$loadings^2,2,sum) Factor1 Factor2 2.119212 1.281613 apply(test.fa2$loadings^2,1,sum) PARA SENT WORD ADD COUNT 0.7575545 0.7002648 0.6727687 0.4256436 0.8445924 apply(test.fa2$loadings^2,1,sum)+test.fa2$uniqueness PARA SENT WORD ADD COUNT 1.0000002 0.9999997 0.9999999 1.0000005 1.0000000

test.fa2$loadings%*%t(test.fa2$loadings)+diag(test.fa2$uniqueness) # same as cor(X) after computational roundings PARA SENT WORD ADD COUNTPARA 1.00000018 0.7233842 0.7137061 0.1908447 0.09714164SENT 0.72338424 0.9999997 0.6833973 0.2421572 0.18171652WORD 0.71370606 0.6833973 0.9999999 0.1916676 0.10913971ADD 0.19084473 0.2421572 0.1916676 1.0000005 0.58499501COUNT 0.09714164 0.1817165 0.1091397 0.5849950 1.00000002

1d. One-factor model

test.fa1 = update(test.fa2, factors=1)# shorthand to make minor changes to a model

test.fa1Call:factanal(factors = 1, covmat = test.cor, n.obs = 145)

Uniquenesses: PARA SENT WORD ADD COUNT 0.258 0.294 0.328 0.933 0.970

Page 4: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

Loadings: Factor1PARA 0.861 SENT 0.840 WORD 0.820 ADD 0.260 COUNT 0.173

Factor1SS loadings 2.217Proportion Var 0.443

Test of the hypothesis that 1 factor is sufficient.The chi square statistic is 58.17 on 5 degrees of freedom.The p-value is 2.91e-11 # df(chi sq) = df(test.cor)-df(factor 1) = 10 – 5 = 5 # small p-value; reject the one-factor model

Can you show that the chisq df is actually equals to ½[(p-c)2-p-c] where p is the number of X variables and c is the number of factors chosen?

Often sequential testing procedure is used: start with 1 factor and then increase the number of factors one at a time until test doesn’t reject the null hypothesis. It can occur that the test always rejects the null hypothesis. This is an indication that the modeldoes not fit well, or that the sample size is too large. Other times the sequential chi-square test tends to over-estimate the number of factors needed for a successful interpretation.

An alternative utility approach which is more computational intensive is: Perform factor analyses with various values of c, complete with rotation, and choose the smallest c that gives the most appealing structure.

2. Artificial Data (From R factanal help page)

A little demonstration, v2 is just v1 with noise, and same for v4 vs. v3 and v6 vs. v5

v1 = c(1,1,1,1,1,1,1,1,1,1,3,3,3,3,3,4,5,6) v2 = c(1,2,1,1,1,1,2,1,2,1,3,4,3,3,3,4,6,5) v3 = c(3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,5,4,6) v4 = c(3,3,4,3,3,1,1,2,1,1,1,1,2,1,1,5,6,4) v5 = c(1,1,1,1,1,3,3,3,3,3,1,1,1,1,1,6,4,5) v6 = c(1,1,1,2,1,3,3,3,4,3,1,1,1,2,1,6,5,4) m1 = cbind(v1,v2,v3,v4,v5,v6) pairs(m1) pairs(m1+runif(6*18, -.3, .3)) # "jittering" to break ties

Page 5: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

2a. Principal components

m1.pc = prcomp(m1) plot(m1.pc$sdev^2, type="o", pch=16) abline(h=1,col="grey") pairs(m1.pc$x[,1:3])

m1.pc$rotation[,1:3] PC1 PC2 PC3v1 0.4168 -0.52292 0.2354v2 0.3886 -0.50888 0.2986v3 0.4183 0.01522 -0.5555v4 0.3944 0.02184 -0.5986v5 0.4254 0.47017 0.2923v6 0.4048 0.49581 0.3210

2b. Factor analysis

m1.fa1 = factanal(m1, factors=1) m1.fa1

Page 6: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

Call:factanal(x = m1, factors = 1)

Uniquenesses: v1 v2 v3 v4 v5 v6 0.773 0.792 0.733 0.795 0.022 0.085

Loadings: Factor1v1 0.476 v2 0.456 v3 0.517 v4 0.453 v5 0.989 v6 0.956

Factor1SS loadings 2.800Proportion Var 0.467

Test of the hypothesis that 1 factor is sufficient.The chi square statistic is 53.43 on 9 degrees of freedom.The p-value is 2.43e-08

m1.fa2 = factanal(m1, factors=2, rotation="none") m1.fa2

Call:factanal(x = m1, factors = 2, rotation = "none")

Uniquenesses: v1 v2 v3 v4 v5 v6 0.005 0.114 0.642 0.742 0.005 0.097

Loadings: Factor1 Factor2v1 0.853 -0.518 v2 0.804 -0.490 v3 0.598 v4 0.508 v5 0.857 0.510 v6 0.796 0.519

Factor1 Factor2SS loadings 3.358 1.038Proportion Var 0.560 0.173Cumulative Var 0.560 0.733

Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 23.14 on 4 degrees of freedom.The p-value is 0.000119

m1.fa3 = factanal(m1, factors=3, rotation="none") m1.fa3

Page 7: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

Call:factanal(x = m1, factors = 3, rotation = "none")

Uniquenesses: v1 v2 v3 v4 v5 v6 0.005 0.101 0.005 0.224 0.084 0.005

Loadings: Factor1 Factor2 Factor3v1 0.808 -0.385 0.440 v2 0.752 -0.290 0.500 v3 0.813 -0.229 -0.530 v4 0.729 -0.139 -0.474 v5 0.802 0.521 v6 0.764 0.636

Factor1 Factor2 Factor3SS loadings 3.638 0.980 0.957Proportion Var 0.606 0.163 0.159Cumulative Var 0.606 0.770 0.929

The degrees of freedom for the model is 0 and the fit was 0.4755

m1.fa3a = factanal(m1, factors=3, rotation="varimax", scores="regression") # default rotation

m1.fa3a # Note improved interpretation of loadings

Call:factanal(x = m1, factors = 3, scores = "regression", rotation = "varimax")

Uniquenesses: v1 v2 v3 v4 v5 v6 0.005 0.101 0.005 0.224 0.084 0.005

Loadings: Factor1 Factor2 Factor3v1 0.944 0.182 0.267 v2 0.905 0.235 0.159 v3 0.236 0.210 0.946 v4 0.180 0.242 0.828 v5 0.242 0.881 0.286 v6 0.193 0.959 0.196

Factor1 Factor2 Factor3SS loadings 1.893 1.886 1.797Proportion Var 0.316 0.314 0.300Cumulative Var 0.316 0.630 0.929The degrees of freedom for the model is 0 and the fit was 0.4755

pairs(m1.fa3a$scores)

Page 8: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

3. Girls physical measurements data

Correlation matrix of 8 physical measurements on 305 girls between 7 and 17

data(Harman23.cor) girls.cor = Harman23.cor$cov girls.cor height arm.span forearm lower.leg weight bitro.diameterheight 1.000 0.846 0.805 0.859 0.473 0.398arm.span 0.846 1.000 0.881 0.826 0.376 0.326forearm 0.805 0.881 1.000 0.801 0.380 0.319lower.leg 0.859 0.826 0.801 1.000 0.436 0.329weight 0.473 0.376 0.380 0.436 1.000 0.762bitro.diameter 0.398 0.326 0.319 0.329 0.762 1.000chest.girth 0.301 0.277 0.237 0.327 0.730 0.583chest.width 0.382 0.415 0.345 0.365 0.629 0.577 chest.girth chest.widthheight 0.301 0.382arm.span 0.277 0.415forearm 0.237 0.345lower.leg 0.327 0.365weight 0.730 0.629bitro.diameter 0.583 0.577chest.girth 1.000 0.539chest.width 0.539 1.000 image(1:8, 1:8, girls.cor, zlim=c(-1,1), col=cm.colors(21) ) girls.pc = eigen(girls.cor) # Principal components analysis girls.pc$values[1] 4.67288 1.77098 0.48104 0.42144 0.23322 0.18667 0.13730 0.09646 plot(girls.pc$values,type="o", pch=16) abline(h=1,col="grey")

Page 9: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

rownames(girls.pc$vectors) = colnames(girls.cor) girls.pc$vectors[,1:2] [,1] [,2]height -0.3976 -0.2797arm.span -0.3893 -0.3314forearm -0.3762 -0.3446lower.leg -0.3884 -0.2971weight -0.3507 0.3942bitro.diameter -0.3119 0.4007chest.girth -0.2855 0.4359chest.width -0.3102 0.3144

#Rotate PC’s by varimax()

girls.pc.loadings=girls.pc$vectors %*% diag(sqrt(girls.pc$values))

summary(as.vector(girls.pc.loadings %*% t(girls.pc.loadings) - girls.cor) ) Min. 1st Qu. Median Mean 3rd Qu. Max. -1.443e-15 -7.772e-16 -5.551e-16 -6.098e-16 -4.441e-16 0.000e+00

varimax(girls.pc.loadings[,c(1:2)])$loadings

Loadings: [,1] [,2] height -0.902 0.252arm.span -0.932 0.187forearm -0.920 0.156lower.leg -0.901 0.222weight -0.258 0.885bitro.diameter -0.188 0.839chest.girth -0.114 0.839chest.width -0.257 0.747

[,1] [,2]SS loadings 3.522 2.922Proportion Var 0.440 0.365

Page 10: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

Cumulative Var 0.440 0.805

$rotmat [,1] [,2][1,] 0.7768362 -0.6297027[2,] 0.6297027 0.7768362

Note that:

3.522+2.922[1] 6.444cumsum(girls.pc$values)[1] 4.672880 6.443862 6.924898 7.346339 7.579560 7.766233 7.903537 8.000000

girls.fa1 = factanal(covmat=girls.cor, factors=1, n.obs=305) girls.fa1

Call:factanal(factors = 1, covmat = girls.cor, n.obs = 305)

Uniquenesses: height arm.span forearm lower.leg weight 0.158 0.135 0.190 0.187 0.760 bitro.diameter chest.girth chest.width 0.829 0.877 0.801

Loadings: Factor1height 0.918 arm.span 0.930 forearm 0.900 lower.leg 0.902 weight 0.490 bitro.diameter 0.413 chest.girth 0.351 chest.width 0.446

Factor1SS loadings 4.064Proportion Var 0.508

Test of the hypothesis that 1 factor is sufficient.The chi square statistic is 611.4 on 20 degrees of freedom.The p-value is 1.12e-116 girls.fa2 = factanal(covmat=girls.cor, factors=2, n.obs=305, rotation="none") girls.fa2

Call:factanal(factors = 2, covmat = girls.cor, n.obs = 305, rotation = "none")

Uniquenesses: height arm.span forearm lower.leg weight 0.170 0.107 0.166 0.199 0.089 bitro.diameter chest.girth chest.width 0.364 0.416 0.537

Page 11: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

Loadings: Factor1 Factor2height 0.880 -0.237 arm.span 0.874 -0.360 forearm 0.846 -0.344 lower.leg 0.855 -0.263 weight 0.705 0.644 bitro.diameter 0.589 0.538 chest.girth 0.526 0.554 chest.width 0.574 0.365

Factor1 Factor2SS loadings 4.434 1.518Proportion Var 0.554 0.190Cumulative Var 0.554 0.744

Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 75.74 on 13 degrees of freedom.The p-value is 6.94e-11

girls.fa2a = factanal(covmat=girls.cor, factors=2, n.obs=305)# Varimax rotation

girls.fa2aCall:factanal(factors = 2, covmat = girls.cor, n.obs = 305)

Uniquenesses: height arm.span forearm lower.leg weight 0.170 0.107 0.166 0.199 0.089 bitro.diameter chest.girth chest.width 0.364 0.416 0.537 Loadings: Factor1 Factor2height 0.865 0.287 arm.span 0.927 0.181 forearm 0.895 0.179 lower.leg 0.859 0.252 weight 0.233 0.925 bitro.diameter 0.194 0.774 chest.girth 0.134 0.752

Page 12: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

chest.width 0.278 0.621 Factor1 Factor2SS loadings 3.335 2.617Proportion Var 0.417 0.327Cumulative Var 0.417 0.744

Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 75.74 on 13 degrees of freedom.The p-value is 6.94e-11 arrows(0, 0, girls.fa2a$loadings[,1], girls.fa2a$loadings[,2],

col="red") identify(girls.fa2a$loadings[,1], girls.fa2a$loadings[,2],

rownames(girls.fa2$loadings), col="red")

girls.fa2b = factanal(covmat=girls.cor, factors=2, n.obs=305, rotation="promax") # Promax rotation

girls.fa2b

Call:factanal(factors = 2, covmat = girls.cor, n.obs = 305, rotation = "promax")

Uniquenesses: height arm.span forearm lower.leg weight 0.170 0.107 0.166 0.199 0.089 bitro.diameter chest.girth chest.width 0.364 0.416 0.537

Loadings: Factor1 Factor2height 0.872 arm.span 0.973 forearm 0.938 lower.leg 0.876 weight 0.961 bitro.diameter 0.803 chest.girth 0.796 chest.width 0.125 0.611

Page 13: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

Factor1 Factor2SS loadings 3.375 2.589Proportion Var 0.422 0.324Cumulative Var 0.422 0.745

Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 75.74 on 13 degrees of freedom.The p-value is 6.94e-11 arrows(0, 0, girls.fa2b$loadings[,1], girls.fa2b$loadings[,2], col="blue") identify(girls.fa2b$loadings[,1], girls.fa2b$loadings[,2], rownames(girls.fa2$loadings), col="blue")

girls.fa3 = factanal(covmat=girls.cor, factors=3, n.obs=305) girls.fa3

Call:factanal(factors = 3, covmat = girls.cor, n.obs = 305)

Uniquenesses: height arm.span forearm lower.leg weight 0.127 0.005 0.193 0.157 0.090 bitro.diameter chest.girth chest.width 0.359 0.411 0.490

Loadings: Factor1 Factor2 Factor3height 0.886 0.267 -0.130 arm.span 0.937 0.195 0.280 forearm 0.874 0.188 lower.leg 0.877 0.230 -0.145 weight 0.242 0.916 -0.106 bitro.diameter 0.193 0.777 chest.girth 0.137 0.755 chest.width 0.261 0.646 0.159

Factor1 Factor2 Factor3SS loadings 3.379 2.628 0.162Proportion Var 0.422 0.329 0.020

Page 14: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

Cumulative Var 0.422 0.751 0.771

Test of the hypothesis that 3 factors are sufficient.The chi square statistic is 22.81 on 7 degrees of freedom.The p-value is 0.00184

Note that although the p-value is much less significant for three factors compared to two, the third factor contributes far less to the total variance than the first two do.

3. Pain Reliever Perceptions Data (from book)

pain=read.table("http://www.uidaho.edu/~stevel/519/Data/PAIN_RELIEF.txt") colnames(pain) = c("No Upset Stomach", "No Side Effects",

"Stops Pain", "Works Quickly", "Keeps Me Awake", "Limited Relief")

pain.pc = prcomp(pain, scale=T) plot(pain.pc$sdev^2, type="o", pch=16) abline(h=1,col="grey")

pain.pc$rotation[,1:2] PC1 PC2No Upset Stomach 0.4316 -0.3595No Side Effects 0.3808 -0.4442Stops Pain 0.4536 0.3546Works Quickly 0.3828 0.4407Keeps Me Awake -0.3516 0.4699Limited Relief -0.4392 -0.3642

pain.fa=factanal(pain, factors=2, scores="reg") pain.fa

Call:factanal(x = pain, factors = 2, scores = "reg")

Uniquenesses:No Upset Stomach No Side Effects Stops Pain Works Quickly 0.434 0.344 0.346 0.365 Keeps Me Awake Limited Relief

Page 15: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

0.365 0.392

Loadings: Factor1 Factor2No Upset Stomach 0.136 0.740 No Side Effects 0.810 Stops Pain 0.802 0.105 Works Quickly 0.795 Keeps Me Awake -0.796 Limited Relief -0.776

Factor1 Factor2SS loadings 1.898 1.857Proportion Var 0.316 0.309Cumulative Var 0.316 0.626

Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 3.29 on 4 degrees of freedom.The p-value is 0.511

plot(pain.fa$scores)

4. Luxury Car Perceptions (from book)

source("readTri.txt") colnames(car.cor) = c("Luxury", "Style", "Reliability", "Fuel Econ",

"Safety", "Maintenance", "Quality", "Durable", "Performance") rownames(car.cor) = colnames(car.cor) car.pc = eigen(car.cor) car.pc$values[1] 4.1640 1.5400 0.6857 0.5848 0.5152 0.4781 0.3736 0.3508 0.3077 plot(car.pc$values,type="o", pch=16) abline(h=1,col="grey")

Page 16: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

factanal(covmat=car.cor, factors=2, n.obs=162)

Call:factanal(factors = 2, covmat = car.cor, n.obs = 162)

Uniquenesses: Luxury Style Reliability Fuel Econ Safety 0.164 0.546 0.440 0.625 0.573 Maintenance Quality Durable Performance 0.560 0.359 0.451 0.469

Loadings: Factor1 Factor2Luxury 0.914 Style 0.644 0.198 Reliability 0.387 0.640 Fuel Econ -0.101 0.604 Safety 0.620 0.204 Maintenance 0.175 0.640 Quality 0.454 0.659 Durable 0.335 0.661 Performance 0.588 0.430

Factor1 Factor2SS loadings 2.491 2.322Proportion Var 0.277 0.258Cumulative Var 0.277 0.535

Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 19.6 on 19 degrees of freedom.The p-value is 0.419

car.pc$vectors[,1:2] [,1] [,2]Luxury -0.3125 0.4937Style -0.3198 0.3197Reliability -0.3726 -0.1909Fuel Econ -0.1894 -0.5748Safety -0.3158 0.2930

Page 17: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

Maintenance -0.3058 -0.3648Quality -0.3968 -0.1092Durable -0.3644 -0.1883Performance -0.3768 0.1446

5. Full Psychological Test Battery

fulltest.cor = Harman74.cor$cov image(1:24, 1:24, fulltest.cor, zlim=c(-1,1), col=cm.colors(21))

fulltest.pc = eigen(fulltest.cor)# Principal components analysis fulltest.pc$values [1] 8.1354 2.0960 1.6926 1.5018 1.0252 0.9429 0.9012 0.8159 0.7902 0.7069[11] 0.6394 0.5433 0.5330 0.5094 0.4775 0.3897 0.3820 0.3404 0.3338 0.3158[21] 0.2972 0.2681 0.1897 0.1725 plot(fulltest.pc$values,type="o", pch=16) abline(h=1,col="grey") rownames(fulltest.pc$vectors) = rownames(fulltest.cor)

fulltest.pc$vectors[,1:4] [,1] [,2] [,3] [,4]VisualPerception -0.2159 -0.003764 -0.32875 0.16685Cubes -0.1401 -0.054849 -0.30751 0.16446PaperFormBoard -0.1559 -0.131808 -0.36614 0.08630Flags -0.1790 -0.122936 -0.25729 0.17621GeneralInformation -0.2436 -0.221950 0.25774 0.04309PargraphComprehension -0.2421 -0.288461 0.20378 -0.06579SentenceCompletion -0.2373 -0.293489 0.27319 0.05917WordClassification -0.2434 -0.167723 0.11036 0.09493WordMeaning -0.2434 -0.311224 0.22351 -0.06499Addition -0.1662 0.374211 0.34305 0.16446Code -0.2020 0.299629 0.16148 -0.02757CountingDots -0.1691 0.379149 0.09735 0.27775StraightCurvedCapitals -0.2167 0.192742 -0.02724 0.29855WordRecognition -0.1570 0.063949 0.04238 -0.45320NumberRecognition -0.1457 0.098312 -0.06024 -0.42914FigureRecognition -0.1871 0.062819 -0.30098 -0.26697ObjectNumber -0.1710 0.190302 0.04029 -0.38276

Page 18: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

NumberFigure -0.1906 0.266841 -0.15243 -0.12392FigureWord -0.1667 0.095217 -0.09373 -0.15777Deduction -0.2254 -0.128716 -0.10154 -0.05720NumericalPuzzles -0.2179 0.160439 -0.07670 0.16458ProblemReasoning -0.2242 -0.100721 -0.08481 -0.04537SeriesCompletion -0.2495 -0.072448 -0.11493 0.08402ArithmeticProblems -0.2358 0.135235 0.17931 0.05043

fulltest.fa1 = factanal(covmat = fulltest.cor, factors=1, n.obs=145) fulltest.fa1

Call:factanal(factors = 1, covmat = fulltest.cor, n.obs = 145)

Uniquenesses: VisualPerception Cubes PaperFormBoard 0.677 0.866 0.830 Flags GeneralInformation PargraphComprehension 0.768 0.487 0.491 SentenceCompletion WordClassification WordMeaning 0.500 0.514 0.474 Addition Code CountingDots 0.818 0.731 0.824 StraightCurvedCapitals WordRecognition NumberRecognition 0.681 0.833 0.863 FigureRecognition ObjectNumber NumberFigure 0.775 0.812 0.778 FigureWord Deduction NumericalPuzzles 0.816 0.612 0.676 ProblemReasoning SeriesCompletion ArithmeticProblems 0.619 0.524 0.593

Loadings: Factor1VisualPerception 0.569 Cubes 0.366 PaperFormBoard 0.412 Flags 0.482 GeneralInformation 0.716 PargraphComprehension 0.713 SentenceCompletion 0.707 WordClassification 0.697 WordMeaning 0.725 Addition 0.426 Code 0.519 CountingDots 0.419 StraightCurvedCapitals 0.565 WordRecognition 0.408 NumberRecognition 0.370 FigureRecognition 0.474 ObjectNumber 0.434 NumberFigure 0.471 FigureWord 0.429

Page 19: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

Deduction 0.623 NumericalPuzzles 0.569 ProblemReasoning 0.617 SeriesCompletion 0.690 ArithmeticProblems 0.638

Factor1SS loadings 7.438Proportion Var 0.310

Test of the hypothesis that 1 factor is sufficient.The chi square statistic is 622.9 on 252 degrees of freedom.The p-value is 2.28e-33

update(fulltest.fa1, factors=2)Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 420.2 on 229 degrees of freedom.The p-value is 2.01e-13

update(fulltest.fa1, factors=3)Test of the hypothesis that 3 factors are sufficient.The chi square statistic is 295.6 on 207 degrees of freedom.The p-value is 0.0000512

update(fulltest.fa1, factors=4)

Call:factanal(factors = 4, covmat = fulltest.cor, n.obs = 145)

Uniquenesses: VisualPerception Cubes PaperFormBoard 0.438 0.780 0.644 Flags GeneralInformation PargraphComprehension 0.651 0.352 0.312 SentenceCompletion WordClassification WordMeaning 0.283 0.485 0.257 Addition Code CountingDots 0.240 0.551 0.435 StraightCurvedCapitals WordRecognition NumberRecognition 0.491 0.646 0.696 FigureRecognition ObjectNumber NumberFigure 0.549 0.598 0.593 FigureWord Deduction NumericalPuzzles 0.762 0.592 0.583 ProblemReasoning SeriesCompletion ArithmeticProblems 0.601 0.497 0.500

Loadings: Factor1 Factor2 Factor3 Factor4VisualPerception 0.160 0.689 0.187 0.160 Cubes 0.117 0.436 PaperFormBoard 0.137 0.570 0.110 Flags 0.233 0.527 GeneralInformation 0.739 0.185 0.213 0.150 PargraphComprehension 0.767 0.205 0.233 SentenceCompletion 0.806 0.197 0.153

Page 20: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

WordClassification 0.569 0.339 0.242 0.132 WordMeaning 0.806 0.201 0.227 Addition 0.167 -0.118 0.831 0.166 Code 0.180 0.120 0.512 0.374 CountingDots 0.210 0.716 StraightCurvedCapitals 0.188 0.438 0.525 WordRecognition 0.197 0.553 NumberRecognition 0.122 0.116 0.520 FigureRecognition 0.408 0.525 ObjectNumber 0.142 0.219 0.574 NumberFigure 0.293 0.336 0.456 FigureWord 0.148 0.239 0.161 0.365 Deduction 0.378 0.402 0.118 0.301 NumericalPuzzles 0.175 0.381 0.438 0.223 ProblemReasoning 0.366 0.399 0.123 0.301 SeriesCompletion 0.369 0.500 0.244 0.239 ArithmeticProblems 0.370 0.158 0.496 0.304

Factor1 Factor2 Factor3 Factor4SS loadings 3.647 2.872 2.657 2.290Proportion Var 0.152 0.120 0.111 0.095Cumulative Var 0.152 0.272 0.382 0.478

Test of the hypothesis that 4 factors are sufficient.The chi square statistic is 226.7 on 186 degrees of freedom.The p-value is 0.0224

update(fulltest.fa1, factors=5)Call:factanal(factors = 5, covmat = fulltest.cor, n.obs = 145)

Uniquenesses: VisualPerception Cubes PaperFormBoard 0.450 0.781 0.639 Flags GeneralInformation PargraphComprehension 0.649 0.357 0.288 SentenceCompletion WordClassification WordMeaning 0.277 0.485 0.262 Addition Code CountingDots 0.215 0.386 0.444 StraightCurvedCapitals WordRecognition NumberRecognition 0.256 0.639 0.706 FigureRecognition ObjectNumber NumberFigure 0.550 0.614 0.596 FigureWord Deduction NumericalPuzzles 0.764 0.521 0.564 ProblemReasoning SeriesCompletion ArithmeticProblems 0.580 0.442 0.478

Loadings: Factor1 Factor2 Factor3 Factor4 Factor5VisualPerception 0.161 0.658 0.136 0.182 0.199 Cubes 0.113 0.435 0.107 PaperFormBoard 0.135 0.562 0.107 0.116

Page 21: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

Flags 0.231 0.533 GeneralInformation 0.736 0.188 0.192 0.162 PargraphComprehension 0.775 0.187 0.251 0.113 SentenceCompletion 0.809 0.208 0.136 WordClassification 0.568 0.348 0.223 0.131 WordMeaning 0.800 0.215 0.224 Addition 0.175 -0.100 0.844 0.176 Code 0.185 0.438 0.451 0.426 CountingDots 0.222 0.690 0.101 0.140 StraightCurvedCapitals 0.186 0.425 0.458 0.559 WordRecognition 0.197 0.557 NumberRecognition 0.121 0.130 0.508 FigureRecognition 0.400 0.529 ObjectNumber 0.145 0.208 0.562 NumberFigure 0.306 0.325 0.452 FigureWord 0.147 0.242 0.145 0.364 Deduction 0.370 0.452 0.139 0.287 -0.190 NumericalPuzzles 0.170 0.402 0.439 0.230 ProblemReasoning 0.358 0.423 0.126 0.302 SeriesCompletion 0.360 0.549 0.256 0.223 -0.107 ArithmeticProblems 0.371 0.185 0.502 0.307

Factor1 Factor2 Factor3 Factor4 Factor5SS loadings 3.632 2.964 2.456 2.345 0.663Proportion Var 0.151 0.124 0.102 0.098 0.028Cumulative Var 0.151 0.275 0.377 0.475 0.503

Test of the hypothesis that 5 factors are sufficient.The chi square statistic is 186.8 on 166 degrees of freedom.The p-value is 0.128

6. Factor analysis vs. PCA

Similarities Both methods are mostly used in EDA (exploratory data analysis). Both methods try to obtain dimension reduction: explain a data set in a smaller

number of new variables. Both methods don’t work if the observed variables are almost uncorrelated:

o Then PCA returns components that are similar to the original variables.o Then factor analysis has nothing to explain, i.e. uniqueness will be all close

to 1 Both methods give similar results if the specific variances are small. If specific variances are assumed to be zero in principle factor analysis, then PCA

and factor analysis are the same. Both PCA and FA DO NOT need Normality assumption.

Page 22: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

Differences PCA required virtually no assumptions.

Factor analysis assumes that data come from a specific model structure. Normality assumption is needed in FA, however, in the case of chi-square test and MLE estimates. The principal factor analysis estimation procedure does not require normality though.

In PCA emphasis is on transforming observed variables to principle components.In factor analysis, emphasis is on the transformation from factors to observed variables.

PCA is not scale invariant.Factor analysis (with MLE) is scale invariant.

In PCA, considering c + 1 instead of c components does not change the first c components.In factor analysis, considering c + 1 instead of c factors may change the first c factors (when using MLE method).

Calculation of PCA scores is straightforward.Calculation of factor scores is more involved.

# Exploratory Factor Analysis

# Psychological Test Results Data (from book)

data(Harman74.cor)test.cor <- Harman74.cor$cov[c(6, 7, 9, 10, 12),c(6, 7, 9, 10, 12)]colnames(test.cor) <- c("PARA","SENT","WORD","ADD","COUNT")rownames(test.cor) <- colnames(test.cor)

test.cor

image(1:5, 1:5, test.cor, zlim=c(-1,1), col=cm.colors(21))# Plot of correlations - magenta = positive, cyan =

negativeimage(1:5, 1:5, test.cor, zlim=c(-1,1), col=grey(0:20/20) )

# Similar, with white = +1, black = -1.

test.pc <- eigen(test.cor) # Principal components analysistest.pc$valuesplot(test.pc$values,type="o", pch=16)abline(h=1,col="grey")

test.pc$vectors[,1:2]

test.fa2 <- factanal(covmat = test.cor, factors=2, n.obs=145)test.fa2

test.fa1 <- update(test.fa2, factors=1) # shorthand to make minor changestest.fa1 # to a model

Page 23: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

# Note p-value for one-factor model

# Artificial Data - From R factanal( ) help pages

# A little demonstration, v2 is just v1 with noise,# and same for v4 vs. v3 and v6 vs. v5

v1 <- c(1,1,1,1,1,1,1,1,1,1,3,3,3,3,3,4,5,6)v2 <- c(1,2,1,1,1,1,2,1,2,1,3,4,3,3,3,4,6,5)v3 <- c(3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,5,4,6)v4 <- c(3,3,4,3,3,1,1,2,1,1,1,1,2,1,1,5,6,4)v5 <- c(1,1,1,1,1,3,3,3,3,3,1,1,1,1,1,6,4,5)v6 <- c(1,1,1,2,1,3,3,3,4,3,1,1,1,2,1,6,5,4)m1 <- cbind(v1,v2,v3,v4,v5,v6)

pairs(m1)pairs(m1+runif(6*18, -.3, .3)) # "jittering" to break ties

m1.pc <- prcomp(m1)plot(m1.pc$sdev^2, type="o", pch=16)abline(h=1,col="grey")

pairs(m1.pc$x[,1:3])m1.pc$rotation[,1:3]

m1.fa1 <- factanal(m1, factors=1)m1.fa1

m1.fa2 <- factanal(m1, factors=2, rotation="none")m1.fa2

m1.fa3 <- factanal(m1, factors=3, rotation="none")m1.fa3

m1.fa3a <- factanal(m1, factors=3, rotation="varimax", scores="regression") # default rotationm1.fa3a # Note improved interpretation of loadings

pairs(m1.fa3a$scores)

# Girls physical measurements data

data(Harman23.cor)girls.cor <- Harman23.cor$cov # Correlation matrix of 8 physical measurements on 305

# girls between 7 - 17

girls.cor

image(1:8, 1:8, girls.cor, zlim=c(-1,1), col=cm.colors(21) )

girls.pc <- eigen(girls.cor) # Principal components analysisgirls.pc$values

Page 24: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

plot(girls.pc$values,type="o", pch=16)abline(h=1,col="grey")

rownames(girls.pc$vectors) <- colnames(girls.cor)

girls.pc$vectors[,1:2]

girls.fa1 <- factanal(covmat=girls.cor, factors=1, n.obs=305)girls.fa1

girls.fa2 <- factanal(covmat=girls.cor, factors=2, n.obs=305, rotation="none")girls.fa2

plot(0,0, xlim=c(-1,1), ylim=c(-1,1), type='n', xlab="Factor 1 Loadings", ylab="Factor 2 Loadings")abline(h=0, col='grey')abline(v=0, col='grey')arrows(0, 0, girls.fa2$loadings[,1], girls.fa2$loadings[,2])text(girls.fa2$loadings[,1], girls.fa2$loadings[,2], rownames(girls.fa2$loadings)) girls.fa2a <- factanal(covmat=girls.cor, factors=2, n.obs=305) # Varimax rotationgirls.fa2a

arrows(0, 0, girls.fa2a$loadings[,1], girls.fa2a$loadings[,2], col="red")text(girls.fa2a$loadings[,1], girls.fa2a$loadings[,2], rownames(girls.fa2$loadings), col="red")

girls.fa2b <- factanal(covmat=girls.cor, factors=2, n.obs=305, rotation="promax") # Promax rotationgirls.fa2b

arrows(0, 0, girls.fa2b$loadings[,1], girls.fa2b$loadings[,2], col="blue")text(girls.fa2b$loadings[,1], girls.fa2b$loadings[,2], rownames(girls.fa2$loadings), col="blue")

girls.fa3 <- factanal(covmat=girls.cor, factors=3, n.obs=305)girls.fa3 # The third factor contributes far less information than the first two

varimax(girls.pc$vectors[,1:2]) # Varimax rotation may be used for PCA loadings toopromax(girls.pc$vectors[,1:2]) # As may promax

girls2.cor <- girls.cor[1:5,1:5] # Drop weight-related variables except weight itself

girls2.fa2 <- factanal(covmat=girls2.cor, factors=2, n.obs=305)girls2.fa2 # Weight has high uniqueness, but not its own factor

Page 25: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

# Pain Reliever Perceptions Data (from book)

pain <- read.table("PAIN_RELIEF.txt")colnames(pain) <- c("No Upset Stomach", "No Side Effects", "Stops Pain",

"Works Quickly", "Keeps Me Awake", "Limited Relief")

pain.pc <- prcomp(pain)plot(pain.pc$sdev^2, type="o", pch=16)abline(h=1,col="grey")

pain.pc$rotation[,1:2]

pain.fa<-factanal(pain, factors=2, scores="reg")pain.faplot(pain.fa$scores)

varimax(pain.pc$rotation[,1:2])promax(pain.pc$rotation[,1:2])

plot(pain.pc$x %*% varimax(pain.pc$rotation[,1:2])$loadings)

# Luxury Car Perceptions Data (from book)

car.cor <- read.tri("LUXURY_CAR.txt")colnames(car.cor) <- c("Luxury", "Style", "Reliability", "Fuel Econ",

"Safety", "Maintenance", "Quality", "Durable", "Performance")rownames(car.cor) <- colnames(car.cor)

car.pc <- eigen(car.cor)car.pc$valuesplot(car.pc$values,type="o", pch=16)abline(h=1,col="grey")

factanal(covmat=car.cor, factors=2)

rownames(car.pc$vectors) <- colnames(car.cor)car.pc$vectors[,1:2]varimax(car.pc$vectors[,1:2])

# Back to full Psychological Test Results data

fulltest.cor <- Harman74.cor$covimage(1:24, 1:24, fulltest.cor, zlim=c(-1,1), col=cm.colors(21))

fulltest.pc <- eigen(fulltest.cor) # Principal components analysisfulltest.pc$valuesplot(fulltest.pc$values,type="o", pch=16)abline(h=1,col="grey")

Page 26: R3: Graphics and Visualizationstevel/519/R/R07.EFA.doc  · Web viewExploratory Factor Analysis. 0. Introduction. In social sciences (e.g., psychology), it is often not possible to

fulltest.pc$vectors[,1:4]varimax(fulltest.pc$vectors[,1:4])promax(fulltest.pc$vectors[,1:4])

fulltest.fa1 <- factanal(covmat = fulltest.cor, factors=1, n.obs=145)fulltest.fa1

update(fulltest.fa1, factors=2) # Note use of test to choose number of factorsupdate(fulltest.fa1, factors=3)update(fulltest.fa1, factors=4)update(fulltest.fa1, factors=5)