Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Exploratory Factor Analysis
0. Introduction
In social sciences (e.g., psychology), it is often not possible to measure the variables of interest directly. Examples:
Intelligence Social class
Such variables are called latent variables or common factors. Researchers examine such variables indirectly, by measuring variables that can be measured and that are believed to be indicators of the latent variables of interest. Examples:
Examination scores on various tests Occupation, education, home ownership
Such variables are called observed variables.
Goal: study the relationship between the latent variables and the observed variables
1. Psychological Testing Data
data(Harman74.cor) test.cor = Harman74.cor$cov[c(6, 7, 9, 10, 12),c(6, 7, 9, 10, 12)] colnames(test.cor) = c("PARA","SENT","WORD","ADD","COUNT") rownames(test.cor) = colnames(test.cor) test.cor PARA SENT WORD ADD COUNTPARA 1.000 0.722 0.714 0.203 0.095SENT 0.722 1.000 0.685 0.246 0.181WORD 0.714 0.685 1.000 0.170 0.113ADD 0.203 0.246 0.170 1.000 0.585COUNT 0.095 0.181 0.113 0.585 1.000 image(1:5, 1:5, test.cor, zlim=c(-1,1), col=cm.colors(21))
# Plot of correlations - magenta = positive, cyan = negative image(1:5, 1:5, test.cor, zlim=c(-1,1), col=grey(0:20/20) )
# Similar, with white = +1, black = -1
1b. Principal components analysis
test.pc = eigen(test.cor)# Principal components analysis
test.pc$values[1] 2.5875 1.4217 0.4152 0.3111 0.2645 plot(test.pc$values,type="o", pch=16) abline(h=1,col="grey")
test.pc$vectors[,1:2] [,1] [,2][1,] -0.5345 -0.2449[2,] -0.5424 -0.1641[3,] -0.5234 -0.2470[4,] -0.2971 0.6268[5,] -0.2406 0.6776
test.loadings =test.pc$vector%*%diag(sqrt(test.pc$values))test.loadings [,1] [,2] [,3] [,4] [,5][1,] -0.8597723 -0.2920337 -0.07368865 -0.055092837 0.40870847[2,] -0.8725092 -0.1957119 0.03846385 -0.368390488 -0.25146294[3,] -0.8419192 -0.2945670 0.09276139 0.411615628 -0.16238911[4,] -0.4779188 0.7473420 -0.45579984 0.053423069 -0.04965974[5,] -0.3869818 0.8079858 0.43809069 -0.008495525 0.07354173test.loadings %*% t(test.loadings) # same as the correlation test.cor [,1] [,2] [,3] [,4] [,5][1,] 1.000 0.722 0.714 0.203 0.095[2,] 0.722 1.000 0.685 0.246 0.181[3,] 0.714 0.685 1.000 0.170 0.113[4,] 0.203 0.246 0.170 1.000 0.585[5,] 0.095 0.181 0.113 0.585 1.000
1c. Exploratory factor analysis – two factors
test.fa2 = factanal(covmat = test.cor, factors=2, n.obs=145)
The R function factanal() assume that X has a multivariate normal distribution and estimate the log likelihood function over the factor loading matrix and Uniqnesness to estimate the parameters (i.e. MLE estimates are obtained iteratively).
test.fa2
Call:
factanal(factors = 2, covmat = test.cor, n.obs = 145)
Uniquenesses: PARA SENT WORD ADD COUNT 0.242 0.300 0.327 0.574 0.155
Loadings: Factor1 Factor2PARA 0.867 SENT 0.820 0.166 WORD 0.816 ADD 0.167 0.631 COUNT 0.918
Factor1 Factor2SS loadings 2.119 1.282Proportion Var 0.424 0.256Cumulative Var 0.424 0.680
Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 0.58 on 1 degree of freedom.The p-value is 0.446 # df(chi sq) = df(test.cor)-df(factor 1)–df(factor 2) = 10 – 5 – 4 = 1
apply(test.fa2$loadings^2,2,sum) Factor1 Factor2 2.119212 1.281613 apply(test.fa2$loadings^2,1,sum) PARA SENT WORD ADD COUNT 0.7575545 0.7002648 0.6727687 0.4256436 0.8445924 apply(test.fa2$loadings^2,1,sum)+test.fa2$uniqueness PARA SENT WORD ADD COUNT 1.0000002 0.9999997 0.9999999 1.0000005 1.0000000
test.fa2$loadings%*%t(test.fa2$loadings)+diag(test.fa2$uniqueness) # same as cor(X) after computational roundings PARA SENT WORD ADD COUNTPARA 1.00000018 0.7233842 0.7137061 0.1908447 0.09714164SENT 0.72338424 0.9999997 0.6833973 0.2421572 0.18171652WORD 0.71370606 0.6833973 0.9999999 0.1916676 0.10913971ADD 0.19084473 0.2421572 0.1916676 1.0000005 0.58499501COUNT 0.09714164 0.1817165 0.1091397 0.5849950 1.00000002
1d. One-factor model
test.fa1 = update(test.fa2, factors=1)# shorthand to make minor changes to a model
test.fa1Call:factanal(factors = 1, covmat = test.cor, n.obs = 145)
Uniquenesses: PARA SENT WORD ADD COUNT 0.258 0.294 0.328 0.933 0.970
Loadings: Factor1PARA 0.861 SENT 0.840 WORD 0.820 ADD 0.260 COUNT 0.173
Factor1SS loadings 2.217Proportion Var 0.443
Test of the hypothesis that 1 factor is sufficient.The chi square statistic is 58.17 on 5 degrees of freedom.The p-value is 2.91e-11 # df(chi sq) = df(test.cor)-df(factor 1) = 10 – 5 = 5 # small p-value; reject the one-factor model
Can you show that the chisq df is actually equals to ½[(p-c)2-p-c] where p is the number of X variables and c is the number of factors chosen?
Often sequential testing procedure is used: start with 1 factor and then increase the number of factors one at a time until test doesn’t reject the null hypothesis. It can occur that the test always rejects the null hypothesis. This is an indication that the modeldoes not fit well, or that the sample size is too large. Other times the sequential chi-square test tends to over-estimate the number of factors needed for a successful interpretation.
An alternative utility approach which is more computational intensive is: Perform factor analyses with various values of c, complete with rotation, and choose the smallest c that gives the most appealing structure.
2. Artificial Data (From R factanal help page)
A little demonstration, v2 is just v1 with noise, and same for v4 vs. v3 and v6 vs. v5
v1 = c(1,1,1,1,1,1,1,1,1,1,3,3,3,3,3,4,5,6) v2 = c(1,2,1,1,1,1,2,1,2,1,3,4,3,3,3,4,6,5) v3 = c(3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,5,4,6) v4 = c(3,3,4,3,3,1,1,2,1,1,1,1,2,1,1,5,6,4) v5 = c(1,1,1,1,1,3,3,3,3,3,1,1,1,1,1,6,4,5) v6 = c(1,1,1,2,1,3,3,3,4,3,1,1,1,2,1,6,5,4) m1 = cbind(v1,v2,v3,v4,v5,v6) pairs(m1) pairs(m1+runif(6*18, -.3, .3)) # "jittering" to break ties
2a. Principal components
m1.pc = prcomp(m1) plot(m1.pc$sdev^2, type="o", pch=16) abline(h=1,col="grey") pairs(m1.pc$x[,1:3])
m1.pc$rotation[,1:3] PC1 PC2 PC3v1 0.4168 -0.52292 0.2354v2 0.3886 -0.50888 0.2986v3 0.4183 0.01522 -0.5555v4 0.3944 0.02184 -0.5986v5 0.4254 0.47017 0.2923v6 0.4048 0.49581 0.3210
2b. Factor analysis
m1.fa1 = factanal(m1, factors=1) m1.fa1
Call:factanal(x = m1, factors = 1)
Uniquenesses: v1 v2 v3 v4 v5 v6 0.773 0.792 0.733 0.795 0.022 0.085
Loadings: Factor1v1 0.476 v2 0.456 v3 0.517 v4 0.453 v5 0.989 v6 0.956
Factor1SS loadings 2.800Proportion Var 0.467
Test of the hypothesis that 1 factor is sufficient.The chi square statistic is 53.43 on 9 degrees of freedom.The p-value is 2.43e-08
m1.fa2 = factanal(m1, factors=2, rotation="none") m1.fa2
Call:factanal(x = m1, factors = 2, rotation = "none")
Uniquenesses: v1 v2 v3 v4 v5 v6 0.005 0.114 0.642 0.742 0.005 0.097
Loadings: Factor1 Factor2v1 0.853 -0.518 v2 0.804 -0.490 v3 0.598 v4 0.508 v5 0.857 0.510 v6 0.796 0.519
Factor1 Factor2SS loadings 3.358 1.038Proportion Var 0.560 0.173Cumulative Var 0.560 0.733
Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 23.14 on 4 degrees of freedom.The p-value is 0.000119
m1.fa3 = factanal(m1, factors=3, rotation="none") m1.fa3
Call:factanal(x = m1, factors = 3, rotation = "none")
Uniquenesses: v1 v2 v3 v4 v5 v6 0.005 0.101 0.005 0.224 0.084 0.005
Loadings: Factor1 Factor2 Factor3v1 0.808 -0.385 0.440 v2 0.752 -0.290 0.500 v3 0.813 -0.229 -0.530 v4 0.729 -0.139 -0.474 v5 0.802 0.521 v6 0.764 0.636
Factor1 Factor2 Factor3SS loadings 3.638 0.980 0.957Proportion Var 0.606 0.163 0.159Cumulative Var 0.606 0.770 0.929
The degrees of freedom for the model is 0 and the fit was 0.4755
m1.fa3a = factanal(m1, factors=3, rotation="varimax", scores="regression") # default rotation
m1.fa3a # Note improved interpretation of loadings
Call:factanal(x = m1, factors = 3, scores = "regression", rotation = "varimax")
Uniquenesses: v1 v2 v3 v4 v5 v6 0.005 0.101 0.005 0.224 0.084 0.005
Loadings: Factor1 Factor2 Factor3v1 0.944 0.182 0.267 v2 0.905 0.235 0.159 v3 0.236 0.210 0.946 v4 0.180 0.242 0.828 v5 0.242 0.881 0.286 v6 0.193 0.959 0.196
Factor1 Factor2 Factor3SS loadings 1.893 1.886 1.797Proportion Var 0.316 0.314 0.300Cumulative Var 0.316 0.630 0.929The degrees of freedom for the model is 0 and the fit was 0.4755
pairs(m1.fa3a$scores)
3. Girls physical measurements data
Correlation matrix of 8 physical measurements on 305 girls between 7 and 17
data(Harman23.cor) girls.cor = Harman23.cor$cov girls.cor height arm.span forearm lower.leg weight bitro.diameterheight 1.000 0.846 0.805 0.859 0.473 0.398arm.span 0.846 1.000 0.881 0.826 0.376 0.326forearm 0.805 0.881 1.000 0.801 0.380 0.319lower.leg 0.859 0.826 0.801 1.000 0.436 0.329weight 0.473 0.376 0.380 0.436 1.000 0.762bitro.diameter 0.398 0.326 0.319 0.329 0.762 1.000chest.girth 0.301 0.277 0.237 0.327 0.730 0.583chest.width 0.382 0.415 0.345 0.365 0.629 0.577 chest.girth chest.widthheight 0.301 0.382arm.span 0.277 0.415forearm 0.237 0.345lower.leg 0.327 0.365weight 0.730 0.629bitro.diameter 0.583 0.577chest.girth 1.000 0.539chest.width 0.539 1.000 image(1:8, 1:8, girls.cor, zlim=c(-1,1), col=cm.colors(21) ) girls.pc = eigen(girls.cor) # Principal components analysis girls.pc$values[1] 4.67288 1.77098 0.48104 0.42144 0.23322 0.18667 0.13730 0.09646 plot(girls.pc$values,type="o", pch=16) abline(h=1,col="grey")
rownames(girls.pc$vectors) = colnames(girls.cor) girls.pc$vectors[,1:2] [,1] [,2]height -0.3976 -0.2797arm.span -0.3893 -0.3314forearm -0.3762 -0.3446lower.leg -0.3884 -0.2971weight -0.3507 0.3942bitro.diameter -0.3119 0.4007chest.girth -0.2855 0.4359chest.width -0.3102 0.3144
#Rotate PC’s by varimax()
girls.pc.loadings=girls.pc$vectors %*% diag(sqrt(girls.pc$values))
summary(as.vector(girls.pc.loadings %*% t(girls.pc.loadings) - girls.cor) ) Min. 1st Qu. Median Mean 3rd Qu. Max. -1.443e-15 -7.772e-16 -5.551e-16 -6.098e-16 -4.441e-16 0.000e+00
varimax(girls.pc.loadings[,c(1:2)])$loadings
Loadings: [,1] [,2] height -0.902 0.252arm.span -0.932 0.187forearm -0.920 0.156lower.leg -0.901 0.222weight -0.258 0.885bitro.diameter -0.188 0.839chest.girth -0.114 0.839chest.width -0.257 0.747
[,1] [,2]SS loadings 3.522 2.922Proportion Var 0.440 0.365
Cumulative Var 0.440 0.805
$rotmat [,1] [,2][1,] 0.7768362 -0.6297027[2,] 0.6297027 0.7768362
Note that:
3.522+2.922[1] 6.444cumsum(girls.pc$values)[1] 4.672880 6.443862 6.924898 7.346339 7.579560 7.766233 7.903537 8.000000
girls.fa1 = factanal(covmat=girls.cor, factors=1, n.obs=305) girls.fa1
Call:factanal(factors = 1, covmat = girls.cor, n.obs = 305)
Uniquenesses: height arm.span forearm lower.leg weight 0.158 0.135 0.190 0.187 0.760 bitro.diameter chest.girth chest.width 0.829 0.877 0.801
Loadings: Factor1height 0.918 arm.span 0.930 forearm 0.900 lower.leg 0.902 weight 0.490 bitro.diameter 0.413 chest.girth 0.351 chest.width 0.446
Factor1SS loadings 4.064Proportion Var 0.508
Test of the hypothesis that 1 factor is sufficient.The chi square statistic is 611.4 on 20 degrees of freedom.The p-value is 1.12e-116 girls.fa2 = factanal(covmat=girls.cor, factors=2, n.obs=305, rotation="none") girls.fa2
Call:factanal(factors = 2, covmat = girls.cor, n.obs = 305, rotation = "none")
Uniquenesses: height arm.span forearm lower.leg weight 0.170 0.107 0.166 0.199 0.089 bitro.diameter chest.girth chest.width 0.364 0.416 0.537
Loadings: Factor1 Factor2height 0.880 -0.237 arm.span 0.874 -0.360 forearm 0.846 -0.344 lower.leg 0.855 -0.263 weight 0.705 0.644 bitro.diameter 0.589 0.538 chest.girth 0.526 0.554 chest.width 0.574 0.365
Factor1 Factor2SS loadings 4.434 1.518Proportion Var 0.554 0.190Cumulative Var 0.554 0.744
Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 75.74 on 13 degrees of freedom.The p-value is 6.94e-11
girls.fa2a = factanal(covmat=girls.cor, factors=2, n.obs=305)# Varimax rotation
girls.fa2aCall:factanal(factors = 2, covmat = girls.cor, n.obs = 305)
Uniquenesses: height arm.span forearm lower.leg weight 0.170 0.107 0.166 0.199 0.089 bitro.diameter chest.girth chest.width 0.364 0.416 0.537 Loadings: Factor1 Factor2height 0.865 0.287 arm.span 0.927 0.181 forearm 0.895 0.179 lower.leg 0.859 0.252 weight 0.233 0.925 bitro.diameter 0.194 0.774 chest.girth 0.134 0.752
chest.width 0.278 0.621 Factor1 Factor2SS loadings 3.335 2.617Proportion Var 0.417 0.327Cumulative Var 0.417 0.744
Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 75.74 on 13 degrees of freedom.The p-value is 6.94e-11 arrows(0, 0, girls.fa2a$loadings[,1], girls.fa2a$loadings[,2],
col="red") identify(girls.fa2a$loadings[,1], girls.fa2a$loadings[,2],
rownames(girls.fa2$loadings), col="red")
girls.fa2b = factanal(covmat=girls.cor, factors=2, n.obs=305, rotation="promax") # Promax rotation
girls.fa2b
Call:factanal(factors = 2, covmat = girls.cor, n.obs = 305, rotation = "promax")
Uniquenesses: height arm.span forearm lower.leg weight 0.170 0.107 0.166 0.199 0.089 bitro.diameter chest.girth chest.width 0.364 0.416 0.537
Loadings: Factor1 Factor2height 0.872 arm.span 0.973 forearm 0.938 lower.leg 0.876 weight 0.961 bitro.diameter 0.803 chest.girth 0.796 chest.width 0.125 0.611
Factor1 Factor2SS loadings 3.375 2.589Proportion Var 0.422 0.324Cumulative Var 0.422 0.745
Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 75.74 on 13 degrees of freedom.The p-value is 6.94e-11 arrows(0, 0, girls.fa2b$loadings[,1], girls.fa2b$loadings[,2], col="blue") identify(girls.fa2b$loadings[,1], girls.fa2b$loadings[,2], rownames(girls.fa2$loadings), col="blue")
girls.fa3 = factanal(covmat=girls.cor, factors=3, n.obs=305) girls.fa3
Call:factanal(factors = 3, covmat = girls.cor, n.obs = 305)
Uniquenesses: height arm.span forearm lower.leg weight 0.127 0.005 0.193 0.157 0.090 bitro.diameter chest.girth chest.width 0.359 0.411 0.490
Loadings: Factor1 Factor2 Factor3height 0.886 0.267 -0.130 arm.span 0.937 0.195 0.280 forearm 0.874 0.188 lower.leg 0.877 0.230 -0.145 weight 0.242 0.916 -0.106 bitro.diameter 0.193 0.777 chest.girth 0.137 0.755 chest.width 0.261 0.646 0.159
Factor1 Factor2 Factor3SS loadings 3.379 2.628 0.162Proportion Var 0.422 0.329 0.020
Cumulative Var 0.422 0.751 0.771
Test of the hypothesis that 3 factors are sufficient.The chi square statistic is 22.81 on 7 degrees of freedom.The p-value is 0.00184
Note that although the p-value is much less significant for three factors compared to two, the third factor contributes far less to the total variance than the first two do.
3. Pain Reliever Perceptions Data (from book)
pain=read.table("http://www.uidaho.edu/~stevel/519/Data/PAIN_RELIEF.txt") colnames(pain) = c("No Upset Stomach", "No Side Effects",
"Stops Pain", "Works Quickly", "Keeps Me Awake", "Limited Relief")
pain.pc = prcomp(pain, scale=T) plot(pain.pc$sdev^2, type="o", pch=16) abline(h=1,col="grey")
pain.pc$rotation[,1:2] PC1 PC2No Upset Stomach 0.4316 -0.3595No Side Effects 0.3808 -0.4442Stops Pain 0.4536 0.3546Works Quickly 0.3828 0.4407Keeps Me Awake -0.3516 0.4699Limited Relief -0.4392 -0.3642
pain.fa=factanal(pain, factors=2, scores="reg") pain.fa
Call:factanal(x = pain, factors = 2, scores = "reg")
Uniquenesses:No Upset Stomach No Side Effects Stops Pain Works Quickly 0.434 0.344 0.346 0.365 Keeps Me Awake Limited Relief
0.365 0.392
Loadings: Factor1 Factor2No Upset Stomach 0.136 0.740 No Side Effects 0.810 Stops Pain 0.802 0.105 Works Quickly 0.795 Keeps Me Awake -0.796 Limited Relief -0.776
Factor1 Factor2SS loadings 1.898 1.857Proportion Var 0.316 0.309Cumulative Var 0.316 0.626
Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 3.29 on 4 degrees of freedom.The p-value is 0.511
plot(pain.fa$scores)
4. Luxury Car Perceptions (from book)
source("readTri.txt") colnames(car.cor) = c("Luxury", "Style", "Reliability", "Fuel Econ",
"Safety", "Maintenance", "Quality", "Durable", "Performance") rownames(car.cor) = colnames(car.cor) car.pc = eigen(car.cor) car.pc$values[1] 4.1640 1.5400 0.6857 0.5848 0.5152 0.4781 0.3736 0.3508 0.3077 plot(car.pc$values,type="o", pch=16) abline(h=1,col="grey")
factanal(covmat=car.cor, factors=2, n.obs=162)
Call:factanal(factors = 2, covmat = car.cor, n.obs = 162)
Uniquenesses: Luxury Style Reliability Fuel Econ Safety 0.164 0.546 0.440 0.625 0.573 Maintenance Quality Durable Performance 0.560 0.359 0.451 0.469
Loadings: Factor1 Factor2Luxury 0.914 Style 0.644 0.198 Reliability 0.387 0.640 Fuel Econ -0.101 0.604 Safety 0.620 0.204 Maintenance 0.175 0.640 Quality 0.454 0.659 Durable 0.335 0.661 Performance 0.588 0.430
Factor1 Factor2SS loadings 2.491 2.322Proportion Var 0.277 0.258Cumulative Var 0.277 0.535
Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 19.6 on 19 degrees of freedom.The p-value is 0.419
car.pc$vectors[,1:2] [,1] [,2]Luxury -0.3125 0.4937Style -0.3198 0.3197Reliability -0.3726 -0.1909Fuel Econ -0.1894 -0.5748Safety -0.3158 0.2930
Maintenance -0.3058 -0.3648Quality -0.3968 -0.1092Durable -0.3644 -0.1883Performance -0.3768 0.1446
5. Full Psychological Test Battery
fulltest.cor = Harman74.cor$cov image(1:24, 1:24, fulltest.cor, zlim=c(-1,1), col=cm.colors(21))
fulltest.pc = eigen(fulltest.cor)# Principal components analysis fulltest.pc$values [1] 8.1354 2.0960 1.6926 1.5018 1.0252 0.9429 0.9012 0.8159 0.7902 0.7069[11] 0.6394 0.5433 0.5330 0.5094 0.4775 0.3897 0.3820 0.3404 0.3338 0.3158[21] 0.2972 0.2681 0.1897 0.1725 plot(fulltest.pc$values,type="o", pch=16) abline(h=1,col="grey") rownames(fulltest.pc$vectors) = rownames(fulltest.cor)
fulltest.pc$vectors[,1:4] [,1] [,2] [,3] [,4]VisualPerception -0.2159 -0.003764 -0.32875 0.16685Cubes -0.1401 -0.054849 -0.30751 0.16446PaperFormBoard -0.1559 -0.131808 -0.36614 0.08630Flags -0.1790 -0.122936 -0.25729 0.17621GeneralInformation -0.2436 -0.221950 0.25774 0.04309PargraphComprehension -0.2421 -0.288461 0.20378 -0.06579SentenceCompletion -0.2373 -0.293489 0.27319 0.05917WordClassification -0.2434 -0.167723 0.11036 0.09493WordMeaning -0.2434 -0.311224 0.22351 -0.06499Addition -0.1662 0.374211 0.34305 0.16446Code -0.2020 0.299629 0.16148 -0.02757CountingDots -0.1691 0.379149 0.09735 0.27775StraightCurvedCapitals -0.2167 0.192742 -0.02724 0.29855WordRecognition -0.1570 0.063949 0.04238 -0.45320NumberRecognition -0.1457 0.098312 -0.06024 -0.42914FigureRecognition -0.1871 0.062819 -0.30098 -0.26697ObjectNumber -0.1710 0.190302 0.04029 -0.38276
NumberFigure -0.1906 0.266841 -0.15243 -0.12392FigureWord -0.1667 0.095217 -0.09373 -0.15777Deduction -0.2254 -0.128716 -0.10154 -0.05720NumericalPuzzles -0.2179 0.160439 -0.07670 0.16458ProblemReasoning -0.2242 -0.100721 -0.08481 -0.04537SeriesCompletion -0.2495 -0.072448 -0.11493 0.08402ArithmeticProblems -0.2358 0.135235 0.17931 0.05043
fulltest.fa1 = factanal(covmat = fulltest.cor, factors=1, n.obs=145) fulltest.fa1
Call:factanal(factors = 1, covmat = fulltest.cor, n.obs = 145)
Uniquenesses: VisualPerception Cubes PaperFormBoard 0.677 0.866 0.830 Flags GeneralInformation PargraphComprehension 0.768 0.487 0.491 SentenceCompletion WordClassification WordMeaning 0.500 0.514 0.474 Addition Code CountingDots 0.818 0.731 0.824 StraightCurvedCapitals WordRecognition NumberRecognition 0.681 0.833 0.863 FigureRecognition ObjectNumber NumberFigure 0.775 0.812 0.778 FigureWord Deduction NumericalPuzzles 0.816 0.612 0.676 ProblemReasoning SeriesCompletion ArithmeticProblems 0.619 0.524 0.593
Loadings: Factor1VisualPerception 0.569 Cubes 0.366 PaperFormBoard 0.412 Flags 0.482 GeneralInformation 0.716 PargraphComprehension 0.713 SentenceCompletion 0.707 WordClassification 0.697 WordMeaning 0.725 Addition 0.426 Code 0.519 CountingDots 0.419 StraightCurvedCapitals 0.565 WordRecognition 0.408 NumberRecognition 0.370 FigureRecognition 0.474 ObjectNumber 0.434 NumberFigure 0.471 FigureWord 0.429
Deduction 0.623 NumericalPuzzles 0.569 ProblemReasoning 0.617 SeriesCompletion 0.690 ArithmeticProblems 0.638
Factor1SS loadings 7.438Proportion Var 0.310
Test of the hypothesis that 1 factor is sufficient.The chi square statistic is 622.9 on 252 degrees of freedom.The p-value is 2.28e-33
update(fulltest.fa1, factors=2)Test of the hypothesis that 2 factors are sufficient.The chi square statistic is 420.2 on 229 degrees of freedom.The p-value is 2.01e-13
update(fulltest.fa1, factors=3)Test of the hypothesis that 3 factors are sufficient.The chi square statistic is 295.6 on 207 degrees of freedom.The p-value is 0.0000512
update(fulltest.fa1, factors=4)
Call:factanal(factors = 4, covmat = fulltest.cor, n.obs = 145)
Uniquenesses: VisualPerception Cubes PaperFormBoard 0.438 0.780 0.644 Flags GeneralInformation PargraphComprehension 0.651 0.352 0.312 SentenceCompletion WordClassification WordMeaning 0.283 0.485 0.257 Addition Code CountingDots 0.240 0.551 0.435 StraightCurvedCapitals WordRecognition NumberRecognition 0.491 0.646 0.696 FigureRecognition ObjectNumber NumberFigure 0.549 0.598 0.593 FigureWord Deduction NumericalPuzzles 0.762 0.592 0.583 ProblemReasoning SeriesCompletion ArithmeticProblems 0.601 0.497 0.500
Loadings: Factor1 Factor2 Factor3 Factor4VisualPerception 0.160 0.689 0.187 0.160 Cubes 0.117 0.436 PaperFormBoard 0.137 0.570 0.110 Flags 0.233 0.527 GeneralInformation 0.739 0.185 0.213 0.150 PargraphComprehension 0.767 0.205 0.233 SentenceCompletion 0.806 0.197 0.153
WordClassification 0.569 0.339 0.242 0.132 WordMeaning 0.806 0.201 0.227 Addition 0.167 -0.118 0.831 0.166 Code 0.180 0.120 0.512 0.374 CountingDots 0.210 0.716 StraightCurvedCapitals 0.188 0.438 0.525 WordRecognition 0.197 0.553 NumberRecognition 0.122 0.116 0.520 FigureRecognition 0.408 0.525 ObjectNumber 0.142 0.219 0.574 NumberFigure 0.293 0.336 0.456 FigureWord 0.148 0.239 0.161 0.365 Deduction 0.378 0.402 0.118 0.301 NumericalPuzzles 0.175 0.381 0.438 0.223 ProblemReasoning 0.366 0.399 0.123 0.301 SeriesCompletion 0.369 0.500 0.244 0.239 ArithmeticProblems 0.370 0.158 0.496 0.304
Factor1 Factor2 Factor3 Factor4SS loadings 3.647 2.872 2.657 2.290Proportion Var 0.152 0.120 0.111 0.095Cumulative Var 0.152 0.272 0.382 0.478
Test of the hypothesis that 4 factors are sufficient.The chi square statistic is 226.7 on 186 degrees of freedom.The p-value is 0.0224
update(fulltest.fa1, factors=5)Call:factanal(factors = 5, covmat = fulltest.cor, n.obs = 145)
Uniquenesses: VisualPerception Cubes PaperFormBoard 0.450 0.781 0.639 Flags GeneralInformation PargraphComprehension 0.649 0.357 0.288 SentenceCompletion WordClassification WordMeaning 0.277 0.485 0.262 Addition Code CountingDots 0.215 0.386 0.444 StraightCurvedCapitals WordRecognition NumberRecognition 0.256 0.639 0.706 FigureRecognition ObjectNumber NumberFigure 0.550 0.614 0.596 FigureWord Deduction NumericalPuzzles 0.764 0.521 0.564 ProblemReasoning SeriesCompletion ArithmeticProblems 0.580 0.442 0.478
Loadings: Factor1 Factor2 Factor3 Factor4 Factor5VisualPerception 0.161 0.658 0.136 0.182 0.199 Cubes 0.113 0.435 0.107 PaperFormBoard 0.135 0.562 0.107 0.116
Flags 0.231 0.533 GeneralInformation 0.736 0.188 0.192 0.162 PargraphComprehension 0.775 0.187 0.251 0.113 SentenceCompletion 0.809 0.208 0.136 WordClassification 0.568 0.348 0.223 0.131 WordMeaning 0.800 0.215 0.224 Addition 0.175 -0.100 0.844 0.176 Code 0.185 0.438 0.451 0.426 CountingDots 0.222 0.690 0.101 0.140 StraightCurvedCapitals 0.186 0.425 0.458 0.559 WordRecognition 0.197 0.557 NumberRecognition 0.121 0.130 0.508 FigureRecognition 0.400 0.529 ObjectNumber 0.145 0.208 0.562 NumberFigure 0.306 0.325 0.452 FigureWord 0.147 0.242 0.145 0.364 Deduction 0.370 0.452 0.139 0.287 -0.190 NumericalPuzzles 0.170 0.402 0.439 0.230 ProblemReasoning 0.358 0.423 0.126 0.302 SeriesCompletion 0.360 0.549 0.256 0.223 -0.107 ArithmeticProblems 0.371 0.185 0.502 0.307
Factor1 Factor2 Factor3 Factor4 Factor5SS loadings 3.632 2.964 2.456 2.345 0.663Proportion Var 0.151 0.124 0.102 0.098 0.028Cumulative Var 0.151 0.275 0.377 0.475 0.503
Test of the hypothesis that 5 factors are sufficient.The chi square statistic is 186.8 on 166 degrees of freedom.The p-value is 0.128
6. Factor analysis vs. PCA
Similarities Both methods are mostly used in EDA (exploratory data analysis). Both methods try to obtain dimension reduction: explain a data set in a smaller
number of new variables. Both methods don’t work if the observed variables are almost uncorrelated:
o Then PCA returns components that are similar to the original variables.o Then factor analysis has nothing to explain, i.e. uniqueness will be all close
to 1 Both methods give similar results if the specific variances are small. If specific variances are assumed to be zero in principle factor analysis, then PCA
and factor analysis are the same. Both PCA and FA DO NOT need Normality assumption.
Differences PCA required virtually no assumptions.
Factor analysis assumes that data come from a specific model structure. Normality assumption is needed in FA, however, in the case of chi-square test and MLE estimates. The principal factor analysis estimation procedure does not require normality though.
In PCA emphasis is on transforming observed variables to principle components.In factor analysis, emphasis is on the transformation from factors to observed variables.
PCA is not scale invariant.Factor analysis (with MLE) is scale invariant.
In PCA, considering c + 1 instead of c components does not change the first c components.In factor analysis, considering c + 1 instead of c factors may change the first c factors (when using MLE method).
Calculation of PCA scores is straightforward.Calculation of factor scores is more involved.
# Exploratory Factor Analysis
# Psychological Test Results Data (from book)
data(Harman74.cor)test.cor <- Harman74.cor$cov[c(6, 7, 9, 10, 12),c(6, 7, 9, 10, 12)]colnames(test.cor) <- c("PARA","SENT","WORD","ADD","COUNT")rownames(test.cor) <- colnames(test.cor)
test.cor
image(1:5, 1:5, test.cor, zlim=c(-1,1), col=cm.colors(21))# Plot of correlations - magenta = positive, cyan =
negativeimage(1:5, 1:5, test.cor, zlim=c(-1,1), col=grey(0:20/20) )
# Similar, with white = +1, black = -1.
test.pc <- eigen(test.cor) # Principal components analysistest.pc$valuesplot(test.pc$values,type="o", pch=16)abline(h=1,col="grey")
test.pc$vectors[,1:2]
test.fa2 <- factanal(covmat = test.cor, factors=2, n.obs=145)test.fa2
test.fa1 <- update(test.fa2, factors=1) # shorthand to make minor changestest.fa1 # to a model
# Note p-value for one-factor model
# Artificial Data - From R factanal( ) help pages
# A little demonstration, v2 is just v1 with noise,# and same for v4 vs. v3 and v6 vs. v5
v1 <- c(1,1,1,1,1,1,1,1,1,1,3,3,3,3,3,4,5,6)v2 <- c(1,2,1,1,1,1,2,1,2,1,3,4,3,3,3,4,6,5)v3 <- c(3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,5,4,6)v4 <- c(3,3,4,3,3,1,1,2,1,1,1,1,2,1,1,5,6,4)v5 <- c(1,1,1,1,1,3,3,3,3,3,1,1,1,1,1,6,4,5)v6 <- c(1,1,1,2,1,3,3,3,4,3,1,1,1,2,1,6,5,4)m1 <- cbind(v1,v2,v3,v4,v5,v6)
pairs(m1)pairs(m1+runif(6*18, -.3, .3)) # "jittering" to break ties
m1.pc <- prcomp(m1)plot(m1.pc$sdev^2, type="o", pch=16)abline(h=1,col="grey")
pairs(m1.pc$x[,1:3])m1.pc$rotation[,1:3]
m1.fa1 <- factanal(m1, factors=1)m1.fa1
m1.fa2 <- factanal(m1, factors=2, rotation="none")m1.fa2
m1.fa3 <- factanal(m1, factors=3, rotation="none")m1.fa3
m1.fa3a <- factanal(m1, factors=3, rotation="varimax", scores="regression") # default rotationm1.fa3a # Note improved interpretation of loadings
pairs(m1.fa3a$scores)
# Girls physical measurements data
data(Harman23.cor)girls.cor <- Harman23.cor$cov # Correlation matrix of 8 physical measurements on 305
# girls between 7 - 17
girls.cor
image(1:8, 1:8, girls.cor, zlim=c(-1,1), col=cm.colors(21) )
girls.pc <- eigen(girls.cor) # Principal components analysisgirls.pc$values
plot(girls.pc$values,type="o", pch=16)abline(h=1,col="grey")
rownames(girls.pc$vectors) <- colnames(girls.cor)
girls.pc$vectors[,1:2]
girls.fa1 <- factanal(covmat=girls.cor, factors=1, n.obs=305)girls.fa1
girls.fa2 <- factanal(covmat=girls.cor, factors=2, n.obs=305, rotation="none")girls.fa2
plot(0,0, xlim=c(-1,1), ylim=c(-1,1), type='n', xlab="Factor 1 Loadings", ylab="Factor 2 Loadings")abline(h=0, col='grey')abline(v=0, col='grey')arrows(0, 0, girls.fa2$loadings[,1], girls.fa2$loadings[,2])text(girls.fa2$loadings[,1], girls.fa2$loadings[,2], rownames(girls.fa2$loadings)) girls.fa2a <- factanal(covmat=girls.cor, factors=2, n.obs=305) # Varimax rotationgirls.fa2a
arrows(0, 0, girls.fa2a$loadings[,1], girls.fa2a$loadings[,2], col="red")text(girls.fa2a$loadings[,1], girls.fa2a$loadings[,2], rownames(girls.fa2$loadings), col="red")
girls.fa2b <- factanal(covmat=girls.cor, factors=2, n.obs=305, rotation="promax") # Promax rotationgirls.fa2b
arrows(0, 0, girls.fa2b$loadings[,1], girls.fa2b$loadings[,2], col="blue")text(girls.fa2b$loadings[,1], girls.fa2b$loadings[,2], rownames(girls.fa2$loadings), col="blue")
girls.fa3 <- factanal(covmat=girls.cor, factors=3, n.obs=305)girls.fa3 # The third factor contributes far less information than the first two
varimax(girls.pc$vectors[,1:2]) # Varimax rotation may be used for PCA loadings toopromax(girls.pc$vectors[,1:2]) # As may promax
girls2.cor <- girls.cor[1:5,1:5] # Drop weight-related variables except weight itself
girls2.fa2 <- factanal(covmat=girls2.cor, factors=2, n.obs=305)girls2.fa2 # Weight has high uniqueness, but not its own factor
# Pain Reliever Perceptions Data (from book)
pain <- read.table("PAIN_RELIEF.txt")colnames(pain) <- c("No Upset Stomach", "No Side Effects", "Stops Pain",
"Works Quickly", "Keeps Me Awake", "Limited Relief")
pain.pc <- prcomp(pain)plot(pain.pc$sdev^2, type="o", pch=16)abline(h=1,col="grey")
pain.pc$rotation[,1:2]
pain.fa<-factanal(pain, factors=2, scores="reg")pain.faplot(pain.fa$scores)
varimax(pain.pc$rotation[,1:2])promax(pain.pc$rotation[,1:2])
plot(pain.pc$x %*% varimax(pain.pc$rotation[,1:2])$loadings)
# Luxury Car Perceptions Data (from book)
car.cor <- read.tri("LUXURY_CAR.txt")colnames(car.cor) <- c("Luxury", "Style", "Reliability", "Fuel Econ",
"Safety", "Maintenance", "Quality", "Durable", "Performance")rownames(car.cor) <- colnames(car.cor)
car.pc <- eigen(car.cor)car.pc$valuesplot(car.pc$values,type="o", pch=16)abline(h=1,col="grey")
factanal(covmat=car.cor, factors=2)
rownames(car.pc$vectors) <- colnames(car.cor)car.pc$vectors[,1:2]varimax(car.pc$vectors[,1:2])
# Back to full Psychological Test Results data
fulltest.cor <- Harman74.cor$covimage(1:24, 1:24, fulltest.cor, zlim=c(-1,1), col=cm.colors(21))
fulltest.pc <- eigen(fulltest.cor) # Principal components analysisfulltest.pc$valuesplot(fulltest.pc$values,type="o", pch=16)abline(h=1,col="grey")
fulltest.pc$vectors[,1:4]varimax(fulltest.pc$vectors[,1:4])promax(fulltest.pc$vectors[,1:4])
fulltest.fa1 <- factanal(covmat = fulltest.cor, factors=1, n.obs=145)fulltest.fa1
update(fulltest.fa1, factors=2) # Note use of test to choose number of factorsupdate(fulltest.fa1, factors=3)update(fulltest.fa1, factors=4)update(fulltest.fa1, factors=5)