20
Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 1 ttp://xkcd.com/419/

Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

Embed Size (px)

Citation preview

Page 1: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

Unit 6b: Principal Components Analysis

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 1

http://xkcd.com/419/

Page 2: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

• Weighted Composites• Biplots: Visualizing Variables as Vectors• Principal Components Analysis

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 2

Multiple RegressionAnalysis (MRA)

Multiple RegressionAnalysis (MRA) iiii XXY 22110

Do your residuals meet the required assumptions?

Test for residual

normality

Use influence statistics to

detect atypical datapoints

If your residuals are not independent,

replace OLS by GLS regression analysis

Use Individual

growth modeling

Specify a Multi-level

Model

If time is a predictor, you need discrete-

time survival analysis…

If your outcome is categorical, you need to

use…

Binomial logistic

regression analysis

(dichotomous outcome)

Multinomial logistic

regression analysis

(polytomous outcome)

If you have more predictors than you

can deal with,

Create taxonomies of fitted models and compare

them.

Form composites of the indicators of any common

construct.

Conduct a Principal Components Analysis

Use Cluster Analysis

Use non-linear regression analysis.

Transform the outcome or predictor

If your outcome vs. predictor relationship

is non-linear,

Use Factor Analysis:EFA or CFA?

Course Roadmap: Unit 6b

Today’s Topic Area

Page 3: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 3

Here’s a dataset in which teachers’ responses to what the investigators believed were multiple indicators/ predictors of a single underlying construct of Teacher Job Satisfaction: The data are described in TSUCCESS_info.pdf.

Here’s a dataset in which teachers’ responses to what the investigators believed were multiple indicators/ predictors of a single underlying construct of Teacher Job Satisfaction: The data are described in TSUCCESS_info.pdf.

Dataset TSUCCESS.txt

Overview Responses of national sample of teachers to six questions about job satisfaction.

SourceAdministrator and Teacher Survey of the High School and Beyond (HS&B) dataset, 1984 administration, National Center for Education Statistics (NCES). All NCES datasets are also available free from the EdPubs on-line supermarket.

Sample Size 5269 teachers (4955 with complete data).

More Info

HS&B was established to study educational, vocational, and personal development of young people beginning in their elementary or high school years and following them over time as they began to take on adult responsibilities. The HS&B survey included two cohorts: (a) the 1980 senior class, and (b) the 1980 sophomore class. Both cohorts were surveyed every two years through 1986, and the 1980 sophomore class was also surveyed again in 1992.

Multiple Indicators of a Common Construct

Page 4: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

The OLS criterion minimizes the sum of vertical squared residuals.

Other definitions of “best fit” are possible:

Vertical Squared Residuals (OLS) Horizontal Squared Residuals (X on Y) Orthogonal Residuals (PCA!)

Unit 6b – Slide 4© Andrew Ho, Harvard Graduate School of Education

Review: From OLS to Orthogonal (Perpendicular) Residuals

Page 5: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

01

23

45

6X

2

1 2 3 4 5 6X1

01

23

45

6X

2

1 2 3 4 5 6X1

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 5

Visualizing Correlations

X6 0.1921 0.2225 0.4326 0.3993 0.5529 1.0000 X5 0.2531 0.2697 0.3557 0.4478 1.0000 X4 0.2127 0.2313 0.2990 1.0000 X3 0.1610 0.1663 1.0000 X2 0.5548 1.0000 X1 1.0000 X1 X2 X3 X4 X5 X6

. pwcorr X1-X6, casewise

The sample correlation between variables X1 and X2 is .5548.

The sample correlation between variables X1 and X2 is .5548.

We can visualize that here with a scatterplot, as usual, adding in the OLS regression line (minimizing vertical residuals) and the orthogonal (principal axis) regression line (minimizing orthogonal residuals).

We can visualize that here with a scatterplot, as usual, adding in the OLS regression line (minimizing vertical residuals) and the orthogonal (principal axis) regression line (minimizing orthogonal residuals).

Page 6: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 6

IndicatorSt.Dev.

Angle of Bisection/Bivariate Correlation

X1 X2 X3 X4 X5 X6

X1: Have high standards of teaching 1.09 0.555 0.161 0.213 0.253 0.192X2: Continually learning on the job 1.25 56 0.166 0.231 0.270 0.222X3: Successful in educating students 0.67 81 80 0.299 0.356 0.433

X4: Waste of time to do best as teacher 1.67 78 77 73 0.448 0.399

X5: Look forward to working at school 1.33 75 74 69 63 0.553

X6: Time satisfied with job 0.57 79 77 64 67 56

Regard the correlation between two indicators as the cosine of the angle between them:

… etc.

X1

1.09

X2

1.25

56

Regard the standard deviation of the indicator as its “length”:

… etc.

X11.09

X21.25

X30.67

Inter-correlated indicators are like “forces” diverging from a point. In compositing the indicators, you may visualize their vector sum, or “resultant” … note the angle between the two vectors is arccos(), the inverse cosine of the correlation.

Inter-correlated indicators are like “forces” diverging from a point. In compositing the indicators, you may visualize their vector sum, or “resultant” … note the angle between the two vectors is arccos(), the inverse cosine of the correlation.

X1

X2

X3

56

81 80•

1.09

1.25

0.67… etc.

Putting it all together …

Visualizing Variables as Vectors

The correlation is not visualized in the actual observations but the angle between the vectors representing the variables. The smaller the angle (more parallel), the more correlated the variables.

The correlation is not visualized in the actual observations but the angle between the vectors representing the variables. The smaller the angle (more parallel), the more correlated the variables.

Page 7: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 7

John Willett’s “Potato Technology”

IndicatorSt.Dev.

Angle of Bisection/Bivariate Correlation

X1 X2 X3 X4 X5 X6

X1: Have high standards of teaching 1.09 0.555 0.161 0.213 0.253 0.192X2: Continually learning on the job 1.25 56 0.166 0.231 0.270 0.222X3: Successful in educating students 0.67 81 80 0.299 0.356 0.433

X4: Waste of time to do best as teacher 1.67 78 77 73 0.448 0.399

X5: Look forward to working at school 1.33 75 74 69 63 0.553

X6: Time satisfied with job 0.57 79 77 64 67 56

Page 8: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

X1

X2

-60

-40

-20

020

4060

Dim

ensi

on 2

-20 0 20 40 60 80 100Dimension 1

Biplot

X1

X2

X3

-60

-40

-20

020

4060

Dim

ensi

on 2

-20 0 20 40 60 80 100Dimension 1

Biplot

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 8

The biplot Command in Stata

Total explained variance 1.0000 Explained variance by component 2 0.2172 Explained variance by component 1 0.7828

Biplot of 4955 observations and 2 variables

. biplot X1 X2 if NMISSING==0, alpha(0) norow xline(0) yline(0)

Total explained variance 0.8651 Explained variance by component 2 0.1867 Explained variance by component 1 0.6785

Biplot of 4955 observations and 3 variables

. biplot X1-X3 if NMISSING==0, alpha(0) norow xline(0) yline(0)

56

“Biplot” because it can plot both observations (rows) and variables (columns), though we use it here for the latter only. It is the shadow of the multidimensional representation of vectors onto 2D space defined by the first two “principal components”“Biplot” because it can plot both observations (rows) and variables (columns), though we use it here for the latter only. It is the shadow of the multidimensional representation of vectors onto 2D space defined by the first two “principal components”

The lines here are directly proportional to the standard deviations of the variables, and the cosine of the angle between them is their correlation.

This is a shadow of a 3D representation onto a 2D plane. The X3 vector is shorter in part because of its smaller standard deviation but more so because it projects into the slide.

Page 9: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 9

Biplots for >3 variables, unstandardized and standardizedWe can visualize two variables in 2D, and three variables in 3D. >3 variables can project into 3D space (with potato technology), and the shadow of potato technology onto the 2D screen makes a biplot. Also, we can standardize…We can visualize two variables in 2D, and three variables in 3D. >3 variables can project into 3D space (with potato technology), and the shadow of potato technology onto the 2D screen makes a biplot. Also, we can standardize…

X1

X2

X3

X4

X5X6

-60

-40

-20

020

4060

80D

imen

sion

2

-20 0 20 40 60 80 100 120Dimension 1

Biplot Total explained variance 0.7120 Explained variance by component 2 0.2193 Explained variance by component 1 0.4927

Biplot of 4955 observations and 6 variables

. biplot X1-X6 if NMISSING==0, alpha(0) norow xline(0) yline(0) xnegate

X1X2

X3

X4 X5

X6

-40

-20

020

4060

Dim

ensi

on 2

-20 0 20 40 60 80Dimension 1

Biplot Total explained variance 0.6363 Explained variance by component 2 0.2019 Explained variance by component 1 0.4343

Biplot of 4955 observations and 6 variables

. biplot X1-X6 if NMISSING==0, alpha(0) norow xline(0) yline(0) xnegate std

Unstandardizedvariables

Standardizedvariables

Page 10: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 10

If a composite is a simple sum (or straight average) of indicators, then coefficient alpha is relevant. If a composite is a simple sum of standardized indicators, then standardized alpha is relevant. If a composite is a simple sum (or straight average) of indicators, then coefficient alpha is relevant. If a composite is a simple sum of standardized indicators, then standardized alpha is relevant.

Cronbach Coefficient AlphaStandardized 0.735530Cronbach Coefficient AlphaStandardized 0.735530

For an additive composite of “standardized” indicators:• First, each indicator is standardized to a mean of 0 and a standard

deviation of 1:

• Then, the standardized indicator scores are summed:

*6

*5

*4

*3

*2

*1

*iiiiiii XXXXXXX

57.0

84.2

33.1

42.4

67.1

23.467.0

15.3

24.1

87.3

09.1

33.4

6*6

5*5

4*4

3*3

2*2

1*1

ii

ii

ii

ii

ii

ii

XX

XX

XX

XX

XX

XX

iX 2

iX1

iX 5

iX 4

iX 3

iX 6

= + + + + +

XSTD XSTD1 XSTD2 XSTD3 XSTD4 XSTD5 XSTD6

Unweighted Composites

But what biplots and potato technology show is that we might be able to do better in forming an optimal composite, by allowing particular indicators to be weighted.

But what biplots and potato technology show is that we might be able to do better in forming an optimal composite, by allowing particular indicators to be weighted.

*iX

Page 11: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 11

More generally, a weighted linear composite can be formed by weighting and adding standardized indicators together to form a composite measure of teacher job satisfaction …More generally, a weighted linear composite can be formed by weighting and adding standardized indicators together to form a composite measure of teacher job satisfaction …

iX

iX 2

iX1

iX 5

iX 4

iX 3

iX 6

6w5w 4w

2w

3w

1w

By choosing weights that differ from unity and differ from each other, we can create an infinite number of potential composites, as follows:

We often use “normed” weights, such that:

*66

*55

*44

*33

*22

*11 iiiiiii XwXwXwXwXwXwX

1)( 26

25

24

23

22

21 wwwwww

Among all such weighted linear composites, are there some that are “optimal”?

How would we define such “optimal” composites?: Does it make sense, for instance, to seek a

composite with maximum variance, given the original standardized indicators.

Perhaps I can also choose weights that take account of the differing inter-correlations among the indicators, and “pull” the composite “closer” to the more highly-correlated indicators?

Weighted Composites

Page 12: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

*--------------------------------------------------------------------------------* Carry-out a principal components analysis of teacher satisfaction*--------------------------------------------------------------------------------* Conduct a principal components analysis of the six indicators.* By default, PCA performs a listwise deletion of cases with missing values, * and standardizes the indicators before compositing: pca X1-X6, means

* Scree Plot showing the eigenvalues screeplot, ylabel(0(.5)2.5)

* Output the composite scores on the first two principal components: predict PC_1 PC_2, score

*--------------------------------------------------------------------------------* Inspect properties of composite scores on the first two principal components.*--------------------------------------------------------------------------------* List out the principal component scores for the first 35 teachers: list PC_1 PC_2 in 1/35

* Estimate univariate descriptive statistics for the composite scores on the* first two principal components: tabstat PC_1 PC_2, stat(n mean sd) columns(statistics)

* Estimate the bivariate correlation between the composite scores on the first* two principal components: pwcorr PC_1 PC_2, sig obs

STATA routine pca implements Principal Components Analysis (PCA): By choosing sets of weights, PCA seeks out optimal weighted linear

composites of the original (often standardized) indicators. These composites are called the “Principal Components.” The First Principal Component is that weighted linear composite

that has maximum variance, among all possible composites with the same indicator-indicator correlations.

The pca Command

After completing a PCA, we can save individual “scores” on the first, second, etc., components (principal components). Provide new variable names for composites that you want to keep (here,

PC_ 1, PC_2, etc.) and use them in subsequent analysis.

iX

iX 2

iX1

iX 5

iX 4

iX 3

iX 6

6w5w 4w

2w

3w

1w

A “scree plot” can tell us how much variance each component

accounts for, and how many components might be necessary

to account for sufficient variance.

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 12

Page 13: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

Summary statistics of the variables ------------------------------------------------------------- Variable | Mean Std. Dev. Min Max -------------+----------------------------------------------- X1 | 4.329364 1.088205 1 6 X2 | 3.873663 1.242735 1 6 X3 | 3.154995 .6692635 1 4 X4 | 4.227043 1.665968 1 6 X5 | 4.42442 1.328885 1 6 X6 | 2.836529 .5714207 1 4 -------------------------------------------------------------

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 13

The total variance is simply . When variables are standardized, the total variance is the number of variables. Generally, the total variance is literally just the sum of the

variances. Next, we ask which linear combination of variables will account for the most variance.

STATA provides important pieces of output, including univariate descriptive statistics:STATA provides important pieces of output, including univariate descriptive statistics:

As a first step, by default, PCA estimates the sample mean and standard deviation of the indicators and standardizes each of them, as follows:

57.0

84.2

33.1

42.4

67.1

23.467.0

15.3

24.1

87.3

09.1

33.4

6*6

5*5

4*4

3*3

2*2

1*1

ii

ii

ii

ii

ii

ii

XX

XX

XX

XX

XX

XX

Standardized Variables and Total Variance

As before, standardizing variables before a PCA suggests that differences in variance across indicators are not interpretable. If they are interpretable, and if indicators share a common scale that transcends standard deviation units, then use the covariance option.

As before, standardizing variables before a PCA suggests that differences in variance across indicators are not interpretable. If they are interpretable, and if indicators share a common scale that transcends standard deviation units, then use the covariance option.

Page 14: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

Principal components (eigenvectors) ------------------------------------------------------------------------ Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 ----------+------------------------------------------------------------ X1 | 0.3472 0.6182 0.0896 0.0264 0.6261 0.3108 X2 | 0.3617 0.5950 0.0543 -0.0217 -0.6685 -0.2548 X3 | 0.3778 -0.3021 0.7555 0.4028 0.0503 -0.1746 X4 | 0.4144 -0.1807 -0.5972 0.6510 -0.0493 0.1129 X5 | 0.4727 -0.2067 -0.2418 -0.4501 0.3022 -0.6176 X6 | 0.4591 -0.3117 0.0558 -0.4584 -0.2548 0.6433 ------------------------------------------------------------------------

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 14

This “ideal” composite is called the First Principal Component, it is:

Where:

*6

*5

*4

*3

*2

*1 46.047.041.038.036.035.01_ iiiiiii XXXXXXPC

57.0

84.2

33.1

42.4

67.1

23.467.0

15.3

24.1

87.3

09.1

34.4

6*6

5*5

4*4

3*3

2*2

1*1

ii

ii

ii

ii

ii

ii

XX

XX

XX

XX

XX

XX

iPC 1_

iX1

iX 2

iX 3

iX 4

iX 5

iX 6

.35

.36

.38

.41

.47

.46

First Principal Component

List of the original indicators

This Column Is Referred To As The “First Eigenvector”It contains the weights that PCA has determined will provide that particular

linear composite of the six original standardized indicators that has maximum possible variance, given the inter-correlations among the original

indicators.

Here is a list of the optimal weights we were seeking!!Here is a list of the optimal weights we were seeking!!

The First Principal Component

Note that the sum of squared weights in any component will equal 1, these are “normalized” eigenvectors.

Note that the sum of squared weights in any component will equal 1, these are “normalized” eigenvectors.

Page 15: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

Principal components (eigenvectors) ------------------------------------------------------------------------ Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 ----------+------------------------------------------------------------ X1 | 0.3472 0.6182 0.0896 0.0264 0.6261 0.3108 X2 | 0.3617 0.5950 0.0543 -0.0217 -0.6685 -0.2548 X3 | 0.3778 -0.3021 0.7555 0.4028 0.0503 -0.1746 X4 | 0.4144 -0.1807 -0.5972 0.6510 -0.0493 0.1129 X5 | 0.4727 -0.2067 -0.2418 -0.4501 0.3022 -0.6176 X6 | 0.4591 -0.3117 0.0558 -0.4584 -0.2548 0.6433 ------------------------------------------------------------------------

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 15

iPC 1_

iX1

iX 2

iX 3

iX 4

iX 5

iX 6

.35

.36

.38

.41

.47

.46

First Principal Component

Notice that each original (standardized) indicator is approximately equally weighted in the First Principal Component: This suggests that the first principal component we have

obtained in this example is a largely equally weighted summary of the indicator variables that we have.

Notice that teachers who score highly on the First Principal Component: Have high standards of teaching performance. Feel that they are continually learning on the job. Believe that they are successful in educating students. Feel that it is not a waste of time to be a teacher. Look forward to working at school. Are always satisfied on the job.

Given this, let’s define the First Principal Component as an overall Index of Teacher Enthusiasm? Sure, why not.

Be cautious of the “naming fallacy” or the “reification fallacy.”

But, what is the First Principal Component actually measuring?But, what is the First Principal Component actually measuring?

Interpreting the First Principal Component

Page 16: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

-------------------------------------------------------------- Component | Eigenvalue Difference Proportion Cumulative-----------+-------------------------------------------------- Comp1 | 2.606 1.394 0.4343 0.4343 Comp2 | 1.212 0.499 0.2019 0.6363 Comp3 | 0.713 0.118 0.1188 0.7551 Comp4 | 0.595 0.147 0.0992 0.8543 Comp5 | 0.448 0.021 0.0746 0.9289 Comp6 | 0.427 . 0.0711 1.0000--------------------------------------------------------------

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 16

This column contains the “Eigenvalues.”

But, how successful a composite of the original indicators is the First Principal Component?

But, how successful a composite of the original indicators is the First Principal Component?

The eigenvalue for the First Principal Component provides its variance:

In this example, where the original indicator-indicator correlations were low, the best that PCA has been able to do is to form an “optimal” composite that contains 2.61 units of the original 6 units of standardized variance. That’s 43.43% of the original standardized variance. Is this sufficient to call this unidimensional?

This implies that 3.39 units of the original standardized variance remain!!! Maybe there are other interesting composites still to

be found, that will sweep up the remaining variance. Perhaps we can form other substantively-interesting

composites from these same six indicators, by choosing different sets of weights:

Maybe there are other “dimensions” of information still hidden within the data?

We can inspect the other “principal components” that PCA has formed in these data …

Eigenvalues and the Proportion of Variance0

.51

1.5

22.

5E

igen

valu

es

1 2 3 4 5 6Number

Scree plot of eigenvalues after pca

The “scree plot” helps us tell whether we should include an additional principal component to account for greater variance.The “scree plot” helps us tell whether we should include an additional principal component to account for greater variance.

We sometimes use the “Rule of 1,” in this case, keeping two principal components, but I prefer basing the decision based on the “elbow” from visual inspection. Consistent, in this case.

We sometimes use the “Rule of 1,” in this case, keeping two principal components, but I prefer basing the decision based on the “elbow” from visual inspection. Consistent, in this case.

Page 17: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

Unit 6b – Slide 17

Scree Plots and Biplots

0.5

11.

52

2.5

Eig

enva

lues

1 2 3 4 5 6Number

Scree plot of eigenvalues after pca

X1X2

X3

X4 X5

X6

-40

-20

020

4060

Dim

ensi

on 2

-20 0 20 40 60 80Dimension 1

Biplot

Total explained variance 0.6363 Explained variance by component 2 0.2019 Explained variance by component 1 0.4343

Biplot of 4955 observations and 6 variables

. biplot X1-X6 if NMISSING==0, alpha(0) norow xline(0) yline(0) xnegate std

Principal components (eigenvectors) ------------------------------------------------------------------------ Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 ----------+------------------------------------------------------------ X1 | 0.3472 0.6182 0.0896 0.0264 0.6261 0.3108 X2 | 0.3617 0.5950 0.0543 -0.0217 -0.6685 -0.2548 X3 | 0.3778 -0.3021 0.7555 0.4028 0.0503 -0.1746 X4 | 0.4144 -0.1807 -0.5972 0.6510 -0.0493 0.1129 X5 | 0.4727 -0.2067 -0.2418 -0.4501 0.3022 -0.6176 X6 | 0.4591 -0.3117 0.0558 -0.4584 -0.2548 0.6433 ------------------------------------------------------------------------

Comp6 .426647 . 0.0711 1.0000 Comp5 .447766 .0211189 0.0746 0.9289 Comp4 .595185 .147419 0.0992 0.8543 Comp3 .712803 .117618 0.1188 0.7551 Comp2 1.2116 .498802 0.2019 0.6363 Comp1 2.60599 1.39439 0.4343 0.4343 Component Eigenvalue Difference Proportion Cumulative

Teachers who score high on the second component… Have high standards of teaching

performance. Feel that they are continually learning on

the job.

But also … Believe they are not successful in

educating students. Feel that it is a waste of time to be a

teacher. Don’t look forward to working at school. Are never satisfied on the job

If the first principal component is teacher enthusiasm, the second might be teacher frustration.

Note that, by construction, all principal components are uncorrelated with all other principal components.

Does it make sense that enthusiasm and frustration should be uncorrelated?

If the first principal component is teacher enthusiasm, the second might be teacher frustration.

Note that, by construction, all principal components are uncorrelated with all other principal components.

Does it make sense that enthusiasm and frustration should be uncorrelated?

© Andrew Ho, Harvard Graduate School of Education

Page 18: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 18

Obs PC_1 PC_2

 1 -0.67402 1.64567 2 -3.70420 1.25497 3 -2.80870 1.46971 4 . . 5 -0.72933 0.16173 6 . . 7 0.68828 -0.66211 8 -1.64624 1.96727 9 1.84142 1.6660610 -0.11813 -0.2059611 -3.70653 0.8550712 2.11717 0.8482013 -0.66466 -0.4725814 -1.09068 -0.9936215 -0.89365 0.4289416 1.61503 0.5529917 -1.95180 -2.3319218 -1.40406 -0.2508419 1.18572 -0.8790420 -2.05647 -1.8849521 -0.36685 -0.0974922 -2.64324 -0.2120723 2.21446 1.1030524 -2.55062 -0.7570125 -0.03442 -2.97280

(cases deleted)

Obs PC_1 PC_2

 1 -0.67402 1.64567 2 -3.70420 1.25497 3 -2.80870 1.46971 4 . . 5 -0.72933 0.16173 6 . . 7 0.68828 -0.66211 8 -1.64624 1.96727 9 1.84142 1.6660610 -0.11813 -0.2059611 -3.70653 0.8550712 2.11717 0.8482013 -0.66466 -0.4725814 -1.09068 -0.9936215 -0.89365 0.4289416 1.61503 0.5529917 -1.95180 -2.3319218 -1.40406 -0.2508419 1.18572 -0.8790420 -2.05647 -1.8849521 -0.36685 -0.0974922 -2.64324 -0.2120723 2.21446 1.1030524 -2.55062 -0.7570125 -0.03442 -2.97280

(cases deleted)

As promised, scores on 1st and 2nd principal components are uncorrelated…As promised, scores on 1st and 2nd principal components are uncorrelated…

Also, notice that any teacher missing on any indicator is also missing on every composite

57.0

84.2

33.1

42.467.1

23.4

67.0

15.324.1

87.3

09.1

34.4

6*6

5*5

4*4

3*3

2*2

1*1

ii

ii

ii

ii

ii

ii

XX

XX

XX

XX

XX

XX

Scores of Teacher #1 on PC #1 & PC #2:

iPC 1_

iX1

iX 2

iX 3

iX 4

iX 5

iX 6

.35

.36

.38

.41

.47

.46

iPC 2_

iX1

iX 2

iX 3

iX 4

iX 5

iX 6

.62

.60

-.30

-.18

-.20

-.31

Pearson Correlation Coefficients PC_1 PC_2 PC_1 1.00000 0.00000 PC_2 0.00000 1.00000

Pearson Correlation Coefficients PC_1 PC_2 PC_1 1.00000 0.00000 PC_2 0.00000 1.00000

Estimated bivariate correlation between teachers’ scores on 1st and 2nd principal components is exactly

zero!! Principal components are uncorrelated by construction.

We can save the first and second principal components…

X1X2

X3

X4 X5

X6

-40

-20

020

4060

Dim

ensi

on 2

-20 0 20 40 60 80Dimension 1

Biplot

Page 19: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 19

A Framework for Principal Components Analysis Know your variables. Read your items. Take the test.

Think carefully about what the scale intends to measure. And how scores will be interpreted and used.

Univariate and bivariate statistics and visualizations. Particularly correlation matrices and pairwise scatterplots. Transform variables to achieve linearity if necessary.

To standardize or not to standardize? Do item scales share equal-interval meaning across items? Are differences in variances meaningful across items? Is it useful to set the variances of items equal?

Reliability analyses (alpha) for unweighted composites Provides a reliability statistic, Cronbach’s alpha, that estimates the correlation of scores

across replications of a measurement procedure (drawing new items, views items as random). Principal Components Analysis (pca) for weighted composites

Decides how to best weight the items you have in order to maximize the variance as a proportion of total score variance (keeps the same items, views items as fixed).

Use eigenvalues and scree plots to check dimensionality. Select the number of principal components to represent the data based on statistical and

substantive criteria. Confirm that the weights on the variables are interpretable and consistent with the

theories that motivated the design of the instrument. Beware the naming/reification fallacy.

Save the principal components for subsequent analysis.

Know your variables. Read your items. Take the test. Think carefully about what the scale intends to measure. And how scores will be interpreted and used.

Univariate and bivariate statistics and visualizations. Particularly correlation matrices and pairwise scatterplots. Transform variables to achieve linearity if necessary.

To standardize or not to standardize? Do item scales share equal-interval meaning across items? Are differences in variances meaningful across items? Is it useful to set the variances of items equal?

Reliability analyses (alpha) for unweighted composites Provides a reliability statistic, Cronbach’s alpha, that estimates the correlation of scores

across replications of a measurement procedure (drawing new items, views items as random). Principal Components Analysis (pca) for weighted composites

Decides how to best weight the items you have in order to maximize the variance as a proportion of total score variance (keeps the same items, views items as fixed).

Use eigenvalues and scree plots to check dimensionality. Select the number of principal components to represent the data based on statistical and

substantive criteria. Confirm that the weights on the variables are interpretable and consistent with the

theories that motivated the design of the instrument. Beware the naming/reification fallacy.

Save the principal components for subsequent analysis.

Page 20: Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 6b – Slide 20

Eigenvectors  PC_1 PC_2 X1 Have high standards of teaching 0.3472 0.6182X2 Continually learning on job 0.3617 0.5950X3 Successful in educating students 0.3778 -.3021X4 Waste of time to do best as teacher 0.4144 -.1807X5 Look forward to working at school 0.4727 -.2067X6 Time satisfied with job 0.4591 -.3117

Eigenvectors  PC_1 PC_2 X1 Have high standards of teaching 0.3472 0.6182X2 Continually learning on job 0.3617 0.5950X3 Successful in educating students 0.3778 -.3021X4 Waste of time to do best as teacher 0.4144 -.1807X5 Look forward to working at school 0.4727 -.2067X6 Time satisfied with job 0.4591 -.3117

Estimated correlation between any indicator and any component can be found

by multiplying the corresponding component loading by the square root of

the eigenvalue.This is sometimes useful in interpretation.(For unstandardized variables, you must

next divide by the standard deviation of the variable to obtain the correlation)

Correlation of X1 and PC_1:= 0.347 2.61= 0.347 1.62= 0.561

Correlation of X1 and PC_2:= 0.618 1.212= 0.618 1.101= 0.680

Correlation of X1 and PC_3:= …==

Appendix: Some Useful Eigenvalue/Eigenvector Math

The proportion of total variation accounted for by any principal component is its eigenvalue over the sum of all eigenvalues.

The proportion of total variation accounted for by any principal component is its eigenvalue over the sum of all eigenvalues.

Comp6 .426647 . 0.0711 1.0000 Comp5 .447766 .0211189 0.0746 0.9289 Comp4 .595185 .147419 0.0992 0.8543 Comp3 .712803 .117618 0.1188 0.7551 Comp2 1.2116 .498802 0.2019 0.6363 Comp1 2.60599 1.39439 0.4343 0.4343 Component Eigenvalue Difference Proportion Cumulative

Eigenvectors are “normalized” such that the sum of squared weights, or loadings, is 1. Note that the dot product of any two eigenvectors is 0, reaffirming their orthogonality.

Eigenvectors are “normalized” such that the sum of squared weights, or loadings, is 1. Note that the dot product of any two eigenvectors is 0, reaffirming their orthogonality.