Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
Topic 9: Canonical Correlation
Topic 9: Canonical Correlation
Ying Li
Stockholm University
October 22, 2012
1/19
Topic 9: Canonical Correlation
Basic Concepts
Objectives
In canonical correlation analysis, we examine the linear relationshipbetween a set of X variables and a set of more then one Yvariables.
2/19
Topic 9: Canonical Correlation
Basic Concepts
Definition
The canonical correlation technique is to find several linearcombinations of X variables and the same number of linearcombination of Y variables in such as these linear combinationbest express the correlation between the two sets.
• The linear combinations are called the canonical variables.
• The correlation between the corresponding pairs of canonicalvariables are called canonical correlation.
3/19
Topic 9: Canonical Correlation
Basic Concepts
Definition
The canonical correlation technique is to find several linearcombinations of X variables and the same number of linearcombination of Y variables in such as these linear combinationbest express the correlation between the two sets.
• The linear combinations are called the canonical variables.
• The correlation between the corresponding pairs of canonicalvariables are called canonical correlation.
3/19
Topic 9: Canonical Correlation
Analytical Approach:
Suppose we desire to examine the relationship between a set ofvariables x1, x2,· · · , xp and another set y1, y2,· · · , yq. And thesample means for all x and y variables are zero. The first step incanonical correlation is to form two linear combination:
W1 = a11x1 + a12x2 + · · ·+ a1pxp
V1 = b11y1 + b12y2 + · · ·+ b1qyq,
such that corr(W1, V1) = C1 is maximum.
4/19
Topic 9: Canonical Correlation
Analytical Approach:
Then the second step is to identify another set of canonicalvariables
W2 = a21x1 + a22x2 + · · ·+ a2pxp
V2 = b21y1 + b22y2 + · · ·+ b2qyq,
such that corr(W2, V2) = C2 is maximum and corr(W1,W2) = 0,corr(V1, V2) = 0.
This procedure continues. In total, no more than min(p, q)canonical variable sets can be identified.
5/19
Topic 9: Canonical Correlation
Analytical Approach:
Then the second step is to identify another set of canonicalvariables
W2 = a21x1 + a22x2 + · · ·+ a2pxp
V2 = b21y1 + b22y2 + · · ·+ b2qyq,
such that corr(W2, V2) = C2 is maximum and corr(W1,W2) = 0,corr(V1, V2) = 0.This procedure continues. In total, no more than min(p, q)canonical variable sets can be identified.
5/19
Topic 9: Canonical Correlation
Data
A depress study on 294 observations. n = 294Dep. variables:
• y1 = CESD: an index of depression,0-60, high scoreindicates likelihood of depression
• y2 = health: rating score, 1-4, high score indicates poorhealth
Indep. variables:
• x1 = sex: 0 male, 1 female
• x2 = age: age in years
• x3 = educat: 1-7, high score indicates high education
• x4 = income: thousands of dollars per year.
6/19
Topic 9: Canonical Correlation
Data
Figure: Summary statistics
7/19
Topic 9: Canonical Correlation
Data
Figure: Correlation matrix
8/19
Topic 9: Canonical Correlation
Interpret the result
Canonical correlation
Figure: Canonical correlation
9/19
Topic 9: Canonical Correlation
Interpret the result
Test of hypothesis
• H0 : C1 = C2 = · · · = Ck = 0• Two tests:
• Bartlett’s chi-square test• An approximate F test
• A large chi-square or a large F are indication that not all thecorrelations are equal to 0.
• At least the largest canonical correlation is not 0.
• It quite possible the remaining k − 1 may be not stat. sign..
• H0 : C2 = · · · = Ck = 0
• · · ·
10/19
Topic 9: Canonical Correlation
Interpret the result
Test of hypothesis
• H0 : C1 = C2 = · · · = Ck = 0• Two tests:
• Bartlett’s chi-square test• An approximate F test
• A large chi-square or a large F are indication that not all thecorrelations are equal to 0.
• At least the largest canonical correlation is not 0.
• It quite possible the remaining k − 1 may be not stat. sign..
• H0 : C2 = · · · = Ck = 0
• · · ·
10/19
Topic 9: Canonical Correlation
Interpret the result
Test of hypothesis
• H0 : C1 = C2 = · · · = Ck = 0• Two tests:
• Bartlett’s chi-square test• An approximate F test
• A large chi-square or a large F are indication that not all thecorrelations are equal to 0.
• At least the largest canonical correlation is not 0.
• It quite possible the remaining k − 1 may be not stat. sign..
• H0 : C2 = · · · = Ck = 0
• · · ·
10/19
Topic 9: Canonical Correlation
Interpret the result
Test of hypothesis
• H0 : C1 = C2 = · · · = Ck = 0• Two tests:
• Bartlett’s chi-square test• An approximate F test
• A large chi-square or a large F are indication that not all thecorrelations are equal to 0.
• At least the largest canonical correlation is not 0.
• It quite possible the remaining k − 1 may be not stat. sign..
• H0 : C2 = · · · = Ck = 0
• · · ·
10/19
Topic 9: Canonical Correlation
Interpret the result
Test of hypothesis
Example
11/19
Topic 9: Canonical Correlation
Interpret the result
Interpretation of the Canonical Variables
Standardized coefficients
Coefficients Standardized coefficients
a11 = 0.051(sex) a11 = 0.025(sex)a12 = 0.048(age) a12 = 0.871(age)
a13 = −0.29(educat) a13 = −0.383(educat)a14 = 0.005(income) a14 = 0.082(income)b11 = −0.055(CESD) b11 = −0.490(CESD)b12 = 1.17(health) b12 = 0.982(health)
12/19
Topic 9: Canonical Correlation
Interpret the result
Interpretation of the Canonical Variables
13/19
Topic 9: Canonical Correlation
Interpret the result
Interpretation of the Canonical Variables
Loadings
Canonical loadings(canonical structural coefficients)
• loadings : corr(xi, wj), corr(yi, vj)
• when the set of variables are uncorrelated, loading= std.coefficients.
• when the set of variables are correlated, loading and std.coefficients can be quite different.
• It’s simpler to try to interpret the loadings rather thancoefficients.
14/19
Topic 9: Canonical Correlation
Interpret the result
Interpretation of the Canonical Variables
Figure: Correlation matrix
15/19
Topic 9: Canonical Correlation
Interpret the result
Interpretation of the Canonical Variables
16/19
Topic 9: Canonical Correlation
Interpret the result
Redundancy Analysis
Redundancy measure(RM) is to determine how much of thevariance accounted for in one set of variables by other set ofvariables.
• Average amount variance in Y variables that is accounted byVi:
AV (Y |Vi) =
∑qi=1 loadings
2yi
q
•RMvi|wi
= AV (Y |Vi)× C2i
eg:•
AV (Y |V1) =(−0.282)2 + 0.878r
2= 0.425
•RMv1|w1
= 0.425× 0.4052 = 0.069717/19
Topic 9: Canonical Correlation
Relations
Most of dependence methods are special cases of canonicalcorrelation.
• only one response: multiple regression
• only one dummy variable as response:two-group discriminant
• several dummy variables as responses: multi-groupdiscriminats
• only one response and dummy variables as indep: ANOVA
• several responses and dummy variables as indep: MANOVA
18/19
Topic 9: Canonical Correlation
Relations
Concluding remarks
• If the sample size is large enough, it is advisable to split it,run a canonical analysis on both halves, and compare resultsto see if they are similar.
• Tests of hypothesis regarding canonical correlation assumethat joint distribution is multivariate normal. This assumptionshould be checked if such tests are to be reported.
• Canonical correlation analysis is one of the less commonlyused multivariate techniques. Its limited use may be due, inpart, to the difficulty often encountered in trying to interpretthe results.
19/19