View
2
Download
0
Category
Preview:
Citation preview
Euclidean representations of a set of hierarchiesusing Multiple Factor AnalysisCadoret M.*, Lê S.* and Pagès J.*
* Applied mathematics departmentAgrocampus Ouest, France
9 February 2011
Correspondence Analysis and Related Methods 2011
Laboratoire de Mathématiques Appliquées Agrocampus
Introduction Data coding Statistical analysis Application Conclusion References
Outline
1 Introduction
2 Data coding
3 Statistical analysis
4 Application
5 Conclusion
2/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Introduction
Interested in:Set of non-indexed hierarchiesSynthetic graphical representations
At least 2 possible graphical representations:As a hierarchy consensus (Adams, 1972)
Same shape of the dataConsensus difficult to obtain when the number of hierarchiesincreases
As an Euclidean representation of the hierarchies:representation of the terminal nodes, etc.
3/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Outline
1 Introduction
2 Data coding
3 Statistical analysis
4 Application
5 Conclusion
4/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Data coding (1)
A B HC ID J K LE M NF G O P
A CB
D
E F G
C E FD G
MH NI J
OK PL
A B H I JK L M N O P
L1 L2 L3ABCDEFGHIJKLMNOP
5/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Data coding (1)
A B HC ID J K LE M NF G O P
L1
L1 L2 L3A G1B G1C G1D G1E G1F G1G G1H G2I G2J G2K G2L G2M G2N G2O G2P G2
5/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Data coding (1)
A B HC ID J K LE M NF G O P
A CB
D
E F G
MH NI J
OK PL
L1
L2
L1 L2 L3A G1 G1B G1 G1C G1 G2D G1 G2E G1 G2F G1 G2G G1 G2H G2 G3I G2 G3J G2 G3K G2 G3L G2 G3M G2 G4N G2 G4O G2 G4P G2 G4
5/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Data coding (1)
A B HC ID J K LE M NF G O P
A CB
D
E F G
C E FD G
MH NI J
OK PL
A B H I JK L M N O P
L1
L2
L3
L1 L2 L3A G1 G1 G1B G1 G1 G1C G1 G2 G2D G1 G2 G2E G1 G2 G3F G1 G2 G3G G1 G2 G3H G2 G3 G4I G2 G3 G4J G2 G3 G4K G2 G3 G4L G2 G3 G4M G2 G4 G5N G2 G4 G5O G2 G4 G5P G2 G4 G5
5/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Data coding (2)
L1 L2 L3 L1 L2 L1 L2 L3 L41
I
Hierarchy 1 Hierarchy j Hierarchy J
6/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Outline
1 Introduction
2 Data coding
3 Statistical analysis
4 Application
5 Conclusion
7/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Check on data coding and analysis of 1 hierarchy
Data table with qualitative variablesMultiple Correspondence Analysis + Ascendant HierarchicalClassification on the dimensions
A B HC ID J K LE M NF G O P
A CB
D
E F G
C E FD G
MH NI J
OK PL
A B H I JK L M N O P
H I J K L M N O PA B C D E F G
0.0
0.2
0.4
0.6
0.8
1.0
⇒ We found the initial hierarchy
8/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Check on data coding and analysis of 1 hierarchy
Data table with qualitative variablesMultiple Correspondence Analysis + Ascendant HierarchicalClassification on the dimensions
A B HC ID J K LE M NF G O P
A CB
D
E F G
C E FD G
MH NI J
OK PL
A B H I JK L M N O P
H I J K L M N O PA B C D E F G
0.0
0.2
0.4
0.6
0.8
1.0
⇒ We found the initial hierarchy
8/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Check on data coding and analysis of 1 hierarchy
Data table with qualitative variablesMultiple Correspondence Analysis + Ascendant HierarchicalClassification on the dimensions
A B HC ID J K LE M NF G O P
A CB
D
E F G
C E FD G
MH NI J
OK PL
A B H I JK L M N O P
H I J K L M N O PA B C D E F G
0.0
0.2
0.4
0.6
0.8
1.0
⇒ We found the initial hierarchy
8/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Objectives
From a data table with a group structure on the variables, we wantto perform a global factorial analysis such as:
it provides graphical representations of objects, hierarchies andlevels of hierarchythe influence of each hierarchy is balanced
⇒ Multiple Factor Analysis (MFA; Escofier and Pagès (1982)) inwhich 1 hierarchy corresponds to 1 group of variables
9/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Objectives
From a data table with a group structure on the variables, we wantto perform a global factorial analysis such as:
it provides graphical representations of objects, hierarchies andlevels of hierarchythe influence of each hierarchy is balanced
⇒ Multiple Factor Analysis (MFA; Escofier and Pagès (1982)) inwhich 1 hierarchy corresponds to 1 group of variables
9/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Objectives
From a data table with a group structure on the variables, we wantto perform a global factorial analysis such as:
it provides graphical representations of objects, hierarchies andlevels of hierarchythe influence of each hierarchy is balanced
⇒ Multiple Factor Analysis (MFA; Escofier and Pagès (1982)) inwhich 1 hierarchy corresponds to 1 group of variables
9/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Multiple Correspondence Analysis (MCA)
MCA is looking for dimensions zs that maximize:
1Q
Q∑q
η2(zs , Lq),
with:Q the number of qualitative variableszs the axis sLq the qualitative variable q
02
12
10/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Multiple Correspondence Analysis (MCA)
MCA is looking for dimensions zs that maximize:
1Q
Q∑q
η2(zs , Lq),
with:Q the number of qualitative variableszs the axis sLq the qualitative variable q
02
12
10/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Multiple Correspondence Analysis (MCA)
MCA is looking for dimensions zs that maximize:
1Q
Q∑q
η2(zs , Lq),
with:Q the number of qualitative variableszs the axis sLq the qualitative variable q
02
12
10/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Multiple Factor Analysis (MFA)
MFA is looking for dimensions zs that maximize the followingcriterion:
J∑j
1Qj
Qj∑q
η2(zs , Lq),
with:Qj the number of level of hierarchy jzs the axis sLq the level q of the hierarchy j
⇒ In this particular case: criterion maximized by MFA ⇔ sum ofcriteria maximized by MCA
11/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Multiple Factor Analysis (MFA)
MFA is looking for dimensions zs that maximize the followingcriterion:
J∑j
1Qj
Qj∑q
η2(zs , Lq),
with:Qj the number of level of hierarchy jzs the axis sLq the level q of the hierarchy j
⇒ In this particular case: criterion maximized by MFA ⇔ sum ofcriteria maximized by MCA
11/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Disjunctive data table associated with one hierarchy j
1 k Kj
1
1
L1
i 0 1 yik
I
Ik IKj
0 0 0 0 0 1 0 0 0
I
Lqj LQjL1 Lqj LQj
1
I
Each level (associated with a hierarchy) is represented by a set ofdummy variables
12/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Object representation
Distance between 2 objects:
d2(i , l) =∑
j
1Qj
∑k∈Kj
IIk(yik − ylk)
2 =∑
j
d2MCAj
(i , l),
with:Qj the number of level of hierarchy jI the number of objectsIk the number of objects into the group kyik the element of the disjunctive data table which is equal to1 if the object i belong to group k and 0 in the opposite case
In this particular case: sum of usual distance in MCA
⇒ 2 objects will be closer than they belong to the same group for alot of hierarchies
13/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Global hierarchy representation
H1
H2H3
10
0
1
),( 21 HzLg
),( 22 HzLg
Coordinate of hierarchy j on axis s:
1Qj
∑q∈Qj
η2(zs , Lq),
with:Qj the number of level ofhierarchy jzs the axis sLq the level q of the hierarchyj
14/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Level representation
H3
H3L1
H3L2
H3L3
10
0
1
Coordinate of level q on axis s:
η2(zs , Lq),
with:zs the axis sLq the level q
2 consequences:Levels ordered along each axisHierarchy = barycenter of itslevels
15/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Outline
1 Introduction
2 Data coding
3 Statistical analysis
4 Application
5 Conclusion
16/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Data
16 advertisements concerning an orange juiceAdvertisements built according to a 25−1 fractional factorialdesign22 subjectsHierarchical sorting
17/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Example of hierarchical sorting: subject number 3
ABC DE FGH IJK LM N OP
18/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Example of hierarchical sorting: subject number 5
A B C
D
E
FG
HI
J
K
L MN
OP
19/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Advertisement representation
-6 -4 -2 0 2 4 6
-4-2
02
46
Dim 1 (15.62 %)
Dim
2 (1
4.18
%)
A
B
C
DE
FG
H
IJ
K
L
M
N
O
P
λ1 = 16.55
20/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Advertisement representation
-6 -4 -2 0 2 4 6
-4-2
02
46
Dim 1 (15.62 %)
Dim
2 (1
4.18
%)
A
B
C
DE
FG
H
IJ
K
L
M
N
O
P
Background color
λ1 = 16.55
20/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Advertisement representation
-6 -4 -2 0 2 4 6
-4-2
02
46
Dim 1 (15.62 %)
Dim
2 (1
4.18
%)
A
B
C
DE
FG
H
IJ
K
L
M
N
O
P
Background color
Figu
rativ
e
λ1 = 16.55
20/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Hierarchy representation
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Dim 1 (15.62 %)
Dim
2 (1
4.18
%)
12
3
4
5
67
8 9 11 1718 2022
10
12
13
14
15
16
19
21
21/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Hierarchy representation: subject number 3
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Dim 1 (15.62 %)
Dim
2 (1
4.18
%)
12
34
5
67
8 9 11 1718 2022
10
1213
14
15
16
19
21
-6 -4 -2 0 2 4 6
-4-2
02
46
Dim 1 (15.62 %)D
im 2
(14.
18 %
) A
B
C
DE
FG
H
IJ
K
L
MN
OP
L2
L1
L3
22/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Level representation: subject number 3
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Dim 1 (15.62%)
Dim
2 (1
4.18
%)
3.L3
3.L23.L1
3
23/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Level representation: trajectories
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Dim 1 (15.62%)
Dim
2 (
14.1
8%
)
4.L1
4.L2
4.L3
5.L1
5.L2
6.L3 6.L47.L3 7.L4
12.L2
12.L3
12.L4
13.L1
13.L2
13.L3
14.L1
14.L2
14.L3
6.L1 6.L27.L1 7.L2
10.L115.L1
1.L1 2.L1
3.L1 3.L2
8.L1 9.L1
11.L1 12.L1
16.L1 17.L118.L1 19.L120.L1 21.L122.L1
1.L2 1.L3 2.L22.L3 3.L3 8.L28.L3 8.L4 9.L29.L3 9.L4 10.L2
10.L3 10.L4 11.L211.L3 11.L4 15.L215.L3 16.L2 16.L317.L2 17.L3 17.L418.L2 18.L3 18.L419.L2 19.L3 19.L419.L5 20.L2 20.L320.L4 21.L2 21.L322.L2 22.L3 22.L4
24/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Level representation: trajectories
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Dim 1 (15.62%)
Dim
2 (
14.1
8%
)
4.L1
4.L2
4.L3
5.L1
5.L2
6.L3 6.L47.L3 7.L4
12.L2
12.L3
12.L4
13.L1
13.L2
13.L3
14.L1
14.L2
14.L3
6.L1 6.L27.L1 7.L2
10.L115.L1
1.L1 2.L1
3.L1 3.L2
8.L1 9.L1
11.L1 12.L1
16.L1 17.L118.L1 19.L120.L1 21.L122.L1
1.L2 1.L3 2.L22.L3 3.L3 8.L28.L3 8.L4 9.L29.L3 9.L4 10.L2
10.L3 10.L4 11.L211.L3 11.L4 15.L215.L3 16.L2 16.L317.L2 17.L3 17.L418.L2 18.L3 18.L419.L2 19.L3 19.L419.L5 20.L2 20.L320.L4 21.L2 21.L322.L2 22.L3 22.L4
18%
63%
18%
24/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Outline
1 Introduction
2 Data coding
3 Statistical analysis
4 Application
5 Conclusion
25/ 27
Introduction Data coding Statistical analysis Application Conclusion References
Conclusion
Methodology providing:Representation of objects, hierarchies, levelsRepresentations related to each otherRepresentations interpretable according to simple rules
In the example, suggests groups of hierarchiesAllows the simultaneous taking into account of hierarchies andpartitions in a same analysisProgram available in the SensoMineR package
26/ 27
Introduction Data coding Statistical analysis Application Conclusion References
References
Adams, E. I. (1972). Consensus techniques and the comparison oftaxonomic trees. Systematic Zoology, 21:390–397.
Escofier, B. and Pagès, J. (1982). Comparaison de groupes devariables définies sur le même ensemble d’individus. Rapport derecherche INRIA, 149.
27/ 27
Recommended