53
Dimension Reduction Methods

Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Embed Size (px)

Citation preview

Page 1: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Dimension Reduction Methods

Page 2: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

• statistical methods that provide information about point scatters in multivariate space

• “factor analytic methods”– simplify complex relationships between cases

and/or variables– makes it easier to recognize patterns

Page 3: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

• identify and describe ‘dimensions’ that underlie the input data– may be more fundamental than those directly

measured, and yet hidden from view

• reduce the dimensionality of the research problem– benefit = simplification; reduce number of variables

you have to worry about

• identifying sets of variables with similar “behaviour”

How?

Page 4: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Basic Ideas

• imagine a point scatter in multivariate space:– the specific values of the numbers used to describe the

variables don’t matter

– we can do anything we want to the numbers, provided they don’t distort the spatial relationships that exist among cases

• some kinds of manipulations help us think about the shape of the scatter in more productive ways

Page 5: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

• imagine a two dimensional scatter of points that show a high degree of correlation …

x

y

bar-x

bar-y

orthogonal regression…

Page 6: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify
Page 7: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Why bother?

• more “efficient” description– 1st var. captures max. variance – 2nd var. captures the max. amount of residual

variance, at right angles (orthogonal) to the first

• the 1st var. may capture so much of the information content in the original data set that we can ignore the remaining axis

Page 8: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

• other advantages…

• you can score original cases (and variables) in new space, and plot them…

• spatial arrangements may reveal relationships that were hidden in higher dimension space

• may reveal subsets of variables based on correlations with new axes…

Page 9: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

length

width

Page 10: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

“size”

“shape”

Page 11: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Storage / Cooking

Cooking

PUBLIC PRIVATE

DO

ME

STIC

RIT

UA

L

Ritual

candelero

Service?

Page 12: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Principal Components Analysis (PCA)

why:• clarify relationships among variables• clarify relationships among cases

when:• significant correlations exist among variables

how:• define new axes (components)• examine correlation between axes and variables• find scores of cases on new axes

Page 13: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

r = 0r = -1r = 1

x4

x3

x2

x1

pc2

pc1

component loading

eigenvalue: sum of all squared loadings on one component

Page 14: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

eigenvalues

• the sum of all eigenvalues = 100% of variance in original data

• proportion accounted for by each eigenvalue = ev/n (n = # of vars.)

• correlation matrix; variance in each variable = 1– if an eigenvalue < 1, it explains less variance than one

of the original variables

– but .7 may be a better threshold…

• ‘scree plots’ – show trade-off between loss of information, and simplification

Page 15: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Mandara Region knife morphology

Page 16: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify
Page 17: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify
Page 18: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify
Page 19: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

J. Yellen – San ethnoarchaeology (1977)

• CAMP: the camp identification number (1-16.)• LENGTH: the total number of days the camp was occupied.• INDIVID: the number of individuals in the principal period of

occupation of the camp. Note that not all individuals were at the camp for the entire LENGTH of occupation.

• FAMILY: the number of families occupying the site.• ALS: the absolute limit of scatter; the total area (m²) over

which debris was scattered.• BONE: the number of animal bone fragments recovered from the site.• PERS_DAY: the actual number of person-days of occupation (not

the product of INDIVID*LENGTH—not all individuals were at the camp for the entire time.)

Page 20: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Correspondence Analysis (CA)

• like a special case of PCA — transforms a table of numerical data into a graphic summary

• hopefully a simplified, more interpretable display deeper understanding of the fundamental

relationships/structure inherent in the data

• a map of basic relationships, with much of the “noise” eliminated

• usually reduces the dimensionality of the data…

Page 21: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

• derived from methods of contingency table analysis

most suited for analysis of categorical data: counts, presence-absence data

• possibly better to use PCA for continuous (i.e., ratio) data

• but, CA makes no assumptions about the distribution of the input variables…

CA – basic ideas

Page 22: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

• simultaneously R and Q mode analysis

• derives two sets of eigenvalues and eigenvectors ( CA axes; analogous to PCA components)

• input data is scaled so that both sets of eigenvectors occupy very comparable spaces

• can reasonably compare both variables and cases in the same plots

Page 23: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

CA output

• CA (factor) scores– for both cases and variables

• percentage of total inertia per axis– like variance in PCA; relates to dispersal of points

around an average value– inertia accounted for by each axis distortion in a

graphic display

• loadings– correlations between rows/columns and axes– which of the original entities are best accounted for by

what axis?

Page 24: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

“mass”

• as in PCA new axes maximize the spread of observations in rows / columns– spread is measured in inertia, not variance

– based on a “chi-squared” distance, and is assessed separately for cases and variables (rows and columns)

• contributions to the definition of CA axes is weighted on the basis of row/column totals– ex: pottery counts from different assemblages; larger

collections will have more influence than smaller ones

Page 25: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

“Israeli political economic concerns”

residential codes:

As/Af (Asia or Africa)

Eu/Am (Europe or America)

Is/AA (Israel, dad lives in Asia or Africa)

Is/EA (Israel, dad lives in Europe or America)

Is/Is (Israel, dad lives in Israel)

Page 26: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

“Israeli political economic concerns”

“worry” codesENR Enlisted relativeSAB SabotageMIL Military situationPOL Political situationECO Economic situationOTH OtherMTO More than one worryPER Personal economics

Page 27: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

As/Af Eu/Am Is/AA Is/EA Is/IsENR 61 104 8 22 5SAB 70 117 9 24 7MIL 97 218 12 28 14POL 32 118 6 28 7ECO 4 11 1 2 1OTH 81 128 14 52 12MTO 20 42 2 6 0PER 104 48 14 16 9

Page 28: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Ksar Akil – Up. Pal., Lebanon

Page 29: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Data> Frequency> COUNT

Statistics> Data Reduction> CA

Page 30: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify
Page 31: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Correspondence Plot

-1.0 -0.5 0.0 0.5 1.0Dim(1)

-1.0

-0.5

0.0

0.5

1.0D

im(2

)

10

9

3

45

2

61

8

7

PC

BT

NC

BL

FB

Page 32: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Correspondence Plot

-1.0 -0.5 0.0 0.5 1.0Dim(1)

-1.0

-0.5

0.0

0.5

1.0

Dim

(2)

10

9

3

45

2

61

8

7

PC

BT

NC

BL

FB

Page 33: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify
Page 34: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

partCor

nonCor

flak eB d

blade

bladelet

-0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5

Dim ens ion 1; E igenvalue: .07609 (59.41% of Inert ia)

-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

Dim

ension 2; Eigenvalue: .04095 (31.97%

of Inertia)

1

2

3

4

5

6

7

89

10

-0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5

Dim ens ion 1; E igenvalue: .07609 (59.41% of Inert ia)

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

Dim

ension 2; Eigenvalue: .04095 (31.97%

of Inertia)

Page 35: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify
Page 36: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

2D P lot of Row and Colum n Coordinates ; Dim ens ion: 1 x 2

Input Table (Rows x Colum ns ): 10 x 5

S tandardiz ation: Row and c olum n profiles

1

2

3

45

6

7

89

10

partCor

nonCor

flak eB d

bladebladelet

-0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5

Dim ens ion 1; E igenvalue: .07609 (59.41% of Inert ia)

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

Dim

ension 2; Eigenvalue: .04095 (31.97%

of Inertia)

Page 37: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Multidimensional Scaling (MDS)

• aim: define low-dimension space that preserves the distance between cases in original high-dimension space…

• closely related to CA/PCA, but with an iterative location-shifting procedure…– may produce a lower-dimension solution than

CA/PCA– not simultaneously Q and R mode…

Page 38: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

0 10 20 30 40 50 60 70EAST

10

20

30

40

50

60

NO

RT

H

A

B

D

C

A B C D

A B C D

‘non-metric’ MDS

‘metric’ MDS

Page 39: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Tree Diagram for 21 Cases

W ard`s method

Euc lidean distances

0 50 100 150 200 250 300

Linkage D istance

AtsinnaLPescadoSpr

CienegaPuebloMuerto

PescadoWRainbowSpr

TinajaNAtsinna

MirabalRuinGigantes

DayRanchJacksLake

UpperSoldadoBoxS

ScribeSMillerCanyon

Spier61UPescadoSpr

Spier81YellowHouseHeshYalawa

Page 40: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

2D Plot of Row Coordinates; D imensions: 1 x 2

Input Table (Rows x Columns): 21 x 10

Standardization: Row and column profiles

Hes hY alawa

RainbowS pr

S pier61

Y ellowHous e

S pier81

P es c adoW

LP es c adoS pr

UP es c adoS pr

B ox S

Day Ranc h

P uebloM uerto

CienegaM irabalRuinTinaja

G igantesNA ts inna

A ts inna

M illerCany on

UpperS oldadoJac k s Lak e

S c ribeS

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5

Dimension 1; Eigenvalue: .43072 (45.49% of Inertia)

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

Dim

ension 2; Eigenvalue: .23744 (25.08%

of Inertia)

Page 41: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

2D Plot of Row Coordinates; D imensions: 1 x 2

Input Table (Rows x Columns): 21 x 10

Standardization: Row and column profiles

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5

Dimension 1; Eigenvalue: .43072 (45.49% of Inertia)

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

Dim

ension 2; Eigenvalue: .23744 (25.08%

of Inertia)

Page 42: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Scatterplot 2D

Final Configuration, dimension 1 vs. dimension 2

Hes hY alawa

RainbowS pr

S pier61 Y ellowHous e

S pier81

P es c adoW

LP es c adoS pr

UP es c adoS prB ox S

Day Ranc h

P uebloM uerto

Cienega

M irabalRuin

Tinaja

G igantes

NA ts inna

A ts inna

M illerCany on

UpperS oldadoJac k s Lak e

S c ribeS

-1.0 -0.5 0.0 0.5 1.0 1.5 2.0

Dimension 1

-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Dim

ension 2

Page 43: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

S hepard Diagram

0 20 40 60 80 100 120

Data

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Distances/D

-Hats

“Shepard Diagram”

Page 44: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Discriminant Analysis (DFA)

• aims: – calculate a function that maximizes the ability

to discriminate among 2 or more groups, based on a set of descriptive variables

– assess variables in terms of their relative importance and relevance to discrimination

– classify new cases not included in the original analysis

Page 45: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

var A

var

B

Page 46: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

DFA

• DFs = groups-1– each subsequent function is orthogonal to the

last– associated with eigenvalues that reflect how

much ‘work’ each function does in discriminating between groups

• stepwise vs. complete DFA

Page 47: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Figure 6.5: Factor structure coefficients: These values show the correlation between Miccaotli ceramic categories and the first two discriminant functions. Categories exhibiting high positive or negative values are the most important for discriminating among A-clusters.

Page 48: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Function 1

Funct

ion 2

-4

-3

-2

-1

0

1

2

3

4

5

6

7

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

MC:acls4/1

MC:acls4/2

MC:acls4/3

MC:acls4/4

+ outcurving bowl+ cazuela/crater

(comales, other fine-wares)

+ olla

+ outcurving bowl (ollas, other fine-wares)

+ cazuela/crater

Figure 6.4: Case scores calculated for the first two functions generated by discriminant analysis, using Miccaotli A-cluster membership as the grouping variable and posterior estimates of ceramic category proportions as discriminating variables.

Page 49: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify

Figure 6.6: Factor structure coefficients generated by four separate DFA analyses using binary grouping variables derived from Miccaotli A-cluster memberships. A single discriminant function is associated with each A-cluster.

Page 50: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify
Page 51: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify
Page 52: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify
Page 53: Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify