Statistical Bases for Map Reconstructions and Comparisons

Preview:

DESCRIPTION

Statistical Bases for Map Reconstructions and Comparisons. Jerry Platt May 2005. Preliminaries. Motivation Do Different Maps “Differ”? Methods Singular-Value Decomposition Multidimensional Scaling and PCA Mantel Permutation Test Procrustean Fit and Permu. Test - PowerPoint PPT Presentation

Citation preview

1

Statistical Bases for Map Reconstructions and Comparisons

Jerry Platt

May 2005

2

Preliminaries

3

Outline• Motivation

– Do Different Maps “Differ”?

• Methods

– Singular-Value Decomposition

Multidimensional Scaling and PCA

Mantel Permutation TestProcrustean Fit and Permu. TestBidimensional Regression

• Working Example– Locational Attributes of Eight URSB Campuses

4

Motivation• Comparing Maps Over Time

Accuracy of a 14th Century MapLeader Image Change in Great BritainWhere IS Wall Street, post-9/11?

• Comparing Maps Among Sub-samplesThings People Fear, M v. F Face-to-Face Comparisons

• Comparing Maps Across AttributesCompetitive Positioning of FirmsChinese Provinces & Human Dev. Indices

5

Accuracy of a 14th Century Map

http://www.geog.ucsb.edu/~tobler/publications/pdf_docs/geog_analysis/Bi_Dim_Reg.pdf

6http://www.mori.com/pubinfo/rmw/two-triangulation-models.pdf

7http://igeographer.lib.indstate.edu/pohl.pdf

8

http://www.analytictech.com/borgatti/papers/borgatti%2002%20-%20A%20statistical%20method%20for%20comparing.pdf

Things People Fear, F v. M

9http://www.multid.se/references/Chem%20Intell%20Lab%20Syst%2072,%20123%20(2004).pdf

Face-to-Face Comparisons

10http://www.gsoresearch.com/page2/map.htm

11

12

MethodsEigen-Analysis and Singular-Value Decomposition

Multidimensional Scaling & Principal Comps.

Mantel Permutation Test

Procrustean Fit and Permutation Test

Bidimensional Regression

13

Eigen-analysis

• C = an NxN variance-covariance matrix

• Find the N solutions to C = = the N Eigenvalues, with 1 ≥ 2 ≥ …

= the N associated Eigenvectors

• C = LDL’, where

L = matrix of s

D = diagonal matrix of s

14

Singular Value Decomposition

• Every NxP matrix A has a SVD

• A = U D V’

• Columns of U = Eigenvectors of AA’

• Entries in Diagonal Matrix D = Singular Values

= SQRT of Eigenvalues of either AA’ or A’A

• Columns of V = Eigenvectors of A’A

15

SVD

16

Principal Component Analysis

• A is a column-centered data matrix

• A = U D V’

• V’ = Row-wise Principal Components

• D ~ Proportional to variance explained

• UD = Principal Component Scores

• DV’ = Principle Axes

17

Multidimensional Scaling• A is a column-centered dissimilarity matrix

• B =

• B = U D V’

• B = XX’, where X = UD1/2

• Limit X to 2 Columns Coordinates to 2d MDS

'

1'

1

2

1 2 iiN

IAiiN

I

18

A RandomPermutation

Test

Given DissimilarityMatrices A and B:

N! Permutations37! = 1.4*E+43 8! = 40,320

19

Permutation Tests

PermuteList & rerun

ObservedTestStatisticTS = 25# CorrectOf 37 SB.

Is 25Significantly> 18.5?

Ho: TS = 18.5HA: TS > 18.5

P = .069P > .05Do NotReject Ho

20

21http://www.entrenet.com/~groedmed/greekm/mythproc.html

22

http://www.zoo.utoronto.ca/jackson/pro2.html

Centering &Scaling

MirrorReflection

Rotation &Dilation toMin ∑(є2)

23

Procrustean Analysis

• Two NxP data configurations, X and Y• X’Y = U D V’• H = UV• OLS Min SSE = tr ∑(XH-Y)’(XH-Y)

= tr(XX’) + tr(YY’) -2tr(D)

= tr(XX’) + tr(YY’) – 2tr(VDV’)

24

OLS Regression

• Y = X + • Y = Xb + e• X = UDV’• b = VrD-1Ur’Y, where r = first r columns (N>P)

• b = (X’X)-1X’Y

• b = VrVr’ • Estimated Y values = Ur Ur’Y

25

Bidimensional Regression• (Y,X) = Coordinate pair in 2d Map 1

Y = 0 + 0X

• (A,B) = Coordinate pair in 2d Map 2

E[A] 1 1 -2 X 1

E[B] 1 2 1 Y 2

1 = Horizontal Translation

2 = Vertical Translation

= Scale Transformation = SQRT(12

+ 22)

= Angle Transformation = TAN-1(2 / 1 ) +1800

= + +

Iff 1 < 0

26

Althoughr = 1,differ inlocation,scale, andangles ofrotationaroundorigin (0,0)

Horizontal& VerticalTranslation

Angle ofrotationaroundorigin (0,0)

Scaletransform,with < 1 ifcontration,& > 1 ifexpansion

27

Working Example

• Eight URSB Campuses– RD, BK, TO, RC, SA, RV, SD, TA

• Data Sources– Locations– Housing Attributes– Tapestry Attributes

• Data Analyses

28

Eight URSB Campuses

29

87.5 miles

88.1 miles

30

31

32

33

EXAMPLE: Eight URSB Campuses

34

35

SD

TA

RDRVRCBK

TOSA

36

… and if DISTANCES available, but COORDINATES Unavailable?

• Treat Distance Matrix as Dissimilarity Matrix

• Apply Multidimensional Scaling

• Apply the two-dimension solution “as if” it represents latitude and longitude coordinates

37

Distance Estimates Vary

… But Not “Significantly”

38

MDS RepresentationInput = D; Output = 2d

D8x8

39

SD

TA

RD

RVRC

BK

TO

SA

Errors“appear”

to bequitesmall

…BUT

is therea wayto test

if errorsare

“STATSIGNIF”

?

40

Mantel Test

41

Procrustean Test:MDS Map Recreation

CONCLUDE: Near-perfect Map Recreation

42

Driving Distances

Do these differ “significantly” from linear distances?

STATISTICAL PRACTICAL

43

DriveD = Driving DistancesEight URSB Locations

Multidimensional Scaling,with 2-dimension solution

44

SD

TA

RD

RVRC

BK

TO

SA

45

46

Bidimensional Regression:AB on YX

47

PROTEST Comparison

BidimensionalRegression

ProcrusteanRotation

48

Housing

49

Tapestry (ESRI)

50

Map Coordinates as Explanatory Variables in Linear Models

51

Incremental Tests

So Map Coordinates seem sufficient as predictors

52

Proxy Measures of lat-longin Linear Model

Translations& Transforms

Reduce 8

And ↑ R2

53

Robust criterionwould help here:

Min (Med(є2))

54

Bidimensional Regression

Is There a Linear RelationshipBetween Housing and Tapestry

Data?

r = 0.5449

MustStandardize

Data

55

56

It’s Still a 3-d World

57

Recommended