147
Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob.

Detailed Look at PCA

  • Upload
    feo

  • View
    35

  • Download
    1

Embed Size (px)

DESCRIPTION

Detailed Look at PCA. Three Important (& Interesting) Viewpoints: Mathematics Numerics Statistics 1 st : Review Linear A lg. and M ultivar . Prob. Review of Linear Algebra (Cont.). Better View of Relationship: Singular Value Dec.  Eigenvalue Dec. - PowerPoint PPT Presentation

Citation preview

Page 1: Detailed Look at PCA

Detailed Look at PCA

Three Important (& Interesting) Viewpoints:

1. Mathematics

2. Numerics

3. Statistics

1st: Review Linear Alg. and Multivar. Prob.

Page 2: Detailed Look at PCA

Review of Linear Algebra (Cont.)

Better View of Relationship:

Singular Value Dec. Eigenvalue Dec.

• Start with data matrix:

• With SVD:

• Create square, symmetric matrix:

• Note that:

• Gives Eigenanalysis,

tVSUX

X

2& SDUB

nd

tXX

tttt USUUSVVSUXX 2

Page 3: Detailed Look at PCA

Review of Multivariate Probability

Given a Random Vector,

A Center of the Distribution is the Mean Vector,

Note: Component-Wise Calc’n (Euclidean)

dX

XX

1

dEX

EXXE

1

Page 4: Detailed Look at PCA

Review of Multivariate Probability

Given a Random Vector,

A Measure of Spread is the Covariance Matrix:

dd

d

XXX

XXXX

var,cov

,covvar)cov(

1

11

dX

XX

1

Page 5: Detailed Look at PCA

Review of Multivar. Prob. (Cont.)

Outer Product Representation:

,

Where:

n

idididid

didii

XXXXXX

XXXXXX

n 1 211

112

11

11ˆ

tn

i

tii XXXXXX

n~~

11ˆ

1

ndn XXXX

nX

11

1~

Page 6: Detailed Look at PCA

PCA as an Optimization Problem

Find “Direction of Greatest Variability”:

Page 7: Detailed Look at PCA

PCA as an Optimization Problem

Find “Direction of Greatest Variability”:

Page 8: Detailed Look at PCA

PCA as Optimization (Cont.)Now since is an Orthonormal Basis Matrix, and So the Rotation Gives a

Distribution of the Energy of Over the Eigen-Directions

And is Max’d (Over ), by Putting all Energy in the “Largest Direction”, i.e. ,

Where “Eigenvalues are Ordered”,

B

d

jjj vuvu

1

,

d

jj uvu

1

22 ,1

uv

uvuB

d

t

,

,1

u

d

jjj

n

iiu uvnXXP

1

2

1

2,1 u

1vu

d 21

Page 9: Detailed Look at PCA

PCA as Optimization (Cont.)Notes:• Solution is Unique when • Else have Sol’ns in Subsp. Gen’d by 1st s• Projecting onto Subspace to ,

Gives as Next Direction• Continue Through ,…,• Replace by to get Theoretical PCA• Estimated by the Empirical Version

21

v 1v

2v

3v dv

Page 10: Detailed Look at PCA

Connect Math to Graphics2-d Toy Example

Descriptor Space Object Space

Data Points (Curves) are columns of data matrix, X

Page 11: Detailed Look at PCA

Connect Math to Graphics (Cont.)

2-d Toy ExampleDescriptor Space Object Space

PC1 Direction = η = Eigenvector (w/ biggest λ)

Page 12: Detailed Look at PCA

Connect Math to Graphics (Cont.)

2-d Toy ExampleDescriptor Space Object Space

Centered Data PC1 Projection Residual Best 1-d Approx’s of Data

Page 13: Detailed Look at PCA

Connect Math to Graphics (Cont.)

2-d Toy ExampleDescriptor Space Object Space

Centered Data PC2 Projection Residual Worst 1-d Approx’s of Data

Page 14: Detailed Look at PCA

PCA Redistribution of EnergyConvenient Summary of Amount of Structure:

Total Sum of Squares

Physical Interpetation:Total Energy in Data

Insight comes from decomposition

Statistical Terminology:ANalysis Of VAriance (ANOVA)

n

iiX

1

2

Page 15: Detailed Look at PCA

PCA Redist’n of Energy (Cont.)Have already studied this decomposition (recall curve e.g.)

• Variation (SS) due to Mean (% of total)• Variation (SS) of Mean Residuals (% of total)

Page 16: Detailed Look at PCA

PCA Redist’n of Energy (Cont.)

j

Eigenvalues Provide Atoms of SS Decompos’nUseful Plots are:• Power Spectrum: vs. • log Power Spectrum: vs. • Cumulative Power Spectrum: vs. Note PCA Gives SS’s for Free (As Eigenval’s),

But Watch Factors of

d

jj

ttn

ii DtrDBBtrBDBtrXX

n 11

2)(

11

jj

j jlog

j

jj

1''

1n

Page 17: Detailed Look at PCA

PCA Redist’n of Energy (Cont.)Note, have already considered some of these Useful Plots:

• Power Spectrum• Cumulative Power Spectrum

Page 18: Detailed Look at PCA

Connect Math to Graphics (Cont.)

2-d Toy ExampleDescriptor Space Object Space

Revisit SS Decomposition for PC1

Page 19: Detailed Look at PCA

Connect Math to Graphics (Cont.)

2-d Toy ExampleDescriptor Space Object Space

Revisit SS Decomposition for PC1:PC1 has “most of var’n” = 93%Reflected by good approximation in Desc’r Space

Page 20: Detailed Look at PCA

Connect Math to Graphics (Cont.)

2-d Toy ExampleDescriptor Space Object Space

Revisit SS Decomposition for PC1:PC2 has “only a little var’n” = 7%Reflected by poor approximation in Desc’or Space

Page 21: Detailed Look at PCA

Different Views of PCASolves several optimization problems:

1. Direction to maximize SS of 1-d proj’d data

Page 22: Detailed Look at PCA

Different Views of PCA2-d Toy Example

Descriptor Space Object Space

1. Max SS of Projected Data2. Min SS of Residuals3. Best Fit Line

Page 23: Detailed Look at PCA

Different Views of PCASolves several optimization problems:

1. Direction to maximize SS of 1-d proj’d data

2. Direction to minimize SS of residuals

Page 24: Detailed Look at PCA

Different Views of PCA2-d Toy Example

Feature Space Object Space

1. Max SS of Projected Data2. Min SS of Residuals3. Best Fit Line

Page 25: Detailed Look at PCA

Different Views of PCASolves several optimization problems:

1. Direction to maximize SS of 1-d proj’d data

2. Direction to minimize SS of residuals

(same, by Pythagorean Theorem)

Page 26: Detailed Look at PCA

Different Views of PCASolves several optimization problems:

1. Direction to maximize SS of 1-d proj’d data

2. Direction to minimize SS of residuals

(same, by Pythagorean Theorem)

3. “Best fit line” to data in “orthogonal sense”

(vs. regression of Y on X = vertical sense

& regression of X on Y = horizontal sense)

Page 27: Detailed Look at PCA

Different Views of PCAToy Example Comparison of Fit Lines:

PC1

Regression of Y on X

Regression of X on Y

Page 28: Detailed Look at PCA

Different Views of PCANormal

Data

ρ = 0.3

Page 29: Detailed Look at PCA

Different Views of PCA

Projected

Residuals

Page 30: Detailed Look at PCA

Different Views of PCA

Vertical

Residuals

(X predicts Y)

Page 31: Detailed Look at PCA

Different Views of PCA

Horizontal

Residuals

(Y predicts X)

Page 32: Detailed Look at PCA

Different Views of PCA

Projected

Residuals

(Balanced

Treatment)

Page 33: Detailed Look at PCA

Different Views of PCAToy Example Comparison of Fit Lines:

PC1

Regression of Y on X

Regression of X on Y

Note: Big Difference Prediction Matters

Page 34: Detailed Look at PCA

Different Views of PCASolves several optimization problems:

1. Direction to maximize SS of 1-d proj’d data

2. Direction to minimize SS of residuals

(same, by Pythagorean Theorem)

3. “Best fit line” to data in “orthogonal sense”

(vs. regression of Y on X = vertical sense

& regression of X on Y = horizontal sense)

Use one that makes sense…

Page 35: Detailed Look at PCA

PCA Data RepresentationIdea: Expand Data Matrix

in Terms of Inner Prod’ts & Eigenvectors

Page 36: Detailed Look at PCA

PCA Data RepresentationIdea: Expand Data Matrix

in Terms of Inner Prod’ts & EigenvectorsRecall Notation:

(Mean Centered Data)

ndn XXXX

nX

11

1~

Page 37: Detailed Look at PCA

PCA Data RepresentationIdea: Expand Data Matrix

in Terms of Inner Prod’ts & EigenvectorsRecall Notation:

Spectral Representation (centered data):

ndn XXXX

nX

11

1~

d

j

tjjnd XX

1

~~

n11d

Page 38: Detailed Look at PCA

PCA Data Represent’n (Cont.)

Now Using:

XnXX ~1

Page 39: Detailed Look at PCA

PCA Data Represent’n (Cont.)

Now Using:

Spectral Representation (Raw Data):

XnXX ~1

d

jjj

d

j

tjjnd cXXnXX

11

~1

Page 40: Detailed Look at PCA

PCA Data Represent’n (Cont.)

Now Using:

Spectral Representation (Raw Data):

Where:• Entries of are Loadings• Entries of are Scores

XnXX ~1

d

jjj

d

j

tjjnd cXXnXX

11

~1

n11dj

jc

Page 41: Detailed Look at PCA

PCA Data Represent’n (Cont.)

Can Focus on Individual Data Vectors:

(Part of Above Full Matrix Rep’n)

d

jijji cXX

1

Page 42: Detailed Look at PCA

PCA Data Represent’n (Cont.)

Can Focus on Individual Data Vectors:

(Part of Above Full Matrix Rep’n)

Terminology: are Called “PCs” and are also Called Scores

d

jijji cXX

1

ijc

Page 43: Detailed Look at PCA

PCA Data Represent’n (Cont.)

More Terminology:

• Scores, are Coefficients in Spectral Representation:

d

jijji cXX

1

ijc

Page 44: Detailed Look at PCA

PCA Data Represent’n (Cont.)

More Terminology:

• Scores, are Coefficients in Spectral Representation:

• Loadings are Entries of Eigenvectors:

d

jijji cXX

1

ijc

dj

j

j

1ij

Page 45: Detailed Look at PCA

PCA Data Represent’n (Cont.)

Reduced Rank Representation:

k

jijji cXX

1

Page 46: Detailed Look at PCA

PCA Data Represent’n (Cont.)

Reduced Rank Representation:

• Reconstruct Using Only Terms(Assuming Decreasing Eigenvalues)

k

jijji cXX

1

dk

Page 47: Detailed Look at PCA

PCA Data Represent’n (Cont.)

Reduced Rank Representation:

• Reconstruct Using Only Terms(Assuming Decreasing Eigenvalues)

• Gives: Rank Approximation of Data

k

jijji cXX

1

dk

k

Page 48: Detailed Look at PCA

PCA Data Represent’n (Cont.)

Reduced Rank Representation:

• Reconstruct Using Only Terms(Assuming Decreasing Eigenvalues)

• Gives: Rank Approximation of Data• Key to PCA Dimension Reduction

k

jijji cXX

1

dk

k

Page 49: Detailed Look at PCA

PCA Data Represent’n (Cont.)

Reduced Rank Representation:

• Reconstruct Using Only Terms(Assuming Decreasing Eigenvalues)

• Gives: Rank Approximation of Data• Key to PCA Dimension Reduction• And PCA for Data Compression (~ .jpeg)

k

jijji cXX

1

dk

k

Page 50: Detailed Look at PCA

PCA Data Represent’n (Cont.)

Choice of in Reduced Rank Represent’n:k

Page 51: Detailed Look at PCA

PCA Data Represent’n (Cont.)

Choice of in Reduced Rank Represent’n:• Generally Very Slippery Problem

k

Page 52: Detailed Look at PCA

PCA Data Represent’n (Cont.)

Choice of in Reduced Rank Represent’n:• Generally Very Slippery Problem

Not Recommended:o Arbitrary Choiceo E.g. % Variation Explained 90%? 95%?

k

Page 53: Detailed Look at PCA

PCA Data Represent’n (Cont.)

Choice of in Reduced Rank Represent’n:• Generally Very Slippery Problem• SCREE Plot (Kruskal 1964):

Find Knee in Power Spectrum

k

Page 54: Detailed Look at PCA

PCA Data Represent’n (Cont.)

SCREE Plot Drawbacks:• What is a Knee?

Page 55: Detailed Look at PCA

PCA Data Represent’n (Cont.)

SCREE Plot Drawbacks:• What is a Knee?• What if There are Several?

Page 56: Detailed Look at PCA

PCA Data Represent’n (Cont.)

SCREE Plot Drawbacks:• What is a Knee?• What if There are Several?• Knees Depend on Scaling (Power? log?)

Page 57: Detailed Look at PCA

PCA Data Represent’n (Cont.)

SCREE Plot Drawbacks:• What is a Knee?• What if There are Several?• Knees Depend on Scaling (Power? log?)

Personal Suggestions: Find Auxiliary Cutoffs (Inter-Rater Variation) Use the Full Range (ala Scale Space)

Page 58: Detailed Look at PCA

PCA SimulationIdea: given• Mean Vector • Eigenvectors• EigenvaluesSimulate data from Corresponding Normal

Distribution

k ,1k ,,1

Page 59: Detailed Look at PCA

PCA SimulationIdea: given• Mean Vector • Eigenvectors• EigenvaluesSimulate data from Corresponding Normal

DistributionApproach: Invert PCA Data Represent’n where

k

jijjji ZX

1 1,0~ NZ ij

k ,1k ,,1

Page 60: Detailed Look at PCA

PCA & Graphical Displays

Small caution on PC directions & plotting:

• PCA directions (may) have sign flip• Mathematically no difference• Numerically caused artifact of round off• Can have large graphical impact

Page 61: Detailed Look at PCA

PCA & Graphical DisplaysToy Example (2 colored “clusters” in data)

Page 62: Detailed Look at PCA

PCA & Graphical DisplaysToy Example (1 point moved)

Page 63: Detailed Look at PCA

PCA & Graphical DisplaysToy Example (1 point moved)

ImportantPoint:

ConstantAxes

Page 64: Detailed Look at PCA

PCA & Graphical DisplaysOriginal Data (arbitrary PC flip)

Page 65: Detailed Look at PCA

PCA & Graphical DisplaysPoint Moved Data (arbitrary PC flip)

MuchHarderTo SeeMovingPoint

Page 66: Detailed Look at PCA

PCA & Graphical Displays

How to “fix directions”?

Personal approach:

Use ± 1 flip that gives:

Max(Proj(Xi)) > |Min(Proj(Xi))|

(assumes 0 centered)

Page 67: Detailed Look at PCA

Alternate PCA Computation

Issue: for HDLSS data (recall )• May be Quite Large,

nd

dd

Page 68: Detailed Look at PCA

Alternate PCA Computation

Issue: for HDLSS data (recall )• May be Quite Large,• Thus Slow to Work with, and to

Compute• What About a Shortcut?

nd

dd

Page 69: Detailed Look at PCA

Alternate PCA Computation

Issue: for HDLSS data (recall )• May be Quite Large,• Thus Slow to Work with, and to

Compute• What About a Shortcut?

Approach: Singular Value Decomposition

(of (centered, scaled) Data Matrix )

nd

dd

X~

Page 70: Detailed Look at PCA

Alternate PCA Computation

Singular Value Decomposition: tUSVX ~

Page 71: Detailed Look at PCA

Alternate PCA Computation

Singular Value Decomposition:Where:

is Unitaryis Unitaryis diag’l matrix of singular

val’s

VU

tUSVX ~

S

dd

nn

Page 72: Detailed Look at PCA

Alternate PCA Computation

Singular Value Decomposition:Where:

is Unitaryis Unitaryis diag’l matrix of singular

val’sAssume: decreasing singular values

VU

tUSVX ~

S

dd

nn

01 kss

Page 73: Detailed Look at PCA

Alternate PCA Computation

Singular Value Decomposition:Recall Relation to Eigen-Analysis of

USVX ~

ttttt UUSUSVUSVXXn 2))((~~ˆ1

Page 74: Detailed Look at PCA

Alternate PCA Computation

Singular Value Decomposition:Recall Relation to Eigen-Analysis of

Thus have Same Eigenvector Matrix

USVX ~

ttttt UUSUSVUSVXXn 2))((~~ˆ1

U

Page 75: Detailed Look at PCA

Alternate PCA Computation

Singular Value Decomposition:Recall Relation to Eigen-Analysis of

Thus have Same Eigenvector Matrix And Eigenval’s are Squares of Singular

Val’s

USVX ~

ttttt UUSUSVUSVXXn 2))((~~ˆ1

nisii ,,1,2

U

Page 76: Detailed Look at PCA

Alternate PCA Computation

Singular Value Decomposition,Computational Advantage (for

Rank ):Use Compact Form, only need to find

e-vec’s s-val’s scores

USVX ~

nrrrrd VSU ,,

r

Page 77: Detailed Look at PCA

Alternate PCA Computation

Singular Value Decomposition,Computational Advantage (for

Rank ):Use Compact Form, only need to find

e-vec’s s-val’s scores

Other Components not Useful

USVX ~

nrrrrd VSU ,,

r

Page 78: Detailed Look at PCA

Alternate PCA Computation

Singular Value Decomposition,Computational Advantage (for

Rank ):Use Compact Form, only need to find

e-vec’s s-val’s scores

Other Components not Useful

So can be much faster for

nd

USVX ~

nrrrrd VSU ,,

r

Page 79: Detailed Look at PCA

Alternate PCA Computation

Another Variation: Dual PCA

Idea: Useful to View Data Matrix

nddnd

n

XX

XXX

1

111

Page 80: Detailed Look at PCA

Alternate PCA Computation

Another Variation: Dual PCA

Idea: Useful to View Data Matrix as

Col’ns as Data Objects

nddnd

n

XX

XXX

1

111

Page 81: Detailed Look at PCA

Alternate PCA Computation

Another Variation: Dual PCA

Idea: Useful to View Data Matrix as Both

Col’ns as Data Objects &Rows as Data Objects

nddnd

n

XX

XXX

1

111

Page 82: Detailed Look at PCA

Alternate PCA Computation

Aside on Row-Column Data Object Choice:

Col’ns as Data Objects &Rows as Data Objects

nddnd

n

XX

XXX

1

111

Page 83: Detailed Look at PCA

Alternate PCA Computation

Aside on Row-Column Data Object Choice:

Personal Choice: (~Matlab,Linear Algebra)

Col’ns as Data Objects &Rows as Data Objects

nddnd

n

XX

XXX

1

111

Page 84: Detailed Look at PCA

Alternate PCA Computation

Aside on Row-Column Data Object Choice:

Personal Choice: (~Matlab,Linear Algebra)

Another Choice: (~SAS,R, Linear Models)

Variables

Col’ns as Data Objects &Rows as Data Objects

nddnd

n

XX

XXX

1

111

Page 85: Detailed Look at PCA

Alternate PCA Computation

Aside on Row-Column Data Object Choice:

Personal Choice: (~Matlab,Linear Algebra)

Another Choice: (~SAS,R, Linear Models)

Careful When Discussing!Col’ns as Data Objects &Rows as Data Objects

nddnd

n

XX

XXX

1

111

Page 86: Detailed Look at PCA

Alternate PCA Computation

Dual PCA Computation:Same as above, but replace

withX

tX

Page 87: Detailed Look at PCA

Alternate PCA Computation

Dual PCA Computation:Same as above, but replace

withSo can almost replace

with

X

tXX~~ˆ

tX

XX tD

~~ˆ

Page 88: Detailed Look at PCA

Alternate PCA Computation

Dual PCA Computation:Same as above, but replace

withSo can almost replace

withThen use SVD, , to get:

X

tXX~~ˆ

tUSVX ~

tttttttD VVSUSVVSUUSVUSVXX 2~~ˆ

tX

XX tD

~~ˆ

Page 89: Detailed Look at PCA

Alternate PCA Computation

Dual PCA Computation:Same as above, but replace

withSo can almost replace

withThen use SVD, , to get:

Note: Same Eigenvalues

X

tXX~~ˆ

tUSVX ~

tttttttD VVSUSVVSUUSVUSVXX 2~~ˆ

tX

XX tD

~~ˆ

Page 90: Detailed Look at PCA

Alternate PCA Computation

Appears to be cool symmetry:Primal Dual

Loadings Scores VU

Page 91: Detailed Look at PCA

Alternate PCA Computation

Appears to be cool symmetry:Primal Dual

Loadings Scores

But, there is a problem with the means

and normalization …

VU

Page 92: Detailed Look at PCA

Functional Data Analysis

Recall from 1st Class Meeting:

Spanish Mortality Data

Page 93: Detailed Look at PCA

Functional Data Analysis

Interesting Data Set:• Mortality Data• For Spanish Males (thus can relate to history)• Each curve is a single year• x coordinate is age

Note: Choice made of Data Object(could also study age as curves,

x coordinate = time)

Page 94: Detailed Look at PCA

Important Issue:

What are the Data Objects?• Curves (years) : Mortality vs. Age• Curves (Ages) : Mortality vs. Year

Note: Rows vs. Columns of Data Matrix

Functional Data Analysis

Page 95: Detailed Look at PCA

Mortality Time Series

ImprovedColoring:

RainbowRepresentingYear:

Magenta = 1908

Red = 2002

Page 96: Detailed Look at PCA

Mortality Time Series

Object Space View of ProjectionsOnto PC1Direction

Main Mode Of Variation:Constant Across Ages

Page 97: Detailed Look at PCA

Mortality Time Series

Shows MajorImprovementOver Time

(medical technology,etc.)

And ChangeIn Age Rounding Blips

Page 98: Detailed Look at PCA

Mortality Time Series

Object Space View of ProjectionsOnto PC2Direction

2nd Mode Of Variation:Difference Between 20-45 & Rest

Page 99: Detailed Look at PCA

Mortality Time SeriesScores Plot

Feature(Point Cloud)Space View

ConnectingLines HighlightTime Order

Good View ofHistoricalEffects

Page 100: Detailed Look at PCA

Demography Data• Dual PCA• Idea: Rows and Columns trade

places• Terminology: from optimization• Insights come from studying

“primal” & “dual” problems

Page 101: Detailed Look at PCA

Primal / Dual PCAConsider“Data Matrix”

dndid

jnjij

ni

xxx

xxx

xxx

1

1

1111

Page 102: Detailed Look at PCA

Primal / Dual PCAConsider“Data Matrix”

Primal Analysis:

Columns are data vectors

dndid

jnjij

ni

xxx

xxx

xxx

1

1

1111

Page 103: Detailed Look at PCA

Primal / Dual PCAConsider“Data Matrix”

Dual Analysis:Rows are data

vectors

dndid

jnjij

ni

xxx

xxx

xxx

1

1

1111

Page 104: Detailed Look at PCA

Demography DataDual PCA

Page 105: Detailed Look at PCA

Demography Data

Color Code (Ages)

Page 106: Detailed Look at PCA

Demography DataOlder People More At RiskOf Dying

Page 107: Detailed Look at PCA

Demography Data

ImprovementOver TimeOn Average

Page 108: Detailed Look at PCA

Demography Data

ImprovementOver TimeOn Average

Except Flu Pandemic& Civil War

Page 109: Detailed Look at PCA

Demography Data

PC1: ReflectsMortalityHigher forOlder People

Page 110: Detailed Look at PCA

Demography Data

PC2: YoungerPeopleBenefittedMost FromImprovedHealth

Page 111: Detailed Look at PCA

Demography Data

PC3: How To Interpret?

Page 112: Detailed Look at PCA

Demography DataUse:Mean + MaxMean - Min

Page 113: Detailed Look at PCA

Demography DataUse:Mean + MaxMean – Min

Flu PandemicCivil WarAuto Deaths

Page 114: Detailed Look at PCA

Demography DataPC3 Intepretation:

Mid-AgeCohort

Page 115: Detailed Look at PCA

Primal / Dual PCAWhich is “Better”?

Same Info, Displayed Differently

Here: PreferPrimal

dndid

jnjij

ni

xxx

xxx

xxx

1

1

1111

Page 116: Detailed Look at PCA

Primal / Dual PCAWhich is “Better”?

Same Info, Displayed Differently

Here: PreferPrimal, As Indicated by Graphics Quality

dndid

jnjij

ni

xxx

xxx

xxx

1

1

1111

Page 117: Detailed Look at PCA

Primal / Dual PCAWhich is

“Better”?In General:• Either Can Be

Best• Try Both and

Choose• Or Use

“Natural Choice” of

Data Object

dndid

jnjij

ni

xxx

xxx

xxx

1

1

1111

Page 118: Detailed Look at PCA

Primal / Dual PCAImportant Early Version:

BiPlot Display

Overlay Primal & Dual PCAsNot Easy to Interpret

Gabriel, K. R. (1971)

Page 119: Detailed Look at PCA

Return to Big PictureMain statistical goals of OODA:• Understanding population

structure– Low dim’al Projections, PCA …

• Classification (i. e. Discrimination)– Understanding 2+ populations

• Time Series of Data Objects– Chemical Spectra, Mortality Data

• “Vertical Integration” of Data Types

Page 120: Detailed Look at PCA

Return to Big PictureMain statistical goals of OODA:• Understanding population

structure– Low dim’al Projections, PCA …

• Classification (i. e. Discrimination)– Understanding 2+ populations

• Time Series of Data Objects– Chemical Spectra, Mortality Data

• “Vertical Integration” of Data Types

Page 121: Detailed Look at PCA

Classification - Discrimination

Background: Two Class (Binary) version:Using “training data” from

Class +1 and Class -1Develop a “rule” for

assigning new data to a ClassCanonical Example: Disease Diagnosis• New Patients are “Healthy” or “Ill”• Determine based on measurements

Page 122: Detailed Look at PCA

Classification - DiscriminationImportant Distinction:

Classification vs. Clustering

Page 123: Detailed Look at PCA

Classification - DiscriminationImportant Distinction:

Classification vs. ClusteringClassification:

Class labels are known,Goal: understand differences

Page 124: Detailed Look at PCA

Classification - DiscriminationImportant Distinction:

Classification vs. ClusteringClassification:

Class labels are known,Goal: understand differences

Clustering:Goal: Find class labels (to be similar)

Page 125: Detailed Look at PCA

Classification - DiscriminationImportant Distinction:

Classification vs. ClusteringClassification:

Class labels are known,Goal: understand differences

Clustering:Goal: Find class labels (to be similar)

Both are about clumps of similar data, but much different goals

Page 126: Detailed Look at PCA

Classification - Discrimination

Important Distinction: Classification vs. Clustering

Useful terminology:Classification: supervised learning

Clustering: unsupervised learning

Page 127: Detailed Look at PCA

Classification - Discrimination

Terminology:For statisticians, these are synonyms

Page 128: Detailed Look at PCA

Classification - Discrimination

Terminology:For statisticians, these are synonyms

For biologists, classification means:• Constructing taxonomies• And sorting organisms into them

Page 129: Detailed Look at PCA

Classification - Discrimination

Terminology:For statisticians, these are synonyms

For biologists, classification means:• Constructing taxonomies• And sorting organisms into them

(maybe this is why discriminationwas used, until politically incorrect…)

Page 130: Detailed Look at PCA

Classification (i.e. discrimination)

There are a number of:• Approaches• Philosophies• Schools of Thought

Too often cast as:Statistics vs. EE - CS

Page 131: Detailed Look at PCA

Classification (i.e. discrimination)

EE – CS variations:• Pattern Recognition• Artificial Intelligence• Neural Networks• Data Mining• Machine Learning

Page 132: Detailed Look at PCA

Classification (i.e. discrimination)

Differing Viewpoints:Statistics

• Model Classes with Probability Distribut’ns

• Use to study class diff’s & find rules

Page 133: Detailed Look at PCA

Classification (i.e. discrimination)

Differing Viewpoints:Statistics

• Model Classes with Probability Distribut’ns

• Use to study class diff’s & find rulesEE – CS

• Data are just Sets of Numbers• Rules distinguish between these

Page 134: Detailed Look at PCA

Classification (i.e. discrimination)

Differing Viewpoints:Statistics

• Model Classes with Probability Distribut’ns

• Use to study class diff’s & find rulesEE – CS

• Data are just Sets of Numbers• Rules distinguish between theseCurrent thought: combine these

Page 135: Detailed Look at PCA

Classification (i.e. discrimination)

Important Overview Reference:

Duda, Hart and Stork (2001)

• Too much about neural nets???• Pizer disagrees…• Update of Duda & Hart (1973)

Page 136: Detailed Look at PCA

Classification (i.e. discrimination)

For a more classical statistical view:

McLachlan (2004).

• Gaussian Likelihood theory, etc.• Not well tuned to HDLSS data

Page 137: Detailed Look at PCA

Classification BasicsPersonal Viewpoint: Point Clouds

Page 138: Detailed Look at PCA

Classification BasicsSimple and Natural Approach:

Mean Differencea.k.a.

Centroid MethodFind “skewer through two meatballs”

Page 139: Detailed Look at PCA

Classification BasicsFor Simple Toy Example:

ProjectOn MD

& splitat center

Page 140: Detailed Look at PCA

Classification BasicsWhy not use PCA?

ReasonableResult?

Doesn’t useclass labels…• Good?• Bad?

Page 141: Detailed Look at PCA

Classification BasicsHarder Example (slanted clouds):

Page 142: Detailed Look at PCA

Classification BasicsPCA for slanted clouds:

PC1 terrible

PC2 better?Still missesright dir’n

Doesn’t useClass Labels

Page 143: Detailed Look at PCA

Classification BasicsMean Difference for slanted clouds:

A little better?Still missesright dir’n

Want toaccount forcovariance

Page 144: Detailed Look at PCA

Classification BasicsMean Difference & Covariance,Simplest Approach:

Rescale (standardize) coordinate axesi. e. replace (full) data matrix:

Then do Mean DifferenceCalled “Naïve Bayes Approach”

ddndd

n

ddnd

n

sxsx

sxsxX

s

s

xx

xxX

//

//

/10

0/1

1

111111

1

111

Page 145: Detailed Look at PCA

Classification Basics

Naïve Bayes Reference:

Domingos & Pazzani (1997)

Most sensible contexts:• Non-comparable data• E.g. different units

Page 146: Detailed Look at PCA

Classification BasicsProblem with Naïve Bayes:

Only adjusts Variances

Not Covariances

Doesn’t solvethis problem

Page 147: Detailed Look at PCA

Classification BasicsBetter Solution: Fisher Linear

Discrimination

Gets theright dir’n

How does it work?