46
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

Embed Size (px)

Citation preview

Page 1: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

The UNIVERSITY of KENTUCKY

Dimensionality Reduction

CS 685: Special Topics in Data Mining

Jinze Liu

Page 2: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Overview

• What is Dimensionality Reduction?– Simplifying complex data– Using dimensionality reduction as a Data Mining

“tool”– Useful for both “data modeling” and “data analysis”– Tool for “clustering” and “regression”

• Linear Dimensionality Reduction Methods– Principle Component Analysis (PCA)– Multi-Dimensional Scaling (MDS)

• Non-Linear Dimensionality Reduction

Page 3: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

What is Dimensionality Reduction?

• Given N objects, each with M measurements, find the best D-dimensional parameterization

• Goal: Find a “compact parameterization” or “Latent Variable” representation

Given N examples of find where

• Underlying assumptions to DimRedux– Measurements over-specify data, M > D – The number of measurements exceed the number of

“true” degrees of freedom in the system – The measurements capture all of the significant variability

Mix )p(fx~ Dp

Page 4: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Uses for DimRedux• Build a “compact” model of the data– Compression for storage, transmission, & retrieval– Parameters for indexing, exploring, and organizing– Generate “plausible” new data

• Answer fundamental questions about data– What is its underlying dimensionality?

How many degrees of freedom are exhibited?How many “latent variables”?

– How independent are my measurements?– Is there a projection of my data set where

important relationships stand out?

Page 5: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

DimRedux in Data Modeling

• Data Clustering - Continuous to Discrete– The curse of dimensionality: the

sampling density is proportional to N1/p.– Need a mapping to a lower-dimensional

space that preserves “important” relations

• Regression Modeling – Continuous to Continuous – A functional model that

generates input data– Useful for interpolation

• Embedding Space

Page 6: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Today’s Focus

• Linear DimRedux methods– PCA – Pearson (1901); Hotelling (1935)– MDS – Torgerson (1952), Shepard (1962)

• “Linear” Assumption– Data is a linear function of the parameters (latent

variables)

– Data lies on a linear (Affine) subspace

"likeplane"x)1(xx),(x

"likeline"x)1(x)(x

321

21

px~ M

where the matrix M is m x d

Page 7: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

PCA: What problem does it solve?

• Minimizes “least-squares” (Euclidean) error– The D-dimensional model provided by PCA has the

smallest Euclidean error of any D-parameter linear model.

where is the model predicted by the D-dimensional PCA.

• Projects data s.t. the variance is maximized• Find an optimal “orthogonal” basis set for

describing the given data

n

1i

2ii )xx~(min

ix~

Page 8: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Principle Component Analysis

• Also known to engineers as the Karhunen-Loéve Transform (KLT)

• Rotate data points to align successive axes with directions of greatest variance– Subtract mean from data– Normalize variance along each direction, and reorder

according to the variance magnitude from high to low

– Normalized variance direction = principle component

• Eigenvectors of system’s Covariance Matrix

permute to order eigenvectors in descending order

0e)()x()x(1n

1ii

Ti

n

ii

ICC

Page 9: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Simple PCA Example

• Simple 3D example>> x = rand(2, 500);>> z = [1,0; 0,1; -1,-1] * x + [0;0;1] * ones(1, 500);>> m = (100 * rand(3,3)) * z + rand(3, 500);>> scatter3(m(1,:), m(2,:), m(3,:), 'filled');

Page 10: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Simple PCA Example (cont)

>> mm = (m- mean(m')' * ones(1, 500));;>> [E,L] = eig(cov(mm ‘ ));>> E

E = 0.8029 -0.5958 0.0212 0.1629 0.2535 0.9535 0.5735 0.7621 -0.3006

>> LL = 172.2525 0 0 0 116.2234 0 0 0 0.0837

>> newm = E’ * (m - mean(m’)’' * ones(1, 500));>> scatter3(newm(1,:), newm(2,:), newm(3,:), 'filled'); axis([-50,50, -50,50, -50,50]);

Page 11: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Simple PCA Example (cont)

Page 12: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

PCA Applied to Reillumination

• Illumination can be modeled as an additive linear system.

)(R ixy

Page 13: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Simulating New Lighting• We can simulate the appearance of a model under new illumination by

combining images taken from a set of basis lights

• We can then capture real-world lighting and use it to modulate our basis lighting functions

Page 14: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Problems

• There are too many basis lighting functions– These have to be stored in order to use them– The resulting lighting model can be huge, in particular

when representing high frequency lighting– Lighting differences can be very subtle

• The cost of modulation is excessive– Every basis image must be scaled and added together– Each image requires a high-dynamic range

• Is there a more compact representation?– Yes, use PCA.

Page 15: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

PCA Applied to Illumination

• More than 90% variance is captured in the first five principle components

• Generate new illumination by combining only 5 basis images

V0

for n lights

Page 16: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Results Video

Page 17: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Results Video

Page 18: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Results Video

Page 19: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

MDS: What problem does it solve?

• Takes as input a dissimilarity matrix M, containing pairwise dissimilarities between N-dimensional data points

• Finds the best D-dimensional linear parameterization compatible with M

• (in other words, outputs a projection of data in D-dimensional space where the pairwise distances match the original dissimilarities as faithfully as possible)

Page 20: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Multidimensional Scaling (MDS)

• Dissimilarities can be metric or non-metric• Useful when absolute measurements are

unavailable; uses relative measurements• Computation is invariant to dimensionality of

data

Page 21: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

An example: map of the US

• Given only the distance between a bunch of cities

Page 22: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

An example: map of the US

• MDS finds suitable coordinates for the points of the specified dimension.

Page 23: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

MDS Properties• Parameterization is not unique –

Axes are meaningless– Not surprising since Euclidean transformations and

reflections preserve distances between points• Useful for visualizing relationships in high

dimensional data. – Define a dissimilarity measure– Map to a lower-dimensional space using MDS

• Common preprocess before cluster analysis– Aids in understanding patterns and relationships in

data• Widely used in marketing and psychometrics

Page 24: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Dissimilarities

• Dissimilarities are distance-like quantities that satisfy the following conditions:

• A dissimilarity is metric if, in addition, it satisfies:

“The triangle inequality”

symmetry)( )3

)similarity-self( 0)2

0)1

jiij

ii

ij

kjikijk

Page 25: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Relating MDS to PCA

• Special case: when distances are Euclidean• PCA = eigendecomposition of covariance

matrix MTM• Convert the pair-wise distance matrix to

the covariance matrix

Page 26: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

How to get MTM from Euclidean Pair-wise Distances

• Eigendecomposition on b to get VSVT

• VS1/2 = matrix of new coordinates

j

i

k

ijd

kjd

kid

Law of cosines

Definition of a dot product

Page 27: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Algebraically…

21 1 1 12

1 1 1 1

n n n n

ij ik kj ij kln n nk k k l

B D D D D

The distance between points pi and pjThe *Column

Average* the average distance that a given point is from pj

The *Row Average* the average distance that a given point is from pi

The “Matrix

Average”

So we “centered” the matrix

Page 28: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

MDS Mechanics

• Given a Dissimilarity matrix, D, the MDS model is computed as follows:

• Where, H, the so called “centering” matrix, is a scaled identity matrix computed as follows:

• MDS coordinates given by (in order of decreasing :

0e)B(HHDB ii2

21 I

I)1(H n1

,....),( ,22,11 iii eep

i

Page 29: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

MDS Stress

• The residual variance of B (i.e. the sum of the remaining eigenvalues) indicate the goodness of fit for the selected d-dimensional model

• This term is often called MDS “stress”• Examining the residual variance gives an indication of the

inherent dimensionality

Page 30: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Reflectance Modeling Example

• From Pellacini, et. al. “Toward a Psychophysically-Based Light Reflection Model for Image Synthesis,” SIGGRAPH 2000

• Objective – Find a perceptually meaningful parameterization for reflectance modeling

The top row of white, grey, and black balls have the same “physical” reflectance parameters, however, the bottom row is “perceptually” more consistent.

Page 31: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Reflectance Modeling Example

• User Task – Subjects were presented with 378 pairs of rendered spheres an asked to rate their difference in “glossiness” on a scale of 0 (no difference) to 100.

• A dissimilarity 27 x 27 dissimilarity matrix was constructed and MDS applied

Page 32: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Reflectance Modeling Example

• Parameters of a 2D embedding space were determined

• Two axes of “gloss” were established

Page 33: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Limitations of Linear methods

• What if the data does not lie within a linear subspace?

• Do all convex combinations of the measurements generate plausible data?

• Low-dimensional non-linear Manifold embedded in a higher dimensional space

• Next time: Nonlinear Dimensionality Reduction

Page 34: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Nonlinear Dimensionality Reduction

• Many data sets contain essential nonlinear structures that invisible to PCA and MDS

• Resorts to some nonlinear dimensionality reduction approaches.– Kernel methods

• Depend on the kernels• Most kernels are not data dependent

Page 35: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Nonlinear Approaches- Isomap

• Constructing neighbourhood graph G• For each pair of points in G, Computing shortest path distances ----

geodesic distances.• Use Classical MDS with geodesic distances. Euclidean distance Geodesic distance

Josh. Tenenbaum, Vin de Silva, John langford 2000

Page 36: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Sample points with Swiss Roll

• Altogether there are 20,000 points in the “Swiss roll” data set. We sample 1000 out of 20,000.

Page 37: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Construct neighborhood graph GK- nearest neighborhood (K=7)DG is 1000 by 1000 (Euclidean) distance matrix of two neighbors (figure A)

Page 38: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Compute all-points shortest path in G

Now DG is 1000 by 1000 geodesic distance matrix of two arbitrary points along the manifold (figure B)

Page 39: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Find a d-dimensional Euclidean space Y (Figure c) to preserve the pariwise diatances.

Use MDS to embed graph in Rd

Page 40: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

The Isomap algorithm

Page 41: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

PCA, MD vs ISOMAP

Page 42: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

• Nonlinear

• Globally optimal• Still produces globally optimal low-dimensional Euclidean

representation even though input space is highly folded,

twisted, or curved.

• Guarantee asymptotically to recover the true dimensionality.

Isomap: Advantages

Page 43: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

• May not be stable, dependent on topology of data

• Guaranteed asymptotically to recover geometric structure of nonlinear manifolds– As N increases, pairwise distances provide better

approximations to geodesics, but cost more computation

– If N is small, geodesic distances will be very inaccurate.

Isomap: Disadvantages

Page 44: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Applications

• Isomap and Nonparametric Models of Image Deformation

• LLE and Isomap Analysis of Spectra and Colour Images

• Image Spaces and Video Trajectories: Using Isomap to Explore Video Sequences

• Mining the structural knowledge of high-dimensional medical data using isomap Isomap Webpage: http://isomap.stanford.edu/

Page 45: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Summary• Linear dimensionality reduction tools are widely

used for– Data analysis– Data preprocessing– Data compression

• PCA transforms the measurement data s. t. successive directions of greatest variance are mapped to orthogonal axis directions (bases)– An D-dimensional embedding space

(parameterization) can be established by modeling the data using only the first d of these basis vectors

– Residual modeling error is the sum of the remaining eigenvalues

Page 46: CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY

Summary (cont)• MDS finds a d-dimensional parameterization that

best preserves a given dissimilarity matrix– Resulting model can be Euclidean transformed to

align data with a more intuitive parameterization– An D-dimensional embedding spaces

(parameterization) are established by modeling the data using only the first d coordinates of the scaled eigenvectors

– Residual modeling error (MDS stress) is the sum of the remaining eigenvalues

– If Euclidean metric dissimilarity matrix is used for MDS the resulting d-dimensional model will match the PCA weights for the same dimensional model