dimred lecture 2015Fallpoloclub.gatech.edu/cse6242/2016spring/slides/dimred... · 2016-04-19 · 22...

Dimension ReductionCSE 6242 / CX 4242

Thanks : Prof. Jaegul Choo , Dr. RamakrishnanKannan, Prof. Le Song

What is Dimension Reduction?

Dataitemindex(n)

Dimensionindex(d)

Columnsasdataitems

low-dimdata

DimensionReduction

Howbigisthis?

Attribute=Feature= Variable=Dimension

Serialized/rasterized pixel values

Image Data

5 34 78

Rawimages Pixelvalues

63Serializedpixels

In a 4K (4096x2160) image there are totally 8.8 million pixels

5 34 78

22 86 15

Serialized/rasterized pixel values

� Huge dimensions� 4096x2160 image size →8847360dimensions� 30 fps.� Means for 2 mins video, you generate a matrix of size

8847360x3600

Video Data

Rawimages

Pixelvalues

Serializedpixels

� Bag-of-words vector� Document 1 = “Life of Pi won Oscar”� Document 2 = “Life of Pi is also a book.”

Text Documents

movies

Vocabulary Doc1 Doc2

� Data items� How many data items?

� Dimensions� How many dimensions representing each item?

Two Axes of Data Set

Dataitemindex(n)

Dimensionindex(d)

Columnsasdataitems vs.Rowsasdataitems

Wewillusethisduringlecture

Dimension Reduction

High-dim data (n) low-dim

data (n)

No. of dimensions

Additional info about data

Other parameters

Dim-reducing transformation for new data

: user-specified

Reduced dimension

Dimensionindex (d)

Benefits of Dimension ReductionObviously,

CompressionVisualizationFaster computation

Computing distances: 100,000-dim vs. 10-dim vectorsMore importantly,

Noise removal (improving data quality)Separates the data into General Pattern + Sparse + NoiseIs Noise the important signal? Works as pre-processing for better performancee.g., microarray data analysis, information retrieval, face recognition, protein disorder prediction, network intrusion detection, document categorization, speech recognition

Two Main Techniques1. Feature selection

Selects a subset of the original variables as reduced dimensions relevant for a particular taske.g., the number of genes responsible for a particular disease may be small

2. Feature extractionEach reduced dimension combines multiple original dimensionsThe original dataset will be transformed to some other numbers

Feature = Variable = Dimension

Feature Selection

What are the optimal subset of m features to maximize a given criterion?

Widely-used criteriaInformation gain, correlation, …

Typically combinatorial optimization problems Therefore, greedy methods are popular

Forward selection: Empty set → Add one variable at a timeBackward elimination: Entire set → Remove one variable at a time

Feature Extraction

Aspects of Dimension Reduction

Linear vs. NonlinearUnsupervised vs. SupervisedGlobal vs. LocalFeature vectors vs. Similarity (as an input)

Linear vs. NonlinearLinear

Represents each reduced dimension as a linear combination of original dimensions

Of the form aX+b where a, x and b are vectors/matricese.g., Y1 = 3*X1 – 4*X2 + 0.3*X3 – 1.5*X4

Y2 = 2*X1 + 3.2*X2 – X3 + 2*X4Naturally capable of mapping new data to the same space

Dimension Reduction

X1 1 1

X2 1 0

X3 0 2

X4 1 1

Y1 1.75 -0.27

Y2 -0.21 0.58

Linear vs. Nonlinear

LinearRepresents each reduced dimension as a linear combination of original dimensions

e.g., Y1 = 3*X1 – 4*X2 + 0.3*X3 – 1.5*X4, Y2 = 2*X1 + 3.2*X2 – X3 + 2*X4

Naturally capable of mapping new data to the same space

NonlinearMore complicated, but generally more powerfulRecently popular topics

Unsupervised vs. SupervisedUnsupervised

Uses only the input data

Dimension Reduction

High-dim data

low-dim data

No. of dimensions

Other parameters

Dim-reducing Transformer for

a new data

Unsupervised vs. SupervisedSupervised

Uses the input data + additional info

Dimension Reduction

High-dim data

low-dim data

No. of dimensions

Other parameters

a new data

Unsupervised vs. SupervisedSupervised

Uses the input data + additional infoe.g., grouping label

Dimension Reduction

High-dim data

low-dim data

No. of dimensions

Other parameters

a new data

Global vs. Local

Dimension reduction typically tries to preserve all the relationships/distances in data

Information loss is unavoidable!Then, what should we emphasize more?

GlobalTreats all pairwise distances equally important

Focuses on preserving large distancesLocal

Focuses on small distances, neighborhood relationshipsActive research area, e.g., manifold learning

Feature vectors vs. Similarity (as an input)

Dimension Reduction

High-dim data (n)

low-dim data

No. of dimensions

Other parameters

a new data

Typical setup (feature vectors as an input)

Reduced dimension

(k)Dimensionindex (d)

Dimension Reduction

Similarity matrix

low-dim data

No. of dimensions

Other parameters

a new data

Typical setup (feature vectors as an input)Alternatively, takes similarity matrix instead

(i,j)-th component indicates similarity between i-th and j-th dataAssuming distance is a metric, similarity matrix is symmetric

Dimension Reduction

low-dim data(kxn)

No. of dimensions

Other parameters

a new data

Typical setup (feature vectors as an input)Alternatively, takes similarity matrix insteadInternally, converts feature vectors to similarity matrix before performing dimension reduction

Similarity matrix(nxn)

High-dim data (dxn)

Dimension Reduction

low-dim data(kxn)

Graph Embedding

Feature vectors vs. Similarity (as an input)Why called graph embedding?

Similarity matrix can be viewed as a graph where similarity represents edge weight

Similarity matrix

High-dim data(dxn)

Dimension Reduction

low-dim data

Graph Embedding

MethodsTraditional

Principal component analysis (PCA)Multidimensional scaling (MDS)Linear discriminant analysis (LDA)

Advanced (nonlinear, kernelized, manifold learning) Isometric feature mapping (Isomap)

* Matlab codes are available athttp://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html

Principal Component AnalysisFinds the axis showing the largest variation, and project all points into this axisReduced dimensions are orthogonalAlgorithm: Eigen-decompositionPros: FastCons: Limited performances

Image source: http://en.wikipedia.org/wiki/Principal_component_analysis

PC1PC2LinearUnsupervisedGlobalFeature vectors

PCA – Some Questions

AlgorithmSubtract mean from the dataset (X-μ)Find the covariance matrix (X-μ)’ (X-μ)Perform Eigen decomposition on this covariance matrix

Key QuestionsWhy covariance matrix?SVD on the original matrix vs Eigen decomposition on covariance matrix

Multidimensional Scaling (MDS)Main idea

Tries to preserve given pairwise distances in low-dimensional space

Metric MDSPreserves given distance values

Nonmetric MDSWhen you only know/care about ordering of distances Preserves only the orderings of distance values

Algorithm: gradient-decent typec.f. classical MDS is the same as PCA

NonlinearUnsupervisedGlobalSimilarity input

ideal distanceLow-dim distance

Multidimensional Scaling

Pros: widely-used (works well in general)Cons: slow (n-body problem)

Nonmetric MDS is even much slower than metric MDSFast algorithm are available.

Barnes-Hut algorithmGPU-based implementations

Linear Discriminant AnalysisWhat if clustering information is available?LDA tries to separate clusters by

Putting different cluster as far as possiblePutting each cluster as compact as possible

(a) (b)

Aspects of Dimension ReductionUnsupervised vs. Supervised

SupervisedUses the input data + additional info

e.g., grouping label

Dimension Reduction

High-dim data

low-dim data

No. of dimensions

Other parameters

a new data

Linear Discriminant Analysis (LDA)vs. Principal Component Analysis

2D visualization of 7 Gaussian mixture of 1000 dimensions

Linear discriminant analysis(Supervised)

Principal component analysis(Unsupervised)30

LDACompute mean of the two classes, global mean (μ1, μ2,μ) Compute class specific covariance matrix Sw

Compute between class covariance matrix using means. Call it Sb

Find the eigenvectors corresponding to the eigenvalues of inv(Sw)*Sb

CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 8

LDA, C classes • Fisher’s LDA generalizes gracefully for C-class problems

– Instead of one projection 𝑦, we will now seek (𝐶 − 1) projections [𝑦1, 𝑦2, …𝑦𝐶−1] by means of (𝐶 − 1) projection vectors 𝑤𝑖arranged by columns into a projection matrix 𝑊 = [𝑤1|𝑤2|… |𝑤𝐶−1]:

𝑦𝑖 = 𝑤𝑖𝑇𝑥 ⇒ 𝑦 = 𝑊𝑇𝑥

• Derivation – The within-class scatter generalizes as

𝑆𝑊 = 𝑆𝑖𝐶𝑖=1

• where 𝑆𝑖 = 𝑥 − 𝜇𝑖 𝑥 − 𝜇𝑖 𝑇𝑥∈𝜔𝑖

and 𝜇𝑖 =1𝑁𝑖 𝑥𝑥∈𝜔𝑖

– And the between-class scatter becomes

𝑆𝐵 = 𝑁𝑖 𝜇𝑖 − 𝜇 𝜇𝑖 − 𝜇 𝑇𝐶𝑖=1

• where 𝜇 = 1𝑁 𝑥∀𝑥 = 1

𝑁 𝑁𝑖𝜇𝑖𝐶𝑖=1

– Matrix 𝑆𝑇 = 𝑆𝐵 + 𝑆𝑊 is called the total scatter

S B 2S W 3

*http://research.cs.tamu.edu/prism/lectures/pr/pr_l10.pdf

Linear Discriminant AnalysisMaximally separates clusters by

Putting different cluster far apartShrinking each cluster compactly

Algorithm: generalized eigendecompositionPros: better at showing cluster structureCons: may distort original relationships of data

LinearSupervisedGlobalFeature vectors

Methods

TraditionalPrincipal component analysis (PCA)Multidimensional scaling (MDS)Linear discriminant analysis (LDA)

Advanced (nonlinear, kernelized, manifold learning) Isometric feature mapping (Isomap)

* Matlab codes are available athttp://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html

Manifold LearningSwiss Roll Data

Swiss roll dataOriginally in 3D

What is the intrinsicdimensionality? (allowing flattening)

Manifold LearningSwiss Roll Data

Swiss roll dataOriginally in 3D

What is the intrinsicdimensionality? (allowing flattening)→ 2D

What if your data has low intrinsic dimensionality but resides in high-dimensional space?

Isomap(Isometric Feature Mapping)

Let’s preserve pairwise geodesic distance (along manifold)Compute geodesic distance as the shortest path length from k-nearest neighbor (k-NN) graph*Eigen-decomposition on pairwise geodesic distance matrix to obtain embedding that best preserves given distances

* Eigen-decomposition is the main algorithm of PCA

Isomap(Isometric Feature Mapping)

Algorithm: all-pair shortest path computation + eigen-decompositionPros: performs well in generalCons: slow (shortest path), sensitive to parameters

NonlinearUnsupervisedGlobal: all pairwise distances are consideredFeature vectors

Practitioner’s GuideCaveats

Trustworthiness of dimension reduction resultsInevitable distortion/information loss in 2D/3DThe best result of a method may not align with what we want, e.g., PCA visualization of facial image data

(1, 2)-dimension (3, 4)-dimension

Practitioner’s GuideGeneral Recommendation

Want something simple and fast to visualize data?PCA, force-directed layout

Want to first try some manifold learning methods?Isomap

It is the method that will give empirically the best result. Have cluster label to use? (pre-given or computed)

LDA (supervised)Supervised approach is sometimes the only viable option when your data do not have clearly separable clusters

Practitioner’s GuideResults Still Not Good?

Try various pre-processingData centering

Subtract the global mean from each vectorNormalization

Make each vector have unit Euclidean normOtherwise, a few outlier can affect dimension reduction significantly

Application-specific pre-processingDocument: TF-IDF weighting, remove too rare and/or short termsImage: histogram normalization

Practitioner’s GuideToo Slow?

Apply PCA to reduce to an intermediate dimensions before the main dimension reduction step

The results may be even better due to noise removed by PCASee if there is any approximated but faster version

Landmarked versions (only using a subset of data items)e.g., landmarked Isomap

Linearized versions (the same criterion, but only allow linear mapping)

e.g., Laplacian Eigenmaps → Locality preserving projection

Practitioner’s GuideStill need more?

Tweak dimension reduction for your own purposePlay with its algorithm, convergence criteria, etc.

See if you can impose label informationRestrict the number of iterations to save computational time.

The main purpose of DR is to serve us in exploring data and solving complicated real-world problems

Take Away

PCA MDS LDA IsomapSupervised ✖ ✖ ✔ ✖

Linear ✔ ✖ ✔ ✖

Global ✔ ✔ ✔ ✔

Feature ✔ ✖ ✔ ✔

Useful ResourceTutorial on PCA

http://arxiv.org/pdf/1404.1100.pdfTutorial on LDA

http://research.cs.tamu.edu/prism/lectures/pr/pr_l10.pdfReview article

http://www.iai.uni-bonn.de/~jz/dimensionality_reduction_a_comparative_review.pdf

Matlab toolbox for dimension reductionhttp://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html

Matlab manifold learning demohttp://www.math.ucla.edu/~wittman/mani/

dimred lecture 2015Fallpoloclub.gatech.edu/cse6242/2016spring/slides/dimred... · 2016-04-19 · 22...

Documents

Group Project - Visualizationpoloclub.gatech.edu/cse6242/2016spring/slides/CSE6242-999-Projec… · Group Project Duen Horng (Polo) Chau Assistant Professor Associate Director, MS

Frame Synchronizer/Converter€¦ · Video Features • Video format converter, that accepts SD, HD or 3G SDI video and up-converts the signal to UltraHD (3840x2160) or 4K (4096x2160),

Autonomous Vessels - Transportation Research Boardonlinepubs.trb.org/onlinepubs/mb/2016spring/presentations/... · ©2016 American Bureau of Shipping. ... logic to follow mission

Cost-effectiveness economically feasible Basic Biology ...bel.kaist.ac.kr/extfiles/lecture/2016spring/bs223/Introduction.pdf · emission (Kyoto protocol) ... Enzymes Petrochemical-based

DEMONSTRATION GUIDE - ATEN · Video extenders VE1801EUT VE1832 VE883 Interface HDMI HDMI 2.0 HDMI HDBaseT HDBaseT 2.0 Fiber Resolution 1920x1080 (40m) 4096x2160 @60Hz 4:4:4 (100m)

Typeface Semantic Attribute Prediction from Rasterized

CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Np final portfolio rasterized for 150

NanoRacks Microgravity Research Platforms: “Big …national.spacegrant.org/2016Spring/6.pdfNanoRacks Microgravity Research Platforms: “Big Science in a Small Box ... •Fund-raisers

Magnetic Microspheres Drug Delivery System Rasterized

4k – the technical details. 4k or UHD Ultra High Definition (UHD) is 3840x2160 pixels 4k (digital cinema definition) is 4096x2160 pixels UHD is often

U.S. Navy Supervisor of Salvage and Divingonlinepubs.trb.org/onlinepubs/mb/2016spring/... · Mission SEA 00C's Mission is to provide technical, operational, and emergency support

Towards autonomous shipsonlinepubs.trb.org/onlinepubs/mb/2016spring/presentations/levander.pdf© 2016 Rolls-Royce plc Oskar Levander Digitalization – Disruptive change Internet of

CS 450: COMPUTER GRAPHICS VISIBLE SURFACE …web.cs.sunyit.edu/~realemj/2016spring/cs450_548/slides/CS_450_23...CS 450: COMPUTER GRAPHICS VISIBLE SURFACE DETECTION SPRING 2016

Quaternion Algebras Edgar Elliott - buzzard.ups.edubuzzard.ups.edu/courses/2016spring/projects/... · Quaternion Algebras Edgar Elliott Other Notes on Octonion Algebras Two complex

Shading? - WPIweb.cs.wpi.edu/~emmanuel/courses/cs4731/A14/slides/... · 2014-10-01 · Shading? After triangle is rasterized/drawn Per‐vertex lighting calculation means we know

Image Compression - Vis Centerryang/Teaching/cs635-2016spring/Lectures/11...Image Compression Lossy Compression and JPEG ... Discrete Cosine Transform ... Code for generating the Quantization

Source Supply Brochure rasterized images

Metabolic engineering Synthetic Biology - KAISTbel.kaist.ac.kr/.../lecture/2016spring/bs223/MetabolicEngineering.pdf · Metabolic engineering • Targeted and purposeful alteration

Protein Engineering - KAISTbel.kaist.ac.kr/.../lecture/2016spring/bs223/ProteinEngineering.pdf · share by developing a new version of EPO by protein engineering ... - Effect on bioactivity