ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416:...

ISyE 6416: Computational StatisticsSpring 2017

Lecture 11: Principal Component Analysis(PCA)

Prof. Yao Xie

H. Milton Stewart School of Industrial and Systems EngineeringGeorgia Institute of Technology

Why PCA

I Real-world data sets usually exhibit structures among theirvariables

I Principal component analysis (PCA) rotates the original datato new coordinates

I Dimension reductionI ClassificationI Denoising

Data compression

Face recognition

Eigenfaces

Denoising

Data visualization

“The system collects opinions on statements as scalar values on acontinuous scale and applies dimensionality reduction to projectthe data onto a two-dimensional plane for visualization andnavigation.” Using Canonical Correla.on Analysis (CCA) for Opinion

Visualiza.on, Faridani et.al, UC Berkeley, 2010.

What is PCAI Given a data matrix X ∈ Rn×p: n samples, and p variablesI Transform data set to one with a few number of principal

components

vTj xi, j = 1, 2, . . . ,K, i = 1, 2, . . . , n.

I It is a form of linear dimension reductionI Specifically, “interesting directions” means “high variance”

Projection onto unit vectors

(xT v) ∈ R: score(xT v)v ∈ Rp: projection

Example: projection onto unit vectors

Projections onto orthonormal vectors

Source: R. Tibshrani.

First principal component

I The first principal component direction of X is the unit vectorv1 ∈ Rp that maximizes the sample variance of Xv1 ∈ Rn

among all unit length vector

v1 = arg max‖v‖2=1

(Xv)T (Xv)

Note that

xT1 v...

are projection of each of the sample on a unit length vector

I Xv1 is the first principal component score

Eigenvector and eigenvalue

v1 = arg max‖v‖2=1

(Xv)T (Xv)

I v1 corresponds to the largest eigenvector of XTX: samplecovariance matrix

I vT1 XTXv1, the largest eigenvalue of XTX is the variance

explained by v1I Rayleigh quotient R(v) = vTAv

λmin ≤vTAv

vT v≤ λmax

Example: X ∈ R50×2

Beyond first principal componentI What’s next? The idea is the find orthogonal directions of the

remaining highest varianceI Orthogonal: since we have already explained the variance

along v1, we need new direction that has no “overlap” with v1to avoid redundancy.

v2 = arg max‖v‖2=1,vT v1=0

(Xv)T (Xv)

I Can repeat this process to find the kth principal componentdirection

vk = arg max‖v‖2=1,vT vj=0,j=1,...,k−1

(Xv)T (Xv)

Example: X ∈ R50×2

Properties

I There are at most p principal components

I [v1, v2, . . . , vp] can be found from eigendecompositionLet Σ = XTX

Σ = UΛUT

where Λ = diag{λ1, . . . , λp}, U is orthogonal matrix

I Can be computed efficiently if you only need the first principlecomponent: power’s method

v(k) := Σv(k−1)/‖Σv(k−1)‖

I In general can be computed using Jacobi’s method

I Sparse Σ

PCA example: financial data analysis

I Daily swap rates of eight maturities from 7/3/2000 to7/15/2005

data from

http://www.stanford.edu/~xing/statfinbook/data.html

We typically take difference of these time series: yt = xt − xt−1, tomake it more “stationary”.

Use the MATLAB command to perform PCA

eigenvalues first 3 eigenvectors

Resulted first 3 reduced dimensions

ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416:...

Documents

6416 LT36231AU Cummins Emission Flyer A4

Lecture 3: Support Vector Machines - ISyE Home | ISyEtzhao80/Lectures/Lecture_3.pdfLecture 3: Support Vector Machines Tuo Zhao Schools of ISYE and CSE, Georgia Tech. CS7641/ISYE/CSE

Branc - ISyE Home | ISyE

App Isye Case

Spring 2009 ISyE News

ISyE Summer 2010

Lecture 2: Generative Learning - ISyE Hometzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech

ISyE 6416: Computational Statistics Spring 2017yxie77/isye6416_17/Lecture2.pdf · operation made swift and practical. 7.1962: Quicksort Algorithms for Sorting. For the e cient handling

6416- 80-5-:1,-;D

ISyE 6201: Manufacturing Systems Instructor : Spyros ... · ISyE 6201: Manufacturing Systems Instructor : Spyros Reveliotis ... ISYE 6201 Spring 2007 Homework 4 Solution 2 A. Question

ISYE 3232 Exam 1 solution

ISyE: Connected · ISyE: Connected. LETTER FROM THE ... Stewart School of Industrial & Systems Engineering’s (ISyE) ... Brings a Powerful Approach to Solving Problems and Making

Microwave Rotational Spectroscopy CHE 6416 Michael Evans 1

PROFIL Isye Dewi MC, Trainer_08985116061_isyedewi@Gmail.com

SIMULATION OUTPUT ANALYSIS - ISyE Homesman/courses/Mexico2010/Module09-Output... · SIMULATION OUTPUT ANALYSIS Dave Goldsman School of ISyE, Georgia Tech, Atlanta, Georgia, USA May

ISyE 6416: Computational Statistics Spring 2017 Lecture 5 ...yxie77/isye6416_17/Lecture11.pdf · LDA computations and sphering I The decision boundaries for LDA are useful for graphical

Fir s t i n I t s C l a s - ISyE Home | ISyE | Georgia ... · program in ISyE measures up in comparison to others. I think you will find the piece interesting, informative, and maybe

Whirlpool Awe 6416

FOR SALE 6416 Hollister Ave, Goleta, CA | NNN Investment ...€¦ · FOR SALE 6416 Hollister Ave, Goleta, CA | NNN Investment with Future Development Potential Outline of property

ISYE 4900 Click Consulting - Kennesaw State University