Visualization of High dimensional Datasets

Preview:

DESCRIPTION

Visualization of High dimensional Datasets. Jahangheer Shaik. Why do we need Visualization?. Data visualization techniques are often required to obtain meaningful insights by reducing the cognitive load to effectively convert the data into information and knowledge for subsequent applications. . - PowerPoint PPT Presentation

Citation preview

Visualization of High dimensional Datasets

Jahangheer Shaik

Why do we need Visualization?

Noise? Distribution? Classes? Structure?

Data visualization techniques are often required to obtain meaningful insights by reducing the cognitive load to effectively convert the data into information and knowledge for subsequent applications.

Line Graphs

Line graphs are used for displaying single valued or piecewise continuous functions of one variable

Problems

Different types of lines (colored, dashed) have to be used to distinguish between the labeled classes

Each of the dimensions may have different scale

Bar Charts, Histograms

Histograms visualize discrete probability density functions

Hierarchical Clustering

Scatter Plot

Most popular tool Helps find clusters, outliers, trends,

correlations etc Glyphs, icons, colors etc may be used

for better understanding Not very intuitive when dimensions

increase

Scatter Plot Matrix

Eigen values and Eigen vectors

511

31

*13

22

46

*41624

46

*13

22

23

*4812

23

*13

22

Eigen vectors(contd..)

A transformation matrix transforms a vector from its original position to another position

If the transform results in the vector itself then the vector and all multiples of it would be eigen vector of transformation matrix

Properties of eigen vectors

Eigen vectors can be found for only square matrices

Given a n x n matrix, there are ‘n’ eigen vectors

It’s the direction that matters not scale Eigen vectors are orthogonal to each

other

Linear Discriminant Analysis

Maximizes the ratio of between class variance to within class variance

PCA-LDA

Dimensions: Orthogonality

Dimensions are organized such that they are orthogonal to each other

Inselberg points out that orthogonality uses up the space rapidly

Parallel Coordinates

Circular Parallel co-ordinates

Star coordinate projection

Star Coordinate Projection

J. Shaik and M. Yeasin, "Visualization of High Dimensional Data using an Automated 3D Star Co-ordinate System," Proceedings of IEEE IJCNN'06, Vancouver, Canada., pp. 1339-1346, 2006

Mathematical Representation

x

cosx

sinx

x

cosx

sinx

cosx

sinx

sinx

cosx

2D vs 3D

3D star coordinate system

cossinuu x

sinsinuu y

cosuu z

Results

00.5

1

-0.1

0

0.1-0.5

0

0.5

1

First 3DSCP component

3D scatter plot

Second 3DSCP component

Third

3D

SCP

com

pone

nt

class1class2class3

-0.4 -0.2 0 0.2 0.4 0.6-1

-0.50

0.5-0.2

0

0.2

0.4

0.6

0.8

1

1.2

X axis

3D star coordinate projection of IRIS dataset

Y axis

Z ax

is

Results

-4

-2

0

2

x 10-14

First principal component

PCA projection of Swiss roll Data

Second principal component

Third

prin

cipa

l com

pone

nt

Results

-200

20

-1

0

1

-0.5

0

0.5

1

x 10-14

First PCA component

3D scatter plot using PCA

Second PCA component

Third

PC

A c

ompo

nent

class1

class2

class3

-1.5-1

-0.50

-20

-10

0-1

-0.5

0

First LDA component

3D scatter plot using LDA

Second LDA component

Third

LD

A c

ompo

nent

class1class2class3

00.5

1

-0.1

0

0.1-0.5

0

0.5

1

First 3DSCP component

3D scatter plot

Second 3DSCP component

Third

3D

SCP

com

pone

nt

class1class2class3

Results

050

100150

200

-6

-4

-2

0-6.5

-6

-5.5

-5

-4.5

-4

-3.5

First PCA component

3D scatter plot

Second PCA component

Third

PC

A c

ompo

nent

class1class2class3

-2-1

01

2

-12

-10

-8

-6-4

-150

-100

-50

0

First LDA component

3D scatter plot

Second LDA component

Third

LD

A co

mpo

nent

class1class2class3

-0.4 -0.2 0 0.2 0.4 0.6-1

-0.50

0.5-0.2

0

0.2

0.4

0.6

0.8

1

1.2

X axis

3D star coordinate projection of IRIS dataset

Y axis

Z ax

is

Recommended