View
32
Download
0
Category
Preview:
DESCRIPTION
Visualization of High dimensional Datasets. Jahangheer Shaik. Why do we need Visualization?. Data visualization techniques are often required to obtain meaningful insights by reducing the cognitive load to effectively convert the data into information and knowledge for subsequent applications. . - PowerPoint PPT Presentation
Citation preview
Visualization of High dimensional Datasets
Jahangheer Shaik
Why do we need Visualization?
Noise? Distribution? Classes? Structure?
Data visualization techniques are often required to obtain meaningful insights by reducing the cognitive load to effectively convert the data into information and knowledge for subsequent applications.
Line Graphs
Line graphs are used for displaying single valued or piecewise continuous functions of one variable
Problems
Different types of lines (colored, dashed) have to be used to distinguish between the labeled classes
Each of the dimensions may have different scale
Bar Charts, Histograms
Histograms visualize discrete probability density functions
Hierarchical Clustering
Scatter Plot
Most popular tool Helps find clusters, outliers, trends,
correlations etc Glyphs, icons, colors etc may be used
for better understanding Not very intuitive when dimensions
increase
Scatter Plot Matrix
Eigen values and Eigen vectors
511
31
*13
22
46
*41624
46
*13
22
23
*4812
23
*13
22
Eigen vectors(contd..)
A transformation matrix transforms a vector from its original position to another position
If the transform results in the vector itself then the vector and all multiples of it would be eigen vector of transformation matrix
Properties of eigen vectors
Eigen vectors can be found for only square matrices
Given a n x n matrix, there are ‘n’ eigen vectors
It’s the direction that matters not scale Eigen vectors are orthogonal to each
other
Linear Discriminant Analysis
Maximizes the ratio of between class variance to within class variance
PCA-LDA
Dimensions: Orthogonality
Dimensions are organized such that they are orthogonal to each other
Inselberg points out that orthogonality uses up the space rapidly
Parallel Coordinates
Circular Parallel co-ordinates
Star coordinate projection
Star Coordinate Projection
J. Shaik and M. Yeasin, "Visualization of High Dimensional Data using an Automated 3D Star Co-ordinate System," Proceedings of IEEE IJCNN'06, Vancouver, Canada., pp. 1339-1346, 2006
Mathematical Representation
x
cosx
sinx
x
cosx
sinx
cosx
sinx
sinx
cosx
2D vs 3D
3D star coordinate system
cossinuu x
sinsinuu y
cosuu z
Results
00.5
1
-0.1
0
0.1-0.5
0
0.5
1
First 3DSCP component
3D scatter plot
Second 3DSCP component
Third
3D
SCP
com
pone
nt
class1class2class3
-0.4 -0.2 0 0.2 0.4 0.6-1
-0.50
0.5-0.2
0
0.2
0.4
0.6
0.8
1
1.2
X axis
3D star coordinate projection of IRIS dataset
Y axis
Z ax
is
Results
-4
-2
0
2
x 10-14
First principal component
PCA projection of Swiss roll Data
Second principal component
Third
prin
cipa
l com
pone
nt
Results
-200
20
-1
0
1
-0.5
0
0.5
1
x 10-14
First PCA component
3D scatter plot using PCA
Second PCA component
Third
PC
A c
ompo
nent
class1
class2
class3
-1.5-1
-0.50
-20
-10
0-1
-0.5
0
First LDA component
3D scatter plot using LDA
Second LDA component
Third
LD
A c
ompo
nent
class1class2class3
00.5
1
-0.1
0
0.1-0.5
0
0.5
1
First 3DSCP component
3D scatter plot
Second 3DSCP component
Third
3D
SCP
com
pone
nt
class1class2class3
Results
050
100150
200
-6
-4
-2
0-6.5
-6
-5.5
-5
-4.5
-4
-3.5
First PCA component
3D scatter plot
Second PCA component
Third
PC
A c
ompo
nent
class1class2class3
-2-1
01
2
-12
-10
-8
-6-4
-150
-100
-50
0
First LDA component
3D scatter plot
Second LDA component
Third
LD
A co
mpo
nent
class1class2class3
-0.4 -0.2 0 0.2 0.4 0.6-1
-0.50
0.5-0.2
0
0.2
0.4
0.6
0.8
1
1.2
X axis
3D star coordinate projection of IRIS dataset
Y axis
Z ax
is
Recommended