Enhancing Set-Analysis through Scalable Visualizations

Preview:

DESCRIPTION

Enhancing Set-Analysis through Scalable Visualizations. Presented by: Hamid Haidarian Shahri ( hamid@cs.umd.edu ) Mudit Agrawal ( mudit@cs.umd.edu ). Content. Problem Definition Motivation Dataset Architecture Visualization Methods Interaction Tools Demo Future Work. - PowerPoint PPT Presentation

Citation preview

May 09, 2006 CMSC 838S    Information Visualization Spring 2006 1

Enhancing Set-Analysis through Scalable Visualizations

Presented by:

Hamid Haidarian Shahri (hamid@cs.umd.edu)

Mudit Agrawal(mudit@cs.umd.edu)

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

2

Content

Problem Definition Motivation Dataset Architecture Visualization Methods Interaction Tools Demo Future Work

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

3

Problem Definition

Analysis of sets by representing the clusters graphically depicting their internal and external links

Scaling visualization

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

4

Motivation

Sets are encountered in various domains websites commodities publications anything that has attributes!!

Visualization of sets to aid human perception is still an unsolved problem no direct relations between sets (or its elements) in spatial

domain can be grouped based on various attributes

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

5

Dataset 2700 law cases

Each case identified by a numerical id ranging from 1000 to 3718

Tuples in the dataset imply a referencing

Relation is unidirectional and not symmetric (the referencing also implies a temporal constraint on the cases)

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

6

Snapshot of the dataFirst 50 links (approximately 0.1 percent of whole dataset)

(1001,1105,'100 S.Ct. 318'),(1001,1612,'101 S.Ct. 2352'),(1001,1018,'107 S.Ct. 1232'),(1001,1016,'112 S.Ct. 2886'),(1001,2923,'113 S.Ct. 2264'),(1001,1016,'120 L.Ed.2d 798'),(1001,2923,'124 L.Ed.2d 539'),(1001,2286,'138 F.3d 1036'),(1001,2396,'238 F.3d 382'),(1001,3410,'438 U.S. 104'),(1001,1105,'444 U.S. 51'),(1001,1612,'452 U.S. 264'),(1001,1018,'480 U.S. 470'),(1001,1016,'505 U.S. 1003'),(1001,2923,'508 U.S. 602'),(1001,3410,'57 L.Ed.2d 631'),(1001,1105,'62 L.Ed.2d 210'),(1001,1612,'69 L.Ed.2d 1'),(1001,1789,'926 F.2d 1169'),(1001,1018,'94 L.Ed.2d 472'),(1001,3410,'98 S.Ct. 2646'),(1002,1276,'100 S.Ct. 2138'),(1002,1101,'105 S.Ct. 3108'),(1002,1018,'107 S.Ct. 1232'),(1002,1098,'107 S.Ct. 2378'),(1002,1016,'112 S.Ct. 2886'),(1002,1015,'114 S.Ct. 2309'),(1002,1016,'120 L.Ed.2d 798'),(1002,1013,'121 S.Ct. 2448'),(1002,1012,'122 S.Ct. 1465'),(1002,1015,'129 L.Ed.2d 304'),(1002,2316,'142 F.3d 1319'),(1002,1013,'150 L.Ed.2d 592'),(1002,1012,'152 L.Ed.2d 517'),(1002,1121,'266 F.3d 487'),(1002,3028,'306 F.3d 113'),(1002,3410,'438 U.S. 104'),(1002,1276,'447 U.S. 255'),(1002,1101,'473 U.S. 172'),(1002,1018,'480 U.S. 470'),(1002,1098,'482 U.S. 304'),(1002,1016,'505 U.S. 1003'),(1002,1015,'512 U.S. 374'),(1002,1013,'533 U.S. 606'),(1002,1012,'535 U.S. 302'),(1002,3410,'57 L.Ed.2d 631'),(1002,2091,'59 F.3d 852'),(1002,1276,'65 L.Ed.2d 106'),(1002,1889,'746 F.2d 135'),(1002,1101,'87 L.Ed.2d 126'),(1002,1018,'94 L.Ed.2d 472'),(1002,2319,'953 F.2d 1299'),(1002,1098,'96 L.Ed.2d 250'),(1002,3410,'98 S.Ct. 2646'),(1002,1022,'980 F.2d 84'),(1002,2670,'989 F.2d 362'),(1003,1104,'100 S.Ct. 383'),(1003,1611,'104 S.Ct. 2862'),(1003,1100,'106 S.Ct. 1018'),(1003,1099,'107 S.Ct. 2076'),(1003,1016,'112 S.Ct. 2886'),(1003,3110,'116 S.Ct. 2432'),(1003,1016,'120 L.Ed.2d 798'),(1003,1012,'122 S.Ct. 1465'),(1003,1881,'13 F.3d 1192'),(1003,3054,'133 F.3d 893'),(1003,3110,'135 L.Ed.2d 964'),(1003,1012,'152 L.Ed.2d 517'),(1003,1047,'18 F.3d 1560'),(1003,1886,'265 F.3d 1237'),(1003,2689,'271 F.3d 1090'),(1003,1358,'271 F.3d 1327'),(1003,1149,'28 F.3d 1171'),(1003,1040,'331 F.3d 891')

(1001,1105,'100 S.Ct. 318')

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

7

Architecture

DataClustering

Module

Similarity Metric

Clustered Data

Visualization Module

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

8

Routine K-Means Clustering Data points are in

vector space. x and are vectors. This assumption does

not hold for cases represented as sets.

Centroids are not simple geometric means.

In fact, mean does not make any sense.

2

1i

k

j ji j

VSx

j

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

9

Routine Self Organizing Map

Wv and D are assumed to be vectors.

Wv(t + 1) = Wv(t) + Θ(t)α(t) [D(t) - Wv(t)]

This assumption does not hold.

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

10

Similarity Measures

Jaccard similarity

Reference-based similarity

Weighted reference-based similarity

( , )A B

J A BA B

( , )S A B A B

( )

( , ) x A B

f x

WS A BA B

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

11

Contribution to clustering

Applying K-means and SOM for producing better visualizations

Not apparent at first glance, but the above algorithms are not applicable to set visualization directly

They assume a 2D or nD (vector) representation for each data point (i.e. law case). More specifically, the attributes must form a vector space.

This assumption does not hold no clear geometric attribute corresponding to the dataset

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

12

Similarity Metrics Geometric Metrics

1-D Partitioning

2-D Partitioning Sequential arrangement Distance based arrangement

1 2 5 9

3 4 7 12

6 8 11 14

10 13 15 16

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

13

K-Means

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

14

K-M

ean

s

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

15

SO

M a

fter

K-M

ean

s

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

16

Various Interactive Tools

Referencing pattern (activating all links)

Local referencing Density map Representative element Tool tip Link follow-up Search

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

17

Referencing Pattern

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

18

Local Referencing

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

19

Local Referencing

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

20

Density Map

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

21

Density Map

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

22

Representative Element

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

23

Link Follow-up

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

24

Link Follow-up

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

25

Link Follow-up

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

26

Link Follow-up

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

27

Link Follow-up

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

28

Link Follow-up

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

29

Link Follow-up

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

30

Link Follow-up

May 09, 2006 CMSC 838S    Information Visualization Spring 2006 31

DEMO

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

32

Future Work

Other clustering algorithms can be explored: Spectral Fuzzy C-means

More similarity functions

Better initial posting of data

Zooming and Panning

May 09, 2006 CMSC 838S    Information Visualization Spring 2006

33

References Abello, J., Korn, J., Visualizing Massive Multi-Digraphs. Proceedings of the IEEE Symposium on

Information Visualization 2000. Berry, M.W., Drma, Z., Jessup, E.R., Matrices, Vector Spaces, and Information Retrieval. SIAM Review,

41:2, 1999, pp. 335-362. Gansner , E.R., Koutsofios, E., North, S.C., Vo, K.P., A Technique for Drawing Directed Graphs. IEEE

Trans. on Soft. Eng. 19(3), 1993, pp. 214-230. Guimerà, R., Mossa, S., Turtschi, A., Amaral, L.A.N., The Worldwide Air Transportation Network:

Anomalous Centrality, Community Structure, and Cities' Global Roles. Proceedings of the National Academy of Sciences 102, May 31, 2005, pp. 7794-7799.

Jain, A.K., Murty, M.N., Flynn, P.J., Data Clustering: A Review. ACM Computing Surveys, 1999. Kohonen, T., The Self-Organizing Map. Proceedings of the IEEE, Volume 78, Issue 9, Sept. 1990, pp.

1464-1480. Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paatero, V., Saarela, A., Self organization

of a massive document collection. IEEE Transactions on Neural Networks, Vol. 11, 2000, pp. 574-585. Kunz, C., Botsch, V., Ziegler, J., Spath, D., Contextualizing Search Results in Networked Directories.

Proceedings of HCII, 2003. Leuski, A., Strategy-based Interactive Cluster Visualization for Information Retrieval. International

Journal on Digital Libraries, Vol. 3, Issue 2, 2000, pp. 170. Liu, X., Luo, M., Shneiderman B. Visualization of Sets. Unpublished manuscript, 2005. McQueen, J.B., Some Methods for classification and Analysis of Multivariate Observations.

Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1967, pp. 281-297.

Murata, T., Visualizing the Structure of Web Communities Based on Data Acquired From a Search Engine. IEEE Trans. on Industrial Electronics, Vol. 50, No. 5, 2003.

Palla, G., Derenyi, I., Farkas, I., Vicsek, T., Uncovering the Overlapping Structure of Complex Networks in Nature and Society. Nature Letters, Vol. 435, 9 June 2005, pp. 814.

Self-organizing map. Wikipedia, The Free Encyclopedia. Seo, J., Shneiderman, B., Understanding Hierarchical Clustering Results by Interactive Exploration of

Dendograms: A Case Study with Genomic Microarray Data. IEEE Computer Special Issue on Bioinformatics, Volume 35, No. 7, July 2002, pp. 80-86.

May 09, 2006 CMSC 838S    Information Visualization Spring 2006 34

Thanks!

Recommended