IIIT Hyderabad Interactive Visualization and Tuning of Multi-Dimensional Clusters for Indexing...
42
IIIT Hyderabad Interactive Visualization and Tuning of Multi- Dimensional Clusters for Indexing Dasari Pavan Kumar (MS by Research Thesis) Centre for Visual Information Technology
IIIT Hyderabad Interactive Visualization and Tuning of Multi-Dimensional Clusters for Indexing Dasari Pavan Kumar (MS by Research Thesis) Centre for Visual
Large scale visualizationsDasari Pavan Kumar IIIT Hyderabad Overview Provide a framework to generate better clusters for high dimensional data points Provide a fast cluster analysis/generation tool IIIT Hyderabad A difficult task however! Data generation in previous decade consisted mostly of textual information Inverted Index, suffix trees, N-grams, etc IIIT Hyderabad More data ! Non-textual information (images) Underlying processes remain similar IIIT Hyderabad can’t be fully automated. IIIT Hyderabad Data Visualization Cluster analysis – descriptive modeling Identify important features/patterns XMDV tool (M. Ward) Past Attempts! IIIT Hyderabad Indexing images/videos Apply clustering to compute bag of words Generate feature histogram and perform some ML methods IIIT Hyderabad Indexing images/videos Apply clustering to compute bag of words Generate feature histogram and perform some ML methods IIIT Hyderabad Other low-level image features exist GLOH, steerable filter, spin images IIIT Hyderabad Clusters + visualization The problem IIIT Hyderabad Cluster analysis Identify better subspaces Efficiently/quickly compute clusters Compare clustering schemas Hence PCA cant be trivially applied Clusters could be lost in cloud of dimensions (curse of dimensionality) Difficult to interpret the combination IIIT Hyderabad Feature selection Wrapper model Filter model Difficult to compare since its highly dependent on density parameter Rank dimensions Uniformity (Entropy) IIIT Hyderabad Ranked dimensions 1D Histogram of distribution Manual Cluster such data on a commodity pc Almost impossible IIIT Hyderabad Data clustering Currently using k-means (GPU) IIIT Hyderabad Extracted low-level image descriptors Manageable size (high dimensional) Statistical sampling Not feasible Plug-in any graph drawing Current – 2D force based Similar nodes must be close Can be estimated using MST Generate minimum spanning tree (MST) of cluster centers Single linkage dendogram Prim’s method Takes 0.2 sec for 1000 nodes Drill-down “visual word” to actually see the “sift” interest points to understand the similarity MST without layout MST with layout IIIT Hyderabad Cluster validation Three basic strategies External – build an independent partition according to our intuition Comparison with schema C or proximity matrix. Relative – choose the one that best fits !! Computationally not feasible GPU implementation Obtain min/max of the graph – optimal clusters Nc Iteration Index IIIT Hyderabad 15 categories IIIT Hyderabad Interesting observation Same with corner cells 78, 79, 83, 84, 110, 116} 1D histograms corresponding to dimensions (a)84, (b) 110, (c) 124 IIIT Hyderabad More clusters does not mean better classification Fei-Fei et al. report a mean accuracy of 52.5% VW = Number of visual words, EW = K-means using uniform weights, IW = K-means with weights adjusted interactively, IW-Ds = K-means with Ds dimensions given a weight zero and weights of other dimensions adjusted interactively. IIIT Hyderabad More clusters does not necessarily mean better classification Fei-Fei et al. report a mean accuracy of 52.5% IIIT Hyderabad Provide a framework for better cluster generation Provide fast cluster analysis/generation tool for a commodity pc enabled with GPU Able to analyze distributions across dimensions Identified redundant dimensions IIIT Hyderabad Publications Interactive Visualization and Tuning of SIFT Indexing, Dasari Pavan Kumar and P.J.Narayanan, Vision, Modelling and Visualization, 2010, Siegen, Germany IIIT Hyderabad User needs to get familiarized with the tool Visual decoding of data is sometimes difficult Cluster generation still depends on parameters like K (no. of clusters). IIIT Hyderabad Future Work Incorporate support for subspace clustering Conduct experiments based on wrapper clustering methods IIIT Hyderabad Thank you