Uncertainty aware multidimensional ensemble data visualization and exploration

Preview:

Citation preview

Uncertainty-aware Multidimensional Ensemble Data Visualization and Exploration

-- Haidong Chen (Zhejiang University), Song Zhang (Mississippi State

University), Wei Chen, Honghui Mei, Jiawei Zhang (Zhejiang University), Andrew Mercer (Mississippi State University), Ronghua Liang (Zhejiang

University) and Huamin Qu (Hong Kong University of Science and Technology).

~Presented By:

Subhashis Hazarika (The Ohio State University)

Goal

• Come up with a projection scheme for multi(/high)-dimensional data which are uncertain( in this case ensemble of data).

• Naïve Approach involves getting the ensemble means and projecting it in low-dimensional space. But it involves loss of distributional information for ensemble objects.

• Applying MDS techniques with the dissimilarity matrix created from distributional distance field is computational intensive for large datasets.

• This work wants to strike a balance between the accuracy and efficiency of projection for large multidimensional ensemble data.

Motivation

Key Contributions

• A novel uncertainty-aware multidimensional projection approach, key factor: – A new dissimilarity measure for the ensemble data objects.

– An enhanced Laplacian-based projection scheme.

• Augment the users’ ability to visually study the ensemble dataset with a suite of visual exploration widgets.

Problem Formulation

• n ensemble data objects

• Each object has m d-dimensional ensemble members

• Goal is to build an l-dimensional representation preserving the relationships among the data-objects in terms of both the ensemble mean and ensemble distribution

Approach Overview

• 2 –step multidimensional projection: – A small set of control points are selected from U and projected to a 2D space using conventional

MDS method.

– Next all the other objects in U are projected to the 2D space with an enhanced Laplacian system that combines the influences from both the control points and the other points.

• To find the distance between two data objects they use both Euclidean distance between the ensemble means as well as the JSD between the ensemble distribution of the 2 objects.

• Overall Steps:

Create Prob. Distributions for Ensemble Data Objects

Dissimilarity Estimate

Enhanced Projection Scheme

Ensemble Data Objects & Prob. Distribution

• To reconstruct the continuous ensemble distribution for each data object, a multidimensional Kernel Density Estimate (KDE) method that considers the dimensional correlations is employed.

• Used a normal kernel, moreover the selection of the kernel K(.) is less important than the bandwidth matrix H in terms of influences on the estimation.

• Choices for H: – Scaled Identity Matrix :

– Diagonal Matrix:

– Generic Symmetric Positive Definite Matrix.

• Silverman’s rule of thumb:

Ensemble Data Objects & Prob. Distribution

• To take advantage of the simplicity offered by the diagonal matrix while preserving correlations among dimensions , KDE is performed not in the usual data space but in a space defined by the principal component transformation.

• Space transformation: – Mean centering approach:

– Then apply PCA to that yields a transformation matrix

– Lastly transform each ensemble member into a new set by:

– Because the bases of the new space are eigenvectors that are orthogonal to each other(i.e independent) we can go ahead and use the diagonal bandwidth matrix created using eq (2) for our KDE.

Dissimilarity Estimation

• Jensen Shannon Divergence:

• Dissimilarity between two distributions:

Enhanced Laplacian Based Projection

• Inspired by Least Square Projection. It is a 2-step local technique. Basic Idea: – First a subset of data objects are projected to the visual space.

– Then, rest of the data objects are interpolated according to the K-nearest neighborhood graph.

• Approach: – Select the initial control points using “K-center algorithm”. If we don’t

have any prior information about the data then we select points.

– We then calculate a set of K - nearest neighbors Ni for each ensemble object Ui. To avoid pairwise distributional difference calculation we select the control points and the nearest neighbors only based on the ensemble mean.

– Apparently Ni might not hold the true K-nearest neighbors of Ui because only the ensemble mean information is utilized. During the second step of projection we will identify those values and assign them to a random set Ri, which is an extension of Ni.

Enhanced Laplacian Based Projection

• Now project the control points using an iterative majorization algorithm called Scaling by Majorizing a Convex Function (SMACOF) (a kind of MDS technique). Let the low-dimensional control points be

• Laplacian-based projection schemes relies on the theory of convex combination. It says that the low dimensional representation for each high-dimensional data object can be regarded as a linear combination of its neighborhoods in the visual space. – Let be the projection of ensemble data object Ui, according to the convex combination theory

Vi can be written as:

Enhanced Laplacian Based Projection

Uncertainty Quantification

• Overall uncertainty Oi of the ensemble data object Ui is sum of standard deviation in all dimensions:

• Deviation of the t-th ensemble member of Ui is defined as its Euclidean distance to the ensemble mean:

From Quantification to Visualization

• Ensemble Bar: a color bar based representation to depict the uncertainty of each data object.

Visual Exploration and Interaction

Synthetic Data

NBA Players’ Statistics

Numerical Weather Simulation Dataset

Thank You

Recommended