27
NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic Institute

NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

Embed Size (px)

Citation preview

Page 1: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Information Visualization: the Other Half of Data Analysis

Dr. Matthew WardComputer Science DepartmentWorcester Polytechnic Institute

Page 2: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

A Data Analysis PipelineRaw Data

Processed Data

HypothesesModels Results

Cleaning Filtering

Transforming

Statistical Analysis Pattern Rec

Knowledge Disc

Validation

A CB

D

Page 3: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Where Does Visualization Come In?

All stages can benefit from visualization A: identify bad data, select subsets, help

choose transforms (exploratory) B: help choose computational techniques,

set parameters, use vision to recognize, isolate, classify patterns (exploratory)

C: Superimpose derived models on data (confirmatory)

D: Present results (presentation)

Page 4: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

What do we need to know to do Information Visualization? Characteristics of data

Types, size, structure Semantics, completeness, accuracy

Characteristics of user Perceptual and cognitive abilities Knowledge of domain, data, tasks, tools

Characteristics of graphical mappings What are possibilities Which convey data effectively and efficiently

Characteristics of interactions Which support the tasks best Which are easy to learn, use, remember

Page 5: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Issues Regarding Data Type may indicate which graphical mappings are

appropriate Nominal vs. ordinal Discrete vs. continuous Ordered vs. unordered Univariate vs. multivariate Scalar vs. vector vs. tensor Static vs. dynamic Values vs. relations

Trade-offs between size and accuracy needs Different orders/structures can reveal different

features/patterns

Page 6: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Issues Regarding Users What graphical attributes do we perceive

accurately? What graphical attributes do we perceive

quickly? Which combinations of attributes are

separable? Coping with change blindness How can visuals support the development

of accurate mental models of the data? Relative vs. absolute judgements – impact

on tasks

Page 7: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Issues Regarding Mappings

Variables include shape, size, orientation, color, texture, opacity, position, motion….

Some of these have an order, others don’t

Some use up significant screen space Sensitivity to occlusion Domain customs/expectations

Page 8: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

www3.sympatico.ca/blevis/Image10.gif

Page 9: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Issues Regarding Interactions Interaction critical component Many categories of techniques

Navigation, selection, filtering, reconfiguring, encoding, connecting, and combinations of above

Many “spaces” in which interactions can be applied Screen/pixels, data, data structures,

graphical objects, graphical attributes, visualization structures

Page 10: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Importance of Evaluation Easy to design bad visualizations Many design rules exist – many conflict, many routinely

violated 5 E’s of evaluation: effective, efficient, engaging, error

tolerant, easy to learn Many styles of evaluation (qualitative and quantitative):

Use/case studies Usability testing User studies Longitudinal studies Expert evaluation Heuristic evaluation

Page 11: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Different Rules -> Different Views

Courtesy of Aisee.com

Page 12: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Categories of Mappings Based on data characteristics

Numbers, text, graphs, software, …. Logical groupings of techniques (Keim)

Standard: bars, lines, pie charts, scatterplots Geometrically transformed: landscapes, parallel

coordinates Icon-based: stick figures, faces, profiles Dense pixels: recursive segments, pixel bar charts Stacked: treemaps, dimensional stacking

Based on dimension management (Ward) Dimension subsetting: scatterplots, pixel-oriented methods Dimension reconfiguring: glyphs, parallel coordinates Dimension reduction: PCA, MDS, Self Organizing Maps Dimension embedding: dimensional stacking, worlds within

worlds

Page 13: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Scatterplot Matrix Each pair of

dimensions generates a single scatterplot

All combinations arranged in a grid or matrix, each dimension controls a row or column

Look for clusters, outliers, partial correlations, trends

Page 14: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Parallel Coordinates Each

variable/dimension is a vertical line

Bottom of line is low value, top is high

Each record creates a polyline across all dimensions

Similar records cluster on the screen

Look for clusters, outliers, line angles, crossings

Page 15: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Star Glyph Glyphs are shapes whose

attributes are controlled by data values

Star glyph is a set of N rays spaced at equal angles

Length of each ray proportional to value for that dimension

Line connects all endpoints of shape

Lay glyphs out in rows and columns

Look for shape similarities and differences, trends

Page 16: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Other Types of Glyphs

Page 17: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Dimensional Stacking Break each dimension range into bins Break the screen into a grid using the number of bins for

2 dimensions Repeat the process for 2 more dimensions within the

subimages formed by first grid, recurse through all dimensions

Look for repeated patterns, outliers, trends, gaps

Page 18: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Pixel-Oriented Techniques Each dimension

creates an image Each value controls

color of a pixel Many organizations of

pixels possible (raster, spiral, circle segment, space-filling curves)

Reordering data can reveal interesting features, relations between dimensions

Page 19: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Methods to Cope with Scale Many modern datasets contain large

number of records (millions and billions) and/or dimensions (hundreds and thousands)

Several strategies to handle scale problems Sampling Filtering Clustering/aggregation

Techniques can be automated or user-controlled

Page 20: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Examples of Data Clustering

Page 21: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Example of Dimension Clustering

Page 22: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Example of Data Sampling

Page 23: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

The Visual Data Analysis (VDA) Process

Overview Filter/cluster/sample Scan Select “interesting” Details on demand Link between different views

Page 24: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Demonstration

Page 25: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Summary

Visualization a powerful component of the data analysis process

Each stage of analysis can be enhanced Visualization can help guide

computational analysis, and vice versa Multiple linked views and a rich

assortment of interactions key to success

Page 26: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

For Further Info on XmdvTool http://davis.wpi.edu/~xmdv Contains source code, windows executable,

data sets, documentation, copies of most Xmdv publications, case studies

We gratefully acknowledge support for the development of XmdvTool from the National Science Foundation (IIS-9732897, IRIS-9729878, IIS-0119276, IIS-0414380, CCF-0811510, and IIS-0812027) and the National Security Agency

Page 27: NERCOMP Workshop, Dec. 2, 2008 Information Visualization: the Other Half of Data Analysis Dr. Matthew Ward Computer Science Department Worcester Polytechnic

NERCOMP Workshop, Dec. 2, 2008

Questions?