40
Intro Definiti on Size Complexi ty Wrap-up 54 Individu al Big Data Visual Analytics: Challenges and Opportunities Remco Chang and Eli Brown Tufts University

Big Data Visual Analytics: Challenges and Opportunities

Embed Size (px)

DESCRIPTION

Big Data Visual Analytics: Challenges and Opportunities. Remco Chang and Eli Brown Tufts University. Talk Outline. Visual Analytics + Big Data: What is Big Data Visual Analytics? Definition and Problem Statement How to Visualize Large Amounts of Data ? - PowerPoint PPT Presentation

Citation preview

Page 1: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up1/54 Individual

Big Data Visual Analytics: Challenges and Opportunities

Remco Chang and Eli BrownTufts University

Page 2: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up2/54 Individual

Talk Outline

• Visual Analytics + Big Data:

1. What is Big Data Visual Analytics? Definition and Problem Statement

2. How to Visualize Large Amounts of Data?

3. Tufts Research on Individual Differences

4. How to Visualize High Dimensional Data?

Page 3: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up3/54 Individual

1. What is Big Data Visual Analytics?A Definition and Problem Statement

Page 4: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up4/54 Individual

Defining Big Data for Visual Analytics

• Let’s say that I have a billion data items, is that Big Data?

• What if:– These data items only have two

attributes (e.g., latitude, longitude)?

– If I transpose this dataset such that I have two rows of data, but with a billion attributes?

Page 5: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up5/54 Individual

Defining Big Data for Visual Analytics

• Big Data is NOT just about the size of your data

• For the purpose of this talk, let’s talk about Big Data in the following way:

– Size: The number of rows (n)• Assume the amount of data cannot fit

into a desktop computer’s memory

– Complexity: The number of attributes (k) • Assume (k > 2)

Page 6: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up6/54 Individual

Problem Statements

• Considering the two together is too difficult, so we’ll tackle the two issues independently for now

• Our goal is to visualize (large| complex) data sets while:– Maintaining interactivity:

rendering at 10 fps – Allowing for operations on the

data (zoom, pivot, etc)

Page 7: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up7/54 Individual

2. How to Visualize Large Amount of Data?

Page 8: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up8/54 Individual

Problem Statement

Visualization on aCommodity Hardware

Large Data in aData Warehouse

Page 9: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up9/54 Individual

Problem Statement

• Constraint: Data is too big to fit into the memory or hard drive of the personal computer– Note: Ignoring various database technologies (OLAP, Column-

Store, No-SQL, Array-Based, etc)

• Classic Computer Science Problem…

• What are some previous techniques?– Truncate (sample, filter)– Resolution reduction (“blurring”, image zooming)– Stream (think Netflix, Hulu)– Pre-fetch (think open world 3D video games)

Page 10: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up10/54 Individual

Pros and Cons: Truncate

• Truncate (sample, filter)– Pros: Easy to implement; efficient; scalable– Cons: Sampling is often data- or task-dependent

SamplingAlgorithm

Page 11: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up11/54 Individual

Pros and Cons: Resolution Reduction

• Resolution reduction (“blurring”)– Pros: Allows hierarchical navigations– Cons:

• Fine details are often lost, • not all data types can be easily blurred (order-invariant data)

Page 12: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up12/54 Individual

Pros and Cons: Streaming

• Stream [Fisher et al. CHI 2012]– Pros: Query can be terminated at any time– Cons: It is inefficient on the database end

t = 1 second t = 5 minuteFisher et al. , Trust Me, I'm Partially Right: Incremental Visualization Lets Analysts Explore Large Datasets Faster. CHI 2012

Page 13: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up13/54 Individual

Pros and Cons: Pre-Fetch

• Pre-fetch– Pros: Seamless to the user– Cons: Predicting the future is kind of hard

• Possible in 3D games because of limited degrees of freedom• http://www.youtube.com/watch?v=n27NLuc44Lk

Page 14: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up14/54 Individual

Pros and Cons: Pre-Fetch

• Pre-fetch in Visual Analytics [Chan, Hanrahan, 2008 VAST]– Limit the types of operations a user can do– Allows interactive analysis of over a billion data points

Chan et al. ,. Maintaining Interactivity While Exploring Massive Time Series. IEEE VAST 2008

Page 15: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up15/54 Individual

Research at Tufts:User-Centric Pre-Fetching

Joint work with Caroline Ziemkiewicz , Alvitta Ottley

Page 16: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up16/54 Individual

Motivation

Page 17: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up17/54 Individual

Individual Differences and Interaction Pattern

• Existing research shows that all the following factors affect how someone uses a visualization:

– Spatial Ability– Cognitive Workload/Mental Demand– Personality– Experience (novice vs. expert)– Emotional State– Perceptual Speed– … and more

Page 18: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up18/54 Individual

Preliminary Study – Novice v. Expert

• Novice vs. Expert financial experts use of the WireVis system when searching for fraud

– Novice exhibited “breadth-first-search” behaviors

– Experts exhibited “depth-first-search” behaviors

• Our next step is to use Machine Learning methods to distinguish a user by analyzing their interactions in real-time

Page 19: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up19/54 Individual

Preliminary Study – Locus of Control

• Identified the personality factor, Locus of Control (LOC), as a predictor for how a user interacts with the following visualizations:

Page 20: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up20/54 Individual

Results

• When with list view compared to containment view, internal LOC users are:– faster (by 70%)– more accurate (by 34%)

• Only for complex (inferential) tasks• The speed improvement is about 2 minutes (116 seconds)R. Chang et al., How Locus of Control Influences Compatibility with Visualization Style , IEEE VAST 2011. R. Chang et al., How Visualization Layout Relates to Locus of Control and Other Personality Factors. TVCG 2012. To Appear.

Page 21: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up21/54 Individual

Cognitive / Affective Priming

Page 22: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up22/54 Individual

LOC Priming

Visual Form

List-View Containment

Performance

Poor

Good

Internal LOC

External LOC

Average ->Internal

Average LOC

R. Chang et al., Poster: Priming locus of control to affect performance. VAST Poster 2012.

Page 23: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up23/54 Individual

Affective Priming on Visual Judgment

R. Chang et al., Influencing Visual Judgment Through Affective Priming, CHI 2013. To Appear

Page 24: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up24/54 Individual

Affective Priming on Visual Judgment

R. Chang et al., Influencing Visual Judgment Through Affective Priming, CHI 2013. To Appear

Page 25: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up25/54 Individual

Preliminary Study – Using Brain Sensing (fNIRS)

Functional Near-Infrared Spectroscopy • a lightweight brain sensing technique • measures mental demand (working memory)

R. Chang et al., Using fNIRS Brain Sensing to Evaluate Information Visualization Interfaces. CHI 2013. To Appear

Page 26: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up26/54 Individual

This is Your Brain on Bar graphs and Pie Charts

3-back test

Page 27: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up27/54 Individual

Quick Summary

• Pre-Fetching is a promising approach for supporting interactive visual analysis of large amounts of data

• Our “User-Centric” approach is three-pronged:– Understand the user’s cognitive “traits” (e.g., LOC,

Numeracy, Spatial Ability, etc.)– Understand the user’s cognitive “states” (Cognitive Load,

Affect, etc.)– Alter the user’s behavior by influencing cognitive traits

and states through priming

Page 28: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up28/54 Individual

3. How to Visualize Complex (High-Dimensional) Data?

Page 29: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up29/54 Individual

Why is This Problem Hard?

You can only see 2D becauseYour monitor is 2D

In other words:you can show at most 2 dimensional data.

Everything else is a hack.

Page 30: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up30/54 Individual

Ways to Visualize k-Dimensional Data

• Two primary ways to do this “hack”

– Divide up the 2D screen into multiple 2D regions• Showing no correlation between

dimensions• Showing k-1 correlations• Showing all pair-wise correlations

– Project k-Dimensional Data into 2D• 3D to 2D• k-D projection

Page 31: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up31/54 Individual

Ways to Visualize k-Dimensional Data• Divide up the 2D screen into multiple 2D regions

– Showing no correlation between dimensions– Showing k-1 correlations– Showing all pair-wise correlations

• Project k-Dimensional Data into 2D– 3D to 2D– k-D projection

Page 32: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up32/54 Individual

Ways to Visualize k-Dimensional Data• Divide up the 2D screen into multiple 2D regions

– Showing no correlation between dimensions

– Showing k-1 correlations– Showing all pair-wise correlations

• Project k-Dimensional Data into 2D– 3D to 2D– k-D projection

Parallel Coordinates

Page 33: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up33/54 Individual

Ways to Visualize k-Dimensional Data• Divide up the 2D screen into multiple 2D regions

– Showing no correlation between dimensions– Showing k-1 correlations

– Showing all pair-wise correlations• Project k-Dimensional Data into 2D

– 3D to 2D– k-D projection

Scatterplot Matrix

Page 34: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up34/54 Individual

Ways to Visualize k-Dimensional Data• Divide up the 2D screen into multiple 2D regions

– Showing no correlation between dimensions– Showing k-1 correlations– Showing all pair-wise correlations

• Project k-Dimensional Data into 2D

– 3D to 2D– k-D projection

Page 35: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up35/54 Individual

Ways to Visualize k-Dimensional Data• Divide up the 2D screen into multiple 2D regions

– Showing no correlation between dimensions– Showing k-1 correlations– Showing all pair-wise correlations

• Project k-Dimensional Data into 2D

– 3D to 2D– k-D projection

Page 36: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up36/54 Individual

Ways to Visualize k-Dimensional Data• Divide up the 2D screen into multiple 2D regions

– Showing no correlation between dimensions– Showing k-1 correlations– Showing all pair-wise correlations

• Project k-Dimensional Data into 2D– 3D to 2D

– k-D projection Example Projection Methods:(Dimension Reduction)• PCA• MDS• LDA• LLE

Many others! Usually, try to preserve distances in 2D as they exist in k-D

Page 37: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up37/54 Individual

What We Have Done (at Tufts)

• We like projection methods because it is more scalable than the “divide the screen” methods

• iPCA – does interaction help understanding high dimensional data?– Demo

• Dis-Function – are interactions in 2D meaningful (recoverable) in k-D?– Switch to Eli

Page 38: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up38/54 Individual

Summary

Page 39: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up39/54 Individual

Summary

• Visual Analytics + Big Data:

1. Definition of Big Data Visual Analytics• (Large | Complex) Data Analysis

2. How to Visualize Large Amounts Data?• Pre-Fetching using individual

differences and priming

3. How to Visualize High Dimensional Data?• nD to 2D Projection• Translating interactions from 2D to nD

Page 40: Big Data Visual Analytics:  Challenges  and Opportunities

Intro Definition Size Complexity Wrap-up40/54 Individual