39
Network-based Visual Analysis of Tabular Data Zhicheng Liu, Shamkant Navathe, John Stasko

Network-based Visual Analysis of Tabular Data

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Network-based Visual Analysis of Tabular Data

Network-based Visual Analysis of Tabular Data

Zhicheng Liu, Shamkant Navathe, John Stasko

Page 2: Network-based Visual Analysis of Tabular Data

Tabular Data

2

Page 3: Network-based Visual Analysis of Tabular Data

Tabular Data

• Rows and columns

• Rows are data cases; columns are attributes/dimensions

• Attribute types

o Quantitative (numbers)

o Ordinal (e.g. small, medium, large)

o Nominal (names, categories)

3

Page 4: Network-based Visual Analysis of Tabular Data

Insight Discovery on Tabular Data: Example

• NSF Grants

o Grant Title

o Amount

o Date

o Program Manager

o Awardee / Researcher Name

o Awardee Affiliation

o …

4

Page 5: Network-based Visual Analysis of Tabular Data

Visualizing Tabular Data

5

[Rao and Card, 1994]

[Tableau]

[Spotfire]

Page 6: Network-based Visual Analysis of Tabular Data

6

# Grants in each program

Page 7: Network-based Visual Analysis of Tabular Data

7

Amount by date

Page 8: Network-based Visual Analysis of Tabular Data

8

Amount by date & ProgMgr

Page 9: Network-based Visual Analysis of Tabular Data

9

Collaboration between

institutions?

Page 10: Network-based Visual Analysis of Tabular Data

10

Relationship between

ProgMgr and Researchers?

Page 11: Network-based Visual Analysis of Tabular Data

11

• Quantitative attributes

o Nominal data as independent

variables, or not handled

• Patterns of distributions,

correlations and outliers of

numerical values

• Nominal attributes

o Quantiative attributes are

useful too

• Entities with interesting

roles, emergent global

structures

Page 12: Network-based Visual Analysis of Tabular Data

12

Explicit Network Tabular Data

Current State of the Art

Spotfire , Tableau,

TableLens ….

GUESS, UCINet,

SocialAction ….

Page 13: Network-based Visual Analysis of Tabular Data

Problems with this Partition

• Analytical: Network semantics are dynamic

• Usability: Modeling networks is tedious and requires

programming skill

13

Counter-intuitive for Exploratory Analysis

Page 14: Network-based Visual Analysis of Tabular Data

NodeXL

14 [Hansen, et. al. 2010]

Page 15: Network-based Visual Analysis of Tabular Data

Questions & Approaches

15

Goal: Network-based Visual Analysis of Tabular Data

1. Which conceptually

meaningful operations are

necessary to extract and

transform tabular data into

networks for exploratory

analysis?

2. Given a set of operations,

how to provide analysts with

easy access to these

operations and to couple

network modeling with

exploratory analysis?

• Domain independent

• Generalized operations

• Expressive power

• Hide technical details

• Reduce articulatory distance

• Immediate visual feedback

Page 16: Network-based Visual Analysis of Tabular Data

• Networks: Weighted Simple

Graphs

o Undirected

o At most one edge between any two

nodes

o Edges are weighted

Formal Framework

• Tables: Relational model [Codd, 1969]

o Each row is uniquely identifiable

o Values in each cell is atomic: number, boolean, string, date

16

A B

Page 17: Network-based Visual Analysis of Tabular Data

An Example

17

ID LastNm FirstNm Type Date Size Visitee Loc

1 Dodd Chris VA 6/25/09 2018 POTUS WH

2 Smith John VA 6/26/09 237 Office

Visitors WH

3 Smith John AL 6/26/09 144 Amanda

Kepko OEOB

4 Hirani Amyn VA 6/30/09 184 Office

Visitors WH

5 Keehan Carol VA 6/30/09 8 Kristin

Sheehy WH

6 Keehan Carol VA 7/8/09 26 Daniella

Leger OEOB

Page 18: Network-based Visual Analysis of Tabular Data

First-order Graph: Single Table

18

ID LastNm FirstNm Type Date Size Visitee Loc

1 Dodd Chris VA 6/25/09 2018 POTUS WH

2 Smith John VA 6/26/09 237 Office Visitors WH

3 Smith John AL 6/26/09 144 Amanda

Kepko OEOB

4 Hirani Amyn VA 6/30/09 184 Office Visitors WH

5 Keehan Carol VA 6/30/09 8 Kristin Sheehy WH

6 Keehan Carol VA 7/8/09 26 Daniella Leger OEOB

LastNm FirstNm

Dodd Chris

Smith John

Smith John

Hirani Amyn

Keehan Carol

Keehan Carol

Loc

WH

WH

OEOB

WH

WH

OEOB

Page 19: Network-based Visual Analysis of Tabular Data

First-order Graph: Single Table

19

ID LastNm FirstNm Type Date Size Visitee Loc

1 Dodd Chris VA 6/25/09 2018 POTUS WH

2 Smith John VA 6/26/09 237 Office Visitors WH

3 Smith John AL 6/26/09 144 Amanda

Kepko OEOB

4 Hirani Amyn VA 6/30/09 184 Office Visitors WH

5 Keehan Carol VA 6/30/09 8 Kristin Sheehy WH

6 Keehan Carol VA 7/8/09 26 Daniella Leger OEOB

LastNm, FirstNm

Dodd Chris

Smith John

Smith John

Hirani Amyn

Keehan Carol

Keehan Carol

Loc

WH

WH

OEOB

WH

WH

OEOB

Page 20: Network-based Visual Analysis of Tabular Data

First-order Graph: Single Table

20

ID LastNm FirstNm Type Date Size Visitee Loc

1 Dodd Chris VA 6/25/09 2018 POTUS WH

2 Smith John VA 6/26/09 237 Office Visitors WH

3 Smith John AL 6/26/09 144 Amanda

Kepko OEOB

4 Hirani Amyn VA 6/30/09 184 Office Visitors WH

5 Keehan Carol VA 6/30/09 8 Kristin Sheehy WH

6 Keehan Carol VA 7/8/09 26 Daniella Leger OEOB

Type

VA

VA

AL

VA

VA

VA

LastNm, FirstNm

Dodd Chris

Smith John

Smith John

Hirani Amyn

Keehan Carol

Keehan Carol

Loc

WH

WH

OEOB

WH

WH

OEOB

Page 21: Network-based Visual Analysis of Tabular Data

First-order Graph: Single Table

21

ID LastNm FirstNm Type Date Size Visitee Loc

1 Dodd Chris VA 6/25/09 2018 POTUS WH

2 Smith John VA 6/26/09 237 Office Visitors WH

3 Smith John AL 6/26/09 144 Amanda

Kepko OEOB

4 Hirani Amyn VA 6/30/09 184 Office Visitors WH

5 Keehan Carol VA 6/30/09 8 Kristin Sheehy WH

6 Keehan Carol VA 7/8/09 26 Daniella Leger OEOB

[Type]

[Type = VA]

[Type = VA]

[Type = AL]

[Type = VA]

[Type = VA]

[Type = VA]

LastNm, FirstNm

Dodd Chris

Smith John

Smith John

Hirani Amyn

Keehan Carol

Keehan Carol

Loc

WH

WH

OEOB

WH

WH

OEOB

Page 22: Network-based Visual Analysis of Tabular Data

Higher-order Graph: Transformations

• Aggregation

• Projection

• Edge Weighting

• Slicing ‘n Dicing

22

Page 23: Network-based Visual Analysis of Tabular Data

Aggregation: Entity Resolution

23

Dodd, Chris WH

Smith, John WH

Keehan, Carol OEOB

Hirani, Amyn WH

Keehan, Carol WH

Smith, John OEOB

Dodd, Chris

Smith, John WH

Hirani, Amyn

Keehan, Carol

Smith, John

OEOB

original graph after aggregation

Page 24: Network-based Visual Analysis of Tabular Data

Aggregation: Pivoting

24

Dodd, Chris

Smith, John WH

Hirani, Amyn

Keehan, Carol

Smith, John

OEOB

VA WH

AL OEOB

Type Location

original graph after aggregation

Page 25: Network-based Visual Analysis of Tabular Data

Projection

25

Dodd, Chris

OEOB

WH

Hirani, Amyn

Keehan, Carol

Smith, John

Dodd, Chris Smith, John

Hirani, Amyn Keehan, Carol

Page 26: Network-based Visual Analysis of Tabular Data

Edge Weighting

26

ID LastNm FirstNm Type Date Size Visitee Loc

1 Dodd Chris VA 6/25/09 2018 POTUS WH

2 Smith John VA 6/26/09 237 Office Visitors WH

3 Smith John AL 6/26/09 144 Amanda

Kepko OEOB

4 Hirani Amyn VA 6/30/09 184 Office Visitors WH

5 Keehan Carol VA 6/30/09 8 Kristin Sheehy WH

6 Keehan Carol VA 7/8/09 26 Daniella Leger OEOB

Size

2018

237

144

184

8

26

LastNm, FirstNm

Dodd Chris

Smith John

Hirani Amyn

Keehan Carol

Visitee

POTUS

Office Visitors

Amanda

Kepko

Kristin Sheehy

Daniella Leger

Page 27: Network-based Visual Analysis of Tabular Data

Edge Weighting

27

ID LastNm FirstNm Type Date Size Visitee Loc

1 Dodd Chris VA 6/25/09 2018 POTUS WH

2 Smith John VA 6/26/09 237 Office Visitors WH

3 Smith John AL 6/26/09 144 Amanda

Kepko OEOB

4 Hirani Amyn VA 6/30/09 184 Office Visitors WH

5 Keehan Carol VA 6/30/09 8 Kristin Sheehy WH

6 Keehan Carol VA 7/8/09 26 Daniella Leger OEOB

LastNm, FirstNm

Dodd Chris

Smith John

Hirani Amyn

Keehan Carol

Visitee

POTUS

Office Visitors

Amanda

Kepko

Kristin Sheehy

Daniella Leger

2018

237

144

184

8

26

Page 28: Network-based Visual Analysis of Tabular Data

Loc

WH

WH

OEOB

WH

WH

OEOB

Slice ‘n Dice

28

ID LastNm FirstNm Type Date Size Visitee Loc

1 Dodd Chris VA 6/25/09 2018 POTUS WH

2 Smith John VA 6/26/09 237 Office Visitors WH

3 Smith John AL 6/26/09 144 Amanda

Kepko OEOB

4 Hirani Amyn VA 6/30/09 184 Office Visitors WH

5 Keehan Carol VA 6/30/09 8 Kristin Sheehy WH

6 Keehan Carol VA 7/8/09 26 Daniella Leger OEOB

LastNm, FirstNm

Dodd Chris

Smith John

Hirani Amyn

Keehan Carol

Visitee

POTUS

Office Visitors

Amanda

Kepko

Kristin Sheehy

Daniella Leger

Page 29: Network-based Visual Analysis of Tabular Data

Loc

WH

WH

OEOB

WH

WH

OEOB

Slice ‘n Dice

29

ID LastNm FirstNm Type Date Size Visitee Loc

1 Dodd Chris VA 6/25/09 2018 POTUS WH

2 Smith John VA 6/26/09 237 Office Visitors WH

3 Smith John AL 6/26/09 144 Amanda

Kepko OEOB

4 Hirani Amyn VA 6/30/09 184 Office Visitors WH

5 Keehan Carol VA 6/30/09 8 Kristin Sheehy WH

6 Keehan Carol VA 7/8/09 26 Daniella Leger OEOB

LastNm, FirstNm

Dodd Chris

Smith John

Hirani Amyn

Keehan Carol

Visitee

POTUS

Office Visitors

Kristin Sheehy

LastNm, FirstNm

Smith John

Keehan Carol

Visitee

Amanda

Kepko

Daniella Leger

WH OEOB

Page 30: Network-based Visual Analysis of Tabular Data

Expressive Power

• Proximity grouping

• Extending to directed one-mode network

• Limitations

30

Page 31: Network-based Visual Analysis of Tabular Data

Ploceus Interface Overview

31

Data Manage-

ment View

Network Schema

View

Network Visualization View

Page 32: Network-based Visual Analysis of Tabular Data

Direct Manipulation Interface

32

Page 33: Network-based Visual Analysis of Tabular Data

Ploceus

Demo

33

Page 34: Network-based Visual Analysis of Tabular Data

Related Work

34

Centrifuge Orion

Page 35: Network-based Visual Analysis of Tabular Data

Multiple Tables

35

GID Title Program Program

Manager Amount Year

1 Data Mining of

Digital Behavior Statistics

Sylvia

Spengler 2241750 2001

2

Real-time Capture,

Management and

Reconstruction of

Spatio-Temporal

Events

Information

Technology

Research

Maria

Zemankova 430000 2000

3

Statistical Data

Mining

of Time-Dependent

Data with

Applications in

Geoscience and

Biology

ITR for

National

Priorities

Sylvia

Spengler 566644 2003

PID Name Org

1 Padhraic

Smyth

University of

California

Irvine

2 Sharad

Mehrotra

University of

California

Irvine

Person Grant Role

1 1 PI

2 1 coPI

2 2 PI

1 3 PI

Page 36: Network-based Visual Analysis of Tabular Data

Grant Role Person

1 PI 1

1 coPI 2

2 PI 2

3 PI 1

First-order Graph: Multiple Tables

36

GID Title Program ProMgr Amount Year

1 Data Mining of Digital

Behavior Statistics

Sylvia

Spengler 2241750 2001

2 Real-time Capture,

Management …

Information Tech-

nology Research

Maria

Zemankova 430000 2000

3 Statistical Data Mining

of Time-Dependent …

ITR for National

Priorities

Sylvia

Spengler 566644 2003

PID Name Org

1 Padhraic Smyth University of

California Irvine

2 Sharad Mehrotra University of

California Irvine

1 Padhraic Smyth University of

California Irvine 1 Padhraic Smyth

University of

California Irvine

2 Sharad Mehrotra University of

California Irvine 2 Sharad Mehrotra

University of

California Irvine

PID Name Org

Data Mining of Digital

Behavior Statistics

Sylvia

Spengler 2241750 2001

Real-time Capture,

Management …

Information Tech-

nology Research

Maria

Zemankova 430000 2000

Title Program ProMgr Amount Year

Statistical Data Mining

of Time-Dependent …

ITR for National

Priorities

Sylvia

Spengler 566644 2003

Data Mining of Digital

Behavior Statistics

Sylvia

Spengler 2241750 2001

Grant Role Person

1

PI 1

1

coPI 2

2

PI 2

3

PI 1

Page 37: Network-based Visual Analysis of Tabular Data

Open Issues with Multiple Tables (1)

• Join Specification

37

Page 38: Network-based Visual Analysis of Tabular Data

Open Issues with Multiple Tables (2)

• Interpretation of Edge Weights

38

Page 39: Network-based Visual Analysis of Tabular Data

Acknowledgments

39

IIS-0915788 VACCINE Center