Key-node-separated graph clustering and visualization Takayuki...

Preview:

Citation preview

Key-node-separated graph clustering and visualization

Takayuki ItohOchanomizu University, Japan

China-Japan Joint Visualization Workshop2017/7/24

Itoh Laboratory, Ochanomizu University

Career of the speaker

1

Itoh Laboratory, Ochanomizu University

• 1992 Researcher of IBM Tokyo Research Lab.– 1992 M.S. at Waseda Univ.– 1997 Ph. D. at Waseda Univ.– 2000 Visiting researcher at Carnegie Mellon Univ. (6 month)– 2003 (Concurrent) researcher at Kyoto Univ. (2 year)CAD / SciVis / InfoVis / Distributed Computing

• 2005 Professor at Ochanomizu Univ.– 2005 Associate professor– 2008 Visiting researcher at Univ. California Davis (2 month)– 2011 Full professorInfoVis / Multimedia / HCI / CG applications

Organizing …

2

Itoh Laboratory, Ochanomizu University

• International– 2014, 2018 IEEE PacificVis Organizing/General chair– 2015 VINCI General Chair– 2012-2015 ACM SAC Multimedia&Visualization Track Chair– 2016 ACM Advanced Visual Interface Associate Program Chair– 2018 ACM Intelligent User Interface Student Volunteer Chair

• Japanese Domestic– 2014-2016 Director of Society for Art and Science– 2015-2017 Chief Program Committee of Interaction Symposium– 2018-2019 Director of SIG on Interactive System & Software

Women-viewpoint projects

3

Skin measurement and synthesis

Music user interface

Cartoon icon generation

Crowdsourcing of photo retouch

Apparel product recommendation

Crowdsourcing ofwomen’s appearance evaluation

Itoh Laboratory, Ochanomizu University

Still implementing by myself

• Hierarchical data visualization

4

Itoh Laboratory, Ochanomizu University

Itoh et al., Hierarchical Data Visualization Using a Fast Rectangle-Packing Algorithm, TVCG 2004

Itoh et al., Hierarchical Visualization of Network Intrusion Detection Data in the IP Address Space, CG&A 2006

Still implementing by myself

• Network data visualization

5

Itoh Laboratory, Ochanomizu University

Itoh et al., Key-node-Separated Graph Clustering and Layout for Human Relationship Graph Visualization, CG&A 2015

Itoh et al., A Hybrid Space-Filling and Force-Directed Layout Method for Visualizing Multiple-Category Graphs, PacificVis 2009

Still implementing by myself

• High-dimensional data visualization

6

Itoh Laboratory, Ochanomizu University

Itoh et al., High-dimensional data visualization by interactive construction of low-dimensional parallel coordinate, JVLC 2017

Contents

• Graph Visualization Overview• Multiple-Category Graph Visualization• Key-Node-Separated Graph Visualization

– Concept & Algorithm– Experiment– On-going work

• General Discussion

7

Itoh Laboratory, Ochanomizu University

Contents

• Graph Visualization Overview• Multiple-Category Graph Visualization• Key-Node-Separated Graph Visualization

– Concept & Algorithm– Experiment– On-going work

• General Discussion

8

Itoh Laboratory, Ochanomizu University

Graph Visualization

• Computer-powered graph drawing

9

Itoh Laboratory, Ochanomizu University

Graph drawing … Long history

10

Itoh Laboratory, Ochanomizu University

Euler’s drawing in 1736 Ball’s abstract drawing in 1892

Graph drawing … Hard work

11

Itoh Laboratory, Ochanomizu University

Computer saved graph drawing

12

Itoh Laboratory, Ochanomizu University

• Automatic• Quick• Interactive• Publishable

Visual representation

• Node-link diagram • Matrix representation

13

Itoh Laboratory, Ochanomizu University

A

E

D C

B

A B C D E

A

B

C

D

E

Data types

14

Itoh Laboratory, Ochanomizu University

Undirected Directed

Unfixed

Fixed

Applications: Social analysis

15

Itoh Laboratory, Ochanomizu University

Applications: Bioinformatics

16

Itoh Laboratory, Ochanomizu University

Applications: Traffic/Communication

17

Itoh Laboratory, Ochanomizu University

Hairball problem

18

Itoh Laboratory, Ochanomizu University

Node layout (force-directed)

19

Itoh Laboratory, Ochanomizu University

Node clustering as a preprocess

20

Itoh Laboratory, Ochanomizu University

Edge bundling as a postprocess

21

Itoh Laboratory, Ochanomizu University

Interaction for graph visualization

• Immersive environment

22

Itoh Laboratory, Ochanomizu University

• Zooming interfaceFocus+Context

Complex data

23

Itoh Laboratory, Ochanomizu University

Time-varying or interactiveaddition / removal of

nodes or edges

AssociatedTime-varying / multivariate

values at nodes / edges

Coordinate view for complex data

24

Itoh Laboratory, Ochanomizu University

Graph visualization: summary

• Computer-powered graph drawing• Techniques

– Node layout– Node clustering– Edge bundling– Interactions

• Data types– Direction, Node positions

• Applications– Social analysis, Bioinformatics, Traffic, Communication, …

25

Itoh Laboratory, Ochanomizu University

Contents

• Graph Visualization Overview• Multiple-Category Graph Visualization• Key-Node-Separated Graph Visualization

– Concept & Algorithm– Experiment– On-going work

• General Discussion

26

Itoh Laboratory, Ochanomizu University

Contents

27

T. Itoh, C. Muelder, K.-L. Ma, J. Sese, A Hybrid Space-Fillingand Force-Directed Layout Method for Visualizing Multiple-Category Graphs, IEEE Pacific Visualization Symposium,pp. 121-128, 2009.

Itoh Laboratory, Ochanomizu University

The third place of the papers presented at IEEE PacificVis on the number of citations!

Definition: Multiple-Category Graph

• Graphs consisting of nodesbelonging to one or more categories

28

Itoh Laboratory, Ochanomizu University

},{ LNG },...,{ 1 nNnnN

},...,{ 1 nLllL

},...,{ 1 mi bbn

GraphNodesLinksA NodeA Link },{ qpi nnl

Category belonging information(array of boolean values)

Drawing example (color=category)Drawing example (color=category)

Easy example: Social Networking

• Node = Person• Link = Friendship• Category = Community/Keyword

29

Itoh Laboratory, Ochanomizu University

CakeCake ViolinViolinSoccerSoccer

Tight sub-networkTight sub-network

Multi-community personMulti-community person

Hub personHub person

Requirements for Visualization

• Place common-category nodes closer• Reduce:

– sum of lengths of edges– number of intersections among edges

• Avoid the cluttering of nodes• Maximize screen space utilization• Reduce the computation time

30

Itoh Laboratory, Ochanomizu University

Multiple-category graph specific

General

Requirements Satisfaction

• Place common-category nodes closer• Reduce:

– sum of lengths of edges– number of intersections among edges

• Avoid the cluttering of nodes• Maximize screen space utilization• Reduce the computation time

31

Itoh Laboratory, Ochanomizu University

Force-directed

Space-filling

Hybrid Space-Filling and Force-Directed Method

Hybrid Approach: Overview

Step1: Hierarchical clustering– Categorized & Non-categorized– Category & Connection based

Step 2: Layout– Rectangle packing for Non-categorized nodes– Hybrid for categorized nodes

Step 3: Interaction– Focus+Context– Category selection

32

Itoh Laboratory, Ochanomizu University

Step 1: Hierarchical Clustering

33

(1) Root

(2) Top of categorized nodes

(3) Clusters of categorized nodes

(4) Clusters of non-categorized nodes

(5) Categorized nodes (6) Non-categorized nodes

Itoh Laboratory, Ochanomizu University

Category & connection based clustering

Connection based clustering

Step 2: Layout

34

Itoh Laboratory, Ochanomizu University

Root

Top of categorized nodes

w

Clusters of categorized nodes

Clusters of non-categorized nodes

Categorized Non-categorized

Data structure Layout

Rectangle Packing Technique

• Originally for tree visualization [Itoh04][Itoh06]

• Treemap-like, but better on:– Aspect ratio– Flexible cluster positioning (by referring templates)

35

Itoh Laboratory, Ochanomizu University

Input Result

Tree

Template(Optional)

Switching Layout Algorithms

36

Itoh Laboratory, Ochanomizu University

Root

Top of categorized nodes

w

Clusters of categorized nodes

Clusters of non-categorized nodes

Categorized Non-categorized

Data structure

Switching Layout Algorithms

37

Itoh Laboratory, Ochanomizu University

Root

Top of categorized nodes

w

Clusters of categorized nodes

Clusters of non-categorized nodes

Simply apply rectangle packing

Consider both connection & category

Rectangle packing is not always good

New approach(Hybrid rectangle packing

& force directed)

Hybrid Layout for Categorized Nodes

38

Itoh Laboratory, Ochanomizu University

(b) Force-directed layoutfor cluster graph

(c) Positions as a template (d) Rectangle packing

(a) Clustering ofcategorized nodes

Edge weight is proportional to the

number of linksConnect if their

category is common, even if there is no links

Drawing Nodes

39

Colors of nodes denote categories

Categorized nodes are drawn as

colored circles

There are 3 levels of thickness and

transparency of links

Non-categorized nodes are drawn

as gray dots

Radii of nodes denote number of links

Three colors denote that the node belongs

to three categories

Itoh Laboratory, Ochanomizu University

Result (1) Zooming categorized nodes

40

Itoh Laboratory, Ochanomizu University

No cluster-cluttering

A

BC

A,B,C: Good concentrationof common-category nodes

D

D: Good for discoveryof isolated categories

Comparison

41

Itoh Laboratory, Ochanomizu University

Space-Filling (1)Space-Filling (1) Space-Filling (2)Space-Filling (2) ProposedProposed

1.3 (sec.) 1.2 (sec.) 4.7 (sec.)

* Force-directed … 267.5 (sec.)

Comparison

• Three criteria– Node distance, edge length, & num. intersection

42

0

0.5

1

1.5

2

2.5

3

3.5

Node distance Edge length Num. intersect.

S.-F. (1)S.-F. (2)Proposed

Itoh Laboratory, Ochanomizu University

* Relative values (1 for proposed technique)

Focus+Context

43

Itoh Laboratory, Ochanomizu University

Non-categorizednode concentration

Space distortionfor focus+context

Applied Active Biological Data

• Data– 6,152 genes (as nodes)– 7,564 gene-gene interactions (as edges)– 10 conditions of gene expression (as categories)

• Goal– Hub / Multi-functional gene discovery– Sub-network discovery by conditions

44

Itoh Laboratory, Ochanomizu University

Visualization Example

[A] Hub gene discovery

[B] Multi-functional gene discovery

[C] Well-divided sub-network discovery

45

Itoh Laboratory, Ochanomizu University

A

A

B

C

Visualization Example

[D,E,F] Separation of common-condition gene clusters

46

Itoh Laboratory, Ochanomizu University

D

D'E E'

F

F'

Contents

• Graph Visualization Overview• Multiple-Category Graph Visualization• Key-Node-Separated Graph Visualization

– Concept & Algorithm– Experiment– On-going work

• General Discussion

47

Itoh Laboratory, Ochanomizu University

Contents

48

Itoh Laboratory, Ochanomizu University

IEEE Computer Graphics & Application, 2015Organized by Xiaoru, Baoquan, Koji, & Issei

T. Itoh, K. Klein, Key-node-Separated Graph Clustering and Layout for Human Relationship Graph Visualization,IEEE Computer Graphics and Applications, Vol. 35, No. 6,pp. 30-40, 2015.

Contents

• Graph Visualization Overview• Multiple-Category Graph Visualization• Key-Node-Separated Graph Visualization

– Concept & Algorithm– Experiment– On-going work

• General Discussion

49

Itoh Laboratory, Ochanomizu University

Human Relationship Graph

• Node=Human, Edge=Relationship– Paper co-authorship– Friendship on SNS

• Characteristics– Authority persons– Topic-based clusters

50

Itoh Laboratory, Ochanomizu University

Human Relationship Graph

• Node=Human, Edge=Relationship– Paper co-authorship– Friendship on SNS

• Characteristics– Authority persons– Topic-based clusters

51

Itoh Laboratory, Ochanomizu University

Issues on node clustering

• Community finding schemes are well applied– Extracting subgraphs which have dense edges

• Issue: “key nodes” are involved in large clusters• Often we want to separate such nodes from clusters

52

Itoh Laboratory, Ochanomizu University

(a) Clustering based ondensity of connections

(1) (2)

(b) High-level drawingof the clustered graph (a)

Key node and many edgesare hidden inside clustersKey node and many edgesare hidden inside clusters

Issues on node clustering

• Community finding schemes are well applied– Extracting subgraphs which have dense edges

• Issue: “key nodes” are involved in large clusters• Often we want to separate such nodes from clusters

53

Itoh Laboratory, Ochanomizu University

(3)

(4)

(5)

(c) Clustering based oncommonality of neighbors

(d) High-level drawingof the clustered graph (c)

• Key nodes are more visible• Many edges are bundled• Key nodes are more visible• Many edges are bundled

Visualization with key-node-aware clustering

• Two metrics for node-to-node distances– (Dis-) Commonality of connected nodes– (Dis-) Similarity of feature vectors

• Used for:– Node clustering– Node layout

54

Itoh Laboratory, Ochanomizu University

Similar feature vectors=Persons with similar topics

Commonly connected nodes=Authority persons

Data Structure & Node Distance

55

Graph={Node, Edge}Node={n1, n2, …}Edge={e1, e2, …}

A node hasa feature vectorni={ai, ….}

An edge connectstwo nodes ei={ni1, ni2}

Data structure Node distance

∝ 1.0 ∝

Itoh Laboratory, Ochanomizu University

· /: feature vector of a node

→Similarity of topics

1.0/ 1: number of common adjacent nodes

→Connection to same persons

Clustering & Layout

56

Hierarchical clustering → Cluster Layout → Node Layout• Centroid method • MDS & Stress minimization

• Laplacian smoothing• Swapping in a circle

Itoh Laboratory, Ochanomizu University

Edge Bundling

57

(a) Edge bundling between two clusters of nodes

Node Node

Center ofcluster

Center ofcluster

Node Node

Center ofcluster

Center ofcluster

(b) Placement of control points of Bezier curves

Itoh Laboratory, Ochanomizu University

Before

After

Contents

• Graph Visualization Overview• Multiple-Category Graph Visualization• Key-Node-Separated Graph Visualization

– Concept & Algorithm– Experiment– On-going work

• General Discussion

58

Itoh Laboratory, Ochanomizu University

Example Dataset

• Paper co-authorship data– 564 papers by NBAF (NERC Biomolecular Analysis Facilities)– 1,821 nodes(=authors), 11,097 edges(=co-authorships)

• Feature vectors from paper titles– Frequency of 12 words for each author → 12 dim. Vector

• Computation time– 2.5 sec. for clustering– 8.8 sec. for node layout

59

Itoh Laboratory, Ochanomizu University

Genetic (Red), Molecular (Orange), Loci (Yellow),Microsatellites (Yellow green), Isolation (Green),Inbreeding (Blue green), Transcriptomics (Sky blue),Expression (Blue), Bacterial (Indigo),Breeding (Purple), Polymorphic (Pink)

Node Layout Example

• Clusters as circles → Nodes inside the circles• Colors according to feature vectors• Edge display control

60

Itoh Laboratory, Ochanomizu University

Clusters of nodes

Feature-based colors

Edges of a node

Key-node Separation from Large Clusters

61

Itoh Laboratory, Ochanomizu University

170 clusters, by our algorithm

* Color=degree of a node

Two key nodes are separatedfrom large clusters

Key-node Separation from Large Clusters

62

159 clusters, by common algorithm

Itoh Laboratory, Ochanomizu University

* Color=degree of a node

Two key nodes are involvedin a large cluster

Case Study with Co-authorship Graph

63

Itoh Laboratory, Ochanomizu University

Expression

Genetic

Isolation

Polymorphic

Molecular

* Colors are based on feature vectors

Genetic (Red), Molecular (Orange), Loci (Yellow),Microsatellites (Yellow green), Isolation (Green),Inbreeding (Blue green), Transcriptomics (Sky blue),Expression (Blue), Bacterial (Indigo),Breeding (Purple), Polymorphic (Pink)

Case Study with Co-authorship GraphItoh Laboratory, Ochanomizu University

* Colors are based on degree of nodes

Case Study with Co-authorship GraphItoh Laboratory, Ochanomizu University

* Colors are based on degree of nodesCluster A

Case Study with Co-authorship Graph

66

Itoh Laboratory, Ochanomizu University

One of the key persons in cluster A:Many connections with particular fields of people

Expression

Genetic

Molecular

Polymorphic

Breeding

Transcriptomics

Loci

Case Study with Co-authorship Graph

67

Itoh Laboratory, Ochanomizu University

Another key person in cluster A:More variety of connections with many fields of people

Expression

Genetic

Molecular

Polymorphic

Breeding

Transcriptomics

Loci

Case Study with Co-authorship GraphItoh Laboratory, Ochanomizu University

* Colors are based on degree of nodes

Case Study with Co-authorship GraphItoh Laboratory, Ochanomizu University

* Colors are based on degree of nodes

Cluster B

Cluster C

Case Study with Co-authorship Graph

70

Itoh Laboratory, Ochanomizu University

A key person in cluster B:Many connections with other fields of people

Case Study with Co-authorship Graph

71

Itoh Laboratory, Ochanomizu University

A key person in cluster C:Many connections with uncolored fields of people

Numeric comparison

Num. clusters Num. nodes of clustersof two key nodes

Num. edgesinside clusters

Ours (1) 813 4, 4 5964Ours (2) 354 4, 4 5421Ours (3) 264 4, 4 5868Ours (4) 170 9, 9 6141Common 159 33, 54 8214

72

Itoh Laboratory, Ochanomizu University

Successfully separatedkey nodes from large clusters

Smaller number of edgesare hidden inside clusters

Subjective Evaluation

Data 1(ours)

Data 1(common)

Data 2(ours)

Data 2(common)

Q: Interested in key nodes?

9 4 10 3

Q: Find the clusters connected to key nodes?

9 4 7 6

Q: Find the number of nodes connected to key nodes?

10 3 8 5

73

Itoh Laboratory, Ochanomizu University

Participants: 13 university students in computer science

Contents

• Graph Visualization Overview• Multiple-Category Graph Visualization• Key-Node-Separated Graph Visualization

– Concept & Algorithm– Experiment– On-going work

• General Discussion

74

Itoh Laboratory, Ochanomizu University

On-going work: for directed graphs

75

Itoh Laboratory, Ochanomizu University

[Toeda16]

Three types of bundling for directed graphs

76

A. Edges betweentwo clusters of nodes

B. Edges in the pair of bidirectionalbundles connecting to thesame pair of clusters

C. Edges of pairs of bundles whichstart or end at the same cluster

Itoh Laboratory, Ochanomizu University

[Toeda16]

Potential Applications

77

Itoh Laboratory, Ochanomizu University

Paper Citation Network Music Ordering

Graph Drawing

Time-Varying

Visual Analytics

GPU-based

Tree Evaluation

Pops

Rock

Slow Ballad

Blues Dance

Jazz

Contents

• Graph Visualization Overview• Multiple-Category Graph Visualization• Key-Node-Separated Graph Visualization

– Concept & Algorithm– Experiment– On-going work

• General Discussion

78

Itoh Laboratory, Ochanomizu University

Current research situation

• Slowly growing research field– Constant number of papers every year– But still remaining many essential problems– Recent meetings of theoretical and application people

(e.g. NII Shonan meeting in 2015/01 and 2016/08)• Mostly used by professional people

– Social analysis, bioinformatics, system monitoring, …– Less popular applications for general users

• General software vs. Application-oriented tool

79

Itoh Laboratory, Ochanomizu University

My interested directions

• Numeric evaluation of comprehensibility– Are mathematically beautiful visualization results

really good?• Automatic algorithm selection

– Especially node clustering and layout algorithms• User interfaces

– Newer devices (touch panels, VR/AR, …)– Voice processing, Natural language processing

• New application development

80

Itoh Laboratory, Ochanomizu University

Recommended