23
Graph Mining: Applications Karel Vaculík 1,2 1 KDLab, Faculty of Informatics Masaryk University, Brno 2 Gauss Algorithmic s.r.o., Brno WIKT & Data a Znalosti 2016

Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

Graph Mining: Applications

Karel Vaculík1,2

1KDLab, Faculty of Informatics Masaryk University, Brno2Gauss Algorithmic s.r.o., Brno

WIKT & Data a Znalosti 2016

Page 2: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

Graph mining = data mining from graph (network) data

G = (V, E)

Introduction

2

Page 3: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

1. Classification of Nodes

2. Anomaly Detection in Recommendation Networks

3. Anomaly Detection in Communication Networks

4. Community Detection in Voice-Call Networks

Outline

3

Page 4: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

1. Classification of Nodes

S. Nandanwar et al.: Structural Neighborhood Based Classification of Nodes in a Network. SIGKDD 2016. 4

Page 5: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

• Structural neighborhood-based classifier (SNBC)

• Assumes homophily and densely connected nodes to be correlated

• Structured random walk approachwith variable termination probability

• Low- and high-degree nodes havelesser discriminative power

1. Classification of Nodes

S. Nandanwar et al.: Structural Neighborhood Based Classification of Nodes in a Network. SIGKDD 2016. 5

Page 6: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

Datasets (multilabeled):

• Youtube – users in friendship relation; groups of interests as classes

• PubMed – citation network; three classes of publications

• CoRA – citation network; research topics

• IMDb – movies connected according to actor similarity; genres

• Amazon Books – similar books connected; genres

• Wikipedia (comp. science) – intralinks; 16 top-level categories

1. Classification of Nodes

S. Nandanwar et al.: Structural Neighborhood Based Classification of Nodes in a Network. SIGKDD 2016. 6

Page 7: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

SNBC results

10% of nodes

for training

1. Classification of Nodes

S. Nandanwar et al.: Structural Neighborhood Based Classification of Nodes in a Network. SIGKDD 2016. 7

Page 8: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

2. Anomaly Detection in Recommendation Networks

B. Perozzi et al.: When Recommendation Goes Wrong - Anomalous Link Discovery in Recommendation Networks. SIGKDD 2016. 8

Page 9: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

• Supervised learning

• Semantic information from Google Knowledge Graph

• Structural (network) features from Related Places Graph• places connected according to their similarity / closeness

2. Anomaly Detection in Recommendation Networks

B. Perozzi et al.: When Recommendation Goes Wrong - Anomalous Link Discovery in Recommendation Networks. SIGKDD 2016. 9

Page 10: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

Examples of anomalies

detected by the system:

2. Anomaly Detection in Recommendation Networks

B. Perozzi et al.: When Recommendation Goes Wrong - Anomalous Link Discovery in Recommendation Networks. SIGKDD 2016. 10

Page 11: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

• ENRON dataset

• employees sending emails• emails sent in one day => graph (snapshot)

• Series of snapshots (∼ 900) = a dynamic graph

3. Anomaly Detection in Communication Networks

K. Vaculík and L. Popelínský: DGRMiner: Anomaly Detection and Explanation in Dynamic Graphs. IDA 2016. 11

Page 12: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

3. Anomaly Detection in Communication Networks

K. Vaculík and L. Popelínský: DGRMiner: Anomaly Detection and Explanation in Dynamic Graphs. IDA 2016.

DAY 678 DAY 679

ENRON emails

12

Page 13: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

DGRMiner (Dynamic Graph Rule Miner):

1. Mine frequent graph patterns (in the form of rules)

2. Mine patterns deviating from the frequent ones

3. Anomaly Detection in Communication Networks

K. Vaculík and L. Popelínský: DGRMiner: Anomaly Detection and Explanation in Dynamic Graphs. IDA 2016. 13

Page 14: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

Examples of anomalies:

3. Anomaly Detection in Communication Networks

K. Vaculík and L. Popelínský: DGRMiner: Anomaly Detection and Explanation in Dynamic Graphs. IDA 2016. 14

Page 15: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

Examples of anomalies:

3. Anomaly Detection in Communication Networks

K. Vaculík and L. Popelínský: DGRMiner: Anomaly Detection and Explanation in Dynamic Graphs. IDA 2016. 15

Page 16: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

Voice-call network

• Vertices = people (not only customers)

• Edges = aggregated calls

• Weights ≈ call traffic volume

• Edges with low traffic filtered out

• ∼ 1.8M vertices

• ∼ 2.8M edges

4. Community Detection in Voice-Call Networks

16

Page 17: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

Degree distribution

4. Community Detection in Voice-Call Networks

17

Page 18: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

• Community detection by using weighted LPA

• ∼ 400K communities found

• Distribution of community sizes:

4. Community Detection in Voice-Call Networks

18

One large community of size∼ 13 000 is omitted in the graph

Page 19: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

19

Example communities of size 8

Page 20: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

20

Sample subgraph withcommunities

Page 21: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

Stability of the communities

• How fast communities change• after one week

• after one month

• ...

• Adjusted Rand index used for comparison

4. Community Detection in Voice-Call Networks

21

Page 22: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

Utilization of detected communities:

• Churn prediction

• Detection of leaders in communities

• Identification of customers related to high-value customers

• Better offers for customers

• Households detection

4. Community Detection in Voice-Call Networks

22

Page 23: Graph Mining: Applications - stuba.sk · •Semantic information from Google Knowledge Graph •Structural (network) features from Related Places Graph •places connected according

Plenty of real-world graphs and associated tasks:

• classification (web pages; chemical compounds)

• clustering (community detection)

• link prediction (friendship suggestion)

• ranking (webpages)

• anomaly detection (fraud detection)

• pattern mining (graph DB indexing)

• ...

Summary

23