54
Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos Faloutsos (CMU) SDM'12 Tutorial

Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

  • Upload
    vanque

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Discovering Roles and Anomalies in Graphs:

Theory and Applications

Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos Faloutsos (CMU)

SDM'12 Tutorial

Page 2: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

T. Eliassi-Rad & C. Faloutsos 2

Overview

SDM'12 Tutorial

Roles

Features

Anomalies

Patterns

= rare roles

Page 3: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

T. Eliassi-Rad & C. Faloutsos 3

Overview

SDM'12 Tutorial

Roles

Features

Anomalies

Patterns

= rare roles

Page 4: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Roadmap •  What are roles •  Roles and communities •  Roles and equivalences (from sociology) •  Roles (from data mining) •  Summary

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 4

Page 5: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

What are roles? •  “Functions” of nodes in the network

– Similar to functional roles of species in ecosystems

•  Measured by structural behaviors •  Examples

–  centers of stars

– members of cliques –  peripheral nodes

– … SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 5

Page 6: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Example of Roles

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 6

centers of stars members of cliques peripheral nodes

Page 7: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Why are roles important? Task Use Case Role query Identify individuals with similar

behavior to a known target

Role outliers Identify individuals with unusual behavior

Role dynamics Identify unusual changes in behavior

Identity resolution Identify known individuals in a new network

Role transfer Use knowledge of one network to make predictions in another

Network comparison

Determine network compatibility for knowledge transfer

Role Discovery

ü Automated discovery

ü Behavioral roles

ü Roles generalize

Page 8: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Roadmap •  What are roles •  Roles and communities •  Roles and equivalences (from sociology) •  Roles (from data mining) •  Summary

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 8

Page 9: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Roles and Communities •  Roles group nodes with similar structural

properties •  Communities group nodes that are well-

connected to each other •  Roles and communities are complementary

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 9

Page 10: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Roles and Communities

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 10

RolX * Fast Modularity†

* Henderson, et al. 2012; † Clauset, et al. 2004

Page 11: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Roles and Communities

•  Roles – Faculty

– Staff – Students

– …

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 11

•  Communities – AI lab

– Database lab – Architecture lab

– …

Consider the social network of a CS dept

Page 12: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Roadmap •  What are roles •  Roles and communities •  Roles and equivalences (from sociology) •  Roles (from data mining) •  Summary

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 12

Page 13: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Equivalences

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 13

Equivalences

Deterministic

Regular

Automorphic

Structural

Probabilistic Stochastic

Page 14: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Deterministic Equivalences

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 14

Regular

Automorphic

Structural

Page 15: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Structural Equivalence •  [Lorrain & White, 1971] •  Two nodes u and v are structurally

equivalent if they have the same relationships to all other nodes

•  Hypothesis: Structurally equivalent nodes are likely to be similar in other ways – i.e., you are your friend

•  Weights & timing issues are not considered •  Rarely appears in real-world networks

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 15

u v

d e

a b c

Page 16: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Structural Equivalence: Algorithms •  CONCOR (CONvergence of iterated CORrelations)

[Breiger et al. 1975] •  A hierarchical divisive approach

1.  Starting with the adjacency matrix, repeatedly calculate Pearson correlations between rows until the resultant correlation matrix consists of +1 and -1 entries

2.  Split the last correlation matrix into two structurally equivalent submatrices (a.k.a. blocks): one with +1 entries, another with -1 entries

•  Successive split can be applied to submatrices in order to produce a hierarchy (where every node has a unique position)

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 16

Page 17: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Structural Equivalence: Algorithms •  STRUCUTRE [Burt 1976] •  A hierarchical agglomerative approach

1.  For each node i, create its ID vector by concatenating its row and column vectors from the adjacency matrix

2.  For every pair of nodes 〈i, j〉, measure the square root of sum of squared differences between the corresponding entries in their ID vectors

3.  Merge entries in hierarchical fashion as long as their difference is less than some threshold α

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 17

Page 18: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Structural Equivalences: Algorithms •  Combinatorial optimization approaches

–  Numerical optimization with tabu search [UCINET]

–  Local optimization [Pajek]

•  Partition the sociomatrices into blocks based on a cost function that minimizes the sum of within block variances

–  Basically, minimize the sum of code cost within each block

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 18

Page 19: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Deterministic Equivalences

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 19

Regular

Automorphic

Structural

Page 20: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Automorphic Equivalence •  [Borgatti, et al. 1992; Sparrow 1993] •  Two nodes u and v are automorphically equivalent if all

the nodes can be relabeled to form an isomorphic graph with the labels of u and v interchanged –  Swapping u and v (possibly

along with their neighbors) does not change graph distances

•  Two nodes that are automorphically equivalent share exactly the same label-independent properties

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 20

u v

Page 21: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Automorphic Equivalence: Algorithms •  Sparrow (1993) proposed an algorithm that scales linearly to the

number of edges

•  Use numerical signatures on degree sequences of neighborhoods

•  Numerical signatures use a unique transcendental number like π, which is independent of any permutation of nodes

•  Suppose node i has the following degree sequence: 1, 1, 5, 6, and 9. Then its signature is Si,1 = (1 + π)(1 + π) (5 + π) (6 + π) (9 + π)

•  The signature for node i at k+1 hops is Si,(k+1) = Π(Si,k + π)

•  To find automorphic equivalence, simply compare numerical signatures of nodes

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 21

Page 22: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Deterministic Equivalences

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 22

Regular

Automorphic

Structural

Page 23: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Regular Equivalence •  [Everett & Borgatti, 1992] •  Two nodes u and v are regularly equivalent if they

are equally related to equivalent others

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 23

+Regular Equivalenceg q

� Introduced by Borgatti (1996)

� Definition:� Two actors are regularly equivalent if they are equally related to

i l t thequivalent others

President Motes

Faculty

Graduate Students

Hanneman, Robert A. and Mark Riddle. 2005. Introduction to social network methods. Riverside, CA: University of California, Riverside ( published in digital form at http://faculty.ucr.edu/~hanneman/ )

Page 24: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Regular Equivalence (continued)

•  Basic roles of nodes –  source

–  repeater –  sink

–  isolate

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 24

Page 25: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Regular Equivalence (continued)

•  Based solely on the social roles of neighbors

•  Interested in –  Which nodes fall in which social roles?

–  How do social roles relate to each other?

•  Hard partitioning of the graph into social roles

•  A given graph can have more than one valid regular equivalence set

•  Exact regular equivalences can be rare in large graphs

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 25

Page 26: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Regular Equivalence: Algorithms •  Many algorithms exist here •  Basic notion

– Profile each node’s neighborhood by the presence of nodes of other "types"

– Nodes are regularly equivalent to the extent that they have similar "types" of other nodes at similar distances in their neighborhoods

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 26

Page 27: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Equivalences

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 27

Equivalences

Deterministic

Regular

Automorphic

Structural

Probabilistic Stochastic

Page 28: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Stochastic Equivalence •  [Holland, et al. 1983;

Wasserman & Anderson, 1987]

•  Two nodes are stochastically equivalent if they are “exchangeable” w.r.t. a probability distribution

•  Similar to structural equivalence but probabilistic

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 28

u v

a

b

p(a,v)

p(u,b) p(v,b)

p(a,u)

p(a,u) = p(a,v) p(u,b)= p(v,b)

Page 29: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Stochastic Equivalence: Algorithms •  Many algorithms exist here •  Most recent approaches are generative

[Airoldi, et al 2008] •  Some choice points

–  Single [Kemp, et al 2006] vs. mixed-membership [Koutsourelakis & Eliassi-Rad, 2008] equivalences (a.k.a. “positions”)

–  Parametric vs. non-parametric models

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 29

Page 30: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Roadmap •  What are roles •  Roles and communities •  Roles and equivalences (from sociology) •  Roles (from data mining) •  Summary

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 30

Page 31: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

RolX: Role eXtraction •  Introduced by Henderson, et al. 2011b •  Automatically extracts the underlying roles

in a network – No prior knowledge required

•  Assigns a mixed-membership of roles to each node

•  Scales linearly on the number of edges

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 31

Page 32: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

RolX: Flowchart

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 32

Node × Node Matrix

Recursive Feature

Extraction

Node × Feature Matrix

Role Extraction

Node × Role Matrix

Role × Feature Matrix

Page 33: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

RolX: Flowchart

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 33

Node × Node Matrix

Recursive Feature

Extraction

Node × Feature Matrix

Role Extraction

Node × Role Matrix

Role × Feature Matrix

Example: degree, avg weight, # of edges in egonet, mean clustering coefficient of neighbors, etc

Page 34: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Recursive Feature Extraction •  ReFeX [Henderson, et al. 2011a] turns network connectivity into

recursive structural features

•  Neighborhood features: What is your connectivity pattern? •  Recursive Features: To what kinds of nodes are you connected? SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 34

ReFeX

Local Egonet Recursive

Neighborhood

Regional

1411# 0# 1# 2# 1# 0# 0# 0# 1# 1# 0# 1# 0# 0# 1# 1# 2# 2#1410# 0# 1# 1# 1# 0# 1# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1#338# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 1# 0# 0# 0#339# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0#1415# 0# 1# 1# 2# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 1# 1# 1#941# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0#1414# 0# 1# 1# 1# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1#942# 0# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0#1413# 0# 1# 1# 1# 0# 1# 1# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1#1412# 0# 0# 0# 0# 0# 0# 0# 1# 2# 0# 1# 1# 0# 0# 1# 2# 0#940# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 1# 0# 1# 1# 1#1419# 0# 0# 1# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 1#945# 0# 1# 4# 3# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 1# 3# 1#332# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0#1418# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 2# 0# 1# 0# 1#946# 0# 1# 1# 0# 0# 1# 0# 1# 0# 0# 0# 1# 4# 0# 1# 1# 2#333# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0#1417# 0# 1# 1# 1# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1#943# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0#330# 1# 3# 2# 0# 1# 2# 2# 0# 2# 2# 2# 0# 3# 1# 0# 2# 5#1416# 0# 1# 1# 1# 1# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 1# 1#944# 0# 1# 4# 2# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 3# 1#331# 0# 3# 2# 1# 0# 1# 0# 0# 2# 0# 2# 0# 2# 0# 1# 2# 5#949# 0# 0# 0# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 0# 0# 0#336# 0# 0# 0# 0# 2# 0# 0# 1# 1# 1# 1# 1# 0# 0# 0# 1# 0#337# 1# 1# 1# 0# 0# 1# 2# 0# 1# 1# 1# 0# 1# 1# 1# 1# 1#947# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0#334# 0# 0# 0# 1# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0#948# 0# 0# 0# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 0#335# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0#531# 1# 0# 0# 0# 1# 0# 2# 0# 0# 2# 0# 0# 0# 2# 0# 0# 0#

Nod

es

Page 35: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Propositionalisation (PROP) •  [Knobbe, et al. 2001; Neville, et al. 2003; Krogel, et al.

2003]

•  From multi-relational data mining with roots in Inductive Logic Programming (ILP)

•  Summarizes a multi-relational dataset (stored in multiple tables) into a propositional dataset (stored in a single “target” table)

•  Derived attribute-value features describe properties of individuals

•  Related more to recursive structural features than structural roles

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 35

Page 36: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Role Extraction

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 36

Features 1411# 0# 1# 2# 1# 0# 0# 0# 1# 1# 0# 1# 0# 0# 1# 1# 2# 2#1410# 0# 1# 1# 1# 0# 1# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1#338# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 1# 0# 0# 0#339# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0#1415# 0# 1# 1# 2# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 1# 1# 1#941# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0#1414# 0# 1# 1# 1# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1#942# 0# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0#1413# 0# 1# 1# 1# 0# 1# 1# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1#1412# 0# 0# 0# 0# 0# 0# 0# 1# 2# 0# 1# 1# 0# 0# 1# 2# 0#940# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 1# 0# 1# 1# 1#1419# 0# 0# 1# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 1#945# 0# 1# 4# 3# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 1# 3# 1#332# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0#1418# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 2# 0# 1# 0# 1#946# 0# 1# 1# 0# 0# 1# 0# 1# 0# 0# 0# 1# 4# 0# 1# 1# 2#333# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0#1417# 0# 1# 1# 1# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1#943# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0#330# 1# 3# 2# 0# 1# 2# 2# 0# 2# 2# 2# 0# 3# 1# 0# 2# 5#1416# 0# 1# 1# 1# 1# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 1# 1#944# 0# 1# 4# 2# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 3# 1#331# 0# 3# 2# 1# 0# 1# 0# 0# 2# 0# 2# 0# 2# 0# 1# 2# 5#949# 0# 0# 0# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 0# 0# 0#336# 0# 0# 0# 0# 2# 0# 0# 1# 1# 1# 1# 1# 0# 0# 0# 1# 0#337# 1# 1# 1# 0# 0# 1# 2# 0# 1# 1# 1# 0# 1# 1# 1# 1# 1#947# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0#334# 0# 0# 0# 1# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0#948# 0# 0# 0# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 0#335# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0#531# 1# 0# 0# 0# 1# 0# 2# 0# 0# 2# 0# 0# 0# 2# 0# 0# 0#

Nod

es Recursively

extract roles

Automatically factorize roles

Page 37: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Automatically Discovered Roles

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 37

Network Science Co-authorship Graph

[Newman 2006]

Page 38: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 38

Making Sense of Roles

cliquey bridge periphery isolated

0

2

4

6

8

10

12

14 D

egre

e

Avg.

Wei

ght

Clu

ster

ing

Coe

ff.

Ecc

entri

city

Pag

eRan

k

Gat

eKee

per

Gat

eKee

per-

Loca

l

Piv

ot

Stru

ctur

al H

ole

Rol

e 1

Rol

e 2

Rol

e 3

Rol

e 4

NodeSense NeighborSense

Role 1 Role 2 Role 3 Role 4 Default

Page 39: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Mixed-Membership over Roles

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 39

Bright blue nodes are peripheral nodes

Bright red nodes are locally central nodes

Amazon Political Books Co-purchasing Network [V. Krebs 2000]

conservative liberal

neutral

Page 40: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Role Query

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 40

Node Similarity for M.E.J. Newman (bridge)

Node Similarity for F. Robert (cliquey)

Node Similarity for J. Rinzel (isolate)

Page 41: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Roles Generalize across Disjoint Networks

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 41

0

0.2

0.4

0.6

0.8

1

A1-

B

A2-

B

A3-

B

A4-

B

A-B

Cla

ssifi

catio

n A

ccur

acy

TrainSet-TestSet

RolX Feat Default

Figure 4: RolX provides better generalization per-

formance between enterprise IP networks A and

B (mean accuracy of RolX =85%, Feat=71%, p-

value=0.01).

0.35

0.45

0.55

0.65

0.75

0.85

0.95

0 10 20 30 40 50 60 70

Mea

n A

ccur

acy

Number of Roles

Figure 5: RolX chooses a high accuracy model size

of 9 roles, in the middle of the peak accuracy range

of 7-11 roles. The Y-axis depicts the mean classifi-

cation accuracy using RolX (over all 4 test sets) by

model size.

is right in the middle of the peak accuracy range shown inFigure 5. Figure 6a shows that the model selection crite-rion used by RolX is highly correlated with classificationaccuracy (Pearson correlation is -0.91). Figure 6b shows thedefault RolX model selection criterion decomposed into itsconstituent parts. Model cost is the cost associated withrepresenting the model itself while error cost is the cost ofrepresenting the di↵erences between the original feature val-ues and the estimated feature values reconstructed using themodel. As expected, we see a consistent increase in modelcost and a consistent decrease in error cost as the numberof roles increases.

Figure 7 shows that IP tra�c classes are well-separatedin the RolX “role space”, with as few as 3 roles (extractedfrom the original 373 structural features). Note that weachieve even better separation with the automatically se-lected model size of 9 roles (see Figure 4), but we can onlyclearly visualize up to 3.

We omit for brevity the “sensemaking” table for the 3-roleIP experiment. It shows that Roles 1 and 2 are lower-volumeIPs while Role 3 is high-volume servers or P2P nodes. Role3 contains nodes of all three types (Web, DNS, P2P). ThisRole is overloaded since the model size of 3 is not as predic-tive as larger model sizes (see Figure 5).

Reality Mining Device data: Using the Reality MiningDevice dataset, we conducted two sets of transfer learningexperiments. The first set of experiments involves a binaryclassification task where we try to predict whether a givensubject is a business school student or not. The second

2

3

4 5

6

7 8

9 10

11

12 13 15

30

0.7

0.75

0.8

0.85

0.9

3800000 4300000 4800000 5300000 5800000 6300000 6800000

Mea

n C

lass

ific

atio

n A

ccur

acy

Description Length

(a)

0 1 2 3 4 5 6 7 8 9

0 5 10 15 20 25 30 35 Des

crip

tion

Len

gth

(mill

ions

of

bits

)

Number of Roles

Total Cost Model Cost Error Cost

(b)

Figure 6: RolX ’s model selection is e↵ective: (a)

Classification accuracy is highest when RolX selec-

tion criterion is minimized. Red markers indicate

the peak performing model sizes of 7-11 roles (b)

RolX ’s model selection criterion balances model size

and reconstruction accuracy.

set is similar, where we try to predict whether a subjectis a graduate student in the Media Lab or not. As trainand test sets, we used each pair of consecutive months inour dataset. In Figure 8, we show the accuracy of RolX .The Baseline is a classifier that learns to always predict themajority class of the training set on the test set. The timelabels denote the month for the train data, and the monthfollowing that is used as the test data. We also use all thedata in 2004 and 2005 as train and test data, respectively.Notice that RolX outperforms the baseline classifier most ofthe time with an average of 83% and 76% accuracy for thetwo experiment sets, respectively. We notice that RolX ’saccuracy drops when September and May data is used astraining, possibly because these months correspond to thestart and end of the school semesters; the behavior of thesubjects would be generally di↵erent than usual in thesemonths, thus providing not as much predictive informationas the other months would.

4. STRUCTURAL SIMILARITYHere we describe experiments in which RolX is used for

its most basic task: grouping nodes based on their structuralsimilarity.

4.1 Network Sciences Coauthorship data:Our first data set is the weighted Network Science Co-

authorship Graph with 1589 authors (from the network sci-ence community) and 2743 weighted edges [25]. Figure 9shows (a) the role-colored graph (where each node is col-

Page 42: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Roles Generalize across Networks

SDM'12 Tutorial

Thu Network

Wed Network

Tue Network

Feature Discovery

Feature Extraction

Feature Extraction

Inference Learning

Regression Regression

Inference

RolX

C3 C2

V1

G1 G2

V2 V3

G3

C1

L

F

M

E.g., degree, avg wgt, etc

V: (node × feature) matrix

G: (node × role) matrix

F: (role × feature) matrix

L: List of feature names

C: Class labels

M: model

Discovery Stage

42

Page 43: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Roles Generalize across Networks

43

Thu Network

Wed Network

Tue Network

Feature Discovery

Feature Extraction

Feature Extraction

Inference Learning

Regression Regression

Inference

RolX

C3 C2

V1

G1 G2

V2 V3

G3

C1

L

F

M

V: (node × feature) matrix

G: (node × role) matrix

F: (role × feature) matrix

L: List of feature names

C: Class labels

M: model

E.g., degree, avg wgt, etc

Discovery Stage Application Stage

Page 44: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Roles: Regular Equivalence vs. RolX RolX Regular Equivalence

Mixed-membership over roles ✓ Fully automatic ✓

Uses structural features ✓ Uses structure ✓ ✓

Generalizable across disjoint networks

?

Scalable (linear on # of edges) ✓

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 44

Page 45: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Roadmap •  What are roles •  Roles and communities •  Roles and equivalences (from sociology) •  Roles (from data mining) •  Summary

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 45

Page 46: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Summary •  Roles

–  Structural behavior (“function”) of nodes

–  Complementary to communities

–  Previous work mostly in sociology under equivalences

–  Recent graph mining work produces mixed-membership roles, is fully automatic and scalable

–  Can be used for many tasks: transfer learning, re-identification, node dynamics, etc

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 46

Page 47: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

Acknowledgement •  LLNL: Brian Gallagher, Keith Henderson •  IBM Watson: Hanghang Tong •  Google: Sugato Basu •  CMU: Leman Akoglu, Danai Koutra, Lei Li

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 47

Thanks to: LLNL, NSF, IARPA.

Page 48: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

References Deterministic Equivalences

•  S. Boorman, H.C. White: Social Structure from Multiple Networks: II. Role Structures. American Journal of Sociology, 81:1384-1446, 1976.

•  S.P. Borgatti, M.G. Everett: Notions of Positions in Social Network Analysis. In P. V. Marsden (Ed.): Sociological Methodology, 1992:1–35.

•  S.P. Borgatti, M.G. Everett, L. Freeman: UCINET IV, 1992.

•  S.P. Borgatti, M.G. Everett, Regular Blockmodels of Multiway, Multimode Matrices. Social Networks, 14:91-120, 1992.

•  R. Breiger, S. Boorman, P. Arabie: An Algorithm for Clustering Relational Data with Applications to Social Network Analysis and Comparison with Multidimensional Scaling. Journal of Mathematical Psychology, 12:328-383, 1975.

•  R.S. Burt: Positions in Networks. Social Forces, 55:93-122, 1976.

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 48

Page 49: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

References •  P. DiMaggio: Structural Analysis of Organizational Fields: A Blockmodel Approach.

Research in Organizational Behavior, 8:335-70, 1986.

•  P. Doreian, V. Batagelj, A. Ferligoj: Generalized Blockmodeling. Cambridge University Press, 2005.

•  M.G. Everett, S. P. Borgatti: Regular Equivalence: General Theory. Journal of Mathematical Sociology, 19(1):29-52, 1994.

•  K. Faust, A.K. Romney: Does Structure Find Structure? A critique of Burt's Use of Distance as a Measure of Structural Equivalence. Social Networks, 7:77-103, 1985.

•  K. Faust, S. Wasserman: Blockmodels: Interpretation and Evaluation. Social Networks, 14:5–61. 1992.

•  R.A. Hanneman, M. Riddle: Introduction to Social Network Methods. University of California, Riverside, 2005.

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 49

Page 50: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

References •  F. Lorrain, H.C. White: Structural Equivalence of Individuals in Social Networks.

Journal of Mathematical Sociology, 1:49-80, 1971.

•  L.D. Sailer: Structural Equivalence: Meaning and Definition, Computation, and Applications. Social Networks, 1:73-90, 1978.

•  M.K. Sparrow: A Linear Algorithm for Computing Automorphic Equivalence Classes: The Numerical Signatures Approach. Social Networks, 15:151-170, 1993.

•  S. Wasserman, K. Faust: Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.

•  H.C. White, S. A. Boorman, R. L. Breiger: Social Structure from Multiple Networks I. Blockmodels of Roles and Positions. American Journal of Sociology, 81:730-780, 1976.

•  D.R. White, K. Reitz: Graph and Semi-Group Homomorphism on Networks and Relations. Social Networks, 5:143-234, 1983.

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 50

Page 51: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

References Stochastic blockmodels

•  E.M. Airoldi, D.M. Blei, S.E. Fienberg, E.P. Xing: Mixed Membership Stochastic Blockmodels. Journal of Machine Learning Research, 9:1981-2014, 2008.

•  P.W. Holland, K.B. Laskey, S. Leinhardt: Stochastic Blockmodels: Some First Steps. Social Networks, 5:109-137, 1983.

•  C. Kemp, J.B. Tenenbaum, T.L. Griffiths, T. Yamada, N. Ueda: Learning Systems of Concepts with an Infinite Relational Model. AAAI 2006.

•  P.S. Koutsourelakis, T. Eliassi-Rad: Finding Mixed-Memberships in Social Networks. AAAI Spring Symposium on Social Information Processing, Stanford, CA, 2008.

•  K. Nowicki ,T. Snijders: Estimation and Prediction for Stochastic Blockstructures, Journal of the American Statistical Association, 96:1077–1087, 2001.

•  Z. Xu, V. Tresp, K. Yu, H.-P. Kriegel: Infinite Hidden Relational Models. UAI 2006.

•  S. Wasserman, C. Anderson: Stochastic a Posteriori Blockmodels: Construction and Assessment, Social Networks, 9:1-36, 1987.

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 51

Page 52: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

References Role Discovery

•  K. Henderson, B. Gallagher, L. Li, L. Akoglu, T. Eliassi-Rad, H. Tong, C. Faloutsos: It's Who Your Know: Graph Mining Using Recursive Structural Features. KDD 2011: 663-671.

•  K. Henderson, B. Gallagher, T. Eliassi-Rad, H. Tong, S. Basu, L. Akoglu, D. Koutra, L. Li, C. Faloutsos: RolX: Structural Role Extraction & Mining in Large Graphs. Technical Report, Lawrence Livermore National Laboratory, Livermore, CA, 2011.

•  Ruoming Jin, Victor E. Lee, Hui Hong: Axiomatic ranking of network role similarity. KDD 2011: 922-930.

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 52

Page 53: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

References Community Discovery

•  A. Clauset, M.E.J. Newman, C. Moore: Finding Community Structure in Very Large Networks. Phys. Rev. E., 70:066111, 2004.

•  M.E.J. Newman: Finding Community Structure in Networks Using the Eigenvectors of Matrices. Phys. Rev. E., 74:036104, 2006.

Propositionalisation

•  A.J. Knobbe, M. de Haas, A. Siebes: Propositionalisation and Aggregates. PKDD 2001: 277-288.

•  M.-A. Krogel, S. Rawles, F. Zelezny, P.A. Flach, N. Lavrac, S. Wrobel: Comparative Evaluation of Approaches to Propositionalization. ILP 2003: 197-214.

•  J. Neville, D. Jensen, B. Gallagher: Simple Estimators for Relational Bayesian Classifiers. ICDM 2003: 609-612.

SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 53

Page 54: Discovering Roles and Anomalies in Graphs: Theory and ... · Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos

T. Eliassi-Rad & C. Faloutsos 54

Back to Overview

SDM'12 Tutorial

Roles

Features

Anomalies

Patterns

= rare roles