Upload
ngonhi
View
225
Download
0
Embed Size (px)
Citation preview
Construction Algorithms for Online Social Networks
Minas Gjoka, Balint Tillman, Athina Markopoulou
University of California, Irvine
Graphs
Social Networks
Protein interactions World Wide Web
Autonomous Systems
DNS
2
Motivation § Measurements/sampling OSNs
• http://odysseas.calit.uci.edu/osn/ • [INFOCOM 2010],[ SIGMETRICS 2011],
3x[JSAC 2011], [WOSN 2012]… • ~3500 researchers have requested our
FB datasets
§ Generate synthetic graphs that resemble real social networks • to use in simulations • for anonymization
§ Q1: resemble in terms of what? § Q2: generate how?
3
dK-Series
§ dK-series framework [Mahadevan et al, Sigcomm ’06] • “A set of graph properties that describe and constrain random
graphs, using degree correlations, in successively finer detail”
4
dK-Series
§ dK-series framework [Mahadevan et al, Sigcomm ’06] • 0K specifies the average node degree
VE
k2
=
5
dK-Series
§ dK-series framework [Mahadevan et al, Sigcomm ’06] • 0K specifies the average node degree • 1K specifies the node degree sequence
o node degree “sequence” vs “distribution”
∑∈=
kVakD 1)(
1a
2a
4a
3b
3a
1b
4b
2
1
2
2
k #
1
2
3
4
1K 6
dK-Series
§ dK-series framework [Mahadevan et al, Sigcomm ’06] • 0K specifies the average node degree • 1K specifies the node degree sequence • 2K specifies the joint node degree matrix (JDM)
∑ ∑∈ ∈ ∈=k lVa Vb EbalkJDM }),{{1),(
1a
2a
4a
3b
3a
1b
4b
(k,l)
1 1
1 1
1 1 4
1 1 4 2
1 2 3 4
1
2
3
4
2K 7
§ dK-series framework [Mahadevan et al, Sigcomm ’06] • 0K specifies the average node degree • 1K specifies the node degree distribution • 2K specifies the joint node degree matrix (JDM) • 3K specifies the number of induced subgraphs of 3 nodes
o nodes are labeled by their degree k
dK-Series
#Wedges(k1,k2,k3) #Triangles(k1,k2,k3)
k1 k2
k3
k1 k2
k3
8
dK-Series
§ dK-series framework [Mahadevan et al, Sigcomm ’06] • 0K specifies the average node degree • 1K specifies the node degree distribution • 2K specifies the joint node degree matrix (JDM) • 3K specifies the distribution of subgraphs of 3 nodes • … • nK specifies the entire graph
§ Nice properties • Inclusion • Convergence • Tradeoff : accuracy vs. complexity
OSNs “2K+”
9
Related Work § Graph Construction Approaches:
• Stochastic: reproduces dk-distribution in expectation. • Configuration (“pseudograph”): reproduces dk-distribution exactly.
o Deterministic algorithms up to d=2. MCMC for d>=2.
§ 1K Construction • Configuration: 1K multigraphs or simple graphs [Molloy’1995] • 1K+ [Bansal ’2009, Newman’2009, Serrano & Boguna’2005, …] • What else is known: conditions for 1K to be graphical [Erdos-Gallai, Havel]; space of
graphs with degree sequence connected [Havel-Hakimi]; MCMC for sampling. § 2K Construction
• Configuration model [Mahadevan’2006] • Balance Degree Invariant: [Amanatidis’2008], [Stanton’ 2012],[Czabarka’2014]. • What else is known: conditions for 2K to be graphical [Amantidis’08][Stanton’12];
space of graphs with a given JDM is connected[Stanton’12 [Czabarka’14]; MCMC convergence speed is an open problem.
§ 2K+ Construction
• 2K preserving, 3K targeting using edge rewiring: [Mahadevan’ 2006] • 2.5K heuristic: JDM+degree dependent clustering coefficient: [Gjoka’13]
10
Our Contributions § New 2K Construction Algorithm
§ can produce any simple graph § Main benefit: no constraints in constructed graphs
§ with the exact JDMtarget § in O(|E|dmax)
§ 2K+ Framework: JDMtarget+ Additional Properties § 2K + Node Attributes (exactly) § 2K + Avg Clustering (approx)
§ Main benefit: orders of magnitude faster than 2K+MCMC
11
2K Construction
1 1
1 1
1 1 4
1 1 4 2
1 2 3 4
1
2
3
4
JDMtarget § Input: Joint Degree Matrix
• JDMtarget must be graphical
§ Goal:
• Construct a simple graph with exactly JDMtarget
12
2K Construction
0/1 0/1
0/1 0/1
0/1 0/1 0/4
0/1 0/1 0/4 0/2
1 2 3 4
1
2
3
4
JDM/JDMtarget
1a
2a
4a
3b
3a
1b
4b
Initialize: 1K: create nodes and stubs
JDM(k,l)=0 for all k,l
13
2K Construction
0/1 1/1
0/1 0/1
0/1 0/1 0/4
1/1 0/1 0/4 0/2
1 2 3 4
1
2
3
4
JDM/JDMtarget
1a
2a
4a
3b
3a
1b
4b
Initialize: 1K: create nodes and stubs
JDM(k,l)=0 for all k,l
14
2K Construction
0/1 1/1
1/1 0/1
0/1 1/1 0/4
1/1 0/1 0/4 0/2
1 2 3 4
1
2
3
4
JDM/JDMtarget
1a
2a
4a
3b
3a
1b
4b
Initialize: 1K: create nodes and stubs
JDM(k,l)=0 for all k,l
15
2K Construction
1a
2a
4a
3b
3a
1b
4b
Initialize:
1K: create nodes and stubs JDM(k,l)=0 for all k,l
Pick (k, l) degree pair, in any order While JDM(k, l) < JDMtarget(k, l)
Pick (x, y) any pair of disconnected nodes with degrees k and l
if x does not have free stubs neighbor switch for x
if y does not have free stubs neighbor switch for y
add edge between (x, y) JDM(k, l)++
0/1 1/1
1/1 0/1
0/1 1/1 0/4
1/1 0/1 0/4 0/2
1 2 3 4
1
2
3
4
JDM/JDMtarget
16
Case 1 x, y both have free stubs JDM(k, l) < JDMtarget(k, l) node x has degree k node y has degree l
x
y
Add edge between x and y
k=3
l=4
17
Case 2 x has free stubs but y does not
x
y
k=3
l=4
t
Neighbor switch between y and b using t
b
Add edge between x and y
JDM(k, l) < JDMtarget(k, l) node x has degree k node y has degree l
18
Case 3 neither x nor y have free stubs
x b2
y
k=3
l=4
t1
Neighbor switch between y and b1 using t1
b1
Neighbor switch between x and b2 using t2
t2
Add edge between x and y
JDM(k, l) < JDMtarget(k, l) node x has degree k node y has degree l
19
Properties of 2K Algorithm
20
§ Terminates with exact JDMtarget in O(|E|dmax) • It adds 1 edge at a time, while staying below JDMtarget
§ It can produce ALL graphs with the JDMtarget § Output graph depends on the order of adding edges
Space of constructed graphs Example: all 7-node graphs Generate all non-‐isomorphic
7-‐node simple graphs G1, .., G1044
All Unique Joint Degree Matrices JDM1, ... , JDM768
2K ConstrucOon Algorithm
Output
Input
768 syntheOc graphs (not all disOnct)
21
Flexibility of 2K Algorithm
22
§ Output graph depends on the order of considering edges to add § It can produce ALL graphs with JDMtarget § Family of algorithms: add one edge at a time, while staying below
JDMtarget • any order of degree pairs (k,l) • any order of node pairs (x,y), even before completing a degree pair • Can start with an empty or partially built graph
§ 2K+: can target additional properties fast § Previously known: space of graphs with JDMtarget is connected; but
slow MCMC mixing § Property 1: clustering § Property 2: attribute correlation
Extension 1: Target JDM + Clustering
2
2
3
3
2 2
2
2
3
3
2
2
2 2
3
3
2 2
JDM
2
3
2 3k l
4 4 4 2
Intuition: by controlling the order we add edges we can control clustering.
0 triangles 1 triangles 2 triangles
23
2a
2c
3b
3a
2b
2d
2a 2b
3b
3a
2d 2c
JDM
2
3
2 3k l
4 4 4 2
0 triangles 2 triangles
0
25
75
50
2b
3a
3b 2d
2a
2c
2b 3a
3b 2d
2a
2c
2b 3a
3b 2d
2a
2c
Extension 1: Target JDM + Clustering
[INFOCOM 2013]: add edges in increasing distance àhigh clustering
nodes randomly on a circle, consider node pairs’ distance
Random order of node pairs à low clustering 24
“Sortedness” of node pairs’ list controls clustering
• Example: JDMtarget of Facebook Caltech Network • Consider many orders of node pairs à create graphs with JDMtarget
à compute avg clustering c.
25
2b 3a
3b 2d
2a
2c
[INFOCOM 2015]: control order of node pairs à control clustering
2K+ Avg Clustering Input: target JDM, avg clustering coefficient c Stage 1 E’ = list of node pairs s.t. sortedness(E’)≈S(c)
FOR each candidate node pair (v,w) in E’: IF both nodes v and w have free stubs and the corresponding JDM(k, l) < JDMtarget(k, l): add edge (v,w)
Stage 2 If not all |E| edges have been added:
Add remaining edges using 2K_Simple
Extension 1: Target JDM + Clustering
26
Real world examples target JDM+avg clustering
Average Clustering Coefficient
Average Node Shortest Path Length
Average Node Closeness
27
2K+MCMC did not finish after several days
Real world examples target JDM+avg clustering
28
Extension 2: Node Attributes
JDM
1
2
1 2k l
2 2 6
JDM
1
2
1 2k l
2 2 6
29
JAM
2 2 2 4
Joint Attribute Matrix (or Attribute Mixing Matrix)
Extension 2: Node Attributes Mixing
JDM
1
2
1 2
JAM
k l
2 2 6
2 2 2 4
JDM
1
2
1 2
JAM
k l
2 2 6
4 6
Joint Attribute Matrix (or Attribute Mixing Matrix)
30
JDM
1
2
1 2
JAM
k l
2 2 6
2 2 2 4
JDM
1
2
1 2
JAM
k l
2 2 6
4 6
1
2
2
1 2 21 1
1 1
1 1 4
1
2
2
1 2 22
2
6
Joint Degree and Attribute Matrix (JDAM)
Extension 2: Degree+Attribute Mixing
31
1
2
2
1 2 21 1
1 1
1 1 4
1
2
2
1 2 22
2
6
Joint Degree and Attribute Matrix (JDAM)
Extension 2: target JDAM
2K Algorithm also works for target JDAM
32
Real world examples graphs with node attributes
Average Clustering Coefficient
Average Node Shortest Path Length
Average Node Closeness
33
Real world examples small graphs with node attributes
Simulation takes ~1 day to target 2K and c = 0.24 with MCMC (using double edge swaps) 34
Construction of 2K+ Graphs § New 2K Construction Algorithm
• can produce any simple graph with exact JDMtarget in O(|E|dmax)
§ 2K+ Framework: JDMtarget+ Additional Properties § Extension 1: 2K (exactly) + Avg Clustering (approx) § Extension 2: 2K (exactly) + Node Attributes (exactly)
§ Future directions § Construction: target attributes + structure (towards 3K) § Applications to privacy
http://odysseas.calit2.uci.edu/osn/
35
Construction of 2K+ Graphs
QuesOons?