View
0
Download
0
Category
Preview:
Citation preview
Scalable Diffusion-Aware Optimization of Network Topology
Elias Boutros Khalil, Bistra Dilkina, Le Song
Georgia Institute of Technology
Problem
• Given
• G(V,E),
• a set of source nodes X (infected nodes)
• Linear Threshold Model
• Find a set of k edges to
• remove, s.t., the spread of a certain
substance is minimized
• add, s.t., the spread of a certain substance
is maximized
2
Review: Diffusion Models
• Linear Threshold Model
• Each edge has a weight Wuv
• each node u chooses a threshold uniformly
at random in [0,1]
• Node v will be infected if
• Independent Cascade Model
• Each edge has a propagation probability
Puv
• Each infected node u has only one chance
to infect its neighbor v with prob. Puv
3
Review: Influence Maximization
• Given
• G(V,E)
• LT model or IC model
• To find k nodes to activate to maximize
the spread of a certain substance
• Greedy algorithm
• Objective function is submodular
• (1-1/e)-appriximation
4
Edge Deletion Problem
• Given G, source set A,
• Find k edges
• Supermodular
• Greedy algorithm provides (1-1/e)-
approximation
• Scaling up tricks
5
Edge Addition Problem
• Given G, source set A,
• Find k edges
• Still supermodular (Equivalent to
constrained submodular minimization)
• Algorithm: max. the lowerbound
6
Edge Addition Problem
• Marginal Gain is bounded
• Apply an approach for constrained submodular
minimization with approximation guarantees R. Iyer, S. Jegelka, and J. Bilmes. Fast semidifferential based
submodular function optimization. In ICML, 2013.
7
Experiments
• Datasets
• Syntetic dataset: generated by Kronecker
graph model
• (1) CorePeriphery, (2) ErdosRenyi and (3)
Hierarchical
• Real datasets:
8
Experiments
• Competing heuristics
• Random
• Weights: highest weights
• Betweenness
• Eigen: k edges to max the leading
eigendrop
• Degree: k edges whose destination nodes
have the highest out-degrees [8]
9
Experiments
Edge deletion Edge addition
10
Core Decomposition of Uncertain Graphs
Francesco Bonchi, Francesco Gullo, Andreas
Kaltenbrunner, Yana Volkovich
Yahoo Labs, Spain
Core decomposition
• k-core of a graph
• a maximal subgraph in which every vertex
is connected to at least k other vertices
within that subgraph
• Core decomposition
• The set of all k-cores of a graph G forms
the core decomposition of G
12
K-core under uncertain graphs • A maximal subgraph whose vertices have at
least k neigbours in that subgraph with
probability no less than η
13
Example
14
Motivation
• core decomposition can be computed
efficiently in deterministic graphs
• computed in linear time
• However, does not guarantee efficiency
in uncertain graphs
• even the simplest graph operations may
become computationally intensive.
• uncertain graph
• edges are assigned a probability of existence
• E.g.:, protein-interaction, the influence of one
person on another 15
Applications • Influence maximization
• Idea: just reduce the input graph G by keeping only
the inner-most η-shells
• the higher the core index is, the more likely the
vertex is an influential spreader [17]
• Task-driven team formation
• Node: individuals; edge: a probabilistic topic model
• Given a pair <T,Q> where T is the set of terms, Q is
a set of nodes
• Goal: Find a node of nodes A where Q⊆A, which a
good team to perform the task in T
• Solution: find a connected component of (k,η)-core
which contains A 16
Algorithm framework
17
Follow the deterministic
case
the maximum degree such that
the probability for v to have that
degree is no less than η
Non-trivial to compute
Experiments
18
Influence Maximization
Task-driven Team-formation
Fast Influence-based Coarsening for Large Networks
KDD, New York City
August 26, 2014
Manish Purohit^, B. Aditya Prakash*,
Chanhyun Kang^, Yao Zhang*, V S Subrahmanian^
*Virginia Tech ^University of Maryland
Networks are getting huge!
20
Flickr (friendship network): 87 million
users and 8 billion photos until 2013 Amazon (friendship network): 237 million
accounts until 2013
Twitter (follower network): 271 million
monthly active users
Facebook (friendship network): 829
million daily active users on average in
June 2014 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Need for fast analysis
• Ever growing list of applications of
network effects
• Viral Marketing
• Immunization
• Information Diffusion
• …
21
However, scaling up traditional algorithms
up to millions of nodes is hard
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
How to handle large-scale networks
• Approaches
• Use faster / simpler algorithms
• Perform analysis locally
• i.e., divide the large network into
smaller subgraphs
• Zoom-out the network to
obtain a smaller
representation of the network
22
this paper
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Bird’s eye view of a network
23
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Bird’s eye view of a network
• “Zoom-out” of the graph to get a quick
picture
24
Called “coarsen” in this paper
Big graph
Zoom-out
A
F
E
D
C
B
Small representation
of the network
A
C B
E
F
D
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Outline
• Motivation
• Challenges
• Problem Definition
• Our Proposed Method
• Experiments
• Applications
• Conclusion
25
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Challenges
• C1: How do we maintain diffusive
characteristics when coarsening
networks?
• C2: How do we merge node to get the
coarse network?
• C3: how do we find the best node to
merge fast?
26
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
C1: Information Diffusion
• Cascading behavior in networks
27
Diffusion is graph induced by a time ordered propagation of information (edges)
Blogs Posts
Links
Information
cascade
Source: [McGlohon et. al., SDM2007]
B1 B2
B4 B3
1
1
2
3
1
Blog network
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
C1: Model information diffusion
• Information spreads over networks
• e.g.:, rumor/meme spreads over Twitter following
network
• Independent cascade model (IC) [Kempe+, KDD03]
• Weights pij: propagation prob. from i to j
• Each node has only one chance to infect its
neighbors
28
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Meme spreading
C1: Diffusive characteristics
• First eigenvalue λ1 (of adjacency matrix)
is enough for most diffusion models.
(Prakash et al. [ICDM’12])
29
λ1 is the epidemic threshold
“Safe” “Vulnerable” “Deadly”
Increasing λ1 , Increasing vulnerability Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
C1: maintain diffusive characteristics
• Goal: maintain the diffusive characteristics of
the original network in the coarsened network?
30 Original network
coarsen
A
F
E
D
C
B
Coarsened network
A
C B
E
F
D
Make the coarsened network has the least
change in the first eigenvalue
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
C2: How to merge nodes
• Goal: Merge nodes of graph G to get the
coarsened graph that “approximates” G with
respect to diffusion
31
Merge b and a can
get the least change
of λ1
Is this correct?
0.375!
Original network
Influence from d to b: 0.5
Influence from d to a: 0.25
Average: 0.375
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
• In general:
32
C2: How to merge nodes
Merging a,b
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Details
C3: which nodes to merge
• Goal:
• Find the best nodes to merge
• Fast, scalable to large network
33
Talk about it
later
Original network
coarsen
A
F
E
D
C
B
Coarsened network
A
C B
E
F
D
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Outline
• Motivation
• Challenges
• Problem Definition
• Our Proposed Method
• Experiments
• Applications
• Conclusion
34
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Problem Definition
Graph Coarsening Problem (GCP)
Given: large graph G(V, E), and reduction
factor α
Find: the best set of edges to merge
Such that: |λG - λH| is minimized
• (i.e. H is the coarsened graph with the
least change in the first eigenvalue)
35
Naive Greedy Heuristic
Step: • Score every edge by the change in eigenvalue
• Greedily choose the edge (a,b) with the least score,
and merge (a,b)
• Re-evaluate the scores of every edge and repeat
36
• Too slow! O(m2) time to score all edges
• Lose time benefits of analyzing the smaller graph
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Outline
• Motivation
• Problem Definition
• Challenges
• Our Proposed Method
• CoarseNet
• Experiments
• Applications
• Conclusion
37
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
CoarseNet: idea
• Can we approximate the edge scores faster?
• Yes!
• Use matrix perturbation arguments to
estimate (up to first order terms) the score of
an edge in constant time!
• Score all edges in O(m) time
• Naive Heuristic: O(m2) time
38
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
CoarseNet: details
• Corollary 5.1: Given the first eigenvalue λ,
and corresponding eigenvectors u, v, the
score of a node pair score(a, b) can be
approximated in constant time.
39
(a,b) is a node-
pair
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
We want to characterize the change of λ after coarsening
a b
f
g
e
Coarsen
merge (a,b)
c
f
g
e
the out-adjacency vector of merged node c
CoarseNet
40
See paper for
details A u = λ . u
u(i)
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
left eigenvector right eigenvector
weight of (b,a)
weight of (a,b)
Details
CoarseNet: Complete algorithm • Step
1: compute scores for all edge pairs
2: Merge nodes with smallest score
3. Goto step 1 until αn nodes left
41
Original Network
(weight=0.5)
Assigning
scores
Merging edges
Coarsened Network
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
CoarseNet: running time
42
• Running time: O(mln(m)+αnnθ)
• m: number of edges
• n: number of nodes
• nθ : the maximum degree of any vertex during the
merging process
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Outline
• Motivation
• Challenges
• Problem Definition
• Our Proposed Method
• Experiments
• Applications
• Conclusion
43
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
How do we perform?
44
The first eigenvalue gets preserved well up to large
coarsening factors!
Amazon
(See more results in the paper)
DBLP
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Scalability w.r.t Reduction Factor (α)
45
Scales linearly with the desired reduction factor
Amazon (334,863 vertices) DBLP (511,163 vertices)
(See more results in the paper)
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Scalability w.r.t Graph Size (𝑛)
46
Flickr
Scales linearly with the number of nodes
We extracted 6
connected
components (with
500K to 1M vertices
in steps of 100K) of
the Flickr network
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Outline
• Motivation
• Challenges
• Problem Definition
• Our Proposed Method
• Experiments
• Applications
• Conclusion
47
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
• How to market well?
• Convince a subset of individuals to adopt a new
product
• Then, trigger a large cascade of further adoptions
• Influence maximization problem
• [Kempe et. al, KDD03]
• Find the best set of seeds in a network to achieve
highest diffusion
48
Application 1: Influence Maximization
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Who is the most
influential person?
Influence
Application 1: Influence Maximization • Our fast algorithm CSPIN:
Step 1: Coarsen the large social network using CoarsenNet
Step 2: Solve influence maximization on the coarsened network
Step 3: Randomly select one node from each selected “supernode”
49
Step 1: Coarsen
A
C B
E
F
D Step 2: Solve influence
maximization
A
C B
E
F
D
Step 3: Randomly
select one node from C We call it CSPIN
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Quality of CSPIN
• We use and compare against the fast and
popular PMIA algorithm (Chen et al.
[KDD’07])
50
We obtain influence spread as good as by PMIA
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Quality of CSPIN w.r.t 𝛼
51
We can merge up to 95% of the vertices are merged
without significantly affecting the influence spread!
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Scalability w.r.t number of seeds
52
Log scale
Finds good solutions in minutes instead of hours!
Portland (1.5 million vertices)
(See more results in the paper)
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Application 2: Diffusion Characterization
• Goal: use Graph Coarsening to understand
information cascades
• Dataset: Flixster • a fridendship network with movie ratings
• Cascade: the same movie rating from friends
• Methodology
• coarsen the network using CoarseNet with the
reduction factor α=0.5
• study the formed groups (supernodes)
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014 53
Diffusion observation
Observation 1: a very large fraction of movies
propagate in a small number of groups
Observation 2: a multi-modal distribution
Stats:
• 1891 groups
• mean group size: 16.6
• the largest group: 22061
nodes (roughly 40% of
nodes)
(See more results in the paper)
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014 54
Can get non-network
surrogates for
super-nodes
Outline
• Motivation
• Challenges
• Problem Definition
• Our Proposed Method
• Experiments
• Applications
• Conclusion
55
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Conclusion Graph Coarsening Problem
• Given: a large graph and
the reduction factor
• Find: "best" nodes to
coarsen
CoarseNet
• estimate edge score in
constant time
• Sub-quadratic
Applications
• Influence Maximization
• Diffusion Characterization
56
Original
network
coarsen
A
F
E
D
C
B
Coarsened
network
A
C B
E
F
D
Purohit, Prakash, Kang, Zhang, Subrahmanian 2014
Any Questions?
• Code at:
http://www.cs.vt.edu/~badityap/
Funding:
57
Original
network
coarsen
A
F
E
D
C
B
Coarsened
network
A
C B
E
F
D
Recommended