Scalable Diffusion-Aware Optimization of Network...

Scalable Diffusion-Aware Optimization of Network Topology

Elias Boutros Khalil, Bistra Dilkina, Le Song

Georgia Institute of Technology

Problem

• Given

• G(V,E),

• a set of source nodes X (infected nodes)

• Linear Threshold Model

• Find a set of k edges to

• remove, s.t., the spread of a certain

substance is minimized

• add, s.t., the spread of a certain substance

is maximized

Review: Diffusion Models

• Linear Threshold Model

• Each edge has a weight Wuv

• each node u chooses a threshold uniformly

at random in [0,1]

• Node v will be infected if

• Independent Cascade Model

• Each edge has a propagation probability

• Each infected node u has only one chance

to infect its neighbor v with prob. Puv

Review: Influence Maximization

• Given

• G(V,E)

• LT model or IC model

• To find k nodes to activate to maximize

the spread of a certain substance

• Greedy algorithm

• Objective function is submodular

• (1-1/e)-appriximation

Edge Deletion Problem

• Given G, source set A,

• Find k edges

• Supermodular

• Greedy algorithm provides (1-1/e)-

approximation

• Scaling up tricks

Edge Addition Problem

• Given G, source set A,

• Find k edges

• Still supermodular (Equivalent to

constrained submodular minimization)

• Algorithm: max. the lowerbound

Edge Addition Problem

• Marginal Gain is bounded

• Apply an approach for constrained submodular

minimization with approximation guarantees R. Iyer, S. Jegelka, and J. Bilmes. Fast semidifferential based

submodular function optimization. In ICML, 2013.

Experiments

• Datasets

• Syntetic dataset: generated by Kronecker

graph model

• (1) CorePeriphery, (2) ErdosRenyi and (3)

Hierarchical

• Real datasets:

Experiments

• Competing heuristics

• Random

• Weights: highest weights

• Betweenness

• Eigen: k edges to max the leading

eigendrop

• Degree: k edges whose destination nodes

have the highest out-degrees [8]

Experiments

Edge deletion Edge addition

Core Decomposition of Uncertain Graphs

Francesco Bonchi, Francesco Gullo, Andreas

Kaltenbrunner, Yana Volkovich

Yahoo Labs, Spain

Core decomposition

• k-core of a graph

• a maximal subgraph in which every vertex

is connected to at least k other vertices

within that subgraph

• Core decomposition

• The set of all k-cores of a graph G forms

the core decomposition of G

K-core under uncertain graphs • A maximal subgraph whose vertices have at

least k neigbours in that subgraph with

probability no less than η

Example

Motivation

• core decomposition can be computed

efficiently in deterministic graphs

• computed in linear time

• However, does not guarantee efficiency

in uncertain graphs

• even the simplest graph operations may

become computationally intensive.

• uncertain graph

• edges are assigned a probability of existence

• E.g.:, protein-interaction, the influence of one

person on another 15

Applications • Influence maximization

• Idea: just reduce the input graph G by keeping only

the inner-most η-shells

• the higher the core index is, the more likely the

vertex is an influential spreader [17]

• Task-driven team formation

• Node: individuals; edge: a probabilistic topic model

• Given a pair <T,Q> where T is the set of terms, Q is

a set of nodes

• Goal: Find a node of nodes A where Q⊆A, which a

good team to perform the task in T

• Solution: find a connected component of (k,η)-core

which contains A 16

Algorithm framework

Follow the deterministic

the maximum degree such that

the probability for v to have that

degree is no less than η

Non-trivial to compute

Experiments

Influence Maximization

Task-driven Team-formation

Fast Influence-based Coarsening for Large Networks

KDD, New York City

August 26, 2014

Manish Purohit^, B. Aditya Prakash*,

Chanhyun Kang^, Yao Zhang*, V S Subrahmanian^

*Virginia Tech ^University of Maryland

Networks are getting huge!

Flickr (friendship network): 87 million

users and 8 billion photos until 2013 Amazon (friendship network): 237 million

accounts until 2013

Twitter (follower network): 271 million

monthly active users

Facebook (friendship network): 829

million daily active users on average in

June 2014 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

Need for fast analysis

• Ever growing list of applications of

network effects

• Viral Marketing

• Immunization

• Information Diffusion

• …

However, scaling up traditional algorithms

up to millions of nodes is hard

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

How to handle large-scale networks

• Approaches

• Use faster / simpler algorithms

• Perform analysis locally

• i.e., divide the large network into

smaller subgraphs

• Zoom-out the network to

obtain a smaller

representation of the network

this paper

Bird’s eye view of a network

• “Zoom-out” of the graph to get a quick

picture

Called “coarsen” in this paper

Big graph

Zoom-out

Small representation

of the network

Outline

• Motivation

• Challenges

• Problem Definition

• Our Proposed Method

• Experiments

• Applications

• Conclusion

Challenges

• C1: How do we maintain diffusive

characteristics when coarsening

networks?

• C2: How do we merge node to get the

coarse network?

• C3: how do we find the best node to

merge fast?

C1: Information Diffusion

• Cascading behavior in networks

Diffusion is graph induced by a time ordered propagation of information (edges)

Blogs Posts

Information

cascade

Source: [McGlohon et. al., SDM2007]

Blog network

C1: Model information diffusion

• Information spreads over networks

• e.g.:, rumor/meme spreads over Twitter following

network

• Independent cascade model (IC) [Kempe+, KDD03]

• Weights pij: propagation prob. from i to j

• Each node has only one chance to infect its

neighbors

Meme spreading

C1: Diffusive characteristics

• First eigenvalue λ1 (of adjacency matrix)

is enough for most diffusion models.

(Prakash et al. [ICDM’12])

λ1 is the epidemic threshold

“Safe” “Vulnerable” “Deadly”

Increasing λ1 , Increasing vulnerability Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

C1: maintain diffusive characteristics

• Goal: maintain the diffusive characteristics of

the original network in the coarsened network?

30 Original network

coarsen

Coarsened network

Make the coarsened network has the least

change in the first eigenvalue

C2: How to merge nodes

• Goal: Merge nodes of graph G to get the

coarsened graph that “approximates” G with

respect to diffusion

Merge b and a can

get the least change

of λ1

Is this correct?

0.375!

Original network

Influence from d to b: 0.5

Influence from d to a: 0.25

Average: 0.375

• In general:

C2: How to merge nodes

Merging a,b

Details

C3: which nodes to merge

• Goal:

• Find the best nodes to merge

• Fast, scalable to large network

Talk about it

Original network

coarsen

Coarsened network

Outline

• Motivation

• Challenges

• Experiments

• Applications

• Conclusion

Problem Definition

Graph Coarsening Problem (GCP)

Given: large graph G(V, E), and reduction

factor α

Find: the best set of edges to merge

Such that: |λG - λH| is minimized

• (i.e. H is the coarsened graph with the

least change in the first eigenvalue)

Naive Greedy Heuristic

Step: • Score every edge by the change in eigenvalue

• Greedily choose the edge (a,b) with the least score,

and merge (a,b)

• Re-evaluate the scores of every edge and repeat

• Too slow! O(m2) time to score all edges

• Lose time benefits of analyzing the smaller graph

Outline

• Motivation

• Challenges

• CoarseNet

• Experiments

• Applications

• Conclusion

CoarseNet: idea

• Can we approximate the edge scores faster?

• Yes!

• Use matrix perturbation arguments to

estimate (up to first order terms) the score of

an edge in constant time!

• Score all edges in O(m) time

• Naive Heuristic: O(m2) time

CoarseNet: details

• Corollary 5.1: Given the first eigenvalue λ,

and corresponding eigenvectors u, v, the

score of a node pair score(a, b) can be

approximated in constant time.

(a,b) is a node-

We want to characterize the change of λ after coarsening

Coarsen

merge (a,b)

the out-adjacency vector of merged node c

CoarseNet

See paper for

details A u = λ . u

left eigenvector right eigenvector

weight of (b,a)

weight of (a,b)

Details

CoarseNet: Complete algorithm • Step

1: compute scores for all edge pairs

2: Merge nodes with smallest score

3. Goto step 1 until αn nodes left

Original Network

(weight=0.5)

Assigning

scores

Merging edges

Coarsened Network

CoarseNet: running time

• Running time: O(mln(m)+αnnθ)

• m: number of edges

• n: number of nodes

• nθ : the maximum degree of any vertex during the

merging process

Outline

• Motivation

• Challenges

• Experiments

• Applications

• Conclusion

How do we perform?

The first eigenvalue gets preserved well up to large

coarsening factors!

Amazon

(See more results in the paper)

Scalability w.r.t Reduction Factor (α)

Scales linearly with the desired reduction factor

Amazon (334,863 vertices) DBLP (511,163 vertices)

Scalability w.r.t Graph Size (𝑛)

Flickr

Scales linearly with the number of nodes

We extracted 6

connected

components (with

500K to 1M vertices

in steps of 100K) of

the Flickr network

Outline

• Motivation

• Challenges

• Experiments

• Applications

• Conclusion

• How to market well?

• Convince a subset of individuals to adopt a new

product

• Then, trigger a large cascade of further adoptions

• Influence maximization problem

• [Kempe et. al, KDD03]

• Find the best set of seeds in a network to achieve

highest diffusion

Application 1: Influence Maximization

Who is the most

influential person?

Influence

Application 1: Influence Maximization • Our fast algorithm CSPIN:

Step 1: Coarsen the large social network using CoarsenNet

Step 2: Solve influence maximization on the coarsened network

Step 3: Randomly select one node from each selected “supernode”

Step 1: Coarsen

D Step 2: Solve influence

maximization

Step 3: Randomly

select one node from C We call it CSPIN

Quality of CSPIN

• We use and compare against the fast and

popular PMIA algorithm (Chen et al.

[KDD’07])

We obtain influence spread as good as by PMIA

Quality of CSPIN w.r.t 𝛼

We can merge up to 95% of the vertices are merged

without significantly affecting the influence spread!

Scalability w.r.t number of seeds

Log scale

Finds good solutions in minutes instead of hours!

Portland (1.5 million vertices)

Application 2: Diffusion Characterization

• Goal: use Graph Coarsening to understand

information cascades

• Dataset: Flixster • a fridendship network with movie ratings

• Cascade: the same movie rating from friends

• Methodology

• coarsen the network using CoarseNet with the

reduction factor α=0.5

• study the formed groups (supernodes)

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014 53

Diffusion observation

Observation 1: a very large fraction of movies

propagate in a small number of groups

Observation 2: a multi-modal distribution

Stats:

• 1891 groups

• mean group size: 16.6

• the largest group: 22061

nodes (roughly 40% of

nodes)

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014 54

Can get non-network

surrogates for

super-nodes

Outline

• Motivation

• Challenges

• Experiments

• Applications

• Conclusion

Conclusion Graph Coarsening Problem

• Given: a large graph and

the reduction factor

• Find: "best" nodes to

coarsen

CoarseNet

• estimate edge score in

constant time

• Sub-quadratic

Applications

• Influence Maximization

• Diffusion Characterization

Original

network

coarsen

Coarsened

network

Any Questions?

• Code at:

http://www.cs.vt.edu/~badityap/

Funding:

Original

network

coarsen

Coarsened

network

Scalable Diffusion-Aware Optimization of Network...

Documents

Vydehi Institute of Medical Sciences and Research Centre ... · 12/09/16 13/09/2016 14/09/2016 15/09/2016 16/09/2016 19/09/2016 20/09/2016 21/09/2016 22/09/2016 23/09/2016 26/09/2016

Endoscopic nasal surgery- a surgical fashion or genuine ... UK essay Alex Yao.pdf · ENTUK Student Undergraduate Essay Prize 2013 i | P a g e Endoscopic nasal surgery- a surgical

업무가빨라지는그룹웨어 다우오피스 …office 20190-09 (¥) Q 19-09-09 1829 19-09-09 19-09-09 19-09-09 19-09-09 19-08-28 11:11 19-05-20 16.10 19-05-16151)4 19-04-19 1527

GEOS 112L 09-08-09, 09-10-09 - Cosmology. from

Pinax Tutorial 09/09/09

21-11591-dsj Doc 17 Filed 09/09/21 Entered 09/09/21 12:09

Edge-Weighted Personalized PageRank: Breaking a Decade …people.cs.vt.edu/liangzhe/slides/Edge-Weighted Personalized PageRank.pdfEdge-Weighted Personalized PageRank: Breaking a Decade-Old

09 09 09 03

Closure duration and VOT of word-initial voiceless ...linguistics.berkeley.edu/phonlab/documents/2007/Yao.pdf · Closure duration and VOT of word-initial voiceless plosives in English

Combinatorial Laplacian and Rank Aggregationlekheng/meetings/datamining/yao.pdf · Combinatorial Laplacian and Rank Aggregation Outline 1 Two Motivating Examples 2 Reﬂections on

Accumulation of mtDNA variations in human single CD34+ cells …mitotool.org/lab/pdf/Stem Cell Res 2013 Yao.pdf · mutation levels in CD34+ cells from the adult samples could be seemingly

SnapNETS: Automatic Segmentation of Network …badityap/papers/snapnets-aaai17.pdfSnapNETS: Automatic Segmentation of Network Sequences with Node Labels Sorour E. Amiri, Liangzhe Chen,

HotSpots: Failure Cascades on Heterogeneous Critical ...badityap/papers/hotspots-cikm17.pdf · HotSpots: Failure Cascades on Heterogeneous Critical Infrastructure Networks Liangzhe

Paper Presentation - Peoplepeople.cs.vt.edu/liangzhe/slides/03-05-2015-steve.pdf · 2015-03-05 · Paper Presentation Steve Jan Virginia Tech March 5, 2015 Steve Jan (Virginia Tech)

VISUM SIMULATION BASED ON-ROAD-VEHICLE CO …athena.ecs.csus.edu/~yaoz/pdf/...Presentation-Yao.pdf · Problem Statement 10/18/2017 2005 U.S. EPA’s data shows 136,224 tons of on

The Impact of Information Sharing on Supply Chain …zhao.rutgers.edu/mythesis-yao.pdf · The Impact of Information Sharing on Supply Chain ... The thesis starts by analyzing a periodic

Visitas - incor.usp.br · Visitas 0 5.000 10.000 15.000 20.000 fev/09 mar/09 abr/09 mai/09 jun/09 jul/09 ago/09 set/09 out/09 nov/09 dez/09 jan/10 fev/10 119.268 141.624 138.689 148.035

09-09-09 Web 2.0 Weekly

Parallel Connection of Pump and Valve Control Units Yao.pdf · Parallel connection of pump and valves ‐Flow @ point A and B Basic working principles ‐Pump provides the majority

DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute