Upload
trista
View
38
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Sparsification of Influence Networks. Michael Mathioudakis 1 , Francesco Bonchi 2 , Carlos Castillo 2 , Aris Gionis 2 , Antti Ukkonen 2 1 University of Toronto, Canada 2 Yahoo! Research Barcelona, Spain. Introduction. users perform actions post messages, pictures, videos - PowerPoint PPT Presentation
Citation preview
Sparsification ofInfluence Networks
Michael Mathioudakis1, Francesco Bonchi2, Carlos Castillo2, Aris Gionis2, Antti Ukkonen2
1University of Toronto, Canada2Yahoo! Research Barcelona, Spain
2
Introduction
users perform actionspost messages, pictures, videos
connected with other usersinteract, influence each other
actions propagate
online social networksfacebook 750m userstwitter 100m+ users
nice read indeed!
09:3009:00
3
Problem
sparsify networkeliminate large number of connections
keep important connections
sparsification: a data reduction operationnetwork visualization
efficient graph analysis
which connections are most important for the propagation of actions?
4
What We Do
technical frameworksparsify network according to observed activity
keep connections that best explain propagations
our approachsocial network & observed propagationslearn independent cascade model (ICM)
select k connectionsmost likely to have produced propagations
5
Outline
• introduction• setting– social network– propagation model
• sparsification– optimal algorithm– greedy algorithm: spine
• experiments
6
Social Network
users – nodesB follows A – arc A→B
A B
7
Propagation of Actions
independent cascade modelpropagation of an action unfolds in timesteps
BA
t t+1
p(A,B)
users perform actionsactions propagate
great movie
I liked this movie influence probability
8
Propagation of Actions
icm generates propagationssequence of activations
likelihood
active
not active
t-1 t t+1
action αp(A,B)
CA
B D
E
9
Estimating Influence Probabilities
social network+
set of propagations
p(A,B)
max likelihood
EM – [Saito et.al.]
active
not active
t-1 t t+1
action αp(A,B)
CA
B D
E
10
Outline
• introduction• setting– social network– propagation model
• sparsification– optimal algorithm– greedy algorithm: spine
• experiments
11
Sparsification
social network
p(A,B)
set of propagations
k arcsmost likely to
explain all propagations
B
A
p(A,B)
12
Sparsification
k arcs
A
B
p(A,B)
social network
p(A,B)
set of propagations
most likely to explain all
propagations
13
Sparsification
not the k arcs with largest probabilities
NP-hard and inapproximabledifficult to find solution with non-zero likelihood
14
How to Solve?
brute-force approachtry all subsets of k arcs?
no
break down into smaller problemscombine solutions
15
Optimal Algorithmsparsify separately incoming arcs of individual nodes
optimize corresponding likelihood
A B C
kA kB kC+ + = k
dynamic programmingoptimal solution
however…
16
Spinesparsification of influence networks
greedy algorithmefficient, good results
two phases
phase 1try to obtain a non-zero-likelihood solution
k0 < k arcs
phase 2build on top of phase 1
17
Spine – Phase 1phase 1
obtain a non-zero-likelihood solutionselect greedily arcs that participate in most propagations
until all propagations are explained
A
B
C
social network
D
A
B
CD
A
B
CD
t t+1
action α
action β
18
Spine – Phase 2add one arc at a time, the one that offers
largest increase in likelihoodlo
gL
# arcsk0 k
submodular
approximation guaranteefor phase 2
19
Outline
• introduction• setting– social network– propagation model
• sparsification– optimal algorithm– greedy algorithm: spine
• experiments
20
Experiments
datasets
meme.yahoo.comactions: postings (photos), nodes: users, arcs: who follows whom
data from 2010
memetracker.orgactions: mentions of a phrase, nodes: blogs & news sources,
arcs: who links to whomdata from 2009
21
sampled datasets of different sizes
Experiments
Dataset Actions Arcs Arcs, prob > 0
YMeme-L 26k 1.25M 430k
YMeme-M 13k 1.15M 380k
YMeme-S 5k 466k 73k
MTrack-L 9k 200k 7.8k
MTrack-M 120 110k 1.4k
MTrack-S 780 78k 768
YMeme meme.yahoo.comMTrack memetracker.org
22
Experiments
algorithms
optimal (very inefficient)
spine(a few seconds to 3.5hrs)
by arc probabilityrandom
23
Experiments
24
Model Selection using BICBIC(k) = -2logL + klogN
25
Application
spine as a preprocessing step
influence maximizationselect k nodes to maximize spread of action
[Kempe, Kleinberg, Tardos, 03]NP-hard, greedy approximation
perform on sparsified network insteadlarge benefit in efficiency, little loss in quality
26
Application
27
Public Code and Data
http://www.cs.toronto.edu/~mathiou/spine/
28
The End
Questions?
29