Sparsification of Influence Networks

Sparsification ofInfluence Networks

Michael Mathioudakis1, Francesco Bonchi2, Carlos Castillo2, Aris Gionis2, Antti Ukkonen2

1University of Toronto, Canada2Yahoo! Research Barcelona, Spain

2

Introduction

users perform actionspost messages, pictures, videos

connected with other usersinteract, influence each other

actions propagate

online social networksfacebook 750m userstwitter 100m+ users

nice read indeed!

09:3009:00

3

Problem

sparsify networkeliminate large number of connections

keep important connections

sparsification: a data reduction operationnetwork visualization

efficient graph analysis

which connections are most important for the propagation of actions?

4

What We Do

technical frameworksparsify network according to observed activity

keep connections that best explain propagations

our approachsocial network & observed propagationslearn independent cascade model (ICM)

select k connectionsmost likely to have produced propagations

5

Outline

• introduction• setting– social network– propagation model

• sparsification– optimal algorithm– greedy algorithm: spine

• experiments

6

Social Network

users – nodesB follows A – arc A→B

A B

7

Propagation of Actions

independent cascade modelpropagation of an action unfolds in timesteps

BA

t t+1

p(A,B)

users perform actionsactions propagate

great movie

I liked this movie influence probability

8

Propagation of Actions

icm generates propagationssequence of activations

likelihood

active

not active

t-1 t t+1

action αp(A,B)

CA

B D

E

9

Estimating Influence Probabilities

social network+

set of propagations

p(A,B)

max likelihood

EM – [Saito et.al.]

active

not active

t-1 t t+1

action αp(A,B)

CA

B D

E

10

Outline



• experiments

11

Sparsification

social network

p(A,B)

set of propagations

k arcsmost likely to

explain all propagations

B

A

p(A,B)

12

Sparsification

k arcs

A

B

p(A,B)

social network

p(A,B)

set of propagations

most likely to explain all

propagations

13

Sparsification

not the k arcs with largest probabilities

NP-hard and inapproximabledifficult to find solution with non-zero likelihood

14

How to Solve?

brute-force approachtry all subsets of k arcs?

no

break down into smaller problemscombine solutions

15

Optimal Algorithmsparsify separately incoming arcs of individual nodes

optimize corresponding likelihood

A B C

kA kB kC+ + = k

dynamic programmingoptimal solution

however…

16

Spinesparsification of influence networks

greedy algorithmefficient, good results

two phases

phase 1try to obtain a non-zero-likelihood solution

k0 < k arcs

phase 2build on top of phase 1

17

Spine – Phase 1phase 1

obtain a non-zero-likelihood solutionselect greedily arcs that participate in most propagations

until all propagations are explained

A

B

C

social network

D

A

B

CD

A

B

CD

t t+1

action α

action β

18

Spine – Phase 2add one arc at a time, the one that offers

largest increase in likelihoodlo

gL

# arcsk0 k

submodular

approximation guaranteefor phase 2

19

Outline



• experiments

20

Experiments

datasets

meme.yahoo.comactions: postings (photos), nodes: users, arcs: who follows whom

data from 2010

memetracker.orgactions: mentions of a phrase, nodes: blogs & news sources,

arcs: who links to whomdata from 2009

21

sampled datasets of different sizes

Experiments

Dataset Actions Arcs Arcs, prob > 0

YMeme-L 26k 1.25M 430k

YMeme-M 13k 1.15M 380k

YMeme-S 5k 466k 73k

MTrack-L 9k 200k 7.8k

MTrack-M 120 110k 1.4k

MTrack-S 780 78k 768

YMeme meme.yahoo.comMTrack memetracker.org

22

Experiments

algorithms

optimal (very inefficient)

spine(a few seconds to 3.5hrs)

by arc probabilityrandom

23

Experiments

24

Model Selection using BICBIC(k) = -2logL + klogN

25

Application

spine as a preprocessing step

influence maximizationselect k nodes to maximize spread of action

[Kempe, Kleinberg, Tardos, 03]NP-hard, greedy approximation

perform on sparsified network insteadlarge benefit in efficiency, little loss in quality

26

Application

27

Public Code and Data

http://www.cs.toronto.edu/~mathiou/spine/

28

The End

Questions?

29

Documents

Sparsification of Influence Networks