Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf ·...

Debunking the Myths of InfluenceMaximization

Akhil Arora1, Sainyam Galhotra1, Sayan Ranu

sainyam@cs.umass.eduUniversity of Massachusetts, Amherst

January 27, 2017NEDB, 2017

1The first two authors have contributed equally to this work.Influence Maximization January 27, 2017 NEDB, 2017 1 / 19

Information Propagation2: Need for Modelling??

Many real-world processes can be interpreted using conceptsfrom information propagationFor example: Spread of Diseases

2Propagation/Flow/Spread/Diffusion, would be used interchangeablyInfluence Maximization January 27, 2017 NEDB, 2017 2 / 19

Need for Modelling??

Traffic Congestion and its propagation

Influence Maximization January 27, 2017 NEDB, 2017 3 / 19

Other Applications

Using the word-of-mouth effect for:

Viral Marketing: Product/Topic/Event promotionManaging Celebrity/Political campaigns

Detect and Prevent Outbreaks/Epidemics/RumoursMany more . . .

Other Applications

Using the word-of-mouth effect for:Viral Marketing: Product/Topic/Event promotionManaging Celebrity/Political campaigns

Other Applications

Using the word-of-mouth effect for:Viral Marketing: Product/Topic/Event promotionManaging Celebrity/Political campaigns

Existing Information Propagation models

Independent Cascade (IC) and Weighted Cascade (WC) ModelsLinear Threshold (LT) ModelOther models – Heat Diffusion etc.

0.5 0.2

Independent Cascade (IC) and Weighted Cascade (WC) ModelsLinear Threshold (LT) ModelOther models – Heat Diffusion etc.

0.5 0.2

The Influence Maximization (IM) Problem

Input: A graph G, an information-diffusion model IConstraints: The budget (k = |S|) defining the size of the seed-set

Task: Identify the set of most-influential nodes in a networkMaximize σ(S) = E[F(S)]: Expected number of nodes active at theend, if set S is targeted for initial activation

Tractability: The IM problem is NP-hard. Need for ApproximateSolutions!The spread function σ is Monotone and Submodular, thus, asimple GREEDY algorithm provides the best possible (1− 1/e)approximation

Input: A graph G, an information-diffusion model IConstraints: The budget (k = |S|) defining the size of the seed-setTask: Identify the set of most-influential nodes in a network

Maximize σ(S) = E[F(S)]: Expected number of nodes active at theend, if set S is targeted for initial activation

Input: A graph G, an information-diffusion model IConstraints: The budget (k = |S|) defining the size of the seed-setTask: Identify the set of most-influential nodes in a network

Maximize σ(S) = E[F(S)]: Expected number of nodes active at theend, if set S is targeted for initial activation

Need for benchmarking? Wide variety of techniques

MC Simulation1. Run MC Simulation from each node to estimate its spread.2. Exploit submodularity to prune out nodes with low spread

SamplingStore a DAG for a sample of nodes and use it to estimate influ-ence

Approximate ScoringEstimate the influence of the nodes using heuristics as exactcomputation is #P - hard

Need for benchmarking? : Ambiguities

Existing Literature: Use IC, WC interchangeablyActual scenario: Varied behaviour in terms of the spread of seednodes, efficiency and scalability aspects of different techniques.

0 100 200

Seeds (k)

Figure: IMM (ε = 0.5) for Orkut dataset

Need for benchmarking? : Ambiguities

State-of-the-art technique in one aspect behaves the worst inanother aspect of the problem.

0 100 200Seeds (k)

EaSyIMIMM

(a) Memory

0 100 200Seeds (k)

EaSyIMIMM

(b) Running Time

Important Questions

How to choose the most appropriate IM technique in a givenspecific scenario?

What does it really mean to claim to be the state-of-the-art?Are the claims made by the recent papers true?

Important Questions

How to choose the most appropriate IM technique in a givenspecific scenario?What does it really mean to claim to be the state-of-the-art?

Are the claims made by the recent papers true?

Important Questions

How to choose the most appropriate IM technique in a givenspecific scenario?What does it really mean to claim to be the state-of-the-art?Are the claims made by the recent papers true?

Our Framework

Generic framework applicable on all techniques.Unified approach to tune the external parameters.

Algorithms Propagation Models Datasets Configurations

orkSeed Selection Spread Computation

Monte-Carlo (MC)

Simulations

Quality

Efficiency Scalability

Influen

Convergence

Robustness

#Parameters

Convergence calculationSelect the parameters which provide the best quality withouthampering the efficiency and scalability of the technique.

Our Framework

Generic framework applicable on all techniques.Unified approach to tune the external parameters.

Algorithms Propagation Models Datasets Configurations

orkSeed Selection Spread Computation

Monte-Carlo (MC)

Simulations

Quality

Efficiency Scalability

Influen

Convergence

Robustness

#Parameters

Convergence calculationSelect the parameters which provide the best quality withouthampering the efficiency and scalability of the technique.

IMM is always faster than TIM+?

Model ε (TIM+) ε (IMM) Time (TIM+) Time (IMM) GainIC 0.05 0.05 8582.23 829.6 10.3xLT 0.35 0.1 0.79 1.2 0.65x

Table: Comparison of convergence parameter and running time (secs) for IMM and TIM+

over HepPH dataset for 200 seeds

IMM is always faster than TIM+?

Model ε (TIM+) ε (IMM) Time (TIM+) Time (IMM) GainIC 0.05 0.05 8582.23 829.6 10.3xLT 0.35 0.1 0.79 1.2 0.65x

Table: Comparison of convergence parameter and running time (secs) for IMM and TIM+

over HepPH dataset for 200 seeds

CELF++ is the fastest IM technique in the MC estimationparadigm?

1 2 3 4 5 6 7 8 9 101112

Independent Runs (WC)

CELFCELF++

(c) Nethept (WC)

1 2 3 4 5 6 7 8 9 101112

Independent Runs (LT)

CELFCELF++

(d) Nethept (LT)

CELF++ is the fastest IM technique in the MC estimationparadigm?

1 2 3 4 5 6 7 8 9 101112

Independent Runs (WC)

CELFCELF++

(e) Nethept (WC)

1 2 3 4 5 6 7 8 9 101112

Independent Runs (LT)

CELFCELF++

(f) Nethept (LT)

SIMPATH is faster the LDAG?

40 80 120 160 200

#Seeds (k)

LDAGSIMPATH

(g) Nethept

40 80 120 160 200

#Seeds (k)

LDAGSIMPATH

(h) DBLP

SIMPATH is faster the LDAG?

40 80 120 160 200

#Seeds (k)

LDAGSIMPATH

(i) Nethept

40 80 120 160 200

#Seeds (k)

LDAGSIMPATH

(j) DBLP

Conclusions

No technique is the best on all aspects of IM.

Quality

EfficiencyMemory Footprint

TIM/IMMPMC

CELF/CELF++EaSyIM

IRIEIMRank

StaticGreedy

LDAGSIMPATH

(k) Qualitative catego-rization of IM techniques

(l) Which technique to choose & when?

Thanks!

For more details, please refer :A. Arora, S. Galhotra, S. Ranu. Debunking the Myths of InfluenceMaximization : An In-Depth Benchmarking Study. SIGMOD 2017

Debunking the Myths of Influence Maximizationmitdbg.github.io/nedbday/2017/talks/galhotr.pdf ·...

Documents

Refutations to Refutations on Debunking the Myths of ...sayan/refutation.pdf · Akhil Arora, Sainyam Galhotra, Sayan Ranu Recently, groups in University of British Columbia (headed

Debunking Flu Season Myths

Debunking Pop Psychology's Greatest Myths

Debunking Some Social Media Myths

Debunking the Five Myths of Multibody Dynamics …...2019/11/18 · MSC Software: Debunking 5 Myths of MBD Whitepaper WHITEPAPER Debunking the Five Myths of Multibody Dynamics Simulation

Debunking Myths About Poverty

Debunking Dynamic Optimization Myths

Debunking Myths & Mysteries of Retained Search

HR hse Debunking the Myths

Debunking the Myths about Life Insurance

Debunking consumerism myths

Debunking social media myths

Debunking Myths - publichealth.lacounty.govpublichealth.lacounty.gov/tob/pdf/bh/Gary_Tedeschi_Kirsten_Hansen... · Debunking Myths Gary Tedeschi, PhD Kirsten Hansen, MPP . California

Windstream Webinar: Debunking Network Security Myths

Admissionado debunking myths

Debunking the Myths of Financial Aid

Debunking Diet Myths

Debunking adware myths

Debunking SEO Myths

Jala Pearson - Debunking ADHD Myths