16
Modeling the Spread of Influence on the Blogosphere Akshay Java, Pranam Kolari, Tim Finin, and Tim Oates UMBC Tech Report 04/12/06

Modeling the Spread of Influence on the Blogosphere Akshay Java, Pranam Kolari, Tim Finin, and Tim Oates UMBC Tech Report 04/12/06

Embed Size (px)

Citation preview

Modeling the Spread of Influence on the Blogosphere Akshay Java, Pranam Kolari, Tim Finin, and Tim Oates

UMBC Tech Report

04/12/06

Outline

What is influence? Basic Influence Model Influence models for the blogosphere Results Conclusions

What is Influence?

Main Entry: in·flu·ence Pronunciation: 'in-"flü-&n(t)s, esp Southern in-'Function: nounEtymology: Middle English, from Middle French, from Medieval Latin influentia, from Latin influent-, influens, present participle of influere to flow in, from in- + fluere to flow -- more at FLUID1 a : an ethereal fluid held to flow from the stars and to affect the actions of humans b : an emanation of occult power held to derive from stars2 : an emanation of spiritual or moral force

3 a : the act or power of producing an effect without apparent exertion of force or direct exercise of command b : corrupt interference with authority for personal gain4 : the power or capacity of causing an effect in indirect or intangible ways : SWAY5 : one that exerts influence- under the influence : affected by alcohol : DRUNK <was arrested for driving under the influence>

NOT This Kind of Influence! ;-)

Motivation

Influence models studied for cocitation graphs David Kempe, Jon Kleinberg, Eva Tardos Maximizing the

Spread of Influence through a Social Network, KDD 2003

Applies to blogs also. Recent Examples: Startups, Microsoft Origami, Walmart,DoD

GOAL: Predict influential blogs Target nodes to help achieve a “Tipping Point”*

* The Tipping Point: Malcolm Gladwell

Influence on the Blogosphere

Post was

Influenced by

NPR, eWeek

Influence Models for the Blogosphere

Blog Graph Influence Graph

4

3

2

1

5

4

3

2

1

5

2/5

1/5

2/5

1/3

1/3

1/3

1

1/2

1/2

1

Wu,v = Cu,v / dv

U

V

U links to V => U is Influenced by V

Basic Influence Models

Linear Threshold Model

Σ bvw ≥ θv

w is the active neighbor of v

Cascade Model Pvw - probability with which a

node can activate each of its

neighbors, independent of

history.

Influence Graph

4

3

2

1

5

2/5

1/5

2/5

1/3

1/3

1/3

1

1/2

1/2

1

θv

Active

Active

Inactive

Node Selection Heuristics

Inlinks Easily spammed

Centrality Expensive to compute for every large graphs

PageRank Requires link information However, is easy to compute

Greedy Heuristic Computationally expensive However performs better

Effect of Splogs on Node Selection(indegree vs pagerank)

Almost 54% of the links were from splogs/failed to splogs/failed!

Effect of Splogs on Inlinksrank URL #inlinks

1 http://www.livejournal.com/users/pics 3072

2 http://www.boingboing.net 2191

3 http://www.dailykos.com 2017

4 http://www.engadget.com 1942

5 http://profiles.blogdrive.com 1526

6 http://michellemalkin.com 1242

7 http://www.opinionjournal.com 1232

8 http://instapundit.com 1187

9 http://slashdot.org 1124

10 http://www.powerlineblog.com 909

11 http://www.huffingtonpost.com/theblog 905

12 http://corner.nationalreview.com 853

13 http://www.talkingpointsmemo.com 733

14 http://www.captainsquartersblog.com/mt 728

15 http://espn-presents2003-world-seriesofpoker.blogspot.com 711

16 http://3-world-series-of-poker-online-3.blogspot.com 711

17 http://worldseries-of-poker-network-tv-show.blogspot.com 711

18 http://wsop2003.blogspot.com 711

19 http://wsop-bracelet1.blogspot.com 711

20 http://worldseries-poker.blogspot.com 711

21 http://worldseries-of-poker-official.blogspot.com 711

22 http://worldseries-of-poker-wsop.blogspot.com 711

23 http://world-series-of-poker-nocd-patch66.blogspot.com 711

24 http://4-world-series-of-poker-past-winners.blogspot.com 711

25 http://7-wsop-games-7.blogspot.com 711

Tightly Knit

Community

of Splog

Influence Models(without splog detection)

Number of nodes selected

Influence Models (After splog removal)

Influence Models(w.r.t. Technorati Ranks)

Conlusions

Influence models can be applied to blogs not just cocitation graphs

Splogs are a problem Greedy heuristics work well, pagerank is an

inexpensive approximation

Ideas for CIKM 06

Good or bad influence? Associating sentiment with links.

Finding influential blogs for a topic. (SVM accuracy 75-85%)

Community structure of blogs.

Questions Comments/ Feedback? Thanks! Acknowledgement:

Buzzmetrics/Blogpulse for the dataset.