47
CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

CS 322: (Social and Information) Network AnalysisJure LeskovecStanford University

Page 2: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Probabilistic models of network diffusion Probabilistic models of network diffusion How cascades spread in real life: Viral marketing Viral marketing BlogsG b hi Group membership

Last 20 minutes: mid term course evaluation Last 20 minutes: mid‐term course evaluation

10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 2

Page 3: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

How do viruses/rumors propagate?How do viruses/rumors propagate? Will a flu‐like virus linger, or will it become extinct?

(Virus) birth rate β: probability than an infected(Virus) birth rate β: probability than an infected neighbor attacks

(Virus) death rate δ: probability that an infected node healsheals

HealthyN2

Prob. δ

NN1

2

Prob. β

Jure Leskovec, Stanford CS322: Network Analysis 3

Infected N310/27/2009

Page 4: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

General scheme for epidemic models: General scheme for epidemic models:

S…susceptibleE…exposedI…infected

d

10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

R…recoveredZ…immune

4

Page 5: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Assuming perfect g pmixing, i.e., a network is a complete graph

odes

The model dynamics:

mbe

r of n

oNu

time

10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

Susceptible Infected Recovered

5

Page 6: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Susceptible Infective Susceptible (SIS) model Susceptible‐Infective‐Susceptible (SIS) model  Cured nodes immediately become susceptible Virus “strength”: s = β / δ Virus  strength : s = β / δ

Infected by neighbor with prob. β

Susceptible Infective

Jure Leskovec, Stanford CS322: Network Analysis 6

Cured internally with prob. δ

10/27/2009

Page 7: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

f Assuming perfect mixing (complete graph): n

odes

graph):

ISIdS

Num

ber o

f n

ISIdI

dt

N

S sceptible Infected

ISIdt

time

10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

Susceptible Infected

7

Page 8: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Epidemic threshold of a graph is a value of t Epidemic threshold of a graph is a value of t, such that: If strength s β / δ < t epidemic can not happen If strength s = β / δ <  t epidemic can not happen (it eventually dies out)

Given a graph compute its epidemic threshold

Jure Leskovec, Stanford CS322: Network Analysis 810/27/2009

Page 9: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

What should t depend on? What should t depend on? avg. degree? and/or highest degree?  and/or variance of degree? and/or variance of degree? and/or third moment of degree? and/or diameter? and/or diameter?

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 9

Page 10: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

[Wang et al. 2003]

We have no epidemic if: We have no epidemic if:

(Virus) Death Epidemic threshold

β/δ < τ = 1/ λ1,A

( )rate

β 1,A

(Virus) Birth rate largest eigenvalue

► λ A alone captures the property of the graph!

of adj. matrix A

Jure Leskovec, Stanford CS322: Network Analysis

► λ1,A alone captures the property of the graph!

10/27/2009 10

Page 11: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

[Wang et al. 2003]

500 Oregonβ 0 001

10,900 nodes and 31,180 edges

400

d N

odes

β = 0.001

β/δ > τ(above threshold)

3 , g

200

300

f Inf

ecte

d

100

200

umbe

r of

β/δ = τ(at the threshold)

00 250 500 750 1000

N

β/δ < τ

Jure Leskovec, Stanford CS322: Network Analysis

Timeδ: 0.05 0.06 0.07

β(below threshold)

10/27/2009 11

Page 12: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Does it matter how many people are Does it matter how many  people are initially infected?

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 12

Page 13: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

[Leskovec et al., SDM ’07]

Bloggers write posts and refer (link) to other Bloggers write posts and refer (link) to other posts and the information propagates

1310/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

Page 14: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Posts

Blogs

Time 

Information cascade

D t Bl

Time ordered 

hyperlinks

Data – Blogs: We crawled 45,000 blogs for 1 year 10 million posts and 350,000 cascades

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 14

Page 15: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

[Leskovec et al., TWEB ’07]

Senders and followers of recommendations receive discounts on products

10% credit 10% off

• Data – Incentivized Viral Marketing program• 16 million recommendations

  illi   l

Jure Leskovec, Stanford CS322: Network Analysis

• 4 million people• 500,000 products

10/27/2009 15

Page 16: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

[Backstrom et al., KDD ’06]

Use social networks where people belong to Use social networks where people belong to explicitly defined groups

Each group defines a behavior that diffuses Each group defines a behavior that diffuses

Data – LiveJournal: On‐line blogging community with friendship links and user‐defined groupsg p Over a million users update content each month Over 250,000 groups to joinOver 250,000 groups to join

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 16

Page 17: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Prob of adoption depends on the number of Prob. of adoption depends on the number of friends who have adopted [Bass ‘69, Granovetter ’78] What is the shape?What is the shape? Distinction has consequences for models and algorithms

on on

 of a

doptio

 of a

doptio

k = number of friends adopting

Prob

. o

k = number of friends adopting

Prob

. o

k = number of friends adopting k = number of friends adopting

Diminishing returns? Critical mass?10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 17

Page 18: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

[Leskovec et al., TWEB ’07]

DVD recommendations

asing

0.090.1

DVD recommendations(8.2 million observations)

of purcha

0 050.060.070.08

bability o

0 020.030.040.05

Prob

00.010.02

0 10 20 30 40

18

0 10 20 30 40# recommendations received

10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

Page 19: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

[Backstrom et al., KDD ’06]

LiveJournal community membership LiveJournal community membership oining

rob. of jo

k ( b   f f i d  i  th   it )

Pr

Jure Leskovec, Stanford CS322: Network Analysis

k (number of friends in the community)

10/27/2009 19

Page 20: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

[Kossinets‐Watts ‘06]

Sending email:Sending email: Email network of large university Prob. of a link as a function of # of common friends

mail

ob. o

f em

Pro

Jure Leskovec, Stanford CS322: Network Analysis

k (number of common friends)

10/27/2009 20

Page 21: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

For viral marketing: For viral marketing: We see that node v receiving the i‐threcommendation and then purchased the productp p

For communities: At time t we see the behavior of node v’s friends

Questions: When did v become aware of recommendations or f i d ’ b h i ?friends’ behavior? When did it translate into a decision by v to act? How long after this decision did v act? How long after this decision did v act?

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 21

Page 22: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Dependence on number of friends Dependence on number of friends Consider: connectedness of friends x and y have three friends in the group x and y have three friends in the group x’s friends are independent

’ f i d ll t d x y y’s friends are all connected Who is more likely to join?

x y

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 22

Page 23: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Competing sociological theories x y Competing sociological theories Information argument [Granovetter ‘73] S i l it l t [C l ’88]

x y

Social capital argument [Coleman ’88]

Information argument:Information argument:  Unconnected friends give independent support

Social capital argument:Social capital argument: Safety/truest advantage in having friends who know each otherknow each other

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 23

Page 24: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

[Backstrom et al. KDD ‘06]

LiveJournal: 1 million users, 250,000 groups

Social capital argument wins!p gProb. of joining  increaseswith 

adjacent members.

10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 24

Page 25: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Large anonymous online retailerLarge anonymous online retailer (June 2001 to May 2003)

15 646 121 d ti 15,646,121 recommendations 3,943,084 distinct customers 548 523 products recommended548,523 products recommended Products belonging to 4 product groups: books DVDs music VHS

2510/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

Page 26: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

t < t < < t t1 < t2 < … < tn

t3legend

bought but didn’ti    di t

t1

receive a discount

bought andreceived a discount

t2received a recommendationbut didn’t buy

26

t4

10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

Page 27: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

There are relatively few DVD titles, but DVDs account for ~ 50% of recommendations. Recommendations per personp p

DVD: 10 books and music: 2 VHS: 1

Recommendations per purchase Recommendations per purchase books: 69 DVDs: 108 music: 136 VHS: 203

Overall there are 3.69 recommendations per node on 3.85 different products.

Music recommendations reached about the same number of people as DVDs but used only p p y1/5 as many recommendations 

Book recommendations reached by far the most people – 2.8 million. All networks have a very small number of unique edges. For books, videos and music the 

number of unique edges is smaller than the number of nodes – the networks are highlynumber of unique edges is smaller than the number of nodes – the networks are highly disconnected

2710/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

Page 28: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

What role does the product category play? What role does the product category play?      

products customers recommenda-tions edges

buy + getdiscount

buy + no discounttions discount discount

Book 103,161 2,863,977 5,741,611 2,097,809 65,344 17,769

DVD 19,829 805,285 8,180,393 962,341 17,232 58,189

Music 393,598 794,148 1,443,847 585,738 7,837 2,739

Video 26,131 239,583 280,270 160,683 909 467

F ll 542 719 3 943 084 15 646 121 3 153 676 91 322 79 164Full 542,719 3,943,084 15,646,121 3,153,676 91,322 79,164

peoplerecommendations

Jure Leskovec, Stanford CS322: Network Analysis

highlow

10/27/2009 28

Page 29: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Some products are easier to recommend than Some products are easier to recommend than othersd t t number of buy forward tproduct category number of buy

bitsforward

recommendations percent

Book 65,391 15,769 24.2

DVD 16,459 7,336 44.6

Music 7,843 1,824 23.3Music 7,843 1,824 23.3

Video 909 250 27.6

Total 90 602 25 179 27 8

Jure Leskovec, Stanford CS322: Network Analysis

Total 90,602 25,179 27.8

10/27/2009 29

Page 30: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Does sending more recommendations Does sending more recommendations influence more purchases?

5

6

7

ases

3

4

ber o

f Pur

cha

20 40 60 80 100 120 1400

1

2

Num

Jure Leskovec, Stanford CS322: Network Analysis

20 40 60 80 100 120 140Outgoing Recommendations

10/27/2009 30

Page 31: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

What is the effectiveness of subsequent What is the effectiveness of subsequent recommendations?

0 07

0.06

0.07

ying

0.04

0.05

babi

lity

of b

u

0 02

0.03Pro

b

Jure Leskovec, Stanford CS322: Network Analysis

5 10 15 20 25 30 35 400.02

Exchanged recommendations

10/27/2009 31

Page 32: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

consider successful recommendations in terms of av # senders of recommendations per book categoryav. # senders of recommendations per book category av. # of recommendations accepted

books overall have a 3% success rate  (2% with discount, 1% without)

lower than average success rate (significant at p=0 01 level) lower than average success rate (significant at p=0.01 level) fiction romance (1.78), horror (1.81) teen (1.94), children’s books (2.06) i (2 30) i fi (2 34) t d th ill (2 40) comics (2.30), sci‐fi (2.34), mystery and thrillers (2.40)

nonfiction sports (2.26) home & garden (2.26) travel (2 39) travel (2.39)

higher than average success rate (statistically significant) professional & technical medicine (5.68) professional & technical (4 54) professional & technical (4.54) engineering (4.10), science (3.90),  computers & internet (3.61) law (3.66), business & investing (3.62)

3210/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

Page 33: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

47 000 customers responsible for the 2 5 out of47,000 customers responsible for the 2.5 out of 16 million recommendations in the system

29% success rate per recommender of an anime DVD

Giant component covers 19% of the nodes

Overall, recommendations for DVDs are more likely to result in a purchase (7%), but the anime 

i dcommunity stands out

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 33

Page 34: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Variable transformation Coefficient

const -0.940 ***# recommendations ln(r) 0 426 ***# recommendations ln(r) 0.426 # senders ln(ns) -0.782 ***# recipients ln(n ) -1 307 ***# recipients ln(nr) 1.307 product price ln(p) 0.128 ***# reviews ln(v) -0 011 ***# reviews ln(v) -0.011 avg. rating ln(t) -0.027 *

R2 0 74

10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 34

R2 0.74significance at the 0.01 (***), 0.05 (**) and 0.1 (*) levels 

Page 35: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

94% of users make first recommendation without having greceived one previously

Size of giant connected component increases from 1% to 2 % f h k (100 20 ) ll!2.5% of the network (100,420 users) – small!

Some sub‐communities are better connected24% f 18 000 f V 24% out of 18,000 users for westerns on DVD

26% of 25,000 for classics on DVD 19% of 47,000 for anime (Japanese animated film) on DVD

Others are just as disconnected 3% of 180,000 home and gardening 2‐7% for children’s and fitness DVDs

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 35

Page 36: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Products suited for Viral Marketing: small and tightly knit community few reviews, senders, and recipients but sending more recommendations helps

pricey products pricey products rating doesn’t play as much of a role

Observations for future diffusion models: purchase decision more complex than threshold or simple infection influence saturates as the number of contacts expands links user effectiveness if they are overused

Conditions for successful recommendations: professional and organizational contexts discounts on expensive items small tightly knit communities small, tightly knit communities

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 36

Page 37: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

How big are cascades? How big are cascades? What are the building blocks of cascades?blocks of cascades?

973

938

Jure Leskovec, Stanford CS322: Network Analysis

Medical guide book DVD10/27/2009 37

Page 38: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Given a (social) network Given a (social) network A process by spreading over the network creates a graph (a tree)creates a graph (a tree)

Cascade (propagation graph)

Social network

Jure Leskovec, Stanford CS322: Network Analysis

Let’s count cascades

10/27/2009 38

Page 39: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

is the most common cascade subgraph is the most common cascade subgraph It accounts for ~75% cascades in books, CD and VHS, only 12% of DVD cascades, y

is 6 (1.2 for DVD) times more frequent than

For DVDs          is more frequent than Chains (             ) are more frequent than 

i f t th lli i is more frequent than a collision (       )   (but collision has less edges)

Late split ( ) is more frequent than Late split (             ) is more frequent than

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 39

Page 40: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Stars (“no propagation”) Stars ( no propagation )

Bipartite cores (“common friends”)

Nodes having same friends Nodes having same friends

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 40

Page 41: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

t  d ff bookssteep drop‐off books

106= 1.8e6 x-4.98

104

ount

very few large cascades102Co

100 101 102100

x = Cascade size (number of nodes)

Jure Leskovec, Stanford CS322: Network Analysis

x = Cascade size (number of nodes)

10/27/2009 41

Page 42: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

DVD cascades can grow large DVD cascades can grow large Possibly as a result of websites where people sign up to exchange recommendationssign up to exchange recommendations 

shallow drop off – fat tail~ x-1.56

104

Coun

t

a number of large cascades102

Jure Leskovec, Stanford CS322: Network Analysis

100 101 102 103100

x = Cascade size (number of nodes)10/27/2009 42

Page 43: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

The probability of observing a cascade on x p y gnodes follows: p(x) ~ x‐2

Coun

t

Jure Leskovec, Stanford CS322: Network Analysis

x = Cascade size (number of nodes)10/27/2009 43

Page 44: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Cascade sizes follow a heavy‐tailed distributiony Viral marketing: Books: steep drop‐off: power‐law exponent ‐5 DVDs: larger cascades: exponent ‐1.5s a ge cascades e po e 5

Blogs:  Power‐law exponent ‐2

However, it is not a simple branching processo e e , s o a s p e b a c g p ocess A simple branching process (a on k‐ary tree): Every node infects each of k of its neighbors with prob. pgives exponential cascade size distributiongives exponential cascade size distribution

Questions: What role does the underlying social network play? C k t t d li ti d ti Can make a step towards more realistic cascade generation (propagation) model?

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 44

Page 45: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

1) Randomly pick blog to infect  add to cascade

2)  Infect each in‐linked neighbor with probability infect, add to cascade.

B1 B2

11

neighbor with probability 

B1B1

B1 B2

11

B4B3

2

1 3

1

B4B3

2

1 3

1

4

3) Add infected neighbors to cascade.

4) Set node infected in (i) to uninfected.

B1 B2

11

21

B1 B2

11

21

B1 B1

Jure Leskovec, Stanford CS322: Network Analysis

B4B31 3B4

B31 3B4 B4

10/27/2009 45

Page 46: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Generative model 

Coun

t

Coun

tproduces realistic cascades

Cascade size Cascade node in‐degree

β=0.025

ount

nt

Co

Cou

Jure Leskovec, Stanford CS322: Network Analysis

Most frequent cascades Size of star cascade Size of chain cascade

10/27/2009 46

Page 47: Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. · Stanford University ... 10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Blogs – information epidemics Blogs – information epidemics Which are the influential/infectious blogs?

Viral marketing Who are the trendsetters?Who are the trendsetters?  Influential people?

Disease spreading Where to place monitoring stations to detect p gepidemics?

4710/27/2009 Jure Leskovec, Stanford CS322: Network Analysis