University of California, Davis - I. Probability …mae.engr.ucdavis.edu/.../253-S16/Lectures/lecture15-2016.pdf– Lauren Ancel Meyers, M.E.J. Newmanb, Babak Pourbohlou, Journal of

ECS 253 / MAE 253, Lecture 15May 17, 2016

I. Probability generating function recap

Part I. Ensemble approaches

• A. Master equations (Random graph evolution, cluster aggregation)

• B. Network configuration model

– Degree distribution, Pk

– Degree sequence (A realization, N specific values drawn from Pk)

• C. Generating functions. Converting a discrete math probleminto a function.

G0(x) =∑k

Pkxk

Moment generating functions

• Base: G0(1) =∑k Pk = 1 (it is the sum of probabilities).

• First moment, 〈k〉 =∑k kPk = G′0(1)

(And note xG′0(x) =∑k kPkx

k)

• Second moment,⟨k2⟩=∑k k

2Pk

ddx(xG

′0(x)) =

∑k k

2Pkx(k−1)

So ddx(xG

′0(x))

∣∣x=1

=∑k k

2Pk

(And note x ddx(xG

′0(x)) =

∑k k

2Pkxk)

• The n-th moment

〈kn〉 =∑k k

nPk =(x ddx

)nG0(x)

∣∣∣x=1

Generating functions for the giant component of a randomgraph

Newman, Watts, Strogatz PRE 64 (2001)

With the basic generating function in place, can build on it tocalculate properties of more interesting distributions.

1. G.F. for connectivity of a node at edge of randomly chosenedge.

2. G.F. for size of the component to which that node belongs.

3. G.F. for size of the component to which an arbitrary nodebelongs.

Following a random edge

• k times more likely to follow edge to a node of degree k than anode of degree 1. Probability random edge is attached to nodeof degree k:

mk = kPk/∑k kPk = kPk/ 〈k〉

• There are k − 1 other edges outgoing from this node.(Called the “excess degree”)

• Each of those leads to a node of degree k′ with probability m′k.

304 Generating functions formalism

reach it, is thus qk = (k +1)Pk+1/!k", and the corresponding generating function thereforereads

G1(x) =!

k

(k + 1)Pk+1

!k"xk = 1

!k"G #

0(x). (A2.5)

The definition and properties of generating functions allow us now to deal with theproblem of percolation in a random network: let us call H1(x) the generating functionfor the distribution of the sizes of the connected components reached by following arandomly chosen edge. Note that H1(x) considers only finite components and thereforeexcludes the possible giant cluster. We neglect the existence of loops, which is indeedlegitimate for such finite components. The distribution of sizes of such components canbe visualized by a diagrammatic expansion as shown in Figure A2.1: each (tree-like)component is composed by the node initially reached, plus k other tree-like components,which have the same size distribution, where k is the number of outgoing links of thenode, whose distribution is qk . The probability that the global component QS has size Sis thus

QS =!

k

qk Prob(union of k components has size S $ 1) (A2.6)

(counting the initially reached node in S). The generating function H1 is by definition

H1(x) =!

S

QS x S, (A2.7)

and the distribution of the sum of the sizes of the k components is generated by Hk1 (as

previously explained for the sum of degrees), i.e.!

S

Prob(union of k components has size S) · x S = (H1(x))k . (A2.8)

A

B

Fig. A2.1. Diagrammatic visualization of (A) Equation (A2.9) and (B) Equa-tion (A2.10). Each square corresponds to an arbitrary tree-like cluster, while thecircle is a node of the network.

(Circles denote isolated nodes, squares components of unknown size.)

What is the PGF for the excess degree?(Build up more complex from simpler)

• Let qk denote the probability of following an edge to a node withexcess degree of k: qk = [(k + 1)Pk+1] / 〈k〉

• The associated GF

G1(x) =1

〈k〉

∞∑k=0

(k + 1)Pk+1xk

=1

〈k〉

∞∑k=1

kPkxk−1

=1

〈k〉G′0(x)

• Recall the most basic GF: G0(x) =∑k Pkx

k

H1(x), Generating function for probability of componentsize reached by following random edge

(subscript 0 on GF denotes node property, 1 denotes edge property)



G1(x) =!

k

(k + 1)Pk+1

!k"xk = 1

!k"G #

0(x). (A2.5)


QS =!

k



H1(x) =!

S

QS x S, (A2.7)



S


A

B


H1(x) = xq0 + xq1H1(x) + xq2[H1(x)]2 + xq3[H1(x)]

3 · · ·

(A self-consistency equation. We assume a tree network.)

Note also that H1(x) = x∑k qk[H1(x)]

k = xG1(H1(x))

H0(x), Generating function for distribution in componentsizes starting from arbitrary node



G1(x) =!

k

(k + 1)Pk+1

!k"xk = 1

!k"G #

0(x). (A2.5)


QS =!

k



H1(x) =!

S

QS x S, (A2.7)



S


A

B


H0(x) = xP0 + xP1H1(x) + xP2[H1(x)]2 + xP3[H1(x)]

3 · · ·

= x∑k Pk[H1(x)]

k = xG0(H1(x))

• Can take derivatives of H0(x) to find moments of componentsize distribution!

• Note we have assumed a tree-like topology.

Expected size of a component starting from arbitrary node

• 〈s〉 = ddxH0(x)

∣∣x=1

= ddxxG0(H1(x))

∣∣x=1

= G0(H1(1)) +ddxG0(H1(1)) · ddxH1(1)

Since H1(1) = 1, (i.e., it is the sum of the probabilities)

〈s〉 = 1 +G′0(1) ·H ′1(1) (Recall 〈k〉 = G′0(1))

• Recall (three slides ago) H1(x) = xG1(H1(x))

so H ′1(1) = 1 +G′1(1)H′1(1) =⇒ H ′1(1) = 1/(1−G′1(1))

And thus, 〈s〉 = 1 +G′0(1)

1−G′1(1)

• Now evaluating the derivative:

G′1(x) =d

dx

1

〈k〉G′0(x) =

1

〈k〉d

dx

∑k

kPkx(k−1)

=1

〈k〉∑k

k(k − 1)Pkx(k−2)

• Evaluate at x = 1

G′1(1) =1

〈k〉∑k

k(k − 1)Pk =1

〈k〉[⟨k2⟩− 〈k〉

]

Expected size of a component starting from arbitrary node

• 〈s〉 = 1 +G′0(1)

1−G′1(1)

• G′0(1) = 〈k〉

• G′1(1) = 1〈k〉[⟨k2⟩− 〈k〉

]

〈s〉 = 1 +G′0(1)

1−G′1(1)= 1 + 〈k〉2

2〈k〉−〈k2〉

Emergence of the giant component

• 〈s〉 → ∞

• This happens when: 2 〈k〉 =⟨k2⟩, which can also be written as

〈k〉 =(⟨k2⟩− 〈k〉

)• This means expected number of nearest neighbors 〈k〉,

first equals expected number of second nearest neighbors(⟨k2⟩− 〈k〉

).

• Can also be written as⟨k2⟩− 2 〈k〉 = 0, which is

the famous Molloy and Reed criteria*, giant emerges when:∑k k (k − 2)Pk = 0.

*GF approach is easier than Molloy Reed!

PGFs widely used in network theory

• Fragility of Power Law Random Graphs to targeted noderemoval / Robustness to random removal– Callaway PRL 2000– Cohen PRL 2000

• Onset of epidemic threshold:– C Moore, MEJ Newman, Physical Review E, 2000

– MEJ Newman - Physical Review E, 2002

– Lauren Ancel Meyers, M.E.J. Newmanb, Babak Pourbohlou,Journal of Theoretical Biology, 2006

– JC Miller - Physical Review E, 2007

• Cascades on random networksWatts PNAS 2002.Susceptible agents drive social change

What really drives a node to activate?

NATO Advanced Study Institute on Mining Massive DataSets for Security

Diffusion and Cascading Behaviour in Networks

author:Jure Leskovec, Computer Science Department, Stanford Universitypublished: Nov. 28, 2007, recorded: September 2007, views: 933

CategoriesTop » Computer Science » Network Analysis

See Also Personal history More by author

Slides

0:00 Diffusion and Cascading Behavior in Networks0:08 Networks –Social and Technological1:41 Examples of Networks2:11 Networks of the Real-world (1)2:40 Networks of the Real-world (2)3:07 Mining Social Network Data4:07 Networks as Phenomena4:27 Models and Laws of Networks5:11 Networks: Rich Data5:51 Networks: A Matter of Scale6:26 Networks: Scale Matters7:26 Structure vs. Process8:02 Diffusion in Social Networks8:23 Overview9:06 Diffusion in Social Networks

Related content

Visitors who watched this lecture also watched...

PhD Thesis Defense: Dynamics of large networks17795 views - Jure Leskovec, 2008

Mining Large Graphs: Laws and Tools4457 views - Jure Leskovec, 2007

Dynamics of Real-world Networks1986 views - Jure Leskovec, 2007

Modeling Social and Information Networks: Opportunities for MachineLearning1216 views - Jure Leskovec, 2009

Cost-effective Outbreak Detection in Networks1309 views - Jure Leskovec, 2007

Dynamics of Large Networks2175 views - Jure Leskovec, 2009

Microscopic Evolution of Social Networks930 views - Jure Leskovec, 2008

Cost-effective Outbreak Detection in Networks485 views - Jure Leskovec, 2007

Modeling real-world networks using Kronecker multiplication1265 views - Jure Leskovec, 2007

Scalable Modeling of Real Graphs using Kronecker Multiplication654 views - Jure Leskovec, 2007

Lecture popularity: You need to login to cast your vote.

Part 1 1:09:52!NOW PLAYING

Part 2 0:39:49

We are currently conducting a short survey. We value your feedback, and wouldappreciate if you took a few moments to respond to some questions. Click here to takethe survey.

Like Be the first of your friends to like this.

Watch videos: (click on thumbnail to launch)

Description

Diffusion is a process by which information, viruses, ideas and new behavior spread overthe network. For example, adoption of a new technology begins on a small scale with afew “early adopters”, then more and more people adopt it as they observe friends andneighbors using it. Eventually the adoption of the technology may spread through thesocial network as an epidemic “infecting” most of the network. As it spreads over thenetwork it creates a cascade. Cascades have been studied for many years by sociologistsconcerned with the diffusion of innovation; more recently, researchers have investigatedcascades for selecting trendsetters for viral marketing, finding inoculation targets inepidemiology, and explaining trends in blogspace.

See Also:

Link this page

Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !

Reviews and comments:

1 Jacob Lee, October 13, 2009 at 5:30 a.m.:

I notice that the first slide is identical to the slide used by Jon Kleinberg here:http://uk.video.yahoo.com/watch/62057...

Launch in a standalone WM PlayerSwitch to Windows Media PlayerDownload slides: mmdss07_leskovec_dcbn_01.pdf (2.6 MB)

Streaming Video HelpWindows Media Player Firefox Plugin - Download

01:36:27

03:52:53

01:19:04

02:21:27

49:05

16:03

16:46

16:51

Turn off the lights

(from Leskovec talk)

Finding the influential nodesMotivation

• Viral marketing – use word-of-mouth effects to sell product withminimal advertising cost.

• Design of search tools to track news, blogs, and other forms ofon-line discussion about current events

Finding the influential nodes: formally

• The minimum set S ∈ V that will lead to the whole networkbeing activated.

• The optimal set of a specified size k = |S| that will lead tolargest portion of the network being activated.

Influentials

• Critical mass model

– NP-hard problem.– NP-hard to even find a approximate optimal set (optimal

to within factor η1−ε where n is network size and ε > 0.)(“inapproximability”)

• Diminishing returns

– Greedy algorithms (e.g. “hill-climbing” within (1−1/e) ∼ 63%

of optimal)

Joining Livejournal: on online bulletin board network

0

0.005

0.01

0.015

0.02

0.025

0 5 10 15 20 25 30 35 40 45 50

pro

bab

ility

k

Probability of joining a community when k friends are already members

• Diminishing returns only sets in once k > 3.

• Network effect not illustrated by curve: If the k friends are highlyclustered, the new user is more likely to join.

(from Leskovec talk)

Documents

University of California, Davis - I. Probability …mae.engr.ucdavis.edu/.../253-S16/Lectures/lecture15-2016.pdf– Lauren Ancel Meyers, M.E.J. Newmanb, Babak Pourbohlou, Journal of