Phase transitions in random graphs - UvA · resulting random graph is, respectively, disconnected or connected with high probability. In this thesis we examine both regimes, the regime

MSc MathematicsMaster Thesis

Phase transitions in random graphs

Author: Supervisor:Hamza Ahmadan dr. Bas Kleijn

Examination date:January 16, 2019

Korteweg-de Vries Institute forMathematics

Abstract

The phase transition of a system is a phenomenon for which a particular property ofa system changes at a critical moment. The transition from solid to liquid to gases, byletting the temperature reach critical values, are well known examples of phase transitions.Random graphs also display phases and phase transitions. When the expected degree ofa random graph is smaller than one, the random graph consist of small clusters. Themoment the expected degree passes through the critical value of one, a giant connectedcomponent emerges and hence provides a structured view of the random graph. Similarly,if the expected degree grows slower or faster then the logarithm of the size of the graph, theresulting random graph is, respectively, disconnected or connected with high probability.In this thesis we examine both regimes, the regime where the giant connected componentemerges and the regime of connectedness, in detail.

Title: Phase transitions in random graphsAuthor: Hamza Ahmadan, [email protected], 10483276Supervisor: dr. Bas KleijnSecond Examiner: dr. Sonja CoxExamination date: January 16, 2019

Korteweg-de Vries Institute for MathematicsUniversity of AmsterdamScience Park 105-107, 1098 XG Amsterdamhttp://kdvi.uva.nlSource image front page:https://blogs.harvard.edu/michaellaw/2013/11/29/the-economics-profession-why-are-there-star-economists/

2

http://kdvi.uva.nl

Acknowledgement

I would like to thank my supervisor Bas Kleijn for his well stated suggestions and hisinfectious enthusiasm on this subject. I have learned much from him during this projectand in previous courses. Also, I would like to thank Sonja Cox for being my second readerand a wonderfull teacher during the reading course functional calculus.I would also like to thank my wife whose unwavering support kept me going forwardduring the whole project. I thank her for her patience and kind words of encouragement.Finally, I would like to thank my family for their understanding and supporting thechoices I made during my study career, for otherwise I would not have made it so far.

3

Contents

1 Introduction 5

2 Random graph models 82.1 The two basic models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Equivalence of the models . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Threshold functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4 Other random graph models . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4.1 Random intersection graphs . . . . . . . . . . . . . . . . . . . . . 172.4.2 Random graphs with hidden variables . . . . . . . . . . . . . . . . 182.4.3 Random geometric graphs . . . . . . . . . . . . . . . . . . . . . . 20

3 Branching process 213.1 Galton-Watson branching process . . . . . . . . . . . . . . . . . . . . . . 223.2 Random walk on branching processes . . . . . . . . . . . . . . . . . . . . 253.3 Poisson branching processes . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Phase transitions 304.1 Relation between branching process and random graphs . . . . . . . . . . 304.2 Connectivity threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.3 Supercritical regime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.4 Critical regime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.5 Subcritical regime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5 Stochastic block model 575.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.2 Planted bi-section model . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6 Popular summary 66

Bibliography 68

4

1 Introduction

Imagine that we are living in a random society, where the randomness lies in the connec-tion between people. With connection here we mean whether or not they know each otherwith some fixed probability. This world would be a terrifying world to live in. Indeed,if the probability that two people know each other is small, then people would live anisolated life. Perhaps this society would have died out, since humans have made it so farby working together and learning from each other. On the other hand if the probabilitywould be high then you would expect that everyone knows everyone. However fascinatingthis may sound, a simple blessing like that of anonymity would be unheard of.Therefore, we can ask ourselves: for which probabilities do we see these extreme cases?For which probability do we see a phase transition from a connected world to an isolatedworld? These questions can be answered by using the Erdos-Renyi random graph as amodel for this random society. The Erdos-Renyi random graph exhibits a phase transi-tion in the connectedness of the vertices on the graph. If the probability is above a certaincritical value, then the graph has with high probability a giant connected component andif it is below the critical value then, with high probability, the graph consist of smallconnected components.

The phase transition of the Erdos-Renyi random graph resembles phase transitions seenin nature. A well-known example of such phase transition is that of water. When the tem-perate is below zero Celsius, the water molecules bind with each other through hydrogenbonds to form a crystalline structure. When the temperature is above zero Celsius, thehydrogen bonds gets weaker and the molecules form small clusters of connected moleculesresulting in a liquid state.Another example is that of magnetism phase transition in ferromagnetic materials. Abovea certain temperature the spin of the atoms can point into two different. The momentthe temperature is below that critical value, the spins point to one direction, creating amagnetic field.

The random graph was first studied by Anatol Rapoport, who together with Ray Solomon-off showed that if the average degree of a network is increased then an abrupt transitionfrom disconnected vertices to a giant component occur, [27]. But the true birth of ran-dom graph theory was born due to eight papers that where published by Paul Erdos

5

and Alfred Renyi in the early sixties ([5, 6, 7, 8, 9, 10, 11, 12]). Their motivation forstudying the random graphs was not to answer question about real networks, but toanswer questions related to the properties of graphs. From there on, the random graphswere intensively studied and much is known about these graphs. A random graph can bedescribed as a graph with n vertices with edges between any two vertices occurring withsome fixed probability. This probability is called an edge probability. The probabilitythat a random graph has certain properties will be studied in the asymptotical sense, i.e.a random graph Gn with n vertices has property P for almost all graphs (or with highprobability) if P(Gn has property P)→ 1 as n→∞.

In this thesis we will investigate the phase transition of the Erdos-Renyi random graphand answer the following questions:

1. For which edge probability is the random graph connected with high probability?

2. For which edge probability do we see the emergence of a giant connected componentwith high probability?

3. What is the order of the size of the largest connected component in the randomgraph?

This thesis is mainly based on the work provided by Remco van der Hofstad [29]. It isalso based on Bollobas [4] and Janson et al [20] in an attempt to gather the results in aself-contained and comprehensive way possible. What we missed in the work by Remcovan der Hofstad are the results about the equivalence of the two basic random graphmodels and the structure of the connected components which are discussed by Bollobas[4] and Janson et al [20]. Also, we added some results about the threshold functionswhere an overview can be found in Bollobas [4] and Frieze & Karonski [13]. What wedo appreciate in the work of van der Hofstad is the probabilistic view on random graphswhich is also partly shared by Janson et al, while Bollobas approach the random graphsthrough combinatorics. We also appreciate the well organised and clear structuring ofthe results about the phase transition. Finally, the part about the stochastic block modelis based on Abbe [1], Mossel et al. [24] and on Kleijn & Waaij [22], combining herebythe point of view from information theory, probability theory and statistical theory.

In chapter 2 we describe the two basic Erdos-Renyi random graph models and theirequivalence for certain graph properties. The phase transition happens when the prob-ability passes through a critical value, at which these graph properties change abruptly.The critical value depends on the size of the random graph, and thus it is a functionwhich is called a threshold function. The threshold function and its properties are alsodiscussed in this chapter.

6

In chapter 3 we give the necessary results from the branching process theory that willbe used to prove results about the phase transition. Branching processes also undergo aphase transition which can be related to the phase transition of random graphs.Chapter 4 is where we state the phase transition of the Erdos-Renyi random graph. Weencounter four different phases. The first is when the random graph is connected withhigh probability. The second is when we encounter the emergence of a giant component.The third is when the probability is close to the critical value. Finally, the phase wherethe connected components are small.In chapter 5 we consider a generalization of the Erdos-Renyi random graph, called thestochastic block model. This model is a random graph that contains communities and is acanonical model for community detection. The phase transition in this model determineswhen communities can be detected. Then, we discuss the planted bi-section model whichis a special case of the stochastic block model and the phase transitions it undergoes.

7

2 Random graph models

Let us consider a very complex network; the protein interaction of yeast. The proteinsare represented by vertices and the binding between the proteins, as result of biochem-ical events, are represented by edges. The number of vertices are roughly 2018 and thenumber of edges are 2930, [2]. When looking at the topology of this network, the firstimpression is that edges seem to have been placed random, see figure 2.1. There is noclear crystalline structure which has some regularity in it or any other structure withpredictable architecture.

Figure 2.1: Protein interaction network of Yeast. The vertices correspond to yeast proteins and the edges represent theexperimental detection of chemical binding interactions. This figure has been obtained from [2].

Therefore, when modelling these networks it makes sense to use random graphs as astarting model, as these models are truly random. Random graphs are loosely speakinggraphs in which the edges occur with some fixed probability. The way of defining theprobability gives rise to a variety of random graphs.

In section 2.1 we describe the two most basic models. Thereafter, we show that these twobasic models are in fact equivalent for some graph properties. In section 2.3 we providethe definition of a threshold function and the role threshold functions play in the phase

8

transitions. At last, in section 2.4 we mention some other random graph models (thatare not considered further).

2.1 The two basic models

There are two models that are frequently used to study the random graphs. Theseare the uniform random graph model, ERn (M), and the binomial random graph modelERn (p). Let G n denote the set of all graphs with n vertices and denote the vertex setby [n] = 1, . . . , n. Then, ERn (p) and ERn (M) are in fact probability spaces that aresubsets of G n.The first model, ERn (M), was introduced by Erdos and Renyi in 1959, [5]. This modelconsist of all graphs with n vertices and M = M(n) edges. By denoting N =

(n2

), the

total number of edges possible, the model ERn (M) consist of(NM

)elements where each

is chosen uniformly, i.e with probability(NM

)−1. Hence, the name uniform model. We

will denote the corresponding random graph by ERn (M).The second model, ERn (p), was introduced by Gilbert in 1959, independent of Erdosand Renyi, [16]. This model consist of all graphs with n vertices in which the edges arechosen independently with probability p. This probability is also called an edge probabil-ity. This means that if G is a graph with n vertices and m edges, then the probability ofhaving such a graph in this model is pm(1− p)N−m, and hence the name binomial model.We will denote a random graph from this model by ERn (p).

The edge probability in the binomial model can depend on n, i.e. p = p(n), but themodel is also interesting when p is constant. For example take p = 1

2 . Then the ERn

(12

)is exactly the set G n where each graph has the same probability, namely 2−N . Sincep = 0 and p = 1 are the trivial cases, we will from now on assume that 0 < p < 1 unlessotherwise stated. The degree distribution of a vertex v in the second model is given by abinomial distribution with parameters n− 1 and p. The main advantage of the binomialmodel is that the edges are chosen independently from each other, but the downside ofthis is that the number of edges is not fixed. The number of edges in ERn (p) follow a bi-nomial distribution with parameters N and p. Let |ERn (p) | denote the number of edges.The following proposition shows that the binomial model conditioned on the number ofedges equals the uniform model, [13].

Lemma 2.1 (Frieze & Karonski [13]). The probability of a random graph ERn (p) giventhat |ERn (p) | = M is the same of having a random graph ERn (M) with M edges.

Proof. Let GM be a graph with n vertices and M edges. Then by noticing that

9

ERn (p) = GM ⊆ |ERn (p) | = M we have that

P(ERn (p) = GM

∣∣∣ |ERn (p) | = M)

= P (ERn (p) = GM , |ERn (p) | = M)P (|ERn (p) | = M)

= P (ERn (p) = GM)P (|ERn (p) | = M)

= pM(1− p)N−M(NM

)pM(1− p)N−M

= 1(NM

) = P (ERn (M) = GM) .

(2.1)

Lemma 2.1 plays an important role in the proofs of the equivalence of the two models aswe will see in the next section.In section 2.4 we mention some other interesting random graph models.

2.2 Equivalence of the models

In the previous section we have defined the two models ERn (M) and ERn (p) for therandom graphs. But can we compare these two models? If we know that a random graphin one of the models has a certain property, what can we say about a random graph inthe other model?It turns out that for some graph properties we can indeed compare these two models,namely they are asymptotically equivalent. For this comparison we need first to un-derstand what a graph property is. Consider G n, the set of graphs with n vertices asmentioned in the previous section. A graph property P is a subset of G n such that ifG ∈ P, H ∈ G n and G and H are isomorphic to each other, then this implies thatH ∈ P. Thus a graph property is a subset of G n which is closed under isomorphism.With isomorphism we mean graph isomorphism: two graphs G and H are isomorphic ifthere exist a bijection f between the vertex sets V (G) and V (H) such that the verticesu, v in G are connected by an edge if and only if the vertices f(u) and f(v) in H areconnected by an edge. Connectedness of graphs is an example of a graph property. It isclearly closed under isomorphism. Thus saying that a graph G is connected means thatG lies in the set G : G is connected.A property P is said to be monotone increasing if G ∈P and G ⊆ H implies H ∈P.This means that by adding edges to the graph G, the property P still holds for the newgraph. For example, the property of being connected is an increasing property. Further-more, a property P is said to be monotone decreasing if H ∈P and G ⊆ H implies thatG ∈P. Thus a graph property is monotone increasing if and only if it’s complement ismonotone decreasing.

10

The next theorem due to Bollobas [4] state that a property will be more likely to occurif the number of edges are increased or if the edges are more likely to be chosen.

Theorem 2.1 (Bollobas [4]). Let P be a monotone increasing property and let 0 ≤M1 ≤M2 ≤ N and 0 ≤ p1 ≤ p2 ≤ 1. Then

(i) P(ERn (M1) ∈P) ≤ P(ERn (M2) ∈P) and

(ii) P(ERn (p1) ∈P) ≤ P(ERn (p2) ∈P).

Proof.

(i) Suppose that we form the graph by selecting one by one the M2 edges. If the graphhas at the M1’th edge the property P then it will surely have it when we completethe M2 edges. This follows from the fact that P is a monotone increasing property.

(ii) Let p = p2−p11−p1

. Choose independently ERn (p) and ERn (p1) and let ERn (q) =ERn (p) ∪ ERn (p1). Then the edge probability of q equals q = p1 + p − p1p = p2.Therefore we get that ERn (q) ∈ ERn (p2). Since ERn (p1) ⊆ ERn (p2) we have bythe monotone increasing property of P that if ERn (p1) ∈P then also ERn (p2) ∈P and the result follows.

Theorem 2.1 (ii) tells us that the Erdos-Renyi random graph ERn (p) is monotone in-creasing in p for monotone increasing graph properties (this is actually true for generalmonotone increasing sets). Let X be a set that consist of N elements and P a monotoneincreasing property of subsets of X, thus P ⊆ P(X) such that if A ∈ P and A ⊆ B

then B ∈P. Let Pp(P) be the probability that a set formed by elements of X that areselected independently from each other with probability p, lies in P. Thus,

Pp(P) =∑A∈P

p|A|(1− p)N−|A|. (2.2)

Let us assume that ∅, X 6∈P. Define Hk as the collection of all sets in P whose size isk, i.e.

Hk = A ∈P : |A| = k, hk = |Hk|. (2.3)

Then, since h0 = 0 we obtain

Pp(P) =N∑k=1

hkpk(1− p)N−k. (2.4)

Taking the derivative with respect to p results in

dPp(P)dp =

N∑k=1

pk−1(1− p)N−k (khk − (N − k + 1)hk−1) . (2.5)

11

Each set A ∈ Hk−1 is contained in N − (k − 1) sets B ∈ Hk and each B ∈ Hk containsat least k sets A ∈ Hk−1. This leads to

khk ≥ (N − k + 1)hk−1, (2.6)

and thus (2.5) is nonnegative which implies that Pp(P) is monotone increasing in p, see[4].

Now that we have defined what a graph property is, we can give the conditions suchthat if P(ERn (M) ∈ P) converges to a limit then also P(ERn(p) ∈ P) converges tothe same limit and vice versa. We expect that for large n, the two models behave thesame if the number of edges M in ERn (M) is close to the expected number of edges inERn (p), namely Np. This means that for one direction we can relax the condition onthe properties, while the other way around we need to restrict it to monotone properties.Thus the equivalence holds for monotone properties where M is close to Np. Proposition2.1 and 2.2 describe this equivalence. These propositions are due to Janson et al. [20]where this has been shown for general random sets.

Proposition 2.1 (Janson et al. [20]). Let P be an arbitrary property of graphs,p = p(n) ∈ [0, 1], q = 1− p and a ∈ [0, 1]. Suppose that for every sequence

M = Np+O(√

Npq)

it holds that P(ERn (M) ∈P)→ a, then also P(ERn (p)) ∈P)→ a, as n→∞.

The idea behind the proof of this theorem is that the probability of ERn (p) having acertain property can be bounded in terms of the uniform model. This can be done bythe use of the law of total probability and lemma 2.1.

Proof. Define for a large C the following set,

M(C) = M : |M −Np| ≤ C√Npq. (2.7)

Let M∗ ∈ M(C) be such that P(ERn (M∗) ∈P) = minM∈M(C)

P(ERn (M) ∈P). Then bythe law of total probability and lemma 2.1 we obtain

P(ERn (p) ∈P) =N∑

M=0P(ERn (p) ∈P

∣∣∣ |ERn (p) | = M)P (|ERn (p) | = M)

=N∑

M=0P (ERn (M) ∈P)P (|ERn (p) | = M)

≥N∑

M∈M(C)P(ERn (M∗) ∈P)P(|ERn (p) | = M)

= P(ERn (M∗) ∈P)P(|ERn (p) | ∈ M(C)).

(2.8)

12

Applying Chebyshev’s inequality results in

P (|ERn (p) | 6∈ M(C)) = P(∣∣∣ |ERn (p) | −Np

∣∣∣ > C√Npq

)≤ Npq

(C√Npq)2 = 1

C2 . (2.9)

Therefore, P(|ERn (p) | ∈ M(C)) ≥ 1 − 1C2 , which provides us with the means to give a

lower bound for (2.8) in terms of the uniform model, i.e. (2.8) becomes

P(ERn (p) ∈P) ≥ P(ERn (M∗) ∈P)(

1− 1C2

). (2.10)

By assumption P(ERn (M∗) ∈P)→ a, therefore, when we take the limit over (2.10) weget

lim infn→∞

P (ERn (p) ∈P) ≥ a lim infn→∞

P(|ERn (p) | ∈P) ≥ a(1− 1C2 ). (2.11)

If M ∈M(C) is such that P(ERn

(M)∈P

)= max

M∈M(C)P(ERn (M) ∈P) then,

P(ERn (p) ∈P)

= P(ERn (p) ∈P , |ERn (p) | ∈ M(C)

)+ P

(ERn (p) ∈P , |ERn (p) | 6∈ M(C)

)≤ P

(ERn (p) ∈P

∣∣∣ |ERn (p) | ∈ M(C))P(|ERn (p) | ∈ M(C)

)+ P

(|ERn (p) | 6∈ M(C)

)≤ P

(ERn

(M)∈P

)+ P

(|ERn (p) | 6∈ M(C)

)≤ P

(ERn

(M)∈P

)+ 1C2 .

(2.12)

Then, when we take the limit over (2.12) we get

lim supn→∞

P(ERn (p) ∈P) ≤ a+ 1C2 . (2.13)

By letting C →∞ we get the desired result.

Proposition 2.2 (Janson et al. [20]). Let P be a monotone increasing property andM,N, a, p as above. Suppose that for every sequence p = M

N+O

(√MN3 (N −M)

)it holds

that P(ERn (p) ∈ P) converges to a as n → ∞, then also P(ERn (M) ∈ P) → a asn→∞.

The idea behind the proof of this proposition is similar to that of proposition 2.1. Wewant to give bounds to the uniform model in terms of the binomial model.

Proof. Let C be a large constant and to simplify the notation let p0 = MN

and q0 = 1−p0.Define p+ = min(p0 + C

√p0q0N, 1) and p− = max(p0 − C

√p0q0N, 0). Then by the law of

13

total probability we have for the lower bound the following inequality:

P(ERn (p+) ∈P) =N∑

M ′=0P(ERn (p+) ∈P | |ERn (p+) | = M ′)P(|ERn (p+))| = M ′)

≥N∑

M ′≥MP(ERn (M ′) ∈P)P(|ERn (p+) | = M ′)

≥ P(ERn (M) ∈P)P(|ERn (p+) | ≥M)

= P(ERn (M) ∈P)(1− P(|ERn (p+) | < M))

≥ P(ERn (M) ∈P)− P(|ERn (p+) | < M).(2.14)

As for the upper bound we obtain the following inequality:

P(ERn (p−) ∈P)

= P(ERn (p−) ∈P , |ERn (p−) | ≤M) + P(ERn (p−) ∈P , |ERn (p−) | > M)

≤ P(ERn (M) ∈P) + P(|ERn (p−) | > M).

(2.15)

From (2.14) and (2.15) we see that

P(ERn (M) ∈P) ≥ P(ERn (p−) ∈P)− P(|ERn (p−) | > M), (2.16)

andP(ERn (M) ∈P) ≤ P(ERn (p+) ∈P) + P(|ERn (p+) | < M). (2.17)

Since |ERn (p−) | ∼ Bin(N, p−) we have that the expectation equals Np− and the variancecan be bounded as follows:

Np−(1− p−) ≤ Np0

(1− p0 + C

√p0q0

N

)= M

N

(N −Np0 + C

√p0q0N

)= Np0q0 + p0C

√p0q0N ≤ Np0q0 + C

√p0q0N.

(2.18)

From Chebyshev’s inequality and noticing that Np0q0 ≥ 12 (when we exclude the trivial

cases M = 0 and M = N) we get

P(|ERn (p−) | > M) = P(∣∣∣ (|ERn (p−) | −Np−

∣∣∣ > M −Np−)

= P(∣∣∣ (|ERn (p−) | −Np−

∣∣∣ > NM

N−Np−

)≤ Np−(1− p−)

(Np0 −Np−)2 ≤Np0q0 + C

√p0q0N

(Np0 −Np−)2

≤ Np0q0 + C√p0q0N

C2Np0q0= 1C2 + 1

C(Np0q0)− 1

2

≤ 1C2 + 1

C

√2 := δ(C).

(2.19)

Then, since by assumption P(ERn (p−) ∈P) and P(ERn (p+) ∈P) both converge to a,we get by taking the limits in (2.16) and (2.17),

a− δ(C) ≤ lim infn→∞

P(ERn (M) ∈P) ≤ lim supn→∞

P(ERn (M) ∈P) ≤ a+ δ(C) (2.20)

Since δ(C)→ 0 as C →∞ we get the desired result.

14

2.3 Threshold functions

A peculiar thing happens to the asymptotic properties of random graphs. When the edgeprobability varies, sudden discontinuous changes occur in certain graph properties. Onecan see it as the random graph version of William Shakespeare’s ”To be, or not to be”.If the number of edges M or the edge probability p increases slower then a certain func-tion, which is called a threshold function, the random graph will not have the property inquestion with high probability. Other way around, if they increase faster then the thresh-old function, the random graph will have the property of interest with high probability.We will now give the definition of threshold functions for monotone increasing properties.

Definition 2.1 (Frieze & Karonski [13]). A function p∗ = p∗(n) is called a thresholdfunction for a monotone increasing property P in the random graph ERn (p) if

P(ERn (p) ∈P) =

0 if pp∗→ 0

1 if pp∗→∞

. (2.21)

Definition 2.2 (Frieze & Karonski [13]). A function M∗ = M∗(n) is called a thresholdfunction for a monotone increasing property P in the random graph ERn (M) if

P(ERn (M) ∈P) =

0 if MM∗→ 0

1 if MM∗→∞

. (2.22)

We will treat ”threshold function” and ”threshold” as meaning the same. The nexttheorem is important since a large part of the theory of random graphs is concernedwith finding threshold functions for certain graph properties. (We write f(n) g(n) toindicate that limn→∞

f(n)g(n) → 0 and f(n) g(n) if limn→∞

f(n)g(n) → ∞). This theorem is

due to A. Frieze [13].

Theorem 2.2 (Frieze & Karonski [13]). Every monotone increasing property has a thresh-old.

Proof. Let ε ∈ (0, 1) and define p(ε) by

P(ERn (p(ε)) ∈P) = ε. (2.23)

This is possible because P(ERn (p) ∈P) is a monotone increasing polynomial in p. Thisfollows from theorem 2.1. We will show that p(1

2) is a threshold for P. Let G1, ..., Gk

be independent copies of ERn (p) then ⋃ki=1Gi is distributed like ERn

(1− (1− p)k

). By

Bernoulli’s inequality we have 1− (1− p)k ≤ kp. Then,

P(ERn (kp) 6∈P) ≤ P(ERn (p) 6∈P)k. (2.24)

15

Let ω = ω(n) be a function such that ω → ∞, for instance ω log(log(n)). Supposethat p∗ = p(1

2), then by the above for k = ω

P(ERn (ωp∗) 6∈P) ≤ P(ERn (p∗) 6∈P)ω = 12ω = o(1). (2.25)

This means that if p p∗ then limn→∞ P (ERn (p) ∈P) = 1. If p = p∗

ωthen

12 = P(ERn (p∗) ∈P) = P(ERn

(ωp∗

ω

)6∈P) ≤ P(ERn

(p∗

ω

)6∈P)ω. (2.26)

ThusP(ERn

(p∗

ω

)6∈P) ≥ 2− 1

ω . (2.27)

Thus for p p∗ we have that limn→∞ P(G(n, p) ∈P) = 0.

The following lemma, see [13], is an example of a threshold for the property of having atleast one edge. Note that this property is monotone increasing.

Lemma 2.2 (Frieze & Karonski [13]). The function p∗(n) = 1n2 is a threshold for the

property of having at least one edge for a random graph ERn (p).

Proof. Let X be variable that counts edges. This means that X ∼ Bin((n2

), p). By

Markov’s inequality we get

P(X > 0) ≤(n

2

)p ≤ n2p. (2.28)

Thus if p 1n2 then P(X > 0)→ 0. By the Second Moment Method we have

P (X > 0) ≥ 1− Var(X)E(X)2 . (2.29)

If p 1n2 , then E(X)→∞ and thus Var(X)

E(X)2 = 1−pE(X) → 0. Thus P(X > 0)→ 1.

It is worth mentioning that thresholds are not unique. But if M∗1 is a threshold then M∗

2

is also threshold iff M∗1 = O(M∗

2 ) and M∗2 = O(M∗

1 ). There is a stronger phenomenonthat occur in random graphs that shows a sharper transition around the threshold. Thisis called a sharp threshold and it is defines as follows.

Definition 2.3 (Frieze & Karonski [13]). A function p∗ = p∗(n) is called a sharp thresholdfor a monotone increasing property P in the random graph ERn (p) if

P(ERn (p) ∈P) =

0 if p ≤ (1− ε)p∗

1 if p ≥ (1 + ε)p∗. (2.30)

16

Definition 2.4 (Frieze & Karonski [13]). A function M∗ = M∗(n) is called a sharpthreshold for a monotone increasing property P in the random graph ERn (M) if

P(ERn (M) ∈P) =

0 if M ≤ (1− ε)M∗

1 if M ≥ (1 + ε)M∗. (2.31)

An example of a sharp threshold is the threshold p∗(n) = log(n)n

for connectedness of therandom graph ERn (p) (see section 4.2).

2.4 Other random graph models

In this section we will elaborate on some random graph models that will not be studiedin this thesis, but are worth mentioning to widen the context.

2.4.1 Random intersection graphs

Random intersection graph were introduced by Singer-Cohen [28]. The model describesa graph in which a set of items is assigned to each vertex and two vertices are connectedwith an edge if and only if they share at least a certain number of items. As with theErdos-Renyi random graph models, there are uniform and binomial random s-intersectiongraphs.In a uniform random s-intersection graph with n vertices, each vertex selects l = l(n)distinct items uniformly from the same item pool with m = m(n) distinct items. Anytwo vertices have an edge between them if and only if they share at least s items, where1 ≤ s ≤ l ≤ m. This model is denoted by Gs(n, l,m).In a binomial random s-intersection graph with n vertices, each item from a pool withm = m(n) distinct items is assigned to each vertex independently with probabilityp = p(n). Any two vertices have an edge between them if and only if they share atleast s items. This model is denoted by Hs(n, p,m) and is called binomial since thenumber of items assigned to each node follows a binomial distribution with parametersm and p.The random intersection graph model has a number of applications for example in mod-elling secure wireless communications, and social networks [31].Michael Behrish [3] describes in his paper the phase transition for the binomial random1-intersection random graph. Let the number of elements in the pool be m = nα for someα ∈ (0,∞) \ 1 and the expected degree be c = np2m. Let η be the unique solutionof η = ec(η−1) in (0, 1) for c > 1. Denote the size of the largest connected component inH1(n, p,m) by Hmax. Then

(i) Hmax ≤ 9(1−c)2 log(n) for α > 1 and c < 1;

17

(ii) Hmax ≤ (1 + o(1))(1− η)n for α > 1 and c > 1.

Behrish also describes other results for different criteria of α and c but those will not bediscussed here. The reason we only show these two results is because they resemble theresults as described in sections 4.3 and 4.5. In section 4.3 we show that the size of thelargest connected component is of order n, which is called the giant component. Thisresembles the result of (ii). In section 4.5 we show that the size of the largest connectedcomponent is of order log(n) which resembles (i). Thus the random intersection graphmodel exhibits a phase transition when α > 1 and c passes through one.The case when α = 1 is treated by Lageras and Lindholm [23]. They present it for amore general case. Let m = βn and p = γn−(1+α)/2. Letting µ = βγ2 be the asymptoticexpected degree results in the following:

(i) if µ < 1, then there is no connected component of size O(log(n)) with high proba-bility;

(ii) if µ > 1, then 0 < η < 1 and there exists a unique giant component of size(1− η+ op(1))n with high probability, and the size of the second largest connectedcomponent is with high probability no larger than O(log(n)).

So there is a phase transition when the expected degree passes trough one.

2.4.2 Random graphs with hidden variables

In this model one starts by creating a large number n of vertices. Every vertex v isassigned with a fitness xv, which is a real number measuring the importance of thevertex. Fitness are random numbers taken from a given probability distribution. For anytwo vertices v and w an edge exist between them with probability f(xv, xw) dependingon the importance of the vertices. It is also a generalization of the Erdos-Renyi randomgraphs, which arise when f(xv, xw) = p for all vertices.Let us consider a special case of a random graph with hidden variables. Consider a graphwith 2n vertices and assign to each vertex a fitness that takes value in 0, 1, i.e. for everyvertex v the fitness is xv ∈ 0, 1. Assign to n vertices the fitness 0 and to n vertices thefitness 1. Let the probability of having an edge between two vertices v and w be definedas

f(xv, xw) =

p, if xv = xw

q, if xv 6= xw.

Then, the fitness represents the ”community” in which a vertex resides, and the probabil-ity of two vertices being connected is p if both the vertices are in the same ”community”and q if one of the two resides in another ”community”. This example is called the plantedbi-section model and is much studied in problems concerning community detection, i.e.

18

given a graph can we recover the fitness of the vertices and thus detect the communities?The planted bi-section model and it’s generalization, the stochastic block model whichconsist of several communities, are discussed in chapter 5.

The random graphs with hidden variables have an interesting property that can be re-lated to real networks. Unlike the Erdos-Renyi random graphs that have Poisson degreedistributions, real networks have degree distributions that follow a power law, i.e. ifwe denote the degree distribution by pk then, pk ∼ k−γ, where γ is called the degreeexponent. Networks whose degree distributions follow a power law are called scale-freedistributions, [2]. One way of generating a scale-free network is using the Barabasi-Albertmodel. In this model two assumptions are made:

i) Growth: at each timestep a vertex is added with m edges.

ii) Preferential attachment: newly added vertices prefer to connect to vertices withhigh degree.

The preferential attachment model produces a graph sequence, (PAm,δt )t≥0, and at each

time t yields a graph with t vertices and mt edges, see van der Hofstad [29]. We definethe evolution of the model only for the case m = 1 and refer to van der Hofstad [29] forthe case m > 1. Denote the vertices of PA1,δ

t by v(1)1 , . . . , v

(1)t and the vertex degree of

v(1)i by Di(t). At time t = 1, the graph PA1,δ

1 consist only of one vertex with a self-loop.Given the graph PA1,δ

t , the growth rule is as follows. Add a single vertex v(1)t+1 with a

single edge. The edge is connected either to v(1)t+1 through a self-loop or to a vertex v

(1)j

that resides in PA1,δt with the following probability,

P(vt+1 → v(1)i ) =

1+δ

t(2+δ)+(1+δ) , if i = t+ 1Di(t)+δ

t(2+δ)+(1+δ) , if i ∈ [t],

where δ ≥ −1 is a parameter of the model, added to make it more general. Therefore,in the preferential attachment model, newly added edges are more likely to connect tovertices with a large degree. The case of δ = 0 is the Barabasi-Albert model [2] .The random graph with hidden variable can be used to produce a scale free networkwithout growth or preferential attachment assumptions. This is possible by choosing afitness distribution that follows a power law. It is also possible if the fitness is exponentialdistributed and f(xi, xj) = θ(xi + xj − z(n), with θ the Heaviside function and z(n) agiven threshold, then pk ∼ k−2, see [15] for more details.

19

2.4.3 Random geometric graphs

This model was introduced by Gilbert [17] and it describes a graph whose vertices arerandomly distributed on Rn and where the edges between two vertices depend on thedistance between these two vertices.Let ‖.‖ be a norm defined on Rn and r ∈ R+. Let X1, X2, . . . be sequence of i.i.d randomvariables on Rn with some probability density function f . Let Xn = X1, . . . Xn. Thethe geometric random graph G(Xn, r) is defined as a graph with vertex set Xn and wherethe edges between two vertices Xi and Xj exist if ‖Xi −Xj‖ ≤ r, see [26].The probabilities of properties of G(Xn, r) are difficult to compute, except for small n,therefore the properties will be determined asymptotically, [26]. In other words, take asequence rn and consider the graph G(Xn, rn). The random geometric graphs exhibits aphase transition that involves the limit nrdn → λ where λ is a positive constant and nrdn isapproximately the expected degree. This limit is called in physics the thermodynamicallimit: it described the behaviour of n points in a large region of volume proportional ton, letting n grow with a fixed range r. Let λc be the critical threshold. Then, if λ < λc

the size of the largest connected component is of order log(n). This regime is called thesubcritical regime and λ is called the subcritical thermodynamic limit. If λ > λc, the sizeof the largest connected component is of order n. This regime is called the supercriticalregime and λ is called the supercritical thermodynamic limit.

20

3 Branching process

The idea of branching process started nearly 145 years ago when Galton and Watson [14]addressed the problem of the extinction of families. Galton was not willingly to acceptthe hypothesis that distinguished families are more likely to perish with time. Therefore,the first step he took to study this hypothesis was to determine the probability that anordinary family dies out. He formulated the problem of extinction as follows:

”Let p0, p1, p2, . . . be the respective probabilities that a man has 0, 1, 2, . . . sons,let each son have the same probability for sons of his own, and so on. Whatis the probability that the male line is extinct after r generations, and moregenerally what is the probability for any given number of descendants in themale line in any given generation?”

By studying the statistics on the reproductive rates of English peers, Galton came tothe conclusion that a factor which played an import role is that of heiresses. The peerstended to marry heiresses, who came from families without sons and by inheritance hada lower expected male offspring. This historical note and more on the branching processis treated by Harris [19].In this chapter we will discuss the Galton-Watson branching process and provide someproperties of the Poisson branching process. We will only mention the necessary proper-ties that we will need for the random graphs in chapter 4.In section 3.1 we state the definition of the Galton-Watson branching process and themain result on phase transition of a branching process. In section 3.2 we provide a randomwalk view on branching processes which is an essential tool for many proofs in chapter4. In section 3.3 we provide some results concerning the Poisson branching process andtheir relationship with the binomial branching process. The results in this chapter aremainly based on van der Hofstad [29], unless otherwise stated.

21

3.1 Galton-Watson branching process

Consider objects that can produce additional objects of the same kind, for instance hu-mans or bacteria. Let the initial set of objects be the 0-th generation and their childrenthe first generation and so on. Let the total number of children in the n-th generationbe denoted by Zn for n ∈ N0. We make the following two assumptions,

(i) the number of offspring Zn depends only on the previous generation Zn−1, and

(ii) the objects reproduce independently from one another.

Furthermore we will always assume that Z0 = 1. Let Xn,i denote the number of childrenin the n-th generation that are descendent from individual i in the (n− 1)-th generation.Then, the number of offspring in the n-th generation is

Zn =Zn−1∑i=1

Xn,i, (3.1)

with Xn,in,i≥1 a double infinite sequence of i.i.d. random variables. We will write Xfor the offspring distribution of the branching process, which means Xn,i ∼ X for alln, i ∈ N.

1

2 3 4

5 6 7 8 9 10

Figure 3.1: An example of a branching process with two generations. Here we have that Z0 = 1, Z1 = 3, Z2 = 6 and byletting vertex 2 to be the first individual in the first generation we have that X2,1 = 3.

Let us consider the question that was asked by Galton: what is the probability of extinc-tion? If the branching of the family of objects dies out then there is a generation withno members in it, which means that Zn = 0 for some n ∈ N. Therefore, by extinction wemean that the sequence Znn∈N consist of zeros for all but a finite number of values ofn. This leads to the following definition.

Definition 3.1 (van der Hofstad [29]). The probability of extinction of a branchingprocess is

η = P (∃n ∈ N : Zn = 0) . (3.2)

The branching process goes through a phase transition when the expectation of theoffspring distribution passes through one. If η is smaller than or equal to one, then the

22

population dies out with probability one (unless X = 1 with probability 1). If it is greaterthan one, then there is a positive probability that the population survives for ever. Thefollowing theorem, see [29], is the main result on the phase transition for a branchingprocess.

Theorem 3.1 (van der Hofstad [29]). For a branching process with offspring distributionX we have that,

(i) if E(X) < 1, then η = 1;

(ii) if E(X) > 1, then η < 1, and

(iii) if E(X) = 1 and P(X = 1) < 1 then η = 1.

Furthermore, the extinction probability η is the smallest solution in [0, 1] of

η = GX(η), (3.3)

where GX is the pgf of the offspring distribution: GX(s) = E(sX).

The idea behind this proof is that the probability generating function of Zn is the n-thiteration of the probability generating function of the offspring distribution X. Withthis we can show that the extinction probability is the solution of s 7→ GX(s). Thestatements in (i)− (iii) follows from properties of probability generating functions, sinceE(X) = G′X(1). Proposition 3.1 is based on Harris [19].

Proposition 3.1 (Harris [19]). The probability generating function of Zn is the n-thiteration of the probability generating function of the offspring distribution X.

Proof. Let GZn denote the pgf of Zn. Note that since Z1 = X1,1 ∼ X we have thatGZ1 = GX . Also denote for a random variable Y the n-th iteration as Gn

Y , i.e. GnY (s) =

GY (Gn−1Y (s)). Then

GZn+1(s) =∞∑k=0

P(Zn+1 = k)sk =∞∑i=0

∞∑k=0

P(Zn+1 = k |Zn = i)P(Zn = i)sk

=∞∑i=0

P(Zn = i)GX(s)i = GZn(GX(s)),(3.4)

where we have used that the conditional distribution of Zn+1 given Zn = i equals the sumof i independent random variables that are distributed like X. Since the pgf of a sumof independent random variables is the product of the pgf, the term GX(s)i in (3.4) isexplained. For n = 0 we have that GZ0 = GX(s) = G1

X(s). Suppose that GZn(s) = GnX(s)

for some n. Then by (3.4)

GZn+1(s) = GZn(GX(s)) = GnX(GX(s)) = Gn+1

X (s). (3.5)

Thus we can conclude that GZn(s) = GnX(s) for all n ∈ N.

23

Proof of Theorem 3.1. Let ηn = P(Zn = 0). Then because Zn = 0 ⊆ Zn+1 = 0 wehave that

η = P (∃n ∈ N : Zn = 0) = P( ∞⋃n=0Zn = 0

)

= limn→∞

P(Zn = 0) = limn→∞

ηn.

(3.6)

Also, since ηn = P(Zn = 0) = GZn(0) and the pgf is continuous, proposition 3.1 says that

η = limn→∞

ηn = limn→∞

GZn(0) = limn→∞

GX(Gn−1X (0))

= limn→∞

GX(GZn−1(0)) = limn→∞

GX(ηn) = GX(η).(3.7)

We now can conlcude that the extinction probability is a solution of s 7→ GX(s). It issufficient to prove the theorem for the case 0 < P(X ≤ 1) < 1, because if P(X ≥ 1) = 1we have that Zn ≥ 1 a.s. and thus the population lives for ever. Also if P(X ≤ 1) = 1but p = P(X = 0) < 1 then P(Zn = 0) = 1− (1− p)n → 1 and thus the population diesout.We will now show that η is the smallest solution of s = GX(s) in [0, 1]. This will be doneby induction on ηn. First of all, suppose that ψ is a solution of s = GX(s) in [0, 1]. It isobvious that η0 = 0 ≤ ψ. Then by induction suppose that ηn ≤ ψ. Then since ηn is amonotone increasing sequence with limit η and s 7→ GX(s) is an increasing function wehave that

ηn+1 = GZn+1(0) = GX(GZn(0)) = GX(ηn) ≤ GX(ψ) = ψ, (3.8)

and thus by taking the limit with respect to n we obtain η ≤ ψ. Therefore, we concludethat the extinction probability is the smallest solution in [0, 1] of s 7→ GX(s).Now, since P(X ≤ 1) < 1 we have that G′′X(s) > 0 and thus s 7→ GX(s) is a strictlyincreasing and strictly convex function. This means that there are at most two solutionin [0, 1]. The value s = 1 is always a solution. Because GX(0) > 0 we have that ifG′X(1) = E(X) < 1, the only solution is s = 1 and thus η = 1. If E(X) > 1, thenthere are two solutions and thus η < 1. If E(X) = 1 then there is only one solutionunless GX(s) = s for all s which is equivalent to P(X = 1) = 1. Thus, if E(X) = 1 andP(X = 1) < 1, the only solution is η = 1.

The branching process will be called subcritical if E(X) < 1, critical if E(X) = 1 andsupercritical if E(X) > 1. Another important probability that involves the survival ofthe population is called the survival probability and denoted by ζ = 1 − η. It is theprobability that the population will survive for ever,

ζ = P(Zn > 0, ∀n ∈ N0). (3.9)

24

1

2 3 4

5 6 7 8 9 10

Figure 3.2: An example of tree exploration: at time 0 the vertex 1 was set active and is given the vertices 2, 3 and 4 whichare set to active (colour gray). Then, vertex 1 is set to inactive (black) and X1 = 3. At time 2 the active vertex2 is chosen and the vertices 5,6 and 7 are set to active. Vertex 2 is set to inactive and X2 = 3. The vertices 3and 4 are still active and the process continues. Therefore, the active vertices after 2 explorations is given byS2 = 3 + 3− (2− 1) = 5.

From now on we will study the properties of the total progeny T of the branching process.This is defined as

T =∞∑n=0

Zn. (3.10)

3.2 Random walk on branching processes

What we have seen of the branching process is that it captures the information about thepopulation by investigating the number of children in a particular generation, i.e. Zn.But when we investigate the phase transition in random graphs in chapter 4, it will bemore convenient to explore the branching process in a different way. This will be doneby investigating the number of children of each member of the population. This leadsto a random walk formulation of the branching process. Let Xii∈N be a sequence ofi.i.d. random variables with Xi ∼ X, the offspring distribution of the branching process.Define the sequence Sii∈N0 by

S0 = 1, Si = Si−1 +Xi − 1 = X1 + · · ·+Xi − (i− 1). (3.11)

Also, let T be defined as

T = inft : St = 0 = inft : X1 + · · ·+Xt = t− 1. (3.12)

The population starts with one active individual. At time i, one of the active individualsis selected and is given Xi children. If there are children, then they are set to active andthe individual (parent) is set to inactive. We continue like this until there are no moreactive individuals in the population. This exploration process can be performed either ina breadth-first or a depth-first order. In figure 3.2 we describe a breadth-first order, thuschoosing the active individuals in the same generation. The process Si then describesthe number of active individuals after the first i explorations. For branching process, therecursion (3.11) only makes sense when i ≤ T because then Si ≥ 0.

25

The following theorem, see [29], describes the probability of extinction of a branchingprocess in the supercritical regime for a population having a size greater or equal to agiven size.

Theorem 3.2 (van der Hofstad [29]). For a branching process offspring distribution X

with µ = E(X) > 1,

P(k ≤ T <∞) ≤ e−kI

1− e−I (3.13)

with I given byI = sup

t≤0

(t− log

(E(etX

)))> 0. (3.14)

Proof. Large deviation theory provides bounds on the probability that a sum of indepen-dent random variables is larger or smaller then its expectation. For a sequence Yii∈Nof i.i.d. random variables, it holds for all a ≤ E(Y1) that P

(n∑i=1

Yi ≤ na)≤ e−nI(a) where

I(a) = supt≤0

(ta− log

(E(etY1)

)). The function a 7→ I(a) is called the rate function.

Since Xi ∼ X and E(X) > 1, the expression in (3.14) follows: namely if T = t, thenSt = 0 and this implies that X1 + · · · + Xt = t− 1 ≤ t. Therefore, by letting a = 1 andusing the large deviation theorem we obtain

P(k ≤ T <∞) ≤∞∑t=k

P(St = 0) ≤∞∑t=0

P(X1 + · · ·+Xt ≤ t)

≤∞∑t=k

e−tI = e−kI

1− e−I .(3.15)

3.3 Poisson branching processes

In this section we will study the branching process with a Poisson offspring distributionthat will be denoted as X∗. Denote the distribution of a Poisson branching process withmean λ by P∗λ and the total progeny by T ∗. As we have seen in theorem 3.1 the extinctionprobability of a branching process is the identity solution of the probability generatingfunction. Therefore, by letting ηλ denote the extinction probability of a Poisson branchingprocess with mean λ, we get that

ηλ = Gλ(ηλ) = eλ(ηλ−1). (3.16)

As discussed by van der Hofstad [29], one can show that conditionally on extinction aPoisson branching process is again Poisson but now with mean µλ = ληλ. From this itfollows that µλe−µλ = λe−λ. This leads to the following definition: when µe−µ = λe−λ

for µ < 1 < λ we call µ and λ conjugate pairs. Since x 7→ xe−x has a global maximum ofat x = 1, the equation xe−x = λe−λ with λ > 1 has exactly two solutions, see figure 3.3.

26

The trivial solution is x = λ and the other one is x = µλ < 1. This means that λ and µλare conjugate pairs, and this relation will be seen in the differentiability of the survivalprobability, see theorem 3.5.

0 1 2 3 4 5

0.0

0.1

0.2

0.3

λ

Figure 3.3: Plot of the function f(x) = xe−x (blue line) and f(λ) for λ = 3 (red dotted line).

The law of the total progeny of T ∗ is described in the next theorem, see [29].

Theorem 3.3 (van der Hofstad [29]). For a Poisson branching process with mean λ,

P∗λ (T ∗ = n) = (λn)n−1

n! e−λn. (3.17)

Proof. This proof is based on another theorem which states that the law of the totalprogeny of a branching process with offspring distribution X is

P(T = n) = 1nP(X1 + · · ·+Xn = n− 1). (3.18)

See theorem 3.13 of van der Hofstad [29] for the proof this result.Since our offspring distribution is Poisson with mean λ and the sum of n independentPoisson random variables is again Poisson with mean λn we have by using (3.18) that

P∗λ(T ∗ = n) = 1n

(λn)n−1

(n− 1)!e−λn = (λn)n−1

n! e−λn. (3.19)

This theorem can be used to prove Cayley’s formula which states that the number oftrees of size n equals nn−2, see [29]. The following corollary follows from theorem 3.2, see[29].

27

Corollary 3.1 (van der Hofstad [29]). For a Poisson branching process with mean λ,

P∗λ(k ≤ T ∗ <∞) ≤ e−kIλ

1− e−Iλ , (3.20)

where Iλ = λ − 1 − log(n) is the large deviation rate function for independent Poissonrandom variables.

The following theorem, see [29], describes the asymptotics of the probability mass functionof the Poisson branching process.

Theorem 3.4 (van der Hofstad [29]). For a Poisson branching process with mean λ,

P∗λ (T ∗ = n) = 1√2πn3

e−Iλn(

1 +O( 1n

)), (3.21)

whereIλ = λ− 1− log(n). (3.22)

Proof. For λ = 1 we have by theorem 3.3 that

P∗1(T ∗ = n) = nn−1

n! e−n. (3.23)

By Stirlings approximation n! =√

2πne−nnn(1 +O(n−1)) equation (3.23) becomes

P∗1(T ∗ = n) = nn−1

nn√

2πnene−n(1 +O(n−1)) = 1√

2πn3(1 +O(n−1)). (3.24)

By noting that

P∗λ(T ∗ = n) = 1λλne−(λ−1)nP∗1(T ∗ = n) = 1

λe−IλnP∗1(T ∗ = n), (3.25)

inserting (3.24) into (3.25) yields the desired result.

Theorem 3.5 (van der Hofstad [29]). Let ηλ denote the extinction probability of a Poissonbranching process with mean λ and ζλ = 1−ηλ its survival probability. Then for all λ > 1,

d

dλζλ = ζλ(1− ζλ)

1− µλ, (3.26)

where µλ is the conjugate of λ.

Proof. From equation (3.16) we know that ηλ = eλ(ηλ−1). Consider the following function:

f : (0, 1)→ (1,∞), defined by f(x) = log(x)x− 1 . (3.27)

This function is strictly decreasing with limx↓0 f(x) = ∞ and limx↑1 f(x) = 1. Thisimplies that f is a bijection from (0, 1) to (1,∞) and it is continuously differentiable in

28

its domain. Thus its inverse is also continuously differentiable. Since f(ηλ) = λ we haveby the inverse theorem that

d

dληλ = d

dλf−1(λ) = 1

ddx

∣∣∣x=ηλ

f(x). (3.28)

The derivative of f at the point ηλ is

d

dx

∣∣∣x=ηλ

f(x) = − 1− ληληλ(1− ηλ)

. (3.29)

Combining this with (3.28) gives us that

d

dληλ = d

dλf−1(λ) = −ηλ(1− ηλ)1− ληλ

. (3.30)

This results ind

dλζλ = d

dλ(1− ηλ) = ηλ(1− ηλ)

1− ληλ= ζλ(1− ζλ)

1− µλ, (3.31)

where µλ = ληλ is the conjugate of λ.

Denote the distribution of a binomial branching process with parameters n, p by Pn,p.The binomial distribution Bin(n, p) converges weakly to the Poisson distribution withparameter λ = np. For the branching processes, this has a consequence, formulated inthe following theorem, see [29].

Theorem 3.6 (van der Hofstad [29]). For a binomial branching process with parametersn, p and the Poisson branching process with mean λ = np it holds that for each k ≥ 1,

Pn,p (T ≥ k) = P∗λ (T ∗ ≥ k) + en(k) (3.32)

where|en(k)| ≤ λ2

n

k−1∑s=3

P∗λ (T ∗ ≥ s) . (3.33)

In particular, |en(k)| ≤ kλ2

n.

This theorem will not be proven but we refer to theorem 3.20 of van der Hofstad [29] forthe proof.

29

4 Phase transitions

Growing Erdos-Renyi random graphs exhibit a remarkable change in the size of theirconnected components as a function of edge probabilities. As has been revealed by Erdosand Renyi in one of the founding papers of random graph theory, ”On the evolution ofRandom graphs” [7], the largest connected component of ERn (p) is of order log(n) whenp < 1

n, while for p > 1

nthe size is of order n. Technically, Erdos and Renyi proved it for

the uniform model but the statement is true by equivalence of the two models.The connected components can be described in terms of branching processes. As we haveseen in theorem 3.1, branching process have phase transitions. When the expected off-spring is smaller than one, the population dies out and when it is bigger than one, there isa positive probability that it will survive for ever. Random graphs also undergoe a phasetransition. When the expected degree is smaller than one, the connected components aresmall, while if the expected degree is bigger than one, then there is a giant componentthat contains a positive proportion of all vertices.

In section 4.1 a relationship between branching processes and random graphs is givenwhich is essential for the proofs. In section 4.2 the threshold for connectedness is givenwhen the expected degree is near the logarithm of the size of the graph. In section 4.3we will discuss the behaviour of the largest connected component near the critical value.In section 4.4, the emergence of the giant connected component is investigated and insection 4.5 the regime of ”small” connected components is discussed.

4.1 Relation between branching process and randomgraphs

The connected component containing vertex v (or cluster of v) will be denoted as C (v)and consist of all vertices that can be reached from v. For two vertices v, w ∈ [n] wedenote a path between v and w by v ←→ w, and we will assume that v ←→ v. Fromthis we can define the connected component C (v) by

C (v) = x ∈ [n] : v ←→ x. (4.1)

30

The size of C (v) will be denoted by |C (v)|, which is the number of vertices connected tov. We are primarily interested in the size of the largest connected component Cmax. Thisis equal to a cluster C (v) for which the size is maximal, i.e.

|Cmax| = maxv∈[n]|C (v)|. (4.2)

For any graph G there is a procedure to find the connected component C (v) of a givenvertex v. In this exploration, vertices can have three states: active, neutral or inactive.The state of the vertices changes during the exploration: at time t = 0 let v be an activevertex and all the other vertices be neutral. Then at time t ≥ 1 we choose an arbitraryactive vertex w and explore all the edges ww′ for all w′ that are neutral. If ww′ ∈ E(G)then w′ becomes active, otherwise it remains neutral. After completing the search forneutral neighbourhood vertices w becomes inactive and we let St be the new number ofactive vertices at time t. If St = 0 for the first time, then there are no more active verticesand C (v) is the set of all inactive vertices. This implies that |C (v)| = t.Let Xt be the number of vertices that became active due to the exploration of the tthactive vertex. Then

S0 = 1, St = St−1 +Xt − 1

= X1 + · · ·+Xt − (t− 1).(4.3)

This is true for any graph, so now we will focus on the Erdos-Renyi random graph ERn (p).After t− 1 explorations of active vertices there are t− 1 inactive vertices and St−1 activevertices. Thus there are in total n− (t− 1)− St−1 neutral vertices. Denote the numberof neutral vertices at time i by Ni = n− i− Si. Then conditionally on St−1,

Xt ∼ Bin (Nt−1, p) . (4.4)

The law of St is described in the following proposition, see [29].

Proposition 4.1 (van der Hofstad [29]). For all t ∈ [n], and with St defined as above wehave that

St ∼ Bin(n− 1, 1− (1− p)t

)− (t− 1). (4.5)

Proof. By (4.4) we have that conditionally on St−1,

Nt = n− t− St = n− t− St−1 −Xt + 1

= Nt−1 −Xt ∼ Bin(Nt−1, 1− p).(4.6)

By recursion on t we get that Nt ∼ Bin(n − 1, (1 − p)t). Since X ∼ Bin(m, p) whenY = m−X ∼ Bin(m, 1− p) we get

St + (t− 1) = (n− 1)−Nt ∼ Bin(n− 1, 1− (1− p)t), (4.7)

and thus the result follows.

31

From now on we use the convention p = λn. Note that, for n large enough, λ is close to

the expected degree which is (n−1)p. We are now going to describe the relation betweenconnected components in ERn

(λn

)and the binomial branching process. This relationship

will be used to prove results concerning the largest connected component Cmax.The following theorem shows that the size of a connected component is stochasticallydominated by the total progeny of a binomial branching process, see [29]. We denote thelaw of ERn(p) and the binomial branching process by Pnp and Pn,p respectively.

Theorem 4.1 (van der Hofstad [29]). For each k ≥ 1,

Pnp (|C (v)| ≥ k) ≤ Pn,p(T ≥ k

), (4.8)

with T the total progeny of a binomial branching process with parameters n, p.

Proof. As we have seen, conditionally on Ni−1 we have Xi ∼ Bin(Ni−1, p). Let Yi bea random variable such that conditionally on Ni−1 it is independent of Xi and Yi ∼Bin(n−Ni−1, p). Define Xi by

Xi = Xi + Yi. (4.9)

Since Ni−1 = n − (i − 1) − Si−1 = n − X1 − · · · − Xi−1 − 1 we have that conditionallyon (Xj)i−1

j=1, Xi ∼ Bin(n, p) and the sequence (Xj)j≥1 is i.i.d. Since Y ≥ 0 a.s we gotXi ≥ Xi a.s. Let Si = X1 + · · ·+ Xi − (i− 1). Then,

Pnp (|C (v)| ≥ k) = P (St > 0 ∀t ≤ k − 1) ≤ P(St > 0 ∀t ≤ k − 1

)= Pn,p

(T ≥ k

),

(4.10)

where T = mint : St = 0 is the total progeny of a binomial branching process withparameters n, p.

We can also provide a lower bound for the tail of the size of connected components interms of the total progeny of a binomial branching process. Note that this is not astochastic lower bound because it depends on k.

Theorem 4.2 (van der Hofstad [29]). For each k ∈ [n]

Pnp (|C (v)| ≥ k) ≥ Pn−k,p (T ′ ≥ k) , (4.11)

with T ′ the total progeny of a binomial branching process with parameters n − k, p = λn

,see [29].

Proof. Let Tk denote the stopping time Tk = mint : Nt ≤ n − k. If |C (v)| ≥ k, thenNk−1 = n− (k − 1)− Sk−1 ≤ n− (k − 1)− 1 = n− k and thus Tk ≤ k − 1. This impliesthat

Pnp (|C (v)| ≥ k) = P (St > 0 ∀t ≤ Tk) . (4.12)

32

Let (X ′i)i≥1 be an i.i.d sequence of Bin(n − k, p) random variables. For i ≤ Tk andconditionally on Ni−1, let Yi ∼ Bin(Ni−1− (n−k), p) be independent of all other randomvariables. Define

Xi = X ′i + Yi (4.13)

Then we have that Xi ≥ X ′i a.s. and Xi ∼ Bin(Ni−1, p). Let S ′i = X ′1 + · · ·+X ′i− (i− 1).Then we get

P (St > 0 ∀t ≤ Tk) ≥ P (S ′t > 0 ∀t ≤ Tk) ≥ P (S ′t > 0 ∀t ≤ k − 1)

= P (T ′ ≥ k) ,(4.14)

where T ′ = mint : S ′t = 0 is the total progeny of a binomial branching process withparameters n− k, p. Combining (4.12) with (4.14), we have proven the theorem.

Now that we can compare the tails of the sizes of connected components with the branch-ing processes we can investigate the behaviour of the largest connected component in thesubcritical regime, λ = np < 1, and the supercritical regime, λ > 1. The general idea isto use the bounds of theorem 4.1-4.2 in order to compare the cluster sizes to the binomialbranching processes with parameters n and λ

n. By the main theorem of branching pro-

cesses, theorem 3.1, the behaviour is different when the expected offspring is larger than1 or smaller than 1. Furthermore we can compare the cluster sizes to Poisson branchingprocesses with parameters close to λ. Then we can use the results from the branchingprocesses to complete the proofs.

4.2 Connectivity threshold

We can imagine that if we increase the value of p to one, the random graph becomesconnected, while if we decrease it to zero, it becomes a disconnected graph with isolatedvertices. So it is interesting to ask for which value of p does the graph makes a transitionfrom connected to disconnected. This transition will happen at the sharp thresholdp = log(n)

nas described in theorem 4.3, see [29]. Note that in van der Hofstad [29] the

theorem is divided into two theorems, but here we combined them in an attempt to be ascomprehensive as possible. We will now investigate the connectivity threshold for ERn(λ

n)

for an appropriate choice λ = λn →∞.

Theorem 4.3 (van der Hofstad [29]). Let t be a constant. Then

Pλ(

ERn

(λ

n

)is connected

)=

0, if λ− log(n)→ −∞

e−e−t, if λ− log(n)→ t

1, if λ− log(n)→∞

. (4.15)

33

Let Y = ∑v∈[n]

1|C (v)|=1 be the total number of isolated vertices. If Y ≥ 1 then there

exist at least one isolated vertex and thus the graph is disconnected. Also, as we willsee in proposition 4.3, see [29], when Y = 0, the random graph is also connected withhigh probability. Thus the connectivity threshold is the threshold for the disappearanceof isolated vertices.To prove theorem 4.3 for the case λ−log(n)→ −∞, we make use of Chebyshev’s inequalityto show that the probability that there are no isolated vertices goes to zero. Then by usingproposition 4.3, which states that the probability that the random graph is connectedcan be bounded by the probability that there are no isolated vertices, the result follows.For the case when λ − log(n)→ ∞, we will use Markov’s inequality to show that theprobability that there are no isolated vertices tends to one and using proposition 4.3again, we can conclude that the probability of connectedness goes to one.As for the case when λ− log(n)→ t, we will show that Y converges to a Poisson randomvariable with parameter e−t. To be able to use the Chebychev’s inequality we will firstneed to give bounds on the expectation and variance of Y , see [29].

Proposition 4.2 (van der Hofstad [29]). For every λ ≤ n2 with λ ≥ 1

2 ,

Eλ (Y ) = ne−λ(

1 +O

(λ2

n

)), (4.16)

and for every λ ≤ n

Varλ (Y ) ≤ Eλ (Y ) + λ

n− λEλ (Y )2 . (4.17)

Proof. By the definition of Y we get

Eλ (Y ) = nPλ (|C (v)| = 1) = n

(1− λ

n

)n−1

≤ ne−λeλ/n, (4.18)

and

Eλ (Y ) = nPλ (|C (v)| = 1) ≥ ne−(n−1)λn

(1+λn

)

≥ ne−λ(1+λn

) = ne−λe−λ2/n.

(4.19)

Since λ ≥ 12 we have that O

(λn

)= O

(λ2

n

).

Now we will prove inequality (4.17). If λ = n then p = 1 and thus there is nothing toprove. So we assume that λ < n. By the exchangeability of the vertices we get

Eλ(Y 2)

=∑i,j∈[n]

Pλ (|C (i)| = 1, |C (j)| = 1)

=∑i∈[n]

∑j 6=i∈[n]

Pλ (|C (i)| = 1) + Pλ (|C (i)| = 1, |C (j)| = 1)

=∑i∈[n]

Pλ (|C (i)| = 1) +∑

j 6=i∈[n]Pλ (|C (i)| = 1, |C (j)| = 1)

= nPλ (|C (v)| = 1) + n(n− 1)Pλ (|C (v)| = 1, |C (w)| = 1) .

(4.20)

34

Thus

Varλ (Y ) = n(Pλ (|C (v)| = 1)− Pλ (|C (v)| = 1, |C (w)| = 1)

)+ n2

(Pλ (|C (v)| = 1, |C (w)| = 1)− Pλ (|C (v)| = 1)2

).

(4.21)

The first term of (4.21) can be boundend by the expectation of Y. As for the second termnote that

Pλ (|C (v)| = 1, |C (w)| = 1) =(

1− λ

n

)2n−3

. (4.22)

This leads to

Pλ (|C (v)| = 1, |C (w)| = 1)− Pλ (|C (v)| = 1)2 =(

1− λ

n

)2n−3

−(

1− λ

n

)2n−2

=(

1− λ

n

)2n−2(1− λ

n

)−1

− 1

= Pλ (|C (v)| = 1)2

(1− λ

n

)−1

− 1

= λ

n(1− λn)Pλ (|C (v)| = 1)2 .

(4.23)

Thus

Varλ (Y ) ≤ Eλ (Y ) + n2 λ

n(1− λn)Pλ (|C (v)| = 1)2

≤ Eλ (Y ) + λ

n− λEλ (Y )2 .

(4.24)

Proposition 4.3 (van der Hofstad [29]). For all 0 ≤ λ ≤ n and n ≥ 2

Pλ(

ERn

(λ

n

)is connected

)≤ Pλ (Y = 0) . (4.25)

If there exist an a > 12 such that λ ≥ alog(n), then for n→∞,

Pλ(

ERn

(λ

n

)is connected

)= Pλ (Y = 0) + o(1). (4.26)

Proof. As for the first inequality, if ERn

(λn

)is connected then there are no isolated

vertices so Y = 0. So the first inequality is obvious. As for (4.26) we will need to performsome computations that involve trees: let Xk be the number of trees of size k that cannotbe extended to a tree of larger size. Note that when ERn

(λn

)is disconnected and Y = 0

then there must be a k ∈ 2, ..., n/2 for which Xk ≥ 1. Note furthermore that

Pλ(

ERn

(λ

n

)is connected

)= Pλ (Y = 0)

− Pλ(

ERn

(λ

n

)is disconnected , Y = 0

).

(4.27)

35

Using Markov’s inequality we get that the second term in the right-hand of (4.27) becomes

Pλ(

ERn

(λ

n


)≤ Pλ

n/2⋃k=2

Xk ≥ 1

≤n/2∑k=2

Pλ (Xk ≥ 1) ≤n/2∑k=2

Eλ (Xk) .

(4.28)

So we only have to bound Eλ (Xk). By Cayley’s theorem we have that

Eλ (Xk) =(n

k

)kk−2

(λ

n

)k−1 (1− λ

n

)k(n−k)

≤ nλk−1kk−2

k! eλnk(n−k)

≤ n(eλ)k 1k2 e

λnk(n−k).

(4.29)

Let λ = alog(n) for some a > 12 . Because k(n− k) ≥ kn

2 , equation (4.29) becomes

Eλ (Xk) ≤ n(eλe−λ/2

)k≤ n1−k/4 (ealog(n))k , (4.30)

and therefore, when k ≥ 5, we have that Eλ (Xk) = o(1).As for k ∈ 2, 3, 4, we have that

Eλ (Xk) ≤ n(eλ)4e−λkeo(1) = o(1). (4.31)

Thus combining (4.30) and (4.31) into (4.28) leads to

Pλ(

ERn

(λ

n


)≤

n/2∑k=2

Eλ (Xk) = o(1), (4.32)

and therefore we can conclude that

Pλ(

ERn

(λ

n

)is connected

)= Pλ (Y = 0) + o(1). (4.33)

Proof of theorem 4.3. Suppose that λ − log(n)→ −∞. Then, for n sufficiently large,λ < log(n)≤ n

2 . By proposition 4.2 it follows that

Eλ (Y ) = ne−λ(1 + o(1)) = e−λ+log(n)(1 + o(1))→∞. (4.34)

By Chebyshev’s inequality

Pλ (Y = 0) ≤Eλ (Y ) + λ

n−λEλ (Y )Eλ (Y )2 = Eλ (Y )−1 + λ

n− λ→ 0. (4.35)

36

Then by proposition 4.3, ERn

(λn

)is disconnected with high probability.

When λ − log(n)→ ∞, then 12 log(n) < λ. Let λ be such that λ ≤ 2log(n) for some

sufficiently large n. Then by Markov’s inequality and proposition 4.2,

Pλ (Y = 0) = 1− Pλ (Y ≥ 1) ≥ 1− Eλ (Y ) ≥ 1− neλO(1)→ 1. (4.36)

Thus by proposition 4.3 the ERn

(λn

)is connected for 1

2 log(n)< λ ≤ 2log(n) with highprobability. Since connectedness is an increasing monotone property in λ, the graphERn

(λn

)is connected for λ ≥ 2log(n) with high probability.

Now we will prove that if λ− log(n)→ t, with t ∈ R, the probability that the ERn

(λn

)is

connected is e−e−t(1 + o(1)). To prove this we will show that Y converges in distributionto a Poisson random variable Z with parameter

limn→∞

Eλ (Y ) = limn→∞

e−λ+log(n)(1 + o(1)) = e−t. (4.37)

This implies that

Pλ (Y = 0) = e−t0

0! e−e−t(1 + o(1)) = e−e

−t(1 + o(1)) (4.38)

and the result follows from proposition 4.3.To show that Y d−→ Z it suffices to prove that for all r ≥ 1

limn→∞

Eλ ((Y )r) = limn→∞

∗∑i1,..ir

Pλ (Ii1 = ... = Iir = 1) = e−tr, (4.39)

where Ii = 1|C (v)|=1 and ∑∗i1,..ir denotes a sum over distinct indices, see van der Hofstad[29]. By exchangeability of the vertices, Pλ (Ii1 = ... = Iir = 1) = Pλ (I1 = ... = Ir = 1).Since there are n!

(n−r)! distinct choices for i1, .., ir ∈ [n] we have that

Eλ ((Y )r) = n!(n− r)!Pλ (I1 = ... = Ir = 1)

= n!(n− r)!

(1− λ

n

)(r2)+r(n−r)

=(

1− λ

n

)nr (1− λ

n

)−r(r+1)/2

= n−rEλ (Y )r (1 + o(1)),

(4.40)

where we have used that Eλ (Y ) = n(1− λ

n

)n−1. Thus

limn→∞

Eλ ((Y )r) = limn→∞

n!(n− r)!n

−rEλ (Y )r = e−tr. (4.41)

Thus we can conclude that Y d−→ Z and by proposition 4.3

Pλ(

ERn

(λ

n

)is connected

)= e−e

−t(1 + o(1)). (4.42)

37

At this point we would like to provide the argument used by Erdos and Renyi for provingtheorem 4.3. In their paper ”On Random Graphs I”, [5], they deal with some interestingquestions, one of which is the following: “What is the probability of Γn,N being completelyconnected?”. Here Γn,N is the uniform random graph model with n vertices and N edgesand thus it is the same as ERn (N) as defined in chapter 2. In said paper t is noted thatNc = b1

2nlog(n)+ ncc for some c ∈ R and ask what is the probability that ERn (Nc) isconnected. The answer is that, as n → ∞, the probability of ERn (Nc) being connectedconverges to e−e−2c . For the proof they use what they called a surprising lemma, whichwill now be described. Let a graph ERn (Nc) be of type A if it consist of a connectedgraph having n − k vertices and k isolated vertices for k ∈ N0. Any graph which is notof type A is of type A.

Surprising Lemma 1 (Erdos-Renyi [5]). Let P (A, n,Nc) denote the probability of Γn,Ncbeing of type A. Then we have

limn→+∞

P (A, n,Nc) = 0.

Thus for a large n, Γn,Nc is of type A with high probability, see [5].

As mentioned by Karonski and Rucinski [21], there is a harmless error in the proof ofthis theorem as provided by Erdos and Renyi which was pointed out by Godehardt andSteinbach [18].Here we only give a sketch of the proof that the probability of Γn,Nc being connectedis e−e−2c . Let Xn,Nc denote the number of connected graphs Gn,Nc with n vertices andNc edges. Then Xn,Nc equals the number of graphs of type A that do not have isolatedpoints, i.e. k = 0. Let us denote the probability of having a graph of type A and k = 0by P(A0). Let X ′n,Nc denote the number of all graphs Gn,Nc that do not have isolatedvertices, including those of type A. Let us denote the probability of having a graph withno isolated vertices as P(G0). Then they showed that the probability of having a graphwith no isolated vertices tends to e−e−2c , i.e.,

P(G0) n→∞−−−→ e−e−2c.

Since P(G0)− P(A0) are the connected graphs that are not of type A we get

0 ≤ P(G0)− P(A0) ≤ P(A),

where P(A) is the probability that a graph is not of type A. By the surprising lemmaP(A) goes to zero which means that P(A0)→ e−e

−2c as n→∞ and this proves the claim.If we let c = cn → ±∞ we get the results of theorem 4.3. For more details we refer thereader to [5].

38

4.3 Supercritical regime

This regime is the most interesting one, because it contains a giant connected componentwhich resembles a network. As we have said in the introduction, phase transition ofrandom graphs resemble the phase transition of water to ice, or the phase transition offerromagnetism. Both transitions go from a chaotic state to a well structured state, andthis structured state is similar to the giant connected component. Also the reason whythe supercritical regime is interesting is that most real networks are in this regime, as wewill discuss at the end of this section.The main theorem in this section is theorem 4.4, see [29], which shows that for λ = np >

1, the largest connected component contains a positive fraction of the vertices, whichimplies that most of these vertices are in the same connected component, called the giantcomponent, see figure 4.1. Fix λ > 1 and let ζλ = 1− ηλ be the survival probability of aPoisson branching process with parameter λ.

Theorem 4.4 (van der Hofstad [29]). For any ν ∈ (12 , 1) there exist δ = δ(ν, n) such that

Pλ(∣∣∣|Cmax| − ζλn

∣∣∣ ≥ nν)

= O(n−δ). (4.43)

Figure 4.1: Example of ERn (p) with n = 300 and p = 1.1300 . In this graph, the size of the largest connected component is

63 (red component).

Consider the following random variable:

Z≥k =∑v∈[n]

1|C (v)|≥k. (4.44)

This random variable counts the number of vertices whose connected component has sizelarger or equal to a given value k ∈ [n]. The random variable Z≥k is a comprehensiblevariable for which we have the necessary mathematical tools to study it, unlike the largest

39

connected component which is a complex object. The beauty of the random variable Z≥kis that it can be used to describe the largest connected component in the following way,

|Cmax| = maxk : Z≥k ≥ k. (4.45)

Also we get|Cmax| ≥ k = Z≥k ≥ k. (4.46)

Let kn = Klog(n)for some K > 0 sufficiently large and α < ζλ. The idea behind the proofof this theorem is that for 2α > ζλ, and when there are no clusters of size between kn andαn, that Z≥kn = |Cmax|. So it suffices to prove the theorem in terms of Z≥kn . For this,we show that Eλ (Z≥kn) = nζλ(1 + o(1)), and by using a new variance estimate on Z≥kn

we will see that for all ν ∈(

12 , 1

)that |Z≥kn − Eλ (Z≥kn) | ≤ nν with high probability.

As for the size of clusters between kn and αn we will show that for any α < ζλ thereexist J = J(α) such that for all n sufficiently large, we can bound the probability of thecluster size by Pλ (kn ≤ |C (v)| ≤ αn) ≤ e−knJ .

First we will show that the tail probability of the size of the connected componentsfrom kn ≥ alog(n) and onwards, with a > 1

Iλ, is close to the survival probability of a

Poisson branching process with mean λ. The following proposition is due to van derHofstad [29].

Proposition 4.4 (van der Hofstad [29]). For any kn ≥ alog(n) with a > 1Iλ

and for nsufficiently large,

Pλ (|C (v)| ≥ kn) = ζλ +O

(knn

). (4.47)

Proof. By theorem 4.1 which gives us an upper bound for the cluster tails and by theorem3.6, which lets us compare the binomial branching process to the Poisson branchingprocess we have, with λ = np,

Pλ (|C (v)| ≥ kn) ≤ Pn,p (T ≥ kn) = Pλ (T ∗ ≥ kn) +O

(knn

). (4.48)

Also, by theorem 3.2 we get

Pλ (T ∗ ≥ kn) = Pλ (T ∗ =∞) + Pλ (kn ≤ T ∗ <∞)

= ζλ +O(e−knIλ

)= ζλ +O

(n−aIλ

)= ζλ + o

( 1n

),

(4.49)

because kn ≤ alog(n) and a > 1Iλ

. Thus (4.48) together with (4.49) proves the upperbound

Pλ (|C (v)| ≥ kn) ≤ ζλ + o( 1n

)+O

(knn

). (4.50)

40

As for the lower bound we use the same reasoning only now with theorem 4.2,

Pλ (|C (v)| ≥ kn) ≥ Pn−kn,p (T ≥ kn) = Pλn (T ∗ ≥ kn) +O

(knn

). (4.51)

where λn = λn−knn

. By corollary 3.1 we get that in the same way as in (4.49) that

P∗λn (T ∗ ≥ kn) = ζλn +O(e−knIλn

)= ζλn + o

( 1n

). (4.52)

By the mean-value theorem we have that

ζλn = ζλ + (λn − λ) ddλζn|λ=λ∗ = ζλ +O

(knn

), (4.53)

for some λ∗ ∈ (λn, λ). Putting (4.53) in (4.52) we get that (4.51) becomes

Pλ (|C (v)| ≥ kn) ≥ ζλ + o( 1n

)+O

(knn

), (4.54)

and this gives us the lower bound and the proposition is proved.

This proposition thus implies that when kn = Klog(n) for some large K > 0 that

Eλ (Z≥kn) = nζλ(1 + o(1)). (4.55)

The next step is to show that the number of vertices in large components are concentrated,see corollary 4.1. But to show this we will need a variance estimate on Z≥k, see [29].

Proposition 4.5 (van der Hofstad [29]). For every n and k ∈ [n],

Varλ (Z≥k) ≤ (λk + 1)nE(|C (v)|1|C (v)|<k

). (4.56)

Proof. SinceZ<k = n− Z≥k, (4.57)

we got that Varλ(Z≥k) = Varλ(Z<k) and thus we only need to prove that

Varλ(Z<k) ≤ (λk + 1)nE(|C (v)|1|C (v)|<k

)Splitting the cases whether there is a path between the vertices or not, we can write thevariance of Z<k as

Varλ(Z<k) =∑i,j∈[n]

(Pλ (|C (i)| < k, |C (j)| < k, i←→/ j)− Pλ (|C (i)| < k)2

)+

∑i,j∈[n]

Pλ (|C (i)| < k, |C (j)| < k, i←→ j) .(4.58)

41

Then since |C (i)| = |C (j)| when i←→ j we can compute the second term in (4.58) as∑i,j∈[n]

Pλ (|C (i)| < k, |C (j)| < k, i←→ j) =∑i,j∈[n]

Eλ(1|C (i)|<k1i←→j

)= nE

(|C (v)|1|C (v)|<k

).

(4.59)

To compute the first term in (4.58) we note thatk−1∑l=1

Pλ (|C (i)| = l, |C (j)| < k, i←→/ j)

=k−1∑l=1

Pλ(|C (j)| < k

∣∣∣ |C (i)| = l, i←→/ j)Pλ (|C (i)| = l)Pλ

(i←→/ j

∣∣∣ |C (i)| = l)

≤k−1∑l=1

Pλ(|C (j)| < k

∣∣∣ |C (i)| = l, i←→/ j)Pλ (|C (i)| = l) .

(4.60)

When |C (i)| = l but i←→/ j the law of |C (j)| is identical to that of |C (1)| in a randomgraph with n − l vertices an edge probability p = λ

n. Then, by denoting Pm,λ to be the

distribution of the random graph ERm

(λn

)we get

Pλ (|C (j)| < k | |C (i)| = l, i←→/ j)

= Pn−l,λ (|C (j)| < k)

= Pn,λ (|C (j)| < k) + Pn−l,λ (|C (j)| < k)− Pn,λ (|C (j)| < k) .

(4.61)

We can couple the random graphs ERn−l(λn

)and ERn

(λn

)by adding the vertices in

[n] \ [n− l] to ERn−l(λn

)and by letting the edges between the newly added vertices and

the other n vertices be independent from each other with probability p. In this case wehave that Pn−l,λ (|C (j)| < k)− Pn,λ (|C (j)| < k) means that |C (j)| < k in ERn−l(p) but|C (j)| ≥ k in ERn(p). This means that at least one of the [n]\ [n− l] vertices is connectedto at most k vertices and this probability is bounded by lk λ

n. Thus

Pnp (|C (j)| < k | |C (i)| = l, i←→/ j)− Pnp (|C (j)| < k) ≤ lkλ

n. (4.62)

Thus∑i,j∈[n]

(Pλ (|C (i)| < k, |C (j)| < k, i←→/ j)− Pλ (|C (i)| < k)2

)

=k−1∑l=1

∑i,j∈[n]

(Pλ (|C (i)| = l, |C (j)| < k, i←→/ j)− Pλ (|C (i)| = l)Pλ (|C (j)| < k))

≤k−1∑l=1

∑i,j∈[n]

Pλ (|C (i)| = l) (Pnp (|C (j)| < k | |C (i)| = l, i←→/ j)− Pnp (|C (j)| < k))

≤k−1∑l=1

∑i,j∈[n]

λkl

nPλ (|C (i)| = l) = λk

n

∑i,j∈[n]

Eλ(|C (i)1|C (i)<k

)= nkλEλ

(|C (v)1|C (v)<k

).

(4.63)

42

Now we can show that the number of vertices in large components is concentrated. Thefollowing corollary is due to van der Hofstad [29].

Corollary 4.1 (van der Hofstad [29]). Fix kn = Klog(n) and ν ∈ (12 , 1). Then for K

sufficient large and every δ < 2ν − 1 as n→∞,

Pλ (| Z≥kn − nζλ| > nν) = O(n−δ). (4.64)

Proof. By proposition 4.4 we have that Eλ (Z≥kn) = nζλ + O(kn) = nζλ + o(nν) andtherefore

|Z≥kn − nζλ| > nν ⊆ |Z≥kn − Eλ (Z≥kn) | > 12n

ν. (4.65)

Thus by Chebyshev’s inequality and proposition 4.3 we get

Pλ (| Z≥kn − nζλ| > nν) ≤ Pλ(|Z≥kn − Eλ (Z≥kn) | > 1

2nν)

≤ 4n1−2ν(λk2n + kn) = O(n−δ),

(4.66)

for any δ < 2ν − 1 and n sufficiently large. Note that we have use the fact that for anyrandom variable it holds that E

(X1X<k

)< k.

To prove that there are no clusters of size between kn and αn for kn = Klog(n)and α < ζη

we will show first that the probability of having a cluster of this size is exponentially small.Recall that Iλ denotes the large deviation rate function for Poisson random variables withmean λ, i.e. Iλ = λ − 1 − log(λ). Let J(α, λ) = Ig(α,λ) = g(α, λ) − 1 − log(g(α, λ)) withg(α, λ) = 1−e−αλ

α. Then J(α, λ) > 0 if g(α, λ) 6= 1. Since the extinction probability

ηλ is the unique solution of the pgf of the Poisson branching process with mean λ (see(3.16)), i.e. ηλ = eλ(ηλ−1), we have that the survival probability is the unique solution tog(α, λ) = 1. Also, α 7→ g(α, λ) is a decreasing function since,

∂

∂αg(α, λ) = e−αλ

αλ−(eαλ − 1

)α2 < 0, (4.67)

where we have used the fact that ex− 1 > x for every x ∈ R. Thus we can conclude thatfor every α < ζλ that g(α, λ) > 1 and thus J(α, λ) > 0. The following proposition is dueto van der Hofstad [29].

Proposition 4.6 (van der Hofstad [29]). Let kn be such that kn →∞ as n→∞. Thenfor every α < ζλ,

Pλ (kn ≤ |C (v)| ≤ αn) ≤ e−knJ(α,λ)

1− e−J(α,λ) . (4.68)

43

Proof. We start by giving a bound for the probability in (4.68),

Pλ (kn ≤ |C (v)| ≤ αn) =αn∑t=kn

Pλ (|C (v)| = t) ≤αn∑t=kn

Pλ (St = 0) . (4.69)

In the following we will use the notation Pλ (Bin(m, q) = t) to denote the probabilitythat a binomial random variable with parameters m, q takes the value t. Then, sinceSt ∼ Bin (n− 1, 1− (1− p)t)− (t− 1), (see proposition 4.1), and using that 1− p ≤ e−p

we have that for each s > 0,

Pλ (St = 0) = Pλ(Bin

(n− 1, 1− (1− p)t

)= t− 1

)≤ Pλ

(Bin

(n, 1− (1− p)t

)≤ t

)≤ Pλ

(Bin

(n, 1− e−pt

)≤ t

)= Pλ

(e−sBin(n,1−e−pt) ≥ e−st

).

(4.70)

Markov’s inequality gives

Pλ (St = 0) ≤ estEλ(e−sBin(n,1−e−pt)

)= est

(ept + (1− e−pte−s)

)n= est(1− (1− e−pt)(1− e−s))

≤ est−(1−e−pt)(1−e−s).

(4.71)

The value s∗ that minimizes the term in the exponent equals s∗ = log(

1−e−λtn

tn

)=

log(g(tn, λ))

.

Let t = βn. Note that limβ↓0 g(β, λ) = λ > 1 and as we have seen, β 7→ g(β, λ) isdecreasing with g(ζλ, λ) = 1. Thus s∗ ≥ 0 if t = bαnc with α ≤ ζλ. Substituting s∗ in(4.71) we get

Pλ (St = 0) ≤ et(log(g( tn ,λ)+1−g( tn ,λ))) = e−tIg(t/n,λ) . (4.72)

Because g(α, λ) is decreasing and Iλ is increasing for λ > 1 we get that α 7→ Ig(α,λ) =J(α, λ) is decreasing, and thus for α ∈ [ t

n, ζλ) we get

Pλ (St = 0) ≤ e−tJ(α,λ). (4.73)

Thusαn∑t=kn

Pλ (St = 0) ≤αn∑t=kn

e−tJ(α,λ) ≤ e−knJ(α,λ)

1− e−J(α,λ) . (4.74)

Combining (4.74) and (4.69) gives us the desired result.

Now we can prove the consequence of this proposition, namely, there does not exist amiddle ground.

44

Corollary 4.2 (van der Hofstad [29]). Let δ = KJ(α, λ) − 1. Then the probability thatthere does not exist a connected component whose size is between kn and αn is 1−O(n−δ).

Proof. Let C = 11−eJ(α,λ) . Then by proposition 4.6 we have that

Pλ (∃v ∈ [n] : kn ≤ |C (v)| ≤ αn) ≤ nPλ (kn ≤ |C (v)| ≤ αn) ≤ nCe−knJ(α,λ). (4.75)

When kn = Klog(n) for some sufficiently large K the right hand side equals O(n−δ) withδ = KJ(α, λ)− 1 > 0 for some sufficiently large K > 0.

Now that we have proved that there is no middle ground for connected components andthat the vertices are concentrated in large connected components, we can prove theorem4.4.

Let ν ∈ (12 , 1) and α ∈

(ζλ2 , ζλ

)and let kn = Klog(n) for some sufficiently large K.

Consider the following event,

En = |Z≥kn − nζλ| ≤ nν ∩ @ v ∈ [n] : kn ≤ |C (v)| ≤ αn. (4.76)

As we have said, the idea behind theorem 4.4 is that Z≥kn = |Cmax| on the event En. Thiswill be proven in the next lemma.

Lemma 4.1 (van der Hofstad [29]). Let ν ∈ (12 , 1) and α ∈

(ζλ2 , ζλ

)and let δ be the

minimum of the δ(K, ν) of corollary 4.1 and δ(K,α) of corollary 4.2. Then, the event Enoccur with high probability, i.e. Pλ (Ec

n) = O(n−δ), and Z≥kn = |Cmax| on the event En.

Proof. Note that for Ecn the following holds;

Pλ (Ecn) ≤ Pλ (|Z≥kn − nζλ| > nν) + Pλ (∃ v ∈ [n] : kn ≤ |C (v)| ≤ αn) . (4.77)

Using corollary 4.1 and 4.2 we can bound the right hand side by O(n−δ

)and thus

Pλ (Ecn) = O

(n−δ

), (4.78)

which proves that the event En occurs with high probability.Now we will prove that |Cmax| = Z≥kn . Note that |Z≥kn − nζλ| ≤ nν ⊆ Z≥kn ≥ 1.Thus on the event En it holds that |Cmax| ≤ Z≥kn . Therefore, when |Cmax| < Z≥kn , thereare at least two connected components whose sizes are at least kn. But since there is nomiddle ground, both sizes must be at least αn. This implies that Z≥kn ≥ 2αn > ζλn,since α ∈

(ζλ2 , ζλ

). This is, for n sufficiently large, a contradiction with Z≥kn ≤ nζλ + nν .

Thus |Cmax| = Z≥kn on En.

Proof of theorem 4.4. By lemma 4.1 we have that on En, |Cmax| = Z≥kn . Thus,

Pλ(∣∣∣|Cmax| − ζλ

∣∣∣ ≤ nν)≥ Pλ

(∣∣∣|Cmax| − ζλ∣∣∣ ≤ nν

∩ En

)= Pλ (En) = 1−O(n−δ).

(4.79)

45

At this point we would like to give an intuition behind theorem 4.4. The followingreasoning has been derived from Newman [25]. Let u be the fraction of vertices that donot belong to the giant component. If there does not exist a giant component, then u = 1and if there exist one, then u < 1. So a vertex v does not belong to the giant componentif it is not connected to any vertex in the giant component. This means that for anyother vertex w one of the following two assertions must hold:

i) v is not connected to w, or

ii) v is connected to w but w does not lie in the giant component.

The probability of situation i) is just 1 − p and of ii) is pu. Therefore, the probabilitythat v is not connected to the giant component through w is 1 − p + pu and the totalprobability of not being connected to the giant component is

u = (1− p− pu)n−1 =(

1− λ

n(1− u)

)n−1

.

Taking the logarithm on both sides leads to

log(u)= (n− 1)log(

1− λ

n(1− u)

)

≈ −(n− 1)(λ

n(1− u)

).

Therefore, when we take the limit to infinity with respect to n we get that log(u) =−λ(1− u). This leads to

u = e−λ(1−u).

By defining S asS = 1− u = 1− e−λS,

we have found the fraction of vertices that do belong to a giant component. By (3.16)we know that the extinction probability ηλ is the unique solution for u = e−λ(1−u) andtherefore S = ζλ.As discussed in the beginning of this section is that most of the real (undirected) networksare in the supercritical regime. This follows from the fact that their average degree isbigger than one. For instance, internet has an average degree of 6.34 and the actornetwork has an average degree of 13.46, [2].

4.4 Critical regime

In this section we will investigate the behaviour of the largest connected component ofthe random graph ERn

(λn

)near the critical value p = 1

n. It is this value that separates

46

the transition from a giant component of order n to small connected components of orderlog(n), see section 4.5. Around the value 1

nthe size of the largest connected component is

of order n2/3, see theorem 4.5. Viewing the transition from the point of view of increasingthe value of p, the size of the largest connected component jumps from O(log(n)) toO(n2/3) and then to O(n). This phenomenon of the increase of the size is what Erdosand Renyi referd to as ”double jump”, [6]. The following result is the main theorem inthis section which describes the size of the largest connected component near the criticalvalue. In this section we will let λ = 1 + θn−1/3 where θ ∈ R. The following theorem isdue to van der Hofstad, see [29].

Theorem 4.5 (van der Hofstad [29]). There exist a constant b = b(θ) > 0 such that, forall ω > 1,

Pλ(ω−1n2/3 ≤ |Cmax| ≤ ωn2/3

)≥ 1− b

ω. (4.80)

The idea behind the proof of this theorem is to provide new bounds on the cluster tailprobabilities near the critical value, which then can be used to give the lower bound intheorem 4.5. The bounds on the cluster tails are provided in proposition 4.7. The proofis also based on the expectation of the cluster size near critical value, see proposition 4.8.This is needed to apply the Chebychev’s inequality. Both propositions 4.7 and 4.8 aredue to van der Hofstad, see [29].

Proposition 4.7 (van der Hofstad [29]). Let r > 0. For k ≤ rn2/3, there exist constants0 < c1 < c2 <∞ with c1 = c1(r, θ) such that minr≤1 c1(r, θ) > 0, and c2 independent of rand θ, such that for n sufficiently large,

c1√k≤ Pλ (|C (1)| ≥ k) ≤ c2

((θ ∨ 0)n−1/3 + 1√

k

). (4.81)

Proof. By theorem 4.1 we have that

Pλ (|C (1)| ≥ k) ≤ Pn,p (T ≥ k) , (4.82)

where T is the total progeny of a binomial branching process with parameters n andp = λ

n= 1+θn−1/3

n. By theorem 3.6 we have that

Pn,p(T ≥ k) = P∗λ(T ∗ ≥ k) + en(k), (4.83)

where|en(k)| ≤ 1

n

k−1∑s=1

P∗λ(T ∗ ≥ s). (4.84)

By theorems 3.4 and 3.5, there exist C > 0 independent of θ such that for all s ≥ 1

P∗λ(T ∗ ≥ s) ≤ P∗λ(T ∗ =∞) +∑t=s

P∗λ(T ∗ = t)

= ζλ +∑t=s

P∗λ(T ∗ = t) ≤ C

((θ ∨ 0)n−1/3 + 1√

s

).

(4.85)

47

Therefore, for all k ≤ n,

|en(k)| ≤ 1n

k∑s=1

C

((θ ∨ 0)n−1/3 + 1√

s

)≤ C

((θ ∨ 0)kn−4/3 + 2

√k

n

)

≤ 2C(

(θ ∨ 0)n−1/3 + 1√k

).

(4.86)

Combining (4.85) and (4.86) we obtain that for all k ≤ n,

Pλ (|C (1)| ≥ k) ≤ 4C(

(θ ∨ 0)n−1/3 + 1√k

). (4.87)

By letting c2 = 4C we get the desired upper bound.Now we will prove the lower bound. By monotonicity we can assume that θ ≤ 0. Letk ≤ rn−2/3. By theorem 4.2 we have that

Pλ (|C (1)| ≥ k) ≥ Pn−k,p(T ≥ k), (4.88)

where T is the total progeny of a binomial branching process with parameters n − k ≥n−rn2/3 and p = 1+θn−1/3

n. By monotonicity it is enough to show for λn = 1+(θ−r)n−1/3

that the lower bound hold. By using the bound in (4.86) for θ ≤ 0

Pλ (|C (1)| ≥ k) ≥ P∗λn(T ∗ ≥ k)− C√k

n

≥ P∗λn(T ∗ ≥ k)− C√r

n2/3 .

(4.89)

Then, by theorem 3.3,

Pλ (|C (1)| ≥ k) ≥∞∑t=k

P∗λn(T ∗ = k)− C√r

n2/3

=∞∑t=k

(λnt)t−1

t! e−λnt − C√r

n2/3

= 1λn

∞∑t=k

P∗1(T ∗ = t)e−Iλn t − C√r

n2/3 ,

(4.90)

and since λn ≤ 1 we have that

Iλn = λn − 1− log(λn)= 12(λn − 1)2 +O(|λn − 1|3). (4.91)

By substituting (4.91) in (4.90) we obtain

Pλ (|C (1)| ≥ k) ≥2k∑t=k

P∗1(T ∗ = t)e− 12 (λn−1)2t(1+o(1)) − C

√r

n2/3

≥2k∑t=k

C√t3e−

12 (λn−1)2t(1+o(1)) − C

√r

n2/3

≥ 2−2/3C√k

e−k12 (λn−1)2(1+o(1)) − C

√r

n2/3 .

(4.92)

48

Further more, since k(λn − r)2 ≤ (θ − r)2r we get that for n ≥ N ,

√rn2/3 =

√rkn−2/3√k

≤ rn−1/3√k≤ rN−1/3√k

, (4.93)

so thatPλ (|C (1)| ≥ k) ≥ c1(r, θ)√

k, (4.94)

with c1(r, θ) = C2−3/2e−r(θ−r)2 − Cr/N1/3 > 0 for r ≤ 1, whenever N is sufficiently

large.

Proposition 4.8 (van der Hofstad [29]). Let λ = 1 + θn−1/3 with θ < 0. Then for alln ≥ 1,

Eλ (|C (1)|) ≤ n1/3

|θ|. (4.95)

Proof. Since the |C (1)| is stochastically dominated by T where T is the total progeny ofa binomial branching process parameters n and p = λ

nwe have that

Eλ (|C (1)|) ≤ E(T ) = 11− λ = n1/3

|θ|. (4.96)

Proof of theorem 4.5. Recall that |Cmax| ≥ k = Z≥k ≥ k where Z≥k = ∑v∈[n]

1|C (v)|≥k.

By Markov’s inequality

Pλ(|Cmax| ≥ ωn2/3

)= Pλ

(Z≥ωn2/3 ≥ ωn2/3

)≤ ω−1n−2/3Eλ

(Z≥ωn2/3

). (4.97)

By proposition 4.7 it follows that

Eλ(Z≥ωn2/3

)= nPλ

(|C (1)| ≥ ωn2/3

)≤ c2n

2/3(

(θ ∨ 0) + 1√ω

), (4.98)

which leads to

Pλ(|Cmax| > ωn2/3

)≤ c2n

2/3(

(θ ∨ 0) + 1√ω

)/(ωn2/3)

= c2

ω

((θ ∨ 0) + 1√

ω

).

(4.99)

For the lower bound make use of monotonicity in λ:

P1+θn−1/3(|Cmax| < ω−1n2/3) ≤ P1−θn−1/3(|Cmax| < ω−1n2/3) (4.100)

with θ = |θ| ∨ 1. Therefore it is enough to consider only θ < −1. By using Chebychev’sinequality as well as |Cmax| < k = Z≥k = 0 we obtain,

Pλ(|Cmax| < ω−1n2/3

)= Pλ

(Z≥ω−1n2/3 = 0

)≤ Varλ(Z≥ω−1n2.3)

Eλ(Z≥ω−1n2/3

)2 . (4.101)

49

By proposition 4.7, with k = n2/3/ω and ω ≥ 1,

Eλ(Z≥ω−1n−2/3

)= nPλ

(|C (1)| ≥ ω−1n−2/3

)≥ c1√ωn2/3, (4.102)

where c1 = minr≤1 c1(r, θ) > 0. Also by proposition 4.9 in section 4.5 we have that

Varλ(Z≥ω−1n2/3) ≤ nEλ(|C (1)|1|C (1)|≥ω−1n2/3

). (4.103)

Using proposition 4.8 we can bound this further by using θ ≤ −1,

Varλ(Z≥ω−1n2/3) ≤ nEλ (|C (1)|) ≤ n4/3. (4.104)

Combining (4.104) with (4.102) in (4.101) gives for n large enough,

Pλ(|Cmax| < ω−1n2/3

)≤ n4/3

c21ωn

4/3 = 1c2

1ω. (4.105)

Thus we can conclude that

Pλ(ω−1n2/3 ≤ |Cmax| ≤ ωn2/3

)= 1− Pλ

(|Cmax| < ω−1n2/3

)− Pλ

(|Cmax| > ωn−2/3

)≥ 1− 1

c21ω− c2

ω≥ 1− b

ω.

(4.106)

when b = c−21 + c2.

4.5 Subcritical regime

When λ = 0 the Erdos-Renyi random graph ERn

(λn

)consist of n isolated vertices.

Increasing λ leads in expectation to λ(n−1)2 edges. Therefore, when λ < 1 we have in

expectation a small amount of edges and hence we observe only small clusters, see figure4.2. We will now first investigate the structure of these clusters. Theorem 4.6 andcorollary 4.3 show that the structure of these clusters are either trees or unicycles. Thenwe will show that the size of the largest connected component in the subcritical regimeis of order log(n).Let Tk be the number of isolated trees of order k in ERn(p). Then the total number T ofvertices that are contained in the isolated trees equals, T = ∑n

k=1 kTk. The next theoremshows that if p = λ

nfor some λ < 1 then almost all vertices belong to isolated trees.

Theorem 4.6 (Bollobas [4]). Suppose λ = np. If 0 < λ < 1, then Eλ (T ) = n + O(1)and if λ > 1, then Eλ (T ) = t(λ)n+O(1), where t(λ) = 1

λ

∑∞k=1

kk−1

k! (λe−λ)k.

Proof.

Eλ (Tk) =(n

k

)kk−2

(λ

n

)k−1 (1− λ

n

)kn−k(k+3)/2+1)

. (4.107)

50

Figure 4.2: Example of ERn (p) with n = 300 and p = 0.9300 . In this graph, the size of the largest connected component is

12 (red component).

Thus for 1 ≤ k ≤ n1/2

Eλ (Tk) ≤ nkk−2

k! λk−1e−λkeλk2/2n+3kλ/2n

≤ nkk−2

k! λk−1e−λk(

1 + λ1k2

n

),

(4.108)

and

Eλ (Tk) ≥ nkk−2

k! λk−1(

1− k

n

)ke−λke−λ

2k/n

≥ nkk−2

k! λk−1e−λk(

1− λ2k2

n

),

(4.109)

where λ1 and λ2 do not depend on k and n. Let λ∗ = maxλ1, λ2. Then for 1 ≤ m ≤ n1/2

∣∣∣∣∣Eλ(

m∑k=1

kTk

)− n

m∑k=1

kk−1

k! λk−1e−λk∣∣∣∣∣ ≤ λ∗

m∑k=1

kk+1

k! λk−1e−λk = O(1). (4.110)

The result follows when we show that the terms with k ≥ n1/2 contribute little to thesums. Using (4.107) we get that

Eλ ((k + 1)Tk+1)Eλ (kTk)

= (n− k)(

1 + 1k

)k−2 λ

n

(1− λ

n

)n−k−2

≤ λ

(1− k

n

)e1−λ(1− k

n)(

1− λ

n

)−2

≤(

1− λ

n

)−2

= α,

(4.111)

51

where we have used that xe1−x ≤ 1 for all x > 0. Note that, for j ≥ 1,

Eλ ((k + j)Tk+j)Eλ (kTk)

= Eλ ((k + j)Tk+j)Eλ((k + (j − 1)Tk+(j−1)

) . . . Eλ ((k + 1)))Tk+1)Eλ ((k + Tk)

= αj. (4.112)

Set k1 = bn1/3c. Then, by Stirling’s formula we have that

Eλ (k1Tk1) = o(n−M), (4.113)

for every M > 0. This leads to

Eλ

n∑k=k1

kTk

≤ Eλ (k1Tk1)n∑

k=k1

αk = O(n2)Eλ (k1Tk1) = o(n−3). (4.114)

Also the tail of the series that defines λt(λ) is small:∞∑

k=k1

kk+1

k! λk−1e−λk = o(n−1). (4.115)

Thus∣∣∣∣∣Eλ(

n∑k=1

kTk

)− n

∞∑k=1

kk−1

k! λk−1e−λk∣∣∣∣∣ ≤

∣∣∣∣∣∣Eλ k1∑k=1

kTk

− n k1∑k=1

kk−1

k! λk−1e−λk

∣∣∣∣∣∣+ Eλ

n∑k=k1

kTk

− n ∞∑k=k1

kk−1

k! λk−1e−λk

= O(1).

(4.116)

This lead to the next corollary which states that every connected component in thesubcritical regime is either a tree or unicyclic. Here we assume that ω(n) ≤ log(log(n)).

Corollary 4.3 (Bollobas [4]). Suppose p = λn

, 0 < λ < 1 and ω(n) → ∞. Then, withhigh probability, ERn(λ) is such that every component is a tree or a unicyclic graph andthere are at most ω(n) vertices on the unicyclic components.

Proof. By Markov’s inequality and the fact that T ≤ n we have that

P(T ≤ n− ω(n)) = P(n− T ≥ ω(n)) ≤ O

(1

ω(n)

)n→∞−−−→ 0. (4.117)

As for the case of a unicyclic graph, it is enough to show that no graph has componentsof order k ≤ ω(n) with at least k + 1 edges with high probability. This can be done bythe following bound on the expectation of such components:

ω(n)∑k=4

(n

k

)2(k2)pk ≤

ω(n)∑k=4

2k2

n= o(1). (4.118)

52

Thus we can conclude from this that the only candidate for the largest connected com-ponent in the subcritical regime is an isolated tree.The next two theorems, 4.7 and 4.8, are the main results in this section. They show that|Cmax| ≤ alog(n) for all a > 1

Iλwith high probability, and |Cmax| ≥ alog(n) for all a < 1

Iλ

with high probability.

Theorem 4.7 (van der Hofstad [29]). Let λ < 1 be fixed. Then for every a > 1Iλ

thereexist a δ = δ(a, n) such that P(|Cmax| > alog(n)) = O(n−δ).

The idea behind the proof of this theorem is to use the fact that the connected componentsare stochastically bounded by the total progeny of a binomial branching process. Thenuse the large deviations for binomial distributions to give a bound for the cluster tail.And the last step is to use the fact that |Cmax| ≥ k = ⋃

v∈[n]|C (v)| ≥ k which resultsin the bound Pλ(|Cmax| ≥ k) ≤ nPλ(|C (v)| ≥ k).

Proof. Because the connected components are stochastically dominated by the totalprogeny T of a binomial branching process with parameters n, p, see theorem 4.1, wehave that

Pλ(|C (v)| > t) ≤ Pn,p(T > t) ≤ Pn,p(St > 0)

= Pn,p(X1 + ...+Xt ≥ t) ≤ e−tIλ ,(4.119)

where for the last inequality we have used the large deviation theorem for binomialrandom variables, since X1 + ...+Xt ∼ Bin(nt, λ

n). Thus

Pλ(|Cmax| > alog(n)) ≤ nPλ(|C (v)| > alog(n)) ≤ ne−alog(n)Iλ = n1−aIλ = n−δ, (4.120)

with δ = aIλ − 1 > 0 for every a > 1Iλ

. From this it follows that Pλ (|Cmax| > alog(n)) =O(n−δ).

Theorem 4.8 (van der Hofstad [29]). Let λ < 1 be fixed. Then for every a < 1Iλ

thereexist a δ = δ(a, n) such that P(|Cmax| < alog(n)) = O(n−δ).

Recall the definition of Z≥k in (4.44) and that |Cmax| < k = Z≥k = 0. This isessential in the proof of theorem 4.8 because now in order to prove that Pλ(|Cmax| <alog(n)) = O(n−δ) it is sufficient to prove that Pλ(Z≥alog(n) = 0) = O(n−δ). This canbe done by using Chebyshev’s inequality. But in order to apply Chebyshev’s inequalitywe need to have a lower bound on the expectation of Z≥k and an upper bound on thevariance of this random variable. As for the upper bound for the variance we will use thefollowing variance estimate on Z≥k.

Proposition 4.9 (van der Hofstad [29]). For every n and k ∈ [n]

Var (Z≤k) ≤ nE(|C (v)|1|C (v)|≥k

).

53

Note that the estimate in proposition 4.5 is a better estimate in the supercritical regimethan the estimate in proposition 4.9. This is because when we are in the supercriticalregime and use the estimate in proposition 4.9 we get that nEλ

(|C (v)|1|C (v)|≥k

)=

O (n2). This is because in the supercrtitical regime |C (v)| = O(n), which follows fromtheorem 4.4. The bound in proposition 4.5 is at most O(k2n) which is smaller when k isnot too larger.

Proof.

Var (Z≤k) = E(Z2≤k

)− E (Z≤k)2

= E

∑i,j∈[n]

1|C (i)|≥k, |C (j)|≥k

− ∑i,j∈[n]

E(1|C (i)

)E(1|C (j)

)

=∑i,j∈[n]

(Pλ(|C (i)| ≥ k, |C (j)| ≥ k

)− Pλ

(|C (i)| ≥ k

)2).

(4.121)

For Pλ (|C (i)| ≥ k, |C (j)| ≥ k) we need to distinguish whether i and j are connected bya path or not. Recall that i←→ j denotes a path between i and j. Then,

Pλ(|C (i)| ≥ k, |C (j)| ≥ k

)= Pλ

(|C (i)| ≥ k, i←→ j

)+ Pλ

(|C (i)| ≥ k, |C (j)| ≥ k, i←→/ j

)= Pλ

(|C (i)| ≥ k, i←→ j

)+∑l≥k

Pλ(|C (i)| = l, |C (j)| ≥ k, i←→/ j

).

(4.122)

By conditioning the second term of (4.122) it becomes∑l≥k

Pλ(|C (i)| = l, |C (j)| ≥ k, i←→/ j

)=∑l≥k

Pλ(|C (j)| ≥ k

∣∣∣ |C (i)| = l, i←→/ j)Pλ(|C (i)| = l, i←→/ j

).

(4.123)

Now if |C (i)| = l and i ←→/ j then all the vertices outside of this component, includingj, form a random graph ERn−l

(λn

). Since Pnp

(|C (j)| ≥ k

)is increasing in n we get

Pnp(|C (j)| ≥ k

∣∣∣ |C (i)| = l, i←→/ j)

= P(n−l)p(|C (j)| ≥ k

)≤ Pnp

(|C (j)| ≥ k

). (4.124)

Therefore, (4.123) can be bounded by (4.124) which provides us with∑l≥k

Pλ(|C (i)| = l, |C (j)| ≥ k, i←→/ j

)≤∑l≥k

Pλ(|C (j)| ≥ k

)Pλ(|C (i)| = l

)= Pλ

(|C (j)| ≥ k

)Pλ(|C (i)| ≥ k

)= Pλ

(|C (i)| ≥ k

)2.

(4.125)

Combining (4.125) with (4.122) we get

Pλ(|C (i)| ≥ k, |C (j)| ≥ k

)≤ Pλ

(|C (i)| ≥ k, i←→ j

)+ Pλ

(|C (i)| ≥ k

)2. (4.126)

54

Thus (4.121) becomes by using (4.126) the following

Varλ(Z≥k

)≤

∑i,j∈[n]

Pλ(|C (i)| ≥ k, i←→ j

)=∑i∈[n]

∑j∈[n]

Eλ(1|C (i)|≥k1j∈C (i)

)=∑i∈[n]

Eλ(1|C (i)|≥k|C (i)|

)= nEλ

(|C (1)|1|C (1)|≥k

).

(4.127)

Proof of theorem 4.8. As we have said above, to prove theorem 4.8 it is sufficient to provethat Pλ

(Z≥alog(n) = 0

)= O(n−δ) and for that we will use Chebyshev’s inequality. Using

proposition 4.9 we can give an upper bound to Varλ (Z≥k). So what we are going to donow is to derive a lower bound for Eλ (Z≥k).

By using the definition of Z≥k we see that Eλ (Z≥k) = nPλ (|C (v)| ≥ k). Since the tails ofconnected components are bounded below by the total progeny of a binomial branchingprocess with parameters n− k, p, see theorem 4.2, we have that

Pλ (|C (v)| ≥ k) ≥ Pn−k,p (T ≥ k) . (4.128)

Also, because we can compare the tails of the total progeny of a binomial branchingprocess with the tail of the total progeny of a Poisson branching process, see theorem 3.6,we get

Pn−k,p (T ≥ k) = Pλn (T ∗ ≥ k) +O

(kλ2

n

), (4.129)

where λn = (n− k)λn. By the law of the total progeny of a Poisson branching processes,

see theorem 3.3, we have

Pλn (T ∗ ≥ k) =∞∑l=k

Pλn (T ∗ = l) =∞∑l=k

(λnl)l−1

l! e−λnl. (4.130)

Using Stirling’s formula,

l! =(l

e

)l√2πl(1 + o(1)), (4.131)

and since Iλn = Iλ + o(1), (4.130) becomes

Pλn (T ∗ ≥ k) =∞∑l=k

(e

l

)l (λnl)l−1√

2πle−λnl(1 + o(1))

=∞∑l=k

1λ(√

2πl3)e−λnl+llog(λn)+l(1 + o(1))

=∞∑l=k

1λ(√

2πl3)e−lIλn (1 + o(1))

≥ e−kIλ(1+o(1)).

(4.132)

55

So with k = alog(n) we get

Eλ(Z≥alog(n)

)= nPλ (|C (v)| ≥ alog(n)) ≥ e−alog(n)Iλ(1+o(1))

= n(1−aIλ)(1+o(1)) ≥ nα,(4.133)

whenever 0 < α < 1− aIλ. To give an upper bound for the variance we note,

E(|C (v)|1|C (v)|≥k

)=

n∑l=k

lP (|C (v)| = l) =n∑l=k

l∑t=1

P (|C (v)| = l)

=n∑t=1

l∑l=t∨k

P (|C (v)| = l) = kP(|C (v)| ≥ k) +n∑

t=k+1P (|C (v)| ≥ t) .

(4.134)

Since the tails of size of the connected components can be bounded as in (4.119) we get

E(|C (v)|1|C (v)|≥k

)≤ ke−(k−1)Iλ +

n∑t=k+1

e−(t−1)Iλ

≤ ke−(k−1)Iλ + e−kIλ

1− e−Iλ = O(kn−aIλ).(4.135)

Thus by proposition (4.9) and (4.135) we can conclude that

Varλ (Z≥k) ≤ nEλ(|C (v)|1|C (v)≥k

)≤ O(kn1−aIλ). (4.136)

Now that we finally have the lower bound for the expectation (4.133) and a upper boundfor the variance (4.136) we can apply the Chebyshev’s inequality,

Pλ(Z≥alog(n) = 0

)≤

Varλ(Z≥alog(n))Eλ(Z≥alog(n)

) ≤ O(alog(n)n1−aIλ−2α) = O(n−δ). (4.137)

with 0 < δ < 2α − (1 − aIλ) when 0 < α < 1 − aIλ. Thus when a < 1Iλ

the resultfollows.

We now can conclude that the largest connected component is a tree whose size is oforder log(n). Also by corollary 4.3, the smaller components that are unicyclic are of orderlog(log(n)). We can also conclude from the bounds given by theorems 4.1, 4.2 and therelation ship between the Binomial and Poisson branching process as given by theorem 3.6that the size of the connected components converge in distribution to a Borel distribution.This because in the subcritical regime, the law of the total progeny of a Poisson branchingprocess as given in theorem 3.3 is precisely the Borel distribution.

56

5 Stochastic block model

In the previous chapter we have seen the phase transition of the Erdos-Renyi randomgraph ERn (p) from a connected graph to a sparse graph when p passes through twothresholds. In this chapter we consider a generalization of the Erdos-Renyi random graphcalled the stochastic block model. The stochastic block model is used as a canonical modelfor community detection. The goal of community detection is to assign the vertices of agraph to their unobserved community. Community detection has many applications inmachine learning, data mining, social sciences, biology and statistical physics.As we will see, the phase transition that is of interest to community detection are sharptransitions between phases where the communities can be ”detected”. The term ”detec-tion” refers to the estimation of the vertices class assignment. We will treat ”class” and”community” as meaning the same. The phase in which exact recovery is possible iscalled the Chernoff-Hellinger phase and under weaker conditions detection is possible inthe Kesten-Stigum phase.In section 5.1 we will describe the general stochastic block model and some recovery re-quirements. Also we will describe the phase transition for exact recovery and detectionas described by Abbe [1]. The notation here are derived from Abbe.In section 5.2 we will provide a special case of the stochastic block model called theplanted bi-section model. We will provide the necessary and sufficient conditions for therecovery of the communities as described by Mossel, Neeman and Sly [24]. We will alsoprovide the results from Kleijn and Waaij [22] in which recovery and detection is viewedfrom a Bayesian perspective in which some statistical inference are answered by the pos-terior.The results in this chapter are provided without proofs as this chapter is meant to offer anoverview of a generalization of the Erdos-Renyi random graph and the phase transition.The notations in section 5.2 are mainly derived from Kleijn and Waaij [22].

57

5.1 The model

The stochastic block model is defined as follows: let the number of vertices be n andthe number of communities be k. Let p = (p1, . . . , pk) be a probability vector where pirepresents the probability that a given vertex lies in community i, i.e. the prior on the kcommunities, and let W be a k×k symmetric matrix with entries in [0, 1] that representsthe connectivity probability. This means that the probabilities on the diagonal representthe probabilities that there is an edge between two vertices inside the communities. Theprobabilities outside the diagonal represent the probabilities that there is an edge betweentwo vertices from different communities. Let X = (X1, . . . Xn) be an n dimensional ran-dom vector where Xi is the random variable that the vertex i lies in some community, i.e.P(Xi = j) = pj with j ∈ [k]. The pair (X,G) is drawn under SBM(n, p,W ) if X is an n-dimensional random vector with i.i.d. components distributed under p, and G is a simplegraph with n vertices and i and j are connected with probability WXi,Xj , independentlyof other pairs of vertices. Define the community sets as Ωi = Ωi(X) = v ∈ [n] : Xv = i,i ∈ [k].If we take p to be uniform and if W takes the same value A on the diagonal and the samevalue B outside the diagonal, we have a symmetric SBM and is denoted by SSBM. Thus(X,G) is drawn under SSBM(n, k, A,B), if p = 1

k, . . . , 1

k is a k-dimensional probability

vector and W as described above. If A = B, then the stochastic block model becomesthe Erdos-Renyi random graph ERn (A), and the reconstruction of the communities isimpossible.

The goal of community detection is to recover the labels of X by observing G, up tosome level of accuracy. To be able to do this we need to compare two community vectors.This can be done by investigating the agreement between the two vectors.

Definition 5.1 (Abbe [1]). The agreement between two community vectors x, y ∈ [k]n

is obtained by maximizing the common components between x and any relabelling of y,i.e.

A(x, y) = maxπ∈Sk

1n

n∑i=1

1xi=π(yi), (5.1)

where Sk is the group of permutations on [k].

The relabelling permutation is necessary to handle symmetric communities such as inSSBM, since in this case it is impossible to recover the actual labels.There are many recovery requirements, but we will only mention two of them that arerelevant to us. Let X be a reconstruction of X, i.e. draw each component of X i.i.d.under p.

58

Definition 5.2 (Abbe [1]). Let (X,G) ∼ SBM(n, p,W ). The exact recovery is solved ifthere exist an algorithm that takes G as an input and outputs X = X(G) such that

P(A(X, X) = 1) = 1− o(1). (5.2)

Definition 5.3 (Abbe [1]). Weak recovery or detection is solved in SBM(n, p,W ) if for(X,G) ∼ SBM(n, p,W ), there exist ε > 0, i, j ∈ [k] and an algorithm that takes G as aninput and outputs a partition of [n] into two sets (S, Sc) such that

P(|Ωi ∩ S|/|Ωi| − |Ωj ∩ S|/|Ωj| ≥ ε) = 1− o(1). (5.3)

Just like for the Erdos-Reyni random graphs we can describe the topological propertiesof the SBM graph. For the symmetric SBM, the result are similar to that of Erdos-Reynirandom graphs when we consider d to be the average degree.

i) For a, b > 0 the SSBM(n, k, alog(n)/n, blog(n)/n) is connected with high probabilityif and only if d := a+(k−1)b

k> 1

ii) SSBM(n, k, a/n, b/n) has a giant component of order n with high probability ifd > 1 and if d < 1 the largest connected component has order log(n) with highprobability.

For the general SBM similar results hold for the case when the expected degree is con-stant. Let Q be a k × k matrix with positive entries. SBM(n, p,Qlog(n)/n) is connectedwith high probability if min

i∈[k]‖(diag(p)Q)i‖ > 1 and is not connected with high probability

if mini∈[k]‖(diag(p)Q)i‖ < 1.

The goal of exact recovery is to recover the partition correctly. The threshold for ex-act recovery was first derived for the symmetric SBM with two communities:

Theorem 5.1 (Abbe [1]). Exact recovery in SSBM(n, 2, alog(n)/n, blog(n)/n) is solvableand efficiently so if |

√a−√b| >

√2 and unsolvable if |

√a−√b| <

√2.

Note that |√a −√b| >

√2 can be written as a+b

2 > 1 +√ab and that the connectivity

regime for SSBM(n, 2, alog(n)/n, blog(n)/n) is a+b2 > 1. Thus for exact recovery connec-

tivity is required but it is not sufficient, because of the extra term√ab.

Next we provide the threshold for exact recovery of the general SBM:

Theorem 5.2 (Abbe [1]). Exact recovery in SBM(n, plog(n)Q/n) is solvable and effi-ciently so if

I+(p,Q) := min1≤i<j≤k

D+ ((diag(p)Q)i||(diag(p)Qj)) > 1 (5.4)

59

and is not solvable if I+(p,Q) < 1 where D+ is defined by

D+(µ||ν) := maxt∈[0,1]

∑x

ν(x)ft(µ(x)ν(x)

), ft(y) := 1− t+ ty − yt. (5.5)

The threshold in theorem 5.2 is called the Chernoff-Hellinger divergence and the reasonfor the name of this divergence is that D+ can be written as

D+(µ||ν) = maxt∈[0,1]

Dt(µ||ν), (5.6)

with Dt = ∑xν(x)ft

(µ(x)ν(x)

)and ft(y) := 1− t+ ty−yt. Since ft(1) = 0 and ft is convex on

R+, the functional Dt is called an f -divergence. Since at t = 12 the functional Dt collapses

to the Hellinger divergence and since it matches the Chernoff divergence for probabilitymeasures, Dt is named the Chernoff-Hellinger (CH) divergence and thus also the D+.

Let SNR= (a−b)2

k(a+(k−1)b)) be the Kesten-Stigum (KS) threshold. The phase in which weakrecovery is solvable is called the Kesten-Stigum phase. In the case of symmetric SBMwith two communities we have the following result:

Theorem 5.3 (Abbe [1]). For k = 2, weak recovery is not solvable if SNR ≤ 1 (i.e.,(a− b)2 ≤ 2(a+ b)), and it is efficiently solvable if SNR > 1 (i.e. (a− b)2 > 2(a+ b)).

For the genral SBM(n, p,Q/n), let P be the diagonal matrix with Pi,i = pi for each i ∈ [k].Let λ, . . . , λn be the distinct eigenvalues of PQ in order of non-increasing magnitude.

Theorem 5.4 (Abbe [1]). Let k ∈ Z+, p ∈ (0, 1)k be a probability distribution, Q be a k×ksymmetric matrix with nonnegative entries, and G be drawn under SBM(n, p,Q/n). IfSNR > 1, then there exist r ∈ Z+, c > 0 and m : Z+ → Z+ such that ABP(G,m(n), r, c, (λ1, . . . , λn))solves detection and runs in O(nlog(n)) time.

Here ABP is an algorithm which stands for acyclic belief propagation.

5.2 Planted bi-section model

This model is a special case of the symmetric stochastic block model where the numberof vertices are 2n and the number of communities are two with equal size of n, andcommunity assignment vector θ′ ∈ Θ′n ⊆ 0, 12n. Each vertex belongs to a communityand two vertices i and j are in the same community if θ′i = θ′j. Denote the random graphby Xn and the space in which it takes it values by Xn, represented by the adjacencymatrix with entries Xij ∈ 0, 1 : 1 ≤ i, j ≤ 2n. The edges between the verticesoccur independently of other vertices with probability depending in which community

60

the vertices lies. Thus the probability between vertex i and j is pn if they both are in thesame community and qn if they lie in different communities:

Pθ,n (Xij = 1) =

pn, if θ′n,i = θ′n,j

qn, if θ′n,i 6= θ′n,j

. (5.7)

When the two probabilities are equal, i.e. p′ = pn = qn, there is no distinction between thetwo communities and the planted bi-section model collapses into the Erdos-Renyi randomgraph ER2n(p′). Therefore, for community detection we need at least that pn 6= qn to beable to distinguish between the two communities. Suppose now that that the probabilitybetween the communities is zero and the probability inside the communities is one, thenwe end up with two disconnected subgraphs with n vertices and each one is a.e. a completegraph, see figure 5.1.

12

3

4

56

7

8

9

1011

12

13

14

1516

17

18

19

20

Figure 5.1: Example of a graph with 20 vertices and two disconnected communities. The coloured pathsrepresent a way of detecting the communities.

Suppose now that the probability inside the communities is zero and the probabilitybetween the two communities is one, then the communities can be trivially detected bycriss-crossing along the edges, see figure 5.2. Therefore, from now on we will assume that0 < pn, qn < 1.

1 2 3 4 5 6 7 8 9 10

11 12 13 14 15 16 17 18 19 20

Figure 5.2: Example of a graph with 20 vertices and two communities who are internally not connected butare connected with each other. The coloured paths represent a way of detecting the communities.

61

Mossel, Neeman and Sly [24] provide necessary and sufficient conditions for the asymp-totic recoverability of the planted bi-section model with p = pn and q = qn. They usedifferent terminology for exact recovery and almost exact recovery, namely strong consis-tency and weak consistency respectively.The main result is a characterisation of the sequences pn and qn for which strong consis-tency or weakly consistency estimators exist. They define the following probability:

Definition 5.4 (Mossel et al. [24]). Given n,m, pn and qn, let

X ∼ Bin(m,maxpn, qn),

Y ∼ Bin(n,minpn, qn).(5.8)

DefineP (m,n, pn, qn) = P(Y ≥ X). (5.9)

When m = n abbreviate P (n, pn, qn) = P (n, n, pn, qn).

Theorem 5.5 (Mossel et al. [24]). Consider sequences pn and qn in [0, 1]. There existstrongly consistent estimator for the planted bi-section model if and only if P (n, pn, qn) =o(n−1). There exist a weakly consistent estimator for planted bi-section model if and onlyif P (n, pn, qn)→ 0.

Furthermore, the authors characterise explicitly which of these sequences allow for strongconsistency.

Proposition 5.1 (Mossel et al. [24]). Let pn = anlog(n)n

and qn = bnlog(n)n

. If thereexist a constant C > 0 such that C−1 ≤ an, bn ≤ C for all but finitely many n, thenP (n, pn, qn) = o(1) if and only if(

an + bn − a√anbn − 1

)log(n)+ 1

2log(log(n))→∞. (5.10)

As for weak consistency a much simpler formula hols

Proposition 5.2 (Mossel et al. [24]). P (n, pn, qn)→ 0 if and only if n(pn−qn)2

pn+qn →∞.

Suppose now that there exist an algorithm that can estimate the community assignment.What is there to say about the accuracy and reliability of this estimate? Questionsregarding accuracy has been answered by Zhang and Zhao [30], but uncertainty quan-tification with confidence sets for community assignment has not been addressed in theliterature. But from a Bayesian perspective the posterior provides estimates of accuracyand credibility without further investigations. Kleijn and Waaij [22] emphasis this point.They show that it is justifiable to use posteriors if one is interested in more than justestimation, and we will now provide some of their results.

62

We have discussed in the beginning of this section that when pn = qn the graph be-comes the Erdos-Renyi random graph ER2n(pn), and this leads to the identifiable issueof the class assignment θn ∈ Θ′n. Another identifiable issue is that the model in invariantunder interchange of class labels 0 and 1. This means that we have an equivalence rela-tion on Θn defined by θ′1 ∼n θ′2 if θ′1,n = ¬θ′2,n. To prevent this unidentifiable issue, Kleijnand Waaij parametrize this model by defining the quotient space Θn = Θ′n/ ∼n. Theprobability measure for Xn corresponding to parameter θ is denoted by Pθ,n. The goalis to reconstruct the unobserved class assignment vectors θn consistently, which meanscorrectly with probability growing to one as n→∞. There is a strong and weak versionof this statement as defined below.

Definition 5.5 (Kleijn & Waaij [22]). Let θ0,n ∈ Θn be given. An estimator sequenceθn : Xn → Θn is said to recover the class assignment θ0,n exactly if,

Pθ0,n

(θn(Xn) = θ0,n

)→ 1, (5.11)

that is, if θn indicates the correct partition assignment with high probability.

Definition 5.6 (Kleijn & Waaij [22]). Let θ0,n ∈ Θn be given. An estimator sequenceθn : Xn → Θn is said to detect the class assignment θ0,n if,

12n

∣∣∣∣∣2n∑i=1

(−1)θn,i(−1)θ0,n,i

∣∣∣∣∣ Pθ0,n−−−→ 1, (5.12)

that is, if the fraction of correct assignments in θn grows to one with high probability.

They proceed with the Bayesian approach. Only uniform priors (Πn) for θn ∈ Θn areconsidered, with πn the probability mass function and the posterior for θn is defined forall A ⊆ Θn

Π(A|Xn) =∑θn∈A pθn,n(Xn)πn(θn)∑θn∈Θn pθn,n(Xn)πn(θn) , (5.13)

with pθn,n(Xn) the likelihood.

The sets of class assignment vectors that differ only by a given number exchanges ofpairs are of particular interest for exact recovery and detection. These sets are definedas follows: let θ′0,n be the representation of the true parameter θ0,n ∈ Θn and defineZ(θ′0,n) ⊆ [2n] to be class zero, i.e. Z(θ′0,n) = i ∈ [2n] : θ′0,i = 1, and class one byZc(θ′0,n). The sets of interest are denoted by V ′n,k ⊆ Θ′n, and are defined as to hold all theassignment vectors θ′n that differ from θ′0,n by k exchanges of pair. More formally, if wedenote Z(θ′n) to be class zero with respect to θ′n, then θ′n ∈ V ′n,k if #Z(θ′0,n) \ Z(θ′n) = k.Let k′(θ′1,n, θ′2,n) be the minimal number of pairs exchanges needed to take θ′1,n into θ′2,n.

63

Note that k′(θ′1,n,¬θ′2,n) = n − k′(θ′1,n, θ′2,n) which leads to a distance measure betweentwo representation classes

k(θ1,n, θ2,n) = k′(θ′1,n, θ′2,n) ∧ k(θ′1,n,¬θ′2,n). (5.14)

Let Vn,k = θn : k(θn, θ0,n) = k = θn : θ′n ∈ V ′n,k. Then, for exact recovery we areinterested in the expected posterior masses of subsets of Θn of the form

Vn = θn ∈ Θn : θn 6= θ0,n =bn/2c⋃k=1

Vn,k. (5.15)

The following theorem states a sufficient condition for pn and qn to achieve exact recovery.

Theorem 5.6 (Kleijn & Waaij [22]). For some θ0,n ∈ Θn, assume that Xn ∼ Pθ0,n, forevery n ≥ 1. If we equip every Θn with its uniform prior and (pn) and (qn) are such that,

(1 + (1− pn − qn + 2pnqn + 2

√pn(1− pn)qn(1− qn))n/2

)2n→ 1, (5.16)

then the posterior succeeds in exact recovery, i.e.

Π(θn = θ0,n|Xn)Pθ0,n−−−→ 1. (5.17)

Thus when npn = anlog(n) and nqn = bnlog(n) with an, bn = O(1) one can show that(1 +

(1− pn − qn + 2pnqn + 2

√pn(1− pn)qn(1− qn)

)n/2)2n

=1 +

(1− (an + bn − 2

√anbn + o

(n−1log(n)

)) log(n)

n))n/22n

≈(1 + n−

12 (an+bn−2

√anbn)

)2n=(

1 + 1nn−

12 (an+bn−2

√anbn−2)

)2n

≈ exp(2e− 1

2 (an+bn−2√anbn−2)log(n)

).

(5.18)

Therefore, in the Chernoff-Hellinger phase it is sufficient that

(an + bn − 2√anbn − 2)log(n)→∞. (5.19)

which resembles the equation by Mossel, Neeman and Sly. The condition provided byMossel, Neeman and Sly holds only if there exist a C > 0 such that C−1 ≤ an, bn ≤ C

for large enough n, and it is slightly weaker. In equation (5.19) it is possible to con-sider either one of the sequence (an) or (bn) to converge to zero, or one of them tobe zero from beginning. For example if bn = 0 and lim infn an > 2, the edges be-tween the classes are absent and the graphs that remain are two subgraphs that areconnected with high probability. This follows from the connectivity threshold in section4.2, since npn − log(n)= anlog(n)− log(n)> (an − 2)log(n)> 0. Similarly, if bn = 0 and

64

lim infn bn > 2, the edges in the clusters are absent and thus we remain with a graph inwhich the communities are with high probability connected with each other. This alsofollows by the same reasoning from the connectivity threshold.

As for the case of detection we only require that the estimated θn differs from θ0,n by o(n)differences. This relaxation makes it possible to detect the communities in the Kersten-Stigum phase. We are interested in the expected posterior masses of subsets of Θn of theform

Wn =bn/2c⋃k=kn

Vn,k, (5.20)

for a possible divergent sequence kn of order o(n).

Theorem 5.7 (Kleijn & Waaij [22]). For some θ0,n ∈ Θn, assume that Xn ∼ Pθ0,n, forevery n ≥ 1. If we equip every Θn with its uniform prior and (pn) and q(n) are such that,

n

kn

(1− pn − qn + 2pnqn + 2


)2n→ 0, (5.21)

then,Π(Wn|Xn)

Pθ0,n−−−→ 0, (5.22)

i.e. the posterior detects θ0,n at rate kn.

Corollary 5.1 (Kleijn & Waaij [22]). Under the conditions of theorem 5.7 with pn andqn such that

n(pn + qn − 2pnqn − 2


)2n→∞, (5.23)

as n→∞, then, there exist a sequence kn = o(n) such that,

Π(kn(θn, θ0,n) ≥ kn|Xn)Pθ0,n−−−→→ 0. (5.24)

Mossel, Neeman and Sly have shown in 5.2 that n(pn−qn)2

pn+qn is a necessary condition fordetection. Thus corollary 5.1 shows that the uniform priors leads to posteriors thatdetect the true parameter under the weakest possible conditions on the sequence pn andqn.

65

6 Popular summary

The World Wide Web (WWW) is the largest network build by humans. It consist ofapproximately 1012 documents, which are linked to each other by Uniform Resource Lo-cator’s (URL’s). One can think of the links to be random and this thought is strengthenedupon visualising this network. Therefore, it is reasonable to approximate large networksby a model which truly captures the randomness in the links. The model that capturesthe randomness is the random graph models. These models describe a network in termsof graphs for which the edges occur with some probability.Yet, real networks are not random. So why use the random graph models? The reason isthat random graph models provide us with properties that can be used as a reference forstudying real networks. An example of a property is whether a network is connected, orif it contains a giant connected component. This can be answered by using random graphmodels as they exhibit a phase transition related to connectedness and giant connectedcomponent. It is worth mentioning that the intention behind the random graph theorywas not to describe the real networks. It was invented by Paul Erdos and Alfred Renyito answer questions related to properties of graphs.

The simplest random graph model is the Erdos-Renyi random graph model in whichthe edges occur with some fixed probability. In this thesis we study the phase transitionof the Erdos-Renyi random graph model related to the connectedness and emergence ofgiant connected component in random graph models. The phase transitions of a randomgraph with n vertices and for which the probability of the independent occurrence of anedge is p, depends on the expected degree λ = np:

• If λ < 1, the largest connected component has a size of order log(n) with highprobability.

• If λ = 1, the largest connected component has a size of order n2/3 with high prob-ability.

• If λ > 1, the largest connected component has a size of order n with high probabilityand is called the giant component.

Thus p = 1n

is the critical value at which a phase transition occurs from a sparse regimeto a regime in which a giant component resides.

66

As for the phase transition for connectedness we have for every ε > 0 the following results:

• If λ < (1− ε)log(n) the graph is disconnected with high probability.

• If λ > (1 + ε)log(n) the graph is connected with high probability.

Thus λ = log(n) is the critical value at which a phase transition from a disconnectedgraph to a connected graph takes place.

67

Bibliography

[1] Abbe, E., Community detection and stochastic block models: recent developments,Cambridge University Press 2016.

[2] Barabasi, A.-L., Network Science, Cambridge University Press 2016.

[3] Behrisch, M., Component evolution in random intersection graphs, The ElectronicJournal of Combinatorics 14 (2007)

[4] Bollobas, B., Random Graphs, Cambridge University Press, 2001.

[5] Erdos, P., Renyi, A., On Random Graphs I, Publ. Math. Debrecen 6 (1959), 290-297.

[6] Erdos, P., Renyi, A., On the evolution of random graphs, Publ. Math. Inst. Hung.Acad. Sci. 5 (1960), 17-61.

[7] Erdos, P., Renyi, A., On the evolution of random graphs, Bull. Inst. Internat. Statist.38 (1961), 343-347.

[8] Erdos, P., Renyi, A., On the strenght of connectedness of a random graph, ActaMath, Acad. Sci. Hungar. 12 (1961) 261-267.

[9] Erdos, P., Renyi, A., Asymmetric graphs, Acta Math. Acad. Sci. Hung. 14, (1963)295-315.

[10] Erdos, P., Renyi, A. On random matrices, Publ. Math. Inst. Hung. Acad. Sci. 8(1964), 455-461.

[11] Erdos, P., Renyi, A., On the existence of a factor of degree one of a connected randomgraph, Acta Math. Acad. Sci. Hung. 17 (1968), 359-368.

[12] Erdos, P., Renyi, A., On random matrices II, Studia Sci. Math. Hung. 3 (1968),459-464.

[13] Frieze, A., Karonski, M., Introduction to Random Graphs, Cambridge UniversityPress, 2016.

[14] Galton, F., Watson, H. W., On the probability of the extinction of families, Journalof the Royal Anthropological Institute, 4, 138–144 (1875).

68

[15] Galdarelli, G., Capocci, A., De Los Rios, P., Munoz, M.A., Scale-Free Networksfrom Varying Vertex Intrinsic Fitness, Physical Review letters, Volume 89 number25, 2002.

[16] Gilbert, E.N., Random Graphs, Volume 30, Number 4 (1959), 1141-1144.

[17] Gilbert, E.N., Random Plane Networks, Journal of the Society for Industrial andApplied Mathematics 9.4: 533–543, (1961).

[18] Godehardt, E., Steinbach, J. On a lemma of P Erdos and A. Renyi about randomgraphs, Publ. Math. Debrecen 28 (1981), 271-273.

[19] Harris., T.E., The theory of branching processes, Springer-Verlag, 1963.

[20] Janson, S., Luczak, T., Rucinski, A., Random Graphs, John Wiley & Sons, Inc, 2000.

[21] Karonski, M., Rucinski, A., The origins of the theory of random graphs, R.L. Grahamet al. (eds.), The mathematics of Paul Erdos I, DOI 10.1007/978-1-4614-7258-2 23,Springer Science + Business Media New York 2013.

[22] Kleijn, B.J.K., van Waaij, J., Recovery, detection, and confidence sets of communitiesin a sparse stochastic block model, arXiv:1810.09533v1, 2018.

[23] Lageras, A.N., Lindholm, M., A note on the component structure in random inter-section graphs with tunable clustering, The Electronic Journal of Combinatorics, Vol.15(1), N10, 2008.

[24] Mossel, E., Neeman, J., Sly, A., Consistency threshold for the planted bisection model,Electron. J, Probab. (2016), no. 21, 1-24.

[25] Newman, M.E.J., Networks, an introduction, Oxford university press, 2010.

[26] Penrose, M. Random Geometric Graphs, Oxford Studies in Probability volume 5,2003.

[27] Rapoport, A., Solomonoff, R., Connectivity of random nets, Bulletin of MathematicalBiophysics, Volume 13, 1951.

[28] Singer-Cohen, K.B., Random intersection graphs, PhD thesis, Department of Math-ematical Sciences, The Johns Hopkins University, 1995.

[29] Van der Hofstad, R., Random graphs and complex networks, volume I, Lecture notes,2016.

69

[30] Zhang, A.Y., Zhou, H.H., Minimax Rates of Community Detection in Stochas-tic Block Models, In: Ann.Statist. 44.5, pp.2252-2280. DOI:10.1214/15-AOS1428,(2016).

[31] Zhao, J., Yagan, O., Gligor, V., Random intersection graphs and their applicationsin security, wireless communications, and social networks, 2015.

70

Documents

Phase transitions in random graphs - UvA · resulting random graph is, respectively, disconnected or connected with high probability. In this thesis we examine both regimes, the regime