26
Ramanujan J (2010) 23: 371–396 DOI 10.1007/s11139-010-9280-z Bentley’s conjecture on popularity toplist turnover under random copying Kimmo Eriksson · Fredrik Jansson · Jonas Sjöstrand Dedicated to George Andrews for his 70th birthday Received: 30 June 2009 / Accepted: 15 October 2010 / Published online: 28 October 2010 © Springer Science+Business Media, LLC 2010 Abstract Bentley et al. studied the turnover rate in popularity toplists in a ‘random copying’ model of cultural evolution. Based on simulations of a model with popula- tion size N , list length and invention rate μ, they conjectured a remarkably simple formula for the turnover rate: μ. Here we study an overlapping generations ver- sion of the random copying model, which can be interpreted as a random walk on the integer partitions of the population size. In this model we show that the conjectured formula, after a slight correction, holds asymptotically. Keywords Toplists · Random walk · Integer partitions · Moran model · Popularity distribution Mathematics Subject Classification (2000) Primary 60G50 · 05A17 This research was supported by the CULTAPTATION project (European Commission contract FP6-2004-NEST-PATH-043434) and the Swedish Research Council. K. Eriksson ( ) · F. Jansson · J. Sjöstrand School of Communication, Culture and Communication, Mälardalen University, 721 23 Västerås, Sweden e-mail: [email protected] F. Jansson e-mail: [email protected] J. Sjöstrand e-mail: [email protected] K. Eriksson · F. Jansson · J. Sjöstrand Centre for the Study of Cultural Evolution at Stockholm University, Stockholm, Sweden

Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

Ramanujan J (2010) 23: 371–396DOI 10.1007/s11139-010-9280-z

Bentley’s conjecture on popularity toplist turnoverunder random copying

Kimmo Eriksson · Fredrik Jansson ·Jonas Sjöstrand

Dedicated to George Andrews for his 70th birthday

Received: 30 June 2009 / Accepted: 15 October 2010 / Published online: 28 October 2010© Springer Science+Business Media, LLC 2010

Abstract Bentley et al. studied the turnover rate in popularity toplists in a ‘randomcopying’ model of cultural evolution. Based on simulations of a model with popula-tion size N , list length � and invention rate μ, they conjectured a remarkably simpleformula for the turnover rate: �

√μ. Here we study an overlapping generations ver-

sion of the random copying model, which can be interpreted as a random walk on theinteger partitions of the population size. In this model we show that the conjecturedformula, after a slight correction, holds asymptotically.

Keywords Toplists · Random walk · Integer partitions · Moran model · Popularitydistribution

Mathematics Subject Classification (2000) Primary 60G50 · 05A17

This research was supported by the CULTAPTATION project (European Commission contractFP6-2004-NEST-PATH-043434) and the Swedish Research Council.

K. Eriksson (�) · F. Jansson · J. SjöstrandSchool of Communication, Culture and Communication, Mälardalen University, 721 23 Västerås,Swedene-mail: [email protected]

F. Janssone-mail: [email protected]

J. Sjöstrande-mail: [email protected]

K. Eriksson · F. Jansson · J. SjöstrandCentre for the Study of Cultural Evolution at Stockholm University, Stockholm, Sweden

Page 2: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

372 K. Eriksson et al.

1 Introduction

A pervasive phenomenon in modern culture are toplists like Top 100 Baby Names orthe Billboard Top 200 Pop Chart. Mathematics is no exception; indeed, the presentpaper was partly inspired by Andrews and Berndt’s paper on the Top Ten Most Fas-cinating Formulas in Ramanujan’s Lost Notebook reporting the outcome of a pop-ularity vote on these formulas among experts in the field [1]. Andrews and Berndtadmit that their list would very likely look different in another week, and point tothe corresponding phenomenon in pop charts: “it is the fate of popular songs to losetheir popularity and fade off the charts” (p. 19). Here we will be concerned preciselywith this phenomenon of turnover of toplists. By the turnover rate we will mean thenumber of entries that have gone off the list after a given time.

Our aim is to prove (a modified version of) a remarkable conjecture on turnoverrates found by Bentley et al. [3]. These authors analyzed empirical data on theturnover of toplists of various things: baby names, pop albums, and dog breeds. Theirdata allowed them to study varying list lengths (like Top 10, Top 100, etc.), whichyielded intriguing results: The turnover rate seemed to be approximately proportionalto the list length and largely independent of the underlying population size.

In order to find a theoretical explanation for this empirical finding, Bentley et al.then simulated cultural evolution of the popularity distribution of cultural variants(say, pop songs) under a simple random copying model. Each individual of a newgeneration is assumed to copy the favorite song of a randomly drawn individual ofthe previous generation, but with a small probability μ the individual instead inventsa new song. Simulations of this model gave results consistent with the empirical data,and the authors observed the following pattern (without any attempt at analyticalverification), which we will refer to as Bentley’s conjecture.

Conjecture 1 (Bentley’s conjecture [3]) Under the above random copying modelwith N individuals per generation, list length �, and invention rate μ, the expectedturnover rate of the toplist is very close to

�√

μ (1)

for small μ and sufficiently large N compared to �.

Bentley et al. made no attempt to pin down more precisely the assumptions and theresult. Our aim in this paper is to come up with, and prove, a precise formulation ofthe result. First, we urge the reader to take a moment to appreciate the problem, sinceit seems to us to be quite novel. As we will discuss at the end, there are a numberof well-known stochastic processes that can be interpreted as generating popularitydistributions, but we have never before seen considered the question of how often themost popular elements get replaced. If Bentley’s conjecture is to be believed, thensuch questions may have very nice answers.

1.1 Comparison of two random copying models

We will now briefly discuss two extensively studied random copying models frommathematical population genetics, called the infinite alleles Wright–Fisher model andthe infinite alleles Moran model. For all details, we refer to the book by Ewens [6].

Page 3: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

Bentley’s conjecture on popularity toplist turnover 373

Fig. 1 Results of simulations ofBentley’s model (dotted curve)and the IAM model (solidcurve), for varying list length �,with fixed population sizeN = 4000 and invention rateμ = 0.02. The simulation firstruns for a number of steps sothat a stationary distribution isapproached. The averageturnover rate is then computedas the mean of the turnover ratein the following 200 generations

Bentley’s model is equivalent to the infinite alleles Wright–Fisher model. In thepresent paper we will instead work with the infinite alleles Moran model (henceforthIAM). The IAM model differs from Bentley’s model only in the respect that genera-tions are overlapping: Each timestep sees the death of a randomly chosen individualand the birth of a new individual who either inherits the variant of a randomly chosenparent or, with probability μ, invents a new variant.

As described in [6], the two models often give pretty similar results (see alsosimulations in [8]). The IAM model seems on the whole to be more amenable to exactanalysis. For instance, an exact expression for the stationary distribution is known forthe IAM, whereas the same expression holds asymptotically for the infinite allelesWright–Fisher model.

Simulations show that the two models also seem to behave similarly with respectto toplist turnover per generation, if we in the IAM model define a generation as N

time steps. Figure 1 shows the turnover rate per generation for varying list length �.Clearly, there are three regimes for both models. In the first regime (for short toplists),the two models give similar turnover rates with what looks like a linear dependenceon �. The second regime has a slightly convex shape. In the third regime, the turnoverrate is constant.

Both Bentley’s conjecture and the asymptotic analysis in the present paper applyto the first regime, which we will look into more carefully shortly. The third regimeis trivially explained: When the list length is greater than the number of existingsongs, then every newly invented song enters the list. Hence, the expected turnoverrate in this regime is Nμ in Bentley’s model (in the figure, Nμ = 80), whereas it issomewhat lower in the IAM model as a song may cease to exist in the same generationit was invented, thus not contributing to the turnover. The second regime awaits closerinvestigation.

Figure 2 shows the first regime for the IAM model, with varying list length andfour different population sizes. We see that, in line with Bentley’s conjecture, theturnover rate is roughly independent of N and roughly linear in �. A least squares fitof the four curves yields the linear expression 0.126�, with r2 = 0.997.

Finally, Bentley’s conjecture says that the turnover rate shall be proportional tothe square root of the invention rate μ. Figure 3 shows that this indeed seems to be

Page 4: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

374 K. Eriksson et al.

Fig. 2 Results of simulations ofthe IAM model, for varying �,four different values of N , andconstant μ = 0.02. Averageturnover rate is computed as themean turnover rate in 70generations

Fig. 3 Results of simulations ofthe IAM model, with N = 4000and μ ∈ [0,1]. The curve showsthe mean turnover rate dividedby list length, taken over all listlengths � ∈ {1,2, . . . ,100}

the case as long as μ is not too close to 1. A least squares fit for the part of the curvewhere

√μ ≤ 0.8 yields the linear expression 1.037

√μ for the turnover rate, with

r2 = 0.996.

1.2 Outline of result and approach

The above simulation results indicate that Bentley’s conjecture applies to the IAMmodel as well. For this model, Strimling et al. [8] recently obtained an expressionfor the expected number of variants of popularity k under the stationary distribution.Denoting this number by fk , they proved that

fk = μN(1 − μ)k−1

k

k−1∏

i=1

N − i

N − i − 1 + iμ. (2)

This formula will be the starting point for our analysis.It can be helpful to observe that the popularity distribution of cultural variants can

be viewed as an integer partition of the population size N . When we talk about asong that has k votes, one can think about a particular row of length k of the Youngdiagram. Thus, the IAM model of cultural evolution can be interpreted as a random

Page 5: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

Bentley’s conjecture on popularity toplist turnover 375

walk on the set of Young diagrams of N squares, where each step involves the death ofone random square and the birth of a new square, either through doubling of a randomsquare or creation of a new row of a single square (and then possibly some reorderingof the rows so that they are kept in order of decreasing length). In another paper [5]we explicitly use this interpretation to study limit shapes of integer partitions thatevolve according to the IMA model (and related stochastic processes). In this paperwe will instead use the terminology of songs and votes. The following key conceptswill also be important.

– The popularity of a song is the number of votes for that song.– The toplist is the set of the � most popular songs. (This is not well-defined if two or

more songs have the same popularity, but then we can use any rule to decide whichof them should be included on the toplist as it does not matter for our analysis.)

– A generation consists of N time steps.– Let α > 0 be some constant. The turnover rate is the number of songs that are on

the toplist at a given time t0 but not α generations later. (We will only be inter-ested in the expected turnover rate which is independent of t0 since we assume astationary distribution.)

As it turns out, Bentley’s conjecture needs to be modified slightly. We will provethe following result:

Theorem 1 Let α > 0 be any constant, suppose N , �, and μ satisfy Assumptions 1, 2and 3 in Sect. 2, and let ψ be defined as in Assumption 2. Then, under the stationarydistribution of the IAM model, the expected turnover rate in α generations (i.e., αN

timesteps) is

∼ √ψα/π · � · √μ ln(N/�).

The paper is organized as follows. First we define the notation we will use, andpresent the assumptions needed for the theorem (Sect. 2). We then derive some basicresults (like expectations and variance) about the number of songs of a given popu-larity (Sect. 3), after which we proceed to examine what popularity it takes to qualifyon the toplist (Sect. 4). In the most technical part, we analyze the random process de-scribing how the popularity of a song changes over time (Sect. 5). We then integratethe previous results into a proof of the main theorem (Sect. 6). We conclude by a briefdiscussion of future directions of research.

2 Notation and assumptions

The input to our problem consists of the three variables N , � and μ and the constant α.To make the notation simple and clear in the following sections, we will think of thevariables N , � and μ, and functions of those, as depending on a single free variableω. Thus, when we write e.g. N/� → ∞ we mean that Nω/�ω → ∞ when ω → ∞.However, the dependence on ω will always be invisible as we will drop the index andwrite e.g. N instead of Nω.

Page 6: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

376 K. Eriksson et al.

We will use conventional ordo notation and the symbols �, ∼ and �.

– A � B means that A/B → 0.– A ∼ B means that A/B → 1.– A � B means that, for any ε > 0, eventually |A/B| < 1 + ε.

Our result requires that the variables N , � and μ satisfy three assumptions thatrestrain their asymptotical behavior.

Assumption 1 N/� → ∞.

Assumption 2 ψ := 1 − lim ln(μ−1)ln(N/�)

exists, and 0 < ψ < 1.

Assumption 3 μ�/√

ln(N/�) → ∞.

These assumptions easily imply the following basic asymptotical properties of ourvariables:

Corollary 1 The following holds.

N → ∞, (3)

� → ∞, (4)

μ ln(N/�) → 0. (5)

Proof We have

ln((μ ln(N/�))−1)

ln(N/�)= ln(μ−1)

ln(N/�)− ln ln(N/�)

ln(N/�),

which tends to 1 − ψ by Assumptions 1 and 2. Thus, μ ln(N/�) → 0. Clearly, thisimplies that μ → 0 which together with Assumption 3 yields that � → ∞. Finally,by Assumption 1 we get N → ∞. �

Finally, we will always assume that the IAM model has reached a stationary dis-tribution (cf. [6, 8]).

3 The number of songs of a given popularity

For 1 ≤ m ≤ N , let Xm be the number of songs with popularity m (assuming sta-tionary distribution), which is a random variable. From (2) we already have an ex-act expression for the expected value fm = E(Xm). In this section we will examineVar(Xm) and xm,n := E(Xm,n), where Xm,n denotes the number of ordered pairs(A,B) of distinct songs A = B such that A has popularity m and B has popularity n,for 1 ≤ m,n ≤ N .

Page 7: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

Bentley’s conjecture on popularity toplist turnover 377

Starting with xm,n, first note that

Xm,n ={XmXn if m = n,

Xm(Xm − 1) if m = n.

Lemma 1

xm,n ≤ fmfn.

Proof For any 1 ≤ i,m ≤ N , let P(i → m) denote the probability that a song withpopularity i will have popularity m at the next time step. For 1 ≤ i, j,m,n ≤ N ,define Bm,n;i,j and cm,n as follows (when applicable).

Bm,n;m,n := P(m → m) + P(n → n) − 1

= 1 − m + n

N− a

m + n

N − 1+ a

m(2m − 1) + n(2n − 1)

N(N − 1),

Bm,n;m+1,n := P(m + 1 → m) = m + 1

N− a

m(m + 1)

N(N − 1),

Bm,n;m,n+1 := P(n + 1 → n) = n + 1

N− a

n(n + 1)

N(N − 1),

Bm,n;m−1,n := P(m − 1 → m) = am − 1

N − 1− a

(m − 1)2

N(N − 1),

Bm,n;m,n−1 := P(n − 1 → n) = an − 1

N − 1− a

(n − 1)2

N(N − 1),

cm,n := μ(fnδm,1 + fmδn,1).

All other Bm,n;i,j are set to zero. Here, a := 1 − μ, and δm,1 is 1 if m = 1 and zerootherwise.

Since fm is the expected value of Xm at the stationary distribution, we must have

fm = μδm,1 + P(m + 1 → m)fm+1 + P(m − 1 → m)fm−1 + P(m → m)fm (6)

= μδm,1 + Bm,n;m+1,nfm+1 + Bm,n;m−1,nfm−1 + P(m → m)fm, (7)

and, similarly,

fn = μδn,1 + P(n + 1 → n)fn+1 + P(n − 1 → n)fn−1 + P(n → n)fn (8)

= μδn,1 + Bm,n;m,n+1fn+1 + Bm,n;m,n−1fn−1 + P(n → n)fn. (9)

Multiplying (7) by fn and (9) by fm and then adding the resulting equations yields

2fmfn = cm,n +( ∑

1≤i,j≤N

Bm,n;i,j fifj

)+ fmfn,

and after subtracting fmfn we obtain

fmfn = cm,n +∑

1≤i,j≤N

Bm,n;i,j fifj . (10)

Page 8: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

378 K. Eriksson et al.

Now, for 1 ≤ i, j,m,n ≤ N , define Rm,n;i,j and dm,n as follows (when applicable).

Rm,n;m,n := −2mna

N(N − 1),

Rm,n;m+1,n := n(m + 1)a

N(N − 1),

Rm,n;m,n+1 := m(n + 1)a

N(N − 1),

Rm,n;m−1,n := n(m − 1)a

N(N − 1),

Rm,n;m,n−1 := m(n − 1)ba

N(N − 1),

Rm,n;m−1,n+1 := −(m − 1)(n + 1)a

N(N − 1),

Rm,n;m+1,n−1 := −(n − 1)(m + 1)a

N(N − 1),

dm,n := μ

N

((nfn − (n + 1)fn+1

)δm,1 + (

mfm − (m + 1)fm+1)δn,1

).

All other Rm,n;i,j are set to zero. (Note that fN+1 = 0 by definition.)Also, for any 1 ≤ i, j,m,n ≤ N such that i+j ≤ N , let P((i, j) → (m,n)) denote

the probability that two distinct songs with popularity i and j will have popularitym and n, respectively, at the next time step. It is not difficult to check that, for any1 ≤ i, j,m,n ≤ N such that i + j ≤ N , in fact

Bm,n;i,j − Rm,n;i,j = P((i, j) → (m,n)

). (11)

Now, let us drop the assumption of stationary distribution for a while and insteadsee what happens if we start with one single song with popularity N at time 0. Letx

(t)m,n denote the expected number of ordered pairs (A,B) of songs at time t such that

A has popularity m and B has popularity n. It follows from (11) that

x(t+1)m,n = cm,n − dm,n +

1≤i,j≤N

(Bm,n;i,j − Rm,n;i,j )x(t)i,j . (12)

Let r(t)m,n := fmfn − x

(t)m,n. Subtracting (12) from (10) yields

r(t+1)m,n = dm,n +

1≤i,j≤N

Rm,n;i,j fifj +∑

1≤i,j≤N

(Bm,n;i,j − Rm,n;i,j )r(t)i,j . (13)

At time 0 there is only one song, so x(0)m,n = 0 and thus r

(0)m,n = fmfn > 0 for all m

and n. Suppose r(t)i,j ≥ 0 and let us show that r

(t+1)i,j ≥ 0.

First, note that r(t+1)m,n = fmfn > 0 automatically when m + n > N , so in the fol-

lowing we will assume that m + n ≤ N . If m + n is strictly less than N , Bm,n;i,j −Rm,n;i,j either vanishes or equals P((i, j) → (m,n)) ≥ 0. When m + n = N it can

Page 9: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

Bentley’s conjecture on popularity toplist turnover 379

easily be checked that Bm,n;i,j − Rm,n;i,j ≥ 0 assuming that μN ≥ 1 which followsfrom Assumptions 1 and 2. Thus, the last sum in (13) is nonnegative. Using againthat μN ≥ 1, we have kfk > (k + 1)fk+1 for all k, which implies that dm,n ≥ 0. Itremains to examine the sum

∑1≤i,j≤N Rm,n;i,j fifj . But this sum can be written as

1≤i,j≤N

Rm,n;i,j fifj = a

N(N − 1)

[((m − 1)fm−1 − mfm

)(nfn − (n + 1)fn+1

)

+ ((n − 1)fn−1 − nfn

)(mfm − (m + 1)fm+1

)], (14)

which is nonnegative if m,n > 1. If m = 1 or n = 1, then we must check that dm,n +∑1≤i,j≤N Rm,n;i,j fifj ≥ 0, and this is true if μN ≥ 1.

Thus, we have proved by induction that r(t)m,n ≥ 0 for all 1 ≤ m,n ≤ N and all

t ≥ 0. Since the process approaches the unique stationary distribution, it follows thatfmfn − xm,n ≥ 0 for all m and n. �

Proposition 1

Var(Xm) ≤ fm.

Proof By Lemma 1 we have

Var(Xm) = E(X2

m

) − E(Xm

)2 = E(Xm(Xm − 1)

) + E(Xm) − E(Xm)2

≤ f 2m + E(Xm) − E(Xm)2 = fm. �

Proposition 2

Var

(N∑

k=m

Xk

)≤

N∑

k=m

fk.

Proof

Var

(N∑

k=m

Xk

)=

m≤i,j≤N

(E(XiXj ) − E(Xi)E(Xj )

)(15)

=∑

m≤i,j≤N

(xi,j − fifj ) +N∑

k=m

fk (16)

≤N∑

k=m

fk, (17)

where the last inequality follows from Lemma 1. �

Page 10: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

380 K. Eriksson et al.

4 What does it take to qualify on the toplist?

In this section we will analyze the asymptotics of the number of votes that are neededto qualify on the toplist. To this end we will shortly define and examine two relatedvariables, K and k. First, we need a couple of lemmas.

Lemma 2 For any m,

fm � μN(1 − μ)m/m,

and for m = O(ln(N/�)

μ),

fm ∼ μN(1 − μ)m/m.

Proof By (2),

fm = μN(1 − μ)m−1

m

m−1∏

i=1

N − i

N − 1 − (1 − μ)i

so it suffices to show that the product

H :=m−1∏

i=1

N − i

N − 1 − (1 − μ)i

is asymptotically smaller than 1, and that it tends to 1 if m = O(ln(N/�)

μ). It is easily

verified that N−iN−1−(1−μ)i

is a decreasing function of i and that it is smaller than 1 ifi > 1/μ. Thus we obtain upper and lower bounds for H as follows.

H ≤(

N − 1

N − 2 + μ

)�1/μ =

(1 + 2 − μ

N − 2 + μ

)�1/μ ∼ exp

(2

μN

)→ 1

by Assumptions 1 and 3, and

H ≥(

N − (m − 1)

N − 1 − (1 − μ)(m − 1)

)m−1

=(

1 + 1 − μ(m − 1)

N − 1 − (1 − μ)(m − 1)

)m−1

∼ exp

((1 − μ(m − 1))(m − 1)

N − 1 − (1 − μ)(m − 1)

),

which tends to 1 if m = O(ln(N/�)

μ) since

(ln(N/�))2

μN= 1

μ�· (ln(N/�))2

N/�= o

(μ−1�−1) → 0

by Assumption 3. �

For 1 ≤ m ≤ N , let Sm = ∑Nk=m Xk be the number of songs with popularity at

least m.

Page 11: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

Bentley’s conjecture on popularity toplist turnover 381

Lemma 3 If μ−1 � m = O(ln(N/�)

μ) then

E(Sm) =N∑

k=m

fk ∼ fm/μ.

Proof Let φ be an integer variable such that μ−1 � φ � m. First we divide the suminto two terms:

N∑

k=m

fk = fm

(A︷ ︸︸ ︷

m+φ∑

k=m

fk/fm +

B︷ ︸︸ ︷N∑

k=m+φ+1

fk/fm

).

By Lemma 2 and the assumption on φ, we have

A ∼m+φ∑

k=m

m

k(1 − μ)k−m ∼

m+φ∑

k=m

(1 − μ)k−m = 1 − (1 − μ)φ+1

μ∼ 1 − e−φμ

μ∼ 1

μ.

Once again by Lemma 2 and the assumption on φ, we have

B �N∑

k=m+φ+1

m

k(1 − μ)k−m

<

∞∑

k=m+φ+1

(1 − μ)k−m = (1 − μ)φ+1

μ∼ e−φμ

μ= o(1/μ). (18)

Let K be the popularity of the �th most popular song, i.e. K is the largest integersuch that S

K≥ �. In other words, K is the number of votes needed to qualify on the

toplist. In order to estimate K we will study the related measure k, defined as thelargest integer such that E(S

k) ≥ �. Below we first determine the asymptotics of k

(Proposition 3). We will then compute the probability of a large deviation of K fromk (Proposition 5).

Proposition 3 We have

k ∼ ψln(N/�)

μ.

Proof By Lemma 3, it suffices to show that f(ψ+ε)μ−1 ln(N/�) � μ� andf(ψ−ε)μ−1 ln(N/�) � μ� for any (sufficiently small) fixed ε > 0.

By Lemma 2, we have

f(ψ±ε)μ−1 ln(N/�)

μ�∼ 1

μ�

μN(1 − μ)(ψ±ε)μ−1 ln(N/�)

(ψ ± ε)μ−1 ln(N/�)=: A±ε.

Page 12: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

382 K. Eriksson et al.

Taking the logarithm yields

lnA±ε = − ln(μ−1) + ln(N/�) + (ψ ± ε)μ−1 ln(N/�) ln(1 − μ)

− ln(ψ ± ε) − ln ln(N/�)

= − ln(μ−1) + ln(N/�) − (

1 + o(1))(ψ ± ε) ln(N/�)

− ln(ψ ± ε) − ln ln(N/�)

= {Assumption 2}= −(

1 − ψ + o(1))

ln(N/�) + ln(N/�)

− (1 + o(1)

)(ψ ± ε) ln(N/�) − ln(ψ ± ε) − ln ln(N/�)

= (∓ε + o(1))

ln(N/�) − ln(ψ ± ε) − ln ln(N/�) → ∓∞.

Thus, A+ε tends to zero and A−ε tends to infinity. �

Proposition 4

fk+o(μ−1)

∼ μ�.

Proof From Lemma 3 and the definition of k it follows that fk∼ μ�. Then Lemma 2

and Proposition 3 tell us that

fk+o(μ−1)

∼ fk

k

k + o(μ−1)(1 − μ)o(μ−1) ∼ f

k. �

Proposition 5 Suppose 0 < ρ = o(μ−1). Then the following holds:

P(|K − k| > ρ

)� 2ρ−2μ−2�−1.

Proof Without loss of generality, we can assume that ρ is an integer. It follows fromthe definition of K that

P(K − k > ρ) ≤ P(Sk+ρ

> �)

and

P(k − K > ρ) ≤ P(Sk−ρ

< �).

Due to the assumption on ρ, Proposition 4 tells us that

E(Sk±ρ

) = E(Sk) ∓

ρ∑

i=1

fk±i

∼ � ∓ ρμ�,

Page 13: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

Bentley’s conjecture on popularity toplist turnover 383

and we also have

Var(Sk±ρ

) ≤ {Proposition 2} ≤ E(Sk±ρ

)

∼ {Lemma 3} ∼ fk±ρ

/μ ∼ {Proposition 4} ∼ �.

Finally, combining the observations above, we obtain

P(Sk±ρ

≷ �) ≤ P(∣∣S

k±ρ− E(S

k±ρ)∣∣ >

∣∣� − E(Sk±ρ

)∣∣)

≤ {Chebyshev’s inequality}

≤ Var(Sk±ρ

)

|� − E(Sk±ρ

)|2

∼ �

(ρμ�)2. �

5 How the popularity of a song changes over time

Define the probabilities

pleft(k) := k

N

(1 − (1 − μ)

k − 1

N − 1

)

and

pright(k) :=(

1 − k

N

)(1 − μ)

k

N − 1

and pstay(k) := 1 − pleft(k) − pright(k). It follows from the definition of the IAMmodel that pleft(k) and pright(k) are the probabilities for a song with k votes to loseand gain a vote, respectively, in the next time step; pstay(k) is the probability that thenumber of votes for this song is not affected in this step.

Given a positive integer k ≤ N , define a random integer sequence (K(k)t )∞t=0 as

follows. Put K(k)0 := k and, assuming that K

(k)0 ,K

(k)1 , . . . ,K

(k)t−1 have already been

defined, let K(k)t := K

(k)t−1 − 1 with probability pleft(K

(k)t−1) and K

(k)t := K

(k)t−1 + 1

with probability pright(K(k)t−1); otherwise, K(k)

t := K(k)t−1. Thus, the sequence (K

(k)t )∞t=0

describes the evolution of the popularity of a song that has k votes from the begin-ning.

We will be interested in assessing the evolution of the random process after α

generations, that is, αN time steps. In order to get a grip on this, we will define and

examine three other random integer sequences derived from (K(k)t )∞t=0. We begin by

a brief overview, saving the details until later. For t = 1,2, . . . :

– U(k)t is basically K

(k)t − K

(k)t−1, but adjusted in such a way that P(U

(k)t = −1) =

pleft(k) and P(U(k)t = 1) = pright(k);

Page 14: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

384 K. Eriksson et al.

– V(k)t is the said adjustment, i.e., V

(k)t := K

(k)t − K

(k)t−1 − U

(k)t ;

– V(k,δ)t is basically V

(k)t , but adjusted to zero if |K(k)

t−1 − k| > δ.

For notational convenience, we define symbols for the sums of the first αN el-

ements of these sequences: U (k) := ∑αNt=1 U

(k)t , V (k) := ∑αN

t=1 V(k)t and V (k,δ) :=

∑αNt=1 V

(k,δ)t .

With this notation, we can express the total change over α generations to the pop-ularity of a song that starts with k votes as

K(k)αN − k = U (k) + V (k).

The purpose of this is that U (k) approximates the total change by assuming that theprobabilities for going left and right are constant, and V (k) is the total adjustment onemust make to that approximation.

Our aim in this section will be to prove that it is unlikely that the total change islarge (Proposition 6), and unlikely that the total adjustment is large (Proposition 7).To achieve this, we must first examine the sequences and then their sums.

5.1 The U(k)t sequence

Define the random integer sequence (U(k)t )∞t=1 as follows.

– If K(k)t < K

(k)t−1, then we put U

(k)t = −1 with probability min{1,pleft(k)/

pleft(K(k)t−1)}, otherwise U

(k)t = 0.

– If K(k)t > K

(k)t−1, then we put U

(k)t = 1 with probability min{1,pright(k)/

pright(K(k)t−1)}, otherwise U

(k)t = 0.

– If K(k)t = K

(k)t−1, then we put U

(k)t = −1 with probability

max

{0,

pleft(k) − pleft(K(k)t−1)

pstay(K(k)t−1)

},

U(k)t = 1 with probability

max

{0,

pright(k) − pright(K(k)t−1)

pstay(K(k)t−1)

},

and U(k)t = 0 otherwise.

In this way, U(k)1 ,U

(k)2 , . . . become independent identically distributed random vari-

ables, with P(U(k)t = −1) = pleft(k) and P(U

(k)t = 1) = pright(k).

Page 15: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

Bentley’s conjecture on popularity toplist turnover 385

5.2 The V(k)t sequence

Define the random integer sequence (V(k)t )∞t=1 by V

(k)t := K

(k)t − K

(k)t−1 − U

(k)t . Ob-

serve that

P(V

(k)t = −1 |K(k)

t−1 = k)

= max{0,pleft(k) − pleft(k)

} + max{0,pright(k) − pright(k)

}

= max

{0,

k − k

N

(1 − (1 − μ)

k + k − 1

N − 1

)}

+ max

{0,

(1 − k + k

N

)(1 − μ)

k − k

N − 1

}

≤ 2|k − k|/N (19)

and

P(V

(k)t = +1 |K(k)

t−1 = k)

= max{0,pright(k) − pright(k)

} + max{0,pleft(k) − pleft(k)

}

= max

{0,

(1 − k + k

N

)(1 − μ)

k − k

N − 1

}

+ max

{0,

k − k

N

(1 − (1 − μ)

k + k − 1

N − 1

)}

≤ 2|k − k|/N. (20)

We will also need that

P(V

(k)t = +1 |K(k)

t−1 = k) − P

(V

(k)t = −1 |K(k)

t−1 = k)

= (pright(k) − pright(k)

) − (pleft(k) − pleft(k)

) = μ(k − k)/N. (21)

5.3 The V(k,δ)t sequence

For any positive integer δ, define the random integer sequence (V(k,δ)t )∞t=1 by V

(k,δ)t =

V(k)t if |K(k)

t−1 − k| ≤ δ and V(k,δ)t = 0 otherwise. From (19), (20) and (21) it follows

that

P(V

(k,δ)t = −1

) ≤ 2δ/N, (22)

P(V

(k,δ)t = +1

) ≤ 2δ/N, (23)

Page 16: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

386 K. Eriksson et al.

∣∣E(V

(k,δ)t

)∣∣ ≤ μδ/N, (24)∣∣E

((V

(k,δ)t

)2)∣∣ ≤ 4δ/N. (25)

5.4 Analysis of V (k,δ)

The purpose of defining V (k,δ) is that it is easier to analyze than V (k). Before showingthat it is unlikely that V (k,δ) is large, we shall find bounds for its first and secondmomentum.

From (24), we obtain

∣∣E(V (k,δ)

)∣∣ ≤αN∑

t=1

∣∣E(V

(k,δ)t

)∣∣ ≤ αμδ. (26)

It follows from (21) that |E(V(k,δ)t | “event”)| ≤ μδ/N for any “event” that takes place

at an earlier point in time than t . Using this and (22), (23) and (24), we conclude that,if 1 ≤ s < t ,

∣∣E(V (k,δ)

s V(k,δ)t

)∣∣ ≤ P(V (k,δ)

s = 1) · ∣∣E(

V(k,δ)t |V (k,δ)

s = 1)∣∣

+ P(V (k,δ)

s = −1) · ∣∣E(

V(k,δ)t |V (k,δ)

s = −1)∣∣

≤ 2δ

N· μδ

N+ 2δ

N· μδ

N

= 4μδ2/N2.

Combining this with (25), we obtain

E((

V (k,δ))2) =

αN∑

t=1

E((

V(k,δ)t

)2) + 2∑

1≤s<t≤αN

E(V (k,δ)

s V(k,δ)t

)(27)

≤ 4αδ + 4α2δ2μ. (28)

Lemma 4 Suppose k = O(k) and 0 < δ = O(k). Then, the following holds for d > 0:

P(∣∣V (k,δ)

∣∣ > d) = O

(δd−2(1 + δμ)

).

Proof If d = O(μδ), then we would have δd−2(1 + δμ) → ∞ by the assumptionon δ together with Corollary 1. Thus, without loss of generality, we may assume thatd � μδ and hence d > |E(V (k,δ))| by (26). Then the following holds.

P(∣∣V (k,δ)

∣∣ > d) ≤ P

(∣∣V (k,δ) − E(V (k,δ)

)∣∣ > d − E(V (k,δ)

))(29)

≤ {Chebyshev’s inequality} ≤ Var(V (k,δ))

(d − E(V (k,δ)))2(30)

Page 17: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

Bentley’s conjecture on popularity toplist turnover 387

≤ E((V (k,δ))2)

(d − E(V (k,δ)))2= {(26) and (28)} (31)

= O(δ + δ2μ)

(d − O(μδ))2= {the assumption that δ = O(k)} (32)

= O(δd−2(1 + δμ)

). (33)

5.5 Analysis of U (k)

It is much easier to deal with the expected value and variance of U (k). Since

E(U

(k)t

) = pright(k) − pleft(k) = −μk/N (34)

and

Var(U

(k)t

) = E((

U(k)t

)2) − E(U

(k)t

)2 = pright(k) + pleft(k) − μ2k2

N2(35)

= k

N

(2 − μ − 2(1 − μ)

k − 1

N − 1

)− μ2k2

N2<

2k

N, (36)

we get

E(U (k)

) = αNE(U

(k)t

) = −αμk (37)

and, since U(k)1 ,U

(k)2 , . . . are independent,

Var(U (k)

) = αN Var(U

(k)t

)< 2αk. (38)

Lemma 5 If k = O(k), then, for d > 0,

P(∣∣U (k)

∣∣ > d) = O

(d−2k

).

Proof If d = O(μk) we would have d−2k → ∞ by Corollary 1. Thus, without lossof generality, we may assume that d � μk and hence d > |E(U (k))| by (37). Thenthe following holds.

P(∣∣U (k)

∣∣ > d) ≤ P

(∣∣U (k) − E(U (k)

)∣∣ > d − E(U (k)

))(39)

≤ {Chebyshev’s inequality} ≤ Var(U (k))

(d − E(U (k)))2(40)

≤ E((U (k))2)

(d − E(U (k)))2= {(37) and (38)} (41)

Page 18: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

388 K. Eriksson et al.

= O(k)(d − αμk

)2= O

(d−2k

). (42)

5.6 Analysis of V (k) and U (k) + V (k)

In order to prove our desired propositions about the improbability of large values ofV (k) and U (k) + V (k), we need two additional lemmas. First, we study U (k) + V (k,δ)

instead.

Lemma 6 If k = O(k) and 0 < δ = O(k), then the following holds:

P(∣∣U (k) + V (k,δ)

∣∣ > δ/2) = O

(δ−2k + μ

).

Proof Obviously,

P(∣∣U (k) + V (k,δ)

∣∣ > δ/2) ≤ P

(∣∣U (k)∣∣ > δ/4

) + P(∣∣V (k,δ)

∣∣ > δ/4).

The first term, P(|U (k)| > δ/4), is O(δ−2k) by Lemma 5. The other term,P(|V (k,δ)| > δ/4), is O(δ−1 + μ) by Lemma 4 with d = δ/4, and O(δ−1 + μ) isO((δ−2k + μ) by the assumption that δ = O(k). �

Next we must show that it is unlikely that V (k,δ) differs from V (k).

Lemma 7 If k = O(k) and 0 < δ = O(k), then

P(∃t ≤ αN : V

(k,δ)t = V

(k)t

) = O(δ−2k + μ

).

Proof If there is a t ≤ αN such that V(k,δ)t = V

(k)t , then clearly |Kt−1 − k| > δ

and there must exist a T ≤ αN such that |∑Tτ=1(U

(k)τ + V

(k,δ)τ )| = �δ + 1�. Then,

by Lemma 6, the probability that |∑αNτ=T +1(U

(k)τ +V

(k,δ)τ )| ≤ δ/2 is 1−O(δ−2k+μ).

But if this probable event happens, then we must have |∑αNτ=1(U

(k)τ + V

(k,δ)τ )| >

δ/2 which, again by Lemma 6, happens with probability O(δ−2k + μ). Thus, the

event that there is a t ≤ αN such that V(k,δ)t = V

(k)t happens with probability

O(δ−2k + μ). �

We are now in a position to derive the desired propositions.

Proposition 6 If k = O(k) and 0 < d = O(k) then

P(∣∣U (k) + V (k)

∣∣ > d) = O

(d−2k + μ

).

Proof This follows directly from Lemmas 6 and 7 with δ = 2d . �

Page 19: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

Bentley’s conjecture on popularity toplist turnover 389

Proposition 7 If k = O(k), then, for d > 0,

P(∣∣V (k)

∣∣ > d) = O

(μ−1d−2 + μ ln(N/�)

).

Proof With δ = μ−1, Lemmas 4 and 7 yield

P(∣∣V (k)

∣∣ > d) = O

(δd−2(1 + δμ)

) + O(δ−2k + μ

) = O(μ−1d−2 + μ ln(N/�)

).

6 The proof of the main theorem

In this section we will use our previous achievements to finally prove Theorem 1. Wewill need the classical Berry–Esseen Theorem, which says how well the distributionof a sum of i.i.d. random variables is approximated by a normal distribution.

Theorem 2 (The Berry–Esseen Theorem) Let W1,W2, . . . be independent andidentically distributed random variables with E(Wi) = 0, E(W 2

i ) = σ 2 > 0 andE(|Wi |3) = τ < ∞. Then, for any x and n,

∣∣P(W1 + · · · + Wn ≤ x) − Φ(x/

(σ√

n))∣∣ ≤ τ

σ 3√

n.

At any point in time, let the pseudolist be the set of songs whose popularity isgreater than or equal to k.

Proposition 8 The expected number L of songs that are on the pseudolist at a timet0 but not at time t0 + αN is ∼ √

ψα/π · � · √μ ln(N/�).

Proof Consider a song that has popularity k ≥ k at time t0, and define the random

sequence (K(k)t )∞t=0 by letting K

(k)t be the popularity of the song at time t0 + t . (Note

that this definition is statistically equivalent to the definition of K(k)t given in Sect. 5.)

Let Qk := P(K(k)αN < k) be the probability that the song has left the pseudolist after

αN time steps.Introduce a variable v that tends to infinity but slowly enough so that

vμ ln(N/�) → 0 and v = o(√

ln(N/�)). (In fact, v = o(√

ln(N/�)) impliesvμ ln(N/�) → 0 via Assumption 2 but that does not matter.) We can divide L intothree terms as follows.

L =N∑

k=k

fkQk =

L1︷ ︸︸ ︷�k+v

√k�∑

k=k

fkQk +

L2︷ ︸︸ ︷�k/ψ�∑

k=�k+v

√k�+1

fkQk +

L3︷ ︸︸ ︷N∑

k=�k/ψ�+1

fkQk .

Page 20: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

390 K. Eriksson et al.

First, we will deal with the term L1, so assume that k ∈ [k, k + v√

k]. Let d =μ−1/2(ln(N/�))1/4 and define three events A1, A2, and B as follows.

A1 ⇐⇒ k + U (k) < k − d, (43)

A2 ⇐⇒ k + U (k) < k + d, (44)

B ⇐⇒ ∣∣V (k)∣∣ > d. (45)

We have the implications

A1 and not B =⇒ K(k)αN < k =⇒ A2 or B

and hence the inequalities

P(A1) − P(B) ≤ Qk ≤ P(A2) + P(B). (46)

By Proposition 7, we have P(B) = O(μ−1d−2 + μ ln(N/�)) which is o(v−1) withour choice of d and v.

Let

Wt := U(k)t − E

(U

(k)t

) = {(34)} = U(k)t + μk

N

and define

σ 2 := E(W 2

t

) = Var(U

(k)t

) = {(36)}

= k

N

(2 − μ − 2(1 − μ)

k − 1

N − 1

)− μ2k2

N2∼ 2k

N

and

τ := E(|Wt |3

)

≤ E(∣∣(U(k)

t

)3∣∣) + 3E((

U(k)t

)2)∣∣E(U

(k)t

)∣∣ + 3E(∣∣U(k)

t

∣∣)E(U

(k)t

)2 + ∣∣E(U

(k)t

)3∣∣

∼ 2k

N.

The Berry–Esseen Theorem now yields that, for any s,

∣∣P(U (k) ≤ s

) − Φ((s + αμk)/

(σ√

αN))∣∣ � 1√

2αk,

which implies, for i = 1,2, that∣∣P(Ai) − Φ

((k − k + (−1)id + αμk

)/(σ√

αN))∣∣ = O

(k−1/2). (47)

Since d = μ−1/2(ln(N/�))1/4 and σ 2 ∼ 2k/N = Θ(k/N), Corollary 1 yields that

d + αμk

σ√

αN→ 0.

Page 21: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

Bentley’s conjecture on popularity toplist turnover 391

Combining this with (46) and (47), we obtain

∣∣∣∣Qk − Φ

((1 + o(1)

) k − k√2αk

+ o(1)

)∣∣∣∣ = O(k−1/2) + P(B) = o

(v−1),

where the ordos converge uniformly over the interval k ∈ [k, k + v√

k]. Summationover this interval yields

∣∣∣∣∣

�k+v

√k�∑

k=k

Qk −�k+v

√k�∑

k=k

Φ

((1 + o(1)

) k − k√2αk

+ o(1)

)∣∣∣∣∣ = v

√k · o(

v−1) = o(√

k).

By Proposition 4, fk ∼ μ�, so

L1

μ�∼ o

(√k) +

�k+v

√k�∑

k=k

Φ

((1 + o(1)

) k − k√2αk

+ o(1)

)(48)

∼ o(√

k) +

∫ k+v

√k

k=k

Φ

((1 + o(1)

) k − k√2αk

+ o(1)

)dk (49)

={x := k − k√

2αk

}(50)

= o(√

k) +

√2αk

∫ 0

x=−v/√

Φ((

1 + o(1))x + o(1)

)dx (51)

∼ o(√

k) +

√2αk

∫ 0

x=−∞Φ(x)dx (52)

∼√

αk√π

. (53)

Next, we will deal with the term L2.

L2 < fk

�k/ψ�∑

k=�k+v

√k�

Qk = {Propositions 4 and 6} (54)

= μ�

�k/ψ�∑

k=�k+v

√k�

O((k − k)−2k + μ

)(55)

= μ� · O(∫ (ψ−1−1)k

x=v

√k

(x−2k + μ

)dx

)(56)

Page 22: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

392 K. Eriksson et al.

= μ� · O(k(v−1k−1/2 − (

ψ−1 − 1)−1

k−1) + μ((

ψ−1 − 1)k − v

√k))

(57)

= μ� · (o(√k) + ln(N/�)

) = {Corollary 1} (58)

= μ� · o(√k). (59)

The last term, L3, is small simply because there are very few songs that are reallypopular:

L3

�√

μ≤ 1

�√

μ

N∑

k=�k/ψ�+1

fk ∼ {Lemma 3}

∼ �−1μ−3/2f�k/ψ�+1 ∼ {Lemma 2}

∼ ψN(1 − μ)k/ψ

�√

μ · k ∼ {Proposition 3}

∼√

μ · N(1 − μ)(1+o(1))μ−1 ln(N/�)

� ln(N/�)=: A.

Taking the logarithm yields

lnA = −1

2ln

(μ−1) + ln(N/�) + (

1 + o(1))μ−1 ln(N/�) ln(1 − μ) − ln ln(N/�)

= −1

2ln

(μ−1) + ln(N/�) − (

1 + o(1))

ln(N/�) − ln ln(N/�)

= {Assumption 2}

= −(

1

2+ o(1)

)ln(N/�) + ln(N/�) − (

1 + o(1))

ln(N/�) − ln ln(N/�)

= −(

1

2+ o(1)

)ln(N/�) → −∞.

Thus, A → 0 and L3 = o(�√

μ). �

In the end we are interested in the actual toplist rather than the pseudolist. Thefollowing proposition gives an upper bound for the difference between these lists.

Proposition 9 Let S be the number of songs that are on exactly one of the toplist andthe pseudolist. Then, the expected value E(S) = O(�

√μ + μ−1/2 lnμ−1).

Proof Let 〈k, K〉 := [k, K] ∪ [K, k] denote the set of real numbers (inclusively) be-tween k and K .

Page 23: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

Bentley’s conjecture on popularity toplist turnover 393

First, we overestimate S like this:

S ≤A︷ ︸︸ ︷∑

k∈〈k,K〉|k−k|≤μ−1/2

Xk +B︷ ︸︸ ︷∑

k∈〈k,K〉μ−1/2<|k−k|≤μ−3/4

Xk

+C︷ ︸︸ ︷

1K>k+μ−3/4

k<k≤N

Xk +D︷ ︸︸ ︷

1K<k−μ−3/4�, (60)

where 1“event” is an indicator variable for “event”. From Proposition 4 we recall thatE(Xk) = fk ∼ μ� if |k − k| = o(μ−1). Thus

E(A) ≤ E

( ∑

|k−k|<μ−1/2

Xk

)=

|k−k|<μ−1/2

fk ∼∑

|k−k|<μ−1/2

μ� = O(�√

μ).

Now we will deal with the second sum, B , which can be written

B =∑

μ−1/2<|k−k|≤μ−3/4

Xk1k∈〈k,K〉.

By the Cauchy–Schwartz inequality,

E(B) =∑

μ−1/2<|k−k|≤μ−3/4

(fkP

(k ∈ 〈k, K〉) + Cov(Xk,1

k∈〈k,K〉))

≤∑

μ−1/2<|k−k|≤μ−3/4

(fkP

(k ∈ 〈k, K〉) +

√Var(Xk)Var(1

k∈〈k,K〉)).

By Proposition 1, Var(Xk) ≤ fk , so we obtain

E(B) ≤∑

μ−1/2<|k−k|≤μ−3/4

(fkP

(k ∈ 〈k, K〉) +

√fkP

(k ∈ 〈k, K〉)) (61)

∼∑

μ−1/2<|k−k|≤μ−3/4

(μ�P

(k ∈ 〈k, K〉) +

√μ�P

(k ∈ 〈k, K〉)). (62)

Proposition 5 with ρ = |k − k| yields

P(k ∈ 〈k, K〉) ≤ P

(|K − k| ≥ |k − k|) � 2|k − k|−2μ−2�−1,

Page 24: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

394 K. Eriksson et al.

and we obtain

E(B) � 2∑

μ−1/2<|k−k|≤μ−3/4

(μ−1|k − k|−2 + μ−1/2|k − k|)

≤ 4�μ−3/4�∑

i=�μ−1/2

(μ−1i−2 + μ−1/2i−1).

In the last summand, the second term is larger than the first term over the summationinterval, so

E(B) � 8μ−1/2�μ−3/4�∑

i=�μ−1/2 i−1

= O(μ−1/2 lnμ−1).

By the Cauchy–Schwartz inequality,

E(C) = E(1K>k+μ−3/4)E

( ∑

k<k≤N

Xk

)+ Cov

(1K>k+μ−3/4 ,

k<k≤N

Xk

)

≤ P(K > k + μ−3/4)E

( ∑

k<k≤N

Xk

)

+√√√√P

(K > k + μ−3/4

)Var

( ∑

k<k≤N

Xk

).

By definition of k we have E(∑

k<k≤NXk) < �, so by Proposition 2,

Var(∑

k<k≤NXk) < �. Proposition 5 with ρ = μ−3/4 yields P(K > k + μ−3/4) =

O(μ−1/2�−1). Thus,

E(C) = O(μ−1/2�−1� +

√μ−1/2�−1�

) = O(μ−1/2).

Finally, by Proposition 5, we get E(D) = P(K < k − μ−3/4)� = O(μ−1/2). �

Now, at last, we are ready to prove Theorem 1.

Proof of Theorem 1 By Proposition 8 it suffices to show that the expected numberE(S) of songs that leave either the toplist or the pseudolist without leaving the otherone, is much smaller than �

√μ lnN/�. Proposition 9 tells us that E(S) = O(�

√μ +

μ−1/2 lnμ−1) and the first term �√

μ is clearly small enough. The second term is

Page 25: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

Bentley’s conjecture on popularity toplist turnover 395

μ−1/2 lnμ−1 = lnμ−1

ln(N/�)μ−1/2 ln(N/�) ∼ {Assumption 2} (63)

∼ (1 − ψ)μ−1/2 ln(N/�) < μ−1/2√

ln(N/�)√

ln(N/�) (64)

� {Assumption 3} � μ−1/2μ�√

ln(N/�). (65)

7 Discussion

Bentley et al. [3] conjectured a simple expression for the turnover rate of popular-ity toplists in a random copying model with nonoverlapping generations (the infi-nite alleles Wright–Fisher model). In this paper we instead studied the overlappinggenerations version, known as the infinite alleles Moran model. We first showed bysimulations that the toplist turnover rate seems to behave in the same way for the twomodels (for the appropriate regime of short lists compared to the population size). Wethen proved an asymptotic formula for the turnover rate, which modifies the conjec-tured formula by a factor

√ln(N/�). In other words, the turnover rate is not perfectly

independent of the population size N , but the dependence will not be noticeable indata unless one considers truly huge variations of N .

It is interesting that the two models behave so similarly with respect to toplistturnover. It is worth investigating how robust this behavior is to other reasonablechanges of the model. For instance, there may be various forms of biases to the ran-dom copying, as discussed by Boyd and Richerson [4]. For instance, some pop songsmay actually be better than others in some sense that makes them more likely to bevoted for. Boyd and Richerson also discuss frequency-dependent biases, that wouldmake already popular songs more (or less) likely to be voted for.

Another type of change is to let the population be increasing rather than fixed.For instance, if we remove the death step from the IAM model, it becomes equiv-alent to economist Herbert Simon’s famous model of urban growth [7], for whichtoplist turnover would certainly be a relevant aspect to study. The book of Andrewsand Eriksson [2] discusses a couple of other important random growth processes onYoung diagrams for which the same question could be asked.

The toplist turnover problem seems to be novel. We envisage that a broader math-ematical investigation of it may be fruitful.

References

1. Andrews, G.E., Berndt, B.C.: Your hit parade: The top ten most fascinating formulas in Ramanujan’slost notebook. Not. Am. Math. Soc. 55, 18–30 (2008)

2. Andrews, G.E., Eriksson, K.: Integer Partitions. Cambridge University Press, Cambridge (2004)3. Bentley, R.A., Lipo, C.P., Herzog, H.A., Hahn, M.W.: Regular rates of popular culture change reflect

random copying. Evol. Human Behav. 28, 151–158 (2007)4. Boyd, R., Richerson, P.J.: Culture and the Evolutionary Process. University of Chicago Press, Chicago

(1985)

Page 26: Bentley’s conjecture on popularity toplist turnover under ...to toplist turnover per generation, if we in the IAM model define a generation as N time steps. Figure 1 shows the turnover

396 K. Eriksson et al.

5. Eriksson, K., Sjöstrand, J.: Limiting shapes of birth-and-death processes on Young diagrams. Adv.Appl. Math. (in press)

6. Ewens, W.J.: Mathematical Population Genetics, 2nd edn. Springer, New York (2004)7. Simon, H.: On a class of skew distribution functions. Biometrika 42, 425–440 (1955)8. Strimling, P., Sjöstrand, J., Eriksson, K., Enquist, M.: Accumulation of independent cultural traits.

Theor. Popul. Biol. 76, 77–83 (2009)