The Persistent-Access-Caching Algorithmpredrag/mypub/PAC04.pdf · 2011. 5. 2. · Previously described cache replacement policy we call PAC(β,k) algorithm with β,kbeing ﬁxed design

The Persistent-Access-Caching Algorithm

Predrag R. Jelenković† and Ana Radovanović‡

Department of Electrical Engineering

Columbia University

New York, NY 10027, USA

{predrag,anar}@ee.columbia.edu

tel.: (1 212) 854 8174, 939 7157

fax: + (1 212) 932 9421

April 2004

Abstract

Caching is widely recognized as an effective mechanism for improving the performance of the WorldWide Web. One of the key components in engineering the Web caching systems is designing documentplacement/replacement algorithms for updating the collection of cached documents. The main objectivesof such a policy are the high cache hit ratio, ease of implementation, low complexity and adaptability tothe fluctuations in access patterns. These objectives are satisfied by the widely used heuristic called theLeast-Recently-Used (LRU) cache replacement rule. However, in the context of the independent refer-ence model, the LRU policy can significantly underperform the optimal Least-Frequently-Used (LFU)algorithm. On the other hand, the benefit of the LFU rule is counterbalanced by its higher implementationcomplexity and lower adaptability to changes in access frequencies.

To alleviate this problem, we introduce a new LRU-based rule, termed the Persistent-Access-Caching(PAC), which essentially preserves all of the desirable attributes of the LRU scheme. Furthermore, forthe independent reference model and generalized Zipf’s law request probabilities, we prove that, for largecache sizes, the PAC policy performs arbitrarily close to the optimal LFU algorithm. The PAC algorithmmakes the replacement decisions based on the references collected during the preceding interval of fixedlength that, for large caches, adds a negligible additional complexity when compared to the ordinaryLRU policy.

Keywords: persistent-access-caching, least-recently-used caching, least-frequently-usedcaching, move-to-front searching, generalized Zipf’s law distributions, heavy-tailed distributions, Web caching, cachefault probability, average-case analysis

†Supported by NSF Grants No. 0092113 and 0117738.‡Supported by IBM PhD Fellowship.Technical Report EE2004-03-05, Dept. of Electrical Engineering, Columbia University, April 2004. Submitted for publication.

1 Introduction

Since the recent invention of the World Wide Web, there has been an explosive growth in multimedia in-formation content and services that include data, audio, video, software downloads, remote service hosting,etc. These distributed multimedia content and services are now an integral part of modern communicationnetworks (e.g., the Internet) and, therefore, are redefining the role of networking to incorporate the storageand service of information, in addition to the traditional task of information transfer. Since network infor-mation and its access are massively distributed, and the same content is repeatedly used by groups of users,it is clear that bringing some of the more popular items closer to the end-users can improve the network per-formance, e.g., reduce the download latency and network congestion. This type of information replicationand redistribution system is often termed Web caching.

One of the key components of engineering efficient Web caching systems is designing document place-ment/replacement algorithms that are selecting and possibly dynamically updating a collection of frequentlyaccessed documents. The design of these algorithms has to be done with special care since the latency andnetwork congestion may actually increase if documents with low access frequency are cached. Thus, themain objective is to achieve high cache hit ratios, while maintaining ease of implementation and scalabil-ity. Furthermore, these algorithms need to be self-organizing and robust since the document access patternsexhibit a high degree of spatial as well as time fluctuations. The well-known heuristic named the Least-Recently-Used (LRU) cache replacement rule satisfies all of the previously mentioned attributes and, there-fore, represents a basis for designing many practical replacement algorithms. However, as shown in [9] inthe context of the stationary independent reference model with generalized Zipf’s law requests, this rule isby a constant factor away from the optimal frequency algorithm that keeps in the cache most frequently useddocuments, i.e., replaces Least-Frequently-Used (LFU) items. On the other hand, the drawbacks of the LFUalgorithm are that it needs to know (measure) the document access frequencies and employ aging schemesbased on reference counters in order to cope with evolving access patterns, which result in high complexity.In the context of database disk buffering, [15] proposes a modification of the LRU policy, called LRU-K,that uses the information of the last K reference times for each document in order to make replacementdecisions. It is shown in [15] that the fault probability of the LRU-K policy approaches, as K increases, theperformance of the optimal LFU scheme. However, practical implementation of the LRU-K policy wouldstill be the same order of complexity as the LFU rule. Furthermore, for larger values ofK, that might berequired for nearly optimal performance, the adaptability of this algorithm to changes in traffic patterns willbe significantly reduced.

In this paper we design a new LRU-based policy, termed the Persistent-Access-Caching (PAC) rule, thatessentially preserves all the desirable features of LRU caching, while achieving arbitrarily close performanceto the optimal LFU algorithm. Furthermore, the PAC algorithm has only negligible additional complexityin comparison to the widely used LRU policy. We analyze the PAC replacement scheme with i.i.d. requeststhat arrive at Poisson time points. If a requested document at timet, sayi, is not found in the cache, it isplaced inside only if it is requested more thank − 1 times in the open interval(t − β, t). The parameterskandβ are the fixed design values of the PAC algorithm, whose detailed description will be provided in thefollowing section. When the frequency of requesting a pagei is equal to the generalized Zipf’s lawc/iα,α > 0, we prove that the cache fault probability asymptotically approaches, ask increases, the performanceof the optimal LFU algorithm. It is surprising that even for the small values ofk, the performance ratiobetween the PAC and optimal algorithms significantly improves when compared to the ordinary LRU; forexample, in the case ofα > 1, this ratio drops from approximately1.78 for k = 1 to 1.18, 1.08 for k = 2, 3,respectively. Furthermore, we show that the derived asymptotic results and simulation experiments matcheach other very well, even for relatively small cache sizes.

2

Our analytical approach uses probabilistic (average-case) analysis that exploits the novel large deviationtechnique and asymptotic results that were recently developed in [9, 12]. The computation of the LRUfault probability is mathematically equivalent to the evaluation of the search cost distribution for the relatedMove-To-Front (MTF) searching scheme. For recent work on average case analysis of MTF and LRUalgorithms see [7, 6, 9, 12] and the references therein. For an alternative combinatorial (competitive, worstcase) approach to analyzing LRU caches the reader can consult [3, 14] and the references therein.

This paper is organized as follows. First, in Section 2, we formally describe the PAC policy with aPoisson reference model. Then, using the Poisson decomposition and superposition properties, we developa representation theorem for the stationary search cost of the related, Persistent Move-To-Front algorithm.This representation formula, in conjunction with the results on Poisson processes derived in Subsection 2.1,provide a starting point for proving our main theorems in Section 3. Informally, our main results show thatfor large cache sizes, independent reference model, and generalized Zipf’s law request distributions, thefault probability of the PAC algorithm approaches the optimal LFU policy with a negligible additional com-plexity. Furthermore, in Section 4, extensive numerical experiments show an excellent agreement betweenour analytical results and simulations. The paper is concluded in Section 5 with a brief discussion of ourresults and their possible extensions.

2 Model description and preliminary results

Consider a setL = {1, 2, . . . , N} of N ≤ ∞ documents (items), out of whichx can be stored in an easilyaccessible location, called cache. The remainingN − x documents are placed outside of the cache in aslower access medium. Documents are requested at moments{τn}n≥1 that represent a positive increasing

sequence of Poisson points of unit rate. Furthermore, define i.i.d. random variablesR(N), {R(N)n }n≥1,

independent from{τn}n≥1, where{R(N)n = i} represents a request for itemi at timeτn. We denote request

probabilities asq(N)i = P[R(N) = i] and, unless explicitly required for clarity, we omit the superscriptN

and simply writeR, Rn andqi; also, without loss of generality, we assumeq1 ≥ q2 ≥ . . . . Let M (qi)(u)be the number of requests for itemi in an open interval(u, u + β); note thatM(qi)(u) does not count therequests that possibly occur at exactly the end pointsu or u + β.

Documents stored in the cache are ordered in a list, which is sequentially searched upon a request fora document and is updated as follows. If a requested document at the momentτn, sayi, is found in thecache, we have a cache hit. In this case, ifM(qi)(τn − β) ≥ k − 1, item i is moved to the front of thelist while documents that were in front of itemi are shifted one position down; otherwise, the list staysunchanged. Furthermore, if documenti is not found in the cache, we call it a cache miss or fault. Then,similarly as before, ifM(qi)(τn − β) ≥ k − 1, documenti is brought to the first position of the cache listand the least recently moved item, i.e., the one at the last position of the list, is evicted from the cache.Previously described cache replacement policy we call PAC(β, k) algorithm withβ, k being fixed designparameters. The performance measure of interest is the cache fault probability, i.e., the probability that arequested document is not found in the cache.

Analyzing the PAC(β, k) algorithm is equivalent to investigating the corresponding Move-To-Front(MTF) scheme that is defined as follows. Consider a listL = {1, 2, . . . , N} and a process of requestsfor documents determined by{Rn}n≥1 and{τn}n≥1 as in the preceding paragraph. When a request for adocument arrives, sayRn = i, the list is searched and the requested item is moved to the front of the listonly if M(qi)(τn−β) ≥ k−1; otherwise the list stays unchanged. Previously described searching algorithmwe call Persistent-MTF, PMTF(β, k). The performance measure of interest for this algorithm is the searchcostC(N)

n that represents the position in the list of a document requested at timeτn.

3

Now, we claim that computing the cache fault probability of the PAC(β, k) algorithm is equivalent toevaluating the tail of the searching costC

(N)n of the PMTF(β, k) searching scheme. Note that the fault

probability of the PAC(β, k) algorithm stays the same regardless of the ordering of documents in the sloweraccess medium. In particular, these documents can be also ordered in an increasing order of the last timesthey are moved to the front of the cache list. Therefore, for the cache of sizex, it is clear that the faultprobability of the PAC(β, k) policy at the time ofnth request is the same as the probability that the searchcost of the corresponding PMTF(β, k) algorithm is greater thanx, i.e.,P[C(N)

n > x]. Hence, even thoughPAC(β, k) and PMTF(β, k) belong to different application areas, their performance analysis is essentiallyequivalent. Thus, in the rest of the paper we investigate the tail of the stationary search cost distribution.

First, we prove the convergence of the search costC(N)n to stationarity. Suppose that the system starts

at t = 0 with initial conditions given by an arbitrary initial permutationΠ0 of the list and a sequence ofrequestsR0 = {(τ0i, R0i)}i≥1 in interval (−β, 0); τ0i ∈ (−β, 0) is the time of theith initial requestR0i.

Denote the sequence of time points of requests for documenti as{τ(qi)n }n≥1. Then, by the assumptions

on the request process{Rn}, arrival times{τn} and Poisson decomposition theorem, we conclude that

processes{τ(qi)n }n≥1, i ≥ 1, are Poisson and independent.

In order to prove the convergence ofC(N)n to stationarity, we define another process of Poisson points

of unit rate on the negative part of the real line{τ−n}n≥0 and setτ0 = 0. Also, we define a sequenceof i.i.d. random variables{R−n}n≥0 that are equal in distribution toR1 and independent from{τ−n}n≥0.Now, for eachn we construct a PMTF(β, k) algorithm starting atτ−n, with a sequence of requests{R−m :m = 0, 1, . . . , n − 1} at times{τ−m : m = 0, 1, . . . , n − 1} and having the same initial condition as in theprevious paragraph, given byΠ0 andR0 in interval(τ−n−β, τ−n); note that in this construction we assume

that there is no request at timeτ−n. Let C(N)−n be the search cost atτ0 for the PMTF(β, k) algorithm starting

at τ−n.Now, if we consider the shift mappingRn−k → R−k and τn−k → τ−k for k = 0, 1, . . . n − 1, we

conclude that, since the corresponding sequences are equal in distribution, the search costsC(N)−n andC

(N)n

are also equal in distribution, i.e.,C(N)n

d= C(N)−n . Thus, instead of computing the tail of the search cost

C(N)n , we continue with evaluating the tail ofC(N)

−n . In this regard, we define a sequence of random times

{T (−n)i }N

i=1, n ≥ 1, where−T(−n)i represents the last time beforet = 0 that itemi was moved to the

front of the list in the PMTF(β, k) algorithm that started atτ−n; if item i is not moved in(τ−n, 0), we set

T(−n)i = −τ−n. Next, we define random times{Ti}N

i=1 as

Ti � − sup{τ (qi)−n < 0 : M (qi)(τ (qi)

−n − β) ≥ k − 1}, (1)

where process{τ(qi)−n }n≥0 represents the moments of requests for documenti; again, from the independence

of {R−n}n≥0 and {τ−n}n≥0, by the Poisson decomposition theorem, processes{τ(qi)−n }n≥1, i ≥ 1, are

Poisson and mutually independent.Next, from the definitions ofTi andT

(−n)i , we conclude thatTi = T

(−n)i a.s. on event{T (−n)

i <

−τ−n − β}. Therefore, the complementary sets of events are the same, i.e.,{Ti ≥ −τ−n − β} = {T (−n)i ≥

−τ−n − β}. Then, given the previous observations, we bound the tail of the search costC(N)−n as

P[C(N)−n > x,R0 = i,T

(−n)i < −τ−n − β] ≤ P[C(N)

−n > x,R0 = i] ≤ (2)

P[C(N)−n > x,R0 = i, T

(−n)i < −τ−n − β] + P[C(N)

−n > x,R0 = i, T(−n)i ≥ −τ−n − β].

Next, since on event{T(−n)i < −τ−n − β} the search costC(N)

−n is equal to the number of different docu-ments (includingi) that are moved to the front of the list from the last time that itemi was brought to the

4

first position of the list, we derive

P[C(N)−n > x,R0 = i, T

(−n)i < −τ−n − β] = P

R0 = i,

∑j �=i

1[T (−n)j < T

(−n)i < −τ−n − β] ≥ x

= qiP

∑

j �=i

1[Tj < Ti < −τ−n − β] ≥ x

,

where the last equality follows from the independence of processes{R−n}n≥1 and{τ−n}n≥0 and the equal-

ity Ti = T(−n)i on {Ti < −τ−n − β}, i ≥ 1. Thus, since−τ−n → ∞ a.s. asn → ∞, we conclude, by

monotone convergence theorem,

limn→∞

N∑i=1

P[C(N)−n > x,R0 = i, Ti < −τ−n − β] =

N∑i=1

qiP

∑

j �=i

1[Tj < Ti] ≥ x

. (3)

Next, note that

P[Ti ≥ −τ−n − β] ≤ P[Ti > n(1 − ε)] + P[−τ−n ≤ n(1 − ε) + β]. (4)

Then, due to the strong law of large numbers,−τn/n → 1 asn → ∞ and, thus,

limn→∞P[−τ−n ≤ n(1 − ε) + β] = 0. (5)

Furthermore, sinceTi < ∞ a.s., we obtain

limn→∞P[Ti > n(1 − ε)] = 0. (6)

Finally, equality of the events{T(−n)i ≥ −τ−n − β} = {Ti ≥ −τ−n − β}, independence of processes

{R−n}n≥0, {τ−n}n≥0 and (4) imply

limn→∞

N∑i=1

P[R0 = i, T(−n)i ≥ −τ−n−β] ≤ lim

n→∞P[−τ−n ≤ n(1−ε)+β]+ limn→∞

N∑i=1

qiP[Ti > n(1−ε)] = 0,

where in the last equality we applied (5), (6) and the monotone convergence theorem. The previous expres-sion, in conjunction with (3) and (2), implies the following result:

Lemma 1 For any 1 ≤ N ≤ ∞, arbitrary initial conditions (Π0,R0) and any x ≥ 0, the search cost C(N)n

converges in distribution to C(N) as n → ∞, where

P[C(N) > x] �N∑

i=1

qiP [Si(Ti) ≥ x] (7)

and Si(t) �∑

j �=i 1[Tj < t], i ≥ 1.

Remark 1 Note that the assumption on the Poisson request times is crucial for the analysis of the PACalgorithm. In particular, this assumption, in conjunction with i.i.d. request process, implies the independenceof the random times{Ti}N

i=1. Otherwise, if the request times are not Poisson, e.g., discrete time arrivals,these variables may not be independent, which would make the analysis possibly intractable. The Poissonembedding technique for LRU policy with i.i.d. requests was first introduced in [6].

In order to evaluate the tail of the stationary search costC(N), we need estimates for random timesTi,i ≥ 1. In the next section we prove both lower and upper bounds forTi, which we use to derive our mainresults in Section 3.

5

2.1 Preliminary results on Poisson processes

Let {τ (q)n }n≥1 be a positive increasing sequence of Poisson points with rateq and recall thatM(q)(u) is the

number of Poisson points in an open interval(u, u + β). Now, we investigate the distribution of the firsttimeT (q) such that the interval[T(q), T (q) + β) contains at leastk ≥ 1 Poisson points, i.e.,

T (q) � inf{τ (q)n > 0 : M (q)(τ (q)

n ) ≥ k − 1}. (8)

The following lemmas compute upper and lower bounds, respectively, on the distribution ofT(q) for all qsufficiently small.

Throughout the paperH denotes a sufficiently large positive constant, whileh denotes a sufficientlysmall positive constant. The values ofH andh are generally different in different places. For example,H/2 = H, H2 = H, H + 1 = H, etc.

Lemma 2 For any ε > 0, there exists q0 > 0 and a fixed constant h > 0, such that for all 0 < q ≤ q0,

P[T (q) > t] ≤ e− qkβk−1

(k−1)!(1−ε)t + 2e−hεqk−1t. (9)

Proof: In order to ease the notation, in the proof of this and the following lemma we omit the superscriptq

in T (q) and simply writeT . Fork = 1 the bound trivially holds sinceT ≡ τ(q)1 and, thus, we assume that

k ≥ 2.First, we define a sequence of random times{Θj}j≥1 ⊂ {τ (q)

n }n≥1 starting withΘ1 = τ(q)1 . Then,

we observe the first point in{τ(q)n }n≥1 after timeτ

(q)1 + β, sayτ

(q)i , and setΘ2 = τ

(q)i . Similarly, Θ3 is

defined to be the first point from{τ(q)n }n≥1 after timeτ

(q)i +β. By continuing this procedure indefinitely, we

construct the whole sequence{Θj}j≥1. Clearly, by the Poisson memoryless property,{Θj}j≥1 is a renewalsequence with renewal increments equal to

Θj+1 − Θjd= τ

(q)1 + β. (10)

Thus, a sequence of points{νj}j≥1, defined asν1 = Θ1, νj = Θj − β(j − 1), j ≥ 2, is Poisson of rateq.

Now, let U be defined by (8) with the sequence{Θj}j≥1 instead of{τ(q)n }n≥1. Since{Θj}j≥1 ⊂

{τ (q)n }n≥1, it is clear that

T ≤ U. (11)

Similarly, we defineX � inf{j ≥ 1 : M (q)(Θj) ≥ k − 1}. (12)

Because of the memoryless property of the Poisson process,X is independent of{νj}j≥1 (and{Θj}j≥1)with geometric distributionP[X = i] = (1 − p)i−1p, i ≥ 1, wherep is equal to

p = P[M (q)(0) ≥ k − 1].

Then, using the observations from the preceding two paragraphs, we can representU as

U = ΘX = νX + β(X − 1) (13)

with X and{νj} being independent.

6

Next, sinceνX is the sum ofX exponential random variables withX being geometric and independentof {νj}, then it is well known (e.g., see Theorem5.3, p. 89 of [4]) thatνX is also exponential with parameterpq. It is also straightforward to derive for anyε > 0 and allq ≤ q0 ≡ − log(1 − ε/2)/β

p = P[M (q)(0) ≥ k − 1] ≥ P[M (q)(0) = k − 1] = e−qβ (qβ)k−1

(k − 1)!≥ (1 − ε/2)

(qβ)k−1

(k − 1)!. (14)

At this point, using the observations from the previous paragraph, (11) and (13), we obtain, for allqsmall enough (q ≤ q0),

P[T > t] ≤ P[U > t]

≤ P

[νX >

(1 − ε

2

)t]

+ P

[βX >

ε

2t]

≤ e−pq(1− ε2)t + (1 − p)

ε2 t

β−1 (15)

≤ e− qkβk−1

(k−1)!(1−ε)t + 2e−hεqk−1t,

where in the last inequality we applied the bound1 − x ≤ e−x, x ≥ 0 and assumed thatq0 is small enoughsuch that1/(1 − p) ≤ 2. This completes the proof. �

Lemma 3 For any ε > 0, there exists q0 > 0 such that for all 0 < q ≤ q0

P[T (q) > t] ≥ e− qkβk−1

(k−1)!(1+ε)t

. (16)

Proof: Since the bound is immediate fork = 1, we assumek ≥ 2.First, we group the points{τ(q)

n }n≥1 into cycles using the following procedure. LetΘ1 = τ(q)1 and

define a random timeZ1 = inf{i ≥ 1 : M (q)(Θ1 + β(i − 1)) = 0}.

Then, the first cycleC1 is the closed interval of timeC1 = [Θ1,Θ1 + βZ1]. Next, the first point from

{τ (q)n }n≥1 after timeΘ1 + βZ1 is namedΘ2 and, similarly as before, we define a random time

Z2 = inf{i ≥ 1 : M (q)(Θ2 + β(i − 1)) = 0}

and the second cycleC2 = [Θ2,Θ2 + βZ2]. We continue this procedure indefinitely.Now, due to the Poisson memoryless property, the sequence of random times{Zi} is i.i.d. with geomet-

ric distributionP[Zi = j] = (P[M (q)(0) > 0])j−1

P[M (q)(0) = 0]. (17)

Similarly as in Lemma 2, by the memoryless property of the Poisson process,{Θj} is a renewal processwith renewal intervals equal to, forj ≥ 1,

Θj+1 − Θjd= τ

(i)1 + βZ1. (18)

Hence, the sequence{νj}, defined asν1 = Θ1, νj = Θj − β∑j−1

l=1 Zl, j ≥ 2, is Poisson of rateq. Notethat the sequences{Θj}j≥1, {νj}j≥1 as well as the other auxiliary variables (e.g.,p, X) are different fromthose in the proof of Lemma 2.

7

Next, we define forj ≥ 1 sets

Aj � {ω : ∃τ (q)n ∈ Cj,M

(q)(τ (q)n ) ≥ k − 1};

note that eventsAj are independent since the distribution of arrivals in different cycles are independent by

the Poisson assumption andM(q)(τ (q)n + β(Zj − 1)) = 0. Then, because the union of the arrival points in

all cycles∪jCj are equal to{τ(q)n },

T = inf{τ (q)n > 0 : M (q)(τ (q)

n ) ≥ k − 1, τ (q)n ∈ Cj, j ≥ 1}

≥ L � inf{Θj(ω) : ω ∈ Aj, j ≥ 1}, (19)

where the inequality is implied byτ(q)n ≥ Θj for anyτ

(q)n ∈ Cj, j ≥ 1.

Now, we defineX = inf{j ≥ 1 : ∃τ (q)

n ∈ Cj,M(q)(τ (q)

n ) ≥ k − 1}.The independendence of eventsAj implies thatX has a geometric distributionP[X = i] = (1 − p)i−1p,i ≥ 1, wherep is equal to

p = P[A1] ≤ P[{M (q)(0) ≥ k − 1} ∪ {M (q)(0, βZ1) ≥ k}],

whereM (q)(u, t), u ≤ t, denotes the number of points from{τ(q)n } in the open interval(u, t). In addition,

by the memoryless property of the Poisson process and the fact thatAj depends only on the distribution ofpoints inside the corresponding cycleCj, the variableX is independent of{νj}j≥1.

Therefore, using the preceding definitions, we arrive at

L = ΘX ≥ νX , (20)

whereX is independent of{νj}. Furthermore, similarly as in the proof of Lemma 2,νX is an exponentialrandom variable with distribution

P[νX > t] = e−pqt. (21)

Thus, in order to complete the proof, we need an upper bound onp. In this respect, using the union bound,we upper bound the success probabilityp as

p ≤ P[{M (q)(0) ≥ k − 1} ∪ {M (q)(0, βZ1) ≥ k}]≤ P[M (q)(0) ≥ k − 1] + P[M (q)(0, βZ1) ≥ k]

≤ P[M (q)(0) ≥ k − 1] + P[Z1 > k] + P[M (q)(0, βk) ≥ k]

= P[M (q)(0) ≥ k − 1] + P[M (q)(0) > 0]k + P[M (q)(0, βk) ≥ k], (22)

where in the last equality we used the geometric distribution ofZ1 from (17). Finally, (19), (20), (21) and(22), in conjunction with

P[M (q)(0) ≥ m] ≤ (qβ)m

m!

∞∑i=0

(qβ)i ≤ (1 + ε)(qβ)m

m!

for anyε > 0 and allq ≤ ε/β(1 + ε), yield the stated bound in the lemma. �

8

3 Main results

In this section we derive our main results in Theorems 1, 2 and 3, where we estimate the asymptotics ofthe tail of the stationary search costC(N) for α being greater, less and equal to one, respectively. Ourmethod of proof uses probabilistic and sample path arguments introduced in [12] for the case of ordinaryLRU (PAC(β, 1)) algorithm. The starting point of our analysis is given by representation formula in (7) fromSection 2.

In this paper we are using the following standard notation. For any two real functionsa(t) andb(t) andfixed t0 ∈ R ∪ {∞} we will usea(t) ∼ b(t) ast → t0 to denotelimt→t0 [a(t)/b(t)] = 1. Similarly, we saythata(t) � b(t) ast → t0 if lim inft→t0 a(t)/b(t) ≥ 1; a(t) � b(t) has a complementary definition.

In the following theorem, we assume thatN = ∞ and denoteC ≡ C(∞).

Theorem 1 Assume that qi ∼ c/iα as i → ∞ and α > 1. Then, as x → ∞,

P[C > x] ∼ Kk(α)P[R > x], (23)

where

Kk(α) �[Γ(

1 − 1αk

)]α−1

Γ(

1 +1k− 1

αk

). (24)

Furthermore, function Kk(α) is monotonically increasing in α, for fixed k, with

limα↓1

Kk(α) = 1, limα↑∞

Kk(α) = Kk(∞) � 1k

Γ(

1k

)eγ/k, (25)

where γ is the Euler constant, i.e., γ ≈ 0.57721 . . . , and monotonically decreasing in k, for fixed α, with

limk→∞

Kk(α) = 1. (26)

Remark 2 (i) It is well known that in the case of independent reference model the static algorithm that storesthe most popular documents in the cache is the optimal. For direct arguments that justify this intuitivelyobvious statement see the first paragraph of Subsection4.1 in [11]; this is also recently shown in [2] usingthe formalism from Markov decision theory. Therefore,P[R > x] is the fault probability of the optimalstatic policy andP[C > x]/P[R > x] is an average-case competitive ratio between the performances ofthe PAC and optimal algorithms. (ii) Figure 1 shows the significant improvement in performance of thePAC(β, k) algorithm fork = 1, 2, 3 when compared to the optimal static policy. Note that already fork = 3the PAC policy performs approximately within the8% of the optimal static algorithm, which implies nearoptimality of the PAC rule even for relatively small values ofk; (iii) The asymptotic result in the presentform, using alternative approach that exploits the Tauberian technique for inverting the Laplace transformwas originally derived in [9].

In the proofs of the following theorems we use results proved in Lemma2 of [9] and Lemma4 of [12]that are, for reasons of convenience, stated in Lemmas 4 and 5 of the Appendix.Proof: First, we prove the upper bound for the asymptotic relationship in (23). Define the sum of indicatorfunctionsS(t) �

∑∞j=1 1[Tj < t]; note thatS(t) is a.s. non-decreasing int, i.e.,S(t) ≤ S(ϑx) a.s. for all

t ≤ ϑx, whereϑx is a positive function ofx that we select later. Then, after conditioning onTi being largeror smaller thanϑx, the expression in (7) can be upper bounded as

P[C > x] ≤ P[S(ϑx) > x] +∞∑i=1

qiP[Ti ≥ ϑx], (27)

9

1 2 3 4 5 6 7 8 9 10 111

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

α

Kk(α

)

K1(∞) ≈ 1.78

K2(∞) ≈ 1.18

K3(∞) ≈ 1.08

Figure 1: FunctionKk(α) for k = 1, 2, 3.

where in the previous expression we applied∑∞

i=1 qi = 1 andP[S(t) > x] ≤ 1. Next, from the assumptionof the theorem and Lemma 3 it follows that for anyε > 0 there existsj0 such that for allj ≥ j0 the bound(16) holds and, therefore, we can upper bound the expectation of the sumS(t) as

ES(t) =∞∑

j=1

P[Tj < t]

≤ j0 +∞∑

j=j0+1

(1 − e

− qkj βk−1

(k−1)!(1+ε)t

).

Next, using the preceding bound and Lemma 4 of the Appendix, we conclude that, ast → ∞,

ES(t) � Γ(

1 − 1αk

)c

1α β

k−1αk

((k − 1)!)1

αk

(1 + ε)1

αk t1

αk . (28)

Now, if we select

ϑx =xαk(1 − 2ε)αk(k − 1)!

(1 + ε)ckβk−1[Γ(1 − 1

αk

)]αk

and use (28), it is easy to show thatES(ϑx) ≤ (1 − ε)x for all x large enough. Now, large deviation boundfor the sum of independent Bernoulli random variables proved from Lemma 5 of the Appendix implies

P[S(ϑx) > x] ≤ 2e−θES(ϑx),

for someθ > 0. Thus, in conjunction with (27), we conclude that asx → ∞,

P[C > x] ≤ o

(1

xα−1

)+

∞∑i=1

qiP[Ti ≥ ϑx]. (29)

Next, from Lemma 2 there existsi0 such that for alli ≥ i0, Ti satisfies (9). We usei0 to denote asufficiently large integer constant that is possibly different at different places in the proof. Then, since for

10

everyi ≤ i0, the inequalityqi ≥ qi0 holds, the Poisson process of rateqi can be constructed as a superpo-sition of two independent Poisson processes with ratesqi0 andqi − qi0. Therefore, in this construction, theprocess of rateqi will have on each sample path more arrival points than the process of rateqi0. Thus, it isstraightforward that, for alli ≤ i0,

P[Ti ≥ t] ≤ P[Ti0 ≥ t]. (30)

Therefore, we obtain

∞∑i=1

qiP[Ti ≥ ϑx] ≤i0∑

i=1

qiP[Ti0 ≥ ϑx] +∞∑

i=i0

qie− qk

i βk−1(1−ε)

(k−1)!ϑx + 2

∞∑i=i0

qie−hεqk−1

i ϑx

� I1(x) + I2(x) + I3(x), (31)

where in the last two sums we used the result of Lemma 2.After using the bound (9) and replacingϑx, it immediately follows that

I1(x) ≤i0∑

i=1

qi

[e− qk

i0βk−1(1−ε)

(k−1)!ϑx + 2e−hεqk−1

i0ϑx

]= o

(1

xα−1

)as x → ∞. (32)

Now, by assumption of the theorem, for alli large enough (i ≥ i0, wherei0 is possibly larger than in(31))

(1 − ε)c/iα < qi < (1 + ε)c/iα. (33)

Furthermore, fori large enough (i ≥ i0) inequalityc/iα ≤ (1 + ε)c/uα holds for anyu ∈ [i, i + 1] and,therefore, using this bound, (33), the monotonicity of the exponential function and replacingϑx, yields

I2(x) ≤ (1 + ε)∞∑

i=i0

c

iαe−ι(ε)[Γ(1− 1

αk)]−αk xαk

iαk

≤ (1 + ε)2∫ ∞

1

c

uαe−ι(ε)[Γ(1− 1

αk)]−αk xαk

uαk du, (34)

whereι(ε) � (1 + ε)−1(1 − ε)k+1(1 − 2ε)αk. Next, applying the change of variable method for evaluatingthe integral withz = ι(ε)[Γ(1 − 1

αk )]−αkxαku−αk, we obtain that the integral in (34) is equal to

c

xα−1(α − 1)

[Γ(

1 − 1αk

)]α−1

(ι(ε))1

αk− 1

kα − 1αk

∫ ι(ε)[Γ(1− 1αk

)]−αkxαk

0e−zz

1k− 1

αk−1dz,

which, in conjunction with (34), implies

lim supx→∞

I2(x)P[R > x]

≤ Kk(α)(ι(ε))1

αk− 1

k → Kk(α) as ε → 0, (35)

whereKk(α) is defined in (24).In order to estimate asymptotics ofI3(x), we use analogous steps to those we applied in estimating

I2(x). Thus, from the assumptionqi ∼ c/iα asi → ∞, it follows that fori large (i ≥ i0) inequalities (33)andc/iα ≤ (1 + ε)c/uα hold for allu ∈ [i, i + 1] and, therefore, after replacingϑx

I3(x) ≤ 2(1 + ε)∞∑

i=i0

c

iαe−hε xαk

iα(k−1)

≤ 2(1 + ε)2∫ ∞

1

c

uαe−hε xαk

uα(k−1) du. (36)

11

Now, if k = 1, it is straightforward to compute the integral in the preceding expression and obtainI3(x) ≤(1 + ε)2(c/(α − 1))e−hεxα

= o(1/xα−1) asx → ∞. Otherwise, fork ≥ 2, after using the change ofvariable method for solving the integral in (36) withz = hεxαku−α(k−1), we obtain, asx → ∞,

I3(x) ≤ 2(1 + ε)3c

(hε)1

k−1(1− 1

α)

1α(k − 1)

1

xk

k−1(α−1)

Γ(

1k − 1

− 1α(k − 1)

)= o

(1

xα−1

).

The previous expression, in conjunction with (35),(32), (31) and (29), yields, asx → ∞,

P[C > x] � Kk(α)P[R > x]. (37)

Next, we estimate an asymptotic lower bound forP[C > x] in (23). Similarly as before, Lemma 2implies that for everyε > 0 and alli large enough (i ≥ i0), inequality (9) holds and, therefore,

ES(t) ≥�Ht

1αk �∑

i=i0

(1 − e

− qki βk−1(1−ε)

(k−1)!t − 2e−hεqk−1

i t

)

≥�Ht

1αk �∑

i=i0

(1 − e


(k−1)!t

)− 2

�Ht1

αk �∑i=i0

e−hεqk−1i t. (38)

Next, since fori large (i ≥ i0) inequality (33) holds, after lower bounding the second sum in (38), we obtain,ast → ∞,

ES(t) ≥�Ht

1αk �∑

i=i0

(1 − e

− (1−ε)k+1ckβk−1t

iαk(k−1)!

)− Ht

1αk e

−hε (1−ε)k−1ck−1

Hα(k−1)t1−

k−1k

=�Ht

1αk �∑

i=i0

(1 − e


iαk(k−1)!

)+ o(t

1αk )

=∞∑

i=i0

(1 − e


ikα(k−1)!

)−

∞∑i=�Ht

1αk �+1

(1 − e


iαk(k−1)!

)+ o(t

1αk ). (39)

Now, after definingL � ckβk−1(1 − ε)k+1/(k − 1)! and using the inequality1 − e−x ≤ x for x ≥ 0 smallenough, we derive

limt→∞

1

t1

αk

∞∑i=�Ht

1αk �+1

(1 − e

− ckβk−1(1−ε)k+1t

iαk(k−1)!

)≤ lim

t→∞1

t1

αk

∫ ∞

Ht1

αk

1uαk

Ltdu

=L

αk − 11

Hαk−1→ 0 as H → ∞, (40)

and, therefore, in conjunction with (39) and Lemma 4 of the Appendix, we conclude

limt→∞

ES(t)

t1

αk

≥ Γ(

1 − 1αk

)c

1α β

k−1αk

((k − 1)!)1

αk

(1 − ε)k+1αk ,

12

for anyε > 0. Therefore, after lettingε → 0, we derive

ES(t) � Γ(

1 − 1αk

)c

1α β

k−1αk

((k − 1)!)1

αk

t1

αk as t → ∞. (41)

Next, we redefineϑx as

ϑx =xαk(1 + 2ε)αk(k − 1)![Γ(1 − 1

αk )]αk

ckβk−1.

Then, the monotonicity ofS(t) and (41) yieldES(t) ≥ (1 + ε)x for all t ≥ ϑx andx large enough, whichin conjunction with Lemma 5 of the Appendix impliesP[S(t) > x] ≥ 1− ε for all t ≥ ϑx. Thus, forx largeenough, after conditioning onTi ≥ ϑx, expression (7) can be lower bounded as

P[C > x] ≥ (1 − ε)∞∑i=1

qiP[Ti ≥ ϑx]. (42)

Next, we proceed with estimating (42). By inequality (16) of Lemma 3 and assumption of the theorem,for i large enough (i > i0), after replacingϑx, we obtain

P[C > x] ≥ (1 − ε)∞∑

i=i0+1

qie− qk

i xαk(1+ε)(1+2ε)αk

[Γ(1− 1αk

)]αkck.

Furthermore, similarly as in estimatingI2(x), since fori large (i > i0) inequalities (33) andc/iα ≥ (1 −ε)c/uα hold for all u ∈ [i − 1, i], which in conjunction with the monotonicity of the exponential functionyields

P[C > x] ≥ (1 − ε)3∫ ∞

u=i0

c

uαe− xαk

uαk[Γ(1− 1αk

)]αkι(ε)

du, (43)

whereι(ε) � (1 + ε)k+1(1 + 2ε)αk. Then, using the change of variable method for solving the previousintegral withz = xαkι(ε)u−αk[Γ(1 − 1

αk )]−αk, we obtain that the integral in (43) is equal to

c

(α − 1)xα−1

α − 1αk

(ι(ε))1

αk− 1

k

∫ xαkι(ε)

iαk0 [Γ(1− 1

αk)]αk

0z

1k− 1

αk−1e−zdz,

which implies

lim infx→∞

P[C > x]P[R > x]

≥ (ι(ε))1

αk− 1

k Kk(α) → Kk(α) as ε → 0,

whereKk(α) is defined in (24). The previous result in conjunction with (37) proves (23).Finally, it is left to prove the monotonicity of functionKk(α) and its limits whenα → ∞, k → ∞.

Since this proof, although technical, uses standard techniques from calculus, we present it in the Appendix.�

Theorem 2 Assume that qi = hN/iα, 1 ≤ i ≤ N , where hN is the normalization constant and 0 < α < 1.Then, for any 0 < δ < 1, as N → ∞,

P[C(N) > δN ] ∼ Fk(δ) � 1 − α

αk(ηδ)

1αk

− 1k Γ(

1k− 1

αk, ηδ

), (44)

13

where ηδ is the unique solution of the equation

1 − 1αk

Γ(− 1

αk, η

)η

1αk = δ;

note that Γ(x, y), y > 0, is the incomplete Gamma function, i.e., Γ(x, y) =∫∞y e−ttx−1dt. Furthermore,

1 − Fk(δ) is a proper distribution on (0, 1) with limδ↓0 Fk(δ) = 1, limδ↑1 Fk(δ) = 0 and

limk→∞

Fk(δ) = 1 − δ1−α. (45)

Remark 3 (i) Observe thatP[R(N) > δN ] → 1 − δ1−α asN → ∞. Thus, for large caches, the limitin (45) shows that PAC policy approaches the optimal static algorithm ask increases. (ii) On Figure 2 wepresent the relative performance of the PAC(20, k) replacement scheme,k = 1, 2, 3, when compared to theoptimal static arrangement; in this case the cache can store1/10 of the total number of documents. Notethat the performance of the PAC algorithm drastically improves even for the small values of parameterk,and, thus, it is nearly optimal; (iii) For the ordinary MTF searching (k = 1), the convergence ofC(N)/N

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.91

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

α

Fk(1

/10)

(1−

(1/1

0)1−

α )−1

k=1

k=2

k=3

1.3471

1.1084

1.0479

Figure 2: RatioFk(1/10)/(1 − (1/10)1−α) for k = 1, 2, 3.

in distribution asN → ∞ and the Laplace transform of the limiting function are obtained in Lemma 4.5 of[5]. The result in its presented form, also for the ordinary LRU, was derived in [10]; (iv) It is possible torelax the assumptionqi = hN/iα, 1 ≤ i ≤ N , e.g., by assuming that for anyε > 0, there existsi0, suchthat for all i0 ≤ i ≤ N , inequality(1 − ε)c/(iαN1−α) ≤ qi ≤ (1 + ε)c/(iαN1−α) holds, the expressionin (48) would be replaced by this, slightly different form, and the rest of the proof would be identical; thefinal asymptotic formula in this case would have factorc instead of1−α and all of the other factors in (44)would be the same; (v) Note that a similar theorem could be proved for the case ofα > 1 andN < ∞, inwhich case the asymptotic result would be less explicit than the one stated in Theorem 1 and, therefore, weomit it.

Proof: First, we estimate the upper bound for the asymptotic relationship in (44). Similarly as in the proofof Theorem 1, we define the sumS(t) �

∑Nj=1 1[Tj < t]. For anyϑ > 0, sinceS(t) is (a.s.) non-decreasing

in t, we haveS(t) ≤ S(ϑ) a.s. for allt ≤ ϑ. Thus, after conditioning onTi being larger or smaller thanϑin expression (7), we easily obtain the following upper bound

P[C(N) > δN ] ≤ P[S(ϑ) > δN ] +N∑

i=1

qiP[Ti ≥ ϑ]. (46)

14

Then, by Lemma 3, forN large and anyε > 0, there existsi0 such that for alli ≥ i0 inequality (16) holds,and, in conjunction with the monotonicity of the exponential function, we obtain

ES(t) =N∑

i=1

P[Ti < t]

≤ i0 +N∑

i=i0+1

(1 − e

−qki

βk−1(1+ε)t(k−1)!

)

≤ N −∫ N

i0

e− hk

Nuαk

βk−1

(k−1)!(1+ε)t

du. (47)

At this point, note that from the assumption∑N

i=1 hN/iα = 1, it is straightforward to check that forN largeenough

(1 − ε)1 − α

N1−α≤ hN ≤ (1 + ε)

1 − α

N1−α. (48)

Thus, if we assume thatt increases inN ast = ηNk/ξ(ε), whereη > 0 is a fixed constant, using inequalities(47) and (48), we boundES(ηNk/ξ(ε)) for N large as

ES

(ηNk

ξ(ε)

)≤ N −

∫ N

i0

e−Nαkη

uαk du, (49)

whereξ(ε) is defined as

ξ(ε) � (1 + ε)k+1(1 − α)kβk−1

(k − 1)!.

Next, using the change of variablez = Nαkηu−αk method in the integral in (49), we obtain

ES

(ηNk

ξ(ε)

)≤ N − N

1αk

η1

αk

∫ η Nαk

iαk0

ηe−zz−1− 1

αk dz. (50)

Furthermore, since ∫ ηNkα

ηe−zz−1− 1

αk dz ↑ Γ(− 1

αk, η

)as N → ∞,

we obtain, forN large enough

ES

(ηNk

ξ(ε)

)≤ N − N(1 − ε)

1αk

η1

αk Γ(− 1

αk, η

). (51)

Now, define the functionf(η) as

f(η) � 1 − 1αk

η1

αk Γ(− 1

αk, η

), (52)

and letηδ(ε0) be a unique solution to the equation

f(η) = δ(1 − 2ε0), (53)

15

for someε0 > 0; the uniqueness of the solution will be justified at the end of the proof. Then, from (51), itfollows

ES

(ηδ(ε0)Nk

ξ(ε)

)≤ N(1 − (1 − ε)(1 − δ(1 − 2ε0))).

Thus, for allε < ε0δ(1 − 2ε0)/(1 − δ(1 − 2ε0)), the preceding expressions are bounded ny

ES

(ηδN

k

ξ(ε)

)≤ (1 + ε0)δ(1 − 2ε0)N ≤ (1 − ε0)δN.

Now, sinceS(t) is a.s. non-decreasing int, we conclude that for allt ≤ ϑ � ηδ(ε0)Nk(ξ(ε))−1

ES(t) ≤ (1 − ε0)δN.

At this point, we use the large deviation (Chernoff) bound for the sum ofN independent Bernoullirandom variables from Lemma 5 of the Appendix to conclude

P[S(ϑ) > δN ] ≤ e−θε0δN ,

for someθε0 > 0. Thus, after upper bounding the first term in (46) and replacingϑ, we derive

P[C(N) > δN ] ≤ o(1) +N∑

i=1

qiP[Ti ≥ ϑ] as N → ∞. (54)

Next, we estimate the second term on the right hand side of (54). The same arguments that resulted ininequality (30) in the proof of Theorem 1, in conjunction with (9), yield, forN andi0 large enough,

N∑i=1

qiP[Ti ≥ ϑ] ≤i0∑

i=1

qiP[Ti ≥ ϑ] +N∑

i=i0

qiP[Ti ≥ ϑ]

≤ i0P[Ti0 ≥ ϑ] +N∑

i=i0

qie− qk

i βk−1(1−ε)

(k−1)!ϑ + 2

N∑i=i0

qie−hεqk−1

i ϑ

� I1(δ) + I2(δ) + I3(δ). (55)

After replacingϑ in I1(δ), it is straightforward to conclude

I1(δ) ≤ i0

[e−

qki0

βk−1(1−ε)

(k−1)!ϑ + 2e−hεqk−1

i0ϑ

]= o(1) as N → ∞. (56)

Next, we estimateI2(δ). After applying inequality (48), replacingϑ, using the monotonicity of theexponential function and relationhN/iα ≤ (1 + ε)hN/uα for all u ∈ [i, i + 1] and i large (i ≥ i0), weobtain

I2(δ) ≤ (1 + ε)2∫ N

i0

1 − α

uαN1−αe−ι(ε)

ηδ(ε0)

uαk Nαk

du, (57)

where we defineι(ε) � (1 − ε)k+1(1 + ε)−(k+1). Then, similarly as before, using the change of variablemethod for solving the integral withz � ι(ε)ηδ(ε0)u−αkNαk, we derive

I2(δ) ≤ (1 + ε)2(ι(ε))1

αk− 1

k1 − α

αk(ηδ(ε0))

1αk

− 1k

∫ ι(ε)ηδ(ε0)Nαk

iαk0

ι(ε)ηδ(ε0)z

1k− 1

αk−1e−zdz

≤ (1 + ε)2(ι(ε))1

αk− 1

k1 − α

αk(ηδ(ε0))

1αk

− 1k Γ(

1k− 1

αk, ι(ε)ηδ(ε0)

).

16

Thus, after lettingε ↓ 0, ε0 ↓ 0, we conclude

I2(δ) ≤ 1 − α

αk(ηδ)

1αk

− 1k Γ(

1k− 1

αk, ηδ

), (58)

since, by continuity off(η), ηδ(ε0) → ηδ as ε0 ↓ 0, whereηδ is the unique solution to the equationf(ηδ) = δ.

Finally, we estimateI3(δ). Here, we observe two possible cases:k = 1 andk ≥ 2. For k = 1, afterapplying (48), replacingϑ, and using relationhN/iα ≤ (1 + ε)hN/uα for all u ∈ [i, i + 1] and i large(i ≥ i0), we obtain

I3(δ) ≤ 2(1 + ε)21 − α

N1−αe−hεNk

∫ N

i0

1uα

du

≤ 2(1 + ε)2e−hεNk= o(1) as N → ∞.

Next, we estimateI3(δ) in the case whenk ≥ 2. Similarly as before, after using (48), replacingϑ, inconjunction with the monotonicity of the exponential function andhN/iα ≤ (1 + ε)hN/uα for all u ∈[i, i + 1] andi large (i ≥ i0), we obtain

I3(δ) ≤ 2(1 + ε)2(1 − α)N1−α

∫ N

i0

1uα

e−hεNαk−α+1

uα(k−1) du.

Then, using the change of variable method for solving the integral, withz = hεNαk−α+1u−α(k−1), wederive, forN large,

I3(δ) ≤ 2hε1

k−1( 1

α−1)N

1k−1

( 1α−1)∫ hεN( N

i0)α(k−1)

hεNe−zz−

1k−1

( 1α−1)−1dz

≤ 2hε1

k−1( 1

α−1)N

1k−1

( 1α−1)∫ ∞

hεNe−zdz

≤ 2hε1

k−1( 1

α−1)N

1k−1

( 1α−1)e−hεN = o(1) as N → ∞. (59)

Finally, (59), (58), (56), (55) and (54) imply

lim supN→∞

P[C(N) > δN ] ≤ 1 − α

αk(ηδ)

1αk

− 1k Γ(

1k− 1

αk, ηδ

). (60)

Next, we estimate the asymptotic lower bound forP[C(N) > δN ]. By Lemma 2, for anyε > 0 andNlarge, there existsi0 such that for alli ≥ i0 inequality (9) holds, and, in conjunction with the monotonicityof the exponential function, we obtain

ES(t) =N∑

i=1

P[Ti < t] ≥N∑

i=i0

P[Ti < t]

≥N∑

i=i0

(1 − e


(k−1)!t − 2e−hεqk−1

i t

)

≥ N − i0 −∫ N

i0

e− hk

Nuαk

βk−1(1−ε)(k−1)!

tdu − 2

∫ N

i0

e−hε

hk−1N

uα(k−1)tdu.

17

Then, after replacing the bound in (48), lettingt increase inN ast = ηNk/ξ(ε), where, similarly as before,η > 0 is a fixed constant, we define

ES

(ηNk

ξ(ε)

)≥ N − i0 −

∫ N

i0

e−Nαkη

uαk du − 2∫ N

i0

e−hε Nαk−α+1

uαk−α du

� N − i0 − I1 − I2, (61)

with ξ(ε) redefined asξ(ε) � (1 − ε)k+1βk−1(1 − α)k((k − 1)!)−1.First, we estimate the upper bound ofI1 for N large. Using completely analogous steps to those applied

in estimating expressions (49), (50) and (51), after using the change of variable method to solve the integralwith z = Nαkηu−αk, we obtain

I1 ≤ Nη1

αk1

αk

∫ ( Nj0

)αkη

ηe−zz−1− 1

αk dz

≤ Nη1

αk1

αkΓ(− 1

αk, η

). (62)

Now, we redefineηδ(ε) to be the unique solution to the equationf(ηδ) = (1 + 2ε)δ, for someε > 0, wheref(η) was defined in (52). Then, after replacingηδ(ε) in (62), we obtain

I1 ≤ Nηδ(ε)1δ

1αk

Γ(− 1

αk, ηδ(ε)

)= N(1 − δ(1 + 2ε)). (63)

Next, we estimate the upper bound ofI2. Similarly as before, in estimatingI3(δ), we observe twopossible cases:k = 1 andk ≥ 2. In the case wherek = 1, we obtain

I2 ≤ 2∫ N

i0

e−hεNdu = (N − i0)e−hεN = o(1) as N → ∞.

Furthermore, in the case ofk ≥ 2 and largeN

I2 ≤ 2∫ N

i0

e−hεNαk−α+1

uα(k−1) du

≤ 2hε1

α(k−1) N1+ 1

α(k−1)

∫ ∞

hεNe−zdz

≤ 2hε1

α(k−1) N1+ 1

α(k−1) e−hεN = o(1) as N → ∞, (64)

where in the first integral we use the change of variablez = hεNαk−α+1u−αk+α.Finally, (64), (63) and (61) imply that for anyε > 0 andN large enough

ES

(ηδ(ε)Nk

ξ(ε)

)≥ N − i0 − N(1 − δ(1 + 2ε)) − ε,

which for allN ≥ (i0 + ε + δ(1 + ε))/(1 + 2ε)δ yields

ES

(ηδ(ε)Nk

ξ(ε)

)≥ (1 + ε)δ. (65)

18

Now, sinceS(t) is a.s. increasing int, we obtain that for allN andt large,t ≥ ϑ � ηδ(ε)Nk(ξ(ε))−1,

ES(t) ≥ (1 + ε)δN. (66)

At this point, using the previous observations, the monotonicity ofS(t) and (7), after conditioning onTi being greater thanϑ, we obtain the lower bound

P[C(N) > δN ] ≥ P[S(ϑ) > δN ]N∑

i=1

qiP[Ti ≥ ϑ]. (67)

Now, given the inequality (66), the large deviation (Chernoff) bound from Lemma 5 of the Appendix impliesfor anyε > 0 andN large enough

P[S(ϑ) > δN ] ≥ 1 − ε.

Thus, the previous inequality and (67) yield

P[C(N) > δN ] ≥ (1 − ε)N∑

i=1

qiP[Ti ≥ ϑ].

Now, by Lemma 3, for anyε > 0 andi large (i ≥ i0), inequality (16) holds, and, therefore

P[C(N) > δN ] ≥ (1 − ε)N∑

i=i0+1

hN

iαe−hk

N βk−1(1+ε)ϑ

iαk(k−1)! ,

which, using the inequalityhN/iα ≥ (1−ε)hN/uα for all i large (i ≥ i0) andu ∈ [i−1, i], the monotonicityof the exponential function and inequality (48), after replacingϑ, implies

P[C(N) > δN ] ≥ (1 − ε)31 − α

N1−α

∫ N

i0

1uα

e− ι(ε)ηδ(ε)Nkα

uαk du, (68)

where we redefineι(ε) � (1 + ε)k+1(1 − ε)−(k+1). Next, similarly to boundingI2(δ) in (57 – 58), we usethe change of variablez = ι(ε)ηδ(ε)Nαku−αk to solve the integral in (68), we derive forN large

P[C(N) > δN ] ≥ (1 − ε)3(ηδ(ε))1

αk− 1

k1 − α

αk(ι(ε))

1αk

− 1k

∫ ( Ni0

)kαηδ(ε)ι(ε)

ι(ε)ηδ(ε)e−zz

1k− 1

αk−1dz,

which, after takinglim infN→∞ and lettingε → 0, renders

lim infN→∞

P[C(N) > δN ] ≥ 1 − α

αk(ηδ)

1αk

− 1k Γ(

1k− 1

αk, ηδ

),

whereηδ is the unique solution to the equationf(η) = δ. The previous expression in conjunction with (60)concludes the proof of this theorem.

Finally, we prove the uniqueness of the solutionηδ for any0 < δ < 1 and the limiting values forFk(δ)whenk → ∞, δ → 0 andδ → 1. Again, this part of the proof uses standard techniques from calculus and,therefore, we move it to the Appendix. �

The following theorem estimates the tail of the search cost distributionP[C(N) > δN ] asN → ∞ inthe case ofα = 1. Since the proof of this result uses completely analogous arguments to the ones used inthe proof of Theorem 2, in order to avoid repetations we just state the result and present an outline of theproof.

19

Theorem 3 Assume that qi = hN/i, 1 ≤ i ≤ N , where hN is the normalization constant. Then, for any0 < δ < 1, as N → ∞,

(log N)P[C(N) > δN ] ∼ Fk(δ) � 1kΓ(0, ηδ), (69)

where ηδ uniquely solves the equation

1 − 1kη

1k Γ(−1

k, η

)= δ;

note that, Γ(x, y), y > 0, is the incomplete Gamma function, i.e., Γ(x, y) =∫∞y e−ttx−1dt. Furthermore,

for any 0 < δ < 1,

limk→∞

Fk(δ) = log(

1δ

). (70)

Remark 4 (i) Again, as in the previous theorems, (70) and(log N)P[R(N) > δN ] → log(1/δ) asN → ∞demonstrate the asymptotic near optimality of the PAC policy. (ii) A related result from Lemma 4.7 of [5],in the context of the ordinary MTF searching on the list (k = 1), shows the convergence in distribution of theratio log C(N)/ log N to a uniform random variable on a unit interval; this result corresponds to a sub-linearscalingP[C(N) > Nu] → 1 − u asN → ∞, 0 < u < 1. (iii) Similarly as in Remark 3 (iv), it is possibleto relax the assumptionqi = hN/i, 1 ≤ i ≤ N . Again, by assuming that for anyε > 0, there existsi0 suchthat for alli0 ≤ i ≤ N , inequality(1− ε)c/ log N < qi < (1 + ε)c/ log N holds, the expression (71) needsto be be replaced by the last inequality implying an almost identical formula to the one in (69), where theonly difference is that1/k is replaced byc/k on the right-hand-side of (69).

Outline of the proof: In order to estimate the upper bound, we use the same arguments as in inequality(46). Note that for anyε > 0 andN large enough, the normalization constanthN is bounded by

(1 − ε)1

log N< hN < (1 + ε)

1log N

. (71)

Thus, after boundingES(t), similarly as in (47) and (49), settingt = η(N log N)k/ξ(ε), whereη > 0 is afixed constant, we obtain

ES

(η(N log N)k

ξ(ε)

)≤ N −

∫ N

i0

e−Nkη

uk du; (72)

note that in this caseξ(ε) is defined as

ξ(ε) � (1 + ε)k+1βk−1

(k − 1)!.

Then, using the change of variablez = Nkηu−k in the integral in (72) and applying the analogous steps asin (50 – 53), we obtain that for allt ≤ ϑ � ηδ(ε0)(N log N)k(ξ(ε))−1, anyε0 > 0, N large andε > 0 smallenough (ε < ε0δ(1 − 2ε0)/(1 − δ(1 − 2ε0)))

ES(t) ≤ (1 − ε0)δN,

whereηδ(ε0) is the unique solution to the equation

f(η) � 1 − 1kη

1k Γ(−1

k, η

)= δ(1 − 2ε0).

20

Next, using the large deviation bound from Lemma 5 of the Appendix, similarly as in (54), we derive,asN → ∞,

P[C(N) > δN ] ≤ o

(1

log N

)+

N∑i=1

qiP[Ti ≥ ϑ]. (73)

Now, after splitting the sum in the previous expression, analogously as in (55), we obtain, forN andi0 largeenough

N∑i=1

qiP[Ti ≥ ϑ] ≤ i0P[Ti0 ≥ ϑ] +N∑

i=i0

qie− qk

i βk−1(1−ε)

(k−1)!ϑ + 2

N∑i=i0

qie−hεqk−1

i ϑ

� I1(δ) + I2(δ) + I3(δ). (74)

Then, after replacingϑ and using similar arguments that led to (56) and (59), we obtain

I1(δ) = o

(1

log N

), I3(δ) = o

(1

log N

)as N → ∞. (75)

Now, by upper bounding the sum inI2(δ) with an integral, as in (57), applying the change of variablez = (1 − ε)k+1(1 + ε)−(k+1)ηδ(ε0)Nku−k and using the same arguments that led to (58), we conclude

I2(δ) � 1log N

1kΓ(0, ηδ) as N → ∞,

whereηδ uniquely solves the equationf(η) = δ, and, therefore, in conjunction with (75), (74) and (73),yields the asymptotic upper bound forP[C > δN ].

In order to prove the asymptotic lower bound forP[C > δN ] in (69), we start by estimatingES(t) usingthe identical arguments as in the proof of the lower bound in Theorem 2. Thus, by lettingt increase inN ast = η(N log N)k/ξ(ε), whereη > 0 is a fixed constant, we define, similarly as in (61), for anyε > 0 andi0, N large enough

ES

(η(N log N)k

ξ(ε)

)≥ N − i0 −

∫ N

i0

e−Nkη

uk du − 2∫ N

i0

e−hε Nk log N

uk−1 du

� N − i0 − I1 − I2, (76)

whereξ(ε) is redefined asξ(ε) � (1 − ε)k+1βk−1((k − 1)!)−1. Then, using the change of variablez =Nkηu−k for solving the integral ofI1 and the analogous arguments as in (63 – 66), for allt ≥ ϑ �ηδ(ε)(N log N)k(ξ(ε))−1, anyε > 0 andN large, we obtain the inequality

ES(t) ≥ (1 + ε)δN,

whereηδ(ε) is the unique solution to the equationf(η) = (1+2ε)δ. Then, applying the same reasoning as in(67), in conjunction with the large deviation bound for the sum of independent Bernoulli random variablesproved in Lemma 5 of the Appendix, we derive that forN large

P[C(N) > δN ] ≥ (1 − ε)N∑

i=1

qiP[Ti ≥ ϑ].

21

Next, by Lemma 3 and the analogous arguments as in (68), after using the change of variablez = (1 +ε)k+1(1 − ε)−(k+1)ηδ(ε)Nku−k to solve the integral and lettingε → 0, we conclude

P[C(N) > δN ] � 1log N

1kΓ(0, ηδ) as N → ∞, (77)

whereηδ is the unique solution to the equationf(η) = δ. Thus, (77) in conjunction with the asymptoticupper bound from before proves (69).

Finally, it is left to prove the uniqueness of the solutionηδ of the equationf(η) = δ and the limit in(70). We omit the details of this proof since these properties follow directly from similar arguments as inthe proof of Theorem 2.

�

4 Numerical experiments

In this section we illustrate our main results stated in Theorems1, 2 and3, using simulation experiments.Since the asymptotic results are obtained for infinite number of documentsN in Theorem 1, while in The-orems 2 and 3 the numberN is passed to infinity, it can be expected that asymptotic expressions givereasonable approximation of the fault probabilityP[C(N) > x], only if both N andx are large (withNmuch larger thanx). However, our experiments show that the obtained approximations work well for rela-tively small values ofN and almost all cache sizesx < N . Furthermore, our simulations validate significantimprovement in performance of the introduced PAC(β, k), k ≥ 2, algorithm when compared to the ordinaryLRU scheme (k = 1), as predicted by our asymptotic results.

4.1 Convergence to stationarity

In order to ensure that the simulated values of the fault probabilities do not deviate significantly from thestationary ones, we first estimate the difference between the distributions ofC(N) andC

(N)n , whereC

(N)n is

the search cost aftern requests with arbitrary initial conditions. Thus, using (2 – 4) withε = 1/2, we upperbound the difference between the tails of these distributions as

supx

∣∣∣P[C(N)n > x] − P[C(N) > x]

∣∣∣ ≤ en � P

[τn <

n

2

]+

N∑i=1

qiP

[Ti >

n

2− β

],

whereτn is thenth arrival point in a Poisson process of unit rate. Thus, by applying the bound (15) andsettingε = 1/2, we obtain

en ≤ P

[τn <

n

2

]+

N∑i=1

qi

[e−piqi

34(n

2−β) + (1 − pi)

n2 −β

4β−1]

; (78)

recall thatpi = P[M (qi)β ≥ k− 1], 1 ≤ i ≤ N , whereM

(q)t is a counting Poisson process of rateq. The first

term in expression (78) is easy to estimate sinceP[τn < n/2] = P[M (1)n/2 > n]; in addition, since the Poisson

distribution is highly concentrated around the mean, this term converges very fast to zero. Therefore, it iseasy to see that the error bound in (78) is dominated by the sum. Furthermore, the value of the sum decreasesasβ increases sincepi = O(βk−1qk−1

i ). Hence, the increase of the parameterβ speeds up the convergence

22

of the search cost process{C(N)n } to stationarity. This makes the algorithm more adaptable to possible

fluctuations in document popularities. On the other hand, the largerβ implies the larger expected size of theadditional storage needed to keep track of the past requests. Thus, although the stationary performance ofthe PAC algorithm is invariant toβ, this parameter provides an important design component whose choicehas to balance between the algorithm complexity and adaptability.

4.2 Experiments

In the presented experiments we take the number of documents to beN = 1300 with popularities satisfyingqi = hN/iα, 1 ≤ i ≤ 1300, wherehN = (

∑Ni=1 1/iα)−1. Also, we selectβ = 20 andα : 1) α = 1.4, 2)

α = 1 and3) α = 0.8. In each experiment, before we conduct measurements, we allow the time of the firstn = 1010 requests to be a warm-up time for the system to reach stationarity. Then, the actual measurementtime is also set to be1010 requests long. The initial permutation of the list is chosen uniformly at randomand the initial set of requests in(−β, 0) is taken to be empty. The fault probabilities are measured for cachesizesx = 50j, 1 ≤ j ≤ 15. Simulation results are presented with “*” symbols on Figures 3, 4 and 5, whileour approximations are presented with the solid lines on the same figures.

0 100 200 300 400 500 600 700 800−2

−1.8

−1.6

−1.4

−1.2

−1

−0.8

−0.6

cache size x

log10

P(e)(x)simulated log

10P[C>x]

log10

P[R>x]

k=1

k=2

Figure 3: Illustration for Experiment 1.

Experiment 1 We setα = 1.4 and measure the cache fault probabilities for values PAC(20, k), k = 1, 2, al-gorithm. We compare the obtained measurements with our approximation given byP

(e)(x) = Kk(α)P[R >x], as implied by Theorem 1. The experimental results for the cases whenk ≥ 3 are almost indistinguish-able from the performance of the optimal algorithm,P[R > x], and for that reason we did not presentthem on Figure 3. After estimatingen in (78) for a given warm-up time of1010 requests, we obtain thaten < 10−6 for both k = 1, 2, which is negligible when compared to the smallest measured probabilities(> 10−2) and, therefore, the measured fault probabilities are essentially the stationary ones. Figure 3 showsan excellent agreement between the approximationP(e)(x) and experimental results, as well as a significantimprovement in performance fork = 2.

Experiment 2 Here, we selectα = 1 and measure the cache fault probabilities fork = 1, 2, 3. Sincethe normalization constant(hN )−1 = log N + γ + o(1) asN → ∞, whereγ is the Euler’s constant, the

23

0 100 200 300 400 500 600 700 800−1.2

−1

−0.8

−0.6

−0.4

−0.2

cache size x

log10

P(e)(x)simulated log

10P[C>x]

log10

P[R>x]

k=1

k=2

k=3

Figure 4: Illustration for Experiment 2

producthN log N converges slowly to one and, therefore, instead of using the approximationP[C(N) >x] ≈ Fk(x/N)/ log N , as suggested by Theorem 3, we defineP

(e)(x) = hNFk(x/N). Furthermore, aftercomputingen for n = 1010, we obtain that fork = 1, 2, the erroren < 10−6, while for k = 3 it satisfiesen < 0.004, which is insignificant when compared to the smallest values of the measured probabilities and,thus, the measured process is basically in stationarity. Again, the accuracy of approximationP(e)(x) andthe improvement in performance are apparent from Figure 4.

Experiment 3 Finally, the third example assumesα = 0.8 and considers casesk = 1, 2, 3. Similarly as inthe case ofα = 1, due to the slow convergence ofhNN1−α/(1− α) to one asN → ∞, we use an estimateP

(e)(x) = hN (N1−α/(1 − α))Fk(x/N) instead ofFk(x/N) that can be inferred from Theorem 2. Afterestimating the erroren, we obtain that for allk = 1, 2, 3 it satisfiesen < 10−6, which again is much smallerthan the smallest measured value of the fault probability and hence, the process is effectively in stationarity.Once again, the validity of approximationP(e)(x) and the benefit of the PAC algorithm are evident fromFigure 5.

5 Concluding remarks

In this paper we propose a new LRU-based PAC(β, k) replacement rule that possesses all of the desirableproperties of the LRU policy, such as low complexity, ease of implementation, adaptability to variabilityin access patterns. In the case of the independent reference model, we show that the performance (faultprobability) of the PAC policy, for large cache sizes, is very close to the optimal frequency algorithm evenfor small values ofk = 2, 3, which implies negligible additional complexity. Mathematical model considersrequest process satisfying generalized Zipf’s law popularity distribution and arriving at Poisson time points.Our analytical approach uses probabilistic (average-case) analysis that exploits the novel large deviationtechnique introduced recently in [12]. Theoretical results also show thatβ does not influence the asymptoticperformance, but, given the observations in Subsection 4.1, it is an important design parameter representingthe tradeoff between the faster adaptability (largerβ) and lower complexity (smallerβ). In addition, theoret-ical results are further validated using simulations that show a significant improvement of the PAC algorithmin comparison to the ordinary LRU scheme, even for small values of cache sizes and the total number of

24

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

δ

log10

P(e)(δ N)simulated log

10P[C>δ N]

log10

P[R>δ N]

k=1

k=2

k=3

Figure 5: Illustration for Experiment 3

documents. The excellent agreement between the analytical and experimental results and simplicity of thePAC algorithm itself implies a very high potential of the proposed policy for practical purposes.

Given the analytic approach established in our recent work on the analysis of the LRU policy in thepresence of the dependent requests [12] and variable page sizes [11], it can be easily shown that analogousresults hold for the PAC algorithm as well. In the context of the ordinary MTF with lighter tailed requestdistributions, the asymptotic fluid limits of the search cost derived in [9] could be analogously extended tothe PAC policy as well. Finally, we would like to note that our algorithm relates to the earlier proposed“k-in-a-row” rule [13, 8]. This rule was stated in the context of the expected list search cost. However, itsanalytical performance in the caching framework remains unknown.

Acknowledgments

We thank Dr. Tracy Kimbrel and Dr. Petar Momcilovic for pointing out the work on LRU-K and “k-in-a-raw” heuristics, respectively.

Appendix

The following lemmas correspond to Lemma2 and Lemma4 from [9] and [12], respectively.

Lemma 4 Let S(t) =∑∞

i=1 Bi(t) and assume qi ∼ c/iα as i → ∞, with α > 1 and c > 0. Then, ast → ∞,

m(t) � ES(t) ∼ Γ(

1 − 1α

)c

1α t

1α .

Lemma 5 Let {Bi, 1 ≤ i ≤ N}, N ≤ ∞, be a sequence of independent Bernoulli random variables,S =

∑Ni=1 Bi and m = E[S]. Then for any ε > 0, there exists θε > 0, such that

P[|S − m| > mε] ≤ 2e−θεm.

25

Proof of the properties of Kk(α) from Theorem 1: Since the function on the right hand side of (24) iswell-defined for all realk ≥ 1, we will assume thatKk(α) is defined for all real values ofk ≥ 1 andα > 1as well. Thus, proving thatKk(α) is monotonic over realk will in particular imply the monotonicity overinteger values.

First, we prove the monotonicity inα for all fixed k. Since PAC(β, 1) algorithm is the same as theordinary LRU, the monotonicity ofK1(α) follows from Theorem3 of [9]. Thus, we continue with the proofof monotonicity for values ofk ≥ 2. Proving thatKk(α) is monotonically increasing inα is equivalent toshowing thatlog Kk(α) monotonically increases inα. Thus, we observe the following function

log Kk(α) = logα − 1αk

+ (α − 1) log Γ(

1 − 1αk

)+ log Γ

(1k− 1

αk

)(79)

= log Γ(

1 +1k− 1

αk

)+ (α − 1) log Γ

(2 − 1

αk

)− (α − 1) log

(1 − 1

αk

).

Functionlog Kk(α) monotonically increases inα if its derivative with respect toα is positive for allα > 1.Thus,

d

dα(log Kk(α)) = log Γ

(2 − 1

αk

)− log

(1 − 1

αk

)− α − 1

α

1αk − 1

(80)

+α − 1α2k

Ψ(0)

(2 − 1

αk

)+

1α2k

Ψ(0)

(1 +

1k− 1

αk

),

whereΨ(k), k = 0, 1, . . . , are Polygamma functions (see equation 6.4.1, p. 260 of [1]). Furthermore, sinceΨ(0)(1) = −γ (Euler’s constant) andΓ(1) = 1, by using continuity and passingα → ∞ in (80), weconclude

d

dαlog Kk(α) → 0 as α → ∞. (81)

Therefore, to show thatd/dα log Kk(α) ≥ 0 for all α > 1, it is enough to prove that the second derivative

d2

dα2log Kk(α) =

1α4k2

[Ψ(1)

(1 +

1k− 1

αk

)+ (α − 1)Ψ(1)

(2 − 1

αk

)](82)

+2

α3k

[Ψ(0)

(2 − 1

αk

)− Ψ(0)

(1 +

1k− 1

αk

)]

is non-positive for allα ∈ (1,∞). Now, equation 6.4.1 on page 260 of [1] implies thatΨ(0)(z) is mono-tonically increasing (Ψ(1)(z) ≥ 0) and concave (Ψ(2)(z) ≤ 0) and, similarly,Ψ(1)(z) is monotonicallydecreasing (Ψ(2)(z) ≤ 0) and convex (Ψ(3)(z) ≥ 0) for all z > 0. Thus, since for anyα > 1, k ≥ 1,arguments of the functionsΨ(0) andΨ(1), i.e., 2 − 1/(αk) and1 + 1/k − 1/(αk), belong to the interval[1, 2], and given the valuesΨ(0)(1) = −γ, Ψ(0)(2) = −γ + 1, Ψ(1)(1) = π2/6, Ψ(1)(2) = π2/6 − 1, wederive the following linear bounds for all1 ≤ z ≤ 2:

Ψ(0)(z) ≥ −γ − 1 + z and Ψ(1)(z) ≤ π2

6+ 1 − z. (83)

Next, by first upper boundingΨ(0)(2 − 1/(αk)) with 1 − γ, and then using the inequalities from (83) in(82), one obtains after some easy algebra

d2

dα2log Kk(α) ≤ 18 + (−18 − 24k + π2)α − 2k(−18 + π2)α2 + k2(−12 + π2)α3

6k2α4(αk − 1)2. (84)

26

Thus, it is left to prove that for anyα > 1 andk ≥ 2, the expression on the right hand side of (84) is less orequal to zero. In that respect, we analyze the numerator of (84)

f(α, k) � 18 + (−18 − 24k + π2)α − 2k(−18 + π2)α2 + k2(−12 + π2)α3.

First, we show that functionf(α, k) is decreasing ink for all k ≥ 2 and any fixedα > 1. Thus, by takingthe derivative off with respect tok, we obtain

df

dk= α[2kα2(−12+π2)−2α(−18+π2)−24] ≤ α[2α2(−24+2π2)−2α(−18+π2)−24] < −8α < −8,

since the maximum of the quadratic term in the previous expression is less than−8. Therefore, functionf(α, k) decreases ink. Sincek ≥ 2 andα > 1, we can upper-bound its value byf(α, 2), i.e.,

f(α, k) ≤ 18(1 − α) + (−48 + π2)α − 4(−18 + π2)α2 + 4(−12 + π2)α3

≤ α[4α2(−12 + π2) + α(72 − 4π2) − 48 + π2] < −7α < −7,

since in this case the quadratic term in the second inequality is less than−7. Therefore, we obtained thatf(α, k) ≤ 0 for all k ≥ 2, α > 1, and, thus, from (84) it follows thatd2/dα2 log Kk(α) ≤ 0. Thisconcludes the proof of monotonicity of functionKk(α) in α. Finally, the second limit in (25) follows bystraightforward application of the equation 6.1.33, p. 256 of [1]. Next, the first limit in (25) in the case ofk = 1 follows from Theorem 3 of [9]. Otherwise, fork ≥ 2, the limit follows directly from expression (24)after replacingα = 1.

In order to prove the monotonicity of functionKk(α) in k, we observe the first derivative with respectto k of function log Kk(α) and obtain

d

dklog Kk(α) =

(1 − 1

α

)1k2

[Ψ(0)

(1 − 1

αk

)− Ψ(0)

(1 +

1k− 1

αk

)]< 0, (85)

where the last inequality follows from the monotonicity of functionΨ(0)(z) as discussed before. Thus,Kk(α) is monotonically decreasing ink for all realk ≥ 1. In particular,Kk(α) is decreasing for integervalues ofk ≥ 1. Finally, the monotonicity ofKk(α) and (25) imply (26), i.e.

1 = Kk(1) ≤ Kk(α) ≤ Kk(∞) → 1 as k → ∞,

which concludes the proof of the theorem.�

The completion of the proof of Theorem 2: First, observe the functionf(η) as defined in (52). Afterexpressing the functionΓ(−1/(αk), η) in integral form and applying the integration by parts, we obtain

f(η) = 1 − 1αk

η1

αk

∫ ∞

ηe−tt−

1αk

−1dt

= 1 − e−η + η1

αk

∫ ∞

ηe−tt−

1αk dt. (86)

Now, since for anyη > 0d

dηf(η) =

1αk

η1

αk−1

∫ ∞

ηe−tt−

1αk dt > 0,

27

we conclude thatf(η) is monotonically increasing inη > 0. Now, note that from (86), it is straightforwardto concludef(η) > 0 for anyη > 0. Furthermore, sincef(η) is continuous function inη ≥ 0 and

f(η) ≤ 1 − e−η + η1

αk Γ(

1 − 1αk

)→ 0 as η → 0,

we concludelimη→0 f(η) = 0. Next, note that forη ≥ 1, we can upper boundf(η) as

f(η) = 1 − e−η + η1

αk

∫ ∞

ηe−tt−

1αk dt ≤ 1 − e−η + η

1αk η−

1αk e−η = 1,

and, therefore1 ≥ f(η) ≥ 1 − e−η → 1 as η → ∞.

Thus,f(η) is strictly increasing continuous function forη > 0 with limη→0 f(η) = 0 andlimη→∞ f(η) =1. This implies that for any0 < δ < 1, there is a unique solutionη(δ) > 0 of the equationf(η) = δ.

Next, we prove the limitting value of the asymptotic expression in (44) whenk → ∞. Note that since

Γ(

1 − 1αk

)−(

1 − 1αk

)−1

η1− 1αk = Γ

(1 − 1

αk

)− ∈ ϑηt−

1αk dt ≤

∫ ∞

ηe−tt−

1αk dt ≤ Γ

(1 − 1

αk

),

(87)then, using the continuity ofΓ(x) at x = 1, for anyε > 0, there existsη0 andk0, such that for allk ≥ k0,0 < η ≤ η0

1 − ε ≤∫ ∞

ηe−tt−

1αk dt ≤ 1 + ε. (88)

Thus, for anyk ≥ k0 andη ≤ η0

f(η) ≥ f1(η) � η1

αk (1 − ε). (89)

Now, using (89) and the fact that functionsf(η), f1(η) are monotonically increasing, if the solutionη∗k,δ ofthe equationf1(η) = δ satisfiesη∗k,δ ≤ η0, it follows that the unique solutionηk,δ of the equationf(η) = δmust satisfy

ηk,δ ≤ η∗k,δ. (90)

Then, from (89), it follows

ηk,δ ≤(

δ

1 − ε

)αk

,

and, therefore, ifε < 1− δ, there existsk0 (possibly greater than before), such that for allk ≥ k0 inequalityη∗k,δ ≤ η0 holds, which in conjunction with (90) yields

limk→∞

ηk,δ = 0. (91)

Next, it is not hard to check that

d

dηk,δFk(δ) =

(1 − α

αk

)2

η1

αk− 1

k−1

k,δ

∫ ∞

ηk,δ

e−tt1k− 1

αk−1dt − 1 − α

αke−ηk,δη−1

k,δ

≤(

1 − α

αk

)2

η1

αk− 1

k−1

k,δ e−ηk,δ

∫ ∞

ηk,δ

t1k− 1

αk−1dt − 1 − α

αke−ηk,δη−1

k,δ = 0,

28

and, therefore, for allk large, after using (90) and replacingηk,δ with η∗k,δ in expression (44), applyingintegration by parts and similar arguments as in (87), we lower boundFk(δ) as

Fk(δ) ≥ 1 − α

αk(η∗k,δ)

1αk

− 1k

∫ ∞

η∗k,δ

e−tt1k− 1

αk−1dt

= 1 − (η∗k,δ)1

αk− 1

k Γ(

1 +1k− 1

αk, η∗k,δ

)→ 1 −

(δ

1 − ε

)1−α

as k → ∞, (92)

which, after lettingε → 0, implieslim

k→∞Fk(δ) ≥ 1 − δ1−α. (93)

Next, similarly as before, using the arguments that led to (88) and the limit in (91), for anyε > 0 there existsk0 > 0, such that for allk ≥ k0, inequalityηk,δ ≤ η0 holds and, furthermore, we can upper boundf(η) forall η ≤ η0 as

f(η) ≤ f2(η) � ε + (1 + ε)η1

αk .

Then, using the analogous reasoning as before, for allk ≥ k0, the solutionη∗k,δ = ((δ − ε)/(1 + ε))αk tothe equationf2(η) = δ must be smaller thanηk,δ. Thus, using the monotonicity ofFk(δ) in ηδ and similararguments as in (92), after replacingηδ with ηk,δ in (44), we obtain

limk→∞

Fk(δ) ≤ 1 −(

δ − ε

1 + ε

)1−α

,

which, after lettingε → 0, in conjunction with (93), yields (45).�

References

[1] M. Abramowitz and I. A. Stegun, editors.Handbook of Mathematical Functions. Dover Publications, Inc., NewYork, 1974.

[2] O. Bahat and A. M. Makowski. Optimal replacement policies for non-uniform cache objects with optionaleviction. InProceedings of Infocom 2003, San Francisco, California, USA, April 2003.

[3] W. A. Borodin, S. Irani, P. Raghavan, and B. Schieber. Competitive paging with locality of reference.Journal ofComputer and System Science, 50(2):244–258, 1995.

[4] E. Cinlar. Introduction to Stochastic Processes. Prentice–Hall, 1975.

[5] J. A. Fill. Limits and rate of convergence for the distribution of search cost under the move-to-front rule.Theo-retical Computer Science, 164:185–206, 1996.

[6] J. A. Fill and L. Holst. On the distribution of search cost for the move-to-front rule.Random Structures andAlgorithms, 8(3):179, 1996.

[7] P. Flajolet, D. Gardy, and L. Thimonier. Birthday paradox, coupon collector, caching algorithms and self-organizing search.Discrete Applied Mathematics, 39:207–229, 1992.

[8] G. H. Gonnet, J. I. Munro, and H. Suwanda. Exegesis of self-organizing linear search.SIAM J. Comput.,10(3):613–637, 1981.

[9] P. R. Jelenković. Asymptotic approximation of the move-to-front search cost distribution and least-recently-usedcaching fault probabilities.Annals of Applied Probability, 9(2):430–464, 1999.

29

[10] P. R. Jelenković. Least-Recently-Used Caching with Zipf’s Law Requests. InThe Sixth INFORMS Telecommu-nications Conference, Boca Raton, Florida, 2002.

[11] P. R. Jelenković and A. Radovanović. Optimizing LRU for variable document sizes.Combinatorics, Probability& Computing, 13:1–17, 2004.

[12] P. R. Jelenković and A. Radovanović. Least-recently-used caching with dependent requests. Theoretical com-puter science, 2004, in press.

[13] Y. C. Kan and S. M. Ross. Optimal list order under partial memory constraints.Journal of Applied Probability,17:1004–1015, 1980.

[14] Tracy Kimbrel. Online paging and file caching with expiration times.Theoretical Computer Science, 268:119–131, 2001.

[15] Elizabeth J. O’Neil, Patrick E. O’Neil, and Gerhard Weikum. An optimality proof of the LRU-K page replace-ment algorithm.Journal of the ACM, 46:92–112, 1999.

30