Practical Bounds on Optimal Caching with Variable Object Sizesdberger1/pdf/2018PracticalBound... · 2018. 7. 5. · 32 Practical Bounds on Optimal Caching with Variable Object Sizes

32

Practical Bounds on Optimal Caching with Variable Object Sizes

DANIEL S. BERGER, Carnegie Mellon University

NATHAN BECKMANN, Carnegie Mellon University

MOR HARCHOL-BALTER, Carnegie Mellon University

Many recent caching systems aim to improve miss ratios, but there is no good sense among practitioners

of how much further miss ratios can be improved. In other words, should the systems community continue

working on this problem?

Currently, there is no principled answer to this question. In practice, object sizes often vary by several

orders of magnitude, where computing the optimal miss ratio (OPT) is known to be NP-hard. The few known

results on caching with variable object sizes provide very weak bounds and are impractical to compute on

traces of realistic length.

We propose a new method to compute upper and lower bounds on OPT. Our key insight is to represent

caching as a min-cost flow problem, hence we call our method the flow-based offline optimal (FOO). We prove

that, under simple independence assumptions, FOO’s bounds become tight as the number of objects goes to

infinity. Indeed, FOO’s error over 10M requests of production CDN and storage traces is negligible: at most

0.3%. FOO thus reveals, for the first time, the limits of caching with variable object sizes.

While FOO is very accurate, it is computationally impractical on traces with hundreds of millions of requests.

We therefore extend FOO to obtain more efficient bounds on OPT, which we call practical flow-based offlineoptimal (PFOO). We evaluate PFOO on several full production traces and use it to compare OPT to prior online

policies. This analysis shows that current caching systems are in fact still far from optimal, suffering 11–43%

more cache misses than OPT, whereas the best prior offline bounds suggest that there is essentially no room

for improvement.

CCS Concepts: • Theory of computation → Caching and paging algorithms; Approximation algorithmsanalysis; • General and reference → Metrics; Performance;

Additional Key Words and Phrases: Caching, optimal caching, offline optimum, limits of caching

ACM Reference Format:

Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter. 2018. Practical Bounds on Optimal Caching

with Variable Object Sizes. Proc. ACM Meas. Anal. Comput. Syst. 2, 2, Article 32 (June 2018), 37 pages. https://doi.org/10.1145/3224427

1 INTRODUCTIONCaches are pervasive in computer systems, and their miss ratio often determines end-to-end

application performance. For example, content distribution networks (CDNs) depend on large,

geo-distributed caching networks to serve user requests from nearby datacenters, and their response

time degrades dramatically when the requested content is not cached nearby. Consequently, there

has been a renewed focus on improving cache miss ratios, particularly for web services and content

Authors’ addresses: Daniel S. Berger, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, dsberger@cs.

cmu.edu; Nathan Beckmann, Carnegie Mellon University, [email protected]; Mor Harchol-Balter, Carnegie Mellon

University, [email protected].

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee

provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the

full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored.

Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires

prior specific permission and/or a fee. Request permissions from [email protected].

© 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM.

2476-1249/2018/6-ART32 $15.00

https://doi.org/10.1145/3224427

Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.

https://doi.org/10.1145/3224427

https://doi.org/10.1145/3224427

https://doi.org/10.1145/3224427

32:2 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter

delivery [16, 17, 24, 34, 42, 48, 60, 62, 68]. These systems have demonstrated significant miss ratio

improvements over least-recently used (LRU) caching, the de facto policy in most production

systems. But how much further can miss ratios improve? Should the systems community continueworking on this problem, or have all achievable gains been exhausted?

1.1 The problem: Finding the offline optimalTo answer these questions, onewould like to know the best achievablemiss ratio, free of constraints—

i.e., the offline optimal (OPT). Unfortunately, very little is known about OPT with variable object

sizes. For objects with equal sizes, computing OPT is simple (i.e., Belady [13, 64]), and it is widely

used in the systems community to bound miss ratios. But object sizes often vary widely in practice,

from a few bytes (e.g., metadata [69]) to several gigabytes (e.g., videos [16, 48]). We need a way to

compute OPT for variable object sizes, but unfortunately this is known to be NP-hard [23].

There has been little work on bounding OPT with variable object sizes, and all of this work gives

very weak bounds. On the theory side, prior work gives only three approximation algorithms [4,

7, 50], and the best approximation is only provably within a factor of 4 of OPT. Hence, when this

algorithm estimates a miss ratio of 0.4, OPT may lie anywhere between 0.1 and 0.4. This is a bigrange—in practice, a difference of 0.05 in miss ratio is significant—, so bounds from prior theory are

of limited practical value. From a practical perspective, there is an even more serious problem with

the theoretical bounds: they are simply too expensive to compute. The best approximation takes

24 hours to process 500 K requests and scales poorly (Section 2), while production traces typically

contain hundreds of millions of requests.

Since the theoretical bounds are incomputable, practitioners have been forced to use conservative

lower bounds or pessimistic upper bounds on OPT. The only prior lower bound is an infinitely

large cache [1, 24, 48], which is very conservative and gives no sense of how OPT changes at

different cache sizes. Belady variants (e.g., Belady-Size in Section 2.4) are widely used as an upper

bound [46, 48, 61, 77], despite offering no guarantees of optimality. While these offline bounds are

easy to compute, we will show that they are in fact far from OPT. They have thus given practitioners

a false sense of complacency, since existing online algorithms often achieve similar miss ratios to

these weak offline upper bounds.

1.2 Our approach: Flow-based offline optimalWe propose a new approach to compute bounds on OPT with variable object sizes, which we

call the flow-based offline optimal (FOO). The key insight behind FOO is to represent caching as a

min-cost flow problem. This formulation yields a lower bound on OPT by allowing non-integer

decisions, i.e., letting the cache retain fractions of objects for a proportionally smaller reward. It also

yields an upper bound on OPT by ignoring all non-integer decisions. Under simple independence

assumptions, we prove that the non-integer decisions become negligible as the number of objects

goes to infinity, and thus the bounds are asymptotically tight.

Our proof is based on the observation that an optimal policy will strictly prefer some requests

over others, forcing integer decisions. We show such preferences apply to almost all requests by

relating such preferences to the well-known coupon collector problem.

Indeed, FOO’s error over 10M requests of five production CDN and web application traces is

negligible: at most 0.15%. Even on storage traces, which have highly correlated requests that violate

our proof’s independence assumptions, FOO’s error remains below 0.3%. FOO thus reveals, for the

first time, the limits of caching with variable object sizes.

While FOO is very accurate, it is too computationally expensive to apply directly to production

traces containing hundreds of millions of requests. To extend our analysis to such traces, we develop

more efficient upper and lower bounds on OPT, which we call practical flow-based offline optimal


Practical Bounds on Optimal Caching with Variable Object Sizes 32:3

(PFOO). PFOO enables the first analysis of optimal caching on traces with hundreds of millions of

requests, and reveals that there is still significant room to improve current caching systems.

1.3 Summary of resultsPFOO yields nearly tight bounds, as shown in Figure 1. This figure plots the miss ratio obtained on

a production CDN trace with over 400 million requests for several techniques: online algorithms

(LRU, GDSF [22], and AdaptSize [16]), prior offline upper bounds (Belady and Belady-Size), the

only prior offline lower bound (Infinite-Cap), and our new upper and lower bounds on OPT (PFOO).

The theoretical bounds, including both prior work and FOO, are too slow to run on traces this long.

0.0

0.1

0.2

LRUAdaptSize

GDSFBelady

Belady-Size

Mis

s R

atio 22%60%

51%

0.43 0.35

InfiniteCache

Online Offline

PFOO

U U U

upperboundU

L L

lowerboundL

Fig. 1. Miss ratios on a production CDN trace for a4GB cache. Prior to our work, the best prior upperbound on OPT is within 1% of online algorithms onthis trace, leading to the false impression that there isno room for improvement. The only prior lower boundon OPT (Infinite-Cap) is 60% lower than the best upperbound on OPT (Belady-Size). By contrast, PFOO pro-vides nearly tight upper and lower bounds, which are22% below the online algorithms. PFOO thus showsthat there is actually significant room for improvingcurrent caching algorithms.

Figure 1 illustrates the two problems with prior practical bounds and how PFOO improves upon

them. First, prior bounds on OPT are very weak. The miss ratio of Infinite-Cap is 60% lower than

Belady-Size, whereas the gap between PFOO’s upper and lower bounds is less than 1%. Second,

prior bounds on OPT are misleading. Comparing GDSF and Belady-Size, which are also within 1%,

one would conclude that online policies are nearly optimal. In contrast, PFOO gives a much better

upper bound and shows that there is in fact a 22% gap between state-of-the-art online policies and

OPT.

We evaluate FOO and PFOO extensively on eight production traces with hundreds of millions

of requests across cache capacities from 16MB to 64GB. On CDN traces from two large Internet

companies, PFOO bounds OPT within 1.9% relative error at most and 1.4% on average. On web

application traces from two other large Internet companies, PFOO bounds OPT within 8.6% rela-

tive error at most and 6.1% on average. On storage traces from Microsoft [79], where our proof

assumptions do not hold, PFOO still bounds OPT within 11% relative error at most and 5.7% on

average.

PFOO thus gives nearly tight bounds on the offline optimal miss ratio for a wide range of realworkloads. We find that PFOO achieves on average 19% lower miss ratios than Belady, Belady-Size,

and other obvious offline upper bounds, demonstrating the value of our min-cost flow formulation.

1.4 ContributionsIn summary, this paper contributes the following:

• We present flow-based offline optimal (FOO), a novel formulation of offline caching as a min-cost

flow problem (Sections 3 and 4). FOO exposes the underlying problem structure and enables our

remaining contributions.

• We prove that FOO is asymptotically exact under simple independence assumptions, giving the

first tight, polynomial-time bound on OPT with variable object sizes (Section 5).



• We present practical flow-based offline optimal (PFOO), which relaxes FOO to give fast and nearly

tight bounds on real traces with hundreds of millions of requests (Section 6). PFOO gives the

first reasonably tight lower bound on OPT on long traces.

• We perform the first extensive evaluation of OPT with variable object sizes on eight production

traces (Sections 7 and 8). In contrast to prior offline bounds, PFOO reveals that the gap between

online policies and OPT is much larger than previously thought: 27% on average, and up to 43%

on web application traces.

Our implementations of FOO, PFOO, and of the previously unimplemented approximation algo-

rithms are publicly available1.

2 BACKGROUND ANDMOTIVATIONLittle is known about how to efficiently compute OPT with variable object sizes. On the theory

side, the best known approximation algorithms give very weak bounds and are too expensive to

compute. On the practical side, system builders use offline heuristics that are much cheaper to

compute, but give no guarantee that they are close to OPT. This section surveys known theoretical

results on OPT and the offline heuristics used in practice.

2.1 Offline bounds are more robust than online boundsFigure 1 showed a 22% gap between the best online policies and OPT. One might wonder whether

this gap arises from comparing an offline policy with an online policy, rather than from anyweakness

in the online policies. In other words, offline policies have an advantage because they know the

future. Perhaps this advantage explains the gap.

This raises the question of whether we could instead compute the online optimal miss ratio.

Such a bound would be practically useful, since all systems necessarily use online policies. Our

answer is that we would like to, but unfortunately this is impossible. To bound the online optimal

miss ratio, one must make assumptions about what information an online policy can use to make

caching decisions. For example, traditional policies such as LRU and LFU make decisions using

the history of past requests. These policies are conceptually simple enough that prior work has

proven online bounds on their performance in a variety of settings [2, 6, 36, 49, 72]. However, real

systems have quirks and odd behaviors that are hard to capture in tractable assumptions. With

enough effort, system designers can exploit these quirks to build online policies that outperform theonline bounds and approach OPT’s miss ratio for specific workloads. For example, Hawkeye [51]

recently demonstrated that a seemingly irrelevant request feature in computer architecture (the

program counter) is in fact tightly correlated with OPT’s decisions, allowing Hawkeye to mimic

OPT on many programs. Online bounds are thus inherently fragile because real systems behave

in strange ways, and it is impossible to bound the ingenuity of system designers to discover and

exploit these behaviors.

We instead focus on the offline optimal to get robust bounds on cache performance. Offline

bounds are widely used by practitioners, especially when objects have the same size and OPT

is simple to compute [13, 64]. However, computing OPT with variable object sizes is strongly

NP-complete [23], meaning that no fully polynomial-time approximation scheme (FPTAS) can

exist.2We are therefore unlikely to find tight bounds on OPT for arbitrary traces. Instead, our

approach is to develop a technique that can estimate OPT on any trace, which we call FOO, and

then show that FOO yields tight bounds on traces seen in practice. To do this, our approach is

two-fold. First, we prove that FOO gives tight bounds on traces that obey the independent reference

1https://github.com/dasebe/optimalwebcaching

2That no FPTAS can exist follows from Corollary 8.6 in [83], as OPT meets the assumptions of Theorem 8.5.


https://github.com/dasebe/optimalwebcaching


model (IRM), the most common assumptions used in prior analysis of online policies [2, 14, 15,

20, 25, 26, 29, 31, 37, 38, 40–43, 45, 52–54, 58, 63, 65, 72, 73, 75, 76, 80, 82, 86]. The IRM assumption

is only needed for the proof; FOO is designed to work on any trace and does not itself rely on

any properties of the IRM. Second, we show empirically that FOO’s error is less than 0.3% on

real-world traces, including several that grossly violate the IRM. Together, we believe these results

demonstrate that FOO is both theoretically well-founded and practically useful.

2.2 OPT with variable object sizes is hardSince OPT is simple to compute for equal object sizes [13, 64], it is surprising that OPT becomes

NP-hard when object sizes vary. Caching may seem similar to Bin-Packing or Knapsack, which

can be approximated well when there are only a few object sizes and costs. But caching is quite

different because the order of requests constrains OPT’s choices in ways that are not captured by

these problems or their variants. In fact, the NP-hardness proof in [23] is quite involved and reduces

from Vertex Cover, not Knapsack. Furthermore, OPT remains strongly NP-complete even with just

three object sizes [39], and heuristics that work well on Knapsack perform badly in caching (see

“Freq/Size” in Section 2.4 and Section 8).

2.3 Prior bounds on OPT with variable object sizes are impracticalPrior work gives only three polynomial time bounds onOPT [4, 7, 50], which vary in time complexity

and approximation guarantee. Table 1 summarizes these bounds by comparing their asymptotic run-

time, how many requests can be calculated in practice (e.g., within 24 hrs), and their approximation

guarantee.

Technique Time Requests / 24hrs Approximation

OPT NP-hard [23] <1K 1

LP rounding [4] Ω(N 5.6) 50 K O(log

maxi si mini si

)LocalRatio [7] O(N 3) 500 K 4

OFMA [50] O(N 2) 28M O(logC)

FOOa O(N 3/2) 28M 1

PFOOb O(N logN ) 250M ≈1.06

Notation: N is the trace length, C is the cache capacity, and si is the size of object i .aFOO’s approximation guarantee holds under independence assumptions.

bPFOO does not have an approximation guarantee but its upper and lower bounds are

within 6% on average on production traces.

Table 1. Comparison of FOO and PFOO to prior bounds on OPT with variable object sizes. Computing OPTis NP-hard. Prior bounds [4, 7, 50] provide only weak approximation guarantees, whereas FOO’s bounds aretight. PFOO performs well empirically and can be calculated for hundreds of millions of requests.

Albers et al. [4] propose an LP relaxation of OPT and a rounding scheme. Unfortunately, the LP

requires N 2variables, which leads to a high Ω(N 5.6)-time complexity [59]. Not only is this running

time high, but the approximation factor is logarithmic in the ratio of largest to smallest object (e.g.,

around 30 on production traces), making this approach impractical.

Bar et al. [7] propose a general approximation framework (which we call LocalRatio), which can

be applied to the offline caching problem. This algorithm gives the best known approximation

guarantee, a factor of 4. Unfortunately, this is still a weak guarantee, as we saw in Section 1.

Additionally, LocalRatio is a purely theoretical algorithm, with a high running time of O(N 3), and



which we believe had not been implemented prior to our work. Our implementation of LocalRatio

can calculate up to 500 K requests in 24 hrs, which is only a small fraction of the length of production

traces.

Irani proposes the OFMA approximation algorithm [50], which has O(N 2) running time. This

running time is small enough for our implementation of OFMA to run on small traces (Section 8).

Unfortunately, OFMA achieves aweak approximation guarantee, logarithmic in the cache capacityC ,and in fact OFMA does badly on our traces, giving much weaker bounds than simple Belady-inspired

heuristics.

Hence, prior work that considers adversarial assumptions yields only weak approximation

guarantees.We therefore turn to stochastic assumptions to obtain tight bounds on the kinds of traces

actually seen in practice. Under independence assumptions, FOO achieves a tight approximation

guarantee on OPT, unlike prior approximation algorithms, and also has asymptotically better

runtime, specifically O(N 3/2

).

We are not aware of any prior stochastic analysis of offline optimal caching. In the context of

online caching policies, there is an extensive body of work using stochastic assumptions similar

to ours [12, 14, 15, 20, 25, 26, 29, 31, 37, 38, 40–43, 45, 52–54, 58, 63, 65, 72, 73, 75, 76, 80, 82, 86],

of which five prove optimality results [2, 6, 36, 49, 72]. Unfortunately, these results are only for

objects with equal sizes, and these policies perform poorly in our experiments because they do not

account for object size.

2.4 Heuristics used in practice to bound OPT give weak boundsSince the running times of prior approximation algorithms are too high for production traces,

practitioners have been forced to rely on heuristics that can be calculated more quickly. However,

these heuristics only give upper bounds on OPT, and there is no guarantee on how close to OPT

they are.

The simplest offline upper bound is Belady’s algorithm, which evicts the object whose next use lies

furthest in the future.3Belady is optimal in caching variants with equal object sizes [13, 44, 55, 64].

Even though it has no approximation guarantees for variable object sizes, it is still widely used

in the systems community [46, 48, 61, 77]. However, as we saw in Figure 1, Belady performs very

badly with variable object sizes and is easily outperformed by state-of-the-art online policies.

A straightforward size-aware extension of Belady is to evict the object with the highest cost =

object size × next-use distance. We call this variant Belady-Size. Among practitioners, Belady-Size

is widely believed to perform near-optimally, but it has no guarantees. It falls short on simple

examples: e.g., imagine that A is 4MB and is referenced 10 requests hence and never referenced

again, and B is 5MB and is referenced 9 and 12 requests hence. With 5MB of cache space, the best

choice between these objects is to keep B, getting two hits. But A has cost = 4 × 10 = 40, and B has

cost = 5 × 9 = 45, so Belady-Size keeps A and gets only one hit. In practice, Belady-Size often leads

to poor decisions when a large object is referenced twice in short succession: Belady-Size will evict

many other useful objects to make space for it, sacrificing many future hits to gain just one.

Alternatively, one could use Knapsack heuristics as size-aware offline upper bounds, such as the

density-ordered Knapsack heuristic, which is known to perform well on Knapsack in practice [33].

We call this heuristic Freq/Size, as Freq/Size evicts the object with the lowest utility = frequency /

size, where frequency is the number of requests to the object. Unfortunately, Freq/Size also falls

short on simple examples: e.g., imagine that A is 2MB and is referenced 10 requests hence, and Bis (as before) 5MB and is referenced 9 and 12 requests hence. With 5MB of cache space, the best

3Belady’s MIN algorithm actually operates somewhat differently. The algorithm commonly called “Belady” was invented

(and proved to be optimal) by Mattson [64]. For a longer discussion, see [66].



choice between these objects is to keep B, getting two hits. But A has utility = 1 ÷ 2 = 0.5, and B has

utility = 2 ÷ 5 = 0.4, so Freq/Size keeps A and gets only one hit. In practice, Freq/Size often leads to

poor decisions when an object is referenced in bursts: Freq/Size will retain such objects long after

the burst has passed, wasting space that could have earned hits from other objects.

Though these heuristics are easy to compute and intuitive, they give no approximation guarantees.

We will show that they are in fact far from OPT on real traces, and PFOO is a much better bound.

3 FLOW-BASED OFFLINE OPTIMALThis section gives a conceptual roadmap for our proof of FOO’s optimality, which we present

formally in Sections 4 and 5. Throughout this section we use a small request trace shown in Figure 2

as a running example. This trace contains four objects, a, b, c, and d, with sizes 3, 1, 1, and 2,

respectively.

Object a b c b d a c d a b b a

Size 3 1 1 1 2 3 1 2 3 1 1 3

Fig. 2. Example trace of requests to objects a, b, c, and d, of sizes 3, 1, 1, and 2, respectively.

First, we introduce a new integer linear program to represent OPT (Section 3.1). After relaxing

integrality constraints, we derive FOO’smin-cost flow representation, which can be solved efficiently

(Section 3.2). We then observe how FOO yields tight upper and lower bounds on OPT (Section 3.3).

To prove that FOO’s bounds are tight on real-world traces, we relate the gap between FOO’s upper

and lower bounds to the occurrence of a partial order on intervals, and then reduce the partial

order’s occurrence to an instance of the generalized coupon collector problem (Section 3.4).

3.1 Our new interval representation of OPTWe start by introducing a novel representation of OPT. Our integer linear program (ILP) minimizes

the number of cache misses, while having full knowledge of the request trace.

We exploit a unique property of offline optimal caching: OPT never changes its decision to

cache object k in between two requests to k (see Section 4). This naturally leads to an interval

representation of OPT as shown in Figure 3. While the classical representation of OPT uses decision

variables to track the state of every object at every time step [4], our ILP only keeps track of

interval-level decisions. Specifically, we use decision variables xi to indicate whether OPT caches

the object requested at time i , or not.


IntervalDeci-

sionVariables x1

x2x3

x5

x4

x7x6

x8

Fig. 3. Interval ILP representation of OPT.



3.2 FOO’s min-cost flow representationThis interval representation leads naturally to FOO’s flow-based representation, shown in Figure 4.

We use the min cost flow notation shown in Table 2. The key idea is to use flow to represent the

interval decision variables. Each request is represented by a node. Each object’s first request is a

source of flow equal to the object’s size, and its last request is a sink of flow in the same amount.

This flow must be routed along intervening edges, and hence min-cost flow must decide whether

to cache the object throughout the trace.

i

βi

j

βj

(cap, cost) cap Capacity of edge (i, j), i.e., maximum flow along this edge.

cost Cost per unit flow on edge (i, j).βi Flow surplus at node i , if βi > 0, flow demand if βi < 0.

Table 2. Notation for FOO’s min-cost flow. Edges are labeled with their (capacity, cost), and nodes are labeledwith flow −demand or +surplus .

a

+3

b

+1

c

+1

b d

+2

a c

−1

d

−2

a b b

−1

a

−3

(3, 0) (3, 0) (3, 0) (3, 0) (3, 0) (3, 0) (3, 0) (3, 0) (3, 0) (3, 0) (3, 0)

(3, 1/3) (3, 1/3) (3, 1/3)

(1, 1)(1, 1)

(1, 1)

(1, 1)

(2, 1/2)

Fig. 4. FOO’s min-cost flow problem for the short trace in Figure 2, using the notation from Table 2. Nodesrepresent requests, and cost measures cache misses. Requests are connected by central edges with capacityequal to the cache capacity and cost zero—flow routed along this path represents cached objects (hits). Outeredges connect requests to the same object, with capacity equal to the object’s size—flow routed along thispath represents misses. The first request for each object is a source of flow equal to the object’s size, and thelast request is a sink of flow of the same amount. Outer edges’ costs are inversely proportional to object sizeso that they cost 1 miss when an entire object is not cached. The min-cost flow achieves the fewest misses.

For cached objects, there is a central path of black edges connecting all requests. These edges

have capacity equal to the cache capacity and cost zero (since cached objects lead to zero misses).

Min-cost flow will thus route as much flow as possible through this central path to avoid costly

misses elsewhere [3].

To represent cache misses, FOO adds outer edges between subsequent requests to the same

object. For example, there are three edges along the top of Figure 4 connecting the requests to a.

These edges have capacity equal to the object’s size s and cost inversely proportional to the object’ssize 1/s . Hence, if an object of size s is not cached (i.e., its flow s is routed along this outer edge), it

will incur a cost of s × (1/s) = 1 miss.

The routing of flow through this graph implies which objects are cached and when. When no

flow is routed along an outer edge, this implies that the object is cached and the subsequent request

is a hit. All other requests, i.e., those with any flow routed along an outer edge, are misses. The

min-cost flow gives the decisions that minimize total misses.

3.3 FOO yields upper and lower bounds on OPTFOO can deviate from OPT as there is no guarantee that an object’s flow will be entirely routed

along its outer edge. Thus, FOO allows the cache to keep fractions of objects, accounting for only a

fractional miss on the next request to that object. In a real system, each fractional miss would be a

full miss. This error is the price FOO pays for making the offline optimal computable.



To deal with fractional (non-integer) solutions, we consider two variants of FOO. FOO-L keeps

all non-integer solutions and is therefore a lower bound on OPT. FOO-U considers all non-integer

decisions as uncached, “rounding up” flow along outer edges, and is therefore an upper bound on

OPT. We will prove this in Section 4.


OPT decision

FOO-L decision 0 1 1 11

20 1

1

20 1 0 0

FOO-U decision 0 1 1 1 0 0 1 0 0 1 0 0

Fig. 5. Caching decisions made by OPT, FOO-L, and FOO-U with a cache capacity of C = 3.

Figure 5 shows the caching decisions made by OPT, FOO-L, and FOO-U assuming a cache of size

3. A “” indicates that OPT caches the object until its next request, and a “” indicates it is notcached. OPT suffers five misses on this trace by caching object b and either c or d. OPT caches b

because it is referenced thrice and is small. This leaves space to cache the two references to either

c or d, but not both. (OPT in Figure 5 chooses to cache c since it requires less space.) OPT does not

cache a because it takes the full cache, forcing misses on all other requests.

The solutions found by FOO-L are very similar to OPT. FOO-L decides to cache objects b and c,

matching OPT, and also caches half of d. FOO-L thus underestimates the misses by one, counting

d’s misses fractionally. FOO-U gives an upper bound for OPT by counting d’s misses fully. In this

example, FOO-U matches OPT exactly.

3.4 Overview of our proof of FOO’s optimalityWe show both theoretically and empirically that FOO-U and FOO-L yield tight bounds. Specifically,

we prove that FOO-L’s solutions are almost always integer when there are many objects (as in

production traces). Thus, FOO-U and FOO-L coincide with OPT.

Our proof is based on a natural precedence relation between intervals such that an optimal policy

strictly prefers some intervals over others. For example, in Figure 2, FOO will always prefer x2 overx1 and x8 over x7. This can be seen in the figure, as interval x2 fits entirely within x1, and likewise

x8 fits within x7. In contrast, no such precedence relation exists between x6 and x4 because a is

larger than b, and so x6 does not fit within x4. Similarly, no precedence relation exists between x2and x5 because, although x5 is longer and larger, their intervals do not overlap, and so x2 does notfit within x5.This precedence relation means that if FOO caches any part of x1, then it must have cached all

of x2. Likewise, if FOO caches any part of x7, then it must have cached all of x8. The precedencerelation thus forces integer solutions in FOO. Although this relation is sparse in the small trace

from Figure 2, as one scales up the number of objects the precedence relation becomes dense. Our

challenge is to prove that this holds on traces seen in practice.

At the highest level, our proof distinguishes between “typical” and “atypical” objects. Atypical

objects are those that are exceptionally unpopular or exceptionally large; typical objects are

everything else. While the precedence relation may not hold for atypical objects, intervals from

atypical objects are rare enough that they can be safely ignored. We then show that for all the

typical objects, the precedence relation is dense. In fact, one only needs to consider precedence

relations among cached objects, as all other interval have zero decision variables. The basic intuition

behind our proof is that a popular cached object almost always takes precedence over another

object. Specifically, it will take precedence over one of the exceptionally large objects, since the only



way it could not is if all of the exceptionally large objects were requested before it was requested

again. There are enough large objects to make this vanishingly unlikely.

This is an instance of the generalized coupon collector problem (CCP). In the CCP, one collects

coupons (with replacement) from an urn with k distinct types of coupons, stopping once all k types

have been collected. The classical CCP (where coupons are equally likely) is a well-studied prob-

lem [35]. The generalized CCP, where coupons have non-uniform probabilities, is very challenging

and the focus of recent work in probability theory [5, 32, 67, 85].

Applying these recent results, we show that it is extremely unlikely that a popular object does

not take precedence over any others. Therefore, there are very few non-integer solutions among

popular objects, which make up nearly all hits, and the gap between FOO-U and FOO-L vanishes

as the number of objects grows large.

4 FORMAL DEFINITION OF FOOThis section shows how to construct FOO and that FOO yields upper and lower bounds on OPT.

Section 4.1 introduces our notation. Section 4.2 defines our new interval representation of OPT.

Section 4.3 relaxes the integer constraints and proves that our min-cost flow representation yields

upper and lower bounds on OPT.

4.1 Notation and definitionsThe trace σ consists of N requests to M distinct objects. The i-th request σi contains the corre-sponding object id, for all i ∈ 1 . . .N . We use si to reference the size of the object σi referencedin the i-th request. We denote the i-th interval (e.g., in Figure 3) by [i, ℓi ), where ℓi is the time of

the next request to object σi after time i , or∞ if σi is not requested again.

OPT minimizes the number of cache misses, while having full knowledge of the request trace.

OPT is constrained to only use cache capacity C (bytes), and is not allowed to prefetch objects as

this would lead to trivial solutions (no misses) [4]. Formally,

Assumption 1. An object k ∈ 1 . . .M can only enter the cache at times i ∈ 1 . . .N with σi = k .

4.2 New ILP representation of OPTWe start by formally stating our ILP formulation of OPT, based on intervals as illustrated in Figure 3.

First, we define the set I of all requests i where σi is requested again, i.e., I = i : ℓi < ∞. I is thetimes when OPT must decide whether to cache an object. For all i ∈ I , we associate a decisionvariable xi . This decision variable denotes whether object σi is cached during the interval [i, ℓi ).Our ILP formulation needs only N −M variables, vs. N ×M for prior approaches [4], and leads

directly to our flow-based approximation.

Definition 1 (Definition of OPT). The interval representation of OPT for a trace of length NwithM objects is as follows.

OPT = min

∑i ∈I

(1 − xi ) (1)

subject to:∑j :j<i<ℓj

sjx j ≤ C ∀i ∈ I (2)

xi ∈ 0, 1 ∀i ∈ I (3)

To represent the capacity constraint at every time step i , our representation needs to find all

intervals [j, ℓj ) that intersect with i , i.e., where j < i < ℓj . Eq. (2) enforces the capacity constraint by



bounding the size of cached intervals to be less than the cache size C . Eq. (3) ensures that decisionsare integral, i.e., that each interval is cached either fully or not at all. Section A.1 proves that our

interval ILP is equivalent to classic ILP formulations of OPT from prior work [4].

Having formulated OPT with fewer decision variables, we could try to solve the LP relaxation

of this specific ILP. However, the capacity constraint, Eq. (2), still poses a practical problem since

finding the intersecting intervals is computationally expensive. Additionally, the LP formulation

does not exploit the underlying problem structure, which we need to bound the number of integer

solutions. We instead reformulate the problem as min-cost flow.

4.3 FOO’s min-cost flow representation of OPTThis section presents the relaxed version of OPT as an instance of min-cost flow (MCF) in a graph

G. We denote a surplus of flow at a node i with βi > 0, and a demand for flow with βi < 0. Each

edge (i, j) in G has a cost per unit flow γ(i, j) and a capacity for flow µ(i, j) (see right-hand side of

Figure 4).

As discussed in Section 3, the key idea in our construction of an MCF instance is that each

interval introduces an amount of flow equal to the object’s size. The graph G is constructed such

that this flow competes for a single sequence of edges (the “inner edges”) with zero cost. These

“inner edges” represent the cache’s capacity: if an object is stored in the cache, we incur zero cost

(no misses). As not all objects will fit into the cache, we introduce “outer edges”, which allow MCF

to satisfy the flow constraints. However, these outer edges come at a cost: when the full flow of an

object uses an outer edge we incur cost 1 (i.e., a miss). Non-integer decision variables arise if part

of an object is in the cache (flow along inner edges) and part is out of the cache (flow along outer

edges).

Formally, we construct our MCF instance of OPT as follows:

Definition 2 (FOO’s representation of OPT). Given a trace with N requests andM objects,the MCF graphG consists ofN nodes. For each request i ∈ 1 . . .N there is a node with supply/demand

βi =

si if i is the first request to σi−si if i is the last request to σi0 otherwise.

(4)

An inner edge connects nodes i and i +1. Inner edges have capacity µ(i,i+1) = C and cost γ(i,i+1) = 0,for i ∈ 1 . . .N − 1.

For all i ∈ I , an outer edge connects nodes i and ℓi . Outer edges have capacity µ(i, ℓi ) = si and costγ(i, ℓi ) = 1/si . We denote the flow through outer edge (i, ℓi ) as fi .FOO-L denotes the cost of an optimal feasible solution to the MCF graphG . FOO-U denotes the cost ifall non-zero flows through outer edges fi are rounded up to the edge’s capacity si .

This representation yields a min-cost flow instance with 2N −M − 1 edges, which is solvable

in O(N 3/2) [8, 9, 28]. Note that while this paper focuses on optimizing miss ratio (i.e., the fault

model [4], where all misses have the same cost), Definition 2 easily supports non-uniform miss

costs by setting outer edge costs to γ(i, ℓi ) = costi/si . We next show how to derive upper and lower

bounds from this min-cost flow representation.

Theorem 1 (FOO bounds OPT). For FOO-L and FOO-U from Definition 2,

FOO-L ≤ OPT ≤ FOO-U (5)

Proof. We observe that fi as defined in Definition 2, defines the number of bytes “not stored”

in the cache. fi corresponds to the i-th decision variable xi from Definition 1 as xi = (1 − fi/si ).



(FOO-L ≤ OPT): FOO-L is a feasible solution for the LP relaxation of Definition 1, because a total

amount of flow si needs to flow from node i to node ℓi (by definition of βi ). At most µ(i,i+1) = Cflows uses an inner edge which enforces constraint Eq. (2). FOO-L is an optimal solution because

it minimizes the total cost of flow along outer edges. Each outer edge’s cost is γ(i, ℓi ) = 1/si , soγ(i, ℓi ) fi = (1 − xi ), and thus

FOO-L = min

∑i ∈I

γ(i, ℓi ) fi= min

∑i ∈I

(1 − xi )≤ OPT (6)

(OPT ≤ FOO-U): After rounding, each outer edge (i, ℓi ) has flow fi ∈ 0, si , so the corresponding

decision variable xi ∈ 0, 1. FOO-U thus yields a feasible integer solution, and OPT yields no more

misses than any feasible solution.

5 FOO IS ASYMPTOTICALLY OPTIMALThis section proves that FOO is asymptotically optimal, namely that the gap between FOO-U and

FOO-L vanishes as the number of objects grows large. Section 5.1 formally states this result and

our assumptions, and Sections 5.2–5.5 present the proof.

5.1 Main result and assumptionsOur proof of FOO’s optimality relies on two assumptions: (i) that the trace is created by stochasticallyindependent request processes and (ii) that the popularity distribution is not concentrated on a

finite set of objects as the number of objects grows.

Assumption 2 (Independence). The request sequence is generated by independently samplingfrom a popularity distribution PM . Object sizes are sampled from an arbitrary continuous size distri-bution S, which is independent ofM and has a finite maxi si .

We assume that object sizes are unique to break ties when making caching decisions. If the object

sizes are not unique, one can simply add small amounts of noise to make them so. We assume a

maximum object size to show the existence of a scaling regime, i.e., that the number of cached

objects grows large as the cache grows large. For the same reason, we exclude trivial cases where

a finite set of objects dominates the request sequence even as the total universe of objects grows

large:

Assumption 3 (Diverging popularity distribution). For any number M > 0 of objects,the popularity distribution PM is defined via an infinite sequenceψk . At any time 1 ≤ i ≤ N ,

P [object k is requested | M objects overall] =ψk∑Mk=1ψk

(7)

The sequenceψk must be positive and diverging such that cache size C → ∞ is required to achieve aconstant miss ratio asM → ∞.

Our assumptions on PMallow for many common distributions, such as uniform popularities (ψk

= 1) or heavy-tailed Zipfian probabilities (ψk = 1/kα for α ≤ 1, as is common in practice [16, 19, 48,

62, 78]). Moreover, with some change to notation, our proofs can be extended to require only that

ψk remains constant over short timeframes. With these assumptions in place, we are now ready to

state our main result on FOO’s asymptotic optimality.

Theorem 2 (FOO is Asymptotically Optimal). Under Assumptions 2 and 3, for any error εand violation probability κ, there exists anM∗ such that for any trace withM > M∗ objects

P [FOO-U − FOO-L ≥ ε N ] ≤ κ (8)



where the trace length N ≥ M log2M and the cache capacity C is scaled withM such that FOO-L’s

miss ratio remains constant.

Theorem 2 states that, as M → ∞, FOO’s miss ratio error is almost surely less than ε for anyε > 0. Since FOO-L and FOO-U bound OPT (Theorem 1), FOO-L = OPT = FOO-U.

The rest of this section is dedicated to the proof Theorem 2. The key idea in our proof is to

bound the number of non-integer solutions in FOO-L via a precedence relation that forces FOO-L

to strictly prefer some decision variables over others, which forces them to be integer. Section 5.2

introduces this precedence relation. Section 5.3 maps this relation to a representation that can be

stochastically analyzed (as a variant of the coupon collector problem). Section 5.4 then shows that

almost all decision variables are part of a precedence relation and thus integer, and Section 5.5

brings all these parts together in the proof of Theorem 2.

5.2 Bounding the number of non-integer solutions using a precedence graphThis section introduces the precedence relation ≺ between caching intervals. The intuition behind

≺ is that if an interval i is nested entirely within interval j , then min-cost flow must prefer i over j .We first formally define ≺, and then state the property about optimal policies in Theorem 3.

Definition 3 (Precedence relation). For two caching intervals [i, ℓi ) and [j, ℓj ), let therelation ≺ be such that i ≺ j (“i takes precedence over j”) if(1) j < i ,(2) ℓj > ℓi , and(3) si < sj .

The key property of ≺ is that it forces integer decision variables.

Theorem 3 (Precedence forces integer decisions). If i ≺ j, then x j > 0 in FOO-L’smin-cost flow solution implies xi = 1.

In other words, if interval i is strictly preferable to interval j , then FOO-L will take all of i beforetaking any of j. The proof of this result relies on the notion of a residual MCF graph [3, p.304 ff],

where for any edge (i, j) ∈ G with positive flow, we add a backwards edge (j, i)with cost γj,i = −γi, j .

Proof. By contradiction. Let G ′be the residual MCF graph induced by a given MCF solution.

Figure 6 sketches the MCF graph in the neighborhood of j, . . . , i, . . . , ℓi , . . . , ℓj .

. . . j . . . i . . . ℓi . . . ℓj . . .

Fig. 6. The precedence relation i ≺ j from Definition 3 forces integer decisions on interval i . In any min-costflow solution, we can reroute flow such that if x j > 0 then xi = 1.

Assume that x j > 0 and that xi < 1, as otherwise the statement is trivially true. Because x j > 0

there exist backwards inner edges all the way between ℓj and j. Because xi < 1, the MCF solution

must include some flow on the outer edge (i, ℓi ), and there exists a backwards outer edge (ℓi , i) ∈ G ′

with cost γℓi ,i = −1/si .



We can use the backwards edges to create a cycle in G ′, starting at j, then following edge (j, ℓj ),

backwards inner edges to ℓi , the backwards outer edge (ℓi , i), and finally backwards inner edges to

return to j. Figure 6 highlights this clockwise path in darker colored edges.

This cycle has cost = 1/sj − 1/si , which is negative because si < sj (by definition of ≺ and since

i ≺ j). As negative-cost cycles cannot exist in a MCF solution [3, Theorem 9.1, p.307], this leads to

a contradiction.

To quantify how many integer solutions there are, we need to know the general structure of the

precedence graph. Intuitively, for large production traces with many overlapping intervals, the

graph will be very dense. We have empirically verified this for our production traces.

Unfortunately, the combinatorial nature of caching traces made it difficult for us to characterize

the general structure of the precedence graph under stochastic assumptions. For example, we

considered classical results on random graphs [56] and the concentration of measure in random

partial orders [18]. None of these paths yielded sufficiently tight bounds on FOO. Instead, we bound

the number of non-integer solutions via the generalized coupon collector problem.

5.3 Relating the precedence graph to the coupon collector problemWe now translate the problem of intervals without child in the precedence graph (i.e., intervals ifor which there exists no i ≺ j) into a tractable stochastic representation.

We first describe the intuition for equal object sizes, and then consider variable object sizes.

Definition 4 (Cached objects). Let Hi denote the set of cached intervals that overlap time i ,excluding i .

Hi =j , i : x j > 0 and i ∈ [j, ℓj )

and hi = |Hi | (9)

We observe that interval [i, ℓi ) is without child if and only if all other objects x j ∈ Hi are requestedat least once in [i, ℓi ). Figure 7 shows an example where Hi consists of five objects (intervals

xa , . . . ,xe ). As all five objects are requested before ℓi , all five intervals end before xi ends, andso [i, ℓi ) cannot fit in any of them. To formalize this observation, we introduce the following

random variables, also illustrated in Figure 7. Li is the length of the i-th interval, i.e., Li = ℓi − i .THi is the time after i when all intervals in Hi end. We observe that THi is the stopping time in a

coupon-collector problem (CCP) where we associate a coupon type with every object in Hi . With

equal object sizes, the event i has a child is equivalent to the event THi > Li .We now extend our intuition to the case of variable object sizes. We now need to consider that

objects inHi can be smaller than si and thus may not be i’s children for a new reason: the precedence

relation (Definition 3) requires i’s children to have size larger than or equal to si . Figure 8 shows anexample where Li is without child because (i) xb , which ends after ℓi , is smaller than si , and (ii) alllarger objects (xa ,xc ,xd ) are requested before ℓi . The important conclusion is that, by ignoring the

smaller objects, we can reduce the problem back to the CCP.

To formalize our observation about the relation to the CCP, we introduce the following random

variables, also illustrated in Figure 8. We define B, which is a subset of the cached objects Hi with a

Li

THi

Hi

xi

xe

xd

xc

xb

xa

ii

Fig. 7. Simplified notation for the coupon collector representation of offlinecaching with equal object sizes. We translate the precedence relation fromTheorem 3 into the relation between two random variables. Li denotes thelength of interval i . THi is the coupon collector time, where we wait until allobjects that are cached at the beginning of Li (Hi denotes these objects) arerequested at least once.



Li

TB

Hi

xi

xd

xc

xa

ii

Fig. 8. Full notation for the coupon collector representation of offline caching.With variable object sizes, we need to ignore all objects with a smaller size thansi (greyed out intervals xb and xe ). We then define the coupon collector timeTB among a subset B ⊂ Hi of cached objects with a size larger than or equalto si . Using this notation, the event TB > Li implies that xi has a child, whichforces xi to be integer by Theorem 3.

size equal to or larger than si , and the coupon collector time TB for B-objects. These definitions areuseful as the event TB > Li implies that i has a child and thus xi is integer, as we now show.

Theorem 4 (Stochastic bound on non-integer variables). For decision variable xi ,i ∈ 1 . . .N , assume that B ⊆ Hi is a subset of cached objects where sj ≥ si for all j ∈ B. Further, letthe random variable TB denote the time until all intervals in B end, i.e., TB = maxj ∈B ℓj − i .If B is non-empty, then the probability that xi is non-integer is upper bounded by the probability

interval i ends after all intervals in B, i.e.,

P [0 < xi < 1] ≤ P [Li > TB ] . (10)

The proof works backwards by assuming thatTB > Li . We then show that this implies that there

exists an interval j with i ≺ j and then apply Theorem 3 to conclude that xi is integer. Finally, weuse this implication to bound the probability.

Proof. Consider an arbitrary i ∈ 1 . . .N with TB > Li . Let [j, ℓj ) denote the interval in B that

is last requested, i.e., ℓj = maxk ∈B ℓk (j exists because B is non-empty). To show that i ≺ j, wecheck the three conditions of Definition 3.

(1) j < i , because j ∈ Hi (i.e., j is cached at time i);(2) ℓj > ℓi , because ℓj = maxk ∈B ℓk = i +TB > i + Li = ℓi ; and(3) si < sj , because j ∈ B (i.e., sj is bigger than si by assumption).

Having shown that i ≺ j, we can apply Theorem 3, so that x j > 0 implies xi = 1. Because j ∈ Hi ,

j is cached x j > 0 and thus xi = 1. Finally, we observe that Li , TB and conclude the theorem’s

statement by translating the above implications, xi = 1 ⇐ i ≺ j ⇐ TB > Li , into probability.

P [0 < xi < 1] = 1 − P [xi ∈ 0, 1] ≤ 1 − P [i ≺ j] ≤ 1 − P [TB > Li ] = P [Li > TB ] . (11)

Theorem 4 simplifies the analysis of non-integer xi to the relation of two random variables, Liand TB . While Li is geometrically distributed, TB ’s distribution is more involved.

We map TB to the stopping time Tb,p of a generalized CCP with b = |B | different coupon types.

The coupon probabilities p follow from the object popularity distribution PMby conditioning

on objects in B. As the object popularities p are not equal in general, characterizing the stopping

time Tb,p is much more challenging than in the classical CCP, where the coupon probabilities are

assumed to be equal. We solve this problem by observing that collecting b coupons under equal

probabilities stops faster than under p. This fact may appear obvious, but it was only recently

shown by Anceaume et al. [5, Theorem 4, p. 415] (the proof is non-trivial). Thus, we can use a

classical CCP to bound the generalized CCP’s stopping time and TB .

Lemma 1 (Connection to classical coupon connector problem). For any object pop-ularity distribution PM , and for q = (1/b, . . . , 1/b), using the notation from Theorem 4

P [TB < l] ≤ P[Tb,q < l

]for any l ≥ 0 . (12)



The proof of this Lemma simply extends the result by Anceaume et al. to account for requests to

objects not in B. We state the full proof in Section A.2.

5.4 Typical objects almost always lead to integer decision variablesWe now use the connection to the coupon collector problem to show that almost all of FOO’s

decision variables are integer. Specifically, we exclude a small number of very large and unpopular

objects, and show that the remaining objects are almost always part of a precedence relation, which

forces the corresponding decision variables to become integer. In Section 5.5, we show that the

excluded fraction is diminishingly small.

We start with a definition of the sets of large objects and the set of popular objects.

Definition 5 (Large objects and popular objects ). Let N ∗ be the time after which thecache needs to evict at least one object. For time i ∈ N ∗ . . .N , we define the sets of large objects, Bi ,and the set of popular objects, Fi .The set Bi ⊆ Hi consists of the requests to the fraction δ largest objects of Hi (0 < δ < 1). We also

define bi = |Bi |, and we write “si < Bi ” if si < sj for all j ∈ Bi and “si ≮ Bi ” otherwise.The set Fi consists of those objects k with a request probability ρk which lies above the following

threshold.

Fi =

k : ρk ≥

1

bi log logbi

(13)

Using the above definitions, we prove that “typical” objects (i.e., popular objects that are not too

large) rarely lead to non-integer decision variables as the number of objectsM grows large.

Theorem 5 (Typical objects are rarely non-integer). For i ∈ N ∗ . . .N , Bi , and Fifrom Definition 5,

P[0 < xi < 1

si < Bi ,σi ∈ Fi]→ 0 asM → ∞ . (14)

The intuition is that, as the number of cached objects grows large, it is vanishingly unlikely that

all objects in Bi will be requested before a single object is requested again. That is, though there are

not many large objects in Bi , there are enough that, following Theorem 4, xi is vanishingly unlikelyto be non-integer. The proof of Theorem 5 uses elementary probability but relies on several very

technical proofs. We sketch the proof here, and state the formal proofs in the Appendices A.3 to A.5.

Proof sketch. Following Theorem 4, it suffices to consider the event Li > TBi .

•(P[Li > TBi

]→ 0 as hi → ∞

): We first condition on Li = l , so that Li and TBi become

stochastically independent and we can bound P[Li > TBi

]by bounding either P [Li > l] or

P[l > TBi

]. Specifically, we split l carefully into “small l” and “large l”, and then show that Li

is concentrated at small l , andTBi is concentrated at large l . Hence, P[Li > TBi

]is negligible.

– (Small l :) For small l , we show that it is unlikely for all objects in Bi to have been requested

after l requests. We upper bound the distribution ofTBi withTbi ,u (Lemma 1). We then show

that the distribution ofTbi ,u decays exponentially at values below its expectation (Lemma 4).

Hence, for l far below the expectation of Tbi ,u , the probability vanishes P[Tbi ,u < l

]→ 0,

so long as bi = δhi grows large, which it does because hi grows large.– (Large l :) For large l , P [Li > l] → 0 because we only consider popular objects σi ∈ Fi byassumption, and it is highly unlikely that a popular object is not requested after many

requests.

• (hi → ∞): What remains to be shown is that the number of cached objects hi actually grows

large. Since the cache size C → ∞ as M → ∞ by Assumption 3, this may seem obvious.



Nevertheless, it must be demonstrated (Lemma 3). The basic intuition is that hi is almost

never much less than h∗ = C/maxk sk , the fewest number of objects that could fit in the

cache, and h∗ → ∞ as C → ∞.

To see why hi is almost never much less than h∗, consider the probability that hi < x ,where x is constant with respect toM . For any x , take large enoughM such that x < h∗.

Now, in order for hi < x , almost all requests must go to distinct objects. Any object that

is requested twice between u (the last time where hu ≥ h∗) and v (the next time where

hv ≥ h∗) produces an interval (see Figure 9). This interval is guaranteed to fit in the cache,

since hi < x < h∗ means there is space for an object of any size. As h∗ andM grow further,

the amount of cache resources that must lay unused for hi < x grows further and further,

and the probability that no interval fits within these resources becomes negligible.

h*

i

x

#C

ach

ed O

bje

cts

Timeu v

1st request 2nd request

Fig. 9. The number of cached objects at time i , hi , is unlikely to be far below h∗ = C/maxk sk , the fewestnumber of objects that fit in the cache. In order for hi < h∗ to happen, no other interval must fit in the whitetriangular space centered at i (otherwise FOO-L would cache the interval).

5.5 Bringing it all together: Proof of Theorem 2This section combines our results so far and shows how to obtain Theorem 2: There existsM∗ suchthat for anyM > M∗ and for any error ε and violation probability κ,

P [FOO-U − FOO-L ≥ ε N ] ≤ κ (15)

Proof of Theorem 2. We start by bounding the cost of non-integer solutions by the number of

non-integer solutions, Ω. ∑i : 0<xi<1

xi ≤∑i ∈I

1i : 0<xi<1 = Ω (16)

It follows that (FOO-U − FOO-L) ≤ Ω.

P [FOO-U − FOO-L ≥ ε N ] ≤ P [Ω ≥ ε N ] (17)

We apply the Markov inequality.

≤E [Ω]

N ε=

1

N ε

∑i ∈I

P [0 < xi < 1] (18)

There are at most N terms in the sum.

≤P [0 < xi < 1]

ε(19)



To complete Eq. (15), we upper bound P [0 < xi < 1] to be less than ε κ. We first condition on si < Biand σi ∈ Fi , double counting those i where si ≮ Bi and σi < Fi .

P [0 < xi < 1] ≤ P[0 < xi < 1

si < Bi ,σi ∈ Fi]P [si < Bi ,σi ∈ Fi ] (20)

+ P[0 < xi < 1

si ≮ Bi]P [si ≮ Bi ] (21)

+ P[0 < xi < 1

σi < Fi ] P [σi < Fi ] (22)

Drop ≤ 1 terms.

≤ P[0 < xi < 1

si < Bi ,σi ∈ Fi]+ P [si ≮ Bi ] + P [i < Fi ] (23)

To bound P [0 < xi < 1] ≤ ε κ, we choose parameters such that each term in Eq. (23) is less than

ε κ/3. The first term vanishes by Theorem 5. The second term is satisfied by choosing δ = ε κ/3(Definition 5). For the third term, the probability that any cached object is unpopular vanishes as higrows large.

P [i < Fi ] ≤hi

bi log logbi=

3

ε κ log log ε κ hi/3→ 0 as hi → ∞ (24)

Finally, we chooseM∗large enough that the first and third terms in Eq. (23) are each less than ε κ/3.

This concludes our theoretical proof of FOO’s optimality.

6 PRACTICAL FLOW-BASED OFFLINE OPTIMAL FOR REAL-WORLD TRACESWhile FOO is asymptotically optimal and very accurate in practice, as well as faster than prior

approximation algorithms, it is still not fast enough to process production traces with hundreds

of millions of requests in a reasonable timeframe. We now use the insights gained from FOO’s

graph-theoretic formulation to design new upper and lower bounds on OPT, which we call practicalflow-based offline optimal (PFOO). We provide the first practically useful lower bound, PFOO-L, and

an upper bound that is much tighter than prior practical offline upper bounds, PFOO-U:

PFOO-L ≤ FOO-L ≤ OPT ≤ FOO-U ≤ PFOO-U (25)

6.1 Practical lower bound: PFOO-LPFOO-L considers the total resources consumed by OPT. As Figure 10a illustrates, cache resources

are limited in both space and time [10]: measured in resources, the cost to cache an object is the

product of (i) its size and (ii) its reuse distance (i.e., the number of accesses until it is next requested).

On a trace of length N , a cache of size C has total resources N ×C . The objects cached by OPT, or

any other policy, cannot cost more total resources than this.

Definition of PFOO-L. PFOO-L sorts all intervals by their resource cost, and caches the smallest-

cost intervals up a total resource usage of N × C . Figure 10b shows PFOO-L on a short request

trace. By considering only the total resource usage, PFOO-L ignores other constraints that are faced

by caching policies. In particular, PFOO-L does not guarantee that cached intervals take less than

C space at all times, as shown by interval 6 for object a in Figure 10b, which exceeds the cache

capacity during part of its interval.

Why PFOO-L works. PFOO-L is a lower bound because no policy, including OPT or FOO-L, can

get fewer misses using N ×C total resources. It gives a reasonably tight bound because, on large

caches with many objects, the distribution of interval costs is similar throughout the request trace.

Hence, for a given cache capacity, the “marginal interval” (i.e., the one barely does not fit in the

cache under OPT) is also of similar cost throughout the trace. Informally, PFOO-L caches intervals



(a) Intervals sorted by resource cost = reuse dis-tance × object size.

(b) PFOO-L greedily claims the smallest intervalswhile not exceeding the average cache size.

Fig. 10. PFOO’s lower bound, PFOO-L, constrains the total resources used over the full trace (i.e., size ×time). PFOO-L claims the hits that require fewest resources, allowing cached objects to temporarily exceedthe cache capacity.

up to this marginal cost, and so rarely exceeds cache capacity by very much. The intuition behind

PFOO-L is thus similar to the widely used Che approximation [21], which states that LRU caches

keep objects up until some “characteristic age”. But unlike the Che approximation, which has

unbounded error, PFOO-L uses this intuition to produce a robust lower bound. This intuition holds

particularly when requests are largely independent, as in our proof assumptions and in traces from

CDNs or other Internet services. However, as we will see, PFOO-L introduces modest error even

on other workloads where these assumptions do not hold.

PFOO-L uses a similar notion of “cost” as Belady-Size, but provides two key advantages. First,

PFOO-L is closer to OPT. Relaxing the capacity constraint lets PFOO-L avoid the pathologies

discussed in Section 2.4, since PFOO-L can temporarily exceed the cache capacity to retain valuable

objects that Belady-Size is forced to evict. Second, relaxing the capacity constraint makes PFOO-L

a lower bound, giving the first reasonably tight lower bound on long traces.

6.2 Practical upper bound: PFOO-UDefinition of PFOO-U. PFOO-U breaks FOO’s min-cost flow graph into smaller segments of

constant size, and then solves each segment using min-cost flow incrementally. By keeping track of

the resource usage of already solved segments, PFOO-U yields a globally-feasible solution, which is

an upper bound on OPT or FOO-U.

Example of PFOO-U. Figure 11 shows our approach on the trace from Figure 2 for a cache capacity

of 3. At the top is FOO’s full min-cost flow problem; for large traces, this MCF problem is too

expensive to solve directly. Instead, PFOO-U breaks the trace into segments and constructs a

min-cost flow problem for each segment.

PFOO-U begins by solving the min-cost flow for the first segment. In this case, the solution is to

cache both bs, c, and one-third of a, since these decisions incur the minimum cost of two-thirds,

i.e., less than one cache miss. As in FOO-U, PFOO-U rounds down the non-integer decision for

a and all following non-integer decisions. Furthermore, PFOO-U only fixes decisions for objects

in the first half of this segment. This is done to capture interactions between intervals that cross

segment boundaries. Hence, PFOO-U “forgets” the decision to cache c and the second b, and its

final decision for this segment is only to cache the first b interval.

PFOO-U then updates the second segment to account for its previous decisions. That is, since b

is cached until the second request to b, capacity must be removed from the min-cost flow to reflect

this allocation. Hence, the capacity along the inner edge c → b is reduced from 3 to 2 in the second

segment (b is size 1). Solving the second segment, PFOO-U decides to cache c and b (as well as half



a b c b d a c d a b b a

Segment 1

Segment 2

Segment 3

Segment 4

Segment 5

(a) PFOO-U breaks the trace into small, overlapping segments . . .

a b c b

(3, 0) (3, 0) (3, 0) (3, 0)

(3, 1/3)

(1, 1)

(1, 1)

(1, 1)

Segment 1. Cache both bs and

c; forget c and second b.

c b d a(2, 0)

(2, 1/2)

(3, 0) (3, 0) (3, 0)

(3, 1/3)

(1, 1)

(1, 1)

Segment 2. Cache c and b.

Segment 3.

Cache c; forget c.d a c d

(3, 1/3)

(1, 0) (1, 0)

(1, 1)

(2, 1/2)

(2, 0) (2, 0)

(2, 1/2) . . .

(b) . . . and solves min-cost flow for each segment.

Fig. 11. Starting from FOO’s full formulation, PFOO-U breaks the min-cost flow problem into overlappingsegments. Going left-to-right through the trace, PFOO-U optimally solves MCF on each segment, and updateslink capacities in subsequent segments to maintain feasibility for all cached objects. The segments overlap tocapture interactions across segment boundaries.

of d, which is ignored). Since these are in the first half of the segment, we fix both decisions, and

move onto the third segment, updating the capacity of edges to reflect these decisions as before.

PFOO-U continues to solve the following segments in this manner until the full trace is processed.

On the trace from Section 3, it decides to cache all requests to b and c, yielding 5 misses on the

requests to a and d. These are the same decisions as taken by FOO-U and OPT. We generally find

that PFOO-U yields nearly identical miss ratios as FOO-U, as we next demonstrate on real traces.

6.3 SummaryPutting it all together, PFOO provides efficient lower and upper bounds on OPT with variable

object sizes. PFOO-L runs in O(N logN ) time, as required to sort the intervals; and PFOO-U runs

in O(N ) because it divides the min-cost flow into segments of constant size. In practice, PFOO-L is

faster than PFOO-U at realistic trace lengths, despite its worse asymptotic runtime, due to the large

constant factor in solving each segment in PFOO-U.

PFOO-L gives a lower bound (PFOO-L ≤ FOO-L, Eq. (25)) because PFOO-L caches all of the

smallest intervals, so no policy can get more hits (i.e., cache more intervals) without exceeding



N ×C overall resources. Resources are a hard constraint obeyed by all policies, including FOO-L,

whose resources are constrained by the capacity C of all N − 1 inner edges.

PFOO-U gives an upper bound (FOO-U ≤ PFOO-U, Eq. (25)) because FOO-U and PFOO-U

both satisfy the capacity constraint and the integrality constraint of the ILP formulation of OPT

(Definition 1). By sequentially optimizing the MCF segment after segment, we induce additional

constraints on PFOO-U, which can lead to suboptimal solutions.

7 EXPERIMENTAL METHODOLOGYWe evaluate FOO and PFOO against prior offline bounds and online caching policies on eight

different production traces.

7.1 Trace characterizationWe use production traces from three global content-distribution networks (CDNs), two web-

applications from different anonymous large Internet companies, and storage workloads from

Microsoft [79]. We summarize the trace characteristics in Table 3. The table shows that our traces

typically span several tens to hundreds of million of requests and tens of millions of objects.

Figure 12 shows four key distributions of these workloads.

Trace Year # Requests # Objects Object sizes

CDN 1 2016 500M 18M 10 B – 616MB

CDN 2 2015 440M 19M 1B – 1.5 GB

CDN 3 2015 420M 43M 1B – 2.3 GB

WebApp 1 2017 104M 10M 3B – 1.9MB

WebApp 2 2016 100M 14M 5B – 977KB

Storage 1 2008 29M 16M 501 B – 780 KB

Storage 2 2008 37M 6M 501 B – 78KB

Storage 3 2008 45M 14M 501 B – 489 KB

Table 3. Overview of key properties of the production traces used in our evaluation.

The object size distribution (Figure 12a) shows that object sizes are variable in all traces. However,

while they span almost ten orders of magnitude in CDNs, object sizes vary only by six orders of

magnitude in web applications, and only by three orders of magnitude in storage systems. WebApp

1 also has noticeably smaller object sizes throughout, as is representative for application-cache

workloads.

The popularity distribution (Figure 12b) shows that CDN workloads and WebApp workloads

all follow approximately a Zipf distribution with α between 0.85 and 1. In contrast, the popularity

distribution of storage traces is much more irregular with a set of disproportionally popular objects,

an approximately log-linear middle part, and an exponential cutoff for the least popular objects.

The reuse distance distribution (Figure 12c) — i.e., the distribution of the number of requests

between requests to the same object — further distinguishes CDN and WebApp traces from stor-

age workloads. CDNs and WebApps serve millions of different customers and so exhibit largely

independent requests with smoothly diminishing object popularities, which matches our proof

assumptions. Thus, the CDN and WebApp traces lead to a smooth reuse distance, as shown in

the figure. In contrast, storage workloads serve requests from one or a few applications, and so

often exhibit highly correlated requests (producing spikes in the reuse distance distribution). For



example, scans are common in storage (e.g., traces like: ABCDABCD ...), but never seen in CDNs.

This is evident from the figure, where the storage traces exhibit several steps in their cumulative

request probability, as correlated objects (e.g., due to scans) have the same reuse distance.

Finally, we measure the correlation across different objects (Figure 12d). Ideally, we could directly

test our independence assumption (Assumption 2). Unfortunately, quantifying independence on

real traces is challenging. For example, classical methods such as Hoeffding’s independence test

[47] only apply to continuous distributions, whereas we consider cache requests in discrete time.

0.25

0.50

0.75

1.00

1B 1KB 1MB 1GB

Object Size

Cu

mu

lativ

e D

istr

ibu

tio

n

CDN 1 CDN 2

(a) Object sizes.

1

100

10000

1e+06

1 100 10000 1e+06

i−th Most Popular Object

Request

Count

CDN 3 WebApp 1

(b) Popularities.

0.00

0.25

0.50

0.75

1.00

1 100 10000 1e+06 1e+08

Reuse Distance

Request

Pro

babili

ty

WebApp 2 Storage 1

(c) Reuse distances.

0.00

0.25

0.50

0.75

1.00

−1.0 −0.5 0.0 0.5 1.0

Correlation Coefficient

Cu

mu

lativ

e D

istr

ibu

tion

Storage 2 Storage 3

(d) Correlations.

Fig. 12. The production traces used in our evaluation comes from three different domains (CDNs, WebApps,and storage) and thus exhibit starkly different request patterns. (a) The object size distribution in CDN tracesspace more than nine orders of mangitude, where object sizes in WebApp and Storage traces are much smaller.(b) Object popularities in CDNs and WebApps follow approximately Zipf popularities, whereas the popularitydistribution for storage traces is more complex. (c) The reuse distance distribution of CDNs and WebApps issmooth, whereas in the Storage there are jumps – indicating correlated request sequences such as scans. (d)The distribution of Pearson correlation coffecients for the top 10k-most popular objects shows that CDN andWebApp traces do not exhibit linear correlations, whereas the three storage traces show distince patterns withsignificant positive correlation. One of the storage traces (Storage 2) in fact shows a correlation coefficientgreater than 0.5 for all 10k-most popular objects.

We therefore turn to correlation coefficients. Specifically, we use the Pearson correlation coeffi-

cient as it is computable in linear time (as opposed to Spearman’s rank and Kendall’s tau [30]). We

define the coefficient based on the number of requests each object receives in a time bucket that

spans 4000 requests (we verified that the results do not change significantly for time bucket sizes

in the range 400 - 40k requests). In order to capture pair-wise correlations, we chose the top 10k

objects in each trace, calculated the request counts for all time buckets, and then calculated the

Pearson coefficient for all possible combinations of object pairs.

Figure 12d shows the distribution of coefficients for all object pair combinations. We find that

both CDN and WebApps do not show a significant correlation; over 95% of the object pairs have a

coefficient coefficient between -0.25 and 0.25. In contrast, we find strong positive correlations in

the storage traces. For the first storage trace, we measure a correlation coefficient greater than 0.5for all 10k-most popular objects. For the second storage trace, we measure a correlation coefficient

greater than 0.5 for more than 20% of the object pairs. And, for the third storage trace, we measure

a correlation coefficient greater than 0.5 for more than 14% of the object pairs. We conclude that the

simple Pearson correlation coefficient is sufficiently powerful to quantify the correlation inherent

to storage traces (such as loops and scans).



7.2 Caching policiesWe evaluate three classes of policies: theoretical bounds on OPT, practical offline heuristics, and

online caching policies. Besides FOO, there exist three other theoretical bounds on OPT with

approximation guarantees: OFMA, LocalRatio, and LP (Section 2.3). Besides PFOO-U, we consider

three other practical upper bounds: Belady, Belady-Size, and Freq/Size (Section 2.4). Besides PFOO-L,

there is only one other practical lower bound: a cache with infinite capacity (Infinite-Cap). Finally,

for online policies, we evaluated GDSF [22], GD-Wheel [60], AdaptSize [16], Hyperbolic [17],

and several other older policies which perform much worse on our traces (including LRU-K [71],

TLFU [34], SLRU [48], and LRU).

Our implementations are in C++ and use the COIN-OR::LEMON library [70], GNU parallel [81],

OpenMP [27], and CPLEX 12.6.1.0. OFMA runs in O(N 2), LocalRatio runs in O(N 3), Belady in

O(N logC). We rely on sampling [74] to run Belady-Size on large traces, which gives us an O(N )

implementation. We will publicly release our policy implementations and evaluation infrastructure

upon publication of this work. To the best of our knowledge, these include the first implementations

of the prior theoretical offline bounds.

7.3 Evaluation metricsWe compare each policy’smiss ratio, which ranges from 0 to 1. We present absolute error as unitless

scalars and relative error as percentages. For example, if FOO-U’s miss ratio is 0.20 and FOO-L’s is

0.15, then absolute error is 0.05 and relative error is 33%.

We present a miss ratio curve for each trace, which shows the miss ratio achieved by different

policies at different cache capacities. We present these curves in log-linear scale to study a wide

range of cache capacities. Miss ratio curves allow us to compare miss ratios achieved by different

policies at fixed cache capacities, as well as compute cache capacities that achieve equivalent miss

ratios.

8 EVALUATIONWe evaluate FOO and PFOO to demonstrate the following: (i) PFOO is fast enough to process real

traces, whereas FOO and prior theoretical bounds are not; (ii) FOO yields nearly tight bounds on

OPT, even when our proof assumptions do not hold; (iii) PFOO is highly accurate on full production

traces; and (iv) PFOO reveals that there is significantly more room for improving current caching

systems than implied by prior offline bounds.

8.1 PFOO is necessary to process real tracesFigure 13 shows the execution time of FOO, PFOO, and prior theoretical offline bounds at different

trace lengths. Specifically, we run each policy on the first N requests of the CDN 1 trace, and vary

N from a few thousand to over 30 million. Each policy ran alone on a 2016 SuperMicro server with

44 Intel Xeon E5-2699 cores and 500GB of memory.

These results show that LP and LocalRatio are unusable: they can process only a few hundred

thousand requests in a 24-hour period, and their execution time increases rapidly as traces lengthen.

While FOO and OFMA are faster, they both take more than 24 hours to process more than 30

million requests, and their execution times increase super-linearly.

Finally, PFOO is much faster and scales well, allowing us to process traces with hundreds of

millions of requests. PFOO’s lower bound completes in a few minutes, and while PFOO’s upper

bound is slower, it scales linearly with trace length. PFOO is thus the only bound that completes inreasonable time on real traces.



Fig. 13. Execution time of FOO, PFOO, and prior theoretical offline bounds at different trace lengths. Mostprior bounds are unusable above 500 K requests. Only PFOO can process real traces with many millions ofrequests.

8.2 FOO is nearly exact on short tracesWe compare FOO, PFOO, and prior theoretical upper bounds on the first 10 million requests of

each trace. Of the prior theoretical upper bounds, only OFMA runs in a reasonable time at this

trace length,4so we compare FOO, PFOO, OFMA, the Belady variants, Freq/Size, and Infinite-Cap.

Our first finding is that FOO-U and FOO-L are nearly identical, as predicted by our analysis.

The largest difference between FOO-U’s and FOO-L’s miss ratio on CDN and WebApp traces is

0.0005—a relative error of 0.15%. Even on the storage traces, where requests are highly correlated

and hence our proof assumptions do not hold, the largest difference is 0.0014—a relative error

of 0.27%. Compared to the other offline bounds, FOO is at least an order of magnitude and often

several orders of magnitude more accurate.

Given FOO’s high accuracy, we use FOO to estimate the error of the other offline bounds.

Specifically, we assume that OPT lies in the middle between FOO-U and FOO-L. Since the difference

between FOO-U and FOO-L is so small, this adds negligible error (less than 0.14%) to all other

results.

Figure 14 shows the maximum error fromOPT across five cache sizes on our first CDN production

trace. All upper bounds are shown with a bar extending above OPT, and all lower bounds are shown

with a bar extending below OPT. Note that the practical offline upper bounds (e.g., Belady) do not

have corresponding lower bounds. Likewise, there is no upper bound corresponding to Infinite-Cap.

Also note that OFMA’s bars are so large than they extend above and below the figure. We have

therefore annotated each bar with its absolute error from OPT.

The figure shows that FOO-U and FOO-L nearly coincide, with error of 0.00003 (=3e−5) on this

trace. PFOO-U is 0.00005 (=5e−5) above OPT, nearly matching FOO-U, and PFOO-L is 0.007 below

OPT, which is very accurate though worse than FOO-L.

All prior techniques yield error several orders of magnitude larger. OFMA has very high error:

its bounds are 0.72 above and 0.39 below OPT. The practical upper bounds are more accurate than

OFMA: Belady is 0.18 above OPT, Belady-Size 0.05, and Freq/Size 0.05. Finally, Infinite-Cap is 0.19

below OPT. Prior to FOO and PFOO, the best bounds for OPT give a broad range of up to 0.24.

PFOO and FOO reduce error by 34× and 4000×, respectively.

4While we have tried downsampling the traces to run LP and LocalRatio (as suggested in [11, 57, 84] for objects with equal

sizes), we were unable to achieve meaningful results. Under variable object sizes, scaling down the system (including the

cache capacity), makes large objects disproportionately disruptive.



Fig. 14. Comparison of the maximum approximation errorof FOO, PFOO, and prior offline upper and lower boundsacross five cache sizes on a CDN production trace.FOO’s upper and lower bounds are nearly identical, witha gap of less than 0.0005. We therefore assume that OPTis halfway between FOO-U and FOO-L, which introducesnegligible error due to FOO’s high accuracy.PFOO likewise introduces small error, with PFOO-U havinga similar accuracy to FOO-U. PFOO-L leads to higher errorsthan FOO-L but is still highly accurate, within 0.007. Incontrast, all prior policies lead to errors several orders-of-magnitude larger.

0.1

8

0.0

5

3e−

05

5e−

05

0.7

2

0.0

5

n/a

−7e−

03

−3e−

05

−0.1

9

−0.3

9

n/a

n/a

n/a

CDN 1

FO

O

PF

OO

OF

MA

Bela

dy

Bela

dy−

Siz

e

Fre

q/S

ize

Infin

ite−

Cap

−0.2

−0.1

OPT

0.1

0.2

Maxim

um

Err

or

in M

iss

Ratio

upper b

ound

slower b

ou

nd

s

Figures 15 and 16 show the approximation error on all eight production traces. The prior upper

bounds are much worse than PFOO-U, except on one trace (Storage 1), where Belady-Size and

Freq/Size are somewhat accurate. Averaging across all traces, PFOO-U is 0.0014 above OPT. PFOO-U

reduces mean error by 37× over Belady-Size, the best prior upper bound. Prior work gives even

weaker lower bounds. PFOO-L is on average 0.004 below OPT on the CDN traces, 0.02 below OPT

on the WebApp traces, and 0.04 below OPT on the storage traces. PFOO-L reduces mean error by

9.8× over Infinite-Cap and 27× over OFMA. Hence, across a wide range of workloads, PFOO is by

far the best practical bound on OPT.

8.3 PFOO is accurate on real tracesNow that we have seen that FOO is accurate on short traces, we next show that PFOO is accurate

on long traces. Figure 17 shows the miss ratio over the full traces achieved by PFOO, the best

prior practical upper bounds (the best of Belady, Belady-Size, and Freq/Size), the Infinite-Cap lower

bound, and the best online policy (see Section 7).

On average, PFOO-U and PFOO-L bound the optimal miss ratio within a narrow range of 4.2%.

PFOO’s bounds are tighter on the CDN and WebApp traces than the storage traces: PFOO gives an

average bound of just 1.4% on CDN 1-3 and 1.3% on WebApp 1-2 but 5.7% on Storage 1-3. This is

likely due to error in PFOO-L when requests are highly correlated, as they are in the storage traces.

Nevertheless, PFOO gives much tighter bounds than prior techniques on every trace. The prior

offline upper bounds are noticeably higher than PFOO-U. On average, compared to PFOO-U,

Belady-Size is 19% higher, Freq/Size is 22% higher, and Belady is fully 72% higher. These prior upper

bounds are not, therefore, good proxies for the offline optimal. Moreover, the best upper bound

varies across traces: Freq/Size is lower on CDN 1 and CDN 3, but Belady-Size is lower on the others.

Unmodified Belady gives a very poor upper bound, showing that caching policies must account for

object size. The only lower bound in prior work is an infinitely large cache, whose miss ratio is

much lower than PFOO-L. PFOO thus gives the first reasonably tight bounds on the offline miss ratiofor real traces.



0.2

2

0.0

5

1e−

04

2e−

04

0.6

6

0.0

4

n/a

−2e−

03

−1e−

04

−0.0

6

−0.1

5

n/a

n/a

n/a

0.3

2

0.0

6

2e−

04

2e−

03

0.6

6

0.0

4

n/a

−2e−

03

−2e−

04

−0.0

4

−0.2

6

n/a

n/a

n/a

0.0

8

0.0

7

2e−

04

0.5

9

0.2

5

4e−

04

n/a

−0.0

2

−2e−

04

−0.5

7

−0.7

9

n/a

n/a

n/a

CDN 2 CDN 3 WebApp 1

FO

O

PF

OO

OF

MA

Bela

dy

Bela

dy−

Siz

e

Fre

q/S

ize

Infin

ite−

Cap

FO

O

PF

OO

OF

MA

Bela

dy

Bela

dy−

Siz

e

Fre

q/S

ize

Infin

ite−

Cap

FO

O

PF

OO

OF

MA

Bela

dy

Bela

dy−

Siz

e

Fre

q/S

ize

Infin

ite−

Cap

0.1

8

0.0

5

3e−

05

5e−

05

0.7

2

0.0

5

n/a

−7e−

03

−3e−

05

−0.1

9

−0.3

9

n/a

n/a

n/a

CDN 1F

OO

PF

OO

OF

MA

Bela

dy

Bela

dy−

Siz

e

Fre

q/S

ize

Infin

ite−

Cap

−0.2

−0.1

OPT

0.1

0.2

Ma

xim

um

Err

or

in M

iss

Ra

tio

upper b

oundslower b

ou

nds

Fig. 15. Approximation error of FOO, PFOO, and several prior offline bounds on four of our eight productiontraces (Figure 16 shows the other four). FOO and PFOO’s lower and upper bounds are orders of magnitudebetter than any other offline bound. (See Figure 14.)

0.2

5

0.0

8

3e−

04

0.5

8

0.1

3

3e−

03

n/a

−0.0

1

−3e−

04

−0.1

5

−0.4

2

n/a

n/a

n/a

0.0

3

9e−

03

1e−

06

5e−

04

0.1

9

0.0

1

n/a

−0.0

2

−1e−

06

−0.0

5

−0.8

3

n/a

n/a

n/a

0.1

3

0.0

7

7e−

04

3e−

03

0.4

8

0.0

9

n/a

−0.0

3

−7e−

04

−0.4

1

−0.8

2

n/a

n/a

n/a

0.1

2

0.0

3

9e−

06

1e−

03

0.1

2

0.0

3

n/a

−0.0

7−9e−

06

−0.1

2

−0.7

3

n/a

n/a

n/a

WebApp 2 Storage 1 Storage 2 Storage 3

FO

O

PF

OO

OF

MA

Bela

dy

Bela

dy−

Siz

e

Fre

q/S

ize

Infin

ite−

Cap

FO

O

PF

OO

OF

MA

Bela

dy

Bela

dy−

Siz

e

Fre

q/S

ize

Infin

ite−

Cap

FO

O

PF

OO

OF

MA

Bela

dy

Bela

dy−

Siz

e

Fre

q/S

ize

Infin

ite−

Cap

FO

O

PF

OO

OF

MA

Bela

dy

Bela

dy−

Siz

e

Fre

q/S

ize

Infin

ite−

Cap

upper b

oundslower b

ou

nds

−0.2

−0.1

OPT

0.1

0.2

Ma

xim

um

Err

or

in M

iss

Ra

tio

Fig. 16. Approximation error of FOO, PFOO, and several prior offline bounds on four of our eight productiontraces (Figure 15 shows the other four). FOO and PFOO’s lower and upper bounds are orders of magnitudebetter than any other offline bound. (See Figure 14.)

8.4 PFOO shows that there is significant room for improvement in online policiesFinally, we compare with online caching policies. Figure 17 show the best online policy (by average

miss ratio) and the offline bounds for each trace. We also show LRU for reference on all traces.

On all traces at most cache capacities, there is a large gap between the best online policy and

PFOO-U, showing that there remains significant room for improvement in online caching policies.



0.0

0.1

0.2

0.3

0.4

1GB 4GB 16GB 64GB 256GB

Cache Size

Mis

s R

atio

LRU

GDWheel

Freq/Size

Infinite Cap

PFOO−U

PFOO−L

(a) CDN 1

0.0

0.1

0.2


Cache Size

Mis

s R

atio

LRU

GDSF

Belady−Size

Infinite Cap

PFOO−U

PFOO−L

(b) CDN 2

0.0

0.1

0.2

0.3

0.4


Cache Size

Mis

s R

atio

LRU

GDSF

Freq/Size

Infinite Cap

PFOO−U

PFOO−L

(c) CDN 3

0.0

0.1

0.2

0.3

0.4

0.5

64MB 256MB 1GB 4GB 16GB 64GB

Cache Size

Mis

s R

atio

LRU

GDWheel

Belady−Size

Infinite Cap

PFOO−U

PFOO−L

(d) WebApp 1

0.0

0.1

0.2

0.3

0.4

16MB 64MB 256MB 1GB

Cache Size

Mis

s R

atio

LRU

GDWheel

Belady−Size

Infinite Cap

PFOO−U

PFOO−L

(e) WebApp 2

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


Cache Size

Mis

s R

atio

LRU

Hyperbolic

Belady−Size

Infinite Cap

PFOO−U

PFOO−L

(f) Storage 1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


Cache Size

Mis

s R

atio

LRU

AdaptSize

Belady−Size

Infinite Cap

PFOO−U

PFOO−L

(g) Storage 2

0.0

0.1

0.2

0.3

0.4

0.5


Cache Size

Mis

s R

atio

LRU

GDWheel

Belady−Size

Infinite Cap

PFOO−U

PFOO−L

(h) Storage 3

Fig. 17. Miss ratio curves for PFOO vs. LRU, Infinite-Cap, the best prior offline upper bound, and the bestonline policy for our eight production traces.



Moreover, this gap is much larger than prior offline bounds would suggest. On average, PFOO-U

achieves 27% fewer misses than the best online policy, whereas the best prior offline policy achieves

only 7.2% fewer misses; the miss ratio gap between online policies and offline optimal is thus 3.75×

as large as implied by prior bounds. The storage traces are the only ones where PFOO does not

consistently increase this gap vs. prior offline bounds, but even on these traces there is a large

difference at some sizes (e.g., at 64 GB in Figure 17g). On CDN and WebApp traces, the gap is much

larger.

For example, on CDN 2, GDSF (the best online policy) matches Belady-Size (the best prior offline

upper bound) at most cache capacities. One would therefore conclude that existing online policies

are nearly optimal, but PFOO-U reveals that there is in fact a large gap between GDSF and OPT on

this trace, as it is 21% lower on average (refer back to Figure 1).

These miss ratio reductions make a large difference in real systems. For example, on CDN 2,

CDN 3, and WebApp 1, OPT requires just 16GB to match the miss ratio of the best prior offline

bound at 64GB (the x-axis in these figures is shown in log-scale). Prior bounds thus suggest that

online policies require 4× as much cache space as is necessary.

9 CONCLUSION AND FUTUREWORKWe began this paper by asking: Should the systems community continue trying to improve miss ratios,or have all achievable gains been exhausted? We have answered this question by developing new

techniques, FOO and PFOO, to accurately and quickly estimate OPT with variable object sizes. Our

techniques reveal that prior bounds for OPT lead to qualitatively wrong conclusions about the

potential for improving current caching systems. Prior bounds indicate that current systems are

nearly optimal, whereas PFOO reveals that misses can be reduced by up to 43%.

Moreover, since FOO and PFOO yield constructive solutions, they offer a new path to improve

caching systems. While it is outside the scope of this work, we plan to investigate adaptive caching

systems that tune their parameters online by learning from PFOO, e.g., by recording a short window

of past requests and running PFOO to estimate OPT’s behavior in real time. For example, AdaptSize,

a caching system from 2017, assumes that admission probability should decay exponentially with

object size. This is unjustified, and we can learn an optimized admission function from PFOO’s

decisions.

This paper gives the first principled way to evaluate caching policies with variable object sizes:

FOO gives the first asymptotically exact, polynomial-time bounds on OPT, and PFOO gives the

first practical and accurate bounds for long traces. Furthermore, our results are verified on eight

production traces from several large Internet companies, including CDN, web application, and

storage workloads, where FOO reduces approximation error by 4000×. We anticipate that FOO and

PFOO will prove important tools in the design of future caching systems.

10 ACKNOWLEDGMENTSWe thank Jens Schmitt, our shepherd Niklas Carlsson, and our anonymous reviewers for their

valuable feedback.

Daniel S. Berger was supported by the Mark Stehlik Postdoctoral Teaching Fellowship of the

School of Computer Science at Carnegie Mellon University. Nathan Beckmann was supported by a

Google Faculty Research Award. Mor Harchol-Balter was supported by NSF-CMMI-1538204 and

NSF-XPS-1629444.

REFERENCES[1] Marc Abrams, C. R. Standridge, Ghaleb Abdulla, S. Williams, and Edward A. Fox. 1995. Caching Proxies: Limitations

and Potentials. Technical Report. Virginia Polytechnic Institute & State University Blacksburgh, VA.



[2] Alfred V Aho, Peter J Denning, and Jeffrey D Ullman. 1971. Principles of optimal page replacement. J. ACM 18, 1

(1971), 80–93.

[3] Ravindra K Ahuja, Thomas L Magnanti, and James B Orlin. 1993. Network flows: theory, algorithms, and applications.Prentice hall.

[4] Susanne Albers, Sanjeev Arora, and Sanjeev Khanna. 1999. Page replacement for general caching problems. In SODA.31–40.

[5] Emmanuelle Anceaume, Yann Busnel, and Bruno Sericola. 2015. New results on a generalized coupon collector problem

using Markov chains. Journal of Applied Probability 52, 2 (2015), 405–418.

[6] Omri Bahat and Armand M Makowski. 2003. Optimal replacement policies for nonuniform cache objects with optional

eviction. In IEEE INFOCOM. 427–437.

[7] Amotz Bar-Noy, Reuven Bar-Yehuda, Ari Freund, Joseph Naor, and Baruch Schieber. 2001. A unified approach to

approximating resource allocation and scheduling. J. ACM 48, 5 (2001), 1069–1090.

[8] Ruben Becker, Maximilian Fickert, and Andreas Karrenbauer. [n. d.]. A Novel Dual Ascent Algorithm for Solving theMin-Cost Flow Problem. Chapter 12, 151–159.

[9] Ruben Becker and Andreas Karrenbauer. 2013. A Combinatorial O(m 3/2)-time Algorithm for the Min-Cost Flow

Problem. arXiv preprint arXiv:1312.3905 (2013).[10] Nathan Beckmann, Haoxian Chen, and Asaf Cidon. 2018. LHD: Improving Hit Rate by Maximizing Hit Density. In

USENIX NSDI. 1–14.[11] Nathan Beckmann and Daniel Sanchez. 2015. Talus: A Simple Way to Remove Cliffs in Cache Performance. In IEEE

HPCA. 64–75.[12] Nathan Beckmann and Daniel Sanchez. 2016. Modeling cache behavior beyond LRU. In IEEE HPCA. 225–236.[13] L. A. Belady. 1996. A study of replacement algorithms for a virtual-storage computer. IBM Systems journal 5 (1996),

78–101.

[14] Daniel S. Berger, Philipp Gland, Sahil Singla, and Florin Ciucu. 2014. Exact analysis of TTL cache networks. Perform.Eval. 79 (2014), 2 – 23. Special Issue: Performance 2014.

[15] Daniel S Berger, Sebastian Henningsen, Florin Ciucu, and Jens B Schmitt. 2015. Maximizing cache hit ratios by variance

reduction. ACM SIGMETRICS Performance Evaluation Review 43, 2 (2015), 57–59.

[16] Daniel S. Berger, Ramesh Sitaraman, and Mor Harchol-Balter. 2017. AdaptSize: Orchestrating the Hot Object Memory

Cache in a CDN. In USENIX NSDI. 483–498.[17] Aaron Blankstein, Siddhartha Sen, and Michael J Freedman. 2017. Hyperbolic Caching: Flexible Caching for Web

Applications. In USENIX ATC. 499–511.[18] Béla Bollobás and Graham Brightwell. 1992. The height of a random partial order: concentration of measure. The

Annals of Applied Probability (1992), 1009–1018.

[19] Lee Breslau, Pei Cao, Li Fan, Graham Phillips, and Scott Shenker. 1999. Web caching and Zipf-like distributions:

Evidence and implications. In IEEE INFOCOM. 126–134.

[20] PJ Burville and JFC Kingman. 1973. On a model for storage and search. Journal of Applied Probability (1973), 697–701.

[21] Hao Che, Ye Tung, and Zhijun Wang. 2002. Hierarchical web caching systems: Modeling, design and experimental

results. IEEE JSAC 20 (2002), 1305–1314.

[22] Ludmila Cherkasova. 1998. Improving WWW proxies performance with greedy-dual-size-frequency caching policy.Technical Report. Hewlett-Packard Laboratories.

[23] Marek Chrobak, Gerhard J Woeginger, Kazuhisa Makino, and Haifeng Xu. 2012. Caching is hard—even in the fault

model. Algorithmica 63 (2012), 781–794.[24] Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh, and Sachin Katti. 2016. Cliffhanger: Scaling Performance Cliffs in

Web Memory Caches. In USENIX NSDI. 379–392.[25] Edward Grady Coffman and Peter J Denning. 1973. Operating systems theory. Prentice-Hall.[26] Edward Grady Coffman and Predrag Jelenković. 1999. Performance of the move-to-front algorithm with Markov-

modulated request sequences. Operations Research Letters 25 (1999), 109–118.[27] Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming.

IEEE Computational Science and Engineering 5, 1 (1998), 46–55.

[28] Samuel I Daitch and Daniel A Spielman. 2008. Faster approximate lossy generalized flow via interior point algorithms.

In ACM STOC. 451–460.[29] Asit Dan and Don Towsley. 1990. An Approximate Analysis of the LRU and FIFO Buffer Replacement Schemes. In

ACM SIGMETRICS. 143–152.[30] Wayne W. Daniel. 1978. Applied nonparametric statistics. Houghton Mifflin.

[31] Robert P Dobrow and James Allen Fill. 1995. The move-to-front rule for self-organizing lists with Markov dependent

requests. In Discrete Probability and Algorithms. Springer, 57–80.



[32] Aristides V Doumas and Vassilis G Papanicolaou. 2012. The coupon collector’s problem revisited: asymptotics of the

variance. Advances in Applied Probability 44, 1 (2012), 166–195.

[33] Ding-Zhu Du and Panos M Pardalos. 2013. Handbook of combinatorial optimization (2 ed.). Springer.

[34] Gil Einziger and Roy Friedman. 2014. Tinylfu: A highly efficient cache admission policy. In IEEE Euromicro PDP.146–153.

[35] William Feller. 2008. An introduction to probability theory and its applications. Vol. 2. John Wiley & Sons.

[36] Andrés Ferragut, Ismael Rodríguez, and Fernando Paganini. 2016. Optimizing TTL caches under heavy-tailed demands.

In ACM SIGMETRICS. 101–112.[37] James Allen Fill and Lars Holst. 1996. On the distribution of search cost for the move-to-front rule. Random Structures

& Algorithms 8 (1996), 179–186.[38] Philippe Flajolet, Daniele Gardy, and Loÿs Thimonier. 1992. Birthday paradox, coupon collectors, caching algorithms

and self-organizing search. Discrete Applied Mathematics 39 (1992), 207–229.[39] Lukáš Folwarczny and Jiří Sgall. 2017. General caching is hard: Even with small pages. Algorithmica 79, 2 (2017),

319–339.

[40] Christine Fricker, Philippe Robert, and James Roberts. 2012. A versatile and accurate approximation for LRU cache

performance. In ITC. 8–16.[41] Massimo Gallo, Bruno Kauffmann, Luca Muscariello, Alain Simonian, and Christian Tanguy. 2012. Performance

evaluation of the random replacement policy for networks of caches. In ACM SIGMETRICS/ PERFORMANCE. 395–396.[42] Nicolas Gast and Benny Van Houdt. 2015. Transient and steady-state regime of a family of list-based cache replacement

algorithms. In ACM SIGMETRICS. 123–136.[43] Erol Gelenbe. 1973. A unified approach to the evaluation of a class of replacement algorithms. IEEE Trans. Comput.

100 (1973), 611–618.

[44] Binny S Gill. 2008. On multi-level exclusive caching: offline optimality and why promotions are better than demotions.

In USENIX FAST. 4–18.[45] WJ Hendricks. 1972. The stationary distribution of an interesting Markov chain. Journal of Applied Probability (1972),

231–233.

[46] Peter Hillmann, Tobias Uhlig, Gabi Dreo Rodosek, and Oliver Rose. 2016. Simulation and optimization of content

delivery networks considering user profiles and preferences of internet service providers. In IEEE Winter SimulationConference. 3143–3154.

[47] Wassily Hoeffding. 1948. A non-parametric test of independence. The annals of mathematical statistics (1948), 546–557.[48] Qi Huang, Ken Birman, Robbert van Renesse, Wyatt Lloyd, Sanjeev Kumar, and Harry C Li. 2013. An analysis of

Facebook photo caching. In ACM SOSP. 167–181.[49] Stratis Ioannidis and Edmund Yeh. 2016. Adaptive caching networks with optimality guarantees. In ACM SIGMETRICS.

113–124.

[50] Sandy Irani. 1997. Page replacement with multi-size pages and applications to web caching. In ACM STOC. 701–710.[51] Akanksha Jain and Calvin Lin. 2016. Back to the future: leveraging Belady’s algorithm for improved cache replacement.

In ACM/IEEE ISCA. 78–89.[52] Predrag R Jelenković. 1999. Asymptotic approximation of the move-to-front search cost distribution and least-recently

used caching fault probabilities. The Annals of Applied Probability 9 (1999), 430–464.

[53] Predrag R Jelenković and Ana Radovanović. 2004. Least-recently-used caching with dependent requests. Theoreticalcomputer science 326 (2004), 293–327.

[54] Predrag R Jelenkovic and Ana Radovanovic. 2004. Optimizing LRU caching for variable document sizes. Combinatorics,Probability and Computing 13 (2004), 627–643.

[55] Mahesh Kallahalla and Peter J Varman. 2002. PC-OPT: optimal offline prefetching and caching for parallel I/O systems.

IEEE Trans. Comput. 51, 11 (2002), 1333–1344.[56] Brian Karrer and Mark EJ Newman. 2009. Random graph models for directed acyclic networks. Physical Review E 80, 4

(2009), 046110.

[57] Richard E. Kessler, Mark D Hill, and David A Wood. 1994. A comparison of trace-sampling techniques for multi-

megabyte caches. IEEE Trans. Comput. 43, 6 (1994), 664–675.[58] W. Frank King. 1971. Analysis of Demand Paging Algorithms. In IFIP Congress (1). 485–490.[59] Christos Koufogiannakis and Neal E Young. 2014. A nearly linear-time PTAS for explicit fractional packing and

covering linear programs. Algorithmica 70, 4 (2014), 648–674.[60] Conglong Li and Alan L Cox. 2015. GD-Wheel: a cost-aware replacement policy for key-value stores. In EUROSYS.

1–15.

[61] Suoheng Li, Jie Xu, Mihaela van der Schaar, andWeiping Li. 2016. Popularity-driven content caching. In IEEE INFOCOM.

1–9.



[62] Bruce M Maggs and Ramesh K Sitaraman. 2015. Algorithmic nuggets in content delivery. ACM SIGCOMM CCR 45

(2015), 52–66.

[63] Valentina Martina, Michele Garetto, and Emilio Leonardi. 2014. A unified approach to the performance analysis of

caching systems.. In IEEE INFOCOM. 12–24.

[64] Richard L. Mattson, Jan Gecsei, Donald R. Slutz, and Irving L. Traiger. 1970. Evaluation techniques for storage

hierarchies.. In IBM Systems journal, Vol. 9. 78–117.[65] John McCabe. 1965. On serial files with relocatable records. Operations Research 13 (1965), 609–618.

[66] Pierre Michaud. 2016. Some mathematical facts about optimal cache replacement. ACM Transactions on Architectureand Code Optimization (TACO) 13, 4 (2016), 50.

[67] John Moriarty and Peter Neal. 2008. The Generalized Coupon Collector Problem. Journal of Applied Probability 45

(2008), 621–29.

[68] Giovanni Neglia, Damiano Carra, and Pietro Michiardi. 2018. Cache policies for linear utility maximization. IEEE/ACMTransactions on Networking 26, 1 (2018), 302–313.

[69] Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C Li, Ryan McElroy, Mike Paleczny,

Daniel Peek, Paul Saab, et al. 2013. Scaling memcache at facebook. In USENIX NSDI. 385–398.[70] Egerváry Research Group on Combinatorial Optimization. 2015. COIN-OR::LEMON Library. (2015). Available at

http://lemon.cs.elte.hu/trac/lemon, accessed 10/21/17.

[71] Elizabeth J O’Neil, Patrick E O’Neil, and Gerhard Weikum. 1993. The LRU-K page replacement algorithm for database

disk buffering. ACM SIGMOD 22, 2 (1993), 297–306.

[72] Elizabeth J O’Neil, Patrick E O’Neil, and Gerhard Weikum. 1999. An optimality proof of the LRU-K page replacement

algorithm. JACM 46 (1999), 92–112.

[73] Antonis Panagakis, Athanasios Vaios, and Ioannis Stavrakakis. 2008. Approximate analysis of LRU in the case of short

term correlations. Computer Networks 52 (2008), 1142–1152.[74] Konstantinos Psounis and Balaji Prabhakar. 2001. A randomized web-cache replacement scheme. In INFOCOM 2001.

Twentieth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, Vol. 3. IEEE,1407–1415.

[75] Konstantinos Psounis, An Zhu, Balaji Prabhakar, and Rajeev Motwani. 2004. Modeling correlations in web traces and

implications for designing replacement policies. Computer Networks 45 (2004), 379–398.[76] Eliane R Rodrigues. 1995. The performance of the move-to-front scheme under some particular forms of Markov

requests. Journal of applied probability 32, 4 (1995), 1089–1102.

[77] Samta Shukla and Alhussein A Abouzeid. 2017. Optimal Device-Aware Caching. IEEE Transactions on Mobile Computing16, 7 (2017), 1994–2007.

[78] Ramesh K. Sitaraman, Mangesh Kasbekar, Woody Lichtenstein, and Manish Jain. 2014. Overlay networks: An Akamai

perspective. In Advanced Content Delivery, Streaming, and Cloud Services. John Wiley & Sons.

[79] Storage Networking Industry Association (SNIA). 2008. MSR Cambridge Traces. http://iotta.snia.org/traces/388. (2008).

[80] David Starobinski and David Tse. 2001. Probabilistic methods for web caching. Perform. Eval. 46 (2001), 125–137.[81] O. Tange. 2011. GNU Parallel - The Command-Line Power Tool. ;login: The USENIX Magazine 36, 1 (Feb 2011), 42–47.[82] Naoki Tsukada, Ryo Hirade, and Naoto Miyoshi. 2012. Fluid limit analysis of FIFO and RR caching for independent

reference model. Perform. Eval. 69 (Sept. 2012), 403–412.[83] Vijay V Vazirani. 2013. Approximation algorithms. Springer Science & Business Media.

[84] Carl Waldspurger, Trausti Saemundsson, Irfan Ahmad, and Nohhyun Park. 2017. Cache Modeling and Optimization

using Miniature Simulations. In USENIX. 487–498).[85] Weiyu Xu and A. Kevin Tang. 2011. A Generalized Coupon Collector Problem. Journal of Applied Probability 48, 4

(2011), 1081–1094.

[86] Neal E Young. 2000. Online paging against adversarially biased random inputs. Journal of Algorithms 37 (2000),

218–235.

A PROOFSA.1 Proof of equivalence of interval and classic ILP representations of OPT

Classic representation of OPT. The literature uses an integer linear program (ILP) representation

of OPT [4]. Figure 18 shows this classic ILP representation on the example trace from Section 3.

The ILP uses decision variables xi,k to track at each time i whether object k is cached or not. The

constraint on the cache capacity is naturally represented: the sum of the sizes for all cached objects

must be less than the cache capacity for every time i . Additional constraints enforce that OPT is


http://lemon.cs.elte.hu/trac/lemon

http://iotta.snia.org/traces/388


not allowed to prefetch objects (decision variables must not increase if the corresponding object is

not requested) and that the cache starts empty.

Object a b c b d a c d a b a. . .Decision

Variables x1,a x2,a x3,a x4,a x5,a x6,a x7,a x8,a x9,a x10,a x11,a

x1,b x2,b x3,b x4,b x5,b x6,b x7,b x8,b x9,b x10,b x11,bx1,c x2,c x3,c x4,c x5,c x6,c x7,c x8,c x9,c x10,c x11,cx1,d x2,d x3,d x4,d x5,d x6,d x7,d x8,d x9,d x10,d x11,d

Fig. 18. Classic ILP representation of OPT.

Lemma 2. Under Assumption 1, our ILP in Definition 1 is equivalent to the classical ILP from [4].

Proof sketch. Under Assumption 1, OPT changes the caching decision of object k only at times

i when σi = k . To see why this is true, let us consider the two cases of changing a decision variable

xk, j for i < j < ℓi . If xk,i = 0, then OPT cannot set xk, j = 1 because this would violate Assumption 1.

Similarly, if xk, j = 0, then setting xk,i = 1 does not yield any fewer misses, so we can safely assume

that xk,i = 0. Hence, decisions do not change within an interval in the classic ILP formulation.

To obtain the decision variables x ′p,i of the classical ILP formulation of OPT from a given solution

xi for the interval ILP, set x′σi , j = xi for all i ≤ j < ℓi , and for all i . This leads to an equivalent

solution because the capacity constraint is enforced at every time step.

A.2 Proof of Lemma 1This section proves Lemma 1 from Section 5.3, which bounds TB , i.e., the first time after i that allintervals in B are completed.

Recall the definitions of the generalized CCP, where coupon probabilities follow a distribution p,and the definition of the classical CCP, where coupons have equal request probability.

We will make use of the following result, which immediately follows from [5, Theorem 4, p. 415].

Theorem 6. For b ≥ 0 coupons, any probability vector p, and the equal-probability vector q =(1/b, . . . , 1/b), it holds that P

[Tb,p < l

]≤ P

[Tb,q < l

]for any l ≥ 0.

We will use this result to prove a bound onTB , which is defined asTB = maxj ∈B ℓj − i . The proofbounds TB first using a generalized CCP and then a classical CCP.

Proof of Lemma 1. We first bound TB via a generalized CCP with stopping time Tb,p and p =(p1, . . . ,pb ) with

pi = P [object k is requested | k ∈ B] under PM . (26)

AsTB contains requests to other objects j < B, it always holds thatTB ≥ Tb,p . Figure 19 shows sucha case, whereTB is extended because of requests to uncached objects and large cached objects.Tb,p ,on the other hand, does not include these other requests, and is thus always shorter or equal to TB .This inequality bounds the probabilities for all l ≥ 0.

P [TB < l] ≤ P[Tb,p < l

](27)

We then apply Theorem 6.

≤ P[Tb,q < l

](28)



xi...

xℓ... ...... ...

xn... ...

xj... ...

THi

xm... .........

coupon 2

coupon 3

coupon 1

coupon 4

Thi,p

Fig. 19. Translation of the time until all B objects are requested once, TB into a coupon-collector problem(CCP),Tb,p . As the CCP is based on fewer coupons (only objects ∈ B), the CCP serves as a lower bound onTB .

A.3 Proof of Theorem 5This section proves Theorem 5 from Section 5.4. Recall the definition of large and unpopular objects

(Definition 5). At a high level, the proof shows that the remaining, “typical” objects are almost

always part of a precedence relation. As a result, the probability of non-integer decision variables

for typical objects vanishes as the number of objectsM grows large.

Before we state the proof of Theorem 5, we introduce two auxiliary results, which are proven in

Appendices A.4 and A.5).

The first auxiliary result shows that the number of cached objects hi goes to infinity as the cache

capacity C and the number of objectsM go to infinity.

Lemma 3 (Proof in Section A.4). For i > N ∗ from Definition 5, P [hi → ∞] = 1 asM → ∞.

The second auxiliary result derives an exponential bound on the lower tail of the distribution of

the coupon collector stopping time as it gets further from its mean (roughly bi logbi ).

Lemma 4 (Proof in Section A.5). The time Tb,q to collect b > 1 coupons, which have equalprobabilities q = (1/b, . . . , 1/b), is lower bounded by

P[Tb,q ≤ b logb − c b

]< e−c for all c > 0 . (29)

With these results in place, we are ready to prove Theorem 5. This proof proceeds by using

elementary probability theory and exploits our previous definitions of Li , TBi , and Tbi ,q .

Proof of Theorem 5. We know from Theorem 4 that the probability of non-integer decision

variables can be upper bounded using the random variables Li and TBi .

P [0 < xi < 1] ≤ P[Li > TBi

](30)

We expand this expression by conditioning on Li = l .

=

∞∑l=1

P[TBi < l |Li = l

]P [Li = l] (31)

We observe that P[TBi < l

]= 0 for l ≤ bi because requesting bi distinct objects takes at least bi

time steps.

=

∞∑l=bi+1

P[TBi < l |Li = l

]P [Li = l] (32)



We use the fact that conditioned on Li = l , events Li = l and TBi < l are stochastically

independent.

=

∞∑l=bi+1

P[TBi < l

]P [Li = l] (33)

We split this sum into two parts, l ≤ Λ and l > Λ, where Λ = 1

2bi logbi is chosen such that Λ scales

slower than the expectation of the underlying coupon collector problem with bi = δhi coupons.(Recall that δ = |Bi |/|Hi | is the largest fraction of objects in Hi , defined in Definition 5.)

≤

Λ∑l=bi

P[TBi < l

]P [Li = l] (34)

+

∞∑l=Λ+1

P[TBi < l

]P [Li = l] (35)

≤

Λ∑l=bi

P[TBi < l

]+

∞∑l=Λ+1

P [Li = l] (36)

We now bound the two terms in Eq. (36), separately. For the first term, we start by applying

Lemma 1.

Λ∑l=bi

P[TBi < l

]≤

Λ∑l=bi

P[Tbi ,q ≤ l

](37)

We rearrange the sum (replacing l by c).

=

1+logbi∑c= 1

2logbi

P[Tbi ,q ≤ bi logbi − c bi

](38)

We apply Lemma 4.

<

1+logbi∑c= 1

2logbi

e−c (39)

We solve the finite exponential sum.

=e2

e2 − e

1

√bi

(40)

For the second term in Eq. (36), we use the fact that Li ’s distribution is Geometric(ρσi ) due toAssumption 2.

∞∑l=Λ+1

P [Li = l] =∞∑

l=Λ+1

(1 − ρσi

) l−1ρσi (41)

We solve the finite sum.

=(1 − ρσi

)Λ(42)



We apply Definition 5, i.e., ρσi ≥1

bi log logbi.

≤

(1 −

1

bi log logbi

) 1

2bi logbi

(43)

Finally, combining Eqs. (40) and (43) yields the following.

P[Li ≥ Tbi ,q

]<

e2

e2 − e

1

√bi+

(1 −

1

bi log logbi

)bi logbi(44)

As bi log bi grows faster than bi log logbi , this proves the statement P[Li ≥ Tbi ,q

]→ 0 as hi → ∞

(implying that bi = δhi → ∞) due to Lemma 3.

A.4 Proof of Lemma 3This section proves that, for any time i > N ∗

, the number of cached objects hi grows infinitelylarge as the number of objectsM goes to infinity. Recall that N ∗

denotes the time after which the

cache needs to evict at least one object. Throughout the proof, let E denote the complementary

event of an event E.Intuition of the proof. The proof exploits the fact that at least h∗ = C/maxk sk distinct objects fit

into the cache at any time, and that FOO finds an optimal solution (Theorem 1). Due to Assumption 3,

M → ∞ implies that C → ∞, and thus h∗ → ∞. So, the case where FOO caches only a finite

number of objects requires that hi < h∗. Whenever hi < h∗ occurs, there cannot exist intervals thatFOO could put into the cache. If any intervals could be put into the cache, FOO would cache them,

due to its optimality.

Our proof is by induction. We first show that the case of no intervals that could be put into the

cache has zero probability, and then prove this successively for larger thresholds.

Proof of Lemma 3. We assume that 0 < h∗ < M because h∗ ∈ 0,M leads to trivial hit

ratios ∈ 0, 1.For arbitrary i > N ∗

and any x that is constant in M , we consider the event X = hi ≤ x.Furthermore, let Z = hi ≤ z for any 0 ≤ z ≤ x . We prove that P [X ] vanishes as M grows by

induction over z and corresponding P [Z ].Figure 20 sketches the number of objects over a time interval including i . Note that, for any

z ≤ x , we can take a large enoughM such that z < h∗, because h∗ → ∞. So the figure shows h∗ > z.The figure also defines the time interval [u,v], where u is the last time before i when FOO cached

h∗ objects, and v is the next time after i when FOO caches h∗ objects. So, for times j ∈ (u,v), itholds that hj < h∗.

Induction base: z = 0 and event Z = hi ≤ 0. In other words, Z means the cache is empty. Let Γdenote the set of distinct objects requested in the interval (u, i]. Note that the event Z requires that,

during (u, i], FOO stopped caching all h∗ objects. Because FOO only changes caching decisions at

interval boundaries, Γ must at least contain h∗ objects. Using the same argument, we observe that

there happen at least h∗ requests to distinct objects in [i,v).A request to any Γ object in [i,v) makes Z impossible. Formally, let A denote the event that

any object in Γ is requested again in [i,v). We observe that A ⇒ Z because any Γ-object that isrequested in [i,v) must be cached by FOO due to FOO-L’s optimality (Theorem 1). By inverting the

implication we obtain Z ⇒ A and thus P [Z ] ≤ P[A].

We next upper bound P[A]. We start by observing that P [A] is minimized (and thus P

[A]is

maximized) if all objects are requested with equal popularities. This follows because, if popularities



h*

i

z

#C

ach

ed

Ob

ject

s

Timeu v

repeated -request

Fig. 20. Sketch of the event Z = hi ≤ z, which happens with vanishing probability if z is a constant withrespect toM . The times u and v denote the beginning and the end of the current period where the number ofcached objects is less than h∗ = C/maxk sk , the fewest number of objects that fit in the cache. We define theset Γ of objects that are requested in (u, i]. If any object in Γ is requested in [i,v), then FOO must cache thisobject (green interval). If such an interval exists, hi > z and thus Z cannot happen.

are not equal, popular objects are more likely to be in Γ than unpopular objects due to the popular

object’s higher sampling probability (similar to the inspection paradox). When Γ contains more

popular objects, it is more likely that we repeat a request in [i,v), and thus P [A] increases (P[A]

decreases).

We upper bound P[A]by assuming that objects are requested with equal probability ρk = 1/M

for 1 ≤ k ≤ M . As the number of Γ-objects is at least h∗, the probability of requesting any Γ-objectis at least h∗/M . Further, we know that v − i ≥ h∗ and so

P[A]≤

(1 −

h∗

M

)h∗

.

We arrive at the following bound.

P [hi ≤ z] = P [Z ] ≤ P[A]≤

(1 −

h∗

M

)h∗

−→ 0 asM → ∞ (45)

Induction step: z − 1 → z for z ≤ x . We assume that the probability of caching only z − 1 objects

goes to zero asM → ∞. We prove the same statement for z objects.As for the induction base, let Γ denote the set of distinct objects requested in the interval (u, i],

excluding objects in Hi . We observe that |Γ | ≥ h∗ − z, following a similar argument.

We define P [A] as above and use the induction assumption. As the probability of less than z − 1

is vanishingly small, it must be that hi ≥ z. Thus, a request to any Γ object in [i,v) makes hi = z

impossible. Consequently, Z ⇒ A and thus P [Z ] ≤ P[A].

To bound P [A], we focus on the requests in [i,v) that do not go Hi -objects. There are at least

h∗ − x such requests. P [A] is minimized if all objects, ignoring objects in Hi , are requested with

equal popularities. We thus upper bound P[A]by assuming the condition requests happen to

objects with equal probability ρk = 1/(M − z) for 1 ≤ k ≤ M − z. As before, we conclude that theprobability of requesting any Γ-object is at least (h∗ − z)/(M − z) and we use the fact that there are

at least h∗ − x to them in [i,v).

P [hi ≤ z] = P [Z ] ≤ P[A]

(46)

≤

(1 −

h∗ − z

M − z

)h∗−z

(47)



We then use that z ≤ x and that x is constant inM .

≤

(1 −

h∗ − x

M

)h∗−x

−→ 0 asM → ∞ (48)

In summary, the number of objects cached by FOO at an arbitrary time i remains constant only

with vanishingly small probability. Consequently, this number grows to infinity with probability

one.

A.5 Proof of Lemma 4Proof of Lemma 4. We consider the time Tb,q to collect b > 1 coupons, which have equal

probabilities q = (1/b, . . . , 1/b). To simplify notation, we set T = Tb,q throughput this proof.

We first transform our term using the exponential function, which is strictly monotonic.

P [T ≤ b logb − c b] = P[e−sT ≤ e−s(b logb−c b)

]for all s > 0 (49)

We next apply the Chernoff bound.

P[e−sT ≤ e−s(b logb−c b)

]≤ E

[e−sT

]es(b logb−c b)

(50)

To derive E[e−sT

], we observe that T =

∑bi=1Ti , where Ti is the time between collecting the

(i − 1)-th unique coupon and the i-th unique coupon. As allTi are independent, we obtain a product

of Laplace-Stieltjes transforms.

E[e−sT

]=

b∏i=1

E[e−sTi

](51)

We derive the individual transforms.

E[e−sTi

]=

∞∑k=1

e−s kpi (1 − pi )k−1

(52)

=pi

es + pi − 1

(53)

We plug the coupon probabilities pi = 1 − i−1b =

b−i+1b into Eq. (51), and simplify by reversing the

product order.

b∏i=1

E[e−sTi

]=

b∏i=1

(b − i + 1)/b

es + (b − i + 1)/b − 1

=

b∏j=1

j/b

es + j/b − 1

(54)

Finally, we choose s = 1

b , which yields es = e1/b ≥ 1 + 1/b and simplifies the product.

b∏j=1

j/b

es + j/b − 1

≤

b∏j=1

j/b

1/b + j/b=

1

b + 1(55)

This gives the statement of the lemma.

P [T ≤ b logb − c b] ≤1

b + 1eb logb−c b

b < e−c (56)

Received February 2018; revised April 2018; accepted June 2018


Documents

Practical Bounds on Optimal Caching with Variable Object Sizesdberger1/pdf/2018PracticalBound... · 2018. 7. 5. · 32 Practical Bounds on Optimal Caching with Variable Object Sizes