Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
32
Practical Bounds on Optimal Caching with Variable Object Sizes
DANIEL S. BERGER, Carnegie Mellon University
NATHAN BECKMANN, Carnegie Mellon University
MOR HARCHOL-BALTER, Carnegie Mellon University
Many recent caching systems aim to improve miss ratios, but there is no good sense among practitioners
of how much further miss ratios can be improved. In other words, should the systems community continue
working on this problem?
Currently, there is no principled answer to this question. In practice, object sizes often vary by several
orders of magnitude, where computing the optimal miss ratio (OPT) is known to be NP-hard. The few known
results on caching with variable object sizes provide very weak bounds and are impractical to compute on
traces of realistic length.
We propose a new method to compute upper and lower bounds on OPT. Our key insight is to represent
caching as a min-cost flow problem, hence we call our method the flow-based offline optimal (FOO). We prove
that, under simple independence assumptions, FOO’s bounds become tight as the number of objects goes to
infinity. Indeed, FOO’s error over 10M requests of production CDN and storage traces is negligible: at most
0.3%. FOO thus reveals, for the first time, the limits of caching with variable object sizes.
While FOO is very accurate, it is computationally impractical on traces with hundreds of millions of requests.
We therefore extend FOO to obtain more efficient bounds on OPT, which we call practical flow-based offlineoptimal (PFOO). We evaluate PFOO on several full production traces and use it to compare OPT to prior online
policies. This analysis shows that current caching systems are in fact still far from optimal, suffering 11–43%
more cache misses than OPT, whereas the best prior offline bounds suggest that there is essentially no room
for improvement.
CCS Concepts: • Theory of computation → Caching and paging algorithms; Approximation algorithmsanalysis; • General and reference → Metrics; Performance;
Additional Key Words and Phrases: Caching, optimal caching, offline optimum, limits of caching
ACM Reference Format:
Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter. 2018. Practical Bounds on Optimal Caching
with Variable Object Sizes. Proc. ACM Meas. Anal. Comput. Syst. 2, 2, Article 32 (June 2018), 37 pages. https://doi.org/10.1145/3224427
1 INTRODUCTIONCaches are pervasive in computer systems, and their miss ratio often determines end-to-end
application performance. For example, content distribution networks (CDNs) depend on large,
geo-distributed caching networks to serve user requests from nearby datacenters, and their response
time degrades dramatically when the requested content is not cached nearby. Consequently, there
has been a renewed focus on improving cache miss ratios, particularly for web services and content
Authors’ addresses: Daniel S. Berger, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, dsberger@cs.
cmu.edu; Nathan Beckmann, Carnegie Mellon University, [email protected]; Mor Harchol-Balter, Carnegie Mellon
University, [email protected].
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the
full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from [email protected].
© 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM.
2476-1249/2018/6-ART32 $15.00
https://doi.org/10.1145/3224427
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:2 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
delivery [16, 17, 24, 34, 42, 48, 60, 62, 68]. These systems have demonstrated significant miss ratio
improvements over least-recently used (LRU) caching, the de facto policy in most production
systems. But how much further can miss ratios improve? Should the systems community continueworking on this problem, or have all achievable gains been exhausted?
1.1 The problem: Finding the offline optimalTo answer these questions, onewould like to know the best achievablemiss ratio, free of constraints—
i.e., the offline optimal (OPT). Unfortunately, very little is known about OPT with variable object
sizes. For objects with equal sizes, computing OPT is simple (i.e., Belady [13, 64]), and it is widely
used in the systems community to bound miss ratios. But object sizes often vary widely in practice,
from a few bytes (e.g., metadata [69]) to several gigabytes (e.g., videos [16, 48]). We need a way to
compute OPT for variable object sizes, but unfortunately this is known to be NP-hard [23].
There has been little work on bounding OPT with variable object sizes, and all of this work gives
very weak bounds. On the theory side, prior work gives only three approximation algorithms [4,
7, 50], and the best approximation is only provably within a factor of 4 of OPT. Hence, when this
algorithm estimates a miss ratio of 0.4, OPT may lie anywhere between 0.1 and 0.4. This is a bigrange—in practice, a difference of 0.05 in miss ratio is significant—, so bounds from prior theory are
of limited practical value. From a practical perspective, there is an even more serious problem with
the theoretical bounds: they are simply too expensive to compute. The best approximation takes
24 hours to process 500 K requests and scales poorly (Section 2), while production traces typically
contain hundreds of millions of requests.
Since the theoretical bounds are incomputable, practitioners have been forced to use conservative
lower bounds or pessimistic upper bounds on OPT. The only prior lower bound is an infinitely
large cache [1, 24, 48], which is very conservative and gives no sense of how OPT changes at
different cache sizes. Belady variants (e.g., Belady-Size in Section 2.4) are widely used as an upper
bound [46, 48, 61, 77], despite offering no guarantees of optimality. While these offline bounds are
easy to compute, we will show that they are in fact far from OPT. They have thus given practitioners
a false sense of complacency, since existing online algorithms often achieve similar miss ratios to
these weak offline upper bounds.
1.2 Our approach: Flow-based offline optimalWe propose a new approach to compute bounds on OPT with variable object sizes, which we
call the flow-based offline optimal (FOO). The key insight behind FOO is to represent caching as a
min-cost flow problem. This formulation yields a lower bound on OPT by allowing non-integer
decisions, i.e., letting the cache retain fractions of objects for a proportionally smaller reward. It also
yields an upper bound on OPT by ignoring all non-integer decisions. Under simple independence
assumptions, we prove that the non-integer decisions become negligible as the number of objects
goes to infinity, and thus the bounds are asymptotically tight.
Our proof is based on the observation that an optimal policy will strictly prefer some requests
over others, forcing integer decisions. We show such preferences apply to almost all requests by
relating such preferences to the well-known coupon collector problem.
Indeed, FOO’s error over 10M requests of five production CDN and web application traces is
negligible: at most 0.15%. Even on storage traces, which have highly correlated requests that violate
our proof’s independence assumptions, FOO’s error remains below 0.3%. FOO thus reveals, for the
first time, the limits of caching with variable object sizes.
While FOO is very accurate, it is too computationally expensive to apply directly to production
traces containing hundreds of millions of requests. To extend our analysis to such traces, we develop
more efficient upper and lower bounds on OPT, which we call practical flow-based offline optimal
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:3
(PFOO). PFOO enables the first analysis of optimal caching on traces with hundreds of millions of
requests, and reveals that there is still significant room to improve current caching systems.
1.3 Summary of resultsPFOO yields nearly tight bounds, as shown in Figure 1. This figure plots the miss ratio obtained on
a production CDN trace with over 400 million requests for several techniques: online algorithms
(LRU, GDSF [22], and AdaptSize [16]), prior offline upper bounds (Belady and Belady-Size), the
only prior offline lower bound (Infinite-Cap), and our new upper and lower bounds on OPT (PFOO).
The theoretical bounds, including both prior work and FOO, are too slow to run on traces this long.
0.0
0.1
0.2
LRUAdaptSize
GDSFBelady
Belady-Size
Mis
s R
atio 22%60%
51%
0.43 0.35
InfiniteCache
Online Offline
PFOO
U U U
upperboundU
L L
lowerboundL
Fig. 1. Miss ratios on a production CDN trace for a4GB cache. Prior to our work, the best prior upperbound on OPT is within 1% of online algorithms onthis trace, leading to the false impression that there isno room for improvement. The only prior lower boundon OPT (Infinite-Cap) is 60% lower than the best upperbound on OPT (Belady-Size). By contrast, PFOO pro-vides nearly tight upper and lower bounds, which are22% below the online algorithms. PFOO thus showsthat there is actually significant room for improvingcurrent caching algorithms.
Figure 1 illustrates the two problems with prior practical bounds and how PFOO improves upon
them. First, prior bounds on OPT are very weak. The miss ratio of Infinite-Cap is 60% lower than
Belady-Size, whereas the gap between PFOO’s upper and lower bounds is less than 1%. Second,
prior bounds on OPT are misleading. Comparing GDSF and Belady-Size, which are also within 1%,
one would conclude that online policies are nearly optimal. In contrast, PFOO gives a much better
upper bound and shows that there is in fact a 22% gap between state-of-the-art online policies and
OPT.
We evaluate FOO and PFOO extensively on eight production traces with hundreds of millions
of requests across cache capacities from 16MB to 64GB. On CDN traces from two large Internet
companies, PFOO bounds OPT within 1.9% relative error at most and 1.4% on average. On web
application traces from two other large Internet companies, PFOO bounds OPT within 8.6% rela-
tive error at most and 6.1% on average. On storage traces from Microsoft [79], where our proof
assumptions do not hold, PFOO still bounds OPT within 11% relative error at most and 5.7% on
average.
PFOO thus gives nearly tight bounds on the offline optimal miss ratio for a wide range of realworkloads. We find that PFOO achieves on average 19% lower miss ratios than Belady, Belady-Size,
and other obvious offline upper bounds, demonstrating the value of our min-cost flow formulation.
1.4 ContributionsIn summary, this paper contributes the following:
• We present flow-based offline optimal (FOO), a novel formulation of offline caching as a min-cost
flow problem (Sections 3 and 4). FOO exposes the underlying problem structure and enables our
remaining contributions.
• We prove that FOO is asymptotically exact under simple independence assumptions, giving the
first tight, polynomial-time bound on OPT with variable object sizes (Section 5).
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:4 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
• We present practical flow-based offline optimal (PFOO), which relaxes FOO to give fast and nearly
tight bounds on real traces with hundreds of millions of requests (Section 6). PFOO gives the
first reasonably tight lower bound on OPT on long traces.
• We perform the first extensive evaluation of OPT with variable object sizes on eight production
traces (Sections 7 and 8). In contrast to prior offline bounds, PFOO reveals that the gap between
online policies and OPT is much larger than previously thought: 27% on average, and up to 43%
on web application traces.
Our implementations of FOO, PFOO, and of the previously unimplemented approximation algo-
rithms are publicly available1.
2 BACKGROUND ANDMOTIVATIONLittle is known about how to efficiently compute OPT with variable object sizes. On the theory
side, the best known approximation algorithms give very weak bounds and are too expensive to
compute. On the practical side, system builders use offline heuristics that are much cheaper to
compute, but give no guarantee that they are close to OPT. This section surveys known theoretical
results on OPT and the offline heuristics used in practice.
2.1 Offline bounds are more robust than online boundsFigure 1 showed a 22% gap between the best online policies and OPT. One might wonder whether
this gap arises from comparing an offline policy with an online policy, rather than from anyweakness
in the online policies. In other words, offline policies have an advantage because they know the
future. Perhaps this advantage explains the gap.
This raises the question of whether we could instead compute the online optimal miss ratio.
Such a bound would be practically useful, since all systems necessarily use online policies. Our
answer is that we would like to, but unfortunately this is impossible. To bound the online optimal
miss ratio, one must make assumptions about what information an online policy can use to make
caching decisions. For example, traditional policies such as LRU and LFU make decisions using
the history of past requests. These policies are conceptually simple enough that prior work has
proven online bounds on their performance in a variety of settings [2, 6, 36, 49, 72]. However, real
systems have quirks and odd behaviors that are hard to capture in tractable assumptions. With
enough effort, system designers can exploit these quirks to build online policies that outperform theonline bounds and approach OPT’s miss ratio for specific workloads. For example, Hawkeye [51]
recently demonstrated that a seemingly irrelevant request feature in computer architecture (the
program counter) is in fact tightly correlated with OPT’s decisions, allowing Hawkeye to mimic
OPT on many programs. Online bounds are thus inherently fragile because real systems behave
in strange ways, and it is impossible to bound the ingenuity of system designers to discover and
exploit these behaviors.
We instead focus on the offline optimal to get robust bounds on cache performance. Offline
bounds are widely used by practitioners, especially when objects have the same size and OPT
is simple to compute [13, 64]. However, computing OPT with variable object sizes is strongly
NP-complete [23], meaning that no fully polynomial-time approximation scheme (FPTAS) can
exist.2We are therefore unlikely to find tight bounds on OPT for arbitrary traces. Instead, our
approach is to develop a technique that can estimate OPT on any trace, which we call FOO, and
then show that FOO yields tight bounds on traces seen in practice. To do this, our approach is
two-fold. First, we prove that FOO gives tight bounds on traces that obey the independent reference
1https://github.com/dasebe/optimalwebcaching
2That no FPTAS can exist follows from Corollary 8.6 in [83], as OPT meets the assumptions of Theorem 8.5.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:5
model (IRM), the most common assumptions used in prior analysis of online policies [2, 14, 15,
20, 25, 26, 29, 31, 37, 38, 40–43, 45, 52–54, 58, 63, 65, 72, 73, 75, 76, 80, 82, 86]. The IRM assumption
is only needed for the proof; FOO is designed to work on any trace and does not itself rely on
any properties of the IRM. Second, we show empirically that FOO’s error is less than 0.3% on
real-world traces, including several that grossly violate the IRM. Together, we believe these results
demonstrate that FOO is both theoretically well-founded and practically useful.
2.2 OPT with variable object sizes is hardSince OPT is simple to compute for equal object sizes [13, 64], it is surprising that OPT becomes
NP-hard when object sizes vary. Caching may seem similar to Bin-Packing or Knapsack, which
can be approximated well when there are only a few object sizes and costs. But caching is quite
different because the order of requests constrains OPT’s choices in ways that are not captured by
these problems or their variants. In fact, the NP-hardness proof in [23] is quite involved and reduces
from Vertex Cover, not Knapsack. Furthermore, OPT remains strongly NP-complete even with just
three object sizes [39], and heuristics that work well on Knapsack perform badly in caching (see
“Freq/Size” in Section 2.4 and Section 8).
2.3 Prior bounds on OPT with variable object sizes are impracticalPrior work gives only three polynomial time bounds onOPT [4, 7, 50], which vary in time complexity
and approximation guarantee. Table 1 summarizes these bounds by comparing their asymptotic run-
time, how many requests can be calculated in practice (e.g., within 24 hrs), and their approximation
guarantee.
Technique Time Requests / 24hrs Approximation
OPT NP-hard [23] <1K 1
LP rounding [4] Ω(N 5.6) 50 K O(log
maxi si mini si
)LocalRatio [7] O(N 3) 500 K 4
OFMA [50] O(N 2) 28M O(logC)
FOOa O(N 3/2) 28M 1
PFOOb O(N logN ) 250M ≈1.06
Notation: N is the trace length, C is the cache capacity, and si is the size of object i .aFOO’s approximation guarantee holds under independence assumptions.
bPFOO does not have an approximation guarantee but its upper and lower bounds are
within 6% on average on production traces.
Table 1. Comparison of FOO and PFOO to prior bounds on OPT with variable object sizes. Computing OPTis NP-hard. Prior bounds [4, 7, 50] provide only weak approximation guarantees, whereas FOO’s bounds aretight. PFOO performs well empirically and can be calculated for hundreds of millions of requests.
Albers et al. [4] propose an LP relaxation of OPT and a rounding scheme. Unfortunately, the LP
requires N 2variables, which leads to a high Ω(N 5.6)-time complexity [59]. Not only is this running
time high, but the approximation factor is logarithmic in the ratio of largest to smallest object (e.g.,
around 30 on production traces), making this approach impractical.
Bar et al. [7] propose a general approximation framework (which we call LocalRatio), which can
be applied to the offline caching problem. This algorithm gives the best known approximation
guarantee, a factor of 4. Unfortunately, this is still a weak guarantee, as we saw in Section 1.
Additionally, LocalRatio is a purely theoretical algorithm, with a high running time of O(N 3), and
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:6 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
which we believe had not been implemented prior to our work. Our implementation of LocalRatio
can calculate up to 500 K requests in 24 hrs, which is only a small fraction of the length of production
traces.
Irani proposes the OFMA approximation algorithm [50], which has O(N 2) running time. This
running time is small enough for our implementation of OFMA to run on small traces (Section 8).
Unfortunately, OFMA achieves aweak approximation guarantee, logarithmic in the cache capacityC ,and in fact OFMA does badly on our traces, giving much weaker bounds than simple Belady-inspired
heuristics.
Hence, prior work that considers adversarial assumptions yields only weak approximation
guarantees.We therefore turn to stochastic assumptions to obtain tight bounds on the kinds of traces
actually seen in practice. Under independence assumptions, FOO achieves a tight approximation
guarantee on OPT, unlike prior approximation algorithms, and also has asymptotically better
runtime, specifically O(N 3/2
).
We are not aware of any prior stochastic analysis of offline optimal caching. In the context of
online caching policies, there is an extensive body of work using stochastic assumptions similar
to ours [12, 14, 15, 20, 25, 26, 29, 31, 37, 38, 40–43, 45, 52–54, 58, 63, 65, 72, 73, 75, 76, 80, 82, 86],
of which five prove optimality results [2, 6, 36, 49, 72]. Unfortunately, these results are only for
objects with equal sizes, and these policies perform poorly in our experiments because they do not
account for object size.
2.4 Heuristics used in practice to bound OPT give weak boundsSince the running times of prior approximation algorithms are too high for production traces,
practitioners have been forced to rely on heuristics that can be calculated more quickly. However,
these heuristics only give upper bounds on OPT, and there is no guarantee on how close to OPT
they are.
The simplest offline upper bound is Belady’s algorithm, which evicts the object whose next use lies
furthest in the future.3Belady is optimal in caching variants with equal object sizes [13, 44, 55, 64].
Even though it has no approximation guarantees for variable object sizes, it is still widely used
in the systems community [46, 48, 61, 77]. However, as we saw in Figure 1, Belady performs very
badly with variable object sizes and is easily outperformed by state-of-the-art online policies.
A straightforward size-aware extension of Belady is to evict the object with the highest cost =
object size × next-use distance. We call this variant Belady-Size. Among practitioners, Belady-Size
is widely believed to perform near-optimally, but it has no guarantees. It falls short on simple
examples: e.g., imagine that A is 4MB and is referenced 10 requests hence and never referenced
again, and B is 5MB and is referenced 9 and 12 requests hence. With 5MB of cache space, the best
choice between these objects is to keep B, getting two hits. But A has cost = 4 × 10 = 40, and B has
cost = 5 × 9 = 45, so Belady-Size keeps A and gets only one hit. In practice, Belady-Size often leads
to poor decisions when a large object is referenced twice in short succession: Belady-Size will evict
many other useful objects to make space for it, sacrificing many future hits to gain just one.
Alternatively, one could use Knapsack heuristics as size-aware offline upper bounds, such as the
density-ordered Knapsack heuristic, which is known to perform well on Knapsack in practice [33].
We call this heuristic Freq/Size, as Freq/Size evicts the object with the lowest utility = frequency /
size, where frequency is the number of requests to the object. Unfortunately, Freq/Size also falls
short on simple examples: e.g., imagine that A is 2MB and is referenced 10 requests hence, and Bis (as before) 5MB and is referenced 9 and 12 requests hence. With 5MB of cache space, the best
3Belady’s MIN algorithm actually operates somewhat differently. The algorithm commonly called “Belady” was invented
(and proved to be optimal) by Mattson [64]. For a longer discussion, see [66].
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:7
choice between these objects is to keep B, getting two hits. But A has utility = 1 ÷ 2 = 0.5, and B has
utility = 2 ÷ 5 = 0.4, so Freq/Size keeps A and gets only one hit. In practice, Freq/Size often leads to
poor decisions when an object is referenced in bursts: Freq/Size will retain such objects long after
the burst has passed, wasting space that could have earned hits from other objects.
Though these heuristics are easy to compute and intuitive, they give no approximation guarantees.
We will show that they are in fact far from OPT on real traces, and PFOO is a much better bound.
3 FLOW-BASED OFFLINE OPTIMALThis section gives a conceptual roadmap for our proof of FOO’s optimality, which we present
formally in Sections 4 and 5. Throughout this section we use a small request trace shown in Figure 2
as a running example. This trace contains four objects, a, b, c, and d, with sizes 3, 1, 1, and 2,
respectively.
Object a b c b d a c d a b b a
Size 3 1 1 1 2 3 1 2 3 1 1 3
Fig. 2. Example trace of requests to objects a, b, c, and d, of sizes 3, 1, 1, and 2, respectively.
First, we introduce a new integer linear program to represent OPT (Section 3.1). After relaxing
integrality constraints, we derive FOO’smin-cost flow representation, which can be solved efficiently
(Section 3.2). We then observe how FOO yields tight upper and lower bounds on OPT (Section 3.3).
To prove that FOO’s bounds are tight on real-world traces, we relate the gap between FOO’s upper
and lower bounds to the occurrence of a partial order on intervals, and then reduce the partial
order’s occurrence to an instance of the generalized coupon collector problem (Section 3.4).
3.1 Our new interval representation of OPTWe start by introducing a novel representation of OPT. Our integer linear program (ILP) minimizes
the number of cache misses, while having full knowledge of the request trace.
We exploit a unique property of offline optimal caching: OPT never changes its decision to
cache object k in between two requests to k (see Section 4). This naturally leads to an interval
representation of OPT as shown in Figure 3. While the classical representation of OPT uses decision
variables to track the state of every object at every time step [4], our ILP only keeps track of
interval-level decisions. Specifically, we use decision variables xi to indicate whether OPT caches
the object requested at time i , or not.
Object a b c b d a c d a b b a
IntervalDeci-
sionVariables x1
x2x3
x5
x4
x7x6
x8
Fig. 3. Interval ILP representation of OPT.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:8 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
3.2 FOO’s min-cost flow representationThis interval representation leads naturally to FOO’s flow-based representation, shown in Figure 4.
We use the min cost flow notation shown in Table 2. The key idea is to use flow to represent the
interval decision variables. Each request is represented by a node. Each object’s first request is a
source of flow equal to the object’s size, and its last request is a sink of flow in the same amount.
This flow must be routed along intervening edges, and hence min-cost flow must decide whether
to cache the object throughout the trace.
i
βi
j
βj
(cap, cost) cap Capacity of edge (i, j), i.e., maximum flow along this edge.
cost Cost per unit flow on edge (i, j).βi Flow surplus at node i , if βi > 0, flow demand if βi < 0.
Table 2. Notation for FOO’s min-cost flow. Edges are labeled with their (capacity, cost), and nodes are labeledwith flow −demand or +surplus .
a
+3
b
+1
c
+1
b d
+2
a c
−1
d
−2
a b b
−1
a
−3
(3, 0) (3, 0) (3, 0) (3, 0) (3, 0) (3, 0) (3, 0) (3, 0) (3, 0) (3, 0) (3, 0)
(3, 1/3) (3, 1/3) (3, 1/3)
(1, 1)(1, 1)
(1, 1)
(1, 1)
(2, 1/2)
Fig. 4. FOO’s min-cost flow problem for the short trace in Figure 2, using the notation from Table 2. Nodesrepresent requests, and cost measures cache misses. Requests are connected by central edges with capacityequal to the cache capacity and cost zero—flow routed along this path represents cached objects (hits). Outeredges connect requests to the same object, with capacity equal to the object’s size—flow routed along thispath represents misses. The first request for each object is a source of flow equal to the object’s size, and thelast request is a sink of flow of the same amount. Outer edges’ costs are inversely proportional to object sizeso that they cost 1 miss when an entire object is not cached. The min-cost flow achieves the fewest misses.
For cached objects, there is a central path of black edges connecting all requests. These edges
have capacity equal to the cache capacity and cost zero (since cached objects lead to zero misses).
Min-cost flow will thus route as much flow as possible through this central path to avoid costly
misses elsewhere [3].
To represent cache misses, FOO adds outer edges between subsequent requests to the same
object. For example, there are three edges along the top of Figure 4 connecting the requests to a.
These edges have capacity equal to the object’s size s and cost inversely proportional to the object’ssize 1/s . Hence, if an object of size s is not cached (i.e., its flow s is routed along this outer edge), it
will incur a cost of s × (1/s) = 1 miss.
The routing of flow through this graph implies which objects are cached and when. When no
flow is routed along an outer edge, this implies that the object is cached and the subsequent request
is a hit. All other requests, i.e., those with any flow routed along an outer edge, are misses. The
min-cost flow gives the decisions that minimize total misses.
3.3 FOO yields upper and lower bounds on OPTFOO can deviate from OPT as there is no guarantee that an object’s flow will be entirely routed
along its outer edge. Thus, FOO allows the cache to keep fractions of objects, accounting for only a
fractional miss on the next request to that object. In a real system, each fractional miss would be a
full miss. This error is the price FOO pays for making the offline optimal computable.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:9
To deal with fractional (non-integer) solutions, we consider two variants of FOO. FOO-L keeps
all non-integer solutions and is therefore a lower bound on OPT. FOO-U considers all non-integer
decisions as uncached, “rounding up” flow along outer edges, and is therefore an upper bound on
OPT. We will prove this in Section 4.
Object a b c b d a c d a b b a
OPT decision
FOO-L decision 0 1 1 11
20 1
1
20 1 0 0
FOO-U decision 0 1 1 1 0 0 1 0 0 1 0 0
Fig. 5. Caching decisions made by OPT, FOO-L, and FOO-U with a cache capacity of C = 3.
Figure 5 shows the caching decisions made by OPT, FOO-L, and FOO-U assuming a cache of size
3. A “” indicates that OPT caches the object until its next request, and a “” indicates it is notcached. OPT suffers five misses on this trace by caching object b and either c or d. OPT caches b
because it is referenced thrice and is small. This leaves space to cache the two references to either
c or d, but not both. (OPT in Figure 5 chooses to cache c since it requires less space.) OPT does not
cache a because it takes the full cache, forcing misses on all other requests.
The solutions found by FOO-L are very similar to OPT. FOO-L decides to cache objects b and c,
matching OPT, and also caches half of d. FOO-L thus underestimates the misses by one, counting
d’s misses fractionally. FOO-U gives an upper bound for OPT by counting d’s misses fully. In this
example, FOO-U matches OPT exactly.
3.4 Overview of our proof of FOO’s optimalityWe show both theoretically and empirically that FOO-U and FOO-L yield tight bounds. Specifically,
we prove that FOO-L’s solutions are almost always integer when there are many objects (as in
production traces). Thus, FOO-U and FOO-L coincide with OPT.
Our proof is based on a natural precedence relation between intervals such that an optimal policy
strictly prefers some intervals over others. For example, in Figure 2, FOO will always prefer x2 overx1 and x8 over x7. This can be seen in the figure, as interval x2 fits entirely within x1, and likewise
x8 fits within x7. In contrast, no such precedence relation exists between x6 and x4 because a is
larger than b, and so x6 does not fit within x4. Similarly, no precedence relation exists between x2and x5 because, although x5 is longer and larger, their intervals do not overlap, and so x2 does notfit within x5.This precedence relation means that if FOO caches any part of x1, then it must have cached all
of x2. Likewise, if FOO caches any part of x7, then it must have cached all of x8. The precedencerelation thus forces integer solutions in FOO. Although this relation is sparse in the small trace
from Figure 2, as one scales up the number of objects the precedence relation becomes dense. Our
challenge is to prove that this holds on traces seen in practice.
At the highest level, our proof distinguishes between “typical” and “atypical” objects. Atypical
objects are those that are exceptionally unpopular or exceptionally large; typical objects are
everything else. While the precedence relation may not hold for atypical objects, intervals from
atypical objects are rare enough that they can be safely ignored. We then show that for all the
typical objects, the precedence relation is dense. In fact, one only needs to consider precedence
relations among cached objects, as all other interval have zero decision variables. The basic intuition
behind our proof is that a popular cached object almost always takes precedence over another
object. Specifically, it will take precedence over one of the exceptionally large objects, since the only
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:10 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
way it could not is if all of the exceptionally large objects were requested before it was requested
again. There are enough large objects to make this vanishingly unlikely.
This is an instance of the generalized coupon collector problem (CCP). In the CCP, one collects
coupons (with replacement) from an urn with k distinct types of coupons, stopping once all k types
have been collected. The classical CCP (where coupons are equally likely) is a well-studied prob-
lem [35]. The generalized CCP, where coupons have non-uniform probabilities, is very challenging
and the focus of recent work in probability theory [5, 32, 67, 85].
Applying these recent results, we show that it is extremely unlikely that a popular object does
not take precedence over any others. Therefore, there are very few non-integer solutions among
popular objects, which make up nearly all hits, and the gap between FOO-U and FOO-L vanishes
as the number of objects grows large.
4 FORMAL DEFINITION OF FOOThis section shows how to construct FOO and that FOO yields upper and lower bounds on OPT.
Section 4.1 introduces our notation. Section 4.2 defines our new interval representation of OPT.
Section 4.3 relaxes the integer constraints and proves that our min-cost flow representation yields
upper and lower bounds on OPT.
4.1 Notation and definitionsThe trace σ consists of N requests to M distinct objects. The i-th request σi contains the corre-sponding object id, for all i ∈ 1 . . .N . We use si to reference the size of the object σi referencedin the i-th request. We denote the i-th interval (e.g., in Figure 3) by [i, ℓi ), where ℓi is the time of
the next request to object σi after time i , or∞ if σi is not requested again.
OPT minimizes the number of cache misses, while having full knowledge of the request trace.
OPT is constrained to only use cache capacity C (bytes), and is not allowed to prefetch objects as
this would lead to trivial solutions (no misses) [4]. Formally,
Assumption 1. An object k ∈ 1 . . .M can only enter the cache at times i ∈ 1 . . .N with σi = k .
4.2 New ILP representation of OPTWe start by formally stating our ILP formulation of OPT, based on intervals as illustrated in Figure 3.
First, we define the set I of all requests i where σi is requested again, i.e., I = i : ℓi < ∞. I is thetimes when OPT must decide whether to cache an object. For all i ∈ I , we associate a decisionvariable xi . This decision variable denotes whether object σi is cached during the interval [i, ℓi ).Our ILP formulation needs only N −M variables, vs. N ×M for prior approaches [4], and leads
directly to our flow-based approximation.
Definition 1 (Definition of OPT). The interval representation of OPT for a trace of length NwithM objects is as follows.
OPT = min
∑i ∈I
(1 − xi ) (1)
subject to:∑j :j<i<ℓj
sjx j ≤ C ∀i ∈ I (2)
xi ∈ 0, 1 ∀i ∈ I (3)
To represent the capacity constraint at every time step i , our representation needs to find all
intervals [j, ℓj ) that intersect with i , i.e., where j < i < ℓj . Eq. (2) enforces the capacity constraint by
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:11
bounding the size of cached intervals to be less than the cache size C . Eq. (3) ensures that decisionsare integral, i.e., that each interval is cached either fully or not at all. Section A.1 proves that our
interval ILP is equivalent to classic ILP formulations of OPT from prior work [4].
Having formulated OPT with fewer decision variables, we could try to solve the LP relaxation
of this specific ILP. However, the capacity constraint, Eq. (2), still poses a practical problem since
finding the intersecting intervals is computationally expensive. Additionally, the LP formulation
does not exploit the underlying problem structure, which we need to bound the number of integer
solutions. We instead reformulate the problem as min-cost flow.
4.3 FOO’s min-cost flow representation of OPTThis section presents the relaxed version of OPT as an instance of min-cost flow (MCF) in a graph
G. We denote a surplus of flow at a node i with βi > 0, and a demand for flow with βi < 0. Each
edge (i, j) in G has a cost per unit flow γ(i, j) and a capacity for flow µ(i, j) (see right-hand side of
Figure 4).
As discussed in Section 3, the key idea in our construction of an MCF instance is that each
interval introduces an amount of flow equal to the object’s size. The graph G is constructed such
that this flow competes for a single sequence of edges (the “inner edges”) with zero cost. These
“inner edges” represent the cache’s capacity: if an object is stored in the cache, we incur zero cost
(no misses). As not all objects will fit into the cache, we introduce “outer edges”, which allow MCF
to satisfy the flow constraints. However, these outer edges come at a cost: when the full flow of an
object uses an outer edge we incur cost 1 (i.e., a miss). Non-integer decision variables arise if part
of an object is in the cache (flow along inner edges) and part is out of the cache (flow along outer
edges).
Formally, we construct our MCF instance of OPT as follows:
Definition 2 (FOO’s representation of OPT). Given a trace with N requests andM objects,the MCF graphG consists ofN nodes. For each request i ∈ 1 . . .N there is a node with supply/demand
βi =
si if i is the first request to σi−si if i is the last request to σi0 otherwise.
(4)
An inner edge connects nodes i and i +1. Inner edges have capacity µ(i,i+1) = C and cost γ(i,i+1) = 0,for i ∈ 1 . . .N − 1.
For all i ∈ I , an outer edge connects nodes i and ℓi . Outer edges have capacity µ(i, ℓi ) = si and costγ(i, ℓi ) = 1/si . We denote the flow through outer edge (i, ℓi ) as fi .FOO-L denotes the cost of an optimal feasible solution to the MCF graphG . FOO-U denotes the cost ifall non-zero flows through outer edges fi are rounded up to the edge’s capacity si .
This representation yields a min-cost flow instance with 2N −M − 1 edges, which is solvable
in O(N 3/2) [8, 9, 28]. Note that while this paper focuses on optimizing miss ratio (i.e., the fault
model [4], where all misses have the same cost), Definition 2 easily supports non-uniform miss
costs by setting outer edge costs to γ(i, ℓi ) = costi/si . We next show how to derive upper and lower
bounds from this min-cost flow representation.
Theorem 1 (FOO bounds OPT). For FOO-L and FOO-U from Definition 2,
FOO-L ≤ OPT ≤ FOO-U (5)
Proof. We observe that fi as defined in Definition 2, defines the number of bytes “not stored”
in the cache. fi corresponds to the i-th decision variable xi from Definition 1 as xi = (1 − fi/si ).
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:12 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
(FOO-L ≤ OPT): FOO-L is a feasible solution for the LP relaxation of Definition 1, because a total
amount of flow si needs to flow from node i to node ℓi (by definition of βi ). At most µ(i,i+1) = Cflows uses an inner edge which enforces constraint Eq. (2). FOO-L is an optimal solution because
it minimizes the total cost of flow along outer edges. Each outer edge’s cost is γ(i, ℓi ) = 1/si , soγ(i, ℓi ) fi = (1 − xi ), and thus
FOO-L = min
∑i ∈I
γ(i, ℓi ) fi= min
∑i ∈I
(1 − xi )≤ OPT (6)
(OPT ≤ FOO-U): After rounding, each outer edge (i, ℓi ) has flow fi ∈ 0, si , so the corresponding
decision variable xi ∈ 0, 1. FOO-U thus yields a feasible integer solution, and OPT yields no more
misses than any feasible solution.
5 FOO IS ASYMPTOTICALLY OPTIMALThis section proves that FOO is asymptotically optimal, namely that the gap between FOO-U and
FOO-L vanishes as the number of objects grows large. Section 5.1 formally states this result and
our assumptions, and Sections 5.2–5.5 present the proof.
5.1 Main result and assumptionsOur proof of FOO’s optimality relies on two assumptions: (i) that the trace is created by stochasticallyindependent request processes and (ii) that the popularity distribution is not concentrated on a
finite set of objects as the number of objects grows.
Assumption 2 (Independence). The request sequence is generated by independently samplingfrom a popularity distribution PM . Object sizes are sampled from an arbitrary continuous size distri-bution S, which is independent ofM and has a finite maxi si .
We assume that object sizes are unique to break ties when making caching decisions. If the object
sizes are not unique, one can simply add small amounts of noise to make them so. We assume a
maximum object size to show the existence of a scaling regime, i.e., that the number of cached
objects grows large as the cache grows large. For the same reason, we exclude trivial cases where
a finite set of objects dominates the request sequence even as the total universe of objects grows
large:
Assumption 3 (Diverging popularity distribution). For any number M > 0 of objects,the popularity distribution PM is defined via an infinite sequenceψk . At any time 1 ≤ i ≤ N ,
P [object k is requested | M objects overall] =ψk∑Mk=1ψk
(7)
The sequenceψk must be positive and diverging such that cache size C → ∞ is required to achieve aconstant miss ratio asM → ∞.
Our assumptions on PMallow for many common distributions, such as uniform popularities (ψk
= 1) or heavy-tailed Zipfian probabilities (ψk = 1/kα for α ≤ 1, as is common in practice [16, 19, 48,
62, 78]). Moreover, with some change to notation, our proofs can be extended to require only that
ψk remains constant over short timeframes. With these assumptions in place, we are now ready to
state our main result on FOO’s asymptotic optimality.
Theorem 2 (FOO is Asymptotically Optimal). Under Assumptions 2 and 3, for any error εand violation probability κ, there exists anM∗ such that for any trace withM > M∗ objects
P [FOO-U − FOO-L ≥ ε N ] ≤ κ (8)
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:13
where the trace length N ≥ M log2M and the cache capacity C is scaled withM such that FOO-L’s
miss ratio remains constant.
Theorem 2 states that, as M → ∞, FOO’s miss ratio error is almost surely less than ε for anyε > 0. Since FOO-L and FOO-U bound OPT (Theorem 1), FOO-L = OPT = FOO-U.
The rest of this section is dedicated to the proof Theorem 2. The key idea in our proof is to
bound the number of non-integer solutions in FOO-L via a precedence relation that forces FOO-L
to strictly prefer some decision variables over others, which forces them to be integer. Section 5.2
introduces this precedence relation. Section 5.3 maps this relation to a representation that can be
stochastically analyzed (as a variant of the coupon collector problem). Section 5.4 then shows that
almost all decision variables are part of a precedence relation and thus integer, and Section 5.5
brings all these parts together in the proof of Theorem 2.
5.2 Bounding the number of non-integer solutions using a precedence graphThis section introduces the precedence relation ≺ between caching intervals. The intuition behind
≺ is that if an interval i is nested entirely within interval j , then min-cost flow must prefer i over j .We first formally define ≺, and then state the property about optimal policies in Theorem 3.
Definition 3 (Precedence relation). For two caching intervals [i, ℓi ) and [j, ℓj ), let therelation ≺ be such that i ≺ j (“i takes precedence over j”) if(1) j < i ,(2) ℓj > ℓi , and(3) si < sj .
The key property of ≺ is that it forces integer decision variables.
Theorem 3 (Precedence forces integer decisions). If i ≺ j, then x j > 0 in FOO-L’smin-cost flow solution implies xi = 1.
In other words, if interval i is strictly preferable to interval j , then FOO-L will take all of i beforetaking any of j. The proof of this result relies on the notion of a residual MCF graph [3, p.304 ff],
where for any edge (i, j) ∈ G with positive flow, we add a backwards edge (j, i)with cost γj,i = −γi, j .
Proof. By contradiction. Let G ′be the residual MCF graph induced by a given MCF solution.
Figure 6 sketches the MCF graph in the neighborhood of j, . . . , i, . . . , ℓi , . . . , ℓj .
. . . j . . . i . . . ℓi . . . ℓj . . .
Fig. 6. The precedence relation i ≺ j from Definition 3 forces integer decisions on interval i . In any min-costflow solution, we can reroute flow such that if x j > 0 then xi = 1.
Assume that x j > 0 and that xi < 1, as otherwise the statement is trivially true. Because x j > 0
there exist backwards inner edges all the way between ℓj and j. Because xi < 1, the MCF solution
must include some flow on the outer edge (i, ℓi ), and there exists a backwards outer edge (ℓi , i) ∈ G ′
with cost γℓi ,i = −1/si .
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:14 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
We can use the backwards edges to create a cycle in G ′, starting at j, then following edge (j, ℓj ),
backwards inner edges to ℓi , the backwards outer edge (ℓi , i), and finally backwards inner edges to
return to j. Figure 6 highlights this clockwise path in darker colored edges.
This cycle has cost = 1/sj − 1/si , which is negative because si < sj (by definition of ≺ and since
i ≺ j). As negative-cost cycles cannot exist in a MCF solution [3, Theorem 9.1, p.307], this leads to
a contradiction.
To quantify how many integer solutions there are, we need to know the general structure of the
precedence graph. Intuitively, for large production traces with many overlapping intervals, the
graph will be very dense. We have empirically verified this for our production traces.
Unfortunately, the combinatorial nature of caching traces made it difficult for us to characterize
the general structure of the precedence graph under stochastic assumptions. For example, we
considered classical results on random graphs [56] and the concentration of measure in random
partial orders [18]. None of these paths yielded sufficiently tight bounds on FOO. Instead, we bound
the number of non-integer solutions via the generalized coupon collector problem.
5.3 Relating the precedence graph to the coupon collector problemWe now translate the problem of intervals without child in the precedence graph (i.e., intervals ifor which there exists no i ≺ j) into a tractable stochastic representation.
We first describe the intuition for equal object sizes, and then consider variable object sizes.
Definition 4 (Cached objects). Let Hi denote the set of cached intervals that overlap time i ,excluding i .
Hi =j , i : x j > 0 and i ∈ [j, ℓj )
and hi = |Hi | (9)
We observe that interval [i, ℓi ) is without child if and only if all other objects x j ∈ Hi are requestedat least once in [i, ℓi ). Figure 7 shows an example where Hi consists of five objects (intervals
xa , . . . ,xe ). As all five objects are requested before ℓi , all five intervals end before xi ends, andso [i, ℓi ) cannot fit in any of them. To formalize this observation, we introduce the following
random variables, also illustrated in Figure 7. Li is the length of the i-th interval, i.e., Li = ℓi − i .THi is the time after i when all intervals in Hi end. We observe that THi is the stopping time in a
coupon-collector problem (CCP) where we associate a coupon type with every object in Hi . With
equal object sizes, the event i has a child is equivalent to the event THi > Li .We now extend our intuition to the case of variable object sizes. We now need to consider that
objects inHi can be smaller than si and thus may not be i’s children for a new reason: the precedence
relation (Definition 3) requires i’s children to have size larger than or equal to si . Figure 8 shows anexample where Li is without child because (i) xb , which ends after ℓi , is smaller than si , and (ii) alllarger objects (xa ,xc ,xd ) are requested before ℓi . The important conclusion is that, by ignoring the
smaller objects, we can reduce the problem back to the CCP.
To formalize our observation about the relation to the CCP, we introduce the following random
variables, also illustrated in Figure 8. We define B, which is a subset of the cached objects Hi with a
Li
THi
Hi
xi
xe
xd
xc
xb
xa
ii
Fig. 7. Simplified notation for the coupon collector representation of offlinecaching with equal object sizes. We translate the precedence relation fromTheorem 3 into the relation between two random variables. Li denotes thelength of interval i . THi is the coupon collector time, where we wait until allobjects that are cached at the beginning of Li (Hi denotes these objects) arerequested at least once.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:15
Li
TB
Hi
xi
xd
xc
xa
ii
Fig. 8. Full notation for the coupon collector representation of offline caching.With variable object sizes, we need to ignore all objects with a smaller size thansi (greyed out intervals xb and xe ). We then define the coupon collector timeTB among a subset B ⊂ Hi of cached objects with a size larger than or equalto si . Using this notation, the event TB > Li implies that xi has a child, whichforces xi to be integer by Theorem 3.
size equal to or larger than si , and the coupon collector time TB for B-objects. These definitions areuseful as the event TB > Li implies that i has a child and thus xi is integer, as we now show.
Theorem 4 (Stochastic bound on non-integer variables). For decision variable xi ,i ∈ 1 . . .N , assume that B ⊆ Hi is a subset of cached objects where sj ≥ si for all j ∈ B. Further, letthe random variable TB denote the time until all intervals in B end, i.e., TB = maxj ∈B ℓj − i .If B is non-empty, then the probability that xi is non-integer is upper bounded by the probability
interval i ends after all intervals in B, i.e.,
P [0 < xi < 1] ≤ P [Li > TB ] . (10)
The proof works backwards by assuming thatTB > Li . We then show that this implies that there
exists an interval j with i ≺ j and then apply Theorem 3 to conclude that xi is integer. Finally, weuse this implication to bound the probability.
Proof. Consider an arbitrary i ∈ 1 . . .N with TB > Li . Let [j, ℓj ) denote the interval in B that
is last requested, i.e., ℓj = maxk ∈B ℓk (j exists because B is non-empty). To show that i ≺ j, wecheck the three conditions of Definition 3.
(1) j < i , because j ∈ Hi (i.e., j is cached at time i);(2) ℓj > ℓi , because ℓj = maxk ∈B ℓk = i +TB > i + Li = ℓi ; and(3) si < sj , because j ∈ B (i.e., sj is bigger than si by assumption).
Having shown that i ≺ j, we can apply Theorem 3, so that x j > 0 implies xi = 1. Because j ∈ Hi ,
j is cached x j > 0 and thus xi = 1. Finally, we observe that Li , TB and conclude the theorem’s
statement by translating the above implications, xi = 1 ⇐ i ≺ j ⇐ TB > Li , into probability.
P [0 < xi < 1] = 1 − P [xi ∈ 0, 1] ≤ 1 − P [i ≺ j] ≤ 1 − P [TB > Li ] = P [Li > TB ] . (11)
Theorem 4 simplifies the analysis of non-integer xi to the relation of two random variables, Liand TB . While Li is geometrically distributed, TB ’s distribution is more involved.
We map TB to the stopping time Tb,p of a generalized CCP with b = |B | different coupon types.
The coupon probabilities p follow from the object popularity distribution PMby conditioning
on objects in B. As the object popularities p are not equal in general, characterizing the stopping
time Tb,p is much more challenging than in the classical CCP, where the coupon probabilities are
assumed to be equal. We solve this problem by observing that collecting b coupons under equal
probabilities stops faster than under p. This fact may appear obvious, but it was only recently
shown by Anceaume et al. [5, Theorem 4, p. 415] (the proof is non-trivial). Thus, we can use a
classical CCP to bound the generalized CCP’s stopping time and TB .
Lemma 1 (Connection to classical coupon connector problem). For any object pop-ularity distribution PM , and for q = (1/b, . . . , 1/b), using the notation from Theorem 4
P [TB < l] ≤ P[Tb,q < l
]for any l ≥ 0 . (12)
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:16 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
The proof of this Lemma simply extends the result by Anceaume et al. to account for requests to
objects not in B. We state the full proof in Section A.2.
5.4 Typical objects almost always lead to integer decision variablesWe now use the connection to the coupon collector problem to show that almost all of FOO’s
decision variables are integer. Specifically, we exclude a small number of very large and unpopular
objects, and show that the remaining objects are almost always part of a precedence relation, which
forces the corresponding decision variables to become integer. In Section 5.5, we show that the
excluded fraction is diminishingly small.
We start with a definition of the sets of large objects and the set of popular objects.
Definition 5 (Large objects and popular objects ). Let N ∗ be the time after which thecache needs to evict at least one object. For time i ∈ N ∗ . . .N , we define the sets of large objects, Bi ,and the set of popular objects, Fi .The set Bi ⊆ Hi consists of the requests to the fraction δ largest objects of Hi (0 < δ < 1). We also
define bi = |Bi |, and we write “si < Bi ” if si < sj for all j ∈ Bi and “si ≮ Bi ” otherwise.The set Fi consists of those objects k with a request probability ρk which lies above the following
threshold.
Fi =
k : ρk ≥
1
bi log logbi
(13)
Using the above definitions, we prove that “typical” objects (i.e., popular objects that are not too
large) rarely lead to non-integer decision variables as the number of objectsM grows large.
Theorem 5 (Typical objects are rarely non-integer). For i ∈ N ∗ . . .N , Bi , and Fifrom Definition 5,
P[0 < xi < 1
si < Bi ,σi ∈ Fi]→ 0 asM → ∞ . (14)
The intuition is that, as the number of cached objects grows large, it is vanishingly unlikely that
all objects in Bi will be requested before a single object is requested again. That is, though there are
not many large objects in Bi , there are enough that, following Theorem 4, xi is vanishingly unlikelyto be non-integer. The proof of Theorem 5 uses elementary probability but relies on several very
technical proofs. We sketch the proof here, and state the formal proofs in the Appendices A.3 to A.5.
Proof sketch. Following Theorem 4, it suffices to consider the event Li > TBi .
•(P[Li > TBi
]→ 0 as hi → ∞
): We first condition on Li = l , so that Li and TBi become
stochastically independent and we can bound P[Li > TBi
]by bounding either P [Li > l] or
P[l > TBi
]. Specifically, we split l carefully into “small l” and “large l”, and then show that Li
is concentrated at small l , andTBi is concentrated at large l . Hence, P[Li > TBi
]is negligible.
– (Small l :) For small l , we show that it is unlikely for all objects in Bi to have been requested
after l requests. We upper bound the distribution ofTBi withTbi ,u (Lemma 1). We then show
that the distribution ofTbi ,u decays exponentially at values below its expectation (Lemma 4).
Hence, for l far below the expectation of Tbi ,u , the probability vanishes P[Tbi ,u < l
]→ 0,
so long as bi = δhi grows large, which it does because hi grows large.– (Large l :) For large l , P [Li > l] → 0 because we only consider popular objects σi ∈ Fi byassumption, and it is highly unlikely that a popular object is not requested after many
requests.
• (hi → ∞): What remains to be shown is that the number of cached objects hi actually grows
large. Since the cache size C → ∞ as M → ∞ by Assumption 3, this may seem obvious.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:17
Nevertheless, it must be demonstrated (Lemma 3). The basic intuition is that hi is almost
never much less than h∗ = C/maxk sk , the fewest number of objects that could fit in the
cache, and h∗ → ∞ as C → ∞.
To see why hi is almost never much less than h∗, consider the probability that hi < x ,where x is constant with respect toM . For any x , take large enoughM such that x < h∗.
Now, in order for hi < x , almost all requests must go to distinct objects. Any object that
is requested twice between u (the last time where hu ≥ h∗) and v (the next time where
hv ≥ h∗) produces an interval (see Figure 9). This interval is guaranteed to fit in the cache,
since hi < x < h∗ means there is space for an object of any size. As h∗ andM grow further,
the amount of cache resources that must lay unused for hi < x grows further and further,
and the probability that no interval fits within these resources becomes negligible.
h*
i
x
#C
ach
ed O
bje
cts
Timeu v
1st request 2nd request
Fig. 9. The number of cached objects at time i , hi , is unlikely to be far below h∗ = C/maxk sk , the fewestnumber of objects that fit in the cache. In order for hi < h∗ to happen, no other interval must fit in the whitetriangular space centered at i (otherwise FOO-L would cache the interval).
5.5 Bringing it all together: Proof of Theorem 2This section combines our results so far and shows how to obtain Theorem 2: There existsM∗ suchthat for anyM > M∗ and for any error ε and violation probability κ,
P [FOO-U − FOO-L ≥ ε N ] ≤ κ (15)
Proof of Theorem 2. We start by bounding the cost of non-integer solutions by the number of
non-integer solutions, Ω. ∑i : 0<xi<1
xi ≤∑i ∈I
1i : 0<xi<1 = Ω (16)
It follows that (FOO-U − FOO-L) ≤ Ω.
P [FOO-U − FOO-L ≥ ε N ] ≤ P [Ω ≥ ε N ] (17)
We apply the Markov inequality.
≤E [Ω]
N ε=
1
N ε
∑i ∈I
P [0 < xi < 1] (18)
There are at most N terms in the sum.
≤P [0 < xi < 1]
ε(19)
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:18 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
To complete Eq. (15), we upper bound P [0 < xi < 1] to be less than ε κ. We first condition on si < Biand σi ∈ Fi , double counting those i where si ≮ Bi and σi < Fi .
P [0 < xi < 1] ≤ P[0 < xi < 1
si < Bi ,σi ∈ Fi]P [si < Bi ,σi ∈ Fi ] (20)
+ P[0 < xi < 1
si ≮ Bi]P [si ≮ Bi ] (21)
+ P[0 < xi < 1
σi < Fi ] P [σi < Fi ] (22)
Drop ≤ 1 terms.
≤ P[0 < xi < 1
si < Bi ,σi ∈ Fi]+ P [si ≮ Bi ] + P [i < Fi ] (23)
To bound P [0 < xi < 1] ≤ ε κ, we choose parameters such that each term in Eq. (23) is less than
ε κ/3. The first term vanishes by Theorem 5. The second term is satisfied by choosing δ = ε κ/3(Definition 5). For the third term, the probability that any cached object is unpopular vanishes as higrows large.
P [i < Fi ] ≤hi
bi log logbi=
3
ε κ log log ε κ hi/3→ 0 as hi → ∞ (24)
Finally, we chooseM∗large enough that the first and third terms in Eq. (23) are each less than ε κ/3.
This concludes our theoretical proof of FOO’s optimality.
6 PRACTICAL FLOW-BASED OFFLINE OPTIMAL FOR REAL-WORLD TRACESWhile FOO is asymptotically optimal and very accurate in practice, as well as faster than prior
approximation algorithms, it is still not fast enough to process production traces with hundreds
of millions of requests in a reasonable timeframe. We now use the insights gained from FOO’s
graph-theoretic formulation to design new upper and lower bounds on OPT, which we call practicalflow-based offline optimal (PFOO). We provide the first practically useful lower bound, PFOO-L, and
an upper bound that is much tighter than prior practical offline upper bounds, PFOO-U:
PFOO-L ≤ FOO-L ≤ OPT ≤ FOO-U ≤ PFOO-U (25)
6.1 Practical lower bound: PFOO-LPFOO-L considers the total resources consumed by OPT. As Figure 10a illustrates, cache resources
are limited in both space and time [10]: measured in resources, the cost to cache an object is the
product of (i) its size and (ii) its reuse distance (i.e., the number of accesses until it is next requested).
On a trace of length N , a cache of size C has total resources N ×C . The objects cached by OPT, or
any other policy, cannot cost more total resources than this.
Definition of PFOO-L. PFOO-L sorts all intervals by their resource cost, and caches the smallest-
cost intervals up a total resource usage of N × C . Figure 10b shows PFOO-L on a short request
trace. By considering only the total resource usage, PFOO-L ignores other constraints that are faced
by caching policies. In particular, PFOO-L does not guarantee that cached intervals take less than
C space at all times, as shown by interval 6 for object a in Figure 10b, which exceeds the cache
capacity during part of its interval.
Why PFOO-L works. PFOO-L is a lower bound because no policy, including OPT or FOO-L, can
get fewer misses using N ×C total resources. It gives a reasonably tight bound because, on large
caches with many objects, the distribution of interval costs is similar throughout the request trace.
Hence, for a given cache capacity, the “marginal interval” (i.e., the one barely does not fit in the
cache under OPT) is also of similar cost throughout the trace. Informally, PFOO-L caches intervals
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:19
(a) Intervals sorted by resource cost = reuse dis-tance × object size.
(b) PFOO-L greedily claims the smallest intervalswhile not exceeding the average cache size.
Fig. 10. PFOO’s lower bound, PFOO-L, constrains the total resources used over the full trace (i.e., size ×time). PFOO-L claims the hits that require fewest resources, allowing cached objects to temporarily exceedthe cache capacity.
up to this marginal cost, and so rarely exceeds cache capacity by very much. The intuition behind
PFOO-L is thus similar to the widely used Che approximation [21], which states that LRU caches
keep objects up until some “characteristic age”. But unlike the Che approximation, which has
unbounded error, PFOO-L uses this intuition to produce a robust lower bound. This intuition holds
particularly when requests are largely independent, as in our proof assumptions and in traces from
CDNs or other Internet services. However, as we will see, PFOO-L introduces modest error even
on other workloads where these assumptions do not hold.
PFOO-L uses a similar notion of “cost” as Belady-Size, but provides two key advantages. First,
PFOO-L is closer to OPT. Relaxing the capacity constraint lets PFOO-L avoid the pathologies
discussed in Section 2.4, since PFOO-L can temporarily exceed the cache capacity to retain valuable
objects that Belady-Size is forced to evict. Second, relaxing the capacity constraint makes PFOO-L
a lower bound, giving the first reasonably tight lower bound on long traces.
6.2 Practical upper bound: PFOO-UDefinition of PFOO-U. PFOO-U breaks FOO’s min-cost flow graph into smaller segments of
constant size, and then solves each segment using min-cost flow incrementally. By keeping track of
the resource usage of already solved segments, PFOO-U yields a globally-feasible solution, which is
an upper bound on OPT or FOO-U.
Example of PFOO-U. Figure 11 shows our approach on the trace from Figure 2 for a cache capacity
of 3. At the top is FOO’s full min-cost flow problem; for large traces, this MCF problem is too
expensive to solve directly. Instead, PFOO-U breaks the trace into segments and constructs a
min-cost flow problem for each segment.
PFOO-U begins by solving the min-cost flow for the first segment. In this case, the solution is to
cache both bs, c, and one-third of a, since these decisions incur the minimum cost of two-thirds,
i.e., less than one cache miss. As in FOO-U, PFOO-U rounds down the non-integer decision for
a and all following non-integer decisions. Furthermore, PFOO-U only fixes decisions for objects
in the first half of this segment. This is done to capture interactions between intervals that cross
segment boundaries. Hence, PFOO-U “forgets” the decision to cache c and the second b, and its
final decision for this segment is only to cache the first b interval.
PFOO-U then updates the second segment to account for its previous decisions. That is, since b
is cached until the second request to b, capacity must be removed from the min-cost flow to reflect
this allocation. Hence, the capacity along the inner edge c → b is reduced from 3 to 2 in the second
segment (b is size 1). Solving the second segment, PFOO-U decides to cache c and b (as well as half
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:20 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
a b c b d a c d a b b a
Segment 1
Segment 2
Segment 3
Segment 4
Segment 5
(a) PFOO-U breaks the trace into small, overlapping segments . . .
a b c b
(3, 0) (3, 0) (3, 0) (3, 0)
(3, 1/3)
(1, 1)
(1, 1)
(1, 1)
Segment 1. Cache both bs and
c; forget c and second b.
c b d a(2, 0)
(2, 1/2)
(3, 0) (3, 0) (3, 0)
(3, 1/3)
(1, 1)
(1, 1)
Segment 2. Cache c and b.
Segment 3.
Cache c; forget c.d a c d
(3, 1/3)
(1, 0) (1, 0)
(1, 1)
(2, 1/2)
(2, 0) (2, 0)
(2, 1/2) . . .
(b) . . . and solves min-cost flow for each segment.
Fig. 11. Starting from FOO’s full formulation, PFOO-U breaks the min-cost flow problem into overlappingsegments. Going left-to-right through the trace, PFOO-U optimally solves MCF on each segment, and updateslink capacities in subsequent segments to maintain feasibility for all cached objects. The segments overlap tocapture interactions across segment boundaries.
of d, which is ignored). Since these are in the first half of the segment, we fix both decisions, and
move onto the third segment, updating the capacity of edges to reflect these decisions as before.
PFOO-U continues to solve the following segments in this manner until the full trace is processed.
On the trace from Section 3, it decides to cache all requests to b and c, yielding 5 misses on the
requests to a and d. These are the same decisions as taken by FOO-U and OPT. We generally find
that PFOO-U yields nearly identical miss ratios as FOO-U, as we next demonstrate on real traces.
6.3 SummaryPutting it all together, PFOO provides efficient lower and upper bounds on OPT with variable
object sizes. PFOO-L runs in O(N logN ) time, as required to sort the intervals; and PFOO-U runs
in O(N ) because it divides the min-cost flow into segments of constant size. In practice, PFOO-L is
faster than PFOO-U at realistic trace lengths, despite its worse asymptotic runtime, due to the large
constant factor in solving each segment in PFOO-U.
PFOO-L gives a lower bound (PFOO-L ≤ FOO-L, Eq. (25)) because PFOO-L caches all of the
smallest intervals, so no policy can get more hits (i.e., cache more intervals) without exceeding
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:21
N ×C overall resources. Resources are a hard constraint obeyed by all policies, including FOO-L,
whose resources are constrained by the capacity C of all N − 1 inner edges.
PFOO-U gives an upper bound (FOO-U ≤ PFOO-U, Eq. (25)) because FOO-U and PFOO-U
both satisfy the capacity constraint and the integrality constraint of the ILP formulation of OPT
(Definition 1). By sequentially optimizing the MCF segment after segment, we induce additional
constraints on PFOO-U, which can lead to suboptimal solutions.
7 EXPERIMENTAL METHODOLOGYWe evaluate FOO and PFOO against prior offline bounds and online caching policies on eight
different production traces.
7.1 Trace characterizationWe use production traces from three global content-distribution networks (CDNs), two web-
applications from different anonymous large Internet companies, and storage workloads from
Microsoft [79]. We summarize the trace characteristics in Table 3. The table shows that our traces
typically span several tens to hundreds of million of requests and tens of millions of objects.
Figure 12 shows four key distributions of these workloads.
Trace Year # Requests # Objects Object sizes
CDN 1 2016 500M 18M 10 B – 616MB
CDN 2 2015 440M 19M 1B – 1.5 GB
CDN 3 2015 420M 43M 1B – 2.3 GB
WebApp 1 2017 104M 10M 3B – 1.9MB
WebApp 2 2016 100M 14M 5B – 977KB
Storage 1 2008 29M 16M 501 B – 780 KB
Storage 2 2008 37M 6M 501 B – 78KB
Storage 3 2008 45M 14M 501 B – 489 KB
Table 3. Overview of key properties of the production traces used in our evaluation.
The object size distribution (Figure 12a) shows that object sizes are variable in all traces. However,
while they span almost ten orders of magnitude in CDNs, object sizes vary only by six orders of
magnitude in web applications, and only by three orders of magnitude in storage systems. WebApp
1 also has noticeably smaller object sizes throughout, as is representative for application-cache
workloads.
The popularity distribution (Figure 12b) shows that CDN workloads and WebApp workloads
all follow approximately a Zipf distribution with α between 0.85 and 1. In contrast, the popularity
distribution of storage traces is much more irregular with a set of disproportionally popular objects,
an approximately log-linear middle part, and an exponential cutoff for the least popular objects.
The reuse distance distribution (Figure 12c) — i.e., the distribution of the number of requests
between requests to the same object — further distinguishes CDN and WebApp traces from stor-
age workloads. CDNs and WebApps serve millions of different customers and so exhibit largely
independent requests with smoothly diminishing object popularities, which matches our proof
assumptions. Thus, the CDN and WebApp traces lead to a smooth reuse distance, as shown in
the figure. In contrast, storage workloads serve requests from one or a few applications, and so
often exhibit highly correlated requests (producing spikes in the reuse distance distribution). For
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:22 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
example, scans are common in storage (e.g., traces like: ABCDABCD ...), but never seen in CDNs.
This is evident from the figure, where the storage traces exhibit several steps in their cumulative
request probability, as correlated objects (e.g., due to scans) have the same reuse distance.
Finally, we measure the correlation across different objects (Figure 12d). Ideally, we could directly
test our independence assumption (Assumption 2). Unfortunately, quantifying independence on
real traces is challenging. For example, classical methods such as Hoeffding’s independence test
[47] only apply to continuous distributions, whereas we consider cache requests in discrete time.
0.25
0.50
0.75
1.00
1B 1KB 1MB 1GB
Object Size
Cu
mu
lativ
e D
istr
ibu
tio
n
CDN 1 CDN 2
(a) Object sizes.
1
100
10000
1e+06
1 100 10000 1e+06
i−th Most Popular Object
Request
Count
CDN 3 WebApp 1
(b) Popularities.
0.00
0.25
0.50
0.75
1.00
1 100 10000 1e+06 1e+08
Reuse Distance
Request
Pro
babili
ty
WebApp 2 Storage 1
(c) Reuse distances.
0.00
0.25
0.50
0.75
1.00
−1.0 −0.5 0.0 0.5 1.0
Correlation Coefficient
Cu
mu
lativ
e D
istr
ibu
tion
Storage 2 Storage 3
(d) Correlations.
Fig. 12. The production traces used in our evaluation comes from three different domains (CDNs, WebApps,and storage) and thus exhibit starkly different request patterns. (a) The object size distribution in CDN tracesspace more than nine orders of mangitude, where object sizes in WebApp and Storage traces are much smaller.(b) Object popularities in CDNs and WebApps follow approximately Zipf popularities, whereas the popularitydistribution for storage traces is more complex. (c) The reuse distance distribution of CDNs and WebApps issmooth, whereas in the Storage there are jumps – indicating correlated request sequences such as scans. (d)The distribution of Pearson correlation coffecients for the top 10k-most popular objects shows that CDN andWebApp traces do not exhibit linear correlations, whereas the three storage traces show distince patterns withsignificant positive correlation. One of the storage traces (Storage 2) in fact shows a correlation coefficientgreater than 0.5 for all 10k-most popular objects.
We therefore turn to correlation coefficients. Specifically, we use the Pearson correlation coeffi-
cient as it is computable in linear time (as opposed to Spearman’s rank and Kendall’s tau [30]). We
define the coefficient based on the number of requests each object receives in a time bucket that
spans 4000 requests (we verified that the results do not change significantly for time bucket sizes
in the range 400 - 40k requests). In order to capture pair-wise correlations, we chose the top 10k
objects in each trace, calculated the request counts for all time buckets, and then calculated the
Pearson coefficient for all possible combinations of object pairs.
Figure 12d shows the distribution of coefficients for all object pair combinations. We find that
both CDN and WebApps do not show a significant correlation; over 95% of the object pairs have a
coefficient coefficient between -0.25 and 0.25. In contrast, we find strong positive correlations in
the storage traces. For the first storage trace, we measure a correlation coefficient greater than 0.5for all 10k-most popular objects. For the second storage trace, we measure a correlation coefficient
greater than 0.5 for more than 20% of the object pairs. And, for the third storage trace, we measure
a correlation coefficient greater than 0.5 for more than 14% of the object pairs. We conclude that the
simple Pearson correlation coefficient is sufficiently powerful to quantify the correlation inherent
to storage traces (such as loops and scans).
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:23
7.2 Caching policiesWe evaluate three classes of policies: theoretical bounds on OPT, practical offline heuristics, and
online caching policies. Besides FOO, there exist three other theoretical bounds on OPT with
approximation guarantees: OFMA, LocalRatio, and LP (Section 2.3). Besides PFOO-U, we consider
three other practical upper bounds: Belady, Belady-Size, and Freq/Size (Section 2.4). Besides PFOO-L,
there is only one other practical lower bound: a cache with infinite capacity (Infinite-Cap). Finally,
for online policies, we evaluated GDSF [22], GD-Wheel [60], AdaptSize [16], Hyperbolic [17],
and several other older policies which perform much worse on our traces (including LRU-K [71],
TLFU [34], SLRU [48], and LRU).
Our implementations are in C++ and use the COIN-OR::LEMON library [70], GNU parallel [81],
OpenMP [27], and CPLEX 12.6.1.0. OFMA runs in O(N 2), LocalRatio runs in O(N 3), Belady in
O(N logC). We rely on sampling [74] to run Belady-Size on large traces, which gives us an O(N )
implementation. We will publicly release our policy implementations and evaluation infrastructure
upon publication of this work. To the best of our knowledge, these include the first implementations
of the prior theoretical offline bounds.
7.3 Evaluation metricsWe compare each policy’smiss ratio, which ranges from 0 to 1. We present absolute error as unitless
scalars and relative error as percentages. For example, if FOO-U’s miss ratio is 0.20 and FOO-L’s is
0.15, then absolute error is 0.05 and relative error is 33%.
We present a miss ratio curve for each trace, which shows the miss ratio achieved by different
policies at different cache capacities. We present these curves in log-linear scale to study a wide
range of cache capacities. Miss ratio curves allow us to compare miss ratios achieved by different
policies at fixed cache capacities, as well as compute cache capacities that achieve equivalent miss
ratios.
8 EVALUATIONWe evaluate FOO and PFOO to demonstrate the following: (i) PFOO is fast enough to process real
traces, whereas FOO and prior theoretical bounds are not; (ii) FOO yields nearly tight bounds on
OPT, even when our proof assumptions do not hold; (iii) PFOO is highly accurate on full production
traces; and (iv) PFOO reveals that there is significantly more room for improving current caching
systems than implied by prior offline bounds.
8.1 PFOO is necessary to process real tracesFigure 13 shows the execution time of FOO, PFOO, and prior theoretical offline bounds at different
trace lengths. Specifically, we run each policy on the first N requests of the CDN 1 trace, and vary
N from a few thousand to over 30 million. Each policy ran alone on a 2016 SuperMicro server with
44 Intel Xeon E5-2699 cores and 500GB of memory.
These results show that LP and LocalRatio are unusable: they can process only a few hundred
thousand requests in a 24-hour period, and their execution time increases rapidly as traces lengthen.
While FOO and OFMA are faster, they both take more than 24 hours to process more than 30
million requests, and their execution times increase super-linearly.
Finally, PFOO is much faster and scales well, allowing us to process traces with hundreds of
millions of requests. PFOO’s lower bound completes in a few minutes, and while PFOO’s upper
bound is slower, it scales linearly with trace length. PFOO is thus the only bound that completes inreasonable time on real traces.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:24 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
Fig. 13. Execution time of FOO, PFOO, and prior theoretical offline bounds at different trace lengths. Mostprior bounds are unusable above 500 K requests. Only PFOO can process real traces with many millions ofrequests.
8.2 FOO is nearly exact on short tracesWe compare FOO, PFOO, and prior theoretical upper bounds on the first 10 million requests of
each trace. Of the prior theoretical upper bounds, only OFMA runs in a reasonable time at this
trace length,4so we compare FOO, PFOO, OFMA, the Belady variants, Freq/Size, and Infinite-Cap.
Our first finding is that FOO-U and FOO-L are nearly identical, as predicted by our analysis.
The largest difference between FOO-U’s and FOO-L’s miss ratio on CDN and WebApp traces is
0.0005—a relative error of 0.15%. Even on the storage traces, where requests are highly correlated
and hence our proof assumptions do not hold, the largest difference is 0.0014—a relative error
of 0.27%. Compared to the other offline bounds, FOO is at least an order of magnitude and often
several orders of magnitude more accurate.
Given FOO’s high accuracy, we use FOO to estimate the error of the other offline bounds.
Specifically, we assume that OPT lies in the middle between FOO-U and FOO-L. Since the difference
between FOO-U and FOO-L is so small, this adds negligible error (less than 0.14%) to all other
results.
Figure 14 shows the maximum error fromOPT across five cache sizes on our first CDN production
trace. All upper bounds are shown with a bar extending above OPT, and all lower bounds are shown
with a bar extending below OPT. Note that the practical offline upper bounds (e.g., Belady) do not
have corresponding lower bounds. Likewise, there is no upper bound corresponding to Infinite-Cap.
Also note that OFMA’s bars are so large than they extend above and below the figure. We have
therefore annotated each bar with its absolute error from OPT.
The figure shows that FOO-U and FOO-L nearly coincide, with error of 0.00003 (=3e−5) on this
trace. PFOO-U is 0.00005 (=5e−5) above OPT, nearly matching FOO-U, and PFOO-L is 0.007 below
OPT, which is very accurate though worse than FOO-L.
All prior techniques yield error several orders of magnitude larger. OFMA has very high error:
its bounds are 0.72 above and 0.39 below OPT. The practical upper bounds are more accurate than
OFMA: Belady is 0.18 above OPT, Belady-Size 0.05, and Freq/Size 0.05. Finally, Infinite-Cap is 0.19
below OPT. Prior to FOO and PFOO, the best bounds for OPT give a broad range of up to 0.24.
PFOO and FOO reduce error by 34× and 4000×, respectively.
4While we have tried downsampling the traces to run LP and LocalRatio (as suggested in [11, 57, 84] for objects with equal
sizes), we were unable to achieve meaningful results. Under variable object sizes, scaling down the system (including the
cache capacity), makes large objects disproportionately disruptive.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:25
Fig. 14. Comparison of the maximum approximation errorof FOO, PFOO, and prior offline upper and lower boundsacross five cache sizes on a CDN production trace.FOO’s upper and lower bounds are nearly identical, witha gap of less than 0.0005. We therefore assume that OPTis halfway between FOO-U and FOO-L, which introducesnegligible error due to FOO’s high accuracy.PFOO likewise introduces small error, with PFOO-U havinga similar accuracy to FOO-U. PFOO-L leads to higher errorsthan FOO-L but is still highly accurate, within 0.007. Incontrast, all prior policies lead to errors several orders-of-magnitude larger.
0.1
8
0.0
5
3e−
05
5e−
05
0.7
2
0.0
5
n/a
−7e−
03
−3e−
05
−0.1
9
−0.3
9
n/a
n/a
n/a
CDN 1
FO
O
PF
OO
OF
MA
Bela
dy
Bela
dy−
Siz
e
Fre
q/S
ize
Infin
ite−
Cap
−0.2
−0.1
OPT
0.1
0.2
Maxim
um
Err
or
in M
iss
Ratio
upper b
ound
slower b
ou
nd
s
Figures 15 and 16 show the approximation error on all eight production traces. The prior upper
bounds are much worse than PFOO-U, except on one trace (Storage 1), where Belady-Size and
Freq/Size are somewhat accurate. Averaging across all traces, PFOO-U is 0.0014 above OPT. PFOO-U
reduces mean error by 37× over Belady-Size, the best prior upper bound. Prior work gives even
weaker lower bounds. PFOO-L is on average 0.004 below OPT on the CDN traces, 0.02 below OPT
on the WebApp traces, and 0.04 below OPT on the storage traces. PFOO-L reduces mean error by
9.8× over Infinite-Cap and 27× over OFMA. Hence, across a wide range of workloads, PFOO is by
far the best practical bound on OPT.
8.3 PFOO is accurate on real tracesNow that we have seen that FOO is accurate on short traces, we next show that PFOO is accurate
on long traces. Figure 17 shows the miss ratio over the full traces achieved by PFOO, the best
prior practical upper bounds (the best of Belady, Belady-Size, and Freq/Size), the Infinite-Cap lower
bound, and the best online policy (see Section 7).
On average, PFOO-U and PFOO-L bound the optimal miss ratio within a narrow range of 4.2%.
PFOO’s bounds are tighter on the CDN and WebApp traces than the storage traces: PFOO gives an
average bound of just 1.4% on CDN 1-3 and 1.3% on WebApp 1-2 but 5.7% on Storage 1-3. This is
likely due to error in PFOO-L when requests are highly correlated, as they are in the storage traces.
Nevertheless, PFOO gives much tighter bounds than prior techniques on every trace. The prior
offline upper bounds are noticeably higher than PFOO-U. On average, compared to PFOO-U,
Belady-Size is 19% higher, Freq/Size is 22% higher, and Belady is fully 72% higher. These prior upper
bounds are not, therefore, good proxies for the offline optimal. Moreover, the best upper bound
varies across traces: Freq/Size is lower on CDN 1 and CDN 3, but Belady-Size is lower on the others.
Unmodified Belady gives a very poor upper bound, showing that caching policies must account for
object size. The only lower bound in prior work is an infinitely large cache, whose miss ratio is
much lower than PFOO-L. PFOO thus gives the first reasonably tight bounds on the offline miss ratiofor real traces.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:26 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
0.2
2
0.0
5
1e−
04
2e−
04
0.6
6
0.0
4
n/a
−2e−
03
−1e−
04
−0.0
6
−0.1
5
n/a
n/a
n/a
0.3
2
0.0
6
2e−
04
2e−
03
0.6
6
0.0
4
n/a
−2e−
03
−2e−
04
−0.0
4
−0.2
6
n/a
n/a
n/a
0.0
8
0.0
7
2e−
04
0.5
9
0.2
5
4e−
04
n/a
−0.0
2
−2e−
04
−0.5
7
−0.7
9
n/a
n/a
n/a
CDN 2 CDN 3 WebApp 1
FO
O
PF
OO
OF
MA
Bela
dy
Bela
dy−
Siz
e
Fre
q/S
ize
Infin
ite−
Cap
FO
O
PF
OO
OF
MA
Bela
dy
Bela
dy−
Siz
e
Fre
q/S
ize
Infin
ite−
Cap
FO
O
PF
OO
OF
MA
Bela
dy
Bela
dy−
Siz
e
Fre
q/S
ize
Infin
ite−
Cap
0.1
8
0.0
5
3e−
05
5e−
05
0.7
2
0.0
5
n/a
−7e−
03
−3e−
05
−0.1
9
−0.3
9
n/a
n/a
n/a
CDN 1F
OO
PF
OO
OF
MA
Bela
dy
Bela
dy−
Siz
e
Fre
q/S
ize
Infin
ite−
Cap
−0.2
−0.1
OPT
0.1
0.2
Ma
xim
um
Err
or
in M
iss
Ra
tio
upper b
oundslower b
ou
nds
Fig. 15. Approximation error of FOO, PFOO, and several prior offline bounds on four of our eight productiontraces (Figure 16 shows the other four). FOO and PFOO’s lower and upper bounds are orders of magnitudebetter than any other offline bound. (See Figure 14.)
0.2
5
0.0
8
3e−
04
0.5
8
0.1
3
3e−
03
n/a
−0.0
1
−3e−
04
−0.1
5
−0.4
2
n/a
n/a
n/a
0.0
3
9e−
03
1e−
06
5e−
04
0.1
9
0.0
1
n/a
−0.0
2
−1e−
06
−0.0
5
−0.8
3
n/a
n/a
n/a
0.1
3
0.0
7
7e−
04
3e−
03
0.4
8
0.0
9
n/a
−0.0
3
−7e−
04
−0.4
1
−0.8
2
n/a
n/a
n/a
0.1
2
0.0
3
9e−
06
1e−
03
0.1
2
0.0
3
n/a
−0.0
7−9e−
06
−0.1
2
−0.7
3
n/a
n/a
n/a
WebApp 2 Storage 1 Storage 2 Storage 3
FO
O
PF
OO
OF
MA
Bela
dy
Bela
dy−
Siz
e
Fre
q/S
ize
Infin
ite−
Cap
FO
O
PF
OO
OF
MA
Bela
dy
Bela
dy−
Siz
e
Fre
q/S
ize
Infin
ite−
Cap
FO
O
PF
OO
OF
MA
Bela
dy
Bela
dy−
Siz
e
Fre
q/S
ize
Infin
ite−
Cap
FO
O
PF
OO
OF
MA
Bela
dy
Bela
dy−
Siz
e
Fre
q/S
ize
Infin
ite−
Cap
upper b
oundslower b
ou
nds
−0.2
−0.1
OPT
0.1
0.2
Ma
xim
um
Err
or
in M
iss
Ra
tio
Fig. 16. Approximation error of FOO, PFOO, and several prior offline bounds on four of our eight productiontraces (Figure 15 shows the other four). FOO and PFOO’s lower and upper bounds are orders of magnitudebetter than any other offline bound. (See Figure 14.)
8.4 PFOO shows that there is significant room for improvement in online policiesFinally, we compare with online caching policies. Figure 17 show the best online policy (by average
miss ratio) and the offline bounds for each trace. We also show LRU for reference on all traces.
On all traces at most cache capacities, there is a large gap between the best online policy and
PFOO-U, showing that there remains significant room for improvement in online caching policies.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:27
0.0
0.1
0.2
0.3
0.4
1GB 4GB 16GB 64GB 256GB
Cache Size
Mis
s R
atio
LRU
GDWheel
Freq/Size
Infinite Cap
PFOO−U
PFOO−L
(a) CDN 1
0.0
0.1
0.2
1GB 4GB 16GB 64GB 256GB
Cache Size
Mis
s R
atio
LRU
GDSF
Belady−Size
Infinite Cap
PFOO−U
PFOO−L
(b) CDN 2
0.0
0.1
0.2
0.3
0.4
1GB 4GB 16GB 64GB 256GB
Cache Size
Mis
s R
atio
LRU
GDSF
Freq/Size
Infinite Cap
PFOO−U
PFOO−L
(c) CDN 3
0.0
0.1
0.2
0.3
0.4
0.5
64MB 256MB 1GB 4GB 16GB 64GB
Cache Size
Mis
s R
atio
LRU
GDWheel
Belady−Size
Infinite Cap
PFOO−U
PFOO−L
(d) WebApp 1
0.0
0.1
0.2
0.3
0.4
16MB 64MB 256MB 1GB
Cache Size
Mis
s R
atio
LRU
GDWheel
Belady−Size
Infinite Cap
PFOO−U
PFOO−L
(e) WebApp 2
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1GB 4GB 16GB 64GB 256GB
Cache Size
Mis
s R
atio
LRU
Hyperbolic
Belady−Size
Infinite Cap
PFOO−U
PFOO−L
(f) Storage 1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1GB 4GB 16GB 64GB 256GB
Cache Size
Mis
s R
atio
LRU
AdaptSize
Belady−Size
Infinite Cap
PFOO−U
PFOO−L
(g) Storage 2
0.0
0.1
0.2
0.3
0.4
0.5
1GB 4GB 16GB 64GB 256GB
Cache Size
Mis
s R
atio
LRU
GDWheel
Belady−Size
Infinite Cap
PFOO−U
PFOO−L
(h) Storage 3
Fig. 17. Miss ratio curves for PFOO vs. LRU, Infinite-Cap, the best prior offline upper bound, and the bestonline policy for our eight production traces.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:28 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
Moreover, this gap is much larger than prior offline bounds would suggest. On average, PFOO-U
achieves 27% fewer misses than the best online policy, whereas the best prior offline policy achieves
only 7.2% fewer misses; the miss ratio gap between online policies and offline optimal is thus 3.75×
as large as implied by prior bounds. The storage traces are the only ones where PFOO does not
consistently increase this gap vs. prior offline bounds, but even on these traces there is a large
difference at some sizes (e.g., at 64 GB in Figure 17g). On CDN and WebApp traces, the gap is much
larger.
For example, on CDN 2, GDSF (the best online policy) matches Belady-Size (the best prior offline
upper bound) at most cache capacities. One would therefore conclude that existing online policies
are nearly optimal, but PFOO-U reveals that there is in fact a large gap between GDSF and OPT on
this trace, as it is 21% lower on average (refer back to Figure 1).
These miss ratio reductions make a large difference in real systems. For example, on CDN 2,
CDN 3, and WebApp 1, OPT requires just 16GB to match the miss ratio of the best prior offline
bound at 64GB (the x-axis in these figures is shown in log-scale). Prior bounds thus suggest that
online policies require 4× as much cache space as is necessary.
9 CONCLUSION AND FUTUREWORKWe began this paper by asking: Should the systems community continue trying to improve miss ratios,or have all achievable gains been exhausted? We have answered this question by developing new
techniques, FOO and PFOO, to accurately and quickly estimate OPT with variable object sizes. Our
techniques reveal that prior bounds for OPT lead to qualitatively wrong conclusions about the
potential for improving current caching systems. Prior bounds indicate that current systems are
nearly optimal, whereas PFOO reveals that misses can be reduced by up to 43%.
Moreover, since FOO and PFOO yield constructive solutions, they offer a new path to improve
caching systems. While it is outside the scope of this work, we plan to investigate adaptive caching
systems that tune their parameters online by learning from PFOO, e.g., by recording a short window
of past requests and running PFOO to estimate OPT’s behavior in real time. For example, AdaptSize,
a caching system from 2017, assumes that admission probability should decay exponentially with
object size. This is unjustified, and we can learn an optimized admission function from PFOO’s
decisions.
This paper gives the first principled way to evaluate caching policies with variable object sizes:
FOO gives the first asymptotically exact, polynomial-time bounds on OPT, and PFOO gives the
first practical and accurate bounds for long traces. Furthermore, our results are verified on eight
production traces from several large Internet companies, including CDN, web application, and
storage workloads, where FOO reduces approximation error by 4000×. We anticipate that FOO and
PFOO will prove important tools in the design of future caching systems.
10 ACKNOWLEDGMENTSWe thank Jens Schmitt, our shepherd Niklas Carlsson, and our anonymous reviewers for their
valuable feedback.
Daniel S. Berger was supported by the Mark Stehlik Postdoctoral Teaching Fellowship of the
School of Computer Science at Carnegie Mellon University. Nathan Beckmann was supported by a
Google Faculty Research Award. Mor Harchol-Balter was supported by NSF-CMMI-1538204 and
NSF-XPS-1629444.
REFERENCES[1] Marc Abrams, C. R. Standridge, Ghaleb Abdulla, S. Williams, and Edward A. Fox. 1995. Caching Proxies: Limitations
and Potentials. Technical Report. Virginia Polytechnic Institute & State University Blacksburgh, VA.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:29
[2] Alfred V Aho, Peter J Denning, and Jeffrey D Ullman. 1971. Principles of optimal page replacement. J. ACM 18, 1
(1971), 80–93.
[3] Ravindra K Ahuja, Thomas L Magnanti, and James B Orlin. 1993. Network flows: theory, algorithms, and applications.Prentice hall.
[4] Susanne Albers, Sanjeev Arora, and Sanjeev Khanna. 1999. Page replacement for general caching problems. In SODA.31–40.
[5] Emmanuelle Anceaume, Yann Busnel, and Bruno Sericola. 2015. New results on a generalized coupon collector problem
using Markov chains. Journal of Applied Probability 52, 2 (2015), 405–418.
[6] Omri Bahat and Armand M Makowski. 2003. Optimal replacement policies for nonuniform cache objects with optional
eviction. In IEEE INFOCOM. 427–437.
[7] Amotz Bar-Noy, Reuven Bar-Yehuda, Ari Freund, Joseph Naor, and Baruch Schieber. 2001. A unified approach to
approximating resource allocation and scheduling. J. ACM 48, 5 (2001), 1069–1090.
[8] Ruben Becker, Maximilian Fickert, and Andreas Karrenbauer. [n. d.]. A Novel Dual Ascent Algorithm for Solving theMin-Cost Flow Problem. Chapter 12, 151–159.
[9] Ruben Becker and Andreas Karrenbauer. 2013. A Combinatorial O(m 3/2)-time Algorithm for the Min-Cost Flow
Problem. arXiv preprint arXiv:1312.3905 (2013).[10] Nathan Beckmann, Haoxian Chen, and Asaf Cidon. 2018. LHD: Improving Hit Rate by Maximizing Hit Density. In
USENIX NSDI. 1–14.[11] Nathan Beckmann and Daniel Sanchez. 2015. Talus: A Simple Way to Remove Cliffs in Cache Performance. In IEEE
HPCA. 64–75.[12] Nathan Beckmann and Daniel Sanchez. 2016. Modeling cache behavior beyond LRU. In IEEE HPCA. 225–236.[13] L. A. Belady. 1996. A study of replacement algorithms for a virtual-storage computer. IBM Systems journal 5 (1996),
78–101.
[14] Daniel S. Berger, Philipp Gland, Sahil Singla, and Florin Ciucu. 2014. Exact analysis of TTL cache networks. Perform.Eval. 79 (2014), 2 – 23. Special Issue: Performance 2014.
[15] Daniel S Berger, Sebastian Henningsen, Florin Ciucu, and Jens B Schmitt. 2015. Maximizing cache hit ratios by variance
reduction. ACM SIGMETRICS Performance Evaluation Review 43, 2 (2015), 57–59.
[16] Daniel S. Berger, Ramesh Sitaraman, and Mor Harchol-Balter. 2017. AdaptSize: Orchestrating the Hot Object Memory
Cache in a CDN. In USENIX NSDI. 483–498.[17] Aaron Blankstein, Siddhartha Sen, and Michael J Freedman. 2017. Hyperbolic Caching: Flexible Caching for Web
Applications. In USENIX ATC. 499–511.[18] Béla Bollobás and Graham Brightwell. 1992. The height of a random partial order: concentration of measure. The
Annals of Applied Probability (1992), 1009–1018.
[19] Lee Breslau, Pei Cao, Li Fan, Graham Phillips, and Scott Shenker. 1999. Web caching and Zipf-like distributions:
Evidence and implications. In IEEE INFOCOM. 126–134.
[20] PJ Burville and JFC Kingman. 1973. On a model for storage and search. Journal of Applied Probability (1973), 697–701.
[21] Hao Che, Ye Tung, and Zhijun Wang. 2002. Hierarchical web caching systems: Modeling, design and experimental
results. IEEE JSAC 20 (2002), 1305–1314.
[22] Ludmila Cherkasova. 1998. Improving WWW proxies performance with greedy-dual-size-frequency caching policy.Technical Report. Hewlett-Packard Laboratories.
[23] Marek Chrobak, Gerhard J Woeginger, Kazuhisa Makino, and Haifeng Xu. 2012. Caching is hard—even in the fault
model. Algorithmica 63 (2012), 781–794.[24] Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh, and Sachin Katti. 2016. Cliffhanger: Scaling Performance Cliffs in
Web Memory Caches. In USENIX NSDI. 379–392.[25] Edward Grady Coffman and Peter J Denning. 1973. Operating systems theory. Prentice-Hall.[26] Edward Grady Coffman and Predrag Jelenković. 1999. Performance of the move-to-front algorithm with Markov-
modulated request sequences. Operations Research Letters 25 (1999), 109–118.[27] Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming.
IEEE Computational Science and Engineering 5, 1 (1998), 46–55.
[28] Samuel I Daitch and Daniel A Spielman. 2008. Faster approximate lossy generalized flow via interior point algorithms.
In ACM STOC. 451–460.[29] Asit Dan and Don Towsley. 1990. An Approximate Analysis of the LRU and FIFO Buffer Replacement Schemes. In
ACM SIGMETRICS. 143–152.[30] Wayne W. Daniel. 1978. Applied nonparametric statistics. Houghton Mifflin.
[31] Robert P Dobrow and James Allen Fill. 1995. The move-to-front rule for self-organizing lists with Markov dependent
requests. In Discrete Probability and Algorithms. Springer, 57–80.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:30 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
[32] Aristides V Doumas and Vassilis G Papanicolaou. 2012. The coupon collector’s problem revisited: asymptotics of the
variance. Advances in Applied Probability 44, 1 (2012), 166–195.
[33] Ding-Zhu Du and Panos M Pardalos. 2013. Handbook of combinatorial optimization (2 ed.). Springer.
[34] Gil Einziger and Roy Friedman. 2014. Tinylfu: A highly efficient cache admission policy. In IEEE Euromicro PDP.146–153.
[35] William Feller. 2008. An introduction to probability theory and its applications. Vol. 2. John Wiley & Sons.
[36] Andrés Ferragut, Ismael Rodríguez, and Fernando Paganini. 2016. Optimizing TTL caches under heavy-tailed demands.
In ACM SIGMETRICS. 101–112.[37] James Allen Fill and Lars Holst. 1996. On the distribution of search cost for the move-to-front rule. Random Structures
& Algorithms 8 (1996), 179–186.[38] Philippe Flajolet, Daniele Gardy, and Loÿs Thimonier. 1992. Birthday paradox, coupon collectors, caching algorithms
and self-organizing search. Discrete Applied Mathematics 39 (1992), 207–229.[39] Lukáš Folwarczny and Jiří Sgall. 2017. General caching is hard: Even with small pages. Algorithmica 79, 2 (2017),
319–339.
[40] Christine Fricker, Philippe Robert, and James Roberts. 2012. A versatile and accurate approximation for LRU cache
performance. In ITC. 8–16.[41] Massimo Gallo, Bruno Kauffmann, Luca Muscariello, Alain Simonian, and Christian Tanguy. 2012. Performance
evaluation of the random replacement policy for networks of caches. In ACM SIGMETRICS/ PERFORMANCE. 395–396.[42] Nicolas Gast and Benny Van Houdt. 2015. Transient and steady-state regime of a family of list-based cache replacement
algorithms. In ACM SIGMETRICS. 123–136.[43] Erol Gelenbe. 1973. A unified approach to the evaluation of a class of replacement algorithms. IEEE Trans. Comput.
100 (1973), 611–618.
[44] Binny S Gill. 2008. On multi-level exclusive caching: offline optimality and why promotions are better than demotions.
In USENIX FAST. 4–18.[45] WJ Hendricks. 1972. The stationary distribution of an interesting Markov chain. Journal of Applied Probability (1972),
231–233.
[46] Peter Hillmann, Tobias Uhlig, Gabi Dreo Rodosek, and Oliver Rose. 2016. Simulation and optimization of content
delivery networks considering user profiles and preferences of internet service providers. In IEEE Winter SimulationConference. 3143–3154.
[47] Wassily Hoeffding. 1948. A non-parametric test of independence. The annals of mathematical statistics (1948), 546–557.[48] Qi Huang, Ken Birman, Robbert van Renesse, Wyatt Lloyd, Sanjeev Kumar, and Harry C Li. 2013. An analysis of
Facebook photo caching. In ACM SOSP. 167–181.[49] Stratis Ioannidis and Edmund Yeh. 2016. Adaptive caching networks with optimality guarantees. In ACM SIGMETRICS.
113–124.
[50] Sandy Irani. 1997. Page replacement with multi-size pages and applications to web caching. In ACM STOC. 701–710.[51] Akanksha Jain and Calvin Lin. 2016. Back to the future: leveraging Belady’s algorithm for improved cache replacement.
In ACM/IEEE ISCA. 78–89.[52] Predrag R Jelenković. 1999. Asymptotic approximation of the move-to-front search cost distribution and least-recently
used caching fault probabilities. The Annals of Applied Probability 9 (1999), 430–464.
[53] Predrag R Jelenković and Ana Radovanović. 2004. Least-recently-used caching with dependent requests. Theoreticalcomputer science 326 (2004), 293–327.
[54] Predrag R Jelenkovic and Ana Radovanovic. 2004. Optimizing LRU caching for variable document sizes. Combinatorics,Probability and Computing 13 (2004), 627–643.
[55] Mahesh Kallahalla and Peter J Varman. 2002. PC-OPT: optimal offline prefetching and caching for parallel I/O systems.
IEEE Trans. Comput. 51, 11 (2002), 1333–1344.[56] Brian Karrer and Mark EJ Newman. 2009. Random graph models for directed acyclic networks. Physical Review E 80, 4
(2009), 046110.
[57] Richard E. Kessler, Mark D Hill, and David A Wood. 1994. A comparison of trace-sampling techniques for multi-
megabyte caches. IEEE Trans. Comput. 43, 6 (1994), 664–675.[58] W. Frank King. 1971. Analysis of Demand Paging Algorithms. In IFIP Congress (1). 485–490.[59] Christos Koufogiannakis and Neal E Young. 2014. A nearly linear-time PTAS for explicit fractional packing and
covering linear programs. Algorithmica 70, 4 (2014), 648–674.[60] Conglong Li and Alan L Cox. 2015. GD-Wheel: a cost-aware replacement policy for key-value stores. In EUROSYS.
1–15.
[61] Suoheng Li, Jie Xu, Mihaela van der Schaar, andWeiping Li. 2016. Popularity-driven content caching. In IEEE INFOCOM.
1–9.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:31
[62] Bruce M Maggs and Ramesh K Sitaraman. 2015. Algorithmic nuggets in content delivery. ACM SIGCOMM CCR 45
(2015), 52–66.
[63] Valentina Martina, Michele Garetto, and Emilio Leonardi. 2014. A unified approach to the performance analysis of
caching systems.. In IEEE INFOCOM. 12–24.
[64] Richard L. Mattson, Jan Gecsei, Donald R. Slutz, and Irving L. Traiger. 1970. Evaluation techniques for storage
hierarchies.. In IBM Systems journal, Vol. 9. 78–117.[65] John McCabe. 1965. On serial files with relocatable records. Operations Research 13 (1965), 609–618.
[66] Pierre Michaud. 2016. Some mathematical facts about optimal cache replacement. ACM Transactions on Architectureand Code Optimization (TACO) 13, 4 (2016), 50.
[67] John Moriarty and Peter Neal. 2008. The Generalized Coupon Collector Problem. Journal of Applied Probability 45
(2008), 621–29.
[68] Giovanni Neglia, Damiano Carra, and Pietro Michiardi. 2018. Cache policies for linear utility maximization. IEEE/ACMTransactions on Networking 26, 1 (2018), 302–313.
[69] Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C Li, Ryan McElroy, Mike Paleczny,
Daniel Peek, Paul Saab, et al. 2013. Scaling memcache at facebook. In USENIX NSDI. 385–398.[70] Egerváry Research Group on Combinatorial Optimization. 2015. COIN-OR::LEMON Library. (2015). Available at
http://lemon.cs.elte.hu/trac/lemon, accessed 10/21/17.
[71] Elizabeth J O’Neil, Patrick E O’Neil, and Gerhard Weikum. 1993. The LRU-K page replacement algorithm for database
disk buffering. ACM SIGMOD 22, 2 (1993), 297–306.
[72] Elizabeth J O’Neil, Patrick E O’Neil, and Gerhard Weikum. 1999. An optimality proof of the LRU-K page replacement
algorithm. JACM 46 (1999), 92–112.
[73] Antonis Panagakis, Athanasios Vaios, and Ioannis Stavrakakis. 2008. Approximate analysis of LRU in the case of short
term correlations. Computer Networks 52 (2008), 1142–1152.[74] Konstantinos Psounis and Balaji Prabhakar. 2001. A randomized web-cache replacement scheme. In INFOCOM 2001.
Twentieth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, Vol. 3. IEEE,1407–1415.
[75] Konstantinos Psounis, An Zhu, Balaji Prabhakar, and Rajeev Motwani. 2004. Modeling correlations in web traces and
implications for designing replacement policies. Computer Networks 45 (2004), 379–398.[76] Eliane R Rodrigues. 1995. The performance of the move-to-front scheme under some particular forms of Markov
requests. Journal of applied probability 32, 4 (1995), 1089–1102.
[77] Samta Shukla and Alhussein A Abouzeid. 2017. Optimal Device-Aware Caching. IEEE Transactions on Mobile Computing16, 7 (2017), 1994–2007.
[78] Ramesh K. Sitaraman, Mangesh Kasbekar, Woody Lichtenstein, and Manish Jain. 2014. Overlay networks: An Akamai
perspective. In Advanced Content Delivery, Streaming, and Cloud Services. John Wiley & Sons.
[79] Storage Networking Industry Association (SNIA). 2008. MSR Cambridge Traces. http://iotta.snia.org/traces/388. (2008).
[80] David Starobinski and David Tse. 2001. Probabilistic methods for web caching. Perform. Eval. 46 (2001), 125–137.[81] O. Tange. 2011. GNU Parallel - The Command-Line Power Tool. ;login: The USENIX Magazine 36, 1 (Feb 2011), 42–47.[82] Naoki Tsukada, Ryo Hirade, and Naoto Miyoshi. 2012. Fluid limit analysis of FIFO and RR caching for independent
reference model. Perform. Eval. 69 (Sept. 2012), 403–412.[83] Vijay V Vazirani. 2013. Approximation algorithms. Springer Science & Business Media.
[84] Carl Waldspurger, Trausti Saemundsson, Irfan Ahmad, and Nohhyun Park. 2017. Cache Modeling and Optimization
using Miniature Simulations. In USENIX. 487–498).[85] Weiyu Xu and A. Kevin Tang. 2011. A Generalized Coupon Collector Problem. Journal of Applied Probability 48, 4
(2011), 1081–1094.
[86] Neal E Young. 2000. Online paging against adversarially biased random inputs. Journal of Algorithms 37 (2000),
218–235.
A PROOFSA.1 Proof of equivalence of interval and classic ILP representations of OPT
Classic representation of OPT. The literature uses an integer linear program (ILP) representation
of OPT [4]. Figure 18 shows this classic ILP representation on the example trace from Section 3.
The ILP uses decision variables xi,k to track at each time i whether object k is cached or not. The
constraint on the cache capacity is naturally represented: the sum of the sizes for all cached objects
must be less than the cache capacity for every time i . Additional constraints enforce that OPT is
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:32 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
not allowed to prefetch objects (decision variables must not increase if the corresponding object is
not requested) and that the cache starts empty.
Object a b c b d a c d a b a. . .Decision
Variables x1,a x2,a x3,a x4,a x5,a x6,a x7,a x8,a x9,a x10,a x11,a
x1,b x2,b x3,b x4,b x5,b x6,b x7,b x8,b x9,b x10,b x11,bx1,c x2,c x3,c x4,c x5,c x6,c x7,c x8,c x9,c x10,c x11,cx1,d x2,d x3,d x4,d x5,d x6,d x7,d x8,d x9,d x10,d x11,d
Fig. 18. Classic ILP representation of OPT.
Lemma 2. Under Assumption 1, our ILP in Definition 1 is equivalent to the classical ILP from [4].
Proof sketch. Under Assumption 1, OPT changes the caching decision of object k only at times
i when σi = k . To see why this is true, let us consider the two cases of changing a decision variable
xk, j for i < j < ℓi . If xk,i = 0, then OPT cannot set xk, j = 1 because this would violate Assumption 1.
Similarly, if xk, j = 0, then setting xk,i = 1 does not yield any fewer misses, so we can safely assume
that xk,i = 0. Hence, decisions do not change within an interval in the classic ILP formulation.
To obtain the decision variables x ′p,i of the classical ILP formulation of OPT from a given solution
xi for the interval ILP, set x′σi , j = xi for all i ≤ j < ℓi , and for all i . This leads to an equivalent
solution because the capacity constraint is enforced at every time step.
A.2 Proof of Lemma 1This section proves Lemma 1 from Section 5.3, which bounds TB , i.e., the first time after i that allintervals in B are completed.
Recall the definitions of the generalized CCP, where coupon probabilities follow a distribution p,and the definition of the classical CCP, where coupons have equal request probability.
We will make use of the following result, which immediately follows from [5, Theorem 4, p. 415].
Theorem 6. For b ≥ 0 coupons, any probability vector p, and the equal-probability vector q =(1/b, . . . , 1/b), it holds that P
[Tb,p < l
]≤ P
[Tb,q < l
]for any l ≥ 0.
We will use this result to prove a bound onTB , which is defined asTB = maxj ∈B ℓj − i . The proofbounds TB first using a generalized CCP and then a classical CCP.
Proof of Lemma 1. We first bound TB via a generalized CCP with stopping time Tb,p and p =(p1, . . . ,pb ) with
pi = P [object k is requested | k ∈ B] under PM . (26)
AsTB contains requests to other objects j < B, it always holds thatTB ≥ Tb,p . Figure 19 shows sucha case, whereTB is extended because of requests to uncached objects and large cached objects.Tb,p ,on the other hand, does not include these other requests, and is thus always shorter or equal to TB .This inequality bounds the probabilities for all l ≥ 0.
P [TB < l] ≤ P[Tb,p < l
](27)
We then apply Theorem 6.
≤ P[Tb,q < l
](28)
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:33
xi...
xℓ... ...... ...
xn... ...
xj... ...
THi
xm... .........
coupon 2
coupon 3
coupon 1
coupon 4
Thi,p
Fig. 19. Translation of the time until all B objects are requested once, TB into a coupon-collector problem(CCP),Tb,p . As the CCP is based on fewer coupons (only objects ∈ B), the CCP serves as a lower bound onTB .
A.3 Proof of Theorem 5This section proves Theorem 5 from Section 5.4. Recall the definition of large and unpopular objects
(Definition 5). At a high level, the proof shows that the remaining, “typical” objects are almost
always part of a precedence relation. As a result, the probability of non-integer decision variables
for typical objects vanishes as the number of objectsM grows large.
Before we state the proof of Theorem 5, we introduce two auxiliary results, which are proven in
Appendices A.4 and A.5).
The first auxiliary result shows that the number of cached objects hi goes to infinity as the cache
capacity C and the number of objectsM go to infinity.
Lemma 3 (Proof in Section A.4). For i > N ∗ from Definition 5, P [hi → ∞] = 1 asM → ∞.
The second auxiliary result derives an exponential bound on the lower tail of the distribution of
the coupon collector stopping time as it gets further from its mean (roughly bi logbi ).
Lemma 4 (Proof in Section A.5). The time Tb,q to collect b > 1 coupons, which have equalprobabilities q = (1/b, . . . , 1/b), is lower bounded by
P[Tb,q ≤ b logb − c b
]< e−c for all c > 0 . (29)
With these results in place, we are ready to prove Theorem 5. This proof proceeds by using
elementary probability theory and exploits our previous definitions of Li , TBi , and Tbi ,q .
Proof of Theorem 5. We know from Theorem 4 that the probability of non-integer decision
variables can be upper bounded using the random variables Li and TBi .
P [0 < xi < 1] ≤ P[Li > TBi
](30)
We expand this expression by conditioning on Li = l .
=
∞∑l=1
P[TBi < l |Li = l
]P [Li = l] (31)
We observe that P[TBi < l
]= 0 for l ≤ bi because requesting bi distinct objects takes at least bi
time steps.
=
∞∑l=bi+1
P[TBi < l |Li = l
]P [Li = l] (32)
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:34 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
We use the fact that conditioned on Li = l , events Li = l and TBi < l are stochastically
independent.
=
∞∑l=bi+1
P[TBi < l
]P [Li = l] (33)
We split this sum into two parts, l ≤ Λ and l > Λ, where Λ = 1
2bi logbi is chosen such that Λ scales
slower than the expectation of the underlying coupon collector problem with bi = δhi coupons.(Recall that δ = |Bi |/|Hi | is the largest fraction of objects in Hi , defined in Definition 5.)
≤
Λ∑l=bi
P[TBi < l
]P [Li = l] (34)
+
∞∑l=Λ+1
P[TBi < l
]P [Li = l] (35)
≤
Λ∑l=bi
P[TBi < l
]+
∞∑l=Λ+1
P [Li = l] (36)
We now bound the two terms in Eq. (36), separately. For the first term, we start by applying
Lemma 1.
Λ∑l=bi
P[TBi < l
]≤
Λ∑l=bi
P[Tbi ,q ≤ l
](37)
We rearrange the sum (replacing l by c).
=
1+logbi∑c= 1
2logbi
P[Tbi ,q ≤ bi logbi − c bi
](38)
We apply Lemma 4.
<
1+logbi∑c= 1
2logbi
e−c (39)
We solve the finite exponential sum.
=e2
e2 − e
1
√bi
(40)
For the second term in Eq. (36), we use the fact that Li ’s distribution is Geometric(ρσi ) due toAssumption 2.
∞∑l=Λ+1
P [Li = l] =∞∑
l=Λ+1
(1 − ρσi
) l−1ρσi (41)
We solve the finite sum.
=(1 − ρσi
)Λ(42)
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:35
We apply Definition 5, i.e., ρσi ≥1
bi log logbi.
≤
(1 −
1
bi log logbi
) 1
2bi logbi
(43)
Finally, combining Eqs. (40) and (43) yields the following.
P[Li ≥ Tbi ,q
]<
e2
e2 − e
1
√bi+
(1 −
1
bi log logbi
)bi logbi(44)
As bi log bi grows faster than bi log logbi , this proves the statement P[Li ≥ Tbi ,q
]→ 0 as hi → ∞
(implying that bi = δhi → ∞) due to Lemma 3.
A.4 Proof of Lemma 3This section proves that, for any time i > N ∗
, the number of cached objects hi grows infinitelylarge as the number of objectsM goes to infinity. Recall that N ∗
denotes the time after which the
cache needs to evict at least one object. Throughout the proof, let E denote the complementary
event of an event E.Intuition of the proof. The proof exploits the fact that at least h∗ = C/maxk sk distinct objects fit
into the cache at any time, and that FOO finds an optimal solution (Theorem 1). Due to Assumption 3,
M → ∞ implies that C → ∞, and thus h∗ → ∞. So, the case where FOO caches only a finite
number of objects requires that hi < h∗. Whenever hi < h∗ occurs, there cannot exist intervals thatFOO could put into the cache. If any intervals could be put into the cache, FOO would cache them,
due to its optimality.
Our proof is by induction. We first show that the case of no intervals that could be put into the
cache has zero probability, and then prove this successively for larger thresholds.
Proof of Lemma 3. We assume that 0 < h∗ < M because h∗ ∈ 0,M leads to trivial hit
ratios ∈ 0, 1.For arbitrary i > N ∗
and any x that is constant in M , we consider the event X = hi ≤ x.Furthermore, let Z = hi ≤ z for any 0 ≤ z ≤ x . We prove that P [X ] vanishes as M grows by
induction over z and corresponding P [Z ].Figure 20 sketches the number of objects over a time interval including i . Note that, for any
z ≤ x , we can take a large enoughM such that z < h∗, because h∗ → ∞. So the figure shows h∗ > z.The figure also defines the time interval [u,v], where u is the last time before i when FOO cached
h∗ objects, and v is the next time after i when FOO caches h∗ objects. So, for times j ∈ (u,v), itholds that hj < h∗.
Induction base: z = 0 and event Z = hi ≤ 0. In other words, Z means the cache is empty. Let Γdenote the set of distinct objects requested in the interval (u, i]. Note that the event Z requires that,
during (u, i], FOO stopped caching all h∗ objects. Because FOO only changes caching decisions at
interval boundaries, Γ must at least contain h∗ objects. Using the same argument, we observe that
there happen at least h∗ requests to distinct objects in [i,v).A request to any Γ object in [i,v) makes Z impossible. Formally, let A denote the event that
any object in Γ is requested again in [i,v). We observe that A ⇒ Z because any Γ-object that isrequested in [i,v) must be cached by FOO due to FOO-L’s optimality (Theorem 1). By inverting the
implication we obtain Z ⇒ A and thus P [Z ] ≤ P[A].
We next upper bound P[A]. We start by observing that P [A] is minimized (and thus P
[A]is
maximized) if all objects are requested with equal popularities. This follows because, if popularities
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
32:36 Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter
h*
i
z
#C
ach
ed
Ob
ject
s
Timeu v
repeated -request
Fig. 20. Sketch of the event Z = hi ≤ z, which happens with vanishing probability if z is a constant withrespect toM . The times u and v denote the beginning and the end of the current period where the number ofcached objects is less than h∗ = C/maxk sk , the fewest number of objects that fit in the cache. We define theset Γ of objects that are requested in (u, i]. If any object in Γ is requested in [i,v), then FOO must cache thisobject (green interval). If such an interval exists, hi > z and thus Z cannot happen.
are not equal, popular objects are more likely to be in Γ than unpopular objects due to the popular
object’s higher sampling probability (similar to the inspection paradox). When Γ contains more
popular objects, it is more likely that we repeat a request in [i,v), and thus P [A] increases (P[A]
decreases).
We upper bound P[A]by assuming that objects are requested with equal probability ρk = 1/M
for 1 ≤ k ≤ M . As the number of Γ-objects is at least h∗, the probability of requesting any Γ-objectis at least h∗/M . Further, we know that v − i ≥ h∗ and so
P[A]≤
(1 −
h∗
M
)h∗
.
We arrive at the following bound.
P [hi ≤ z] = P [Z ] ≤ P[A]≤
(1 −
h∗
M
)h∗
−→ 0 asM → ∞ (45)
Induction step: z − 1 → z for z ≤ x . We assume that the probability of caching only z − 1 objects
goes to zero asM → ∞. We prove the same statement for z objects.As for the induction base, let Γ denote the set of distinct objects requested in the interval (u, i],
excluding objects in Hi . We observe that |Γ | ≥ h∗ − z, following a similar argument.
We define P [A] as above and use the induction assumption. As the probability of less than z − 1
is vanishingly small, it must be that hi ≥ z. Thus, a request to any Γ object in [i,v) makes hi = z
impossible. Consequently, Z ⇒ A and thus P [Z ] ≤ P[A].
To bound P [A], we focus on the requests in [i,v) that do not go Hi -objects. There are at least
h∗ − x such requests. P [A] is minimized if all objects, ignoring objects in Hi , are requested with
equal popularities. We thus upper bound P[A]by assuming the condition requests happen to
objects with equal probability ρk = 1/(M − z) for 1 ≤ k ≤ M − z. As before, we conclude that theprobability of requesting any Γ-object is at least (h∗ − z)/(M − z) and we use the fact that there are
at least h∗ − x to them in [i,v).
P [hi ≤ z] = P [Z ] ≤ P[A]
(46)
≤
(1 −
h∗ − z
M − z
)h∗−z
(47)
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.
Practical Bounds on Optimal Caching with Variable Object Sizes 32:37
We then use that z ≤ x and that x is constant inM .
≤
(1 −
h∗ − x
M
)h∗−x
−→ 0 asM → ∞ (48)
In summary, the number of objects cached by FOO at an arbitrary time i remains constant only
with vanishingly small probability. Consequently, this number grows to infinity with probability
one.
A.5 Proof of Lemma 4Proof of Lemma 4. We consider the time Tb,q to collect b > 1 coupons, which have equal
probabilities q = (1/b, . . . , 1/b). To simplify notation, we set T = Tb,q throughput this proof.
We first transform our term using the exponential function, which is strictly monotonic.
P [T ≤ b logb − c b] = P[e−sT ≤ e−s(b logb−c b)
]for all s > 0 (49)
We next apply the Chernoff bound.
P[e−sT ≤ e−s(b logb−c b)
]≤ E
[e−sT
]es(b logb−c b)
(50)
To derive E[e−sT
], we observe that T =
∑bi=1Ti , where Ti is the time between collecting the
(i − 1)-th unique coupon and the i-th unique coupon. As allTi are independent, we obtain a product
of Laplace-Stieltjes transforms.
E[e−sT
]=
b∏i=1
E[e−sTi
](51)
We derive the individual transforms.
E[e−sTi
]=
∞∑k=1
e−s kpi (1 − pi )k−1
(52)
=pi
es + pi − 1
(53)
We plug the coupon probabilities pi = 1 − i−1b =
b−i+1b into Eq. (51), and simplify by reversing the
product order.
b∏i=1
E[e−sTi
]=
b∏i=1
(b − i + 1)/b
es + (b − i + 1)/b − 1
=
b∏j=1
j/b
es + j/b − 1
(54)
Finally, we choose s = 1
b , which yields es = e1/b ≥ 1 + 1/b and simplifies the product.
b∏j=1
j/b
es + j/b − 1
≤
b∏j=1
j/b
1/b + j/b=
1
b + 1(55)
This gives the statement of the lemma.
P [T ≤ b logb − c b] ≤1
b + 1eb logb−c b
b < e−c (56)
Received February 2018; revised April 2018; accepted June 2018
Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, No. 2, Article 32. Publication date: June 2018.