39
ElCached: Elastic Multi-Level Key-Value Cache Rahman Lavaee, Stephen Choi, Yang-Suk Kee, Chen Ding November 1 st , Savannah, GA 1

ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

ElCached:ElasticMulti-LevelKey-ValueCache

RahmanLavaee,StephenChoi,Yang-SukKee,

ChenDingNovember1st,Savannah,GA

1

Page 2: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

Backenddatabaseserver

WebApp

2

Page 3: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

Backenddatabaseserver

WebApp

3

Page 4: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

WebCaching

Memcached nodes

Backenddatabaseserver

WebApp

4

Page 5: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

ResourceProvisioningforMemcached

5

Page 6: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

ResourceProvisioningforMemcached

• Theserviceprovidermustwiselyallocatetheresourcetoguaranteeeachtenant’sSLA,whileminimizingTCO.

6

Page 7: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

ElasticResourceProvisioning

• Optimalresourceprovisioningrequireselasticity• capabilitytoadapttoworkloadchangesbydynamicresourceprovisioning.

7

Page 8: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

MultitenantResourceProvisioning

• Theproblembecomesmorecomplexas• moretenantsareaddedtothesystem.• morewebcachinglayersareused.

8

Page 9: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

MlCached [HotCloud’16]

• Amulti-levelkey-value cachingsystem.• L1:DRAM-basedMemcached• L2:exclusiveNAND-flash-basedkey-valuecache(SSD).

• MlCached implementsdirectkey-to-PBAmappingsonSSD.• Independentresourceprovisioning

• Inthiswork,weextendMlCached byaddingtheelasticityfeature.

9

Page 10: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

IndependentResourceProvisioninginMlCached

• MlCached implementsdirectkey-to-PBAmappingsonSSD.• Thisremovestheneedforstoringredundantkey-to-LBAmappingtablesinmemory

10

Page 11: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

11

Page 12: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

LatencyModel

𝑙" LatencyofMemcached server

𝑙# Latency ofSSD

𝑙$% LatencyofbackendDBserver

M' MissrateofMemcached

𝑀# MissrateofSSD

𝐿𝑎𝑡 = 𝑙" + 𝑙#.M' + 𝑙$%.𝑀#

12

Page 13: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

CostModel

p' Priceperunitof DRAM

𝑝# Price perunitofSSD

c' SizeofDRAM

𝑐# SizeofSSD

𝐶𝑜𝑠𝑡 = 𝑐". 𝑝" + 𝑐#. 𝑝#

13

Page 14: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

LatencyBasedonMissRatioCurve

• Thekeytooptimalresourceprovisioningistofindthemissratiocurve (MRC).

Capacity

MissRatio

𝐿𝑎𝑡 = 𝑙" + 𝑙#.𝑚r c' + 𝑙$%.𝑚𝑟 𝑐#

14

Page 15: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

15

Page 16: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

ReuseDistance

• Weusethereusedistancetheorytocomputethemissratiocurve.• Reusedistanceisthenumberofdistinctmemorylocationsaccessedbetweentwoconsecutiveusesofthesamememorylocation.

a b c d b a c

∞ ∞ ∞ ∞ 3 4 4

16

Page 17: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

ReuseDistanceHistogram

• Thereusedistanceinformationisbestrepresentedbythereusedistancehistogram,whichshowsthefrequencyforeveryreusedistance.• TheMRCcanbecomputedfromthishistogram.

a b c d b a c

∞ ∞ ∞ ∞ 3 4 4

0

1

2

3

4

1 2 3 4 ∞

reuse distance

frequ

ency

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4 5cache size

mis

s ra

tio

17

Page 18: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

ReuseDistanceComputation

• Olken tree[Olken 1981]• Approximatereusedistance[Ding+2001]• Footprintestimation[Xiang+2011]• Stackcounters[Wires+2014]

18

Page 19: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

ReuseDistanceComputation

• Olken tree[Olken 1981]• Approximatereusedistance[Ding+2001]• Footprintestimation[Xiang+2011]• Stackcounters[Wires+2014]

Program Locality Analysis Using Reuse Distance • 20:5

Fig. 2. An example illustrating the reuse-distance measurement. Part (a) shows a reuse distance.Parts (b) and (c) show its measurement by the Bennett-Kruskal algorithm and the Olken algorithm.Part (d) shows our approximate measurement with a guaranteed precision of 33%.

case, the previous access may occur at the beginning of the trace, the differencein access time is up to T − 1, and the reuse distance is up to N − 1. In largeapplications, T can be over 100 billion, and N is often in the tens of millions.

We use the example in Figure 2 to introduce two previous solutions andthen describe the basic idea for our solution. Part (a) shows an example trace.Suppose we want to find the reuse distance between the two accesses of b at time4 and 12. A solution has to store enough information about the trace historybefore time 12. Bennett and Kruskal [1975] discovered that it is sufficient tostore only the last access of each datum, as shown in Part (b) for the exampletrace. The reuse distance is measured by counting the number of last accesses,stored in a bit vector rather than using the original trace.

The efficiency was improved by Olken [1981], who organized the last accessesas nodes in a search tree keyed by their access time. The Olken-style tree forthe example trace has 7 nodes, one for the last access of each datum, as shownin Figure 2(c). The reuse distance is measured by counting the number of nodeswhose key values are between 4 and 12. The counting can be done in a singletree search, first finding the node with key value 4 and then backing up to theroot accumulating the subtree weights [Olken 1981]. Since the algorithm needsone tree node for each data location, the search tree can grow to a significantsize when analyzing programs with a large amount of data.

While it is costly to measure long reuse distances, we rarely need the exactlength. Often the first few digits suffice. For example, if a reuse distance isabout one million, it rarely matters whether the exact value is one million orone million and one. Next we describe two approximate algorithms that extendthe Olken algorithm by adapting and trimming the search tree.

The new algorithms guarantee two types of precision for the approximatedistance, dapproximate, compared to the actual distance, dactual. In both types, the

ACM Transactions on Programming Languages and Systems, Vol. 31, No. 6, Article 20, Pub. date: August 2009.

19

Page 20: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

ReuseDistanceforMemcached

• Memcached distributesitemsamongdifferentslabclasses,accordingtotheirsizes.• Slaballocationisdoneduringthecoldstart.

20

Page 21: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

LRUReplacementinMemcached

• OncetheMemcached systemreachesitsmemorylimit,LRUreplacementisdoneseparately foreveryslabclass.

Item Item...

Item

LRUHead LRUTail

SlabClass1

SlabClass2

......

...

Item Item...

Item...

21

Page 22: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

Slab-AwareReuseDistanceProfiling• RatherthananalyzingthewholeMemcachedsysteminasinglereusedistancemodel,wemodeleachslabclassseparately.• Wecompose theMRCsfromdifferentslabclasses.

Item Item...

Item

LRUHead LRUTail

SlabClass1

SlabClass2

......

...

Item Item...

Item...

22

Page 23: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

23

Page 24: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

ResourceProvisioningasaLinearProgram• Theresourceprovisioningproblemcanbedescribedinoneofthetwoways.• MinimizeCost suchthatLat ≤SLA.• MinimizeLat suchthatCost≤TCO.

24

Page 25: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

ResourceProvisioningasaLinearProgram• Theresourceprovisioningproblemcanbedescribedinoneofthetwoways.• MinimizeCost suchthatLat ≤SLA.• MinimizeLat suchthatCost≤TCO.

25

Page 26: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

ResourceProvisioningasaLinearProgram• Theresourceprovisioningproblemcanbedescribedinoneofthetwoways.• MinimizeCost suchthatLat ≤SLA.• MinimizeLat suchthatCost≤TCO.

• CostisalreadylinearintermsofDRAM/SSDcapacities.

26

Page 27: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

ResourceProvisioningasaLinearProgram• Theresourceprovisioningproblemcanbedescribedinoneofthetwoways.• MinimizeCost suchthatLat ≤SLA.• MinimizeLat suchthatCost≤TCO.

• CostisalreadylinearintermsofDRAM/SSDcapacities.• Latencyislinearonlyintermsofthemissratiofunction.

27

Page 28: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

ResourceProvisioningasaLinearProgram• Weobservethatthemissratiocurvesinourworkloadsarealwaysconvex.• Weformulatethemissratiocurveusinglinearconstraints.

28

Page 29: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

Evaluation

• WecompareElCached againstaproportionalapproachthatfixestheratiobetweenDRAMandSSDcapacitiesto1:4(Paretoprinciple).

29

Page 30: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

Evaluation

• WecompareElCached againstaproportionalapproachthatfixestheratiobetweenDRAMandSSDcapacitiesto1:4(Paretoprinciple).

30

Page 31: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

Evaluation

• WecompareElCached againstaproportionalapproachthatfixestheratiobetweenDRAMandSSDcapacitiesto1:4(Paretoprinciple).

• Workloads:• Zipfian keydistributionwith𝛼 = 1.15• Exponentialkeydistributionwith𝜆 = 10?@• Bothworkloadsissue800millionrequeststoarangeof4billionkeys.

31

Page 32: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

MissRatioPredictionAccuracy

• MeanrelativeerroronZipfian workload:4%

On the other hand, the elastic approach, the mechanismElCached uses, changes the ratio adaptively based on ourreuse-distance based miss-rate prediction to achieve bet-ter elasticity. That is, different data locality determinesthe optimal DRAM/SSD allocations to optimize eitherlatency or cost. We first investigate the accuracy of ourmiss-rate prediction and then, show the increased elastic-ity of Web caching service with ElCached.

For these experiments, we use Mutilate [1] to gener-ate our workloads. Mutilate emulates the ETC work-loads at Facebook using key size, value size, and inter-arrival time distributions given by Atikoglu et al. [2]. Themean key size and value sizes are respectively 31 and 247bytes. We use Mutilate to generate two workloads. Forthe first workload, we use a Zipfian key distribution witha = 1.15. For the second workload, we use an exponen-tial distribution with l = 10�6. We set the both work-load to issue 800 million requests to a range of 4 billionkeys. Note that this is a limit study to evaluate elastic-ity of ElCached compared to existing work. Therefore,it does not include the overheads and other tuning as-pects of dynamic resource provisioning; data promotion,demotion, eviction overheads and profiling window, re-configuration frequencies, and its associated efficienciesin time domain.

Prediction Accuracy We profile the workloads usingour model and predict miss rates on a logarithmic scale,and compare the results to measurement. Figure 2 showsthe result for the Zipfian workload. The mean relativeerror of prediction is 4%.

l

l

l

l

l

l

l

ll

7.5

10.0

12.5

15.0

17.5

64 256 1024 4096 16384Cache Size (MB)

Mis

s R

ate

(%)

l Predicted Measured

Figure 2: Miss ratio prediction accuracy of our reuse dis-tance model

Elastic Resource Provisioning Although both are dy-namically allocating/deallocating resources, the propor-

Memory+Remote Storage) are common from Amazon EC2, GoogleCloud Platform, and Microsoft Azure. Remote storage is out of thescope of this work.

tional approach does not change the ratio betweenDRAM and SSD. In particular, we use 1:4 ratio follow-ing the Pareto principle. On the other hand, the elas-tic approach continuously changes the ratio based onthe reuse-distance based miss-rate prediction. Figure 3shows minimum cost per latency for both techniques,and for both workloads. For example, to guarantee theaverage latency of 700µs for the Zipfian workload, elas-tic costs 0.7 and proportional costs 8.1 (cost is normal-ized with respect to the cost of proportional at 1ms). Inthis case, elastic allocates 60MB/11.7GB while propor-tional does 1.8GB/7.3GB for DRAM and SSD, respec-tively. Assuming that CSPs bill tenants proportional toactual cost, elasticity will save costs both for users andCSPs. In particular, for the Zipfian workload, elastic-ity saves around 60% of cost. Meanwhile, the Exponen-tial workload exhibits around 10% reduction in cost un-til 250µs latency requirement, but this rapidly changesto more than 70% cost improvement after 300µs. Thisis because the Exponential workload has much strongerdata locality than Zipfian. So DRAM allocation domi-nates the performance before 300µs.

Zipfian Exponentiall

lllll

ll l l

l l l l l

llllll l l l l l l

0

5

10

0.40.60.81.0

Norm

alized Cost

Relative C

ost

0.7 0.8 0.9 1.0 0.2 0.3 0.4 0.5Latency (ms)

l Prop. Elastic Elastic/Prop.

Figure 3: Elastic vs. proportional resource provisioning

We also conduct a comparison between elastic andproportional in a multi-tenant case. Here we assume avirtual cloud instance with 3 GB memory. Figure 4ashows latency for the two tenants when partitioning thefixed amount of memory. In particular, proportional par-titions memory and allocates each memory and SSD withthe fixed ratio of 1:4. On the other hand, elastic predictsan optimal configuration per tenant and tests its feasibil-ity. We compare the elastic approach to one particularmemory partition which allocates 1.63 GBs of DRAMfor tenant 1 (shown by a line in Figure 4a). As shown inFigure 4b, both approaches result in the same latency fortenant 2, whereas, for tenant 1, elastic improves latencyby 5%. Moreover, elastic reduces the cost for both ten-ants by around 26%. Elastic also saves the total mem-ory consumption by 46% by allocating only 1.63 GB

4

32

Page 33: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

Experiment1:ElasticResourceProvisioning

• Foreachworkload• Foreachlatencylimit,wefindtheminimumcostresourceprovisioning.

33

Page 34: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

Experiment1:ElasticResourceProvisioning

• Foreachworkload• Foreachlatencylimit,wefindtheminimumcostresourceprovisioning.

• Elasticsavescostbyaround60%forbothworkloads.

Latency(ms)

Cost(N

ormalize

d)Re

lativeCo

st

On the other hand, the elastic approach, the mechanismElCached uses, changes the ratio adaptively based on ourreuse-distance based miss-rate prediction to achieve bet-ter elasticity. That is, different data locality determinesthe optimal DRAM/SSD allocations to optimize eitherlatency or cost. We first investigate the accuracy of ourmiss-rate prediction and then, show the increased elastic-ity of Web caching service with ElCached.

For these experiments, we use Mutilate [1] to gener-ate our workloads. Mutilate emulates the ETC work-loads at Facebook using key size, value size, and inter-arrival time distributions given by Atikoglu et al. [2]. Themean key size and value sizes are respectively 31 and 247bytes. We use Mutilate to generate two workloads. Forthe first workload, we use a Zipfian key distribution witha = 1.15. For the second workload, we use an exponen-tial distribution with l = 10�6. We set the both work-load to issue 800 million requests to a range of 4 billionkeys. Note that this is a limit study to evaluate elastic-ity of ElCached compared to existing work. Therefore,it does not include the overheads and other tuning as-pects of dynamic resource provisioning; data promotion,demotion, eviction overheads and profiling window, re-configuration frequencies, and its associated efficienciesin time domain.

Prediction Accuracy We profile the workloads usingour model and predict miss rates on a logarithmic scale,and compare the results to measurement. Figure 2 showsthe result for the Zipfian workload. The mean relativeerror of prediction is 4%.

l

l

l

l

l

l

l

ll

7.5

10.0

12.5

15.0

17.5

64 256 1024 4096 16384Cache Size (MB)

Mis

s R

ate

(%)

l Predicted Measured

Figure 2: Miss ratio prediction accuracy of our reuse dis-tance model

Elastic Resource Provisioning Although both are dy-namically allocating/deallocating resources, the propor-

Memory+Remote Storage) are common from Amazon EC2, GoogleCloud Platform, and Microsoft Azure. Remote storage is out of thescope of this work.

tional approach does not change the ratio betweenDRAM and SSD. In particular, we use 1:4 ratio follow-ing the Pareto principle. On the other hand, the elas-tic approach continuously changes the ratio based onthe reuse-distance based miss-rate prediction. Figure 3shows minimum cost per latency for both techniques,and for both workloads. For example, to guarantee theaverage latency of 700µs for the Zipfian workload, elas-tic costs 0.7 and proportional costs 8.1 (cost is normal-ized with respect to the cost of proportional at 1ms). Inthis case, elastic allocates 60MB/11.7GB while propor-tional does 1.8GB/7.3GB for DRAM and SSD, respec-tively. Assuming that CSPs bill tenants proportional toactual cost, elasticity will save costs both for users andCSPs. In particular, for the Zipfian workload, elastic-ity saves around 60% of cost. Meanwhile, the Exponen-tial workload exhibits around 10% reduction in cost un-til 250µs latency requirement, but this rapidly changesto more than 70% cost improvement after 300µs. Thisis because the Exponential workload has much strongerdata locality than Zipfian. So DRAM allocation domi-nates the performance before 300µs.

Zipfian Exponentiall

lllll

ll l l

l l l l l

llllll l l l l l l

0

5

10

0.40.60.81.0

Norm

alized Cost

Relative C

ost

0.7 0.8 0.9 1.0 0.2 0.3 0.4 0.5Latency (ms)

l Prop. Elastic Elastic/Prop.

Figure 3: Elastic vs. proportional resource provisioning

We also conduct a comparison between elastic andproportional in a multi-tenant case. Here we assume avirtual cloud instance with 3 GB memory. Figure 4ashows latency for the two tenants when partitioning thefixed amount of memory. In particular, proportional par-titions memory and allocates each memory and SSD withthe fixed ratio of 1:4. On the other hand, elastic predictsan optimal configuration per tenant and tests its feasibil-ity. We compare the elastic approach to one particularmemory partition which allocates 1.63 GBs of DRAMfor tenant 1 (shown by a line in Figure 4a). As shown inFigure 4b, both approaches result in the same latency fortenant 2, whereas, for tenant 1, elastic improves latencyby 5%. Moreover, elastic reduces the cost for both ten-ants by around 26%. Elastic also saves the total mem-ory consumption by 46% by allocating only 1.63 GB

4

On the other hand, the elastic approach, the mechanismElCached uses, changes the ratio adaptively based on ourreuse-distance based miss-rate prediction to achieve bet-ter elasticity. That is, different data locality determinesthe optimal DRAM/SSD allocations to optimize eitherlatency or cost. We first investigate the accuracy of ourmiss-rate prediction and then, show the increased elastic-ity of Web caching service with ElCached.

For these experiments, we use Mutilate [1] to gener-ate our workloads. Mutilate emulates the ETC work-loads at Facebook using key size, value size, and inter-arrival time distributions given by Atikoglu et al. [2]. Themean key size and value sizes are respectively 31 and 247bytes. We use Mutilate to generate two workloads. Forthe first workload, we use a Zipfian key distribution witha = 1.15. For the second workload, we use an exponen-tial distribution with l = 10�6. We set the both work-load to issue 800 million requests to a range of 4 billionkeys. Note that this is a limit study to evaluate elastic-ity of ElCached compared to existing work. Therefore,it does not include the overheads and other tuning as-pects of dynamic resource provisioning; data promotion,demotion, eviction overheads and profiling window, re-configuration frequencies, and its associated efficienciesin time domain.

Prediction Accuracy We profile the workloads usingour model and predict miss rates on a logarithmic scale,and compare the results to measurement. Figure 2 showsthe result for the Zipfian workload. The mean relativeerror of prediction is 4%.

l

l

l

l

l

l

l

ll

7.5

10.0

12.5

15.0

17.5

64 256 1024 4096 16384Cache Size (MB)

Mis

s R

ate

(%)

l Predicted Measured

Figure 2: Miss ratio prediction accuracy of our reuse dis-tance model

Elastic Resource Provisioning Although both are dy-namically allocating/deallocating resources, the propor-

Memory+Remote Storage) are common from Amazon EC2, GoogleCloud Platform, and Microsoft Azure. Remote storage is out of thescope of this work.

tional approach does not change the ratio betweenDRAM and SSD. In particular, we use 1:4 ratio follow-ing the Pareto principle. On the other hand, the elas-tic approach continuously changes the ratio based onthe reuse-distance based miss-rate prediction. Figure 3shows minimum cost per latency for both techniques,and for both workloads. For example, to guarantee theaverage latency of 700µs for the Zipfian workload, elas-tic costs 0.7 and proportional costs 8.1 (cost is normal-ized with respect to the cost of proportional at 1ms). Inthis case, elastic allocates 60MB/11.7GB while propor-tional does 1.8GB/7.3GB for DRAM and SSD, respec-tively. Assuming that CSPs bill tenants proportional toactual cost, elasticity will save costs both for users andCSPs. In particular, for the Zipfian workload, elastic-ity saves around 60% of cost. Meanwhile, the Exponen-tial workload exhibits around 10% reduction in cost un-til 250µs latency requirement, but this rapidly changesto more than 70% cost improvement after 300µs. Thisis because the Exponential workload has much strongerdata locality than Zipfian. So DRAM allocation domi-nates the performance before 300µs.

Zipfian Exponentiall

lllll

ll l l

l l l l l

llllll l l l l l l

0

5

10

0.40.60.81.0

Norm

alized Cost

Relative C

ost

0.7 0.8 0.9 1.0 0.2 0.3 0.4 0.5Latency (ms)

l Prop. Elastic Elastic/Prop.

Figure 3: Elastic vs. proportional resource provisioning

We also conduct a comparison between elastic andproportional in a multi-tenant case. Here we assume avirtual cloud instance with 3 GB memory. Figure 4ashows latency for the two tenants when partitioning thefixed amount of memory. In particular, proportional par-titions memory and allocates each memory and SSD withthe fixed ratio of 1:4. On the other hand, elastic predictsan optimal configuration per tenant and tests its feasibil-ity. We compare the elastic approach to one particularmemory partition which allocates 1.63 GBs of DRAMfor tenant 1 (shown by a line in Figure 4a). As shown inFigure 4b, both approaches result in the same latency fortenant 2, whereas, for tenant 1, elastic improves latencyby 5%. Moreover, elastic reduces the cost for both ten-ants by around 26%. Elastic also saves the total mem-ory consumption by 46% by allocating only 1.63 GB

4

Cost(N

ormalize

d)

On the other hand, the elastic approach, the mechanismElCached uses, changes the ratio adaptively based on ourreuse-distance based miss-rate prediction to achieve bet-ter elasticity. That is, different data locality determinesthe optimal DRAM/SSD allocations to optimize eitherlatency or cost. We first investigate the accuracy of ourmiss-rate prediction and then, show the increased elastic-ity of Web caching service with ElCached.

For these experiments, we use Mutilate [1] to gener-ate our workloads. Mutilate emulates the ETC work-loads at Facebook using key size, value size, and inter-arrival time distributions given by Atikoglu et al. [2]. Themean key size and value sizes are respectively 31 and 247bytes. We use Mutilate to generate two workloads. Forthe first workload, we use a Zipfian key distribution witha = 1.15. For the second workload, we use an exponen-tial distribution with l = 10�6. We set the both work-load to issue 800 million requests to a range of 4 billionkeys. Note that this is a limit study to evaluate elastic-ity of ElCached compared to existing work. Therefore,it does not include the overheads and other tuning as-pects of dynamic resource provisioning; data promotion,demotion, eviction overheads and profiling window, re-configuration frequencies, and its associated efficienciesin time domain.

Prediction Accuracy We profile the workloads usingour model and predict miss rates on a logarithmic scale,and compare the results to measurement. Figure 2 showsthe result for the Zipfian workload. The mean relativeerror of prediction is 4%.

l

l

l

l

l

l

l

ll

7.5

10.0

12.5

15.0

17.5

64 256 1024 4096 16384Cache Size (MB)

Mis

s R

ate

(%)

l Predicted Measured

Figure 2: Miss ratio prediction accuracy of our reuse dis-tance model

Elastic Resource Provisioning Although both are dy-namically allocating/deallocating resources, the propor-

Memory+Remote Storage) are common from Amazon EC2, GoogleCloud Platform, and Microsoft Azure. Remote storage is out of thescope of this work.

tional approach does not change the ratio betweenDRAM and SSD. In particular, we use 1:4 ratio follow-ing the Pareto principle. On the other hand, the elas-tic approach continuously changes the ratio based onthe reuse-distance based miss-rate prediction. Figure 3shows minimum cost per latency for both techniques,and for both workloads. For example, to guarantee theaverage latency of 700µs for the Zipfian workload, elas-tic costs 0.7 and proportional costs 8.1 (cost is normal-ized with respect to the cost of proportional at 1ms). Inthis case, elastic allocates 60MB/11.7GB while propor-tional does 1.8GB/7.3GB for DRAM and SSD, respec-tively. Assuming that CSPs bill tenants proportional toactual cost, elasticity will save costs both for users andCSPs. In particular, for the Zipfian workload, elastic-ity saves around 60% of cost. Meanwhile, the Exponen-tial workload exhibits around 10% reduction in cost un-til 250µs latency requirement, but this rapidly changesto more than 70% cost improvement after 300µs. Thisis because the Exponential workload has much strongerdata locality than Zipfian. So DRAM allocation domi-nates the performance before 300µs.

Zipfian Exponentiall

lllll

ll l l

l l l l l

llllll l l l l l l

0

5

10

0.40.60.81.0

Norm

alized Cost

Relative C

ost

0.7 0.8 0.9 1.0 0.2 0.3 0.4 0.5Latency (ms)

l Prop. Elastic Elastic/Prop.

Figure 3: Elastic vs. proportional resource provisioning

We also conduct a comparison between elastic andproportional in a multi-tenant case. Here we assume avirtual cloud instance with 3 GB memory. Figure 4ashows latency for the two tenants when partitioning thefixed amount of memory. In particular, proportional par-titions memory and allocates each memory and SSD withthe fixed ratio of 1:4. On the other hand, elastic predictsan optimal configuration per tenant and tests its feasibil-ity. We compare the elastic approach to one particularmemory partition which allocates 1.63 GBs of DRAMfor tenant 1 (shown by a line in Figure 4a). As shown inFigure 4b, both approaches result in the same latency fortenant 2, whereas, for tenant 1, elastic improves latencyby 5%. Moreover, elastic reduces the cost for both ten-ants by around 26%. Elastic also saves the total mem-ory consumption by 46% by allocating only 1.63 GB

4

RelativeCo

st

Latency(ms)

On the other hand, the elastic approach, the mechanismElCached uses, changes the ratio adaptively based on ourreuse-distance based miss-rate prediction to achieve bet-ter elasticity. That is, different data locality determinesthe optimal DRAM/SSD allocations to optimize eitherlatency or cost. We first investigate the accuracy of ourmiss-rate prediction and then, show the increased elastic-ity of Web caching service with ElCached.

For these experiments, we use Mutilate [1] to gener-ate our workloads. Mutilate emulates the ETC work-loads at Facebook using key size, value size, and inter-arrival time distributions given by Atikoglu et al. [2]. Themean key size and value sizes are respectively 31 and 247bytes. We use Mutilate to generate two workloads. Forthe first workload, we use a Zipfian key distribution witha = 1.15. For the second workload, we use an exponen-tial distribution with l = 10�6. We set the both work-load to issue 800 million requests to a range of 4 billionkeys. Note that this is a limit study to evaluate elastic-ity of ElCached compared to existing work. Therefore,it does not include the overheads and other tuning as-pects of dynamic resource provisioning; data promotion,demotion, eviction overheads and profiling window, re-configuration frequencies, and its associated efficienciesin time domain.

Prediction Accuracy We profile the workloads usingour model and predict miss rates on a logarithmic scale,and compare the results to measurement. Figure 2 showsthe result for the Zipfian workload. The mean relativeerror of prediction is 4%.

l

l

l

l

l

l

l

ll

7.5

10.0

12.5

15.0

17.5

64 256 1024 4096 16384Cache Size (MB)

Mis

s R

ate

(%)

l Predicted Measured

Figure 2: Miss ratio prediction accuracy of our reuse dis-tance model

Elastic Resource Provisioning Although both are dy-namically allocating/deallocating resources, the propor-

Memory+Remote Storage) are common from Amazon EC2, GoogleCloud Platform, and Microsoft Azure. Remote storage is out of thescope of this work.

tional approach does not change the ratio betweenDRAM and SSD. In particular, we use 1:4 ratio follow-ing the Pareto principle. On the other hand, the elas-tic approach continuously changes the ratio based onthe reuse-distance based miss-rate prediction. Figure 3shows minimum cost per latency for both techniques,and for both workloads. For example, to guarantee theaverage latency of 700µs for the Zipfian workload, elas-tic costs 0.7 and proportional costs 8.1 (cost is normal-ized with respect to the cost of proportional at 1ms). Inthis case, elastic allocates 60MB/11.7GB while propor-tional does 1.8GB/7.3GB for DRAM and SSD, respec-tively. Assuming that CSPs bill tenants proportional toactual cost, elasticity will save costs both for users andCSPs. In particular, for the Zipfian workload, elastic-ity saves around 60% of cost. Meanwhile, the Exponen-tial workload exhibits around 10% reduction in cost un-til 250µs latency requirement, but this rapidly changesto more than 70% cost improvement after 300µs. Thisis because the Exponential workload has much strongerdata locality than Zipfian. So DRAM allocation domi-nates the performance before 300µs.

Zipfian Exponentiall

lllll

ll l l

l l l l l

llllll l l l l l l

0

5

10

0.40.60.81.0

Norm

alized Cost

Relative C

ost

0.7 0.8 0.9 1.0 0.2 0.3 0.4 0.5Latency (ms)

l Prop. Elastic Elastic/Prop.

Figure 3: Elastic vs. proportional resource provisioning

We also conduct a comparison between elastic andproportional in a multi-tenant case. Here we assume avirtual cloud instance with 3 GB memory. Figure 4ashows latency for the two tenants when partitioning thefixed amount of memory. In particular, proportional par-titions memory and allocates each memory and SSD withthe fixed ratio of 1:4. On the other hand, elastic predictsan optimal configuration per tenant and tests its feasibil-ity. We compare the elastic approach to one particularmemory partition which allocates 1.63 GBs of DRAMfor tenant 1 (shown by a line in Figure 4a). As shown inFigure 4b, both approaches result in the same latency fortenant 2, whereas, for tenant 1, elastic improves latencyby 5%. Moreover, elastic reduces the cost for both ten-ants by around 26%. Elastic also saves the total mem-ory consumption by 46% by allocating only 1.63 GB

4

34

Page 35: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

Experiment2:MultitenantResourceProvisioning

• First,weusetheproportionalschemetopartitionafixed3GBmemorybetweenthetwotenants.

compared to the 3 GB of physical memory. That is thememory consumption of tenant 1 alone, when taking theproportional approach. This means potential revenue forCSP because it could potentially host more tenants aslong as other resources (CPU, network, etc.) are avail-able.

l

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

0.25

0.50

1.00

2.00

4.00

8.00

0 1 1.63 2 3DRAM Partition for T1 (GB)

Late

ncy

(ms)

l Tenant 1 Tenant 2

(a)

ProportionalT1 T2

DRAM 1.63 1.37Lat. 0.71 0.20Cost 7.20 5.41

ElasticT1 T2

DRAM 0.46 1.17Lat. 0.67 0.20Cost 5.30 4.01

(b)

Figure 4: Multi-tenant resource provisioning: (a) latencyper DRAM partitioning with proportional SSD size (1:4ratio) (b) elastic vs proportional with the specified mem-ory partition

6 Online Resource Provisioning

Our study of online resource provisioning is preliminaryand remains as future work. Here, we explain the basicsof our online implementation.

Memcached allocates memory in units of fixed size(called slabs) until it reaches its memory limit. Thus wecan control resource provisioning by dynamically recon-figuring the memory limit parameter. Increasing this pa-rameter allows Memcached to boost memory provisionto the new limit. To lower the provision, we must findand deallocate victim slabs. Prior to deallocation, allthe items in the victim slabs must be removed from theLRU chains. We choose victim slabs from different sizeclasses to ensure that the number of slabs in different sizeclasses remains proportional to the initial setting. To finda victim slab in each size class, we monitor the activity ofall the slabs during a short period of time. Slab activity isdefined as the number of distinct items accessed duringthat period. For each slab, we estimate its activity usinga Hyperloglog counter [8] with logarithmic space over-head. After the monitoring period, we choose the leastactive slabs as the victim slabs. We keep deallocatingslabs until the new memory limit is satisfied.

Our online analysis consists of a profiling window fol-lowed by a release period. During the profiling period,we use the reuse distance analysis to find the miss rate

for every cache size. At the end of the profiling window,we formulate and solve a linear program according to theconstraints and the objective function. The solution givesthe new Memcached and KVD sizes. Then, we recon-figure the ElCached system using our above-mentionedprocedure.

The study of online resource provisioning remains asfuture work. An important question is how the profil-ing window length affects the accuracy and optimalityof solution. Our future work also includes analyzing thememory and time overheads of the online reuse distanceanalysis.

7 Related Work

Our study is related to previous work on optimizingWeb caching applications. For example, Dynacache [4]optimizes Memcached by profiling workloads and dy-namically adjusting memory resources and eviction poli-cies. Saemundsson et al. [14] introduced MIMIR, alightweight online profiler for Memcached. Analogousto us, they used a tree-based scheme to estimate reusedistance. To lower the overhead, MIMIR divides theLRU stack into variable-sized buckets. However, con-trary to us, MIMIR and Dynacache both assume a fixedsize for all cached elements, which isn’t the case in Mem-cached. Hu et al. [10] introduced LAMA to improve slaballocation in Memcached. Like us, they also estimate themiss ratio for every slab class.

Reuse distance computation has been well-studied inthe past. Mattson et al. [12] introduced a basic stack sim-ulation algorithm. Olken [13] gave the first tree-basedmethod using an AVL tree. Ding and Zhong [6] useda similar technique to approximate reuse distance whilereducing the time overhead. Xiang et al. [18] definedthe footprint of a given trace to be the number of distinctelements accessed in the window. They showed that un-der a certain condition (reuse-window condition) reusedistance is equal to the finite differential of average foot-print. We implemented the footprint approach and foundits accuracy to be slightly lower than our tree-based algo-rithm. These techniques all require a linear space over-head. Wires et al. [17] used counter stacks to approxi-mate the miss ratio curve with sublinear space overhead.These studies can be integrated into ElCached to reducethe space and time overhead in the online analysis.

8 Conclusion

In this paper, we described ElCached, a multi-level key-value caching system that uses a reuse distance profilerto estimate the miss rate curve across all capacity limitswith 4% relative error. ElCached solves the resource pro-

5

compared to the 3 GB of physical memory. That is thememory consumption of tenant 1 alone, when taking theproportional approach. This means potential revenue forCSP because it could potentially host more tenants aslong as other resources (CPU, network, etc.) are avail-able.

l

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

0.25

0.50

1.00

2.00

4.00

8.00

0 1 1.63 2 3DRAM Partition for T1 (GB)

Late

ncy

(ms)

l Tenant 1 Tenant 2

(a)

ProportionalT1 T2

DRAM 1.63 1.37Lat. 0.71 0.20Cost 7.20 5.41

ElasticT1 T2

DRAM 0.46 1.17Lat. 0.67 0.20Cost 5.30 4.01

(b)

Figure 4: Multi-tenant resource provisioning: (a) latencyper DRAM partitioning with proportional SSD size (1:4ratio) (b) elastic vs proportional with the specified mem-ory partition

6 Online Resource Provisioning

Our study of online resource provisioning is preliminaryand remains as future work. Here, we explain the basicsof our online implementation.

Memcached allocates memory in units of fixed size(called slabs) until it reaches its memory limit. Thus wecan control resource provisioning by dynamically recon-figuring the memory limit parameter. Increasing this pa-rameter allows Memcached to boost memory provisionto the new limit. To lower the provision, we must findand deallocate victim slabs. Prior to deallocation, allthe items in the victim slabs must be removed from theLRU chains. We choose victim slabs from different sizeclasses to ensure that the number of slabs in different sizeclasses remains proportional to the initial setting. To finda victim slab in each size class, we monitor the activity ofall the slabs during a short period of time. Slab activity isdefined as the number of distinct items accessed duringthat period. For each slab, we estimate its activity usinga Hyperloglog counter [8] with logarithmic space over-head. After the monitoring period, we choose the leastactive slabs as the victim slabs. We keep deallocatingslabs until the new memory limit is satisfied.

Our online analysis consists of a profiling window fol-lowed by a release period. During the profiling period,we use the reuse distance analysis to find the miss rate

for every cache size. At the end of the profiling window,we formulate and solve a linear program according to theconstraints and the objective function. The solution givesthe new Memcached and KVD sizes. Then, we recon-figure the ElCached system using our above-mentionedprocedure.

The study of online resource provisioning remains asfuture work. An important question is how the profil-ing window length affects the accuracy and optimalityof solution. Our future work also includes analyzing thememory and time overheads of the online reuse distanceanalysis.

7 Related Work

Our study is related to previous work on optimizingWeb caching applications. For example, Dynacache [4]optimizes Memcached by profiling workloads and dy-namically adjusting memory resources and eviction poli-cies. Saemundsson et al. [14] introduced MIMIR, alightweight online profiler for Memcached. Analogousto us, they used a tree-based scheme to estimate reusedistance. To lower the overhead, MIMIR divides theLRU stack into variable-sized buckets. However, con-trary to us, MIMIR and Dynacache both assume a fixedsize for all cached elements, which isn’t the case in Mem-cached. Hu et al. [10] introduced LAMA to improve slaballocation in Memcached. Like us, they also estimate themiss ratio for every slab class.

Reuse distance computation has been well-studied inthe past. Mattson et al. [12] introduced a basic stack sim-ulation algorithm. Olken [13] gave the first tree-basedmethod using an AVL tree. Ding and Zhong [6] useda similar technique to approximate reuse distance whilereducing the time overhead. Xiang et al. [18] definedthe footprint of a given trace to be the number of distinctelements accessed in the window. They showed that un-der a certain condition (reuse-window condition) reusedistance is equal to the finite differential of average foot-print. We implemented the footprint approach and foundits accuracy to be slightly lower than our tree-based algo-rithm. These techniques all require a linear space over-head. Wires et al. [17] used counter stacks to approxi-mate the miss ratio curve with sublinear space overhead.These studies can be integrated into ElCached to reducethe space and time overhead in the online analysis.

8 Conclusion

In this paper, we described ElCached, a multi-level key-value caching system that uses a reuse distance profilerto estimate the miss rate curve across all capacity limitswith 4% relative error. ElCached solves the resource pro-

5

35

Page 36: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

Experiment2:MultitenantResourceProvisioning

• Then,weuseElCached tofindthelatency-optimalDRAM/SSDpartitioning.

compared to the 3 GB of physical memory. That is thememory consumption of tenant 1 alone, when taking theproportional approach. This means potential revenue forCSP because it could potentially host more tenants aslong as other resources (CPU, network, etc.) are avail-able.

l

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

0.25

0.50

1.00

2.00

4.00

8.00

0 1 1.63 2 3DRAM Partition for T1 (GB)

Late

ncy

(ms)

l Tenant 1 Tenant 2

(a)

ProportionalT1 T2

DRAM 1.63 1.37Lat. 0.71 0.20Cost 7.20 5.41

ElasticT1 T2

DRAM 0.46 1.17Lat. 0.67 0.20Cost 5.30 4.01

(b)

Figure 4: Multi-tenant resource provisioning: (a) latencyper DRAM partitioning with proportional SSD size (1:4ratio) (b) elastic vs proportional with the specified mem-ory partition

6 Online Resource Provisioning

Our study of online resource provisioning is preliminaryand remains as future work. Here, we explain the basicsof our online implementation.

Memcached allocates memory in units of fixed size(called slabs) until it reaches its memory limit. Thus wecan control resource provisioning by dynamically recon-figuring the memory limit parameter. Increasing this pa-rameter allows Memcached to boost memory provisionto the new limit. To lower the provision, we must findand deallocate victim slabs. Prior to deallocation, allthe items in the victim slabs must be removed from theLRU chains. We choose victim slabs from different sizeclasses to ensure that the number of slabs in different sizeclasses remains proportional to the initial setting. To finda victim slab in each size class, we monitor the activity ofall the slabs during a short period of time. Slab activity isdefined as the number of distinct items accessed duringthat period. For each slab, we estimate its activity usinga Hyperloglog counter [8] with logarithmic space over-head. After the monitoring period, we choose the leastactive slabs as the victim slabs. We keep deallocatingslabs until the new memory limit is satisfied.

Our online analysis consists of a profiling window fol-lowed by a release period. During the profiling period,we use the reuse distance analysis to find the miss rate

for every cache size. At the end of the profiling window,we formulate and solve a linear program according to theconstraints and the objective function. The solution givesthe new Memcached and KVD sizes. Then, we recon-figure the ElCached system using our above-mentionedprocedure.

The study of online resource provisioning remains asfuture work. An important question is how the profil-ing window length affects the accuracy and optimalityof solution. Our future work also includes analyzing thememory and time overheads of the online reuse distanceanalysis.

7 Related Work

Our study is related to previous work on optimizingWeb caching applications. For example, Dynacache [4]optimizes Memcached by profiling workloads and dy-namically adjusting memory resources and eviction poli-cies. Saemundsson et al. [14] introduced MIMIR, alightweight online profiler for Memcached. Analogousto us, they used a tree-based scheme to estimate reusedistance. To lower the overhead, MIMIR divides theLRU stack into variable-sized buckets. However, con-trary to us, MIMIR and Dynacache both assume a fixedsize for all cached elements, which isn’t the case in Mem-cached. Hu et al. [10] introduced LAMA to improve slaballocation in Memcached. Like us, they also estimate themiss ratio for every slab class.

Reuse distance computation has been well-studied inthe past. Mattson et al. [12] introduced a basic stack sim-ulation algorithm. Olken [13] gave the first tree-basedmethod using an AVL tree. Ding and Zhong [6] useda similar technique to approximate reuse distance whilereducing the time overhead. Xiang et al. [18] definedthe footprint of a given trace to be the number of distinctelements accessed in the window. They showed that un-der a certain condition (reuse-window condition) reusedistance is equal to the finite differential of average foot-print. We implemented the footprint approach and foundits accuracy to be slightly lower than our tree-based algo-rithm. These techniques all require a linear space over-head. Wires et al. [17] used counter stacks to approxi-mate the miss ratio curve with sublinear space overhead.These studies can be integrated into ElCached to reducethe space and time overhead in the online analysis.

8 Conclusion

In this paper, we described ElCached, a multi-level key-value caching system that uses a reuse distance profilerto estimate the miss rate curve across all capacity limitswith 4% relative error. ElCached solves the resource pro-

5

compared to the 3 GB of physical memory. That is thememory consumption of tenant 1 alone, when taking theproportional approach. This means potential revenue forCSP because it could potentially host more tenants aslong as other resources (CPU, network, etc.) are avail-able.

l

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

0.25

0.50

1.00

2.00

4.00

8.00

0 1 1.63 2 3DRAM Partition for T1 (GB)

Late

ncy

(ms)

l Tenant 1 Tenant 2

(a)

ProportionalT1 T2

DRAM 1.63 1.37Lat. 0.71 0.20Cost 7.20 5.41

ElasticT1 T2

DRAM 0.46 1.17Lat. 0.67 0.20Cost 5.30 4.01

(b)

Figure 4: Multi-tenant resource provisioning: (a) latencyper DRAM partitioning with proportional SSD size (1:4ratio) (b) elastic vs proportional with the specified mem-ory partition

6 Online Resource Provisioning

Our study of online resource provisioning is preliminaryand remains as future work. Here, we explain the basicsof our online implementation.

Memcached allocates memory in units of fixed size(called slabs) until it reaches its memory limit. Thus wecan control resource provisioning by dynamically recon-figuring the memory limit parameter. Increasing this pa-rameter allows Memcached to boost memory provisionto the new limit. To lower the provision, we must findand deallocate victim slabs. Prior to deallocation, allthe items in the victim slabs must be removed from theLRU chains. We choose victim slabs from different sizeclasses to ensure that the number of slabs in different sizeclasses remains proportional to the initial setting. To finda victim slab in each size class, we monitor the activity ofall the slabs during a short period of time. Slab activity isdefined as the number of distinct items accessed duringthat period. For each slab, we estimate its activity usinga Hyperloglog counter [8] with logarithmic space over-head. After the monitoring period, we choose the leastactive slabs as the victim slabs. We keep deallocatingslabs until the new memory limit is satisfied.

Our online analysis consists of a profiling window fol-lowed by a release period. During the profiling period,we use the reuse distance analysis to find the miss rate

for every cache size. At the end of the profiling window,we formulate and solve a linear program according to theconstraints and the objective function. The solution givesthe new Memcached and KVD sizes. Then, we recon-figure the ElCached system using our above-mentionedprocedure.

The study of online resource provisioning remains asfuture work. An important question is how the profil-ing window length affects the accuracy and optimalityof solution. Our future work also includes analyzing thememory and time overheads of the online reuse distanceanalysis.

7 Related Work

Our study is related to previous work on optimizingWeb caching applications. For example, Dynacache [4]optimizes Memcached by profiling workloads and dy-namically adjusting memory resources and eviction poli-cies. Saemundsson et al. [14] introduced MIMIR, alightweight online profiler for Memcached. Analogousto us, they used a tree-based scheme to estimate reusedistance. To lower the overhead, MIMIR divides theLRU stack into variable-sized buckets. However, con-trary to us, MIMIR and Dynacache both assume a fixedsize for all cached elements, which isn’t the case in Mem-cached. Hu et al. [10] introduced LAMA to improve slaballocation in Memcached. Like us, they also estimate themiss ratio for every slab class.

Reuse distance computation has been well-studied inthe past. Mattson et al. [12] introduced a basic stack sim-ulation algorithm. Olken [13] gave the first tree-basedmethod using an AVL tree. Ding and Zhong [6] useda similar technique to approximate reuse distance whilereducing the time overhead. Xiang et al. [18] definedthe footprint of a given trace to be the number of distinctelements accessed in the window. They showed that un-der a certain condition (reuse-window condition) reusedistance is equal to the finite differential of average foot-print. We implemented the footprint approach and foundits accuracy to be slightly lower than our tree-based algo-rithm. These techniques all require a linear space over-head. Wires et al. [17] used counter stacks to approxi-mate the miss ratio curve with sublinear space overhead.These studies can be integrated into ElCached to reducethe space and time overhead in the online analysis.

8 Conclusion

In this paper, we described ElCached, a multi-level key-value caching system that uses a reuse distance profilerto estimate the miss rate curve across all capacity limitswith 4% relative error. ElCached solves the resource pro-

5

compared to the 3 GB of physical memory. That is thememory consumption of tenant 1 alone, when taking theproportional approach. This means potential revenue forCSP because it could potentially host more tenants aslong as other resources (CPU, network, etc.) are avail-able.

l

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

mem

ory

part

ition

0.25

0.50

1.00

2.00

4.00

8.00

0 1 1.63 2 3DRAM Partition for T1 (GB)

Late

ncy

(ms)

l Tenant 1 Tenant 2

(a)

ProportionalT1 T2

DRAM 1.63 1.37Lat. 0.71 0.20Cost 7.20 5.41

ElasticT1 T2

DRAM 0.46 1.17Lat. 0.67 0.20Cost 5.30 4.01

(b)

Figure 4: Multi-tenant resource provisioning: (a) latencyper DRAM partitioning with proportional SSD size (1:4ratio) (b) elastic vs proportional with the specified mem-ory partition

6 Online Resource Provisioning

Our study of online resource provisioning is preliminaryand remains as future work. Here, we explain the basicsof our online implementation.

Memcached allocates memory in units of fixed size(called slabs) until it reaches its memory limit. Thus wecan control resource provisioning by dynamically recon-figuring the memory limit parameter. Increasing this pa-rameter allows Memcached to boost memory provisionto the new limit. To lower the provision, we must findand deallocate victim slabs. Prior to deallocation, allthe items in the victim slabs must be removed from theLRU chains. We choose victim slabs from different sizeclasses to ensure that the number of slabs in different sizeclasses remains proportional to the initial setting. To finda victim slab in each size class, we monitor the activity ofall the slabs during a short period of time. Slab activity isdefined as the number of distinct items accessed duringthat period. For each slab, we estimate its activity usinga Hyperloglog counter [8] with logarithmic space over-head. After the monitoring period, we choose the leastactive slabs as the victim slabs. We keep deallocatingslabs until the new memory limit is satisfied.

Our online analysis consists of a profiling window fol-lowed by a release period. During the profiling period,we use the reuse distance analysis to find the miss rate

for every cache size. At the end of the profiling window,we formulate and solve a linear program according to theconstraints and the objective function. The solution givesthe new Memcached and KVD sizes. Then, we recon-figure the ElCached system using our above-mentionedprocedure.

The study of online resource provisioning remains asfuture work. An important question is how the profil-ing window length affects the accuracy and optimalityof solution. Our future work also includes analyzing thememory and time overheads of the online reuse distanceanalysis.

7 Related Work

Our study is related to previous work on optimizingWeb caching applications. For example, Dynacache [4]optimizes Memcached by profiling workloads and dy-namically adjusting memory resources and eviction poli-cies. Saemundsson et al. [14] introduced MIMIR, alightweight online profiler for Memcached. Analogousto us, they used a tree-based scheme to estimate reusedistance. To lower the overhead, MIMIR divides theLRU stack into variable-sized buckets. However, con-trary to us, MIMIR and Dynacache both assume a fixedsize for all cached elements, which isn’t the case in Mem-cached. Hu et al. [10] introduced LAMA to improve slaballocation in Memcached. Like us, they also estimate themiss ratio for every slab class.

Reuse distance computation has been well-studied inthe past. Mattson et al. [12] introduced a basic stack sim-ulation algorithm. Olken [13] gave the first tree-basedmethod using an AVL tree. Ding and Zhong [6] useda similar technique to approximate reuse distance whilereducing the time overhead. Xiang et al. [18] definedthe footprint of a given trace to be the number of distinctelements accessed in the window. They showed that un-der a certain condition (reuse-window condition) reusedistance is equal to the finite differential of average foot-print. We implemented the footprint approach and foundits accuracy to be slightly lower than our tree-based algo-rithm. These techniques all require a linear space over-head. Wires et al. [17] used counter stacks to approxi-mate the miss ratio curve with sublinear space overhead.These studies can be integrated into ElCached to reducethe space and time overhead in the online analysis.

8 Conclusion

In this paper, we described ElCached, a multi-level key-value caching system that uses a reuse distance profilerto estimate the miss rate curve across all capacity limitswith 4% relative error. ElCached solves the resource pro-

5

36

Page 37: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

Experiment2:MultitenantResourceProvisioning• Elasticityimproves• tenant1’slatencyby5%,• bothtenants’costby26%,and• totalmemoryconsumptionby46%.

37

Page 38: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

Summary

• ElCached isamulti-levelkey-valuecachingsystem.• Itusesareusedistanceprofilertoestimatethemissratecurveacrossallcapacitylimits.• Itreducesthetotalcost byuptoaround60%comparedtoaproportionalscheme.• Multi-tenantexperimentindicatesthatwecanimprovelatency,cost,andtotalDRAMusage,comparedtotheproportionalscheme.

38

Page 39: ElCached: Elastic Multi-Level Key-Value Cache · Backend database server Web App 4. Resource Provisioning for Memcached 5. Resource Provisioning for Memcached • The service provider

ThankYou

39