Workload Analysis of a Large-Scale Key-Value Store
Berk Atikoglu, Yuehai Xu, Eitan Fracthenberg, Song Yiang, Mike Paleczny
2
Analyze Memcached at Facebook
+284,000,000,000 requests
5 different use cases
Workload characteristics, locality, cache effectiveness
3
Why Is Caching Important?
Cache ServersWeb Servers
Database
4
Motivation
Understand workload characteristics
Identify factors affecting performance
Provide a benchmark for future studies
5
Memcached
Distributed memory caching system Key-value store for small objects
Hash Function
Memcached Servers
Key
6
Tracing Methodology
Capture traces through a Linux Kernel Module (LKM)
Process traces with Hive
Memcached
Transport (TCP/UDP)
Network
Ethernet
LKM
7
Facebook Deployment
Pool Size Description
USR Few User-account status information
APP Dozens Object metadata of a popular application
SYS Few System data on service location
VAR Dozens Server-side browser information
ETC Hundreds Nonspecific, general purpose
Contains server related information
Anything that doesn’t belong to a specific pool goes to ETC
8
Analysis
Workload Characteristics
Locality, Cache Behavior
9
Request Composition
> 99.8% GETGET:UPDATE = 30:1
10
Key Size Distribution90% of VAR keys are 31B
USR keys are 16B or 21B
ETC is heterogeneous
11
Value Size DistributionUSR values are only 2B
90% of values are smaller than 500B
12
Value Size Dist. By Overall Weight
90% of data is generated by values of 500B or smaller except ETC
90% is 10KB or smaller values for ETC
13
Request Rate Over Time
All pools show diurnal pattern except SYS
14
Request Rate Over Time (ETC)
Night time in Western Semiphere
North America starts its day
15
Analysis
Workload Characteristics
Locality, Cache Behavior
16
Repeating Keys0.0003% of keys in 10% of requests in ETC
1% of keys in 55% of requests in ETC
Least frequent 50% of keys in 1% of requests in ETC
17
Locality Over Time
USR APP ETC VAR SYS0
20
40
60
80
100
% of unique keys out of total in unit time
5min 60min
18
Reuse Period of Keys99.9% of SYS keys are reused in 1hr
88.5% of ETC keys are reused in 1hr
96.4% of ETC keys are reused in 6hr
19
Hit Rate98.2% 92.9% 81.4%
93.7% 98.7%
Why?
20
Causes of ETC Cache Misses
Compulsory
Capacity
Invalidation
70% 22% 8%
81%
13%4% 2%hit miss: compulsory miss: capacity
miss: invalidation
21
Conclusion
Analyzed 5 different memcached use cases
Different applications of memcached have extreme variations in access patterns
Answered pertinent questions to improve Facebook’s memcached usage
22
Thank You
Questions?