Upload
lily
View
47
Download
0
Embed Size (px)
DESCRIPTION
Memento : Coordinated In-Memory Caching for Data-Intensive Clusters. Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker, Ion Stoica. Data Intensive Computation. Data analytic clusters are pervasive - PowerPoint PPT Presentation
Citation preview
1
Memento: Coordinated In-Memory Caching
for Data-Intensive Clusters
Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth
Kandula, Scott Shenker, Ion Stoica
2
Data Intensive ComputationData analytic clusters are pervasive
◦Jobs run multiple tasks in parallel◦Jobs operate on petabytes of input
Distributed file systems (DFS) store data distributed and replicated◦Data reads are either disk-local or
remote across the network
3
Access to disk slowMemory orders of magnitude faster
How do we leverage memory storage for datacenter jobs?
4
Can we store all data in memory? Machines have tens of gigabytes of
memory
But, huge discrepancy between storage and memory capacities
◦ Facebook cluster has ~200x more data on disk than memory
Use Memory as Cache
5
Will the data fit in cache?
10% total input is >80% of all jobs
Heavy-tailed 96% of smallest jobs can fit in the
memory cache
6
Elephants and miceMix of a few “large” jobs and
very many “small” jobs
Large jobs:◦Batch operations◦Production jobs
Small jobs:◦Interactive queries (e.g., Hive,
SCOPE)◦Experimental analytics
7
Challenge: Small Parallel JobsJob finishes when its last task finishes
◦Need to cache all-or-nothing
8
In summary…
Only option for memory-locality is caching
96% of jobs can have their data in memory, if we cache it right
9
OutlineFATE: Cache Replacement
Memento: System Architecture
Evaluation
10
We care about jobs finishing faster…
Job j that completed in tn time normally, takes tm time with memory caching
◦%Reductionj =
Average % Reduction in Completion Time
11
Traditional Cache Replacement
Traditional cache replacement policies (e.g., LRU, LFU) optimize for hit-ratio
◦ Belady’s MIN: Evict blocks that are to be accessed “farthest in future”
12
Belady’s MIN Example
50% cache hit
E, F, B, D, C, A (time
)
F, B, D, C, A (time
)
B, D, C, A(time)…
Data Block
A B C D
E B C D
E B F D
MIN: How much do jobs benefit?
Memory-local tasks are 10x (or 90%) fasterB DA CJ1 J2
J2DJ2
C
13
J1A J1
B
Reduction:
0%
0%
Average(0 + 0)/2 = 0%
4 computation slots
B, D, C, A(time)
Data Block E B F D
14
“Whole-job” inputs
50% cache hit
E, F, B, D, C, A (time
)
F, B, D, C, A (time
)
B, D, C, A(time)…
Data Block
A B C D
A B E D
A B E F
B
D
A
C
J1
J2
MIN: How much do jobs benefit?
Memory-local tasks are 10x (or 90%) fasterB DA CJ1 J2
J2DJ2
C
15
J1A J1
B
Reduction:
90%
0%
Average(90 + 0)/2 = 45%
4 computation slots
B, D, C, A(time)
Data Block A B E F
(MIN): Average(0 + 0)/2 = 0%Cache hit-ratio not the most
suited
16
FATE Cache ReplacementMaximize “whole-job” inputs in cache
Need global coordination◦Parallel tasks distributed over different
machines
Property:◦Small jobs get preference◦Large jobs benefit with remaining cache
space
17
Waves in the jobSingle
Wave (small jobs) All-or-nothing
Multiple Waves (large jobs)Linear benefits
18
Waves in the job
Multiple
Waves
Sing
le
Wav
e
19
OutlineFATE: Cache Replacement
Memento: System Architecture
Evaluation
20
Global coordination of local caches
Global cache viewBlock
IdClient Id
File Name
… … …
21
Memento: Salient Features
External Service
Local cache reads
Metadata communication
22
OutlineFATE: Cache Replacement
Memento: System Architecture
Evaluation
23
EvaluationHDFS in conjunction with Memento
Microsoft and Facebook traces replayed◦Replay jobs with same inter-arrival time
Deployment on EC2 cluster of 100 machines◦20GB memory for Memento
Jobs binned by their size
24
Job Distribution, by bins
25
Jobs are 77% faster at average
Small jobs see 85% reduction in completion
time
26
Cache hit-ratio matters less
Average job faster by 77% with FATE (vs.) 49% with
MIN
27
Memento scales sufficientlyCoordinator handles 10,000
simultaneous client communications
Client can handle eight simultaneous local map tasks
Sufficient for current datacenter loads
28
Ongoing / Future work >>
29
Simpler Implementation [1]Ride the OS cache
◦Estimate where block is cached
Change job manager to track block accesses
– No FATE, use default (LRU?)
Initial results show 2.3x improvement in cache hit-rate
30
Alternate Metrics [2]We optimize for “average %
reduction in completion time” of jobs
Average◦Weighted to include job priorities?
Other metrics◦Reduction of load on disk subsystem?◦Utilization?
31
Solid State Devices [3]SSDs, a new layer in the storage
hierarchy
Hierarchical Caching◦Include SSDs between disk and memory
What’s the best cache replacement policy?
32
SummaryMemory-caching can be surprisingly
effective◦…despite disk and memory capacity
discrepancy
Memento: Coordinated cache management◦FATE Replacement Policy (“whole-jobs”)
Encouraging results for datacenter workload