81
An In-Memory Object Caching Framework with Adaptive Load Balancing Yue Cheng (Virginia Tech) Aayush Gupta (IBM Research – Almaden) Ali R. Butt (Virginia Tech)

An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

An In-Memory Object Caching Framework with Adaptive Load Balancing

Yue Cheng (Virginia Tech) Aayush Gupta (IBM Research – Almaden)

Ali R. Butt (Virginia Tech)

Page 2: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

In-memory caching in datacenters

2

Local deployment

Cloud deployment

Page 3: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

In-memory caching in datacenters

3

Client library

DB Query Network

Persistent storage tier e.g., MySQL

Web app servers

Local deployment

Cloud deployment

Page 4: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

In-memory caching in datacenters

4

Client library

get(key) set(key)

Cache miss set(key)

Network

In-memory caching tier e.g., Memcached

Persistent storage tier e.g., MySQL

Web app servers

Local deployment

Cloud deployment

Page 5: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

In-memory caching is desirable

•  Offers high performance •  Enables quick deployment •  Provides ease of use •  Supports elastic scale-out

5

Page 6: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

In-memory caching is desirable

•  Offers high performance •  Enables quick deployment •  Provides ease of use •  Supports elastic scale-out

•  Problem: Load imbalance impacts performance

6

Page 7: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Access load imbalance

7

0

2

4

6

8

10

12

14

16

unif 0.4 0.8 0.9 0.99 1.01 1.1

Per-

clie

nt t

hrou

ghpu

t (Q

PS

in t

hous

ands

)

Workload skewness (Zipfian constant)

Throughput

95% GET, 5% SET, Zipfian, 20 cache servers

Ideal balance high imbalance

Page 8: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

8

0

2

4

6

8

10

12

14

16

unif 0.4 0.8 0.9 0.99 1.01 1.1

Per-

clie

nt t

hrou

ghpu

t (Q

PS

in t

hous

ands

)

Workload skewness (Zipfian constant)

Throughput

95% GET, 5% SET, Zipfian, 20 cache servers

Ideal balance high imbalance

Access load imbalance Key popularity distribution: different Zipfian constant

0.5 0.9

1.1

* http://www.percona.com/blog/2012/05/09/new-distribution-of-random-generator-for-sysbench-zipf/

Page 9: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

9

0

2

4

6

8

10

12

14

16

unif 0.4 0.8 0.9 0.99 1.01 1.1

Per-

clie

nt t

hrou

ghpu

t (Q

PS

in t

hous

ands

)

Workload skewness (Zipfian constant)

Throughput

95% GET, 5% SET, Zipfian, 20 cache servers

Ideal balance high imbalance

Access load imbalance

Page 10: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

10

0

2

4

6

8

10

12

14

16

unif 0.4 0.8 0.9 0.99 1.01 1.1

Per-

clie

nt t

hrou

ghpu

t (Q

PS

in t

hous

ands

)

Workload skewness (Zipfian constant)

Throughput

> 60%!

95% GET, 5% SET, Zipfian, 20 cache servers

Ideal balance high imbalance

Access load imbalance

Page 11: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

11

0

5

10

15

20

25

30

35

0

2

4

6

8

10

12

14

16

unif 0.4 0.8 0.9 0.99 1.01 1.1

99th

%ile

late

ncy

(ms)

Per-

clie

nt t

hrou

ghpu

t (Q

PS

in t

hous

ands

)

Workload skewness (Zipfian constant)

Throughput Latency

3.2×

95% GET, 5% SET, Zipfian, 20 cache servers

Ideal balance high imbalance

Access load imbalance

Page 12: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Access load imbalance

12

0

5

10

15

20

25

30

35

0

2

4

6

8

10

12

14

16

unif 0.4 0.8 0.9 0.99 1.01 1.1

99th

%ile

late

ncy

(ms)

Per-

clie

nt t

hrou

ghpu

t (Q

PS

in t

hous

ands

)

Workload skewness (Zipfian constant)

Throughput Latency

3.2×

95% GET, 5% SET, Zipfian, 20 cache servers

Ideal balance high imbalance

Great opportunity for performance improvement

Page 13: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Our contribution:

Revisiting in-memory cache design

13

MBal

A holistic in-memory caching framework with adaptive Multi-phase load Balancing

•  Synthesizes different load balancing techniques – Key replication –  Server-local cachelet migration – Coordinated cachelet migration

•  Improves scale-up gains •  Mitigates load imbalance

Page 14: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Outline

14

MBal cache design MBal load balancer design

Evaluation Related work

Page 15: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Outline

15

MBal cache design MBal load balancer design

Evaluation Related work

MBal Cache Design

Page 16: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

A typical in-memory cache design

16

Worker 1

Worker 2

Shared in-memory data structure

In-memory data

Worker N …

CPU

DRAM

Page 17: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

MBal: Fine-grained resource partitioning

17

Worker 1

Cachelet

Partition

Worker 2

Cachelet

Partition

Worker N

Cachelet

Partition

CPU

DRAM

Page 18: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

MBal cachelet: Resource encapsulation

18

Indexing metadata (e.g., chained hash table)

Cachelet

Slab classes

•  Cachelet -  Encapsulates resources -  Avoids lock contention

Page 19: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Key-to-cachelet mapping

19

MBal client MBal client MBal client

MBal cache

Client side Server side

Query

Page 20: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Key-to-cachelet mapping

20 MBal cache

Client side Server side

Key ring

VN1 VN2

VNN VNN-1

header key, (value*)

hash(key) 1! ① Compute VN #

with hash

MBal client MBal client MBal client

Page 21: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Key-to-cachelet mapping

21 MBal cache

Client side Server side

Key ring

VN1 VN2

VNN VNN-1

hash(key) 1!

C1 C2 C3 cachelets

① Compute VN # with hash

② Map VN # to Cachelet ID

MBal client MBal client MBal client

2!

header key, (value*)

cachelet id

Page 22: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Key-to-cachelet mapping

22 MBal cache

Client side Server side

Key ring

VN1 VN2

VNN VNN-1

hash(key) 1!

C1 C2 C3 cachelets 2!

3!

① Compute VN # with hash

② Map VN # to Cachelet ID

③ Map Cachelet ID to the worker thread

MBal client MBal client MBal client

key, (value*)

Page 23: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Outline

23

MBal cache design MBal load balancer design

Evaluation Related work

MBal Multi-Phase Load Balancer

Page 24: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Phase 1: Key replication

•  TRIGGER? –  EWMA access > threshold

•  ACTION? –  Randomly pick a shadow

server –  Replicate hot keys –  Proportional sampling

•  FEATURES? –  Fine-grained –  Temporary

24

MBal cache 1

MBal client 1 MBal client 2

MBal cache 2

get(foo)

foo

foo

get(foo)

Shadow server

Key replication

* SPORE [SoCC’13]

Page 25: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Phase 2: Server-local cachelet migration

25

MBal client 1 MBal client 2

MBal cache

•  TRIGGER? –  # hot keys > REPLHIGH

–  Enough local headroom

•  ACTION? –  Migrate/swap cachelet(s)

within a server –  ILP

•  FEATURES? –  Coarse-grained –  Temporary

Page 26: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Phase 2: Server-local cachelet migration

•  TRIGGER? –  # hot keys > REPLHIGH

–  Enough local headroom

•  ACTION? –  Migrate/swap cachelet(s)

within a server –  ILP

•  FEATURES? –  Coarse-grained –  Temporary

26

MBal client 1 MBal client 2

MBal cache

Server-local migration

Page 27: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Phase 3: Coordinated cachelet migration

27

MBal client 1 MBal client 2

MBal cache 1

•  TRIGGER? –  # hot keys > REPLHIGH

–  Not enough local headroom

•  ACTION? –  Migrate/swap

cachelet(s) across servers

–  ILP

•  FEATURES? –  Coarse-grained –  Permanent

Page 28: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Phase 3: Coordinated cachelet migration

28

MBal cache 1 MBal cache 2

MBal client 1 MBal client 2 MBal

coordinator •  TRIGGER?

–  # hot keys > REPLHIGH

–  Not enough local headroom

•  ACTION? –  Migrate/swap

cachelet(s) across servers

–  ILP

•  FEATURES? –  Coarse-grained –  Permanent

Page 29: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Phase 3: Coordinated cachelet migration

•  TRIGGER? –  # hot keys > REPLHIGH

–  Not enough local headroom

•  ACTION? –  Migrate/swap

cachelet(s) across servers

–  ILP

•  FEATURES? –  Coarse-grained –  Permanent

29

MBal cache 1 MBal cache 2

MBal client 1 MBal client 2 MBal

coordinator

Coordinated migration

Page 30: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Client-side mapping change

•  Phase 2: Server-local cachelet migration – Clients are informed of cachelet migration when

cache home worker receives requests about that migrated cachelet

•  Phase 3: Coordinated cachelet migration – Once migration is done, source worker informs

coordinator about the mapping change – Clients ping coordinator periodically

30

Page 31: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

MBal: Cost/benefit trade-offs

31

high

low

Benefit

Cost low high

Cost: metadata; space; n/w transfer Benefit: fast fix for hot keys

P1: Key replication

Page 32: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

MBal: Cost/benefit trade-offs

32

high

low

Benefit

Cost low high

Cost: metadata; space; n/w transfer Benefit: fast fix for hot keys

P1: Key replication

Cost: metadata Benefit: fast fix for hot partitions

P2: Server-local cachelet migration

Page 33: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

MBal: Cost/benefit trade-offs

33

high

low

Benefit

Cost low high

Cost: metadata; space; n/w transfer Benefit: fast fix for hot keys

Cost: metadata; bulk transfer n/w Benefit: global load balancing

P2: Server-local cachelet migration

P3: Coordinated cachelet migration P1: Key

replication

Cost: metadata Benefit: fast fix for hot partitions

Page 34: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Outline

34

MBal cache design MBal load balancer design

Evaluation Related work Evaluation

Page 35: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Methodology

•  Scale-up cache performance tests – Local testbed (8-core server) – Single instance

•  End-to-end load balancer evaluation – 20-VM cluster (Amazon EC2, c3.large)

35

Page 36: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

MBal evaluation – micro-benchmark

36

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

45.00

1 2 4 6 8

Thr

ough

put

(QP

S in

mill

ions

)

Number of threads

MBal MBal no NUMA Mercury Memcached

•  8-core 2.5GHz, 2×10MB L3 LLC, 64GB DRAM •  Uniform workload, 100% GET, 10B key 20B value •  Without network

Page 37: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

MBal evaluation – micro-benchmark

37

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

45.00

1 2 4 6 8

Thr

ough

put

(QP

S in

mill

ions

)

Number of threads

MBal MBal no NUMA Mercury Memcached

•  8-core 2.5GHz, 2×10MB L3 LLC, 64GB DRAM •  Uniform workload, 100% GET, 10B key 20B value •  Without network

1.7×

Page 38: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

MBal evaluation – micro-benchmark

38

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

45.00

1 2 4 6 8

Thr

ough

put

(QP

S in

mill

ions

)

Number of threads

MBal MBal no NUMA Mercury Memcached

•  8-core 2.5GHz, 2×10MB L3 LLC, 64GB DRAM •  Uniform workload, 100% GET, 10B key 20B value •  Without network

1.7×

15%

Page 39: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

MBal evaluation – micro-benchmark

39

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

45.00

1 2 4 6 8

Thr

ough

put

(QP

S in

mill

ions

)

Number of threads

MBal MBal no NUMA Mercury Memcached

•  8-core 2.5GHz, 2×10MB L3 LLC, 64GB DRAM •  Uniform workload, 100% GET, 10B key 20B value •  Without network

ü MBal uses fine-grained cachelet design ü MBal eliminates bucket-level lock contention

1.7×

15%

Page 40: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

MBal evaluation – micro-benchmark

40

•  8-core 2.5GHz, 2×10MB L3 LLC, 64GB DRAM •  Uniform workload, 100% SET, 10B key 20B value •  Without network

Page 41: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

MBal evaluation – micro-benchmark

41

•  8-core 2.5GHz, 2×10MB L3 LLC, 64GB DRAM •  Uniform workload, 100% SET, 10B key 20B value •  Without network

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

1 2 4 6 8

Thr

ough

put

(QP

S in

mill

ions

)

Number of threads

Memcached

Page 42: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

MBal evaluation – micro-benchmark

42

•  8-core 2.5GHz, 2×10MB L3 LLC, 64GB DRAM •  Uniform workload, 100% SET, 10B key 20B value •  Without network

0.00

0.50

1.00

1.50

2.00

2.50

3.00

1 2 4 6 8

Thr

ough

put

(QP

S in

mill

ions

)

Number of threads

Mercury Memcached

Page 43: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

MBal evaluation – micro-benchmark

43

•  8-core 2.5GHz, 2×10MB L3 LLC, 64GB DRAM •  Uniform workload, 100% SET, 10B key 20B value •  Without network

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

1 2 4 6 8

Thr

ough

put

(QP

S in

mill

ions

)

Number of threads

MBal MBal no NUMA Mercury Memcached

62×

Page 44: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

MBal evaluation – micro-benchmark

44

•  8-core 2.5GHz, 2×10MB L3 LLC, 64GB DRAM •  Uniform workload, 100% SET, 10B key 20B value •  Without network

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

1 2 4 6 8

Thr

ough

put

(QP

S in

mill

ions

)

Number of threads

MBal MBal no NUMA Mercury Memcached

62×

ü MBal eliminates global cache lock contention!

Page 45: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

End-to-end load balancer evaluation

45

Workload Characteristics Application scenario

Workload A 100% read, Zipfian User account status info

Workload B 95% read, 5% update, hotspot (95% ops on 5% data)

Photo tagging

Workload C 50% read, 50% update, Zipfian Session store recording actions

Amazon EC2, us-west-2b, Clients on 36 instances (c3.2xlarge), MBal caches on 20-VM cluster (c3.large)

Page 46: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Load balancer evaluation

46

Workload A Workload B Workload C

Memcached is unable to sustain write-intensive workload

Memcached

Workload Characteristics

Workload A 100% read, Zipfian

Workload B 95% read, 5% update, hotspot

Workload C 50% read, 50% update, Zipfian

0

400

800

1200

1600

0 100 200 300 400 500

90th

%ile

late

ncy

(ms)

Runtime (seconds)

Ideal balance

Page 47: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

0

400

800

1200

1600

0 100 200 300 400 500

90th

%ile

late

ncy

(ms)

Runtime (seconds)

Load balancer evaluation

47

Workload Characteristics

Workload A 100% read, Zipfian

Workload B 95% read, 5% update, hotspot

Workload C 50% read, 50% update, Zipfian

Memcached is unable to sustain write-intensive workload

Memcached

Workload A Workload B Workload C

Ideal balance

MBal, w/o load balancer

Page 48: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Load balancer evaluation

48

Workload Characteristics

Workload A 100% read, Zipfian

Workload B 95% read, 5% update, hotspot

Workload C 50% read, 50% update, Zipfian

0

400

800

1200

1600

0 100 200 300 400 500

90th

%ile

late

ncy

(ms)

Runtime (seconds)

Memcached

Workload A Workload B Workload C

MBal, all phases

Memcached is unable to sustain write-intensive workload

Ideal balance

MBal, w/o load balancer

Page 49: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

0

400

800

1200

1600

0 100 200 300 400 500

90th

%ile

late

ncy

(ms)

Runtime (seconds)

Load balancer evaluation

49

Workload Characteristics

Workload A 100% read, Zipfian

Workload B 95% read, 5% update, hotspot

Workload C 50% read, 50% update, Zipfian

Memcached

Workload A Workload B Workload C

MBal, all phases

Ideal balance

MBal, w/o load balancer

Page 50: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

0

400

800

1200

1600

0 100

90th

%ile

late

ncy

(ms)

Runtime (seconds)

50

MBal, w/o load balancer

Ideal balance

Memcached

all 3 phases are triggered

MBal, all phases

35%

Workload A

Load balancer evaluation Workload Characteristics

Workload A 100% read, Zipfian

Workload B 95% read, 5% update, hotspot

Workload C 50% read, 50% update, Zipfian

Page 51: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

0

400

800

1200

1600

200 300

90th

%ile

late

ncy

(ms)

Runtime (seconds)

51

Ideal balance

only Phase 2 is needed

MBal, all phases

Workload B

Load balancer evaluation Workload Characteristics

Workload A 100% read, Zipfian

Workload B 95% read, 5% update, hotspot

Workload C 50% read, 50% update, Zipfian

MBal, w/o load balancer Memcached

Page 52: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

0

400

800

1200

1600

400 500

90th

%ile

late

ncy

(ms)

Runtime (seconds)

52

Ideal balance

a combination of Phase 2 & 3 is triggered

MBal, all phases

Workload C

Load balancer evaluation Workload Characteristics

Workload A 100% read, Zipfian

Workload B 95% read, 5% update, hotspot

Workload C 50% read, 50% update, Zipfian

23%

MBal, w/o load balancer

Page 53: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Summary of results

•  MBal fine-grained partitioning design – 2× more QPS for GETs – 62× more QPS for SETs

•  MBal multi-phase load balancer – 35% lower tail latency – 20% higher throughput

53

Page 54: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Summary of results

•  MBal fine-grained partitioning design – 2× more QPS for GETs – 62× more QPS for SETs

•  MBal multi-phase load balancer – 35% lower tail latency – 20% higher throughput

54

Improves “BANG for the buck”

Page 55: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Outline

55

MBal cache design MBal load balancer design

Evaluation Related work Related work

Page 56: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Related work

•  High performance in-memory KV store –  Masstree [EuroSys’12], MemC3 [NSDI’12], MICA [NSDI’14]

•  Storage load balancing –  DHT (Pastry [Middleware’01], CFS [SOSP’01],

Chord [SIGCOMM’01]), Proteus [ICDCS’13]

•  Access load balancing –  SmallCache [SoCC’11], Chronos [SoCC’12],

SPORE [SoCC’13], Streaming Analytics [Feedback’14]

56

Page 57: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Related work

•  High performance in-memory KV store –  Masstree [EuroSys’12], MemC3 [NSDI’12], MICA [NSDI’14]

•  Storage load balancing –  DHT (Pastry [Middleware’01], CFS [SOSP’01],

Chord [SIGCOMM’01]), Proteus [ICDCS’13]

•  Access load balancing –  SmallCache [SoCC’11], Chronos [SoCC’12],

SPORE [SoCC’13], Streaming Analytics [Feedback’14]

57

Page 58: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Related work

•  High performance in-memory KV store –  Masstree [EuroSys’12], MemC3 [NSDI’12], MICA [NSDI’14]

•  Storage load balancing –  DHT (Pastry [Middleware’01], CFS [SOSP’01],

Chord [SIGCOMM’01]), Proteus [ICDCS’13]

•  Access load balancing –  SmallCache [SoCC’11], Chronos [SoCC’12],

SPORE [SoCC’13], Streaming Analytics [Feedback’14]

58

Page 59: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Related work

•  High performance in-memory KV store –  Masstree [EuroSys’12], MemC3 [NSDI’12], MICA [NSDI’14]

•  Storage load balancing –  DHT (Pastry [Middleware’01], CFS [SOSP’01],

Chord [SIGCOMM’01]), Proteus [ICDCS’13]

•  Access load balancing –  SmallCache [SoCC’11], Chronos [SoCC’12],

SPORE [SoCC’13], Streaming Analytics [Feedback’14]

59

Page 60: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Conclusions

•  Fine-grained, horizontal partitioning of in-memory data structure – Eliminates sync overhead – Enables load balancing

•  MBal synthesizes three replication and migration techniques into a holistic system – Reduces load imbalance –  Improves tail latency

60

Page 61: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Thank you! http://research.cs.vt.edu/dssl/

Yue Cheng Aayush Gupta Ali R. Butt

61

MBal

Page 62: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

62

Page 63: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

63

Page 64: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

64

Backup Slides

Page 65: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Memcached is desirable

•  Quick deployment •  Ease of use

65

Page 66: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Memcached deployment in the Cloud

•  Quick deployment •  Ease of use •  Elastic scale-up •  Elastic scale-out

66

Page 67: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Memcached deployment in the Cloud

•  Quick deployment •  Ease of use •  Elastic scale-up •  Elastic scale-out

67

Instance type

vCPU ECU N/w (Gbps)

Price/hr

m1.small 1 1 0.1 $0.044

m3.medium 1 3 0.5 $0.070

c3.large 2 7 0.6 $0.105

m3.xlarge 4 13 0.7 $0.280

c3.2xlarge 8 28 1 $0.420

c3.8xlarge 32 108 10 $1.680

Page 68: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Memcached deployment in the Cloud

•  Quick deployment •  Ease of use •  Elastic scale-up •  Elastic scale-out

68

Instance type

vCPU ECU N/w (Gbps)

Price/hr

m1.small 1 1 0.1 $0.044

m3.medium 1 3 0.5 $0.070

c3.large 2 7 0.6 $0.105

m3.xlarge 4 13 0.7 $0.280

c3.2xlarge 8 28 1 $0.420

c3.8xlarge 32 108 10 $1.680

Page 69: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

“Decision paralysis ...”

69

Instance type

vCPU ECU N/w (Gbps)

Price/hr

m1.small 1 1 0.1 $0.044

m3.medium 1 3 0.5 $0.070

c3.large 2 7 0.6 $0.105

m3.xlarge 4 13 0.7 $0.280

c3.2xlarge 8 28 1 $0.420

c3.8xlarge 32 108 10 $1.680

Getting the most “BANG for the buck”

Page 70: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

“Decision paralysis ...”

70

Instance type

vCPU ECU N/w (Gbps)

Price/hr

m1.small 1 1 0.1 $0.044

m3.medium 1 3 0.5 $0.070

c3.large 2 7 0.6 $0.105

m3.xlarge 4 13 0.7 $0.280

c3.2xlarge 8 28 1 $0.420

c3.8xlarge 32 108 10 $1.680

Getting the most “BANG for the buck”

•  Desire 1: performance •  Desire 2: $ efficiency

Page 71: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

0

0.4

0.8

1.2

1.6

2

2.4

1 5 10 20

Thr

ough

put

(QP

S in

mill

ions

)

Memcached cluster size

c3.large m3.xlarge

c3.2xlarge c3.8xlarge

Desire 1: Performance

71

95% GET, 5% SET, Uniform

Instance type

vCPU ECU N/w (Gbps)

Price/hr

m1.small 1 1 0.1 $0.044

m3.medium 1 3 0.5 $0.070

c3.large 2 7 0.6 $0.105

m3.xlarge 4 13 0.7 $0.280

c3.2xlarge 8 28 1 $0.420

c3.8xlarge 32 108 10 $1.680

Page 72: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

0

0.4

0.8

1.2

1.6

2

2.4

1 5 10 20

Thr

ough

put

(QP

S in

mill

ions

)

Memcached cluster size

c3.large m3.xlarge

c3.2xlarge c3.8xlarge

Desire 1: Performance

72

95% GET, 5% SET, Uniform

Instance type

vCPU ECU N/w (Gbps)

Price/hr

m1.small 1 1 0.1 $0.044

m3.medium 1 3 0.5 $0.070

c3.large 2 7 0.6 $0.105

m3.xlarge 4 13 0.7 $0.280

c3.2xlarge 8 28 1 $0.420

c3.8xlarge 32 108 10 $1.680

Network is the bottleneck!

Page 73: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

0

0.4

0.8

1.2

1.6

2

2.4

1 5 10 20

Thr

ough

put

(QP

S in

mill

ions

)

Memcached cluster size

m1.small m3.medium c3.large

m3.xlarge c3.2xlarge c3.8xlarge

Desire 1: Performance

73

95% GET, 5% SET, Uniform

Instance type

vCPU ECU N/w (Gbps)

Price/hr

m1.small 1 1 0.1 $0.044

m3.medium 1 3 0.5 $0.070

c3.large 2 7 0.6 $0.105

m3.xlarge 4 13 0.7 $0.280

c3.2xlarge 8 28 1 $0.420

c3.8xlarge 32 108 10 $1.680

CPU is the bottleneck!

Page 74: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Desire 2: $ efficiency

$ efficiency = QPS/$

74

95% GET, 5% SET, Uniform

Instance type

vCPU ECU N/w (Gbps)

Price/hr

m1.small 1 1 0.1 $0.044

m3.medium 1 3 0.5 $0.070

c3.large 2 7 0.6 $0.105

m3.xlarge 4 13 0.7 $0.280

c3.2xlarge 8 28 1 $0.420

c3.8xlarge 32 108 10 $1.680

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 5 10 20

Mill

ion

QP

S/$

Memcached cluster size

m1.small m3.medium c3.large

m3.xlarge c3.2xlarge c3.8xlarge

Page 75: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 5 10 20

Mill

ion

QP

S/$

Memcached cluster size

m1.small m3.medium c3.large

m3.xlarge c3.2xlarge c3.8xlarge

Desire 2: $ efficiency

75

95% GET, 5% SET, Uniform

Instance type

vCPU ECU N/w (Gbps)

Price/hr

m1.small 1 1 0.1 $0.044

m3.medium 1 3 0.5 $0.070

c3.large 2 7 0.6 $0.105

m3.xlarge 4 13 0.7 $0.280

c3.2xlarge 8 28 1 $0.420

c3.8xlarge 32 108 10 $1.680

$ efficiency = QPS/$

•  Adding more resources is NOT a good solution •  Extra CPU capacity is wasted in the cloud •  Instance with modest CPU offers best $ efficiency

Page 76: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

MBal evaluation – complete system

76

•  8-core 2.5GHz, 2×10MB L3 LLC, 64GB DRAM •  Zipfian workload, 75% GET, 10B key 20B value •  10Gb Ethernet, MultiGET

0.2

0.5

0.9

1.3

1.8

0.2

0.4

0.6 0.7

0.8

0.1 0.2 0.3 0.2 0.3 0.2

0.4

0.8

1.1

1.5

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

1.60

1.80

2.00

1 2 4 6 8 Thr

ough

put

(QP

S in

mill

ions

)

Number of threads/instances

MBal Mercury Memcached Multi-inst Mc

Page 77: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

MBal evaluation – complete system

77

0.2

0.5

0.9

1.3

1.8

0.2

0.4

0.6 0.7

0.8

0.1 0.2 0.3 0.2 0.3 0.2

0.4

0.8

1.1

1.5

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

1.60

1.80

2.00

1 2 4 6 8 Thr

ough

put

(QP

S in

mill

ions

)

Number of threads/instances

MBal Mercury Memcached Multi-inst Mc

20%

•  8-core 2.5GHz, 2×10MB L3 LLC, 64GB DRAM •  Zipfian workload, 75% GET, 10B key 20B value •  10Gb Ethernet, MultiGET

Page 78: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

MBal evaluation – complete system

78

8-core 2.5GHz, 2×10MB L3 LLC, 64GB DRAM Zipfian workload, 75% GET, 10B key 20B value 10GbE network, MultiGET

0.2

0.5

0.9

1.3

1.8

0.2

0.4

0.6 0.7

0.8

0.1 0.2 0.3 0.2 0.3 0.2

0.4

0.8

1.1

1.5

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

1.60

1.80

2.00

1 2 4 6 8 Thr

ough

put

(QP

S in

mill

ions

)

Number of threads/instances

MBal Mercury Memcached Multi-inst Mc

20%

ü  MBal uses lightweight CPU cache-aligned bucket locks!

Page 79: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

100%

10 50 100 150 200 250 300 350 400 450 500 550 600

Eve

nt b

reak

dow

n (%

)

Runtime (seconds)

P3 P2 P1

Event breakdown in E2E test

79

Workload A Workload B Workload C

Phase 3 is sparingly used

Page 80: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

Multi-core scalability

80

•  32-core 2GHz, 64GB DRAM •  memaslap with MultiGET, 16B key 32B value •  10GbE network

0

0.05

0.1

0.15

0.2

0.25

0.3

0 5 10 15 20 25 30 35

Per-

core

thr

ough

put

(Q

PS

in m

illio

ns)

Number of threads

MBal (90% GET) Mercury (90% GET) Memcached (90% GET)

MBal (50% GET) Mercury (50% GET) Memcached (50% GET)

Ideal scalability

Page 81: An In-Memory Object Caching Framework with Adaptive Load ...yuecheng/docs/eurosys15_talk.pdf · In-memory caching in datacenters 4 Client library get(key) set(key) Cache miss set(key)

99th percentile latency vs. throughput

81

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

350 450 550 650 750 850 950 1050 1150

99th

%ile

late

ncy

(ms)

Throughput (QPS in thousands)

Memcached Mercury MBal (w/o load balancer)

MBal (P1) MBal (P2) MBal (P3)

MBal (Unif)

Latency improvement

Throughput improvement