Cooperative Caching Middleware for Cluster-Based Servers Francisco Matias Cuenca-Acuna Thu D. Nguyen Panic Lab Department of Computer Science Rutgers University

Cooperative Caching Middleware for Cluster-Based Servers

Francisco Matias Cuenca-Acuna

Thu D. Nguyen

Panic Lab

Department of Computer Science

Rutgers University

Our work

• Goal– Provide a mechanism to co-manage memory of cluster-based

servers

– Deliver a generic solution that can be reused by Internet servers and file systems

• Motivation– Emerging Internet computing model based on infrastructure

services like Google, Yahoo! and others

» Being built on clusters: scalability, fault-tolerance

– It’s hard to build efficient cluster-based servers

» Dealing with distributed memory

» If memories are used independently, servers only perform well when the working set fits on a single node

Previous solutions

Request distribution based on load and

data affinity

Web Server

FS

Web Server

FS

Web Server

FS

Web Server

FS

Web Server

FS

Network

A A

Front end

Previous solutions

Web Server

FS

Web Server

FS

Web Server

FS

Web Server

FS

Web Server

FS

Network

A A

Distributed front end

Request distribution based on load and

data affinity

Round Robin req .distribution

Our approach

Web Server

FS

Web Server

FS

Web Server

FS

Web Server

FS

Web Server

FS

Network

AA

Cooperative block caching and global block replacement

Round Robin req distribution

Our approach

Web Server

FS

Web Server

FS

Web Server

FS

Web Server

FS

Web Server

FS

Network

Round Robin req distribution

Other uses for our CC layer

Why cooperative caching and what do we give up?

• Advantages of our approach– Generality

» By presenting a block-level abstraction

» Can be used across very different applications such as web servers and file systems

» Doesn’t need any application knowledge

– Reusability

» By presenting it as a generic middleware layer

• Disadvantages of our approach– Generality + no application knowledge possible

performance loss

– How much?

Our contributions

• Study carefully why cooperative caching, as designed for cooperative client caching to reduce server load, does not perform as well as content-aware request distribution

• When compared to a web server that uses content-aware request distribution

– Lose 70% when using a traditional CC algorithm

– Lose only 8% when using our adapted version (CCM)

• Adapt cooperative caching to better suit cluster based servers

– Trade lower local hit rates for higher total hit rates (local + global)

Our cooperative caching algorithm (CCM)

• Files are distributed across all nodes– No replication

– The node holding a file on disk is called the file’s home

– Homes are responsible for tracking blocks in memory

• Master blocks and non-master blocks– There is only one master block for each block/file in memory

– CCM only tracks master blocks

• Hint based block location– Algorithm based on Dahlin et. al (1994)

– Nodes have approximate knowledge of block location and may have to follow a chain of nodes to get to it

Replacement mechanisms

• Each node maintains local LRU lists

• Exchange age hints when forwarding blocks– Piggyback age of oldest block

• Replacement– Victim is a local block: evict

– Victim is a master block:

» If oldest block in cluster according to age hints, evict

» Otherwise, forward to peer with oldest block

Example of CCM at work

bmc

m pnfhome

Request b

Forward

bmc

Request b

Request b

Request b

b

b

Assessing performance

• Compare a CCM-based web server against one that uses content-aware request distribution

– L2S (HPDC 2000)

» Efficient and scalable

» Application-specific request distribution

» Maintain global information

» File based caching

• Event driven simulation– The same simulator used in L2S

• The platform we simulate is equivalent to:– 1Gbps VIA LAN

– Clusters of 4 & 8 nodes with single 800Mhz Pentium III

– IDE hard drive on each node

Workload

• Four WWW traces:

• Drive server as fast as possible

Trace Avg. req. size Num. of requests Working set sizeCalgary 13.67KB 567823 128MBClarknet 9.50KB 2978121 250MBNASA 20.33KB 3147685 250MBRutgers 17.54KB 745815 500MB

0

1000

2000

3000

4000

5000

6000

7000

8000

4 8 16 32 64 128 256Memory per node (MB)

Th

rou

gh

pu

t (r

eq/s

ec)

L2S

CCM-Basic

Results

Throughput for Clarknet on 8 nodes

0

1000

2000

3000

4000

5000

6000

7000

8000


Th

rou

gh

pu

t (r

eq/s

ec)

L2S

CCM-DS

CCM-Basic

0

1000

2000

3000

4000

5000

6000

7000

8000


Th

rou

gh

pu

t (r

eq/s

ec)

L2S

CCM

CCM-DS

CCM-Basic

Hit Rate

0

10

20

30

40

50

60

70

80

90

100

4 8 16 32 64 128 256 512

Memory per node (MB)

Remote hit rate

Local hit rate

0

10

20

30

40

50

60

70

80

90

100

4 8 16 32 64 128 256 512

Memory per node (MB)

Remote hit rate

Local hit rate

Hit rate distribution on CCM-Basic Hit rate distribution on CCM

To

tal

hit

rat

e

To

tal

hit

rat

e

Normalized throughput

Throughput normalized versus L2S

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2


No

rmal

ized

th

rou

gh

pu

t

Clarknet

Rutgers

Nasa

Calgary

Resource utilization

CCM’s resource utilization

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

4 8 16 32 64 128 256 512Memory per node (MB)

No

rmal

ized

res

ou

rce

usa

ge

Disk

CPU

NIC

Scalability

Throughput when running on varying cluster sizes

0

2000

4000

6000

8000

10000

12000

14000

Number of nodes

Th

rou

gh

pu

t (r

eq

/se

c)

Further results

• Performance differences between CCM and L2S may be affected by:

– L2S’s use of TCP hand-off

– L2S’s assumption that files are replicated everywhere

– Refer to paper for estimates of potential performance difference due to these factors

• Current work– Limit the amount of metadata maintained by CCM

» To reduce memory usage

» Discard outdated information

– Lazy eviction and forwarding notification

» On average finds a block with 1.1 hops (vs. 2.4)

» 10% response time decrease

» 2% throughput increase

Conclusions

• A generic block-based cooperative caching algorithm can efficiently co-manage cluster memory

– CCM performs almost as well as a highly optimized content aware request distribution web server

– CCM scales linearly with cluster size

– Presenting a block-based solution to a file-based application only led to a small performance loss should work great for block-based applications

• CCM achieves high-performance by using a new replacement algorithm well-suited to a server environment

– Trades off local hit rates and network bandwidth for increased total hit rates

– Right trade-off given current network and disk technology trends

Future & related work

• Future Work– Investigate the importance of load-balancing

– Provide support for writes

– Validate simulation results with implementation

• Some Related Work– PRESS (PPoPP 2001)

– L2S (HPDC 2000)

– LARD (ASPLOS 1998)

– Cooperative Caching (OSDI 1994)

– Cluster-Based Scalable Network Services (SOSP 1997)

Thanks to

• Liviu Iftode

• Ricardo Bianchini

• Vinicio Carreras

• Xiaoyan Li

Want more information?www.panic-lab.rutgers.edu

Extra slides – Simulation parameters

Extra slides – Response time

Response time normalized versus L2S

0.6

0.8

1

1.2

1.4

1.6

1.8


No

rmal

ized

res

po

nse

tim

e

Clarknet

Rutgers

Nasa

Calgary

Extra slides – Hops vs. hit rate

Number of hops versus hit rate

0

0.5

1

1.5

2

2.5

4 8 16 32 64 128 256 512Memory per node (MB)

Nu

mb

er

of

ho

ps

0

20

40

60

80

100

120

Hit

ra

te

Hops (w/ notification)HopsGlobal Hit Rate

Extra slides – Traces characteristics

Extra slides – Using location hints

bmc

m pnfhome

Request b

Forward

bmc

Request b

Request b

Request b

b

b

Documents

Cooperative Caching Middleware for Cluster-Based Servers Francisco Matias Cuenca-Acuna Thu D. Nguyen Panic Lab Department of Computer Science Rutgers University