On the Scale and Performance of Cooperative Web Proxy Caching

On the Scale and Performance of Cooperative Web Proxy Caching

University of Washington

Alec Wolman, Geoff Voelker, Nitin Sharma, Neal Cardwell, Anna Karlin, Henry Levy

2

Caching for a Better WebCaching for a Better Web

Performance is a major concern in the Web Proxy caching is the most commonly used method to

improve Web performance Duplicate requests to the same document served from cache Hits reduce latency, network utilization, server load Misses increase latency (extra hops)

Clients Proxy Cache Servers

Hits

Misses Misses

Internet

3

Cache EffectivenessCache Effectiveness

Previous work has shown that hit rate increases with population size [ Duska et al. 97, Breslau et al. 98 ]

A single proxy cache has practical limits Load, network topology, organizational constraints

One technique to scale the client population is to have proxy caches cooperate

4

Cooperative Web Proxy CachingCooperative Web Proxy Caching

Sharing and/or coordination of cache state among multiple Web proxy cache nodes

Effectiveness of proxy cooperation depends on: Inter-proxy communication distance Size of client population served

Proxy utilization and load balance

Clients

ClientsProxy

Clients

Internet

5

Cooperative Web CachingCooperative Web Caching

How much benefit does cooperative caching provide in the Web environment?

6

OutlineOutline

Introduction & related work Trace methodology Cooperative caching for small & medium scale populations Scaling to larger client populations Latency Conclusions

7

Previous ResearchPrevious Research

Cooperative proxy caching is a popular research topic:[e.g. Chankhunthod et al. 96, Zhang et al. 97, Fan et al. 98, Krishnan et al. 98,

Menaud et al. 98, Tewari et al. 98, Touch 98, Karger et al. 99 ...]

Focus is on highly scalable algorithms Some seek to scale to the entire Web

8

ChallengesChallenges

No real understanding of document sharing across diverse organizations

Little analytic or empirical evaluation of these algorithms using realistic workloads for large-scale client populations

Problem: Evaluating cooperative proxy caching requires multiple

simultaneous traces of Web proxies, across a diverse set of organizations

9

Our ContributionOur Contribution

We use multi-organization traces to evaluate cooperative proxy caching at small and medium scales

We use analytic modelling to evaluate cooperative caching at scales beyond those available in our traces

10

A Multi-Organization TraceA Multi-Organization Trace

University of Washington (UW) is a large and diverse client population Approximately 50K people

UW client population contains 200 independent campus organizations

Museums of Art and Natural History Schools of Medicine, Dentistry, Nursing Departments of Computer Science, History, and Music

A trace of UW is effectively a simultaneous trace of 200 diverse client organizations

11

Cooperation Across OrganizationsCooperation Across Organizations

By considering each UW organization as an independent “company” with its own clients and its own proxy, we can empirically evaluate cooperative caching across diverse client populations

How much Web document reuse is there between these organizations? Place a proxy cache in front of each organization. What is the

benefit of cooperative caching among these 200 proxies?

12

UW Trace CharacteristicsUW Trace Characteristics

Trace collected at UW network border (May 1999) Filtered: requests from UW clients, responses from external Web servers Most of requests come directly from clients (0.5% come from proxies)

Trace UW

Duration 7 daysHTTP objects 18.4 millionHTTP requests 82.8 millionAvg. requests/sec 137Total Bytes 677 GBServers 244,211Clients 22,984

13

QuestionQuestion

What is the benefit of cooperative caching among the 200 UW organizational proxies?

14

Ideal Hit Rates for UW proxiesIdeal Hit Rates for UW proxies

Ideal hit rate - infinite storage, ignore cacheability, expirations

Average ideal localhit rate: 43%

15

Ideal Hit Rates for UW proxiesIdeal Hit Rates for UW proxies

Ideal hit rate - infinite storage, ignore cacheability, expirations

Average ideal localhit rate: 43%

Explore benefits of perfect cooperation rather than a particular algorithm

Average ideal hit rate increases from 43% to 69% with cooperative caching

16

Cacheable Hit Rates forUW proxies

Cacheable Hit Rates forUW proxies

Cacheable hit rate - same as ideal, but doesn’t ignore cacheability

Cacheable hit rates are much lower than ideal (average is 20%)

Average cacheable hit rate increases from 20% to 41% with (perfect) cooperative caching

17

Scaling Cooperative CachingScaling Cooperative Caching

Organizations of this size can benefit significantly from cooperative caching

We don’t need cooperative caching to handle the entire UW population size A single proxy (or small cluster) can handle this entire population! No technical reason to use cooperative caching for this

environment In the real world, decisions of proxy placement are often political or

geographical

How effective is cooperative caching at scales where a single cache will not work?

18

Hit Rate vs. Client PopulationHit Rate vs. Client Population

Curves similar to other studies [e.g., Duska97, Breslau98]

Small organizations Significant increase in hit rate

as client population increases The reason why cooperative

caching is effective for UW Large organizations

Marginal increase in hit rate as client population increases

19

Extrapolation to Larger Client Populations

Extrapolation to Larger Client Populations

Use least squares fit to create a linear extrapolation of hit rates

Hit rate increases logarithmically with client population, e.g., to increase hit rate by 10%:

Need 8 UWs (ideal) Need 11 UWs (cacheable)

“Low ceiling”, though: 100% at 11.3M clients (UW ideal) 61% at 2.1M clients (UW cacheable)

A city-wide cooperative cache would get all the benefit

20

QuestionQuestion

What is the benefit of cooperative caching among large organizations?

21

UW & Microsoft CooperationUW & Microsoft Cooperation

What if we ran a wire across Lake Washington, to connect UW & Microsoft?

We collected a Microsoft proxy trace during same time period as the UW trace Combined population is ~80K clients Increases the UW population by a factor of 3.6 Increases the MS population by a factor of 1.4

22

UW & Microsoft TracesUW & Microsoft Traces

Trace UW MS

Duration 7 days 6.25 days

HTTP objects 18.4 million 15.3 million

HTTP requests 82.8 million 107.7 million

Avg. requests/sec 137 199

Total Bytes 677 GB N/A

Servers 244,211 360,586

Clients 22,984 60,233

Population ~50,000 ~40,000

23

UW & MS Cooperative CachingUW & MS Cooperative Caching

Is this worth it?

24

What about Latency?What about Latency?

From the client’s perspective, latency matters far more than hitrate

How does latency change with population?

Median latencies improve only a few 100 ms with ideal caching compared to no caching.

On average, a web page consists of 4.5 HTTP objects

25

ConclusionsConclusions

A negative result: without significant workload changes, designing highly-scalable cooperative proxy-cache schemes is unnecessary Largest benefit is achieved with small populations (up to 2K-5K clients) Limited benefit of cooperation when we combined the UW & Microsoft

populations Document cacheability is a severe limitation with current workloads

Analytic model results: Confirm that most of benefit is obtained once you reach populations the

size of a large city Future workloads: large-scale cooperative caching could become more

relevant with different rate-of-change characteristics

26

UW Cooperative Caching ResultsUW Cooperative Caching Results

27

Extrapolating UW & MS Hit RatesExtrapolating UW & MS Hit Rates

28

UW LatencyUW Latency

29

Complete Hit Rate GraphComplete Hit Rate Graph

30

Blank SlideBlank Slide

blankness here...

Documents

On the Scale and Performance of Cooperative Web Proxy Caching