Click here to load reader

Cost-Aware WWW Proxy Caching Algorithms WWW Proxy Caching Algorithms Pei Cao University of Wisconsin-Madison Sandy Irani University of California-Irvine Cost-Aw are WWW Pro xy Cac

  • View
    216

  • Download
    3

Embed Size (px)

Text of Cost-Aware WWW Proxy Caching Algorithms WWW Proxy Caching Algorithms Pei Cao University of...

  • The following paper was originally published in theProceedings of the USENIX Symposium on Internet Technologies and Systems

    Monterey, California, December 1997

    For more information about USENIX Association contact:

    1. Phone: 510 528-86492. FAX: 510 548-57383. Email: office@usenix.org4. WWW URL: http://www.usenix.org/

    Cost-Aware WWW Proxy Caching Algorithms

    Pei CaoUniversity of Wisconsin-Madison

    Sandy IraniUniversity of California-Irvine

  • Cost-Aware WWW Proxy Caching Algorithms

    Pei Cao Sandy IraniDepartment of Computer Science, Information and Computer Science Department,

    University of Wisconsin-Madison. University of California-Irvine.

    cao@cs.wisc.edu irani@ics.uci.edu

    Abstract

    Web caches can not only reduce network trac anddownloading latency, but can also aect the distri-bution of web trac over the network through cost-aware caching. This paper introduces GreedyDual-Size, which incorporates locality with cost and sizeconcerns in a simple and non-parameterized fashionfor high performance. Trace-driven simulations showthat with the appropriate cost denition, GreedyDual-Size outperforms existing web cache replacement al-gorithms in many aspects, including hit ratios, la-tency reduction and network cost reduction. In ad-dition, GreedyDual-Size can potentially improve theperformance of main-memory caching of Web docu-ments.

    1 Introduction

    As the World Wide Web has grown in popularity inrecent years, the percentage of network trac dueto HTTP requests has steadily increased. Recent re-ports show that Web trac has constituted 40% ofthe network trac in 1996, compared to only 19%in 1994. Since the majority of Web documents re-quested are static documents (i.e. home pages, audioand video les), caching at various network pointsprovides a natural way to reduce web trac. A com-mon form of web caching is caching at HTTP proxies,which are intermediateries between browser processesand web servers on the Internet (for example, one canchoose a proxy by setting the network preference inthe Netscape Navigator1).

    There are many benets of proxy caching. Itreduces network trac, average latency of fetchingWeb documents, and the load on busy Web servers.Since documents are stored at the proxy cache, manyHTTP requests can be satised directly from thecache instead of generating trac to and from the

    1Navigator is a trademark of Netscape Inc.

    Web server. Numerous studies [WASAF96] haveshown that the hit ratio for Web proxy caches canbe as high as over 50%. This means that if proxycaching is utilized extensively, the network trac canbe reduced signicantly.

    Key to the eectiveness of proxy caches is a doc-ument replacement algorithm that can yield high hitratio. Unfortunately, techniques developed for lecaching and virtual memory page replacement do notnecessarily transfer to Web caching.

    There are three primary dierences between Webcaching and conventional paging problems. First,web caching is variable-size caching: due to the re-striction in HTTP protocols that support whole letransfers only, a cache hit only happens if the entirele is cached, and web documents vary dramaticallyin size depending on the information they carry (text,image, video, etc.). Second, web pages take dierentamounts of time to download, even if they are of thesame size. A proxy that wishes to reduce the aver-age latency of web accesses may want to adjust itsreplacement strategy based on the download latency.Third, access streams seen by the proxy cache are theunion of web access streams from tens to thousandsof users, instead of coming from a few programmedsources as in the case of virtual memory paging.

    Proxy caches are in a unique position to aectweb trac on the Internet. Since the replacementalgorithm decides which documents are cached andwhich documents are replaced, it aects which fu-ture requests will be cache hits. Thus, if the insti-tution employing the proxy must pay more on somenetwork links than others, the replacement algorithmcan favor expensive documents (i.e. those travellingthrough the expensive links) over cheap documents.If it is known that certain network paths are heav-ily congested, the caching algorithm can retain moredocuments which must travel on congested paths.The proxy cache can reduce its contribution to thenetwork router load by preferentially caching docu-ments that travel more hops. Web cache replace-

    Page 1

  • ment algorithms can incorporate these considerationsby associating an appropriate network cost with ev-ery document, and minimizing the total cost incurredover a particular access stream.

    Today, most proxy systems use some form of theLeast-Recently-Used replacement algorithm. Thoughsome proxy systems also consider the time-to-liveelds of the documents and replace expired docu-ments rst, studies have found that time-to-live eldsrarely correspond exactly to the actual life time ofthe document and it is better to keep expired-but-recently-used documents in the cache and validatethem by querying the server [LC97]. The advantageof LRU is its simplicity; the disadvantage is that itdoes not take into account le sizes or latency andmight not give the best hit ratio.

    Many Web caching algorithms have been pro-posed to address the size and latency concerns. Weare aware of at least nine algorithms, from the sim-ple to the very elaborate, proposed and evaluated inseparate papers, some of which give conicting con-clusions. This naturally leads to a state of confusionover which algorithm should be used. In addition,none of the existing algorithms address the networkcost concerns.

    In this paper, we introduce a new algorithm,called GreedyDual-Size, which combines locality, sizeand latency/cost concerns eectively to achieve thebest overall performance. GreedyDual-Size is a varia-tion on a simple and elegant algorithm called Greedy-Dual [You91b], which handles uniform-size variable-cost cache replacement. Using trace-driven simula-tion, we show that GreedyDual-Size with appropri-ate cost denitions out-performs the various \cham-pion" web caching algorithms in existing studies ona number of performance issues, including hit ratios,latency reduction, and network cost reduction.

    2 Existing Results

    The size and cost concerns make web caching a muchmore complicated problem than traditional caching.Below we rst summarize the existing theoretical re-sults, then take a look at a variety of web cachingalgorithms proposed so far.

    2.1 Existing Theoretical Results

    There are a number of results on the optimal oinereplacement algorithms and online competitive algo-rithms on simplied versions of the Web caching prob-lem.

    The variable document sizes in web caching makeit much more complicated to determine an optimal of-

    ine replacement algorithm. If one is given a sequenceof requests to uniform size blocks of memory, it iswell known that the simple rule of evicting the blockwhose next request is farthest in the future will yieldthe optimal performance [Bel66]. In the variable-sizecase, no such oine algorithm is known. In fact, it isknown that determining the optimal performance isNP-hard [Ho97], although there is an algorithm whichcan approximate the optimal to within a logarithmicfactor [Ir97]. The approximation factor is logarithmicin the maximum number of bytes that can t in thecache, which we will call k.

    For the cost consideration, there have been severalalgorithms developed for the uniform-size variable-cost paging problem. GreedyDual [You91b], is actu-ally a range of algorithms which include a generaliza-tion of LRU and a generalization of FIFO. The nameGreedyDual comes from the technique used to provethat this entire range of algorithms is optimal accord-ing to its competitive ratio. The competitive ratio isessentially the maximum ratio of the algorithms costto the optimal oine algorithm's cost over all possiblerequest sequences. (For an introduction to competi-tive analysis, see [ST85]).

    We have generalized the result in [You91b] to showthat our algorithm GreedyDual-Size, which handlesdocuments of diering sizes and diering cost (de-scribed in Section 4), also has an optimal competi-tive ratio. Interestingly, it is also known that LRUhas an optimal competitive ratio when the page sizecan vary and the cost of fetching a document is thesame for all documents or proportional to the size ofa document [FKIP96].

    2.2 Existing Document Replacement

    Algorithms

    We describe nine cache replacement algorithms pro-posed in recent studies, which attempt to minimizevarious cost metrics, such as miss ratio, byte miss ra-tio, average latency, and total cost. Below we givea brief description of all of them. In describing thevarious algorithms, it is convenient to view each re-quest for a document as being satised in the follow-ing way: the algorithm brings the newly requesteddocument into the cache and then evicts documentsuntil the capacity of the cache is no longer exceeded.Algorithms are then distinguished by how they choosewhich documents to evict. This view allows for thepossibility that the requested document itself may beevicted upon its arrival into the cache, which meansit replaces no other document in the cache.

    Least-Recently-Used (LRU) evicts the doc-ument which was requested the least recently.

    Page 2

  • Least-Frequently-Used (LFU) evicts thedocument which is accessed least frequently.

    Size [WASAF96] evicts the largest document.

    LRU-Threshold [ASAWF95] is the same asLRU, except documents larger than a certainthreshold size are never cached;

    Log(Size)+LRU [ASAWF95] evicts thedocument who has the largest log(size) and isthe least recently used document among all doc-uments with the same log(size).

    Hyper-G [WASAF96] is a renement of LFUwith last access time and size consider

Search related