51
Web Caching Dr. Yingwu Zhu

Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Embed Size (px)

Citation preview

Page 1: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Web Caching

Dr. Yingwu Zhu

Page 2: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

What is Web Caching

• Introducing proxy servers at certain points in the network that serve in caching Web documents for faster client access.

• Comparable to the cache memory in a computer system

Page 3: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Proxy Cache

clients

proxy

servers

Reply

Req.Req.

Reply

Page 4: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

How?

• Client send requests to the proxy.• If the requested document is in its

cache, the proxy serves the request from its cache.

• Otherwise, the proxy forward the request to the server.

• Server replies the request through the proxy (proxy keep a copy of the requested document).

Page 5: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Why Web Caching?

• Rapid growth in HTTP traffic to form the largest part of the Internet traffic which causes more network congestion and server unavailability.

• The number of Web static pages almost doubles every year

• Some old data– Number of unique pages: 800M < X < 2.2B – Number of unique web sites: 8,500,000– static pages: %30 - %40– pages revisited: %80– expected hit-rate: %24 - %32

Page 6: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Why Web Caching?

• Bandwidth

• Latency

• Performance = Response Time

• Server Load

• Failure Redundancy

Page 7: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Expected Gains

• Bandwidth saving• Improving content availability.• Improving web server availability.• Server load balancing.• Reducing user-perceived latency

Page 8: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

What: Content and Protocols

• HTTP 1.0 Basic protocol– Send Request based on fix number of

verbs• GET• HEAD• POST

– Receive response, meta-data, content

Page 9: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

What: Content and Protocols• HTTP Request

Request = Simple-Request | Full-Request

Simple-Request = "GET" SP Request-URI CRLF

Full-Request = Request-Line ; * ( General-Header ;

| Request-Header ;| Entity-Header ) ;

CRLF[ Entity-Body ]

Page 10: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

What: Content and Protocols

• Example: GET /pub/www/index.html HTTP/1.0

• Response:HTTP/1.1 200 OKServer: Microsoft-IIS/5.0Date: Sat, 19 Oct 2002 05:46:53 GMTExpires: Sun, 20 Oct 2002 16:00:00 GMTContent-Length: 2291Content-Type: text/htmlCache-control: private

Page 11: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

What: Content and Protocols

• Example “if-modified-since”:GET /pub/www/index.html HTTP/1.0If-Modified-Since: Sat, 19 Oct 2002 19:43:31 GMT

• Response:HTTP/1.1 200 OKServer: Microsoft-IIS/5.0Date: Thu, 13 Jul 2000 05:46:53 GMTExpires: Sun, 20 Oct 2002 16:00:00 GMTContent-Length: 2291Content-Type: text/htmlCache-control: private

Page 12: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

What: Content and Protocols

• Example “if-modified-since”:

GET /pub/www/index.html HTTP/1.0If-Modified-Since: Sat, 19 Oct 2002 19:43:31 GMT

• Response:

HTTP/1.1 304 Not Modified

Page 13: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

HTTP support for caching

• Conditional requests (IMS)• Servers can set expires and max-age • Request indirection: application level

routing• Range requests, entity tag • Cache-control header

– Requests: min-fresh, max-stale, no-transform

– Responses: must-revalidate, public, private, no-cache

Page 14: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Reverse

ProxyReverse

ProxyReverse

Proxy

Intranet

Where

Browser

Local ISP

cacheL4 Switch

Data Center

ISPcdn

cache

cache

Content

ServerContent

ServerContent

ServerContent

Server

Reverse

Proxy

Browsercache

Browsercache

cdn

Page 15: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Cache Types

• Proxy Caching• Reverse Proxy Caching• Transparent Caching

• Adaptive Caching

• Push Caching

• Active Caching

Page 16: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Proxy Caching

• Harvest/Squid

• Provide web content for a fixed user base

• Deployed at the network edges (company or institutional

gateway or firewall hosts)

• Standalone operation

• Manual configuration in web browsers

• Commodity product/technology

• Single point of failures

Page 17: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Reverse Proxy Caching

• Designed to offload duties from one

or more specific servers

• Data size is limited to size of static

content on the server

• Challenge is fast, disk-less operation

• Cache consistency is easy

Page 18: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Transparent Caching

• Intercept HTTP requests and redirect them to web

cache servers or cache clusters

• No client configuration

• Violates end-to-end paradigm

– Client thinks it is talking directly to server

– Server thinks it is talking to cache

• Implemented as: L4-switch

– Layer 4 switch makes switching decisions based on TCP

or UDP port number, i.e., 80

Page 19: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Transparent Caching

Page 20: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Adaptive Caching

• ISP Level caching, global data placement optimization

• Cooperating multiple distributed caches

• Operate as a cache-mesh based on content demand

• Cache Group Management Protocol – How meshes are formed

– How individual caches join/leave the meshes

• Content Routing Protocol sends request to the appropriate

cache within the meshes• Uses distributed cache meshes to solve the hot spot

problem• Caches dynamically join and leave the groups based on

content demand• Administrative boundaries must be relaxed

Page 21: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Push Caching

• Keep data close to those clients requesting this information

• Send the data out proactively• Assumption: we are able launch

caches that may cross administrative boundaries

• Incurs cost (storage and transmission)

Page 22: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Active Caching

• Applies caching to dynamic documents• 30 % of client HTTP requests contains

cookies• The servers provides the cache with

the objects and any associated cache applets

– Use an applet inside of the cache to

customize dynamic pages on the fly

Page 23: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Cache Placement/Deployment

• Close to clients/content consumers– Proxy caching– Transparent proxy caching

• Close to servers/content providers– Improve access to logical sets of data– Delay-sensitive data: video, audio– Reverse proxy caching– Push caching

• Network choke points: strategic deployment– Adaptive caching– Problem with administrative control

Page 24: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Zipf Law vs. Web Access

• Zipf Law• Web Access• Caching?

Page 25: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Zipf’s Law

• Zipf’s law: The frequency of an event P as a function of rank i is a power law function:

Pi = Ω / iα where α ≤ 1

Page 26: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Zipf’s Law

• Observed to be true for– Frequency of written words in

English texts– Population of cities– Income of a company as a function

of rank

Page 27: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Zipf’s Law vs. Web Access

• For a given server, page access by rank follows Zipf’s law

• Web requests from a fixed population of users follows Zipf’s law 0.64 < α < 0.83

Page 28: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Observations

• Top %1 of all documents account for %20 - %35 of proxy requests

• Top %10 account for %45 - %55 of requests

• It takes %25 to %40 of all documents to account for %70 of requests

• It takes %70 to %80 of all documents to account for %90 of requests

Page 29: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Zipf’s Law and Caching

Discussion

• How does this help in cache design?

Page 30: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Basic caching algorithm

Pages may be

• Fresh: up-to-date

• Expired: current date > expiration

date

• Stale: “old”

Page 31: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Basic caching algorithm - #2

If (page is in the cache)if ( page is expired or stale )

Get from server - if-modified-since

If not modified, Get from cache Get from ServerElse Get from Server

Page 32: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Basic caching algorithm - #3

If cache has spaceStore the file

Else1. Delete expired from cache2. Delete stale from cache3. Delete LRU from cache4. Delete largest/smallest from cache?

Page 33: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Cache Replacement

• Cache size is limited, need replacement policy

• LRU• LFU• Greedy-dual size• Many others

Page 34: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Cache Consistency

• Multiple copies of objects created– How and when renewing the copies?

• Goals– Avoid stale copies– Keep non useful traffic as low as possible

Page 35: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Cache Consistency: Polling

Solution 1: polling every time

implemented in HTTP using the optional “if-modified-since" request header field

Benefit: strong consistencyDrawback: very slow cache hit

Page 36: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Cache Consistency: PollingSolution 2: polling if TTL expires, widely

used– Associate a TTL (12 hours or 2 days) with each

cached object

implemented in HTTP using the optional "expires" header field

Benefit: fast cache hitDrawback: weak cache consistency (5% stale) due to TTL is an a priori estimate of an object's life time

Page 37: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Cache Consistency

• Solution 3 : Invalidation Protocols• The server helps the proxy in maintaining

consistency• Invalidation protocols

– When the proxy makes a request,• Piggyback cache validation (PCV) : the proxy provides some

other potentially stale copies for server validating• Piggyback cache invalidation (PCI) : the server provides

some copies which have been updated since last access– Use of volumes

• Volume lease :– The client receive a lease from the server– During the lease validity the client can retreive copies

from proxy– When the lease expire the client has to renew it

• Problems: scalability, servers needs keep cache states

Page 38: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Cache Cooperation

• Hierarchical caching– Cache servers form a hierarchy, tree-like

structures– Parent servers: top of the hierarchy, receive

requests from child servers. If they do not have the requested objects, either ask their parents or original web servers

– Sibling servers: if the local cache does not have the requested object, then ask its sibling caches. If the sibling caches do not have the object, then the local cache asks the parent cache

Page 39: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster
Page 40: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Cache Hierarchies• Use hierarchy to scale a proxy

– Why? • Larger population = higher hit rate (less compulsory

misses)• Larger effective cache size

– Why is population for single proxy limited?• Performance, administration, policy, etc.

• NLANR cache hierarchy– Most popular – 9 top level caches– Internet Cache Protocol based (ICP)– Squid/Harvest proxy

• How to locate content?

Page 41: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

ICP (Internet cache protocol)

• Simple protocol to query another cache for content

• Uses UDP – why?• ICP message contents

– Type – query, hit, hit_obj, miss– Other – identifier, URL, version, sender address– Special message types used with UDP echo port

• Used to probe server or “dumb cache”

• Query and then wait till time-out (2 sec)• Transfers between caches still done using HTTP

Page 42: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Squid

Client

Parent

Child Child Child

Web page request

ICP Query

ICP Query

Page 43: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Squid

Client

Parent

Child Child ChildICP MISS

ICP MISS

Page 44: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Squid

Client

Parent

Child Child Child

Web page request

Page 45: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Squid

Client

Parent

Child Child Child

Web page request

ICP Query

ICP Query

ICP Query

Page 46: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Squid

Client

Parent

Child Child Child

Web page request

ICP MISS

ICP HIT

ICP HIT

Page 47: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Squid

Client

Parent

Child Child Child

Web page request

Page 48: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Hierarchical caching

• Ideally, want the cache mesh to behave as a single cache with equivalent capacity and processing capability

• ICP: many copies of popular objects created – capacity wasted

• High Latency: More than one hop needed for searching object

• How to improve? Discuss!

Page 49: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Problems with caching

• Over 50% of all HTTP objects are uncacheable.• Sources:

– Dynamic data stock prices, frequently updated content

– CGI scripts results based on passed parameters– SSL encrypted data is not cacheable

• Most web clients don’t handle mixed pages well many generic objects transferred with SSL

– Cookies results may be based on passed data– Hit metering owner wants to measure # of hits

for revenue, etc, so, cache busting

Page 50: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Risks of Using Proxy

• Benefits: reduce latency, bandwidth saving, etc.

• Risks– Obsolete data– Violate client privacy: the proxy can

keep a log file telling which objects the client has requested

– Data integrity

Page 51: Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster

Real Proxy Servers• Squid: The most widely used. The better working and the

free one.• http://www.squid-cache.org/• Microsoft ISA Server 2004 : Microsoft developed ISA to

replace Microsoft proxy server. It’s fully functional with Active Directory

http://www.microsoft.com/isaserver/• Apache: Apache web server has a module to do reverse

caching (experimental) http://httpd.apache.org/docs-2.0/mod/mod_cache.html• Cisco Cache Engine: sits next to (mostly) Cisco routers and

receives transparently redirected HTTP requests http://www.cisco.com/warp/public/cc/pd/cxsr/500/index.shtml

• CERN/W3C HTTPd: It was the original proxy server. http://www.w3.org/hypertext/WWW/Daemon/Status.html