30
1 Caching in HTTP Representation and Management of Data on the Internet

Caching in HTTP

  • Upload
    michel

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Caching in HTTP. Representation and Management of Data on the Internet. Reasons for Using Web Caches. Reduce Latency Since the cache is closer to the client, it takes less time for the client to get the object and display it Save bandwidth - PowerPoint PPT Presentation

Citation preview

Page 1: Caching in HTTP

1

Caching in HTTP

Representation and Management of Data on the

Internet

Page 2: Caching in HTTP

2

Reasons for UsingWeb Caches

• Reduce Latency– Since the cache is closer to the client,

it takes less time for the client to get the object and display it

• Save bandwidth– Since each object is only gotten from

the server once, it reduces the amount of bandwidth used by a client

Page 3: Caching in HTTP

3

Type of Web Caches• Browser Caches

– A portion of the hard disk is used to store objects that have already been displayed

– If an objected is requested again (for example, by hitting the “back” button), the request is served from the browser cache

• Proxy Caches – These are shared caches – they serve

many users

Page 4: Caching in HTTP

4

For example, how much traffic is saved ifit is not required to send the Google icon

with each search result?

Page 5: Caching in HTTP

5

Caching Improves Performance in Two Ways

• In some cases, caching eliminates the need to send requests by using an expiration mechanism

• In other cases, caching eliminates the need to send full responses by using a validation mechanism

Page 6: Caching in HTTP

6

An Example of Using a Validation Mechanism

client

server

cache

•Client: GET /fruit/apple.gif•Server responds withLast-Modified-Date: ...

•Server returns either 304 Not Modified or object

•Client sendsGET /fruit/apple.gif …If-Modified-Since: …

•Client caches object and last-modified-date

Page 7: Caching in HTTP

7

Proxy Caches

client

client

client

server

server

proxyserver

GET /fruit/apple.gif

GET /fruit/apple.gif

GET /fruit/apple.gif

Page 8: Caching in HTTP

8

Benefit of Caching

client

client

client

10Mbps LAN

R R1.5Mbps

server

server15 req/sec100Kbits/req

proxyserver50% hit rate is possible, because

the cache is shared by many users and, therefore, there is alarge number of shared hits

Internet

Page 9: Caching in HTTP

9

Points to Consider When Designing a Web Site

• Caches can help the Web site to load faster

• Caches may “hide” the users of the Web site, making it difficult to see who is using the site

• Caches may serve content that is out of date, or stale

Page 10: Caching in HTTP

10

The Risk in Caching• Response might not be

“semantically transparent”– the response is different from what

would have been returned by the origin server

• The cache should verify that the copy is fresh

• The copy is stale if it is not fresh

Page 11: Caching in HTTP

11

Cases Where Objects Are Not Cached

• In the following cases, objects are not cached:– The object’s headers tell the cache

not to keep the object– The object has no validator (i.e., an

Expires value, a Last-Modified value or an Etag)

– The object is authenticated or secured

Page 12: Caching in HTTP

12

Fresh Objects are Served from the Cache

• An object is fresh in the following cases:– The object has an expiry time or other age-

controlling directive, and is still within the fresh period

– The browser cache has already seen the object, and has been set to check for newer versions once a session

– A proxy cache has received the object recently, and the object was modified relatively long ago (this is a heuristic – see later)

Page 13: Caching in HTTP

13

Validating an Object• If the object is stale (i.e., not fresh), the

cache will ask the origin server to validate the object

• In response, the origin server will either– tell the cache that the object has not

changed, or– send a new copy of the object to the cache

Page 14: Caching in HTTP

14

The Expires HTTP Header• A response may include an Expires header:Expires: Fri, 30 Oct 2002 14:19:41 GMT

• If an expiry time is not specified, the cache can heuristically estimate the expiry time

Page 15: Caching in HTTP

15

A Possible Heuristic• If the cache received the object 10

hours after it was last modified, then it can heuristically determine that the expiry time is 1 hour after it has received it

• In general, add 10% of the interval between the last-modification time (given by the Last-Modified header) and the time it was received

Page 16: Caching in HTTP

16

The Cache-Control Header(Introduced in HTTP 1.1)

• The following are possible values of the cache-control header in responses

• max-age=[seconds] – Specifies the maximum amount of time

that an object will be considered fresh (similar to the Expires header)

• s-maxage=[seconds] – Similar to max-age, except that it only

applies to proxy (shared) caches

Page 17: Caching in HTTP

17

More Possible Values of the Cache-Control Header

• public– Document is cacheable even if normal rules say

that it shouldn’t be (e.g., authenticated document)

• private– The document is for a single user and can only be

stored in private (non-shared) caches• no-store

– Document should never be cached and should not even be stored in a temporary location on disk (this value is intended to prevent inadvertent copies of sensitive information)

Page 18: Caching in HTTP

18

More Possible Values of the Cache-Control Header

• must-revalidate– Tell caches that they must obey any

freshness information provided with the object (HTTP allows caches to take liberties with the freshness of objects)

• proxy-revalidate – Similar to must-revalidate, except that

it only applies to proxy (shared) caches

Page 19: Caching in HTTP

19

No-Cache• Some values of the Cache-Control

header are meaningful in either responses or requests

• No-cache– In a response, it means not to cache the

object– In a request, it means to bring a copy

from the origin server (i.e., not to use a cache)

Page 20: Caching in HTTP

20

The Pragma Header• The Pragma: no-cache request header

is the same as no-cache in the Cash-Control request header

• Don’t use Pragma – its meaning is specified only for requests and it is used just for compatibility with HTTP 1.0

• A Safer approach is to set both the Pragma and the Cache-Control response headers with the value no-cache

Page 21: Caching in HTTP

21

Who AddsCache-Control Headers?

• The server– The configuration of the server

determines which cache-control headers are added to responses

– The author of the page can add headers by means of the .htaccess file (only in the Apache server)

• The Application that generates dynamic pages, e.g., servlets, ASP, PHP

Page 22: Caching in HTTP

22

Cache-Control in HTTP-EQUIV

• The author of the page can add a cache-control header by means of the Meta HTTP-EQUIV tag– <meta http-equiv=“cache-control”

content =“no cache”>• But usually only the browser

interprets this tag• Proxies along the way don’t read it

Page 23: Caching in HTTP

23

Validators• A validator is any mechanism that

may help in determining whether a copy is fresh or stale– A strong validator is, for example, a

counter that is incremented whenever the resource is changed

– A weak validator is, for example, a counter that is incremented only when a significant change is made

For example, if the only change in thesite is the number of visitors…

Page 24: Caching in HTTP

24

Last-Modified Header• The most common validator is the

time when the document was last changed, the last-modified time– It is given by the Last-Modified header– This header should be included in

every response– It is a weak validator if an object can

change more than once within a one-second interval

Page 25: Caching in HTTP

25

ETag (Entity Tag)• ETag is a validator generated by

the server (i.e., unique identifier)– It is part of the HTTP 1.1 specification

(not available in HTTP 1.0)• The preferred behavior for an HTTP

1.1 origin server is to send both a strong entity tag and a Last-Modified value

Page 26: Caching in HTTP

26

Conditional Requests• Some conditional headers are

– If-Modified-Since– If-Unmodified-Since– If-None-Match

• These headers are used to validate an object (i.e., check with the origin server whether the object has changed)

Page 27: Caching in HTTP

27

HTTP/1.1 304 Not Modified Date: Fri, 31 Dec 1999 23:59:59 GMT [blank line]

If-Modified-Since Header• The If-Modified-Since: header is used

with a GET request• If the requested resource has been

modified since the given date, the server returns the resource as it normally would (i.e., header is ignored)

• Otherwise, the server returns a 304 Not Modified response, including the Date: header, but with no message body

Page 28: Caching in HTTP

28

If-Unmodified-Since Header

• The If-Unmodified-Since: header can be used with any method

• If the requested resource has not been modified since the given date, the server returns the resource as it normally would

• Otherwise, the server returns a 412 Precondition Failed responseHTTP/1.1 412 Precondition Failed [blank line]

Page 29: Caching in HTTP

29

If-None-Match Header• If the ETag matches when an If-

None-Match header is specified, then the object is really the same and is not returned

Page 30: Caching in HTTP

30

Links• For specifications and additional

information:– http://www.w3.org/Protocols/– http://www.w3.org/Protocols/Specs.html– http://www.jmarshall.com/easy/http/– http://wdvl

.com/Internet/Protocols/HTTP/article.html– Caching Tutorial for Web Authors and

Webmasters (http://www.mnot.net/cache_docs/)