39
HTTP Caching & Cache- Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

Embed Size (px)

Citation preview

Page 1: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

HTTP Caching & Cache-Bustingfor Content Publishers

Michael J. Radwin

O’Reilly Open Source Convention

July 28, 2004

Page 2: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

2

Publishers must think about caching

• Publishers have a lot of web content– HTML, images, Flash, movies

• Speed is important part of user experience• Bandwidth is expensive

– Use what you need, but avoid unnecessary extra

• Personalization differentiates– Show timely data (stock quotes, news stories)– Get accurate advertising statistics– Protect sensitive info (e-mail, account balances)

Page 3: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

3

HTTP Review

Server

ClientInterne

t

Internet

(1) Client connects to www.example.com port 80

Server

Client

(2) Client sends HTTP GET request

Internet

Internet

Page 4: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

4

HTTP Review (cont’d)

Server

ClientInterne

t

Internet

(3) Client reads HTTP response from server

Server

ClientInterne

t

Internet

(4) Client and Server close connection

Page 5: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

5

HTTP Examplemradwin@machshav:~$ telnet www.example.com 80Trying 192.168.37.203...Connected to w6.example.com.Escape character is '^]'.GET /foo/index.html HTTP/1.1Host: www.example.com

HTTP/1.1 200 OKDate: Wed, 28 Jul 2004 23:36:12 GMTLast-Modified: Fri, 23 Jul 2004 01:52:37 GMTContent-Length: 3688Connection: closeContent-Type: text/html

<html><head><title>Hello World</title>...

Page 6: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

6

Browsers use private caches

Server

ClientInterne

t

Internet

GET /foo/index.html HTTP/1.1Host: www.example.com

HTTP/1.1 200 OKLast-Modified: Fri, 23 Jul 2004 01:52:37 GMTContent-Length: 3688Content-Type: text/html

Client stores copy of http://www.example.com/foo/index.html on its hard disk with timestamp.

Page 7: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

7

Revalidation (Conditional GET)

Server

ClientInterne

t

Internet

GET /foo/index.html HTTP/1.1Host: www.example.comIf-Modified-Since: Fri, 23 Jul 2004 01:52:37 GMT

HTTP/1.1 304 Not Modified

Page 8: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

8

Non-Caching Proxy

Server

ClientInternet

Internet

Proxy

GET /foo/index.html HTTP/1.1Host: www.example.com

HTTP/1.1 200 OKLast-Modified: Fri, 23 Jul ...Content-Length: 3688Content-Type: text/html

GET /foo/index.html HTTP/1.1Host: www.example.com

HTTP/1.1 200 OKLast-Modified: Fri, 23 Jul ...Content-Length: 3688Content-Type: text/html

Page 9: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

9

Proxy Cache Miss

Server

ClientInternet

Internet

Proxy

GET /foo/index.html HTTP/1.1Host: www.example.com

HTTP/1.1 200 OKLast-Modified: Fri, 23 Jul ...Content-Length: 3688Content-Type: text/html

GET /foo/index.html HTTP/1.1Host: www.example.com

HTTP/1.1 200 OKLast-Modified: Fri, 23 Jul ...Content-Length: 3688Content-Type: text/html

Page 10: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

10

Proxy Cache Hit

Server

ClientInternet

Internet

Proxy

GET /foo/index.html HTTP/1.1Host: www.example.com

HTTP/1.1 200 OKLast-Modified: Fri, 23 Jul ...Content-Length: 3688Content-Type: text/html

Page 11: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

11

Proxy Cache Revalidation Hit

Server

ClientInternet

Internet

Proxy

GET /foo/index.html HTTP/1.1Host: www.example.comIf-Modified-Since: Fri, 23 Jul ...

HTTP/1.1 304 Not Modified

GET /foo/index.html HTTP/1.1Host: www.example.com

HTTP/1.1 200 OKLast-Modified: Fri, 23 Jul ...Content-Length: 3688Content-Type: text/html

Page 12: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

12

Assumptions about content types

Rate of change once published Frequently Occasionally Rarely/Never

HTML CSS

JavaScript

Images

Flash

PDF

Dynamic Content Static Content

Personalized Same for everyone

Page 13: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

13

Top 5 techniques for publishers

1. Use “Cache-Control: private” for personalized content

2. Implement “Images Never Expire” policy

3. Use a cookie-free TLD for static content

4. Use Apache defaults for CSS & JavaScript

5. Use random strings in URL for accurate hit metering or very sensitive content

Page 14: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

14

1. Use “Cache-Control: private”for personalized content

Rate of change once published Frequently Occasionally Rarely/Never

HTML CSS

JavaScript

Images

Flash

PDF

Dynamic Content Static Content

Personalized Same for everyone

Page 15: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

15

Bad caching of personalized content

WebmailServer

Client 1Internet

Internet

Proxy

GET /msg3.html HTTP/1.1Host: webmail.example.comCookie: user=jane

Jane’s e-mail message

GET /msg3.html HTTP/1.1Host: webmail.example.comCookie: user=jane

Jane’s e-mail message

Page 16: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

16

Bad caching of personalized content

WebmailServer

Client 1

msg3.html

Internet

Internet

Proxy

GET /msg3.html HTTP/1.1Host: webmail.example.comCookie: user=jane

Jane’s e-mail message

Page 17: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

17

Bad caching of personalized content

WebmailServer

Client 2 msg3.html

Internet

Internet

Proxy

GET /msg3.html HTTP/1.1Host: webmail.example.comCookie: user=mary

Jane’s

e-mail

mess

age

Page 18: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

18

What’s cacheable?

• HTTP/1.1 allows caching anything by default– Unless explicit Cache-Control header

• In practice, most caches avoid anything with– Cache-Control/Pragma header

– Cookie/Set-Cookie headers

– WWW-Authenticate/Authorization header

– POST/PUT method

– 302/307 status code

Page 19: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

19

Cache-Control: private

• Shared caches bad for shared content– Mary shouldn’t be able to read Jane’s webmail

• Private caches perfectly OK– Speed up web browsing experience

• Avoid personalization leakage with single line in httpd.conf or .htaccessHeader set Cache-Control private

Page 20: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

20

2. “Images Never Expire” policy

Rate of change once published Frequently Occasionally Rarely/Never

HTML CSS

JavaScript

Images

Flash

PDF

Dynamic Content Static Content

Personalized Same for everyone

Page 21: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

21

The “Images Never Expire” Policy

• Encourage caching of icons & logos– Forever ≈ 10 years in Internet biz

• Must change URL when you change image– http://us.yimg.com/i/new.gif– http://us.yimg.com/i/new2.gif

• Tradeoff– More difficult for designers– Bandwidth savings, faster user experience

Page 22: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

22

Images Never Expire (mod_expires)

# Works with both HTTP/1.0 and HTTP/1.1ExpiresActive OnExpiresByType image/gif A315360000ExpiresByType image/jpeg A315360000ExpiresByType image/png A315360000

Page 23: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

23

Images Never Expire (mod_headers)

# Works with HTTP/1.1 only<FilesMatch "\.(gif|jpe?g|png)$"> Header set Cache-Control \ "max-age=315360000"

</FilesMatch># Works with both HTTP/1.0 and HTTP/1.1<FilesMatch "\.(gif|jpe?g|png)$"> Header set Expires \ "Mon, 28 Jul 2014 23:30:00 GMT"

</FilesMatch>

Page 24: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

24

mod_images_never_expire/* Enforce policy with module that runs at URI translation hook */static int translate_imgexpire(request_rec *r) { const char *ext; if ((ext = strrchr(r->uri, '.')) != NULL) { if (strcasecmp(ext, ".gif") == 0 || strcasecmp(ext, ".jpg") == 0 || strcasecmp(ext, ".png") == 0 || strcasecmp(ext, ".jpeg") == 0) { if (ap_table_get(r->headers_in, "If-Modified-Since") != NULL || ap_table_get(r->headers_in, "If-None-Match") != NULL) { /* Don't bother checking filesystem, just hand back a 304 */ return HTTP_NOT_MODIFIED; } } } return DECLINED;}

Page 25: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

25

3. Cookie-free TLD for static content

Rate of change once published Frequently Occasionally Rarely/Never

HTML CSS

JavaScript

Images

Flash

PDF

Dynamic Content Static Content

Personalized Same for everyone

Page 26: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

26

Cookie-free TLD for static content

• For maximum efficiency use two domains– www.example.com for HTML– static.example.net for images

• Many proxies won’t cache Cookie reqs– But: multimedia is never personalized– Cookies would ignored by server anyways

Page 27: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

27

Typical GET request w/CookiesGET /i/foo/bar/quux.gif HTTP/1.1Host: www.example.comUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040707

Firefox/0.8Accept: application/x-shockwave-flash,text/xml,application/xml,application/

xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1

Cookie: U=mt=vtC1tp2MhYv9RL5BlpxYRFN_P8DpMJoamllEcA--&ux=IIr.AB&un=42vnticvufc8v; brandflash=1; B=amfco1503sgp8&b=2; F=a=NC184LcsvfX96G.JR27qSjCHu7bII3s. tXa44psMLliFtVoJB_m5wecWY_.7&b=K1It; LYC=l_v=2&l_lv=7&l_l=h03m8d50c8bo &l_s=3yu2qxz5zvwquwwuzv22wrwr5t3w1zsr&l_lid=14rsb76&l_r=a8&l_um=1_0_1_0_0; GTSessionID835990899023=83599089902340645635; Y=v=1&n=6eecgejj7012f &l=h03m8d50c8bo/o&p=m012o33013000007&jb=16|47|&r=a8&lg=us&intl=us&np=1; PROMO=SOURCE=fp5; YGCV=d=; T=z=iTu.ABiZD/AB6dPWoqXibIcTzc0BjY3TzI3NTY0MzQ-&a=YAE&sk=DAAwRz5HlDUN2T&d=c2wBT0RBekFURXdPRFV3TWpFek5ETS0BYQFZQUUBb2sBWlcwLQF0aXABWUhaTVBBAXp6AWlUdS5BQmdXQQ--&af=QUFBQ0FDQURCOUFIQUJBQ0FEQUtBTE FNSDAmdHM9MTA5MDE4NDQxOCZwcz1lOG83MUVYcTYxOVouT2Ftc1ZFZUhBLS0-; LYS=l_fh=0&l_vo=myla; PA=p0=dg13DX4Ndgk-&p1=6L5qmg--&e=xMv.AB; YP.us=v=2&m=addr&d=1525+S+Robertson+Blvd%01Los+Angeles%01CA%0190035-4231%014480%0134.051590%01-118.384342%019%01a%0190035

Referer: http://www.example.com/foo/bar.php?abc=123&def=456Accept-Language: en-us,en;q=0.7,he;q=0.3Accept-Encoding: gzip,deflateAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7Keep-Alive: 300Connection: keep-alive

Page 28: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

28

Same request, no CookiesGET /i/foo/bar/quux.gif HTTP/1.1Host: static.example.netUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040707

Firefox/0.8Accept: application/x-shockwave-flash,text/xml,application/xml,application/

xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1

Referer: http://www.example.com/foo/bar.php?abc=123&def=456Accept-Language: en-us,en;q=0.7,he;q=0.3Accept-Encoding: gzip,deflateAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7Keep-Alive: 300Connection: keep-alive

• Added bonus: much smaller GET request– Dial-up MTU size 576 bytes, PPPoE 1492– 1450 bytes reduced to 550

Page 29: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

29

4. Apache defaults for static, occasionally-changing content

Rate of change once published Frequently Occasionally Rarely/Never

HTML CSS

JavaScript

Images

Flash

PDF

Dynamic Content Static Content

Personalized Same for everyone

Page 30: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

30

Revalidation works pretty well

• Revalidation default behavior for static content– Browser sends If-Modified-Since request– Server replies with short 304 Not Modified– No fancy Apache config needed

• Use if you can’t predict when content will change– Page designers can change immediately– No renaming necessary

• Cost: extra HTTP transaction for 304– Small with Keep-Alive, but large sites disable

Page 31: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

31

Techniques to encourage caching

• Send explicit Cache-Control or Expires

• Generate “static content” headers– Last-Modified, ETag– Content-Length

• Avoid “cgi-bin”, “.cgi” or “?” in URLs– Some proxies (e.g. Squid) won’t cache– Use PATH_INFO instead

Page 32: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

32

5. Random URL strings for accurate hit metering or very sensitive content

Rate of change once published Frequently Occasionally Rarely/Never

HTML CSS

JavaScript

Images

Flash

PDF

Dynamic Content Static Content

Personalized Same for everyone

Page 33: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

33

Accurate advertising statistics

• If you trust proxies– Send Cache-Control: must-revalidate– Count 304 Not Modified log entries as hits

• If you don’t– Ask client to fetch uncacheable image URL– Return 307 to highly cacheable image file– Count 307s as hits– Don’t bother to look at cacheable server log

Page 34: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

34

Hit-metering for advertisements (1)

<script type="text/javascript">var r = Math.random();var t = new Date();document.write("<img width='109' height='52'

src='http://ads.example.com/ad/foo/bar.gif?t=" + t.getTime() + ";r=" + r + "'>");

</script><noscript><img width="109" height="52" src=

"http://ads.example.com/ad/foo/bar.gif?js=0"></noscript>

Page 35: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

35

Hit-metering for advertisements (2)GET /ad/foo/bar.gif?t=1090538707;r=0.510772917234983 HTTP/1.1Host: ads.example.comUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;

rv:1.7) Gecko/20040707 Firefox/0.8Referer: http://www.example.com/foo/bar.php?abc=123&def=456Cookie: uid=C50DF33E-E202-4206-B1F3-946AEDF9308B

HTTP/1.1 307 Temporary RedirectDate: Wed, 28 Jul 2004 23:45:06 GMTCache-Control: max-age=0,no-cache,no-storeExpires: Tue, 11 Oct 1977, 01:23:45 GMTPragma: no-cacheLocation: http://static.example.net/i/foo/bar.gifContent-Type: text/html

<a href="http://static.example.net/i/foo/bar.gif">Moved</a>

Page 36: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

36

Hit-metering for advertisements (3)GET /i/foo/bar.gif HTTP/1.1Host: static.example.netUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;

rv:1.7) Gecko/20040707 Firefox/0.8Referer: http://www.example.com/foo/bar.php?abc=123&def=456

HTTP/1.1 200 OKDate: Wed, 28 Jul 2004 23:45:07 GMTLast-Modified: Mon, 05 Oct 1998 18:32:51 GMTETag: "69079e-ad91-40212cc8"Cache-Control: public,max-age=315360000Expires: Mon, 28 Jul 2014 23:45:07 GMTContent-Length: 6096Content-Type: image/gif

GIF89a...

Page 37: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

37

Turning proxies into private caches

• Use distinct tokens in URL– No two users use same token– Defeats shared proxy caches– Works well with private caches

• Doesn’t break the back button• May break visited-link highlighting

– e.g. JavaScript timestamps/random numbers– Every link is blue, no purple

Page 38: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

38

Breaking the Back button

• When users click browser Back button– Expect to go back one page instantly– Private cache enables this behavior

• Aggressive cache-busting breaks Back button– Server sends Pragma: no-cache or Expires in past– Browser must re-visit server to re-fetch page– Hitting network much slower than hitting disk

• Use very sparingly– Compromising user experience is A Bad Thing

Page 39: HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

39

Review: Top 5 techniques

1. Use “Cache-Control: private” for personalized content

2. Implement “Images Never Expire” policy

3. Use a cookie-free TLD for static content

4. Use Apache defaults for CSS & JavaScript

5. Use random strings in URL for accurate hit metering or very sensitive content