Caching Tutorial for Web Authors and Webmasters.pdf

Embed Size (px)

Citation preview

  • 7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf

    1/14

    forWebAuthorsandWebmasters

    Thisisaninformationaldocument.Althoughtechnicalinnature,itatemptstomaketheconceptsinvolvedunderstandableandapplicable inrealworldsituations.Becauseofthis,someaspectsofthematerialaresimplifiedoromited,forthesakeofclarity.Ifyouareinterestedintheminutiaofthesubject,pleaseexploretheReferencesandFurtherInformationattheend.

    WhatsaWebCache?Whydopeopleusethem?1.

    KindsofWebCaches

    BrowserCaches1.ProxyCaches2.

    2.

    ArentWebCachesbadforme?WhyshouldIhelpthem?3.

    HowWebCachesWork4.

    How(andhownot)toControlCaches

    HTMLMetaTagsvs.HTTPHeaders1.

    PragmaHTTPHeaders(andwhytheydontwork)2.

    ControllingFreshnesswiththeExpiresHTTPHeader3.

    CacheControlHTTPHeaders4.

    ValidatorsandValidation5.

    5.

    TipsforBuildingaCacheAwareSite6.

    WritingCacheAwareScripts7.

    FrequentlyAskedQuestions8.ImplementationNotesWebServers9.

    ImplementationNotesServerSideScripting10.

    ReferencesandFurtherInformation11.

    AboutThisDocument12.

    AWebcachesitsbetweenoneormoreWebservers(alsoknownasoriginservers)andaclientormanyclients,andwatchesrequestscomeby,savingcopiesoftheresponseslikeHTMLpages,imagesandfiles(collectivelyknownasrepresentations)for

    itself.Then,ifthereisanotherrequestforthesameURL,itcanusetheresponsethatithas,insteadofaskingtheoriginserverforitagain.

    TherearetwomainreasonsthatWebcachesareused:

    ToreducelatencyBecausetherequestissatisfiedfromthecache(whichisclosertotheclient)insteadoftheoriginserver,ittakeslesstimeforittogettherepresentationanddisplayit.ThismakestheWebseemmoreresponsive.ToreducenetworktrafficBecauserepresentationsarereused,itreducestheamountofbandwidthusedbyaclient.Thissavesmoneyiftheclientispayingfortraffic,andkeepstheirbandwidthrequirementslowerandmoremanageable.

    ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs

    4 12/4/2012

  • 7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf

    2/14

    BROWSER CACHES

    IfyouexaminethepreferencesdialogofanymodernWebbrowser(likeInternetExplorer,SafariorMozilla),youllprobablynoticeacacheseting.Thisletsyousetasideasectionofyourcomputersharddisktostorerepresentationsthatyouveseen,justforyou.Thebrowsercacheworksaccordingtofairlysimplerules.Itwillchecktomakesurethattherepresentationsarefresh,usuallyonceasession(thatis,theonceinthecurrentinvocationofthebrowser).

    Thiscacheisespeciallyusefulwhenusershitthebackbutonorclickalinktoseeapagetheyvejustlookedat.Also,ifyouusethesamenavigationimagesthroughoutyoursite,theyllbeservedfrombrowserscachesalmostinstantaneously.

    PROXY CACHES

    Webproxycachesworkonthesameprinciple,butamuchlargerscale.Proxiesservehundredsorthousandsofusersinthesameway;largecorporationsandISPsofensetthemupontheirfirewalls,orasstandalonedevices(alsoknownasintermediaries).

    Becauseproxycachesarentpartoftheclientortheoriginserver,butinsteadareoutonthenetwork,requestshavetoberoutedtothemsomehow.Onewaytodothisistouseyourbrowsersproxysetingtomanuallytellitwhatproxytouse;anotherisusinginterception.InterceptionproxieshaveWebrequestsredirectedtothembytheunderlyingnetworkitself,sothatclientsdontneedtobeconfiguredforthem,orevenknowaboutthem.

    Proxycachesareatypeofsharedcache;ratherthanjusthavingonepersonusingthem,theyusuallyhavealargenumberofusers,andbecauseofthistheyareverygoodatreducinglatencyandnetworktraffic.Thatsbecausepopularrepresentationsarereusedanumberoftimes.

    GATEWAY CACHES

    Alsoknownasreverseproxycachesorsurrogatecaches,gatewaycachesarealsointermediaries,butinsteadofbeingdeployedbynetworkadministratorstosavebandwidth,theyretypicallydeployedbyWebmastersthemselves,tomaketheirsitesmorescalable,reliableandbeterperforming.

    Requestscanberoutedtogatewaycachesbyanumberofmethods,buttypically

    someformofloadbalancerisusedtomakeoneormoreofthemlookliketheoriginservertoclients.

    Contentdeliverynetworks(CDNs)distributegatewaycachesthroughouttheInternet(orapartofit)andsellcachingtointerestedWebsites.SpeederaandAkamaiareexamplesofCDNs.

    Thistutorialfocusesmostlyonbrowserandproxycaches,althoughsomeoftheinformationissuitableforthoseinterestedingatewaycachesaswell.

    Web

    caching

    is

    one

    of

    the

    most

    misunderstood

    technologies

    on

    the

    Internet.Webmastersinparticularfearlosingcontroloftheirsite,becauseaproxycachecanhidetheirusersfromthem,makingitdifficulttoseewhosusingthesite.

    Unfortunatelyforthem,evenifWebcachesdidntexist,therearetoomanyvariablesontheInternettoassurethattheyllbeabletogetanaccuratepictureofhowusersseetheirsite.Ifthisisabigconcernforyou,thistutorialwillteachyouhowtogetthestatisticsyouneedwithoutmakingyoursitecacheunfriendly.

    Anotherconcernisthatcachescanservecontentthatisoutofdate,orstale.However,thistutorialcanshowyouhowtoconfigureyourservertocontrolhowyourcontentiscached.

    Ontheotherhand,ifyouplanyoursitewell,cachescanhelpyourWebsiteload

    ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs

    4 12/4/2012

  • 7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf

    3/14

    CDNsarean

    interesting

    development,because

    unlikemanyproxy

    caches,theirgateway

    cachesarealigned

    withtheinterestsof

    theWebsitebeing

    cached,sothatthese

    problemsarentseen.

    However,evenwhen

    youuseaCDN,you

    stillhavetoconsider

    thattherewillbe

    proxyandbrowser

    cachesdownstream.

    faster,andsaveloadonyourserverandInternetlink.Thedifferencecanbedramatic;asitethatisdifficulttocachemaytakeseveralsecondstoload,whileonethattakesadvantageofcachingcanseeminstantaneousincomparison.Userswillappreciateafastloadingsite,andwillvisitmoreofen.

    Thinkofitthisway;manylargeInternetcompaniesarespendingmillionsofdollarsseting

    upfarmsofserversaroundtheworldtoreplicatetheircontent,inordertomakeitasfasttoaccessaspossiblefortheirusers.Cachesdothesameforyou,andtheyreevenclosertotheenduser.Bestofall,youdonthavetopayforthem.

    Thefactisthatproxyandbrowsercacheswillbeusedwhetheryoulikeitornot.Ifyoudontconfigureyoursitetobecachedcorrectly,itwillbecachedusingwhateverdefaultsthecachesadministratordecidesupon.

    Allcacheshaveasetofrulesthattheyusetodeterminewhentoservearepresentationfromthecache,ifitsavailable.Someoftheserulesaresetintheprotocols(HTTP1.0and1.1),andsomearesetbytheadministratorofthecache(eithertheuserofthebrowsercache,ortheproxyadministrator).

    Generallyspeaking,thesearethemostcommonrulesthatarefollowed(dontworryifyoudontunderstandthedetails,itwillbeexplainedbelow):

    Iftheresponsesheaderstellthecachenottokeepit,itwont.1.

    Iftherequestisauthenticatedorsecure(i.e.,HTTPS),itwontbecached.2.

    Acachedrepresentationisconsideredfresh(thatis,abletobesenttoaclientwithoutcheckingwiththeoriginserver)if:

    Ithasanexpirytimeorotheragecontrollingheaderset,andisstillwithinthefreshperiod,orIfthecachehasseentherepresentationrecently,anditwasmodifiedrelativelylongago.

    Freshrepresentationsareserveddirectlyfromthecache,withoutcheckingwiththeoriginserver.

    3.

    Ifarepresentationisstale,theoriginserverwillbeaskedtovalidateit,ortellthecachewhetherthecopythatithasisstillgood.

    4.

    Undercertaincircumstancesforexample,whenitsdisconnectedfromanetworkacachecanservestaleresponseswithoutcheckingwiththeoriginserver.

    5.

    Ifnovalidator(anETagorLas t - Modi f i edheader)ispresentonaresponse,and it

    doesn

    t

    have

    any

    explicit

    freshness

    information,

    it

    will

    usually

    but

    not

    always

    beconsidereduncacheable.

    Together,freshnessandvalidationarethemostimportantwaysthatacacheworkswithcontent.Afreshrepresentationwillbeavailableinstantlyfromthecache,whileavalidatedrepresentationwillavoidsendingtheentirerepresentationoveragainifithasntchanged.

    ThereareseveraltoolsthatWebdesignersandWebmasterscanusetofinetunehowcacheswilltreattheirsites.Itmayrequiregetingyourhandsalitledirtywithyourserversconfiguration,buttheresultsareworthit.Fordetailsonhowtousethese

    ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs

    4 12/4/2012

  • 7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf

    4/14

    Ifyoursiteishosted

    atanISPorhosting

    farmandtheydont

    giveyoutheabilityto

    setarbitraryHTTP

    headers(like

    and

    ),

    complainloudly;

    thesearetools

    necessaryfordoing

    yourjob.

    toolswithyourserver,seetheImplementation sectionsbelow.

    HTML META TAGS AND HTTP HEADERS

    HTMLauthorscanputtagsinadocumentssectionthatdescribeitsatributes.Thesemetatagsareofenusedinthebeliefthattheycanmarkadocumentasuncacheable,orexpireitatacertaintime.

    Metatagsareeasytouse,butarentveryeffective.Thatsbecausetheyreonlyhonoredbyafewbrowsercaches,notproxycaches(whichalmostneverreadtheHTMLinthedocument).WhileitmaybetemptingtoputaPragma:nocachemetatagintoaWebpage,itwontnecessarilycauseittobekeptfresh.

    Ontheotherhand,trueHTTPheadersgiveyoualotofcontroloverhowbothbrowsercachesandproxieshandleyourrepresentations.TheycantbeseenintheHTML,andareusuallyautomaticallygeneratedbytheWebserver.However,youcancontrolthemtosomedegree,dependingontheserveryouuse.Inthefollowingsections,youllseewhatHTTPheadersareinteresting,andhowtoapplythemtoyoursite.

    HTTPheadersaresentbytheserverbeforethe

    HTML,and

    only

    seen

    by

    the

    browser

    and

    anyintermediatecaches.TypicalHTTP1.1response

    headersmightlooklikethis:

    HTTP/ 1. 1 200 OKDat e: Fr i , 30 Oct 1998 13: 19: 41 GMTServer : Apache/ 1 . 3 . 3 ( Uni x)Cache- Cont r o l : max- age=3600, must - r eva l i dat eExpi r es: Fr i , 30 Oct 1998 14: 19: 41 GMTLast - Modi f i ed: Mon, 29 Jun 1998 02: 28: 12 GMTETag: " 3e86- 410- 3596f bbc"Cont ent - Lengt h : 1040Con t ent - Type : t ext / ht ml

    TheHTML

    would

    follow

    these

    headers,

    separated

    by

    ablank

    line.

    See

    theImplementationsectionsforinformationabouthowtosetHTTPheaders.

    PRAGMA HTTP HEADERS (AND WHY THEY DONT WORK)

    ManypeoplebelievethatassigningaPr agma: no- cacheHTTPheadertoarepresentationwillmakeituncacheable.Thisisnotnecessarilytrue;theHTTPspecificationdoesnotsetanyguidelinesforPragmaresponseheaders;instead,Pragmarequestheaders(theheadersthatabrowsersendstoaserver)arediscussed.Althoughafewcachesmayhonorthisheader,themajoritywont,anditwonthaveanyeffect.Usetheheadersbelowinstead.

    CONTROLLING FRESHNESS WITH THE EXPIRES HTTP HEADER

    TheExpi r es HTTPheaderisabasicmeansofcontrollingcaches;ittellsallcaches

    howlongtheassociatedrepresentationisfreshfor.Aferthattime,cacheswillalwayscheckbackwiththeoriginservertoseeifadocumentischanged.Expi r esheadersaresupportedbypracticallyeverycache.

    MostWebserversallowyoutosetExpi r es responseheadersinanumberofways.Commonly,theywillallowsetinganabsolutetimetoexpire,atimebasedonthelasttimethattheclientretrievedtherepresentation(lastaccesstime),oratimebasedonthelasttimethedocumentchangedonyourserver(lastmodificationtime).Expi r es headersareespeciallygoodformakingstaticimages(likenavigationbarsandbutons)cacheable.Becausetheydontchangemuch,youcansetextremelylongexpirytimeonthem,makingyoursiteappearmuchmoreresponsivetoyourusers.Theyrealsousefulforcontrollingcachingofapagethatisregularlychanged.For

    ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs

    4 12/4/2012

  • 7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf

    5/14

    Itsimportanttomake

    surethatyourWeb

    serversclockis

    accurateifyouusethe

    header.

    Onewaytodothisis

    usingtheNetwork

    TimeProtocol(NTP);

    talktoyourlocal

    systemadministrator

    tofindoutmore.

    instance,ifyouupdateanewspageonceadayat6am,youcansettherepresentationtoexpireatthattime,socacheswillknowwhentogetafreshcopy,withoutusershavingtohitreload.

    TheonlyvaluevalidinanExpi r es headerisaHTTPdate;anythingelsewillmostlikelybeinterpretedasinthepast,sothattherepresentationisuncacheable.Also,rememberthatthetimeinaHTTPdateisGreenwichMeanTime(GMT),notlocaltime.

    Forexample:

    Expi r es: Fr i , 30 Oct 1998 14: 19: 41 GMT

    AlthoughtheExpi r esheaderisuseful,ithassomelimitations.First,becausetheresadateinvolved,theclocksontheWebserverandthecachemustbesynchronised;iftheyhaveadifferentideaofthetime,theintendedresultswontbeachieved,andcachesmightwronglyconsiderstalecontentasfresh.

    AnotherproblemwithExpi r es isthatitseasytoforgetthatyouvesetsomecontenttoexpireataparticulartime.IfyoudontupdateanExpi r es

    timebeforeitpasses,eachandeveryrequestwillgobacktoyourWebserver,increasingloadandlatency.

    CACHE-CONTROL HTTP HEADERS

    HTTP1.1introducedanewclassofheaders,Cache- Cont r ol responseheaders,togiveWebpublishersmorecontrolovertheircontent,andtoaddressthelimitationsofExpi r es .

    UsefulCache- Cont r ol responseheadersinclude:

    max- age=[seconds]specifiesthemaximumamountoftimethatarepresentationwillbeconsideredfresh.SimilartoExpi r es,thisdirectiveis

    relativetothe

    time

    ofthe

    request,

    rather

    than

    absolute.

    [seconds]

    isthe

    number

    ofsecondsfromthetimeoftherequestyouwishtherepresentationtobefreshfor.

    s- maxage=[seconds]similartomax- age,exceptthatitonlyappliestoshared(e.g.,proxy)caches.publ i c marksauthenticatedresponsesascacheable;normally,ifHTTPauthentication isrequired,responsesareautomaticallyprivate.pr i vat eallowscachesthatarespecifictooneuser(e.g.,inabrowser)tostoretheresponse;sharedcaches(e.g.,inaproxy)maynot.no- cacheforcescachestosubmittherequesttotheoriginserverforvalidationbeforereleasingacachedcopy,everytime.Thisisusefultoassurethatauthentication isrespected(incombinationwithpublic),ortomaintainrigidfreshness,withoutsacrificingallofthebenefitsofcaching.no- s t or einstructscachesnottokeepacopyoftherepresentationunderanyconditions.mus t - r eval i dat etellscachesthattheymustobeyanyfreshnessinformationyougivethemaboutarepresentation.HTTPallowscachestoservestalerepresentationsunderspecialconditions;byspecifyingthisheader,youretellingthecachethatyouwantittostrictlyfollowyourrules.pr ox y- r ev al i dat esimilartomus t - r eval i dat e,exceptthatitonlyappliestoproxycaches.

    Forexample:

    Cache- Cont r o l : max- age=3600, must - r eva l i dat e

    WhenbothCache- Cont r ol andExpi r esarepresent,Cache- Cont r ol takesprecedence.IfyouplantousetheCache- Cont r ol headers,youshouldhavealook

    ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs

    4 12/4/2012

  • 7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf

    6/14

    attheexcellentdocumentationinHTTP1.1;seeReferencesandFurtherInformation.

    VALIDATORS AND VALIDATION

    InHowWebCachesWork,wesaidthatvalidationisusedbyserversandcachestocommunicatewhenarepresentationhaschanged.Byusingit,cachesavoidhavingtodownloadtheentirerepresentationwhentheyalreadyhaveacopylocally,buttheyrenotsureifitsstillfresh.

    Validatorsareveryimportant;ifoneisntpresent,andthereisntanyfreshnessinformation(Expi r es orCache- Cont r ol )available,cacheswillnotstorearepresentationatall.

    Themostcommonvalidatoristhetimethatthedocumentlastchanged,ascommunicatedinLas t - Modi f i edheader.WhenacachehasarepresentationstoredthatincludesaLas t - Modi f i edheader,itcanuseittoasktheserveriftherepresentationhaschangedsincethelasttimeitwasseen,withanI f - Modi f i ed- Si nc erequest.

    HTTP1.1introducedanewkindofvalidatorcalledtheETag.ETagsareuniqueidentifiersthataregeneratedbytheserverandchangedeverytimetherepresentationdoes.BecausetheservercontrolshowtheETagisgenerated,cachescanbesurethatiftheETagmatcheswhentheymakeaI f - None- Mat chrequest,therepresentationreallyisthesame.

    AlmostallcachesuseLastModifiedtimesasvalidators;ETagvalidationisalsobecomingprevalent.

    MostmodernWebserverswillgeneratebothETagandLas t - Modi f i edheaderstouseasvalidatorsforstaticcontent(i.e.,files)automatically; youwonthavetodoanything.However,theydontknowenoughaboutdynamiccontent(likeCGI,ASPordatabasesites)togeneratethem;seeWritingCacheAwareScripts.

    Besidesusingfreshnessinformationandvalidation,thereareanumberofotherthingsyoucandotomakeyoursitemorecachefriendly.

    UseURLsconsistentlythisisthegoldenruleofcaching.Ifyouservethesamecontentondifferentpages,todifferentusers,orfromdifferentsites,itshouldusethesameURL.Thisistheeasiestandmosteffectivewaytomakeyoursitecachefriendly.Forexample,ifyouuse/index.htmlinyourHTMLasareferenceonce,alwaysuseitthatway.Useacommonlibraryofimagesandotherelementsandreferbacktothemfromdifferentplaces.MakecachesstoreimagesandpagesthatdontchangeofenbyusingaCache-Cont r ol : max-ageheaderwithalargevalue.Makecachesrecogniseregularlyupdatedpagesbyspecifyinganappropriatemaxageorexpirationtime.Ifaresource(especiallyadownloadable file)changes,changeitsname.Thatway,youcanmakeitexpirefarinthefuture,andstillguaranteethatthecorrectversionisserved;thepagethatlinkstoitistheonlyonethatwillneedashortexpirytime.Dontchangefilesunnecessarily.Ifyoudo,everythingwillhaveafalselyyoungLas t - Modi f i eddate.Forinstance,whenupdatingyoursite,dontcopyovertheentiresite;justmovethefilesthatyouvechanged.Usecookiesonlywherenecessarycookiesaredifficulttocache,andarentneededinmostsituations.Ifyoumustuseacookie,limititsusetodynamicpages.MinimizeuseofSSLbecauseencryptedpagesarenotstoredbysharedcaches,usethemonlywhenyouhaveto,anduseimagesonSSLpagessparingly.CheckyourpageswithREDbotitcanhelpyouapplymanyoftheconceptsinthistutorial.

    ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs

    4 12/4/2012

  • 7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf

    7/14

    Bydefault,mostscriptswontreturnavalidator(aLas t - Modi f i ed orETag responseheader)orfreshnessinformation(Expi r es orCache- Cont r ol ).Whilesomescriptsreallyaredynamic(meaningthattheyreturnadifferentresponseforeveryrequest),many(likesearchenginesanddatabasedrivensites)canbenefitfrombeingcachefriendly.

    Generallyspeaking,ifascriptproducesoutputthatisreproduciblewiththesamerequestatalatertime(whetheritbeminutesordayslater),itshouldbecacheable.IfthecontentofthescriptchangesonlydependingonwhatsintheURL,itiscacheable;iftheoutputdependsonacookie,authenticationinformationorother

    externalcriteria,itprobablyisnt.

    Thebestwaytomakeascriptcachefriendly(aswellasperformbeter)istodumpitscontenttoaplainfilewheneveritchanges.TheWebservercanthentreatitlikeanyotherWebpage,generatingandusingvalidators,whichmakesyourlifeeasier.Remembertoonlywritefilesthathavechanged,sotheLas t - Modi f i edtimesarepreserved.Anotherwaytomakeascriptcacheableinalimitedfashionistosetanagerelatedheaderforasfarinthefutureaspractical.AlthoughthiscanbedonewithExpi r es,itsprobablyeasiesttodosowithCache- Cont r ol : max- age,whichwillmaketherequestfreshforanamountoftimeafertherequest.Ifyoucantdothat,youllneedtomakethescriptgenerateavalidator,andthenrespondtoI f - Modi f i ed- Si nc eand/orI f - None- Mat chrequests.ThiscanbedonebyparsingtheHTTPheaders,andthenrespondingwith304 Not

    Modi f i edwhenappropriate.Unfortunately,thisisnotatrivaltask.

    Someothertips;

    DontusePOSTunlessitsappropriate.ResponsestothePOSTmethodarentkeptbymostcaches;ifyousendinformationinthepathorquery(viaGET),cachescanstorethatinformationforthefuture.DontembeduserspecificinformationintheURLunlessthecontentgeneratediscompletelyuniquetothatuser.Dontcountonallrequestsfromausercomingfromthesamehost,becausecachesofenworktogether.GenerateCont ent - Lengt hresponseheaders.Itseasytodo,anditwillallowtheresponseofyourscripttobeusedinapersistentconnection.ThisallowsclientstorequestmultiplerepresentationsononeTCP/IPconnection,insteadofsetingupa

    connectionfor

    every

    request.

    Itmakes

    your

    site

    seem

    much

    faster.

    SeetheImplementationNotesformorespecificinformation.

    WHAT ARE THE MOST IMPORTANT THINGS TO MAKECACHEABLE?

    Agoodstrategyistoidentifythemostpopular,largestrepresentations(especiallyimages)andworkwiththemfirst.

    HOW CAN I MAKE MY PAGES AS FAST AS POSSIBLE WITHCACHES?

    Themostcacheablerepresentationisonewithalongfreshnesstimeset.Validationdoeshelpreducethetimethatittakestoseearepresentation,butthecachestillhastocontacttheoriginservertoseeifitsfresh.Ifthecachealreadyknowsitsfresh,itwillbeserveddirectly.

    I UNDERSTAND THAT CACHING IS GOOD, BUT I NEED TO KEEPSTATISTICS ON HOW MANY PEOPLE VISIT MY PAGE!

    Ifyoumustknoweverytimeapageisaccessed,selectONEsmallitemonapage(orthepageitself),andmakeituncacheable,bygivingitasuitableheaders.Forexample,youcouldrefertoa1x1transparentuncacheableimagefromeachpage.TheRef er er headerwillcontaininformationaboutwhatpagecalledit.

    ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs

    4 12/4/2012

  • 7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf

    8/14

    Beawarethateventhiswillnotgivetrulyaccuratestatisticsaboutyourusers,and isunfriendlytotheInternetandyourusers;itgeneratesunnecessarytraffic,andforcespeopletowaitforthatuncacheditemtobedownloaded.Formoreinformationaboutthis,seeOnInterpretingAccessStatisticsinthereferences.

    HOW CAN I SEE A REPRESENTATIONS HTTP HEADERS?

    ManyWebbrowsersletyouseetheExpi r es andLas t - Modi f i edheadersareinapageinfoorsimilarinterface.Ifavailable,thiswillgiveyouamenuofthepageandanyrepresentations(likeimages)associatedwithit,alongwiththeirdetails.

    Toseethefullheadersofarepresentation,youcanmanuallyconnecttotheWebserverusingaTelnetclient.

    Todoso,youmayneedtotypetheport(bedefault,80)intoaseparatefield,oryoumayneedtoconnecttowww. exampl e. com: 80orwww. exampl e. com 80(notethespace).ConsultyourTelnetclientsdocumentation.

    Onceyouveopenedaconnectiontothesite,typearequestfortherepresentation.Forinstance,ifyouwanttoseetheheadersforht t p: / / www. exampl e. com

    / f oo. ht ml ,connecttowww. exampl e. com,port80,andtype:

    GET / f oo. h t ml HTTP/ 1. 1 [ r e t urn ]Hos t : www. exampl e. com [ r e t u rn ] [ r et ur n]

    PresstheReturnkeyeverytimeyousee[ r et ur n ] ;makesuretopressittwiceattheend.Thiswillprinttheheaders,andthenthefullrepresentation.Toseetheheadersonly,substituteHEADforGET.

    MY PAGES ARE PASSWORD-PROTECTED; HOW DO PROXYCACHES DEAL WITH THEM?

    Bydefault,pagesprotectedwithHTTPauthentication areconsideredprivate;theywillnotbekeptbysharedcaches.However,youcanmakeauthenticatedpagespublicwithaCacheControl:publicheader;HTTP1.1compliantcacheswillthenallowthemtobecached.

    Ifyoudlikesuchpagestobecacheable,butstillauthenticatedforeveryuser,combine

    theCache- Cont r ol : publ i c

    and

    no- cache

    headers.

    This

    tells

    the

    cache

    that

    itmustsubmitthenewclientsauthentication informationtotheoriginserverbefore

    releasingtherepresentationfromthecache.Thiswouldlooklike:

    Cache-Cont r o l : pub l i c , no -cache

    Whetherornotthisisdone,itsbesttominimizeuseofauthentication; forexample,ifyourimagesarenotsensitive,puttheminaseparatedirectoryandconfigureyourservernottoforceauthentication forit.Thatway,thoseimageswillbenaturallycacheable.

    SHOULD I WORRY ABOUT SECURITY IF PEOPLE ACCESS MYSITE THROUGH A CACHE?

    SSLpagesarenotcached(ordecrypted)byproxycaches,soyoudonthavetoworryaboutthat.However,becausecachesstorenonSSLrequestsandURLsfetchedthroughthem,youshouldbeconsciousaboutunsecuredsites;anunscrupulousadministratorcouldconceivablygatherinformationabouttheirusers,especiallyintheURL.

    Infact,anyadministratoronthenetworkbetweenyourserverandyourclientscouldgatherthistypeofinformation.OneparticularproblemiswhenCGIscriptsputusernamesandpasswordsintheURLitself;thismakesittrivialforotherstofindandusetheirlogin.

    IfyoureawareoftheissuessurroundingWebsecurityingeneral,youshouldnthaveanysurprisesfromproxycaches.

    ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs

    4 12/4/2012

  • 7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf

    9/14

    IM LOOKING FOR AN INTEGRATED WEB PUBLISHINGSOLUTION. WHICH ONES ARE CACHE-AWARE?

    Itvaries.Generallyspeaking,themorecomplexasolutionis,themoredifficultitistocache.Theworstareoneswhichdynamicallygenerateallcontentanddontprovidevalidators;theymaynotbecacheableatall.Speakwithyourvendorstechnicalstaffformoreinformation,andseetheImplementationnotesbelow.

    MY IMAGES EXPIRE A MONTH FROM NOW, BUT I NEED TOCHANGE THEM IN THE CACHES NOW!

    TheExpiresheadercantbecircumvented;unlessthecache(eitherbrowserorproxy)runsoutofroomandhastodeletetherepresentations,thecachedcopywillbeuseduntilthen.

    Themosteffectivesolutionistochangeanylinkstothem;thatway,completelynewrepresentationswillbeloadedfreshfromtheoriginserver.Rememberthatanypagethatreferstotheserepresentationswillbecachedaswell.Becauseofthis,itsbesttomakestaticimagesandsimilarrepresentationsverycacheable,whilekeepingtheHTMLpagesthatrefertothemonatightleash.

    Ifyouwanttoreloadarepresentationfromaspecificcache,youcaneitherforceareload(inFirefox,holdingdownshif whilepressingreloadwilldothisbyissuingaPr agma: no- cacherequestheader)whileusingthecache.Or,youcanhavethecacheadministratordeletetherepresentationthroughtheirinterface.

    I RUN A WEB HOSTING SERVICE. HOW CAN I LET MY USERSPUBLISH CACHE-FRIENDLY PAGES?

    IfyoureusingApache,considerallowingthemtouse.htaccessfilesandprovidingappropriatedocumentation.

    Otherwise,youcanestablishpredeterminedareasforvariouscachingatributesineachvirtualserver.Forinstance,youcouldspecifyadirectory/cache1mthatwillbecachedforonemonthaferaccess,anda/nocacheareathatwillbeservedwithheadersinstructingcachesnottostorerepresentationsfromit.

    Whateveryouareabletodo,itisbesttoworkwithyourlargestcustomersfirstoncaching.Mostofthesavings(inbandwidthandinloadonyourservers)willbe

    realizedfromhighvolumesites.

    IVE MARKED MY PAGES AS CACHEABLE, BUT MY BROWSERKEEPS REQUESTING THEM ON EVERY REQUEST. HOW DO IFORCE THE CACHE TO KEEP REPRESENTATIONS OF THEM?

    Cachesarentrequiredtokeeparepresentationandreuseit;theyreonlyrequiredtonotkeeporusethemundersomeconditions.Allcachesmakedecisionsaboutwhichrepresentationstokeepbasedupontheirsize,type(e.g.,imagevs.html),orbyhowmuchspacetheyhavelef tokeeplocalcopies.Yoursmaynotbeconsideredworthkeepingaround,comparedtomorepopularorlargerrepresentations.

    Somecachesdoallowtheiradministratorstoprioritizewhatkindsofrepresentationsarekept,andsomeallowrepresentationstobepinnedincache,sothattheyre

    alwaysavailable.

    Generallyspeaking,itsbesttousethelatestversionofwhateverWebserveryouvechosentodeploy.Notonlywilltheylikelycontainmorecachefriendlyfeatures,newversionsalsousuallyhaveimportantsecurityandperformanceimprovements.

    APACHE HTTP SERVER

    Apacheusesoptionalmodulestoincludeheaders,includingbothExpiresandCacheControl.Bothmodulesareavailableinthe1.2orgreaterdistribution.

    ThemodulesneedtobebuiltintoApache;althoughtheyareincludedinthe

    ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs

    4 12/4/2012

  • 7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf

    10/14

    distribution,theyarenotturnedonbydefault.Tofindoutifthemodulesareenabledinyourserver,findthehtpdbinaryandrunht t pd - l ;thisshouldprintalistoftheavailablemodules(notethatthisonlylistscompiledinmodules;onlaterversionsofApache,useht t pd - Mtoincludedynamicallyloadedmodulesaswell).Themoduleswerelookingforaremod_expiresandmod_headers.

    Iftheyarentavailable,andyouhaveadministrativeaccess,youcanrecompileApachetoincludethem.ThiscanbedoneeitherbyuncommentingtheappropriatelinesintheConfigurationfile,orusingthe- enabl e-modul e=expi r esand- enabl e- modul e=header sargumentstoconfigure(1.3

    orgreater).ConsulttheINSTALLfilefoundwiththeApachedistribution.OnceyouhaveanApachewiththeappropriatemodules,youcanusemod_expirestospecifywhenrepresentationsshouldexpire,eitherin.htaccessfilesorintheserversaccess.conffile.Youcanspecifyexpiryfromeitheraccessormodificationtime,andapplyittoafiletypeorasadefault.Seethemoduledocumentationformoreinformation,andspeakwithyourlocalApacheguruifyouhavetrouble.

    ToapplyCache- Cont r ol headers,youllneedtousethemod_headersmodule,whichallowsyoutospecifyarbitraryHTTPheadersforaresource.Seethemod_headersdocumentation.

    Heresanexample.htaccessfilethatdemonstratestheuseofsomeheaders.

    .htaccessfilesallowwebpublisherstousecommandsnormallyonlyfoundin

    configurationfiles.Theyaffectthecontentofthedirectorytheyreinandtheirsubdirectories.Talktoyourserveradministratortofindoutiftheyreenabled.

    ### act i vat e mod_expi r esExpi r esAct i ve On### Expi r e . gi f ' s 1 mon t h f r om when t hey ' r e accessedExpi r esByType i mage/ gi f A2592000### Ex pi r e ev er y t hi ng el s e 1 day f r om when i t ' s l as t modi f i ed### ( t hi s us es t he Al t er nat i v e s ynt ax )Expi r esDef au l t " modi f i cat i on pl us 1 day"### Appl y a Cache- Cont r ol header t o i ndex. ht mlHeader append Cache- Cont r o l " pub l i c , must - r eva l i dat e"

    Notethatmod_expiresautomaticallycalculatesandinsertsaCache-Cont r ol : max-ageheaderasappropriate.

    Apache2sconfigurationisverysimilartothatof1.3;seethe2.2mod_expiresandmod_headersdocumentationformoreinformation.

    MICROSOFT IIS

    MicrosofsInternetInformationServermakesitveryeasytosetheadersinasomewhatflexibleway.Notethatthisisonlypossibleinversion4oftheserver,whichwillrunonlyonNTServer.

    Tospecifyheadersforanareaofasite,selectitintheAdmi ni s t r at i on Tool s

    interface,andbringupitsproperties.AferselectingtheHTTP Header s tab,you

    shouldseetwointerestingareas;Enabl e Cont ent Expi r at i onandCust omHTTP header s .Thefirstshouldbeselfexplanatory,andthesecondcanbeusedtoapplyCacheControlheaders.

    SeetheASPsectionbelowforinformationaboutsetingheadersinActiveServerPages.ItisalsopossibletosetheadersfromISAPImodules;refertoMSDNfordetails.

    NETSCAPE/IPLANET ENTERPRISE SERVER

    Asofversion3.6,EnterpriseServerdoesnotprovideanyobviouswaytosetExpiresheaders.However,ithassupportedHTTP1.1featuressinceversion3.0.ThismeansthatHTTP1.1caches(proxyandbrowser)willbeabletotakeadvantageofCache

    ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs

    14 12/4/2012

  • 7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf

    11/14

    Onethingtokeepin

    mindisthatitmaybe

    easiertosetHTTP

    headerswithyour

    Webserverrather

    thaninthescripting

    language.Tryboth.

    Controlsetingsyoumake.

    TouseCacheControlheaders,chooseCont ent Management | Cache Cont r olDi r ec t i v es intheadministrationserver.Then,usingtheResourcePicker,choosethedirectorywhereyouwanttosettheheaders.Afersetingtheheaders,clickOK.Formoreinformation,seetheNESmanual.

    Becausetheemphasisinserversidescriptingisondynamiccontent,itdoesntmakeforverycacheablepages,evenwhenthecontentcouldbecached.Ifyourcontentchangesofen,butnotoneverypagehit,considersetingaCacheControl:maxageheader;mostusersaccesspagesagaininarelativelyshortperiodoftime.Forinstance,whenusershitthebackbuton,ifthereisntanyvalidatororfreshnessinformationavailable,theyllhavetowaituntilthepageisredownloadedfromtheservertoseeit.

    CGI

    CGIscriptsareoneofthemostpopularwaystogeneratecontent.YoucaneasilyappendHTTPresponseheadersbyaddingthembeforeyousendthebody;MostCGIimplementationsalreadyrequireyoutodothisfortheCont ent - Typeheader.Forinstance,inPerl;

    #! / us r / bi n/ per lpr i nt " Cont ent - t ype: t ext / ht ml \ n" ;pr i n t " Expi r es: Thu, 29 Oct 1998 17: 04: 19 GMT\ n";pr i nt " \ n" ;### t he cont en t body f o l l ows . . .

    Sinceitsalltext,youcaneasilygenerateExpi r es andotherdaterelatedheaderswithinbuiltfunctions.ItseveneasierifyouuseCache- Cont r ol : max- age;

    pr i n t " Cache- Cont r o l : max- age=600\ n";

    Thiswillmakethescriptcacheablefor10minutesafertherequest,sothatiftheuserhitsthebackbuton,theywontberesubmitingtherequest.

    TheCGIspecificationalsomakesrequestheadersthattheclientsendsavailableintheenvironmentofthescript;eachheaderhasHTTP_prependedtoitsname.So,ifaclientmakesanI f - Modi f i ed- Si nc erequest,itwillshowupasHTTP_I F_MODI FI ED_SI NCE.

    Seealsothecgi_bufferlibrary,whichautomaticallyhandlesETaggenerationandvalidation,Cont ent - Lengt hgenerationandgzipcontentcodingforPerlandPythonCGIscriptswithaonelineinclude.ThePythonversioncanalsobeusedtowrap

    arbitraryCGIscriptswith.

    SERVER SIDE INCLUDES

    SSI(ofenusedwiththeextension.shtml)isoneofthefirstwaysthatWebpublisherswereabletogetdynamiccontentintopages.Byusingspecialtagsinthepages,alimitedformofinHTMLscriptingwasavailable.

    MostimplementationsofSSIdonotsetvalidators,andassucharenotcacheable.However,ApachesimplementationdoesallowuserstospecifywhichSSIfilescanbecached,bysetingthegroupexecutepermissionsontheappropriatefiles,combinedwiththeXbi t Hack f ul l directive.Formoreinformation,seethemod_includedocumentation.

    ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs

    14 12/4/2012

  • 7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf

    12/14

    PHP

    PHPisaserversidescriptinglanguagethat,whenbuiltintotheserver,canbeusedtoembedscriptsinsideapagesHTML,muchlikeSSI,butwithafarlargernumberofoptions.PHPcanbeusedasaCGIscriptonanyWebserver(UnixorWindows),orasanApachemodule.

    Bydefault,representationsprocessedbyPHParenotassignedvalidators,andarethereforeuncacheable.However,developerscansetHTTPheadersbyusingtheHeader ( ) function.

    Forexample,thiswillcreateaCacheControlheader,aswellasanExpiresheaderthreedaysinthefuture:

    RememberthattheHeader ( ) functionMUSTcomebeforeanyotheroutput.

    As

    you

    can

    see,

    youll

    have

    to

    create

    the

    HTTP

    date

    for

    anExpi r es

    header

    by

    hand;PHPdoesntprovideafunctiontodoitforyou(althoughrecentversionshavemadeiteasier;seethePHPsdatedocumentation).Ofcourse,itseasytosetaCache-Cont r ol : max- age header ,whichisjustasgoodformostsituations.

    Formoreinformation,seethemanualentryforheader.

    Seealsothecgi_bufferlibrary,whichautomaticallyhandlesETaggenerationandvalidation,Cont ent - Lengt hgenerationandgzipcontentcodingforPHPscriptswithaonelineinclude.

    COLD FUSION

    ColdFusion,byMacromediaisacommercialserversidescriptingengine,withsupportforseveralWebserversonWindows,LinuxandseveralflavorsofUnix.

    ColdFusionmakessetingarbitraryHTTPheadersrelativelyeasy,withtheCFHEADERtag.Unfortunately,theirexampleforsetinganExpi r esheader,asbelow,isabitmisleading.

    Itdoesntworklikeyoumightthink,becausethetime(inthiscase,whentherequestismade)doesntgetconvertedtoaHTTPvaliddate;instead,itjustgetsprintedasarepresentationofColdFusionsDate/Timeobject.Mostclientswilleitherignoresuchavalue,orconvertittoadefault,likeJanuary1,1970.

    However,ColdFusiondoesprovideadateformatingfunctionthatwilldothejob;Get Ht t pTi meSt r i ng.IncombinationwithDat eAdd,itseasytosetExpiresdates;here,wesetaheadertodeclarethatrepresentationsofthepageexpireinonemonth;

    YoucanalsousetheCFHEADERtagtosetCache- Cont r ol : max- ageandotherheaders.

    RememberthatWebserverheadersarepassedthroughinsomedeploymentsofColdFusion(suchasCGI);checkyourstodeterminewhetheryoucanusethistoyouradvantage,bysetingheadersontheserverinsteadofinColdFusion.

    ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs

    14 12/4/2012

  • 7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf

    13/14

    WhensetingHTTP

    headersfromASPs,

    makesureyoueither

    placetheResponse

    methodcallsbefore

    anyHTML

    generation,oruse

    tobuffertheoutput.

    Also,notethatsome

    versionsofIISseta

    headeron

    ASPsbydefault,and

    mustbedeclared

    public tobecacheable

    bysharedcaches.

    ASP AND ASP.NET

    ActiveServerPages,builtintoIISandalsoavailableforotherWebservers,alsoallowsyoutosetHTTPheaders.Forinstance,tosetanexpirytime,youcanusethepropertiesoftheResponse

    object;

    specifyingthenumberofminutesfromtherequesttoexpiretherepresentation.Cache- Cont r ol

    headerscanbeaddedlikethis:

    InASP.NET,Response. Expi r esisdeprecated;theproperwaytosetcacherelatedheadersiswithResponse. Cache;

    Response. Cache. Set Expi r es ( Dat eTi me. Now. AddMi nut es ( 60 ) ) ;Response. Cache . Set Cacheab i l i t y ( Ht t pCacheab i l i t y . Pub l i c ) ;

    HTTP 1.1 SPECIFICATION

    TheHTTP1.1spechasmanyextensionsformakingpagescacheable,andistheauthoritativeguidetoimplementingtheprotocol.Seesections13,14.9,14.21,and14.25.

    WEB-CACHING.COM

    Anexcellentintroductiontocachingconcepts,withlinkstootheronlineresources.

    ON INTERPRETING ACCESS STATISTICS

    JeffGoldbergsinformativerantonwhyyoushouldntrelyonaccessstatisticsandhitcounters.

    REDBOT

    ExaminesHTTPresourcestodeterminehowtheywillinteractwithWebcaches,andgenerallyhowwelltheyusetheprotocol.

    CGI_BUFFER LIBRARY

    OnelineincludeinPerlCGI,PythonCGIandPHPscriptsautomaticallyhandlesETaggenerationandvalidation,ContentLengthgenerationandgzipContentEncodingcorrectly.ThePythonversioncanalsobeusedasawrapperaroundarbitraryCGIscripts.

    ThisdocumentisCopyright19982012MarkNotingham.ThisworkislicensedunderaCreativeCommonsAtributionNoncommercialNoDerivativeWorks3.0UnportedLicense.

    Alltrademarkswithinarepropertyoftheirrespectiveholders.

    Althoughtheauthorbelievesthecontentstobeaccurateatthetimeofpublication,noliabilityisassumedforthem,theirapplicationoranyconsequencesthereof.Ifanymisrepresentations,errorsorotherneedforclarificationisfound,pleasecontacttheauthorimmediately.

    ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs

    14 12/4/2012

  • 7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf

    14/14

    Thelatestrevisionofthisdocumentcanalwaysbeobtainedfromhtp://www.mnot.net/cache_docs/

    Translationsareavailablein:Belarusian,Chinese,Czech,German,andFrench.

    February9,2012

    ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs