Varnish - PLNOG 4

A modern HTTP acceleratorfor content providers

Leszek UrbańskiTrader Media East Competence Center

unabridged versionPLNOG 4 – Warsaw, 2010-03-05

Modern web apps

● you know this picture...

Modern web apps

● you know this picture...

Modern web apps

● a nice setup...

Modern web apps

● ...but the application is still slow.

● you need efficient web caching, before the traffic hits your app

● CDNs? Hardware? Squid?

Web caching

● CDNs● expensive● you are completely dependent on a CDN's service

● hardware

● nice, but...● $30,000 for one BIG-IP 3600 without redundancy,

support and only 50 Mbps compression with the standard licence

Web caching

● Squid● a forward proxy/cache with optional reverse

proxying (HTTP acceleration)

● huge config files full of forward proxy options

● it's slow

● “1970s programming”

http://varnish-cache.org/


Varnish

● a state-of-the-art reverse proxy and cache

● open source, initially developed for a Norwegian tabloid “Verdens Gang” in 2006

● Poul-Henning Kamp – architect and lead developer

● Linpro AS

Varnish

● used by TOP100 sites● Twitter● Photobucket● weather.com● answers.com● Hulu● Wikia

● source: Ingvar Hagelund http://users.linpro.no/ingvar/varnish/stats-2010-01-18.txt

http://users.linpro.no/ingvar/varnish/stats-2010-01-18.txt

Varnish

● used by only one Alexa TOP100 site in Poland● Gadu-Gadu

Architecture

● Varnish does not fight the OS kernel!

● uses virtual memory, two main stevedores:● mmap()● malloc()

● scales well in SMP environments● event-based acceptor● multi-threaded worker model

Architecture

● avoids expensive memory operations● workers used in the MRU order, session lingering

● a worker has a private set of variables on the stack

● static buffers – reused

● uses jemalloc library. No noticeable difference with Google's tcmalloc

Architecture

● workspaces● operate on pointers, do not copy data● malloc() only for the workspaces

● obj_workspace – per object, for request/response headers and metadata. Watch out for very large headers/cookies!

● sess_workspace – per thread, for request processing

● shm_workspace – for SHM logging

Architecture

● SHM logging● an mmap()ed file shared by all threads and logging

programs

● logging without syscalls!

memcpy(p + SHMLOG_DATA, t.b, l);/* or */vsnprintf((char *)(p + SHMLOG_DATA), mlen + 1, fmt, ap);

Architecture

● object eviction from a LRU list● the list requires locking for writes

● an object is only moved in the LRU list if it hasn't been moved for the last lru_interval seconds

● hitpass objects

Architecture

● efficient object purging - “ban list”● need to purge 200,000 objects from the cache

without overloading the server?

● Varnish keeps a list of purges● every object is tested against the list, but only if

requested by a client● if it matches, it is refreshed from a backend● its “last tested against” pointer is updated

Architecture

● results?● microsecond-level response for cached objects

● good even for static content

● performance limit currently unknown :-)● 75,000 reqs/s achieved at TMECC● 143,000 reqs/s achieved by Kristian from Redpill-

Linpro

Architecture

● serving a request from cache:<... futex resumed> ) = 0 <0.629910>futex(0x7f2a577fe2e8, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000011>ioctl(9, FIONBIO, [0]) = 0 <0.000011>read(9, "GET /logo.png HTTP/1.0\r\n (...) 8191) = 177 <0.000016>clock_gettime(CLOCK_REALTIME, {1265632945, 828835974}) = 0 <0.000011>clock_gettime(CLOCK_REALTIME, {1265632945, 828986444}) = 0 <0.000011>clock_gettime(CLOCK_REALTIME, {1265632945, 829032564}) = 0 <0.000010>writev(9, [{"HTTP/1.1"..., 8}, (...) 12912}], 32) = 13227 <0.000039>clock_gettime(CLOCK_REALTIME, {1265632945, 830411262}) = 0 <0.000011>close(9) = 0 <0.000019>futex(0x44884bf4, FUTEX_WAIT_PRIVATE, 239, NULL <unfinished ...>

● 10 system calls, 4 for clock

Features

● run-time management and reconfiguration

$ telnet localhost 6082Trying 127.0.0.1...Connected to localhost.Escape character is '^]'.vcl.list200 23active 7 boot

vcl.load new1 /etc/varnish/default.vcl200 13VCL compiled.vcl.use new1200 0

Features

● comprehensive logging and management● varnishadm● varnishlog● varnishncsa● varnishtop● varnishstat● varnishhist● varnishreplay● varnishtest

Features

● logging examples● tags

varnishtop -i RxURL

varnishtop -i TxURL

varnishtop -i RxHeader -I '^User-Agent'

varnishlog -c -o ReqStart 10.0.0.1

varnishlog -b -o TxHeader '^X-Forwarded-For: .*10.0.0.1'

Features

● varnishstat – real time statisticsclient_conn 87737603 99.74 Client connections acceptedclient_req 335496200 381.40 Client requests receivedcache_hit 307936704 350.07 Cache hitscache_hitpass 811746 0.92 Cache hits for passbackend_conn 12311926 14.00 Backend conn. successn_object 549675 . N struct objectn_wrk 100 . N worker threadsn_expired 23826372 . N expired objectsn_lru_nuked 0 . N LRU nuked objectsn_wrk_failed 0 0.00 N worker threads not createds_req 335510357 381.41 Total Requestss_pass 2947900 3.35 Total passs_fetch 27317481 31.05 Total fetchsma_nbytes 6661407561 . SMA outstanding bytessma_balloc 2173616292374 . SMA bytes allocatedsma_bfree 2166954884813 . SMA bytes freebackend_req 27318738 31.06 Backend requests madeesi_parse 0 0.00 Objects ESI parsed (unlock)esi_errors 0 0.00 ESI parse errors (unlock)

Features

● timing information type XID start time

830 ReqEnd c 877345549 1233949945.075706005 1233949945.075754881 0.017112017 0.000022888 0.000025988

end time accept()-processing processing-delivery delivery time

Features

● backend load balancing – directors● round-robin● random

● backend health polling – using new connections● grace● URL serialization● IPv6 support

Features

● no forward-proxy support – can be done, but with huge amount of configuration magic

● flexible purgingpurge req.http.host == foobar.com && req.url ~ ^/directory/.*$purge obj.http.Cookie ~ example=true

● ESI support

Features

● Edge Side Includes● a markup language for dynamic content assembly

● used by Akamai, IBM WebSphere, F5, Varnish

● without ESI: page-level caching decisions

● with ESI: a page can be split into separate blocks and assembled by the cache server

Features

● Edge Side Includes● Varnish implements a small subset of ESI● no compression support yet● no If-Modified-Since support yet

<esi:include src="/esi/hot_news.html"/>

<esi:remove><a href="/something">something</a></esi:remove>



Features

● VCL – Varnish Configuration Language● a domain-specific language● translated to C and compiled● dynamically loaded● similar to C, Perl● = == ! && || ~ !~● character escaping like in URLs: %nn● no user-defined variables, use HTTP headers:

set req.http.something = "";unset req.http.something;

Features

● VCL – Varnish Configuration Language● “normal” “concatenated” “strings” or

{"stringstring"}

synthetic { “string” }

● if () {} elsif {}● no loops● include “file.vcl”;● regsub(), regsuball()

Features

● VCL – Varnish Configuration Language● user-definied subroutines

sub f {do_magic;

}

call f;

● no arguments / return values in subs● return(); exclusive to internal VCL functions● special variables: now (unix time), client.ip,

server.ip, server.port, server.identity

Features

● VCL – Varnish Configuration Language● ACLs

acl localnet {“localhost”;“10.0.0.0/24”;! “10.0.0.1”;

}

if (client.ip ~ localnet) {do_magic;

}

● security.vcl● if everything else fails... embedded C!

Features

● embedded C in VCL● example: syslog logging from VCL (don't :-)

C{ #include <syslog.h>}C

C{syslog(LOG_INFO, "Something happened at VCL line XX.");

syslog(LOG_ERR, "Response from backend: XID %s request %s %s \"%s\" %d \"%s\" \"%s\"", VRT_r_req_xid(sp), VRT_r_req_request(sp), VRT_GetHdr(sp, HDR_REQ, "\005Host:"), VRT_r_req_url(sp), VRT_r_obj_status(sp), VRT_r_obj_response(sp), VRT_GetHdr(sp, HDR_OBJ, "\011Location:"));}C

VCL

● request path through VCL● vcl_recv● vcl_pipe● vcl_pass● vcl_hash● vcl_{hit,miss}● vcl_fetch● vcl_deliver● http://varnish-cache.org/wiki/VCLExampleDefault

● this graph is oversimplified!

http://varnish-cache.org/wiki/VCLExampleDefault

VCL

● vcl_recv● called at the beginning, after the request has been

received● possible returns: error, pass, pipe, lookup● example variables: req.request, req.url, req.proto,

req.backend, req.backend.healthy, req.http.Header

if (req.host == “static.foo.com” && req.url ~ “^/static/.*”) { set req.backend = cluster1;} else { error 404 “Unknown virtual host”;}

VCL

● vcl_pipe● called when entering pipe mode● shifts bytes back and forth (client backend)↔● possible returns: error, pipe● example variables: bereq.request, bereq.url,

bereq.proto, bereq.http.Header● timeouts

VCL

● vcl_pass● called when entering pass mode

● the request is passed to the backend without caching

● possible returns: error, pass

● example variables: bereq.*

VCL

● vcl_hash● called on object lookup● generates a user-configurable object hash● possible returns: hash● example variables: req.hash

vcl_hash {if (req.url ~ “^/content” && req.http.Cookie ~ “adult=true”) { set req.hash += “adultContent”; set req.http.X-Adult-Content = “1”;}

}

VCL

● vcl_hit● called after lookup when hit● possible returns: error, pass, deliver● example variables: obj.hits, obj.ttl● caveat: do not modify the object here!● example: adaptive TTLs:

if (req.http.host ~ “^images\.”) { if (obj.hits > 5 && obj.hits < 10) {

set obj.ttl = 8h; } elsif (obj.hits >= 10) {

set obj.ttl = 2d; }}

VCL

● vcl_miss● called after lookup when missed

● possible returns: error, pass, fetch

● example variables: bereq.*

VCL

● vcl_fetch● called after the object has been fetched● possible returns: error, pass, deliver, esi● example variables: obj.hits, obj.proto, obj.status,

obj.response, obj.cacheable, obj.ttl, obj.lastuse● obj.cacheable means: obj.status is 200, 203, 300,

301, 302, 410 or 404● forced obj.ttl first set here● obj. called beresp. in trunk

VCL

● vcl_fetch● ESI processing takes place here

sub vcl_fetch { if (req.url ~ "/esi/" || obj.http.X-ESI) { esi; set obj.ttl = 1d; } elsif (req.url == "/counter.cgi") { set obj.ttl = 1m; }}



<esi:include src="/counter.cgi"/>

VCL

● vcl_deliver● called before delivery to the client● possible returns: error, deliver● example variables: resp.proto, resp.status,

resp.response, resp.http.HEADER● modify headers for the client here

set resp.http.X-Served-By = server.identity;if (obj.hits > 0) { set resp.http.X-Varnish-Hit = “HIT”; set resp.http.X-Varnish-Hits = obj.hits;} else { set resp.http.X-Varnish-Hit = “MISS”;}

VCL

● vcl_error● called on errors● possible returns: deliver● example variables: req.*, obj.*● customizing error pages:

sub vcl_error {if (req.url ~ “^/MONITOR.txt$”) { synthetic {“MONITOR

“}; deliver;}

}

VCL

● restarts● the “restart” keyword turns the request all the

way back to vcl_recv, available everywhere

sub vcl_fetch { if (obj.status >= 500) { restart; }}

sub vcl_error { if (obj.status == 500 && req.restarts < 4) { restart; }}

VCL

● restarts● the “restart” keyword – you can even try another

data center

sub vcl_recv { if (req.restarts == 0) { set req.backend = data_center_1; } elsif (req.restarts == 1) { set req.backend = data_center_2; }}

VCL

● things to remember● req. data structure available throughout the VCL

(except in vcl_deliver)● do not modify objects in vcl_hit (except for TTL)● if unsure, translate the VCL to C

varnishd -C -f file.vcl

● look for VRT_count(sp, X) for ordering● if you don't return in a vcl_*, default VCL for that

function is appended

VCL examples

● purging, “the squid way”sub vcl_recv {

if (req.request == "PURGE") { if (!client.ip ~ purge) { error 405 "Not allowed"; } lookup; }}sub vcl_hit {

if (req.request == "PURGE") { set obj.ttl = 0s; error 200 "Purged";}

}sub vcl_miss {

if (req.request == "PURGE") { error 404 "Not found";}

}

VCL examples

● saint mode (trunk only)● do not send errors to clients

sub vcl_fetch { if (beresp.status >= 500) { set beresp.saintmode = 20s; restart; } set beresp.grace = 30m;}

● saint mode will disable a backend for a specified period of time

● if all backends are unavailable - grace

VCL examples

● force grace on errorvcl_error {

if (req.restarts == 0) { set req.http.X-Serve-Graced = "1"; restart;

}}

vcl_recv {if (req.http.X-Serve-Graced && req.restarts == 1) { set req.backend = dead;

}

● define a “dead” backend with health polling● when restarted from vcl_error, graced content will

be served

VCL examples

● URL rewritingif (req.http.host ~ "^(www\.)?foo" && req.url ~ "^/images/") { set req.http.host = "images.foo"; set req.url = regsub(req.url, "^/images/", "/");}

● redirects (a bit of a hack)sub vcl_recv { if (req.http.host = "^(www\.)?foo.com" && req.http.User-Agent ~

"iPhone|Nokia|Motorola") { error 701 "Moved temporarily"; }}sub vcl_error { if (obj.status == 701) { set obj.http.Location = "http://m.foo.com/"; set obj.status = 302; deliver; }}

VCL examples

● caching publicly available authorized pages sub vcl_fetch {

if (obj.http.Authorization && !obj.http.Cache-Control ~ "public") {

pass;}

}

● caching logged in users (be careful!)● http://varnish-cache.org/wiki/VCLExampleCachingLoggedInUsers

● possible with per-user caching and careful use of ESI

● separate “(not) logged in” objects from “logged in as...”

http://varnish-cache.org/wiki/VCLExampleCachingLoggedInUsers

VCL examples

● cookie based hashingsub vcl_hash {

if (req.http.Cookie ~ "language=esperanto" ) { set req.hash += "LangEsperanto";

}}

● result: a separate cached version of the object for requests with Cookie: language=esperanto;

● extracting the value of a cookie● nothing more than a regexp

regsub(req.http.Cookie, "^.*?cookie=([^;]*);*.*$", "\1");

VCL examples

● serving synthetic responsesif (req.url ~ "^/MONITOR.txt") { error 200 "OK";}

● allowing reloads from browsers without purgingif (req.http.Cache-Control ~ "(no-cache|no-store|private)") { pass;}

● watch out for nasty bots!● passing everything for secure URLsif (req.url ~ "^/secure") { pass;}

VCL examples

● normalizing Accept-Encoding headers for compression

if (req.http.Accept-Encoding) { if (req.http.Accept-Encoding ~ "gzip") { set req.http.Accept-Encoding = "gzip"; } elsif (req.http.Accept-Encoding ~ "deflate") { set req.http.Accept-Encoding = "deflate"; } else { remove req.http.Accept-Encoding; } if (req.url ~ "\.(css|js)$" &&req.http.User-Agent ~ "MSIE 6") { remove req.http.Accept-Encoding; }}

Best practices

● RFC 2616!● not really for reverse proxies...

● Varnish is both a client and a server

● TTLs

Best practices

● object TTL control – headers from backend● considered in the following order:● Cache-Control: s-maxage=<relative time>● Cache-Control: max-age=<relative time>● Varnish ignores all other Cache-Control headers

(unless told otherwise in VCL)● Expires: absolute time, requires synced clocks● Expires is an HTTP/1.0 header● Varnish will try to compensate for clock skew

Best practices

● object TTL control – VCL● set obj.ttl = x; - takes precedence over headers● default_ttl configuration parameter

● Varnish sets the Age: header

● if in doubt, check varnishlog● TTL tag

Best practices

● TTL tag in varnishlog 509 TTL - 1850178309 RFC 1798 1267393695 1267393694 1267395494 0 0

XID TTL time Date Expires max-age age

242 TTL c 1416303904 VCL 86400 1267393696

XID TTL time

Best practices

● caching policy● Last-Modified / If-Modified-Since

● ETag / If-None-Match

● Vary

Best practices

● compression● Varnish leaves compression up to the backends

● gzip, deflate, none – data set * 3

● Vary: Accept-Encoding

● normalize Accept-Encoding from browsers

Best practices

● sanitize request headers● we've had requests coming in to

“http://our.com/http://another.com/.*”if (req.url ~ "^/\?http://") { set req.url = regsub(req.url, "\?http://.*", "");}

● cache hit ratio went from 92% to 94%● normalize vhostsif (req.http.host ~ "^(www.)?example.com") { set req.http.host = "example.com";}

● hit ratio and backend requests: 1% is half of 2%!

Best practices

● set Content-Length on the backends● static files: multiple backends = multiple VFS

caches● serving large objects a.k.a. My Own YouTube

● objects are fully fetched before delivery● use pipe● ranges not supported● not really suitable for serving video content

Best practices

● forced TTLs● on heavily loaded sites – force TTLs to a few

seconds on all pages (but pass secure content)

● purging● entries on the ban list accumulate – consider

forcing expiry by PURGE requests● when forcing expiry, purge all Vary versions!● include purging in your application design

Best practices

● debugging● add an X-Served-By header

● add other headers along the way

● beware of header traffic!

● X-Varnish header

Best practices

● HTTPS – use pound or perlbal● in vcl_pipe:

set bereq.http.Connection = "close";

● drain connections quickly before restarting varnishd

sub vcl_recv {if (req.http.Connection != "close") { set req.http.Connection = "close"; restart;}

}

Best practices

● graph everything, ask questions later● YMMV: what is good for the big guys from

TOP100 may not be as good for you (e.g. stevedore choice)

● test● wget --save-headers● curl -i● LWP: GET -USsed● caveat: lwp-request does: “GET http://foo/bar”

Best practices

● if everything else fails...$ gdb /usr/sbin/varnishd coreGNU gdb 6.8-debianThis GDB was configured as "x86_64-linux-gnu"...(gdb) bt

(…)

(gdb) frame 3#3 0x000000000042ef64 in mgt_cli_vlu (priv=0x7fb112813c00, p=0x7fb1128d3000 "debug.health") at mgt_cli.c:270270 xxxassert(i == strlen(p));

● don't strip Varnish binaries● compile with --enable-debugging-symbols --enable-

diagnostics

Configuration

● object hash table● Varnish 2.0: -h classic,N● N hash buckets – objects / 10● a prime number

● Varnish trunk: -h critbit● Patricia Tree

Configuration

● run-time parameters (can be set from CLI)● obj_workspace=Nbytes (dynamic in trunk) –

headers, per object overhead● sess_workspace=Nbytes – entire header and all

edits done in VCL, per thread● shm_workspace=Nbytes – for the log, per thread● shm_reclen=Nbytes – max SHM log record length● session_linger=Nms – time before a worker thread

is returned to its pool● sess_timeout=Ns – persistent session timeout

Configuration

● thread_pools=N – set to the number of CPU cores

● thread_pool_add_delay=Nms – default may be too high

● thread_pool_max and _min – a bit confusing● max – the limit for all thread pools● min – the limit for one thread pool● do not set too high

OS environment

● forget about 32-bit● malloc() better than mmap() for in-memory

cache sets● also better for larger-than-memory cache sets

on Linux (YMMV)● Varnish on virtualized guests?

● slight latency difference● can be an issue for on-line auction sites

OS environment

● Virtualization● Varnish on a standalone system

OS environment

● Virtualization● Varnish on a Xen domU with pinned vcpus

OS environment

● I/O related tuning on Linux● set vm.swappiness to 0● /var/lib/varnish/$HOSTNAME/_.vsl – the SHM log● put the SHM log on tmpfs● anticipatory elevator best on HDDs, noop on SSDs● use ext2● noatime● swap striping● iSCSI is great for logs

OS environment

● network tuning● run NTP● check if your load balancer uses keep-alive● /proc/sys/net – don't tune if you don't know what

you're doing● don't use net.ipv4.tcp_tw_reuse● tcp_tw_recycle is even worse● normally the socket waits 2 * MSL● reusing causes problems with NAT routers

New features in trunk

● upcoming release: Varnish 2.1● persistent storage (without LRU support)● URL hashing director● client hashing director● critbit by default● saint mode● obj_workspace allocated dynamically● req.* in vcl_deliver● obj.* in vcl_fetch is now beresp.*

Shopping list

● http://varnish-cache.org/wiki/PostTwoShoppingList

● ESI enhancements (304s, gzip, etc.)● compression support● streaming in pass / fetch● Content-Range support● file upload buffering● VCL cookie handling (req.cookie.foo)● custom formats in varnishlog and varnishncsa

http://varnish-cache.org/wiki/PostTwoShoppingList

Shopping list

● expiry randomization● “lemming effect”

Support & development

● commercial support offered by Redpill-Linpro

● community support #varnish on irc.linpro.no

● VML – Varnish Moral License● http://phk.freebsd.dk/VML/● not a support contract!● help pay for Varnish development

http://phk.freebsd.dk/VML/

Thank you

Questions?

Sources

● http://varnish-cache.org/● TMECC VCL configs● Wikia VCL configs● http://kristian.blog.linpro.no/● http://ingvar.blog.linpro.no/● #varnish


http://kristian.blog.linpro.no/

http://ingvar.blog.linpro.no/

Technology

Varnish - PLNOG 4