NGINX High-performance Caching

Preview:

Citation preview

NGINX High-performance Caching

Introduced by Andrew Alexeev

Presented by Owen Garrett

Nginx, Inc.

About this webinar

Content Caching is one of the most effective ways to dramatically improve

the performance of a web site. In this webinar, we’ll deep-dive into

NGINX’s caching abilities and investigate the architecture used, debugging

techniques and advanced configuration. By the end of the webinar, you’ll

be well equipped to configure NGINX to cache content exactly as you need.

BASIC PRINCIPLES OF CONTENT CACHING

Basic Principles

Internet

N

GET /index.html

GET /index.html

Used by: Browser Cache, Content Delivery Network and/or Reverse Proxy Cache

Mechanics of HTTP Caching• Origin server declares cacheability of content

• Requesting client honors cacheability– May issue conditional GETs

Expires: Tue, 6 May 2014 02:28:12 GMT

Cache-Control: public, max-age=60

X-Accel-Expires: 30

Last-Modified: Tue, 29 April 2014 02:28:12 GMT

ETag: "3e86-410-3596fbbc“

What does NGINX cache?

• Cache GET and HEAD with no Set-Cookie response

• Uniqueness defined by raw URL or:

• Cache time defined by

– X-Accel-Expires

– Cache-Control

– Expires http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html

proxy_cache_key $scheme$proxy_host$uri$is_args$args;

NGINX IN OPERATION…

NGINX Configproxy_cache_path /tmp/cache keys_zone=one:10m levels=1:2 inactive=60m;

server {

listen 80;

server_name localhost;

location / {

proxy_pass http://localhost:8080;

proxy_cache one;

}

}

Caching Process

Internet

Check Cache

Respond from cache

Read request Wait?cache_lock_timeout

Response cacheable?

Stream to disk

proxy_cache_use_stale error | timeout | invalid_header |

updating | http_500 | http_502 | http_503 | http_504 |

http_403 | http_404 | off

NGINX can use stale content under the following circumstances:

MISS

HIT

Caching is not just for HTTP

• FastCGI– Functions much like HTTP

• Memcache– Retrieve content from memcached

server (must be prepopulated)

• uwsgi and SCGI

N

HTTPFastCGImemcacheduwsgiSCGI

NGINX is more than just a reverse proxy

HOW TO UNDERSTAND WHAT’S GOING ON

add_header X-Cache-Status $upstream_cache_status;

MISS Response not found in cache; got from upstream. Response may have been saved to cache

BYPASS proxy_cache_bypass got response from upstream. Response may have been saved to cache

EXPIRED entry in cache has expired; we return fresh content from upstream

STALE takes control and serves stale content from cache because upstream is not responding correctly

UPDATING serve state content from cache because cache_lock has timed out and proxy_use_stale takes control

REVALIDATED proxy_cache_revalidate verified that the current cached content was still valid (if-modified-since)

HIT we serve valid, fresh content direct from cache

Cache Instrumentation

Cache Instrumentationmap $remote_addr $cache_status {

127.0.0.1 $upstream_cache_status;

default “”;

}

server {

location / {

proxy_pass http://localhost:8002;

proxy_cache one;

add_header X-Cache-Status $cache_status;

}

}

Extended StatusCheck out: demo.nginx.com

http://demo.nginx.com/status.html http://demo.nginx.com/status

HOW CONTENT CACHING FUNCTIONS IN NGINX

How it works...

• NGINX uses a persistent disk-based cache

– OS Page Cache keeps content in memory, with hints from NGINX processes

• We’ll look at:

– How is content stored in the cache?

– How is the cache loaded at startup?

– Pruning the cache over time

– Purging content manually from the cache

How is cached content stored?

• Define cache key:

• Get the content into the cache, then check the md5

• Verify it’s there:

$ echo -n "httplocalhost:8002/time.php" | md5sum

6d91b1ec887b7965d6a926cff19379b4 -

$ cat /tmp/cache/4/9b/6d91b1ec887b7965d6a926cff19379b4

proxy_cache_path /tmp/cache keys_zone=one:10m levels=1:2

max_size=40m;

proxy_cache_key $scheme$proxy_host$uri$is_args$args;

Loading cache from disk

• Cache metadata stored in shared memory segment

• Populated at startup from cache by cache loader

– Loads files in blocks of 100

– Takes no longer than 200ms

– Pauses for 50ms, then repeats

proxy_cache_path path keys_zone=name:size

[loader_files=number] [loader_threshold=time] [loader_sleep=time];

(100) (200ms) (50ms)

Managing the disk cache

• Cache Manager runs periodically, purging files that were inactive irrespective of cache time, deleteingfiles in LRU style if cache is too big

– Remove files that have not been used within 10m

– Remove files if cache size exceeds max_size

proxy_cache_path path keys_zone=name:size

[inactive=time] [max_size=size];

(10m)

Purging content from disk

• Find it and delete it

– Relatively easy if you know the key

• NGINX Plus – cache purge capability

$ curl -X PURGE -D – "http://localhost:8001/*"

HTTP/1.1 204 No Content

Server: nginx/1.5.12

Date: Sat, 03 May 2014 16:33:04 GMT

Connection: keep-alive

X-Cache-Key: httplocalhost:8002/*

CONTROLLING CACHING

Delayed caching

• Saves on disk writes for very cool caches

• Saves on upstream bandwidth and disk writes

proxy_cache_min_uses number;

proxy_cache_revalidate on;

Cache revalidation

Control over cache time

• Priority is:

– X-Accel-Expires

– Cache-Control

– Expires

– proxy_cache_valid

proxy_cache_valid 200 302 10m;

proxy_cache_valid 404 1m;

Set-Cookie response header means no caching

Cache / don’t cache

• Bypass the cache – go to origin; may cache result

• No_Cache – if we go to origin, don’t cache result

• Typically used with a complex cache key, and only if the origin does not sent appropriate cache-control reponses

proxy_cache_bypass string ...;

proxy_no_cache string ...;

proxy_no_cache $cookie_nocache $arg_nocache $http_authorization;

Multiple Caches

• Different cache policies for different tenants

• Pin caches to specific disks

• Temp-file considerations – put on same disk!:

proxy_cache_path /tmp/cache1 keys_zone=one:10m levels=1:2 inactive=60s;

proxy_cache_path /tmp/cache2 keys_zone=two:2m levels=1:2 inactive=20s;

proxy_temp_path path [level1 [level2 [level3]]];

QUICK REVIEW – WHY CACHE?

Why is page speed important?

• We used to talk about the ‘N second rule’:

– 10-second rule• (Jakob Nielsen, March 1997)

– 8-second rule • (Zona Research, June 2001)

– 4-second rule • (Jupiter Research, June 2006)

– 3-second rule • (PhocusWright, March 2010)

0

2

4

6

8

10

12

Jan

-97

Jan

-98

Jan

-99

Jan

-00

Jan

-01

Jan

-02

Jan

-03

Jan

-04

Jan

-05

Jan

-06

Jan

-07

Jan

-08

Jan

-09

Jan

-10

Jan

-11

Jan

-12

Jan

-13

Jan

-14

Google changed the rules

“We want you to be able to get from one page to another as quickly as you turn the page on a book”

Urs Hölzle, Google

The costs of poor performance• Google: search enhancements cost 0.5s page load

– Ad CTR dropped 20%

• Amazon: Artificially increased page load by 100ms– Customer revenue dropped 1%

• Walmart, Yahoo, Shopzilla, Edmunds, Mozilla… – All reported similar effects on revenue

• Google Pagerank – Page Speed affects Page Rank– Time to First Byte is what appears to count

NGINX Caching lets you

Improve end-user performance

Consolidate and simplify your web infrastructure

Increase server capacity

Insulate yourself from server failures

Closing thoughts

• 38% of the world’s busiest websites use NGINX

• Check out the blogs on nginx.com

• Future webinars: nginx.com/webinars

Try NGINX F/OSS (nginx.org) or NGINX Plus (nginx.com)