44
@TwitterAds | Confidential @lfcipriani 2013-08-30 APIs Caching Why your server needs some rest Rubyconf Brazil 2013

API Caching, why your server needs some rest

Embed Size (px)

DESCRIPTION

The best HTTP request made to your server is that one that never reaches it. Do you know the life cycle time of your resources? How to be sure that the user never reaches an expired response without the need to open the connection door with the origin server? What kinds of caches do exist and when do I need to use each one of them? Why can I not be afraid to read the RFCs? This talk will present good practices on the usage of HTTP cache for APIs and web applications, turning your digital products to optimize the usage of machines and save money.

Citation preview

Page 1: API Caching, why your server needs some rest

@TwitterAds | Confidential

@lfcipriani2013-08-30

APIs CachingW h y y o u r s e r v e r n e e d s s o m e r e s t

R u b y c o n f B r a z i l 2 0 1 3

Page 2: API Caching, why your server needs some rest

@TwitterAds | Confidential

Who?@lfcipriani

Page 3: API Caching, why your server needs some rest

@TwitterAds | Confidential

What?

Page 4: API Caching, why your server needs some rest

@lfcipriani

Scope of this presentation

4

• Caching in a Distributed System• The flows of HTTP Cache and how to control them• Good and Bad Practices

Page 5: API Caching, why your server needs some rest

@lfcipriani

If you need a friendly way to understand the Caching part of RFC 2616Scope of this presentation

5Source: http://www.slideshare.net/lfcipriani/fearless-http-requests-abuse

Page 6: API Caching, why your server needs some rest

@TwitterAds | Confidential

Definitions and

Definitions and Motivations

6

Page 7: API Caching, why your server needs some rest

@lfcipriani

Memorizing phone numbers or go check phonebook every time

7

Analogy

Page 8: API Caching, why your server needs some rest

@lfcipriani

Network Effect

8

Welcome to the first year of Software Engineering...

...where every request delivers a response without failure and all network is reliable and fast.

Source: First day on Internet Kid (know your meme)

Page 9: API Caching, why your server needs some rest

@lfcipriani

What problems cache helps to solve?

• redundant and unnecessary data traffic• network bottlenecks• origin server heavy load (or spikes)• long network latency

9

Page 10: API Caching, why your server needs some rest

@lfcipriani

HTTP Archive

10

Motivations

Source: http://httparchive.org/trends.php?s=All&minlabel=Jan+20+2011&maxlabel=Aug+15+2013

All sites Top 1000

Page 11: API Caching, why your server needs some rest

@lfcipriani

HTTP Archive Cache lifetime: All Sites vs Top 100

11

Motivations

http://httparchive.org/interesting.php?a=All&l=Aug%2015%202013&s=Top100

Page 12: API Caching, why your server needs some rest

@TwitterAds | Confidential

HTTP Caching Protocol

12

Page 13: API Caching, why your server needs some rest

@lfcipriani

HTTP Caching flows

13

Page 14: API Caching, why your server needs some rest

@lfcipriani 14https://vine.co/v/hOuAXTOetuz

bit.ly/vinecaching

Page 15: API Caching, why your server needs some rest

@lfcipriani 15https://vine.co/v/hOuMHbTzp6h

bit.ly/vinecaching

Page 16: API Caching, why your server needs some rest

@lfcipriani 16https://vine.co/v/hOu5g9FVDa5

bit.ly/vinecaching

Page 17: API Caching, why your server needs some rest

@lfcipriani 17https://vine.co/v/hOuvzinwrt6

bit.ly/vinecaching

Page 18: API Caching, why your server needs some rest

@lfcipriani

The Cache headers zoo

18Source: http://www.slideshare.net/lfcipriani/fearless-http-requests-abuse

Page 19: API Caching, why your server needs some rest

@TwitterAds | Confidential

Cache Coherency

19

Page 20: API Caching, why your server needs some rest

@lfcipriani

What’s cache coherency?

20

Since only the Origin Server knows the state of a resource with certainty, caches and other components must to ensure that the cached response is still fresh before returning it to client.

Due to the complexity, keep cache coherency in distributed systems has a high cost.

In a distributed system

Page 21: API Caching, why your server needs some rest

@lfcipriani

Better safe than sorryStrong consistency

21

Maintain coherency by revalidating every request in origin server.

Page 22: API Caching, why your server needs some rest

@lfcipriani

Living dangerouslyWeak consistency

22

Cache has autonomy to use a heuristic to decide whether the cached response is still fresh, without consulting the origin server

Basically, there are 2 types of weak consistency.

Page 23: API Caching, why your server needs some rest

@lfcipriani

Weak consistency - Invalidation

23

Page 24: API Caching, why your server needs some rest

@lfcipriani

Weak consistency - Invalidation is bad!

24

• approach does not scale

• server needs to coordinate with a unknown network of caches

• choose 2: immediacy, scalability, reliability • “There are only two hard things in Computer Science: cache invalidation and naming things” - Phil Karlton

• Two Generals Problem

http://www.subbu.org/blog/2010/01/cache-invalidationhttp://en.wikipedia.org/wiki/Two_Generals'_Problem

Page 25: API Caching, why your server needs some rest

@lfcipriani

Weak consistency - When to do Invalidation

25

When your network is similar to the one below ;-)

Page 26: API Caching, why your server needs some rest

@lfcipriani

Weak consistency - TTL approach

26

Page 27: API Caching, why your server needs some rest

@TwitterAds | Confidential

Taming Cache

27

Page 28: API Caching, why your server needs some rest

@lfcipriani

Topology considerations

28

Page 29: API Caching, why your server needs some rest

@lfcipriani

Controlling cacheability Protocol Specific Considerations

29

1. locally means a cache that servers only one consumer2. these directives override any configuration of the cache3. by default, we can cache non safe/authenticated requests, GET and HEAD and those with status code 200, 203, 206, 300, 301, 410

cache-control directive may I cache locally? may I cache

anywhere?should revalidate, even being fresh?

no-store no no n/aprivate yes no no

no-cache yes yes yespublic yes yes no

Page 30: API Caching, why your server needs some rest

@lfcipriani 30

Protocol Specific ConsiderationsControlling cacheability

Be aware of the Vary header, if the value is a header name which values are high diversified, you could fill cache storage too fast.

Page 31: API Caching, why your server needs some rest

@lfcipriani 31

Protocol Specific ConsiderationsControlling revalidation

Revalidation is done with conditional requests.

If-Modified-Since != Last-Modified = 200If-Modified-Since == Last-Modified = 304If-None-Match != Etag = 200If-None-Match == Etag = 304

You can even decide how revalidation is done.

Page 32: API Caching, why your server needs some rest

@lfcipriani

Content specific considerations

32

Careful with cookies

Be aware of how privacy policy influences what’s cacheable

Page 33: API Caching, why your server needs some rest

@lfcipriani

Content life cycle considerations

33

TL;DR;

Know the rates of change of your resources and establish a time to live for them.

Expires=[Date]Cache-Control: max-age=[seconds]

Page 34: API Caching, why your server needs some rest

@lfcipriani 34

• too short (seconds) or too long (days) TTLs smell bad

• TTL can vary, don’t consider it as a constant value.

• don’t be afraid to get sophisticated, if needed:• L-Factor heuristic: (date - last modified) * factor• Prediction Models http://www.slideshare.net/jseidman/real-world-machine-learning-at-orbitz-strata-2011

• Control your cache strategy!

Content life cycle considerations

Page 35: API Caching, why your server needs some rest

@lfcipriani

General considerations

35

Deciding to have NO cache is part of the strategy.

Your cache strategy might not be honored by an intermediary cache, no hard feelings about it, is more common than you think.

Page 36: API Caching, why your server needs some rest

@TwitterAds | Confidential

Measuring efficiency

36

Page 37: API Caching, why your server needs some rest

@lfcipriani

Measuring Cache efficiency

37

Hit Rate = Cache hits / Total of requests

This will depend on:• how big your cache is• how similar the interests of the cache users are• the data rate of change• how caches are configured

Page 38: API Caching, why your server needs some rest

@lfcipriani

Measuring Cache efficiency

38

Byte Hit Rate = Bytes transferred from cache hits / Bytes transferred by Total of requests

Page 39: API Caching, why your server needs some rest

@lfcipriani

Measuring Cache efficiency

39

• the same metrics could be applied to revalidations

• do the measures by resource

• do continuous measures and monitor to improve strategy

Page 40: API Caching, why your server needs some rest

@lfcipriani

Validate your strategy in redbot.org

40

Measuring Cache efficiency

Page 41: API Caching, why your server needs some rest

@TwitterAds | Confidential

Final considerations

41

Page 42: API Caching, why your server needs some rest

@lfcipriani

Final considerations

42

• Is important to have a good knowledge of Topology of the application and Distributed Systems constraints.

• Think and build a good strategy, don’t rely on default heuristics

• Measure, monitor and improve. Strategies are dynamic and change it is part of the process.

• All this can be done incrementally, focus on relevant resources

• Be careful to not turn cache into overhead.

Page 43: API Caching, why your server needs some rest

@lfcipriani 43

References

Web Protocols and Practice: HTTP/1.1, Networking Protocols, Caching, and Traffic Measurement (Balachander Krishnamurthy and Jennifer Rexford)HTTP: The Definitive Guide (David Gourley, Brian Totty, Marjorie Sayer and Anshu Aggarwal)

http://www.w3.org/Protocols/rfc2616/rfc2616.html (HTTP RFC)http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13 (Caching in HTTP)http://stevesouders.com/http://talleye.com/https://dev.twitter.com/bit.ly/vinecaching

Page 44: API Caching, why your server needs some rest

@TwitterAds | Confidential

Thank you!