21
Resilient APIs

How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

Embed Size (px)

Citation preview

Page 1: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

ResilientAPIs

Page 2: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

@JPENNINKHOF

Page 3: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

Challenge

Page 4: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

“If you add up all the smartphones and the tablets

and the digital televisions and the PCs... we see a

large opportunity of perhaps 3 billion to 4 billion

units per annum, but we see an embedded market

that’s maybe 30 billion to 40 billion units per

annum”

- ARM CEO Warren East

Page 5: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

Problem definition

For example, running an application that depends on 30 services that each have 99.99% uptime we get:

99.9930 = 99.7% uptime

0.3% of 1 million requests = 3,000 failures

2+ hours downtime/month even if all dependencies have excellent uptime.

Reality is generally worse.

Page 6: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

API vulnerability

Page 7: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

API Fallbacks

Page 8: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

Design principles

• Restrict any single dependency from using up all user threads.

• Shed load and fail fast instead of queueing.

• Provide fallbacks wherever feasible to protect users from failure

• Use isolation techniques (such as bulkhead, swimlane and circuit breaker patterns) to limit impact of any one dependency.

• Optimize for time-to-discovery through near real-time metrics, monitoring and alerting

• Optimize for time-to-recovery with low latency propagation of configuration changes and support for dynamic property changes in virtually all aspects of Hystrix to allow real-time operational modifications with low latency feedback loops.

• Protect against entire dependency client execution, not just network traffic

Page 9: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

Use timeoutsTime-out calls that take longer than defined thresholds. A

default exists but for most dependencies is custom-set via

properties to be just slightly higher than the measured

99.5th percentile performance for each dependency.

Page 10: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

BulkheadsMaintain a small thread-pool (or semaphore) for each dependency and if it becomes full commands will be immediately rejected instead of queued up. Dependencies with Clogged threads pools shouldn’t hinder access to other dependencies.

Page 11: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

Circuit breakersTrip a circuit-breaker automatically or manually

to stop all requests to that service for a period of

time if error percentage passes a threshold.

Page 12: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

Fallback logicPerform fallback logic when a request

fails, is rejected, timed-out or short-

circuited.

Page 13: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

MeasureMeasure success, failures

(exceptions thrown by client),

timeouts, and thread

rejections.

Page 14: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

Request collapsingCollapse multiple concurrent user request

into one a single backend dependency call

(within a short time window of e.g. 10ms)

Page 15: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

Request cachingReduce the number of request being sent to the

backend dependencies by caching and de-

duping requests.

Page 16: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

Define a pipeline and contextMany service share base functionality such as

authentication. Defining a clear request pipeline and

context, optimizes shared logic and prevents

repeating calls (e.g. getCustomer)

Page 17: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

Don’t lock the bonnetMake it possible to switch on logging and direct certain

traffic to a specific node

Page 18: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

REST vs Experience API

/users/<id>/ratings/title

/users/<id>/queues

/users/<id>/queues/instant

/users/<id>/recommendations

/catalog/titles/movie

/catalog/titles/series

/catalog/people

VS

Page 19: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

Example: /phone/homescreen

User Interface Rendering

Data gathering, formattingand delivery

Page 20: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)
Page 21: How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)

We are hiring!Contact me:

[email protected]

Thanks for listening!