Upload
jan-penninkhof
View
242
Download
5
Tags:
Embed Size (px)
Citation preview
ResilientAPIs
@JPENNINKHOF
Challenge
“If you add up all the smartphones and the tablets
and the digital televisions and the PCs... we see a
large opportunity of perhaps 3 billion to 4 billion
units per annum, but we see an embedded market
that’s maybe 30 billion to 40 billion units per
annum”
- ARM CEO Warren East
Problem definition
For example, running an application that depends on 30 services that each have 99.99% uptime we get:
99.9930 = 99.7% uptime
0.3% of 1 million requests = 3,000 failures
2+ hours downtime/month even if all dependencies have excellent uptime.
Reality is generally worse.
API vulnerability
API Fallbacks
Design principles
• Restrict any single dependency from using up all user threads.
• Shed load and fail fast instead of queueing.
• Provide fallbacks wherever feasible to protect users from failure
• Use isolation techniques (such as bulkhead, swimlane and circuit breaker patterns) to limit impact of any one dependency.
• Optimize for time-to-discovery through near real-time metrics, monitoring and alerting
• Optimize for time-to-recovery with low latency propagation of configuration changes and support for dynamic property changes in virtually all aspects of Hystrix to allow real-time operational modifications with low latency feedback loops.
• Protect against entire dependency client execution, not just network traffic
Use timeoutsTime-out calls that take longer than defined thresholds. A
default exists but for most dependencies is custom-set via
properties to be just slightly higher than the measured
99.5th percentile performance for each dependency.
BulkheadsMaintain a small thread-pool (or semaphore) for each dependency and if it becomes full commands will be immediately rejected instead of queued up. Dependencies with Clogged threads pools shouldn’t hinder access to other dependencies.
Circuit breakersTrip a circuit-breaker automatically or manually
to stop all requests to that service for a period of
time if error percentage passes a threshold.
Fallback logicPerform fallback logic when a request
fails, is rejected, timed-out or short-
circuited.
MeasureMeasure success, failures
(exceptions thrown by client),
timeouts, and thread
rejections.
Request collapsingCollapse multiple concurrent user request
into one a single backend dependency call
(within a short time window of e.g. 10ms)
Request cachingReduce the number of request being sent to the
backend dependencies by caching and de-
duping requests.
Define a pipeline and contextMany service share base functionality such as
authentication. Defining a clear request pipeline and
context, optimizes shared logic and prevents
repeating calls (e.g. getCustomer)
Don’t lock the bonnetMake it possible to switch on logging and direct certain
traffic to a specific node
REST vs Experience API
/users/<id>/ratings/title
/users/<id>/queues
/users/<id>/queues/instant
/users/<id>/recommendations
/catalog/titles/movie
/catalog/titles/series
/catalog/people
VS
Example: /phone/homescreen
User Interface Rendering
Data gathering, formattingand delivery