Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Taming the rising complexity of event-driven APIs Useful insights for API publishers
My journey with event-driven APIs
Technical co-founder of Ably Built and scaled event-driven APIs for developers. Board member OpenAPI initiative
Page 3 Ably - Serious realtime infrastructure
What is an event-driven API?
Delivery person Logistics provider
Retailer
Parcel arrived! (on device)
Customer
Ping
Page 4 Ably - Serious realtime infrastructure
Pull (request/response) is stateless & simpler
Data source Worker Server Consumers
Page 5 Ably - Serious realtime infrastructure
Pull (request/response) is stateless & simpler
Data source Worker Server Consumers
Page 6 Ably - Serious realtime infrastructure
Event-driven is more demanding on producers
Worker hot spots
Additional complexity:
Backoffice scheduling & management
Short value window
Delivery logistics
Acme Fresh Food
Page 7 Ably - Serious realtime infrastructure
Complexity inversion with event-driven APIs Pull (REST) API Event-driven API
Consumer
Producer
Trigger requests and maintain state
API Gateway / CDN Fan out push,
maintain state, and fault tolerant.
Complexity
Wait for updates
Stateless response
Page 8 Ably - Serious realtime infrastructure
If complexity is evil - can we avoid it? Probably not.
Event-driven data exceeds all
data produced in 2019 (circa
40Zb)
Page 9 Ably - Serious realtime infrastructure
Event-driven data demand is here, now By 2021, cumulative event-
driven data exceeds all
previous event-driven data
produced and consumer,
ever
Page 10 Ably - Serious realtime infrastructure
Criticality of this data
Compound Annual
Growth Rate
All data Potentially Critical Critical Hyper-Critical
30% 37% 39% 54%
Page 11 Ably - Serious realtime infrastructure
Source of new event-driven data
Page 12 Ably - Serious realtime infrastructure
Terminology: The Realtime API family
Pub/sub Messaging pattern
Streaming Consumer pattern
Event-driven Architectural
pattern
Push Producer pattern
Realtime
APIs
Publishing a realtime API
Page 14 Ably - Serious realtime infrastructure
Infrastructure
Publishing
Documentation / Spec
Three distinct functions when publishing a realtime API
Distribute Deploy
Authentication
Pull & push protocols
Onboarding + dev tooling
Scaling & performance
Manage
Access control
Instrumentation
Billing
Monetization
Page 15 Ably - Serious realtime infrastructure
Three distinct functions when publishing a realtime API
Distribute Deploy
Authentication
Pull & push protocols
Onboarding + dev tooling
Scaling & performance
Publishing
Infrastructure
Documentation / Spec
Manage
Access control
Instrumentation
Billing
Monetization
Page 16 Ably - Serious realtime infrastructure
Three distinct functions when publishing a realtime API
Distribute Deploy
Authentication
Pull & push protocols
Onboarding + dev tooling
Scaling & performance
Publishing
Infrastructure
Documentation / Spec
Manage
Access control
Instrumentation
Billing
Monetization
Distribution layer complexity 1. Integrity vs Latency
2. Throughput 3. Push vs Pull
subscriptions 4. Downstream reliability 5. Durability
Five commonly unforeseen but important areas of complexity in event-driven API delivery.
Distribution layer complexity #1 Integrity vs Latency
Page 19 Ably - Serious realtime infrastructure
Integrity vs Latency
Happy Path
Your servers
message bus
API Interface
15 updates a minute
Page 20 Ably - Serious realtime infrastructure
Integrity vs Latency: Congestion & connectivity issues
Backpressure Your servers
message bus
Less Happy Path
Page 21 Ably - Serious realtime infrastructure
Solution #1 - Backpressure control
Time: 0s 10s 20s 30s 40s 50s 60s 70s 80s
Backpressure control Bandwidth limited
Managing backpressure Server-side
● TCP buffer size with high watermark Network layer
● ACK from subscribing clients Application layer
Client-side
● Stream polling
Page 23 Ably - Serious realtime infrastructure
Integrity vs Latency: Conflation
Pricing
Scores GPS locations
Time: 1s 2s 3s 4s
Great for:
Page 24 Ably - Serious realtime infrastructure
Simplified example of conflation on a stream
Page 25 Ably - Serious realtime infrastructure
Considerations when latency takes priority
● Ordering of streams is typically not required Less complexity
● Backpressure management or conflation preferred Additional complexity
● Capacity planning critical Additional operational complexity
● Persistent subscriber transport preferred Improved latencies as reduced round-trip overhead
Page 26 Ably - Serious realtime infrastructure
Integrity vs Latency: Integrity prioritized
Ordered message stream
Auditors & Legal
message bus
Page 27 Ably - Serious realtime infrastructure
Integrity vs Latency: Integrity prioritized
Ordered message stream
Auditors & Legal
message bus
Page 28 Ably - Serious realtime infrastructure
Considerations when integrity takes priority
● Ordering of streams is important Significant complexity increased in a stateful design (serial numbers)
● Backpressure management still needed Buffers cannot build up indefinitely. Some increased complexity
● Persistent subscriber transport preferred Removes complexity in the subscriber as TCP can be relied upon for integrity
● Reliable publishing ACKs, persistent connections & idempotency
Page 29 Ably - Serious realtime infrastructure
Publisher integrity with idempotent publishing
Post A
Cloud Service
Connection Disconnected
Response
No Response, Retry Post A
Response A
Page 30 Ably - Serious realtime infrastructure
Integrity vs Latency in summary
● Consumers decide on integrity vs latency ● Producers of data can choose both latency and integrity
○ To gain integrity, idempotency and persistent connections preferred ● Latency over integrity is generally simpler ● Backpressure needs to be handled always
Distribution layer complexity #2 Throughput
Page 32 Ably - Serious realtime infrastructure
Throughput - narrow pipe
Multiple producers
Fat message
bus Single stream
Consumer Max 1,000 msgs p/s
per stream
Page 33 Ably - Serious realtime infrastructure
Throughput: #1 solved using queues Advantages: Relatively simple Scales easily Disadvantages: No ordering At least once delivery Potentially two design patterns in play - pub/sub and queueing. Consumer
workers
Queue
500 msgs p/s
Fat message
bus
Multiple producers
Page 34 Ably - Serious realtime infrastructure
Advantages: Data stream integrity (ordering) Single pattern pub/sub Exactly-once delivery Disadvantages: Sharding is complex Expanding or contracting shards is hard Consumer state may be needed
Throughput: #2 solved with sharding
Shard consumers
Consumer group 1
2
4
3
Throughput sharding
Fat message
bus
Multiple producers
Page 35 Ably - Serious realtime infrastructure
● Everything has limits - distribution is your friend Practical limits in terms of what a single server, connection or shard can sustain.
● Keep complexity away from your consumers It’s your job to keep it simple for consumers.
● Queues are simple Yet lack ordering and exactly-once delivery guarantees
● Shards are hard Providesordering and exactly-once delivery guarantees
Throughput in summary
Distribution layer complexity #3 Push vs Pull Subscriptions
Page 37 Ably - Serious realtime infrastructure
Push vs Pull subscriptions
AMQP
API Gateway
Push service
Client initiated
Server initiated
HTTP
WebSub
Your message bus
Page 38 Ably - Serious realtime infrastructure
Examples of Push subscription protocols
Stateless over HTTP Queue based Stream based
Webhooks AMQP Kafka
WebSub AWS SQS AWS Kinesis
Serverless function
invocation
MQTT
Integrations (Zapier
etc)
Page 39 Ably - Serious realtime infrastructure
When to use push vs pull subscriptions?
Push subscriptions
● High throughput Target can be load balanced.
● Reduced consumer complexity Creates complexity for producer, and you need to address durability and downstream failures
● Always online Unsuitable for devices that are not always online.
● Unintentional DoS risk Control rate of downstream requests.
Pull subscriptions
● On demand Generally better suited for devices such as mobiles, desktops, browsers where data is needed on demand.
● Simple Simple for consumers and producers.
● Low throughput per subscriber Not suitable for high throughput, without sharding or queueing.
● Capacity planning harder Unpredictable load.
Distribution layer complexity #4 Downstream reliability (for push subscriptions)
Page 41 Ably - Serious realtime infrastructure
Downstream reliability - push subscriptions
Your message bus
Push service
Ordered stream
HTTP
Kafka
AMQP
Page 42 Ably - Serious realtime infrastructure
Downstream reliability - coping with failure
Poison message 50x
Faulty connection
Your message bus
Push service
Ordered stream
HTTP
Kafka
AMQP
Page 43 Ably - Serious realtime infrastructure
Dead letter queues
Dead letter queue
Failed messages
Your message bus Ordered stream
HTTP
Kafka
AMQP
Page 44 Ably - Serious realtime infrastructure
● Manage throughput to avoid unintentional DoS attacks Avoid unintentional floods of requests to downstream endpoints. Use rate limiting, and incremental back-offs.
● Push subscriptions may prefer latency over integrity Latency prioritized traffic may allow messages to be discarded when there is a failure.
● Dead letter queue poison / bad messages Don’t delete data.
● Tooling and alerts Ensure downstream providers have necessary tooling and alerts to manage failures.
Downstream reliability considerations
Distribution layer complexity #5 Durability
Page 46 Ably - Serious realtime infrastructure
Durability
Resume from #1 (-2 hours)
Persistent ordered index log stream
Resume from #6 (-1 minute)
10 9 8 7 6 5 4 3 2 1
Page 47 Ably - Serious realtime infrastructure
Durability considerations
● Use a dedicated log stream storage solution Trying to get traditional databases to store log streams is hard. There are many solutions that provide efficient append-only stream storage.
● Complexity of adding storage is probably worth it Freedom to later improve reliability, reduce backpressure issues, better continuity of streams.
● Latency and financial cost The financial impact can be managed to some degree by modifying TTLs for storage.
Event-driven APIs do introduce complexity
But it’s not rocket science :)
Page 49 Ably - Serious realtime infrastructure
What next?
#1 Give developers what they want - better event-driven integrations
Event driven integration frustration
up 2% this year
Source: State of API Integration Cloud Elements
Page 50 Ably - Serious realtime infrastructure
What next?
#2 Avoid complexity ● Focus on necessary complexity ● Delay complexity that is not needed to day, but plan for it ● Open source and cloud solutions exist - don’t reinvent the
wheel
Page 51 Ably - Serious realtime infrastructure
#3 Focus on developer experience ● Don’t downstream complexity to consumers ● Let consumers choose push or pull protocols, integrity or
latency ● Documentation and developer portals are important ● An API is a contract and commitment to your consumers
What next?
Page 52 Ably - Serious realtime infrastructure
#4 Help build an open realtime connected world ● Provide event-driven APIs ● Set your data free
What next?
Thank you Taming the rising complexity of event-driven APIs
www.ably.io
Me: @mattheworiordan @ablyrealtime
Shameless plugs: go.ably.io/open-data-streams go.ably.io/api-management
Illustrations by Leonie Wharton - leoniewharton.com