View
2
Download
0
Category
Preview:
Citation preview
A Head Start on Cloud-native Event Driven Applications
S. Suhothayan (Suho)
Director WSO2@suhothayan
Increasing demand is causing disaggregation
Types of System Integration
Types of System Integration
Types of System Integration
Nation of Event Driven Applications
Applications that process data arriving via events and streamsResponds to actions generated by both users and systems.Messages passed via event handlers or message queuesWithin the application or across applications.
Applications relationships using queues and topics● One to One ● One to Many ● Many to One ● Many to Many
Characteristics of Cloud Nativeness
Developed as microservices.Packaged in containers.Deployed via continuous delivery workflows.Managed on elastic container-based infrastructureFollow agile DevOps processes.
Other Key Characteristics● Designed as loosely coupled microservices ● Developed with best-of-breed languages and frameworks● Architected with a clean separation of stateless and stateful services
Source: https://medium.com/walmartlabs/cloud-native-application-architecture-a84ddf378f82
Stateless Event Driven Applications
Processing done based on incoming event content, and if needed, by calling databases and other services.
● Can have multiple instances at the same time. ● Apps can be started and stopped when every needed. ● Highly scalable!
Can run on Kubernetes with multiple replicas, or even in serverless frameworks.
Deployment of Stateless Apps
When microservices use HTTP/TCP transports
Deployment of Stateless Apps
When microservices using messaging systems (JMS/Kafka/NATS)
Full size Image Area with text
Not all apps are stateless!
Some Stateful Use Cases!
● Number of server errors over last 20 min.● Same credit card being used in Russia and then in USA within 1 hour.
(Potential Fraud)● Purchasing 3 diamond rings consecutively (high value transactions) within
one hour. (Potential Fraud)● Item delivered but payment not received within 15 min. ● Continuous stock price increase
followed by the first drop.
Building Stateful Applications
Apps need to maintain states (remember previous events).
Things to remember ● State should be preserved during system failure. ● State should be shared with new nodes when scaled.
Options ● Use databases to store the state.● Use distributed cache to store and replicate state.● Keep the data in-memory.
Using Databases to Store the State
Advantages● Data can be shared with all the applications.● Supports transactions.● Support for reach data definition and querying language.
Disadvantages● Introduce high latency in the decisions making process. ● Not the best to implement a state machine.
Things to remember ● State should be preserved during system failure. ● State should be shared to new nodes when scaled.
Options Can use databases to store the state : ● High latency in decisions making process. ● Not the best to implement a state machine..
Can keep the data in-memory : ● Only finite amount of data can be stored. ● Cannot scale
Using Distributed Cache to Share State
Advantages● Not limited to the memory of the computing node.● Fast read. ● Data is preserved during node failures.
Disadvantages● Only Eventual consistency.● Not the best for transactions.● Limited querying support compared to databases.
Things to remember ● State should be preserved during system failure. ● State should be shared to new nodes when scaled.
Options Can use databases to store the state : ● High latency in decisions making process. ● Not the best to implement a state machine..
Can keep the data in-memory : ● Only finite amount of data can be stored. ● Cannot scale
Keeping Data In-Memory
Advantages● Very fast read and write access.● Best to implement state machine like data structures.
Disadvantages● Memory is limited to the computing node.● Cannot scale. ● Data is not preserved during system failures.
Things to remember ● State should be preserved during system failure. ● State should be shared to new nodes when scaled.
Options Can use databases to store the state : ● High latency in decisions making process. ● Not the best to implement a state machine..
Can keep the data in-memory : ● Only finite amount of data can be stored. ● Cannot scale
Best Approach for Stateful Apps
Database Cache Memory
High Performance
Scalability
Fault Tolerant
Full size Image Area with text
There is no one tool for all cases!
Solution to High Performance, Scalable, and Fault Tolerant Apps
Use a combination of database, cache, and in-memory● Keep data in memory. (most states are short lived)● Periodically perform state snapshots to the databases.● Use cache to preload data (to improve read performance)
Solve by following the mapreduce philosophy.1. Map the incoming data.2. Partition data by key.3. Parallely process/summarize the data.4. If needed, combine the data at the end.
When microservices using messaging systems (Kafka/NATS)
Deployment of Stateful Apps
System snapshots periodically, replay data from source upon failure.
How Checkpointing Works?
System snapshots periodically, replay data from source upon failure.
How Checkpointing Works? ...
System snapshots periodically, replay data from source upon failure.
How Checkpointing Works? ...
System snapshots periodically, replay data from source upon failure.
How Checkpointing Works? ...
System snapshots periodically, replay data from source upon failure.
How Checkpointing Works? ...
System snapshots periodically, replay data from source upon failure.
How Checkpointing Works? ...
Full size Image Area with text
How to implement all this?
Streaming and Event Processing Frameworks
Use frameworks to build high performance, scalable, and fault tolerant event driven applications.
Supported frameworks:
Characteristics Of Typical Big Data Frameworks
They are massive (need 5 - 6 nodes per minimum deployment).They need a specialized team to manage. Developers need to go through an approval process to access the framework (center of excellence).
Issues: Reduce speed of innovations and productivity. Less autonomy to the team. High maintenance cost.
Choosing The Best Tool For Microservices Environment?
● Microservices are written, deployed and managed by two pizza size teams. ● Team have full autonomy in the way their project runs. ● Focus on agile and fast development. ● Use the best tool for the task.
Each service can be on written in the best language for its task, and no need to force all service to be in Java, Python or Go.
Introducing A Cloud Native Stream Processor
● Lightweight (low memory footprint and quick startup)● 100% open source (no commercial features).● Native support for Docker and Kubernetes.● Support agile devops workflow and full CI/CD pipeline.● Allow event processing logics to be written in SQL like query language and
via graphical tool. ● Used by :
Image Area
Working with Siddhi
● Develop apps using Siddhi Editor.● CI/CD with build integration and
Siddhi Test Framework. ● Running modes
○ Emadded in Java/Python apps.○ Microservice in bare metal/VM.○ Microservice in Docker.○ Microservice in Kubernetes
(distributed deployment with NATS)
We are happy to announce Siddhi 5.1
With new features such as:
● Export to Docker, and Kubernetes.● New connectors such as gRPC (with Protobuf), S3, Google Cloud Storage.● Database caching. ● Support to remove duplicate events.● Support for complex data transformations with List and Map extensions. ● Unified configuration madel.● Sandbox support to run test.● Improved error handling (log/wait/fault-stream).
Streaming SQL
@app:name('Alert-Processor')
@source(type='kafka', ..., @map(type='json'))define stream TemperatureStream(roomNo string, temp double);
@sink(type='email', ..., @map(type='text')) define stream AlertStream(roomNo string, avgTemp double);
@info(name='AlertQuery') from TemperatureStream#window.time(5 min)select roomNo, avg(temp) as avgTempgroup by roomNooutput first every 15 mininsert into AlertStream;
Source/Sink & Streams
Window Query with Rate Limiting
Web Based Graphical Editor
Web Based Graphical Editor
CI/CD Pipeline of Siddhi
Full size Image Area with text
Patterns For Event Driven Data Processing
Patterns For Event Driven Data Processing
1. Consume and publish events with various data formats.2. Data filtering and preprocessing.3. Date transformation.4. Database integration and caching.5. Service integration and error handling.6. Rule processing7. Serving online and predefined ML models.8. Data Summarization.9. Scatter-gather and data pipelining.
10. Realtime decisions as a service (On-demand processing).
Image Area
Scenario: Order Processing
Customers place orders.Shipments are made.Customers pay for the order. Tasks: ● Process order fulfillment. ● Alerts sent on abnormal conditions. ● Send recommendations.● Throttle order requests when limit
exceeded.● Provide order analytics over time.
Full size Image Area with text
Consume and Publish Events With Various Data Formats
Consume and Publish Events With Various Data Formats
Supported transports
● NATS, Kafka, RabbitMQ, JMS, Amazon SQS, MQTT, IBMMQ● HTTP, gRPC, TCP, Email, WebSocket, ● Change Data Capture (CDC)● File, S3, Google Cloud Storage
Supported data formats
● JSON, XML, Avro, Protobuf, Text, Binary, Key-value, CSV
Consume and Publish Events With Various Data Formats
Default JSON mapping
Custom JSON mapping
@source(type = mqtt, …, @map(type = json))define stream OrderStream(custId string, item string, amount int);
@source(type = mqtt, …, @map(type = json, @attribute(“$.id”,"$.itm” “$.count”)))define stream OrderStream(custId string, item string, amount int);
{“event”:{“custId”:“15”,“item”:“shirt”,“amount”:2}}
{“id”:“15”,“itm”:“shirt”,“count”:2}
Full size Image Area with text
Data Filtering and Preprocessing
Data Filtering and Preprocessing
Filtering● Value ranges● String matching● Regex
Setting Defaults● Null checks ● Default function● If-then-else function
define stream OrderStream (custId string, item string, amount int);
from OrderStream[item!=“unknown”]select default(custId, “internal”) as custId, item, ifThenElse(amount<0, 0, amount) as amount, insert into CleansedOrderStream;
Full size Image Area with text
Date Transformation
Date Transformation
Data extraction ● JSON, Text
Reconstruct messages ● JSON, Text
Inline operations● Math, Logical operations
Inbuilt function calls● 60+ extensions
Custom function calls● Java, JS
json:getDouble(json,"$.amount") as amount
str:concat(‘Hello ’,name) as greeting
amount * price as cost
time:extract('DAY', datetime) as day
myFunction(item, price) as discount
Full size Image Area with text
Database Integration and Caching
Database Integration and Caching
Supported Databases: ● RDBMS (MySQL, Oracle, DB2, Postgre, H2), Redis, Hazelcast● MongoDB, HBase, Cassandra, Solr, Elasticsearch
Database Integration and Caching
Joining table with cache (preloads data for high read performance).
define stream CleansedOrderStream (custId string, item string, amount int);
@store(type=‘rdbms’, …, @cache(cache.policy=‘LRU’, … ))@primaryKey(‘name’)@index(‘unitPrice’)define table ItemPriceTable(name string, unitPrice double);
from CleansedOrderStream as O join ItemPriceTable as T on O.item == T.name
select O.custId, O.item, O.amount * T.unitPrice as priceinsert into EnrichedOrderStream;
Table with Cache
Join Query
Full size Image Area with text
Service Integration and Error Handling
Enriching data with HTTP and gRPC service Calls
● Non blocking ● Handle response based on status
codes
● Handle error conditions○ Send events to Error Stream○ Log failed events○ Block events till endpoint
become available
Service Integration and Error Handling
200
4**
Handle response based on status code
SQL for HTTP Service Integration
Calling external HTTP service and consuming the response.
@sink(type='http-call', publisher.url="http://mystore.com/discount", sink.id="discount", @map(type='json'))define stream EnrichedOrderStream (custId string, item string, price double);
@source(type='http-call-response', http.status.code="200", sink.id="discount", @map(type='json', @attributes(custId ="trp:custId", ..., price="$.discountedPrice")))define stream DiscountedOrderStream (custId string, item string, price double);
Call service
Consume Response
Full size Image Area with text
Rule Processing
Rule Processing
Type of predefined rules
● Rules on single event○ Filter, If-then-else, Match, etc.
● Rules on collection of events○ Summarization○ Join with window or table
● Rules based on event occurrence order ○ Pattern detection○ Trend (sequence) detection ○ Non-occurrence of event
Alert Based On Single Event
Use filter to identify abnormal events.
define stream DiscountedOrderStream (custId string, item string, price double);
from DiscountedOrderStream[price>2000]select custId, item, price, ‘High Price Purchase’ as reasoninsert into AlertStream;
Filter Query
Alert Based On Collection Of Events
Use stateful window query to aggregate orders over time for each customer, and alert conditions once every 5 minute.
define stream DiscountedOrderStream (custId string, item string, price double);
from DiscountedOrderStream#window.time(30 min)select custId, sum(price) as totalPricegroup by custIdhaving totalPrice > 5000output first every 5 mininsert into AlertStream;
Window querywith aggregation and
rate limiting
Alert Based On Event Occurrence Order
Use stateful pattern query to detect event occurrence order and non occurrence.
define stream OrderStream (custId string, orderId string, ...);define stream PaymentStream (orderId string, ...);
from every (e1=OrderStream) -> not PaymentStream[e1.orderId==orderId] within 15 minselect e1.custId, e1.orderId, ...insert into PaymentDelayedStream;
Non occurrence of event
Full size Image Area with text
Serving Online and Predefined ML Models
Serving Online and Predefined ML Models
Type of Machine Learning and Artificial Intelligence processing ● Anomaly detection
○ Markov model
● Serving pre-created ML models○ PMML (build from Python, R, Spark, H2O.ai, etc)○ TensorFlow
● Online machine learning ○ Clustering ○ Classification ○ Regression from OrderStream
#pmml:predict(“/home/user/ml.model”,custId, itemId)insert into RecommendationStream;
Find recommendations
Full size Image Area with text
Data Summarization
Data Summarization
Type of data summarization ● Time based
○ Sliding time window○ Tumbling time window○ On time granularities (secs to years)
● Event count based ○ Sliding length window○ Tumbling length window
● Session based ● Frequency based
Type of aggregations● Sum● Count● Avg● Min● Max● DistinctCount● StdDev
Aggregation Over Multiple Time Granularities
Aggregation on every second, minute, hour, … , yearBuilt using 𝝀 architecture ● In-memory real-time data● RDBMs based historical data
define aggregation OrderAggregation from OrderStream select custId, itemId, sum(price) as total, avg(price) as avgPrice group by custId, itemId aggregate every sec ... year;
Query
Speed Layer & Serving Layer
Batch Layer
Data Retrieval from Aggregations
Query for relevant time interval and granularity.
Data being retrieved both from memory and DB with milliseconds accuracy.
from OrderAggregation within "2019-10-06 00:00:00", "2019-10-30 00:00:00" per "days"select total as orders;
Full size Image Area with text
Scatter-gather and Data Pipelining
Scatter-gather and Data Pipelining
Divide into sub-elements, process each and combine the results
Example : json:tokenize() -> process -> window.batch() -> json:group() str:tokenize() -> process -> window.batch() -> str:groupConcat()
{x,x,x} {x},{x},{x} {y},{y},{y} {y,y,y}
● Create a Siddhi App per use case (Collection of queries).● Connect multiple Siddhi Apps using in-memory source and sink.● Allow rules addition and deletion at runtime.
Modularization
Siddhi Runtime
Siddhi App for data capture
and preprocessing
Siddhi Appsfor each use case
Siddhi App for common data publishing logic
Full size Image Area with text
Realtime Decisions As A Service
Realtime Decisions As A Service
Query Data Stores using REST APIs● Database backed stores (RDBMS, NoSQL)● Named aggregations● In-memory windows & tables
Call HTTP and gRPC service using REST APIs● Use Service and Service-Response
loopbacks● Process Siddhi query chain
and send response synchronously
Full size Image Area with text
Kubernetes Deployment
Image Area
Siddhi Custom Resource Definition
apiVersion: siddhi.io/v1alpha2kind: SiddhiProcessmetadata: name: <name>spec: apps: - script: | <siddhi app> container: env: - name: <key> value: <value> image: "siddhiio/siddhi-runner-ubuntu:5.1.0" messagingSystem: type: nats persistentVolumeClaim: <PVC config> runner: | state.persistence: enabled: true
Image Area
Siddhi Custom Resource Definition
apiVersion: siddhi.io/v1alpha2kind: SiddhiProcessmetadata: name: <name>spec: apps: - script: | <siddhi app> container: env: - name: <key> value: <value> image: "siddhiio/siddhi-runner-ubuntu:5.1.0" messagingSystem: type: nats persistentVolumeClaim: <PVC config> runner: | state.persistence: enabled: true
Siddhi process name
Image Area
Siddhi Custom Resource Definition
apiVersion: siddhi.io/v1alpha2kind: SiddhiProcessmetadata: name: <name>spec: apps: - script: | <siddhi app> container: env: - name: <key> value: <value> image: "siddhiio/siddhi-runner-ubuntu:5.1.0" messagingSystem: type: nats persistentVolumeClaim: <PVC config> runner: | state.persistence: enabled: true
Specify Siddhi Appsas inline text or ConfigMaps
Image Area
Siddhi Custom Resource Definition
apiVersion: siddhi.io/v1alpha2kind: SiddhiProcessmetadata: name: <name>spec: apps: - script: | <siddhi app> container: env: - name: <key> value: <value> image: "siddhiio/siddhi-runner-ubuntu:5.1.0" messagingSystem: type: nats persistentVolumeClaim: <PVC config> runner: | state.persistence: enabled: true
Necessary environment variables
Image Area
Siddhi Custom Resource Definition
apiVersion: siddhi.io/v1alpha2kind: SiddhiProcessmetadata: name: <name>spec: apps: - script: | <siddhi app> container: env: - name: <key> value: <value> image: "siddhiio/siddhi-runner-ubuntu:5.1.0" messagingSystem: type: nats persistentVolumeClaim: <PVC config> runner: | state.persistence: enabled: true
Siddhi Runner with extensions
Image Area
Siddhi Custom Resource Definition
apiVersion: siddhi.io/v1alpha2kind: SiddhiProcessmetadata: name: <name>spec: apps: - script: | <siddhi app> container: env: - name: <key> value: <value> image: "siddhiio/siddhi-runner-ubuntu:5.1.0" messagingSystem: type: nats persistentVolumeClaim: <PVC config> runner: | state.persistence: enabled: true
Distributed processing configs wirth NATS
Image Area
Siddhi Custom Resource Definition
apiVersion: siddhi.io/v1alpha2kind: SiddhiProcessmetadata: name: <name>spec: apps: - script: | <siddhi app> container: env: - name: <key> value: <value> image: "siddhiio/siddhi-runner-ubuntu:5.1.0" messagingSystem: type: nats persistentVolumeClaim: <PVC config> runner: | state.persistence: enabled: true
Periodic incremental persistence, and other Siddhi Runner configs
Kubernetes Deployment
Kubernetes Deployment
$ kubectl get siddhiNAME STATUS READY AGESample-app Running 2/2 5m
What We Looked At
● Characteristics of cloud native apps. ● Problems in implementing event driven stateful applications. ● Siddhi : Cloud Native Stream Processor.● Patterns on implementing event driven applications. ● Deploying on Kubernetes with Siddhi and NATS.
For more info visithttps://siddhi.io
Thank You
Recommended