Event Driven Applications A Head ... - Big Data Days 2019 · Keep data in memory. (most states are...

Preview:

Citation preview

A Head Start on Cloud-native Event Driven Applications

S. Suhothayan (Suho)

Director WSO2@suhothayan

Increasing demand is causing disaggregation

Types of System Integration

Types of System Integration

Types of System Integration

Nation of Event Driven Applications

Applications that process data arriving via events and streamsResponds to actions generated by both users and systems.Messages passed via event handlers or message queuesWithin the application or across applications.

Applications relationships using queues and topics● One to One ● One to Many ● Many to One ● Many to Many

Characteristics of Cloud Nativeness

Developed as microservices.Packaged in containers.Deployed via continuous delivery workflows.Managed on elastic container-based infrastructureFollow agile DevOps processes.

Other Key Characteristics● Designed as loosely coupled microservices ● Developed with best-of-breed languages and frameworks● Architected with a clean separation of stateless and stateful services

Source: https://medium.com/walmartlabs/cloud-native-application-architecture-a84ddf378f82

Stateless Event Driven Applications

Processing done based on incoming event content, and if needed, by calling databases and other services.

● Can have multiple instances at the same time. ● Apps can be started and stopped when every needed. ● Highly scalable!

Can run on Kubernetes with multiple replicas, or even in serverless frameworks.

Deployment of Stateless Apps

When microservices use HTTP/TCP transports

Deployment of Stateless Apps

When microservices using messaging systems (JMS/Kafka/NATS)

Full size Image Area with text

Not all apps are stateless!

Some Stateful Use Cases!

● Number of server errors over last 20 min.● Same credit card being used in Russia and then in USA within 1 hour.

(Potential Fraud)● Purchasing 3 diamond rings consecutively (high value transactions) within

one hour. (Potential Fraud)● Item delivered but payment not received within 15 min. ● Continuous stock price increase

followed by the first drop.

Building Stateful Applications

Apps need to maintain states (remember previous events).

Things to remember ● State should be preserved during system failure. ● State should be shared with new nodes when scaled.

Options ● Use databases to store the state.● Use distributed cache to store and replicate state.● Keep the data in-memory.

Using Databases to Store the State

Advantages● Data can be shared with all the applications.● Supports transactions.● Support for reach data definition and querying language.

Disadvantages● Introduce high latency in the decisions making process. ● Not the best to implement a state machine.

Things to remember ● State should be preserved during system failure. ● State should be shared to new nodes when scaled.

Options Can use databases to store the state : ● High latency in decisions making process. ● Not the best to implement a state machine..

Can keep the data in-memory : ● Only finite amount of data can be stored. ● Cannot scale

Using Distributed Cache to Share State

Advantages● Not limited to the memory of the computing node.● Fast read. ● Data is preserved during node failures.

Disadvantages● Only Eventual consistency.● Not the best for transactions.● Limited querying support compared to databases.

Things to remember ● State should be preserved during system failure. ● State should be shared to new nodes when scaled.

Options Can use databases to store the state : ● High latency in decisions making process. ● Not the best to implement a state machine..

Can keep the data in-memory : ● Only finite amount of data can be stored. ● Cannot scale

Keeping Data In-Memory

Advantages● Very fast read and write access.● Best to implement state machine like data structures.

Disadvantages● Memory is limited to the computing node.● Cannot scale. ● Data is not preserved during system failures.

Things to remember ● State should be preserved during system failure. ● State should be shared to new nodes when scaled.

Options Can use databases to store the state : ● High latency in decisions making process. ● Not the best to implement a state machine..

Can keep the data in-memory : ● Only finite amount of data can be stored. ● Cannot scale

Best Approach for Stateful Apps

Database Cache Memory

High Performance

Scalability

Fault Tolerant

Full size Image Area with text

There is no one tool for all cases!

Solution to High Performance, Scalable, and Fault Tolerant Apps

Use a combination of database, cache, and in-memory● Keep data in memory. (most states are short lived)● Periodically perform state snapshots to the databases.● Use cache to preload data (to improve read performance)

Solve by following the mapreduce philosophy.1. Map the incoming data.2. Partition data by key.3. Parallely process/summarize the data.4. If needed, combine the data at the end.

When microservices using messaging systems (Kafka/NATS)

Deployment of Stateful Apps

System snapshots periodically, replay data from source upon failure.

How Checkpointing Works?

System snapshots periodically, replay data from source upon failure.

How Checkpointing Works? ...

System snapshots periodically, replay data from source upon failure.

How Checkpointing Works? ...

System snapshots periodically, replay data from source upon failure.

How Checkpointing Works? ...

System snapshots periodically, replay data from source upon failure.

How Checkpointing Works? ...

System snapshots periodically, replay data from source upon failure.

How Checkpointing Works? ...

Full size Image Area with text

How to implement all this?

Streaming and Event Processing Frameworks

Use frameworks to build high performance, scalable, and fault tolerant event driven applications.

Supported frameworks:

Characteristics Of Typical Big Data Frameworks

They are massive (need 5 - 6 nodes per minimum deployment).They need a specialized team to manage. Developers need to go through an approval process to access the framework (center of excellence).

Issues: Reduce speed of innovations and productivity. Less autonomy to the team. High maintenance cost.

Choosing The Best Tool For Microservices Environment?

● Microservices are written, deployed and managed by two pizza size teams. ● Team have full autonomy in the way their project runs. ● Focus on agile and fast development. ● Use the best tool for the task.

Each service can be on written in the best language for its task, and no need to force all service to be in Java, Python or Go.

Introducing A Cloud Native Stream Processor

● Lightweight (low memory footprint and quick startup)● 100% open source (no commercial features).● Native support for Docker and Kubernetes.● Support agile devops workflow and full CI/CD pipeline.● Allow event processing logics to be written in SQL like query language and

via graphical tool. ● Used by :

Image Area

Working with Siddhi

● Develop apps using Siddhi Editor.● CI/CD with build integration and

Siddhi Test Framework. ● Running modes

○ Emadded in Java/Python apps.○ Microservice in bare metal/VM.○ Microservice in Docker.○ Microservice in Kubernetes

(distributed deployment with NATS)

We are happy to announce Siddhi 5.1

With new features such as:

● Export to Docker, and Kubernetes.● New connectors such as gRPC (with Protobuf), S3, Google Cloud Storage.● Database caching. ● Support to remove duplicate events.● Support for complex data transformations with List and Map extensions. ● Unified configuration madel.● Sandbox support to run test.● Improved error handling (log/wait/fault-stream).

Streaming SQL

@app:name('Alert-Processor')

@source(type='kafka', ..., @map(type='json'))define stream TemperatureStream(roomNo string, temp double);

@sink(type='email', ..., @map(type='text')) define stream AlertStream(roomNo string, avgTemp double);

@info(name='AlertQuery') from TemperatureStream#window.time(5 min)select roomNo, avg(temp) as avgTempgroup by roomNooutput first every 15 mininsert into AlertStream;

Source/Sink & Streams

Window Query with Rate Limiting

Web Based Graphical Editor

Web Based Graphical Editor

CI/CD Pipeline of Siddhi

Full size Image Area with text

Patterns For Event Driven Data Processing

Patterns For Event Driven Data Processing

1. Consume and publish events with various data formats.2. Data filtering and preprocessing.3. Date transformation.4. Database integration and caching.5. Service integration and error handling.6. Rule processing7. Serving online and predefined ML models.8. Data Summarization.9. Scatter-gather and data pipelining.

10. Realtime decisions as a service (On-demand processing).

Image Area

Scenario: Order Processing

Customers place orders.Shipments are made.Customers pay for the order. Tasks: ● Process order fulfillment. ● Alerts sent on abnormal conditions. ● Send recommendations.● Throttle order requests when limit

exceeded.● Provide order analytics over time.

Full size Image Area with text

Consume and Publish Events With Various Data Formats

Consume and Publish Events With Various Data Formats

Supported transports

● NATS, Kafka, RabbitMQ, JMS, Amazon SQS, MQTT, IBMMQ● HTTP, gRPC, TCP, Email, WebSocket, ● Change Data Capture (CDC)● File, S3, Google Cloud Storage

Supported data formats

● JSON, XML, Avro, Protobuf, Text, Binary, Key-value, CSV

Consume and Publish Events With Various Data Formats

Default JSON mapping

Custom JSON mapping

@source(type = mqtt, …, @map(type = json))define stream OrderStream(custId string, item string, amount int);

@source(type = mqtt, …, @map(type = json, @attribute(“$.id”,"$.itm” “$.count”)))define stream OrderStream(custId string, item string, amount int);

{“event”:{“custId”:“15”,“item”:“shirt”,“amount”:2}}

{“id”:“15”,“itm”:“shirt”,“count”:2}

Full size Image Area with text

Data Filtering and Preprocessing

Data Filtering and Preprocessing

Filtering● Value ranges● String matching● Regex

Setting Defaults● Null checks ● Default function● If-then-else function

define stream OrderStream (custId string, item string, amount int);

from OrderStream[item!=“unknown”]select default(custId, “internal”) as custId, item, ifThenElse(amount<0, 0, amount) as amount, insert into CleansedOrderStream;

Full size Image Area with text

Date Transformation

Date Transformation

Data extraction ● JSON, Text

Reconstruct messages ● JSON, Text

Inline operations● Math, Logical operations

Inbuilt function calls● 60+ extensions

Custom function calls● Java, JS

json:getDouble(json,"$.amount") as amount

str:concat(‘Hello ’,name) as greeting

amount * price as cost

time:extract('DAY', datetime) as day

myFunction(item, price) as discount

Full size Image Area with text

Database Integration and Caching

Database Integration and Caching

Supported Databases: ● RDBMS (MySQL, Oracle, DB2, Postgre, H2), Redis, Hazelcast● MongoDB, HBase, Cassandra, Solr, Elasticsearch

Database Integration and Caching

Joining table with cache (preloads data for high read performance).

define stream CleansedOrderStream (custId string, item string, amount int);

@store(type=‘rdbms’, …, @cache(cache.policy=‘LRU’, … ))@primaryKey(‘name’)@index(‘unitPrice’)define table ItemPriceTable(name string, unitPrice double);

from CleansedOrderStream as O join ItemPriceTable as T on O.item == T.name

select O.custId, O.item, O.amount * T.unitPrice as priceinsert into EnrichedOrderStream;

Table with Cache

Join Query

Full size Image Area with text

Service Integration and Error Handling

Enriching data with HTTP and gRPC service Calls

● Non blocking ● Handle response based on status

codes

● Handle error conditions○ Send events to Error Stream○ Log failed events○ Block events till endpoint

become available

Service Integration and Error Handling

200

4**

Handle response based on status code

SQL for HTTP Service Integration

Calling external HTTP service and consuming the response.

@sink(type='http-call', publisher.url="http://mystore.com/discount", sink.id="discount", @map(type='json'))define stream EnrichedOrderStream (custId string, item string, price double);

@source(type='http-call-response', http.status.code="200", sink.id="discount", @map(type='json', @attributes(custId ="trp:custId", ..., price="$.discountedPrice")))define stream DiscountedOrderStream (custId string, item string, price double);

Call service

Consume Response

Full size Image Area with text

Rule Processing

Rule Processing

Type of predefined rules

● Rules on single event○ Filter, If-then-else, Match, etc.

● Rules on collection of events○ Summarization○ Join with window or table

● Rules based on event occurrence order ○ Pattern detection○ Trend (sequence) detection ○ Non-occurrence of event

Alert Based On Single Event

Use filter to identify abnormal events.

define stream DiscountedOrderStream (custId string, item string, price double);

from DiscountedOrderStream[price>2000]select custId, item, price, ‘High Price Purchase’ as reasoninsert into AlertStream;

Filter Query

Alert Based On Collection Of Events

Use stateful window query to aggregate orders over time for each customer, and alert conditions once every 5 minute.

define stream DiscountedOrderStream (custId string, item string, price double);

from DiscountedOrderStream#window.time(30 min)select custId, sum(price) as totalPricegroup by custIdhaving totalPrice > 5000output first every 5 mininsert into AlertStream;

Window querywith aggregation and

rate limiting

Alert Based On Event Occurrence Order

Use stateful pattern query to detect event occurrence order and non occurrence.

define stream OrderStream (custId string, orderId string, ...);define stream PaymentStream (orderId string, ...);

from every (e1=OrderStream) -> not PaymentStream[e1.orderId==orderId] within 15 minselect e1.custId, e1.orderId, ...insert into PaymentDelayedStream;

Non occurrence of event

Full size Image Area with text

Serving Online and Predefined ML Models

Serving Online and Predefined ML Models

Type of Machine Learning and Artificial Intelligence processing ● Anomaly detection

○ Markov model

● Serving pre-created ML models○ PMML (build from Python, R, Spark, H2O.ai, etc)○ TensorFlow

● Online machine learning ○ Clustering ○ Classification ○ Regression from OrderStream

#pmml:predict(“/home/user/ml.model”,custId, itemId)insert into RecommendationStream;

Find recommendations

Full size Image Area with text

Data Summarization

Data Summarization

Type of data summarization ● Time based

○ Sliding time window○ Tumbling time window○ On time granularities (secs to years)

● Event count based ○ Sliding length window○ Tumbling length window

● Session based ● Frequency based

Type of aggregations● Sum● Count● Avg● Min● Max● DistinctCount● StdDev

Aggregation Over Multiple Time Granularities

Aggregation on every second, minute, hour, … , yearBuilt using 𝝀 architecture ● In-memory real-time data● RDBMs based historical data

define aggregation OrderAggregation from OrderStream select custId, itemId, sum(price) as total, avg(price) as avgPrice group by custId, itemId aggregate every sec ... year;

Query

Speed Layer & Serving Layer

Batch Layer

Data Retrieval from Aggregations

Query for relevant time interval and granularity.

Data being retrieved both from memory and DB with milliseconds accuracy.

from OrderAggregation within "2019-10-06 00:00:00", "2019-10-30 00:00:00" per "days"select total as orders;

Full size Image Area with text

Scatter-gather and Data Pipelining

Scatter-gather and Data Pipelining

Divide into sub-elements, process each and combine the results

Example : json:tokenize() -> process -> window.batch() -> json:group() str:tokenize() -> process -> window.batch() -> str:groupConcat()

{x,x,x} {x},{x},{x} {y},{y},{y} {y,y,y}

● Create a Siddhi App per use case (Collection of queries).● Connect multiple Siddhi Apps using in-memory source and sink.● Allow rules addition and deletion at runtime.

Modularization

Siddhi Runtime

Siddhi App for data capture

and preprocessing

Siddhi Appsfor each use case

Siddhi App for common data publishing logic

Full size Image Area with text

Realtime Decisions As A Service

Realtime Decisions As A Service

Query Data Stores using REST APIs● Database backed stores (RDBMS, NoSQL)● Named aggregations● In-memory windows & tables

Call HTTP and gRPC service using REST APIs● Use Service and Service-Response

loopbacks● Process Siddhi query chain

and send response synchronously

Full size Image Area with text

Kubernetes Deployment

Steps On Deploying Siddhi App In Kubernetes

Image Area

Siddhi Custom Resource Definition

apiVersion: siddhi.io/v1alpha2kind: SiddhiProcessmetadata: name: <name>spec: apps: - script: | <siddhi app> container: env: - name: <key> value: <value> image: "siddhiio/siddhi-runner-ubuntu:5.1.0" messagingSystem: type: nats persistentVolumeClaim: <PVC config> runner: | state.persistence: enabled: true

Image Area

Siddhi Custom Resource Definition

apiVersion: siddhi.io/v1alpha2kind: SiddhiProcessmetadata: name: <name>spec: apps: - script: | <siddhi app> container: env: - name: <key> value: <value> image: "siddhiio/siddhi-runner-ubuntu:5.1.0" messagingSystem: type: nats persistentVolumeClaim: <PVC config> runner: | state.persistence: enabled: true

Siddhi process name

Image Area

Siddhi Custom Resource Definition

apiVersion: siddhi.io/v1alpha2kind: SiddhiProcessmetadata: name: <name>spec: apps: - script: | <siddhi app> container: env: - name: <key> value: <value> image: "siddhiio/siddhi-runner-ubuntu:5.1.0" messagingSystem: type: nats persistentVolumeClaim: <PVC config> runner: | state.persistence: enabled: true

Specify Siddhi Appsas inline text or ConfigMaps

Image Area

Siddhi Custom Resource Definition

apiVersion: siddhi.io/v1alpha2kind: SiddhiProcessmetadata: name: <name>spec: apps: - script: | <siddhi app> container: env: - name: <key> value: <value> image: "siddhiio/siddhi-runner-ubuntu:5.1.0" messagingSystem: type: nats persistentVolumeClaim: <PVC config> runner: | state.persistence: enabled: true

Necessary environment variables

Image Area

Siddhi Custom Resource Definition

apiVersion: siddhi.io/v1alpha2kind: SiddhiProcessmetadata: name: <name>spec: apps: - script: | <siddhi app> container: env: - name: <key> value: <value> image: "siddhiio/siddhi-runner-ubuntu:5.1.0" messagingSystem: type: nats persistentVolumeClaim: <PVC config> runner: | state.persistence: enabled: true

Siddhi Runner with extensions

Image Area

Siddhi Custom Resource Definition

apiVersion: siddhi.io/v1alpha2kind: SiddhiProcessmetadata: name: <name>spec: apps: - script: | <siddhi app> container: env: - name: <key> value: <value> image: "siddhiio/siddhi-runner-ubuntu:5.1.0" messagingSystem: type: nats persistentVolumeClaim: <PVC config> runner: | state.persistence: enabled: true

Distributed processing configs wirth NATS

Image Area

Siddhi Custom Resource Definition

apiVersion: siddhi.io/v1alpha2kind: SiddhiProcessmetadata: name: <name>spec: apps: - script: | <siddhi app> container: env: - name: <key> value: <value> image: "siddhiio/siddhi-runner-ubuntu:5.1.0" messagingSystem: type: nats persistentVolumeClaim: <PVC config> runner: | state.persistence: enabled: true

Periodic incremental persistence, and other Siddhi Runner configs

Kubernetes Deployment

Kubernetes Deployment

$ kubectl get siddhiNAME STATUS READY AGESample-app Running 2/2 5m

What We Looked At

● Characteristics of cloud native apps. ● Problems in implementing event driven stateful applications. ● Siddhi : Cloud Native Stream Processor.● Patterns on implementing event driven applications. ● Deploying on Kubernetes with Siddhi and NATS.

For more info visithttps://siddhi.io

Thank You

Recommended