Event Driven Applications A Head ... - Big Data Days 2019 · Keep data in memory. (most states are...

A Head Start on Cloud-native Event Driven Applications

S. Suhothayan (Suho)

Director WSO2@suhothayan

Increasing demand is causing disaggregation

Types of System Integration

Nation of Event Driven Applications

Applications that process data arriving via events and streamsResponds to actions generated by both users and systems.Messages passed via event handlers or message queuesWithin the application or across applications.

Applications relationships using queues and topics● One to One ● One to Many ● Many to One ● Many to Many

Characteristics of Cloud Nativeness

Developed as microservices.Packaged in containers.Deployed via continuous delivery workflows.Managed on elastic container-based infrastructureFollow agile DevOps processes.

Other Key Characteristics● Designed as loosely coupled microservices ● Developed with best-of-breed languages and frameworks● Architected with a clean separation of stateless and stateful services

Source: https://medium.com/walmartlabs/cloud-native-application-architecture-a84ddf378f82

Stateless Event Driven Applications

Processing done based on incoming event content, and if needed, by calling databases and other services.

● Can have multiple instances at the same time. ● Apps can be started and stopped when every needed. ● Highly scalable!

Can run on Kubernetes with multiple replicas, or even in serverless frameworks.

Deployment of Stateless Apps

When microservices use HTTP/TCP transports

Deployment of Stateless Apps

When microservices using messaging systems (JMS/Kafka/NATS)

Full size Image Area with text

Not all apps are stateless!

Some Stateful Use Cases!

● Number of server errors over last 20 min.● Same credit card being used in Russia and then in USA within 1 hour.

(Potential Fraud)● Purchasing 3 diamond rings consecutively (high value transactions) within

one hour. (Potential Fraud)● Item delivered but payment not received within 15 min. ● Continuous stock price increase

followed by the first drop.

Building Stateful Applications

Apps need to maintain states (remember previous events).

Things to remember ● State should be preserved during system failure. ● State should be shared with new nodes when scaled.

Options ● Use databases to store the state.● Use distributed cache to store and replicate state.● Keep the data in-memory.

Using Databases to Store the State

Advantages● Data can be shared with all the applications.● Supports transactions.● Support for reach data definition and querying language.

Disadvantages● Introduce high latency in the decisions making process. ● Not the best to implement a state machine.

Things to remember ● State should be preserved during system failure. ● State should be shared to new nodes when scaled.

Options Can use databases to store the state : ● High latency in decisions making process. ● Not the best to implement a state machine..

Can keep the data in-memory : ● Only finite amount of data can be stored. ● Cannot scale

Using Distributed Cache to Share State

Advantages● Not limited to the memory of the computing node.● Fast read. ● Data is preserved during node failures.

Disadvantages● Only Eventual consistency.● Not the best for transactions.● Limited querying support compared to databases.

Keeping Data In-Memory

Advantages● Very fast read and write access.● Best to implement state machine like data structures.

Disadvantages● Memory is limited to the computing node.● Cannot scale. ● Data is not preserved during system failures.

Best Approach for Stateful Apps

Database Cache Memory

High Performance

Scalability

Fault Tolerant

There is no one tool for all cases!

Solution to High Performance, Scalable, and Fault Tolerant Apps

Use a combination of database, cache, and in-memory● Keep data in memory. (most states are short lived)● Periodically perform state snapshots to the databases.● Use cache to preload data (to improve read performance)

Solve by following the mapreduce philosophy.1. Map the incoming data.2. Partition data by key.3. Parallely process/summarize the data.4. If needed, combine the data at the end.

When microservices using messaging systems (Kafka/NATS)

Deployment of Stateful Apps

System snapshots periodically, replay data from source upon failure.

How Checkpointing Works?

How Checkpointing Works? ...

How to implement all this?

Streaming and Event Processing Frameworks

Use frameworks to build high performance, scalable, and fault tolerant event driven applications.

Supported frameworks:

Characteristics Of Typical Big Data Frameworks

They are massive (need 5 - 6 nodes per minimum deployment).They need a specialized team to manage. Developers need to go through an approval process to access the framework (center of excellence).

Issues: Reduce speed of innovations and productivity. Less autonomy to the team. High maintenance cost.

Choosing The Best Tool For Microservices Environment?

● Microservices are written, deployed and managed by two pizza size teams. ● Team have full autonomy in the way their project runs. ● Focus on agile and fast development. ● Use the best tool for the task.

Each service can be on written in the best language for its task, and no need to force all service to be in Java, Python or Go.

Introducing A Cloud Native Stream Processor

● Lightweight (low memory footprint and quick startup)● 100% open source (no commercial features).● Native support for Docker and Kubernetes.● Support agile devops workflow and full CI/CD pipeline.● Allow event processing logics to be written in SQL like query language and

via graphical tool. ● Used by :

Image Area

Working with Siddhi

● Develop apps using Siddhi Editor.● CI/CD with build integration and

Siddhi Test Framework. ● Running modes

○ Emadded in Java/Python apps.○ Microservice in bare metal/VM.○ Microservice in Docker.○ Microservice in Kubernetes

(distributed deployment with NATS)

We are happy to announce Siddhi 5.1

With new features such as:

● Export to Docker, and Kubernetes.● New connectors such as gRPC (with Protobuf), S3, Google Cloud Storage.● Database caching. ● Support to remove duplicate events.● Support for complex data transformations with List and Map extensions. ● Unified configuration madel.● Sandbox support to run test.● Improved error handling (log/wait/fault-stream).

Streaming SQL

@app:name('Alert-Processor')

@source(type='kafka', ..., @map(type='json'))define stream TemperatureStream(roomNo string, temp double);

@sink(type='email', ..., @map(type='text')) define stream AlertStream(roomNo string, avgTemp double);

@info(name='AlertQuery') from TemperatureStream#window.time(5 min)select roomNo, avg(temp) as avgTempgroup by roomNooutput first every 15 mininsert into AlertStream;

Source/Sink & Streams

Window Query with Rate Limiting

Web Based Graphical Editor

CI/CD Pipeline of Siddhi

Patterns For Event Driven Data Processing

1. Consume and publish events with various data formats.2. Data filtering and preprocessing.3. Date transformation.4. Database integration and caching.5. Service integration and error handling.6. Rule processing7. Serving online and predefined ML models.8. Data Summarization.9. Scatter-gather and data pipelining.

10. Realtime decisions as a service (On-demand processing).

Image Area

Scenario: Order Processing

Customers place orders.Shipments are made.Customers pay for the order. Tasks: ● Process order fulfillment. ● Alerts sent on abnormal conditions. ● Send recommendations.● Throttle order requests when limit

exceeded.● Provide order analytics over time.

Consume and Publish Events With Various Data Formats

Supported transports

● NATS, Kafka, RabbitMQ, JMS, Amazon SQS, MQTT, IBMMQ● HTTP, gRPC, TCP, Email, WebSocket, ● Change Data Capture (CDC)● File, S3, Google Cloud Storage

Supported data formats

● JSON, XML, Avro, Protobuf, Text, Binary, Key-value, CSV

Consume and Publish Events With Various Data Formats

Default JSON mapping

Custom JSON mapping

@source(type = mqtt, …, @map(type = json))define stream OrderStream(custId string, item string, amount int);

@source(type = mqtt, …, @map(type = json, @attribute(“$.id”,"$.itm” “$.count”)))define stream OrderStream(custId string, item string, amount int);

{“event”:{“custId”:“15”,“item”:“shirt”,“amount”:2}}

{“id”:“15”,“itm”:“shirt”,“count”:2}

Data Filtering and Preprocessing

Filtering● Value ranges● String matching● Regex

Setting Defaults● Null checks ● Default function● If-then-else function

define stream OrderStream (custId string, item string, amount int);

from OrderStream[item!=“unknown”]select default(custId, “internal”) as custId, item, ifThenElse(amount<0, 0, amount) as amount, insert into CleansedOrderStream;

Date Transformation

Data extraction ● JSON, Text

Reconstruct messages ● JSON, Text

Inline operations● Math, Logical operations

Inbuilt function calls● 60+ extensions

Custom function calls● Java, JS

json:getDouble(json,"$.amount") as amount

str:concat(‘Hello ’,name) as greeting

amount * price as cost

time:extract('DAY', datetime) as day

myFunction(item, price) as discount

Database Integration and Caching

Supported Databases: ● RDBMS (MySQL, Oracle, DB2, Postgre, H2), Redis, Hazelcast● MongoDB, HBase, Cassandra, Solr, Elasticsearch

Database Integration and Caching

Joining table with cache (preloads data for high read performance).

define stream CleansedOrderStream (custId string, item string, amount int);

@store(type=‘rdbms’, …, @cache(cache.policy=‘LRU’, … ))@primaryKey(‘name’)@index(‘unitPrice’)define table ItemPriceTable(name string, unitPrice double);

from CleansedOrderStream as O join ItemPriceTable as T on O.item == T.name

select O.custId, O.item, O.amount * T.unitPrice as priceinsert into EnrichedOrderStream;

Table with Cache

Join Query

Service Integration and Error Handling

Enriching data with HTTP and gRPC service Calls

● Non blocking ● Handle response based on status

● Handle error conditions○ Send events to Error Stream○ Log failed events○ Block events till endpoint

become available

Service Integration and Error Handling

Handle response based on status code

SQL for HTTP Service Integration

Calling external HTTP service and consuming the response.

@sink(type='http-call', publisher.url="http://mystore.com/discount", sink.id="discount", @map(type='json'))define stream EnrichedOrderStream (custId string, item string, price double);

@source(type='http-call-response', http.status.code="200", sink.id="discount", @map(type='json', @attributes(custId ="trp:custId", ..., price="$.discountedPrice")))define stream DiscountedOrderStream (custId string, item string, price double);

Call service

Consume Response

Rule Processing

Type of predefined rules

● Rules on single event○ Filter, If-then-else, Match, etc.

● Rules on collection of events○ Summarization○ Join with window or table

● Rules based on event occurrence order ○ Pattern detection○ Trend (sequence) detection ○ Non-occurrence of event

Alert Based On Single Event

Use filter to identify abnormal events.

define stream DiscountedOrderStream (custId string, item string, price double);

from DiscountedOrderStream[price>2000]select custId, item, price, ‘High Price Purchase’ as reasoninsert into AlertStream;

Filter Query

Alert Based On Collection Of Events

Use stateful window query to aggregate orders over time for each customer, and alert conditions once every 5 minute.

define stream DiscountedOrderStream (custId string, item string, price double);

from DiscountedOrderStream#window.time(30 min)select custId, sum(price) as totalPricegroup by custIdhaving totalPrice > 5000output first every 5 mininsert into AlertStream;

Window querywith aggregation and

rate limiting

Alert Based On Event Occurrence Order

Use stateful pattern query to detect event occurrence order and non occurrence.

define stream OrderStream (custId string, orderId string, ...);define stream PaymentStream (orderId string, ...);

from every (e1=OrderStream) -> not PaymentStream[e1.orderId==orderId] within 15 minselect e1.custId, e1.orderId, ...insert into PaymentDelayedStream;

Non occurrence of event

Serving Online and Predefined ML Models

Type of Machine Learning and Artificial Intelligence processing ● Anomaly detection

○ Markov model

● Serving pre-created ML models○ PMML (build from Python, R, Spark, H2O.ai, etc)○ TensorFlow

● Online machine learning ○ Clustering ○ Classification ○ Regression from OrderStream

#pmml:predict(“/home/user/ml.model”,custId, itemId)insert into RecommendationStream;

Find recommendations

Data Summarization

Type of data summarization ● Time based

○ Sliding time window○ Tumbling time window○ On time granularities (secs to years)

● Event count based ○ Sliding length window○ Tumbling length window

● Session based ● Frequency based

Type of aggregations● Sum● Count● Avg● Min● Max● DistinctCount● StdDev

Aggregation Over Multiple Time Granularities

Aggregation on every second, minute, hour, … , yearBuilt using 𝝀 architecture ● In-memory real-time data● RDBMs based historical data

define aggregation OrderAggregation from OrderStream select custId, itemId, sum(price) as total, avg(price) as avgPrice group by custId, itemId aggregate every sec ... year;

Speed Layer & Serving Layer

Batch Layer

Data Retrieval from Aggregations

Query for relevant time interval and granularity.

Data being retrieved both from memory and DB with milliseconds accuracy.

from OrderAggregation within "2019-10-06 00:00:00", "2019-10-30 00:00:00" per "days"select total as orders;

Scatter-gather and Data Pipelining

Divide into sub-elements, process each and combine the results

Example : json:tokenize() -> process -> window.batch() -> json:group() str:tokenize() -> process -> window.batch() -> str:groupConcat()

{x,x,x} {x},{x},{x} {y},{y},{y} {y,y,y}

● Create a Siddhi App per use case (Collection of queries).● Connect multiple Siddhi Apps using in-memory source and sink.● Allow rules addition and deletion at runtime.

Modularization

Siddhi Runtime

Siddhi App for data capture

and preprocessing

Siddhi Appsfor each use case

Siddhi App for common data publishing logic

Realtime Decisions As A Service

Query Data Stores using REST APIs● Database backed stores (RDBMS, NoSQL)● Named aggregations● In-memory windows & tables

Call HTTP and gRPC service using REST APIs● Use Service and Service-Response

loopbacks● Process Siddhi query chain

and send response synchronously

Kubernetes Deployment

Steps On Deploying Siddhi App In Kubernetes

Image Area

Siddhi Custom Resource Definition

apiVersion: siddhi.io/v1alpha2kind: SiddhiProcessmetadata: name: <name>spec: apps: - script: | <siddhi app> container: env: - name: <key> value: <value> image: "siddhiio/siddhi-runner-ubuntu:5.1.0" messagingSystem: type: nats persistentVolumeClaim: <PVC config> runner: | state.persistence: enabled: true

Image Area

Siddhi process name

Image Area

Specify Siddhi Appsas inline text or ConfigMaps

Image Area

Necessary environment variables

Image Area

Siddhi Runner with extensions

Image Area

Distributed processing configs wirth NATS

Image Area

Periodic incremental persistence, and other Siddhi Runner configs

Kubernetes Deployment

$ kubectl get siddhiNAME STATUS READY AGESample-app Running 2/2 5m

What We Looked At

● Characteristics of cloud native apps. ● Problems in implementing event driven stateful applications. ● Siddhi : Cloud Native Stream Processor.● Patterns on implementing event driven applications. ● Deploying on Kubernetes with Siddhi and NATS.

For more info visithttps://siddhi.io

Thank You

Event Driven Applications A Head ... - Big Data Days 2019 · Keep data in memory. (most states are...

Documents

Installing 9.6 BDE binaries on hadoop data nodes Snapshots captured from Cloudera sandbox

Diversity Resources and Data Snapshots December 2011 Edition Diversity Policy and Programs

Green Storage - SNIA...Snapshots. Data. RAID10 “Growth” Snapshots. Backup. Archive. Test. Test. Test. Test. Test. Data. RAIDDP “Growth” Snapshots. Data. RAID DP “Growth”

Using Data Mart reports to create snapshots

RN2120, RR2304, RN3138 ReadyNAS® 2120/2304/3138 ......Data Protection (Backup & Replication) Unlimited block-based snapshots for continuous data protection Restore Snapshots to any

Introduction to XtremIO Snapshots - Corporate Armor · 2015. 5. 7. · Introduction to XtremIO Snapshots 4 Executive Summary Snapshots are instantaneous copy images of volume data,

Project snapshots

Poverty snapshots

VNX Snapshots

Distributed Snapshots

Dell EMC PowerStore: Snapshots and Thin Clones...Snapshots allow recovery of data by rolling back to an older point-in-time or copying select data from the snapshot. Snapshots continue

Public Opinion On The Parliament In Georgia Survey Data Snapshots

Duet Snapshots

Snapshots 2013

SNAPSHOTS Guidelines

Projects SnapShots

Community Foundations: Snapshots of Public-Philanthropic ... · Community Foundations: Snapshots of Public-Philanthropic Partnerships. ... Snapshots of Public-Philanthropic Partnerships,”

VTB -RS485 Software Scalability - Machine Saver, INC. · The VTB-Sensor captures user defined snapshots of the raw vibration dynamicsignal (time waveform/ spectrum)periodically or

Tracking daily mobilities: GPS based bicycle data collection, processing, and analysis snapshots

History Snapshots