Upload
cask-data-inc
View
161
Download
0
Embed Size (px)
Citation preview
Building an ECA Rules engine for IoT using CDAP
Big Data On Tap
03/29/2017
Bhooshan Mogal
2
Event-Condition-Action (ECA) Basics
• Event Parsing and Schema Management
• Boolean expressions — Conditions (or Rules)
• Ability to take one or more Actions based on the result of conditions
ECA is made up of three major components
• Common paradigm in traditional Complex Event Processing (CEP) or Event-Driven architectures, relatively new to the scale-out Apache Hadoop world.
3
ECA Use-cases and Characteristics
• Use Cases
• Smart Home - Security Systems, Appliance Monitoring, …
• Wearables - Monitoring Vital Stats, Fitness Goals, …
• Typical Characteristics
• Data arrives in continuous, real-time streams
• Data has varying schema
• Metadata (Schemas, Rules) can be registered and managed
4
Cask Data Application Platform (CDAP)
• CDAP is a unified integration platform that provides higher level abstractions such as ingest, storage, compute, egress, and visual pipelines for Big Data applications
• Ties together data preparation, data integration, data discovery, data science as well as complex, custom data applications with metadata management, security, operations and governance.
5
ECA via CDAP
• CDAP ECA Application for Schema and Rules Management
• RESTful APIs + UI for Schema and Rules Management
• CDAP Streams/Apache Kafka for high-throughput event ingestion
• Data Preparation directives for transforming data, generating measurements, selecting rules
• Real-time streaming pipelines using Apache Spark Streaming for processing events
• Generic Event Parser plugin to parse events from a known set of schemas
• Rule Executor plugin to apply rules on parsed events, and generate actions
6
ECA Concepts
• Events are telemetry data sent from devices or aggregators that run at edge
• Schema defines the fields, types, rules and transformations that parse the ingested Events
• Measurements are one or more quantitative measures of any kind in an event. Measurements can also be generated by applying transformations on events.
• Conditions (a.k.a Rules) are boolean expressions applied on fields or measurements in an event. Expressions can include complex conditions like ‘and’, ‘or’, etc. A set of rules can be defined for a given schema.
• Actions define a concrete external notifications that are generated based on the result of executing a condition.
• Schema Hash is a MD5 digest of the field names of an event that uniquely identifies the event type.
7
ECA Architecture
8
Schema and Rules Management User Flow
9
ECA Architecture Description
• (A) Event Parser parses incoming events. It has generic
parsing capabilities combined with ability to generate
schema hash for an event. Schema hash is then further
used to retrieve the user based transformations on event
to enhance or extract measurements.
• (B) Rules Executor is responsible for executing the
conditions on the event to generate a boolean value to
be associated with an action. Rules and Action for an
event are uniquely identified using a schema hash.
• (C) Schema Registry is a repository of definition of
schema types that are parseable by the Event Parser. It’s
a CDAP Service backed by a dataset.
• (D) Rules Registry is a repository of Rules (conditions) to
be executed for a Event type. Rules are indexed on a
Schema hash. It’s a CDAP Service backed by a dataset.
• (E) Reliable Notification Dispatcher is a daemon process
that is responsible for reading the events of a priority
queue dataset to trigger external notifications. It uses
plugin capabilities to define different external comm.
points.
• (F) Event transport or ingestion is achieved by either
using Kafka or CDAP Streams. There can be other
mechanisms like Amazon SQS or Azure Event Hub.
10
ECA Event Flow
{ "Alert": { "Id": "25", "Time": "2016-09-22T07:41:59.2486611+01:00", "Type": "SOS" }, "Battery": 85, "CallerId": "+44123456789", "Calories": 100, "LastContactTime": "2016-09-22T07:41:59.2486611+01:00", "MessageId": "a32d4883-1d0e-489c-bf74-706ffa4b9e62", "MessageTime": "2016-09-22T07:42:06.2486611+01:00", "Position": { "Accuracy": 10, "Latitude": "51.507351", "Longitude": "-0.127758", "Time": "2016-09-22T07:41:58.2486611+01:00" }, "Steps": 1000, "WatchImei": "123456789012345" }
Incoming Telemetry Event
{ “Alert_Id”: “25”, “Alert_Time” : “2016-09-22T07:41:59.2486611+01:00”, “Alert_Type” : “SOS”, "Battery": 85, "CallerId": "+44123456789", "Calories": 100, "LastContactTime": "2016-09-22T07:41:59.2486611+01:00", "MessageId": "a32d4883-1d0e-489c-bf74-706ffa4b9e62", "MessageTime": "2016-09-22T07:42:06.2486611+01:00", “Position_Accuracy”: 10, “Position_Latitude”: “51.507351”, “Position_Longitude”: “-0.127758”, “Position_Time”: “2016-09-22T07:41:58.2486611+01:00", "Steps": 1000, "WatchImei": "123456789012345" }
{ “Alert_Id”: “25”, “Alert_Time” : “2016-09-22T07:41:59.2486611+01:00”, “Alert_Type” : “SOS”, "Battery": 85, "CallerId": "+44123456789", "Calories": 100, "LastContactTime": "2016-09-22T07:41:59.2486611+01:00", "MessageId": "a32d4883-1d0e-489c-bf74-706ffa4b9e62", "MessageTime": "2016-09-22T07:42:06.2486611+01:00", “Position_Accuracy”: 10, “Position_Latitude”: “51.507351”, “Position_Longitude”: “-0.127758”, “Position_Time”: “2016-09-22T07:41:58.2486611+01:00", "Steps": 1000, "WatchImei": “123456789012345”, “CaloriesPerStep” : 0.1, “hash”: “ABABBASBAB342442ABABABAAB234ABABA67867” }
Parsing Directives Applied Hash Generation & User Transformation
• Generic directives applied
• If array of events, multiple record created
• Flattening on each record
• Hash Generated based on field names (all field considered for hash generation) (e.g. hash)
• User directives looked up based on hash
• User directives applied (e.g. CaloriesPerStep)
11
ECA Event Flow
Hash Generation
(Battery > 85 && Alert_Type == “SOS”) => sms (CaloriesPerStep > 10) => email
Apply Rules & Post Directives
{ “Alert_Id”: “25”, “Alert_Time” : “2016-09-22T07:41:59.2486611+01:00”, “Alert_Type” : “SOS”, "Battery": 85, "CallerId": "+44123456789", "Calories": 100, "LastContactTime": "2016-09-22T07:41:59.2486611+01:00", "MessageId": "a32d4883-1d0e-489c-bf74-706ffa4b9e62", "MessageTime": "2016-09-22T07:42:06.2486611+01:00", “Position_Accuracy”: 10, “Position_Latitude”: “51.507351”, “Position_Longitude”: “-0.127758”, “Position_Time”: “2016-09-22T07:41:58.2486611+01:00", "Steps": 1000, "WatchImei": “123456789012345”, “CaloriesPerStep” : 0.1, “hash”: “ABABBASBAB342442ABABABAAB234ABABA67867” }
{ “output” : <key-value-of-event>, “sms” : true, “email” : false }
Output Event Stored in Dataset
• Applies conditions (boolean expressions) on the incoming fields
• Action types - sms, email are predefined
• Conditions can be complex
• Each event will generate the action result
• Includes the event as key-value for debugging purpose
12
Demo
13
Summary - Schema and Rules Management APIs
•Add New Schema — Adds a new schema to the registry
•Delete Schema — Deletes a schema from the registry
•View A Schema — Provides details of a schema
• List Schemas — List all the schemas in the schema registry
•Update Schema — Update a schema in the registry
Schema Management APIs provide ability to create, delete, list and update Schema Registry
Rules and Action management APIs provide ability to create, delete, list and update Rules and associated actions
•Add New Rule(s) — Adds one or more rules to the rules registry. Rules are associated with a Schema Hash or
Key field as specified by the user directives
•Delete A Rule — Deletes a rule.
• List All Rules for a key - Key could be schema hash or user generated key
14
Summary - Processing events in real-time
•CDAP Realtime pipeline using Apache Spark Streaming
• Reads events from a CDAP Stream
•Generic Event parser parses the events, applies transformations stored in the schema registry. Generates
measurements, and a key to lookup rules to be applied to the event.
• Rules executor looks up rules from the rules registry and applies them to generate actions
• Events with an sms action are stored in the SMS Kafka topic, ones with an email action in the Email Kafka topic
Questions?Thank You