Upload
wso2-inc
View
108
Download
1
Embed Size (px)
Citation preview
Analytics Patterns for Your Digital Enterprise
Sriskandarajah Suhothayan (Suho) Associate Director/Architect
WSO2
How do you like your boss to be ?
How do you like your boss to be ?
You are Fired !!!
Without Analytics !
We will make your AnalyticsGreat Again !!!
We will cover
• Introduction to WSO2 Data Analytics Server • Analytics Patterns • Smart Analytics Solutions
Smart Analytics
Creating realtime, intelligent, actionable business insights,
and data products
WSO2 Data Analytics Server
Realtime Incremental Intelligent
WSO2 DAS Architecture
Market RecognitionNamed as a Strong Performer in The Forrester Wave™: Big Data
Streaming Analytics, Q1 2016.
• Highest score possible in 'Acquisition and Pricing' criteria• Second-highest scores in 'Ability to execute' criteria
The Forrester Report notes…..
“WSO2 is an open source middleware provider that includes a full spectrum of architected-as-one components such as application servers, message brokers, enterprise service
bus, and many others.
Its streaming analytics solution follows the complex event processor architectural approach, so it provides very low-latency analytics. Enterprises that already use WSO2 middleware can add CEP seamlessly. Enterprises looking for a full middleware stack that
includes streaming analytics will find a place for WSO2 on their shortlist as well.”
Receivers
+ Pluggable Custom Receivers
Event Streams
Event Stream Schema
Name : TemperatureStream
Version : 1.0
Attribute Type
sensorID String
temperature double
preasure double
Event
StreamID TemperatureStream:1.0
Timestamp 1487270220419
sensorID AP234
temperature 23.5
preasure 94.2
SourceIP 168.50.24.2
+ Support for arbitrary key-value
pairs
Realtime Processing
• Process events in streaming fashion (one event at a time) • Processing topology (Execution Plan)
– Written in Siddhi Query Language– Runs in isolation– Include
• Queries• Input event streams • Output event streams
Realtime Processing Patterns
• Transformation– projection, transformation, enrich, split
• Temporal Aggregation– basic stats, group by Aggregation, moving averages
• Alert and Threshold • Event Correlation • Trends
– detecting rise, fall, turn, triple bottom • Partition• Join Streams • Datastore Querying
Siddhi Query Syntax
define stream <event stream>(<attribute> <type>,<attribute> <type>, ...);
from <event stream>select <attribute>,<attribute>, ...insert into <event stream> ;
Basic Patterns
define stream SoftDrinkSales (region string, brand string, quantity int, price double);
from SoftDrinkSales[price >= 100]#window.time(1 hour)select region, brand, avg(quantity) as avgQuantity
group by region, brandhaving avgQuantity > 1000insert into HighHourlySales ;
Temporal Aggregation,Transformation,
Threshold & Filtering
Other supported window types: timeBatch(), length(), lengthBatch(), etc.
Event Correlation Pattern
define stream Purchase (price double, cardNo long,place string);
from every (a1 = Purchase[price < 10] ) -> a2 = Purchase[ price >10000 and a1.cardNo == a2.cardNo ]
within 1 dayselect a1.cardNo as cardNo, a2.price as price, a2.place as placeinsert into PotentialFraud ;
Only Supported in CEP Systems!!!
Data Persistence
• Provides backend datastore agnostic way to store and retrieve data
• Provides standard REST API• Pluggable data connectors
– RDBMS– Cassandra– HBase– custom ...
Data Abstraction Layer
Custom
Siddhi Event Table and Join
@from(eventtable = 'rdbms' , datasource.name = ‘CardDataSource’ , table.name = ‘UserTable’, caching.algorithm’=‘LRU’)define table CardUserTable (name string, cardNum long);
from Purchase as p join CardUserTable as con p.cardNo == c.cardNum
select p.cardNo as cardNo, c.name as name, p.price as priceinsert into PurchaseUserStream ;
Supported for RDBMS, In-Memory, Distributed In-Memory Grid
(Hazelcast), WSO2 Analytics Table
Cache used to improve performance
Incremental Processing Patterns
• Periodic Analysis• Incremental Analysis
– on newly arrived data• Lambda Architecture• Realtime Incremental Analytics
– on newly arrived data with low latency
Periodic Analytics Pattern
• Runs through the full data set• Summarize data periodically• E.g: Identifying median
• Supported with WSO2 DAS– Spark Script Scheduling– Siddhi Batch windows.
https://www.hsph.harvard.edu/population-development/2014/09/08/impact-of-schedule-control-on-quality-of-care-in-nursing-homes/
Incremental Analytics Pattern
• For incremental Big Data processing• Periodically process the newly arrived data• Via Extended Spark
create temporary table orders using CarbonAnalytics options (tableName "ORDERS",
schema "customerID STRING, phoneType STIRNG, OrderID STRING, cost DOUBLE, _timestamp LONG -i",
incrementalParams "orders, 60");
Lambda Architecture Pattern
http://lambda-architecture.net/
Supported with Siddhi Event Tables
Realtime Incremental Analytics Pattern
• Low latency and low resource utilization • Works for both short and long term streaming data • Enhanced version of Lambda Architecture
Realtime incremental processing (Seconds & Minutes)
Batch incremental processing (Hour and above)
Communicate
Alerts Dashboards
Interactive
Queries APIs
https://old.datacandy.com/technology/loyalty-software-api/, http://martywdamon.blogspot.com/2013/05/child-endangerment.html http://www.stormcorp.ch/products/interactive-queries/
Publishers+ Pluggable
Custom Publishers
Dashboards
• Dashboard generation • Gadget generation • Gather data via
– Websockets – Polling
• Custom & Personalized Gadget and Dashboard support
Interactive Queries
• Full text search • Drilldown search • Near real time data indexing
and retrieval• Powered by Apache Lucene
Intelligent Processing Patterns
• Build and Run ML Models• Streaming ML• Anomaly Detection• Detect Rare Activity Sequences• Scoring• Realtime Risk Detection
Predictive Analytics
• Guided UI to build Machine Learning models with– Apache Spark MLlib– H2O.ai (for deep learning
algorithms)• Build with R and export them as
PMML • Run built models against realtime
data in DAS
Real time Prediction
Using built machine learning models
from DataStream#ml:predict(“/home/user/ml.model”, “double”)select * Insert into PredictionStream ;
Or use R scripts, Regression, Markov Chains or Anomaly Detection on realtime
Analytics Extensions Store
• geo: Geographical processing • nlp: Natural Language Processing
(with Stanford NLP)• ml: Running Machine Learning and
PMML models • timeseries: Regression and time
series • math: Mathematical operations• str: String operations • regex: Regular expression • more ...
https://store.wso2.com/
Smart Analytics Solutions
WSO2 DAS Solutions
• Banking and Finance– Fraud Detection & Anti Money Laundering– Risk Management– Stock Market Surveillance
• eCommerce and Digital Marketing• Fleet Management• Smart Energy Analytics• QoS Enablement• System and Network Monitoring• Healthcare
Fraud Detection & Anti Money Laundering
Applies for
• Payment Fraud• Money Laundering• Identity Fraud
Ways to solve
• Static Rules• Fraud Scoring• Machine Learning• Markov Models
http://www.carp.ca/2013/03/08/march-is-fraud-prevention-month/
Avoid False Positives with Scoring Pattern
• You just bought a diamond ring
• You bought 20 diamond rings, within 15 minutes at 3am and shipped it to 4 global locations?
• Use combination of rules• Give weights to each rule• Single number that reflects multiple fraud indicators• Use a threshold to reject transactions
Score = 2*X + 4*Y + 13*Z
Machine Learning Pattern
• Identify ‘unknown’ types of fraud• Use classification techniques
• Model randomly changing systems• Detect using Siddhi Markov Models
Detect Sequence of Rare Activities
https://en.wikipedia.org/wiki/Markov_chain
Investigation
Capture alert & further investigate with interactive analytics
Banking and FinanceRisk Management
• End of Day Risk processing is no longer adequate
• Support Realtime Intra-day Value at Risk computations
• Calculated using realtime– Market prices– Portfolio changes
Realtime Value at Risk
WSO2 DAS models Value at Risk using 3 standard methods
• Historical Simulation • Variance-Covariance • Monte Carlo Simulation
Query : from InputSteam#var:historical(251, 0.95, Symbol, Price)select *insert into VaRStream ;
Banking and FinanceStock Market Surveillance
Manipulation Methods
• Front Running• Pump (and Dump)• Insider Dealing• Wash Trading• Churning• more ...
Banking and FinanceStock Market Surveillance ...
Manipulated via :
• Artificially inflating or deflating stock prices• Exploiting prior knowledge of company proceedings• Abusing advanced knowledge of pending orders
Solved via :
Joining market data feeds with external data streams such as
company announcements, news feeds, twitter streams, etc
Client Front Running Detectionwith Event Correlation Pattern
eCommerce and Digital MarketingRecommendations
Based on :
• Customer buying history• Item buying history• Current trends • Machine Learning
Customers are likely to choose recommendations as they are personalized.
Proximity Marketing
Heat Map Analysis
eCommerce and Digital MarketingAD Optimisations
Achieve :
• More clicks• More conversions• Effective use of
allocated budget• Higher click through
ratio (CTR)• Greater ROI
Fleet Management
You can know
• Where your fleet is ?• Driving behaviour• Are vehicles used optimally?
– Fuel expenses– Travel time– Round trip time
• Current situation on the road• more ...
https://rnpc-rekos.ru/gps-fleet-management-systems/
Fleet Management Dashboard
Smart Energy Analytics
• Optimize Smart Grids– Analyse energy demand– Predict required energy
supply• Understand steady state
operations• Act on events in energy network• Monitor process and equipments
on energy network• more ...
• Monitor and manage Equipments
• Home Automation• Surveillance • Maintenance• Edge Analytics with
Siddhi• more ...
Smart Building / Home Analytics
QoS Enablement, Network & System Monitoring
• Real-Time Botnet Traffic Detection• Auto scaling based on
– CPU utilisation– Memory consumption– Load average – Request count – etc ...
QoS Enablement, Network & System Monitoring ...
• Throttling (E.g. In WSO2 APIM)Multiple throttling levels– API– Application– Resource URL– Subscription level.
• Supports hierarchical throttling limits
Healthcare
• HL7 Messaging support • Monitoring Medical
Records – Delay in patient visits– Alerts based on
glucose levels• Used with of WSO2
Integration (ESB)
Key Differentiations
• Realtime analytics at it’s best – Rich set of realtime functions – Sequence and pattern detection
• No code compilations - SQL Like language• Incremental processing for everyday analytics• Intelligent decision making with ML and more• Rich sets of input & output connectors• High performance and low infrastructure cost
Thank You!