Howtoapplymachinelearningtoreal-timeprocessing
Kai Waehner
MILAN 25-26 NOVEMBER 2016
[email protected]@KaiWaehnerwww.kai-waehner.de
© Copyright 2000-2016 TIBCO Software Inc.
Apply Big Data Analytics to Real Time Processing
© Copyright 2000-2016 TIBCO Software Inc.
Analyze and Act on Critical Business Moments
© Copyright 2000-2016 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics
2) Building an Analytic Model
3) Real Time Processing
4) Live Demo
© Copyright 2000-2016 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics
2) Building an Analytic Model
3) Real Time Processing
4) Live Demo
Machine Learning
…. allows computers to find hidden insights without being explicitly programmed where to look.
Real World Examples of Machine Learning
Spam Detection Search Results +Product Recommendation
Picture Detection(Friends, Locations, Products)
Machine Learning is already present in daily life…
Now, every enterprise is beginning to leverage it!
The Next Disruption:Google Beats Go Champion
© Copyright 2000-2016 TIBCO Software Inc.
Analytics Maturity Model
Immediate Long-TermCompetitiveAdvantageValue to the Organization
A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases
Self-serviceDashboards
EventProcessingAdvancedAnalytics
Measure Diagnose Predict Optimize Alert Automate
Analytics Maturity
VisualAnalytics
EventProcessing
Analytics
© Copyright 2000-2016 TIBCO Software Inc.
Analytics Maturity Model
Immediate Long-TermCompetitiveAdvantageValue to the Organization
VisualAnalytics
EventProcessingAdvancedAnalytics
Measure Diagnose Predict Optimize Alert Automate
Analytics Maturity
A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases
Analytics
© Copyright 2000-2016 TIBCO Software Inc.
Analytics Maturity Model
Immediate Long-TermCompetitiveAdvantageValue to the Organization
Self-serviceDashboards
EventProcessingAdvancedAnalytics
Measure Diagnose Predict Optimize Alert Automate
Analytics Maturity
A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases
VisualAnalytics
EventProcessing
Analytics
© Copyright 2000-2016 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics
2) Building an Analytic Model
3) Real Time Processing
4) Live Demo
© Copyright 2000-2016 TIBCO Software Inc.
Analytical Pipeline
© Copyright 2000-2016 TIBCO Software Inc.
Analytics Maturity Model
Immediate Long-TermCompetitiveAdvantageValue to the Organization
A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases
Self-serviceDashboards
EventProcessingAdvancedAnalytics
Measure Diagnose Predict Optimize Alert Automate
Analytics Maturity
VisualAnalytics
EventProcessing
Analytics
© Copyright 2000-2016 TIBCO Software Inc.
Analytical Pipeline
© Copyright 2000-2016 TIBCO Software Inc.
Variety of Data in Enterprises
CustomGUI-drivendataaccessvia
SDK
SiebeleBusiness
Localdatasources
AccessExcel STDF
Drag-and-drop
MySQL
SQLServerOracle
InformationServices(join,transform,reusable,
parameterized,dynamicqueryforin-memoryuse)
Databases
JDBC/ODBC
HadoopSFDC
PostgreSQL
TeradataNetezza
Etc.XML
RDBMS
FlatFiles
Spread-sheets
WebServices
OracleE-Business
RDBMSRDBMS
RDBMS
SAP BWSAP R/3 DATA
FABRIC
Salesforce
ODBCOLEDBSqlClient
Directconnection
Oracle
TeradataAsterMSSSASTeradata
DirectQuery(dynamicallyqueryandretrievedata
forvisualizationandanalysis)
Databases
MySQLEtc.
OBIEE
NetezzaHadoop
© Copyright 2000-2016 TIBCO Software Inc.
Data Acquisition
“Smart Recommendation Engine”
© Copyright 2000-2016 TIBCO Software Inc.
Analytical Pipeline
cust_id dept sku dollar gift date1 104 C 12003 2.40 FALSE 2016-10-172 105 A 12005 62.85 FALSE 2016-10-173 102 C 12007 69.23 TRUE 2016-10-174 104 B 12004 9.33 FALSE 2016-10-185 105 C 12010 14.16 TRUE 2016-10-186 101 B 12003 90.43 FALSE 2016-10-197 103 C 12005 90.97 FALSE 2016-10-19n … … … … … …
cust_id A B C total # orders first_date
last_date
1 100 21.76 23.67 0.00 45.43 2 2016-10-19
2016-10-20
2 101 0.01 74.65 0.00 74.66 3 2016-10-19
2016-10-20
3 102 0.00 60.92 50.29 111.21 6 2016-10-17
2016-10-20
4 103 0.00 0.00 52.30 52.30 2 2016-10-19
2016-10-20
5 104 31.34 9.33 2.40 43.06 4 2016-10-17
2016-10-20
6 105 62.85 0.00 56.00 118.85 3 2016-10-17
2016-10-20
© Copyright 2000-2016 TIBCO Software Inc.
Data Munging - Transformations
© Copyright 2000-2016 TIBCO Software Inc.
Analytical Pipeline
“The greatest value of a picture is when it forces us to notice what we never expected to see”
John W. Tukey, 1977
© Copyright 2000-2016 TIBCO Software Inc.
Exploratory Data Analysis
Visual Analytics - Interactive Brush-Linked
© Copyright 2000-2016 TIBCO Software Inc.
… and “Inline Data Wrangling” à Ad-hoc data preparation instead of just ETL
© Copyright 2000-2016 TIBCO Software Inc.
Analytics Maturity Model
Immediate Long-TermCompetitiveAdvantageValue to the Organization
VisualAnalytics
EventProcessingAdvancedAnalytics
Measure Diagnose Predict Optimize Alert Automate
Analytics Maturity
A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases
Analytics
© Copyright 2000-2016 TIBCO Software Inc.
Analytical Pipeline
© Copyright 2000-2016 TIBCO Software Inc.
Which picture represents a model?
A model is a simplification of the truth that helps you with decision making.
© Copyright 2000-2016 TIBCO Software Inc.
Model Building
© Copyright 2000-2016 TIBCO Software Inc.
Model Building
Employees who write longer emails earn higher salaries!
© Copyright 2000-2016 TIBCO Software Inc.
Model Building
© Copyright 2000-2016 TIBCO Software Inc.
Model Improvement
Managers
Staff
© Copyright 2000-2016 TIBCO Software Inc.
Model Improvement
© Copyright 2000-2016 TIBCO Software Inc.
Analytical Pipeline
© Copyright 2000-2016 TIBCO Software Inc.
Model Validation
How is the IQ of a kid related to the IQ of his / her mum?
© Copyright 2000-2016 TIBCO Software Inc.
Frameworks and Tooling
Advanced Analytics and Big Data Tools (for Data Scientists)
Many more ….
© Copyright 2000-2016 TIBCO Software Inc.
“…as a next-generation data discovery capability that automatically finds and explains insights from advanced analytics to business users or citizen data scientists”
Smart Data Discovery (for the Business User)
Leverage Machine Learningwithout the help of a Data Scientist
Smart Visual Analytics vs. Data Science Tools
Live DemoLive Demo
© Copyright 2000-2016 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics
2) Building an Analytic Model
3) Real Time Processing
4) Live Demo
© Copyright 2000-2016 TIBCO Software Inc.
Analytics Maturity Model
Immediate Long-TermCompetitiveAdvantageValue to the Organization
Self-serviceDashboards
EventProcessingAdvancedAnalytics
Measure Diagnose Predict Optimize Alert Automate
Analytics Maturity
A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases
VisualAnalytics
EventProcessing
Analytics
© Copyright 2000-2016 TIBCO Software Inc.
Traditional Data Processing: ”Request – Response”
Store
Analyze
Act
© Copyright 2000-2016 TIBCO Software Inc.
The New Era: Streaming Analytics
Act & Monitor
Analyze
Store
© Copyright 2000-2016 TIBCO Software Inc.
Streaming Analytics - Processing Pipeline
APIs
Adapters / Channels
Integration
Messaging
Stream Ingest
Transformation
Aggregation
Enrichment
Filtering
StreamPreprocessing
Process Management
Analytics (Real Time)
Applications& APIs
Analytics / DW
Reporting
StreamOutcomes
• Contextual Rules
• Windowing
• Patterns
• Analytics
• Deep ML
• …
Stream Analytics & Processing
Index / SearchNormalization
Applying an Analytic Modelis just a piece of the puzzle!
© Copyright 2000-2016 TIBCO Software Inc.
Frameworks and Products
(no complete list!)
OPEN SOURCE CLOSED SOURCE
PRODUCT
FRAMEWORK
Azure MicrosoftStream Analytics
© Copyright 2000-2016 TIBCO Software Inc.
Comparison of Stream Processing Frameworks and Products
Slide Deck and Video Recording:http://www.kai-waehner.de/blog/2016/11/15/streaming-analytics-comparison-open-source-frameworks-products-cloud-services/
© Copyright 2000-2016 TIBCO Software Inc.
Apache Storm – Hello World
http://wpcertification.blogspot.ch/2014/02/helloworld-apache-storm-word-counter.html
© Copyright 2000-2016 TIBCO Software Inc.
Visual Coding for Streaming Analytics
• StreamingOperators• Connectivity• VisualDevelopment• Testing&Simulation• MatureTooling/Support• MiddlewareIntegration
© Copyright 2000-2016 TIBCO Software Inc.
Live Visual Analytics UI
Dynamicaggregation
Livevisualization
Ad-hoccontinuousquery
Alerts
Action
© Copyright 2000-2016 TIBCO Software Inc.
How to apply analytic models to real time processing without redevelopment?
StreamProcessi
ngH20.ai
Open Source R
TERRSpark
ML MATLAB
SAS
PMML
© Copyright 2000-2016 TIBCO Software Inc.
TIBCO StreamBase Connector for H2O.ai
© Copyright 2000-2016 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics
2) Building an Analytic Model
3) Real Time Processing
4) Live Demo
Scenario: Predictive Scrapping of Parts in an Assembly Line
Goal: Scrap parts as early as possible automatically to reduce costs in a manufacturing process.
Question: When to scrap a part in Station 1 instead of doing re-work or sending it to Station 2?
Station 1 Station 2
Cost Before9€ 7€ 13€ Total Cost
29€(or more)
Scrap? Scrap?
Fast Data Architecture for Predictive Maintenance
OperationalAnalytics
OperationsLiveUI
CSV Batch
JSON Real Time
XML Real Time
StreamingAnalyticsAction
Aggregate
Rules
Analytics
Correlate
LiveDatamart
Continuousqueryprocessing
Alerts
Manualaction,escalation
HISTORICALANALYSIS DataScientists
FlumeHDFS
Spotfire
R/TERRHDFS
Hadoop (Cloudera)
StreamBase
TIBCO Fast Data Platform
H2O
OracleRDBMS
Avro Parquet … PMML
InternalData
TIBCO Spotfire with H2O Integration
Data Discovery / Data Mining (“Are parts that repeat a station more likely scrap parts?”)
TIBCO Live Datamart
Operational Intelligence (“Monitor the manufacturing process and change rules in real time!”)
Live Dartmart Desktop Client
TIBCO Live Datamart
Operational Intelligence (“Monitor the manufacturing process and change rules in real time!”)
Live Dartmart Web API
TIBCO Spotfire + StreamBase + H2O.ai + Live Datamart
Live DemoLive Demo
© Copyright 2000-2016 TIBCO Software Inc.
TIBCO Accelerator for Apache Spark
1. Fast Data Preparation for IoTDozens of enterprise and IoT data preparation adapters: MQTT, Databases; inbound creation of HDFS, Parquet, Hbase, Avro…
2. Spotfire Model Discovery TemplateUse Spotfire to explore Spark data lake, create predictive model, train in H20, and deploy to Streaming Analytics.
3. Operationalize Predictive ModelsZookeeper deployment to StreamBase nodes living in Spark cluster via H20, PMML, TERR models
4. Streaming Analytics for AutomationAutomate action based on predictive models – make offers to customers, stop fraudulent transactions, alert.
5. Monitor & Retrain Model Monitor behavior of model, retrain when necessary.
6. Drag & Drop for Business Solution DevelopersCode-free development environment for work with H20, HDFS, Avro, TERR
The TIBCO Accelerator for Spark is a TIBCO engineered, light-weight open-source fast-start for systems to stream data into Spark, discover patterns in Spark with Spotfire, and operationalize the insights on Big Data.
FUNCTIONAL COMPONENTS
© Copyright 2000-2016 TIBCO Software Inc.
Key Take-Aways
Ø Insights are hidden in Historical Data on Big Data Platforms
Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models
Ø Event Processing uses these Models (without Redevelopment) to take Action in Real Time
Questions? Please contact me!
Kai WähnerTechnology Evangelist at TIBCO
[email protected]@KaiWaehnerwww.kai-waehner.deLinkedIn