Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1
Watson Machine Learning for z/OS—Jamar SmithData Scientist, North Americaz Hybrid Cloud [email protected]
Agenda
3
Enterprise Analytics Strategy
Machine Learning Overview
Value of Analytics in Place
IBM Cloud Pak 4 Data
Current trends in analytics
The need for Pervasive Analytics is increasing in almost every industry
Real time or near real time analytic results are necessary
Need to leverage all relevant data sources available for insight
Ease demands on highly sought-after analytic skill base
Embrace rapid pace of innovation
5
Data gravity key to enterprise analytics
6
Podcast: http://www.ibmbigdatahub.com/podcast/making-data-simple-what-data-gravity
Data Gravity
Data volume is large, distilling data
provides operational efficiencies
Predominance of data originates on IBM Z, z/OS (transactions,
member info, …)
Performance matters for variety of data on
and off IBM Z
Core transactional systems of record are
on IBM Z
Security / data privacy needs to be preserved
Real-time / near real-time insights are
valuable
7
IBM Z AnalyticsKeep your data in place – a different approach to enterprise analytics
• Keep data in place for analytics • Keep data in place, encrypted and secure• Minimize latency, cost and complexity of data movement• Transform data on platform• Improve data quality and governance
• Apply the same resiliency to analytics as your operational applications
• Combine insight from structured & unstructured data from z and non-z data sources
• Leverage existing people, processes and infrastructure
What is Machine Learning?
9
Data Perform AnalysisProvide
Actionable Insight
Computers that learn without being explicitly programmed.
Hint: It’s just a bunch of math.
Traditional decision process
10
Loan ApplicationHouse Data
Warranty ResolutionCustomer Satisfaction
Approve or RejectAppraise Home Value
Predict Causality Churn
Decision process with ML
11
Loan ApplicationHouse Data
Warranty ResolutionCustomer Satisfaction
Represents a pattern with a Mathematical
Function
Approve or RejectAppraise Home Value
Predict Causality Churn
Mathematical Function
f(x)
What’s involved in Machine Learning
Machine learning processBuild a model using a subset
of the dataDeploy the model to score
against new data
Model ManagementMonitor models performance
over time Retrain model if
performance has degraded
Machine learning prepClearly define business problem
Select data set to address business problemTransform Data
12
Why Machine Learning?
13
Tap into the rich value of historical data
Discover insights and generate predictive models make better decisions
Don’t just generate reports, use predictive analytics
Predictive analytics in the future means things like:• Fraud detection• Optimization of resources• Infinite others all meant to increase
revenue or provide savings
The value of machine learning is rooted in its ability to create accurate models to guide future actions and to discover patterns that we’ve never seen before
QMF: Move Towards ML with BI
Start with the Data! Create stories with the data that influence questions about the business
QMF, the BI tool on IBM Z for the first step in a data driven enterprise
Analytics Stack on IBM Z Driven by Data Gravity
16
DATA
ANALYTICS ENGINE
Spark Cluster
Db2 Analytics Accelerator (IDAA) z/OS Distributed Platform
Spark Cluster
Spark Cluster
SparkAnaconda
Optimized Data Access Layer
MACHINE LEARNING PLATFORM
MACHINE LEARNING SOLUTIONS
IBM Z Operations Analytics
(IZOA, formerly IOAz)ML based Anti-Fraud
SolutionsDb2 AI for z/OS
(Db2ZAI)
Machine Learning for z/OS
Db2 Analytics Accelerator (IDAA) z/OS Distributed Platform
HTAP
Data Virtualization Manager (DVM)
MerchantsTransactions
MerchantsTransactions
Client DataUser Behavior
NewsTwitter
Data Warehouse Engine
Open Data Analytics for z/OS
Loader
Open Source at its Core
17
Business Applications
CustomerTransaction Merchant
Distributed Apache Spark
Distilled Insight Query
Acceleration
Federate analytics leveraging data in place for more current insights at scale, optimized security, privacy and reduced costs
Distilled InsightAnalytic Result
Sets
Govern, Manage, Algorithm Assist…
Monitor, Feedback
Pauselss GCNew SIMD instructions 32 TB Memory
Pervasive Encryption
IBM Machine Learning for z/OS
Python
IBM Open Data Analytics for z/OS
Optimized Data Integration Layer
Data Data Prep
ML Algo Model Deploy Predict
Full Lifecycle Machine Learning Platform
18
Ingest Data Preparation
Train & Evaluate Models Deploy Go Live: Predict
and MonitorExplore & Visualize
Data Engineers Data Scientists
Application Developers
Production Engineers
Platform agnostic model development
Leverage open source software
Real-time insight with transactions
Insight incorporated from any platformIndustry leading encryption, security, reliability & availability
Tools for Both Coders and Non-Coders
19
• Visual productivity tool around data science• Open-Source data science tools (Python, Spark, Jupyter Notebooks)• Quicker time to value• Inclusion of full-fledged data preparation and many machine learning algorithms
PROGRAMMATIC
• Trained using open source or self-taught• Works within a start-up, technology
firm, CIO office or dedicated• Background in mathematics,
computer science• Uses programming languages,
APIs and avoids packages
VISUAL
• Commercial tools (SPSS) • Line of business/solution
focused• Trained in data mining/ analytic
methodology• Background in social sciences,
economics, mathematics
Better Together
Utilities to accelerate every stage of Machine Learning
20
Auto data preparation (ADP) Auto modelling Auto feature
engineeringAutomatically analyzes input data and prepares it for training • Fills missing values• Encodes/decodes
categorical data• Index string data• Group all numeric types into
vectors• Normalize data
Cognitive assistant for data scientists (CADS)• Select the best algorithm with the
best performance from a set of candidates
Hyperparameter optimization (HPO)• Select the hyperparameter with the
best performance from a set of candidates given a specific algorithm
CADS and HPO use the performance of models on small data sets to predict performance on large data sets. They use ML to facility ML
Automatically recommends feature set which can produce model with bestaccuracy• Join multiple tables and
automatically select relevant features
• Feature selection based on underlying correlation analysis
ML for z/OS Fraud detection solution templates
22
Tree based sampling for skewed data• Data for fraud detection are generally skewed, e.g. 1/5000 fraud ratio
– Leads to biased model
• Random sampling method may lead to information loss and unstable model performance
• Tree based sampling method to populate training data set
• Goal/Results– Amplify probability of discovering fraud from the data data– Minimize false positives and maximize
finding truly positive fraud
Sample the records in every leaf node
23
§ Leverages machine learning and data science § Ingests SMF data for model training and
scoring§ Analyzes, monitors, and visualizes large
amount of operational data• Builds a hierarchy health tree to represent the
health status of the Db2 sub-systems, transactions and individual KPIs
• Monitors the changes in health status over time
§ Highlights abnormal KPIs in a timeline to assist root cause diagnosis
§ Uses ML for z/OS functionalities to provide module life cycle management
§ Provides real-time scoring capability by adopting SMF real-time interface
Db2 Health Tree - using IBM WML for z/OS
IBM Cloud Pak for Data (ICP4D)
On-Premises
Kubernetes Layer
Infrastructure Layer
3
4
1. Services Ecosystem
With a click, access and deploy an ecosystem of 45+ analytics services and templates from IBM and third parties.
2. Data Virtualization
Quickly and easily query across multiple data sources without moving your data
3. Platform Interface
Speed time-to-value with a single user experience that integrates data management, data governance and analysis for greater efficiency and improved use of resources
4. Red Hat OPENSHIFT®
Leverage the leading hybrid cloud, enterprise container platform for an innovative and fast deployment strategy
5. Any Cloud
Avoid lock-in and leverage all cloud infrastructures with our multi-cloud approach
5
2
1
Platform Interface Layer
Services Layer
ICP4D Use Case with WMLz
27
- Get Access to Data on and off IBM Z
- Deploy ML modelsinto production atthe speed of your business
Summary
• Train anywhere, deploy anywhereLeveraging WMLz for in-transaction scoring
• Data gravityLimiting data movement via coexistence of WMLz with ICP4D
• Several coexistence scenarios Generating benefits of both WMLz and ICP4D
• IBM Db2 Analytics AcceleratorAccess IDAA directly from WMLz and ICP4D
• Data virtualizationProvision Z data to ICP4D via IBM Data Virtualization Manager for z/OS
On-Premises
More resources
31
Machine Learning and z Systemshttps://www.youtube.com/watch?v=T2HtyNX7aHc
Machine Learning Launch Event interviewhttps://www.youtube.com/watch?v=WHenFAa6iPw&feature=youtu.be&list=PLenh213llmca-QogcjfSW9RHPtNye9N_p
Machine Learning and z Systemshttps://www.youtube.com/watch?v=T2HtyNX7aHc
Machine Learning Launch Event interviewhttps://www.youtube.com/watch?v=WHenFAa6iPw&feature=youtu.be&list=PLenh213llmca-QogcjfSW9RHPtNye9N_p
Gaining Agility with Spark Analytics on z Systemshttps://www.youtube.com/watch?v=Y7HQbKBR_l4
IBM z/OS Platform for Apache Sparkhttps://www-03.ibm.com/systems/z/os/zos/apache-spark.html
IBM Knowledge Center: z/OS Platform for Apache Sparkhttps://www.ibm.com/support/knowledgecenter/SSLTBW_2.2.0/com.ibm.zos.v2r2.azk/azk.htm
IBM Knowledge Center: IBM Machine Learning for z/OShttps://www.ibm.com/support/knowledgecenter/SS9PF4_1.1.0/src/tpc/mlz_home.html
Redbook: Apache Spark Implementation on IBM z/OS http://www.redbooks.ibm.com/redbooks/pdfs/sg248325.pdf
Previous use cases for real-time analytics
32
Business Challenge Proof of Concept
Global Bank headquartered in Japan
• Current rules based process the bank has for monitoring fraudulent transactions for money transfer is very manual and resource intensive
• IBM worked with the bank’s IMS data to build a model that scored transactions while minimizing the rate offalse positives
• Desired solution close to data on Z
Large Bank in the United States
• Need for a more flexible and scalable option to deploy current and future credit card fraud detection models built by their data scientists
• MLz tuned PMML scoring service to meet the bank’s strict SLAs (<5 ms end to end)
• MLz & IzODA together provides the bank with a flexible platform that embraces open standards
Bank serving members of the U.S. military and their families
• Desire to leverage more data sources on IBM Z, and reflect real-time changes in their business environment; Need for an enterprise approach to model deployment and life cycle management
• They were able to leverage more data sources by accessing IBM Z data in place and in real-time by leveraging Db2 Analytics Accelerator (“IDAA”) with MLz
• With IDAA, they were able to process queries and access data in that used to take several hours in under 5 seconds
US Credit Union• Loan underwriting is time-consuming and errors
are expensive
• Most loan applications required a skilled underwriter
• Joint development of a binary classification model with loan characteristics as input. The output was an approved / declined score with a greater than 90% accuracy rate
• Desired deployment close to data on Z
Large Healthcare Company
• Deliver greater value to their customers by leveraging technologyand analytics
• Achieve the best possible outcomes by getting the most appropriate care delivered at the best price
• Reward diabetic patients, by adjusting their co-pay, for taking actions to manage disease
• Claims data on Db2 for z/OS