32
1 Watson Machine Learning for z/OS Jamar Smith Data Scientist, North America z Hybrid Cloud [email protected]

Watson Machine Learning for z/OS - ibm-zcouncil.com · Analytics Stack on IBM Z Driven by Data Gravity 16 DATA ANALYTICS ENGINE Spark Cluster Db2 Analytics Accelerator (IDAA) z/OS

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

1

Watson Machine Learning for z/OS—Jamar SmithData Scientist, North Americaz Hybrid Cloud [email protected]

Goal

2

Demonstrate the value of enterprise analytics on the IBM Z platform.

Agenda

3

Enterprise Analytics Strategy

Machine Learning Overview

Value of Analytics in Place

IBM Cloud Pak 4 Data

4

Enterprise Analytics Strategy

Current trends in analytics

The need for Pervasive Analytics is increasing in almost every industry

Real time or near real time analytic results are necessary

Need to leverage all relevant data sources available for insight

Ease demands on highly sought-after analytic skill base

Embrace rapid pace of innovation

5

Data gravity key to enterprise analytics

6

Podcast: http://www.ibmbigdatahub.com/podcast/making-data-simple-what-data-gravity

Data Gravity

Data volume is large, distilling data

provides operational efficiencies

Predominance of data originates on IBM Z, z/OS (transactions,

member info, …)

Performance matters for variety of data on

and off IBM Z

Core transactional systems of record are

on IBM Z

Security / data privacy needs to be preserved

Real-time / near real-time insights are

valuable

7

IBM Z AnalyticsKeep your data in place – a different approach to enterprise analytics

• Keep data in place for analytics • Keep data in place, encrypted and secure• Minimize latency, cost and complexity of data movement• Transform data on platform• Improve data quality and governance

• Apply the same resiliency to analytics as your operational applications

• Combine insight from structured & unstructured data from z and non-z data sources

• Leverage existing people, processes and infrastructure

8

Machine Learning Overview

What is Machine Learning?

9

Data Perform AnalysisProvide

Actionable Insight

Computers that learn without being explicitly programmed.

Hint: It’s just a bunch of math.

Traditional decision process

10

Loan ApplicationHouse Data

Warranty ResolutionCustomer Satisfaction

Approve or RejectAppraise Home Value

Predict Causality Churn

Decision process with ML

11

Loan ApplicationHouse Data

Warranty ResolutionCustomer Satisfaction

Represents a pattern with a Mathematical

Function

Approve or RejectAppraise Home Value

Predict Causality Churn

Mathematical Function

f(x)

What’s involved in Machine Learning

Machine learning processBuild a model using a subset

of the dataDeploy the model to score

against new data

Model ManagementMonitor models performance

over time Retrain model if

performance has degraded

Machine learning prepClearly define business problem

Select data set to address business problemTransform Data

12

Why Machine Learning?

13

Tap into the rich value of historical data

Discover insights and generate predictive models make better decisions

Don’t just generate reports, use predictive analytics

Predictive analytics in the future means things like:• Fraud detection• Optimization of resources• Infinite others all meant to increase

revenue or provide savings

The value of machine learning is rooted in its ability to create accurate models to guide future actions and to discover patterns that we’ve never seen before

14

Value of Analytics in Place

QMF: Move Towards ML with BI

Start with the Data! Create stories with the data that influence questions about the business

QMF, the BI tool on IBM Z for the first step in a data driven enterprise

Analytics Stack on IBM Z Driven by Data Gravity

16

DATA

ANALYTICS ENGINE

Spark Cluster

Db2 Analytics Accelerator (IDAA) z/OS Distributed Platform

Spark Cluster

Spark Cluster

SparkAnaconda

Optimized Data Access Layer

MACHINE LEARNING PLATFORM

MACHINE LEARNING SOLUTIONS

IBM Z Operations Analytics

(IZOA, formerly IOAz)ML based Anti-Fraud

SolutionsDb2 AI for z/OS

(Db2ZAI)

Machine Learning for z/OS

Db2 Analytics Accelerator (IDAA) z/OS Distributed Platform

HTAP

Data Virtualization Manager (DVM)

MerchantsTransactions

MerchantsTransactions

Client DataUser Behavior

NewsTwitter

Data Warehouse Engine

Open Data Analytics for z/OS

Loader

Open Source at its Core

17

Business Applications

CustomerTransaction Merchant

Distributed Apache Spark

Distilled Insight Query

Acceleration

Federate analytics leveraging data in place for more current insights at scale, optimized security, privacy and reduced costs

Distilled InsightAnalytic Result

Sets

Govern, Manage, Algorithm Assist…

Monitor, Feedback

Pauselss GCNew SIMD instructions 32 TB Memory

Pervasive Encryption

IBM Machine Learning for z/OS

Python

IBM Open Data Analytics for z/OS

Optimized Data Integration Layer

Data Data Prep

ML Algo Model Deploy Predict

Full Lifecycle Machine Learning Platform

18

Ingest Data Preparation

Train & Evaluate Models Deploy Go Live: Predict

and MonitorExplore & Visualize

Data Engineers Data Scientists

Application Developers

Production Engineers

Platform agnostic model development

Leverage open source software

Real-time insight with transactions

Insight incorporated from any platformIndustry leading encryption, security, reliability & availability

Tools for Both Coders and Non-Coders

19

• Visual productivity tool around data science• Open-Source data science tools (Python, Spark, Jupyter Notebooks)• Quicker time to value• Inclusion of full-fledged data preparation and many machine learning algorithms

PROGRAMMATIC

• Trained using open source or self-taught• Works within a start-up, technology

firm, CIO office or dedicated• Background in mathematics,

computer science• Uses programming languages,

APIs and avoids packages

VISUAL

• Commercial tools (SPSS) • Line of business/solution

focused• Trained in data mining/ analytic

methodology• Background in social sciences,

economics, mathematics

Better Together

Utilities to accelerate every stage of Machine Learning

20

Auto data preparation (ADP) Auto modelling Auto feature

engineeringAutomatically analyzes input data and prepares it for training • Fills missing values• Encodes/decodes

categorical data• Index string data• Group all numeric types into

vectors• Normalize data

Cognitive assistant for data scientists (CADS)• Select the best algorithm with the

best performance from a set of candidates

Hyperparameter optimization (HPO)• Select the hyperparameter with the

best performance from a set of candidates given a specific algorithm

CADS and HPO use the performance of models on small data sets to predict performance on large data sets. They use ML to facility ML

Automatically recommends feature set which can produce model with bestaccuracy• Join multiple tables and

automatically select relevant features

• Feature selection based on underlying correlation analysis

Data visualization of SPSS Modeler in ML for z/OS

21

Chart themes

ML for z/OS Fraud detection solution templates

22

Tree based sampling for skewed data• Data for fraud detection are generally skewed, e.g. 1/5000 fraud ratio

– Leads to biased model

• Random sampling method may lead to information loss and unstable model performance

• Tree based sampling method to populate training data set

• Goal/Results– Amplify probability of discovering fraud from the data data– Minimize false positives and maximize

finding truly positive fraud

Sample the records in every leaf node

23

§ Leverages machine learning and data science § Ingests SMF data for model training and

scoring§ Analyzes, monitors, and visualizes large

amount of operational data• Builds a hierarchy health tree to represent the

health status of the Db2 sub-systems, transactions and individual KPIs

• Monitors the changes in health status over time

§ Highlights abnormal KPIs in a timeline to assist root cause diagnosis

§ Uses ML for z/OS functionalities to provide module life cycle management

§ Provides real-time scoring capability by adopting SMF real-time interface

Db2 Health Tree - using IBM WML for z/OS

24

IBM Cloud Pak 4 Data

The building blocks of data and analytics

IBM Cloud Pak for Data (ICP4D)

On-Premises

Kubernetes Layer

Infrastructure Layer

3

4

1. Services Ecosystem

With a click, access and deploy an ecosystem of 45+ analytics services and templates from IBM and third parties.

2. Data Virtualization

Quickly and easily query across multiple data sources without moving your data

3. Platform Interface

Speed time-to-value with a single user experience that integrates data management, data governance and analysis for greater efficiency and improved use of resources

4. Red Hat OPENSHIFT®

Leverage the leading hybrid cloud, enterprise container platform for an innovative and fast deployment strategy

5. Any Cloud

Avoid lock-in and leverage all cloud infrastructures with our multi-cloud approach

5

2

1

Platform Interface Layer

Services Layer

ICP4D Use Case with WMLz

27

- Get Access to Data on and off IBM Z

- Deploy ML modelsinto production atthe speed of your business

Summary

• Train anywhere, deploy anywhereLeveraging WMLz for in-transaction scoring

• Data gravityLimiting data movement via coexistence of WMLz with ICP4D

• Several coexistence scenarios Generating benefits of both WMLz and ICP4D

• IBM Db2 Analytics AcceleratorAccess IDAA directly from WMLz and ICP4D

• Data virtualizationProvision Z data to ICP4D via IBM Data Virtualization Manager for z/OS

On-Premises

Thank you

29

Jamar SmithData Scientist, North Americaz Hybrid Cloud [email protected]

Appendix

30

More resources

31

Machine Learning and z Systemshttps://www.youtube.com/watch?v=T2HtyNX7aHc

Machine Learning Launch Event interviewhttps://www.youtube.com/watch?v=WHenFAa6iPw&feature=youtu.be&list=PLenh213llmca-QogcjfSW9RHPtNye9N_p

Machine Learning and z Systemshttps://www.youtube.com/watch?v=T2HtyNX7aHc

Machine Learning Launch Event interviewhttps://www.youtube.com/watch?v=WHenFAa6iPw&feature=youtu.be&list=PLenh213llmca-QogcjfSW9RHPtNye9N_p

Gaining Agility with Spark Analytics on z Systemshttps://www.youtube.com/watch?v=Y7HQbKBR_l4

IBM z/OS Platform for Apache Sparkhttps://www-03.ibm.com/systems/z/os/zos/apache-spark.html

IBM Knowledge Center: z/OS Platform for Apache Sparkhttps://www.ibm.com/support/knowledgecenter/SSLTBW_2.2.0/com.ibm.zos.v2r2.azk/azk.htm

IBM Knowledge Center: IBM Machine Learning for z/OShttps://www.ibm.com/support/knowledgecenter/SS9PF4_1.1.0/src/tpc/mlz_home.html

Redbook: Apache Spark Implementation on IBM z/OS http://www.redbooks.ibm.com/redbooks/pdfs/sg248325.pdf

Previous use cases for real-time analytics

32

Business Challenge Proof of Concept

Global Bank headquartered in Japan

• Current rules based process the bank has for monitoring fraudulent transactions for money transfer is very manual and resource intensive

• IBM worked with the bank’s IMS data to build a model that scored transactions while minimizing the rate offalse positives

• Desired solution close to data on Z

Large Bank in the United States

• Need for a more flexible and scalable option to deploy current and future credit card fraud detection models built by their data scientists

• MLz tuned PMML scoring service to meet the bank’s strict SLAs (<5 ms end to end)

• MLz & IzODA together provides the bank with a flexible platform that embraces open standards

Bank serving members of the U.S. military and their families

• Desire to leverage more data sources on IBM Z, and reflect real-time changes in their business environment; Need for an enterprise approach to model deployment and life cycle management

• They were able to leverage more data sources by accessing IBM Z data in place and in real-time by leveraging Db2 Analytics Accelerator (“IDAA”) with MLz

• With IDAA, they were able to process queries and access data in that used to take several hours in under 5 seconds

US Credit Union• Loan underwriting is time-consuming and errors

are expensive

• Most loan applications required a skilled underwriter

• Joint development of a binary classification model with loan characteristics as input. The output was an approved / declined score with a greater than 90% accuracy rate

• Desired deployment close to data on Z

Large Healthcare Company

• Deliver greater value to their customers by leveraging technologyand analytics

• Achieve the best possible outcomes by getting the most appropriate care delivered at the best price

• Reward diabetic patients, by adjusting their co-pay, for taking actions to manage disease

• Claims data on Db2 for z/OS