24
Zementis © - Confidential Dr. Michael Zeller Zementis, Inc. Big Data Science Meetup August 25, 2012 www.zementis.com @Zementis Predictive Analytics in a Heterogeneous World of Tools and Big Data Platforms

Predictive Analytics in a Heterogeneous World of Tools and

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Zementis © - Confidential

Dr. Michael ZellerZementis, Inc.

Big Data Science MeetupAugust 25, 2012

www.zementis.com@Zementis

Predictive Analytics in a Heterogeneous World of Tools and Big Data Platforms

Zementis © - Confidential 2

Software Technology‐ ADAPA® Decision Engine‐ UPPI Universal PMML Plug‐in‐ ADAPA  Add‐in for Excel‐ ADAPA Control Center‐ PMML Converter‐ Transformations Generator

Consulting Services‐ Data Mining‐ Predictive Analytics‐ Statistical Data Analysis‐ Business Rules Development‐ Predictive Solutions, e.g.:

Credit Risk AssessmentCustomer PreferencesCredit Card FraudPredictive MaintenanceQuality ControlHealthcare Fraud/Abuse

San Diego and Hong Kong

Operational Predictive Analytics

Zementis © - Confidential 3

About ZementisHighlights, Use Cases and Examples

Peer-reviewed Articles and Publications Available at http://www.zementis.com/

R-Journal & ACM SIGKDD Explorations Journal KDD Conference Panel & Report / PMML Workshop LinkedIn PMML Discussion Group ~3000 members PMML Book

“PMML in Action: Unleashing the Power of Open Standards for Data Mining and Predictive Analytics”

Global Partner Network

Zementis © - Confidential

Today’s Focus: Business Value of Predictive Analytics

Score Distribution1st Lien Stand-Alone Loans

0%

2%

4%

6%

8%

10%

12%

14%

50 100

150

200

250

300

350

400

450

500

550

600

650

700

750

800

850

900

950

1000

Score

% W

ithin

Cla

ss

GoodsBadsPoly. (Goods)Poly. (Bads)

Score Distribution1st Lien Stand-Alone Loans

0%

2%

4%

6%

8%

10%

12%

14%

50 100

150

200

250

300

350

400

450

500

550

600

650

700

750

800

850

900

950

1000

Score

% W

ithin

Cla

ss

GoodsBadsPoly. (Goods)Poly. (Bads)

% of Delinquent Loans per Month

0

10

20

30

40

50

60

70

80

90

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov

Months

% o

f Del

inqu

ent L

oans

700750800850900950

4

Operational Predictive Analytics

Zementis © - Confidential 5

Zementis © - Confidential 6

PMMLPredictive Model Markup Language

Transformations

PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications.

Mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models.

Supported by all leading data mining tools, commercial and open-source.

Allows for the clear separation of tasks: Model development vs. model deployment.

Eliminates the need for custom code and proprietary model deployment solutions.

Uniform deployment platform ensures scalability and reliability of model execution.

Models

PMML defines a standard not only to represent models, but also data handling and data transformations (pre- and post-processing)

Transformations

Zementis © - Confidential 7

Industry SupportPredictive Model Markup Language

Zementis © - Confidential

Individual Model

Input Data

PMML File

Input Validation

Pre‐Processing

Core Model

Post‐Processing

Missing Values, Invalid Values, Outliers

Normalize, Discretize, Bin, Map, etc. 

Neural Nets, Trees, Regression, SVM, Clustering, etc.

Scored Data

Scaling, Thresholds, etc.

8

Zementis © - Confidential

Model Composition

Input Data

PMML File

Input Validation

Pre‐Processing Voting

Scored Data

Model 1

Model 2

Model 3

Majority Voting, Weighted Voting, Weighted Average, etc.

Scores from all models are computed

9

Zementis © - Confidential

Data Driven Model Segmentation

PMML File

Input Validation

Pre‐Processing

Scored Data

Model 1

Model 2

Model 3

?

Predicate‐based Model Selection

10

Input Data

Zementis © - Confidential

One Standard, One Process

Applications External Vendors

Service ProvidersDivisions

PMML

Zementis © - Confidential

PMML: Predictive Model Management Integrating across all systems and processes

ApplicationsCRM, ERP, EXCEL, etc.

Business Process

PMMLPMML

Cloud ComputingVirtual Server

In-databaseHadoop

Zementis © - Confidential 13

BusinessRules

BusinessRules

Auditing &ReportingAuditing &Reporting

PredictiveAnalyticsPredictiveAnalytics

Web Services &Java API

Web Services &Java API

ADAPA Predictive Analytics Decision Engine Overview

ADAPA Scoring Engine

Predictive ModelsBusiness Rules

DataEnhancedDecisioning

Zementis © - Confidential 14

Scalable Execution Platform

Environment to Manage Predictive Models

Framework for SOA‐based IT Integration

ADAPA is not ...

Execute your models in real‐time and on demand.Score in single decision or batch mode.

Deploy one or many models in the same engine.Manage and maintain models through web console.

Completely standards‐basedmodels and API.Easily integrated into your existing infrastructure.

ADAPA is not a model development environment.Use best‐of‐breed commercial or open source tools.

What isADAPA?What isADAPA?

Adaptive Decision And Predictive Analytics

Zementis © - Confidential

From Model Building to Model Deployment

15

Model DeploymentModel Building

ADAPA Deployment Options Amazon EC2 IBM SmartCloud Private Cloud In‐house As Embedded Java Library OEM / White Label

Zementis © - Confidential 16

Model DeploymentIntegration / Execution

Model Building

Universal PMML Plug-in for “Big Data” Scoring

In‐database & Hadoop Turn PMML into UDFs Deploy PMML files & UDF stubs Write SQL against UDFs

Zementis © - Confidential

ADAPA & Universal Plug-In OverviewFeatures and Model Types

17

The Plug-in delivers a wide range of predictive analytics for high performance scoring, including:

• Decision Trees for classification and regression• Neural Network Models: Back-Propagation, Radial-Basis Function, and Neural-Gas• Support Vector Machines for regression, binary and multi-class classification• Linear and Logistic Regression (binary and multinomial)• Naïve Bayes Classifiers• General and Generalized Linear Models• Cox Regression Models• Rule Set Models (flat decision trees)• Clustering Models: Distribution-Based, Center-Based, and 2-Step Clustering• Scorecards (including reason codes)• Association Rules• Multiple Models: Model ensemble, segmentation, chaining and composition

It also implements the a data dictionary, missing / invalid values handling and data pre-processing.

Zementis © - Confidential

ADAPA & Universal Plug-In OverviewBroad Compatibility across PMML Versions & Vendors

18

Universal PMML Plug-in Includes PMML Conversion, Validation & Correction

• Consumes PMML Versions 2.0 … 4.0

• Validates and Corrects Known Issues

• Ensures Compatibility with Vendors

• Invisible & Seamless to User

Zementis © - Confidential 19

Case Study – ADAPA in the Financial IndustryFraud & Risk Scoring

Scoring Bureau

IT Service Provider

Financial Institution

ADAPA Scoring Engine

Online Transactions Decision Management

Zementis © - Confidential 20

ADAPA Real-time Decision Management Sensor & device data processing

Energy

Biometrics

IP Network Security

Rotating Equipment

ADAPA Scoring Engine

Zementis © - Confidential

ADAPA Case Study – iPhone Mobile Scoring“On-the-Cloud” with Zementis ADAPA

21

ERP SCM CRM Legacy Others

Batch / Real-time Business Intelligence Hub

PMML Model Upload

Scores &Recommendation

Inquiry

DynaMine Data Mining Automation

Inquiry

Customer Information

Zementis © - Confidential 22

ADAPA Demo

Zementis © - Confidential 23

PMML – Why Should You Care? Best Practices for Predictive Analytics and Data Mining

Platform IndependentDeployment

Platform IndependentDeployment

Vendor NeutralStandard

Vendor NeutralStandard Time‐to‐Market

AgilityTime‐to‐Market

Agility

• Open Standards vs. Proprietary Code

• Select Best‐of‐Breed Tool Set• Aviod Vendor Lock‐in

• Deploy in Minutes vs. Months• Facilitate Clear Requirements & 

Communication• Scale with Business Demand

• Big Data & Real‐Time• In‐Database & Hadoop• Server, Cloud & SaaS

Zementis © - Confidential 24

Thank You!

U.S.A Headquarters Asia Office

E-mail: [email protected]

19/F., Unit AHo Lee Commercial Building38-44 D’Aguilar StreetCentral, Hong Kong (S.A.R.)

Tel: +852 2868-0878Fax: +852 2845-6027

6125 Cornerstone Court EastSuite 250San Diego, CA, 92121

Tel: +1 619 330-0780Fax: +1 858 535-0227