Upload
michael-murff
View
295
Download
1
Embed Size (px)
Citation preview
Rapid Model Refresh (RMR)
in Online Fraud Detection Engine
Oct 2010Presented by Michael Murff, WenSui Liu
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Agenda Overview
Traditional Tactics Fighting Fraud
Best Practice in PayPal Fraud Detection
Rapid Model Refresh (RMR)
Extensions and Future
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Online Fraud in Financial Services Evolution in Financial Services
• Paper-Based• In-Branch• Perceptible Footprint
… …
• Electronic• Cyber Spaces• Invisible Marketplace
… …
Emerging Fraud Trends
• Old-Fashion• Isolated Individual• Limited-Scope Damage• Traceable Patterns
… …
• Tech-Savvy• Organized Gang• Multi-Billion Loss• Dynamic Trends
… …
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Industry Fact
$1.5 $1.7 $2
.1
$1.9 $2
.6 $2.8 $3
.1 $3.7 $4
.0
$0
$1
$2
$3
$4
$5
2000 2001 2002 2003 2004 2005 2006 2007 2008
Loss
in B
illio
n $
Online Revenue Loss Due to Fraud
Source: Cybersource
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Agenda Objectives
Traditional Tactics Fighting Fraud
Best Practice in PayPal Fraud Detection
Rapid Model Refresh (RMR)
Extensions and Future
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Traditional Mitigation Tactics Heuristic Approach
Detect Anomalies Identify Patterns Set Review Criterion
Model-Based Score Rely on Statistical Models (Logit Models / Neural Nets) Generate Suspicion Score Rank Order Transactions
Rule-Based System Employ Machine Learning Algorithms Generate Rule Sets for Segmentation Target High-Risk Segments
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Pros and Cons Heuristic
• Integrate Domain Knowledge• Easy to Implement
• Review-Based & Labor Intensive• Local Solutions without Global View
Scoring
• Successful Industrial Applications• Ideal for Large-Scale Domains
• Long Time-to-Market• Static perspective of Fraud Trends
Rule-Induction
• Fits Dynamic Online Nature• Rapid Development & Deployment
• Require Frequent Refreshes• Burden of High-Volume Rules
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Next … …
Now What?
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Agenda Objectives
Traditional Tactics Fighting Fraud
Best Practices in PayPal Fraud Detection
Rapid Model Refresh (RMR)
Extensions and Future
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
PayPal's Way to Fight Frauds
PayPal Loss Trend from 200X through 200Y
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Multi-Level Detection EngineRisk Scoring Rule Induction Agent Review
• Modelers developed scoring models with logistic regression / neural network
• Risk score is assigned to each transaction through the system.
• Low-risk transactions will be passed through.
• Analysts built decision trees on high-risk transactions ranked order by risk scoring.
• Most risky segments are further identified by balancing between bad and pass-through rate.
• Most risky transactions identified by rule sets are sent into review queues.
• Queued transactions are prioritized and routed to agents in specific domains.
• Case review and investigation are conducted.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Implementation ChallengesRealities Problems
Fast-Growing International Footprint
Overwhelming Number of Segments & Models
Extremely Rich Data from Diversified Sources
Information Overload instead of Data Mining
Ever-Complicated IT Infrastructure
High Exposures to System Risks
Dynamic Fraud Trends & Smarter Fraudsters
Escalating Model Decay & Deterioration
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Data-Driven Model (DDM) Strategy
Conceptual
DDMModular Data Processing
Automatic Model
Development
Dynamic Rule Induction
Real-Time Deployment
Daily Monitoring
Implemented by
Rapid Model Refresh (RMR)
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Agenda Objectives
Traditional Tactics Fighting Fraud
Best Practice in PayPal Fraud Detection
Rapid Model Refresh (RMR)
Extensions and Future
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
What’s RMR? Three Common Layers
Data
Layer
Algorithm
Layer
Deployment
Layer
•Packaged Processing
•Optimized Queries
•Repeatable Stream
•Arbitrary Models
•Standard Evaluation
•Version Controlled
• Model Specs. to XML
•Deploy in Real-Time
•Batched Monitor
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
RMR – Data Layer
Enterprise Database Web Logs 3rd-Party
Sources
CoarseLayer
Variables Creation / Imputation / Transformation
Model Development
SAS Data
FineLayer
Modular SAS Macros &
Parameterized Scripts
SAS as Wrapper around Shell / SED / BTEQ Scripts
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Data Layer at A Glance
SAS Workflow
20+ SAS Macros
Shell Scripts
SED Stream Editor
BTEQ Interface with Teradata
Data ManipulationVariable Transformation
Create Dynamic SQLParallel Execution
Update Parameters in Scripts
Submit SQL
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Code Snippet in Data Layer
2
3
1
1. Use SED update parameters in the query
2. Submit the query to Teradata through BTEQ
3. Append the log into a output file
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
RMR – Algorithm Layer
Model Evaluation (KS / AUC / … ) Swap Analysis for Rule Sets
Supported by SAS / STAT & SAS / Enterprise Miner
Champion
•Generalized Linear Model
Arbitrary Challengers
•Neural Nets
•Bagging Trees
… …
Bumping
•Stochastic Search for Best Tree(s)
Stump
•Exhaustive Search for Best Cutoffs
Best Models to Production
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
A Peek into Algorithm Layer
50% Training
SAS EDA
Macros
WoE Vars
Binned Vars
GLM
NNET
Bagging
Tree2 … … TreeX
25% Testing
25% Validation
SAS Evaluation
MacrosBest
Model
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
One Tree, Endless PossibilitiesUse Cases of Decision Tree in RMR’s View
Bagging Simple Average of Massive Number of Trees Take Advantages of RMR Deployment Layer and Parallel Computing Use as A Challenger to Traditional Logistic Regression
Bumping Stochastic Search from Massive Number of Trees Improve Estimation while Retain Simple Tree Structure Use to Enhance Vallina-Version Tree Development
Stump Exhaustive Search on 1-Dimension Space, e.g. Score Induce 1-Level Binary Tree by Minimizing Gini Impurity Use to Find the Best Score Cutoff while Balancing Review Rate
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Pick Winner from Multiple Candidates
Generically Support Arbitrary Number of Score Inputs for Massive Models Evaluation and Deployment
Sample 1 Sample 2 Sample 3 Sample 4 Sample 1 Sample 2 Sample 3 Sample 4
Champion Model 0 0 1 0 55 52 54 54Challenger Model 1 0 0 0 0 58 55 60 58Challenger Model 2 1 1 0 1 61 59 64 62Challenger Model 3 0 0 0 0 57 53 59 56
Champion Model 1 0 1 1 52 46 43 40Challenger Model 1 0 0 0 0 48 42 41 36Challenger Model 2 0 1 0 0 52 45 45 43Challenger Model 3 0 0 0 0 44 38 37 35
Champion Model 1 1 1 1 72 74 74 73Challenger Model 1 0 0 0 0 65 66 67 65Challenger Model 2 0 0 0 0 69 71 72 72Challenger Model 3 0 0 0 0 64 65 67 66
Champion Model 0 1 0 81 76 72 70Challenger Model 1 0 0 0 0 70 64 63 60Challenger Model 2 1 0 1 1 81 75 72 71Challenger Model 3 0 0 0 0 71 63 62 59
SEGMENT 03
SEGMENT 04
SEGMENT 05
SEGMENT 06
SCORECARD EVALUATION SUMMARY
BEST MODEL PREDICTABILITY MEASURE
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
RMR – Deployment Layer
Model Specifications
Convert to XML / PMML
Inject into Web Engine
Collect Web Logs in DB
Monitor Daily Scoring Stability
Email Reports to Stakeholders
Perl
Shell
SAS
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
A Use Case: Score Monitoring
Lookup Tables
Objectives: Score Shift System Breakage
Driver Table Log Table
Model / Segment / Owner Lookups
Baseline Distribution Daily Web Log
SAS Daily Job Scheduled by Cron
Population Stability Reports in Html
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Sample Reports
MODEL MODEL DAILY
TYPE NAME VOLUME
GWM 1 1 1 7027 100.00% 0.00% 0.0084
GWM 1 1 2 37388 95.00% 5.00% 0.0068
GWM 1 1 3 33336 100.00% 0.00% 0.0174
GWM 1 1 4 2410 100.00% 0.00% 0.2529
GWM 1 1 5 27924 100.00% 0.00% 0.0121
GWM 1 1 6 13093 100.00% 0.00% 0.0188
Back-End
OVERALL SUMMARY of POPULATION STABILITY INDEX on 05/12/2010
VERSION TIER SEGMENT % VALID%
MISSING PSI
MIN. MAX. EXPECTED ACTUALSCORE SCORE DISTRIBUTION DISTRIBUTIONLow 521 342 5.00% 4.87% 0.0000521 540 324 5.00% 4.61% 0.0003540 553 353 5.00% 5.02% 0.0000553 562 330 5.00% 4.70% 0.0001562 569 328 5.00% 4.67% 0.0002569 576 359 5.00% 5.11% 0.0000576 581 331 5.02% 4.71% 0.0001581 587 396 5.04% 5.64% 0.0006587 591 325 4.94% 4.63% 0.0002
POPULATION STABILITY INDEX Details for GWM Segment 2
FREQ. PSI
… …
Overall
Detailed
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Formula for RMR Success
RMR = 1% × INSPIRATION + 99% × PERSPIRATION
Risk Management Collaboration Award
Nominee for PayPallian
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Agenda Objectives
Traditional Tactics Fighting Fraud
Best Practice in PayPal Fraud Detection
Rapid Model Refresh (RMR)
Extensions and Future
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Evolution of RMR Paradigm
Past Now Future
Expert Process• Programmers Pull
Data
• Statisticians Build
Predictive Model
• Engineers Hard-
Code Specification
into On-Line
Environment
• Meets Minimum
Benefit Schedule.
Mechanized Process• Population and
Performance
Criterion Identified
• A Suite of Challenger
Models Built
Automatically
• Model Specifications
Published in Live
Scoring Platform
• New Models
Deployed in Periodic
Batch
Online Process• Models Developed &
Deployed with Most
Recent Online Data
Dynamically
• Re-deployment of
New Models not
Needed
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
2-Path Directions
Alternate Big Data Analytics
Framework
SAS / Teradata in-DB Analytics
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Special Thanks to:
SAS
Dr. Jerry Oglesby