Rapid Model Refresh (RMR)
in Online Fraud Detection Engine
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Agenda
Overview
Traditional Tactics Fighting Fraud
Best Practice in PayPal Fraud Detection
Rapid Model Refresh (RMR)
Extensions and Future
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Online Fraud in Financial Services
Evolution in Financial Services
• Paper-Based
• In-Branch
• Perceptible Footprint
… …
• Electronic
• Cyber Spaces
• Invisible Marketplace
… …
Emerging Fraud Trends
• Old-Fashion
• Isolated Individual
• Limited-Scope Damage
• Traceable Patterns
… …
• Tech-Savvy
• Organized Gang
• Multi-Billion Loss
• Dynamic Trends
… …
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Industry Fact
$1.5
$1.7 $2
.1
$1.9
$2.6
$2.8 $3
.1 $3.7 $4
.0
$0
$1
$2
$3
$4
$5
2000 2001 2002 2003 2004 2005 2006 2007 2008
Lo
ss in
Bill
ion
$
Online Revenue Loss Due to Fraud
Source: Cybersource
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Agenda
Objectives
Traditional Tactics Fighting Fraud
Best Practice in PayPal Fraud Detection
Rapid Model Refresh (RMR)
Extensions and Future
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Traditional Mitigation Tactics
Heuristic Approach Detect Anomalies Identify Patterns Set Review Criterion
Model-Based Score Rely on Statistical Models (Logit Models / Neural Nets) Generate Suspicion Score Rank Order Transactions
Rule-Based System Employ Machine Learning Algorithms Generate Rule Sets for Segmentation Target High-Risk Segments
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Pros and Cons
Heuristic
• Integrate Domain Knowledge• Easy to Implement
• Review-Based & Labor Intensive• Local Solutions without Global View
Scoring
• Successful Industrial Applications• Ideal for Large-Scale Domains
• Long Time-to-Market• Static perspective of Fraud Trends
Rule-Induction
• Fits Dynamic Online Nature• Rapid Development & Deployment
• Require Frequent Refreshes• Burden of High-Volume Rules
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Next … …
Now What?
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Agenda
Objectives
Traditional Tactics Fighting Fraud
Best Practices in PayPal Fraud Detection
Rapid Model Refresh (RMR)
Extensions and Future
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
PayPal's Way to Fight Frauds
PayPal Loss Trend from 200X through 200Y
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Multi-Level Detection Engine
Risk Scoring Rule Induction Agent Review
• Modelers developed scoring models with logistic regression / neural network
• Risk score is assigned to each transaction through the system.
• Low-risk transactions will be passed through.
• Analysts built decision trees on high-risk transactions ranked order by risk scoring.
• Most risky segments are further identified by balancing between bad and pass-through rate.
• Most risky transactions identified by rule sets are sent into review queues.
• Queued transactions are prioritized and routed to agents in specific domains.
• Case review and investigation are conducted.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Implementation Challenges
Realities Problems
Fast-Growing International Footprint
Overwhelming Number of Segments & Models
Extremely Rich Data from Diversified Sources
Information Overload instead of Data Mining
Ever-Complicated IT Infrastructure
High Exposures to System Risks
Dynamic Fraud Trends & Smarter Fraudsters
Escalating Model Decay & Deterioration
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Data-Driven Model (DDM) Strategy
Conceptual
DDMModular Data Processing
Automatic Model
Development
Dynamic Rule Induction
Real-Time Deployment
Daily Monitoring
Implemented by
Rapid Model Refresh (RMR)
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Agenda
Objectives
Traditional Tactics Fighting Fraud
Best Practice in PayPal Fraud Detection
Rapid Model Refresh (RMR)
Extensions and Future
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
What’s RMR?
Three Common Layers
Data
Layer
Algorithm
Layer
Deployment
Layer
•Packaged Processing
•Optimized Queries
•Repeatable Stream
•Arbitrary Models
•Standard Evaluation
•Version Controlled
• Model Specs. to XML
•Deploy in Real-Time
•Batched Monitor
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
RMR – Data Layer
Enterprise Database
Web Logs3rd-Party Sources
Coarse
Layer
Variables Creation / Imputation / Transformation
Model Development
SAS Data
Fine
Layer
Modular SAS Macros &
Parameterized Scripts
SAS as Wrapper around Shell / SED / BTEQ Scripts
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Data Layer at A Glance
SAS Workflow
20+ SAS Macros
Shell Scripts
SED Stream Editor
BTEQ Interface with Teradata
Data Manipulation
Variable Transformation
Create Dynamic SQL
Parallel Execution
Update Parameters in Scripts
Submit SQL
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Code Snippet in Data Layer
2
3
1
1. Use SED update parameters in the query
2. Submit the query to Teradata through BTEQ
3. Append the log into a output file
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
RMR – Algorithm Layer
Model Evaluation (KS / AUC / … ) Swap Analysis for Rule Sets
Supported by SAS / STAT & SAS / Enterprise Miner
Champion
•Generalized Linear Model
Arbitrary Challengers
•Neural Nets
•Bagging Trees
… …
Bumping
•Stochastic Search for Best Tree(s)
Stump
•Exhaustive Search for Best Cutoffs
Best Models to Production
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
A Peek into Algorithm Layer
50% Training
SAS EDA
Macros
WoE Vars
Binned Vars
GLM
NNET
Bagging
Tree2 … … TreeX
25% Testing
25% Validation
SAS Evaluation
Macros
Best Model
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
One Tree, Endless Possibilities
Use Cases of Decision Tree in RMR’s View
Bagging Simple Average of Massive Number of Trees Take Advantages of RMR Deployment Layer and Parallel Computing Use as A Challenger to Traditional Logistic Regression
Bumping Stochastic Search from Massive Number of Trees Improve Estimation while Retain Simple Tree Structure Use to Enhance Vallina-Version Tree Development
Stump Exhaustive Search on 1-Dimension Space, e.g. Score Induce 1-Level Binary Tree by Minimizing Gini Impurity Use to Find the Best Score Cutoff while Balancing Review Rate
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Pick Winner from Multiple Candidates
Generically Support Arbitrary Number of Score Inputs for Massive Models Evaluation and Deployment
Sample 1 Sample 2 Sample 3 Sample 4 Sample 1 Sample 2 Sample 3 Sample 4
Champion Model 0 0 1 0 55 52 54 54Challenger Model 1 0 0 0 0 58 55 60 58Challenger Model 2 1 1 0 1 61 59 64 62Challenger Model 3 0 0 0 0 57 53 59 56
Champion Model 1 0 1 1 52 46 43 40Challenger Model 1 0 0 0 0 48 42 41 36Challenger Model 2 0 1 0 0 52 45 45 43Challenger Model 3 0 0 0 0 44 38 37 35
Champion Model 1 1 1 1 72 74 74 73Challenger Model 1 0 0 0 0 65 66 67 65Challenger Model 2 0 0 0 0 69 71 72 72Challenger Model 3 0 0 0 0 64 65 67 66
Champion Model 0 1 0 81 76 72 70Challenger Model 1 0 0 0 0 70 64 63 60Challenger Model 2 1 0 1 1 81 75 72 71Challenger Model 3 0 0 0 0 71 63 62 59
SEGMENT 03
SEGMENT 04
SEGMENT 05
SEGMENT 06
SCORECARD EVALUATION SUMMARY
BEST MODEL PREDICTABILITY MEASURE
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
RMR – Deployment Layer
Model Specifications
Convert to XML / PMML
Inject into Web Engine
Collect Web Logs in DB
Monitor Daily Scoring Stability
Email Reports to Stakeholders
Perl
Shell
SAS
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
A Use Case: Score Monitoring
Lookup Tables
Objectives:
Score Shift System Breakage
Driver Table Log Table
Model / Segment / Owner Lookups
Baseline Distribution
Daily Web Log
SAS Daily Job Scheduled by Cron
Population Stability Reports in Html
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Sample Reports
MODEL MODEL DAILY
TYPE NAME VOLUME
GWM 1 1 1 7027 100.00% 0.00% 0.0084
GWM 1 1 2 37388 95.00% 5.00% 0.0068
GWM 1 1 3 33336 100.00% 0.00% 0.0174
GWM 1 1 4 2410 100.00% 0.00% 0.2529
GWM 1 1 5 27924 100.00% 0.00% 0.0121
GWM 1 1 6 13093 100.00% 0.00% 0.0188
Back-End
OVERALL SUMMARY of POPULATION STABILITY INDEX on 05/12/2010
VERSION TIER SEGMENT % VALID%
MISSINGPSI
MIN. MAX. EXPECTED ACTUAL
SCORE SCORE DISTRIBUTION DISTRIBUTION
Low 521 342 5.00% 4.87% 0.0000
521 540 324 5.00% 4.61% 0.0003
540 553 353 5.00% 5.02% 0.0000
553 562 330 5.00% 4.70% 0.0001
562 569 328 5.00% 4.67% 0.0002
569 576 359 5.00% 5.11% 0.0000
576 581 331 5.02% 4.71% 0.0001
581 587 396 5.04% 5.64% 0.0006
587 591 325 4.94% 4.63% 0.0002
POPULATION STABILITY INDEX Details for GWM Segment 2
FREQ. PSI
… …
Overall
Detailed
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Agenda
Objectives
Traditional Tactics Fighting Fraud
Best Practice in PayPal Fraud Detection
Rapid Model Refresh (RMR)
Extensions and Future
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
Evolution of RMR Paradigm
Past Now Future
Expert Process
• Programmers Pull
Data
• Statisticians Build
Predictive Model
• Engineers Hard-
Code Specification
into On-Line
Environment
• Meets Minimum
Benefit Schedule.
Mechanized Process
• Population and
Performance
Criterion Identified
• A Suite of Challenger
Models Built
Automatically
• Model Specifications
Published in Live
Scoring Platform
• New Models
Deployed in Periodic
Batch
Online Process
• Models Developed &
Deployed with Most
Recent Online Data
Dynamically
• Re-deployment of
New Models not
Needed
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410
2-Path Directions
Hadoop with R Integration
SAS / Teradata in-DB Analytics