View
14
Download
0
Category
Preview:
Citation preview
Fraud DetectionANALYZING ONLINE TRANSACTION DATA WITH SQL / R / POWERBI
Fraud Detection
• Help data scientists easily build and deploy an online transaction fraud detection solution.
• Fraud detection typically handled as a binary classification problem.
• Class population is severely unbalanced, high volume of transaction.
• When fraudulent transactions discovered, business typically takes measures to block the accounts from transacting to prevent further losses.
Transaction Data
Source: Cortana Analytics Online-Fraud-Detection-Template-1
Data fields Description
transactionID Unique transaction ID
accountID Unique account ID
transactionAmountUSD Transaction amount in USD.
transactionAmount Transaction amount in currency expressed in transactionCurrencyCode
transactionCurrencyCode Currency code of the transaction. 3 alphabet letters, e.g., USD
transactionCurrencyConversionRate Conversion rate to US Dollars. e.g. 1.00000 for USD to USD
transactionDate Date when transaction occurred. Typically in the time zone of the processor.
transactionTime Time when transaction occurred. Typically in the time zone of the processing end.
localHour The hour in local time. Value of 0-23
• Online Fraud- Untagged Transactions.csv• Online Fraud- Fraud Transactions.csv
Steps in Demo
Prepare DataTrain/Predict/Evaluate
ModelDeploy
SQL Server 2016
Prepare Data: Create SQL Tables
0: Create Tables
•Input:
•Online Fraud- Untagged Transactions.csv
•Online Fraud- Fraud Transactions.csv
•Output:
•untaggedData
•fraud
1: Tagging
•Input:
•untaggedData
•fraud
•Output:
•sql_taggedData
2: Preprocessing
•Input:
•sql_taggedData
•Output:
•sql_tagged_training
•sql_tagged_testing
3: Create Risk Tables
•Input:
•sql_tagged_training
•Output:
•sql_risk_var: stores the name of variables to be converted and the name of risk tables
•sql_risk_xxx: risk tables for variable xxx.
4: Feature Engineering
•Input:
•sql_tagged_training
•sql_risk_var
•sql_risk_xxx
•Output:
•sql_tagged_training: new created features will be appended to original sql_tagged_training table
Tags:• Non fraud – accountID does not appear in the fraud data
table• Fraud – transaction was present in fraud data table• Pre fraud – accountID was in fraud data table but transaction
was not.
Model in Database
5: Model Training
• Input:
• sql_tagged_training
• Output:
• sql_trained_model: stores a serialized model
6: Prediction
• Input:
• sql_trained_model
• sql_tagged_testing
• Output:
• sql_predict_score: stores the predicted score
7: Evaluation
• Input:
• sql_predict_score
• Output
• sql_performance: metrics on account level.
• sql_performance_auc: stores metrics on transaction level: AUC of ROC curve.
Deploy
Send transaction data to SQL Server Model
Receive predicted probability of fraud
Use predicted probability to interrupt a purchase
Demo
Develop in Database from
an R client
Operationalize in Database with T- SQL
Deploy Model
Visualize Data in PowerBI
Develop in R Client IDE
Operationalize In Database
Visualize Data
Power BI
Visualize Predictions
Power BI
Deploy
Resources
Online Fraud Detection Template with SQL Server R Services -https://gallery.cortanaanalytics.com/Tutorial/Online-Fraud-Detection-Template-with-SQL-Server-R-Services-1
R and SQL code for this demo
https://github.com/Microsoft/SQL-Server-R-Services-Samples/tree/master/FraudDetection/SQLR
Recommended