14
Fraud Detection ANALYZING ONLINE TRANSACTION DATA WITH SQL / R / POWERBI

Fraud Detection - Meetupfiles.meetup.com/6477262/Microsoft SQL R Server - Fraud...Fraud Detection •Help data scientists easily build and deploy an online transaction fraud detection

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fraud Detection - Meetupfiles.meetup.com/6477262/Microsoft SQL R Server - Fraud...Fraud Detection •Help data scientists easily build and deploy an online transaction fraud detection

Fraud DetectionANALYZING ONLINE TRANSACTION DATA WITH SQL / R / POWERBI

Page 2: Fraud Detection - Meetupfiles.meetup.com/6477262/Microsoft SQL R Server - Fraud...Fraud Detection •Help data scientists easily build and deploy an online transaction fraud detection

Fraud Detection

• Help data scientists easily build and deploy an online transaction fraud detection solution.

• Fraud detection typically handled as a binary classification problem.

• Class population is severely unbalanced, high volume of transaction.

• When fraudulent transactions discovered, business typically takes measures to block the accounts from transacting to prevent further losses.

Page 3: Fraud Detection - Meetupfiles.meetup.com/6477262/Microsoft SQL R Server - Fraud...Fraud Detection •Help data scientists easily build and deploy an online transaction fraud detection

Transaction Data

Source: Cortana Analytics Online-Fraud-Detection-Template-1

Data fields Description

transactionID Unique transaction ID

accountID Unique account ID

transactionAmountUSD Transaction amount in USD.

transactionAmount Transaction amount in currency expressed in transactionCurrencyCode

transactionCurrencyCode Currency code of the transaction. 3 alphabet letters, e.g., USD

transactionCurrencyConversionRate Conversion rate to US Dollars. e.g. 1.00000 for USD to USD

transactionDate Date when transaction occurred. Typically in the time zone of the processor.

transactionTime Time when transaction occurred. Typically in the time zone of the processing end.

localHour The hour in local time. Value of 0-23

• Online Fraud- Untagged Transactions.csv• Online Fraud- Fraud Transactions.csv

Page 4: Fraud Detection - Meetupfiles.meetup.com/6477262/Microsoft SQL R Server - Fraud...Fraud Detection •Help data scientists easily build and deploy an online transaction fraud detection

Steps in Demo

Prepare DataTrain/Predict/Evaluate

ModelDeploy

SQL Server 2016

Page 5: Fraud Detection - Meetupfiles.meetup.com/6477262/Microsoft SQL R Server - Fraud...Fraud Detection •Help data scientists easily build and deploy an online transaction fraud detection

Prepare Data: Create SQL Tables

0: Create Tables

•Input:

•Online Fraud- Untagged Transactions.csv

•Online Fraud- Fraud Transactions.csv

•Output:

•untaggedData

•fraud

1: Tagging

•Input:

•untaggedData

•fraud

•Output:

•sql_taggedData

2: Preprocessing

•Input:

•sql_taggedData

•Output:

•sql_tagged_training

•sql_tagged_testing

3: Create Risk Tables

•Input:

•sql_tagged_training

•Output:

•sql_risk_var: stores the name of variables to be converted and the name of risk tables

•sql_risk_xxx: risk tables for variable xxx.

4: Feature Engineering

•Input:

•sql_tagged_training

•sql_risk_var

•sql_risk_xxx

•Output:

•sql_tagged_training: new created features will be appended to original sql_tagged_training table

Tags:• Non fraud – accountID does not appear in the fraud data

table• Fraud – transaction was present in fraud data table• Pre fraud – accountID was in fraud data table but transaction

was not.

Page 6: Fraud Detection - Meetupfiles.meetup.com/6477262/Microsoft SQL R Server - Fraud...Fraud Detection •Help data scientists easily build and deploy an online transaction fraud detection

Model in Database

5: Model Training

• Input:

• sql_tagged_training

• Output:

• sql_trained_model: stores a serialized model

6: Prediction

• Input:

• sql_trained_model

• sql_tagged_testing

• Output:

• sql_predict_score: stores the predicted score

7: Evaluation

• Input:

• sql_predict_score

• Output

• sql_performance: metrics on account level.

• sql_performance_auc: stores metrics on transaction level: AUC of ROC curve.

Page 7: Fraud Detection - Meetupfiles.meetup.com/6477262/Microsoft SQL R Server - Fraud...Fraud Detection •Help data scientists easily build and deploy an online transaction fraud detection

Deploy

Send transaction data to SQL Server Model

Receive predicted probability of fraud

Use predicted probability to interrupt a purchase

Page 8: Fraud Detection - Meetupfiles.meetup.com/6477262/Microsoft SQL R Server - Fraud...Fraud Detection •Help data scientists easily build and deploy an online transaction fraud detection

Demo

Develop in Database from

an R client

Operationalize in Database with T- SQL

Deploy Model

Visualize Data in PowerBI

Page 9: Fraud Detection - Meetupfiles.meetup.com/6477262/Microsoft SQL R Server - Fraud...Fraud Detection •Help data scientists easily build and deploy an online transaction fraud detection

Develop in R Client IDE

Page 10: Fraud Detection - Meetupfiles.meetup.com/6477262/Microsoft SQL R Server - Fraud...Fraud Detection •Help data scientists easily build and deploy an online transaction fraud detection

Operationalize In Database

Page 11: Fraud Detection - Meetupfiles.meetup.com/6477262/Microsoft SQL R Server - Fraud...Fraud Detection •Help data scientists easily build and deploy an online transaction fraud detection

Visualize Data

Power BI

Page 12: Fraud Detection - Meetupfiles.meetup.com/6477262/Microsoft SQL R Server - Fraud...Fraud Detection •Help data scientists easily build and deploy an online transaction fraud detection

Visualize Predictions

Power BI

Page 13: Fraud Detection - Meetupfiles.meetup.com/6477262/Microsoft SQL R Server - Fraud...Fraud Detection •Help data scientists easily build and deploy an online transaction fraud detection

Deploy

Page 14: Fraud Detection - Meetupfiles.meetup.com/6477262/Microsoft SQL R Server - Fraud...Fraud Detection •Help data scientists easily build and deploy an online transaction fraud detection

Resources

Online Fraud Detection Template with SQL Server R Services -https://gallery.cortanaanalytics.com/Tutorial/Online-Fraud-Detection-Template-with-SQL-Server-R-Services-1

R and SQL code for this demo

https://github.com/Microsoft/SQL-Server-R-Services-Samples/tree/master/FraudDetection/SQLR