Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek Mehta, Fareed Jawad

Preview:

DESCRIPTION

 

Citation preview

28/01/13 1

Fraudsters are smart, “Frank” is Smarter

- Fareed and Vivek

2 of 22

Outline

Why detect fraud – Is there a problem?Why an intelligent system?How we built one

Show me some numbers

What was the value of all electronic transactions globally for year 2012?

$17 trillion (with a T)This includes all credit, debit and pre-paid

cards used in both online and offline (card present) scenarios for purchases and cash withdrawals

More Numbers

How much of $17T was lost due to FRAUD?$8 billion in 2012, > $10 billion by 2015 Fraud rate of 0.05% – Not too bad right?Wrong !!

Getting specific

Reminder - 0.05% ratio is for all transactions including face to face transactions

The fraud rate is a much more scary 3.5% for Online transactions aka CNP

Global e-Commerce is expected to exceed $1T in 2013 –> $3.5B will be lost due to fraud

Add to this, the erosion due to loss of future business from impacted customers

Big Customer Impact ! Big deal for us !!

The Big Fight...

Fraud to transaction ratio has been constant over the past 10 years

This ratio should not lull us into a false sense of security – bigger numbers are at stake and increasing as volumes grow

The crooks LOVE e-Commerce (think 3.5%)How do we then figure out if a transaction is

genuine or a victim of fraudIntelligently of course! - ENTER FRANK !!

Why Frank?

8 of 22

Fraud Detection System

Two partsSignals/FeaturesAlgorithm

9 of 22

Rule based system

Rules on various signalsNum of transaction from a card in last one dayTransaction amountand many more

Thresholds are hand craftedFraud Score = sum of individual scores

10 of 22

Need for Smarter system Too much data for manual analysis Businesses are evolving Fraudsters are evolving Extending to really high dimension – pushing

beyond limits of rule based system

11 of 22

Designing Frank

Labeled data missingObservation

Very few fraud records

When you see one, you can identify one

Social behavior

12 of 22

Visualization

Number of transactions in a day by a user

Total Amount

13 of 22

Visualization

14 of 22

Clustering – Centroid based

15 of 22

Clustering - Distance based

16 of 22

Clustering - Distribution based

17 of 22

Clustering – Density based

18 of 22

Density based clustering

p

qp1

EpsilonMin-ptsStatistical distance

Scale invariantCorrelation taken into

account

19 of 22

Clustering for detecting fraud

Cluster the data using density based clusteringFor new point find distance to all the existing

clustersIf there exists min-pts with epsilon dist in a

cluster, new point belongs to this clusterIf doesn't belong to any cluster -> fraud

20 of 22

Computing fraud probability

We find nearest clusterConvert the distance to probability

using chi-square distribution

Probability of fraud between 0 and 1

21 of 22

Execution

Distributed clusteringReal-time model updating< 20ms to compute fraud probabilitySuspend the payment authorization in real

time

22 of 22

We Frank, You Shop.