Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
19/09/07 1 / 69
Plastic Card Fraud Detection using Peer Group analysis
David Weston, Niall Adams, David Hand, Christopher Whitrow, Piotr Juszczak
19 September, 2007
EPSRC Think Crime Initiative
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 2 / 69
• EPSRC Think Crime Initiative• Crime Prevention & Detection• Funding 12 projects• Also feasibilty studies and more
Think Crime Project
• Develop Fraud Detection Tools• Real Data
ThinkCrime Team
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 3 / 69
• Members of the team are
◦ David Hand◦ Niall Adams◦ Christopher Whitrow◦ Piotr Juszczak◦ David Weston◦ Gordon Blunt
• Collaborating banks
◦ Abbey National, Alliance and Leicester, Capital One,Lloyds TSB
Overview
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 4 / 69
• Plastic Card Fraud• Peer Group Analysis
◦ Introduction◦ Applied to Time-Aligned Multivariate Continuous Data
• The Dataset• Peer Group Analysis
◦ Applied to Credit Card Transaction Data
• Performance Evaluation• Experiments & Results• Conclusions & Current Work
Plastic Card Fraud
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud• Consequences ofFraud
• Patterns Of Fraud• Determining whenFraud has occurred• Challenges of FraudDetection
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 5 / 69
Consequences of Fraud
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud• Consequences ofFraud
• Patterns Of Fraud• Determining whenFraud has occurred• Challenges of FraudDetection
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 6 / 69
• Financial Consequences
◦ Financial Consequences
• UK: £428.0m
Consequences of Fraud
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud• Consequences ofFraud
• Patterns Of Fraud• Determining whenFraud has occurred• Challenges of FraudDetection
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 6 / 69
• Financial Consequences
◦ Financial Consequences
• UK: £428.0m
◦ Consumer Consequences
• Customer Inconvenience• Fraud Detection
◦ Transactions falsely flagged as fraudulent
Patterns Of Fraud
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud• Consequences ofFraud
• Patterns Of Fraud• Determining whenFraud has occurred• Challenges of FraudDetection
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 7 / 69
• Fraud evolves to evade detection
Patterns Of Fraud
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud• Consequences ofFraud
• Patterns Of Fraud• Determining whenFraud has occurred• Challenges of FraudDetection
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 7 / 69
• Fraud evolves to evade detection• APACS 14/03/07• UK card fraud £309.8m (−13%)• Fraud abroad £118.2m (+43%)
Patterns Of Fraud
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud• Consequences ofFraud
• Patterns Of Fraud• Determining whenFraud has occurred• Challenges of FraudDetection
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 7 / 69
• Fraud evolves to evade detection• APACS 14/03/07• UK card fraud £309.8m (−13%)• Fraud abroad £118.2m (+43%)
The introduction of chip and PIN has made it more difficult forfraudsters to commit card fraud in the UK... create counterfeitmagnetic stripe cards that can potentially be used in countriesthat haven’t upgraded to chip and PIN. This has caused theincrease in fraud abroad losses over the last 12 months.
Determining when Fraud has occurred
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud• Consequences ofFraud
• Patterns Of Fraud• Determining whenFraud has occurred• Challenges of FraudDetection
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 8 / 69
• Issuing Bank determines if fraud has taken place• Can take several months• Not necessarily correct
Determining when Fraud has occurred
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud• Consequences ofFraud
• Patterns Of Fraud• Determining whenFraud has occurred• Challenges of FraudDetection
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 8 / 69
• Issuing Bank determines if fraud has taken place• Can take several months• Not necessarily correct
• Bad Debt
◦ Bankruptcy
Determining when Fraud has occurred
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud• Consequences ofFraud
• Patterns Of Fraud• Determining whenFraud has occurred• Challenges of FraudDetection
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 8 / 69
• Issuing Bank determines if fraud has taken place• Can take several months• Not necessarily correct
• Bad Debt
◦ Bankruptcy
• ‘Friendly Fraud’
Determining when Fraud has occurred
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud• Consequences ofFraud
• Patterns Of Fraud• Determining whenFraud has occurred• Challenges of FraudDetection
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 8 / 69
• Issuing Bank determines if fraud has taken place• Can take several months• Not necessarily correct
• Bad Debt
◦ Bankruptcy
• ‘Friendly Fraud’
◦ 2001 US Banker magazine: over half online fraudulenttransactions
• Account Holder declares a transaction they have performedis fraudulent
Challenges of Fraud Detection
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud• Consequences ofFraud
• Patterns Of Fraud• Determining whenFraud has occurred• Challenges of FraudDetection
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 9 / 69
• Fraud Evolution
Challenges of Fraud Detection
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud• Consequences ofFraud
• Patterns Of Fraud• Determining whenFraud has occurred• Challenges of FraudDetection
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 9 / 69
• Fraud Evolution
• Data streams
• Timeliness
◦ Online System◦ Back Office
• Imbalanced Classes
◦ Fraud as % of total value of number of transactions0.0148% (credit card, Australia)
Peer Group Analysis -Introduction
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction• Approaches to FraudDetection
• Anomaly Detection
• Peer Group Analysis• Anomaly Detection toPeer Groups I• Anomaly Detection toPeer Groups II• Anomaly Detection toPeer Groups III
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results19/09/07 10 / 69
Approaches to Fraud Detection
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction• Approaches to FraudDetection
• Anomaly Detection
• Peer Group Analysis• Anomaly Detection toPeer Groups I• Anomaly Detection toPeer Groups II• Anomaly Detection toPeer Groups III
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results19/09/07 11 / 69
• Broadly 2 approaches to statistical fraud detection• Supervised or Anomaly Detection
Approaches to Fraud Detection
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction• Approaches to FraudDetection
• Anomaly Detection
• Peer Group Analysis• Anomaly Detection toPeer Groups I• Anomaly Detection toPeer Groups II• Anomaly Detection toPeer Groups III
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results19/09/07 11 / 69
• Broadly 2 approaches to statistical fraud detection• Supervised or Anomaly Detection
◦ Supervised
• Historical Instances of Fraud• Less likely to falsely flag a transaction as fraudulent• Approach Chris is taking
Anomaly Detection
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction• Approaches to FraudDetection
• Anomaly Detection
• Peer Group Analysis• Anomaly Detection toPeer Groups I• Anomaly Detection toPeer Groups II• Anomaly Detection toPeer Groups III
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results19/09/07 12 / 69
• Does not use historical Instances of Fraud• Build a profile of ‘usual’ behaviour• Significant deviations considered as potential frauds• More likely to falsely flag a transaction as fraudulent• Potential to adapt to changing fraud patterns• Approach Piotr is taking
Peer Group Analysis
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction• Approaches to FraudDetection
• Anomaly Detection
• Peer Group Analysis• Anomaly Detection toPeer Groups I• Anomaly Detection toPeer Groups II• Anomaly Detection toPeer Groups III
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results19/09/07 13 / 69
• Similar to anomaly detection methods• Do not need to build a model of usual behaviour for
account holder• Determine a peer group• Find other accounts that you expect will behave similarly to
the account holder• Find accounts that have behaved similarly in the past• Monitor account holder’s behaviour with respect to peer
group• Anomalous behaviour, should account holder deviate
strongly from peer group
Anomaly Detection to Peer Groups I
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction• Approaches to FraudDetection
• Anomaly Detection
• Peer Group Analysis• Anomaly Detection toPeer Groups I• Anomaly Detection toPeer Groups II• Anomaly Detection toPeer Groups III
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results19/09/07 14 / 69
• The weekly amount spent on a credit card for a particularaccount
• Week 1 to Week n
y1, . . . , yn−1, yn
• Target Account• Wish to determine if the amount spent in week n is
anomalous
Anomaly Detection based on account profile
y1 y2 · · · yn−1 yn
Anomaly Detection to Peer Groups II
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction• Approaches to FraudDetection
• Anomaly Detection
• Peer Group Analysis• Anomaly Detection toPeer Groups I• Anomaly Detection toPeer Groups II• Anomaly Detection toPeer Groups III
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results19/09/07 15 / 69
Population Normalised Anomaly Detection
xm,1 xm,2 · · · xm,n−1 xm,n
...
x2,1 x2,2 · · · x2,n−1 x2,n
x1,1 x1,2 · · · x1,n−1 x1,n
y1 y2 · · · yn−1 yn
Anomaly Detection to Peer Groups III
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction• Approaches to FraudDetection
• Anomaly Detection
• Peer Group Analysis• Anomaly Detection toPeer Groups I• Anomaly Detection toPeer Groups II• Anomaly Detection toPeer Groups III
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
• Peer Groups Example
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results19/09/07 16 / 69
Sort accounts in order of decreasing similarity, π(i)
xπ(m),1 xπ(m),2 · · · xπ(m),n−1 xπ(m),n...
xπ(k),1 xπ(k),2 · · · xπ(k),n−1 xπ(k),n...
...
xπ(2),1 xπ(2),2 · · · xπ(2),n−1 xπ(2),n
xπ(1),1 xπ(1),2 · · · xπ(1),n−1 xπ(1),n
y1 y2 · · · yn−1 yn
• Peer Group size k
Peer Groups Example
19/09/07 17 / 69
10 20 30 40 50 60 7025
30
35
40
45
50
55
60
Peer Groups Example
19/09/07 18 / 69
10 20 30 40 50 60 7025
30
35
40
45
50
55
60
Peer Groups Example
19/09/07 19 / 69
10 20 30 40 50 60 7025
30
35
40
45
50
55
60
Peer Groups Example
19/09/07 20 / 69
50 52 54 56 58 60 62 64 66 68 7035
40
45
50
55
Peer Groups Example
19/09/07 21 / 69
50 52 54 56 58 60 62 64 66 68 700
10
20
30
40
50
60
Peer Group Analysis
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
• Detecting Anomalies
• Detecting Anomalies
• Robustifying PeerGroups
• Robustifying PeerGroups
• Peer Group Quality• Whitening thePopulation
• Building Peer Groups
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 22 / 69
Detecting Anomalies
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
• Detecting Anomalies
• Detecting Anomalies
• Robustifying PeerGroups
• Robustifying PeerGroups
• Peer Group Quality• Whitening thePopulation
• Building Peer Groups
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 23 / 69
• Assuming we already have a peer group set of accounts forour target account.
• yn is multivariate (column vector) and continuous• Mahalanobis distance of the target from the mean of its
peer group• µ is mean of xπ(1),n, . . . , xπ(k),n
• C is covariance matrix of xπ(1),n, . . . , xπ(k),n
• Mahalanobis distance of a target from its peer group
◦
√
(yn − µ)T C−1(yn − µ)
Detecting Anomalies
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
• Detecting Anomalies
• Detecting Anomalies
• Robustifying PeerGroups
• Robustifying PeerGroups
• Peer Group Quality• Whitening thePopulation
• Building Peer Groups
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 24 / 69
• If the distance is above an externally selected threshold,then we flag the target as fraudulent.
−10 −8 −6 −4 −2 0 2 4 6 8 10−10
−8
−6
−4
−2
0
2
4
6
8
10
Peer GroupTarget
Robustifying Peer Groups
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
• Detecting Anomalies
• Detecting Anomalies
• Robustifying PeerGroups
• Robustifying PeerGroups
• Peer Group Quality• Whitening thePopulation
• Building Peer Groups
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 25 / 69
• Peer Group contaminated by fraudulent transactions• Outlier Masking• Outlier Swamping
−10 −8 −6 −4 −2 0 2 4 6 8 10−10
−8
−6
−4
−2
0
2
4
6
8
10
Peer GroupTarget
Robustifying Peer Groups
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
• Detecting Anomalies
• Detecting Anomalies
• Robustifying PeerGroups
• Robustifying PeerGroups
• Peer Group Quality• Whitening thePopulation
• Building Peer Groups
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 26 / 69
• Robustify the covariance matrix for the MahalanobisDistance evaluation
• Use Heuristic
Robustifying Peer Groups
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
• Detecting Anomalies
• Detecting Anomalies
• Robustifying PeerGroups
• Robustifying PeerGroups
• Peer Group Quality• Whitening thePopulation
• Building Peer Groups
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 26 / 69
• Robustify the covariance matrix for the MahalanobisDistance evaluation
• Use Heuristic• An account that has deviated strongly from its peer group
at time t should not contribute to any peer group at time t
Robustifying Peer Groups
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
• Detecting Anomalies
• Detecting Anomalies
• Robustifying PeerGroups
• Robustifying PeerGroups
• Peer Group Quality• Whitening thePopulation
• Building Peer Groups
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 26 / 69
• Robustify the covariance matrix for the MahalanobisDistance evaluation
• Use Heuristic• An account that has deviated strongly from its peer group
at time t should not contribute to any peer group at time t
• For each peer group select p% closest to their own peergroups
Peer Group Quality
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
• Detecting Anomalies
• Detecting Anomalies
• Robustifying PeerGroups
• Robustifying PeerGroups
• Peer Group Quality• Whitening thePopulation
• Building Peer Groups
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 27 / 69
It is not necessarily the case that peer group analysis can besuccessfully deployed on all accounts.
qt =1
k
k∑
i=1
(yt − xπ(i),t)T (yt − xπ(i),t) (1)
where T is the transpose. This is a simple measure of howclose the members of the peer group are to the target.
• A good quality peer group is one that closely follows thetarget over time.
Qs,e =1
te − ts
te∑
t=ts
qt. (2)
Whitening the Population
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
• Detecting Anomalies
• Detecting Anomalies
• Robustifying PeerGroups
• Robustifying PeerGroups
• Peer Group Quality• Whitening thePopulation
• Building Peer Groups
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 28 / 69
• Whitening the population to make the scatter of a peergroup (of size 2) commensurate across time
• The smaller the value of Qs,e the better the peer grouptracks the target over time.
t=1 t=2 t=3
Peer Group Members
Population
Target
Building Peer Groups
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
• Detecting Anomalies
• Detecting Anomalies
• Robustifying PeerGroups
• Robustifying PeerGroups
• Peer Group Quality• Whitening thePopulation
• Building Peer Groups
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 29 / 69
• Possible to know apriori the peer group membership• Employee fraud detection, people with the same job
description can be naturally grouped together.• IBM FAMS. Health care fraud. Geography, speciality• Infer peer group membership from the time series itself• Measuring similarity of time series
The Dataset
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
• Real Data
• Transaction Details• Merchant CategoryCodes
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 30 / 69
Real Data
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
• Real Data
• Transaction Details• Merchant CategoryCodes
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 31 / 69
• Real credit card transaction history• 4 month period• Selected approximately 50,000 accounts• No static data about the account holder• Each account is a list of transactions
Transaction Details
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
• Real Data
• Transaction Details• Merchant CategoryCodes
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 32 / 69
Each Transaction is a record that includes
• Amount• Time transaction took place• Type of transaction, e.g. change pin code• ATM or POS• Card present / not present
A Fraud flag was provided that gave the date (to the nearestday) when fraudulent behaviour began.
Merchant Category Codes
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
• Real Data
• Transaction Details• Merchant CategoryCodes
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 33 / 69
• Identify in which market segment the transaction wasperformed
• For example ‘Book stores’• 4 digit number• Fewer than 10,000 codes in use
• Merchant Category Groups
Applying Peer GroupAnalysis
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis• Time Alignment &Feature Extraction• Time Alignment &Feature Extraction• Outlier Detection fromPeer Groups• Active and InactiveAccounts
• Building Peer Groups
• Building Peer Groups
• Building Peer Groups
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 34 / 69
Time Alignment & Feature Extraction
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis• Time Alignment &Feature Extraction• Time Alignment &Feature Extraction• Outlier Detection fromPeer Groups• Active and InactiveAccounts
• Building Peer Groups
• Building Peer Groups
• Building Peer Groups
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 35 / 69
• Accounts’ transactions are asynchronous data streams• Synchronise account time series by extracting features
from the data streams at regular time intervals• M(s, e, A) summarise transactions of account A occurring
from day s to day e inclusive
◦ Mean amount spent◦ Number of transactions◦ Entropy of Merchant Category Groups
• 16 Groups +1 for ATMs
• Returns 1 point in 3 dimensional space
Time Alignment & Feature Extraction
19/09/07 36 / 69
Account B
Day
Am
ount
With
draw
n
0 1 2 3 4 5 6 7 8 9 100
20
40
60
80
100
0 1 2 3 4 5 6 7 8 9 100
20
40
60
80
100Account A
Day
Am
ount
With
draw
n
M(7,10,B)
M(7,10,A)
Outlier Detection from Peer Groups
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis• Time Alignment &Feature Extraction• Time Alignment &Feature Extraction• Outlier Detection fromPeer Groups• Active and InactiveAccounts
• Building Peer Groups
• Building Peer Groups
• Building Peer Groups
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 37 / 69
• Once a day at midnight• Summary statistic for day t, behaviour of the past d days
M(t − d + 1, t, A)• Smaller d, the more sensitive to new transactions• Mahalanobis distance in 3 dimensional space
Active and Inactive Accounts
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis• Time Alignment &Feature Extraction• Time Alignment &Feature Extraction• Outlier Detection fromPeer Groups• Active and InactiveAccounts
• Building Peer Groups
• Building Peer Groups
• Building Peer Groups
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 38 / 69
• Account inactive on day t if it has not performed anytransactions on that day
• Do not test for outlierness for inactive accounts• Unusually long periods of inactivity will not be considered
fraudulent
Active and Inactive Accounts
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis• Time Alignment &Feature Extraction• Time Alignment &Feature Extraction• Outlier Detection fromPeer Groups• Active and InactiveAccounts
• Building Peer Groups
• Building Peer Groups
• Building Peer Groups
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 38 / 69
• Account inactive on day t if it has not performed anytransactions on that day
• Do not test for outlierness for inactive accounts• Unusually long periods of inactivity will not be considered
fraudulent• Account not active over entire summary statistic window• Active peer group members. Closest k accounts that are
active on at least one day of the summary statistic window
Building Peer Groups
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis• Time Alignment &Feature Extraction• Time Alignment &Feature Extraction• Outlier Detection fromPeer Groups• Active and InactiveAccounts
• Building Peer Groups
• Building Peer Groups
• Building Peer Groups
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 39 / 69
• Subdivide training data into n non-overlapping windows
◦ M(1, L
n, A), . . . ,M((n − 1)L
n+ 1, L,A)
• Point in 3n dimensional space• Complication, potential for bias• Standardise each window by whitening
Building Peer Groups
19/09/07 40 / 69
Account B
Am
ount
With
draw
n
0 1 2 3 4 5 6 7 8 9 100
20
40
60
80
100
0 1 2 3 4 5 6 7 8 9 100
20
40
60
80
100
Account A
Am
ount
With
draw
n
M(6 2
3, 10,A)M(1,3 1
3,A) M(3 1
3,6 2
3,A)
M(6 2
3, 10,B)M(1,3 1
3,B) M(3 1
3,6 2
3,B)
Building Peer Groups
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis• Time Alignment &Feature Extraction• Time Alignment &Feature Extraction• Outlier Detection fromPeer Groups• Active and InactiveAccounts
• Building Peer Groups
• Building Peer Groups
• Building Peer Groups
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
19/09/07 41 / 69
• Find k nearest neighbours• Large number of accounts• Accounts that have high volume of transactions unlikely to
be tracked by accounts with low volume• First sort by number of transactions in training data
Performance Evaluation
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
• Performance Criteria
• Performance Metric
• Performance Curve• Average PerformanceCurve
Experiments & Results
Conclusions & CurrentWork
19/09/07 42 / 69
Performance Criteria
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
• Performance Criteria
• Performance Metric
• Performance Curve• Average PerformanceCurve
Experiments & Results
Conclusions & CurrentWork
19/09/07 43 / 69
• Reduce total amount lost to fraud• Reduce number of fraudulent transactions• Reduce the time between fraud starting and fraud
detection• Reduce the number of account holders affected by flagging
legitimate transactions as fraud• Number of possible performance metrics
Performance Metric
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
• Performance Criteria
• Performance Metric
• Performance Curve• Average PerformanceCurve
Experiments & Results
Conclusions & CurrentWork
19/09/07 44 / 69
• If an account has been flagged as containing fraudulenttransactions. The card issuer would need to investigate thisaccount.
• minimise the amount of fraud given the number ofinvestigations the card company can make
Performance Curve
• x-axis number of fraudulent accounts missed as aproportion of the number of fraudulent accounts
• y-axis number of fraud flags raised as a proportion of thenumber of accounts
• Different to ROC curve. The smaller the area under thecurve the better the performance.
• Random classification is represented by a diagonal linefrom the top left to the bottom right.
Performance Curve
19/09/07 45 / 69
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Proportion of Frauds not found
Num
ber
of F
raud
Fla
gs R
aise
d pe
r D
ayas
a P
ropo
rtio
n of
the
Pop
ulat
ion
• The lower the curve the better the performance.• Twice Area under Curve [0,1], smaller the area the better the
performance
Average Performance Curve
19/09/07 46 / 69
• Produce one curve for each day• Take the average of the curves.• For a given proportion of fraud flags raised
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Proportion of Frauds not found
Num
ber
of F
raud
Fla
gs R
aise
d pe
r D
ayas
a P
ropo
rtio
n of
the
Pop
ulat
ion
Experiments & Results
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
• Experiments
• Effect of FraudContamination using anOracle• Effect of FraudContamination using anOracle
• Building Peer Groups
• Building Peer Groups
• Building Peer Groups
• Building Peer Groups
• Building Peer Groups
• Varying Length ofSummary StatisticWindow• Varying Length of
19/09/07 47 / 69
Experiments
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
• Experiments
• Effect of FraudContamination using anOracle• Effect of FraudContamination using anOracle
• Building Peer Groups
• Building Peer Groups
• Building Peer Groups
• Building Peer Groups
• Building Peer Groups
• Varying Length ofSummary StatisticWindow• Varying Length of
19/09/07 48 / 69
Data
• 4 months of data• Accounts with > 80 transactions and fraud free for first 3
months.• About 4000 accounts 6% defrauded in final month• Performed Peer Group Analysis once a day for the
remaining month
Parameters
• Peer Group building 8 segments• Summary Statistic window size 7 days• Active Peer Group Size 100• Robustifying Peer Groups not used
Effect of Fraud Contamination using an Oracle
19/09/07 49 / 69
0 50 100 150 200 250 300 350 4000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Peer Group Size
Tw
ice
Are
a U
nder
Cur
ve
With Fraud Contamination
Effect of Fraud Contamination using an Oracle
19/09/07 50 / 69
0 50 100 150 200 250 300 350 4000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Peer Group Size
Tw
ice
Are
a U
nder
Cur
ve
With Fraud ContaminationWithout Fraud Contamination
Building Peer Groups
19/09/07 51 / 69
The effect of changing the granularity of the description of the PeerGroup building data .
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Proportion of Frauds Not Found
Num
ber
of F
raud
Fla
gs R
aise
d pe
r D
ayas
a P
ropo
rtio
n of
the
Pop
ulat
ion
1
Building Peer Groups
19/09/07 52 / 69
The effect of changing the granularity of the description of the PeerGroup building data .
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Proportion of Frauds Not Found
Num
ber
of F
raud
Fla
gs R
aise
d pe
r D
ayas
a P
ropo
rtio
n of
the
Pop
ulat
ion
12
Building Peer Groups
19/09/07 53 / 69
The effect of changing the granularity of the description of the PeerGroup building data .
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Proportion of Frauds Not Found
Num
ber
of F
raud
Fla
gs R
aise
d pe
r D
ayas
a P
ropo
rtio
n of
the
Pop
ulat
ion
124
Building Peer Groups
19/09/07 54 / 69
The effect of changing the granularity of the description of the PeerGroup building data .
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Proportion of Frauds Not Found
Num
ber
of F
raud
Fla
gs R
aise
d pe
r D
ayas
a P
ropo
rtio
n of
the
Pop
ulat
ion
1248
Building Peer Groups
19/09/07 55 / 69
The effect of changing the granularity of the description of the PeerGroup building data .
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Proportion of Frauds Not Found
Num
ber
of F
raud
Fla
gs R
aise
d pe
r D
ayas
a P
ropo
rtio
n of
the
Pop
ulat
ion
124816
Varying Length of Summary Statistic Window
19/09/07 56 / 69
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
Proportion of Frauds not Found
Num
ber
of F
raud
Fla
gs R
aise
d pe
r D
ay a
s a
Pro
port
ion
of th
e P
opul
atio
n
1 day
Varying Length of Summary Statistic Window
19/09/07 57 / 69
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
Proportion of Frauds not Found
Num
ber
of F
raud
Fla
gs R
aise
d pe
r D
ay a
s a
Pro
port
ion
of th
e P
opul
atio
n
1 day3 days
Varying Length of Summary Statistic Window
19/09/07 58 / 69
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
Proportion of Frauds not Found
Num
ber
of F
raud
Fla
gs R
aise
d pe
r D
ay a
s a
Pro
port
ion
of th
e P
opul
atio
n
1 day3 days5 days
Varying Length of Summary Statistic Window
19/09/07 59 / 69
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
Proportion of Frauds not Found
Num
ber
of F
raud
Fla
gs R
aise
d pe
r D
ay a
s a
Pro
port
ion
of th
e P
opul
atio
n
1 day3 days5 days7 days
Varying Length of Summary Statistic Window
19/09/07 60 / 69
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
Proportion of Frauds not Found
Num
ber
of F
raud
Fla
gs R
aise
d pe
r D
ay a
s a
Pro
port
ion
of th
e P
opul
atio
n
1 day3 days5 days7 days14 days
Global Outlier Detector
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
• Experiments
• Effect of FraudContamination using anOracle• Effect of FraudContamination using anOracle
• Building Peer Groups
• Building Peer Groups
• Building Peer Groups
• Building Peer Groups
• Building Peer Groups
• Varying Length ofSummary StatisticWindow• Varying Length of
19/09/07 61 / 69
• Is peer group analysis doing nothing more than findingoutliers to the population?
• Special case, use largest possible peer group• All accounts apart from target account
Peer Groups Performance
19/09/07 62 / 69
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Proportion of Frauds Not Found
Nu
mb
er
of
Fra
ud
Fla
gs
Ra
ise
d p
er
Da
ya
s a
Pro
po
rtio
n o
f th
e P
op
ula
tion
Non Robust
Peer Groups Performance
19/09/07 63 / 69
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Proportion of Frauds Not Found
Nu
mb
er
of
Fra
ud
Fla
gs
Ra
ise
d p
er
Da
ya
s a
Pro
po
rtio
n o
f th
e P
op
ula
tion
Non RobustNon Robust without Fraud Contamination
Peer Groups Performance
19/09/07 64 / 69
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Proportion of Frauds Not Found
Nu
mb
er
of
Fra
ud
Fla
gs
Ra
ise
d p
er
Da
ya
s a
Pro
po
rtio
n o
f th
e P
op
ula
tion
Non RobustNon Robust without Fraud ContaminationRobust
Peer Groups Performance
19/09/07 65 / 69
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Proportion of Frauds Not Found
Nu
mb
er
of
Fra
ud
Fla
gs
Ra
ise
d p
er
Da
ya
s a
Pro
po
rtio
n o
f th
e P
op
ula
tion
Non RobustNon Robust without Fraud ContaminationRobustGlobal
Peer Groups Versus Global Outlier Detector
19/09/07 66 / 69
Performance of the peer group analysis compared with global populationoutlier detector.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−0.1
−0.05
0
0.05
0.1
Number of Fraud Flags Raised per Day as a Proportion of the Population
Pe
rfo
rma
nce
Diff
ere
nce
Robustified Peer GroupPeer Group
Peer Groups Versus Global Outlier Detector
19/09/07 67 / 69
Performance of the robustified peer group analysis compared with globalpopulation outlier detector on screened data.
0 0.2 0.4 0.6 0.8 1
−0.1
−0.05
0
0.05
0.1
Number of Fraud Flags Raised per Day as a Proportion of the Population
Per
form
ance
Diff
eren
ce
Conclusions & CurrentWork
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
• Conclusions
19/09/07 68 / 69
Conclusions
• EPSRC Think CrimeInitiative
• ThinkCrime Team
• Overview
Plastic Card Fraud
Peer Group Analysis -Introduction
Peer Group Analysis
The Dataset
Applying Peer GroupAnalysis
Performance Evaluation
Experiments & Results
Conclusions & CurrentWork
• Conclusions
19/09/07 69 / 69
• We have demonstrated there exist credit card transactionaccounts that evolve sufficiently closely to enablefraudulent behaviour to be detected.
• Finding frauds that are not global outliers to the population.
Current work
• Combining Methods