8/6/2019 Data Mining for Fraud Detection 4381
1/31
CS490D:Introduction to Data Mining
Prof. Chris Clifton
April 14, 2004
Fraud and Misuse Detection
8/6/2019 Data Mining for Fraud Detection 4381
2/31
What is Fraud Detection?
Identify wrongful actions
Is right and wrong universal?
If so, why not just prevent wrong actions
Identify actions by the wrong people
Identify suspectactions
Legal But probably not right
8/6/2019 Data Mining for Fraud Detection 4381
3/31
In Data Mining terms
Classification?
Classify into fraudulent and non-fraudulent
behavior
What do we need to do this?
Outlier Detection
Assume non-fraudulent behavior is normal
Find the exceptions
Problems?
8/6/2019 Data Mining for Fraud Detection 4381
4/31
Solution: Differential Profiling
Determine individual
behavior
What is normal for the
individual
What separates one
individual from another
Gives profile of
individual behavior
How do we do this?
8/6/2019 Data Mining for Fraud Detection 4381
5/31
Has this been done?
Intrusion Detection(Lane&Brodley)
Profiled computer users based on commandsequences Command
Some (but not all) argument information Sequence information
8/6/2019 Data Mining for Fraud Detection 4381
6/31
Results
Accuracy Time to Alarm
8/6/2019 Data Mining for Fraud Detection 4381
7/31
Scaling Issues
What happens with millions of users?
Credit card
Cell phone
What about new users?
Ideas?
8/6/2019 Data Mining for Fraud Detection 4381
8/31
Multi-user profiles
Cluster users
Develop profiles for clusters
E.g., differential profiling Old customers: Do they match profile for
their cluster?
Allows wider range of acceptable behavior
New customer: Do they matchany
profile?
8/6/2019 Data Mining for Fraud Detection 4381
9/31
Data miningfor detectionand
prevention
8/6/2019 Data Mining for Fraud Detection 4381
10/31
The process of discoveringmeaningful new relationships,
patterns and trends by siftingthrough data using patternrecognition technologies as wellas statistical and mathematical
tech
niques.
- The Gartner Group
Data mining defined:
8/6/2019 Data Mining for Fraud Detection 4381
11/31
Matching known
fraud/non-compliance Which new cases are similar to
known cases?
How can we define similarity?
How can we rate orscoresimilarity?
8/6/2019 Data Mining for Fraud Detection 4381
12/31
Anomalies and
irregularities How can we detect anomalous or
unusual behavior?
What do we mean by usual?
Can we rate or score cases ontheir degree of anomaly?
8/6/2019 Data Mining for Fraud Detection 4381
13/31
Data mining is not Blindapplication of
analysis/modeling algorithms
Brute-force crunching of bulkdata
Black box technology
Magic
8/6/2019 Data Mining for Fraud Detection 4381
14/31
How do you mine data?
Use the CrossIndustry StandardProcess for DataMining (CRISP-DM)
Based on real-world lessons:
Focus onbusiness issues
User-centric &
interactive Full process
Results are used
8/6/2019 Data Mining for Fraud Detection 4381
15/31
Techniques used to
identify fraudPredict and Classify
Regressionalgorithms(predict numericoutcome): neuralnetworks, CART,Regression, GLM
Classification
algorithms(predict symbolicoutcome): CART,C5.0, logisticregression
Group and FindAssociations
Clustering/Grouping algorithms:K-means,Kohonen, 2Step,Factor analysis
Associationalgorithms:apriori, GRI,Capri, Sequence
8/6/2019 Data Mining for Fraud Detection 4381
16/31
Techniques for
finding fraud:
Predict the expected
value for a claim,compare that with theactual value of the claim.
Those cases that fall faroutside the expectedrange should be
evaluated more closely
8/6/2019 Data Mining for Fraud Detection 4381
17/31
Techniques for
finding fraud:
Build a profile of the
ch
aracteristics offraudulent behavior.
Pull out the casesthat meet thehistorical
ch
aracteristics offraud.
Decision Trees and Rules
8/6/2019 Data Mining for Fraud Detection 4381
18/31
Techniques for
finding fraud:
Group behavior
using aclusteringalgorithm
Find groups ofevents usingthe associationalgorithms
Identify outliersand investigate
Clustering and Associations
8/6/2019 Data Mining for Fraud Detection 4381
19/31
Fraud detection usingCRISP-DM Provides a systematic way to
detect fraud and abuse
Ensures auditing andinvestigative efforts aremaximized
Continually assesses andupdates models to identify
new emerging fraud patterns Leads to higher recoupments
8/6/2019 Data Mining for Fraud Detection 4381
20/31
Data mining in
action: Fraud,waste and abusecase studies
8/6/2019 Data Mining for Fraud Detection 4381
21/31
How can data mining
help? Payment error prevention
Billing and payment fraud
Audit selection
8/6/2019 Data Mining for Fraud Detection 4381
22/31
Payment Error Prevention
used this information to focus
their auditing effort
The US Health Care Finance
Administration needed to isolate the
likely causes of payment error by
developing a profile of acceptable
billing practices and...
8/6/2019 Data Mining for Fraud Detection 4381
23/31
Payment error
prevention solution Clementine
Using audited discharge records, builtprofiles of appropriate decisions such as
diagnosis coding and admission Matched new cases
Cases not matching are audited
8/6/2019 Data Mining for Fraud Detection 4381
24/31
Payment error
prevention results Detected 50% of past incorrect
payments resulting in significantrecovery of funding lost to payment
errors PRO analysts able to use resultant
Clementine models to prevent futureerror
8/6/2019 Data Mining for Fraud Detection 4381
25/31
Billing and payment fraud
Identified suspicious cases to
focus investigations
The US Defense Finance and
Accounting Service needed tofind fraud in millions of Dept of
Defense transactions and...
8/6/2019 Data Mining for Fraud Detection 4381
26/31
Billing and payment
fraud solution Clementine
Detection models based on known fraudpatterns
Analyzed all transactions scoredbased on similarity to these knownpatterns
High scoring transactions flagged forinvestigation
8/6/2019 Data Mining for Fraud Detection 4381
27/31
Billing and payment
fraud results Identified over 1,200 payments for
further investigation
Integrated the detection process
Anomaly detection methods (e.g.,clustering) will serve as sentinelsystems for previously undetected fraudpatterns
8/6/2019 Data Mining for Fraud Detection 4381
28/31
Audit selection
Focused audit investigations oncases with the highest likely
adjustments
The Washington State
Department of Revenueneeded to detect erroneous
tax returns and...
8/6/2019 Data Mining for Fraud Detection 4381
29/31
Audit selection solution
Clementine
Using previously audited returns
Model adjustment (recovery) per auditorhour based on return information
Models will then score future returnsshowing highest potential adjustment
8/6/2019 Data Mining for Fraud Detection 4381
30/31
Audit selection results
Maximizes auditors time byfocusing on cases likely to yield
th
eh
igh
est return Closes the tax gap
8/6/2019 Data Mining for Fraud Detection 4381
31/31
Data mining - key to detecting
and preventing fraud, wasteand abuse Learn from the past
High quality, evidence baseddecisions
Predict
Prevent future instances
React to changing circumstances Models kept current, from latest data