Data Mining for Fraud Detection 4381

  • Upload
    eleodor

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

  • 8/6/2019 Data Mining for Fraud Detection 4381

    1/31

    CS490D:Introduction to Data Mining

    Prof. Chris Clifton

    April 14, 2004

    Fraud and Misuse Detection

  • 8/6/2019 Data Mining for Fraud Detection 4381

    2/31

    What is Fraud Detection?

    Identify wrongful actions

    Is right and wrong universal?

    If so, why not just prevent wrong actions

    Identify actions by the wrong people

    Identify suspectactions

    Legal But probably not right

  • 8/6/2019 Data Mining for Fraud Detection 4381

    3/31

    In Data Mining terms

    Classification?

    Classify into fraudulent and non-fraudulent

    behavior

    What do we need to do this?

    Outlier Detection

    Assume non-fraudulent behavior is normal

    Find the exceptions

    Problems?

  • 8/6/2019 Data Mining for Fraud Detection 4381

    4/31

    Solution: Differential Profiling

    Determine individual

    behavior

    What is normal for the

    individual

    What separates one

    individual from another

    Gives profile of

    individual behavior

    How do we do this?

  • 8/6/2019 Data Mining for Fraud Detection 4381

    5/31

    Has this been done?

    Intrusion Detection(Lane&Brodley)

    Profiled computer users based on commandsequences Command

    Some (but not all) argument information Sequence information

  • 8/6/2019 Data Mining for Fraud Detection 4381

    6/31

    Results

    Accuracy Time to Alarm

  • 8/6/2019 Data Mining for Fraud Detection 4381

    7/31

    Scaling Issues

    What happens with millions of users?

    Credit card

    Cell phone

    What about new users?

    Ideas?

  • 8/6/2019 Data Mining for Fraud Detection 4381

    8/31

    Multi-user profiles

    Cluster users

    Develop profiles for clusters

    E.g., differential profiling Old customers: Do they match profile for

    their cluster?

    Allows wider range of acceptable behavior

    New customer: Do they matchany

    profile?

  • 8/6/2019 Data Mining for Fraud Detection 4381

    9/31

    Data miningfor detectionand

    prevention

  • 8/6/2019 Data Mining for Fraud Detection 4381

    10/31

    The process of discoveringmeaningful new relationships,

    patterns and trends by siftingthrough data using patternrecognition technologies as wellas statistical and mathematical

    tech

    niques.

    - The Gartner Group

    Data mining defined:

  • 8/6/2019 Data Mining for Fraud Detection 4381

    11/31

    Matching known

    fraud/non-compliance Which new cases are similar to

    known cases?

    How can we define similarity?

    How can we rate orscoresimilarity?

  • 8/6/2019 Data Mining for Fraud Detection 4381

    12/31

    Anomalies and

    irregularities How can we detect anomalous or

    unusual behavior?

    What do we mean by usual?

    Can we rate or score cases ontheir degree of anomaly?

  • 8/6/2019 Data Mining for Fraud Detection 4381

    13/31

    Data mining is not Blindapplication of

    analysis/modeling algorithms

    Brute-force crunching of bulkdata

    Black box technology

    Magic

  • 8/6/2019 Data Mining for Fraud Detection 4381

    14/31

    How do you mine data?

    Use the CrossIndustry StandardProcess for DataMining (CRISP-DM)

    Based on real-world lessons:

    Focus onbusiness issues

    User-centric &

    interactive Full process

    Results are used

  • 8/6/2019 Data Mining for Fraud Detection 4381

    15/31

    Techniques used to

    identify fraudPredict and Classify

    Regressionalgorithms(predict numericoutcome): neuralnetworks, CART,Regression, GLM

    Classification

    algorithms(predict symbolicoutcome): CART,C5.0, logisticregression

    Group and FindAssociations

    Clustering/Grouping algorithms:K-means,Kohonen, 2Step,Factor analysis

    Associationalgorithms:apriori, GRI,Capri, Sequence

  • 8/6/2019 Data Mining for Fraud Detection 4381

    16/31

    Techniques for

    finding fraud:

    Predict the expected

    value for a claim,compare that with theactual value of the claim.

    Those cases that fall faroutside the expectedrange should be

    evaluated more closely

  • 8/6/2019 Data Mining for Fraud Detection 4381

    17/31

    Techniques for

    finding fraud:

    Build a profile of the

    ch

    aracteristics offraudulent behavior.

    Pull out the casesthat meet thehistorical

    ch

    aracteristics offraud.

    Decision Trees and Rules

  • 8/6/2019 Data Mining for Fraud Detection 4381

    18/31

    Techniques for

    finding fraud:

    Group behavior

    using aclusteringalgorithm

    Find groups ofevents usingthe associationalgorithms

    Identify outliersand investigate

    Clustering and Associations

  • 8/6/2019 Data Mining for Fraud Detection 4381

    19/31

    Fraud detection usingCRISP-DM Provides a systematic way to

    detect fraud and abuse

    Ensures auditing andinvestigative efforts aremaximized

    Continually assesses andupdates models to identify

    new emerging fraud patterns Leads to higher recoupments

  • 8/6/2019 Data Mining for Fraud Detection 4381

    20/31

    Data mining in

    action: Fraud,waste and abusecase studies

  • 8/6/2019 Data Mining for Fraud Detection 4381

    21/31

    How can data mining

    help? Payment error prevention

    Billing and payment fraud

    Audit selection

  • 8/6/2019 Data Mining for Fraud Detection 4381

    22/31

    Payment Error Prevention

    used this information to focus

    their auditing effort

    The US Health Care Finance

    Administration needed to isolate the

    likely causes of payment error by

    developing a profile of acceptable

    billing practices and...

  • 8/6/2019 Data Mining for Fraud Detection 4381

    23/31

    Payment error

    prevention solution Clementine

    Using audited discharge records, builtprofiles of appropriate decisions such as

    diagnosis coding and admission Matched new cases

    Cases not matching are audited

  • 8/6/2019 Data Mining for Fraud Detection 4381

    24/31

    Payment error

    prevention results Detected 50% of past incorrect

    payments resulting in significantrecovery of funding lost to payment

    errors PRO analysts able to use resultant

    Clementine models to prevent futureerror

  • 8/6/2019 Data Mining for Fraud Detection 4381

    25/31

    Billing and payment fraud

    Identified suspicious cases to

    focus investigations

    The US Defense Finance and

    Accounting Service needed tofind fraud in millions of Dept of

    Defense transactions and...

  • 8/6/2019 Data Mining for Fraud Detection 4381

    26/31

    Billing and payment

    fraud solution Clementine

    Detection models based on known fraudpatterns

    Analyzed all transactions scoredbased on similarity to these knownpatterns

    High scoring transactions flagged forinvestigation

  • 8/6/2019 Data Mining for Fraud Detection 4381

    27/31

    Billing and payment

    fraud results Identified over 1,200 payments for

    further investigation

    Integrated the detection process

    Anomaly detection methods (e.g.,clustering) will serve as sentinelsystems for previously undetected fraudpatterns

  • 8/6/2019 Data Mining for Fraud Detection 4381

    28/31

    Audit selection

    Focused audit investigations oncases with the highest likely

    adjustments

    The Washington State

    Department of Revenueneeded to detect erroneous

    tax returns and...

  • 8/6/2019 Data Mining for Fraud Detection 4381

    29/31

    Audit selection solution

    Clementine

    Using previously audited returns

    Model adjustment (recovery) per auditorhour based on return information

    Models will then score future returnsshowing highest potential adjustment

  • 8/6/2019 Data Mining for Fraud Detection 4381

    30/31

    Audit selection results

    Maximizes auditors time byfocusing on cases likely to yield

    th

    eh

    igh

    est return Closes the tax gap

  • 8/6/2019 Data Mining for Fraud Detection 4381

    31/31

    Data mining - key to detecting

    and preventing fraud, wasteand abuse Learn from the past

    High quality, evidence baseddecisions

    Predict

    Prevent future instances

    React to changing circumstances Models kept current, from latest data