18
Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Embed Size (px)

Citation preview

Page 1: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Data Mining Journal Entries for Fraud Detection: A Pilot Study

by Roger S. Debreceny &Glen L. Gray

Discussed bySeverin Grabski

Page 2: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Objective

• Explore research issues related to the application of statistical data mining to fraud detection in journal entries– Is this important?– YES! Most significant frauds are not

conducted by the users of the ERP systems, they are done “outside” of these well controlled systems.

• Was this accomplished?– Maybe

Page 3: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Accomplished?• Used Benford’s Law in examining Journal

Entries• Statistically significant differences in First

Digit distributions were found (Chi Square test), should these be investigated?– A 0% difference (Omicron) gives a statistically

significant p < 0.015. What does this tell me? – Is a 1% difference between observed and

predicted indicative of a problem?– Could use Mean Absolute Deviation

Page 4: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Entity Total Dev MAD

Beta 0.19 0.0211

Chi 0.03 0.0033

ChiEta 0.06 0.0067

ChiNu 0.11 0.0122

ChiPi 0.30 0.0333

Delta 0.06 0.0067

Eta 0.20 0.0222

EtaNu 0.10 0.0111

EtaPi 0.08 0.0089

Nu 0.34 0.0378

Page 5: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Benford’s Law & First 5 Firms

0%

5%

10%

15%

20%

25%

30%

35%

1 2 3 4 5 6 7 8 9

Benford

Beta

Chi

ChiEta

ChiNu

ChiPi

Page 6: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Accomplished?

• Identification of “violations” of the Benford’s First Digit Law only provides a preliminary indication– Nigrini and Mittermaier (1997) recommend

using the first digit as an initial test of reasonableness

Page 7: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Other “Benford’s Law” Digit Tests

• Second Digit Test– This also only gives a preliminary indication

• First Two Digits Test– Provide more direction

• Number Duplication – Identify and rank order duplicate numbers

Page 8: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Other Benford’s Law Research

• Carslaw (1988) found support for rounding up of income figures using the expected second digit frequencies (more 0s, fewer 9s than expected).

• Thomas (1989), again using second digits found support for rounding up of income and down for losses.

• Nigrini – (1994) used first two digit frequencies to analyze

payroll fraud, and – (1996) used first two digit frequencies to examine tax

compliance

Page 9: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Fourth Digit Test• Chi Square to test for distributional

difference of fourth digit– “…distribution of the fourth digit for each

organization for all dollar amounts over $999.”– Was this the fourth digit to the left or right?– What if the transaction was for $100,000?

• While statistically significant differences were found, should these be investigated?

Page 10: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Three Digit Test

• Examined Last (Three) Digits in dollar amounts – Used the “top 5” of the last three digit pattern – Found that 4 of 29 entities had 30-60% of

their transactions consisting of the top 5 last three digit patterns

• Would be interesting to note if these were the entities that “failed” Benford’s Law

Page 11: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Data Mining J/E Questions

• Would have liked a more reasoned/theoretical approach in specifying where and why data mining techniques should be applied

• Sources of J/E?– Influence Data Mining

• Unusual patterns between classes of J/Es?• Class of J/E influence nature of J/E (i.e., do any

type of J/E have a higher probability of fraud)?• Evidence from Benford’s Law or Right Most Digits?• Underlying issues that will guide effective and

efficient data mining of JEs

Page 12: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Descriptive Statistics

• Any way to group the firms by industry?

• What can be found based upon grouping and analyzing by size?

Page 13: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Other Questions

• What other approaches (than Benford’s Law) can be applied to mining journal entries?

• What is currently done by audit teams for computerized analysis of journal entries?

• The analysis expects to see a “large enough” number of Journal Entries in order to highlight that fraud might be occurring. What if only a few JEs are made? What is the sensitivity of this approach?

Page 14: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Confusion• Number of organizations?

– 36 organizations– 8 data sets had less than 1 year– 1 data set was incomplete– 27 why 29 observations?

• Did you count each year for the 2 organizations that provided 2 years of data as separate observations?– What is the justification?– Why not do a year-to-year comparison for

those organizations?

Page 15: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

What’s Missing?

• Interpretation and more detailed analysis of the data– Know that there are “violations” but never

know if there is really fraudulent activity

• What are the other data mining techniques that are planned?

• Analytical reasoning as to what tests should be done or what is revealed by certain tests

Page 16: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Data Mining Extensions

• Compare the entities with “larger” average line items per journal entry (e.g. >10) in one pool?

• Alternatively look at those in which the maximum number of line items is large (e.g. >100)

Page 17: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Summary

• Objective – explore research issues related to the application of statistical data mining to fraud detection in journal entries

• Good first step – and this is a pilot study• Would like more theoretical motivation for

tests & research issues• Would have liked more data analysis• Could I apply this in an audit?• I’m not sure - - - more research is needed

Page 18: Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Thank You