QUT Digital Repository: · Chapter 1 DETECTING COLLUSION IN ERP SYSTEMS Asadul K. Islam, Malcolm Corney, George Mohay, Andrew Clark, Shane Bracher, Tobias Raub and Ulrich Flegel

This is the author version published as: ed as :

QUT Digital Repository: http://eprints.qut.edu.au/40635

halla

Rectangle

halla

Rectangle

Chapter 1

DETECTING COLLUSION IN ERP SYSTEMS

Asadul K. Islam, Malcolm Corney, George Mohay, Andrew Clark, ShaneBracher, Tobias Raub and Ulrich Flegel

Abstract In today’s technological age, fraud has become more complicated, andincreasingly more difficult to detect, especially when it is collusive innature. Different fraud surveys showed that the median loss from col-lusive fraud is much greater than fraud perpetrated by a single person.Despite its prevalence and potentially devastating effects, collusion iscommonly overlooked as an organizational risk. Internal auditors oftenfail to proactively consider collusion in their fraud assessment and de-tection efforts. In this paper, we consider fraud scenarios with collusion.We present six potentially collusive fraudulent behaviors and show theirdetection process in an ERP system. We have enhanced our fraud de-tection framework to utilize aggregation of different sources of logs inorder to detect communication and have further enhanced it to renderit system-agnostic thus achieving portability and making it generallyapplicable to all ERP systems.

Keywords: Fraud Detection, Collusion, Enterprise Resource Planning, ERP Sys-tems, Signature Matching

1. Introduction

The Association of Certified Fraud Examiners (ACFE) shows in itssurvey report in 2010 that in the Oceania region alone the annual medianloss per company from fraud is more than $600,000 [1]. A typical orga-nization loses 5% of its annual revenues to fraud and abuse. In today’stechnological age, fraud has become more complicated, and increasinglydifficult to detect, especially when it is collusive in nature and commit-ted by mid- and upper- management who are capable of concealing it [2].In the 2006 national fraud survey (throughout the US) of ACFE showsthat the majority of 2006 cases (60.3%) involved only a single perpetra-tor, but when two or more persons conspired, the median loss more than

2

quadrupled ($100,000 for one perpetrator and $485,000 for two or moreperpetrators) [3]. When auditing user activities for detecting possiblefraudulent activities, it is relatively easy to identify individual transac-tions that indicate possible fraud, however when the fraud involves acombination of multiple legitimate user steps, then the fraud is muchharder to detect [4].

Enterprise resource planning (ERP) systems can help prevent fraudthrough appropriate policy and internal controls and these can help con-siderably to prevent crimes of opportunity. However, their effectivenessis limited for a number of reasons, including that the controls are typi-cally designed only to red flag, not prevent, individual transactions. Inaddition, controls are sometimes simply not turned on either because ofmisconfiguration or, in the case of small to medium enterprises (SMEs),because the controls implement segregation of duties (SoD) and the SMEsimply has insufficient staff numbers to comply.

In our previous work [6] we developed the design and implementationof a framework for detecting patterns of fraudulent activity, which wecall fraud scenarios. A fraud scenario can be defined as a set of useractivities that indicates the possible occurrence of fraud. This immedi-ately presents an interesting parallel with computer intrusion scenariosand the way in which intrusion detection systems (IDSs) attempt torecognize computer intrusions using attack signatures. Fraud activityscenarios however are concerned with high-level transactions by a useron financial data rather than with the events or states of the computeritself, as is the case for IDS. There is a corresponding greater degree ofsystem independence in the case of fraud detection, and this can in turnbe exploited by more easily separating out abstract or semantic aspectsof fraud signatures from configuration aspects [6]. For this reason, wedesigned a signature language specific to the purpose of defining fraudscenarios.

The semantics of the language have been developed and tested forcommon fraud scenarios in ERP systems. We identified and specifiedsix fraud scenarios and described the process of specifying and detectingthe occurrence of those scenarios in ERP user log data. The scenariosin each case reflected possible violation of segregation of duties or pos-sible instances of masquerade. Detection of SoD violation scenarios in-volved identifying multiple transactions carried out by the one principal;whereas detection of possible masquerade scenarios involved identifyingthat multiple transactions (although carried out supposedly by differentprincipals) were carried out from the same terminal.

Our fraud detection system performs post-hoc (non-realtime) anal-ysis and investigation. The occurrences of the fraud scenarios do not

Islam, Corney, Mohay, Clark, Bracher, Raub & Flegel 3

always confirm fraud; rather they identify the possible occurrence offraud and raise a red flag to the auditors for further investigation. Itgives the auditor a simple, user-friendly interface and tool for writingnew fraud scenarios as well as for fine tuning existing ones. The sys-tem can be tuned to be more or less conservative. It can be tuned toidentify a larger number of suspicious activities, effectively ‘red flagging’many activities of which only a few if any would turn out after fur-ther investigation to be fraudulent. In IDS terms this is equivalent toproducing many false positives (FPs). Alternatively, the system can betuned to identify fewer suspicious activities, while possibly missing somepotentially fraudulent scenarios, a situation equivalent in IDS terms toproducing false negatives (FNs).

Collusive fraud is one of the most difficult types of fraud to expose.Collusion is acting with another person or other persons with the inten-tion to deceive. More precisely, collusion is a secret agreement betweentwo or more individuals for a deceitful or fraudulent purpose. It is diffi-cult enough to uncover fraud when it is committed by an individual, butdetecting two or more individuals working together to defraud an orga-nization can be especially difficult. In this paper, we extend our previouswork to include detection of fraudulent activities involving collusion.

Segregation of duties (SoD) is a cornerstone of effective internal con-trols and the deterrence of fraud. However, even with SoD in place,collusive acts can occur that effectively override segregation controlsand thus allow co-conspirators to commit fraud by creating opportu-nities where none existed previously. Best practice to minimize fraud,especially collusion, includes mandatory disclosure of employee relation-ships, prohibiting vendors from sending gifts of any kinds to employees,ensuring employees are aware of organizational policy, and a system forindividuals to report suspicious behavior and irregularities from both in-side and outside. However, fraud committed via collusion is difficult touncover, particularly if the participants hold true to their pact. Respon-dents to KPMG’s fraud survey [5] cited collusion as one of the leadingfactors that allows fraud in their organization to occur.

To detect collusion, we have extended our framework so that, in ad-dition to analyzing transactions, it enables the aggregation of differentsources of logs in order to detect different forms of potentially collu-sive communication between principals, using logs such as phone logs,email logs, and door entry logs. We employ the same six scenariosas previously, now extended to express possible collusion between theprincipals. In addition, we have enhanced our framework to render itsystem-agnostic thus achieving portability and making it generally ap-plicable to all ERP systems.

4

An example. Purchase orders created and approved by the same usersignify a potentially fraudulent activity where proper segregation of du-ties has not been implemented, which as mentioned is often the case forSMEs. However, fraud can still occur when collusion exists between twousers who are allowed to carry out those steps individually. If we assumethat the two steps are carried out by two legitimate users but betweenthese two steps they maintain some kind of communication in the formof email or phone and if we can utilize the phone or email communicationlogs in addition to the ERP system’s transaction log, we can then definea fraud scenario which identifies the occurrence of potentially fraudulentactivity. The scenario needs to express that the two users carry out thetwo steps and that they communicate before completion of the completeset of transactions. In this paper, we explain how we can utilize ourscenario language to define this kind of collusive fraud scenario and howwe can utilize different sources of logs to detect those scenarios. The de-sign is applicable to transaction log files from any ERP system; howeverthroughout our research we have used transaction log files from SAPsystems.

In Section 1.2 we discuss related works and the novelty of our work.Section 1.3 describes collusive fraud scenarios in general; it also includesa description of all six collusive fraud scenarios we reference throughoutthe paper. Section 1.4 describes the detection framework and imple-mentation. In Section 1.5 we describe our experimental evaluation ofcollusive fraud scenarios on synthetically generated data. Although theexperiments were conducted on synthetic data, our fine-tuned approachfor balancing the number of false positives and false negatives demon-strates successful detection of fraud scenarios. Section 1.6 contains theconclusion and future work.

2. Related Work

Fraud detection, being part of the overall approach to fraud con-trol, automates and helps to reduce the laborious manual aspects ofthe screening and checking processes [7]. Each business is always sus-ceptible to internal fraud or corruption from its management and non-management employees. Internal fraud detection is mostly concernedwith determining financial reporting by management [8–11] and abnor-mal retail transactions by employees [12, 13].

In the case of detecting fraudulent behavior from user activity logsand transaction log data, data mining and statistical approaches areused to detect suspicious changes in user behavior or anomalies in behav-ior [7, 14]. In contrast to this anomaly detection technique, our previous


paper [6] described a process of building fraud scenarios and the processof detecting such scenarios from the user transaction log data in ERPsystems. Although this technique is not used widely in the detection offraud, it is very commonly used in intrusion detection systems (IDSs) fordetecting intrusions into computer systems (often referred to as signa-ture matching) [15–17]. In [6], we described the difference between fraudscenario detection and intrusion signature detection, and explained theneed for developing a scenario language specifically for specifying fraudscenarios. We also compared our semantics with the semantic model ofMeier [18].

We have found no work in the literature on fraud detection whichutilizes user activity logs to detect collusive fraud. In this paper we de-fine six fraud scenarios involving user collusion and explain the processof detecting such scenarios from user activity logs. The prototype soft-ware which we have developed can utilize different sources of logs in theprocess of finding collusion.

3. Fraud Scenario Definition Language

The structure of a fraud scenario in our framework consists of: Sce-nario Name and Description, list of Components, Attributes and Sce-nario rules. A component is a transaction (extracted from a transactionlog) or any already defined scenario. Scenario attributes hold the val-ues required to specify other aspects of the scenario which defines thebehavior and characteristics of the scenario, in particular the ‘inter-transaction’ conditions which together capture the essential nature ofthe fraud.

The minimum or maximum time intervals allowed between compo-nents are defined in three levels: default, scenario and component level.Component level attribute values override default and scenario level at-tribute values. Default values are used when there is no value specifiedin the definition. The scenario level interval value applies between allcomponents and the component level value applies between two specificcomponents. Duration applies as a condition for maximum time for thewhole scenario to complete. “Ordered” is used to specify whether thecomponents should be present sequentially or not.

3.1 Defining Fraud Scenarios

A scenario definition file (SDF) is an XML file used to specify andstore scenario definitions. For example, consider a scenario ‘RedirectedPayment’ or ‘S01’ described as the behavior to make a payment in sucha way so that the payment goes not to the vendor’s actual account but

6

Figure 1. Structure of the fraud scenario ‘S01’.

to a re-directed account. The scenario involves making payments to avendor account after changing the bank account details of the vendor toa different account, and then, after payment, changing the bank detailsback again. For ‘changing of vendor’s bank details’ we found three trans-action codes (‘FK02’, ‘FI01’, ‘FI02’) in the SAP system are used for thisactivity. The scenario which we call ‘Change Vendor Bank’, will matchany of these transaction codes in the log. Similarly, another scenario,‘make payments to vendor’ will match the existence of any of the fourtransaction codes (‘F-40’, ‘F-44’, ‘F-48’, ‘F-53’) found in the transactionlog, and we call this ‘Pay Vendor’. The structure of the scenario ‘S01’ isshown in Figure 1. ‘Change Vendor Bank’ and ‘Pay Vendor’ are whatwe call basic scenarios; these are used only to match individual trans-actions or events, not groups or sequences of transactions or events, inthe source logs. Our signature scenarios for detecting fraudulent activ-ities consisting of multiple transactions or events we call scenarios orsometimes composite scenarios.

In addition to the scenario of Figure 1, in our previous work [6] we de-fined five other fraud scenarios. Since the detailed description of buildingthe scenario is explained in our previous work, here we will only explainhow we can add collusion in each scenario definition. We add collusionto all six scenarios and detect these modified ‘colluding’ scenarios withour system. Due to shortage of space, we explain four scenarios here inthis paper, the following section defines the three other scenarios andtheir structure in addition to the one above (‘S01’ ). The modificationsare in each case essentially to detect communication between the officers(the ‘users’) who have committed the various transactions forming thescenario.

3.2 Defining Fraud Scenarios with Collusion

We utilize three different types of source logs: ERP system logs whichinclude users’ day to day activity within the ERP system, phone logs andemail logs. The phone and email logs are for the purpose of identifying


Figure 2. Scenario S01 col: Redirect Payment with Collusion.

communication between users. There are of course other forms of com-munication and interaction which could be relevant and which could beanalyzed for the purpose of identifying possible collusion. For instance,door logs can be analyzed, office layouts too to identify people workingin the same room, and personal relationships (by analyzing employeemaster records) etc. We use logs of phone and email communicationbecause these are readily available and we use them to demonstrate theability of our fraud detection framework to detect collusion in additionto detecting violation of SoD and masquerade. We note that we donot analyze or expect to analyze communication content, but only thatthe communication has taken place, and this is for both for legal andpractical reasons. Content analysis is typically prohibited by privacylegislation and would in any case add considerable complexity albeit itcould significantly improve the precision of detecting inappropriate orcollusive communication. ‘Phone or Email’ is a basic scenario shown inFigure 2 which identifies a list of phone calls or sent email records inthe corresponding logs allows to identify the two parties involved in thecommunication.

Scenario ‘S01 col’: Redirected Payment with Collusion (Figure 2).Scenario ‘S02 col’: False Invoice Payment with Collusion (Invoices

created, approved and payment; any two of these activities performedon the same invoice with user collusion). The intention of this fraudis to make a false payment for one’s own benefit. We build the ‘Cre-ate Invoice’ scenario by defining the presence of either of two trans-action codes, ‘FB60’ or ‘MIRO’, in the transaction log which are in-dicative of the ‘create invoice’ activity. The second activity ‘approveinvoice’, is indicated by transaction code ‘MRBR’. For the third activ-ity ‘make payment’, we have already defined the scenario ‘Pay Vendor’as part of scenario ‘S01’. The fraud scenario ‘S02 col’ is the sequenceof scenarios described in Figure 3 on the same purchase order with a

8

Figure 3. Scenario ‘S02 col’. False Invoice Payment with collusion.

‘Phone or Email’ scenario between them indicating the communicationbetween those users.

Scenario ‘S03 col’: Misappropriation with Collusion (purchase or-der and purchase approval of the same purchase order with user collu-sion). This fraud is the result of misappropriation of company capital.When one user has permission to make purchase orders and the otheruser has the privileges to approve them, fraud may exist if they colludewith each other. The fraud scenario ‘S03 col’ is the sequence of twoscenarios, ‘Create PO’ and ‘PO Approval’, on the same purchase orderwith a ‘Phone or Email’ scenario between them indicating communica-tion between those users. The necessary conditions of user collusion aresimilar to that shown in Figure 3.

Scenario ‘S04 col’: Non-purchase Payment with Collusion (Pur-chase order or goods received; any one of these activities and creat-ing an invoice on the same purchase order with user collusion). Theintention of this fraud is to generate purchase records in the systemand make a payment without purchasing. When a user makes eithera purchase order or a goods received transaction and also creates aninvoice on the same purchase entity, this may be a possible fraud. Inthe case of collusion, when any two of those events are done by differ-ent users, and a communication exists between them, we consider thisas a possible fraudulent behavior to check. We build scenario ‘Cre-ate PO or Goods Receipt’ which is the existence of any of two activities‘Create PO’ and ‘Goods Receipt’. The scenario ‘S04 col’ is the sequenceof ‘Create PO or Good Receipt’ and ‘Create Invoice’ on the same pur-chase order with the use of a ‘Phone or Email’ scenario between themand necessary conditions for user collusion.


Figure 4. Workflow and parts of the detection process.

4. Detecting Fraud Scenarios involving Collusion

This section describes the details of the fraud scenario detection pro-cess. In summary, the process finds the definition of fraud scenariosfrom the SDF and searches for matches in the aggregated log data. Inthis paper, and since our previous work, in order to provide for porta-bility, we have redesigned our framework so as to make the detectionprocess independent of log sources and the number of sources. A dataextraction and aggregation module has therefore been developed outsideof the main detection process. This module extracts and aggregates logdata from different sources of interest. In addition, to allow us to takeadvantage of new logs being added to this extraction and aggregationprocess viz., to provide for extensibility, we have redesigned the systemto include the concept of a data profile which allows us to add new se-mantic components to the vocabulary of the matching process. Figure 4shows the workflow and different parts of the detection process.

4.1 Detection Process

The detection process generates an SQL query from the scenario def-inition and runs the query against the aggregated data file. We use theMySQL database for this purpose especially because it is freely available.However the SQL generated by the process follows the ANSI standard,so that it can be supported by any relational database managementsystem.

The Data Extraction and Aggregation Module is external tothe main prototype software because it may need to change accordingto different systems and data. It implements the data extraction proce-dure of different ERP systems 1and generates a text file (AggregatedData File) with necessary user activity records with the intention of

10

finding the presence of any fraud scenarios. The extraction procedure inthe module will be different for different ERP systems. The number offields and arrangements could be different. The data extraction modulemay also include an “Anonymizer tool” to anonymize sensitive infor-mation with a consistent one-way hash function to maintain privacyrequirements. Note that the data extraction and aggregation modulecomprises a set of appropriate parsers which need to know the exactdata structure of the corresponding ERP and other system logs. Thismodule is also responsible for combining different sources of log data.Since we are only considering log sources inside an organization, we canassume the existence of a simple mapping between user identities or ac-counts to identify actual individuals. We do not consider any external orthird party data sources at this stage but will consider it for an extensionin future.

Data Profile has the description of the Data File, e.g., number offields, types of field, user defined name for any specific field, columnand line separator, which fields to consider and which fields to ignoreetc. The inclusion of data profile is one of the major additions in thedetection process since our previous work [6], and we have also developeda Data Profile Module which helps the user to generate the Data Profile.When describing the Aggregated Data File, the user needs to define theformat and types of the individual fields. The system has a number ofpre-defined data types such as date, time, event etc. Users can addor modify this list. In the data profile, users can optionally define theinformation extraction process for each field from a specific position. Forexample, VendorID is defined as the 4-digit string from position 2 of thefourth field. Currently, we are considering only the start position andlength. In future, we may consider using regular expressions.

Data upload module uploads the data from the aggregated datafile to a database table. At the time of the upload the user needs todefine the data profile. The data upload module creates a transactiontable according to the data profile and uploads the data according tothe description in the data profile to that table.

SDF holds the description of known fraud scenarios with the inten-tion of searching for the occurrence of these in the aggregated data file.Creation of new scenarios or editing the existing ones can be done using‘Scenarios Module’. We build Scenarios according to the data profileand list of data types. More detail on scenario building and specificationis provided in [6].

1While our prototype currently has been tested with the SAP R/3 transactions logs, webelieve it can be applicable to other ERP systems’ transaction logs as well.


4.2 Implementation

We use MySQL for storing log data and running scenario matchingqueries. The prototype and other helping modules are all developed inJava with the NetBeans IDE. The data extraction module extracts therequired log data and stores it in a text file. With the data uploadmodule in the prototype, the data is uploaded to MySQL as a table andwith the description of the data format stored in the data profile.

5. Experimental Validation

Below we describe the process we used to generate datasets for theexperiments and detail the results of the scenario testing.

5.1 Data Preparation

The fraud detection process detects fraudulent activities by matchingthe scenarios against transactions or events recorded in the transactiontable of the database. This section describes the sources of data andexamples of how it can be extracted and used in fraud detection. Thecollusion scenarios we described earlier will utilize phone and email logs.In this case, the data extraction module will extract those logs as wellas the user transaction logs from ERP system.

We are using an SAP R/3 ERP version 4.0 system as the source ofthe transaction log data and we extract the application log data fromthese system logs. For the detection process to work there should beat least three data columns in each transaction record: timestamp, userand the event. There might be needed other fields of interest specificto the scenarios, e.g., for the ‘change vendor bank details’ activity weextract the vendor identification number or VendorID. In our previouswork [6], as an example, we describe data extraction process for thetransaction activities in the first scenario from different data storages inthe SAP system.

A major problem in fraud detection research is the dearth of pub-licly available real data to perform experiments on. In our case also wecould not find any organization willing to provide real data for validat-ing our framework. We thus need to generate synthetic data. Lundin etal. describe a method of generating synthetic data for fraud detectionpurposes [19, 20] based on authentic background data merged with datarepresenting fraudulent activity. However, generating synthetic data inthat way is complex and requires authentic data which we do not have.We therefore choose a different approach to generating synthetic data.In this approach the log data follows the exact structure of the data ex-

12

tracted from the SAP system described in more detail in [6]. For phoneand email log we followed the structure of telephone exchange callinglogs and SMTP email server logs. As an example, the maillog from theOpenBSD SMTP server typically contains ‘date and time’, ‘to address’,‘from address’, message size, message id and some other system data.For our purpose, all we need is the ‘to address’, ‘from address’ and ‘dateand time’. This is the very basic information every email log has. Weutilize this basic information and generated it using the synthetic datagenerator.

For implementation and testing purposes we generated 100,000 recordsusing a list of 100 users and 100 terminals. An assumption is made thata user does not always use a specific terminal. This gives us the optionto detect masqueraded fraud scenarios in cases where users perform ac-tivities from multiple terminals or different users perform activities withsame terminal. Before running the synthetic data generator the pro-cess was provided with a list of Vendor identification numbers, InvoiceNumbers, Purchase order numbers and Customer identification numbersneeded for the scenario described earlier. The data generator generateslog data for ERP system transactions along with the email log and phonecall log.

5.2 Scenario Testing

We have used the above synthetic data and the previously describedscenarios to evaluate our system. We have in each case examined theoutput result-set of the SQL queries and verified output correctness. Wehave in addition, as a secondary check in each case, added and deletedrecords, and then re-run the scenario detection. This enabled us to verifythat adding relevant transactions has the expected effect of producing amatch where none existed previously, and that deleting records has theeffect of producing no match where one did exist previously. We alsovaried the interval or timeout attributes of scenarios and examined thedifferences in the number of matches that returns. For example, in thecase of scenario S01, when the maximum interval between componentswas four days and the overall duration of the scenario was one week, itreturns four matches. However, when we change the intervals to morerestricted values such as maximum interval of two days and overall dura-tion of three days, it returns two matches. This demonstrates successfulcontrolling of false positives and false negatives on the synthetic dataand ensures successful detection of scenarios on the synthetic data.

We ran all user activities (the basic scenarios) used in the describedfraud scenarios. Table 1 contains the number of activities or scenarios


Scenario Name Matches Scenario Name Matches

Change Vendor Bank 2730 PO Approval 5140

Pay Vendor 10820 Good Receipt 15570

Create Invoice 5280 Create Vendor 21510

Invoice Approval (MRBR) 2430 Create Customer 4830

Create PO 13190 Credit to customer 5260

Phone or Mail 15970

Table 1. Number of activities found in the randomly generated data.

Time Trans. User Term. Vendor Invoice P. Order

10:17:53 FK02 U010 T04 V00020

11:32:17 F-53 U010 T05 V00020 I00024 P00000010

13:02:48 FK02 U010 T10 V00020

01:03:34 FK02 U009 T02 V00001

02:28:50 F-53 U009 T06 V00001 I00017 P00000004

02:58:40 FK02 U009 T03 V00001

Table 2. The result of ‘S01’ on the generated data.

found in the randomly generated data. The summation of the activi-ties is greater than 100,000 because some transaction codes are presentin multiple activities. For example, ‘Create Vendor’ activity includesall the transaction codes with creating a new vendor and changingany details of the vendor records, thus the transaction codes for the‘Change Vendor Bank’ are all included in the ‘Create Vendor’ activity.

The scenario ‘Change Vendor Bank’ returned 2730 matching recordsand the scenario ‘Pay Vendor’ returned 10820 matching records. We ranthe scenario ‘S01’ (without the user collusion) to determine if any se-quence of ‘Change Vendor Bank’, ‘Pay Vendor’, ‘Change Vendor Bank’happened for the same vendor when the maximum interval between anytwo activities was 2 days and the overall duration was within 3 days.Extra conditions were that users should match. This process returned 2matching scenarios which took 1.2 seconds to run on a Pentium 4 ma-chine with 2GB RAM running Microsoft Windows XP Professional. Thetwo matches are shown in Table 2. The process returned one record foreach match, but for clarity we show each match displayed across threerows. We ran the same scenario with user collusion, ‘S01 col’ with themaximum interval 1 day and overall duration 3 days. We found fourmatches in 92 seconds, one of which is shown in Table 3, demonstratingthat considering collusion uncovers more red flag situations.

14

Time Trans. User Term. Recip. Vendor Invoice P. Order

09:57:55 FK02 U007 T04 V00006

10:59:44 PhoneTo U007 T09 U002

11:39:39 F-48 U002 T06 V00006 I00039 P00000013

12:15:07 MailTo U002 T10 U007

13:00:22 FK02 U007 T03 V00006

Table 3. The result of ‘S01 col’ on the generated data.

Without Collusion With CollusionScenario Name Matches Time Matches Time

S01: Redirected Payment 2 1.2s 4 92s

S02: False Invoice Payment 15 1.1s 5 18s

S03: Misappropriation 5 0.8s 6 17s

S04: Non-purchase Payment 27 2.3s 9 5.6s

S05: Anonymous Vendor Payment 42 1.3s 21 3.6s

S06: Anonymous Customer Payment 0 0.3s 2 2.7s

Table 4. The results of all scenarios.

We tested the other collusive scenarios as well. The matching resultsfound on the synthetic data for those scenarios are shown in Table 4. Thetime taken by the query for collusive scenarios is much greater than thetime taken for the non-collusive version. This is because of two reasons:one, collusive scenarios involve more data sources than the non-collusivescenarios, and two, the queries for collusive scenarios have cross fieldjoins in the database rather than same field joins as in non-collusivescenarios. For example, for non-collusive scenarios the query matchesthe user name for all activities, whereas in collusive scenarios, the queryhas match conditions between fields as one activity’s user name matcheswith another activity’s different field which holds the user name to whomthe mail has been sent.

We note that after each testing phase, in order to verify our results,we manually checked the results with the transaction log data and con-firmed the results to be correct. We verified also the correctness of theautomatically generated SQL queries which led to those results.

6. Conclusions

Fraudulent activity involving collusion is a significant problem but isoften overlooked in fraud detection research. In our previous work [6]we designed a fraud definition language and developed a fraud detection


framework for detection of financial fraud in ERP systems. Consider-ing the significance of fraud involving collusion, we have in this paperextended our architecture to define and detect fraud scenarios involv-ing collusion in ERP systems. Our architecture and its implementationhave been tested using common fraud scenarios involving collusion. Theprocess aggregates different sources of data for utilizing communicationrecords to detect collusion. The framework is independent of the struc-ture of the log data, and is portable among various ERP systems. Usingthe developed prototype software we have successfully tested the detec-tion of all scenarios described in this paper using synthetically generatedlog data. The framework allows us to configure scenarios on the fly thusdrilling down to more specific results and minimizing the number of falsepositives. We intend in future work to test our fraud scenario detectionprototype on real SAP system data. The current framework is based onthe idea of analyzing data in a post-hoc manner; we are considering ourfuture work to make it applicable for real-time analysis.

Acknowledgment: We gratefully acknowledge the support of SAPResearch. The research is supported in part by the Australian ResearchCouncil.

References

[1] ACFE, Report to the Nation on Occupational Fraud and Abuse,http://www.acfe.com/rttn/2010-rttn.asp, 2010.

[2] J.T. Wells, Corporate Fraud Handbook: Prevention and Detection.Wiley, 2004.

[3] ACFE, Report to the Nation on Occupational Fraud and Abuse2006, http://www.acfe.com/rttn/2006-rttn.asp, 2006.

[4] D. Coderre, Computer Aided Fraud Prevention and Detection: AStep by Step Guide, Wiley, 2009.

[5] KPMG, Fraud Survey 2003, KPMG Forensic,http://www.kpmg.com, 2003.

[6] A. K. Islam, M. Corney, G. Mohay, A. Clark, S. Bracher, T. Raub,and U. Flegel, Fraud Detection in ERP Systems Using ScenarioMatching. 25th IFIP International Information Security Confer-ence, SEC 2010, Springer. Security and Privacy - Silver Linings inthe Cloud pp. 112 123, 2010.

[7] C. Phua, V. Lee, K. Smith, R. Gayler, A Comprehensive Survey ofData Mining-based Fraud Detection Research, Artificial IntelligenceReview, 2005.

16

[8] J. Lin, M. Hwang, J. Becker, A Fuzzy Neural Network for Assessingthe Risk of Fraulent Financial Reporting, em Managerial AuditingJournal 18(8), pp. 657-665, 2003.

[9] T. Bell, J. Carcello, A decision Aid for Assessing the Liklihood ofFraudulent Financial Reporting, Auditing: A Journal of Practiceand Theory 10(1), pp. 271-309, 2000.

[10] K. Fanning, K. Cogger, R. Srivastava, Detection of ManagementFraud: A Neural Network Approach, Journal of Intelligent Systemsin Accounting, Finance and Management 4, pp. 113-126, 1995.

[11] S. Summers, J. Sweeney, Fraudulently Misstated Financial State-ments and Insider Trading: An Empirical Analysis, The AccountingReview, January 1998, pp. 131-146, 1998.

[12] J. Kim, A. Ong, R. Overill, Design of an Artificial Immune Systemas a Novel Anomaly Detector for Combating Financial Fraud inRetail Sector, Congress on Evolutionary Computation, 2003.

[13] R. Khan, M. Corney, A. Clark, G. Mohay, A Role Mining In-spired Approach to Representing User Behaviour in ERP Sys-tems, Asia Pacific Industrial Engineering and Management Society(APIEMS), 2010.

[14] G. Mohay, A. Anderson, B. Collie, O. De Vel, R. McKemmish, Com-puter and Intrusion Forensics, Artech House, 2003.

[15] P. Porras, R. Kemmerer, Penetration State Transition Analysis: ARule-Based Intrusion Detection Approach, 8th Annual ComputerSecurity Applications Conference, pp. 220-229, 1992.

[16] C. Michel, L. Me, ADeLe: an Attack Description Language forKnowledge-Based Intrusion Detection, 16th International Confer-ence on Information Security, pp. 353-368, 2001.

[17] J. Pouzol, M. Ducase, From Declarative Signature to Misuse IDS,Recent Advances in Intrusion Detection (RAID), LNCS vol. 2212,pp. 1-21, 2001.

[18] M. Meier, A Model for the Semantics of Attack Signatures in MisuseDetection Systems, 7th Information Security Conference, LNCS vol.3225, pp. 158-169, 2004.

[19] E. Lundin, H. Kvarnstrom, and E. Jonsson, Generation of high qual-ity test data for evaluation of fraud detection systems, 6th NordicWorkshop on Secure IT systems (NordSec2001), 2001.

[20] E. Lundin, H. Kvarnstrom, and E. Jonsson, Synthesizing Test Datafor Fraud Detection Systems, ICICS 2002, Singapore, 2002.

Documents

QUT Digital Repository: · Chapter 1 DETECTING COLLUSION IN ERP SYSTEMS Asadul K. Islam, Malcolm Corney, George Mohay, Andrew Clark, Shane Bracher, Tobias Raub and Ulrich Flegel