University of Science Ho Chi Minh City-Hoang_Thai_Son - HMM

8/21/2019 University of Science Ho Chi Minh City-Hoang_Thai_Son - HMM

1/27

University Of Science Ho Chi Minh City.

APPLICATION OF HIDDEN MARKOV MODEL IN

CREDIT CARD FRAUD DETECTION

Technical Report

Hoang Thai Son

10/10/2014

The most accepted payment mode is credit card for both online and offline in todaysworld, it provides cashless shopping at every shop in all countries. It will be the most

convenient way to do online shopping, paying bills etc. Hence, risks of fraud transaction

using credit card has also been increasing. In the existing credit card fraud detectionbusiness processing system, fraudulent transaction will be detected after transaction is

done. It is difficult to find out fraudulent and regarding loses will be barred by issuing

authorities. Hidden Markov Model is the statistical tools for engineer and scientists tosolve various problems.


2/27

APPLICATION OF HIDDEN MARKOV MODEL IN CREDIT CARD FRAUD DETECTION.

10/10/2014

Page 1

Abstract

Credit card frauds are increasing day by day regardless of the various techniques

developed for its detection. Fraudsters are so expert that they engender new ways for

committing fraudulent transactions each day which demands constant innovation for its

detection techniques as well. Many techniques based on Artificial Intelligence, Data

mining, Fuzzy logic, Machine learning, Sequence Alignment, decision tree, neural

network, logistic regression, nave Bayesian, Bayesian network, metalearning, Genetic

Programming etc., has evolved in detecting various credit card fraudulent transactions. A

steady indulgent on all these approaches will positively lead to an efficient credit card

fraud detection system. This paper presents a survey of various techniques used in credit

card fraud detection mechanisms and Hidden Markov Model (HMM) in detail. HMMcategorizes card holders profile as low, medium and high spending based on their

spending behavior in terms of amount. A set of probabilities for amount of transaction is

being assigned to each cardholder. Amount of each incoming transaction is then matched

with card owners category, if it justifies a predefined threshold value then the transaction

is decided to be legitimate else declared as fraudulent.

1.Introduction

In todays electronic society, e-commerce has become an essential sales channel

for global business. Due to rapid expansion of e-commerce, making use of credit cards

for purchases has dramatically amplified. Unfortunately, fraudulent use of credit cards

has also become an attractive source of revenue for criminals.

Occurrence of credit card fraud is increasing dramatically due to the security

weaknesses in contemporary credit card processing systems resulting in loss of billions of

dollars every year Credit cards can be used for doing shopping either offline or online. In

offline transaction the card must be physically present and is inserted in the payment

machine in the merchants place for making the payment.

But in online transaction only some of the card details like secure code, expiration

date and card number etc. is needed to do the transaction as it is mostly done via phone or

internet.

Credit card Fraud can be performed in two ways. In first case the card is stolen

and misused and in the other case only card details are known which is then entered while


3/27


10/10/2014

Page 2

doing online shopping for buying the expensive commodities. In this case the owner is

not aware that his card details are being used unless notified by some means.

. Some of the challenges that are faced by most detection techniques include [2]:

Skewed distribution of legitimate and fraudulent data in the database that

challenges the detection approaches. Genuine transactions are much higher as

compared to fraudulent.

Count of transaction which is proliferating swiftly. Mining of such immense

amount of data calls efficient techniques.

Availability of labeled data for the purpose of training, as genuine or cheat is not

readily available.

Tracking users behavior is tough as it changes quite often for all type of users

(good users, business and fraudsters). Dealing with old as well as new intellectual

is a challenging task.

2. Application Of Hidden Markov Model As Credit Card Fraud

Detection Method

2.1 Hidden Markov Model

A Hidden Markov Model is a finite set of states; each state is linked with aprobability distribution. Transitions among these states are governed by a set of

probabilities called transition probabilities. In a particular state a possible outcome or

observation can be generated which is associated symbol of observation of probability

distribution. It is only the outcome, not the state that is visible to an external observer and

therefore states are hidden to the outside; hence the name Hidden Markov Model.

Hence, Hidden Markov Model is a perfect solution for addressing detection of

fraud transaction through credit card. One more important benefit of the HMM-based

approach is an extreme decrease in the number of False Positives transactions recognized

as malicious by a fraud detection system even though they are really genuine.

In this prediction process, HMM consider mainly three price value ranges such as :

1) Low (l),

2) Medium (m) and,


4/27


10/10/2014

Page 3

3) High (h).

First, it will be required to find out transaction amount belongs to a particular

category either it will be in low, medium, or high ranges.

2.2 Credit Card Fraud Detection Using HMM

The implementation techniques of Hidden Markov Model in order to detect fraud

transaction through credit cards, it create clusters of training set and identify the spending

profile of cardholder. The number of items purchased, types of items that are bought in a

particular transaction are not known to the Fraud Detection system, but it only

concentrates on the amount of item purchased and use for further processing. It stores

data of different amount of transactions in form of clusters depending on transaction

amount which will be either in low, medium or high value ranges.

It tries to find out any variance in the transaction based on the spending behavioral

profile of the cardholder, shipping address, and billing address and so on. The

probabilities of initial set have chosen based on the spending behavioral profile of card

holder and construct a sequence for further processing. If the fraud detection system

makes sure that the transaction to be of fraudulent, it raises an alarm, and the issuing bank

declines the transaction.

In this prediction process it considers mainly three price value ranges such as 1)

low (l) 2) Medium (m) and 3) High (h)[23]. So set of this model prediction symbols is V

{ l, m, h}, so V f as l (low), m (medium), h (high) which makes M 3. E.g.

If card holder perform a transaction as $ 250 and card holders profile groups as l

(low) = (0, $ 100], m (medium) = ($ 200, $ 500], and h (high) = ($ 500, up to credit card

limit], then transaction which card holder want to do will come in medium profile group.

So the corresponding profile group or symbol is M and V (2) will be used.

In initial stage, model does not have data of last 10 transactions, in that case,

model will ask to the cardholder to feed basic information during transaction about the

cardholder such as mother name, place of birth, mailing address, email id etc. Due tofeeding of information, HMM model acquired relative data of transaction for further

verification of transaction on spending profile of cardholder

The implementation technique used in HMM is creating clusters of training sets so

as to identify spending profile of card holder. The type of items purchased works as states

for the model. The transition from one state to another is determined by probability


5/27


10/10/2014

Page 4

distribution. It requires minimum 10 previous transactions, on the basis of which the fore

coming transaction is chosen as fraud or genuine.

The model goes through two stages. In the first stage training of the system is

done. Second stage works for the detection of the fraud, based on the expected range ofamount the transaction. The expected amount and the actual amount for the next

transaction are compared on the basis of probability distribution during training phase. If

the deviation is above a threshold value then it is treated as fraud else legal. In case of

fraud alarm is generated and transaction is terminated or else it is routinely accomplished.

The figure 1 below illustrates about the two phases of the detection system used by

HMM. In the training phase clusters are created and based on the initial set of

transactions the spending profile of cardholder is identified. This directs for expected

transaction amount for each cardholder and the system is trained accordingly.

Then in the detection phase the system looks for the deviation in expected and

actual outcome and fraud is recognized

Figure 1: process flow of credit card fraud detection system


6/27


10/10/2014

Page 5

5.Conclusions

In this paper, it has been discussed that how Hidden Markov Model will facilitate

to stop fraudulent online transaction through credit card. The Fraud Detection System is

also scalable for handling vast volumes of transactions processing. The HMM basedcredit card fraud detection system is not taking long time and having complex process to

perform fraud check like the existing system and it gives better and fast result than

existing system. The Hidden Markov Model makes the processing of detection very easy

and tries to remove the complexity.

At the initial state HMM checks the upcoming transaction is fraudulent or not and

it allow to accept the next transaction or not based on the probability result. The different

ranges of transaction amount like low group, medium group, and high group as the

observation symbols were considered. The types of item have been considered to bestates of the Hidden Markov Model. It is recommended that a technique for finding the

spending behavioral habit of cardholders, also the application of this knowledge in

deciding the value of observation symbols and initial estimation of the model parameters.

6.Reference

[1] Abhinav Srivastava, Amlan Kundu, Shamik Sural. Credit Card Fraud DetectionUsing Hidden Markov Model. IEEE transactions on dependable and secure computing,

vol. 5, no. 1, january-march 2008.

[2] V. Bhusari1, S. Patil. Application of Hidden Markov Model in Credit Card Fraud

Detection. International Journal of Distributed and Parallel Systems (IJDPS) Vol.2,

No.6, November 2011.


7/27

International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.6, November 2011

DOI : 10.5121/ijdps.2011.2618 203

Application of Hidden Markov Model in CreditCard Fraud Detection

V. Bhusari1, S. Patil1

1Department of Computer Technology, College of Engineering,Bharati Vidyapeeth, Pune, India, 400011

Email: [email protected]

ABSTRACT

In modern retail market environment, electronic commerce has rapidly gained a lot of attention and also

provides instantaneous transactions. In electronic commerce, credit card has become the most important

means of payment due to fast development in information technology around the world. As the usage of

credit card increases in the last decade, rate of fraudulent practices is also increasing every year.

Existing fraud detection system may not be so much capable to reduce fraud transaction rate.

Improvement in fraud detection practices has become essential to maintain existence of payment system.

In this paper, we show how Hidden Markov Model (HMM) is used to detect credit card fraud transaction

with low false alarm. An HMM based system is initially studied spending profile of the card holder and

followed by checking an incoming transaction against spending behavior of the card holder, if it is not

accepted by our proposed HMM with sufficient probability, then it would be a fraudulent transaction.

Keywords

Credit card, Hidden Markov Model, fraud transaction

1. Introduction

In day to day life, online transactions are increased to purchase goods and services. According

to Nielsen study conducted in 2007-2008, 28% of the worlds total population has been usinginternet [1]. 85% of these people has used internet to make online shopping and the rate ofmaking online purchasing has increased by 40% from 2005 to 2008. The most common methodof payment for online purchase is credit card. Around 60% of total transaction was completedby using credit card [2]. In developed countries and also in developing countries to someextent, credit card is most acceptable payment mode for online and offline transaction. Asusage of credit card increases worldwide, chances of attacker to steal credit card details andthen, make fraud transaction are also increasing. There are several ways to steal credit carddetails such as phishing websites, steal/lost credit cards, counterfeit cards, theft of card details,intercepted cards etc [3]. The total amount of credit card online fraud transaction made in theUnited States itself was reported to be $1.6 billion in 2005 and estimated to be $1.7 billion in2006 [4].

Credit card can be used to purchases goods and services using online and offline transactionmode. It can be divided into two types: 1) physical card and 2) virtual card. In the physical cardbased purchase, card holder has to produce the card at the merchant counter and merchant willsweep the card in the EMV (Europay, MasterCard and Visa) machine. Fraud transaction can behappened in this mode, only after the card has been stolen. It will be difficult to detect fraud inthis type of transaction. If the card holder does not realize loss of the card and does not report topolice or card issuing company, it can give financial loses to issuing authorities. In the secondmethod of purchasing i.e. online, these transactions generally happen on telephone or internet


8/27


204

and to make this kind of transaction, the user will need some important information about acredit card (such as credit card number, validity, CVV number, name of card holder). To makefraud transaction to purchase goods and services, fraudster will need to know all these details ofcard only then he/she will make transactions. Most of the time, the cardholder may or may notknow that when or where any person will be seen or stolen card information. To detect thiskind of fraud transaction, we have proposed a Hidden Markov Model which is studyingspending profile of the card holder. An HMM is to analyze the spending profile of each cardholder and to find out any discrepancy in the spending patterns. Fraud detection can be detectedon analyzing of previous transactions data which helps to form spending profile of the cardholder. Every card holder having unique pattern contains information about amount oftransactions, details of purchased items, merchant information, date of transaction etc. It will bethe most effective method to counter fraud transaction through internet. If any deviation will benoticed from available patterns of the card holder, then it will generate an alarm to the systemto stop the transaction. Various techniques for the detection of credit card fraud transactionhave been proposed in last few years, are briefly explained some of them in section 2.

2. Other credit card fraud detection techniques

Credit card fraud detection has received an important attention from researchers in the world.Several techniques have been developed to detect fraud transaction using credit card which arebased on neural network, genetic algorithms, data mining, clustering techniques, decision tree,Bayesian networks etc.

Ghosh and Reilly [5] have proposed a neural network method to detect credit card fraudtransaction. They have built a detection system, which is trained on a large sample of labeledcredit card account transactions. These sample contain example fraud cases due to lost cards,stolen cards, application fraud, stolen card details, counterfeit fraud etc. They tested on a dataset of all transactions of credit card account over a subsequent period of time.

Bayesian networks are also one technique to detect fraud, and have been used to detect fraud inthe credit card industry [6]. This techniques yield better results but having large cycle time todetect fraud. However, the time constraint is one main disadvantage of this technique,especially compared with neural networks.

Another algorithm that has been suggested by Bentley [7] is based on genetic programming. AGenetic algorithm is used to establish logic rules capable of classifying credit card transactionsinto suspicious and non-suspicious classes. Basically, this method follows the scoring processin which overdue payment was checking against last three month payment. If it is greater thanthat of last three month, then it will be considered as suspicious or else it will be nonsuspicious.

The idea of a similarity tree using decision tree logic has been reported in 1997 by Kokkinaki[8]. A decision tree is defined recursively; it contains nodes and edges that are labeled withattribute names and with values of attributes, respectively. All of these satisfy some conditionand get an intensity factor which is defined as the ratio of the number of transactions thatsatisfy applied conditions over the total number of legitimate transaction. The advantages of themethod are easy to understand and implement. However, disadvantages of the methods are thata long time period and check each transaction one by one.

The next is clustering technique proposed by Bolton and Hand in 2002 [9]. In this technique,clustering of two algorithms have used for behavioral fraud detection. The proposed systemwas identified those accounts that are behaving differently from others at the particular momentwhereas they were behaving the same previously. Those accounts are treating as suspiciousones and fraud analysis is to be done only on these accounts. If break point analysis can


9/27


205

identify suspicious behavior such as sudden transaction of high amount and high frequency,then card will be identified as fraudulent.

The data mining technique has been using from 1990. This technique was very time consumingand difficult process to detect fraud transaction. Since there are millions of transactionsprocessed everyday and their data are highly skewed. The transactions are more legitimate than

fraudulent. It requires highly efficient technique to scale down all data and also try to identifyfraud transaction not legitimate transactions. Black Box technique has proposed by Chan in1999 [10]. In this data mining technique, they have divided the whole data into subgroups andapply mining technique to generate classifiers. These classifiers treat as black box and appliedvariety of algorithms to these black boxes to detect fraud transactions.

3. Hidden markov model (HMM)

Hidden Markov Model is probably the simplest and easiest models which can be used to modelsequential data, i.e. data samples which are dependent from each other. An HMM is a doubleembedded random process with two different levels, one is hidden and other is open to all.

The Hidden Markov Model is a finite set of states, each of which is associated with aprobability distribution. Transitions among the states are governed by a set of probabilities

called transition probabilities.In a particular state an outcome or observationcan be generated,according to the associated probability distribution. It is only the outcome, not the state visibleto an external observer and therefore states are hidden to the outside; hence the name HiddenMarkov Model [11, 13].

HMM has been successfully applied to many applications such as speech recognition, robotics,bio-informatics, data mining etc [10-12].

In order to define an HMM completely, following elements are needed.

The number of states of the model,N. We denote the set of states S = {S1; S2; S3; . . SN},where i =1; 2; . . .; N, is a number of state and Si, is an individual state. The state at timeinstant t is denoted by qt.

The number of observation symbols in the alphabet,M. If the observations are continuousthen M is infinite. We denote the set of symbols V = {V1; V2; . . . VM} where Vi, is anindividual symbol for a finite value of M.

= {aij}

A set of state transition probabilities.

aij= P{qt+1= Sj| qt= Si}, 1i, j N,

where qtdenotes the current state,

Transition probabilities should satisfy the normal stochastic constraints,

aij0, 1 i, j N

And Naij= 1, 1 i N,j=1

The observation symbol probability matrix B,

B = {bj(k)}

A probability distribution in each of the states,

bj(k) = P{at= Vk| qt= Sj}, 1 j N, 1 k M


10/27


206

where, Vk denotes the kth observation symbol in the alphabet, and at the current parameter

vector.Following stochastic constraints must be satisfied.

bj(k) 0, 1 j N, 1 k M

And Mbj(k) = 1, 1 j Nk=1

If the observations are continuous then we will have to use a continuous probability densityfunction, instead of a set of discrete probabilities. In this case we specify the parameters of theprobability density function. Usually the probability density is approximated by a weighted sumofMGaussian distributionsN,

bj(at) = cjmN (jm, jm, at)

where, cjm = weighting coefficients,

jm = mean vectors,

jm = Covariance matrices

cjmshould satisfy the stochastic constrains,cjm0, 1 j N, 1 m M

And

M

cjm = 1, 1 j Nm=1

The initial state distribution,= {i},

where,i= P{qi= Si}, 1 i N

N

i= 1` i=1

Therefore we can use the compact notation= (, B, )

to denote an HMM with discrete probability distributions, while

= (, cjm, jm, jm, )

to denote one with continuous densities.

Hidden Markov Model assumes that current output (observation) is statisticallyindependent of the previous outputs (observations). We can formulate this assumptionmathematically, by considering a sequence of observations,

O = O1, O2, O3,..... OR,

Q = q1, q2, q3......qR,

where R, is a number of observation in the sequence and Q, is a one particular sequence.

Then according to the assumption for an HMM, probability that O is generated from thisstatesequence is given by

R

P{O|q1,q2,q3,...qR, } = P(Ot|qt, )t=1


11/27


207

a1-2

a2-1

a3-2

a1-3 a2-3

a2-3a1-1

a3-3

a2-2

a3-2

P(O|Q,) = bq1(O1).bq2(O2)......bqR(OR).

The probability of the state sequence Q is given as

P(Q|) = q1.aq1q2.aq2q3aqR-1qR

Thus, the probability of generation of the observation sequence O by the HMM with respect to

will be written as follows:P(O|) = P(O|Q, ).P(Q|).

All Q

Calculation of probability P(O|) is an intensive computing process. Hence, a forward-backward algorithm [13] is used to calculate probability P(O|).

Fig. 1: Transition of different states

4. Application of HMM in credit card fraud detection

In this section, we present credit card fraud detection system based on Hidden Markov Model,which does not require fraud signatures and still is able to detect frauds just by bearing in minda cardholders spending habit. The important benefit of the HMM-based approach is anextreme decrease in the number of False Positives transactions recognized as malicious by afraud detection system even though they are really genuine.

As we have shown that How HMM is useful for interstate transition in section 3. In this frauddetection system, we consider three different spending profiles of the card holder which isdepending upon price range, named high (h), medium (m) and low (l). In this set of symbols,we define V = {l, m, h} and M =3. The price range of proposed symbols has taken as low (0,$100], medium ($100, $500] and high ($500, up to credit card limit]. After finalizing the stateand symbol representations, the next step is to determine different components of the HMM,i.e. the probability matrices A, B, and so that all parameters required for the HMM is known.

These three model parameters are determined in a training phase using the forward-backwardalgorithm [13]. The initial choice of parameters affects the performance of this algorithm and,hence, it is necessary to choose all these parameters carefully. We consider the special case offully connected HMM in which every state of the model can be reached to every other state justin a single step, as shown in Fig. 1. 1, 2, 3 etc., are names given to the states to denote differentpurchase types such as bill payment, restaurant, electronics items etc.

In the figure 1, it has been shown that probability of transition from one state to another (forexample from 1 to 2 and vice versa, represented as a1-2 and a2-1, respectively) and also

12

3

b1-h

b1-lb1-m

hm l

h m

lb2-l

b2-mb2-h

h m l

b3-h b3-lb3-m

1


12/27


208

probabilities of transition from a particular state (1, 2, or 3) to different spending habits h, m, orl (for example, b1-h, b1-m,etc.).

The most important thing is to estimate HMM parameters for each card holder. The forward-backward algorithm starts with initial HMM parameters and converges to the nearest likelihoodvalues.

After deciding HMM parameters, we will consider to form an initial sequence of the existingspending behavior of the card holder. Let O1, O2, OR be consisting of R symbols to form asequence. This sequence is recorded from cardholders transaction till time t. We put thissequence in HMM model to compute the probability of acceptance. Let us assume be thisprobability is 1, which can be calculated as

1= P (O1, O2, O3, ...OR| ),

Let OR+1be new generated sequence at time t+1, when a transaction is going to process. Thetotal number of sequences is R+1. To consider R sequences only, we will drop O1sequence andwe will have R sequences from O2to OR+1.

Let the probability of new R sequences be2

2= P (O2, O3, O4, ....OR+1| ),

Hence, we will find

= 1 2,

If > 0, it means that HMM consider new sequence i.e. OR+1 with low probability andtherefore, this transaction will be considered as fraud transaction if and only if percentagechange in probability is greater than a predefined threshold value.

/ 1threshold value,

The threshold value can be calculated empirically. This Fraud detection system if finds that thepresent transaction is a malicious, then credit card issuing bank will regret the transaction andFDS discard to add OR+1symbol to available sequence. If it will be a genuine transaction, FDSwill add this symbol in the sequence and will consider in future for fraud detection.

5. Results and discussion

It is very difficult to do simulation on real time data set which is not providing from any creditcard bank on security reasons. In Table 1, it is shown that a random data set of all transactionshappened is categorized according to their types of purchase. With the help of this, we calculateprobability of each spending profile (h, l and m) of every category (1, 2 and 3). Fraud detectionof incoming transaction will be checked on last 10 transactions.

Table 1, list of all transactions happened till date

No. of Transaction Category Amount No. of Transaction Category Amount1st 1 140 10t 1 552nd 3 125 11th 1 2103r 2 120 12t 3 5504th 2 40 13th 3 1605th 1 15 14th 2 6956th 3 10 15th 2 3427th 1 520 16th 1 288th 2 74 17th 2 5079th 2 190 18th 2 610


13/27


209

Fig. 2: Different transactions amount in a category

Fig. 3: Probabilities of different spending profiles of each category

In Fig. 3, it is shown that the amount of purchased items or services in different categories suchas 1st for restaurant, grocery etc., 2nd for bill payment, balance transfer etc. and 3rd for ticketreservation, electronic devices etc., with respect to their number of transactions.

We have simulated several large data sets; one is shown in Table 1, in our proposed frauddetection system and found out probability mean distribution of false and genuine transactions.In Fig. 4, it is noted that when probability of genuine transaction is going down,correspondingly probability of false transaction going up and vise versa. If the percentagechange in probability of false transaction will be more than threshold value, then alarm will begenerated for fraudulent transaction and credit card bank will decline the same transaction.

6. ConclusionIn this paper, we have discussed that How Hidden Markov Model will be useful to detectfraudulent online transaction through credit card. The proposed Fraud Detection System is alsoscalable for handling vast volumes of transactions data processing. The HMM based credit cardfraud detection system is not having complex process to perform fraud check like the existingsystem. Proposed Fraud detection system gives genuine and fast result than existing system.The Hidden Markov Model makes the processing of detection very easy and tries to remove thecomplexity.

0 1 2 3 4 5 6 7 8 9

0

100

200

300

400

500

600

700

Amount(inUS$)

No. of Transactions

1st

2nd

3rd

1 2 30.0

0.1

0.2

0.3

0.4

0.5

Probability

Categories

high

medium

low


14/27


210

Fig. 4: Fraud Transaction Mean Distribution

In this paper, we have shown that HMM initially checks the upcoming transaction is fraudulentor not. It also takes decision to add new upcoming transaction to existing sequence or not which

will be dependent on percentage change in probabilities of old and new sequence. It will decidewhether this transaction is genuine or fraudulent depending on threshold values. We havecategorized different types of items and services such as restaurant, bill payment etc. Thesedifferent categories have been considered as three different states of the Hidden Markov Model.In each category, we have further divided into three different groups, high, medium and lowbased on different ranges of transaction amount. These groups were considered as observationsymbols. This technique helps to find the spending behavioral habit of cardholders andpurchasing of different items. The most important application of this technique is to decideinitial value of observation symbols, probability of transition states and initial estimation of themodel parameters.

In our proposed model, we have found out more than 88% transactions are genuine and verylow false alarm which is about 8 % of total number of transactions.

The relative studies and our results sure that the correctness and effectiveness of the proposedsystem is secure to 82 percent over a broad deviation in the input data.

7. References

[1] Internet usage world statistics, (http://www.internetworldstats.com/stats.htm) (2011).

[2] Trends in online shopping, a Global Nelson Consumer Report, (2008).

[3] European payment cards fraud report, Payments, Cards and Mobiles LLP & Author, (2010).

[4] Statistics for General and On-Line Card Fraud, (2007).

[5] Ghosh, Sushmito & Reilly, Douglas L., (1994) Credit Card Fraud Detection with a Neural-Network, Proc. of 27

thHawaii Intl Conf. on System Science: Information systems: Decision

Support and Knowledge-Based Systems, Vol.3, pp. 621-630.

[6] Maes, Sam, Tuyls Karl, Vanschoenwinkel Bram & Manderick, Bernard, (2002) Credit Card FraudDetection Using Bayesian and Neural Networks, Proc. of 1

stNAISO Congress on Neuro Fuzzy

Technologies. Hawana.

[7] Bentley, Peter J., Kim, Jungwon, Jung, Gil-Ho and Choi, Jong-Uk, (2000) Fuzzy DarwinianDetection of Credit Card Fraud, Proc. of 14thAnnual Fall Symposium of the Korean InformationProcessing Society.

[8] Kokkinaki, A. I., (1997) "On Atypical Database Transactions: Identification of Probable FraudsUsing Machine Learning for User Profiling",IEEE Knowledge and Data Engineering Exchange

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.00.0

0.2

0.4

0.6

0.8

1.0

1.2

Probability

Fraud Trasaction Mean Distribution

Genuine Transaction

False Transaction


15/27


211

Workshop, kdex, pp.107.

[9] Bolton, Richard J. & Hand, David J., (2002) Statistical Fraud Detection: A Review, StatisticalScience, Vol.10, No. 3, pp. 235-255.

[10] Chan, Philip K., Fan, Wei, Prodromidis, Andreas L. & Stolfo, Salvatore J., (1999) DistributedData Mining in Credit Card Fraud Detection,IEEE Intelligent Systems, Vol. 14, No. 6, pp. 67-74.

[11] Rabiner, Lawrence R., (1989) A Tutorial on Hidden Markov Models and SelectedApplications in Speech Recognition, Proc. of IEEE,Vol. 77, No. 2, pp. 257-286.

[12] Fonzo, Valeria De, Aluffi-Pentini, Filippo and Parisi, Valerio, (2007) Hidden Markov Models inBioinformatics, Current Bioinformatics, Vol. 2, pp. 49-61.

[13] Srivastava, Abhinav, Kundu, Amlan, Sural, Shamik and Majumdar, Arun K., (2008) Credit CardFraud Detection Using Hidden Markov Model,IEEE Transactions on Dependable and SecureComputing, Vol. 5, No. 1, pp. 37-48.

Author

Vrunda Bhusari receivedB. E. degree in computer technology from Kavikulguru Instituteof Technology and Science, Nagpur in 2007 and currently perusing M Tech in computerscience from Bharati VidyaPeeth, Pune. Her research interests include data mining,network security and database security.


16/27

Credit Card Fraud Detection UsingHidden Markov Model

Abhinav Srivastava, Amlan Kundu, Shamik Sural, Senior Member, IEEE, and

Arun K. Majumdar, Senior Member, IEEE

AbstractDue to a rapid advancement in the electronic commerce technology, the use of credit cards has dramatically increased. As

creditcard becomes themost popular mode of payment forboth online as well as regular purchase, casesof fraud associated with it are

also rising. In this paper, we model the sequence of operations in credit card transaction processing using a Hidden Markov Model

(HMM) andshow howit can be used forthe detectionof frauds. An HMMis initially trained with thenormal behaviorof a cardholder. If an

incoming creditcard transactionis notacceptedby the trained HMMwith sufficientlyhigh probability,it is considered to be fraudulent. At

the same time, we try to ensure that genuine transactions are not rejected. We present detailed experimental results to show the

effectiveness of our approach and compare it with other techniques available in the literature.

Index TermsInternet, online shopping, credit card, e-commerce security, fraud detection, Hidden Markov Model.

1 INTRODUCTION

THEpopularity of online shopping is growing day by day.According to an ACNielsen study conducted in 2005,one-tenth of the worlds population is shopping online [1].Germany and Great Britain have thelargest numberof onlineshoppers, and credit card is the most popular mode ofpayment (59percent).About350 milliontransactionsper yearwere reportedly carried out by Barclaycard, the largest creditcard company in the United Kingdom, toward the end of thelast century [2]. Retailers like Wal-Mart typically handlemuch larger number of credit card transactions includingonline and regular purchases. As the number of credit cardusers risesworld-wide, the opportunities for attackersto stealcredit card details and, subsequently, commit fraud are alsoincreasing. The total credit card fraud in the United Statesitself is reported to be $2.7 billion in 2005 and estimated to be$3.0 billion in 2006, out of which $1.6 billion and $1.7 billion,respectively, are the estimates of online fraud [3].

Credit-card-based purchases can be categorized into twotypes: 1) physical card and 2) virtual card. In a physical-card-based purchase, the cardholder presents his card physicallyto a merchant for making a payment. To carry out fraudulenttransactions in this kind of purchase, an attacker has to stealthe credit card. If the cardholder does not realize the loss ofcard, it canlead to a substantial financialloss to thecredit cardcompany. In the second kind of purchase, only some

important information about a card (card number, expirationdate, secure code) is required to make the payment. Suchpurchases are normally done on the Internet or over thetelephone. To commit fraud in these types of purchases, afraudster simply needs to know the card details. Most of the

time, the genuine cardholder is not aware that someone elsehasseenor stolenhis card information.The only wayto detectthis kind of fraud is to analyze thespendingpatternson everycard and to figure out any inconsistency with respect to theusual spending patterns. Fraud detection based on theanalysis of existing purchase data of cardholder is apromising way to reduce the rate of successful credit cardfrauds. Since humans tend to exhibit specific behavioristicprofiles, every cardholder can be represented by a set ofpatterns containing information about the typical purchasecategory, the time since the last purchase, the amount of

money spent, etc. Deviation from such patterns is a potentialthreat to the system.Several techniques for the detection of credit card fraud

have been proposed in the last few years. We briefly reviewsome of them in Section 2.

2 RELATED WORK ON CREDIT CARDFRAUDDETECTION

Credit card fraud detection has drawn a lot of researchinterest and a number of techniques, with special emphasison data mining and neural networks, have been suggested.Ghosh and Reilly [4] have proposed credit card frauddetection with a neural network. They have built a detection

system, which is trained on a large sample of labeled creditcard account transactions. These transactions contain exam-ple fraud cases due to lost cards, stolen cards, applicationfraud, counterfeit fraud, mail-order fraud, and nonreceivedissue (NRI) fraud. Recently, Syeda et al. [5] have used parallelgranular neural networks (PGNNs) for improving the speedof data mining and knowledge discovery process in creditcard fraud detection. A complete system has been imple-mented for this purpose. Stolfo et al. [6] suggest a credit cardfraud detection system (FDS) using metalearning techniquesto learn models of fraudulent credit card transactions.Metalearning is a general strategy that provides a means forcombining and integrating a number of separately built

classifiers or models. A metaclassifier is thus trained on the

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 5, NO. 1, JANUARY-MARCH 2008 37

. The authors are with the School of Information Technology, IndianInstitute of Technology, Kharagpur, 721302 India.E-mail: {abhinav.srivastava, amlan.kundu, shamik, akmj}@sit.iitkgp.ernet.in.

Manuscript received 2 Aug. 2006; revised 11 Mar. 2007; accepted 21 Aug.2007; published online 31 Aug. 2007.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TDSC-0112-0806.

Digital Object Identifier no. 10.1109/TDSC.2007.70228.1545-5971/08/$25.00 2008 IEEE Published by the IEEE Computer Society

Authorized licensed use limited to: Rajarambapu Institute of Technology. Downloaded on August 2, 2009 at 09:30 from IEEE Xplore. Restrictions apply.


17/27

correlation of the predictions of the base classifiers. The samegroup has also worked on a cost-based model for fraud andintrusion detection[7]. They useJava agentsfor Metalearning(JAM), which is a distributed data mining system for creditcard fraud detection. A number of important performancemetrics like True PositiveFalse Positive (TP-FP) spread andaccuracy have been defined by them.

Aleskerov et al. [8] present CARDWATCH, a databasemining system used for credit card fraud detection. Thesystem, based on a neural learning module, provides aninterface to a variety of commercial databases. Kim and Kimhave identified skewed distribution of data and mix oflegitimate and fraudulent transactions as the two mainreasons for the complexity of credit card fraud detection [9].Based on this observation, they use fraud density of realtransaction data as a confidence value and generate theweighted fraud score to reduce the number of misdetections.

Fan et al. [10] suggest the application of distributed datamining in credit card fraud detection. Brause et al. [11] havedeveloped an approach that involves advanced data miningtechniques and neural network algorithms to obtain high

fraud coverage. Chiu and Tsai [12] have proposed Webservices and data mining techniques to establish a colla-borative scheme for fraud detection in the banking industry.With this scheme, participating banks share knowledgeabout the fraud patterns in a heterogeneous and distributedenvironment. To establish a smooth channel of dataexchange, Web services techniques such as XML, SOAP,and WSDL are used. Phua et al. [13] have done an extensivesurvey of existing data-mining-based FDSs and published acomprehensive report. Prodromidis and Stolfo [14] use anagent-based approach with distributed learning for detect-ing frauds in credit card transactions. It is based on artificialintelligence and combines inductive learning algorithmsand metalearning methods for achieving higher accuracy.

Phua et al. [15] suggest the use of metaclassifier similar to[6]infrauddetectionproblems.TheyconsidernaiveBayesian,C4.5, and Back Propagation neural networks as the baseclassifiers. A metaclassifier is used to determine whichclassifier should be considered based on skewness of data.Although they do not directly use credit card fraud detectionas thetarget application,their approach is quitegeneric.Vatsaet al. [16] have recently proposed a game-theoretic approachto credit card fraud detection. They model the interactionbetweenan attackerand an FDS as a multistage game betweentwo players, each trying to maximize his payoff.

The problem with most of the abovementioned ap-proaches is that they require labeled data for both genuine,as well as fraudulent transactions, to train the classifiers.Getting real-world fraud data is one of the biggest problemsassociated with credit card fraud detection. Also, theseapproaches cannot detect new kinds of frauds for whichlabeled data is notavailable. In contrast, we present a HiddenMarkov Model (HMM)-based credit card FDS, which doesnotrequire fraud signatures andyet is able to detectfrauds byconsidering a cardholdersspendinghabit. We model a creditcard transaction processing sequence by the stochasticprocess of an HMM. The details of items purchased inindividual transactions are usually not known to an FDSrunning atthe bank that issuescredit cards tothe cardholders.This can be represented as the underlying finite Markovchain, which is not observable. The transactions can only be

observed through the other stochastic process that produces

the sequence of the amount of money spent in eachtransaction. Hence, we feel that HMM is an ideal choice foraddressingthis problem. Another important advantage of theHMM-basedapproach is a drastic reduction in thenumber ofFalse Positives (FPs)transactions identified as malicious byan FDS although they are actually genuine. Since the numberof genuine transactions is a few orders of magnitude higher

than the number of malicious transactions, an FDS should bedesigned in such a way that the number of FPs is as low aspossible. Otherwise, due to the base rate fallacy effect [17],bank administrators may tend to ignore the alarms. To thebest of our knowledge, there is no other published literatureon the application of HMM for credit card fraud detection.

The rest of the paper is organized as follows: In Section 3,we first briefly explain the working principle of an HMM.We then show how to model credit card transactionprocessing using HMM in Section 4. We also describe thecomplete process flow of the proposed FDS in this section.Detailed experimental result is presented in Section 5.Finally, we conclude the paper with some discussions.

3 HMM BACKGROUND

An HMM is a double embedded stochastic process with twohierarchy levels. It can be used to model much morecomplicatedstochasticprocesses as compared to a traditional

Markovmodel.AnHMMhasafinitesetofstatesgovernedbya set of transition probabilities. In a particular state, anoutcome or observation can be generated according to an

associated probability distribution. It is only theoutcome andnot the state that is visible to an external observer [18].

HMM-based applications are common in various areassuch as speech recognition, bioinformatics, and genomics. Inrecent years, Joshi and Phoba [19] have investigated the

capabilities of HMM in anomaly detection. They classifyTCPnetwork traffic as an attack or normal using HMM. Cho andPark [20] suggest an HMM-based intrusion detection system

that improves the modeling time and performance byconsidering only the privilege transition flows based on thedomain knowledge of attacks. Ourston et al. [21] have

proposed the application of HMM in detecting multistagenetwork attacks. Hoang et al. [22] present a new method toprocesssequencesofsystemcallsforanomalydetectionusing

HMM. The key ideais tobuild a multilayer model of programbehaviors based on both HMMs and enumerating methodsfor anomaly detection. Lane [23] has used HMM to modelhumanbehavior.Once humanbehavior is correctly modeled,

anydetecteddeviation is a cause forconcern since an attackeris not expected to have a behavior similar to thegenuine user.Hence, an alarm is raised in case of any deviation.

An HMM can be characterized by the following [18]:

1. N is the number of states in the model. We denotethe set of states S fS1; S2; . . . SNg, where Si, i1; 2; . . . ; N is an individual state. The state at timeinstantt is denoted by qt.

2. Mis the number of distinct observation symbols perstate. The observation symbols correspond to thephysical output of the system being modeled. Wedenote the set of symbols V fV1; V2; . . . VMg, where

Vi, i 1; 2; . . . ; Mis an individual symbol.

38 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 5, NO. 1, JANUARY-MARCH 2008



18/27

3. The state transitionprobability matrix A aij,where

aij Pqt1 Sjjqt Si; 1 i N; 1 j N; t 1; 2; . . . :

1

For the general case where any statejcan be reached

from any other state i in a single step, we have aij>

0 for all i, j. Also,PN

j1 aij 1, 1 i N.4. The observation symbol probability matrix B

bjk, where

bjk PVkjSj; 1 j N ; 1 k Mand

XM

k1

bjk 1; 1 j N :2

5. The initial state probability vector i, where

i Pq1 Si; 1 i N ; such thatXN

i1

i 1: 3

6. The observation sequence O O1; O2; O3; . . . OR,where each observation Otis one of the symbols fromV, and R is the number of observations in thesequence.

It is evident that a complete specification of an HMM requires

the estimation of two model parameters, Nand M, and three

probability distributions A, B, and . We use the notation

A;B; to indicate the complete set of parameters of the

model, whereA,Bimplicitly includeNand M.An observation sequence O, as mentioned above, can be

generated by many possible state sequences. Consider one

such particular sequence

Q q1; q2; . . . ; qR; 4whereq1 is the initial state.

The probability that O is generated from this state

sequence is given by

POjQ; YR

t1

POtjqt; ; 5

where statistical independence of observations is assumed.

Equation (5) can be expanded as

POjQ; bq1 O1:bq2 O2 . . . bqR OR: 6

The probability of the state sequence Q is given as

PQj q1 aq1 q2 aq2q3. . . aqR1qR : 7

Thus, the probability of generation of the observation

sequence O by the HMM specified by can be written as

follows:

POj X

all Q

POjQ; PQj: 8

DerivingthevalueofPOj usingthedirectdefinitionof(8)is

computationally intensive. Hence, a procedure named as

Forward-Backwardprocedure[18]isusedtocomputePOj.

4 USE OF HMM FOR CREDIT CARDFRAUDDETECTION

An FDS runs at a credit card issuing bank. Each incomingtransaction is submitted to the FDS for verification. FDSreceives the card details and the value of purchase to verifywhether the transaction is genuine or not. The types of goodsthatareboughtinthattransactionarenotknowntotheFDS.Ittries to find any anomaly in the transaction based on thespending profile of the cardholder, shipping address, andbilling address, etc. If the FDS confirms the transaction to bemalicious,it raisesan alarm,and theissuing bank declines the

transaction. The concernedcardholdermay thenbe contactedand alerted about the possibility that the card is compro-mised. In this section, we explain how HMM can be used forcredit card fraud detection.

A set of notations and acronyms used in the paper isgiven in Table 1.

4.1 HMM Model for Credit Card TransactionProcessing

To map the credit card transaction processing operation interms of an HMM, we start by first deciding the observationsymbolsinourmodel.Wequantizethepurchasevaluesx intoM price ranges V1; V2; . . . VM, forming the observationsymbols at the issuing bank. The actual price range for each

symbol is configurable based on the spending habit ofindividual cardholders. These price ranges can be deter-mined dynamically by applying a clustering algorithmon thevalues of each cardholders transactions, as shown inSection 5.2. We use Vk,k 1; 2; . . . M, to represent both theobservation symbol, as well as the corresponding pricerange.

In this work, we consider only three price ranges, namely,low l, medium m, and high h. Our set of observationsymbols is, therefore, V fl;m;hg making M 3. Forexample, let l 0; $100, m $100; $500, and h $500;credit card limit. If a cardholder performs a transaction of$190, then the corresponding observation symbol ism.

A credit cardholder makes different kinds of purchases of

different amounts over a period of time. One possibility is to

SRIVASTAVA ET AL.: CREDIT CARD FRAUD DETECTION USING HIDDEN MARKOV MODEL 39

TABLE 1Notations and Acronyms



19/27

consider the sequence of transaction amounts and look fordeviations in them. However, the sequence of types ofpurchase is more stable compared to the sequence oftransaction amounts. The reason is that, a cardholder makespurchases depending on his need for procuring differenttypes of items over a period of time. This, in turn, generates asequence of transaction amounts.Each individual transactionamount usually depends on the corresponding type ofpurchase. Hence, we consider the transition in the type ofpurchase as state transition in our model. The type of eachpurchase is linkedto theline of business of thecorrespondingmerchant. This information about the merchants line ofbusiness is not known to the issuing bank running the FDS.Thus, the type of purchase of the cardholder is hidden fromthe FDS. The set of all possible types of purchase and,equivalently, the set of all possible lines of business ofmerchants forms the set of hidden states of the HMM. Itshould be noted at this stage that the line of business of themerchant is known to the acquiring bank, since thisinformation is furnished at the time of registration of amerchant. Also, some merchants may be dealing in varioustypes of commodities (For example, Wal-Mart, K-Mart, orTarget sells tens of thousands of different items). Such typesoflineofbusinessareconsideredasMiscellaneous,andwedonot attempt to determine the actual types of items purchasedin these transactions. Any assumption about availability ofthis information with the issuing bank and, hence, with theFDS, is not practical and, therefore, would not have beenvalid. In Section 5, we show the effect of the choice of thenumber of states on the system performance.

After deciding the state and symbol representations, thenext step is to determine the probability matricesA,B, and so that representation of the HMM is complete. Thesethree model parameters are determined in a training phaseusing the Baum-Welch algorithm [18]. The initial choice ofparameters affects the performance of this algorithm and,hence, they should be chosen carefully.

We consider the special case of fully connected HMM inwhichevery state of the model can be reached in a single stepfrom every other state, as shown in Fig. 1. Gr, El, Mi, etc., are

names given to the states to denote purchase types like

Groceries, Electronic items, and Miscellaneous purchases.Spending profiles of the individual cardholders are used toobtain an initial estimate for probability matrix Bof (2). InSection 4.2, we explain how to determine observationsymbols dynamically from a cardholders transactions.

4.2 Dynamic Generation of Observation Symbols

For each cardholder, we train and maintain an HMM. Tofind the observation symbols corresponding to individualcardholders transactions dynamically, we run a clusteringalgorithm on his past transactions. Normally, the transac-tions that are stored in the issuing banks database containmany attributes. For our work, we consider only the amount

that the cardholder spent in his transactions. Althoughvarious clustering techniques could be used, we useK-meansclustering algorithm [24] to determine the clusters.K-means is an unsupervised learning algorithm for groupinga given set of data based on the similarity in their attribute(often called feature) values. Each group formed in theprocess is called a cluster. The number of clusters Kis fixed apriori. The grouping is performed by minimizing the sum ofsquares of distances between each data point and thecentroid of the cluster to which it belongs.

In our work,Kis the same as the number of observationsymbols M. Le t c1; c2; . . . cM be the centroids of thegenerated clusters. These centroids or mean values areused to decide the observation symbols when a newtransaction comes in. Let x be the amount spent by thecardholder u in transaction T. FDS generates the observa-tion symbol for x (denoted by Ox) as follows:

Ox Varg mini

jxci j: 9

As mentioned before, the number of symbols is 3 in oursystem. Considering M 3, if we execute K-means algo-rithm on the example transactions in Table 2, we get theclusters, as shown in Table 3, with cl, cm, and ch as therespective centroids. It may be noted that the dollar amounts5, 10, and 10 have been clustered together asclresulting in acentroid of 8.3. The percentage p of total number of

transactions in this cluster is thus 30 percent. Similarly,


Fig. 1. HMM for credit card fraud detection.



20/27

dollar amounts 15, 15, 20, 25, and 25 have beengrouped in thecluster cmwith centroid 20, whereas amounts 40 and 80 havebeen grouped together in clusterch.cmand ch, thus, contain50 percent and 20 percent of the total number of transactions.When the FDS receives a transaction Tfor this cardholder, itmeasures the distance of the purchase amount x with respectto the meanscl,cm, andchto decide (using (9)) the cluster towhichTbelongs and, hence, the corresponding observationsymbol. As an example, ifx $10, then in Table 3 using (9),the observation symbol isV1 l.

4.3 Spending Profile of Cardholders

The spending profile of a cardholder suggests his normal

spending behavior. Cardholders can be broadly categorizedinto three groups based on their spending habits, namely,high-spending (hs) group, medium-spending (ms) group,and low-spending (ls) group. Cardholders who belong to thehs group, normally use their credit cards for buying high-priced items. Similar definition applies to the other twocategories also.

Spending profiles of cardholders are determined at theend of the clustering step. Let pi be the percentage of totalnumber of transactions of the cardholder that belong tocluster with meanci. Then, the spending profile (SP) of thecardholderu is determined as follows:

SPu arg maxi

pi: 10

Thus, spending profile denotes the cluster number to whichmost of the transactions of the cardholder belong. In theexample in Table 3, thespending profileof thecardholderis 2,that is, m and, hence, thecardholder belongs to thems group.

4.4 Model Parameter Estimation and Training

We use Baum-Welch algorithm to estimate the HMMparameters for each cardholder. The algorithm starts withan initial estimate of HMM parameters A, B, and andconverges to the nearest local maximum of the likelihoodfunction. Initial state probability distribution is considered tobe uniform, that is, if there are N states, then the initial

probability of each state is 1=N. Initial guess of transition andobservation probability distributions can also be consideredto be uniform. However, to make the initial guess ofobservation symbol probabilities more accurate, spendingprofile of the cardholder, as determined in Section 4.3, istaken into account. We make three sets of initial probabilityfor observation symbol generation for three spendinggroupsls, ms, and hs. Based on the cardholders spendingprofile, we choosethe corresponding setof initial observationprobabilities. The initial estimate of symbol generationprobabilities using this method leads to accurate learning ofthemodel. Since there is no a prioriknowledge about thestatetransition probabilities, we consider the initial guesses to be

uniform. In case of a collaborative work between an acquiring

bank and an issuing bank, we can have better initial guessabout state transition probabilities as well.

We now start training the HMM. The training algorithmhas the following steps: 1) initialization of HMM para-meters, 2) forward procedure, and 3) backward procedure.Details of these steps can be found in [18]. For training theHMM, we convert the cardholders transaction amount intoobservation symbols and form sequences out of them. Atthe end of the training phase, we get an HMM correspond-ing to each cardholder. Since this step is done offline, it doesnot affect the credit card transaction processing perfor-mance, which needs online response.

4.5 Fraud Detection

After the HMM parameters are learned, we take thesymbols from a cardholders training data and form aninitial sequence of symbols. Let O1; O2; . . . OR be one suchsequence of length R. This recorded sequence is formedfrom the cardholders transactions up to time t. We inputthis sequence to the HMM and compute the probability ofacceptance by the HMM. Let the probability be 1, whichcan be written as follows:

1 PO1; O2; O3; . . . ORj: 11

Let OR1 be the symbol generated by a new transaction attimet 1. To form another sequence of length R, we dropO1 and append OR1 in that sequence, generatingO2; O3; . . . OR; OR1 as the new sequence. We input thisnew sequence to the HMM and calculate the probability ofacceptance by the HMM. Let the new probability be 2

2 PO2; O3; O4; . . . OR1j; 12

Let 1 2: 13

If > 0, it means that the new sequence is accepted by theHMM with low probability, and it could be a fraud. Thenewly added transaction is determined to be fraudulent ifthe percentage changein theprobability is above a threshold,

that is,=1 Threshold: 14

The threshold value can be learned empirically, as will bediscussed in Section 5. IfOR1is malicious, the issuing bankdoes not approve the transaction, and the FDS discards thesymbol. Otherwise, OR1 is added in the sequence perma-nently, and the new sequence is used as the base sequence fordetermining the validity of the next transaction. The reasonforincludingnew nonmalicious symbols in thesequenceis tocapture the changing spending behavior of a cardholder.Fig. 2 shows the complete process flow of the proposed FDS.As shown in the figure, the FDS is divided into two

partsone is the training module, and the other is detection.


TABLE 3Output of K-Means Clustering Algorithm

TABLE 2Example Transactions with the Dollar Amount

Spent in Each Transaction



21/27

Training phase is performed offline, whereas detection is anonline process.

5 RESULTS

Testing credit card FDSs using real data set is a difficult task.Banks do not, in general, agree to share their data withresearchers. There is also no benchmark data set available forexperimentation. We have, therefore, performed large-scalesimulation studies to test the efficacy of the system. Asimulator is used to generate a mix of genuine andfraudulenttransactions. The number of fraudulent transactions in agiven length of mixed transactions is normally distributedwith a user specified (mean) and (standard deviation),taking cardholders spending behavior into account. specifies the average number of fraudulent transactions in agiven transaction mix. In a typical scenario, an issuing bank,and hence, its FDS receives a large number of genuinetransactions sparingly intermixed with fraudulent transac-tions. The genuine transactions are generatedaccording to thecardholders profiles. The cardholders are classified intothree categories as mentioned beforethe low, medium, andhs groups. We have studied theeffects of spending group andthe percentage of transactions that belong to the low-,medium-, and high-price-range clusters. We use standardmetricsTrue Positive (TP) and FP, as well as TP-FP spreadand Accuracy metrics, as proposed in [7] to measure theeffectiveness of the system. TP represents the fraction offraudulent transactions correctly identified as fraudulent,whereas FP is the fraction of genuine transactions identifiedas fraudulent. Most of the design choices for a FDS that resultin higher values of TP, also cause FP to increase. Tomeaningfully capture the performance of such a system, thedifference between TP and FP, often called the TP-FP spread,is used as a metric. Accuracy represents the fraction of totalnumber of transactions (both genuine and fraudulent) thathave been detected correctly. It can be expressed as follows:

Accuracy No:of good trans:detected as goodNo:of bad trans:detected as badTotal No:of transactions : 15

We first carried out a set of experiments to determine thecorrect combination of HMM design parameters, namely,the number of states, the sequence length, and the thresholdvalue. Once these parameters were decided, we performedcomparative study with another FDS.

5.1 Choice of Design Parameters

Since there are three parameters in an HMM, we need tovary one at a time keeping the other two fixed, thusgenerating a large number of possible combinations. Forchoosing the design parameters, we generate transactionsequences using 95 percent low value, 3 percent mediumvalue, and 2 percent high value transactions. The reason for

using this mix is that it represents a profile that stronglyresembles a ls customer profile. We also consider the and values to be 1.0 and 0.5, respectively. This is chosen sothat, on the average, there will be 1 fraudulent transactionin any incoming sequence with some scope for variation.After the parameter values are fixed, we will see inSection 5.2, how the system performs as we vary the profileand the mix of fraudulent transactions.

For parameter selection, the sequence length is variedfrom 5 to 25 in steps of 5. The threshold values consideredare 30 percent, 50 percent, 70 percent, and 90 percent. Thenumber of states is varied from 5 to 10 in steps of 1. Weconsider both TP and FP for deciding the optimum

parameter values. Thus, there are a total of 120 546possible combinations of parameters. The number ofsimulation runs required for obtaining results with a givenconfidence interval (CI) was derived as follows [25], [26].

Aninitialsetoffivesimulationruns,eachwith100samples,was carried out to estimate the mean and standarddeviation of both TP and FP for a fixed sequence length,number of states, and threshold value. Mean TP was found tobe an order of magnitude higher than mean FP. Standarddeviation of TP was 0.1 and that for FP was 0.005. We set thetarget 95 percent CI for TP and FP, respectively, as = 2.5 percent and = 0.25 percent around their mean values.Using Students t-distribution, the minimum number of

simulation runs required for obtaining desired CI for TP


Fig. 2. Process flow of the proposed FDS.



22/27

was derived as 83 and that for FP as 23. Based on theseobservations, we set thenumber of simulation runs for all theexperiments to be 100. The results obtained were within thedesired CI, as mentioned above.

Since it is not convenient to present the detailed resultsfor each of the 120 combinations, we show summarizedresults. In Table 4, we show the results for each value ofsequence length averaged over all the six states. Similarly,we present results for each value of the number of statesaveraged over all the five sequence lengths in Table 5.

In Tables 4 and 5, the highest value of TP, as well as thelowest value of FP, has been highlighted for each row. It isseen from the two tables that FP shows a clear trend of

decreasing with higher threshold and smaller sequencelengths. However, the number of states does not have astronginfluence eitheron TPor onFP. InTable4, itis seenthatTP is high for sequence length 15 in 75 percent of the cases.Also, fraud detection time increases linearly with thesequence length, as shown in Fig. 3. The results have beenplotted for a Java implementation on a 1.8 GHz Pentium IVprocessor machine. Hence, we choose 15 as the length ofobservation symbol sequence for optimum performance.Once sequence length is decided, it is seen in Table 4 that thethreshold could be set to either 30 percent or 50 percent.Although TP is higher forthreshold 30%, FP is also higher.To minimize FP, we choose threshold 50%. After choosing

sequence lengthand threshold, wehave tochoose thenumber

of states. Since there is no clear indication from the abovesummary information,as presentedin Tables4 and5, we takea look at the detailed data for TP when threshold 50%, asshown in Table 6.

It is seen that for sequence length 15, the highest valueof TP occurs for no:of states 10 and, hence, it would be agood choice for our design.

We have also analyzed the time taken by the trainingphase, which is performed offline for each cardholdersHMM. Fig. 4 shows the plot of model learning time againstthe number of sequences in the training data. As the size oftraining data increases, learning time increases, especiallybeyond 100. We therefore, use 100 sequences for training theHMM. Although done offline, the model learning time has astrong impact on the scalability of the system. Since an HMM


TABLE 4Variation of TP and FP with Different Sequence Lengths

TABLE 5Variation of TP and FP with Different Number of States

Fig. 3. Detection time versus length of sequence.



23/27

is trained foreachcardholder,it is imperativethatthe trainingtime is kept as low as possible especially when an issuingbank is meant to handle millions of cardholders with manynewcards being issuedeveryday. Theonlineprocessing timeof about 200 ms on a 1.8GHz Pentium IV machine also showsthat the system will be able to handle a large number ofconcurrent operations and, hence, is scalable.

Thus, our design parameter setting is given as follows:

1. number of hidden statesN 10,2. length of observation sequenceR 15,3. Threshold value 50%, and4. number of sequences fortraining 100.

With this design parameter setting, we next proceed to

study the performance of the system under variouscombinations of input data.

5.2 Comparative Performance

In this section, we show performance of the proposed systemas we vary thenumber of fraudulent transactionsand also thespending profile of the cardholder. Our design parametersetting is as obtained in the previous section. We compareperformance of our approach (denoted by OA below) withthe credit card fraud detection technique proposed by Stolfoet al. [6] (denoted by ST below). For comparison, we considerthe metrics TP and FP, as well as TP-FP and Accuracy [7].

We carried out experiments by varying both the transac-tion amount mix, as well as the number of fraudulent

transactions intermixed with a sequence of genuine transac-tions. Transaction amount mix is captured by the cardhol-ders profile. We consider four profiles. One of them is themixed profile, which means that spending profile is notconsidered at all by our approach, as explained inSection 4.3. The other profiles considered are (55 35 10),(70 20 10), and (95 3 2). Here, a b c profile represents a lsprofile cardholder who has been found to carry outa percent of his transactions in the low, b percent inmedium, andcpercent in the high range. Thus, our attemptis to see how the system performs in the presence ofdifferent mixes of transaction amount ranges in thetransactions. It may be noted that for cardholders in the

other two groups, namely, hs and ms, will show similar

performance as only the relative ordering ofa,b, andc willchange. We also vary the mean value of malicioustransactions from 0.5 to 4.0 in steps of 0.5. The value iskept fixed at 0.5 for all the experiments. Thus, everysequence of transaction that we use for testing is a mixedsequence containing both genuine, as well as malicious,transactions. For each combination of spending profile andmalicious transaction distribution, we carried out 100 runs

and report the average result. The same set of data was usedto determine the performance of both OA and ST.Fig. 5a shows the variation of TP and FP for the two

approaches using the spending profile (95 3 2). Variation ofTP-FP and Accuracy is shown in Fig. 5b.

It is seen from thefigures that TP of theproposedapproachis very close to that of Stolfo et als approach. Also, both theapproaches have almost similar values of FP. As a result, thetwo systems have comparable accuracies and average TP-FPspread. Further, the two approaches exhibit similar trendwith variation in . Next, we show how the two systemsbehave as we mix the transaction amounts. Percentages oflow value, medium value, and high value transactions arechanged from(95 3 2), asshownaboveto (70 2010) inFigs.6a

and 6b and (55 35 10) in Figs. 7a and 7b.There are a number of interesting observations from the

above twosets of figures. Thefirst observation is that, TP fallsandFPrisesforboththeapproaches,astransactionsnolongerremain strictly ls in nature. However, the FP rate of ST risessharply, although the TP rate does not degrade to a greatextent. On the other hand, for our approach, FP rate remainslow while there is a graceful degradation of the TP rate. As aresult, the Accuracy of the proposed system remains close to80 percent for all the above settings. Accuracy by Stolfo et alsapproach falls drastically to about 60 percent for (55 35 10).Thus, our approach has 15-20 percent better Accuracy.Although TP-FP values are close for the profile (70 20 10),there is more than 15 percent difference in TP-FP for theprofile (55 35 10) with our approach performing better.

In Figs. 8a and 8b, we show the performance of the twoapproaches when the profile is mixed, which means that allthethreerangesoftransactionsareequiprobable.Itisseenthatthe FPof the STmethod has goneup sharply. Infact, FPis evenhigher than TP. As a result, the mean accuracy has droppedbelow 40 percent. Our approach shows a fall in TP, but the FPhasnot degraded a lot. As a result, theAccuracyof oursystemstill remains around 80 percent. The TP-FP value for the STapproach has become negative even for 0:5. For ourapproach,theTP-FPvaluebecamenegativeonlyafter 2:5.

From the above results, it can be concluded that theproposed system has an overall Accuracy of 80 percent even

under large input condition variations, which is much higher


Fig. 4. Model learning time versus number of sequences in training data.

TABLE 6Detailed Result of TP for threshold 50%



24/27


Fig. 5. Performance variation of the two systems (OA and ST) with the meanof malicious transaction distribution for the spending profile (95 3 2).

(a) TP and FP. (b) TP-FP spread (SP) and Accuracy (AC).

Fig. 6. Performance variation of the two systems (OA and ST) within the mean of malicious transaction distribution for the spending profile

(70 20 10). (a) TP and FP. (b) TP-FP spread (SP) and Accuracy (AC).



25/27


Fig. 7. Performance variation of the two systems (OA and ST) with the mean of malicious transaction distribution for the spending profile

(55 35 10). (a) TP and FP. (b) TP-FP spread (SP) and Accuracy (AC).

Fig. 8. Performance variation of the two systems (OA and ST) with mean of malicious transaction distribution for mixed profile. (a) TP and FP.

(b) TP-FP spread (SP) and Accuracy (AC).



26/27

than the overall Accuracy of the method proposed by Stolfoet al. [6]. Our system can, therefore, correctly detect most ofthe transactions. However, when there is no profile informa-tion at all, the system shows some performance degradationin terms of TP-FP. This observation highlights theimportanceof profileselection as explained in Section 4. Also, when thereis little difference between genuine transactions and mal-

icious transactions, most of the credit card FDSs sufferperformance degradation,either dueto a fallin thenumber ofTPs or a rise in the number of FPs.

6 CONCLUSIONS AND DISCUSSIONS

In this paper, we have proposed an application of HMM incredit card fraud detection. The different steps in creditcard transaction processing are represented as the under-lying stochastic process of an HMM. We have used theranges of transaction amount as the observation symbols,whereas the types of item have been considered to be statesof the HMM. We have suggested a method for finding thespending profile of cardholders, as well as application ofthis knowledge in deciding the value of observationsymbols and initial estimate of the model parameters. Ithas also been explained how the HMM can detect whetheran incoming transaction is fraudulent or not. Experimentalresults show the performance and effectiveness of oursystem and demonstrate the usefulness of learning thespending profile of the cardholders. Comparative studiesreveal that the Accuracy of the system is close to 80 percentover a wide variation in the input data. The system is alsoscalable for handling large volumes of transactions.

ACKNOWLEDGMENTS

The authors would like to the anonymous reviewers fortheir constructive and useful comments. This work ispartially supported by a research grant from the Depart-ment of Information Technology, Ministry of Communica-tion and Information Technology, Government of India,under Grant 12(34)/04-IRSD dated 07/12/2004.

REFERENCES[1] Global Consumer Attitude Towards On-Line Shopping,

http://www2.acnielsen.com/reports/documents/2005_cc_onlineshopping.pdf, Mar. 2007.

[2] D.J. Hand, G. Blunt, M.G. Kelly, and N.M. Adams, Data Miningfor Fun and Profit, Statistical Science, vol. 15, no. 2, pp. 111-131,

2000.[3] Statistics for General and On-Line Card Fraud, http://www.epaynews.com/statistics/fraud.html, Mar. 2007.

[4] S. Ghosh and D.L. Reilly, Credit Card Fraud Detection with aNeural-Network, Proc. 27th Hawaii Intl Conf. System Sciences:Information Systems: Decision Support and Knowledge-Based Systems,vol. 3, pp. 621-630, 1994.

[5] M. Syeda, Y.Q. Zhang, and Y. Pan, Parallel Granular Networksfor Fast Credit Card Fraud Detection,Proc. IEEE Intl Conf. FuzzySystems, pp. 572-577, 2002.

[6] S.J. Stolfo, D.W. Fan, W. Lee, A.L. Prodromidis, and P.K. Chan,Credit Card Fraud Detection Using Meta-Learning: Issues andInitial Results,Proc. AAAI Workshop AI Methods in Fraud and Risk

Management, pp. 83-90, 1997.[7] S.J. Stolfo, D.W. Fan, W. Lee, A. Prodromidis, and P.K. Chan,

Cost-Based Modeling for Fraud and Intrusion Detection: Resultsfrom the JAM Project,Proc. DARPA Information Survivability Conf.

and Exposition,vol. 2, pp. 130-144, 2000.

[8] E. Aleskerov, B. Freisleben, and B. Rao, CARDWATCH: ANeural Network Based Database Mining System for Credit CardFraud Detection, Proc. IEEE/IAFE: Computational Intelligence forFinancial Eng.,pp. 220-226, 1997.

[9] M.J. Kim and T.S. Kim, A Neural Classifier with Fraud DensityMap for Effective Credit Card Fraud Detection, Proc. Intl Conf.Intelligent Data Eng. and Automated Learning, pp. 378-383, 2002.

[10] W. Fan, A.L. Prodromidis, and S.J. Stolfo, Distributed DataMining in Credit Card Fraud Detection,IEEE Intelligent Systems,

vol. 14, no. 6, pp. 67-74, 1999.[11] R. Brause, T. Langsdorf, and M. Hepp, Neural Data Mining forCredit Card Fraud Detection, Proc. IEEE Intl Conf. Tools with

Artificial Intelligence,pp. 103-106, 1999.[12] C. Chiu and C. Tsai, A Web Services-Based Collaborative Scheme

for Credit Card Fraud Detection, Proc. IEEE Intl Conf.e-Technology, e-Commerce and e-Service, pp. 177-181, 2004.

[13] C. Phua, V. Lee, K. Smith, and R. Gayler, A ComprehensiveSurvey of Data Mining-Based Fraud Detection Research, http://www.bsys.monash.edu.au/people/cphua/, Mar. 2007.

[14] S. Stolfo and A.L. Prodromidis, Agent-Based Distributed Learn-ing Applied to Fraud Detection, Technical Report CUCS-014-99,Columbia Univ., 1999.

[15] C. Phua, D. Alahakoon, and V. Lee, Minority Report in FraudDetection: Classification of Skewed Data,ACM SIGKDD Explora-tions Newsletter,vol. 6, no. 1, pp. 50-59, 2004.

[16] V. Vatsa, S. Sural, and A.K. Majumdar, A Game-theoretic

Approach to Credit Card Fraud Detection, Proc. First Intl Conf.Information Systems Security,pp. 263-276, 2005.[17] S. Axelsson, The Base-Rate Fallacy and the Difficulty of Intrusion

Detection, ACM Trans. Information and System Security, vol. 3,no. 3, pp. 186-205, 2000.

[18] L.R. Rabiner, A Tutorial on Hidden Markov Models and SelectedApplications in Speech Recognition, Proc. IEEE, vol. 77, no. 2,pp. 257-286, 1989.

[19] S.S. Joshi and V.V. Phoha, Investigating Hidden Markov ModelsCapabilities in Anomaly Detection, Proc. 43rd ACM Ann. South-east Regional Conf., vol. 1, pp. 98-103, 2005.

[20] S.B. Cho and H.J. Park, Efficient Anomaly Detection by ModelingPrivilege Flows Using Hidden Markov Model, Computer andSecurity,vol. 22, no. 1, pp. 45-55, 2003.

[21] D. Ourston, S. Matzner, W. Stump, and B. Hopkins, Applicationsof Hidden Markov Models to Detecting Multi-Stage NetworkAttacks,Proc. 36th Ann. Hawaii Intl Conf. System Sciences, vol. 9,

pp. 334-344, 2003.[22] X.D. Hoang, J. Hu, and P. Bertok, A Multi-Layer Model forAnomaly Intrusion Detection Using Program Sequences of SystemCalls,Proc. 11th IEEE Intl Conf. Networks, pp. 531-536, 2003.

[23] T. Lane, Hidden Markov Models for Human/Computer Inter-face Modeling, Proc. Intl Joint Conf. Artificial Intelligence, Work-shop Learning about Users, pp. 35-44, 1999.

[24] L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: AnIntroduction to Cluster Analysis, Wiley Series in Probability andMath. Statistics, 1990.

[25] K.S. Trivedi, Probability and Statistics with Reliability, Queuing, andComputer Science Applications,second ed. John Wiley & Sons, 2001.

[26] J. Banks, J.S. Carson II, B.L. Nelson, and D.M. Nicol,Discrete-EventSystem Simulation, fourth ed. Prentice Hall, 2004.

Abhinav Srivastava received the BTech degreein computer science and engineering from

Harcourt Butler Technological Institute, Kanpur,in 2001 and the MTech degree in informationtechnology from the Indian Institute of Technol-ogy, Kharagpur, in 2006. He is currently workingtoward the PhD degree in the School of Compu-ter Science, Georgia Institute of Technology. Hisresearch interests include virtualization, systemssecurity, database security, and data mining.




27/27

Amlan Kunduis working toward the MS degreein information technology from the School ofInformation Technology, Indian Institute ofTechnology, Kharagpur, India. His researchinterests include fraud detection, databasesecurity, and network security.

Shamik Sural received the BE degree inelectronics and telecommunication engineeringfrom Jadavpur University, Calcutta, India, in1990, the ME degree in electrical communica-tion engineering from the Indian Institute ofScience, Bangalore, India, in 1992 and thePhD degree from Jadavpur University in 2000.Before joining IIT, he held technical and man-agerial positions in a number of organizations inIndia and the United States. He has served on

the program committee and executive committee of a number ofinternational conferences including International conference on Informa-tion Systems Security, International Database Engineering and Applica-tions Symposium, IEEE Conference on Fuzzy Systems, Asian MobileComputing Conference, International Workshop on Distributed Comput-ing, and others. He is an associate professor at the School ofInformation Technology, IIT, Kharagpur, India. His research interestsinclude database security, data mining, and multimedia databasesystems. His research work has been funded by the Ministry ofCommunication and Information Technology, Department of Scienceand Technology, government of India, and the National SemiconductorsCorp. He has published more than 70 research papers in reputedinternational journals and conference proceedings. He is a seniormember of the IEEE and has served as the chairman of the IEEEKharagpur Section in 2006.

Arun K. Majumdar received the MTech andPhD degrees in applied physics from theUniversity of Calcutta in 1968 and 1973,respectively, and the PhD degree in electricalengineering from the University of Florida,Gainesville, in 1976. Before joining the IIT,Kharagpur, in 1980, he served as a facultymember at the Indian Statistical Institute, Cal-cutta, and Jawaharlal Nehru University, New

Delhi. He has held several administrative posi-tions at IIT Kharagpur that include dean (Faculty and Planning), head ofthe Computer Science and Engineering Department, head of theComputer and Informatics Centre, and head of the School of MedicalScience and Technology. He was a visiting p

Documents

University of Science Ho Chi Minh City-Hoang_Thai_Son - HMM