19
128 Int. J. Electronic Marketing and Retailing, Vol. 5, No. 2, 2012 Copyright © 2012 Inderscience Enterprises Ltd. Using CLV for modelling churn and customer retention Najmeh Abedzadeh* and MohammadAli Nematbakhsh Department of Computer Engineering, University of Isfahan, 81746-73441, Iran E-mail: [email protected] E-mail: [email protected] *Corresponding author Abstract: Preventing customer churn and trying to retain customers is the main object of customer churn management. This paper proposes a model to measure churn probability and introduces a policy to retain customers. Using existing datasets of customers, we calculated CLV and used the C5.0 technique to predict churn probability for each customer. We also used process mining to find a policy to retain each customer separately. The model was simulated using a super market chain (Refah) and the results show the model is performing much better than previous proposed models. A computer result is shown. Keywords: customer lifetime value; customer churn; data mining; customer retention; process mining. Reference to this paper should be made as follows: Abedzadeh, N. and Nematbakhsh, M. (2012) ‘Using CLV for modelling churn and customer retention’, Int. J. Electronic Marketing and Retailing, Vol. 5, No. 2, pp.128–146. Biographical notes: Najmeh Abedzadeh is an Instructor in the Department of computer engineering, University of Isfahan. MohammadAli Nematbakhsh is a Professor in the Department of Computer Engineering, University of Isfahan. 1 Introduction Customer churn is the loss of existing customers to a competitor. Preventing customer churn and trying to retain customers is the main object of customer churn management. Several issues of the present day, such as a corporation’s competition and changes in the needs of customers, can influence customers’ behaviour in department stores. It is more profitable to keep and satisfy existing customers than to constantly attract new customers, (Reinartz and Kumar, 2003). Thus, companies start to focus on customers, instead of corporation services. In fact, stores increase their profits by decreasing the rate of churn customers.

Using CLV for modelling churn and customer retention

Embed Size (px)

Citation preview

128 Int. J. Electronic Marketing and Retailing, Vol. 5, No. 2, 2012

Copyright © 2012 Inderscience Enterprises Ltd.

Using CLV for modelling churn and customer retention

Najmeh Abedzadeh* and MohammadAli Nematbakhsh Department of Computer Engineering, University of Isfahan, 81746-73441, Iran E-mail: [email protected] E-mail: [email protected] *Corresponding author

Abstract: Preventing customer churn and trying to retain customers is the main object of customer churn management. This paper proposes a model to measure churn probability and introduces a policy to retain customers. Using existing datasets of customers, we calculated CLV and used the C5.0 technique to predict churn probability for each customer. We also used process mining to find a policy to retain each customer separately. The model was simulated using a super market chain (Refah) and the results show the model is performing much better than previous proposed models. A computer result is shown.

Keywords: customer lifetime value; customer churn; data mining; customer retention; process mining.

Reference to this paper should be made as follows: Abedzadeh, N. and Nematbakhsh, M. (2012) ‘Using CLV for modelling churn and customer retention’, Int. J. Electronic Marketing and Retailing, Vol. 5, No. 2, pp.128–146.

Biographical notes: Najmeh Abedzadeh is an Instructor in the Department of computer engineering, University of Isfahan.

MohammadAli Nematbakhsh is a Professor in the Department of Computer Engineering, University of Isfahan.

1 Introduction

Customer churn is the loss of existing customers to a competitor. Preventing customer churn and trying to retain customers is the main object of customer churn management. Several issues of the present day, such as a corporation’s competition and changes in the needs of customers, can influence customers’ behaviour in department stores. It is more profitable to keep and satisfy existing customers than to constantly attract new customers, (Reinartz and Kumar, 2003). Thus, companies start to focus on customers, instead of corporation services. In fact, stores increase their profits by decreasing the rate of churn customers.

Using CLV for modelling churn and customer retention 129

We are following to find better methods for retaining the loyalty of customers. Predicting customers’ behaviour is the main means of retaining them. Building lasting relationships with customers could be the best method of minimising churn probability.

To effectively manage customer churn within a company, it is crucial to build an effective and accurate customer-churn model. To accomplish this, there are numerous predictive modelling techniques. Some articles have predicted the probability of churn using traditional statistical models (Grotto et al., 2001). Several researchers have examined customer churn and the factors that influence churn (Aurelie and Croux, 2006). Moreover, there are some data mining techniques that can effectively assist with the selection of customers most prone to churn (Hung et al., 2006). A decision tree could be the best method for modelling churns (Petersen et al., 2009; Bong Horng et al., 2007). Some articles show that CLV is highly correlated with firm value, using a longitudinal analysis of a firm’s data. Moreover, there are some process mining techniques for mining the main processes of the company. One of the articles uses process mining to discover a concrete workflow schema. It models all possible execution patterns registered in a given log, which can be exploited subsequently to support future enactments (Greco et al., 2008).

This paper proposes a hybridised architecture to approach the customer retention problem. It uses data mining to predict the churn probability. Using existing datasets of customers, it calculates CLV and uses the C5.0 technique to predict churn probability for each customer. It also uses process mining to find a policy to retain each customer separately.

2 Modelling techniques for customer retention

2.1 System architecture

This system works on two modes: training and testing. The training mode uses data mining to predict the churn probability. Using existing datasets of customers, it calculates CLV. By using CLV for an algorithm, customers would be divided into distinct groups. Using existing information of customers and C5.0 technique, it predicts the churn probability for each customer. It also uses process mining to find a policy to retain each customer separately.

The algorithm that we will show in the following section is used to divide customers into distinct groups. Then we use this information to construct a churn model. This model and process mining are used to find the best policy for retaining them. The testing mode tries to predict the churn probability of every new customer and tries to keep them.

The following sections explain each component in these problems and show how it works with a real-world sample data set as an example.

2.2 Training mode

The churn model is constructed in this mode. Using existing datasets, it calculates CLV for one year. Historical information, particularly the CLV, is used to construct the churn model. Then this churn model is used for process mining to discover the best policy for retaining customers.

130 N. Abedzadeh and M. Nematbakhsh

2.2.1 Calculate CLV

CLV is generally defined as the present value of all future profits obtained from a customer over his or her life of relationship with a firm (Reinartz and Kumar, 2003). CLV is similar to the discounted cash flow approach used in finance. However, there are two key differences (Gupta et al., 2006). First, CLV is typically defined and estimated at an individual customer or segment level. This allows us to differentiate between customers who are more profitable than others, rather than simply examining average profitability. Second, unlike finance, CLV explicitly incorporates the possibility that a customer may defect to competitors in the future. CLV for a customer (omitting customer subscript) is (Sunil et al., 2004; Reinartz and Kumar, 2003)

t0

(Pt Ct)RtCLV AC(1 I)U

T

t=

−= −

+∑

where

Pt = price paid by a customer at time t

Ct = direct cost of servicing the customer at time t

i = discounted rate or cost of capital for the firm

Rt = the probability of repeated buying or the probability of being ‘alive’ at time t

AC = acquisition cost and

T = time horizon for estimating CLV.

The CLV1 is calculated for the two first months:

The pt parameter is the money that paid by a customer in time t.

The ct parameter is calculated like this: (cp + o + sc) / N.

where

cp = total cost paid for carrying products

o = total cost paid for operators

sc = total cost paid for sidelong cases

N = total considered customers

The i parameter is equal to %5. // there isn’t any discount for the CLV1

The rt parameter is equal to %90.

The AC parameter for this time is calculated like this: AdC/N

where

AdC = advertisement cost

N = total considered customers

The T parameter is equal to 60.

Using CLV for modelling churn and customer retention 131

The CLV2 is calculated for two second months:

The AC parameter for this time is zero and other parameters are like CLV1.

The dataset that is used for our paper is Refah departments store data. We filter data and delete the customers whose information is not complete. Then the historical information of customers is used to calculate CLV for one year. The parameters of pt, ct, i, rt, AC and T are calculated for CLV. The algorithm that is used for seperating the customers to 4 groups is pointed below.

The algorithm for finding the probability of churning

1 If ((CLV1 * 1.1) < = CLV2) then churn = 1. 2 If ((CLV1 * 0.9) < = CLV2 < (CLV1 * 1.1)) then churn = 2. 3 If ((CLV1 * 0.6) < CLV2 < (CLV1 * 0.9)) then churn = 3. 4 If (CLV2 < = (CLV1* 0.6)) then churn = 4.

Using the above algorithm, four groups of churning customers would be constructed, as described here:

1 loyalty customers

2 almost loyalty customers

3 almost churn customers

4 churn customers

Customer loyalty describes the tendency of a customer to choose one business or product over another for a particular need.

The following section will describe the method that we have used to discover the common attributes of customer churning. We have used the historical information, the CLV of customers for one year and the group that each customer belongs to.

2.2.2 Constructing the churn model

The historical information of customers, CLV and churn number are used to construct the churn model decision tree. In the following sections, we will describe historical information and the decision tree to construct the churn model.

2.2.2.1 Historical information

The historical information on customers, CLV and churn numbers is used to construct the churn model decision tree. It has been shown over and over that past consumer behaviour is the best predictor of future behaviour. Past behaviour is a much better predictor of future behaviour than demographics ever will be. A visitor or buyer who repeats his or her behaviour is more likely to continue repeating it, meaning their future value to the business is high. Thus, when you look at a particular segment of customers, if repeating visitors or buyers are rising, then your future business with this segment of customer will be stronger than it is today. If repeaters are falling, business from this customer segment will be weaker in the future.

When you look at the repeating behaviour of different groups of customers, you can make judgments about which customer groups will be most valuable in the future. You

132 N. Abedzadeh and M. Nematbakhsh

want to do everything you can to attract customers with high repeat behaviour and reduce or eliminate any spending or other efforts on attracting customers with low repeat behaviour.

There are about ten attributes in our dataset. We filtered the data and used just eight attributes for our dataset. Our historical information is pointed here:

ID: between 1–7000 customers

Sex: 1 for men, 2 for women

Zone: value 12 for Esfahan city, value 13 for other cities

Age: more than 17 years old.

CLV1: more than 24

CLV2: more than 20

ProdID: value 1 for grain, value 2 for clothing, value 3 for household services, value 4 for kitchen services, value 5 for bread, value 6 for meat types, value 7 for pastry types, value 8 for meal types.

Churn: value 1 for loyalty customers, value 2 for almost loyalty customers, value 3 for almost churn customers, value 4 for churn customers.

2.2.2.2 Decision tree

An important technique in machine learning, decision trees are used extensively in data mining. They are able to produce human-readable descriptions of trends in the underlying relationships of a dataset and can be used for classification and prediction tasks. The technique has been used successfully in many different areas, such as medical diagnosis, plant classification and customer marketing strategies.

2.2.2.3 Churn model

The churn model learned by C5.0 is represented by a decision tree, as illustrated in Figure 1. Each leaf is associated with a parameter of (1/2/3/4), which means that every customer with these attributes belongs to one group. Note that each path in the decision tree forms a rule. For example, the following path says that for customers whose 49 ≤ CLV2 < 66, 69 < CLV1 ≤ 74, Zone = 12, ProdID ≤ 7 and age ≤ 74, there is a likelihood that this customer belongs to group 3 and would be likely to churn.

49 ≤ CLV2 < 66 & 69 < CLV1 ≤ 74 & Zone = 12 & ProdID ≤ 7 & age ≤ 52:- (3)

2.2.3 Policy model constructor

This module constructs retention strategies by looking for proper retention incentives for potential churners. This involves two tasks. First, any implicit relationships among churners must be identified to create possible groups. Second, the attributes of those groups must be analysed to interpret their significance. Based on the characteristics of groups, specific policies are proposed for retaining customers.

To accomplish the first task, the learned churn model is consulted to single out those attributes which are recognised by the model as having strong relationships with

Using CLV for modelling churn and customer retention 133

churning. Those attributes are used to segment all churners into different groups and label the most significant attributes in each group. Second, the attributes in each group, which conceptually represent a churner group, can be analysed to interpret their significance. Based on the interpretation, customers are divided to 4 groups and a set of policies based on process mining is proposed to retain the customers of each group.

2.2.4 Using process mining for customer retention

Process mining targets the automatic discovery of information from an event log. This discovered information can be used to deploy new systems that support the execution of business processes or as a feedback tool that helps in auditing, analysing and improving already enacted business processes. The main benefit of process mining techniques is that information is objectively compiled. In other words, process mining techniques are helpful because they gather information about what is actually happening according to an event log of an organisation and not according to what people think is happening in the organisation.

Figure 1 A snapshot of part of the churn model (see online version for colours)

Process mining is used to retain customers. There are a lot of polices that are used to retain customers, but we cannot discover the policy that is more efficient. Some polices have different influences on different people (Chip, 1996; Nigel et al., 2000). Some of the results of polices on customers can be considered for efficiency, though not all of them. Process mining is a method that is used to consider the efficiency of different polices.

For using process mining, two states should be considered. First, the different polices would be considered on customers in different groups. There are a lot of polices for

134 N. Abedzadeh and M. Nematbakhsh

retaining customers, but we do not know the most efficient process for customers. Second, the CLV of customers would be calculated again. Those customers whose CLV their increases by more than 20% would be considered for process mining. These customers would be selected to mine their process of retaining polices.

The genetic algorithm is used to discover this process, as shown in Figure 2. This process is done by the operator to retain customers. It has an accuracy of 65% and is proved to be the best policy for increasing CLV. These methods would be done for every four groups and the best policy for each group would be selected.

Figure 2 A mined process for customer retention

Using CLV for modelling churn and customer retention 135

2.3 Testing mode

In the testing mode, a churner predictor prompts dialogue to solicit customer data from the user. The attributes of the new customer would be entered to ascertain which group the customer belongs to, as shown in Figure 3. The results would show the probability of churn of this customer. Figure 4 illustrates that this customer has a 0.93 probability of almost churn and 0.07 of probability of almost loyal.

Since the churn probability is above 90%, this customer is considered ‘almost to churn’. This result means that the appropriate policy must be implemented to retain the customer. The policy that considered best for each group is used to retain customers in that group.

Figure 3 Enter customer data to the churner predictor (see online version for colours)

Figure 4 Churn probability as predicted by the churner predictor (see online version for colours)

136 N. Abedzadeh and M. Nematbakhsh

3 Classification technique

C5.0 is a well-known decision tree algorithm developed by Quinlan (1993). It introduces a number of extensions to the classical ID3 algorithm (Quinlan, 1986). With respect to a set of training cases T, the C5.0 algorithm can be written recursively as follows:

• if T is a leaf, the algorithm is finished

• alternatively find a test X on some attribute such that the resulting partition T1, T2,…,Tm has the highest gain

• apply C5.0 to T1, T2,…,Tm.

This study uses C5.0 as a classification technique.

3.1 Gain ratio criterion vs. absolute gain

The classical ID3 algorithm uses a criterion called information gain (Kamber and Han, 2000). Suppose a test X is conducted on some attribute with n outcomes, which partitions the training set T into subsets of T1, T2, … and Tn. Let S be any set of cases and let freq (Cj, S) stand for the number of cases in S which belong to class Cj, j = 1, 2,…,k. Let | S | denote the number of cases in set S. To find the expected entropy of S, the surprise of each class is summed up and weighted by its occurrence frequency in S, giving equation (1).

( ) ( )2

1

, ,entropy(S) log .

| | | |

Kj j

j

freq C S freq C SS S=

⎛ ⎞= × ⎜ ⎟

⎝ ⎠∑ (1)

After T is partitioned to n outcomes according to the test X, the expected entropy can be calculated as the weighted sum over subsets, as defined by equation (2).

( )x1

entropy ( ) entropy .| |

Ni

ii

TT TT=

= ×∑ (2)

Now define the information gain for text X, denoted as gain (X), in equation (3) as a metric of gain by partitioning T according to test X. Finally select a test with the maximum information gain to be the root.

xgain(X) entropy(T) – entropy (T).= (3)

This selection of a test is based on absolute information gain. Although it produces good results, the criterion has a serious deficiency: it is biased towards tests with more outcomes. In C5.0 this bias is rectified by a normalisation factor defined by equation (4). In the equation, split_entropy is calculated, which becomes larger if the outcome of a partition contains more subsets.

1

split-entropy( ) log .| | | |

Ni i

i

T TXT T=

⎛ ⎞= − × ⎜ ⎟

⎝ ⎠∑ (4)

Now a gain_ratio can be defined as a new metric measuring the information gain with respect to the complexity of splits, equation (5).

Using CLV for modelling churn and customer retention 137

gain_ ratio(X)gain(X) .

split entropy(X)=

− (5)

Generally speaking, the use of this gain ratio criterion to select attributes for each test is robust and can consistently give a better choice of tests than the absolute gain criterion. Note that this gain ratio was originally proposed by C4.5/C5.0. For detailed discussions please refer to reference (Kamber and Han, 2000).

3.2 Continuous vs. categorical attributes

The ID3 algorithm can only deal with attributes which contain categorical values. In C5.0, continuous attributes are properly dealt with. Specifically, if all the training cases in T are sorted according to the values of a continuous attribute, represented here as A, then there will only be a finite number of values for A. These values are denoted in order as {v1, v2,…, vm}. Now equation (6) can be used to calculate m_1 threshold values. For each threshold partition, the whole value range can be divided into two splits and evaluated on its appropriateness.

1 .2

i iv v ++ (6)

4 Process mining techniques

Process mining techniques allow for extracting information from event logs. For example, the audit trails of a workflow management system or the transaction logs of an enterprise resource planning system can be used to discover models describing processes, organisations and products. The genetic algorithm is used to mine processes from log files (Jiafei et al., 2011). In the following section, we will explain this algorithm.

4.1 Genetic algorithm definition

The genetic algorithm plug-in assumes that the target model to be mined satisfies the following constraints:

• the model has a single start task.

• the model has a single end task.

• the start task has an XOR-join/split semantics. In Petri net terms, the start task has a single input place and a single output one.

• the end task has an XOR-join/split semantics. In Petri net terms, the end task has a single input place and a single output one.

To make sure your log will always satisfy these constraints; just add two artificial start tasks and two artificial end tasks to every trace in your log. You can easily do that by respectively using the log filters ‘Add Artificial Filter’ and ‘Add Artificial End Task Log Filter’, at the ‘Advanced’ tab of the log window.

138 N. Abedzadeh and M. Nematbakhsh

4.1.1 Genetic algorithm plug-in

The genetic algorithm plug-in uses genetic algorithms to mine process models from event logs. Its output is a set of process models that are decreasingly ordered by the fitness value. In other words, the best mined model is listed first and so on. Genetic algorithms are adaptive search methods that try to mimic the process of evolution. These algorithms start with an initial population of individuals. Every individual is assigned a fitness measure to indicate its quality. In our case, an individual is a possible process model and the fitness is a function that evaluates how well the individual is able to reproduce the behaviour in the log. Populations evolve by selecting the fittest individuals and generating new individuals using genetic operators such as crossover (combining parts of two or more individuals) and mutation (random modification of an individual).

Figure 5 and Table 1 illustrate the genetic algorithm plug-in’s main steps:

Figure 5 The main steps of the genetic algorithm plug-in

Table 1 The steps description of Figure 5

Step Description

I Read event log II Build the initial population III Calculate fitness of the individuals in the population IV Stop and return the fitness of the individuals? V Create next population – use elitism and genetic operators

Every individual in the population is represented as a causal matrix. Figure 7 illustrates the causal matrix for the process model in Figure 6. The causal matrix shows the dependencies between the tasks, as well as the semantics of these dependencies. In short, every task has an input (I) and output dependencies. Tasks in the same subset of an input/output set have an XOR relation. Tasks in different input/output subsets of a task have an AND relation.

Figure 6 Model represented as a Petri net

The ‘genetic algorithm plug-in’ parameters are:

Population size: sets the number of individuals that are going to be used during the search.

Initial population type: sets how the initial population should be built.

Using CLV for modelling churn and customer retention 139

Figure 7 Model represented as a causal matrix (CM)

The table below summarises the possible options.

Heuristics are used to set the dependencies between tasks

The dependencies between tasks are randomly set

Duplicate individuals are allowed Possible duplicates Without heuristics Duplicate individuals are forbidden No duplicates

Maximum number of generations: Sets the maximum number n of times that the genetic algorithm can iterate. This

parameter relates to the stopping criteria. In short, the genetic algorithm plug-in stops when

1 it finds an individual whose fitness is maximal

2 it iterates n times

3 the fittest individual has not changed for n/2 iterations in a row.

Seed

Sets the seed that is used to generate random number for this plug-in.

Power value

Sets the power value that is used by the heuristics to build the initial population. In a nutshell, the heuristics works as follows: the more often a task t1 is directly followed by a task t2 (i.e., the sub trace ‘t1t2’ appears in traces in the log), the higher the probability that individuals are built with a dependency (or arc) from t1 to t2. The power value is used to control the ‘influence’ of the heuristics in the probability of setting a dependency between two tasks. Higher values for power value lead to the inferring of fewer dependencies between two tasks in the event log and vice-versa.

Elitism rate

Sets the percentage of the fittest individuals in the current generation that are going to be copied to the next generation. For instance, an elitism rate of 0.02 means that 2% of the best individuals in the population are copied to the next population.

Fitness type

Sets the type of the fitness that the genetic algorithm plug-in uses to assess the quality of an individual. The quality of an individual is basically set by its replaying of the log

140 N. Abedzadeh and M. Nematbakhsh

traces. This replay can use a stop semantics parsing or a continuous semantics one. The stop semantics parsing stops whenever the task to be parsed is not enabled. The continuous semantics parsing works in a different way: whenever a task is not enabled, the problem is registered (for instance, the number of missing tokens) and the task is fired anyway. The fitness types Proper Completion and Stop Semantics use stop semantics parsing. The other fitness types use a continuous semantics one. In short, the fitness types work as follows:

Let L be an event log and CM a causal matrix or multiset of causal matrices (i.e., a population) that contains be a real number greater than 0.

all Property Completed Log Traces (CM, L)1 Proper Completion (L.CM)numTraces Log (L)

=

all Parsed Activities (CM, L)2 Stop Semantics (L, CM) 0.20num Activities Log (L)

all Completed Log Traces (CM, L) 0.03num Traces Log (L)

all Property Completed Log Traces (CM, L) 0.05num Traces Log (L)

= × +

× +

×

all Parsed Activities (CM, L)3 Continuous Semantics (L, CM) 0.40num Activities Log (L)

all Property Completed Log Traces (CM, L) 0.06num Traces Log (L)

= × +

×

all Parsed Activities (CM, L)4 Improved Continuous Semantics (L, CM)num Activities Log (L)

whereall Missinng Tokens (L.CM)Punishment

num Traces Log (L) num Traces Missing Tokens (L, CM) 1

=

= +− +

all Extra Tokens Left Behind (L, CM) num Traces Log (L) num Traces Extra Tokens Left Behind (L, CM) 1− +

( )

5 Extra Behaviour Punishment (L, CM, CM[])Improved Continuous Semantic (L, CM) K

all Enabled Activities (L, CM) .Max all Enabled Activities (L, CM), num Traces Log (L)

s −

⎛ ⎞×⎜ ⎟⎝ ⎠

Show advanced fitness parameters

Allows for the setting of the parameters for the selected fitness type. For the GA, the fitness ‘Extra Behaviour Punishment’ is the only one to have a parameter. This parameter is the Extra Behaviour Punishment (the kappa in item 5 of Figure 4).

Using CLV for modelling churn and customer retention 141

Use genetic operators

This function is used to set whether the genetic operators’ crossover and mutation are going to be used to build the populations that follow the initial population. If this option is unchecked, the next populations are built just like the initial population. If this option is checked, then the individuals of the next population that do not belong to the elite are created by applying crossover and mutation to individuals in the current population. The cycle works as follows: first, two parents are selected; second, they undergo crossover (with a certain probability) and produce two offspring; third, every offspring may be mutated with a certain probability; finally, the offspring are inserted into the new population.

Selection method type

Sets how the parents for genetic operators are going to be chosen. Both methods are based on a tournament. The method type Tournament works by randomly selecting two individuals in the population and returning the fittest individual 75% of the times and the less fit individual 25% of the times. The method type Tournament5 randomly selects five individuals in the population and always returns the fittest individual.

Crossover type

Sets how two parents (selected individuals) are going to be recombined. All types work at the task level. However, their granularity varies. A task has an input and output set. Thus, some crossover types work on the full input/output sets, others on subsets that are in the input/output sets of a task. In short, the crossover types work as follows:

• Local one point: Always swaps both the input and output sets of a task

• Variable local one point: Swaps both input/output sets 50% of the times. The other times only one of these sets is swapped.

• Fine granularity: Swaps subsets in the input/output set of a task. If the subsets that are been swapped have intersecting elements with subsets (in the other individual) that were not swapped, this crossover type randomly chooses if these subsets (the swapped and others) are going

1 to be merged

2 if the common elements are going to be removed from the subsets that were not swapped during the crossover.

• Enhanced: Swaps subsets in the input/output set of a task. With equal probability, the swapped subsets can be:

1 included into the input/output set

2 merged with some subsets which are not swapped in the input/output sets

3 the swapped sets are added and intersecting elements that are in the swapped ones are removed from some subsets which are not swapped.

142 N. Abedzadeh and M. Nematbakhsh

NOTE: We strongly recommend the use of the ‘Enhanced’ crossover type because it incorporates the concepts of the other crossover types.

Crossover rate

Sets the probability that two parents are going to be recombined offspring for the next generation. If the probability is equal to 0, then, after the crossover, the offspring are just like the parents.

Mutation type

Sets how an individual is going to be randomly modified. The mutation point is a task.

• All elements: adds or removes one element of a subset in the input/output set of the mutated task

• Partition redefinition: re-arranges the subsets of the input/output set of the mutated task

• Enhanced: performs one of the following operations with equal probability: 1 add a task to a subset in the input/output set of the mutated task 2 remove a task from a subset in the input/output set of a mutated task, or 3 re-arrange the subsets of the input/output set of the mutated task.

NOTE: We strongly recommend the use of the ‘Enhanced’ mutation type because it incorporates the concepts of the other mutation types.

Mutation rate

Sets the probability that an individual is going to be mutated.

5 Results

To evaluate the performance of the proposed method, we apply it to a real-world database. The Refah department store provided the database for this study. The data set, as extracted from the store’s data warehouse, included records of more than 15,000 customers described by 8 variables. The customers whose attributes were not complete were removed.

This dataset is divided in two sections: training and testing. The training dataset was first fed to C5.0 to construct the churn model. The testing dataset was then used to assess the accuracy of the built model. In the training dataset, CLV is calculated for one year. The CLV of the first six months is calculated to discover which group this customer belongs to. The CLV1 is calculated for the first two months and the CLV2 for the second two months. Then C5.0 is used to generate rules to know the common attributes, as shown in Figure 8. The tertiary CLV is calculated for process mining for discovering the best method for retaining customers.

Process mining is used to mine the best process that is used to retain customers. The best polices that are considered for each group are pointed here:

Using CLV for modelling churn and customer retention 143

Figure 8 A snapshot for generating rules (see online version for colours)

6 Discussion

This paper has described hybrid architecture to deal with modelling churn and retaining churning customers. Two modes had been worked into this architecture, namely, training and testing. The training mode had been used for data mining to predict the probability of churning. Historical information of customers was used to calculate CLV and divide customers into distinct groups to construct churn model. Then this model was used to find the best policy for retaining them. The testing mode tried to predict the churn probability of every new customer and tried to retain the customers.

As a result of our article, four groups are known. The C5.0 was used to find out which group every new customer belonged to. Now we use the method that was shown in the results section to retain this customer. As shown, just sending a postal card is enough for the first group. They would stay with the store and would be loyal forever. The second group is almost loyal and they may leave the store. A discount after sending a postal card is necessary for them.

The process for the third group of three shows that sending postal cards or discounts would not be efficient for these customers. The best way for retaining this group is by first giving them the guarantee that the products of the company are the best and if any product turns out to be damaged, by giving back their money to them. The store can give a note to them with to confirm that it at least replaces products with a bad quality. Then the store can give a gift to satisfy them. By giving confidence and discount to them, they

144 N. Abedzadeh and M. Nematbakhsh

would come back to the store. This is a kind of persuasion that is done for retaining this group.

The fourth group is really important. The above methods would not work for this group. When customers leave the store, the store must discover the reason behind their churning and meet their needs by calling them. Giving guarantees and gifts are the first methods of retaining them. But this process must be continuing by calling them and discovering their needs. This is the time for meeting their requirements as far as possible.

Using CLV for modelling churn and customer retention 145

As shown in this article, process mining is used for retaining customers for the first time. Before, we could not find the main process which had been used in the stores. A lot of methods for retaining customers were tried, but the best method was unknown.

Comparing the results of this article with one of the methods that had been used for clustering customer churn (Bong-Horng et al., 2007), we find that these methods have better results. As you seen in Figure 9, our findings have about 76% improvements in retaining churn customers of the Refah store while the result of clustering has about 68% improvement in it.

Figure 9 The chart of comparing process mining with clustering (see online version for colours)

7 Conclusions

This paper has described a hybridised architecture to deal with customer retention. Two modes had been worked on in this architecture, namely, training and testing. This is accomplished not only through predicting churn probability, but also by proposing retention policies. The training mode had used data mining techniques to predict the probability of churning. Historical information of customers had been used for calculating CLV and dividing customers into distinct groups for constructing a churn model. Then this model had been used to find the best policy for retaining them. The testing mode tried to predict the churn probability of every new customer. Then it used process mining to keep them.

Directions for future research are given by the fact that nowadays there are a lot of attributes that are important for churning. Considering more attributes like behaviour and thoughts of the customers would prove to be efficient in retaining customers. Moreover, dividing customers into more distinct groups could be beneficial.

Acknowledgements

N. Abedzadeh, M.A. Nematbakhsh and N. Nematbakhsh worked on customer retention. N. Abedzadeh used data mining for customer churn and process mining for retaining them. Refah Department Store helped them to test their method by giving dataset to them.

146 N. Abedzadeh and M. Nematbakhsh

References Aurelie, L. and Croux, C. (2006) ‘Bagging and boosting classification trees to predict churn’, J.

Market. Res., Vol. 43, pp.276–286. Bong-Horng, C., Ming-Shian, T. and Cheng-Seen, H. (2007) ‘Toward a hybrid data mining model

for customer retention’, Knowledge-Based Syst., Vol. 20, pp.703–718. Chip, R.B. (1996) Customers as Partners, Berrett-Koehler, UK. Gerpott, T.J., Rams, W. and Schindler, A. (2001) ‘Customer retention, loyalty and satisfaction in

the German mobile cellular telecommunications market’, Telecommun. Policy, Vol. 25, pp.249–269.

Greco, G., Guzzo, A. and Pontieri, L. (2008) ‘Mining taxonomies of process models’, Data Knowledge Eng., Vol. 67, pp.74–102.

Gupta, S., Hanssens, D., Hardie, B., Kahn, W. and Kumar, V. (2006) ‘Modeling customer lifetime value’, J. Service Res., Vol. 9, pp.139–155.

Hung, S.Y., Yen, D.C. and Wang, H.Y. (2006) ‘Applying data mining to telecom churn management’, Expert Syst. Appl., Vol. 31, pp.515–524.

Jiafei Li, Jihong Ou Yang, Jiafei Li, Jihong OuYang, Mingyong Feng (2011) A Heuristic Genetic Process Mining Algorithm, CIS 2011: 15–19.

Kamber, M. and Han, J. (2000) Data Mining: Concepts and Techniques, 1st (Ed.) Morgan Kaufmann, San Mateo CA, ISBN-10: 1558604898.

Nigel, H., Self, B. and Roche, G. (2000) Details of Book: Customer Satisfaction Measurement For ISO 9000: 2000, Butterworth-Heinemann, UK., ISBN: 0750655135, p.176.

Petersen, J.A., McAlister, L., Reibstein, D.J., Winer, R.S., Kumar, V. and Atkinson, G. (2009) ‘Choosing the right metrics to maximize profitability and shareholder value’, J. Retailing, Vol. 85, pp.95–111.

Quinlan, J.R. (1986) ‘Induction of decision trees’, Mach, Learn., Vol. 1, pp.81–106. Quinlan, R.J. (1993) C4.5 Programs for Machine Learning, Morgan Kaufmann Publishers Inc.,

San Francisco, CA, USA. Reinartz, W.J. and Kumar, V. (2003) ‘The impact of customer relationship characteristics on

profitable lifetime duration’, J. Market., Vol. 67, pp.77–99. Sunil, G., Lehmann, D.R. and Stuart, J.A. (2004) ‘Valuing customers’, Journal of Marketing

Research, February, HBS Marketing Research Paper Nos. 03–08, pp.7–18.