20
This article was downloaded by: [University of Calgary] On: 03 May 2013, At: 22:22 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of Organizational Computing and Electronic Commerce Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/hoce20 Hybrid Models Using Unsupervised Clustering for Prediction of Customer Churn Indranil Bose a & Xi Chen b a The University of Hong Kong, b Zhejiang University, Published online: 15 Apr 2009. To cite this article: Indranil Bose & Xi Chen (2009): Hybrid Models Using Unsupervised Clustering for Prediction of Customer Churn, Journal of Organizational Computing and Electronic Commerce, 19:2, 133-151 To link to this article: http://dx.doi.org/10.1080/10919390902821291 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Hybrid Models Using Unsupervised Clustering for Prediction of Customer Churn

  • Upload
    xi

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

This article was downloaded by: [University of Calgary]On: 03 May 2013, At: 22:22Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Organizational Computing andElectronic CommercePublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/hoce20

Hybrid Models Using UnsupervisedClustering for Prediction of CustomerChurnIndranil Bose a & Xi Chen ba The University of Hong Kong,b Zhejiang University,Published online: 15 Apr 2009.

To cite this article: Indranil Bose & Xi Chen (2009): Hybrid Models Using Unsupervised Clustering forPrediction of Customer Churn, Journal of Organizational Computing and Electronic Commerce, 19:2,133-151

To link to this article: http://dx.doi.org/10.1080/10919390902821291

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representationthat the contents will be complete or accurate or up to date. The accuracy of anyinstructions, formulae, and drug doses should be independently verified with primarysources. The publisher shall not be liable for any loss, actions, claims, proceedings,demand, or costs or damages whatsoever or howsoever caused arising directly orindirectly in connection with or arising out of the use of this material.

HYBRID MODELS USING UNSUPERVISED CLUSTERINGFOR PREDICTION OF CUSTOMER CHURN

Indranil Bose1and Xi Chen

2

1The University of Hong Kong2Zhejiang University

Churn management is one of the key issues handled by mobile telecommunication operators.

Data mining techniques can help in the prediction of churn behavior of customers. Various

supervised learning techniques have been used to study customer churn. However, research on

the use of unsupervised learning techniques for prediction of churn is limited. In this article, we

use two-stage hybrid models consisting of unsupervised clustering techniques and decision

trees with boosting on two different data sets and evaluate the models in terms of top decile

lift. We examine two different approaches for hybridization of the models for utilizing the

results of clustering based on various attributes related to services usage and revenue

contribution of customers. The results indicate that the use of clustering led to improved

top decile lift for the hybrid models compared with the benchmark case when no clustering is

used. It is also shown that using cluster labels as inputs to the decision trees is a preferred

method of hybridization.Out of the five unsupervised clustering techniques used, none is found

to dominate others. But interesting attributes and rules that can help marketing experts

identify churners from the data are obtained from the best hybrid models.

Keywords: churn; clustering; data mining; decision trees; lift; prediction; rules

I. INTRODUCTION

A recent report from the International Data Corporation (IDC) groupestimated that the number of mobile subscribers in the Asia-Pacific will reach1.05 billion by 2010 [1]. The increasing use of mobile services is observed not onlyin Asia-Pacific but also in other parts of the world. According to the research firmAnalysys, the penetration rate of mobile telephony for Western Europe is going toreach 100% by the end of 2007 [2] and the United States will have 100%penetration by 2013 [3]. However, a hot market also implies fierce competition.The key to survival in this competitive market lies in knowing customers betterand retaining every valuable customer. Due to the deregulation of the mobiletelecommunication industry, today customers can switch easily from one mobileoperator to another at their wish. This phenomenon is referred to as customerchurn and is defined as the gross rate of customer loss during a given period [4]. Itis a challenging problem faced by mobile service providers around the globe.

Journal of Organizational Computing and Electronic Commerce, 19: 133–151, 2009

Copyright � Taylor & Francis Group, LLC

ISSN: 1091-9392 print / 1532-7744 online

DOI: 10.1080/10919390902821291

Address correspondence to Xi Chen, School of Management, Zi Jin Gang Campus, Zhejiang

University, Hangzhou China. E-mail: [email protected]

133

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

Preventing customer churn is critical for the survival of mobile service providersbecause it is estimated that the cost of acquiring a new customer is about $300 ormore if theadvertising,marketing, and technical support are all taken into consideration.On the otherhand, the cost of retaining a current customer is usually as low as the cost of a singlecustomer retention call or a single mail solicitation [5]. The high acquisition cost makes itimperative formobile services providers to deviseways to predict the churn behavior and toexecute appropriate proactive actions before customers leave the company. There arevarious reasons for customer churn. For example, Keaveney has stated that voluntarychurn is due to the loss of customers’ loyalty to the company [6]. Bolton has reported in herresearch that voluntary churn of customers can be explained by the level of customers’satisfaction [7].

Data mining techniques have been used in the area of customer analytics because oftheir ability to extract hidden behavioral patterns of customers from large databases [8].Mobile telecommunication companies have used data mining techniques to identify custo-mers that are likely to churn. In fact, according to a report from SPSS [9], data miningtechniques such as decision trees and neural networks have been commonly used to predictcustomer churn in practice because the companies can benefit from them. Burez and Vanden Poel reported that the rate of customer attrition for pay-TV services was significantlyreducedwhen they used datamining with random forest (a type of decision treemodel) anda hybrid model consisting of logistic regression and Markov chain [10]. Since the mainpurpose of applyingdatamining techniques is prediction, supervised learning techniques arepopularly used. The use of unsupervised learning techniques for churn prediction can beadvantageous, but this is rather limited. By using unsupervised learning techniques, custo-mers whohave similar behavioral patterns (that can be described bymultiple attributes) canbe grouped together. Since no predefined classes of customers are required, the patternsobtained using clustering techniques represent natural structures that exist in the data.Instead of being a simple combination of several attributes with ad hoc values, a patternrepresents the relationships between attributes that make the clusters intra-similar as well asinter-distinctive. By combining unsupervised learning techniques with supervised learningtechniques, more information from the results of clustering techniques can be brought intothe process of classification. In this article, we investigate the issue of how to combineunsupervised learning techniques with supervised learning techniques in the form of hybridmodels for the prediction of customer churn. Our goal is to seek answers to the followingresearch questions. First, can clustering algorithms detect patterns that help decision trees inidentifying churners better? Second, which clustering algorithm(s) are more useful in pre-diction of churn?Third,what is the bestway to combine the results obtained fromclusteringalgorithms with that of decision trees? Fourth, what type of behavioral patterns of custo-mers obtained from clustering of customer data is useful for detection of churners?

This article starts with a review of the related literature and moves on to thedescription of the data used in this research in the third section. The fourth section of thearticle describes the experimental results. The fifth section provides discussion of the resultsand identifies possible directions for future research. This is followed by a section thatincludes the concluding remarks.

II. LITERATURE REVIEW

Han and Kamber described data mining as a process for discovering interestingknowledge from large amounts of data [11]. There are mainly two types of data mining

134 BOSE AND CHEN

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

techniques that are used in practice: supervised learning and unsupervised learning.Supervised learning requires that the data set should contain target variables that representthe classes of data items or the behaviors that are going to be predicted. For example, thetarget variable could be good customers or bad customers, and churning customers or non-churning customers. In supervised learning, themodels are trained to identify patterns thatcan be used to classify the customers into the right classes or predict the customers’ actualbehavior. Themost important decision in customer churnmanagement is the separation ofchurners from non-churners. This is a task that is quite capably handled by supervisedlearning techniques. As a result, supervised learning techniques that are popularly adoptedby customer churn researchers include logit, neural networks, decision trees, and geneticalgorithms.

A. Supervised Learning

Decision tree models are very popular for prediction of churn. Wei and Chiu useddifferent subsets of the whole data set to generate different decision tree models andcombined the results of those single decision tree models using a weighted voting approachand generated a final classification decision for churn [12]. They included customercharacteristics as well as their contract information in their churn model. Hung et al.clustered customers according to their tenure-related data and built decision trees for eachcluster to predict customer churn [13]. Chu et al. used C5.0 decision tree to separatechurners from non-churners and to identify key attributes for the prediction of churners[14]. In the second phase of their research, they clustered the detected churners according tothe identified key attributes so that retention policies could be designed for each cluster.Decision tree models have also been used to construct hybrid models in combination withother supervised learning techniques. Qi et al. combined decision trees and logistic regres-sionmodels [15]. They determined different subsets of attributes from customer data basedon correlation analysis and then built decision trees using each subset of attributes. Then alogistic regression model was used to predict churn based on the likelihood of churnpredicted by the decision trees.

Other techniques have also been used for prediction of churn. These included the useof neural networks by Mozer et al. [16] and Song et al. [17], the use of support vectormachines byCoussement andVan den Poel [18], and the use of evolutionary algorithms byAu et al. [19] and Bhattacharyya [20]. However, the implementation of these techniques iscomplicated. For neural networks, users need to decide the topology of the network andparameters such as learning rate. For support vector machines, users need to determinevalues of penalty parameters for error items and kernel parameters so that the performanceis optimized. In case of evolutionary algorithms, users need to determine size of population,crossover ratio, and mutation ratio, among others. Further, the research conducted byCoussement and Van den Poel showed that decision tree models performed better thansupport vector machines even when the parameter selection procedure was used [18]. Auet al. showed that the proposed evolutionary algorithm and neural networks basedmodelsperformed better than the decision tree model (C4.5) in terms of lift [19]. Mozer et al. alsoshowed that the neural networks models performed better than the decision tree models interms of the ROC curve [16]. However, the execution time of the proposed evolutionaryalgorithms and neural networks was 20 and 50 times that of decision tree models,respectively [19]. Further, data preprocessing and attribute selection needed an extensiveeffort for neural networks and evolutionary algorithms. For example, Au et al. even

HYBRID MODELS USING UNSUPERVISED CLUSTERING 135

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

consulted experts for attribute selection [19]. In contrast, decision tree models alwaysexhibited the inherent ability of feature selection and required the least effort in dataprocessing. The performance of decision tree models could be improved significantlywith the use of boosting algorithms. In fact, the model that performed best in the Duke/NCRTeradata churn-modeling tournament was a sophisticated example of a decision treemodel with boosting [21]. This goes on to show that decision tree models are preferred byresearchers for prediction of customer churn.

In addition to classification of churners and non-churners, other related issueshave been studied by researchers. Wei and Chiu studied the impact of three factors onthe predictive performance [12]. The first is the ratio of churners to non-churners intraining data. The second is the length of the period over which customer’s call detailsare agglomerated (e.g., every week, every two weeks, or every three weeks). The third isthe gap in time (retention period) between the customers’ data collection and occur-rence of churn. Their results showed that the performance is best when the ratiobetween churners and non-churners was 1:2. Also, the period of observation did nothave a significant impact on the prediction performance, and a shorter retentionperiod is always found to yield better results.

B. Unsupervised Learning

Unsupervised learning techniques do not require the data set to contain the targetvariable. Clustering is a type of unsupervised learning technique that can be used to exploredata sets to discover the natural structure and unknown but valuable behavioral patternsof customers hidden in it [22]. Various approaches have been used for clustering. Jain et al.presented an overview of unsupervised clustering methods, including K-means (KM),K-medoid (KMD), self-organizing map (SOM), fuzzy c-means (FCM), and hierarchicalclustering [23]. Clustering techniques group data items based on their similarities.Euclidean distance is a common choice for measuring the similarities. A data item isassigned to a cluster whose center is themost similar to the data item. Clustering techniqueshave been applied in various business applications. Aronson and Klein developed amathematical programming based clustering model for processes [24]. Lenard et al. usedfuzzy clustering techniques to identify groups of firms thatmight not be able to survive [25].Helsen and Green used KM to identify market segments for a computer system based onthe survey results obtained from customers [26]. Kuo et al. and Kim used SOM to findcustomer segments so that different types of services or products could be designed andprovided to different customers [27], [28]. Premkumar et al. used a combination ofhierarchical clustering and KM to identify different information processing needs andcapabilities of organizations [29]. KMandKMD, SOM, FCM, and hierarchical clusteringrepresent four different type of clustering techniques. KM and KMD are partitionalclustering algorithms that are similar to each other. KM uses the mean of the data itemsin a cluster as the center of that cluster whereas KMD uses a data item that is at the centerof a cluster as the cluster center. It is reported that KMD is less sensitive to the presence ofoutliers in data sets [30]. SOM is a neural networks-based clustering technique that clustersdata into a two-dimensional map so that the distribution of clusters can be visualized [23],[31]. The visualization of clusters produced by SOM can help people estimate the numberof clusters [27]. FCM is a type of fuzzy clustering algorithm that assigns data items toclusters using membership functions. The membership function indicates the likelihoodthat a data item should be assigned to a cluster and under FCMa data item is assigned to a

136 BOSE AND CHEN

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

cluster for which it has the largest value of the membership function. The clusteringprocesses of KM, KMD, FCM, and SOM begin with the initialization of clusters centersin a randommanner and then the centers of clusters are updated repeatedly till the stoppingcriteria is satisfied. The KM and KMD determine the seeds of clusters using McQueen’slaw [32]. SOM finds seeds by applying Kohonen learning algorithms [33]. The procedurefor updating of cluster seeds in case of FCM involves the expectation-maximizationalgorithm, which ensures convergence. This makes FCMmore stable than other clusteringtechniques like KM [30]. Hierarchical clustering follows a bottom-up approach and formsclusters starting from a single data item. It is reported that SOM and KM are suitable fordata-containing hyperspherical clusters whereas hierarchical clustering is suitable for well-separated, chain-like, and concentric clusters [23]. The differences between the variousclustering techniques are summarized in Table 1.

Despite the popularity of unsupervised learning techniques, there is little literaturedevoted to the utilization of the natural patterns detected by clustering algorithms in thebuilding of churn classification models. Chu et al. applied the hierarchical SOM to clusterchurners. However, clustering was performed after prediction was made by the decisiontree model and the results of SOM did not improve the performance of the decision treemodels in any way. [14]. In a different area of application, Thomassey and Fiordaliso usedcluster labels obtained byKMas target variables for decision trees for sales forecasting [34].In their research, the decision tree model is used to find rules that could explain theformation of the clusters. Mingoti and Lima compared the performance of SOM, FCM,KM, and hierarchical clustering algorithms in terms of their classification accuracy andinter-cluster dispersion using simulated data [35]. They showed that FCMwasmore robustthan KM and SOM when dealing with data sets containing noise and also exhibitedreasonably high classification accuracy. The research conducted by Hung et al. is mostclosely related to this article [13]. They clustered customers according to a single variable(i.e., tenure) and built decision trees for each cluster. They used the decision trees on thesame testing data to find which cluster could generate decision trees with better predictionaccuracy. In this article, we use multiple variables for clustering and examine differentapproaches of hybridization for utilizing the results of clustering in order to build bettersupervised learning models (using decision trees) for prediction of customer churn.

III. DATA DESCRIPTION

The three customer churn data sets used in this research were obtained from theTeradata Center at Duke University, USA [36]. The data sets were collected during July,September, October,November, andDecember of 2001 in theUnited States. The first dataset contains 100,000 records of customers. The ratio of churning to non-churning custo-mers is about 50%. The second data set contains 50,000 records of customers and the third

Table 1 Comparison between clustering techniques.

Technique Type Data assignment Preferred data type Cluster center

KM Partitional Hard Hyperspherical Mean

KMD Partitional Hard Hyperspherical Data

FCM Partitional Fuzzy Hyperspherical Mean

BIRCH Hierarchical Hard Chain-like Mean

SOM Neural Networks Hard Hyperspherical Mean

HYBRID MODELS USING UNSUPERVISED CLUSTERING 137

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

data set contains 100,000 records of customers. The churn ratio of customers in the secondand third data set is about 1.8%. All three data sets are described using the same set ofattributes and they contain the samebinary target variable, churn.The number of predictorattributes in these data sets is 171. There are three types of attributes: behavioral attributessuch as minutes of use, revenue, and type of handset; company interaction attributes suchas customer calls to the customer service center, and customer household attributes.Customers’ churn behavior was observed 31–60 days after the customers were sampled.The one-month treatment lag between sampling of customers and observed churn ismaintained because a few weeks are needed to score the customers and to implement anyproactive action. Of the three data sets, the customers’ data in the first and second data setswere collected at the same point in time. The customers in the third data set were selected ata future point in time. In the numerical experiments reported in this article, we use the firstdata set as the training data and refer to it as the calibration data. The second and the thirddata sets are used as testing data and are subsequently referred to as current data and futuredata, respectively.

IV. EXPRIMENTS

A. Decision Trees

Clustering is used as the first stage in the hybrid method and the second stage isconducted using decision trees. Although several supervised learning techniques couldbe chosen for the second stage, the C5.0 decision tree model with boosting is adopted inthis research. There are a number of reasons for that. In general, decision trees are foundto be efficient and fast in prediction of churn, and compared with other supervisedlearning techniques, they can automatically decide the importance of attributes [37]. Inaddition, decision trees can tolerate the presence of outliers and missing data and sominimum effort is required for data preprocessing. C5.0 is an upgraded version of C4.5developed by Quinlan [38]. Compared with C4.5, C5.0 is faster, more accurate, and lessmemory intensive [39]. To enhance the performance of C5.0, it is extended with theboosting algorithm. The boosting approach combines different classifiers by assigningweights to them. The weights are then iteratively adjusted over several trials according tothe performance of the classifiers. Although each single classifier may not have goodperformance, the combination of them using the boosting approach can improve theoverall performance of classification models significantly. At the same time, the boost-ing approach can avoid the problem of overfitting so that classification models can havegood performance not only for the training data but also for unknown testing data [30].In our experiments, we conducted 10 trials of the boosting algorithm to obtain the finalmodel; thismeant that 10 different C5.0 trees are trained using 10 different subsets of thetraining data and the boosting algorithm found the optimal weighting scheme tocombine the results of those 10 models so that classification accuracy is maximized.

B. Clustering Techniques

Five different clustering algorithms are examined in this research as the first stage ofthe hybrid method. They are KM, KMD, SOM, FCM, and BIRCH (Balanced IterativeReducing and Clustering using Hierarchies) [40]. BIRCH belongs to the family of hier-archical clustering algorithms and has been found to be efficient for large data sets. It can

138 BOSE AND CHEN

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

automatically identify the optimal number of clusters during clustering. For KM, KMD,and FCM, Dunn’s index is used for identification of optimal number of clusters [41]. ForSOM, data is clustered into a two-dimensional map. KM and KMD, SOM, FCM, andBIRCH are representatives of the four types of clustering: partitional, neural networks,fuzzy, and hierarchical, respectively, and the use of these five methods helps us comparebetween the different types of clustering for the same data set.

C. Selection of Attributes for Clustering

For mobile telecommunication services, the most important information isminutes of use of mobile services and revenue contribution for those services. In thecase under consideration, the telecommunication services operator provided differentplans to customers based on flat rate pricing. Each plan required the customers to paya distinct minimum usage charge per month. The customers are allowed to use mobileservices for a fixed number of minutes specified by the rate plan. When the limit ofusage is exceeded, the customers are charged on a pay-per-use basis over and on top ofthe minimum monthly charge. The minutes of use of mobile services is decided by thecustomers themselves whereas the revenue contribution is influenced by the pricingplan adopted by the mobile services providers. There are various types of servicesinformation related to data, voice, and other miscellaneous services. We found thatattributes for data services and other miscellaneous services contained a large numberof zeros and this indicated that very few customers actually used them. In contrast,attributes for voice call services did not suffer from that problem. In this research wemade use of voice call attributes only because voice call is the most important serviceprovided by the services provider. Clustering is performed on two types of attributesrelated to voice calls: services usage and revenue contribution. Both services usage andrevenue contribution are characterized bymultiple attributes (seven attributes for eachgroup) and clustering is used to segment customers using multivariate data. Theattributes used for clustering are listed in Table 2.

Table 2 Attributes used for clustering.

Type Name Description

Revenue

contribution

Rev_Mean Mean monthly revenue (charge amount)

Change_Rev Percentage change in monthly revenue versus previous

three months average

Totrev Total revenue

Adjrev Billing adjusted total revenue over the life of the customer

Avgrev Average monthly revenue over the life of the customer

Avg3rev Average monthly revenue over the previous three months

Avg6rev Average monthly revenue over the previous six months

Services usage Change_Mou Percentage change in monthly minutes of use versus

previous three months average

Plcd_Vce_Mean Mean number of attempted voice calls placed

Recv_Vce_Mean Mean number of received voice calls

F_Vce_Ratio Ratio of failed voice calls to placed voice calls

Comp_Vce_Ratio Ratio of completed voice calls to placed voice calls

Peak_Ratio Ratio of peak time voice calls to placed voice calls

MPC Minutes per call

HYBRID MODELS USING UNSUPERVISED CLUSTERING 139

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

D. Choice of Performance Metric

Due to the highly skewed distribution of the target variable—churn in thecurrent and future data sets—the traditional method of assessing classificationaccuracy of models cannot be used in this research. In fact, we could achieveaccuracy as high as 98.2% by classifying all customers as non-churners. However,this result would not be meaningful. We needed models that could identify customerswho were most likely to churn so that appropriate actions could be taken to retainthem. We used top decile lift as the metric of choice to compare the performance ofthe different hybrid models because it is popularly used in the literature [18], [19], [21]to compare different models used for prediction of churn in terms of their ability tocapture customers with high risk of churn. The top decile lift is equal to the ratio ofchurners among the top 10% of customers in terms of the likelihood score divided bythe ratio of churners in the whole population. For example, suppose there are 10,000customers and the churn ratio is 1.8%. If the top decile lift is two, the model cancapture 36 churners by selecting 1,000 customers whose likelihood scores for churnare among the top 10% of the population. In contrast, by random targeting, theexpected number of churners that can be captured among 10% of all customers isonly 18. Therefore the higher the top decile lift, the better the model.

E. Alternative Methods of Hybridization

The five clustering algorithms are applied on the calibration data. The resultincluded two cluster labels. One indicated the identity of the segment obtained usinginformation on services usage and the other indicated the identity of the segmentobtained using information on revenue contribution. We examined two methods ofhybridization for utilizing the results of the clustering techniques, as shown inFigures 1–2. The first method (shown in Figure 1) used the labels that representedthe identity of clusters as inputs to the decision tree model for prediction of churn.The second method (shown in Figure 2) separated the customers into differentclusters and then built decision tree models for each cluster. In the first method,the decision tree models with boosting are trained for each clustering technique andused labels of services usage clusters (UseLbl), labels of revenue contribution clusters(RevLbl), and both types of labels (TwoLbl) as input. In the second method, twotypes of C5.0 decision tree models with boosting were trained: training models foreach services usage cluster (UseClst) and training models for each revenue contribu-tion cluster (RevClst). Finally, a benchmark model was used: a C5.0 decision treemodel that did not utilize any results from clustering (NoLbl). As the last stage of theexperimental procedure, the trained hybrid models were used on the two testing datasets and the top decile lifts were computed.

V. RESULTS

Tables 3 and 4 represent the results obtained for the current and future data sets,respectively. In these tables, each row represents a clustering technique and eachcolumn represents amethod of utilizing the results of clustering under the twomethodsof hybridization. There were 25 combinations of clustering techniques and methods of

140 BOSE AND CHEN

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

utilizing clustering results. The column titled ‘‘NoLbl’’ is not considered as a combina-tion because it represented the benchmark case. Fifteen combinations belonged to thefirst method of hybridization and the remaining 10 combinations belonged to thesecond method of hybridization. The value in each cell of the two tables representedthe top decile lift of the corresponding model for prediction of churn. It is observedthat the best model for current data is the hybridmodel comprising SOMandC5.0 treewith boosting, using ‘‘RevLbl.’’ For future data, the best models are the BIRCH and

Clustering

Decision tree

Customer IDAttribute 1Attribute 2……Attribute n

Customer IDAttribute 1Attribute 2……Attribute nCluster label

Customer IDChurner or not

Figure 1 First method of hybridization.

Clustering

Decision tree 1

Customer IDAttribute 1Attribute 2……Attribute n

Customer IDChurner or not

Cluster 1 Cluster 2 Cluster n......

Decision tree 2 Decision tree n

Figure 2 Second method of hybridization.

HYBRID MODELS USING UNSUPERVISED CLUSTERING 141

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

C5.0 tree with boosting using ‘‘RevLbl’’ and the FCM and C5.0 tree with boostingusing ‘‘TwoLbl.’’ It can also be observed from Tables 3 and 4 that among the fiveclustering techniques that are used in these experiments, the hybrid models using KMperformed the best with three models beating the benchmark model for the currentdata and four models beating the benchmark model for the future data.

Among the 15models that belonged to the first method of hybridization, 10 beat theperformance of the benchmarkmodel in terms of the top decile lift for current data and 13beat the performance of benchmark model for future data. On the other hand, among the10 models that belonged to the second method of hybridization, only one beat the bench-mark model for both current and future data. This indicated that the first method ofhybridization performedbetter than the second in terms of topdecile lift and thus it is betterto include the cluster label as an additional input rather than forming the clusters first andthen using C5.0 decision trees with boosting on each cluster.

Bootstrapping experiments are carried out to obtain a firm statistical statementon the difference between the hybrid models and the benchmark model. We created 50samples using the training data set and the two testing data sets. Each samplerepresented 60% of the entire data set. Because the performance of the models usingthe second method of hybridization is obviously worse than the first method, we onlyconducted the bootstrapping experiments for models using the first method of hybri-dization. The results of the bootstrapping experiments are shown in Tables 5 and 6. Itcan be observed from Table 5 that the hybrid SOM and C5.0 tree with boosting thatused RevLbl and the hybrid BIRCH and C5.0 tree with boosting that used UseLbl,RevLbl, and TwoLbl are significantly better than the benchmarkmodel (at 5% level ofsignificance) for current data. For future data, the hybrid FCM and C5.0 tree withboosting that used TwoLbl, the hybrid KMD and C5.0 tree with boosting that usedRevLbl, and the hybrid BIRCH and C5.0 tree with boosting that used UseLbl,

Table 3 Top decile lift for current data.

First method of hybridization Second method of hybridization

UseLbl RevLbl TwoLbl UseClst RevClst NoLbl

BIRCH 2.61 2.56 2.61 2.10 1.66 2.52

FCM 2.46 2.53 2.51 2.10 2.41 2.52

KM 2.55 2.57 2.46 2.25 2.53 2.52

KMD 2.47 2.54 2.60 2.23 2.22 2.52

SOM 2.62 2.65 2.48 2.23 2.41 2.52

Table 4 Top decile lift for future data.

First method of hybridization Second method of hybridization

UseLbl RevLbl TwoLbl UseClst RevClst NoLbl

BIRCH 2.49 2.61 2.40 1.99 1.78 2.44

FCM 2.47 2.50 2.61 2.44 2.42 2.44

KM 2.51 2.56 2.52 2.18 2.48 2.44

KMD 2.54 2.46 2.47 2.37 2.10 2.44

SOM 2.47 2.42 2.53 2.23 2.37 2.44

142 BOSE AND CHEN

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

RevLbl, and TwoLbl are significantly better than the benchmark model at the 5%level of significance, as shown in Table 6.

Figures 3 and 4 represent the lift curves for the best models for current and futuredata, respectively. In these figures, the y-axis represents the value of the lift and the x-axisrepresents the deciles. The lift curve for the benchmarkmodels are also shown in the twofigures for ease of comparison. From Figure 1 it can be observed that the hybrid SOMand C5.0 tree model with boosting that used RevLbl had higher lift values not only for

Table 5 Differences of top decile lift for hybrid and benchmark models for current data obtained using

bootstrapping.

Method Cluster label Mean difference t-statistic p-value

FCM UseLbl 0.03 0.65 0.52

RevLbl 0.06 1.51 0.14

TwoLbl 0.04 0.85 0.40

KMD UseLbl 0.00 -0.09 0.93

RevLbl 0.03 0.72 0.47

TwoLbl 0.00 0.09 0.93

SOM UseLbl 0.06 1.35 0.18

RevLbl 0.08 1.98 0.05

TwoLbl 0.01 0.18 0.86

BIRCH UseLbl 0.12 2.93 0.01

RevLbl 0.14 3.57 0.00

TwoLbl 0.10 2.91 0.01

KM UseLbl -0.05 -1.01 0.32

RevLbl -0.03 -0.83 0.41

TwoLbl -0.02 -0.39 0.70

Table 6 Differences of top decile lift for hybrid and benchmark models for future data obtained using

bootstrapping.

Method Cluster label Mean difference t-statistic p-value

FCM UseLbl 0.03 1.00 0.32

RevLbl 0.05 1.76 0.08

TwoLbl 0.08 2.91 0.01

KMD UseLbl 0.02 0.83 0.41

RevLbl 0.06 2.77 0.01

TwoLbl 0.03 1.09 0.28

SOM UseLbl 0.04 1.37 0.18

RevLbl 0.05 1.88 0.07

TwoLbl 0.05 1.78 0.08

BIRCH UseLbl 0.16 5.92 0.00

RevLbl 0.14 4.74 0.00

TwoLbl 0.12 4.17 0.00

KM UseLbl 0.04 1.67 0.10

RevLbl 0.05 1.70 0.10

TwoLbl 0.05 1.79 0.08

HYBRID MODELS USING UNSUPERVISED CLUSTERING 143

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

the top decile but also for the second and the third deciles. From Figure 2 it can beobserved that the hybrid FCMandC5.0 treemodel with boosting that used TwoLbl hadsimilar top decile lift as the hybrid BIRCH and C5.0 tree model with boosting that usedRevLbl. However, the performance of the hybrid FCM and C5.0 tree model withboosting that used TwoLbl deteriorated fast for the subsequent deciles and was evenworse than the benchmark model (e.g., second and third deciles). In comparison, thehybrid BIRCH and C5.0 tree model with boosting that used RevLbl continued toperform better than the benchmark model up to the sixth decile. This leads us to theconclusion that the hybrid BIRCH and C5.0 tree model with boosting that used RevLblwas the best model for future data although its top decile lift was exactly same as thehybrid FCM and C5.0 tree model with boosting that used TwoLbl.

1

1. 3

1. 6

1. 9

2. 2

2. 5

2. 8

1 2 3 4 5 6 7 8 9 10Deciles

Lif

t Val

ue

NoLbl SOM-RevLbl

Figure 3 Lift curve of the best hybrid model for current data.

1

1. 3

1. 6

1. 9

2. 2

2. 5

2. 8

1 2 3 4 5 6 7 8 9 10Deciles

Lif

t Val

ue

NoLbl BIRCH-RevLbl FCM-TwoLbl

Figure 4 Lift curve of the best hybrid models for future data.

144 BOSE AND CHEN

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

A. Important Churn Indicators

Decision trees can automatically determine the importance of attributes. Thecloser an attribute is to the root of a decision tree the more important it is for theprediction of the target variables. We analyzed all the decision trees obtained from thenumerical experiments and identified attributes that existed in the first eight levels ofthem (the root is at level 0). In order to quantify their influence on the prediction ofcustomer churn, we built a logistic regression model using these attributes. The resultsare shown in Table 7.

From Table 7 we can observe that most of the attributes regarded as importantby the decision tree models are also found to have significant influence on the predic-tion of customer churn except for the attribute total revenue (i.e., Totrev). Theinterpretation of the results revealed some interesting facts about the customers.

Most of the attributes related to customers’ usage of services had an effect oncustomer churn. The effect is considered to be positive if a higher value of the attributeis associated with a higher rate of churn. First, the attribute ‘‘mean number of monthlyminutes of use’’ had a positive effect on customer churn. The reason may be that themore the customer used the services, the more he or she cared about the quality of theservices and therefore the more he or she was eager to switch if he or she found betterservices being offered by other service providers. Second, the attribute ‘‘total numberof months in service’’ also had a positive effect on customer churn. It is possible thatafter customers tried the new service for a long time they wanted to change theprovider (because of competitive promotions or discount plans on offer) since theswitching cost was quite low. Third, the attribute ‘‘percentage change in monthlyminutes of use vs. previous three month average’’ had a positive effect on customerchurn, implying that the service providers needed to be watchful about recent changesin customers’ behaviors.

For attributes related to customers’ revenue contribution, mixed results wereseen. The attributes ‘‘average monthly revenue over the life of the customer’’ and

Table 7 Predictive attributes for customer churn.

Attributes Description Logit coefficient p-value

Mou_mean Mean number of monthly minutes of use 0.001 0.00

Months Total number of months in service 0.020 0.00

Eqpdays Number of days of current equipment -0.001 0.00

Avgrev Average monthly revenue over the life of the customer -0.004 0.00

Totmrc_mean Base cost of the calling plan 0.008 0.00

Retdaysnew Number of days since last retention call 0.133 0.00

Asl_flag = No Reached account spending limit or not -0.351 0.00

Avg3rev Average monthly revenue over the previous three months 0.007 0.00

Mailresp=yes Mail responder or not 0.184 0.00

Hnd_webcap =No Handset web capability -0.179 0.00

Hnd_webcap =Yes Handset web capability -0.134 0.00

Phone, 3 Number of handsets issued -0.165 0.00

Refurb_new=new Handset: refurbished or new 0.265 0.00

Totrev Total revenue -2.8E-06 0.88

Change_mou Percentage change in monthly minutes of use

vs. previous three months average

0.001 0.00

Rev_mean Mean monthly revenue over the data collection period -0.009 0.00

HYBRID MODELS USING UNSUPERVISED CLUSTERING 145

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

‘‘mean monthly revenue over the data collection period’’ had a negative effect oncustomer churn. However, the attributes ‘‘average monthly revenue over the previousthree months’’ had a positive effect on customer churn. These findings suggested thatin the long run, the more money a customer spent on the services, the less likely that heor she would churn. On the contrary, in the recent period, the more money a customerspent on the services, the more likely that he or she would churn. It was also found thatthe attribute ‘‘base cost of the calling plan’’ had a positive effect on customer churn. Itseemed logical that if customers perceived that they are paying a higher base price forthe calling plan, they tended to churn.

The attributes related to customers’ handphone also showed some effect oncustomer churn. The attribute ‘‘number of days of current equipment’’ had negativeeffect on customer churn. If a customer was willing to use a handset for a long time, itindicated that his or her willingness to change is low. Therefore, the customer is alsoless likely to switch to other mobile services providers. The attribute ‘‘number ofhandsets issued’’ also showed a similar effect. The less the number of phones issued,the less a customer is likely to churn. These findings suggest that customers’ prefer-ences on handsets had close relationship with their churning behavior.

The attribute ‘‘number of days since last retention call’’ had positive effect oncustomer churn. This is an important finding that demonstrates the importance ofcustomer care. If service providers stayed in close contact with their customers ittended to reduce their likelihood of switching providers. Mail responders were morelikely to churn and this may be because they are also open to marketing solicitationsfrom other services providers. Having no account-spending limit had a negative effecton customer churn. It seemed that having a limit on spending amount made thecustomers unhappy with the services and prompted them to churn.

B. Lift Value of Important Rules

Next, we examined the rules generated by the hybrid models in order to ascertainthe usefulness of clustering as the first stage. One way to show whether the inclusion ofthe cluster labels under the first hybrid method actually helped the C5.0 decision treemodels with boosting to perform better or not was to examine whether the rulesgenerated by C5.0 decision tree models contained cluster labels and whether the samerules could help to separate churners from non-churners. We examined the rulesgenerated by the best models for current and future data (i.e., hybrid SOM and C5.0model with boosting and using RevLbl and hybrid BIRCH and C5.0 model withboosting and using RevLbl). We generated rules from the best trees for the currentand future data sets obtained from 10 trials of the boosting algorithm and listed them inTable 8. The rule sets contained two types of rules: rules for identifying churners andrules for identifying non-churners. We identified 6 rules containing cluster labels fromthe bestmodel for current data and 19 rules containing cluster labels from the bestmodelfor future data. The ability to separate churners from non-churners was ascertained onthe basis of the lift values of the rules. The lift value of a rule is equal to the ratio ofnumber of real churners (or non-churners) among customers predicted as churners (ornon-churners) by that rule to the number of churners (or non-churners) in the entirepopulation of customers. From Table 8, it is observed that for rules that predicted non-churners, the lift values are all very close to 1. This is because non-churners represented98.2%of the whole population for both current and future data. For rules that predicted

146 BOSE AND CHEN

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

churners, it was observed that 10 had lift values greater than 1 and 3 had lift valuesgreater than 3 (BIRCH7, BIRCH11, and BIRCH18). This indicated that the rulescontaining cluster labels are effective in predicting customer churn.

VI. DISCUSSION

In this article, we answered four research questions on hybrid models thatcombined unsupervised clustering techniques with decision trees. For the first ques-tion, the results showed that including cluster labels as inputs to C5.0 decision treemodels with boosting improved the performance of those models in terms of top decilelift. Hence, we can say that the patterns detected by the clustering techniques helpedC5.0 decision trees to detect the phenomenon of churning better. For the secondquestion, SOMhelped generate the best hybrid model for current data, BIRCH helpedgenerate the best hybrid model for future data, and KM helped generate the mostnumber of models that beat the benchmark model for the two data sets. Because of themixed result, it is difficult to recommend a single clustering algorithm for the buildingof hybrid models using unsupervised and supervised learning techniques. Therefore,we suggest that for cases related to prediction of customer churn it is a good strategy totry different clustering algorithms as the first component of a hybrid model. For thethird question, the results illustrated that including cluster labels as input to thedecision trees is always a better method of hybridization than clustering the customers

Table 8 Lift for rules obtained from the best models for current and future data.

Type of Rule Rule ID Churn=yes Churn=no

Rules for churn=no SOM1 0.99

SOM2 1.02

SOM3 1.00

BIRCH1 1.02

BIRCH2 1.02

BIRCH3 0.97

BIRCH4 0.93

BIRCH5 1.00

Rules for churn=yes SOM4 1.19

SOM5 1.69

SOM6 0.71

BIRCH7 13.89

BIRCH8 0.00

BIRCH9 0.00

BIRCH10 1.45

BIRCH11 3.97

BIRCH12 0.00

BIRCH13 0.00

BIRCH14 2.25

BIRCH15 0.00

BIRCH16 0.00

BIRCH17 1.98

BIRCH18 3.77

BIRCH19 1.32

BIRCH20 1.75

HYBRID MODELS USING UNSUPERVISED CLUSTERING 147

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

and then using decision trees on each customer cluster. It is difficult to find a consistentanswer to the fourth question. However, it is worth noting that the two best models forthe current and future data used revenue cluster labels. In addition, for current data,models including revenue cluster labels were always better than the benchmark modelwhereas for the future data, 4 out of 5 models including revenue cluster labelsperformed better than the benchmark model. Therefore, it is safe to recommendrevenue cluster labels as input to the decision trees for these two data sets.

The decision tree models also helped to identify important indicators of customerchurn. Being a frequent user of services or a long-time customer did not necessarilyimply loyalty to the service provider. Churn is quite prevalent among such customers.The customers’ revenue attributes are quite important for prediction of churn.However,long-term revenue contribution and short -term revenue contribution have oppositeinfluences on occurrence of churn. Service providers are urged to pay closer attention toboth types of revenue attributes of their customers. In addition to the attributes relatedto usage and revenue contribution, customer care is very important in retaining custo-mers. At the same time, customers’ preferences or choices about type of phone are foundto be closely related to churn. It is therefore recommended that service providers seekclues of churn from customers’ handset purchasing records.

Traditionally, unsupervised learning and supervised learning techniques areconsidered to be ideal for different types of data analyses. Supervised learning techni-ques are used for prediction and classification whereas unsupervised learning techni-ques are mostly used for exploration of unknown patterns in data. However, there isongoing effort among researchers to combine these two types of techniques in amutually beneficial manner. Some researchers have used clustering techniques forsupervised learning tasks [35] and others have used supervised learning techniques tosupport unsupervised learning by identifying the rules related to the assignment ofdata items to clusters [34]. The key contribution of this article is that it combinedclustering with decision trees in such a way that the unsupervised learning techniqueaided the performance of a supervised learning technique for the classification task ofchurn prediction. Two different methods of hybridization were also studied, and it wasshown that retaining cluster labels for subsequent classification by decision treesyielded better results in terms of top decile lift. The efficiency of the methods proposedin this article is exemplified by the fact that in the data mining tournament organizedby Duke University where the same data sets are used, the average top decile liftamong all participating data mining algorithms was 2.14 for the current data and 2.09for the future data, which was worse than our results. It must be remembered that theresults obtained from this research are based only on the data sets under consideration.The exact same observations made using this data set might not be seen if the samedata mining experiments are repeated with a different data set. However, the perfor-mance of the hybrid models discussed in the article is reasonable compared with theperformance of models used in industry. In the case of KPN Mobile of Netherlands,the use of data mining techniques improved the efficiency of marketing activitiesrelated to prevention of customer churn [42]. The project reported a 15% improvementin campaign execution time and cost. The models discussed in this article improved thetop decile lift of random targeting bymore than 200%. It is reasonable to believe that ifthe same models are used in industry they would lead to a significant reduction inexecution time and cost for marketing campaigns that are similar to that of KPNMobile.

148 BOSE AND CHEN

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

VII. CONCLUSION

In this article, we investigated several issues related to combination of unsu-pervised clustering techniques and C5.0 decision trees with boosting. The numberof clustering techniques used for hybridization and the two different approachesfor hybridization made this article unique. We studied five clustering techniquesfor hybridization: BIRCH, KM, KMD, FCM, and SOM. They represented fourtypes of clustering approaches. We compared two methods of hybridization thatutilized the results from clustering algorithms, including cluster labels as input forbuilding the decision trees and forming clusters first and then predicting churnwithin each cluster using decision trees. Several hybrid models resulted in good topdecile lift for prediction of churn. The results of the experiments also identifiedimportant customer attributes that acted as indicators of churn and business ruleswith reasonably high values of lift that can be used by marketing experts. In thebootstrapping experiments, the hybrid models showed better performance than thebenchmark model in a statistically significant manner. Two factors brought thistype of improvement. The first is the method of hybridization. The second is thechoice of appropriate attributes for the generation of clusters: services usage andrevenue contribution. The article also listed several short- and long-term strategiesthat can be adopted by mobile service providers to prevent customer churn.

We chose the C5.0 decision tree as the second stage of the hybrid modelsbecause they needed little effort in data preprocessing and had generated goodresults in past research. Other techniques, such as neural networks and supportvector machines, are powerful supervised learning techniques that can replace deci-sion trees in future experiments. In addition, other unsupervised clustering techni-ques besides BIRCH, KM, KMD, FCM, and SOM can be used in the future as thefirst stage of the hybrid models. Alternate methods of hybridization that utilize thepower of clustering in a more efficient way should also be considered. We used topdecile lift as the metric for evaluating the models. For the data sets under study this isa reasonable choice. However, other metrics such as accuracy, sensitivity, andspecificity may be considered in the future for studying other data sets. Finally, theattributes for customers of mobile services were grouped under services usage andrevenue contribution because this categorization seemed intuitive. Different types ofgrouping of attributes and use of other attributes related to customers can beinvestigated in future.

REFERENCES

[1] International Data Corporation (IDC), ‘‘Asia/Pacific (excluding Japan) mobile services2006–2010 forecast update: 2005 Year-End review,’’ 2006. Available: http://www.idc.com/getdoc.jsp?containerId=AP201304N

[2] Analysys, ‘‘Mobile penetration in Western Europe is forecast to reach 100% in 2007,’’ 2005.

Available: http://www.3g.co.uk/PR/May2005/1412.htm[3] SNL Kagan, ‘‘USA to pass 100% mobile penetration level by 2013—report,’’ 2007.

Available: http://www.cellular-news.com/story/25628.php

[4] C. Geppert, ‘‘Customer churn management: retaining high-margin customers with customerrelationship management techniques,’’ 2002. Available: http://www.kpmg.com.au/Portals/0/ChurnMgmt-whitepaper_0226.pdf

HYBRID MODELS USING UNSUPERVISED CLUSTERING 149

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

[5] A. Berson, S. Smith, and K. Thearling, Building Data Mining Applications for CRM,New York: McGrawHill, 2002.

[6] S. M. Keaveney, ‘‘Customer switching behavior in service industries: an exploratory

study,’’ Journal of Marketing, vol. 59, no. 2, pp. 71–82, 1995.[7] R. N. Bolton, ‘‘A dynamic model of the duration of the customer’s relationship with a

continuous service provider: the role of satisfaction,’’ Marketing Science, vol. 17, no. 1,

pp. 45–65, 1998.[8] A. Rygielski, J.–C. Wang, and D. C. Yen, ‘‘Data mining techniques for customer relation-

ship management,’’ Technology in Society, vol. 24, pp. 483–502, 2002.[9] SPSS, ‘‘Working with telecommunications: churning in the telecommunications industry,’’

SPSS White Paper, 1999. Available: http://www.spss.com/download/papers/[10] J. Burez and D. Van den Poel, ‘‘CRM at a pay-TV company: using analytical models to

reduce customer attribution by targeted marketing for subscription services,’’ Expert

Systems with Applications, vol. 32, no.2, pp. 277–288, 2007.[11] J. W. Han and M. Kamber, Data Mining: Concepts and Techniques. San Francisco, CA:

Morgan Kaufmann, 2001.

[12] C.–P. Wei and I.–T. Chiu, ‘‘Turning telecommunications call details to churn prediction: adata mining approach,’’Expert Systems with Applications, vol. 23, no. 2, pp. 103–112, 2002.

[13] S.–Y. Hung, D. C. Yen, and H.–Y. Wang, ‘‘Applying data mining to telecom churn

management,’’ Expert Systems with Applications, vol. 31, no. 3, pp. 515–524, 2006.[14] B.–H. Chu, M.–S. Tsai, and C.–S. Ho, ‘‘Toward a hybrid data mining model for customer

retention,’’ Knowledge Based Systems, vol. 20, no. 8, pp. 703–718, 2007.[15] J. Y. Qi, Y. M. Zhang, Y. Y. Zhang, and S. Shi, ‘‘TreeLogit model for customer churn

prediction,’’ in Proceedings of the 2006 IEEE Asia-Pacific Conference on ServicesComputing, Guangzhou, China, pp. 70–75, Dec. 12–15, 2006.

[16] M. C. Mozer, R. Wolniewicz, and D. B. Grimes, ‘‘Predicting subscriber dissatisfaction and

improving retention in the wireless telecommunications industry,’’ IEEE Transactions onNeural Networks, vol. 11, no. 3, pp. 690–696, 2000.

[17] G. J. Song, D. Q. Yang, L. Wu, T. J. Wang, and S. W. Tang, ‘‘A mixed process neural

network and its application to churn prediction inmobile communications,’’ inProceedingsof the ICDM Workshops, , Hong Kong, pp. 798–802, Dec. 18–22, 2006.

[18] K. Coussement and D. Van den Poel, ‘‘Churn prediction in subscription services: An

application of support vector machines while comparing two parameter-selection techni-ques,’’ Expert Systems with Application, vol. 34, no. 1, pp. 313–327, 2008.

[19] W.–H. Au, K. C. C. Chan, and X. Yao, ‘‘A novel evolutionary data mining algorithm withapplication to churn prediction,’’ IEEE Transactions on Evolutionary Computation, vol. 7,

no. 6, pp. 532–545, 2003.[20] S. Bhattacharyya, ‘‘Evolutionary algorithms in data mining: multi-objective performance

modeling for direct marketing,’’ in Proceedings of the Sixth ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, pp. 248–257,Aug. 20–23, 2000.

[21] Salford Systems, ‘‘The Duke/NCR Teradata churn modeling tournament,’’ 2004.

Available: http://www.salford-systems.com/churn.php[22] G. Punj andD.W. Stewart. ‘‘Cluster analysis in marketing research: review and suggestions

for application,’’ Journal of Marketing Research, vol. 20, no.2, pp. 134–148, 1983.[23] A. K. Jain, M. N. Murty, and P. J. Flynn. ‘‘Data clustering: a review,’’ ACM Computing

Surveys, vol. 31, no. 3, pp. 264–323, 1999.[24] J. E. Aronson and G. Klein, ‘‘A clustering algorithm for computer-assisted process orga-

nization,’’ Decision Sciences, vol. 20, no. 4, pp. 730–745, 1989.

[25] M. J. Lenard, P. Alam, andD. Booth. ‘‘An analysis of fuzzy clustering and a hybrid modelfor the auditor’s going concern assessment,’’ Decision Sciences, vol. 31, no. 4,pp. 861–884, 2000.

150 BOSE AND CHEN

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3

[26] K. Helsen, and P. E. Green, ‘‘A computational study of replicated clustering with anapplication to market segmentation,’’ Decision Sciences, vol. 22, no. 5, pp. 1124–1141,1991.

[27] R. J. Kuo, K. Chang, and S. Y. Chien, ‘‘Integration of self-organizaing feature maps andgenetic-algorithm-based clustering method for market segmentation,’’ Journal ofOrganizational Computing and Electronic Commerce, vol. 14, no. 1, pp. 43–60, 2004.

[28] S. H. Kim, ‘‘An architecture for advanced services in Cyberspace through data mining: aframework with case studies in finance and engineering,’’ Journal of OrganizationalComputing and Electronic Commerce, vol. 10, no. 4, pp. 257–270, 2000.

[29] G. Premkumar, K. Ramamurthy, and C. S. Saunders, ‘‘Information processing view of

organizations: an exploratory examination of fit in the context of interorganizationalrelationships,’’ Journal of Management Information Systems, vol. 22, no. 1, pp. 257–294,2005.

[30] S. Theodoridis and K. Koutroumbas, Pattern Recognition, San Diego: Elsevier, 2006.[31] L. Churilov, A. Bagirov, D. Schwartz, K. Smith, and M. Dally, ‘‘Data mining with

combined use of optimization techniques and self-organizing maps for improving risk

grouping rules: an application to prostate cancer patients,’’ Journal of ManagementInformation Systems, vol. 21, no. 4, pp. 85–100, 2005.

[32] J. McQueen, ‘‘Some methods for classification and analysis of multivariate observations,’’

in Proceedings of the Fifth Berkeley Symposium onMathematical Statistics and Probability,CA, USA, pp. 281–297, 1967.

[33] T. Kohonen, Self-organizing Maps, Berlin: Springer, 1995.[34] S. Thomassey and A. Fiordaliso, ‘‘A hybrid sales forecasting system based on clustering

and decision trees,’’ Decision Support Systems, vol. 42, no. 1, pp. 408–421, 2006.[35] S. A. Mingoti and J. O. Lima, ‘‘Comparing SOM neural network and fuzzy c-means,

K-means and traditional hierarchical clustering algorithms,’’ European Journal of

Operational Research, vol. 174, no. 3, pp. 1742–1759, 2006.[36] Duke University, ‘‘Case studies, presentations and video modules,’’ 2005 Available: http://

www.fuqua.duke.edu/centers/ccrm/datasets/download.html#data

[37] B. Padmanabhan, Z. Zheng, S. O. Kimbrough, ‘‘An empirical analysis of the value ofcomplete information for eCRMmodels,’’MISQuarterly, vol. 30, no. 2, pp. 247–267, 2006.

[38] J.R.Quinlan,C4.5Programs forMachineLearning, SanMateo,CA:MorganKaufmann, 1993.

[39] Rulequest, ‘‘Is C5.0 better than C4.5,’’ 2007. Available: http://www.rulequest.com/see5-comparison.html

[40] T. Zhang, R. Ramakrishnan, and M. Livny, ‘‘BIRCH: an efficient data clustering methodfor very large databases,’’ inProceedings of the ACMSIGMODConference onManagement

of DATA, , Montreal, Canada, pp. 103–114, 1996.[41] J. C. Bezdek and N. R. Pal, ‘‘Some new indexes of cluster validity,’’ IEEE Transactions on

Systems, Man, and Cybernetics—Part B: Cybernetics, vol. 28, no.3, pp. 301–315, 1998.

[42] SPSS, ‘‘Customer case study: KPN Mobile,’’ 2006. Available: http://www.spss.com/suc-cess/template_view.cfm?Story_ID=180

HYBRID MODELS USING UNSUPERVISED CLUSTERING 151

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 2

2:22

03

May

201

3