Data Mining

Data Mining Industrial Projects and Case

Studies

Kwok-Leung Tsui

Industrial and Systems Engineering

Georgia Institute of Technology

1. AT&T business data mining2. Inventory management in military maintenance 3. Sea cargo demand forecasting4. SMATRAQ project in transportation policies5. Location problem of letterbox6. Home improvement store shrinkage analysis 7. Hotels & resorts chain data mining8. Used car auction sales data mining9. Fast food restaurant call center

Industrial Projects

Data Mining in Telecom. (Funded AT&T project)

~160 billion dollar per year industry (~70 B long distance & ~90 B dollars local)100 million + customers/accounts/lines>1 billion phone calls per day

Book closing (Estimating this month price/usage/revenue) Budgeting (Forecasting next year price/usage/revenue)Segmentation (Clustering of usage, growth, …)Cross Selling (Association Rule)Churn (Disconnect prediction & Tracking)Fraud (Detection of unusual usage time series behavior)Each of these problems worth hundreds millions dollars

A contractor manages parts inventory for aircraft maintenance

Characterization and forecasting of demand and lead time distributions

60,000 different parts and 500 bench locations

Data tracked by an automated system

Demand data not available & stockout penalty

Inventory Management in Air Force (Funded project)

Sea cargo network optimization

Contract planning & booking control

Characterize & forecast sea cargo demand distribution & cost structure

Improve ocean carrier and terminal operation efficiency

Data Mining in Sea Cargo Application (Funded TLIAP project)

Strategies for Metropolitan Atlanta’s Regional Transportation & Air Quality

Five-year project sponsored by Transportation Dept., Federal Highway Admin., EPA, CDC, etc.

Assess air quality, travel behavior, land use & transportation policies

Reduce auto-dependence and vehicle emissions

SMARTRAQ Project for Transportation Policies

Improve performance of express mail dropoff letter boxes

50,000 letter boxes & 8 month transaction data

Relate performance with important factors, e.g. regions, demographic, adjacent competition, pick-up schedule

Comparison with direct competitors

Customer demand analysis and forecast

Mining of Letter Box Transaction Data

Inventory shrinkage costs US retailers 32 billions

Shrinkage = book inventory – inventory on hand

Working with a home improvement store’s Loss Prevention Group

Develop predictive model to relate shrinkage to important variables

Extract hidden knowledge to reduce loss and improve operation efficiency

Data Mining for Shrinkage Analysis in Retail Industry

Manage chain hotels and resorts in different scale

Evaluate impact of promotional programs

Forecasting of customer behavior in frequent stay program

Monitor performance in customer survey

Predict performance with important factors

Data Mining for Hotels and Resorts Chain Business

Maintain all used car auction data in last 20 years

Provide service to customers and dealers on auction price projection

Price depreciations on year,

Develop methods for mileage, seasonal, and regional adjustments

Data Mining of Used Car Auction Data

Centralized call center for drive through customers of over 50 chain restaurants

Contractor manages call center with constraints on time to answer customers

Scheduling and management of human resources

Simulation and optimization algorithms

Data mining and forecasting on aggregate and individual demand

Fast Food Restaurant Call Center

1. A Medical Case Study2. Profile Monitoring in Telecommunication3. Letterbox Transaction Data Mining4. A Market Analysis Case Study5. Air Force Parts Inventory Data Mining

Data Mining Case Studies

1. Telecommunication Data Mining2. Churn Modeling in Wireless Industry3. Market Basket Analysis4. Supermarket Mining I5. Supermarket Mining II6. Banking and Finance

More DM Case Studies (Berry & Linoff)

A Review & Analysis of MTS

(Technometrics, 2003)

W. H. Woodall and R. Koudelik, Virginia Tech

K.-L. Tsui and S. B. Kim, Georgia Tech

Z. G. Stoumbos, Rutgers University

Christos P. Carvounis, MD State University at Stony Brook

A Medical Case Study using MTS and DM Methods

Primary MTS ReferencesTaguchi, G., and Rajesh, J. (2000), “New Trends in Multivariate Diagnosis,” Sankhya: The Indian Journal of Statistics, 62, 233-248.Taguchi, G., Chowdhury, S., and Wu, Y. (2001), The Mahalanobis-Taguchi System, New York: McGraw Hill.Taguchi, G., and Rajesh, J. (2002), a new book in MTS.

P.C. Mahalanobis

Very influential in large-scale sample survey methodsFounder of the Indian Statistical Institute in 1931Architect of India’s industrial strategyAdvisor to Nehru and friend of R.A. Fisher

Deming prize in Japan: 4 timesRockwell Medal (1986) Citation:Combine engineering & statistical methods to achieve rapid improvements in costs and quality by optimizing product design and manufacturing processes.1978-79: Ford / Bell Labs Teams "Discover" Method1980: First US Experiences (Xerox / Bell Labs)1990 - : Taguchi Methods or DOE well recognized by all industries for improving product or manufacturing process design.

Genichi Taguchi Japanese Quality Engineer

MTS is said to be ………A groundbreaking new philosophy for data mining from multivariate data.A process of recognizing patterns and forecasting resultsUsed by Fiju, Nissan, Sharp, Xerox, Delphi Automotive Systems, Ford, GE and othersBeyond theory Intended to create an atmosphere of excitement for management, engineering and academia.

Applications include the following:Patient monitoringMedical diagnosisWeather and earthquake forecastingFire detectionManufacturing inspectionClinical trialsCredit scoring

MTS OverviewSimilar to a classification method using a discriminant-type function.Based on multivariate observations from a “normal” and an “abnormal” group. Used to develop a scale to measure how abnormal an item is while matching a pre-specified or estimated scale.MTS scale is used for variable selection, diagnosis, forecasting, and classification.

MTS Procedure: Stage 1Identify p variables, Vi , i = 1, 2, …, pthat measure the “normality” of an item.Collect multivariate data on the normal group, Xj , j = 1, 2, …, m.Standardize each variable to obtain Zi vectors.Calculate the Mahalanobisdistances (MD) for the mobservations.

( ) T 11i i iMD p −= Z S Z

i=1, …, mwhere S is the sample correlation matrix of the Z’s for the normal group.

Stage 2Collect data on t abnormal items, Xi, i = m + 1, m + 2, …, m + t.

Standardize each variable using the normal group means and standard deviations.Calculate MD values MDi , i = m + 1, m + 2, …, m + t.

According to the MTS, the scale is good if the MD values for the abnormal items are higher than those for the normal items (good separation).

Stage 3Identify the useful variables using orthogonal arrays (OAs) and signal to noise (S/N) ratios.The MTS uses a design of experiments approach as an optimization tool to choose the variables that maximize the average S/N ratio.

Use of DOE for Variable SelectionDesign an OA experiment using all variables.

For each row of the OA (a given set of variables)Compute MDi for each observation in abnormal groups;Determine a Mi value (the true severity level or working average) for each abnormal group;Compute S/N ratio based on MDi and Mi.

Determine significant variables using main effect analysis with S/N ratio as response.

An Example of OA+ including variable; - excluding variable

Run V1 V2 V3 . . . . . . . . . V17 S/N Ratio1 + + + . . . . . . . . . + SN1 2 - + + + SN2 3 + - + + SN3 4 - - + + SN4 5 + + - + SN5 6 - + - + SN6 . . . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. . . . . . . . . . . . . .

.

.

.

.

.

32 - - - . . . . . . . . . - SN32

Dynamic S/N ratio (multiple abnormal groups)

First regress Yi = SQRT(MDi) to Mi to obtain slope estimate (beta hat), then define S/N ratio:

⎥⎥⎦

⎤

⎢⎢⎣

⎡=⎥⎦

⎤⎢⎣⎡ −

MSEMSEMSESSR

r

2ˆlog101log10 β

Larger-is-better S/N Ratio (single abnormal group)

⎥⎦

⎤⎢⎣

⎡− ∑

=

t

i iMDt 1

11log10

For t abnormal observations, the larger-is-better S/N ratio is

Compute level averages of S/N ratios (+ and -) for each variable.Keep variables only with positive(significant) estimated main effects.

i iS N S N

+ −−

Main Effect Analysis

Stage 4

Based on the chosen variables, use the MD scale for diagnosis and forecasting.A threshold is given such that the losses due to the two types of classification errors are balanced in some sense.

A Medical Case Study

Medical diagnosis of liver disease.200 healthy patients and 17 unhealthy patients (10 with a mild level of disease and 7 with a medium case).Age, Gender and 15 blood test variables

(Data is made available.)

Case Study Blood Test Variables with Normal Ranges

Variables Symbol Acronym Normal RangesTaguchi et al. (2001) Normal

RangesTotal Protein in Blood V3 TP 6.0 to 8.3 gm/dL 6.5-7.5 gm/dL

Albumin in Blood V4 Alb 3.4 to 5.4 gm/dL 3.5-4.5 gm/dL

Cholinesterase(Pseudocholinesterase) V5 ChE Depends on Technique

8 to 18 U/mL 0.60-1.00 dpHGlutamate O Transaminase

(Asparate Aminotransferase) V6 GOT 10 to 34 IU/L 2-25 Units

Glutamate P Transaminase(Alanine Transaminase) V7 GPT 6 to 59 U/L 0-22 Units

Lactic Dehydrogenase V8 LDH 105 to 333 IU/L 130-250 Units

Alkaline Phosphatase V9 Alp 0-250 U/L Normal250-750U/L Moderate Elevation 2.0-10.0 Units

r-Glutamyl Transpeptidase(gamma-Glutamate Transferase) V10 r-GPT 0 to 51 IU/L 0-68 Units

Leucine Aminopeptidase V11 LAP

Serum:

Male: 80 to 200 U/mLFemale: 75 to 185 U/mL

⎯

Total Cholesterol V12 TCh< 200 Desirable

200-239 Borderline high240+ High

⎯

Triglyceride V13 TG 10 to 190 mg/dL ⎯

Phospholipid V14 PL Platelet: 150,000 to 400,000/mm3 ⎯

Creatinine V15Cr

.

8 to 1.4 mg/dL ⎯

Blood Urea Nitrogen V16 BUN 7 to 20 mg/dL ⎯

Uric Acid V17 UA 4.1 to 8.8 mg/dL ⎯

Some results and conclusions

Largest MD in healthy group 2.36 Lowest MD in unhealthy group 7.73

Thus, there is a lot of separation between the healthy and unhealthy group.

The Mi values are estimated from averages of MD values.

OA32+ including variable; - excluding variable

Run V1 V2 V3 . . . . . . . . . V17 S/N Ratio1 + + + . . . . . . . . . + SN1 2 - + + + SN2 3 + - + + SN3 4 - - + + SN4 5 + + - + SN5 6 - + - + SN6 . . . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. . . . . . . . . . . . . .

.

.

.

.

.

32 - - - . . . . . . . . . - SN32

average S/N ratioAll variables -6.25MTS combination -4.27 OA optimal comb. -3.34 Overall optimal comb. -1.76

Thus, the proposed method does not yield the optimum combination. MTS average S/N ratio was at about the 95th

percentile.

Subject Disease Level All MTS OA Optimal Optimal1 Mild 7.727 13.937 8.058 13.3292 Mild 8.416 14.726 7.485 8.6163 Mild 10.291 17.342 9.498 8.0024 Mild 7.204 10.804 4.951 12.3115 Mild 10.590 18.379 9.367 12.0426 Mild 10.557 8.605 6.643 6.1397 Mild 13.317 13.896 7.794 6.1398 Mild 14.812 27.910 8.162 22.6669 Mild 15.693 28.110 10.278 26.000

10 Mild 18.911 35.740 20.992 14.42211 Medium 12.610 20.828 16.517 20.83312 Medium 12.256 18.578 14.607 19.31213 Medium 19.655 34.127 35.229 44.61414 Medium 43.039 85.564 13.105 32.72015 Medium 78.639 74.175 9.560 28.56016 Medium 97.268 104.424 29.201 31.81017 Medium 135.698 123.022 44.742 57.226

MDs for Unhealthy Group for Various Combinations of Variables

Plots of MDs for Unhealthy Group for Various Combinations of Variables.:.

.::.Mild +---------+---------+---------+---------+---------+-------All

... . . . .Medium +---------+---------+---------+---------+---------+-------All

:::. :.

Mild +---------+---------+---------+---------+---------+-------MTS: . . . . .

Medium +---------+---------+---------+---------+---------+-------MTS

:.::: .

Mild +---------+---------+---------+---------+---------+-------OA Optimal.

.: .. .Medium +---------+---------+---------+---------+---------+-------OA Optimal

:::: :

Mild +---------+---------+---------+---------+---------+-------Optimal : :. . .

Medium +---------+---------+---------+---------+---------+-------Optimal

Case Study Blood Test Variables with Normal Ranges

Variables Symbol Acronym Normal RangesTaguchi et al. (2001) Normal

RangesTotal Protein in Blood V3 TP 6.0 to 8.3 gm/dL 6.5-7.5 gm/dL

Albumin in Blood V4 Alb 3.4 to 5.4 gm/dL 3.5-4.5 gm/dL

Cholinesterase(Pseudocholinesterase) V5 ChE Depends on Technique

8 to 18 U/mL 0.60-1.00 dpHGlutamate O Transaminase

(Asparate Aminotransferase) V6 GOT 10 to 34 IU/L 2-25 Units

Glutamate P Transaminase(Alanine Transaminase) V7 GPT 6 to 59 U/L 0-22 Units

Lactic Dehydrogenase V8 LDH 105 to 333 IU/L 130-250 Units

Alkaline Phosphatase V9 Alp 0-250 U/L Normal250-750U/L Moderate Elevation 2.0-10.0 Units

r-Glutamyl Transpeptidase(gamma-Glutamate Transferase) V10 r-GPT 0 to 51 IU/L 0-68 Units

Leucine Aminopeptidase V11 LAP

Serum:

Male: 80 to 200 U/mLFemale: 75 to 185 U/mL

⎯

Total Cholesterol V12 TCh< 200 Desirable

200-239 Borderline high240+ High

⎯

Triglyceride V13 TG 10 to 190 mg/dL ⎯

Phospholipid V14 PL Platelet: 150,000 to 400,000/mm3 ⎯

Creatinine V15Cr

.

8 to 1.4 mg/dL ⎯

Blood Urea Nitrogen V16 BUN 7 to 20 mg/dL ⎯

Uric Acid V17 UA 4.1 to 8.8 mg/dL ⎯

Variables for Unhealthy Patients Well Outside Normal Ranges

Subject Number Variable Number

1 12, 13

2 None

3 None

4 13

5 10

6 7

7 7

8 13

9 12, 13

10 4, 12

11 10, 12

12 10

13 10

14 10, 13

15 6, 7, 13

16 3, 6, 7, 10, 12

17 6, 7, 8, 10, 13

Medical Analysis

V4, V6, V7, V9, and V10 are crucial for liver disease diagnosis and classification.Medical diagnosis shows that patients 15-17 exhibit some chronic liver disorder.Cluster analysis on V4, V6, V7, V9, and V10 yields only two groups. Only patients 15-17 are classified as “abnormal”. This result is consistent with medical diagnosis

5.84.83.8Normal

Mild

Medium16 17 15

Dotplot for V4 Alb

15010050

1617 15Medium

Mild

Normal

Dotplot for V6 GOT

1701207020

1517 16Medium

Mild

Normal

Dotplot for V7 GPT

300200100

1716 15Medium

Mild

Normal

Dotplot for V9 Alp

2001000

1715 16Medium

Mild

Normal

Dotplot for V10 r-GPT

Tree Classification Methods

Classification Trees

• The CART (Classification And Regression Tree) methodology known as binary recursive partitioning. For more detailed information on CART, please see: Breiman, Friedman, Olshen, & Stone (1984): Classification and Regression Trees

• C4.5 is a decision tree learning system introduced by Quinlan (Quinlan, J. Ross (1993): C4.5: Programs for Machine Learning). The software is available at:http://www2.cs.uregina.ca/~hamilton/courses/831/notes/ml/dtrees/c4.5/tutorial.html

Tree from Splus

V5 < 381.5

V10 < 63 V6 < 37.5

2(2)3(6)

2(8) 1(4)3(1)

1(196)

NoYes

NoYes NoYes

Tree from SplusVariables actually used in tree construction: V5, V10, and V6.

Number of terminal nodes: 4

Misclassification error rate:0.01382 = 3 / 217

Classification matrix based on learning sample

Predicted ClassActual Class 1 2 3

1 200 0 02 0 8 23 1 0 6

Tree from C4.5

V5 <= 364

V10 <= 63

V6 <= 26

1(200)3(1)

2(8)

3(6)2(2)

No

NoYes

NoYes

Yes

Tree from C4.5

Variables actually used in tree construction: V5, V10, and V6.

Number of terminal nodes: 4

Misclassification error rate:0.0046 = 1 / 217

Classification matrix based on learning sample

Predicted ClassActual Class 1 2 3

1 200 0 02 0 10 03 1 0 6

150100

0

0

100

V6 GOT

200

50

300

400

50

500

100

600

700

150 200

V5 ChE

0250V10 r-GPT

Normal

Mild

Medium

1617

15

Scatter Plot of V5 vs. V10 vs. V6

7006005004003002001000

150

100

50

0

V5 ChE

V6

GO

T

Medium

Mild

Normal

17

16 15

Scatter Plot of V5 vs. V6

7006005004003002001000

250

200

150

100

50

0

V5 ChE

V10

r-G

PT

Medium

Mild

Normal17

16

15


250200150100500

150

100

50

0

V10 r-GPT

V6

GO

T

Medium

Mild

Normal

15 16 17


700600500400300200100Normal

Mild

Medium16

1715

Dotplot for V5 ChE

15010050

1617 15Medium

Mild

Normal

Dotplot for V6 GOT

2001000

1715 16Medium

Mild

Normal

Dotplot for V10 r-GPT

Comparison with Taguchi Approaches

All variables: V1 – V17

MTS: V4, V5, V10, V12, V13, V14, V15, V17

OA Optimal: V1, V4, V5, V10, V11, V14, V15, V16, V17

Optimal: V3, V5, V10, V11, V12, V13, V17

Classification Trees : V5, V6, V10

Disease Level All MTS OA Optimal Optimal TreesMild 7.727 13.937 8.058 13.329 7.366Mild 8.416 14.726 7.485 8.616 18.789Mild 10.291 17.342 9.498 8.002 9.068Mild 7.204 10.804 4.951 12.311 6.517Mild 10.590 18.379 9.367 12.042 29.864Mild 10.557 8.605 6.643 6.139 10.869Mild 13.317 13.896 7.794 6.139 10.869Mild 14.812 27.910 8.162 22.666 8.222Mild 15.693 28.110 10.278 26.000 9.155Mild 18.911 35.740 20.992 14.422 16.420

Medium 12.610 20.828 16.517 20.833 42.681Medium 12.256 18.578 14.607 19.312 38.523Medium 19.655 34.127 35.229 44.614 86.796Medium 43.039 85.564 13.105 32.720 28.252Medium 78.639 74.175 9.560 28.560 208.102Medium 97.268 104.424 29.201 31.810 228.428Medium 135.698 123.022 44.742 57.226 199.304

MDs for Unhealthy Group for Various Combinations of Variables

.:.

.::.Mild +---------+---------+---------+---------+---------+-------All

... . . . .Medium +---------+---------+---------+---------+---------+-------All

:::. :.

Mild +---------+---------+---------+---------+---------+-------MTS: . . . . .

Medium +---------+---------+---------+---------+---------+-------MTS:

.::: .

Mild +---------+---------+---------+---------+---------+-------OA Optimal.

.: .. .Medium +---------+---------+---------+---------+---------+-------OA Optimal

:::: :

Mild +---------+---------+---------+---------+---------+-------Optimal : :. . .

Medium +---------+---------+---------+---------+---------+-------Optimal.:

::.. .Mild +---------+---------+---------+---------+---------+-------Trees

. .. . . . .Medium +---------+---------+---------+---------+---------+-------Trees

0 50 100 150 200 250

ConclusionThe MD values and dotplots show that

only the MD scale based on the variables used by classification trees, i.e., V5, V6and V10, does a good job discriminating between patients with mild level disease and patients with medium level disease. (Maybe MD is a good measure for multivariate data.)

Comparison with Medical Analysis

V4, V6, V7, V9, and V10 are crucial for liver disease diagnosis and classification.Medical diagnosis shows that patients 15-17 exhibit some chronic liver disorder.Cluster analysis on V4, V6, V7, V9, and V10 yields only two groups. Only patients 15-17 are classified as “abnormal”. This result is consistent with medical diagnosis

Correlations

Variables in Classification Trees V5 V6 V10

V4 0.501 -0.505 -0.184

V6 -0.370 1 0.507

V7 -0.365 0.905 0.485

V9 -0.305 0.197 0.269

Var

iabl

es C

ruci

al fo

r M

edia

l Dia

gnos

is

V10 -0.189 0.507 1

5.84.83.8Normal

Mild

Medium16 17 15

Dotplot for V4 Alb

1701207020

1517 16Medium

Mild

Normal

Dotplot for V7 GPT

300200100

1716 15Medium

Mild

Normal

Dotplot for V9 Alp

OA & main effect analysis do not give overall optimum.MTS discriminant function (S/N ratios) does not separate the two unhealthy groups.The variables selected from MTS are not appropriate to detect liver disease based on medical diagnosis.Tree methods separate the two unhealthy groups.MD may be a good distance measure for multivariate data.Results are based on current data and training error.

Case Study Summary

DiscussionsThe MTS ignores considerable previous work in application areas such as medical diagnosis and classification methods.The MTS ignores sampling variation and discounts variation between units.The use of OA cannot be justified.The MTS is not a well-defined approach.Traditional statistical approaches may work better in many cases.Despite flaws, we expect the MTS to be used, in many companies.

150100500

180

160

140

120

100

80

60

40

20

0

V6 GOT

V7

GP

T

Medium

Mild

Normal15

17

16

Correlation (V6, V7) = 0.905

300200100

350

250

150

V12 TCh

V14

PL

Medium

Mild

Normal

15

17

16


250200150100500

120

110

100

90

80

70

60

50

40

V10 r-GPT

V11

LA

P

Medium

Mild

Normal

1615

17


4003002001000

350

250

150

V13 TG

V14

PL

Medium

Mild

Normal

17

1516


8.57.56.55.5

6

5

4

V3 TP

V4

Alb

Medium

Mild

Normal

17

15

16


A SPC Approach for Business Activity Monitoring

(IIE Transcations, 2006)

W. Jiang, Stevens Institute of TechnologyT. Au, AT&T

K.-L. Tsui, Georgia Institute of Technology

A Telecommunication Case Study

A General Framework for Modeling & Monitoring of Dynamic Systems

Dynamic Monitoring (A General Framework)

Actions

Segmentation & Model Selection

Monitoring

Dynamic Update

Problem

Profile–Time domain profile– Profile w. controllable predictors– Profile w. uncontrollable predictors

Model Selection– Global w/o segmentation– Global w. segmentation– Local within Segment

–Detection/Classification– Interpretation–Forecasting/Prediction Segmentation

– Known– Unknown

– Phase I: estimating unknown parameter– Phase II: monitoring and detecting– Anticipated drifts Vs. unanticipated

changes

Objective

ApplicationsManufacturing Processes

Stamping Tonnage Signal Data (functional data)Nortel’s Antenna Signal Data (functional data)Mass Flow Controller (MFC) Calibration (linear profile)Vertical Density Profile (VDP) Data (nonlinear profile)

Service OperationsUsed Car Price Mining and PredictionTelecom. Customer UsageHotel Performance MonitoringFast food drive through call center forecasting & scheduling

Manufacturing:Stamping Tonnage Signal Data

Figure 2: An Tonnage Signal and Some Possible Faults (Jin and Shi 1999)

Stamping Tonnage Signal DataProblem

Time domain profile (a tonnage signal represents the stamping force in a process cycle).

ObjectiveFault detection and classification

Segmentation & Model SelectionKnown segmentation: most process faults occur only in specific working stages. Boundaries and sizes of segments are determined by process knowledge. (Jin and Shi 1999) Global model: wavelet transforms

MonitoringFor each segment, use T2 charts based on selected wavelet coefficients to conduct monitoring. (Jin and Shi 2001)

Dynamic UpdateClassify a new signal as normal, a known fault or a new fault, and update wavelet coefficients’ selection and parameter estimates (e.g. μ, ∑, etc.) using all available data.

ActionsIdentify and remove assignable causes.

Service: Telecom. Customer UsageProblem

Profile with uncontrollable predictorsObjective

Abnormal behavior detection and classificationForecasting/prediction

Segmentation & Model SelectionUnknown segmentation: segment customers based on demographic, geographic, psychographic and/or behavioral information.Segmental: fit model for each customer segment, e.g. linear regression.

MonitoringUse the model built for each segment to monitor customer behaviors, e.g. monitor linear regression parameter vector βusing T2 chart.

Dynamic UpdateUpdate customer segmentation, segmental model fitting and/or parameter monitoring, e.g. parameters update based on known trend.

ActionsService improvement, customer approval, etc.

Telecom. Customer Usage

Profile: profile with uncontrollable predictors

Objective– Abnormal behavior detection and classification– Forecasting/prediction

Segmentation– Unknown (segments are defined by customer information.)Model Selection– segmental (e.g. linear regression on uncontrollable predictors for each segment)

Monitoring – Phase I: unknown control chart parameters estimated from data– Phase II: monitoring by control charts, like T2 chart, EWMA chart, etc.

Actions: service improvement, customer approval, etc.

Dynamic Update– Update segmentation, model selection and/or parameter monitoring

A SPC Approach for Business Activity Monitoring

Jiang, Au, and Tsui (2006), to appear in IIE Transactions

Churn Detection via Customer Profiling

Qian, Jiang, and Tsui (2006), appear in International J. of Production Research

Activity monitoring for interesting events that require actions (Fawcett and Provost, 1999)Examples:

Credit card or insurance fraud detectionChurn modeling and detectionComputer intrusion detectionNetwork performance monitoring

Objective: Trigger alarms for action accuratelyand as quickly as possible once activity occurs

Activity Monitoring

Profiling Approach (SPC & hypothesis test):Characterize populations of key variables that describe normal activityTrigger alarm on activity that deviates from normal

Discriminating Approach (classification):Establish models & patterns of abnormal activity w.r.t. normal Apply pattern recognition to identify abnormal activity

Other Approaches:Hypothesis Vs. classificationNeural network for SPC problems (Hwarng et al. )Apply other classification to SPCDOE for variable selections on discriminationDetect complex patterns in SPC

Activity Monitoring

Objective of Activity monitoring is similar to that of statistical process control (SPC)Multivariate control chart methods for continuous and attribute data may be neededMore sophisticated tools are needed

Activity Monitoring

STATISTICAL PROCESS CONTROL

Widely used in manufacturing industry for variation reduction by discriminating:

Common causesAssignable causes

Evaluation: in-control vs. out-of-control

Performance:

False alarm rate

Average run length (ARL) Techniques:

Shewhart chart , EWMA chart, CUSUM chart

STATISTICAL PROCESS CONTROL

Two stages of implementation:Phase 1 (retrospective): off-line modeling

Identify and clear outliersEstimate in-control models

Phase 2 (prospective): on-line deploymentTrigger out-of-control conditionsIsolate and remove causes of signals

AN EXAMPLE

+

+

+

+++++++

+

+

+

+++++

+++

+++++

+++

+

+

+

++++

+++

++

+++++

+

+++

++

+

+

++

+++

+++

++++

+

++++

+

+

+++

+

+++

++

+++

+

+++

+

+

++++

++

++

+

Time

x

0 20 40 60 80 100

-4-2

02

4

Shewhart Chart

+++++++++++++++++++++++++++++++++++

++++++++++++++++++++++++

++++++++++++++++++++

+++++++++++++++++++++

Time

x.se

s$pr

ed

0 20 40 60 80 100

-4-2

02

EWMA Chart

Time

0 20 40 60 80 100

010

300

1030

CUSUM Chart

KEY CHALLENGES TO SPC

Off-line modelingRobust models w/ outliers and change pointsAutomatic model building

Scalability - a single algorithm tracking millions of data streamsImportance of early signals Interpretation is mostly qualitative, sacrificing accuracy for speed is acceptableDiagnosis and updating - business rulesOnline fashion: incomplete data - censored and/or truncated

SPC Approach for CRM Monitoring

PHASE 1AUTOMATIC MODELING &

PROFILING

PHASE 2PROFILE

MONITORING & UPDATING

PHASE 3EVENT

DIAGNOSIS

CRM MONITORING PROCESS

Business Event Definition

Customer Profiling

ProfileUpdating

Event Monitoring and Triggering

Small Set of Interesting Customers

Customer Diagnosis

SPC FOR CRM - PHASE 1

Off-line Modeling: building customer profile robustly - time consuming

RequirementsA single, time variant model capturing most customers’behaviorAutomatic modeling, less human intervention

TechniquesRobust and efficient estimation methodsChange-point modeling

Parameter SelectionMSE/AIC/BICBusiness Requirement/Domain Knowledge


On-line customer profile updating and monitoring, in search for interesting events requiring action

Requirements:Recursive vs. time windowSignal accurately and as quickly as possible

Techniques:Markovian Type Updating – storage space & timeState Space control models


Diagnosis and Re-profilingRequirements

Following signalsRobust - outliers, trends, …Attribute identification

Techniques:Bayesian modelsNonlinear filtering methods

PHASE 1: CUSTOMER PROFILE

Dynamic Linear Model (West and Harrison, 1997)

Size/Level

Trend

Variability/Variance

Seasonaility (optional)

)(iMt

)(iTt

)]'(),(),([)()}({ iViTiMiPiX ttttt

)(iVt

=a

)(iSt

Estimation Methods

Least Square Estimation (LSE)

Least Absolute Deviation (LAD)

Dummy Change Point Model with LSE

Dummy Change Point Model with LAD

LSE and LAD

A DUMMY CHANGE-POINT MODEL


Solve global models assuming dummy change points

can be recursively obtained by reversing DES method with

Combine forecasts with exponential weights

Local variance can be estimated via bootstrap resampling

∑−

=− +−=

1

0

210 )]([arg)( Min

p

kkt

akaaXpa

∑=

t

pp paw

20 )(

1=λ)( pa


PHASE 2: CUSTOMER PROFILE UPDATING AND MONITORING

History data cleaning and profilingForecasting

Online monitoring

Markovian updating

)()()(ˆ1 iTiMiM ttt +=+

2111

11

111

))()(()()1()(

))()(()()1()())(ˆ)(()()1()(

iMiXiViV

iMiMiTiTiMiXiMiM

ttVtVt

ttTtTt

ttMtMt

+++

++

+++

−+−=

−+−=−+−=

λλ

λλλλ

ttt VKMX >− ++ |ˆ| 11

Comparisons

Objectives:Robust at phase 1.Sensitive at phase 2.

Four methods:1. LSA2. LAD3. Dummy change point model with LSE4. Dummy change point model with LAD

Case Study

Data Mining in Telecommunications Industry

(Source: AT & T, Mastering Data Mining by Berry & Linoff.)

Outline

BackgroundDataflowsBusiness problemsDataA voyage of discoverySummary

Telecommunication Industry

~160 billion dollar per year industry (~70 B long distance & ~90 B dollars local)100 million + customers/accounts/lines>1 billion phone calls per day

Book closing (Estimating this month price/usage/revenue) Budgeting (Forecasting next year price/usage/revenue)Segmentation (Clustering of usage, growth, …)Cross Selling (Association Rule)Churn (Disconnect prediction & Tracking)Fraud (Detection of unusual usage time series behavior)Each of these problems worth hundreds millions dollars

Information Sources

OrderingSystem Network Billing

SystemCustomer

Add a phone Make a call

FCC

CensusDun & Bradstreet,...

CompetitiveWin/Loss/New/No Further Use, ...

Call Details/Web access

Revenue,Price, ...

Official Competitive highlevel reports

DelayedAnnually/Quarterly

Daily Real timeDelayedMonthly

DelayedAnnually/Quarterly

(Tera bytes of interesting information)

Customer Focus

Telecommunication companies want to meet all the needs of their customers:

Local, long distance, and international voice telephone servicesWireless voice communicationsData communicationsGateways to the InternetData networks between corporationsEntertainment services, cable and satellite television

Instead of miles of cable and numbers of switches, customers are becoming the biggest asset of a telephone company.

DataflowsCustomer behavior is in the data.Over a billion phone calls every day.A dataflow is a way of visually representing transformations on data.A dataflow graph consists of nodes and edges.

Data flows along edges, and gets processed at node.

A basic dataflow to read a file, uncompress it, and write it out:

uncompressCompressed

input file (in.z)Uncompressed

output file (out.text)

Why are Dataflows efficient?

Dataflows dispense with most of the overhead that traditional databases have, like logging transaction, indexes, pages, etc.

Dataflows can be run in parallel, taking advantage of multiple processors and disks.

Dataflows provide a much richer set of transforma-tions than traditional SQL.

Basic Operations in Dataflow

Basic OperationsCompress and uncompressReformatSelectSortAggregate and hash aggregateMerge/Join

Very important stepsData is very, very largeNeed super computer power

Business ProblemsTelecommunication business has shifted from an infrastructure business to a customer business.

Understanding customer behavior becomes critical (market segmentation).

Revenue forecasting, churn prediction, fraud detection, new business customer identification.

The detailed transaction data contains a wealth of information, but unexploited due to its huge volume.

Important Marketing Questions

Discussions with business users highlight the areas for analysis:

Understanding the behavior of individual customers

Regional differences in calling patterns

High-margin services

Supporting marketing and new sales initiatives

Data

Call detail data

Customer data

Auxiliary files

Call Detail Data

Definition:A call detail data is a single record for every call made over the telephone network.

Three sources of call detail data:Direct network/switch recordingsSwitch records: the least clean, but the most informative.Inputs into the billing systemBilling records: cleaner, but not complete.Data warehouse feedsRather clean, but limited by the needs of the data warehouse.

Network Call DetailsHundred million calls a day

>100 byte per call record (>10 giga-bytes per days)Originating numberTerminating numberDay/Time of the CallLength of the callTypes of call, …..

2 year data online ??? ---> Statistical Compression>70 billion records (> 7 Tera bytes)Currently in tapes, Batch processing

Real time, low level details +++Raw data, Massive data processing ---Key applications : Book closing, Fraud Detection, Early Warning, ...

Billing DetailsMillions of customer/accounts

Tons of other information about the customer/accounts100+ services (Regular long distance, Digital 1 rate, easylink, Readyline, VTNS,..)5 Jurisdiction (International, Interstate, …)50 states

NPA-NXX24-36 months of Message, Minute, Revenue

Length of call, Average revenue per minute

~? Billions observations$, Detailed +++Dirty, Delayed ----Key Applications : Budgeting/Forecasting, Segmentation/Clustering.

Call Detail DataRecord formatImportant fields in a call detail record includes:

from_numberto_numberduration_of_callstart_timebandservice_field

Customer Data

Customers can have multiple telephone lines. Customer data is needed to match telephone numbers to information about customers.

Telecommunication companies have made significant investments in building and populating data models for their customers.

Customer Ordering DataHundred thousands of add/disconnect order weekly

Add a line or disconnect a line, …Tons of other information about the customer/accounts4+ Order types (Add, Win, Loss, No Further Use)100+ servicesRelated Carrier

Require Minute/Revenue Estimation/PredictionSummarizing the historical usage of a loss/NFU into 1 numberPredicting the future usage of a win/new (Growth Curve)

5 year online, a few hundred million recordsTimely, Small Volume +++Missing information, Massive Data Integration ---Major Applications : Customer Churn, Early Warning, Predicting disconnects

Auxiliary FilesISP access numbersA list of access number of Internet Service Providers

Fax numbersA list of known fax machines

Wireless exchangesA list of exchanges that correspond to mobile carriers

Exchange geographyA list of geographic areas represented by the phone number exchange

InternationalA list of country code, and the names of the corresponding countries.

DiscoveryCall durationCalls by time of dayCalls by market segmentInternational calling patternsWhen are customers at homeInternet service providersPrivate networksConcurrent callsBroad band customers

Call Duration

Call Duration

Calls by Time of DayIn call detail data, the field band is a number representing how the call should be charged. This provides a breakdown:

localregionalnationalinternationalfixed-to-mobileotherUnknown

Question: when are different types of calls being made?

Calls by Time of Day



Calls by Market SegmentThe market segment is a broad categorization of customers:

ResidentialSmall businessMedium businessLarge businessGlobalNamed accountsGovernment

Question: Are customers within market segments similar to each other?What are the calling patterns between market segments?

Calls by Market SegmentSolution approach

Results

customer data from_market_segment to_market_segment

from_number to_number

call detail records

Calls by Market Segment



International Calling PatternsInternational calls are highly profitable, but highly competitive.

Questions:where are calls going to?how do calling patterns change over time?how do calling patterns change during the day?what are differences between business and consumer usage?which customers primarily call one country?which customers call a wider variety of international numbers?

International Calling Patterns

When are Customers at Home?

Internet Providers

Question:which customers own modems?which Internet service providers (ISPs) are customers using?do different segments of customers use different ISPs?

Internet Providers

Private NetworksSpecial customers:

Businesses that operate from multiple sites likely make large volumes of phone calls and data transfers between the sites.

Some businesses must exchange large volumes of data with other businesses.

Virtual private network (VPN) is a telephone product designed for this situation. For large volumes of phone calls, it provide less expensive service than pay-by-call service

Question: Which customers are good candidates for VPN?

Result: A list of businesses that have multiple offices and makephone calls between them.

Concurrent CallsFor businesses having a limited number of outbound lines connected to a large number of extensions, the following questions are of interest:

When do a customer need additional outside line?

When is the right time to offer upgrades to their phone systems?

One measure of a customer’s need for new lines is the maximum number of lines that are used concurrently.

Concurrent Calls

Identify Broad Band Customers

Objective: Identify customers who use the telephone lines for data/computer access (potential broad band customers)Collect sample of 4000 lines in which voice or data/computer access information are availableDivide to two halves for training and testingDefine hundreds of call behavior variablesRun neural network, logistic regression, and tree

Identify Broad Band Customers

Key predictive drivers:length of call (10+ min.)number of repeat phone call to the same number (5+)call by the time of day (at night)Call by day of the week (weekend)

Neural network performed the best.Tree is most intuitive.

Summary

Call detail records contain rich information about customers:Customer behavior varies from one region of a country to another.Thousands of companies place calls to ISPs. They own modems and have the ability to respond to web-based marketing.Residential customers indicate when they are home by using the phone. These patterns can be important, both for customer contact and for customer segmentation.The market share of ISPs differs by market segment.International calls show regional variations. The length of calls varies considerably depending on the destinations.International calls made during the evening and early morning are longer than international calls made during the day.Companies making calls between their different sites are candidates for private networking.

Case Study: Churn Modelingin Wireless Communications

This case study took place at the largest mobile telephone company in a newly developed country. The primary data source is the prototype of an ongoing data warehousing effort. (Source: “Mastering Data Mining” by Berry & Linoff)

Outline The Wireless Telephone IndustryThree GoalsApproach to Building the Churn ModelChurn Model buildingThe DataLessons about Churn Models BuildingSummary

The Wireless Telephone Industry

Rapidly maturing of the wireless market makes the number of churners and the effect of churn on the customer base grow significantly. The business shifts away from signing on nonusers, and focuses on existing customers. (see Figure 11.2 and Figure 11.3)

The wireless telephone industry has differences from other industries.Sole service providersRelatively high cost of acquisitionNo direct customer contactLittle customer mindshareThe handset

Three Goals

Near-term goal: identify a list of probable churners for a marketing intervention.Discussion with the marketing group define the near-term goal: by the 24th of the month, provide the marketing department with a list of 10’000 club members most likely to churnMedium-term goal: build a churn management application (CMA).Besides running churn models, CMA also needed to:

Manage modelsProvide an environment for data analysis before and after modelingImport data and transform it into the input for churn modelsExport the churn scores developed by the models

Long-term goal: complete customer relationship management

Approach to Building the Churn Model

Define churnInvoluntary churn refers to cancellation of a customer’s service due to nonpayment. Voluntary churn is everything that is not involuntary churn. The model is for the latter. Inventory available dataA basic set of data includes data from the customer information file, data from the service account file, and data from billing system.Build modelsDeploy scoresChurn scores can be used for marketing intervention campaigns, prioritizing of customers for different campaigns, and estimating customer longevity in computing estimated lifetime customer value.Measure the scores against what really happens

How close are the estimated churn probabilities to the actual churn probabilities?Are the churn scores “relatively” true, i.e., higher scores imply higher probabilities?

Churn Model Building

A churn modeling effort necessitate a number of decisions:The choice of data mining toolSAS Enterprise Miner Version 2 was used for this project.Segmenting the models setThree models were built for three segments of customers: club members, non-club members, recent customers who had joined in the previous eight or nine months.

The final four models on four different segmentsIn order to investigate if customers joining at about the same time have similar reasons for churn, the club model set was split into two segments: customers who joined in the previous two years, and the rest.

Churn Model Building (continued)

Choice of modeling algorithmDecision tree models were used for churn modeling due to their ability to handle hundreds of fields in the data, their explanatory power and easy to be automated.This project built six trees for each model set (using Gini and entropy as split functions, and allowing 2-, 3- and 4-way splits) in order to see which performs best and to verify each other.Three parameters need to be set: minimum size of a leaf node, minimum size of a node to split, and maximum depth of the tree. The resulted tree needs to be pruned.

The size and churner density of the model setExperiments with different model sets show that the model set with 30% churners and 50k records works best. (Table 11.3)

The effect of latency (Figure 11.12)Translating models in time (Figure 11.13)

The Data

Historical churn ratesHistorical churn rate was calculated along different dimensions: handset, demographic, dealer, and ZIP code.Data at the customer and account levelSSN, ZIP code of residence, market ID, age and gender, pager indication flag, etc.Data at the service levelActivation data and reason, features ordered, billing plan, handset, and dealer, etc.Data billing historyTotal amount billed, late charges and amount overdue, all calls, fee-paid services, etc. Rejecting some variablesVariable that cheat, identifiers, categoricals with too many values, absolute dates, and untrustworthy values, etc.Derived variables

Lessons about Churn Model Building

Finding the most significant variableshandset churn rate, other churn rate, number of phones in use by a customer, low usageListening to the business users to define the goalsListening to the dataIncluding historical churn ratesThe past is the best predictor of the future. For churn, the past is historical churn rates: churn rate by handset, by demographics, by area, and by usage patterns.(Figure 11.17)Composing the model setImportant factors: historical data availability, size and churner density. (Figure 11.18)Building a model for the churn management applicationListening to the data to determine model parametersUnderstanding the algorithm and the tool

Summary

Four critical success factors for building a churn model:Defining churn, especially differentiating between interesting churn (such as customers who leave for a competitor) and uninteresting churn (customers whose service has been cut off due to nonpayment).Understanding how the churn results will be used. Identifying data requirements for the churn model, being sure toinclude historical predictors of churn, such as churn rate by handset and churn rate by demographics.Designing the model set so the resultant models can slide through different time windows and are not obsolete as soon as they are built.

Case Study

Market Basket AnalysisWho buys meat at the health food

store ?

(Source: Mastering Data Mining by Berry & Linoff.)

Purpose

Who buys meat at the health food store?

Understand customer behavior.

DM Tools

Association Rules of Market Basket Analysis.

Customer clustering.

Decision tree.

Customer AnalysisMarket Basket Analysis uses the information about what a customer purchases to give us insight into who they are and why they make certain purchases.

Product AnalysisMarket Basket Analysis gives us insight into the merchandise by telling us which products tend to be purchased together and which are most amenable to purchase.

Market Basket Analysis

Source: E. Wegman

Given

A database of transactions.

Each transaction contains a set of items.

Find all rules X →Y that correlate the presence of one set of items X with another set of items Y.

Example: When a customer buys bread and butter, they buy milk 85% of the time.


Source: E. Wegman

While association rules are easy to understand, they are not always useful.Useful: On Friday convenience store customers often purchase diapers and beer together.Trivial: Customer who purchase maintenance agreements are very likely to purchase larger appliances.

Inexplicable: When a new Super Store opens, one of the most commonly sold items is light bulbs.


Source: E. Wegman

Measures for Market Basket Analysis

Confidence: Probability that right-hand product is present given that the left-hand product is in the basket.

Support: Percentage of baskets that contain both the left-hand side and the right-hand side of the association.

Lift (correlation): Compare the likelihood of finding the right-hand product in a basket known to contain the left-hand product to the likelihood of finding the right-hand product in any random basket.

Example“Caviar implies Vodka”

High confidenceGiven that we know someone bought a caviar then probability that person buy vodka is very high.

Low supportThe percentage of basket that contain both the vodka and caviar is very low since those products are not much use.

High lift

)Pr()Pr(

basketrandomanyinVodkaFindingbaskettheinalreadyisCavierVodkafinding

Association Results

Relation Lift Support (%)

Confidence(%) Rule

1 4 2.47 3.23 33.72 Red pepper -> Yellow pepper & Bananas & Bakery

2 3 2.24 4.75 49.21 Red pepper -> Yellow pepperBananas

… … … … … …

50 2 1.37 3.77 85.96 Green peppers -> Bananas

… … … … … …

LowHighHigh

basketrandomanyinBananaFindingbaskettheinalreadyispepperGreenbanadafinding

==

)Pr()Pr(

Clustering

VariablesGender

Meat buying

Total Spending

• The height of pies: total spending

• Shaded pie slice: the percentage of people in the cluster who buy meat

• Top row: Women, Bottom row: men

Customer Clusters

Decision Tree

The most meat-buying branches

Spend the most money

Buy the largest number of item

Although only about 5% of shoppers buy meat, they are among the most valuable shoppers !!!

Decision Treefor More about Meat

Conclusion

Data Mining can be used to improve shelf placement decision.

Data Mining can be used to identify a small, but very profitable group of customers.

Case Study

Supermarket Mining Analyzing Ethnic Purchasing Patterns


Overview

Describe how the manufacturer learned about ethnic

purchasing patterns.

Aimed at Spanish speaking shoppers in Texas.

Collected data from supermarket chain in Texas.

Employed data mining tools from Mineset (SGI).

Purpose

Discover whether the data provided revealed any differences between the stores with a high percentage of Spanish-speaking customers and those having fewer.

Hispanic percentage for the specific item.

Identify which products sell well in Hispanic consumers.

Scatter plot showing variability of Hispanic appeal by category

DataConsist of weekly sales figures for products from five basic categories. (Ready-to-eat cereals, Desserts, Snacks, Main meals, Pancake and variety baking mixes)Within category subcategories were assigned.(actual units sold, dollar volume and equivalent case sales)For each store,(store sizes, % of Hispanic shoppers and % of African-American shoppers)

Decode variables that carried more than one piece of information.HISPLVL and AALEVEL: % of Hispanic and AAs.HISPLVL: 1 ~15

1 Store outside San Antonio with 90% or more Hispanic.10 With little or no Hispanic.

Normalized values by taking the sales volume to compare stores of different sizes.Hispanic score

Ave. values for the high H. store - Ave. values for the least H. storeLarge post. value indicates a product that sells much better in the heavily Hispanic stores.

Transformation of Data

Transformation of Data

The most valuable part of the project was preparing the data and getting familiar with it,

Rather than in running fancy data mining algorithms.

Association rule visualization for Hispanic percentage.

Scatter plot showing which products sell well in Hispanic neighborhoods.

Scatter plot showing variability of Hispanic appeal by category.

DM Tools

Case Study

Supermarket MiningTransactions & Customer Analysis


A collaboration between a manufacturer and one of the retailer chains.Grocery market usually belong to the retailer actually performed by a supplier.

Overview

Effective use sales data to make the category as a whole more profitable for the retailer.

Identify the customer behavior.

Finding clusters of customers

Purpose

Transaction Detail Fields

FIELDS DESCRIPTION

Date YYYY-MM-DD

Store CCCSSSS, where CCC=chain, SSSS=store

Lane Lane of transaction

Time The time-stamp of the order start time

Customer IDThe loyalty card number presented by the customerID of 0 means the customer did not present a card

Tender Type Payment type, i.e. 1=cash, 2=check,….

UPC The universal product code for item purchased

Quantity The total quantity of this item

Dollar Amount The total $ amount for the quantity of a particular UPC purchased

The numbers, encoded as machine-readable bar code that identify nearly every product that might be sold in a grocery store.Organizations

Uniform Code Council(www.uc-council.org): US and CanadaEuropean Article Numbering Association(www.ean.be):Europe and rest of the world

North America: Consist of 12 digitsThe code itself fits in 11 digits; the twelfth is a checksum

Universal Product Code

http://www.uc-council.org/

http://www.ean.be/

Calculate the % of each shopper’s total spending that went to that category.

The total number of trips.

The total dollar amount spent for the year along with the total number of items purchased and the total number of distinct itemspurchased.

The % of the items purchased that carried high, medium and low profit margins for the store.

From Transaction Detail FieldsWE can calculate …….. .

Finding groups of customers with similar behavior.

K-mean clustering.

Set a certain number k.

Selected as candidate cluster centers.

Assigned to the cluster whose center it is nearest.

Centers of the clusters are recalculated and the records are reassigned based on their proximity to the new cluster center.

Finding Clusters of Customers

To get insight in customer behavior by understanding what differentiates one cluster from another.

To build further model within cluster

To use as additional input variables to another models.

Main Ways to Use Cluster

Case Study

Who Gets What? Building a Best Next Offer Model for an Online Bank


Who Gets What? Building a Best Next Offer Model for an Online Bank

The use of data mining by the online division of a major bank to improve its ability to perform cross selling.

Cross-selling: the activity of Selling additional services to the customers you already have.

Outline

Background on the Banking IndustryThe Business ProblemThe DataApproach to The ProblemModels BuildingLesson learned

Background on the Banking Industry

The challenge for today’s large bank is to shift their focus from market share to wallet-share. That is, instead of merely increasing the number of customers, banks need to increase the profitability of the ones they already have.

Background on the Banking Industry

Why use data mining?A bank knows much more about current customers than external prospects.

The information gathered on customers in the course of normal business operations is much more reliable than the data purchased on external respects.

The Business Problem

The project had immediate, short-term, and long-term goals.Long-term: increase the bank’s share of each customer’s financial business by cross-selling appropriate products.Short term: support a direct e-mail campaign for four selected products (brokerage accounts, money market accounts, home equity loans, and a particular type of saving account).Immediate: take advantage of a data mining platform on loan from SGI to demonstrate the usefulness of data mining to the marketing of online banking services.

The Data

The initial data comprised 1,122,692 account records extracted from the Customer Information System (CIS). Before starting data mining, a SAS data set was created, which contain an enriched version of the extracted data.

The Data

From accounts to customers

Defining the products to be offered.

The Data

From accounts to customersThe data extracted from the CIS had one row per account, which reflects the usual product-centric organization of a bank where managers are responsible for the profitability of particular products rather than the profitability of customers or households.

The best next offer project required pivoting the data to build customer-centric models. The account-level records from the CIS were transformed into around a quarter million household-level records.

The Data

Defining the products to be offered45 product types is used for the best next offer model. Of these25 products are ones that may be offered to a customers. Information on the remaining is used only as input variables when building the models.

Approach to the Problem

The approach to the problem:

A propensity-to-buy model is built for each product individually, which gives each customer a score for the modeled product. The scores for four products are combined to yield the best next offer model: customers are all offered the product for which they have the highest score.


Comparable scores

How to score?

Pitfalls of this approach


Comparable scoresThree requirements are needed to make scores from various product propensity models comparable:

All scores must fall into the same range: zero to one.

Anyone who already has a product should score zero for it.

The relative popularity of products should be reflects in the scores.


How to score?With the product propensity model, prospects are given a score based on the extent to which they look like the existing account holders for that product.This project used a decision tree-based approach, which use the percentage of existing customers at a leaf to assign a score forthe product.

This approach can be summed up by the words of Richard C. Cushing: “When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.”


Pitfalls of this approachBecoming a customer may change people’s behavior.

The best approach is to build models based on the way current customers looked just before they became customers. But, the data this approach is not easy to get.

Current customers reflect past policy. This will result in “past discrimination” .

Models Building

Build an individual propensity model for each product

Finding important variablesBuilding a decision tree modelModel performance in a controlled test

Get to a cross-sell model by combining individual propensity models

Models Building

Start with brokerage accounts

Finding important variables

Using the column importance ToolFind a set of variables which, taken together, do a good job of differentiating classes (people with brokerage accounts and people without):

Whether they are a private banking customerThe length of time they have been with the bankThe value of certain lifestyle codes assigned to them by Microvision (a marketing statistics company)

Using the evidence classifierThis tool Uses the naïve Bayes algorithm to build a predictive model.Naïve Bayes models treat each variable independently and measure their contributions to a prediction. Then these independent contributions are combined to make a classification.

Building a decision tree model for brokerage

MineSet’s decision tree toolLeaves in the tree are either mostly nonbrokerage or mostly brokerage.Each path through the tree to the leaf containing mostly brokerage customers can be thought of as a “rule” for predicting an unclassified customer. Customers meeting the conditions of the “rule” are likely to have or be interested in a brokerage account.In our data, only 1.2 percent of customers had brokerage accounts.To improve the model, Oversampling is used to increase the percentage of brokerage customers in the model set. The final tree is built on a model set containing about one quarter brokerage accounts.


Records weights in place of oversamplingAllowing one-off splitsGrouping categoriesInfluencing the pruning decisionsBackfitting the model for comparable scores


Records weights in place of oversamplingRecording weighting can achieve the effect of oversampling by increasing the relative importance of the rare records.

Splitting decision is based on the total weight of records in each class rather than the total number of records.

In stead of increasing the weight of records in the rare class, the proper approach is to lower the weight of records in the common class.

Bringing the weight of rare records up to 20~25% of the total works well.


Allowing one-off splitsBy default, MineSet’s tree building algorithm splits a categorical variable on every single value, or dose not split on it at all.Users can control if one-off splits are considered through one parameter.One-off split: split based on a single value of a categorical variable.

Grouping categoriesMineSet’s design: The tree building algorithm is unlikely to make good splits on a categorical variable taking on hundreds of values.Some variables rejected by MineSet seem to be very predictive for some cases. They have the characteristic that although there were hundreds of values in the data, only a few values of those variables appear frequently.The approach is to Lump all values below a certain threshold into a catch-all “other” category, and make splits on the more populous ones.


Influencing the pruning decisionsUsers have the control of the size, depth, and bushiness of the tree.Good settings: minimum number of records in a node: 50; pruning factor: 0.1; no explicit limit on the depth.

Backfitting the model for comparable scoresThe backfit model is used to run the original data through the tree.The score for each leaf is based on the percentage of brokerage customers at that leaf.The more brokerage at one leaf, the higher scores the customers without brokerage at this leaf will get, and the more possible they will open a brokerage account.

Brokerage model performance in a controlled test

High score: any score higher than the density of brokerage customers in the population, not a large number.

Group Size Choosing Email Response Rate

Model 10,000 High score Yes 0.7

Control 10,000 Random Yes 0.3

Hold-out 10,000 Random No 0.05

Getting to a cross-sell model

The propensity models for the rest products are built following the same procedure, and individual propensity models are combined into a cross-sell model to find the best next offer.

B

D

A

vote B

0.47

0.10

0.72

C0.31

Summary of the Procedure

Determine whether cross-selling makes sense.

Determine whether sufficient data exists to build a good cross-sell model.

Build propensity models for each product individually.

Combine individual propensity models to construct a cross-sell model.

Lessons LearnedBefore building customer-centric models, data need to be transformed from product-centric to customer-centric.

Having a particular product may change a customer’s behavior. The best way to solve this problem is to build models based on the behavior before buying the product.

The current composition of the customer population is largely a reflection of past marketing policy.

Oversampling and record weighting can be used to consider rare events.

References

Berry & Linoff (Wiley)

Mastering Data Mining, 2000

Han & Kamber (Morgan Kaufmann Publishers)

Data Mining: Concept and Techniques, 2001

Hastie, Tibshirani, & Friedman (Springer Verleg)

The Elements of Statistical Learning, 2001

Taguchi & Jugulum (Wiley)

The Mahalanobis-Taguchi Strategy, 2002

Documents

Data Mining