75
1 Chapter 6: Segmentation 6.1 Introduction 6.2 Cluster Segmentation 6.3 Market Basket Analysis 6.4 Recommended Reading

Chapter 6: Segmentation

Embed Size (px)

DESCRIPTION

Chapter 6: Segmentation. Chapter 6: Segmentation. Objectives. Define pattern discovery. Name some of the statistical and analytical techniques that are useful for pattern discovery. Pattern Discovery. The Essence of Data Mining? - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 6: Segmentation

1

Chapter 6: Segmentation

6.1 Introduction

6.2 Cluster Segmentation

6.3 Market Basket Analysis

6.4 Recommended Reading

Page 2: Chapter 6: Segmentation

2

Chapter 6: Segmentation

6.1 Introduction6.1 Introduction

6.2 Cluster Segmentation

6.3 Market Basket Analysis

6.4 Recommended Reading

Page 3: Chapter 6: Segmentation

3

Objectives Define pattern discovery. Name some of the statistical and analytical techniques

that are useful for pattern discovery.

Page 4: Chapter 6: Segmentation

4

Pattern Discovery

...

The Essence of Data Mining?

“…the discovery of interesting, unexpected, or valuable structures in large data sets.”

– David Hand

Page 5: Chapter 6: Segmentation

5

Pattern Discovery

“If you’ve got terabytes of data, and you’re relying on data mining to find interesting things in there for you, you’ve lost before you’ve even begun.”

– Herb Edelstein

The Essence of Data Mining?

“…the discovery of interesting, unexpected, or valuable structures in large data sets.”

– David Hand

Page 6: Chapter 6: Segmentation

6

Pattern DiscoveryAre there demographic characteristics to identify people who are more likely to preorder books at a premium price point?

What types of people are most likely to be at the food court on a Saturday afternoon? Is that a good time to have a promotional activity for children (and their parents) or for teens?

What sorts of complaints are most common for different call centers?

If a customer bought product A this week, what is that customer most likely to buy next?

Page 7: Chapter 6: Segmentation

7

Pattern Discovery Caution

Poor data quality Opportunity Intervention Separability Obviousness Nonstationarity

...

Page 8: Chapter 6: Segmentation

8

Pattern Discovery Caution

Poor data quality Opportunity Intervention Separability Obviousness Nonstationarity

Page 9: Chapter 6: Segmentation

9

Pattern Discovery Applications

Data reduction

Novelty detection

Profiling

Market basket analysis

Sequence analysisCB

A

Page 10: Chapter 6: Segmentation

10

Pattern Discovery ToolsIn this chapter, you learn two techniques for unsupervised pattern discovery:

Cluster Segmentation and Profiling

Market Basket Analysis, Sequence Analysis

Page 11: Chapter 6: Segmentation

11

Chapter 6: Segmentation

6.1 Introduction

6.2 Cluster Segmentation6.2 Cluster Segmentation

6.3 Market Basket Analysis

6.4 Recommended Reading

Page 12: Chapter 6: Segmentation

12

Objectives Describe several examples of segmentation. Explain k-means clustering. Explain the Ward method in SAS Enterprise Miner. Perform cluster segmentation and generate profiles of

the segments using SAS Enterprise Miner.

Page 13: Chapter 6: Segmentation

13

Unsupervised Classification

Unsupervised classification:grouping of cases based onsimilarities in input values

grouping

cluster 1

cluster 2

cluster 2

cluster 1

cluster 3

inputs

Page 14: Chapter 6: Segmentation

14

Segmentation for Customer TypesYou want to identify segments. While you have thousands of customers, there are really only a handful of major types into which most of your customers can be grouped. Bargain hunter Man/woman on a mission Impulse shopper Weary parent DINK (dual income, no kids)

Page 15: Chapter 6: Segmentation

15

Segmentation for Fraud DetectionMost fraudulent customer activity is difficult to identify by a single variable. Are there unusual combinations of behaviors that can help identify criminal activity or fraud?

Spending $250.00 on shoes is not unusual.An online purchase by Dan Kelly is not unusual. Purchases in New York by Dan Kelly are not unusual although Dan lives in Raleigh.

Dan Kelly buying $250.00 in shoes online while he is in New York; that is unusual. Fraud alert!

Page 16: Chapter 6: Segmentation

16

Segmentation for Store Location You want to open new grocery stores in the U.S. based on demographics. Where should you locate the following types of new stores? low-end budget grocery stores small boutique grocery stores large full-service supermarkets

Page 17: Chapter 6: Segmentation

17

Classifying Fashion Trends Based on the four styles of pants that your customers can purchase, can you identify stores as serving similar fashion types?

country-club dresser fashion trendsetter comfort kick-back dresser

Page 18: Chapter 6: Segmentation

18

k-Means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Re-assign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 19: Chapter 6: Segmentation

19

k-Means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Re-assign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 20: Chapter 6: Segmentation

20

k-Means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 21: Chapter 6: Segmentation

21

k-Means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 22: Chapter 6: Segmentation

22

k-Means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 23: Chapter 6: Segmentation

23

k-Means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 24: Chapter 6: Segmentation

24

k-Means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 25: Chapter 6: Segmentation

25

k-Means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 26: Chapter 6: Segmentation

26

k-Means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 27: Chapter 6: Segmentation

27

k-Means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 28: Chapter 6: Segmentation

28

k-Means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 29: Chapter 6: Segmentation

29

k-Means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 30: Chapter 6: Segmentation

30

k-Means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 31: Chapter 6: Segmentation

31

k-Means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 32: Chapter 6: Segmentation

32

k-Means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 33: Chapter 6: Segmentation

33

Segmentation Analysis

When no clusters exist, use the k-means algorithm to partition cases into contiguous groups.

Training Data

Page 34: Chapter 6: Segmentation

34

6.01 PollIf you ask SAS Enterprise Miner to recover five clusters but there are not five distinct groups in the data, you do not get a five-cluster solution. You only get as many clusters as there are true groupings to find in the data.

Yes

No

Page 35: Chapter 6: Segmentation

35

6.01 Poll – Correct AnswerIf you ask SAS Enterprise Miner to recover five clusters but there are not five distinct groups in the data, you do not get a five-cluster solution. You only get as many clusters as there are true groupings to find in the data.

Yes

No

Page 36: Chapter 6: Segmentation

36

What Value of k to UseThe number of seeds, k, typically translates to the final number of clusters that are obtained. The choice of k can be made using a variety of methods. Subject-matter knowledge (There

are most likely five groups.) Convenience (It is convenient to

market to three to four groups.) Constraints (You have six products

and need six segments.) Arbitrarily (Always pick 20.) Based on the data (Ward’s method)

Page 37: Chapter 6: Segmentation

37

What Value of k to UseThe number of seeds, k, typically translates to the final number of clusters that are obtained. The choice of k can be made using a variety of methods. Subject-matter knowledge (There

are most likely five groups.) Convenience (It is convenient to

market to three to four groups.) Constraints (You have six products

and need six segments.) Arbitrarily (Always pick 20.) Based on the data (Ward’s method)

Page 38: Chapter 6: Segmentation

38

Ward’s Method in SAS Enterprise MinerWard’s method is an algorithm for hierarchical cluster analysis.

In this method, each observation is considered a cluster, and the clusters are hierarchically joined, based on minimizing the ratio of the variation between clusters to the variation within clusters.

Based on a statistical analysis, the number of clusters is selected.

This number of clusters is used for k-means cluster analysis.

Page 39: Chapter 6: Segmentation

39

Ward’s Method in SAS Enterprise MinerSAS Enterprise Miner uses an empirical approach to select the number for k, based on a preliminary analysis using Ward’s clustering in three steps:

1. Preliminary k-means clustering on original data to save many cluster centroids

2. Ward’s hierarchical clustering on saved cluster centroids to determine the ideal value for k

3. k-means clustering on the original data set using k from step 2

Page 40: Chapter 6: Segmentation

40

Step 1Many seeds (by default, 50) are chosen from the original training data, and an initial k-means clustering is performed. The means (centroids) of the 50 preliminary clusters are saved to a data set and input to step 2.

Page 41: Chapter 6: Segmentation

41

Step 2Ward’s method performs hierarchical clustering on the preliminary clusters (the centroids saved in step 1). At each step (k clusters, k-1 clusters, k-2 clusters, and so on), the cubic clustering criterion statistic (CCC) is saved to a data set. The final number of clusters is selected based on the CCC with the following conditions: The final number of clusters must be greater than or

equal to the minimum number of clusters specified in the Selection Criteria properties.

The final number of clusters must have a CCC greater than the CCC threshold in the Selection Criteria properties.

Page 42: Chapter 6: Segmentation

42

Step 3The number of clusters determined in step 2 provides the value for k in a k-means clustering of the original training data set. Ideally, the number of clusters should correspond to a

peak in the CCC statistic. When there is no peak in the CCC, the resulting

number of clusters might be suspect. When the CCC for the selected k is negative, the

resulting number of clusters might be suspect.

Page 43: Chapter 6: Segmentation

43

6.02 Multiple Choice PollYou should use a clustering solution that corresponds to the _____________ of the CCC.

a. maximum

b. minimum

Page 44: Chapter 6: Segmentation

44

6.02 Multiple Choice Poll – Correct AnswerYou should use a clustering solution that corresponds to the _____________ of the CCC.

a. maximum

b. minimum

Page 45: Chapter 6: Segmentation

45

Grocery Store Case StudyAnalysis goal:

Where should you open new grocery store locations?Group geographic regions into segments based on income, household size, and population density.

Select and transform segmentation inputs. Select the number of segments to create. Create segments with the Cluster tool. Interpret the segments.

Analysis plan:

Page 46: Chapter 6: Segmentation

46

Segmenting Census Data

Grocery Store Case Study

Task: Use tools and techniques in SAS Enterprise Miner for cluster and segmentation analysis.

Page 47: Chapter 6: Segmentation

47

Idea ExchangeDo any of the segments seem to map onto the types of stores that the grocery store company is considering (budget, small boutique, large full-service supermarket)?

Explore different numbers of clusters for the solution. Do your conclusions change?

Page 48: Chapter 6: Segmentation

48

Bank Marketing Segmentation Case StudyAnalysis goal:

Who is the best target for a cross-sell/up-sell campaign?A consumer bank wants to segment its customers based on historic usage patterns to identify those who might benefit from new product offerings.

Analysis plan:

1. Perform cluster analysis.2. Select the number of

segments to create.3. Interpret the segments.4. Deploy the segmentation rules with scoring code.

Page 49: Chapter 6: Segmentation

49

Accessing and Assaying the Data

Bank Marketing Segmentation Case Study

Task: Use tools and techniques in SAS Enterprise Miner for cluster and segmentation analysis.

Page 50: Chapter 6: Segmentation

50

Idea ExchangeIn the examples from this course, you have performed cluster analysis with a small number of variables. However, in real applications, it is common that there are many variables you could use in clustering. Cluster analysis does not perform well with a large number of variables, as it becomes increasingly difficult to detect differences among groups as the number of variables increases.

Consider an example in which you might use many variables, such as questionnaire items, demographics, and purchasing behavior.

What are some strategies you would take to reduce from a large number of variables to something more manageable?

Page 51: Chapter 6: Segmentation

51

Exercise

This exercise reinforces the concepts discussed previously.

Page 52: Chapter 6: Segmentation

52

Chapter 6: Segmentation

6.1 Introduction

6.2 Cluster Segmentation

6.3 Market Basket Analysis6.3 Market Basket Analysis

6.4 Recommended Reading

Page 53: Chapter 6: Segmentation

53

Objectives Describe several examples where association analysis

is useful. Distinguish between two types of association analysis:

market basket analysis and sequence analysis. Define support and confidence in the context of

association analysis. Perform market basket analysis and sequence

analysis in SAS Enterprise Miner.

Page 54: Chapter 6: Segmentation

54

Market Baskets for Grocery GroupingsA classic application of market basket analysis addresses this question:

Which items are likely to be purchased together? If product A and product B often go together, then

placing a more expensive alternative to B near the display for A can create an up-sell opportunity.

If product A and B are often purchased together, putting them on sale at different times can drive purchases continually.

Page 55: Chapter 6: Segmentation

55

Market Baskets for HardwareA hardware store has 25 shopping aisles. Which products should be grouped near one another? Key-cutting near paint or near door hardware? Lawn ornaments near garden or near indoor

decorative ornaments?

Page 56: Chapter 6: Segmentation

56

Sequence Analysis for TrainingRelated to market basket analysis is sequence analysis, which looks at which items go together from one time to another. This can create opportunity for best-next-offer campaigns. After a student takes the SAS Programming 2 course,

which course is most likely to be next? After a student takes the Statistics 1 course and the

programming certification exam, which course is most likely to be next?

Page 57: Chapter 6: Segmentation

57

Market Basket Analysis

Rules: X Y = “X implies Y”C A = “Given C, how often does A occur?”A C = “Given A, how often does C occur?”Strength of association is measured by support and confidence.

A B C A C D B C D A D E B C E

Page 58: Chapter 6: Segmentation

58

Market Basket Analysis

Support (A B) =transactions containing every item in A and B

all transactions

A B C A C D B C D A D E B C E

Page 59: Chapter 6: Segmentation

59

Market Basket Analysis

Confidence (A B) =transactions containing every item in A and B

transactions containing the items in A

A B C A C D B C D A D E B C E

Page 60: Chapter 6: Segmentation

60

Market Basket Analysis

Rule

A DC AA C

B & C D

Support

2/52/52/51/5

Confidence

2/32/42/31/3

A B C A C D B C D A D E B C E

Page 61: Chapter 6: Segmentation

61

Implication?Checking Account

No

Yes

No Yes

SavingsAccount

4,000

6,000

10,000Support(SVG CK) = 50%Confidence(SVG CK) = 83%

Lift(SVG CK) = 0.83/0.85 < 1Expected Confidence(SVG CK) = 85%

Page 62: Chapter 6: Segmentation

62

Barbie Doll Candy1. Put them closer together in the store.

2. Put them far apart in the store.

3. Package candy bars with the dolls.

4. Package Barbie + candy + poorly selling item.

5. Raise the price on one, and lower it on the other.

6. Offer Barbie accessories for proofs of purchase.

7. Do not advertise candy and Barbie together.

8. Offer candies in the shape of a Barbie doll.

Page 63: Chapter 6: Segmentation

63

Data Capacity

A A B C D A

D A A B BA

Page 64: Chapter 6: Segmentation

64

Banking Services Case StudyAnalysis goal:

Explore associations between retail banking services used by customers.

Analysis plan: Create an association data source. Run an association analysis. Interpret the association rules. Run a sequence analysis. Interpret the sequence rules.

Page 65: Chapter 6: Segmentation

65

Performing Association Analysis: Market Basket Analysis

Banking Services Case Study

Task: Perform market basket analysis on the banking data.

Page 66: Chapter 6: Segmentation

66

Idea ExchangeBased on the findings from the bank data market basket analysis, what are some business decisions you might recommend? List five possible actionable decisions from the analysis.

Page 67: Chapter 6: Segmentation

67

Performing Association Analysis: Sequence Analysis

Banking Services Case Study

Task: Perform sequence analysis on the banking data.

Page 68: Chapter 6: Segmentation

68

Idea ExchangeConsider the actionable decisions that you discussed for market basket analysis. Based on the findings from the bank data sequence analysis and your understanding of the order in which products tend to occur together, how would you update those decisions?

Page 69: Chapter 6: Segmentation

69

Pattern Discovery Tools: ReviewGenerate clusters and perform segmentation using automatic settings and with user-defined settings.

Compare within-segment distributions of selected inputs to overall distributions. This helps you understand segment definition.

Conduct market basket and sequence analysis on transactions data. A data source must have one target, one ID, and (if desired) one sequence variable in the data source.

Page 70: Chapter 6: Segmentation

70

Idea ExchangeThink about products that you purchase together. Name several pairs or groups of items that are often

purchased together, or behaviors that tend to occur together. Now suppose that these combinations of products are common. What actionable business decisions could be made knowing these associations?

Name several pairs or groups of items that you purchase in sequence, or behaviors that you engage in sequentially. Now suppose that these sequences of behaviors are common. What actionable business decisions could be made knowing these sequences.

Page 71: Chapter 6: Segmentation

71

Exercise

This exercise reinforces the concepts discussed previously.

Page 72: Chapter 6: Segmentation

72

Chapter 6: Segmentation

6.1 Introduction

6.2 Cluster Segmentation

6.3 Market Basket Analysis

6.4 Recommended Reading6.4 Recommended Reading

Page 73: Chapter 6: Segmentation

73

Recommended Reading Gulati, Ranjay. “Inside Best Buy’s Customer-Centric Strategy.” Harvard Business Review blogs. April 12, 2010.

http://blogs.hbr.org/hbsfaculty/2010/04/inside-best-buys-customer-cent.html

Best Buy has implemented a customer segmentation approach that has set the company apart from its competition. This blog provides a summary of Best Buy’s customer-centric approach driven by analytics.

Page 74: Chapter 6: Segmentation

74

Recommended Reading May, Thornton. 2010. The New Know: Innovation Powered by Analytics. New York: Wiley. Chapters 4 and 5

Further discussion of analysts in the workplace, the importance of relationships, and the analysis of social network data.

Page 75: Chapter 6: Segmentation

75

Recommended Reading Ketchen, David J. and Christopher L. Shook. 1996. “The Application of Cluster Analysis in Strategic Management Research: An Analysis and Critique.” Strategic Management Journal 17(6):441-458. available on JSTOR: www.jstor.org/stable/2486927

Optional reading