11
Association Rule By Kenneth Leung

Association Rule By Kenneth Leung. Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large

Embed Size (px)

Citation preview

Page 1: Association Rule By Kenneth Leung. Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large

Association Rule

By Kenneth Leung

Page 2: Association Rule By Kenneth Leung. Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large

Data Mining

The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases, and using it to make crucial business decisions.

Make decision based on previous experience or observation

Page 3: Association Rule By Kenneth Leung. Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large

Association Rule Mining

Formal: To find interesting associations

and/or correlation relationships among

large set of data items. Association

rules show attribute value conditions

that occur frequently together in a given dataset.

Informal: “If – Then” relationship. If this happen, what is most likely to happen next.

Obesity => Diabetes

Page 4: Association Rule By Kenneth Leung. Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large

Market Basket Analysis

A typical and widely-used example of association rule mining.

Example: Data are collected using bar-code scanners in supermarkets. Each record will consist of all items in a single purchase

transaction. Managers would be interested to know if certain groups of

items are consistently purchased together. They could use this data for adjusting store layouts (placing

items optimally with respect to each other), for cross-selling, for promotions, for catalog design and to identify customer segments based on buying patterns. 

Page 5: Association Rule By Kenneth Leung. Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large

Famous & Interesting Finding

Beer & Diaper“A number of convenience store clerks noticed that men often bought beer at the same time they bought diapers. The store mined its receipts and proved the clerks' observations correct. So, the store began stocking

diapers next to the beer

coolers, and sales

skyrocketed”

Page 6: Association Rule By Kenneth Leung. Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large

Why beer and Diapers??Moms are stressed out by their naughty babies, and they need some beers for relief?

Diapers boxes for putting oldbeer bottles. Very environmentalFriendly, and easy handling.

Page 7: Association Rule By Kenneth Leung. Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large

Two Certainty Indices

Determine whether a rule is good

Support of AR: percentage of transactions that contain X and Y (X and Y are two items)

Confidence of AR: Ratio of number of transactions that contain X and Y to the number that contain X

The higher, the more reliable.

Page 8: Association Rule By Kenneth Leung. Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large

Example: Support

Supermarket has 100,000 transactions. 2000/100,000 transactions include beer 800/2000 transactions contain diapers Support for the rule “beer->diapers” is

800 or 800/100,000 = 0.0008, or 0.8%

Page 9: Association Rule By Kenneth Leung. Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large

Example: Confidence

Supermarket has 100,000 transactions. 2000/100,000 transactions include beer 800/2000 transactions contain item diapers Confidence for the rule “beer->diapers”

is 800/2000 = 0.4, or 40%

Page 10: Association Rule By Kenneth Leung. Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large

Full example from Wiki

1. {Cold, Raining} => No2. {Calm, Dry} => Yes3. {Dry} => No4. {Windy} => No

1. {Cold, Raining} => NoSupport: 2/5 = 40%Confidence: 2/2 = 100%=> Good

2. {Calm, Dry} => YesSupport: 2/5 = 40%Confidence: 2/2 = 100%=> Good

3. {Dry} => NoSupport: 1/5 = 20%Confidence: 1/3 = 33.3%=> Bad

4. {Windy} => NoSupport: 0/5 = 0%Confidence: 1/1 = 100%=>Bad

Page 11: Association Rule By Kenneth Leung. Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large

References

http://www.resample.com/xlminer/help/Assocrules/associationrules_intro.htm

http://en.wikipedia.org/wiki/Association_rule_learning

Dr Sin-Min Lee’s lecture 30