30
Association Rules Association Rules (market basket analysis) (market basket analysis) Retail shops are often interested in associations between different items that people buy. Someone who buys bread is quite likely also to buy milk A person who bought the book Database System Concepts is quite likely also to buy the book Operating System Concepts. Associations information can be used in several ways. E.g. when a customer buys a particular book, an online shop may suggest associated books. Association rules: bread milk DB-Concepts, OS-Concepts Networks Left hand side: antecedent, right hand side: consequent An association rule must have an associated population; the population consists of a set of instances E.g. each transaction (sale) at a shop is an instance, and the set of all transactions is the population

Association Rules (market basket analysis) Retail shops are often interested in associations between different items that people buy. Someone who buys

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Association RulesAssociation Rules(market basket analysis)(market basket analysis)

Retail shops are often interested in associations between different items that people buy. • Someone who buys bread is quite likely also to buy milk

• A person who bought the book Database System Concepts is quite likely also to buy the book Operating System Concepts.

Associations information can be used in several ways. • E.g. when a customer buys a particular book, an online shop may

suggest associated books.

Association rules:

bread milk DB-Concepts, OS-Concepts Networks• Left hand side: antecedent, right hand side: consequent

• An association rule must have an associated population; the population consists of a set of instances

• E.g. each transaction (sale) at a shop is an instance, and the set of all transactions is the population

Association Rule DefinitionsAssociation Rule Definitions

Set of items: I={I1,I2,…,Im}

Transactions: D={t1,t2, …, tn}, tj I

Itemset: {Ii1,Ii2, …, Iik} I

Support of an itemset: Percentage of transactions which contain that itemset.

Large (Frequent) itemset: Itemset whose number of occurrences is above a threshold.

Association Rules ExampleAssociation Rules Example

I = { Beer, Bread, Jelly, Milk, PeanutButter}

Association Rule DefinitionsAssociation Rule Definitions

Association Rule (AR): implication X Y where X,Y I and X Y = the null set;

Support of AR (s) X Y: Percentage of transactions that contain X Y

Confidence of AR () X Y: Ratio of number of transactions that contain X Y to the number that contain X

Association Rules Ex (cont’d)Association Rules Ex (cont’d)

Association Rules Ex (cont’d)Association Rules Ex (cont’d)

Of 5 transactions, 3 involve both Bread and PeanutButter, 3/5 = 60%

Of the 4 transactions that involve Bread, 3 of them also involve PeanutButter 3/4 = 75%

Association Rule ProblemAssociation Rule Problem

Given a set of items I={I1,I2,…,Im} and a database of transactions D={t1,t2, …, tn} where ti={Ii1,Ii2, …, Iik} and Iij I, the Association Rule Problem is to identify all association rules X Y with a minimum support and confidence (supplied by user).

NOTE: Support of X Y is same as support of X Y.

Association Rule Algorithm (Basic Idea)Association Rule Algorithm (Basic Idea)

1. Find Large Itemsets.

2. Generate rules from frequent itemsets.

This is the simple naïve algorithm, better algorithms exist.

Association Rule AlgorithmAssociation Rule Algorithm

We are generally only interested in association rules with reasonably high support (e.g. support of 2% or greater)

Naïve algorithm

1. Consider all possible sets of relevant items.

2. For each set find its support (i.e. count how many transactions purchase all items in the set).

• Large itemsets: sets with sufficiently high support

• Use large itemsets to generate association rules.

• From itemset A generate the rule A - {b} b for each b A.

• Support of rule = support (A).

• Confidence of rule = support (A ) / support (A - {b})

• From itemset A generate the rule A - {b} b for each b A.

• Support of rule = support (A).

• Confidence of rule = support (A ) / support (A - {b})

Lets say itemset A = {Bread, Butter, Milk}

Then A - {b} b for each b A includes 3 possibilities

{Bread, Butter} Milk

{Bread, Milk} Butter

{Butter, Milk} Bread

AprioriApriori

Large Itemset Property:

Any subset of a large itemset is large.

Contrapositive:

If an itemset is not large,

none of its supersets are large.

Large Itemset PropertyLarge Itemset Property

Large Itemset PropertyLarge Itemset Property

If B is not frequent, then none of the supersets of B can be frequent.

If {ACD} is frequent, then all subsets of {ACD} ({AC}, {AD}, {CD}) must be frequent.

If {ACD} is frequent, then all subsets of ({A}, {A}, {C}) must be frequent.

My Personal View of Association Rules My Personal View of Association Rules

Vastly over studied problem, of dubious utility

Student PresentationsStudent Presentations

Starting next week students will be giving presentations

Presentation can be on

The student project

A paper chosen by the student (per my approval)

The presentation should last 8 to15 minutes. You need to tell me in advance how long the talk will be.

You must email me the slides by midnight, before the talk

There will be a signup sheet (topic and date) on my door tomorrow.

Tips for Giving a Good TalkTips for Giving a Good Talk

Winter 2003Winter 2003

Dr Eamonn KeoghDr Eamonn KeoghComputer Science & Engineering Department

University of California - RiversideRiverside,CA [email protected]

Modified from the notes of Edward R. Tufte, Craig S. Kaplan, Eamonn Keogh and others

OutlineOutline

Advice on giving talksAdvice on giving talks

• General advice• Organization• Making clear overheads• Avoiding common pitfalls

ConclusionConclusion

• Show up early. You may have a chance to head off some technical or ergonomic problem.

• Have a backup plan. If your lecture is based on a PowerPoint presentation, have overhead backups of each page.

• Check out the room ahead of time. Before your talk, check out the room, and make sure it has everything you need.

General Advice IGeneral Advice I

•Never apologize. Most people wouldn’t have noticed the issues for which you’re apologizing—and it just sounds lame.

• Invest in a laser pointer. They are inexpensive, and are extremely useful.

• Rehearse timing. This is the most common sin!!!

General Advice IIGeneral Advice II

Overheads IOverheads I

• Use large fonts. Use the biggest fonts realistically possible. Small fonts are hard to read

• Use highly contrasting colors.

• Avoid busy backgrounds. Too much in the background makes the text hard to read

Overheads IIOverheads II

• Avoid using red text. Red text is often hard to read.

• AVOID ALL CAPS! All caps look like you're shouting.

…Include a good combination of words, pictures, and graphics. A variety keeps the presentation interesting

…Include a good combination of words, pictures, and graphics. A variety keeps the presentation interesting

Overheads IIIOverheads III

• Be Terse

• The sales forecasts show an increase on the horizon. • Sales are up.

• Use bullets or numbered items appropriately

Goals• Ease of use • Reusability • Reliability

Outline of our method1. Design 2. Implementation 3. Testing

Overheads IIIIOverheads IIII

• Begin with an introduction slide (Who you are, why you are giving a talk, the title of the talk)

• Next, give an outline (“roadmap”). For a short talk, you might want to combine this with the above

• State your point (one simple slide)• Demonstrate your point (a few slides)• Review your point (one simple slide)

Overheads VOverheads V

• End with a slide that reviews the entire talk…

• We introduced the TSP problem• We explained why it is an important problem• We explained why it is a hard problem• We introduced a new heuristic to solve TSP• We empirically demonstrated the utility of our approach

• End “cleanly”, don’t fade away.

Overheads VIOverheads VI

• Avoid using “standard” clipart/ background etc

I have seen this at least 20 times in conference presentations.

Overheads VIIOverheads VII

• Be careful with Acronyms…

C_max

C_min

Rangei, Diameteri

R1, D1

R2, D2

Neighboring Unlabeled Token:

sskh f dhfa

Annoying Personal Habits IAnnoying Personal Habits I(This means you)(This means you)

• Playing with jewelry • Licking and/or biting your lips • Constantly adjusting your glasses • Popping the top of a pen • Playing with facial hair (men)• Playing with/twirling your hair (women)

Annoying Personal Habits IIAnnoying Personal Habits II(This means you)(This means you)

• Jingling change in your pocket • Leaning against anything for support• Fillers: “ah”, “um”, and “and”• Starting every sentence with the same word • Sticky floor syndrome• Avoiding eye contact• Lack of enthusiasm “Basically” and

“essentially” seem to be the current favorites.

ConclusionConclusion

• We have motivated the need for a high quality talk

• We have seen various tips on creating high quality overheads

• We have seen various hints on avoiding common pitfalls