View
22
Download
2
Category
Preview:
DESCRIPTION
Association Mining. Data Mining Spring 2012. Transactional Database. Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk}. Items and Itemsets. Item = {Milk}, {Cheese}, {Bread}, etc. Itemset = {Milk}, {Milk, Cheese}, {Bacon, Bread, Milk} - PowerPoint PPT Presentation
Citation preview
Association Mining
Data Mining
Spring 2012
• Transactional Database
• Transaction – A row in the database
• i.e.: {Eggs, Cheese, Milk}
Transactional Database
Transactional dataset
Eggs Cheese Milk
Milk Jam
Cheese Bacon Eggs Cat food
Butter Bread
Bread Butter Eggs Milk Cheese
• Item = {Milk}, {Cheese}, {Bread}, etc.
• Itemset = {Milk}, {Milk, Cheese}, {Bacon, Bread, Milk}
• Doesn’t have to be in the dataset
• Can be of size 1 – n
Items and Itemsets
Transactional dataset
Eggs Cheese Milk
Milk Jam
Cheese Bacon Eggs Cat food
Butter Bread
Bread Butter Eggs Milk Cheese
The Support Measure
Support Examples
Support({Eggs}) = 3/5 = 60%
Support({Eggs, Milk}) = 2/5 = 40%
Transactional dataset
Eggs Cheese Milk
Milk Jam
Cheese Bacon Eggs Cat food
Butter Bread
Bread Butter Eggs Milk Cheese
Minimum Support
Minsup – The minimum support threshold for an itemset to be considered frequent (User defined)
Frequent itemset – an itemset in a database whose support is greater than or equal to minsup.
Support(X) > minsup = frequent
Support(X) < minsup = infrequent
Minimum Support Examples Minimum support = 50% Support({Eggs}) = 3/5 = 60% Pass
Support({Eggs, Milk}) = 2/5 = 40% Fail
Transactional dataset
Eggs Cheese Milk
Milk Jam
Cheese Bacon Eggs Cat food
Butter Bread
Bread Butter Eggs Milk Cheese
Association Rules
Confidence Example 1
{Eggs} => {Bread}
Confidence = sup({Eggs, Bread})/Sup({Eggs})
Confidence = (1/5)/(3/5) = 33%
Transactional dataset
Eggs Cheese Milk
Milk Jam
Cheese Bacon Eggs Cat food
Butter Bread
Bread Butter Eggs Milk Cheese
Confidence Example 2
{Milk} => {Eggs, Cheese}
Confidence = sup({Milk, Eggs, Cheese})/sup({Milk})
Confidence = (2/5)/(3/5) = 66%
Transactional dataset
Eggs Cheese Milk
Milk Jam
Cheese Bacon Eggs Cat food
Butter Bread
Bread Butter Eggs Milk Cheese
Strong Association Rules
Minimum Confidence – A user defined minimum bound on confidence. (Minconf)
Strong association rule – a rule X=>Y whose conf > minconf.
- this is a potentially interesting rule for the user.
Conf(X=>Y) > minconf = strong
Conf(X=>Y) < minconf = uninteresting
Minimum Confidence Example
Minconf = 50%
{Eggs} => {Bread}
Confidence = (1/5)/(3/5) = 33% Fail
{Milk} => {Eggs, Cheese}
Confidence = (2/5)/(3/5) = 66% Pass
Association Mining
Association Mining:
- Finds strong rules contained in a dataset from frequent itemsets.
Can be divided into two major subtasks:1. Finding frequent itemsets2. Rule generation
• Some algorithms change items into letters or numbers
• Numbers are more compact
• Easier to make comparisons
Transactional Database Revisited
Transactional dataset
1 2 3
3 5
2 7 1 4
6 8
8 6 1 3 2
Basic Set Logic
Subset – a subset itemset X is contained in an itemset Y.
Superset – a superset itemset Y contains an itemset X.
example: X = {1,2} Y = {1,2,3,5} Y
X
Apriori
Arranges database into a temporary lattice structure to find associations
Apriori principle –
1. itemsets in the lattice with support < minsup will only produce supersets with support < minsup.
2. the subsets of frequent itemsets are always frequent.
Prunes lattice structure of non-frequent itemsets using minsup.
Reduces the number of comparisons Reduces the number of candidate itemsets
Monotonicity
Monotone (upward closed) - if X is a subset of Y,
then support(X) cannot exceed support(Y).
Anti-Monotone (downward closed) - if X is a subset of Y, then support(Y) cannot exceed support(X).
Apriori is anti-monotone.- uses this property to prune the lattice structure.
Itemset Lattice
Lattice Pruning
Lattice Example
1 2 3 4 5
2 4
1 2 4
1 4
Count occurrences of each 1-itemset in the database and compute their support: Support = #occurrences/#rows in dbPrune anything less than minsup = 30%
Lattice Example
1 2 3 4 5
2 4
1 2 4
1 4
1 2 3 4 5
2 4
1 2 4
1 4
1 2 3 4 5
2 4
1 2 4
1 4
Count occurrences of each 2-itemset in the database and compute their supportPrune anything less than minsup = 30%
Lattice Example
A B C D E
B D
A B D
A D
Count occurrences of the last 3-itemset in the database and compute its support.Prune anything less than minsup = 30%
Example - Results
1 2 3 4 5
2 4
1 2 4
1 4
Frequent itemsets: {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3}
Apriori Algorithm
Frequent Itemset Generation
Itemset Support Frequent
{1} 75% Yes
{2} 50% No
{3} 75% Yes
{4} 25% No
{5} 100% Yes
Transactional Database
1 2 3 4 5
2 3 5
1 3 5
1 5
1. Minsup = 70%2. Generate all 1-itemsets3. Calculate the support for each itemset4. Determine whether or not the itemsets are frequent
Frequent Itemset Generation
Itemset Support Frequent
{1,3} 50% Yes
{1,5} 75% Yes
{3,5} 75% Yes
Transactional Database
1 2 3 4 5
2 3 5
1 3 5
1 5
Generate all 2-itemsets, minsup = 70%
{1} U {3} = {1,3} , {1} U {5} = {1,5}
{3} U {5} = {3,5}
Frequent Itemset Generation
Itemset Support Frequent
{1,3,5} 50% Yes
Transactional Database
1 2 3 4 5
2 3 5
1 3 5
1 5
Generate all 3-itemsets, minsup = 70%
{1,3} U {1,5} = {1,3,5}
Frequent Itemset Results
All frequent itemsets generated are output:
{1} , {3} , {5}
{1,3} , {1,5} , {3,5}
{1,3,5}
Apriori Rule Mining
Apriori Rule Mining
Rule Combinations: 1. {1,2} 2-itemsets
{1}=>{2}{2}=>{1}
2. {1,2,3} 3-itemsets
{1}=>{2,3}{2,3}=>{1}{1,2}=>{3}{3}=>{1,2}{1,3}=>{2}{2}=>{1,3}
Strong Rule Generation
Transactional Database
1 2 3 4 5
2 3 5
1 3 5
1 5
1. I = {{1}, {3}, {5}}2. Rules = X => Y3. Minconf = 80%
Strong Rule Generation
Transactional Database
1 2 3 4 5
2 3 5
1 3 5
1 5
1. I = {{1}, {3}, {5}}2. Rules = X => Y3. Minconf = 80%
Strong Rules Results
All strong rules generated are output:
{1}=>{5}{3}=>{5}{2}=>{3,5}{2,3}=>{5}{2,5}=>{3}
Other Frequent Itemsets
Closed Frequent Itemset – a frequent itemset X who has no immediate supersets with the same support count as X.
Maximal Frequent Itemset – a frequent itemset whom none of its immediate supersets are frequent.
Itemset Relationships
Frequent Itemsets
Closed Frequent Itemsets Maximal
FrequentItemsets
Targeted Association Mining
Targeted Association Mining
* Users may only be interested in specific results
* Potential to get smaller, faster, and more focused results
* Examples: 1. User wants to know how often only bread and garlic cloves occur together.
2. User wants to know what items occur with toilet paper.
Itemset Trees
* Itemset Tree: - A data structure which aids in users querying for a
specific itemset and it’s support.
* Items within a transaction are mapped to integer values and ordered such that each transaction is in lexical order.
{Bread, Onion, Garlic} = {1, 2, 3}
* Why use numbers?- make the tree more compact - numbers follow ordering easily
Itemset Trees
An Itemset Tree T contains: * A root pair (I, f(I)), where I is an itemset and f(I) is its count. * A (possibly empty) set {T1, T2, . . . , Tk} each element of which is an
itemset tree.
* If Ij is in the root, then it will also be inThe root’s children
* If Ij is not in the root, then it might be in the root’s children if:
first_item(I) < first_item(Ij) and
last_item(I) < last_item(Ij)
Building an Itemset TreeLet ci be a node in the itemset tree.Let I be a transaction from the dataset
Loop: Case 1: ci = I
Case 2: ci is a child of I
- make I the parent node of ci
Case 3: ci and I contain a common lexical overlap i.e. {1,2,4} vs. {1,2,6}
- make a node for the overlap- make I and ci it’s children.
Case 4: ci is a parent of I- Loop to check ci’s children- make I a child of ci
Note: {2,6} and {1,2,6} do not have a Lexical overlap
Itemset Trees - Creation
Dataset
2 4
1 2 3 5
3 9
1 2 6
2
2 9
Itemset Trees - Creation
Dataset
2 4
1 2 3 5
3 9
1 2 6
2
2 9
Child node.
Itemset Trees - Creation
Dataset
2 4
1 2 3 5
3 9
1 2 6
2
2 9
Child node.
Itemset Trees - Creation
Dataset
2 4
1 2 3 5
3 9
1 2 6
2
2 9
Child node.
Itemset Trees - Creation
Dataset
2 4
1 2 3 5
3 9
1 2 6
2
2 9
Lexical overlap
Itemset Trees - Creation
Dataset
2 4
1 2 3 5
3 9
1 2 6
2
2 9
Parent node.
Itemset Trees - Creation
Dataset
2 4
1 2 3 5
3 9
1 2 6
2
2 9
Child node.
Itemset Trees – Querying
Let I be an itemset, Let ci be a node in the treeLet totalSup be the total count for I in the tree
For all s.t. first_item(ci) < first_item(I):
Case 1: If I is contained in ci. - Add support to totalSup.
Case 2: If I is not contained and last_item(ci) < last_item(I)- proceed down the tree
Example 1
Itemset Trees - Querying
Querying Example 1:
Query: {2}
totalSup = 0
Itemset Trees - Querying
Querying Example 1:
Query: {2}
2 = 2
Add to support:
totalSup = 3
Itemset Trees - Querying
Querying Example 1:
Query: {2}
1,2 contains 2
Add to support
totalSup = 3 + 2 = 5
Itemset Trees - Querying
Querying Example 1:
Query: {2,9}
3 > 2, and end of Subtree.
Return support
totalSup = 5
Example 2
Itemset Trees - Querying
Querying Example 2:
Query: {2,9}
totalSup = 0
Itemset Trees - Querying
Querying Example 2:
Query: {2,9}
totalSup = 0
2 < 22 < 9 continue
Itemset Trees - Querying
Querying Example 2:
Query: {2,9}
totalSup = 0
2 < 24 < 9
{2,4} doesn’t contain{2,9}, go to next sibling
Itemset Trees - Querying
Querying Example 2:
Query: {2,9}
totalSup = 1
{2,9} = {2,9}
Add to support!
Itemset Trees - Querying
Querying Example 2:
Query: {2,9}
totalSup = 1
1 < 22 < 9
continue
Itemset Trees - Querying
Querying Example 2:
Query: {2,9}
totalSup = 1
1 < 25 < 9
{1,2,3,5} doesn’t contain{2,9}, go to next sibling
Itemset Trees - Querying
Querying Example 2:
Query: {2,9}
totalSup = 1
1 < 26 < 9
{1,2,6} doesn’t contain{2,9}, go to next node
Itemset Trees - Querying
Querying Example 2:
Query: {2,9}
totalSup = 1
3 < 2 <= fail9 < 9
End of tree,
totalSupp = 1
Nodes = 8
Recommended