Upload
raj-endran
View
14
Download
3
Embed Size (px)
DESCRIPTION
Data Mining-Association Mining 2
Citation preview
ASSOCIATION RULE MINING
Generating Association Rules from Frequent Itemsets Strong association rules satisfy both minimum
support and minimum confidence levels Confidence (A B)
= P(B / A ) = support_count(A U B) / support_count(A)
Association rules For each frequent itemset l, generate all non-
empty subsets of l For every non-empty subset s of l, output s -s)
if sup_count(l) / sup_count(s) >= min_conf
Example I = {I1, I2, I5} Confidence Threshold : 70% Non empty subsets: {I1, I2}, {I1, I5}, {I2, I5} {I1}, {I2}, {I5} I1 I1 I2 I I1 I2 I5
Improving the Efficiency of Apriori Hash based technique Transaction reduction A transaction which does not contain k frequent
itemsets cannot contain k+1 frequent itemsets Partitioning
Sampling Dynamic itemset counting Start points
Hash Based Technique Partition: Scan Database Only Twice Any itemset that is potentially frequent in DB must
be frequent in at least one of the partitions of DB Scan 1: partition database and find local frequent
patterns Scan 2: consolidate global frequent patterns
Sampling for Frequent Patterns Select a sample of original database, mine frequent
patterns within sample using Apriori Can use a lower support threshold
Scan database once to verify frequent itemsets found in sample
Scan database again to find missed frequent patterns
Bottleneck of Frequent-pattern Mining Multiple database scans are costly Mining long patterns needs many passes of scanning
and generates lots of candidates To find frequent itemset i1i2i100 # of scans: 100 # of Candidates: = 2100-1 = 1.27*1030
Bottleneck: candidate-generation-and-test Avoid candidate generation
Mining Frequent Patterns Without Candidate Generation FP Growth Divide and Conquer technique FP-Tree Grow long patterns from short ones using local
frequent items FP-tree from a Transaction Database - Example FP-Growth For each frequent length-1 pattern(Suffix pattern): Construct conditional pattern base (Sub-database
consisting of set of prefix paths co-occurring with suffix)
Construct conditional FP-tree and mine recursively
Generate all combinations of frequent patterns by combing with suffix
FP-Growth
Algorithm Input: A transaction db D; min_sup Output: Frequent patterns Construction of FP-Tree Scan database, collect frequent items F and sort in
descending order of support Create root of FP-tree labeled null For each Trans, sort in descending order [p|P] Insert_tree([p|P],T) If T has a child N = p, increment count else create new node with count 1 and set
parent and node links If P is non-empty call insert_tree(P,N) recursively
Algorithm Procedure FP_growth (Tree, a) If Tree contains a single path P then for each combination of nodes- b generate b a with
support = min. support of nodes in b else for each xi in the header of the Tree { generate pattern b = xi construct bs conditional pattern base and bs
conditional FP_tree Treeb if Treeb < > NULL then call FP_growth(Treeb, b) }
Features Finds long frequent patterns by looking for shorter
ones recursively Items in frequency descending order: the more
frequently occurring, the more likely to be shared Main-memory based FP-tree Efficient and scalable Faster than Apriori
ASSOCIATION RULE MININGGenerating Association Rules from Frequent Itemsets Strong association rules satisfy both minimum support and minimum confidence levels Confidence (A ( B)= P(B / A )= support_count(A U B) / support_count(A)
Association rules For each frequent itemset l, generate all non-empty subsets of l For every non-empty subset s of l, output s (l-s) if sup_count(l) / sup_count(s) >= min_conf
ExampleI = {I1, I2, I5} Confidence Threshold : 70%Non empty subsets: {I1, I2}, {I1, I5}, {I2, I5}{I1}, {I2}, {I5}I1 ( I2 I5, confidence = 2 /4 = 50%I1 ( I5 I2, confidence = 2 /2 = 100%I2 ( I5 I1, confidence = 2 /2 = 100%I1 ( I2 I5, confidence = 2 /6 = 33%I2 ( I1 I5, confidence = 2 /7 = 29%I5 ( I1 I2, confidence = 2 /2 = 100%
Improving the Efficiency of Apriori Hash based technique Transaction reduction A transaction which does not contain k frequent itemsets cannot contain k+1 frequent itemsets
Partitioning Sampling Dynamic itemset counting Start points
Hash Based TechniquePartition: Scan Database Only Twice Any itemset that is potentially frequent in DB must be frequent in at least one of the partitions of DB Scan 1: partition database and find local frequent patterns Scan 2: consolidate global frequent patterns
Sampling for Frequent Patterns Select a sample of original database, mine frequent patterns within sample using Apriori Can use a lower support threshold
Scan database once to verify frequent itemsets found in sample Scan database again to find missed frequent patterns
Bottleneck of Frequent-pattern Mining Multiple database scans are costly Mining long patterns needs many passes of scanning and generates lots of candidates To find frequent itemset i1i2i100 # of scans: 100 # of Candidates: = 2100-1 = 1.27*1030
Bottleneck: candidate-generation-and-test Avoid candidate generation
Mining Frequent Patterns Without Candidate Generation FP Growth Divide and Conquer technique FP-Tree Grow long patterns from short ones using local frequent items
FP-tree from a Transaction Database - ExampleFP-Growth For each frequent length-1 pattern(Suffix pattern): Construct conditional pattern base (Sub-database consisting of set of prefix paths co-occurring with suffix) Construct conditional FP-tree and mine recursively Generate all combinations of frequent patterns by combing with suffix
FP-GrowthAlgorithmInput: A transaction db D; min_supOutput: Frequent patternsConstruction of FP-Tree Scan database, collect frequent items F and sort in descending order of support Create root of FP-tree labeled nullFor each Trans, sort in descending order [p|P]Insert_tree([p|P],T)If T has a child N = p, increment countelse create new node with count 1 and set parent and node linksIf P is non-empty call insert_tree(P,N) recursively
AlgorithmProcedure FP_growth (Tree, a)If Tree contains a single path P thenfor each combination of nodes- b generate b ( a with support = min. support of nodes in belse for each xi in the header of the Tree{generate pattern b = xi a with support = xi.supportconstruct bs conditional pattern base and bs conditional FP_tree Treebif Treeb < > NULL then call FP_growth(Treeb, b)}
Features Finds long frequent patterns by looking for shorter ones recursively Items in frequency descending order: the more frequently occurring, the more likely to be shared Main-memory based FP-tree Efficient and scalable Faster than Apriori