Upload
deepti92pawar
View
130
Download
1
Tags:
Embed Size (px)
DESCRIPTION
This ppt will surely help to understand Apriori and FP-growth algorithm.
Citation preview
A SEMINAR ONTHE COMPARATIVE STUDY OF
APRIORI AND FP-GROWTH ALGORITHM FOR ASSOCIATION RULE MINING
Under the Guidance ofMrs Sankirti Shiravale
By
Deepti Pawar
ContentsIntroduction
Literature Survey
Apriori Algorithm
FP-Growth Algorithm
Comparative Result
Conclusion
Reference
Introduction
Data Mining It is the process of discovering interesting patterns (or knowledge) from large amount of data
bull Which items are frequently purchased with milk
bull Fraud detection Which types of transactions are likely to be fraudulent given the demographics and transactional history of a particular customer
bull Customer relationship management Which of my customers are likely to be the most loyal and which are most likely to leave for a competitor
Data Mining helps extract such information
Introduction (contd) Why Data MiningBroadly the data mining could be useful to answer the queries on
bull Forecasting
bull Classification
bull Association
bull Clustering
bull Making the sequence
Introduction (contd) Data Mining Applicationsbull Aid to marketing or retailing
bull Market basket analysis (MBA)
bull Medicare and health care
bull Criminal investigation and homeland security
bull Intrusion detection
bull Phenomena of ldquobeer and baby diapersrdquo And many morehellip
Literature Survey Association Rule Miningbull Proposed by R Agrawal in 1993
bull It is an important data mining model studied extensively by the database and data mining community
bull Initially used for Market Basket Analysis to find how items purchased by customers are related
bull Given a set of transactions find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction
Literature Survey (contd)Frequent Itemset
bull Itemset A collection of one or more items
Example Milk Bread Diaper k-itemset
An itemset that contains k itemsbull Support count () Frequency of occurrence of an itemset Eg (Milk Bread Diaper) = 2
bull Support Fraction of transactions that contain an itemset Eg s( Milk Bread Diaper ) = 25
bull Frequent Itemset An itemset whose support is greater than or equal
to a minsup threshold
TID Items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke
Literature Survey (contd)Association Rulebull Association Rule An implication expression of
the form X Y where X and Y are itemsets
Example Milk Diaper Beer
bull Rule Evaluation Metrics Support (s)
Fraction of transactions that contain both X and Y
Confidence (c) Measures how often items in
Y appear in transactions thatcontain X
TID Items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke
ExampleBeerDiaperMilk
4052
|T|)BeerDiaperMilk(
s
67032
)DiaperMilk()BeerDiaperMilk(
c
Apriori Algorithm
bull Apriori principle If an itemset is frequent then all of its subsets must also be frequent
bull Apriori principle holds due to the following property of the support measure Support of an itemset never exceeds the support of its subsets This is known as the anti-monotone property of support
Apriori Algorithm (contd)
The basic steps to mine the frequent elements are as follows
bull Generate and test In this first find the 1-itemset frequent elements L1 by scanning the database and removing all those elements from C which cannot satisfy the minimum support criteria
bull Join step To attain the next level elements Ck join the previous frequent elements by self join ie Lk-1Lk-1 known as Cartesian product of Lk-1 ie This step generates new candidate k-itemsets based on joining Lk-1 with itself which is found in the previous iteration Let Ck denote candidate k-itemset and Lk be the frequent k-itemset
bull Prune step This step eliminates some of the candidate k-itemsets using the Apriori property A scan of the database to determine the count of each candidate in Ck would result in the determination of Lk (ie all candidates having a count no less than the minimum support count are frequent by definition and therefore belong to Lk) Step 2 and 3 is repeated until no new candidate set is generated
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
TID Set-of- itemsets
100 134
200 235
300 1235
400 25
Itemset Support
1 2
2 3
3 3
5 3
itemset
1 2
1 3
1 5
2 3
2 5
3 5
TID Set-of- itemsets
100 1 3
200 2 32 5 3 5
300 1 21 31 52 3 2 5 3 5
400 2 5
Itemset Support
1 3 2
2 3 3
2 5 3
3 5 2
itemset
2 3 5
TID Set-of- itemsets
200 2 3 5
300 2 3 5
Itemset Support
2 3 5 2
Database C^1
L2
C2 C^2
C^3
L1
L3C3
Apriori Algorithm (contd) Bottlenecks of Aprioribull It is no doubt that Apriori algorithm successfully finds the frequent
elements from the database But as the dimensionality of the database increase with the number of items then
bull More search space is needed and IO cost will increase
bull Number of database scan is increased thus candidate generation will increase results in increase in computational cost
FP-Growth Algorithm
FP-Growth allows frequent itemset discovery without candidate itemset generation Two step approach
Step 1 Build a compact data structure called the FP-tree Built using 2 passes over the data-set
Step 2 Extracts frequent itemsets directly from the FP-tree
FP-Growth Algorithm (contd)Step 1 FP-Tree Construction FP-Tree is constructed using 2 passes
over the data-setPass 1 Scan data and find support for each
item Discard infrequent items Sort frequent items in decreasing
order based on their supportbull Minimum support count = 2bull Scan database to find frequent 1-itemsetsbull s(A) = 8 s(B) = 7 s(C) = 5 s(D) = 5 s(E) = 3bull 1048698 Item order (decreasing support) A B C D E
Use this order when building the FP-Tree so common prefixes can be shared
FP-Growth Algorithm (contd) Step 1 FP-Tree ConstructionPass 2Nodes correspond to items and have a counter1 FP-Growth reads 1 transaction at a time and maps it to a path
2 Fixed order is used so paths can overlap when transactions share items (when they have the same prefix ) In this case counters are incremented
3 Pointers are maintained between nodes containing the same item creating singly linked lists (dotted lines) The more paths that overlap the higher the compression FP-tree
may fit in memory
4 Frequent itemsets extracted from the FP-Tree
FP-Growth Algorithm (contd) Step 1 FP-Tree Construction (contd)
FP-Growth Algorithm (contd)Complete FP-Tree for Sample Transactions
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
ContentsIntroduction
Literature Survey
Apriori Algorithm
FP-Growth Algorithm
Comparative Result
Conclusion
Reference
Introduction
Data Mining It is the process of discovering interesting patterns (or knowledge) from large amount of data
bull Which items are frequently purchased with milk
bull Fraud detection Which types of transactions are likely to be fraudulent given the demographics and transactional history of a particular customer
bull Customer relationship management Which of my customers are likely to be the most loyal and which are most likely to leave for a competitor
Data Mining helps extract such information
Introduction (contd) Why Data MiningBroadly the data mining could be useful to answer the queries on
bull Forecasting
bull Classification
bull Association
bull Clustering
bull Making the sequence
Introduction (contd) Data Mining Applicationsbull Aid to marketing or retailing
bull Market basket analysis (MBA)
bull Medicare and health care
bull Criminal investigation and homeland security
bull Intrusion detection
bull Phenomena of ldquobeer and baby diapersrdquo And many morehellip
Literature Survey Association Rule Miningbull Proposed by R Agrawal in 1993
bull It is an important data mining model studied extensively by the database and data mining community
bull Initially used for Market Basket Analysis to find how items purchased by customers are related
bull Given a set of transactions find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction
Literature Survey (contd)Frequent Itemset
bull Itemset A collection of one or more items
Example Milk Bread Diaper k-itemset
An itemset that contains k itemsbull Support count () Frequency of occurrence of an itemset Eg (Milk Bread Diaper) = 2
bull Support Fraction of transactions that contain an itemset Eg s( Milk Bread Diaper ) = 25
bull Frequent Itemset An itemset whose support is greater than or equal
to a minsup threshold
TID Items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke
Literature Survey (contd)Association Rulebull Association Rule An implication expression of
the form X Y where X and Y are itemsets
Example Milk Diaper Beer
bull Rule Evaluation Metrics Support (s)
Fraction of transactions that contain both X and Y
Confidence (c) Measures how often items in
Y appear in transactions thatcontain X
TID Items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke
ExampleBeerDiaperMilk
4052
|T|)BeerDiaperMilk(
s
67032
)DiaperMilk()BeerDiaperMilk(
c
Apriori Algorithm
bull Apriori principle If an itemset is frequent then all of its subsets must also be frequent
bull Apriori principle holds due to the following property of the support measure Support of an itemset never exceeds the support of its subsets This is known as the anti-monotone property of support
Apriori Algorithm (contd)
The basic steps to mine the frequent elements are as follows
bull Generate and test In this first find the 1-itemset frequent elements L1 by scanning the database and removing all those elements from C which cannot satisfy the minimum support criteria
bull Join step To attain the next level elements Ck join the previous frequent elements by self join ie Lk-1Lk-1 known as Cartesian product of Lk-1 ie This step generates new candidate k-itemsets based on joining Lk-1 with itself which is found in the previous iteration Let Ck denote candidate k-itemset and Lk be the frequent k-itemset
bull Prune step This step eliminates some of the candidate k-itemsets using the Apriori property A scan of the database to determine the count of each candidate in Ck would result in the determination of Lk (ie all candidates having a count no less than the minimum support count are frequent by definition and therefore belong to Lk) Step 2 and 3 is repeated until no new candidate set is generated
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
TID Set-of- itemsets
100 134
200 235
300 1235
400 25
Itemset Support
1 2
2 3
3 3
5 3
itemset
1 2
1 3
1 5
2 3
2 5
3 5
TID Set-of- itemsets
100 1 3
200 2 32 5 3 5
300 1 21 31 52 3 2 5 3 5
400 2 5
Itemset Support
1 3 2
2 3 3
2 5 3
3 5 2
itemset
2 3 5
TID Set-of- itemsets
200 2 3 5
300 2 3 5
Itemset Support
2 3 5 2
Database C^1
L2
C2 C^2
C^3
L1
L3C3
Apriori Algorithm (contd) Bottlenecks of Aprioribull It is no doubt that Apriori algorithm successfully finds the frequent
elements from the database But as the dimensionality of the database increase with the number of items then
bull More search space is needed and IO cost will increase
bull Number of database scan is increased thus candidate generation will increase results in increase in computational cost
FP-Growth Algorithm
FP-Growth allows frequent itemset discovery without candidate itemset generation Two step approach
Step 1 Build a compact data structure called the FP-tree Built using 2 passes over the data-set
Step 2 Extracts frequent itemsets directly from the FP-tree
FP-Growth Algorithm (contd)Step 1 FP-Tree Construction FP-Tree is constructed using 2 passes
over the data-setPass 1 Scan data and find support for each
item Discard infrequent items Sort frequent items in decreasing
order based on their supportbull Minimum support count = 2bull Scan database to find frequent 1-itemsetsbull s(A) = 8 s(B) = 7 s(C) = 5 s(D) = 5 s(E) = 3bull 1048698 Item order (decreasing support) A B C D E
Use this order when building the FP-Tree so common prefixes can be shared
FP-Growth Algorithm (contd) Step 1 FP-Tree ConstructionPass 2Nodes correspond to items and have a counter1 FP-Growth reads 1 transaction at a time and maps it to a path
2 Fixed order is used so paths can overlap when transactions share items (when they have the same prefix ) In this case counters are incremented
3 Pointers are maintained between nodes containing the same item creating singly linked lists (dotted lines) The more paths that overlap the higher the compression FP-tree
may fit in memory
4 Frequent itemsets extracted from the FP-Tree
FP-Growth Algorithm (contd) Step 1 FP-Tree Construction (contd)
FP-Growth Algorithm (contd)Complete FP-Tree for Sample Transactions
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
Introduction
Data Mining It is the process of discovering interesting patterns (or knowledge) from large amount of data
bull Which items are frequently purchased with milk
bull Fraud detection Which types of transactions are likely to be fraudulent given the demographics and transactional history of a particular customer
bull Customer relationship management Which of my customers are likely to be the most loyal and which are most likely to leave for a competitor
Data Mining helps extract such information
Introduction (contd) Why Data MiningBroadly the data mining could be useful to answer the queries on
bull Forecasting
bull Classification
bull Association
bull Clustering
bull Making the sequence
Introduction (contd) Data Mining Applicationsbull Aid to marketing or retailing
bull Market basket analysis (MBA)
bull Medicare and health care
bull Criminal investigation and homeland security
bull Intrusion detection
bull Phenomena of ldquobeer and baby diapersrdquo And many morehellip
Literature Survey Association Rule Miningbull Proposed by R Agrawal in 1993
bull It is an important data mining model studied extensively by the database and data mining community
bull Initially used for Market Basket Analysis to find how items purchased by customers are related
bull Given a set of transactions find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction
Literature Survey (contd)Frequent Itemset
bull Itemset A collection of one or more items
Example Milk Bread Diaper k-itemset
An itemset that contains k itemsbull Support count () Frequency of occurrence of an itemset Eg (Milk Bread Diaper) = 2
bull Support Fraction of transactions that contain an itemset Eg s( Milk Bread Diaper ) = 25
bull Frequent Itemset An itemset whose support is greater than or equal
to a minsup threshold
TID Items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke
Literature Survey (contd)Association Rulebull Association Rule An implication expression of
the form X Y where X and Y are itemsets
Example Milk Diaper Beer
bull Rule Evaluation Metrics Support (s)
Fraction of transactions that contain both X and Y
Confidence (c) Measures how often items in
Y appear in transactions thatcontain X
TID Items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke
ExampleBeerDiaperMilk
4052
|T|)BeerDiaperMilk(
s
67032
)DiaperMilk()BeerDiaperMilk(
c
Apriori Algorithm
bull Apriori principle If an itemset is frequent then all of its subsets must also be frequent
bull Apriori principle holds due to the following property of the support measure Support of an itemset never exceeds the support of its subsets This is known as the anti-monotone property of support
Apriori Algorithm (contd)
The basic steps to mine the frequent elements are as follows
bull Generate and test In this first find the 1-itemset frequent elements L1 by scanning the database and removing all those elements from C which cannot satisfy the minimum support criteria
bull Join step To attain the next level elements Ck join the previous frequent elements by self join ie Lk-1Lk-1 known as Cartesian product of Lk-1 ie This step generates new candidate k-itemsets based on joining Lk-1 with itself which is found in the previous iteration Let Ck denote candidate k-itemset and Lk be the frequent k-itemset
bull Prune step This step eliminates some of the candidate k-itemsets using the Apriori property A scan of the database to determine the count of each candidate in Ck would result in the determination of Lk (ie all candidates having a count no less than the minimum support count are frequent by definition and therefore belong to Lk) Step 2 and 3 is repeated until no new candidate set is generated
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
TID Set-of- itemsets
100 134
200 235
300 1235
400 25
Itemset Support
1 2
2 3
3 3
5 3
itemset
1 2
1 3
1 5
2 3
2 5
3 5
TID Set-of- itemsets
100 1 3
200 2 32 5 3 5
300 1 21 31 52 3 2 5 3 5
400 2 5
Itemset Support
1 3 2
2 3 3
2 5 3
3 5 2
itemset
2 3 5
TID Set-of- itemsets
200 2 3 5
300 2 3 5
Itemset Support
2 3 5 2
Database C^1
L2
C2 C^2
C^3
L1
L3C3
Apriori Algorithm (contd) Bottlenecks of Aprioribull It is no doubt that Apriori algorithm successfully finds the frequent
elements from the database But as the dimensionality of the database increase with the number of items then
bull More search space is needed and IO cost will increase
bull Number of database scan is increased thus candidate generation will increase results in increase in computational cost
FP-Growth Algorithm
FP-Growth allows frequent itemset discovery without candidate itemset generation Two step approach
Step 1 Build a compact data structure called the FP-tree Built using 2 passes over the data-set
Step 2 Extracts frequent itemsets directly from the FP-tree
FP-Growth Algorithm (contd)Step 1 FP-Tree Construction FP-Tree is constructed using 2 passes
over the data-setPass 1 Scan data and find support for each
item Discard infrequent items Sort frequent items in decreasing
order based on their supportbull Minimum support count = 2bull Scan database to find frequent 1-itemsetsbull s(A) = 8 s(B) = 7 s(C) = 5 s(D) = 5 s(E) = 3bull 1048698 Item order (decreasing support) A B C D E
Use this order when building the FP-Tree so common prefixes can be shared
FP-Growth Algorithm (contd) Step 1 FP-Tree ConstructionPass 2Nodes correspond to items and have a counter1 FP-Growth reads 1 transaction at a time and maps it to a path
2 Fixed order is used so paths can overlap when transactions share items (when they have the same prefix ) In this case counters are incremented
3 Pointers are maintained between nodes containing the same item creating singly linked lists (dotted lines) The more paths that overlap the higher the compression FP-tree
may fit in memory
4 Frequent itemsets extracted from the FP-Tree
FP-Growth Algorithm (contd) Step 1 FP-Tree Construction (contd)
FP-Growth Algorithm (contd)Complete FP-Tree for Sample Transactions
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
Introduction (contd) Why Data MiningBroadly the data mining could be useful to answer the queries on
bull Forecasting
bull Classification
bull Association
bull Clustering
bull Making the sequence
Introduction (contd) Data Mining Applicationsbull Aid to marketing or retailing
bull Market basket analysis (MBA)
bull Medicare and health care
bull Criminal investigation and homeland security
bull Intrusion detection
bull Phenomena of ldquobeer and baby diapersrdquo And many morehellip
Literature Survey Association Rule Miningbull Proposed by R Agrawal in 1993
bull It is an important data mining model studied extensively by the database and data mining community
bull Initially used for Market Basket Analysis to find how items purchased by customers are related
bull Given a set of transactions find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction
Literature Survey (contd)Frequent Itemset
bull Itemset A collection of one or more items
Example Milk Bread Diaper k-itemset
An itemset that contains k itemsbull Support count () Frequency of occurrence of an itemset Eg (Milk Bread Diaper) = 2
bull Support Fraction of transactions that contain an itemset Eg s( Milk Bread Diaper ) = 25
bull Frequent Itemset An itemset whose support is greater than or equal
to a minsup threshold
TID Items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke
Literature Survey (contd)Association Rulebull Association Rule An implication expression of
the form X Y where X and Y are itemsets
Example Milk Diaper Beer
bull Rule Evaluation Metrics Support (s)
Fraction of transactions that contain both X and Y
Confidence (c) Measures how often items in
Y appear in transactions thatcontain X
TID Items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke
ExampleBeerDiaperMilk
4052
|T|)BeerDiaperMilk(
s
67032
)DiaperMilk()BeerDiaperMilk(
c
Apriori Algorithm
bull Apriori principle If an itemset is frequent then all of its subsets must also be frequent
bull Apriori principle holds due to the following property of the support measure Support of an itemset never exceeds the support of its subsets This is known as the anti-monotone property of support
Apriori Algorithm (contd)
The basic steps to mine the frequent elements are as follows
bull Generate and test In this first find the 1-itemset frequent elements L1 by scanning the database and removing all those elements from C which cannot satisfy the minimum support criteria
bull Join step To attain the next level elements Ck join the previous frequent elements by self join ie Lk-1Lk-1 known as Cartesian product of Lk-1 ie This step generates new candidate k-itemsets based on joining Lk-1 with itself which is found in the previous iteration Let Ck denote candidate k-itemset and Lk be the frequent k-itemset
bull Prune step This step eliminates some of the candidate k-itemsets using the Apriori property A scan of the database to determine the count of each candidate in Ck would result in the determination of Lk (ie all candidates having a count no less than the minimum support count are frequent by definition and therefore belong to Lk) Step 2 and 3 is repeated until no new candidate set is generated
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
TID Set-of- itemsets
100 134
200 235
300 1235
400 25
Itemset Support
1 2
2 3
3 3
5 3
itemset
1 2
1 3
1 5
2 3
2 5
3 5
TID Set-of- itemsets
100 1 3
200 2 32 5 3 5
300 1 21 31 52 3 2 5 3 5
400 2 5
Itemset Support
1 3 2
2 3 3
2 5 3
3 5 2
itemset
2 3 5
TID Set-of- itemsets
200 2 3 5
300 2 3 5
Itemset Support
2 3 5 2
Database C^1
L2
C2 C^2
C^3
L1
L3C3
Apriori Algorithm (contd) Bottlenecks of Aprioribull It is no doubt that Apriori algorithm successfully finds the frequent
elements from the database But as the dimensionality of the database increase with the number of items then
bull More search space is needed and IO cost will increase
bull Number of database scan is increased thus candidate generation will increase results in increase in computational cost
FP-Growth Algorithm
FP-Growth allows frequent itemset discovery without candidate itemset generation Two step approach
Step 1 Build a compact data structure called the FP-tree Built using 2 passes over the data-set
Step 2 Extracts frequent itemsets directly from the FP-tree
FP-Growth Algorithm (contd)Step 1 FP-Tree Construction FP-Tree is constructed using 2 passes
over the data-setPass 1 Scan data and find support for each
item Discard infrequent items Sort frequent items in decreasing
order based on their supportbull Minimum support count = 2bull Scan database to find frequent 1-itemsetsbull s(A) = 8 s(B) = 7 s(C) = 5 s(D) = 5 s(E) = 3bull 1048698 Item order (decreasing support) A B C D E
Use this order when building the FP-Tree so common prefixes can be shared
FP-Growth Algorithm (contd) Step 1 FP-Tree ConstructionPass 2Nodes correspond to items and have a counter1 FP-Growth reads 1 transaction at a time and maps it to a path
2 Fixed order is used so paths can overlap when transactions share items (when they have the same prefix ) In this case counters are incremented
3 Pointers are maintained between nodes containing the same item creating singly linked lists (dotted lines) The more paths that overlap the higher the compression FP-tree
may fit in memory
4 Frequent itemsets extracted from the FP-Tree
FP-Growth Algorithm (contd) Step 1 FP-Tree Construction (contd)
FP-Growth Algorithm (contd)Complete FP-Tree for Sample Transactions
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
Introduction (contd) Data Mining Applicationsbull Aid to marketing or retailing
bull Market basket analysis (MBA)
bull Medicare and health care
bull Criminal investigation and homeland security
bull Intrusion detection
bull Phenomena of ldquobeer and baby diapersrdquo And many morehellip
Literature Survey Association Rule Miningbull Proposed by R Agrawal in 1993
bull It is an important data mining model studied extensively by the database and data mining community
bull Initially used for Market Basket Analysis to find how items purchased by customers are related
bull Given a set of transactions find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction
Literature Survey (contd)Frequent Itemset
bull Itemset A collection of one or more items
Example Milk Bread Diaper k-itemset
An itemset that contains k itemsbull Support count () Frequency of occurrence of an itemset Eg (Milk Bread Diaper) = 2
bull Support Fraction of transactions that contain an itemset Eg s( Milk Bread Diaper ) = 25
bull Frequent Itemset An itemset whose support is greater than or equal
to a minsup threshold
TID Items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke
Literature Survey (contd)Association Rulebull Association Rule An implication expression of
the form X Y where X and Y are itemsets
Example Milk Diaper Beer
bull Rule Evaluation Metrics Support (s)
Fraction of transactions that contain both X and Y
Confidence (c) Measures how often items in
Y appear in transactions thatcontain X
TID Items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke
ExampleBeerDiaperMilk
4052
|T|)BeerDiaperMilk(
s
67032
)DiaperMilk()BeerDiaperMilk(
c
Apriori Algorithm
bull Apriori principle If an itemset is frequent then all of its subsets must also be frequent
bull Apriori principle holds due to the following property of the support measure Support of an itemset never exceeds the support of its subsets This is known as the anti-monotone property of support
Apriori Algorithm (contd)
The basic steps to mine the frequent elements are as follows
bull Generate and test In this first find the 1-itemset frequent elements L1 by scanning the database and removing all those elements from C which cannot satisfy the minimum support criteria
bull Join step To attain the next level elements Ck join the previous frequent elements by self join ie Lk-1Lk-1 known as Cartesian product of Lk-1 ie This step generates new candidate k-itemsets based on joining Lk-1 with itself which is found in the previous iteration Let Ck denote candidate k-itemset and Lk be the frequent k-itemset
bull Prune step This step eliminates some of the candidate k-itemsets using the Apriori property A scan of the database to determine the count of each candidate in Ck would result in the determination of Lk (ie all candidates having a count no less than the minimum support count are frequent by definition and therefore belong to Lk) Step 2 and 3 is repeated until no new candidate set is generated
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
TID Set-of- itemsets
100 134
200 235
300 1235
400 25
Itemset Support
1 2
2 3
3 3
5 3
itemset
1 2
1 3
1 5
2 3
2 5
3 5
TID Set-of- itemsets
100 1 3
200 2 32 5 3 5
300 1 21 31 52 3 2 5 3 5
400 2 5
Itemset Support
1 3 2
2 3 3
2 5 3
3 5 2
itemset
2 3 5
TID Set-of- itemsets
200 2 3 5
300 2 3 5
Itemset Support
2 3 5 2
Database C^1
L2
C2 C^2
C^3
L1
L3C3
Apriori Algorithm (contd) Bottlenecks of Aprioribull It is no doubt that Apriori algorithm successfully finds the frequent
elements from the database But as the dimensionality of the database increase with the number of items then
bull More search space is needed and IO cost will increase
bull Number of database scan is increased thus candidate generation will increase results in increase in computational cost
FP-Growth Algorithm
FP-Growth allows frequent itemset discovery without candidate itemset generation Two step approach
Step 1 Build a compact data structure called the FP-tree Built using 2 passes over the data-set
Step 2 Extracts frequent itemsets directly from the FP-tree
FP-Growth Algorithm (contd)Step 1 FP-Tree Construction FP-Tree is constructed using 2 passes
over the data-setPass 1 Scan data and find support for each
item Discard infrequent items Sort frequent items in decreasing
order based on their supportbull Minimum support count = 2bull Scan database to find frequent 1-itemsetsbull s(A) = 8 s(B) = 7 s(C) = 5 s(D) = 5 s(E) = 3bull 1048698 Item order (decreasing support) A B C D E
Use this order when building the FP-Tree so common prefixes can be shared
FP-Growth Algorithm (contd) Step 1 FP-Tree ConstructionPass 2Nodes correspond to items and have a counter1 FP-Growth reads 1 transaction at a time and maps it to a path
2 Fixed order is used so paths can overlap when transactions share items (when they have the same prefix ) In this case counters are incremented
3 Pointers are maintained between nodes containing the same item creating singly linked lists (dotted lines) The more paths that overlap the higher the compression FP-tree
may fit in memory
4 Frequent itemsets extracted from the FP-Tree
FP-Growth Algorithm (contd) Step 1 FP-Tree Construction (contd)
FP-Growth Algorithm (contd)Complete FP-Tree for Sample Transactions
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
Literature Survey Association Rule Miningbull Proposed by R Agrawal in 1993
bull It is an important data mining model studied extensively by the database and data mining community
bull Initially used for Market Basket Analysis to find how items purchased by customers are related
bull Given a set of transactions find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction
Literature Survey (contd)Frequent Itemset
bull Itemset A collection of one or more items
Example Milk Bread Diaper k-itemset
An itemset that contains k itemsbull Support count () Frequency of occurrence of an itemset Eg (Milk Bread Diaper) = 2
bull Support Fraction of transactions that contain an itemset Eg s( Milk Bread Diaper ) = 25
bull Frequent Itemset An itemset whose support is greater than or equal
to a minsup threshold
TID Items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke
Literature Survey (contd)Association Rulebull Association Rule An implication expression of
the form X Y where X and Y are itemsets
Example Milk Diaper Beer
bull Rule Evaluation Metrics Support (s)
Fraction of transactions that contain both X and Y
Confidence (c) Measures how often items in
Y appear in transactions thatcontain X
TID Items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke
ExampleBeerDiaperMilk
4052
|T|)BeerDiaperMilk(
s
67032
)DiaperMilk()BeerDiaperMilk(
c
Apriori Algorithm
bull Apriori principle If an itemset is frequent then all of its subsets must also be frequent
bull Apriori principle holds due to the following property of the support measure Support of an itemset never exceeds the support of its subsets This is known as the anti-monotone property of support
Apriori Algorithm (contd)
The basic steps to mine the frequent elements are as follows
bull Generate and test In this first find the 1-itemset frequent elements L1 by scanning the database and removing all those elements from C which cannot satisfy the minimum support criteria
bull Join step To attain the next level elements Ck join the previous frequent elements by self join ie Lk-1Lk-1 known as Cartesian product of Lk-1 ie This step generates new candidate k-itemsets based on joining Lk-1 with itself which is found in the previous iteration Let Ck denote candidate k-itemset and Lk be the frequent k-itemset
bull Prune step This step eliminates some of the candidate k-itemsets using the Apriori property A scan of the database to determine the count of each candidate in Ck would result in the determination of Lk (ie all candidates having a count no less than the minimum support count are frequent by definition and therefore belong to Lk) Step 2 and 3 is repeated until no new candidate set is generated
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
TID Set-of- itemsets
100 134
200 235
300 1235
400 25
Itemset Support
1 2
2 3
3 3
5 3
itemset
1 2
1 3
1 5
2 3
2 5
3 5
TID Set-of- itemsets
100 1 3
200 2 32 5 3 5
300 1 21 31 52 3 2 5 3 5
400 2 5
Itemset Support
1 3 2
2 3 3
2 5 3
3 5 2
itemset
2 3 5
TID Set-of- itemsets
200 2 3 5
300 2 3 5
Itemset Support
2 3 5 2
Database C^1
L2
C2 C^2
C^3
L1
L3C3
Apriori Algorithm (contd) Bottlenecks of Aprioribull It is no doubt that Apriori algorithm successfully finds the frequent
elements from the database But as the dimensionality of the database increase with the number of items then
bull More search space is needed and IO cost will increase
bull Number of database scan is increased thus candidate generation will increase results in increase in computational cost
FP-Growth Algorithm
FP-Growth allows frequent itemset discovery without candidate itemset generation Two step approach
Step 1 Build a compact data structure called the FP-tree Built using 2 passes over the data-set
Step 2 Extracts frequent itemsets directly from the FP-tree
FP-Growth Algorithm (contd)Step 1 FP-Tree Construction FP-Tree is constructed using 2 passes
over the data-setPass 1 Scan data and find support for each
item Discard infrequent items Sort frequent items in decreasing
order based on their supportbull Minimum support count = 2bull Scan database to find frequent 1-itemsetsbull s(A) = 8 s(B) = 7 s(C) = 5 s(D) = 5 s(E) = 3bull 1048698 Item order (decreasing support) A B C D E
Use this order when building the FP-Tree so common prefixes can be shared
FP-Growth Algorithm (contd) Step 1 FP-Tree ConstructionPass 2Nodes correspond to items and have a counter1 FP-Growth reads 1 transaction at a time and maps it to a path
2 Fixed order is used so paths can overlap when transactions share items (when they have the same prefix ) In this case counters are incremented
3 Pointers are maintained between nodes containing the same item creating singly linked lists (dotted lines) The more paths that overlap the higher the compression FP-tree
may fit in memory
4 Frequent itemsets extracted from the FP-Tree
FP-Growth Algorithm (contd) Step 1 FP-Tree Construction (contd)
FP-Growth Algorithm (contd)Complete FP-Tree for Sample Transactions
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
Literature Survey (contd)Frequent Itemset
bull Itemset A collection of one or more items
Example Milk Bread Diaper k-itemset
An itemset that contains k itemsbull Support count () Frequency of occurrence of an itemset Eg (Milk Bread Diaper) = 2
bull Support Fraction of transactions that contain an itemset Eg s( Milk Bread Diaper ) = 25
bull Frequent Itemset An itemset whose support is greater than or equal
to a minsup threshold
TID Items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke
Literature Survey (contd)Association Rulebull Association Rule An implication expression of
the form X Y where X and Y are itemsets
Example Milk Diaper Beer
bull Rule Evaluation Metrics Support (s)
Fraction of transactions that contain both X and Y
Confidence (c) Measures how often items in
Y appear in transactions thatcontain X
TID Items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke
ExampleBeerDiaperMilk
4052
|T|)BeerDiaperMilk(
s
67032
)DiaperMilk()BeerDiaperMilk(
c
Apriori Algorithm
bull Apriori principle If an itemset is frequent then all of its subsets must also be frequent
bull Apriori principle holds due to the following property of the support measure Support of an itemset never exceeds the support of its subsets This is known as the anti-monotone property of support
Apriori Algorithm (contd)
The basic steps to mine the frequent elements are as follows
bull Generate and test In this first find the 1-itemset frequent elements L1 by scanning the database and removing all those elements from C which cannot satisfy the minimum support criteria
bull Join step To attain the next level elements Ck join the previous frequent elements by self join ie Lk-1Lk-1 known as Cartesian product of Lk-1 ie This step generates new candidate k-itemsets based on joining Lk-1 with itself which is found in the previous iteration Let Ck denote candidate k-itemset and Lk be the frequent k-itemset
bull Prune step This step eliminates some of the candidate k-itemsets using the Apriori property A scan of the database to determine the count of each candidate in Ck would result in the determination of Lk (ie all candidates having a count no less than the minimum support count are frequent by definition and therefore belong to Lk) Step 2 and 3 is repeated until no new candidate set is generated
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
TID Set-of- itemsets
100 134
200 235
300 1235
400 25
Itemset Support
1 2
2 3
3 3
5 3
itemset
1 2
1 3
1 5
2 3
2 5
3 5
TID Set-of- itemsets
100 1 3
200 2 32 5 3 5
300 1 21 31 52 3 2 5 3 5
400 2 5
Itemset Support
1 3 2
2 3 3
2 5 3
3 5 2
itemset
2 3 5
TID Set-of- itemsets
200 2 3 5
300 2 3 5
Itemset Support
2 3 5 2
Database C^1
L2
C2 C^2
C^3
L1
L3C3
Apriori Algorithm (contd) Bottlenecks of Aprioribull It is no doubt that Apriori algorithm successfully finds the frequent
elements from the database But as the dimensionality of the database increase with the number of items then
bull More search space is needed and IO cost will increase
bull Number of database scan is increased thus candidate generation will increase results in increase in computational cost
FP-Growth Algorithm
FP-Growth allows frequent itemset discovery without candidate itemset generation Two step approach
Step 1 Build a compact data structure called the FP-tree Built using 2 passes over the data-set
Step 2 Extracts frequent itemsets directly from the FP-tree
FP-Growth Algorithm (contd)Step 1 FP-Tree Construction FP-Tree is constructed using 2 passes
over the data-setPass 1 Scan data and find support for each
item Discard infrequent items Sort frequent items in decreasing
order based on their supportbull Minimum support count = 2bull Scan database to find frequent 1-itemsetsbull s(A) = 8 s(B) = 7 s(C) = 5 s(D) = 5 s(E) = 3bull 1048698 Item order (decreasing support) A B C D E
Use this order when building the FP-Tree so common prefixes can be shared
FP-Growth Algorithm (contd) Step 1 FP-Tree ConstructionPass 2Nodes correspond to items and have a counter1 FP-Growth reads 1 transaction at a time and maps it to a path
2 Fixed order is used so paths can overlap when transactions share items (when they have the same prefix ) In this case counters are incremented
3 Pointers are maintained between nodes containing the same item creating singly linked lists (dotted lines) The more paths that overlap the higher the compression FP-tree
may fit in memory
4 Frequent itemsets extracted from the FP-Tree
FP-Growth Algorithm (contd) Step 1 FP-Tree Construction (contd)
FP-Growth Algorithm (contd)Complete FP-Tree for Sample Transactions
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
Literature Survey (contd)Association Rulebull Association Rule An implication expression of
the form X Y where X and Y are itemsets
Example Milk Diaper Beer
bull Rule Evaluation Metrics Support (s)
Fraction of transactions that contain both X and Y
Confidence (c) Measures how often items in
Y appear in transactions thatcontain X
TID Items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke
ExampleBeerDiaperMilk
4052
|T|)BeerDiaperMilk(
s
67032
)DiaperMilk()BeerDiaperMilk(
c
Apriori Algorithm
bull Apriori principle If an itemset is frequent then all of its subsets must also be frequent
bull Apriori principle holds due to the following property of the support measure Support of an itemset never exceeds the support of its subsets This is known as the anti-monotone property of support
Apriori Algorithm (contd)
The basic steps to mine the frequent elements are as follows
bull Generate and test In this first find the 1-itemset frequent elements L1 by scanning the database and removing all those elements from C which cannot satisfy the minimum support criteria
bull Join step To attain the next level elements Ck join the previous frequent elements by self join ie Lk-1Lk-1 known as Cartesian product of Lk-1 ie This step generates new candidate k-itemsets based on joining Lk-1 with itself which is found in the previous iteration Let Ck denote candidate k-itemset and Lk be the frequent k-itemset
bull Prune step This step eliminates some of the candidate k-itemsets using the Apriori property A scan of the database to determine the count of each candidate in Ck would result in the determination of Lk (ie all candidates having a count no less than the minimum support count are frequent by definition and therefore belong to Lk) Step 2 and 3 is repeated until no new candidate set is generated
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
TID Set-of- itemsets
100 134
200 235
300 1235
400 25
Itemset Support
1 2
2 3
3 3
5 3
itemset
1 2
1 3
1 5
2 3
2 5
3 5
TID Set-of- itemsets
100 1 3
200 2 32 5 3 5
300 1 21 31 52 3 2 5 3 5
400 2 5
Itemset Support
1 3 2
2 3 3
2 5 3
3 5 2
itemset
2 3 5
TID Set-of- itemsets
200 2 3 5
300 2 3 5
Itemset Support
2 3 5 2
Database C^1
L2
C2 C^2
C^3
L1
L3C3
Apriori Algorithm (contd) Bottlenecks of Aprioribull It is no doubt that Apriori algorithm successfully finds the frequent
elements from the database But as the dimensionality of the database increase with the number of items then
bull More search space is needed and IO cost will increase
bull Number of database scan is increased thus candidate generation will increase results in increase in computational cost
FP-Growth Algorithm
FP-Growth allows frequent itemset discovery without candidate itemset generation Two step approach
Step 1 Build a compact data structure called the FP-tree Built using 2 passes over the data-set
Step 2 Extracts frequent itemsets directly from the FP-tree
FP-Growth Algorithm (contd)Step 1 FP-Tree Construction FP-Tree is constructed using 2 passes
over the data-setPass 1 Scan data and find support for each
item Discard infrequent items Sort frequent items in decreasing
order based on their supportbull Minimum support count = 2bull Scan database to find frequent 1-itemsetsbull s(A) = 8 s(B) = 7 s(C) = 5 s(D) = 5 s(E) = 3bull 1048698 Item order (decreasing support) A B C D E
Use this order when building the FP-Tree so common prefixes can be shared
FP-Growth Algorithm (contd) Step 1 FP-Tree ConstructionPass 2Nodes correspond to items and have a counter1 FP-Growth reads 1 transaction at a time and maps it to a path
2 Fixed order is used so paths can overlap when transactions share items (when they have the same prefix ) In this case counters are incremented
3 Pointers are maintained between nodes containing the same item creating singly linked lists (dotted lines) The more paths that overlap the higher the compression FP-tree
may fit in memory
4 Frequent itemsets extracted from the FP-Tree
FP-Growth Algorithm (contd) Step 1 FP-Tree Construction (contd)
FP-Growth Algorithm (contd)Complete FP-Tree for Sample Transactions
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
Apriori Algorithm
bull Apriori principle If an itemset is frequent then all of its subsets must also be frequent
bull Apriori principle holds due to the following property of the support measure Support of an itemset never exceeds the support of its subsets This is known as the anti-monotone property of support
Apriori Algorithm (contd)
The basic steps to mine the frequent elements are as follows
bull Generate and test In this first find the 1-itemset frequent elements L1 by scanning the database and removing all those elements from C which cannot satisfy the minimum support criteria
bull Join step To attain the next level elements Ck join the previous frequent elements by self join ie Lk-1Lk-1 known as Cartesian product of Lk-1 ie This step generates new candidate k-itemsets based on joining Lk-1 with itself which is found in the previous iteration Let Ck denote candidate k-itemset and Lk be the frequent k-itemset
bull Prune step This step eliminates some of the candidate k-itemsets using the Apriori property A scan of the database to determine the count of each candidate in Ck would result in the determination of Lk (ie all candidates having a count no less than the minimum support count are frequent by definition and therefore belong to Lk) Step 2 and 3 is repeated until no new candidate set is generated
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
TID Set-of- itemsets
100 134
200 235
300 1235
400 25
Itemset Support
1 2
2 3
3 3
5 3
itemset
1 2
1 3
1 5
2 3
2 5
3 5
TID Set-of- itemsets
100 1 3
200 2 32 5 3 5
300 1 21 31 52 3 2 5 3 5
400 2 5
Itemset Support
1 3 2
2 3 3
2 5 3
3 5 2
itemset
2 3 5
TID Set-of- itemsets
200 2 3 5
300 2 3 5
Itemset Support
2 3 5 2
Database C^1
L2
C2 C^2
C^3
L1
L3C3
Apriori Algorithm (contd) Bottlenecks of Aprioribull It is no doubt that Apriori algorithm successfully finds the frequent
elements from the database But as the dimensionality of the database increase with the number of items then
bull More search space is needed and IO cost will increase
bull Number of database scan is increased thus candidate generation will increase results in increase in computational cost
FP-Growth Algorithm
FP-Growth allows frequent itemset discovery without candidate itemset generation Two step approach
Step 1 Build a compact data structure called the FP-tree Built using 2 passes over the data-set
Step 2 Extracts frequent itemsets directly from the FP-tree
FP-Growth Algorithm (contd)Step 1 FP-Tree Construction FP-Tree is constructed using 2 passes
over the data-setPass 1 Scan data and find support for each
item Discard infrequent items Sort frequent items in decreasing
order based on their supportbull Minimum support count = 2bull Scan database to find frequent 1-itemsetsbull s(A) = 8 s(B) = 7 s(C) = 5 s(D) = 5 s(E) = 3bull 1048698 Item order (decreasing support) A B C D E
Use this order when building the FP-Tree so common prefixes can be shared
FP-Growth Algorithm (contd) Step 1 FP-Tree ConstructionPass 2Nodes correspond to items and have a counter1 FP-Growth reads 1 transaction at a time and maps it to a path
2 Fixed order is used so paths can overlap when transactions share items (when they have the same prefix ) In this case counters are incremented
3 Pointers are maintained between nodes containing the same item creating singly linked lists (dotted lines) The more paths that overlap the higher the compression FP-tree
may fit in memory
4 Frequent itemsets extracted from the FP-Tree
FP-Growth Algorithm (contd) Step 1 FP-Tree Construction (contd)
FP-Growth Algorithm (contd)Complete FP-Tree for Sample Transactions
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
Apriori Algorithm (contd)
The basic steps to mine the frequent elements are as follows
bull Generate and test In this first find the 1-itemset frequent elements L1 by scanning the database and removing all those elements from C which cannot satisfy the minimum support criteria
bull Join step To attain the next level elements Ck join the previous frequent elements by self join ie Lk-1Lk-1 known as Cartesian product of Lk-1 ie This step generates new candidate k-itemsets based on joining Lk-1 with itself which is found in the previous iteration Let Ck denote candidate k-itemset and Lk be the frequent k-itemset
bull Prune step This step eliminates some of the candidate k-itemsets using the Apriori property A scan of the database to determine the count of each candidate in Ck would result in the determination of Lk (ie all candidates having a count no less than the minimum support count are frequent by definition and therefore belong to Lk) Step 2 and 3 is repeated until no new candidate set is generated
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
TID Set-of- itemsets
100 134
200 235
300 1235
400 25
Itemset Support
1 2
2 3
3 3
5 3
itemset
1 2
1 3
1 5
2 3
2 5
3 5
TID Set-of- itemsets
100 1 3
200 2 32 5 3 5
300 1 21 31 52 3 2 5 3 5
400 2 5
Itemset Support
1 3 2
2 3 3
2 5 3
3 5 2
itemset
2 3 5
TID Set-of- itemsets
200 2 3 5
300 2 3 5
Itemset Support
2 3 5 2
Database C^1
L2
C2 C^2
C^3
L1
L3C3
Apriori Algorithm (contd) Bottlenecks of Aprioribull It is no doubt that Apriori algorithm successfully finds the frequent
elements from the database But as the dimensionality of the database increase with the number of items then
bull More search space is needed and IO cost will increase
bull Number of database scan is increased thus candidate generation will increase results in increase in computational cost
FP-Growth Algorithm
FP-Growth allows frequent itemset discovery without candidate itemset generation Two step approach
Step 1 Build a compact data structure called the FP-tree Built using 2 passes over the data-set
Step 2 Extracts frequent itemsets directly from the FP-tree
FP-Growth Algorithm (contd)Step 1 FP-Tree Construction FP-Tree is constructed using 2 passes
over the data-setPass 1 Scan data and find support for each
item Discard infrequent items Sort frequent items in decreasing
order based on their supportbull Minimum support count = 2bull Scan database to find frequent 1-itemsetsbull s(A) = 8 s(B) = 7 s(C) = 5 s(D) = 5 s(E) = 3bull 1048698 Item order (decreasing support) A B C D E
Use this order when building the FP-Tree so common prefixes can be shared
FP-Growth Algorithm (contd) Step 1 FP-Tree ConstructionPass 2Nodes correspond to items and have a counter1 FP-Growth reads 1 transaction at a time and maps it to a path
2 Fixed order is used so paths can overlap when transactions share items (when they have the same prefix ) In this case counters are incremented
3 Pointers are maintained between nodes containing the same item creating singly linked lists (dotted lines) The more paths that overlap the higher the compression FP-tree
may fit in memory
4 Frequent itemsets extracted from the FP-Tree
FP-Growth Algorithm (contd) Step 1 FP-Tree Construction (contd)
FP-Growth Algorithm (contd)Complete FP-Tree for Sample Transactions
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
TID Set-of- itemsets
100 134
200 235
300 1235
400 25
Itemset Support
1 2
2 3
3 3
5 3
itemset
1 2
1 3
1 5
2 3
2 5
3 5
TID Set-of- itemsets
100 1 3
200 2 32 5 3 5
300 1 21 31 52 3 2 5 3 5
400 2 5
Itemset Support
1 3 2
2 3 3
2 5 3
3 5 2
itemset
2 3 5
TID Set-of- itemsets
200 2 3 5
300 2 3 5
Itemset Support
2 3 5 2
Database C^1
L2
C2 C^2
C^3
L1
L3C3
Apriori Algorithm (contd) Bottlenecks of Aprioribull It is no doubt that Apriori algorithm successfully finds the frequent
elements from the database But as the dimensionality of the database increase with the number of items then
bull More search space is needed and IO cost will increase
bull Number of database scan is increased thus candidate generation will increase results in increase in computational cost
FP-Growth Algorithm
FP-Growth allows frequent itemset discovery without candidate itemset generation Two step approach
Step 1 Build a compact data structure called the FP-tree Built using 2 passes over the data-set
Step 2 Extracts frequent itemsets directly from the FP-tree
FP-Growth Algorithm (contd)Step 1 FP-Tree Construction FP-Tree is constructed using 2 passes
over the data-setPass 1 Scan data and find support for each
item Discard infrequent items Sort frequent items in decreasing
order based on their supportbull Minimum support count = 2bull Scan database to find frequent 1-itemsetsbull s(A) = 8 s(B) = 7 s(C) = 5 s(D) = 5 s(E) = 3bull 1048698 Item order (decreasing support) A B C D E
Use this order when building the FP-Tree so common prefixes can be shared
FP-Growth Algorithm (contd) Step 1 FP-Tree ConstructionPass 2Nodes correspond to items and have a counter1 FP-Growth reads 1 transaction at a time and maps it to a path
2 Fixed order is used so paths can overlap when transactions share items (when they have the same prefix ) In this case counters are incremented
3 Pointers are maintained between nodes containing the same item creating singly linked lists (dotted lines) The more paths that overlap the higher the compression FP-tree
may fit in memory
4 Frequent itemsets extracted from the FP-Tree
FP-Growth Algorithm (contd) Step 1 FP-Tree Construction (contd)
FP-Growth Algorithm (contd)Complete FP-Tree for Sample Transactions
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
Apriori Algorithm (contd) Bottlenecks of Aprioribull It is no doubt that Apriori algorithm successfully finds the frequent
elements from the database But as the dimensionality of the database increase with the number of items then
bull More search space is needed and IO cost will increase
bull Number of database scan is increased thus candidate generation will increase results in increase in computational cost
FP-Growth Algorithm
FP-Growth allows frequent itemset discovery without candidate itemset generation Two step approach
Step 1 Build a compact data structure called the FP-tree Built using 2 passes over the data-set
Step 2 Extracts frequent itemsets directly from the FP-tree
FP-Growth Algorithm (contd)Step 1 FP-Tree Construction FP-Tree is constructed using 2 passes
over the data-setPass 1 Scan data and find support for each
item Discard infrequent items Sort frequent items in decreasing
order based on their supportbull Minimum support count = 2bull Scan database to find frequent 1-itemsetsbull s(A) = 8 s(B) = 7 s(C) = 5 s(D) = 5 s(E) = 3bull 1048698 Item order (decreasing support) A B C D E
Use this order when building the FP-Tree so common prefixes can be shared
FP-Growth Algorithm (contd) Step 1 FP-Tree ConstructionPass 2Nodes correspond to items and have a counter1 FP-Growth reads 1 transaction at a time and maps it to a path
2 Fixed order is used so paths can overlap when transactions share items (when they have the same prefix ) In this case counters are incremented
3 Pointers are maintained between nodes containing the same item creating singly linked lists (dotted lines) The more paths that overlap the higher the compression FP-tree
may fit in memory
4 Frequent itemsets extracted from the FP-Tree
FP-Growth Algorithm (contd) Step 1 FP-Tree Construction (contd)
FP-Growth Algorithm (contd)Complete FP-Tree for Sample Transactions
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm
FP-Growth allows frequent itemset discovery without candidate itemset generation Two step approach
Step 1 Build a compact data structure called the FP-tree Built using 2 passes over the data-set
Step 2 Extracts frequent itemsets directly from the FP-tree
FP-Growth Algorithm (contd)Step 1 FP-Tree Construction FP-Tree is constructed using 2 passes
over the data-setPass 1 Scan data and find support for each
item Discard infrequent items Sort frequent items in decreasing
order based on their supportbull Minimum support count = 2bull Scan database to find frequent 1-itemsetsbull s(A) = 8 s(B) = 7 s(C) = 5 s(D) = 5 s(E) = 3bull 1048698 Item order (decreasing support) A B C D E
Use this order when building the FP-Tree so common prefixes can be shared
FP-Growth Algorithm (contd) Step 1 FP-Tree ConstructionPass 2Nodes correspond to items and have a counter1 FP-Growth reads 1 transaction at a time and maps it to a path
2 Fixed order is used so paths can overlap when transactions share items (when they have the same prefix ) In this case counters are incremented
3 Pointers are maintained between nodes containing the same item creating singly linked lists (dotted lines) The more paths that overlap the higher the compression FP-tree
may fit in memory
4 Frequent itemsets extracted from the FP-Tree
FP-Growth Algorithm (contd) Step 1 FP-Tree Construction (contd)
FP-Growth Algorithm (contd)Complete FP-Tree for Sample Transactions
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd)Step 1 FP-Tree Construction FP-Tree is constructed using 2 passes
over the data-setPass 1 Scan data and find support for each
item Discard infrequent items Sort frequent items in decreasing
order based on their supportbull Minimum support count = 2bull Scan database to find frequent 1-itemsetsbull s(A) = 8 s(B) = 7 s(C) = 5 s(D) = 5 s(E) = 3bull 1048698 Item order (decreasing support) A B C D E
Use this order when building the FP-Tree so common prefixes can be shared
FP-Growth Algorithm (contd) Step 1 FP-Tree ConstructionPass 2Nodes correspond to items and have a counter1 FP-Growth reads 1 transaction at a time and maps it to a path
2 Fixed order is used so paths can overlap when transactions share items (when they have the same prefix ) In this case counters are incremented
3 Pointers are maintained between nodes containing the same item creating singly linked lists (dotted lines) The more paths that overlap the higher the compression FP-tree
may fit in memory
4 Frequent itemsets extracted from the FP-Tree
FP-Growth Algorithm (contd) Step 1 FP-Tree Construction (contd)
FP-Growth Algorithm (contd)Complete FP-Tree for Sample Transactions
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd) Step 1 FP-Tree ConstructionPass 2Nodes correspond to items and have a counter1 FP-Growth reads 1 transaction at a time and maps it to a path
2 Fixed order is used so paths can overlap when transactions share items (when they have the same prefix ) In this case counters are incremented
3 Pointers are maintained between nodes containing the same item creating singly linked lists (dotted lines) The more paths that overlap the higher the compression FP-tree
may fit in memory
4 Frequent itemsets extracted from the FP-Tree
FP-Growth Algorithm (contd) Step 1 FP-Tree Construction (contd)
FP-Growth Algorithm (contd)Complete FP-Tree for Sample Transactions
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd) Step 1 FP-Tree Construction (contd)
FP-Growth Algorithm (contd)Complete FP-Tree for Sample Transactions
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd)Complete FP-Tree for Sample Transactions
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd) Step 2 Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree
Bottom-up algorithm - from the leaves towards the root
Divide and conquer first look for frequent itemsets ending in e then de etc then d then cd etc
First extract prefix path sub-trees ending in an item(set) (using the linked lists)
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd) Prefix path sub-trees (Example)
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd) Example
Let minSup = 2 and extract all frequent itemsets containing E Obtain the prefix path sub-tree for E
Check if E is a frequent item by adding the counts along the linked list (dotted line) If so extract it
Yes count =3 so E is extracted as a frequent itemset
As E is frequent find frequent itemsets ending in e ie DE CE BE and AE
E nodes can now be removed
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd) Conditional FP-Tree
The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions)
I Example FP-Tree conditional on e
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd) Obtain T(DE) from T(E) 4 Use the conditional FP-tree for e to find frequent itemsets ending in DE CE
and AE Note that BE is not considered as B is not in the conditional FP-tree for E
bull Support count of DE = 2 (sum of counts of all Drsquos)bull DE is frequent need to solve CDE BDE ADE if they exist
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd) Current Position of Processing
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd)Solving CDE BDE ADEbull Sub-trees for both CDE and BDE are emptybull no prefix paths ending with C or Bbull Working on ADE
ADE (support count = 2) is frequentsolving next sub problem CE
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd)Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd) Solving for Suffix CE
CE is frequent (support count = 2)bull Work on next sub problems BE (no support) AE
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd) Current Position in Processing
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd) Solving for Suffix AE
AE is frequent (support count = 2)Done with AEWork on next sub problem suffix D
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd) Found Frequent Itemsets with Suffix Ebull E DE ADE CE AE discovered in this order
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
FP-Growth Algorithm (contd) Example (contd)
Frequent itemsets found (ordered by suffix and order in which the are found)
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
Comparative Result
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
Conclusion
It is found that
bull FP-tree a novel data structure storing compressed crucial information about frequent patterns compact yet complete for frequent pattern mining
bull FP-growth an efficient mining method of frequent patterns in large Database using a highly compact FP-tree divide-and-conquer method in nature
bull Both Apriori and FP-Growth are aiming to find out complete set of patterns but FP-Growth is more efficient than Apriori in respect to long patterns
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
References
1 Liwu ZOU Guangwei REN ldquoThe data mining algorithm analysis for personalized servicerdquo Fourth International Conference on Multimedia Information Networking and Security 2012
2 Jun TAN Yingyong BU and Bo YANG ldquoAn Efficient Frequent Pattern Mining Algorithmrdquo Sixth International Conference on Fuzzy Systems and Knowledge Discovery 2009
3 Wei Zhang Hongzhi Liao Na Zhao ldquoResearch on the FP Growth Algorithm about Association Rule Miningrdquo International Seminar on Business and Information Management 2008
4 SP Latha DR NRamaraj ldquoAlgorithm for Efficient Data Miningrdquo In Proc
Intrsquo Conf on IEEE International Computational Intelligence and Multimedia Applications 2007
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
References (contd)
5 Dongme Sun Shaohua Teng Wei Zhang Haibin Zhu ldquoAn Algorithm to Improve the Effectiveness of Apriorirdquo In Proc Intrsquol Conf on 6th IEEE International Conf on Cognitive Informatics (ICCI07) 2007
6 Daniel Hunyadi ldquoPerformance comparison of Apriori and FP-Growth algorithms in generating association rulesrdquo Proceedings of the European Computing Conference 2006
7 By Jiawei Han Micheline Kamber ldquoData mining Concepts and Techniquesrdquo Morgan Kaufmann Publishers 2006
8 Tan P-N Steinbach M and Kumar V ldquoIntroduction to data miningrdquo Addison Wesley Publishers 2006
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993
References (contd)
9 HanJ PeiJ and Yin Y ldquoMining frequent patterns without candidate generationrdquo In Proc ACM-SIGMOD International Conf Management of Data (SIGMOD) 2000
10 R Agrawal Imielinskit SwamiA ldquoMining Association Rules between Sets of Items in Large Databasesrdquo In Proc International Conf of the ACM SIGMOD Conference Washington DC USA 1993