Upload
niyati
View
44
Download
0
Embed Size (px)
DESCRIPTION
Department of Information & Computer Education, NTNU. A Parameterised Algorithm for Mining Association Rules. Nuansri Denwattana, and Janusz R Getta, Database Conference 2001 (ADC 2001) Proceedings. 12th Australasian, 29 Jan.-2 Feb. 2001 , pp. 45-51. Advisor : Jia-Ling Koh - PowerPoint PPT Presentation
Citation preview
1
A Parameterised Algorithm for Mining Association Rules
Department of Information & Computer Education, NTNU
Nuansri Denwattana, and Janusz R Getta, Database Conference 2001 (ADC 2001) Proceedings. 12th Australasian, 29 Jan.-2 Feb. 200
1, pp. 45-51.
Advisor: Jia-Ling Koh
Speaker: Chen-Yi Lin
2
Introduction Problem Definition Finding Frequent Itemsets Experimental Results Conclusion
Department of Information & Computer Education, NTNU
Outline
3
Introduction (1/2)
Majority of the algorithms finding frequent itemsets counts one category of itemsets, e.g. Apriori algorithm.
The quality of association rule mining algorithms is determined:– the number of passes through an input dat
aset– the number of candidate itemsets
Department of Information & Computer Education, NTNU
4
Introduction (2/2)
One of the objectives is to construct an algorithm that makes a good guess.– the parameterised (n, p) algorithm finds all
frequent itemsets from a range of n levels in itemset lattice in p passes (n>=p) through an input data set.
Department of Information & Computer Education, NTNU
5
Problem Definition
Positive candidate itemset– It is assumed (guessed) to be frequent.
Negative candidate itemset– It is assumed (guessed) to be not frequent.
Remaining candidate itemset – candidates verified in another scan.
C
C
RC
Department of Information & Computer Education, NTNU
6
Finding Frequent Itemsets (Guessing Candidate Itemsets)
TID Items
1 ABC
2 ABE
3 BCF
4 BDE
5 ACE
6 ABCD
7 ABCE
8 ABCEF
9 ABCEF
10 BCDEF
Item
Freq. According to tr. Length
3 elements
4 elements
5 elements
Total freq
A 3 2 2 7
B 4 2 3 9
C 3 2 3 9
D 1 1 1 3
E 3 1 3 7
F 1 0 3 4
No. of
m-els trs.5 2 3 10
Statistics table T
FEDCBAL ,,,,,1 scan
Department of Information & Computer Education, NTNU
Initial DBscan
7
FEDCBAL ,,,,,1
Item frequency threshold = 80%m-element transaction threshold = 5Number of levels to traverse (n) = 3Number of passes through an input data set (p) = 2
EFDFDECFCECDBFBEBDBCAFAEADACABC ,,,,,,,,,,,,,,2
apriori_gen
3-element transactions: 5*80%=4 {B} 4-element transactions: 2*80%=2 {ABC}
5-element transactions: 3*80%=3 {BCEF}
EFCFCEBFBEBCAFAEACABC ,,,,,,,,,2
DFDECDBDADC ,,,,2
Department of Information & Computer Education, NTNU
Statistics table T
8
CEFBEFBCFBCEAEFACFACEABFABEABCC ,,,,,,,,,3
apriori_gen
EFCFCEBFBEBCAFAEACABC ,,,,,,,,,2
3C
CEFBEFBCFBCEAEFACFACEABFABEABCC ,,,,,,,,,3
4C
BCEFACEFABEFABCFABCEC ,,,,4
BCEFACEFABEFABCFABCEC ,,,,4
apriori_gen
23 CC
BCEFACEFABEFABCFABCEC ,,,,4
DFDECDBDADC ,,,,2
43 CC
pruning all subsets of positive superset
Department of Information & Computer Education, NTNU
9
Finding Frequent Itemsets (Verification of Candidate Itemset
s)Minimum support=20%
23 CC
BCEFACEFABEFABCFABCEC ,,,,4
DFDECDBDADC ,,,,2
43 CC
scan DB (1)
DECDBDL ,,2 3L
BCEFACEFABEFABCFABCEL ,,,,4
RC2
CDEBDEBCDC R ,,3
BCDFBCDEC R ,4
generate remaining candidate itemsets
Department of Information & Computer Education, NTNU
10
scan DB (2)
EFDECFCECDBFBEBDBCAFAEACABL ,,,,,,,,,,,,2 CEFBEFBDEBCFBCEBCDAEFACFACEABFABEABCL ,,,,,,,,,,,3
BCEFACEFABEFABCFABCEL ,,,,4
apriori_gen
ABCEFC 5
scan DB
ABCEFL 5
Department of Information & Computer Education, NTNU
11
Finding Frequent Itemsets
Department of Information & Computer Education, NTNU
12
Experimental Results (1/6)
Parameters:– ntrans- number of transactions in a datab
ase– tl- average transaction length– np- number of patterns– sup-minimum support
Department of Information & Computer Education, NTNU
13
Experimental Results (2/6)
A comparison of no. database scans between Apriori and (n, p) algorithm
Department of Information & Computer Education, NTNU
14
Experimental Results (3/6)
Performance of Apriori and (n, p) with tl=10 np=10 sup=20%
Department of Information & Computer Education, NTNU
15
Experimental Results (4/6)
Performance of Apriori and (n, p) algorithm with tl=14 np=10 sup=20%
Performance of Apriori and (n, p) algorithm with tl=20 np=100 sup=10%
Department of Information & Computer Education, NTNU
16
Experimental Results (5/6)
A performance of (n,3) with increasing ratio of (n/p)
Department of Information & Computer Education, NTNU
17
Experimental Results (6/6)
A performance of (8,p) with increasing parameter p
Department of Information & Computer Education, NTNU
18
Conclusion
The important contribution is the reduction of number scans through a data set.
Department of Information & Computer Education, NTNU