18
1 A Parameterised Algorithm for Mining Association Rul es Department of Information & Computer Education, NTNU Nuansri Denwattana, and Janusz R Getta, Datab ase Conference 2001 (ADC 2001) Proceedings. 12th Au stralasian, 29 Jan.-2 Feb. 2001, pp. 45-51. Advisor Jia-Ling Koh Speaker Chen-Yi Lin

A Parameterised Algorithm for Mining Association Rules

  • Upload
    niyati

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

Department of Information & Computer Education, NTNU. A Parameterised Algorithm for Mining Association Rules. Nuansri Denwattana, and Janusz R Getta, Database Conference 2001 (ADC 2001) Proceedings. 12th Australasian, 29 Jan.-2 Feb. 2001 , pp. 45-51. Advisor : Jia-Ling Koh - PowerPoint PPT Presentation

Citation preview

Page 1: A Parameterised Algorithm for Mining Association Rules

1

A Parameterised Algorithm for Mining Association Rules

Department of Information & Computer Education, NTNU

Nuansri Denwattana, and Janusz R Getta, Database Conference 2001 (ADC 2001) Proceedings. 12th Australasian, 29 Jan.-2 Feb. 200

1, pp. 45-51.

Advisor: Jia-Ling Koh

Speaker: Chen-Yi Lin

Page 2: A Parameterised Algorithm for Mining Association Rules

2

Introduction Problem Definition Finding Frequent Itemsets Experimental Results Conclusion

Department of Information & Computer Education, NTNU

Outline

Page 3: A Parameterised Algorithm for Mining Association Rules

3

Introduction (1/2)

Majority of the algorithms finding frequent itemsets counts one category of itemsets, e.g. Apriori algorithm.

The quality of association rule mining algorithms is determined:– the number of passes through an input dat

aset– the number of candidate itemsets

Department of Information & Computer Education, NTNU

Page 4: A Parameterised Algorithm for Mining Association Rules

4

Introduction (2/2)

One of the objectives is to construct an algorithm that makes a good guess.– the parameterised (n, p) algorithm finds all

frequent itemsets from a range of n levels in itemset lattice in p passes (n>=p) through an input data set.

Department of Information & Computer Education, NTNU

Page 5: A Parameterised Algorithm for Mining Association Rules

5

Problem Definition

Positive candidate itemset– It is assumed (guessed) to be frequent.

Negative candidate itemset– It is assumed (guessed) to be not frequent.

Remaining candidate itemset – candidates verified in another scan.

C

C

RC

Department of Information & Computer Education, NTNU

Page 6: A Parameterised Algorithm for Mining Association Rules

6

Finding Frequent Itemsets (Guessing Candidate Itemsets)

TID Items

1 ABC

2 ABE

3 BCF

4 BDE

5 ACE

6 ABCD

7 ABCE

8 ABCEF

9 ABCEF

10 BCDEF

Item

Freq. According to tr. Length

3 elements

4 elements

5 elements

Total freq

A 3 2 2 7

B 4 2 3 9

C 3 2 3 9

D 1 1 1 3

E 3 1 3 7

F 1 0 3 4

No. of

m-els trs.5 2 3 10

Statistics table T

FEDCBAL ,,,,,1 scan

Department of Information & Computer Education, NTNU

Initial DBscan

Page 7: A Parameterised Algorithm for Mining Association Rules

7

FEDCBAL ,,,,,1

Item frequency threshold = 80%m-element transaction threshold = 5Number of levels to traverse (n) = 3Number of passes through an input data set (p) = 2

EFDFDECFCECDBFBEBDBCAFAEADACABC ,,,,,,,,,,,,,,2

apriori_gen

3-element transactions: 5*80%=4 {B} 4-element transactions: 2*80%=2 {ABC}

5-element transactions: 3*80%=3 {BCEF}

EFCFCEBFBEBCAFAEACABC ,,,,,,,,,2

DFDECDBDADC ,,,,2

Department of Information & Computer Education, NTNU

Statistics table T

Page 8: A Parameterised Algorithm for Mining Association Rules

8

CEFBEFBCFBCEAEFACFACEABFABEABCC ,,,,,,,,,3

apriori_gen

EFCFCEBFBEBCAFAEACABC ,,,,,,,,,2

3C

CEFBEFBCFBCEAEFACFACEABFABEABCC ,,,,,,,,,3

4C

BCEFACEFABEFABCFABCEC ,,,,4

BCEFACEFABEFABCFABCEC ,,,,4

apriori_gen

23 CC

BCEFACEFABEFABCFABCEC ,,,,4

DFDECDBDADC ,,,,2

43 CC

pruning all subsets of positive superset

Department of Information & Computer Education, NTNU

Page 9: A Parameterised Algorithm for Mining Association Rules

9

Finding Frequent Itemsets (Verification of Candidate Itemset

s)Minimum support=20%

23 CC

BCEFACEFABEFABCFABCEC ,,,,4

DFDECDBDADC ,,,,2

43 CC

scan DB (1)

DECDBDL ,,2 3L

BCEFACEFABEFABCFABCEL ,,,,4

RC2

CDEBDEBCDC R ,,3

BCDFBCDEC R ,4

generate remaining candidate itemsets

Department of Information & Computer Education, NTNU

Page 10: A Parameterised Algorithm for Mining Association Rules

10

scan DB (2)

EFDECFCECDBFBEBDBCAFAEACABL ,,,,,,,,,,,,2 CEFBEFBDEBCFBCEBCDAEFACFACEABFABEABCL ,,,,,,,,,,,3

BCEFACEFABEFABCFABCEL ,,,,4

apriori_gen

ABCEFC 5

scan DB

ABCEFL 5

Department of Information & Computer Education, NTNU

Page 11: A Parameterised Algorithm for Mining Association Rules

11

Finding Frequent Itemsets

Department of Information & Computer Education, NTNU

Page 12: A Parameterised Algorithm for Mining Association Rules

12

Experimental Results (1/6)

Parameters:– ntrans- number of transactions in a datab

ase– tl- average transaction length– np- number of patterns– sup-minimum support

Department of Information & Computer Education, NTNU

Page 13: A Parameterised Algorithm for Mining Association Rules

13

Experimental Results (2/6)

A comparison of no. database scans between Apriori and (n, p) algorithm

Department of Information & Computer Education, NTNU

Page 14: A Parameterised Algorithm for Mining Association Rules

14

Experimental Results (3/6)

Performance of Apriori and (n, p) with tl=10 np=10 sup=20%

Department of Information & Computer Education, NTNU

Page 15: A Parameterised Algorithm for Mining Association Rules

15

Experimental Results (4/6)

Performance of Apriori and (n, p) algorithm with tl=14 np=10 sup=20%

Performance of Apriori and (n, p) algorithm with tl=20 np=100 sup=10%

Department of Information & Computer Education, NTNU

Page 16: A Parameterised Algorithm for Mining Association Rules

16

Experimental Results (5/6)

A performance of (n,3) with increasing ratio of (n/p)

Department of Information & Computer Education, NTNU

Page 17: A Parameterised Algorithm for Mining Association Rules

17

Experimental Results (6/6)

A performance of (8,p) with increasing parameter p

Department of Information & Computer Education, NTNU

Page 18: A Parameterised Algorithm for Mining Association Rules

18

Conclusion

The important contribution is the reduction of number scans through a data set.

Department of Information & Computer Education, NTNU