6
Proceedings of 14th International Conference on Computer and Infonnation Technology (ICCIT 2011) 22-24 December, 2011, Dhaka, Bangladesh Implication of Association Rules Employing FP·Groh Algorithm for Knowledge Discovery A.H.M. Sajedul Hoque 1 , Sujit Kumar Mondal 2 , Tassnim Manami Zaman 1 , Dr. Paresh Chandra Barman 3 , Dr. Md. AI·Amin Bhuiyan 4 I Dept. of Computer Science & Engineering, Northern University Bangladesh, Dhaka, Bangladesh 2 Dept. of Computer Science & Engineering , Islamic University, kushtia, Bangladesh 3 Dept. of Information and Communication Engineering, Islamic University, kushtia, Bangladesh 4 Dept. of Computer Science & Engineering, Jahangirnagar University, Savar, Dhaka,Bangladesh E-mail: sajidiukahoo.com.skmcseiuahoo.com. tassnim.zaman@gmail.com.pcb@iceiu.info. alamin bhuiyanahoo.com Abstract - Nowadays the database of an organization is increasing day by day. Sometimes it is necessary to know the behavior of that organization by retrieving the relationships among different attributes of their database. Implication of association rules provides an efficient way of data mining task which is used to find out the relationships among the items or the attributes of a database. This paper addresses on implication of association rules among the quantitative and categorical attributes of a database employing classical logic and Frequent Pattern (FP) . Growth algorithm. The system is based on generating association rules over binary or categorical attributes and is organized with splitting the quantitative attributes into two or more intervals to generate association rules when the domain of quantitative attribute increases. The effectiveness of the method has been justified over a sample database. Ind Terms - Data Mining, KDD, Association Rule, FP-Growth, FP- Tree. 1. INTRODUCTION Knowledge discovery is the nontrivial extraction of implicit, previously unknown and potentially usel information om data [1]. Knowledge Discovery in Database (KDD) refers to the process that retrieves knowledge om large database. Data mining is a part of KDD process which generates knowledge om preprocessed database. There are lots of data mining tasks such as association rule, regression, clustering and prediction [2]. Among these tasks association rule mining is most prominent. Association rules are used to retrieve relationships among a set of attributes in a database. There are lots of algorithms to generate association rules om a database, such as Apriori, F requent Patte Growth (FP- Growth), Eclat, Recursive Elimination etc. These produced association rules can be used to know the customer behavior of super market, to classi the employees of an organization to offer some opportunities and to generate predictions of an organization. This paper focuses on FP-Growth algorithm to generate association rule om an employee database. The soce database may contain Boolean or categorical or quantitative attributes. In order to generate association rule over quantitative attributes, the domain of quantitative attributes must be split into two or more intervals. This paper explores the generation of association rules on quantitative attributes by employing classical logic. The rest of the paper is organized as follows. Section II describes the basics of association rules. The FP-Growth algorithm is described in section III. The process of mining association rules om a sample database is presented in section IV. Section V illustrates the experimental results. And finally VI concludes the paper. II. ASSOCIATION RULES Association rules provide a convenient and effective way to identi and represent certain dependencies between attributes in a database [3]. An association rule is an implication of the form X Y, where X, Y c I are sets of items called itemsets, and X n Y = ¢. Here, X is called antecedent, and Y consequent. F or example bread butter represents that if a customer buys bread, he also buys butter. Two important constraints are used to mease the interestingness of an association rule, such as support and confidence. The support of a rule, X Y, is the ratio (in percent) of the records contain X U Y to the total number of records in the database. The confidence of an associational rule, X Y, is the ratio (in percent) of the number of records that contain X U Y to the number of records that contain X [4]. The attributes of a database may be either Boolean or quantitative or categorical. The attribute whose domain is {O,I} or {yes,no} or {true, false} is called Boolean attribute. The attribute which has any numerical value is called quantitative attribute. Based on the types of values, the association rules can be classified as Boolean Association rules and quantitative association rules. The association rules with Boolean items is called Boolean association rule. F or example, Keyboard Mouse [support = 20% and confidence = 60%], where both keyboard and mouse are Boolean items. The association rule with quantitative items is called quantitative association rules. F or example ( Age = 26 to 30) (Car = 1,2) [support = 20% and confidence = 60%], where age and car are both quantitative attributes. Tf the number of values in the domain of quantitative attributes is very limited then association rules can be generated easily. But when the number of values increases, the domain must be split into intervals to generate association rules. Usually quantitative association rules are generated using classical logic, which is also called classical association rules. In classical quantitative association rule, the intervals are constructed using classical or crisp set theory. In classical set theory, very precise bounds separate the elements that belong to a certain set 987-161284·908·9111/$26.00 © 2011 IEEE

[IEEE 2011 14th International Conference on Computer and Information Technology (ICCIT) - Dhaka, Bangladesh (2011.12.22-2011.12.24)] 14th International Conference on Computer and Information

Embed Size (px)

Citation preview

Page 1: [IEEE 2011 14th International Conference on Computer and Information Technology (ICCIT) - Dhaka, Bangladesh (2011.12.22-2011.12.24)] 14th International Conference on Computer and Information

Proceedings of 14th International Conference on Computer and Infonnation Technology (ICCIT 2011) 22-24 December, 2011, Dhaka, Bangladesh

Implication of Association Rules Employing FP·Growth Algorithm for Knowledge Discovery

A.H.M. Sajedul Hoque 1 , Sujit Kumar Mondal2 , Tassnim Manami Zaman 1 , Dr. Paresh Chandra Barman 3 , Dr. Md. AI·Amin

Bhuiyan 4 I Dept. of Computer Science & Engineering, Northern University Bangladesh, Dhaka, Bangladesh

2 Dept. of Computer Science & Engineering , Islamic University, kushtia, Bangladesh

3 Dept. of Information and Communication Engineering, Islamic University, kushtia, Bangladesh

4 Dept. of Computer Science & Engineering, Jahangirnagar University, Savar, Dhaka,Bangladesh E-mail: [email protected]@yahoo.com. [email protected]@iceiu.info.

alamin [email protected]

Abstract - Nowadays the database of an organization is increasing day by day. Sometimes it is necessary to know the behavior of that organization by retrieving the relationships among different attributes of their database. Implication of association rules provides an efficient way of data mining task which is used to find out the relationships among the items or the attributes of a database. This paper addresses on implication of association rules among the quantitative and categorical attributes of a database employing classical logic and Frequent Pattern (FP) . Growth algorithm. The system is based on generating association rules over binary or categorical attributes and is organized with splitting the quantitative attributes into two or more intervals to generate association rules when the domain of quantitative attribute increases. The effectiveness of the method has been justified over a sample database.

Index Terms - Data Mining, KDD, Association Rule,

FP-Growth, FP-Tree.

1. INTRODUCTION

Knowledge discovery is the nontrivial extraction of implicit, previously unknown and potentially useful information from data [1]. Knowledge Discovery in Database (KDD) refers to the process that retrieves knowledge from large database. Data mining is a part of KDD process which generates knowledge from preprocessed database. There are lots of data mining tasks such as association rule, regression, clustering and prediction [2]. Among these tasks association rule mining is most prominent. Association rules are used to retrieve relationships among a set of attributes in a database. There are lots of algorithms to generate association rules from a database, such as Apriori, F requent Pattern Growth (FP­Growth), Eclat, Recursive Elimination etc. These produced association rules can be used to know the customer behavior of super market, to classify the employees of an organization to offer some opportunities and to generate predictions of an organization. This paper focuses on FP-Growth algorithm to generate association rule from an employee database. The source database may contain Boolean or categorical or quantitative attributes. In order to generate association rule over quantitative attributes, the domain of quantitative attributes must be split into two or more intervals. This paper explores the generation of association rules on quantitative attributes by employing classical logic.

The rest of the paper is organized as follows. Section II describes the basics of association rules. The FP-Growth

algorithm is described in section III. The process of mining association rules from a sample database is presented in section IV. Section V illustrates the experimental results. And finally VI concludes the paper.

II. ASSOCIATION RULES

Association rules provide a convenient and effective way to identify and represent certain dependencies between attributes in a database [3]. An association rule is

an implication of the form X � Y, where X, Y c I are

sets of items called itemsets, and X n Y = ¢. Here, X is

called antecedent, and Y consequent. F or example

bread � butter represents that if a customer buys bread,

he also buys butter. Two important constraints are used to measure the interestingness of an association rule, such as

support and confidence. The support of a rule, X � Y, is

the ratio (in percent) of the records contain X U Y to the total number of records in the database. The confidence of

an associational rule, X � Y, is the ratio (in percent) of

the number of records that contain X U Y to the number of

records that contain X [4]. The attributes of a database may be either Boolean or quantitative or categorical. The attribute whose domain is {O,I} or {yes,no} or {true, false} is called Boolean attribute. The attribute which has any numerical value is called quantitative attribute. Based on the types of values, the association rules can be classified as Boolean Association rules and quantitative association rules. The association rules with Boolean items is called Boolean association rule. F or example, Keyboard => Mouse [support

= 20% and confidence = 60%], where both keyboard and mouse are Boolean items. The association rule with quantitative items is called quantitative association rules.

F or example ( Age = 26 to 30) => (Car = 1,2) [support =

20% and confidence = 60%], where age and car are both quantitative attributes. Tf the number of values in the domain of quantitative attributes is very limited then association rules can be generated easily. But when the number of values increases, the domain must be split into intervals to generate association rules. Usually quantitative association rules are generated using classical logic, which is also called classical association rules. In classical quantitative association rule, the intervals are constructed using classical or crisp set theory. In classical set theory, very precise bounds separate the elements that belong to a certain set

987-161284·908·9111/$26.00 © 2011 IEEE

Page 2: [IEEE 2011 14th International Conference on Computer and Information Technology (ICCIT) - Dhaka, Bangladesh (2011.12.22-2011.12.24)] 14th International Conference on Computer and Information

from the elements outside the set. Consequently in classical interval, any value of a quantitative attribute belongs to one interval with membership degree 1 and in other intervals the membership degree is o. Another special feature of classical intervals is that there is no overlap among intervals. F or example consider the domain of age attribute of a database is age={22, 27, 33, 43, 54, 67, 70, 80} and it is split into three intervals named low, medium and high, where the ranges are 0 to 39, 40 to 59 and 60 to 80 respectively. So low {(22, 1),(27, 1 ),(33, 1 ),(43,0),(67,0),(70,0),(80,0)}, medium = {(22,0),(27 ,0),(33,0),(43,1 ),(67,1 ),(70,0),(80,0)} and high = {(22, 0),(27, 0),(33, 0),(43, 0),(67, 0),(70, 1),(80, I)} respectively, where each set consists of ordered pairs, the first element of the ordered pair is the element of universal set age and the second element is the membership degree of that element. Classical association rule is

represented as X => Y, where X and Y are classical linguistic terms of quantitative attribute.

Ill. MINING OF ASSOCIATION RULES USING FP -GROWTH ALGORITHM

FP-Growth algorithm is currently one of the fastest algorithms to mine association rules. This algorithm generates frequent itemsets and it does not create huge amount of candidate item sets like Apriori algorithm. Before applying FP-growth algorithm on source database, the source database must be pre-processed. During the pre­processing of the database, the frequency of all items are determined by scanning the whole database, then all infrequent items are discarded from each transaction and finally the items of each transaction are sorted according to their frequency. Then from the pre-processed database a FP­tree is constructed. FP-tree is a highly compact representation of the original database, which is assumed to fit into the main memory. Algorithm 1 shows the algorithm of the construction ofFP-tree [5,6].

Algorithm 1 (FP-tree construction) Input: A transaction database DB and a minimum support threshold �. Output: Its frequent pattern tree, FP-Tree Method: The FP-tree is constructed in the following steps.

1. Scan the transaction database DB once. Collect the set of frequent items F and their supports. Discard all infrequent items from each transaction of DB. Sort F in support descending order as L, the list of frequent items.

2. Create the root of an FP-tree, T, and label it as "null", for each transaction in DB.

3. Select and sort the frequent items in transaction according to the order of L.

4. Let the sorted frequent item list in transaction be [piP], where p is the first element and P is the remaining list. Call insert_tree ([piP), T).

Procedure insert_tree ([piP), T) 1. { 2. If T has a child N such that N.item-name = p.item­

name, then increment N's count by 1; 3. Else do

1.

11. create a new node N; 111. N's count = 1; IV. N's parent link be linked to T; v. N's node-link be linked to the nodes with

the same item-name via the node-link structure;

vi. } 3. IfP is nonempty, Call insert_tree (P, N). 4. }

After having the FP-tree, all frequent itemsets are generated using Algorithm 2 [5,6].

Algorithm 2 (FP-growth) Input: FP-tree constructed based on Algorithm 1, DB and a minimum support threshold �. Output: The complete set of frequent patterns. Method: Call FP-growth (FP-tree, nUll)

Procedure FP-growth (Tree, a) 1. { 2. if Tree contains a single path P

then for each combination (denoted as �) of the nodes in the path P do

generate pattern � U a with support= minimum support of nodes in �;

3. else for each ai in the header of Tree (in reverse

order) do

4.

1. {

11. generate pattern � = a; U a with support

= ai .support;

111. construct Ws conditional pattern base and then Ws conditional FP-tree Tree �;

IV. if Tree fJ::j:. ¢

v. then call FP-growth (Tree p, p) VI. }

This algorithm works by traversing from bottom node ofFP­tree to root node. During traversing at each level of the tree the FP-Growth algorithm checks if the node has a single path. If it is not, at first conditional pattern bases of that node are determined, secondly all infrequent items are discarded from the determined pattern bases and finally a FP-tree is constructed. This process is repeated until a single path is found. If a single path is found, all combinations of the path including candidate node are determined. These combinations are the desired frequent item sets of the desired node. After getting frequent all item sets, association rules are generated whose confidence value is greater than threshold value. In the produced association rule, antecedent consists of any number of items of frequent itemsets and consequent consists of only one item. This paper explores the FP-growth algorithm on a sample employee database, which is illustrated in table 1.

Page 3: [IEEE 2011 14th International Conference on Computer and Information Technology (ICCIT) - Dhaka, Bangladesh (2011.12.22-2011.12.24)] 14th International Conference on Computer and Information

TID Age

1 23

2 28

3 28

4 30

5 30

6 37

7 39

8 41

9 44

10 46

TABLE I EMPLOYEE DATABASE

Salary Education

( Tk.)

12000 High School

15800 Bachelor

17000 Master

21300 Master

9500 High School

28000 PHD

20000 Bachelor

36500 PHD

32000 Master

15000 High School

Service Years

5

3

1

2

1

1

8

II

15

23

Here Age, Salary and Service Years are all quantitative attributes and Education is categorical attribute. It is mentioned that it is necessary to split the quantitative attributes into two or more intervals to generate association rules. In this paper, Age attribute is split into Young (Y), Middle Aged (MA) and Old (0), Salary attribute is split into Low Salary (LSa), High Salary (HSa), Medium Salary (MSa) and finally Service Years are split into High Experienced (HEx), Medium Experienced (MEx) and Low Experienced (LEx). And the domain of categorical attribute, Education, is {High School (HSc), Bachelor (Ba), Master (Ms), PHD}. In this paper, classical association rules are generated over these quantitative attributes and categorical attributes.

IV. GENERATION OF CLASSICAL ASSOCIATION RULE

When the domain of quantitative attributes is limited, it is easy to generate association rules. But when the domain is large, then the domain of quantitative attribute must be split into two or more intervals. In classical association rules, the intervals are constructed employing classical logic. In classical intervals one element belongs to only one interval absolutely, that is, the membership degree of one element in one interval is 1, but in others it is O. The process to generate classical association rule of table 1 is described below step by step:

A. Constructing Classical Intervals Intervals of quantitative attributes can be constructed either by consulting with an expert or by applying statistical approach. In this paper, each quantitative attribute is split into three classical intervals, which are illustrated in fig. 1 [7].

I I I I I I

I" intFrval 2"d iqterval 3'd interval I I I I I I I I I

min mean-sd mean mean+sd max

Values of Quantitative Attribute

F ig.! Classical Intervals Generation

1 st interval: The lower border (Ib) of 1 st interval is the minimum value over the domain of the quantitative attribute. The higher border (hb) is computed using the mean and standard deviation (sd) of the values of quantitative attribute. The mathematical expression of Ib and hb is shown in Eq. 1.

Ib = min(quantitative attribute)

} sd (1) hb=mean--

2 2nd interval: The lower border (Ib) and higher border (hb) of 2nd interval are computed using the Eq. 2.

Ib = mean -s: )

(2) sd

hb=mean+-2

3rd interval: Eq. 3 is used to compute the lower border (Ib) and higher border (hb) of 3rd interval.

Ib = mean + sd

} hb = max(qU�ntitative attribute)

(3)

After applying eq. 1, eq. 2 and eq. 3, the classical intervals of Age quantitative attribute will be [23, 30.7], [30.7, 38.5] and [38.5, 46] for Y, MA and 0 respectively, the classical intervals of Salary attribute are [9500, 16291.64], [16291.64, 25128.36] and [25128.36, 36500] for LSa, MSa and HSa respectively and the classical intervals for Service Years are [1, 3.3] for LEx, [3.3,10.7] for MEx and [10.7, 23] for HEx.

B. Preprocessing the Database Before pre-processing the database, the quantitative and categorical attributes should be mapped into Boolean attributes. That is, if any value x belongs to any Boolean attribute X, i.e x EX, x will be mapped into 1, otherwise it will be o. The mapping table from the quantitative attribute of table 1 into Boolean attributes is shown in table II. Then the support of each item is determined by summing all membership degrees of that item in every transaction. F or example, the support of Boolean attribute or item Young (Y), i.e support (Y), is (1 + 1 + 1 + 1 + 1 +0+0+0+0+0) or 5.

Page 4: [IEEE 2011 14th International Conference on Computer and Information Technology (ICCIT) - Dhaka, Bangladesh (2011.12.22-2011.12.24)] 14th International Conference on Computer and Information

TABLE II MAPPING TABLE OF EMPLOYEE DATABASE

TID Age Salary Y MA 0 LSa MSa

[23, 30.7] [30.7,38.5] [38.5,46] [9500, [16291.64, 16291.64] 25128.36]

1 1 0 0 1 0 2 1 0 0 1 0 3 I 0 0 0 I 4 1 0 0 0 1 5 1 0 0 I 0 6 0 1 0 0 0 7 0 0 1 0 I 8 0 0 1 0 0 9 0 0 1 0 0

10 0 0 1 1 0

The frequency of each item in table 2 is shown in table III.

TABLE III FREQUENCY Of ITEMS

SI. No Items Support 1 Y ( Young ) 5 2 MA ( Middle Aged) 1 3 o ( Old ) 4 4 LSa ( Low Salary) 4 5 MSa ( Medium Salary) 3 6 HSa (High Salary) 3 7 HSc (High School) 3 8 Ba ( Bachelor) 2 9 Ms ( Master) 3

10 PHD 2 11 LEx ( Low Experienced) 5 12 MEx ( Medium Experienced) 2 13 HEx ( High Experienced) 3

In this paper, the assumed minimum support and minimum confidence are respectively 20% and 70%. So the only infrequent item in table III is MA (Middle Aged). After discarding the infrequent item, MA, from every transaction and sorting the items in each transaction according to the descending order of their frequency in the database, the Pre­Processed database shown in table IV is found.

SI. No 1 2 3 4 5 6 7 8 9

10

TABLE IV P RE-P ROCESSED DATABASE

Items ( ordered) Frequent Items Y LSaHSc MEx Y LSaHSc MEx Y LSa Ba LEx Y LEx LSa Ba Y MSa Ms LEx Y LEx MSa Ms Y MSa Ms LEx Y LEx MSa Ms Y LSa HSc LEx Y LEx LSaHSc MA HSa PHD LEx LEx HSa PHD O MSa Ba MEx o MSa Ba MEx o HSaPHD HEx o HSa HEx PHD o HSa Ms HEx o HSa Ms HEx o LSa HSc HEx o LSa HSc HEx

C. Building the Classical FP-tree An FP-tree is basically a prefix tree in which each path represents a set of transactions that share the same prefix. Then the procedure insert_tree ([piP], T) is applied on ordered frequent items in table 3 to build up FP-tree, which is shown in fig. 2.

Education Service Years HSa HSc Ba Ms PHD LEx MEx HEx

[25128.36, [1,3.3] [3.3, 10.7] [10.7,23] 36500]

0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 I 0 1 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 1 0 0 I 0 0 0 1 0 1 0 0 0 0 0 1

Y: 5

LEx: 5

0:4

LSa:4

MSa: 3

HSa: 3

HSc: 3

Ms: 3

HEx: 3

Ba: 2

PHD: 2

MEx: 2

Fig. 2 Classical FP-tree

According to the procedure, at first a root node is created, which is labelled by "null". Then the first branch is constructed for transaction <Y:l, LSa:l, HSc:l, MEx:l>, where four new nodes are created for items Y, LSa, HSc and MEx. And the node Y is linked as the child of null, LSa is linked as the child of Y, HSc is linked as the child of node LSa, finally the node MEx is linked as the child of node HSc. The next transaction is <Y: I, LEx:l, LSa:l, Ba:l> which shares the same common prefix <V> with the existing path <Y LEx LSa Ba>. So it is not necessary to create new node for node Y, only the count of the node is incremented by 1. But three nodes for LEx, LSa and Ba are created and their child link is also created. In this way all transactions are embedded into FP-tree. During the construction of FP­tree, a header file is build, in which each item points to its first occurrence in the tree via a node-link. Nodes with the same item name are linked in sequence via such node-links.

Page 5: [IEEE 2011 14th International Conference on Computer and Information Technology (ICCIT) - Dhaka, Bangladesh (2011.12.22-2011.12.24)] 14th International Conference on Computer and Information

D. Generation Frequent Itemsets After having the FP-tree, the Algorithm 2 is applied on this FP-tree to generate frequent itemsets. The FP-Growth algorithm process the node HEx first, then PHD, then Sa and so on until the root node "null". Here the node HEx has two paths: <Y:5, LSa:1, HSc:1, HEx:1> and <0:4, MSa:1, Sa:1, HEx:1>. The two prefix paths of HEx item are <Y:1, LSa:1, HSc:1> and <0:1, MSa: 1, Sa: 1>, which are also called the conditional pattern bases of HEx. Since all items in conditional pattern bases of HEx are infrequent, no FP-tree can be constructed. Thus there is no frequent item sets including item HEx. Again consider the item PHD, which has also two paths: <LEx:1, HSa:1, PHD:1> and <0:4, HSa:2, HEx:1, PHD:1>. So the conditional pattern bases of item PHD are <LEx:1, HSa:1> and <0:1, HSa:1, HEx:1>. Here the only frequent item is HSa and the support of this item is 2. After discarding the infrequent item from the conditional pattern bases, the FP­tree shown in fig. 3 is constructed.

HSa

Fig.3 : FP-tree of PHD item

Since the FP-tree for PHD item has a single path, the frequent itemsets including PHD item is <HSa PHD:2>. In this way all frequent item sets are determined from FP-tree shown in 2, which are listed in table V.

Table V Frequent Itemsets

item Conditional Pattern Bases Frequent item sets MEx {<Y: I, LSa: I, ¢

HSc:I>,<O:I, MSa:l, Ba:I>}

PHD {<LEx:l, HSa:I>, <0:1, <HSa PHD: 2> HSa:l, HEx:I>}

Ba {<Y:I, Lex: I, LSa:I>, ¢ <0:1, MSa:I>}

HEx {<O:I, HSa:I>, <0:1, <0 HSa HEx:2>,< 0 HEx:2>, HSa:l, Ms:I>, <0:1, <HSa HEx:2>

LSa: 1 ,HSc: 1 >} Ms {<Y:2, Lex:2, MSa:2>, <Y LEx MSa Ms:2>, <Y

0:1, HSa:l} Ms:2>, <LEx Ms:2>, <MSa Ms:2>, <LEx MSa Ms:2>, <Y MSa Ms:2>, <Y LEx Ms:2>

HSc {<Y:I, LSa:I>,<Y:I, <Y LSa HSc:2>, <Y HSc:2>, LEx:l, LSa:I>,<O:I, <LSa HSc:2>

LSa:I>} HSa {<LEx:I>, <0:2>} <0 HSa:2> MSa {<Y:2, LEx:2>,<0:1>} <Y MSa:2>, <LEx MSa:2>,

<Y LEx MSa:2> LSa {<Y:l>,<Y:2, <Y LEx LSa:2>, <Y LSa:2>,

LEx:2>,<0:1>} <LEx LSa:2> 0 ¢ ¢

LEx {<Y:4>} <Y LEx:4> Y ¢ ¢

E. Generation Classical Association Rules From the determined frequent itemsets, the interesting association rules are generated. The confidence of

association rule, X � Y, is determined using the Eq. 4.

confidence(X => Y) sup port(X u Y)

xl 00% sup port(X)

(4)

During the generation of association from frequent itemsets, any one item of the frequent item sets is placed as consequent and the rest of the items are placed as antecedent in association rule. Then the confidence value is determined using equation 4. If the determined confidence value of the rule is greater than the minimum confidence, then the association rule is interesting, otherwise not. For example, consider the frequent item set <0 HSa HEx:2>, where the support of this association is 2. So the candidate association

rules are 0 HSa => HEx , 0 HEx => HSa and

HSa HEx => 0 . The confidences are:

confidence(O HSa => HEx)

sup port(O u HSa u HEx) = x 100%

sup porte 0 u HSa)

2 = -x100% = 100%

2 confidence(O HEx => HSa)

sup porte 0 u HSa u HEx) = x 100%

sup port(Ou HEx)

2 = -x 100% = 66.7%

3 confidence(HSa HEx => 0)

sup port(O u HSa u HEx) = x100%

sup port(HSa u HEx)

2 = -x 100% = 100%

2

Since the minimum confidence value is 70%, the interesting

association rules 0 HSa => HEx and HSa HEx => 0 and

the confidence value of both association rules is 100%. The first rule indicates that if an employee is old and draws high salary, he or she posses high experienced and the possibility is 100%. The second association rule represents that if an employee draws high salary and has high experienced, he or she will be old and again the possibility is 100%.

V. EXPERIMENTAL RESULTS

The generated association rules of employee database in table 1employing classical logic and FP-growth algorithm are listed in table VI, which represents overall behaviour of the organization over employees.

Page 6: [IEEE 2011 14th International Conference on Computer and Information Technology (ICCIT) - Dhaka, Bangladesh (2011.12.22-2011.12.24)] 14th International Conference on Computer and Information

SI No I

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

TABLE VI EXPERIMENTAL RESULTS

Generated Classical Association Rules

HSa => PHD, Y LSa => HSc

PHD=> HSa,HSa HEx=> 0

o HEx=> HSa,HEx=> 0

o HSa=> HEx

o => HEx, LSa => LEx

HSa => HEx, HEx => HSa

Y LEx MSa => Ms

Y LEx MS => MSa

Y MSa Ms => Lex

LEx MSa Ms => Y

Ms => Y, Ms => LEx

Y => Ms, LEx => Ms

MSa => Ms, Ms => MSa

MSa Ms=> Lex

LEx Ms=> MSa

LEx MSa =>Ms

Y MSa => Ms

MSa Ms=>Y

Y Ms=> MSa

Y LEx => Ms, Y LEx => MSa

Y Ms => LEx, LEx Ms => Y

LSa HSc=> Y

Y HSc=> LSa

Y => HSc, LSa => HSc

HSc => Y , HSc => LSa

HSa => 0, Y LSa => LEx

o => HSa , Y => MSa

MSa => Y, MSa => LEx

LEx => MSa, Y => LSa

Y MSa=> LEx

LEx MSa=> Y

LEx LSa => Y , Lex => Y

Y LEx => LSa, LSa => Y

LEx => LSa

Y=> LEx

Confidence 66.7%

100%

66.7%

100%

50%

66.7%

100%

100%

100%

100%

66.7%

40%

66.7%

100%

100%

100%

100%

100%

100%

50%

100%

66.7%

100%

40%

66.7%

66.7%

40%

66.7%

40%

100%

100%

100%

50%

40%

80%

VI. CONCLUSION

This paper presents implication of association rules for knowledge discovery. These association rules represent the behaviour of the employee of an organization. Through these association rules one can easily know the salary structure of employee of that organization with age, job experiences and education qualification. Theses association rules can be used to classify the employee to give some opportunities, such as salary increment, promotion etc. In this paper, all quantitative attributes are splitted using classical logic, which leads the over-estimation and under­estimation closer to the boundary. It can be solved using fuzzy logic, which provides the facilities of soft boundary.

ACKNOWLEDGMENT

The authors would like to express their heartiest felicitation and sense of gratitude to Professor Dr. Md. Nurul Islam, Registrar, Northern University Bangladesh for his encouragement regarding their research on Data Mining. They are also grateful to Professor Dr. Mir. Md. Akramuzzaman for his nice cooperation and constant support.

REFERENCES

[I) Frawley, William 1.; Piatetsky-Shapiro, Gregory and Matheus, Christopher 1., "Knowledge Discovery in Databases: An Overview " AAAVMIT Press, 1992, pp-I-27 .

[2) Fayyad; Piatetsky-Shapiro and Smyth, "From Data Mining to Knowledge Discovery in Database", AI Magazine, 1996.

[3) M. De Cock, C. Cornelis, E.E. Kerre: "Elicitation of fuzzy association rules from positive and negative examples". Fuzziness and Uncertainty Modelling Research Unit, Department of Applied Mathematics and Computer Science,Ghent University, Belgium, 19 August ,2004.

[4) Rakesh Agrawal; Tomasz Imielinski and Arun N. Swami, "Mining Association Rules Between Sets of Items in Large Databases", Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207-216, Washington, D.C., May 1993.

[5) liawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", China MachinePress, 2001.8, pp. 158-161.

[6) liawei Han, lian Pei and Yiwen Yin., "Mining Frequent Patterns without Candidate Generation", ACM-SIGMOD Int. Conf. Management of Data ( SIGMOD'OO), pp 1-12, Dallas,TX, May 2000.

[7) Bakk. Lukas Helm, "Fuzzy Association Rules An Implementation in R", Master Thesis, Vienna University of Economics and Business Administration, 2007.