40
Efficient Mining of Both Positive and Negative Association Rules Xindong Wu (*), Chengqi Zhang (+), and Shichao Zhang (+) (*) University of Vermont, USA (+) University of Technology Sydney, Australia xwu@ cs.uvm.edu Presenter: Mike Tripp 1

Efficient Mining of Both Positive and Negative Association Rules

  • Upload
    janae

  • View
    57

  • Download
    0

Embed Size (px)

DESCRIPTION

Efficient Mining of Both Positive and Negative Association Rules. Xindong Wu (*), Chengqi Zhang (+), and Shichao Zhang (+) (*) University of Vermont, USA (+) University of Technology Sydney, Australia xwu@ cs.uvm.edu Presenter: Mike Tripp. Outline. Association Analysis Exceptions - PowerPoint PPT Presentation

Citation preview

Page 1: Efficient Mining of Both Positive and Negative Association Rules

Efficient Mining of Both Positive andNegative Association RulesXindong Wu (*), Chengqi Zhang (+), and Shichao Zhang

(+)

(*) University of Vermont, USA(+) University of Technology Sydney, Australia

[email protected]

Presenter: Mike Tripp 1

Page 2: Efficient Mining of Both Positive and Negative Association Rules

Outline• Association Analysis• Exceptions• Problems• Rules• Examples

• Pruning Strategy• Frequent Items of Potential Interest• Infrequent Items of Potential Interest

• Procedure AllItemsOfInterest• Extracting Positive and Negative Rules• CPIR

• PositiveAndNegativeAssociations• Effectiveness and Efficiency• Experimental Results• Related Work• Exam Questions

2

Page 3: Efficient Mining of Both Positive and Negative Association Rules

Exceptions of Rules• Known as a deviational pattern to a well-known fact, and exhibits

unexpectedness.• Also known as surprising patterns.• Example:• While birds(x) => flies(x), exception:• bird(x), penguin(x) => −flies(x)

• Interesting fact: A => B as a valid rule does not imply −B => −A is a valid rule.

3

Page 4: Efficient Mining of Both Positive and Negative Association Rules

Key Problems in Negative Association Rule Mining• How to effectively search for interesting itemsets.• How to effectively identify negative association rules of

interest.

4

Page 5: Efficient Mining of Both Positive and Negative Association Rules

Association Analysis• Generate all large itemsets: All itemsets that have a support

greater than or equal to the user specified minimum support are generated.

• Generate all the rules that have a minimum confidence in the following naive way: For every large itemset X and any B C X , let A = X − B . If the rule A => B has the minimum confidence (or supp(X)/supp(A) ≥ mc ), then it is a valid rule.

5

Page 6: Efficient Mining of Both Positive and Negative Association Rules

Negation/Types of Rules• The negation of an itemset A is indicated by −A . The support

of −A, supp(−A) = 1 − supp(A). • In particular, for an itemset i1−i2i3, its support is supp(i1−i2i3 )

= supp(i1i3) − supp (i1i2i3).

• Positive rule: A => B• Negative rules:• A => −B• −A => B• −A => −B

6

Page 7: Efficient Mining of Both Positive and Negative Association Rules

Negative Association Rules• Still Difficult: exponential growth of infrequent itemsets• TD={(A,B,D);(B,C,D);(B,D);(B,C,D,E);(A,B,D,F)}• Such a simple database contains 49 infrequent item sets.

7

Page 8: Efficient Mining of Both Positive and Negative Association Rules

Define Negative Association RulesTwo cases:

If both A and B are frequent, A U B is infrequent, is A=>~B a valid rule? If A is frequent, B is infrequent, is A => ~B a valid rule? Maybe, but not

of our interest.Heuristic: Only if both A and B are frequent, will A => ~B

be considered.

8

Page 9: Efficient Mining of Both Positive and Negative Association Rules

Negative Association ExampleConsider supp(c) = 0.6, supp(t) = 0.4, supp(t U c) = 0.05, mc = 0.52• Confidence of t => c is supp(t U c)/supp[t] = 0.05/0.4 = 0.125, which

is < mc = 0.52 and, supp(t U c) = 0.05 is low.• This indicates that t U c is an infrequent itemset and that t => c cannot

be a valid rule.• However...• Supp(t U −c) = supp[t] – supp[t U c] = 0.4 – 0.05 = 0.35 which is high,

and the confidence of t => −c is the ratio supp[t U −c]/supp[t] = 0.35/0.4 = 0.875, which is > mc• Therefore, t => −c is a valid rule.

9

Page 10: Efficient Mining of Both Positive and Negative Association Rules

Identifying Interesting Itemsets• Because of the exponential number of infrequent itemsets in a

database, pruning is critical to efficient search for interesting itemsets.

10

Page 11: Efficient Mining of Both Positive and Negative Association Rules

Pruning Strategy• We want to find interesting itemsets when pruning.• Let’s define an interestingness function interest(X, Y) = |supp(X

U Y) – supp(X)supp(Y)| and a threshold mi• If interest(X, Y) ≥ mi, then the rule X => Y is of potential interest,

and X U Y is referred to as potentially interesting itemset• Using this approach, we can establish an effective pruning

strategy for efficiently identifying all frequent itemsets of potential interest in a database.

• The pruning strategy ensures we can use an Apriori like algorithm. Generating infrequent k itemsets from frequent k-1 itemsets.

11

Page 12: Efficient Mining of Both Positive and Negative Association Rules

Frequent Itemset of Potential Interest

Where f() is a constraint function concerning the support, confidence, and interestingness of X => Y

12

Page 13: Efficient Mining of Both Positive and Negative Association Rules

Infrequent Itemset of Potential Interest

Where g() is a constraint function concerning f() and the support, confidence, and interestingness of X => Y

13

Page 14: Efficient Mining of Both Positive and Negative Association Rules

Bringing them together• Using the fipi and iipi mechanisms for both positive and

negative rule discovery, our search is constrained to seeking interesting rules on certain measures, and pruning is the removal of all uninteresting branches that cannot lead to an interesting rule that would satisfy those constraints.

14

Page 15: Efficient Mining of Both Positive and Negative Association Rules

Procedure AllItemsOfInterestInput: D (a database); minsupp; mininterestOutput: PL (frequent itemsets); NL (infrequent itemsets)

15

Page 16: Efficient Mining of Both Positive and Negative Association Rules

Procedure AllItemsOfInterest

16

Page 17: Efficient Mining of Both Positive and Negative Association Rules

Procedure AllItemsOfInterest• E.g. run of the algorithm (ms=0.3,mi=0.05)

17

TID Items bought T1 {A,B,D}T2 {A,B,C,D}T3 {B,D}T4 {B,C,D,E}T5 {A,E} T6 {B,D,F}T7 {A,E,F}T8 {C,F}T9 {B,C,F}T10 {A,B,C,D,F}

Page 18: Efficient Mining of Both Positive and Negative Association Rules

Procedure AllItemsOfInterest• Generate frequent and infrequent 2-itemset of interest.• When ms = 0.3, L2 = {AB, AD, BC, BD, BF, CD, CF}, N2 = {AC, AE, AF,

BE, CE, DE, DF, EF}• Use interest measure to prune.

18

Page 19: Efficient Mining of Both Positive and Negative Association Rules

Procedure AllItemsOfInterest

• So AD and CD are not of interest, they are removed from L2. 19

Page 20: Efficient Mining of Both Positive and Negative Association Rules

Procedure AllItemsOfInterest• So the resulting frequent 2-itemsets are as follows:

20

Page 21: Efficient Mining of Both Positive and Negative Association Rules

Procedure AllItemsOfInterest• Generate infrequent 2-itemsets useing the iipi measure. • Very similar to frequent 2-itemsets.

21

Page 22: Efficient Mining of Both Positive and Negative Association Rules

Extracting Positive and Negative Rules

• Continue like this to get all the itemsets.

22

TID Items bought T1 {A,B,D}T2 {A,B,C,D}T3 {B,D}T4 {B,C,D,E}T5 {A,E} T6 {B,D,F}T7 {A,E,F}T8 {C,F}T9 {B,C,F}T10 {A,B,C,D,F}

Algorithm iterationFrequent 1-itemset A,B,C,D,E,FFrequent 2-itemset AB,BC,BD,BF,CFInfrequent 2-itemset AC,AE,AF,BE,

CE,CF,DE,EFFrequent 3-itemset BCDInfrequent 3-itemset BCF,BDF

Page 23: Efficient Mining of Both Positive and Negative Association Rules

Extracting Positive and Negative Rules

• Pruning strategy for rule generation: Piatetsky-Shapiro’s argument.

• If Dependence(X,Y) = 1, X and Y are independent.• If Dependence(X,Y) > 1, Y is positively dependent on X.• The bigger the ratio (p(Y | X) – p(Y))/(1 – p(Y)), the higher the

positive dependence.• If Dependence(X,Y) < 1, Y is negatively dependent on X

(~Y is positively dependent on X).• The bigger the ratio (p(Y | X) – p(Y))/(−p(Y)), the higher the

negative dependence.23

Page 24: Efficient Mining of Both Positive and Negative Association Rules

Extracting Both Types of Rules• Conditional probability increment ratio.

• Used to measure the correlation between X and Y. • When CPIR(X|Y)=0, X and Y are dependent. • When it is 1, they are perfectly correlated. • When it is -1, they are perfectly negatively correlated.

24

Page 25: Efficient Mining of Both Positive and Negative Association Rules

Extracting Both Types of Rules• Because p(~A)=1-p(A), we only need the first half of the

previous equation.

or

• This value is used as confidence value.

25

Page 26: Efficient Mining of Both Positive and Negative Association Rules

Association Rules of Interest• Let I be the set of items in a database D, i = A U B ⊆ I be an

itemset, A ∩ B = Ø , supp(A) ≠ 0, supp(B) ≠ 0, and ms, mc and mi > 0 be given by the user. Then,• If supp(A U B) ≥ ms, interest(A, B) ≥ mi, and CIPR(B|A) ≥ mc, then

A => B is a positive rule of interest• If supp(A U −B) ≥ ms, supp(A) ≥ ms, supp(B) ≥ ms, interest(A, −B)

≥mi, and CPIR(−B|A) ≥ mc, then A => −B is a negative rule of interest

• If supp(−A U B) ≥ ms, supp(A) ≥ ms, supp(B) ≥ ms, interest(−A, B) ≥ mi, and CPIR(B|−A) ≥ mc, then −A => B is a negative rule of interest

• If supp(−A U −B) ≥ ms, supp(A) ≥ ms, supp(B) ≥ ms, interest(−A, −B) ≥ mi, and CPIR(−B|−A) ≥ mc, then −A => −B is a negative rule of interest

26

Page 27: Efficient Mining of Both Positive and Negative Association Rules

Example with CPIR• For itemset B U D in PL

• B => D can be a valid positive rule of interest

27

Page 28: Efficient Mining of Both Positive and Negative Association Rules

Extracting rules• One snapshot of an iteration in the algorithm

• The result B => −E is a valid rule.28

Page 29: Efficient Mining of Both Positive and Negative Association Rules

Algorithm Design• Generate the set PL of frequent itemset and the set NL of

infrequent itemsets• Extract positive rules of the form A => B in PL, and negative

rules of the forms A => −B, −A => B, −A => −B in NL

29

Page 30: Efficient Mining of Both Positive and Negative Association Rules

PositiveAndNegativeAssociations

30

Page 31: Efficient Mining of Both Positive and Negative Association Rules

PositiveAndNegativeAssociations

31

Page 32: Efficient Mining of Both Positive and Negative Association Rules

Effectiveness and Efficiency• Aggregated Test Data: used for KDD Cup 2000 Data and

Questions• Can be found: http://www.ecn.purdue.edu/KDDCUP/• Implemented on: Dell Workstation PWS650 w/ 2G of CPU and

2G memory• Language: C++

32

Page 33: Efficient Mining of Both Positive and Negative Association Rules

Experimental Results (1)A comparison with Apriori like algorithm without pruning

MBP = Mining By Pruning MNP = Mining with No-Pruning

33

Page 34: Efficient Mining of Both Positive and Negative Association Rules

Experimental Results (2)• A comparison with no-pruning

34

Page 35: Efficient Mining of Both Positive and Negative Association Rules

Experimental Results• Effectiveness of pruning

35

PII = Positive Items of Interest

NII = Negative Items of Interest

Page 36: Efficient Mining of Both Positive and Negative Association Rules

Related Work• Negative relationships between frequent itemsets,

but not how to find negative rules (Brin, Motwani and Silverstein 1997)

• Strong negative association mining using domain knowledge (Savasere, Ommiecinski and Navathe 1998)

36

Page 37: Efficient Mining of Both Positive and Negative Association Rules

Conclusions• Negative rules are useful• Pruning is essential to find frequent and

infrequent itemsets.• Pruning is important to find negative

association rules.• There could be more negative association

rules if you have different conditions.

37

Page 38: Efficient Mining of Both Positive and Negative Association Rules

Exam Questions• What are the three types dependencies when dependence(X,

Y) are equal, greater than, and less than 1?• If Dependence(X,Y) = 1, X and Y are independent.• If Dependence(X,Y) > 1, Y is positively dependent on X.• If Dependence(X,Y) < 1, Y is negatively dependent on X (−Y is

positively dependent on X).

38

Page 39: Efficient Mining of Both Positive and Negative Association Rules

Exam Questions• Give an example of a rule exception, or surprising pattern.• While birds(x) => flies(x), exception:• bird(x), penguin(x) => −flies(x)

39

Page 40: Efficient Mining of Both Positive and Negative Association Rules

Exam Questions• What does CPIR(X|Y) tell us?• When CPIR(X|Y)=0, X and Y are dependent. • When it is 1, they are perfectly correlated. • When it is -1, they are perfectly negatively correlated.

40