42
Mining Quantitative Association Rules in Large Relational Databases Ramakrishnan Srikant Rakesh Agrawal ACM SIGMOD Conference on Management of Data, 1996 March 21, 2013 (Slides modified from Sasi Sekhar Kunta’s version.) Presented by: Sepehr Amir-Mohammadian

Mining Quantitative Association Rules in Large Relational Databases

  • Upload
    konala

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

Mining Quantitative Association Rules in Large Relational Databases. Ramakrishnan Srikant Rakesh Agrawal ACM SIGMOD Conference on Management of Data, 1996 March 21, 2013 (Slides modified from Sasi Sekhar Kunta’s version.) . Presented by: Sepehr Amir- Mohammadian. Outline. - PowerPoint PPT Presentation

Citation preview

Page 1: Mining Quantitative Association Rules in Large Relational Databases

Mining Quantitative Association Rules in Large Relational

DatabasesRamakrishnan Srikant

Rakesh Agrawal

ACM SIGMOD Conference on Management of Data, 1996

March 21, 2013(Slides modified from Sasi Sekhar Kunta’s version.)

Presented by:Sepehr Amir-

Mohammadian

Page 2: Mining Quantitative Association Rules in Large Relational Databases

2

Outline• Association Rules and Quantitative Association

Rules• Formal Study of Quantitative Association

Analysis• Partitioning Quantitative Attributes• Identifying the Interesting Rules• Candidate Generation• Concluding Remarks• Q&A

Page 3: Mining Quantitative Association Rules in Large Relational Databases

3

Outline• Association Rules and Quantitative Association

Rules• Formal Study of Quantitative Association

Analysis• Partitioning Quantitative Attributes• Identifying the Interesting Rules• Candidate Generation• Concluding Remarks• Q&A

Page 4: Mining Quantitative Association Rules in Large Relational Databases

4

Association Rules• Itemsets and , • Rule • Support: • Confidence: • Find rules that have MinSup and MinConf

Page 5: Mining Quantitative Association Rules in Large Relational Databases

5

Boolean Association Rules

TID A B C D100 1 1 0 1200 0 1 1 1300 1 1 1 0400 0 0 1 0

TID Items100 A B D200 B C D300 A B C400 C

Page 6: Mining Quantitative Association Rules in Large Relational Databases

6

Quantitative Association RulesRecordID Age Married NumCars

100 23 No 1200 25 Yes 1300 29 No 0400 34 Yes 2500 38 Yes 2

Page 7: Mining Quantitative Association Rules in Large Relational Databases

7

Mapping to Boolean Association Rules

• Use as new attribute instead of a categorical attribute

• Use as new attribute instead of a quantitative attribute with a small domain

• Use as new attribute instead of a quantitative attribute with a large domain

RecordID

Age: 20..29

Age: 30..39

Married: Yes

Married: No

NumCars: 0

NumCars: 1

100 1 0 0 1 0 1200 1 0 1 0 0 1300 1 0 0 1 1 0400 0 1 1 0 0 0500 0 1 1 0 0 0

Page 8: Mining Quantitative Association Rules in Large Relational Databases

8

Problems• “MinSup”: If number of partitions is large, the

support of a single partition can be lower• “MinConf”: Information lost during partition

values into intervals. Confidence can be lower as number of intervals is smaller

RecordID Age Married NumCars

100 23 No 1200 25 Yes 1300 29 No 0400 34 Yes 2500 38 Yes 2

Page 9: Mining Quantitative Association Rules in Large Relational Databases

9

Solution• Consider all combinations of adjacent

values/intervals in quantitative attributes Solves “MinSup” problem

• Increase the number of values/intervals, without encountering “MinSup” problem Reduces information loss

• New Problems:– Execution time: Maximum support threshold, MaxSup– Many rules: Interestingness of rules

Page 10: Mining Quantitative Association Rules in Large Relational Databases

10

Steps of Proposed Approach1. Determine the number of partitions for each

quantitative attribute2. Map values/ranges to consecutive integer

values such that the order is preserved3. Find the support of each value of the attributes,

and combine when support is less than MaxSup. Find frequent itemsets, whose support is larger than MinSup

4. Use frequent itemsets to generate association rules

5. Pruning out uninteresting rules

Page 11: Mining Quantitative Association Rules in Large Relational Databases

11

Example• Step 0: Initial set of records

RecordID Age Married NumCars100 23 No 1200 25 Yes 1300 29 No 0400 34 Yes 2500 38 Yes 2

Page 12: Mining Quantitative Association Rules in Large Relational Databases

12

Example – Cont. • Step 1: Determine the partitions for each

quantitative attributes

Intervals for Age

20 .. 24

25 .. 29

30 .. 34

35 .. 39

RecordID Age Married NumCars

100 20 .. 24 No 1

200 25 .. 29 Yes 1

300 25 .. 29 No 0

400 30 .. 34 Yes 2

500 35 .. 39 Yes 2

Page 13: Mining Quantitative Association Rules in Large Relational Databases

13

Example – Cont.• Step 2: Mapping intervals/values to consecutive

intergers

Intervals for Age

Integers

20 .. 24 1

25 .. 29 2

30 .. 34 3

35 .. 39 4

Values for

Married

Integers

Yes 1

No 2

Page 14: Mining Quantitative Association Rules in Large Relational Databases

14

Example – Cont.• Step 2: Mapping intervals/values to consecutive

integers

RecordID Age Married NumCars100 1 2 1200 2 1 1300 2 2 0400 3 1 2500 4 1 2

Page 15: Mining Quantitative Association Rules in Large Relational Databases

15

Example – Cont.• Step 3: Extracting large itemsets

– Some of these itemsets are represented with MinSup = 0.4

Itemset Support323232

Page 16: Mining Quantitative Association Rules in Large Relational Databases

16

Example – Cont.• Step 4: Rule generation

– Some of these rules are represented with MinConf = 0.5

Page 17: Mining Quantitative Association Rules in Large Relational Databases

17

Outline• Association Rules and Quantitative Association

Rules• Formal Study of Quantitative Association

Analysis• Partitioning Quantitative Attributes• Identifying the Interesting Rules• Candidate Generation• Concluding Remarks• Q&A

Page 18: Mining Quantitative Association Rules in Large Relational Databases

18

Formal Study of Quantitative A. A.

• set of attributes• set of positive integers• , denotes that attribute has value • set of items • For any , • , set of records• , a record such that attributes are distinct• A record supports itemset if

• , a quantitative association rule, where

– ,

Page 19: Mining Quantitative Association Rules in Large Relational Databases

19

Formal Definition of Quantitative A. A. – Cont.

• holds in with support , if of the records in support .

• holds in with confidence , if of the records in that support , also support .

• , probability that all items in are supported by a given record

• is a generalization of , denoted by if

Page 20: Mining Quantitative Association Rules in Large Relational Databases

20

Outline• Association Rules and Quantitative Association

Rules• Formal Study of Quantitative Association

Analysis• Partitioning Quantitative Attributes• Identifying the Interesting Rules• Candidate Generation• Concluding Remarks• Q&A

Page 21: Mining Quantitative Association Rules in Large Relational Databases

21

Partitioning Quantitative Attributes• A measure of partial completeness: Information

lost in partitioning– : set of rules obtained before partitioning– : set of rules obtained after partitioning– Partial completeness measures the distance

between a rule in and its closest generalization in – The distance is defined by the ratio of support

• Give the best approach to have minimal number of partitions

Page 22: Mining Quantitative Association Rules in Large Relational Databases

22

Partial Completeness• : the set of frequent itemsets• For any , is -complete w.r.t if

– –

• The smaller is, the less the information lost

Page 23: Mining Quantitative Association Rules in Large Relational Databases

23

Example – K-Completeness• Consider the following set of frequent itemsets:

• Then, items 2, 3, 5, 7 form a 1.5-complete set.• But, items 3,5,7 do not form a 1.5-complete set.

Number Itemset Support1 5%2 6%3 8%4 5%5 6%6 4%7 5%

Page 24: Mining Quantitative Association Rules in Large Relational Databases

24

Confidence of Rules Generated from K-Complete Set

• If is -complete set w.r.t , then any rule obtained from has a generalization from , such that is bounded by

• In the previous example:

Page 25: Mining Quantitative Association Rules in Large Relational Databases

25

K-Completeness for a Single Attribute

• Consider as a quantitative attribute, partitioned into base intervals.

• Suppose than the support for each base interval is less than

• Let be the set of all combinations of base intervals that have .

• Then, is -complete w.r.t. the set of all ranges over .

Page 26: Mining Quantitative Association Rules in Large Relational Databases

26

K-Completeness for a Group of Attributes

• Consider a set 0f quantitative attributes, partitioned into base intervals.

• Suppose that the support for each base interval is less than

• Let be the set of all frequent itemsets over the partitioned attributes.

• Then, is -complete w.r.t. the set of all frequent itemsets without partitioning.

Page 27: Mining Quantitative Association Rules in Large Relational Databases

27

Equi-Depth Partitioning • Equi-depth partitioning: Splitting the support

identically

• Suppose that the number of intervals are given.• Then, equi-depth partitioning minimizes max

support for a base interval , and so minimizes .

• Suppose that is given and .• Then, equi-depth partitioning with support in

each base interval results in the minimum number of intervals:

Page 28: Mining Quantitative Association Rules in Large Relational Databases

28

Outline• Association Rules and Quantitative Association

Rules• Formal Study of Quantitative Association

Analysis• Partitioning Quantitative Attributes• Identifying the Interesting Rules• Candidate Generation• Concluding Remarks• Q&A

Page 29: Mining Quantitative Association Rules in Large Relational Databases

29

Identify Interesting Rules• Combining intervals results in many rules

• For example, suppose a quarter of people in age group 20..30 are in the age group 20..25– with 8% sup, 70% conf– , with 2% sup, 70% conf– The second rule doesn’t give any additional

information, and is less general than the first rule

Page 30: Mining Quantitative Association Rules in Large Relational Databases

30

Expected Value of Support and Confidence

• Interest: Rules with support and confidence according to some expectations

• Let • Let , • The expected value of based on , would be

)• Similarly, the expected value of the confidence for the rule

according to its generalization would be)

where , .

Page 31: Mining Quantitative Association Rules in Large Relational Databases

31

Interest Measure• Itemset is -interesting w.r.t its generalization

, if – , and– For any specialization with , is -interesting w.r.t

• Rule is -interesting w.r.t its generalization if – , or

– Moreover, the itemset is -interesting w.r.t .

Page 32: Mining Quantitative Association Rules in Large Relational Databases

32

Example of Interest

Page 33: Mining Quantitative Association Rules in Large Relational Databases

33

Outline• Association Rules and Quantitative Association

Rules• Formal Study of Quantitative Association

Analysis• Partitioning Quantitative Attributes• Identifying the Interesting Rules• Candidate Generation• Concluding Remarks• Q&A

Page 34: Mining Quantitative Association Rules in Large Relational Databases

34

Candidate Generation• Given the set of all frequent -itemsets, generate

the set of • The process has three parts:

– Join Phase– Subset Prune Phase– Interest Prune Phase

Page 35: Mining Quantitative Association Rules in Large Relational Databases

35

Join Phase• joined with itself

• Example, :

• Result of self-join, :

Page 36: Mining Quantitative Association Rules in Large Relational Databases

36

Subset Prune Phase• Make sure any -subset is in .

• Example, :

• Result of self-join, :

• Delete the first itemset in since is not in .

Page 37: Mining Quantitative Association Rules in Large Relational Databases

37

Interest Prune Phase• Given user-specified interest level • Delete any itemset that contains an item with

support greater than • It is guaranteed that such itemsets cannot be -

interesting w.r.t their generalizations

Page 38: Mining Quantitative Association Rules in Large Relational Databases

38

Outline• Association Rules and Quantitative Association

Rules• Formal Study of Quantitative Association

Analysis• Partitioning Quantitative Attributes• Identifying the Interesting Rules• Candidate Generation• Concluding Remarks• Q&A

Page 39: Mining Quantitative Association Rules in Large Relational Databases

39

Concluding Remarks• Introduced the problem of mining quantitative

association rules

• Dealt with quantitative attributes by fine-partitioning the values and combining adjacent partitions as necessary

• Introduced partial completeness to quantify the information lost, and help decide the partitions

• Gave interest measure to identify interesting rules

• Candidate Generation

Page 40: Mining Quantitative Association Rules in Large Relational Databases

40

Outline• Association Rules and Quantitative Association

Rules• Formal Study of Quantitative Association

Analysis• Partitioning Quantitative Attributes• Identifying the Interesting Rules• Extending the Apriori Algorithm• Concluding Remarks• Q&A

Page 41: Mining Quantitative Association Rules in Large Relational Databases

41

Exam Questions1. What are the two problems with mapping quantitative associations to boolean associations?A. Slide No. 8

2. Give the general steps to be followed in order to mine quantitative association rules.B. Slide No. 10

3. If P is a K-Complete set w.r.t. the set of all frequent itemsets, the minimum confidence when generating rules from P should follow what constraint, in order to guarantee that a close rule will be generated?C. It should be of the desired level of confidence. Slide

No. 24.

Page 42: Mining Quantitative Association Rules in Large Relational Databases

42

Thank you.Questions?