36
Mining Quantitative Correlated Patterns Using an Information- Theoretic Approach Yiping Ke, James Cheng, Wilfred Ng Presented By: Chibuike Muoh

Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

  • Upload
    jovita

  • View
    22

  • Download
    0

Embed Size (px)

DESCRIPTION

Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach. Yiping Ke, James Cheng, Wilfred Ng. Presented By: Chibuike Muoh. Presentation Outline:. Contributions of the paper Introduction What are QCPs? Definitions Background Information Theory (entropy, MI, NMI) - PowerPoint PPT Presentation

Citation preview

Page 1: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Mining Quantitative Correlated Patterns Using an Information-

Theoretic Approach

Yiping Ke, James Cheng, Wilfred Ng

Presented By:

Chibuike Muoh

Page 2: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Presentation Outline:

• Contributions of the paperIntroduction

• What are QCPs?DefinitionsBackground Information Theory (entropy, MI, NMI)

• Mining QCPsAll-confidenceDiscretization problem (interval combining)Attribute-level pruningInterval-level pruningQCoMine algorithm

Page 3: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Contributions of the paper

• Presents a new algorithm for mining patterns on databases based on theory borrowed from information theory: entropy & mutual information

• Achieves discretization of attribute domain using supervised interval combining to preserve dependency between attributes

Page 4: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Introduction

• Similar to association rule mining in principle but evaluating for association rules can be too expensive on VLDBs

• Trivial result-set {pregnant} {edema} & {pregnant, female} {edema}

• Unproductive rules as a result of co-occurrence effects {pregnant, dataminer} {edema}.

– So occupation and edema condition are related?

• Unlike association mining, mining for QCP consider the dependency of the attribute sets of the database to generate highly correlated patterns– Similar to generating “maximal informative k-itemsets”, but here

we consider dependency in the attribute sets

Page 5: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Introduction…contd.

• The idea behind mining QCP– Evaluate the attribute set and look for ‘strong’

dependencies between attributes– Next find correlated interval sets in the

dependent attributes and generate patterns from them

• Thus, QCPs are not restricted by frequently co-occurring attributes

Page 6: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Definitions: Quantitative Database

• A pattern X, is a set of attributes or random variable = {x1, x2, x3, …, xm} whose outcomes can be numerical or quantitative and have possibilities p(vx) = {p1, p2, p3, …, pm}– These attributes can be either categorical in which case

domain of xi, dom(x) is in the interval {lx, ux} where lx = ux

– And it is quantitative if where xi[lx, ux] is the interval of xi,

lx <= ux

– A pattern X is called a k-pattern if |attr(X)| = k• Consider a quantitative database, D, as a set of

transactions, T. and transaction in D are a vector of items <v1, v2, v3, …, vm> where vi E dom(xi) for 1 <= I <= m.

Page 7: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Definition…contd.

• So we say a transaction supports a pattern X if every attribute in X is represented in T– The frequency of a pattern X in D, freq(X), is

the number of transactions in D that supports X

– The support of X, supp(X) = freq(X)/|D| which is the probability a transaction T in D supports X

Page 8: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Example

• The database table above consists of (6) attributes of which (3) are quantitative {age, salary, service years} and two are categorical {gender, married}

• The last column records the support of each transaction• E.g. For pattern X = age[4, 5]gender[1,1], supp(X) =

0.25+0.19 = 0.44

Page 9: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Background: Information Theory

• Mining QCPs makes use of fundamental concepts in information theory

• Entropy: measures the information content/uncertainty of a random variable, x

Page 10: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Background: Information Theory…contd.

• Mutual information (MI): measures the average reduction in uncertainty about a random variable X, given the knowledge of Y (or vice versa)

– MI is a symmetric measure, so the greater the value of I(x; y), the more information x and y tell about each other.

Page 11: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

The example above shows that age causes a reduction of 0.47 in the uncertainty of married

Similarly as an exercise, we can compute I(gender;education) =

Example

• Consider the pattern X = (age;married) from Table 1, we can compute I(age,married) =

}5,4,3,2,1{ }2,1{

47.0)()(

),(log),(

age marriedv vmarriedage

marriedagemarriedage

pp

vvpvvp

0.40

Page 12: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Normalized Mutual Information

• But by how much does X actually tell us about Y?

• Entropy of different attributes vary greatly, so MI only returns us an absolute value, which would not be so helpful in our case

• We can try normalizing the MI among our set of attributes to get a global relative measure

Page 13: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

NMI…contd.

• Normalizing the MI measure among the attribute sets returns us the minimum percentage of reduction in the uncertainty of one attribute given the knowledge of another

where

Page 14: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Example 2• From the previous

example we can compute

• Also we can determine

Note that although I(age;married) > I(gender;education) its NMI is less this can be attributed to the high entropy value of H(age) = 2.19 > H(education) = 1.34

This implies that a much larger absolute value of uncertainty can be reduce by knowing age than a relative amount.

Page 15: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Definition: Quantitative Pattern

• A more formal definition of quantitative pattern X follows below:

• Thus given a minimum threshold (μ) and minimum all-confidence threshold (ς ), a quantitative pattern has strong co-dependency between attributes and high confidence level in the dataset

Page 16: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

allconf(X)

• All confidence is a correlation measure for determining the minimum confidence of association rules that can be derived from a given pattern.

• For a quantitative pattern, allconf(X) is defined as:

• This is different from association rule mining where conf(XY) only indicates an implication of sets on left to sets on right

Page 17: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

allconf(X)…contd.

• All confidence has the downward closure property thus a pattern has all-confidence no less than ς, so do all its sub-patterns

)(]),[sup()sup( xdomxulxX xx

Page 18: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Example

• allconf(X) = gender[1,1]education[1,1]

53.0

}09.009.011.019.0,08.009.009.009.011.019.025.0{

09.009.011.019.0

])}1,1[sup(]),1,1[{sup(

])1,1[]1,1[sup(

MAX

educationgenderMAX

educationgender

Similarly allconf(gender[1,1]married[1,1]) = 0.9

Page 19: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

allconf(X)

• A caveat about allconf is that since it is applied at fine granularity to intervals of attributes it can’t solely be used as a measure for co-related patterns.– Quantitative attributes can span huge intervals creating a co-

occurrence problem

• The above, points explain the need to first perform pruning at attribute level

Example

For the given employee database in the previous example, we set μ= 0.2 and ς = 0.5. The pattern Y = gender[1,1]married[1,1] is not a QCP because

Ί(gender,married)= 0 < μ although allconf(Y) = 0.9

this is because, gender & married are independent of each other, but then p(gender[1,1]) and p(married[1,1]) are very high

Page 20: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

QCP Mining

• Problem description: – Given a quantitative database, D, a minimum

information threshold μ, and a minimum all-confidence threshold, ς, the mining problem is to find all QCPs from D

Page 21: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

QCP Mining: Process Outline

Quantitative Database

Attributepruning

Interval pruning

IntervalCombining/Discretization

QCoMine Algorithm

- Attribute pruning finds dependent attribute sets

- Interval pruning generates correlated patterns

Page 22: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Interval Combining

• When dealing with quantitative data, continuous attributes we need to discretize the intervals of the attribute.

• Challenges– Preventing the intervals from being to trivial

• Eg: age[0,2] vs age[0,0], age[1,1], age[2,2]

– Considering the dependency of the attributes when combining their intervals

• Example: the pattern (age,gender) can produce a different interval than (age,married)

Page 23: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Interval combining…contd.• Interval combing for quantitative patterns can be

considered an optimization problem, for an objective function Φ :

• Goal for this stage is:– Given two attributes x and y, where x is quantitative

and y can be either quantitative or categorical we want to obtain the optimal combined intervals of x with respect to y.

• Note that since this optimization is performed locally (btw. pairs of attribute) we use MI instead of NMI

Page 24: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Interval combining: Algorithm.

The idea is to pick up at each time the maximum Φ [ix[j],ix[j+1]](x,y) among all pairs of consecutive intervals ix[j] and ix[j+1], and combine corresponding ix[j] and ix[j+1] into xj’

•Let Φ[ix1,ix2](x,y) denote the value of Φ(x,y) when ix1 and ix2 are combined with respect to y

•At each time, two consecutive intervals, ix1 & ix2 are considered for combination.

To prevent the intervals from being to trivial a termination condition is set as minimum value for the interval specified

Page 25: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Attribute level pruning

• At this stage pruning at the attribute level is performed such that the attributes in a pattern have NMI of at least μ

The above definition considers attribute patterns as vertices in a graph, and cliques in the graph represent QCPs

Page 26: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Attribute Level pruning…contd.• From the previous definition, QCP’s are cliques in the

NMI-graph having NMI >= μ– Without pruning at the attribute level i.e. u=0 the search space

for cliques in the graph becomes more complex– And enumerating for cliques in a graph can be an exhaustive

process

• Authors of the paper introduce a prefix tree structure for prefixing correlated attributes attribute prefix tree, Tattr

• Clique enumeration in the NMI-graph is done using a the prefix tree– The only extra action required when enumerating cliques using

the prefix tree is to check if (u,v) is an edge in the G

Page 27: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Prefix tree construction• To create the prefix tree

1. First a root node is created at level 0 of Tattr 2. Then at level 1 we create a node for each

attribute I as a child of the root3. For each node u at level k (k >= 1) and for each

right sibling v of u, if (u,v) is an edge in G, we create a child node for u with the same attribute label as that of v

4. Repeat step 3 until for u’s children at level k+1

Step 3 of the prefix tree construction creates the prefix tree in a depth-first manner

Page 28: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Interval-level pruning

• Even though the cliques found using the NMI-graph have high NMI they differ on the intervals of their continuous attributes– Since intervals are combined in a supervised way, the same

attribute may have difference set of combined intervals with respect to different attributes

– Thus patterns with low all-confidence may still be generated from correlated attributes

• The Interval-level pruning process uses all-confidence to ensure that only high confidence patterns are generated from a pattern X and all its super-patterns– Follows from its downward closure property

Page 29: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Interval-level pruning…contd.

• Note that an easy way to perform pruning at the interval level for a k+1 pattern, is to compute the intersection of the prefixing (k-1) intervals of the two k-patterns– Example

Given age[30,40]married[1,1] and age[25,35]salary[2000,3000] intersect the intervals of age to obtain the new pattern age[30,35]married[1,1]salary[2000,3000]

• However producing a new (k+1) pattern using intersection violates the downward closure property of all-confidence– Shrinking the intervals in the (k+1)-pattern may cause a great

decrease in the support value of a single item so its all-confidence may be higher than its composite k-patterns

Page 30: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Interval-level pruning…contd.

• We can avoid intersection in the interval pruning by enumerating all sub-intervals of a combined interval Sx and Sy of the attribute set {x,y} at level-2 of Tattr and prune at that level before generating a pattern

• We need to consider all pairs of sub-intervals of x and y as each of them represents a pattern– Thus for each interval set {i’x, i’y}, where

– We create a QCP X if x[i’x]y[i’x] if allconf(X) >= ς

• This process of evaluating all possible sub-interval combinations at 2-patterns ensures down closure on all k-patterns generated from it

yyxxyyxx SiandSiiiii ,','

Page 31: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

QCoMine AlgorithmFirst combine the base intervals of each quantitative attribute with respect to another attribute

Step 2-4 constructs the NMI graph G and uses it to guide the construction of the attribute prefix tree Tattr to perform attribute pruning

Steps 5-13 construct level-2 of Tattr and also perform interval pruning (steps 10-13) which produces all 2-pattern QCPs

Twinterval is an interval-prefix tree,

that keep the interval sets of all patterns generated by a node u in Tattr it is used as a memoization variable for speedup and space saving

Steps 14-15 invoke RecurMine on the child nodes of u in G to generate all k-QCPs for k > 2

Page 32: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

QCoMine Algorithm…contd.The steps in the RecurMine algorithm continue to build the prefix tree Tattr from k>2

Interval pruning is aided by using the interval-prefix tree to speed up joins of two k-patterns.

At step 6 of the algorithm when two k-patterns are combined, it is ensured that all their prefixing (k-1)-intervals are the same in both patterns to prevent performing interval combining

Page 33: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Performance of QCoMine• Performance test of the QCoMine algorithm were

performed to test the efficiency of its three major components

1. Supervised interval combining2. Attribute-level pruning by NMI3. Interval-level pruning by all confidence

• Three-variants of the algorithm were createda. QCoMine, which performs all operations as described originally in

the paperb. QCoMine-0, a control variant of the original algorithm which

performs the interval combining process but sets μ=0c. QCoMine-1, is another control variant that does not perform

interval combining process but utilizes μ as described originally in the paper

• The tests were performed with all-confidence from ς = 60% to 100%

Page 34: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Performance of QCoMine…contd.

When interval combining is not applied, results on the dataset can only be obtain when ς = 100%. In all other cases the algorithm will run out of memory.

This is because QCoMine-1 is inefficient since it allows the interval of an item to become too trivial so patters would easily gain all-confidence > ς simply by co-occurrence.

Page 35: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Performance of QCoMine…contd.

The running time for both QCoMine and QCoMine-0 increases only slightly for smaller ς this is because the majority of the time is spent on computing the 2-patterns.No matter the value of ς we need to test every 2-pattern to determine if it’s a QCP, before we can employ downward property of all-confidence to prune.

Page 36: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

References

1. Mining quantitative correlated patterns using an information-theoretic approach, Y Ke, J Cheng, W Ng - Proceedings of the 12th ACM SIGKDD international conference 2006

2. Discovering significant rules, GI Webb - Proceedings of the 12th ACM SIGKDD international conference 2006

3. Maximally informative k-itemsets and their efficient discovery, AJ Knobbe, EKY Ho - Proceedings of the 12th ACM SIGKDD international conference 2006