36
Mining Quantitative Correlated Patterns Using an Information- Theoretic Approach Yiping Ke, James Cheng, Wilfred Ng Presented By: Chibuike Muoh

Mining Quantitative Correlated Patterns Using an Information- Theoretic Approach Yiping Ke, James Cheng, Wilfred Ng Presented By: Chibuike Muoh

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Mining Quantitative Correlated Patterns Using an Information-

Theoretic Approach

Yiping Ke, James Cheng, Wilfred Ng

Presented By:

Chibuike Muoh

Presentation Outline:

• Contributions of the paperIntroduction

• What are QCPs?DefinitionsBackground Information Theory (entropy, MI, NMI)

• Mining QCPsAll-confidenceDiscretization problem (interval combining)Attribute-level pruningInterval-level pruningQCoMine algorithm

Contributions of the paper

• Presents a new algorithm for mining patterns on databases based on theory borrowed from information theory: entropy & mutual information

• Achieves discretization of attribute domain using supervised interval combining to preserve dependency between attributes

Introduction

• Similar to association rule mining in principle but evaluating for association rules can be too expensive on VLDBs

• Trivial result-set {pregnant} {edema} & {pregnant, female} {edema}

• Unproductive rules as a result of co-occurrence effects {pregnant, dataminer} {edema}.

– So occupation and edema condition are related?

• Unlike association mining, mining for QCP consider the dependency of the attribute sets of the database to generate highly correlated patterns– Similar to generating “maximal informative k-itemsets”, but here

we consider dependency in the attribute sets

Introduction…contd.

• The idea behind mining QCP– Evaluate the attribute set and look for ‘strong’

dependencies between attributes– Next find correlated interval sets in the

dependent attributes and generate patterns from them

• Thus, QCPs are not restricted by frequently co-occurring attributes

Definitions: Quantitative Database

• A pattern X, is a set of attributes or random variable = {x1, x2, x3, …, xm} whose outcomes can be numerical or quantitative and have possibilities p(vx) = {p1, p2, p3, …, pm}– These attributes can be either categorical in which case

domain of xi, dom(x) is in the interval {lx, ux} where lx = ux

– And it is quantitative if where xi[lx, ux] is the interval of xi,

lx <= ux

– A pattern X is called a k-pattern if |attr(X)| = k• Consider a quantitative database, D, as a set of

transactions, T. and transaction in D are a vector of items <v1, v2, v3, …, vm> where vi E dom(xi) for 1 <= I <= m.

Definition…contd.

• So we say a transaction supports a pattern X if every attribute in X is represented in T– The frequency of a pattern X in D, freq(X), is

the number of transactions in D that supports X

– The support of X, supp(X) = freq(X)/|D| which is the probability a transaction T in D supports X

Example

• The database table above consists of (6) attributes of which (3) are quantitative {age, salary, service years} and two are categorical {gender, married}

• The last column records the support of each transaction• E.g. For pattern X = age[4, 5]gender[1,1], supp(X) =

0.25+0.19 = 0.44

Background: Information Theory

• Mining QCPs makes use of fundamental concepts in information theory

• Entropy: measures the information content/uncertainty of a random variable, x

Background: Information Theory…contd.

• Mutual information (MI): measures the average reduction in uncertainty about a random variable X, given the knowledge of Y (or vice versa)

– MI is a symmetric measure, so the greater the value of I(x; y), the more information x and y tell about each other.

The example above shows that age causes a reduction of 0.47 in the uncertainty of married

Similarly as an exercise, we can compute I(gender;education) =

Example

• Consider the pattern X = (age;married) from Table 1, we can compute I(age,married) =

}5,4,3,2,1{ }2,1{

47.0)()(

),(log),(

age marriedv vmarriedage

marriedagemarriedage

pp

vvpvvp

0.40

Normalized Mutual Information

• But by how much does X actually tell us about Y?

• Entropy of different attributes vary greatly, so MI only returns us an absolute value, which would not be so helpful in our case

• We can try normalizing the MI among our set of attributes to get a global relative measure

NMI…contd.

• Normalizing the MI measure among the attribute sets returns us the minimum percentage of reduction in the uncertainty of one attribute given the knowledge of another

where

Example 2• From the previous

example we can compute

• Also we can determine

Note that although I(age;married) > I(gender;education) its NMI is less this can be attributed to the high entropy value of H(age) = 2.19 > H(education) = 1.34

This implies that a much larger absolute value of uncertainty can be reduce by knowing age than a relative amount.

Definition: Quantitative Pattern

• A more formal definition of quantitative pattern X follows below:

• Thus given a minimum threshold (μ) and minimum all-confidence threshold (ς ), a quantitative pattern has strong co-dependency between attributes and high confidence level in the dataset

allconf(X)

• All confidence is a correlation measure for determining the minimum confidence of association rules that can be derived from a given pattern.

• For a quantitative pattern, allconf(X) is defined as:

• This is different from association rule mining where conf(XY) only indicates an implication of sets on left to sets on right

allconf(X)…contd.

• All confidence has the downward closure property thus a pattern has all-confidence no less than ς, so do all its sub-patterns

)(]),[sup()sup( xdomxulxX xx

Example

• allconf(X) = gender[1,1]education[1,1]

53.0

}09.009.011.019.0,08.009.009.009.011.019.025.0{

09.009.011.019.0

])}1,1[sup(]),1,1[{sup(

])1,1[]1,1[sup(

MAX

educationgenderMAX

educationgender

Similarly allconf(gender[1,1]married[1,1]) = 0.9

allconf(X)

• A caveat about allconf is that since it is applied at fine granularity to intervals of attributes it can’t solely be used as a measure for co-related patterns.– Quantitative attributes can span huge intervals creating a co-

occurrence problem

• The above, points explain the need to first perform pruning at attribute level

Example

For the given employee database in the previous example, we set μ= 0.2 and ς = 0.5. The pattern Y = gender[1,1]married[1,1] is not a QCP because

Ί(gender,married)= 0 < μ although allconf(Y) = 0.9

this is because, gender & married are independent of each other, but then p(gender[1,1]) and p(married[1,1]) are very high

QCP Mining

• Problem description: – Given a quantitative database, D, a minimum

information threshold μ, and a minimum all-confidence threshold, ς, the mining problem is to find all QCPs from D

QCP Mining: Process Outline

Quantitative Database

Attributepruning

Interval pruning

IntervalCombining/Discretization

QCoMine Algorithm

- Attribute pruning finds dependent attribute sets

- Interval pruning generates correlated patterns

Interval Combining

• When dealing with quantitative data, continuous attributes we need to discretize the intervals of the attribute.

• Challenges– Preventing the intervals from being to trivial

• Eg: age[0,2] vs age[0,0], age[1,1], age[2,2]

– Considering the dependency of the attributes when combining their intervals

• Example: the pattern (age,gender) can produce a different interval than (age,married)

Interval combining…contd.• Interval combing for quantitative patterns can be

considered an optimization problem, for an objective function Φ :

• Goal for this stage is:– Given two attributes x and y, where x is quantitative

and y can be either quantitative or categorical we want to obtain the optimal combined intervals of x with respect to y.

• Note that since this optimization is performed locally (btw. pairs of attribute) we use MI instead of NMI

Interval combining: Algorithm.

The idea is to pick up at each time the maximum Φ [ix[j],ix[j+1]](x,y) among all pairs of consecutive intervals ix[j] and ix[j+1], and combine corresponding ix[j] and ix[j+1] into xj’

•Let Φ[ix1,ix2](x,y) denote the value of Φ(x,y) when ix1 and ix2 are combined with respect to y

•At each time, two consecutive intervals, ix1 & ix2 are considered for combination.

To prevent the intervals from being to trivial a termination condition is set as minimum value for the interval specified

Attribute level pruning

• At this stage pruning at the attribute level is performed such that the attributes in a pattern have NMI of at least μ

The above definition considers attribute patterns as vertices in a graph, and cliques in the graph represent QCPs

Attribute Level pruning…contd.• From the previous definition, QCP’s are cliques in the

NMI-graph having NMI >= μ– Without pruning at the attribute level i.e. u=0 the search space

for cliques in the graph becomes more complex– And enumerating for cliques in a graph can be an exhaustive

process

• Authors of the paper introduce a prefix tree structure for prefixing correlated attributes attribute prefix tree, Tattr

• Clique enumeration in the NMI-graph is done using a the prefix tree– The only extra action required when enumerating cliques using

the prefix tree is to check if (u,v) is an edge in the G

Prefix tree construction• To create the prefix tree

1. First a root node is created at level 0 of Tattr 2. Then at level 1 we create a node for each

attribute I as a child of the root3. For each node u at level k (k >= 1) and for each

right sibling v of u, if (u,v) is an edge in G, we create a child node for u with the same attribute label as that of v

4. Repeat step 3 until for u’s children at level k+1

Step 3 of the prefix tree construction creates the prefix tree in a depth-first manner

Interval-level pruning

• Even though the cliques found using the NMI-graph have high NMI they differ on the intervals of their continuous attributes– Since intervals are combined in a supervised way, the same

attribute may have difference set of combined intervals with respect to different attributes

– Thus patterns with low all-confidence may still be generated from correlated attributes

• The Interval-level pruning process uses all-confidence to ensure that only high confidence patterns are generated from a pattern X and all its super-patterns– Follows from its downward closure property

Interval-level pruning…contd.

• Note that an easy way to perform pruning at the interval level for a k+1 pattern, is to compute the intersection of the prefixing (k-1) intervals of the two k-patterns– Example

Given age[30,40]married[1,1] and age[25,35]salary[2000,3000] intersect the intervals of age to obtain the new pattern age[30,35]married[1,1]salary[2000,3000]

• However producing a new (k+1) pattern using intersection violates the downward closure property of all-confidence– Shrinking the intervals in the (k+1)-pattern may cause a great

decrease in the support value of a single item so its all-confidence may be higher than its composite k-patterns

Interval-level pruning…contd.

• We can avoid intersection in the interval pruning by enumerating all sub-intervals of a combined interval Sx and Sy of the attribute set {x,y} at level-2 of Tattr and prune at that level before generating a pattern

• We need to consider all pairs of sub-intervals of x and y as each of them represents a pattern– Thus for each interval set {i’x, i’y}, where

– We create a QCP X if x[i’x]y[i’x] if allconf(X) >= ς

• This process of evaluating all possible sub-interval combinations at 2-patterns ensures down closure on all k-patterns generated from it

yyxxyyxx SiandSiiiii ,','

QCoMine AlgorithmFirst combine the base intervals of each quantitative attribute with respect to another attribute

Step 2-4 constructs the NMI graph G and uses it to guide the construction of the attribute prefix tree Tattr to perform attribute pruning

Steps 5-13 construct level-2 of Tattr and also perform interval pruning (steps 10-13) which produces all 2-pattern QCPs

Twinterval is an interval-prefix tree,

that keep the interval sets of all patterns generated by a node u in Tattr it is used as a memoization variable for speedup and space saving

Steps 14-15 invoke RecurMine on the child nodes of u in G to generate all k-QCPs for k > 2

QCoMine Algorithm…contd.The steps in the RecurMine algorithm continue to build the prefix tree Tattr from k>2

Interval pruning is aided by using the interval-prefix tree to speed up joins of two k-patterns.

At step 6 of the algorithm when two k-patterns are combined, it is ensured that all their prefixing (k-1)-intervals are the same in both patterns to prevent performing interval combining

Performance of QCoMine• Performance test of the QCoMine algorithm were

performed to test the efficiency of its three major components

1. Supervised interval combining2. Attribute-level pruning by NMI3. Interval-level pruning by all confidence

• Three-variants of the algorithm were createda. QCoMine, which performs all operations as described originally in

the paperb. QCoMine-0, a control variant of the original algorithm which

performs the interval combining process but sets μ=0c. QCoMine-1, is another control variant that does not perform

interval combining process but utilizes μ as described originally in the paper

• The tests were performed with all-confidence from ς = 60% to 100%

Performance of QCoMine…contd.

When interval combining is not applied, results on the dataset can only be obtain when ς = 100%. In all other cases the algorithm will run out of memory.

This is because QCoMine-1 is inefficient since it allows the interval of an item to become too trivial so patters would easily gain all-confidence > ς simply by co-occurrence.

Performance of QCoMine…contd.

The running time for both QCoMine and QCoMine-0 increases only slightly for smaller ς this is because the majority of the time is spent on computing the 2-patterns.No matter the value of ς we need to test every 2-pattern to determine if it’s a QCP, before we can employ downward property of all-confidence to prune.

References

1. Mining quantitative correlated patterns using an information-theoretic approach, Y Ke, J Cheng, W Ng - Proceedings of the 12th ACM SIGKDD international conference 2006

2. Discovering significant rules, GI Webb - Proceedings of the 12th ACM SIGKDD international conference 2006

3. Maximally informative k-itemsets and their efficient discovery, AJ Knobbe, EKY Ho - Proceedings of the 12th ACM SIGKDD international conference 2006