Upload
riva
View
48
Download
0
Embed Size (px)
DESCRIPTION
Extracting Decisional Correlation Rules. Alain Casali Christian Ernst. Industrial Problem. Given a supply chain (in micro- electronics) , we want to find links between some parameters ’ values and values of a specific attribute of the supply chain (the yield) . - PowerPoint PPT Presentation
Citation preview
Extracting Decisional Correlation Rules
Alain Casali
Christian Ernst
Dexa'09 - Extracting Decision Correlation Rules
Industrial ProblemGiven a supply chain (in micro-electronics), we
want to find links between some parameters’ values and values of a specific attribute of the supply chain (the yield).
The use of positive (and/or negative) association rules is not suitable in our context.
We use correlation tests because: it is a more significant measure in a statistical way; the measure takes into account not only the presence but
also the absence of the items; the measure is non-directional, and can thus highlight
more complex existing links than a “simple ” implication.
Dexa'09 - Extracting Decision Correlation Rules
OutlinePreliminariesDecision Correlation RulesContingency VectorsLHS-χ2 algorithmExperimental Analysis Conclusion
Dexa'09 - Extracting Decision Correlation Rules
Literal SetA literal set XY is composed by:
a positive part (X);a negative part (Y);
The variation of a literal set XY encompasses all the combinations that we can obtain from XY.Ex: Var(AB) = AB, AB, AB, AB
The support of a literal set is the number of transactions that contain its positive part and contain no 1-item of its negative part.
Dexa'09 - Extracting Decision Correlation Rules
Correlation rule and χ2 (1)Contingency table
Expected Value
Tid Item Target
1 B C F T1
2 B C F T1
3 B C E T1
4 F T1
5 B D F T2
6 B F
7 B C F
8 A E
9 B C F
10 B F
Each cell of the contingency table (CT) of a pattern X contains the support of all literal sets YZ related to its variation:
CT (BF) B B ∑ line
F 7 1 8
F 1 1 2
∑ column 8 2 10
Correlation rule and χ2 (2)Computation of χ2 (Brin’97)
Makes the link between real support and theoretical support (expected value)
Correlation rateutilization of a table giving the centile values with a single degree of freedom (existence of a bijection) Correlation (BF) ≈ 85%
Dexa'09 - Extracting Decision Correlation Rules
)( )ZE(Y
))²ZE(Y - )Z(Supp(Y)²(XTCZY
X ⇒χ2(BF) ≈ 1,67
Dexa'09 - Extracting Decision Correlation Rules
Related ConstraintsAnti monotone constraint
(Cochran criteria):no cell of the CT must have a
null value; at least p% of the CT’s cells
must have a support greater or equal than MinSup;
Monotone ConstraintX symbolizes a valid correlation
rule: χ2(X) ≥ MinCor
Dexa'09 - Extracting Decision Correlation Rules
Browsing the search spaceUtilization of levelwise algorithms to browse the
search space;Levelwise algorithms are adapted when:
the relation is on the disk;we have anti monotone constraints.
Problem: memory requirement for the contingency tables)*o( C2 i
n
1i
Level Memory requirement
2 4 MB3 2,5 GB4 1,3 TB
Example with |I| = 1000
DEXA - Sept. 2006 9
Goal: enumerate the combinations (powerset lattice) with a balanced tree
Start point: 2 vectors; the 1st one is empty, the 2nd one contains the list of the itemsCreate 2 branches:
left: prune the last element of the 2nd vector (recursive call)
right: add the last element of the 2nd vector to the first (recursive call) Stop: when the 2nd vector is empty, then output the 1st vector
(,ABC)
(C,AB)(,AB)
(,A) (B,A)
(, ) (A,) (B,) (AB,)
Lectic Order & Lectic Search (LS)
Dexa'09 - Extracting Decision Correlation Rules
OutlinePreliminariesDecision Correlation RulesContingency VectorsLHS-χ2 algorithmExperimental Analysis Conclusion
Dexa'09 - Extracting Decision Correlation Rules
Decision Correlation RulesWe are interested by rules satisfying the both
constraints:χ2(X) ≥ MinCorX contains 1 value of the target attribute
Problem: it does not exist a function f such that
χ2(X ∪ A) = f(χ2(X), supp(A))
Dexa'09 - Extracting Decision Correlation Rules
OutlinePreliminariesDecision Correlation RulesContingency VectorsLHS-χ2 algorithmExperimental Analysis Conclusion
Dexa'09 - Extracting Decision Correlation Rules
Contingency Vector (1)Equivalence class associated with a literal
Contingency Vector of a pattern XSet of equivalence classes of the variation of X
[YZ] = {i Tid(r) / Y Tid(i) et Z Tid(i) = }Ex : [B F] = {3}
Ex : CV (B F) = { [BF], [BF], [BF], [BF]} = {{8}, {4}, {3},
{1,2,5,6,7,9,10}
Tid Item Target
1 B C F T1
2 B C F T1
3 B C E T1
4 F T1
5 B D F T2
6 B F
7 B C F
8 A E
9 B C F
10 B F
Dexa'09 - Extracting Decision Correlation Rules
Contingency Vector (2)The contingency vector is a partition of the
Tid’sRecurrence relation:
In practice:
VC (X A) = (VC(X) [A]) (VC(X) [A])
Tid 1 2 3 4 5 6 7 8 9 10
VC(B) 1 1 1 0 1 1 1 0 1 1
Tid 1 2 3 4 5 6 7 8 9 10
VC(F) 1 1 0 1 1 1 1 0 1 1
Tid 1 2 3 4 5 6 7 8 9 10
VC(B) + VC(F) = VC(B F) 11 11 10 01 11 11 11 00 11 11
Additions in binary logic
Dexa'09 - Extracting Decision Correlation Rules
Contingency Vector (3)Tid 1 2 3 4 5 6 7 8 9 10
VC(B) + VC(F) = VC(B F) 11 11 10 01 11 11 11 00 11 11
«Distribution» B F B F B F B F B F
TC[B F] 1 1 1 7
Computation of the contingency table
Dexa'09 - Extracting Decision Correlation Rules
OutlinePreliminariesDecision Correlation RulesContingency VectorsLHS-χ2 algorithmExperimental Analysis Conclusion
Dexa'09 - Extracting Decision Correlation Rules
LHS χ2 AlgorithmModification of LS in order to include the
contingency vectors;If we are on a node:
Call to the left branch: we do nothing;Before calling the right branch:
Computation of the new contingency vector; Test of the anti monotone constraints; [Add current pattern to the positive border] Test of the monotone constraints; Computation of the χ2
If all tests are OK, then output the pattern and its χ2
Dexa'09 - Extracting Decision Correlation Rules
Memory RequirementsWhat is the needed storage requirement?Contingency Vectors of the 1-item:
|I|*|r| bitsCurrents contingency vectors (including the
previous one due to recursive call):|I|*|I|*|r| bits in theory|I|*|r| bytes in practice since we never
exceed pattern having a length greater than 8Finally we need: |r|*(|I|+|I|/8) bytes
this result has to be compared with )*o( C2 i
n
1i
Dexa'09 - Extracting Decision Correlation Rules
OutlinePreliminariesDecision Correlation RulesContingency VectorsLHS-χ2 algorithmExperimental Analysis Conclusion
Dexa'09 - Extracting Decision Correlation Rules
Experimental Analysis (1) Experiments are made on PC with a 1.8 GHz
processor with a RAM of 2GoFiles are provided by 2 manufacturers
(STMicroelectronics and ATMEL)
STMicroelectronics
ATMEL
# transactions 492 426# Items 3384 1136
Dexa'09 - Extracting Decision Correlation Rules
Experimental Analysis (2)
Dexa'09 - Extracting Decision Correlation Rules
Experimental Analysis (2)
Dexa'09 - Extracting Decision Correlation Rules
OutlinePreliminariesDecision Correlation RulesContingency VectorsLHS-χ2 algorithmExperimental Analysis Conclusion
Dexa'09 - Extracting Decision Correlation Rules
ConclusionWe have discovered new parameters having an
influence on the yield (above 25% was not known before);
Better response time between 30 and 70% with LHS-χ2 compared to a levelwise algorithm;
Perspectives:Utilization of “divided and conquer” strategy for
better performances;« Cleaning » / Transformation of original data;Generalization of the rules by integrated literal
sets.