Upload
dunn
View
34
Download
1
Embed Size (px)
DESCRIPTION
Classification by Association Rules: Use Minimum Set of Rules. Jianyu Yang December 10, 2003. Classification System. Problem: (A, B, C) => y | n ? Decision tree learning, etc. Association rules: X => c X : antecedent, c : consequent Support & Confidence Algorithms: Apriori. - PowerPoint PPT Presentation
Citation preview
Classification by Association Classification by Association Rules:Rules:
Use Minimum Set of RulesUse Minimum Set of Rules
Jianyu YangJianyu Yang
December 10, 2003December 10, 2003
Classification SystemClassification System
• Problem: (A, B, C) => y | n ?Problem: (A, B, C) => y | n ?– Decision tree learning, etc.Decision tree learning, etc.
• Association rules: Association rules: XX => => cc– XX: antecedent, : antecedent, c c : consequent: consequent– Support & ConfidenceSupport & Confidence– Algorithms: AprioriAlgorithms: Apriori
Association Rules: IssuesAssociation Rules: Issues
• Too many rulesToo many rules– InefficientInefficient– OverfittingOverfitting
• Applying order mattersApplying order matters– Example: (A, B) => y, (C) => nExample: (A, B) => y, (C) => n
• Minimum Support (Minimum Support (minsupminsup))
• Minimum Confidence (Minimum Confidence (minconf minconf ))
MSR AlgorithmMSR Algorithm
Ideas:
• No redundant rules– (A, B) =>y– (A, B, C) =>y
• Total order of rules– “Occum’s razor”:
favor general rules
• Pre-pruning – (A, B) =>y– (A, B, D)=>?
1 L1 = {large 1-ruleitems};2 CAR1 = genRules(L1)3 pruneSet(L1)4 for (k = 2; Lk-1 ≠ ; k++) do begin5 Ck = apriori-gen(Lk-1);6 forall training instances t D do begin7 Ct = subset(Ck, t)8 forall candidates c Ct
9 Ci .count++ for class label i10 end11 Lk = {c Ct | ci .count ≥ minsup for any
class i}12 CARk = genRules(Lk)13 pruneSet(Lk)14 end15 CARs = UNIONk(CARk)
minsupminsup
0
5
10
15
20
0 5 10 15 20
Support (%)
Err
or R
ate
(%)
crx austra auto
minconfminconf
0
10
20
30
40 50 60 70 80 90 100Confidence (%)
Err
or R
ate
(%)
crx austra auto
Results: Error Rate Results: Error Rate ComparisonComparison
0
10
20
30
40
Err
or R
ate(
%)
C4.5(R8) CBA(V2) MSR
ConclusionsConclusions
• A new algorithm was designed to build a A new algorithm was designed to build a classification system using a minimum set classification system using a minimum set of association rules.of association rules.
• In general, low In general, low minsupminsup and high and high minconfminconf produce low error rates.produce low error rates.
• Experiments on 26 benchmark datasets Experiments on 26 benchmark datasets showed lower error rates in 17 datasets showed lower error rates in 17 datasets thanC4.5 (R8), in 16 than CBA (v2.0). thanC4.5 (R8), in 16 than CBA (v2.0).
• The new algorithm does not always produce The new algorithm does not always produce lower error rates than other algorithms. lower error rates than other algorithms.