Upload
edwin-reed
View
220
Download
2
Tags:
Embed Size (px)
Citation preview
1
Rule Generation
Given a frequent itemset L, find all non-empty subsets f L such that f L – f satisfies the minimum confidence requirement– If {A,B,C,D} is a frequent itemset, candidate rules:
ABC D, ABD C, ACD B, BCD A, A BCD, B ACD, C ABD, D ABCAB CD, AC BD, AD BC, BC AD, BD AC, CD AB,
If |L| = k, then there are 2k – 2 candidate association rules (ignoring L and L)
[Chapter 6 . 6 . 2 ]
2
Rule Generation
How to efficiently generate rules from frequent itemsets?– In general, confidence does not have an anti-monotone
propertyc(ABC D) can be larger or smaller than c(AB
D)
– But confidence of rules generated from the same itemset has an anti-monotone property
– e.g., L = {A,B,C,D}:
c(ABC D) c(AB CD) c(A BCD) Confidence is anti-monotone w.r.t. number of items on the RHS of the rule
67.03
2
)Diaper,Milk(
)BeerDiaper,Milk,(
BeerDiaper-,Milk
c
3
Rule Generation for Apriori Algorithm
ABCD=>{ }
BCD=>A ACD=>B ABD=>C ABC=>D
BC=>ADBD=>ACCD=>AB AD=>BC AC=>BD AB=>CD
D=>ABC C=>ABD B=>ACD A=>BCD
Lattice of rulesABCD=>{ }
BCD=>A ACD=>B ABD=>C ABC=>D
BC=>ADBD=>ACCD=>AB AD=>BC AC=>BD AB=>CD
D=>ABC C=>ABD B=>ACD A=>BCD
Pruned Rules
Low Confidence Rule
[optional]
4
Rule Generation for Apriori Algorithm
Candidate rule is generated by merging two rules that share the same prefixin the rule consequent
join(CD=>AB,BD=>AC)would produce the candidaterule D => ABC
Prune rule D=>ABC if itssubset AD=>BC does not havehigh confidence
BD=>ACCD=>AB
D=>ABC
[optional]
5
Midterm Exam 1
(d) List all association rules with the minimum confidence minconf=50% and minimum support minsup=30%.
Transaction ID Items Bought
1 {a,b,d,e}
2 {b,c,d}
3 {a,b,d,e}
4 {a,c,e}
5 {b,c,d,e}
6 {b,d }
7 {c,d}
8 {a,b,c}
9 {a,d,e}
10 {b,e}
6
F
F F F F F
F F F F F F FF
F F
7
Transaction ID Items Bought
1 {a,b,d,e}
2 {b,c,d}
3 {a,b,d,e}
4 {a,c,e}
5 {b,c,d,e}
6 {b,d }
7 {c,d}
8 {a,b,c}
9 {a,d,e}
10 {b,e}
ab, ad, ae, bc, bd, be, cd, de, ade, bde
ade->{ }
de->a ae->d ad->e
e->ad d->ae a->de
)A(
)BA( BA-
c
8
Midterm 4.
(a) Build a decision tree on the data set by using misclassification error rate as the criterion for splitting.
(b) Build the cost matrix and a new decision tree accordingly.
(c) What are the accuracy, precision, recall, and F1-measure of the new decision tree?
Number of
Instances
X Y + -
0 0 0 100
1 0 0 0
2 0 0 150
0 1 10 50
1 1 20 0
2 1 10 100
0 2 0 100
1 2 0 0
2 2 0 100
,
., ifinstancesofnumber
instancesofnumber2 ;, if3
; if0
)|(
ji
ji
ji
jic
9
Number of
Instances
X Y + -
0 0 0 100
1 0 0 0
2 0 0 150
0 1 10 50
1 1 20 0
2 1 10 100
0 2 0 100
1 2 0 0
2 2 0 100
x+ -
0 10 250
1 20 0
2 10 350
Error rate=260/640*10/260+360/640*10/360=20/640
y+ -
0 0 250
1 40 150
2 0 200
Error rate=190/640*40/190=40/640
10
Number of
Instances
X Y + -
0 0 0 100
1 0 0 0
2 0 0 150
0 1 10 50
1 1 20 0
2 1 10 100
0 2 0 100
1 2 0 0
2 2 0 100
xx + -
0 10 250
1 20 0
2 10 350
(X=0) y
+ -
0 0 100
1 10 50
2 0 100
10 2
+
y
10 2
- --
(X=2) y
+ -
0 0 150
1 10 100
2 0 100- --
10 2
y
x
1 0,2
+ -
11
,
., ifinstancesofnumber
instancesofnumber2 ;, if3
; if0
)|(
ji
ji
ji
jic + -
+ 0 2*600/40=30
- 3 0
real
predicted
x10 2
+y10 2
? ?? ? ??
10 2
y
(X=0) y
+ - + -
0 0 100
1 10 50
2 0 100
(X=2) y
+ - + -
0 0 150
1 10 100
2 0 100
+ -
10*0+50*3=150 10*30+50*0=300
10*0+100*3=300 10*30+100*0=300
+or
12
x
10 2
+y
10 2
? ?? ? ??
10 2
y
+ -
(X=0) y
+ -
0 0 100
1 10 50
2 0 100
(X=2) y
+ -
0 0 150
1 10 100
2 0 100
Recall=30/(30+10)F-measure=2*30/(2*30+10+50)
+ -
+ 30 10
- 50 450
x + -
0 10 250
1 20 0
2 10 350
accuracy=(30+450)/640Precision=30/(30+50)
cba
a
pr
rpba
aca
a
2
22(F) measure-F
(r) Recall
(p)Precision
a bc d
13
Bayesian Theorem
Given training data X, posteriori probability of a hypothesis H, P(H|X) follows the Bayes theorem
Informally, this can be written as
posterior =likelihood x prior / evidence
Predicts X belongs to C2 iff the probability P(Ci|X) is the highest among all the P(Ck|X) for all the k classes
Practical difficulty: require initial knowledge of many probabilities, significant computational cost
)()()|()|(
XPHPHXPXHP
14
Naïve Bayes Classifier
A simplified assumption: attributes are conditionally independentindependent and each data sample has n attributes
No dependence relation between attributes By Bayes theorem,
As P(X) is constant for all classes, assign X to the class with maximum P(X|Ci)*P(Ci)
)()()|()|(
XPCiPCiXPXCiP
15
Bayesian Networks
Bayesian belief network allows a subset of the variables
conditionally independent
A graphical model of causal relationships– Represents dependency among the variables
– Gives a specification of joint probability distribution
X Y
ZP
Nodes: random variablesLinks: dependencyX,Y are the parents of Z, and Y is the parent of PNo dependency between Z and PHas no loops or cycles
16
Bayesian Belief Network: An Example
FamilyHistory
LungCancer
PositiveXRay
Smoker
Emphysema
Dyspnea
(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)
LC
~LC
0.8
0.2
0.5
0.5
0.7
0.3
0.1
0.9
Bayesian Belief Networks
The CPT for the variable LungCancer:Shows the conditional probability for each possible combination of its parents
•One conditional probability table (CPT) for each variable
Derivation of the probability of a particular combination of values of X, from CPT:
n
iYParents ixPxxP in
1))(|(),...,( 1
17
Midterm Exam 5
(a) Draw the probability table for each node in the network.
(b) Use the Bayesian network to compute P(Engine = Bad, Air Conditioner= Broken).
Mileage
Engine
Air Conditioner
Number of Records
with Car Value=Hi
Number of Records
with Car Value=Lo
Hi Good
Working 3 4
Hi Good
Broken 1 2
Hi Bad Working 1 5
Hi Bad Broken 0 4
Lo Good
Working 10 0
Lo Good
Broken 3 1
Lo Bad Working 2 2
Lo Bad Broken 0 2
18
Mileage Engine Air Conditioner Number of Records with Car Value=Hi
Number of Records with Car Value=Lo
Hi Good Working 3 4
Hi Good Broken 1 2
Hi Bad Working 1 5
Hi Bad Broken 0 4
Lo Good Working 10 0
Lo Good Broken 3 1
Lo Bad Working 2 2
Lo Bad Broken 0 2
P(M=Hi)= 0.5P(M=Lo)= 0.5
P(A=W)= 27/40P(A=B)= 13/40
P(E=G|M=Hi)= 0.5P(E=B|M=Hi)=0.5 P(E=G|M=Lo)= 0.7P(E=B|M=Lo)= 0.3
P(C=Hi|E=G,A=W))=13/17P(C=Lo|E=G,A=W))=4/17P(C=Hi|E=G,A=B)=4/7P(C=Lo|E=G,A=B)=3/7P(C=Hi|E=B,A=W)=0.3P(C=Lo|E=B,A=W)=0.7P(C=Hi|E=B,A=B)=0P(C=Lo|E=B,A=B)=1
19