1 Rule Generation l Given a frequent itemset L, find all non-empty subsets f L such that f L – f satisfies the minimum confidence requirement –If {A,B,C,D}

1

Rule Generation

Given a frequent itemset L, find all non-empty subsets f L such that f L – f satisfies the minimum confidence requirement– If {A,B,C,D} is a frequent itemset, candidate rules:

ABC D, ABD C, ACD B, BCD A, A BCD, B ACD, C ABD, D ABCAB CD, AC BD, AD BC, BC AD, BD AC, CD AB,

If |L| = k, then there are 2k – 2 candidate association rules (ignoring L and L)

[Chapter 6 . 6 . 2 ]

2

Rule Generation

How to efficiently generate rules from frequent itemsets?– In general, confidence does not have an anti-monotone

propertyc(ABC D) can be larger or smaller than c(AB

D)

– But confidence of rules generated from the same itemset has an anti-monotone property

– e.g., L = {A,B,C,D}:

c(ABC D) c(AB CD) c(A BCD) Confidence is anti-monotone w.r.t. number of items on the RHS of the rule

67.03

2

)Diaper,Milk(

)BeerDiaper,Milk,(

BeerDiaper-,Milk

c

3

Rule Generation for Apriori Algorithm

ABCD=>{ }

BCD=>A ACD=>B ABD=>C ABC=>D

BC=>ADBD=>ACCD=>AB AD=>BC AC=>BD AB=>CD

D=>ABC C=>ABD B=>ACD A=>BCD

Lattice of rulesABCD=>{ }

BCD=>A ACD=>B ABD=>C ABC=>D

BC=>ADBD=>ACCD=>AB AD=>BC AC=>BD AB=>CD

D=>ABC C=>ABD B=>ACD A=>BCD

Pruned Rules

Low Confidence Rule

[optional]

4

Rule Generation for Apriori Algorithm

Candidate rule is generated by merging two rules that share the same prefixin the rule consequent

join(CD=>AB,BD=>AC)would produce the candidaterule D => ABC

Prune rule D=>ABC if itssubset AD=>BC does not havehigh confidence

BD=>ACCD=>AB

D=>ABC

[optional]

5

Midterm Exam 1

(d) List all association rules with the minimum confidence minconf=50% and minimum support minsup=30%.

Transaction ID Items Bought

1 {a,b,d,e}

2 {b,c,d}

3 {a,b,d,e}

4 {a,c,e}

5 {b,c,d,e}

6 {b,d }

7 {c,d}

8 {a,b,c}

9 {a,d,e}

10 {b,e}

6

F

F F F F F

F F F F F F FF

F F

7

Transaction ID Items Bought

1 {a,b,d,e}

2 {b,c,d}

3 {a,b,d,e}

4 {a,c,e}

5 {b,c,d,e}

6 {b,d }

7 {c,d}

8 {a,b,c}

9 {a,d,e}

10 {b,e}

ab, ad, ae, bc, bd, be, cd, de, ade, bde

ade->{ }

de->a ae->d ad->e

e->ad d->ae a->de

)A(

)BA( BA-

c

8

Midterm 4.

(a) Build a decision tree on the data set by using misclassification error rate as the criterion for splitting.

(b) Build the cost matrix and a new decision tree accordingly.

(c) What are the accuracy, precision, recall, and F1-measure of the new decision tree?

Number of

Instances

X Y + -

0 0 0 100

1 0 0 0

2 0 0 150

0 1 10 50

1 1 20 0

2 1 10 100

0 2 0 100

1 2 0 0

2 2 0 100

,

., ifinstancesofnumber

instancesofnumber2 ;, if3

; if0

)|(

ji

ji

ji

jic

9

Number of

Instances

X Y + -

0 0 0 100

1 0 0 0

2 0 0 150

0 1 10 50

1 1 20 0

2 1 10 100

0 2 0 100

1 2 0 0

2 2 0 100

x+ -

0 10 250

1 20 0

2 10 350

Error rate=260/640*10/260+360/640*10/360=20/640

y+ -

0 0 250

1 40 150

2 0 200

Error rate=190/640*40/190=40/640

10

Number of

Instances

X Y + -

0 0 0 100

1 0 0 0

2 0 0 150

0 1 10 50

1 1 20 0

2 1 10 100

0 2 0 100

1 2 0 0

2 2 0 100

xx + -

0 10 250

1 20 0

2 10 350

(X=0) y

+ -

0 0 100

1 10 50

2 0 100

10 2

+

y

10 2

- --

(X=2) y

+ -

0 0 150

1 10 100

2 0 100- --

10 2

y

x

1 0,2

+ -

11

,

., ifinstancesofnumber

instancesofnumber2 ;, if3

; if0

)|(

ji

ji

ji

jic + -

+ 0 2*600/40=30

- 3 0

real

predicted

x10 2

+y10 2

? ?? ? ??

10 2

y

(X=0) y

+ - + -

0 0 100

1 10 50

2 0 100

(X=2) y

+ - + -

0 0 150

1 10 100

2 0 100

+ -

10*0+50*3=150 10*30+50*0=300

10*0+100*3=300 10*30+100*0=300

+or

12

x

10 2

+y

10 2

? ?? ? ??

10 2

y

+ -

(X=0) y

+ -

0 0 100

1 10 50

2 0 100

(X=2) y

+ -

0 0 150

1 10 100

2 0 100

Recall=30/(30+10)F-measure=2*30/(2*30+10+50)

+ -

+ 30 10

- 50 450

x + -

0 10 250

1 20 0

2 10 350

accuracy=(30+450)/640Precision=30/(30+50)

cba

a

pr

rpba

aca

a

2

22(F) measure-F

(r) Recall

(p)Precision

a bc d

13

Bayesian Theorem

Given training data X, posteriori probability of a hypothesis H, P(H|X) follows the Bayes theorem

Informally, this can be written as

posterior =likelihood x prior / evidence

Predicts X belongs to C2 iff the probability P(Ci|X) is the highest among all the P(Ck|X) for all the k classes

Practical difficulty: require initial knowledge of many probabilities, significant computational cost

)()()|()|(

XPHPHXPXHP

14

Naïve Bayes Classifier

A simplified assumption: attributes are conditionally independentindependent and each data sample has n attributes

No dependence relation between attributes By Bayes theorem,

As P(X) is constant for all classes, assign X to the class with maximum P(X|Ci)*P(Ci)

)()()|()|(

XPCiPCiXPXCiP

15

Bayesian Networks

Bayesian belief network allows a subset of the variables

conditionally independent

A graphical model of causal relationships– Represents dependency among the variables

– Gives a specification of joint probability distribution

X Y

ZP

Nodes: random variablesLinks: dependencyX,Y are the parents of Z, and Y is the parent of PNo dependency between Z and PHas no loops or cycles

16

Bayesian Belief Network: An Example

FamilyHistory

LungCancer

PositiveXRay

Smoker

Emphysema

Dyspnea

(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)

LC

~LC

0.8

0.2

0.5

0.5

0.7

0.3

0.1

0.9

Bayesian Belief Networks

The CPT for the variable LungCancer:Shows the conditional probability for each possible combination of its parents

•One conditional probability table (CPT) for each variable

Derivation of the probability of a particular combination of values of X, from CPT:

n

iYParents ixPxxP in

1))(|(),...,( 1

17

Midterm Exam 5

(a) Draw the probability table for each node in the network.

(b) Use the Bayesian network to compute P(Engine = Bad, Air Conditioner= Broken).

Mileage

Engine

Air Conditioner

Number of Records

with Car Value=Hi

Number of Records

with Car Value=Lo

Hi Good

Working 3 4

Hi Good

Broken 1 2

Hi Bad Working 1 5

Hi Bad Broken 0 4

Lo Good

Working 10 0

Lo Good

Broken 3 1

Lo Bad Working 2 2

Lo Bad Broken 0 2

18

Mileage Engine Air Conditioner Number of Records with Car Value=Hi

Number of Records with Car Value=Lo

Hi Good Working 3 4

Hi Good Broken 1 2

Hi Bad Working 1 5

Hi Bad Broken 0 4

Lo Good Working 10 0

Lo Good Broken 3 1

Lo Bad Working 2 2

Lo Bad Broken 0 2

P(M=Hi)= 0.5P(M=Lo)= 0.5

P(A=W)= 27/40P(A=B)= 13/40

P(E=G|M=Hi)= 0.5P(E=B|M=Hi)=0.5 P(E=G|M=Lo)= 0.7P(E=B|M=Lo)= 0.3

P(C=Hi|E=G,A=W))=13/17P(C=Lo|E=G,A=W))=4/17P(C=Hi|E=G,A=B)=4/7P(C=Lo|E=G,A=B)=3/7P(C=Hi|E=B,A=W)=0.3P(C=Lo|E=B,A=W)=0.7P(C=Hi|E=B,A=B)=0P(C=Lo|E=B,A=B)=1

19

Documents

1 Rule Generation l Given a frequent itemset L, find all non-empty subsets f L such that f L – f satisfies the minimum confidence requirement –If {A,B,C,D}