Naive Bayes ClassifierFrom Jiawei Han's slides Play-tennis example: estimating P(di|c) Outlook...

Preview:

Citation preview

Naive Bayes Classifier

Lecturer: Ji Liu

This is not me :-). He is Bayes

Most slides are from Eamonn Keogh's.

Key of Bayes Classifiers

Key of Bayes Classifiers●

Single attribute = “name”

More Attributes (Features)

● In the “policewoman” case, we only consider a single attribute “name” to predict the gender;

● What if the number of attributes is more than one (a more general case)?

● The way to estimate is the same● Then how to estimate ?

This is the key assumption for NAIVE Bayes!!!

How to estimate?

From Jiawei Han's slides

Play-tennis example: estimating P(di|c)

Outlook Temperature Humidity Windy Classsunny hot high false Nsunny hot high true Novercast hot high false Prain mild high false Prain cool normal false Prain cool normal true Novercast cool normal true Psunny mild high false Nsunny cool normal false Prain mild normal false Psunny mild normal true Povercast mild high true Povercast hot normal false Prain mild high true N

P(true|n) = 3/5P(true|p) = 3/9

P(false|n) = 2/5P(false|p) = 6/9

P(high|n) = 4/5P(high|p) = 3/9

P(normal|n) = 2/5P(normal|p) = 6/9

P(hot|n) = 2/5P(hot|p) = 2/9

P(mild|n) = 2/5P(mild|p) = 4/9

P(cool|n) = 1/5P(cool|p) = 3/9

P(rain|n) = 2/5P(rain|p) = 3/9

P(overcast|n) = 0P(overcast|p) = 4/9

P(sunny|n) = 3/5P(sunny|p) = 2/9

windy

humidity

temperature

outlook

P(n) = 5/14

P(p) = 9/14

From Jiawei Han's slides

Issues for Naive Bayes

● p(d|cj)=p(d1|c)* p(d2|c) * ... * p(dn|c) would be a tiny number. How to deal with the numerical issue in practice?

● Compute log p(d|cj) instead of p(d|cj)

– log p(d|cj) = log p(d1|cj) + log p(d2|cj) + … log p(dn|cj)

From Jiawei Han's slides

Play-tennis example: classifying dAn unseen sample d = <rain, hot, high, false>

P(<rain, hot, high, false> | p) * P(p) = P(rain|p) * P(hot|p) * P(high|p) * P(false|p) * P(p)

P(d|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 = 0.010582

P(d|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 = 0.018286

Sample d is classified in class n (don’t play)

Recommended