On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

On Discriminative vs. Gener-ative classifiers: Naïve Bayes

Presenter : Seung Hwan, Bae

Andrew Y. Ng and Michael I. JordanNeural Information Processing System (NIPS),

2001 (slides adapted from Ke Chen from University of Manchester

and YangQiu Song from MSRA)Total Citation: 831

Machine Learning

Training classifiers involves estimating f: X->Y, or P(Y|X)– X: Training data, Y: Labels

Discriminative classifiers(also called ‘infor-mative’ by Rubinstein & Hastie):– Assume some functional form from for P(Y|X)– Estimate parameters of P(Y|X) directly from training data

Generative classifier– Assume some functional from for P(X|Y), P(X)– Estimate parameters of P(X|Y), P(X) directly from train-

ing data– Use Bayes rule to calculate

Generative vs. Discriminative Classi-fiers

Bayes Formula

Generative Model

• Color• Size• Texture• Weight• …

Discriminative Model

Logistic Regression

• Color• Size• Texture• Weight• …

Generative models– Assume some functional form for P(X|Y), P(Y)– Estimate parameters of P(X|Y), P(Y) directly from training

data– Use Bayes rule to calculate P(Y|X=x)

Discriminative models– Directly assume some functional form for P(Y|X)– Estimate parameters of P(Y|X) directly from training data

Comparison

Naïve BayesGenerative

Logistic RegressionDiscriminative

Probability Basics

• Prior, conditional and joint probability for random variables– Prior probability:

– Conditional probability: – Joint probability: – Relationship:– Independence:

• Bayesian Rule

)| ,)( 121 XP(XX|XP 2

)()()(

PCPC|P

) )( ),,( 22 ,XP(XPXX 11 XX

)()|()()|() 2211122 XPXXPXPXXP,XP(X1

)()() ),()|( ),()|( 212121212 XPXP,XP(XXPXXPXPXXP 1

EvidencePriorLikelihood

Posterior

Establishing a probabilistic model for classi-fication– Discriminative model

Probabilistic Classification

),, , )( 1 n1L X(Xc,,cC|CP XX

),,,( 21 nxxx x

Discriminative Probabilistic Classifier

1x 2x nx

)|( 1 xcP )|( 2 xcP )|( xLcP

Establishing a probabilistic model for classi-fication (cont.)– Generative model

),, , )( 1 n1L X(Xc,,cCC|P XX

GenerativeProbabilistic Model

for Class 1

)|( 1cP x

1x 2x nx

for Class 2

)|( 2cP x

1x 2x nx

for Class L

)|( LcP x

1x 2x nx

),,,( 21 nxxx x

MAP classification rule– MAP: Maximum A Posterior– Assign x to c* if

Generative classification with the MAP rule– Apply Bayesian rule to convert them into posterior prob-

abilities

– Then apply the MAP rule

Lc,,cccc|cCP|cCP 1** , )( )( xXxX

cCPcC|P

PcCPcC|P

,,2,1 for

)()()(

Bayes classification

- Difficulty: learning the joint probability

- If the number of feature n is large or when a feature can take on a large number of values, then basing such a model on probability tables is infeasible.

Naïve Bayes

)()|,,()()( )( 1 CPCXXPCPC|P|CP n XX

)|,,( 1 CXXP n

Naïve Bayes classification– Assume that all input attributes are conditionally inde-

pendent!

– MAP classification rule: for

Naïve Bayes

)|()|()|(

)|,,()|(

)|,,();,,|()|,,,(

CXPCXPCXP

CXXPCXP

CXXPCXXXPCXXXP

),,,( 21 nxxx x

Lnn ccccccPcxPcxPcPcxPcxP ,, , ),()]|()|([)()]|()|([ 1*

Naïve Bayes Algorithm (for discrete input attributes)– Learning phase: Given a train set S,

Output: conditional probability tables; for elements

– Test phase: Given an unknown instance Look up tables to assign the label c* to X’ if

Naïve Bayes

;in examples with )|( estimate)|(ˆ

),1 ;,,1( attributeeach of valueattributeevery For

;in examples with )( estimate)(ˆ

of et value each targFor 1

ijkjijkj

cCxXPcCxXP

N,knj Xx

cCPcCP

)c,,c(c c

LNX jj ,

),,( 1 naa X

Lnn ccccccPcaPcaPcPcaPcaP ,, , ),(̂)]|(̂)|(̂[)(̂)]|(̂)|(̂[ 1*

Example

• Example: Play Tennis

Learning phase

Example

Outlook Play=Yes

Play=No

Sunny 2/9 3/5Overcast 4/9 0/5

Rain 3/9 2/5

Temperature

Play=Yes Play=No

Hot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5

Humidity Play=Yes

Play=No

High 3/9 4/5Normal 6/9 1/5

Wind Play=Yes

Play=No

Strong 3/9 3/5Weak 6/9 2/5

P(Play=Yes) = 9/14P(Play=No) = 5/14

Test Phase– Given a new instances x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)

– Look up tables

– MAP rule

Example

P(Outlook=Sunny|Play=Yes) = 2/9

P(Temperature=Cool|Play=Yes) = 3/9

P(Huminity=High|Play=Yes) = 3/9

P(Wind=Strong|Play=Yes) = 3/9

P(Play=Yes) = 9/14

P(Outlook=Sunny|Play=No) = 3/5

P(Temperature=Cool|Play==No) = 1/5

P(Huminity=High|Play=No) = 4/5

P(Wind=Strong|Play=No) = 3/5

P(Play=No) = 5/14

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.

• Test Phase– Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High,

Wind=Strong)– Look up tables

– MAP rule

P(Outlook=Sunny|Play=No) = 3/5

P(Temperature=Cool|Play==No) = 1/5

P(Huminity=High|Play=No) = 4/5

P(Wind=Strong|Play=No) = 3/5

P(Play=No) = 5/14

P(Outlook=Sunny|Play=Yes) = 2/9

P(Temperature=Cool|Play=Yes) = 3/9

P(Huminity=High|Play=Yes) = 3/9

P(Wind=Strong|Play=Yes) = 3/9

P(Play=Yes) = 9/14

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be

“No”.

Example

Violation of Independent Assumption– For many real world tasks,– Nevertheless, naïve Bayes works surprisingly well any-

way! Zero conditional probability problem

– In no example contains the attribute value– In this circumstance, during test– For a remedy, conditional probabilities estimated with

Relevant Issues

)|()|( )|,,( 11 CXPCXPCXXP nn

0)|(̂ , ijkjjkj cCaXPaX0)|(ˆ)|(ˆ)|(ˆ

1 inijki cxPcaPcxP

)1 examples, virtual"" of(number prior o weight t:

) of valuespossible for /1 (usually, estimateprior :

for which examples trainingofnumber :

C and for which examples trainingofnumber :

caXnmn

mpncCaXP

Continuous-valued Input Attributes– Numberless vales for an attribute– Conditional probability modeled with the normal distribu-

– Learning phase: Output: normal distributions and– Test phase:

• Calculate conditional probabilities with all the normal distribution• Apply the MAP rule to make a decision

Relevant Issues

for which examples of X valuesattribute ofdeviation standard :

Cfor which examples of valuesattribute of (avearage)mean :

1)|(ˆ

Ln ccCXX ,, ),,,(for 11 X

Ln LicCP i ,,1 )( ),,(for 1 nXX X

Naïve Bayes based on the independent as-sumption– A small amount of training data to estimate parameters

(means and variances of the variable)– Only the variances of variables for each class need to be

determined and not the entire covariance matrix– Test is straightforward; just looking up tables or calculat-

ing conditional probabilities with normal distribution

Advantages of Naïve Bayes

Performance competitive to most of state-of-art classifiers even in presence of violat-ing independence assumption

Many successful application, e.g., spam mail fitering

A good candidate of a base learner in en-semble learning

Apart from classification, naïve Bayes can do more…

Conclusion

On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

Documents

Electronic and Magnetic Structure of Transition Metals doped GaN Seung-Cheol Lee, Kwang-Ryeol Lee, Kyu-Hwan Lee Future Technology Research Division, KIST,

Asymptomatic Transmission of SARS-CoV-2 on Evacuation Flight · Sung Hwan Bae, Heidi Shin, Ho-Young Koo, Seung Won Lee, Jee Myung Yang, Dong Keon Yon Author affiliations: Soonchunhyang

Rank of Experts: Detection Network Ensemble · Rank of Experts: Detection Network Ensemble Seung-Hwan Bae1, Youngwan Lee 2, Youngjoo Jo , Yuseok Bae 2, Joong-won Hwang 1Computer Vision

D-FACTOR: A Quantitative Performance Model of Application ...€¦ · D-FACTOR: A Quantitative Performance Model of Application Slow-down in Multi-Resource Shared Systems Seung-Hwan

High-performance photocurrent generation from two ... · High-performance photocurrent generation from two-dimensional WS 2 field-effect transistors Seung Hwan Lee,1,2,a) ... which

Hwan Hill Mobis Introduction

Jung Hwan Kim - :: BNR

Localized discriminative Gaussian process latent variable ... · PDF fileLocalized discriminative Gaussian process latent variable model for text ... we use Discriminative Gaussian

Machine Learning Classification, Discriminative …...Machine Learning Classiﬁcation, Discriminative learning Structured output, structured input, discriminative function, joint

Meta-stable Sites in Amorphous Carbon Generated by Rapid Quenching of Liquid Diamond Seung-Hyeob Lee, Seung-Cheol Lee, Kwang-Ryeol Lee, Kyu-Hwan Lee, and

RESUME- Dr. Chung-Hwan CHEN

Digital Data Visualization May 1, 2001 Hwan-Seung Yong Dept. of Computer Science & Eng Ewha Womans Univ. hsyong@ewha.ac.kr

Young Career Focus: Dr. Seung Hwan Cho (Pohang University ... · Young Career Focus: Dr. Seung Hwan Cho (Pohang University of Science and Technology, Republic of Korea) Background

Introduction to Web Programming ICS213, 1 / 2011 Dr. Seung Hwan Kang

Additional Value of Shear Wave Sonoelastography in … Value of Shear Wave Sonoelastography in the Assessment of Non-traumatic Achilles Tendinopathy. Seung Hwan Han1, Sungjun Kim2,

Hydrogel-laden paper scaffold system for origami-based ... · Hydrogel-laden paper scaffold system for origami-based tissue engineering Su-Hwan Kima,1, Hak Rae Leeb,1, Seung Jung

REVIEW Open Access Hanwoo cattle: origin, domestication, … · Hanwoo cattle: origin, domestication, breeding strategies and genomic selection Seung-Hwan Lee1†, Byoung-Ho Park2†,

Cascading Style Sheets (CSS) ICS213, 1 / 2011 Dr. Seung Hwan Kang

New Prediction Techniques for Inter- and Intra-Frames of ... · New Prediction Techniques for Inter- and Intra-Frames of Advanced Video Coding Chung-Cheng Lou , Szu-Wei Leey, Seung-Hwan

PHP Basics 2 ICS213, 1 / 2011 Dr. Seung Hwan Kang 1