Naïve Bayes -...

Naïve BayesCS 760@UW-Madison

Goals for the lecture• understand the concepts

• generative/discriminative models • MLE (Maximum Likelihood Estimate) and MAP (Maximum A Posteriori)

• Naïve Bayes• Naïve Bayes assumption• Generic Naïve Bayes• model 1: Bernoulli Naïve Bayes

• Other Naïve Bayes• model 2: Multinomial Naïve Bayes• model 3: Gaussian Naïve Bayes• model 4: Multiclass Naïve Bayes

Review: supervised learning

problem setting• set of possible instances (feature space):• unknown target function (concept): • set of hypotheses (hypothesis class):

given• training set of instances of unknown target function f

output• hypothesis that best approximates target functionHh∈

( ) ( ) ( ))()()2()2()1()1( , ... , ,, mm yyy xxx

Parametric hypothesis class• hypothesis is indexed by parameter• learning: find the such that best approximate the target

• different from “nonparametric” approaches like decision trees and nearest neighbor

• advantages: various hypothesis class encoding inductive bias/prior knowledge; easy to use math/optimization

Hh∈ Θ∈θθ Hh ∈θ

Hh ∈θ

Discriminative approaches• hypothesis directly predicts the label given the features

• then define a loss function and find hypothesis with min. loss

• example: linear regression

)()|( generally, moreor )( xhxypxhy ==

ii yxhm

2)()( ))((1)(

Generative approaches• hypothesis specifies a generative story for how the data was

created

• then pick a hypothesis by maximum likelihood estimation (MLE) or Maximum A Posteriori (MAP)

• example: roll a weighted die• weights for each side ( ) define how the data are generated• use MLE on the training data to learn

),(),( yxpyxh =

or simply ℎ 𝑥𝑥 = 𝑝𝑝(𝑥𝑥)

Comments on discriminative/generative

• usually for supervised learning, parametric hypothesis class• can also for unsupervised learning

• k-means clustering (discriminative flavor) vs Mixture of Gaussians (generative)

• can also for nonparametric• nonparametric Bayesian: a large subfield of ML

• when discriminative/generative is likely to be better? Discussed in later lecture

• typical discriminative: linear regression, logistic regression, SVM, many neural networks (not all!), …

• typical generative: Naïve Bayes, Bayesian Networks, …

MLE and MAP

MLE vs. MAP

Maximum Likelihood Estimate (MLE)

Background: MLE

Example: MLE of Exponential Distribution

Background: MLE

MLE vs. MAP

Maximum Likelihood Estimate (MLE)

Maximum a posteriori(MAP) estimate

Naïve Bayes

Spam News

The Economist The Onion

Model 0: Not-so-naïve Model?

Generative Story:1. Flip a weighted coin (Y)2. If heads, sample a document ID (X) from the Spam

distribution3. If tails, sample a document ID (X) from the Not-Spam

distribution

Generative Story:1. Flip a weighted coin (Y)2. If heads, roll the yellow many sided die to sample a document

vector (X) from the Spam distribution3. If tails, roll the blue many sided die to sample a document

vector (X) from the Not-Spam distribution

This model is computationally naïve!

If HEADS, roll yellow die

Flip weighted coin

If TAILS, roll blue die

0 1 0 1 … 1

y x1 x2 x3 … xK

1 0 1 0 … 1

1 1 1 1 … 1

0 0 0 1 … 1

0 1 0 1 … 0

1 1 0 1 … 0

Each side of the die is labeled with a

document vector (e.g. [1,0,1,…,1])

Naïve Bayes Assumption

Conditional independence of features:

Estimating a joint from conditional probabilities

A C P(A|C)

0 0 0.2

0 1 0.5

1 0 0.8

1 1 0.5

B C P(B|C)

0 0 0.1

0 1 0.9

1 0 0.9

1 1 0.1

A B C P(A,B,C)

0 0 0 …

0 0 1 …

0 1 0 …

C P(C)

0 0.33

1 0.67

C P(C)

0 0.33

1 0.67 A B D C P(A,B,D,C)

0 0 0 0 …

0 0 1 0 …

0 1 0 0 …

0 1 1 0

1 0 0 0

1 0 1 0

1 1 0 0

1 1 1 0

0 0 0 1

0 0 1 0

… … .. … …

A C P(A|C)

0 0 0.2

0 1 0.5

1 0 0.8

1 1 0.5

B C P(B|C)

0 0 0.1

0 1 0.9

1 0 0.9

1 1 0.1

D C P(D|C)

0 0 0.1

0 1 0.1

1 0 0.9

1 1 0.1

),...,()()|...,(

),...,|(:,1

===∀n

nn XXP

yYPYXXPXXyYPy

Assuming conditional independence, the conditional probabilities encode the same information as the joint table.

They are very convenient for estimating P( X1,…,Xn|Y)=P( X1|Y)*…*P( Xn|Y)

They are almost as good for computing

Model: Product of prior and the event model

Generic Naïve Bayes Model

Support: Depends on the choice of event model, P(Xk|Y)

Training: Find the class-conditional MLE parameters

For P(Y), we find the MLE using the data. For each P(Xk|Y)we condition on the data with the corresponding class.

Classification: Find the class that maximizes the posterior

Classification:

Various Naïve Bayes Models

Model 1: Bernoulli Naïve Bayes

Support: Binary vectors of length K

Generative Story:

Model:

If HEADS, flip each yellow coin

Flip weighted coin

If TAILS, flip each blue coin

0 1 0 1 … 1

y x1 x2 x3 … xK

1 0 1 0 … 1

1 1 1 1 … 1

0 0 0 1 … 1

0 1 0 1 … 0

1 1 0 1 … 0Each red coin

corresponds to an xk

… …

We can generate data in this fashion. Though in

practice we never would since our data is given.

Instead, this provides an explanation of how the

data was generated (albeit a terrible one).

Support: Binary vectors of length K

Generative Story:

Model:

Classification: Find the class that maximizes the posterior

Same as Generic Naïve Bayes

Classification:

Training: Find the class-conditional MLE parameters

For P(Y), we find the MLE using all the data. For each P(Xk|Y) we condition on the data with the corresponding class.

Model 2: Multinomial Naïve Bayes

Integer vector (word IDs)Support:

Generative Story:

Model:

(Assume 𝑀𝑀𝑖𝑖 = 𝑀𝑀 for all 𝑖𝑖)

Model 3: Gaussian Naïve Bayes

Model: Product of prior and the event model

Support:

Model 4: Multiclass Naïve Bayes

Model:

Summary: Generative Approach

• Step 1: specify the joint data distribution (generative story)• Step 2: use MLE or MAP for training• Step 3: use Bayes’ rule for inference on test instances

THANK YOUSome of the slides in these lectures have been adapted/borrowed

from materials developed by Mark Craven, David Page, Jude Shavlik, Tom Mitchell, Nina Balcan, Matt Gormley, Elad Hazan,

Tom Dietterich, and Pedro Domingos.

Naïve Bayes -...

Documents

Acalabrutinib plus Obinutuzumab in Treatment-Naïve and Relapsed/Refractory Chronic ... · treatment-naïve or relapsed/refractory chronic lymphocytic leukemia (CLL). Nineteen treatment-naïve

Naïve Bayes Classifier · Naïve Bayes Classifier 17 • Bayes Classifier with additional “naïve” assumption: – Features are independent given class: – More generally: •

Naïve Bayesmgormley/courses/10601-s18/slides/... · 1. Bernoulli Naïve Bayes: –for binary features 2.Gaussian Naïve Bayes: –for continuousfeatures 3.MultinomialNaïve Bayes:

Naïve Bayes 𝑖 𝜶 - Kangwoncs.kangwon.ac.kr/.../2015_MachineLearning/07_naive_bayes.pdf · 2016. 6. 17. · Naïve Bayes •Bayes rule을적용하면모든데이터에대하여고려해야함

Naïve T cells

Naïve Bayes, Maxent and Neural Models · Naïve Bayes (NB) classification Terminology: bag-of-words “Naïve” assumption Training & performance NB as a language Maximum Entropy

Naïve Bayes Classifier

Discrete Naïve Bayes

Naïve Bayes: refinements

Naïve Bayes - University of Wisconsin–Madisonpages.cs.wisc.edu/.../slides/lecture8-naive-bayes.pdf · 2018. 9. 30. · Model: Product of prior and the event model Generic Naïve

The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Naïve Bayes - University of California, Berkeleycs188/su19/assets/... · Naïve Bayes for Digits § Naïve Bayes: Assume all features are independent effects of the label § Simple

More Naïve Bayes

Naïve Bayes Classfication

Naïve Bayes (Continued)

Naïve Bayes and Logistic Regression · Naïve Bayes and Logistic Regression Naïve Bayes •Generative Model "̂=argmax)*(")(-|") •Features assumed to be independent Logistic Regression

Recent work on Naïve Realism Nov 2014jamesgenone.camden.rutgers.edu/files/2015/01/Recent... · RECENT WORK ON NAÏVE REALISM James Genone 1. INTRODUCTION Naïve realism, often overlooked

Naïve Bayes - GitHub Pagesaritter.github.io/courses/5522_slides/naive_bayes.pdf · Naïve Bayes for Digits Naïve Bayes: Assume all features are independent effects of the label

MLE and MAP•Maximum a posteriori (MAP) estimate •Naïve Bayes •Various Naïve Bayes models •model 1: Bernoulli Naïve Bayes •model 2: Multinomial Naïve Bayes •model 3:

naïve compositing