34
Last lecture summary Naïve Bayes Classifier

Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

Embed Size (px)

Citation preview

Page 1: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

Last lecture summaryNaïve Bayes Classifier

Page 2: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

Bayes Rule

Normalization Constant

Likelihood PriorPosterior

Prior and likelihood must be learnt (i.e. estimated from the data)

Page 3: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

• learning prior– A hundred independently drawn training examples

will usually suffice to obtain a reasonable estimate of P(Y).

• larning likelihood– The Naïve Bayes Assumption: Assume that all

features are independent given the class label Y.

𝑃 (𝑋 1,…, 𝑋𝑛|𝑌 )=∏𝑖=1

𝑛

𝑃 (𝑋 𝑖∨𝑌 )

Page 4: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

Example – Play Tennis

Page 5: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

Example – Learning Phase

Outlook Play=Yes Play=No

Sunny 2/9 3/5Overcast 4/9 0/5

Rain 3/9 2/5

Temperature Play=Yes Play=NoHot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5

Humidity Play=Yes Play=No

High 3/9 4/5Normal 6/9 1/5

Wind Play=Yes Play=No

Strong 3/9 3/5Weak 6/9 2/5

P(Play=Yes) = 9/14 P(Play=No) = 5/14

P(Outlook=Sunny|Play=Yes) = 2/9

Page 6: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

Example - Predictionx’=(Outl=Sunny, Temp=Cool, Hum=High, Wind=Strong)

Look up tables

P(Outl=Sunny|Play=No) = 3/5

P(Temp=Cool|Play=No) = 1/5

P(Hum=High|Play=No) = 4/5

P(Wind=Strong|Play=No) = 3/5

P(Play=No) = 5/14

P(Outl=Sunny|Play=Yes) = 2/9

P(Temp=Cool|Play=Yes) = 3/9

P(Hum=High|Play=Yes) = 3/9

P(Wind=Strong|Play=Yes) = 3/9

P(Play=Yes) = 9/14

P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.

Page 7: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

Last lecture summaryBinary classifier performance

Page 8: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

TP, TN, FP, FN

Precision, Positive Predictive Value (PPV) TP / (TP + FP)

Recall, Sensitivity, True Positive Rate (TPR), Hit rate TP / P = TP/(TP + FN)

False Positive Rate (FPR), Fall-out FP / N = FP / (FP + TN)

Specificity, True Negative Rate (TNR) TN / (TN + FP) = 1 - FPR

Accuracy (TP + TN) / (TP + TN + FP + FN)

Page 9: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated
Page 10: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

Neural networks(new stuff)

Page 11: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

Biological motivation

• The human brain has been estimated to contain (~1011) brain cells (neurons).

• A neuron is an electrically excitable cell that processes and transmits information by electrochemical signaling.

• Each neuron is connected with other neurons through the connections called synapses.

• A typical neuron possesses a cell body (often called soma), dendrites (many, mm), and an axon (one, 10 cm – 1 m).

Page 12: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated
Page 13: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

• Synapse permits a neuron to pass an electrical or chemical signal to another cell.

• Synapse can be either excitatory, or inhibitory.• Synapses are of different strength (the stronger

the synapse is, the more important it is).• The effects of synapses cumulate inside the

neuron.• When the cumulative effect of synapses reaches

certain threshold, the neuron gets activated, the signal is sent to the axon, through which the neuron is connected to other neuron(s).

Page 14: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

• Simplistic view of the function of neuron– Neuron accumulates positive/negative stimuli

from other neurons.– Then is processed further – – to produce an

output, i.e. neuron sends an output signal to neurons connected to it.

Page 15: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

Neural networks for applied science and engineering, Samarasinghe

Page 16: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

Warren McCulloch Walter Pitts

1899 - 1969 1923 - 1969

Threshold neuron

Page 17: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

• 1st mathematical model of neuron – McCulloch & Pitts binary (threshold) neuron– only binary inputs and output– the weights are pre-set, no learning

x1 x2 t

0.2 0.3 0

0.2 0.8 0

0.8 0.2 0

1.0 0.8 1

– inputs – weights – activation (tansfer) function - output

Page 18: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

• In this exercise, both weights will be fixed

• When the target is classified as 0 and when as 1?

• Set the threshold. – If threshold, then it is classified as 1. – If threshold, then it is classified as 0.

• Which threshold would you use?– e.g.

2

1 1 2 2 1 21

. j jj

w x w x w x x x

w x

x1 x2 t

0.2 0.3 0

0.2 0.8 0

0.8 0.2 0

1.0 0.8 1

Page 19: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

Heavyside (threshold) activation function

Page 20: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

• Threshold is incorporated as a weight of one additional input with input value .

• Such input is called bias.

2

0 1 1 2 20

1.0j jj

w x w w x w x

Page 21: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

• Because the location of the threshold function defines the two categories, its value of 1.3 decides a classification boundary that can be formulated as

Page 22: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

Perceptron (1957)

Frank Rosenblatt

Developed the learning algorithm.

Used his neuron (pattern recognizer = perceptron) for classification of letters.

Page 23: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

• binary classifier, maps its input x (real-valued vector) to – a binary value (0 or 1)

• (including bias)• 0 … otherwise

• perceptron can adjust its weights (i.e. can learn) – perceptron learning algorithm

Page 24: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

Multiple output perceptron• for multicategory (i.e. more than 2 classes) classification• one output neuron for each class

input layer

output layer

single layer (one-layered)vs.

double layer (two-layered)

Page 25: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

Learning

• Set the weights (including threshold ).• Supervised learning, we know the target

values .• We want the outputs to be as close as

possible to the desired values of . • We define an error (Sum of Squares Error, we

already know this one)

Page 26: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

• “ to be as close as possible to ” means that shoud be minimal

• So we want to minimize , which is the function of weights .– is also called objective function or sometimes

energy.

Page 27: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

2

0 0i i j

E E

w w w

requirements for the minimum

Gradient grad is a vector pointing in the direction of the greatest rate of increase of the function

We want to decline, we take -grad.

1 2

grad ,E E

E Ew w

Page 28: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

Delta rule

• gradient descent• How to train linear neuron using delta rule?• Demonstration will be given for one neuron

with one input , no bias, one output .

Σ 𝑦𝑥𝑤1

Page 29: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

• Neuron is presented with an input pattern.• It calculates , and its outuput as (no threshold

is used)• The error E:

• If you draw against , which curve you get?

erro

r gradient

Page 30: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

• To find a gradient , differentiate the error E with respect to w1:

• According to the delta rule, weight change is proportional to the negative of the error gradient:

• New weight:

1

d 2

d 2

Et y x t y x x

w

1w x

1 1 1 1new old oldw w w w x

𝐸=12

(𝑡 – 𝑦 )2=12

(𝑡 –𝑤1𝑥 )2𝑑𝐸𝑑𝑤1

=?

Page 31: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

β is called a learning rate. It determines how far along the gradient it is necessary to move.

Page 32: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

11 1 1 1i i iw w w w x the new weight after ith iteration

Page 33: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

• This is an iterative algorithm, one pass through training set is not enough.

• One pass of the whole training data set is called an epoch.

• Adjusting the weights after each input pattern presentation (iteration) is called example-by-example (online) learning.– For some problems this can cause weights to

oscillate – adjustment required by one pattern may be canceled by the next pattern.

– More popular is the next method.

Page 34: Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated

• Batch learning – wait until all input patterns (i.e. epoch) have been processed and then adjust weights in the average sense.– More stable solution.– Obtain the error gradient for each input pattern– Average them at the end of the epoch– Use this average value to adjust the weights using

the delta rule

11

1 n

i ii

w xn