15
Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: P(Cause | Evidence): diagnostic probability P(Evidence | Cause): causal probability ) ( ) ( ) | ( ) | ( B P A P A B P B A P Rev. Thomas Bayes (1702-1761) ) ( ) ( ) | ( ) | ( Evidence P Cause P Cause Evidence P Evidence Cause P

Bayes Rule

  • Upload
    lovey

  • View
    55

  • Download
    0

Embed Size (px)

DESCRIPTION

Bayes Rule. Rev. Thomas Bayes (1702-1761). How is this rule derived? Using Bayes rule for probabilistic inference: P(Cause | Evidence): diagnostic probability P(Evidence | Cause): causal probability. Bayesian decision theory. - PowerPoint PPT Presentation

Citation preview

Page 1: Bayes  Rule

Bayes Rule

• How is this rule derived?• Using Bayes rule for probabilistic inference:

– P(Cause | Evidence): diagnostic probability– P(Evidence | Cause): causal probability

)()()|()|(

BPAPABPBAP Rev. Thomas Bayes

(1702-1761)

)()()|()|(

EvidencePCausePCauseEvidencePEvidenceCauseP

Page 2: Bayes  Rule

Bayesian decision theory• Suppose the agent has to make a decision about

the value of an unobserved query variable X given some observed evidence E = e – Partially observable, stochastic, episodic environment– Examples: X = {spam, not spam}, e = email message

X = {zebra, giraffe, hippo}, e = image features– The agent has a loss function, which is 0 if the value

of X is guessed correctly and 1 otherwise– What is agent’s optimal estimate of the value of X?

• Maximum a posteriori (MAP) decision: value of X that minimizes expected loss is the one that has the greatest posterior probability P(X = x | e)

Page 3: Bayes  Rule

MAP decision• X = x: value of query variable• E = e: evidence

• Maximum likelihood (ML) decision:

)()|(maxarg)()()|()|(maxarg*

xPxePePxPxePexPx

x

x

)()|()|( xPxePexP likelihood priorposterior

)|(maxarg* xePx x

Page 4: Bayes  Rule

Example: Spam Filter• We have X = {spam, ¬spam}, E = email message.• What should be our decision criterion?

– Compute P(spam | message) and P(¬spam | message), and assign the message to the class that gives higher posterior probability

Page 5: Bayes  Rule

Example: Spam Filter• We have X = {spam, ¬spam}, E = email message.• What should be our decision criterion?

– Compute P(spam | message) and P(¬spam | message), and assign the message to the class that gives higher posterior probability

P(spam | message) P(message | spam) P(spam)P(¬spam | message) P(message | ¬spam) P(¬spam)

Page 6: Bayes  Rule

Example: Spam Filter• We need to find P(message | spam) P(spam) and

P(message | ¬spam) P(¬spam)• How do we represent the message?

– Bag of words model:• The order of the words is not important• Each word is conditionally independent of the others given

message class • If the message consists of words (w1, …, wn), how do we

compute P(w1, …, wn | spam)?– Naïve Bayes assumption: each word is conditionally

independent of the others given message class

n

iin spamwPspamwwPspammessageP

11 )|()|,,()|(

Page 7: Bayes  Rule

Example: Spam Filter• Our filter will classify the message as spam if

• In practice, likelihoods are pretty small numbers, so we need to take logs to avoid underflow:

• Model parameters: – Priors P(spam), P(¬spam)– Likelihoods P(wi | spam), P(wi | ¬spam)

• These parameters need to be learned from a training set (a representative sample of email messages marked with their classes)

n

ii

n

ii spamwPspamPspamwPspamP

11

)|()()|()(

)|(log)(log)|()(log11

spamwPspamPspamwPspamP i

n

i

n

ii

Page 8: Bayes  Rule

Parameter estimation• Model parameters:

– Priors P(spam), P(¬spam)– Likelihoods P(wi | spam), P(wi | ¬spam)

• Estimation by empirical word frequencies in the training set:

– This happens to be the parameter estimate that maximizes the likelihood of the training data:

P(wi | spam) =# of occurrences of wi in spam messages

total # of words in spam messages

D

d

n

idid

d

classwP1 1

, )|(

d: index of training document, i: index of a word

Page 9: Bayes  Rule

Parameter estimation• Model parameters:

– Priors P(spam), P(¬spam)– Likelihoods P(wi | spam), P(wi | ¬spam)

• Estimation by empirical word frequencies in the training set:

• Parameter smoothing: dealing with words that were never seen or seen too few times– Laplacian smoothing: pretend you have seen every vocabulary word

one more time than you actually did

P(wi | spam) =# of occurrences of wi in spam messages

total # of words in spam messages

Page 10: Bayes  Rule

Bayesian decision making: Summary

• Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E

• Inference problem: given some evidence E = e, what is P(X | e)?

• Learning problem: estimate the parameters of the probabilistic model P(X | E) given a training sample {(x1,e1), …, (xn,en)}

Page 11: Bayes  Rule

Bag-of-word models for images

Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

Page 12: Bayes  Rule

Bag-of-word models for images1. Extract image features

Page 13: Bayes  Rule

Bag-of-word models for images1. Extract image features

Page 14: Bayes  Rule

1. Extract image features2. Learn “visual vocabulary”

Bag-of-word models for images

Page 15: Bayes  Rule

1. Extract image features2. Learn “visual vocabulary”3. Map image features to visual words

Bag-of-word models for images