43
1 Discrete Math CS 280 Prof. Bart Selman [email protected] Module Probability --- Part b) Bayes’ Rule Random Variables

Discrete Math CS 280

Embed Size (px)

DESCRIPTION

Discrete Math CS 280. Prof. Bart Selman [email protected] Module Probability --- Part b) Bayes’ Rule Random Variables. Bayes’ Theorem. How to assess the probability that a particular event will occur on the basis of partial evidence? Examples: - PowerPoint PPT Presentation

Citation preview

Page 1: Discrete Math CS 280

1

Discrete MathCS 280

Prof. Bart [email protected]

Module Probability --- Part b)

Bayes’ RuleRandom Variables

Page 2: Discrete Math CS 280

2

Bayes’ Theorem

How to assess the probability that a particular event will occur on the basis of partial evidence?

Examples:

What is the likelihood that people who test positive to a particular disease (e.g., HIV), actually have the disease?

What is the probability that an e-mail message is spam?

Key idea: one should factor in additional information regarding occurrence of events.

Page 3: Discrete Math CS 280

Assume that with respect to events F and E (“E” for “Evidence”):

We know P(F) – probability that event F occurs

(e.g. probability that email message is spam;

this is given by what fraction of email is spam)

We also know event E has occurred.

(e.g., email message contains words “sale” and “bargain”)

Therefore the probability conditional probability that F occurs given

that E occurs, P(F|E), is a more realistic estimate that F occurs than P(F).

How do we compute P(F|E)?

E.g., based on P(F), P(E|F), and P(E| ¬F)

Note: ¬F is also referred to as complement of F (FC or F).

Page 4: Discrete Math CS 280

BayesianBayesianInferenceInference

EvidenceEvidence

Original BeliefOriginal Belief(Prior Probability)(Prior Probability)

ModifiedModifiedBeliefBelief

HypothesisHypothesis TheoryTheory

P(F)

E

P(F|E)

Page 5: Discrete Math CS 280

Experiment: Pick one box at random (p = 0.5) and than a ball at random from that box.

Assume you got a red ball.

What’s the probability that it came form the left box?

Define:E – you choose a red ball. (therefore ¬ E – you choose the green ball)F – you choose the left box. (therefore ¬ F– you choose the right box)

We want to know P(F|E)

Box A Box B

Page 6: Discrete Math CS 280

What we know:

P(E|F) =

P(E|¬F) =

Given that the boxes are selected at random: P(F) = P(¬F)=1/2

P(F|E) = P(E∩F)/P(E) so we need to compute P(E∩F) and P(E).

P(F|E)?

We know P(E|F) = P(E∩F)/P(F). So, P(E∩F) = P(E|F) P(F) = 7/9 * 1/2 = 7/18.What about P(E)? Note that P(E) = P(EF) +P (E∩ ¬F). Why?Note also that P (E ∩ ¬F)= P(¬F) P(E|¬F) = 1/2 * 3/7 = 3/14So, P(E) = P(EF) +P (E∩ ¬F) = 7/18 + 3/14 = 38/63

And therefore P(F|E) = P(E∩F)/P(E) = (7/18) / (38/63) = 49/76 0.645

7/93/7

E – red colorF – left box

Page 7: Discrete Math CS 280

BayesianBayesianInferenceInference

Concrete (new) EvidenceConcrete (new) EvidenceRed ball picked (E)Red ball picked (E)

Original BeliefOriginal Belief there is a 0.5 that you will pick left box there is a 0.5 that you will pick left box

(P(F)).(P(F)).

Modified BeliefModified BeliefIncreased to 0.65 Probability Increased to 0.65 Probability

(P(F|E))(P(F|E))

Page 8: Discrete Math CS 280

( | ) ( )( | )

( | ) ( ) ( | ) ( )C CP E F P F

P F EP E F P F P E F P F

Theorem: Bayes’ Theorem

Suppose that E and F are events from a sample space S such that P(E) ≠0 and P(F)≠0. Then

Proof:

, ( ) ( | ) ( )

exp ( )

( ) ( ) ( )

( ) ( ) ( ) (

( | ) ( )( | )

(

( | ) ( ) / ( )

( | ) (

| ) ( ) ( | ) ( )

( |

)

) / (

)

)

( |

C C

C C C

So P E F P E F P F therefore

We only need an ression for P E

E E S E F F E F E F

So

P E P E F

P F E P

P

P E F P FP

E F

E F P E

P E F P

F EP

P E F P F P E F P F

So

P EP F E

F P

E

E F

( | ) ( )

( | ) ( ) ( | )

) ( )

( )( ) C C

P E F P F

P E F

F

P F

P

P E F P

F

P E F

Why?

( | ) ( )( | )

( )

P E F P FP F E

P E

Page 9: Discrete Math CS 280

Example:

Suppose that 1 person in 100,000 has a particular rare disease. There is an

accurate test for the disease that is correct in 99% of the time when given

to someone with the disease; it is correct in 99.5% of the time when given

to someone without the disease.

Find:

a) Probability that someone who tests positive has the disease.

b) Probability that someone who tests negative does not have the disease.

Page 10: Discrete Math CS 280

Solution:

a)

Always start by defining the events!

F – the person has the diseaseE – the person tests positive to the diseaseP(F|E) – probability of having the disease given positive test

P(F)=1/100,000 = 0.00001; P(FC) = 0.99999P(E|F) = 0.99; P(EC|F) = 0.01P(E|FC) = 0.005

002.0)99999.0)(005.0()00001.0)(99.0(

)00001.0)(99.0(

)()|()()|(

)()|()|(

CC FPFEPFPFEP

FPFEPEFP

Only 0.2% of people who test positive actually have the disease!!!

Note: These arethe probabilitiesmost easily measured!

Page 11: Discrete Math CS 280

b) F – the person has the disease

E – the person tests positive to the disease

P(FC|EC) – probability of not having the disease given negative test

P(F)=1/100,000=0.00001; P(FC)=0.99999

P(E|F)=0.99; P(EC|F)=0.01

P(E|FC)=0.005

9999999.0)00001.0)(01.0()99999.0)(995.0(

)999991.0)(995.0(

)()|()()|(

)()|()|(

FPFEPFPFEP

FPFEPEFP

CCCC

CCCCC

That’s… pretty good!

Page 12: Discrete Math CS 280

MarblesMarbles

TOYS R US sells two kinds of bags of marbles:(1) Bags of all black marbles, and(2) Bags of mixed marbles in which 20% of the marbles are black.

The bags are opaque and wrapped in plastic, and I have no idea which bag is more common. I buy a bag and figure there is a 50:50 chance that the bag I purchased contains all black marbles. A guess!

I pull a marble out of the bag and see that it is black. How should this new evidence affect the 50:50 assessment I assigned to the probability of my having purchased an all black bag of marbles? (as previous example)

F – bag of all black marbles; FC – bag with 20% black marblesE – black marble

Page 13: Discrete Math CS 280

Prior BeliefThere is a 1/2 chance that I have an all-black bag of marbles … a guess (P(F))

MarblesMarbles

1 0.5( | ) 0.83

(1 0.5) (0.2 0.5)P F E

0.5 chance of all-black (100%) marble bag.

0.5 chance of 0.2 black marble bag.

Posterior BeliefProbability that my

bag of marbles is all black = 0.833 P(F|E).

( | ) ( )( | )

( | ) ( ) ( | ) ( )C C

P E F P FP F E

P E F P F P E F P F

Page 14: Discrete Math CS 280

MarblesMarbles

96.0)2.017(.)183.0(

183.0)|(

EFP

Prior Belief0.83

0.83 chance of all-black (1) marble bag.

0.17 chance of 0.2 black marble bag.

New Belief0.96

I put the marble back, shake the bag, and draw another marble. It is black? What happens now that my new prior probability is 0.83?

Remember, I don’t know which type of marble bag is most popular … Wal-Mart may have 100 bags of mixed marbles on the shelf for every bag of all black marbles.

Bayes’ Theorem doesn’t tell me the probability of my marble bag being all black – it only tells me how I should revise my initial best guess based on the newly obtained information.

Warning: Correct but slightly informal! Instead of changing prior, we could consider new experiment and evidence drawing two marbles.

( | ) ( )( | )

( | ) ( ) ( | ) ( )C C

P E F P FP F E

P E F P F P E F P F

Page 15: Discrete Math CS 280

BayesianBayesianInferenceInference

Concrete EvidenceConcrete Evidence1st Black Marble1st Black Marble

Original BeliefOriginal BeliefI shrug my shoulders and guess is that I shrug my shoulders and guess is that

there is a 0.5 chance that my bag contains there is a 0.5 chance that my bag contains all black marbles.all black marbles.

Modified BeliefModified BeliefIncreased to 0.83 ProbabilityIncreased to 0.83 Probability

BayesianBayesianInferenceInference

Concrete EvidenceConcrete Evidence2nd Black Marble2nd Black Marble

Modified BeliefModified BeliefIncreased to 0.96 Increased to 0.96

Page 16: Discrete Math CS 280

Generalized Bayes’ Theorem

Suppose that E is an event from a sample space S and F1, F2 ,…, Fn are

mutually exclusive events such that

Asume that P(E) ≠ 0 and P(Fi) ≠ 0 for i=1, 2,…, n. Then

SFini 1

)

1

( | ( )( | )

( | ) ( )

j jj n

i ii

P E F P FP F E

P E F P F

P(E)

( | ) ( )( | )

( | ) ( ) ( | ) ( )C C

P E F P FP F E

P E F P F P E F P F

Compare:

Page 17: Discrete Math CS 280

1717

Bayesian Spam Filters

Page 18: Discrete Math CS 280

18

Applying Bayes’ Theorem:SPAM or HAM?

Let our sample space or universe be the set of emails. (So, we’re sampling fromLet our sample space or universe be the set of emails. (So, we’re sampling from

the space of possible emails.)the space of possible emails.)

Let S be the event a message is spam; hence is the event a message is not spamLet S be the event a message is spam; hence is the event a message is not spam

Let E be the event a message contains a word Let E be the event a message contains a word ww. .

Since we have no idea of likelihood of SPAM, we assume P(S)=P(SC)=1/2.

Can we do better?

How do we get and ? ( | )p E S ( | )p E S

Page 19: Discrete Math CS 280

1919

Estimations

Note these are estimates based on frequencies in samples.

Page 20: Discrete Math CS 280

20

Estimation Continued

Note P(S) = P(SC) = ½ divides out.

So,

becomes

So, a quite straightforward formula for our first Bayesian spam filter!

So, what do we want forp(w) and q(w) ??

( | ) ( )p E S p w

( | ) ( )p E S q w

Page 21: Discrete Math CS 280

2121

Spam based on single words?

Probabilities based on single words: Bad IdeaProbabilities based on single words: Bad Idea

– False positives AND false negatives a plentyFalse positives AND false negatives a plenty

Calculate based on Calculate based on n n words, assuming each event Ewords, assuming each event Eii|S (E|S (Eii|S|SCC) is ) is

independent;independent;

P(S) = P(SP(S) = P(SCC).).

Derivation see Sect. 6.3.

Page 22: Discrete Math CS 280

22

Final Approximation

22

Compare to single word:

Page 23: Discrete Math CS 280

2323

How do we use this?

User must train the filter based on messages in his/her inbox to estimate User must train the filter based on messages in his/her inbox to estimate probabilities.probabilities.

The program or user must define a threshold probability The program or user must define a threshold probability rr: :

If , the message is considered spam.If , the message is considered spam.

Gmail: Train on all users! (note: report spam button)Gmail: Train on all users! (note: report spam button)

Page 24: Discrete Math CS 280

2424

Example

Suppose the filter has the following dataSuppose the filter has the following dataThreshold Probability: .9Threshold Probability: .9““Nigeria” occurs in 250 of 2000 spam messages Nigeria” occurs in 250 of 2000 spam messages ““Nigeria” occurs in only 5 of 1000 non-spam messagesNigeria” occurs in only 5 of 1000 non-spam messagesLet’s try to estimate the probability, using the process we just definedLet’s try to estimate the probability, using the process we just defined

Page 25: Discrete Math CS 280

2525

Example Cont.

Step 1: Find the probability that the message has the Step 1: Find the probability that the message has the word “Nigeria” in it and is spam. word “Nigeria” in it and is spam. – p(p(NigeriaNigeria) = 250 / 2000 = 0.125) = 250 / 2000 = 0.125

Step 2: Find the probability that the message has the Step 2: Find the probability that the message has the word “Nigeria” in it and is not spam.word “Nigeria” in it and is not spam.– q(q(NigeriaNigeria) = 5 / 1000 = 0.005) = 5 / 1000 = 0.005

Page 26: Discrete Math CS 280

2626

Since we are assuming that it is equally likely that an Since we are assuming that it is equally likely that an incoming message is or is not spam, we can incoming message is or is not spam, we can estimate the probability with this equation:estimate the probability with this equation:

r(Nigeria) = r(Nigeria) = p(Nigeria) p(Nigeria)

p(Nigeria) + q(Nigeria)p(Nigeria) + q(Nigeria)

Example Cont.

Page 27: Discrete Math CS 280

2727

= = 0.1250.125

0.1300.130

= 0.962= 0.962

Since Since r(Nigeria)r(Nigeria) is greater than the threshold of 0.9, we can reject is greater than the threshold of 0.9, we can reject

this message as spam.this message as spam.

Example Cont.

0.125____0.125____0.125 + 0.0050.125 + 0.005

Page 28: Discrete Math CS 280

2828

Multiple Words

2000 Spam messages; 1000 real messages2000 Spam messages; 1000 real messages

““Nigeria” appears in 400 spam messagesNigeria” appears in 400 spam messages

““Nigeria” appears in 60 real messagesNigeria” appears in 60 real messages

““bank” appears in 200 spam and 25 real messagesbank” appears in 200 spam and 25 real messages

Threshold Probability: .9Threshold Probability: .9

Let’s calculate the probability that message with “Nigeria” and “bank” is Let’s calculate the probability that message with “Nigeria” and “bank” is spam.spam.

Page 29: Discrete Math CS 280

29

Example Cont.

Step 1: Find the probability that the message has the word “Nigeria” in it Step 1: Find the probability that the message has the word “Nigeria” in it and is spam.and is spam.– p(Nigeria) = 400 / 2000 = 0.2p(Nigeria) = 400 / 2000 = 0.2

Step 2: Find the probability that the message has the word “Nigeria” and is Step 2: Find the probability that the message has the word “Nigeria” and is not spam.not spam.– q(Nigeria) = 60 / 1000 = 0.06q(Nigeria) = 60 / 1000 = 0.06

Step 3: Find the probability that the message contains the word “bank” and Step 3: Find the probability that the message contains the word “bank” and is spam.is spam.– p(bank) = 200 / 2000 = 0.1p(bank) = 200 / 2000 = 0.1

Step 4: Find the probability that the message contains the word “bank” and Step 4: Find the probability that the message contains the word “bank” and is not spam.is not spam.– q(bank) = 25 / 1000 = 0.025q(bank) = 25 / 1000 = 0.025

Page 30: Discrete Math CS 280

3030

Example Cont

Using our approximation, we have:Using our approximation, we have:

r(Nigeria,bank) = r(Nigeria,bank) = p(Nigeria) * p(bank) p(Nigeria) * p(bank) p(Nigeria) * p(bank) + q(Nigeria) * q(bank)p(Nigeria) * p(bank) + q(Nigeria) * q(bank)

Page 31: Discrete Math CS 280

3131

Example Cont.

Using our approximation, we have:Using our approximation, we have:

r(Nigeria,bank) = r(Nigeria,bank) = p(Nigeria) * p(bank) p(Nigeria) * p(bank) p(Nigeria) * p(bank) + q(Nigeria) * q(bank)p(Nigeria) * p(bank) + q(Nigeria) * q(bank)

r(Nigeria,bank)r(Nigeria,bank) = = (0.2)(0.1) (0.2)(0.1)

(0.2)(0.1) + (0.6)(0.025)(0.2)(0.1) + (0.6)(0.025)

= = 0.9300.930

This message will be rejected however since we set the threshold probability at 0.9.This message will be rejected however since we set the threshold probability at 0.9.

Concludes Bayes Reasoning

Page 32: Discrete Math CS 280

32

Probability Paradox I

Page 33: Discrete Math CS 280

Magic Dice: Or How to Win Every Time!

a) You select any one of the four dice (A, B, C, or D). b) I’ll select another.Both dice are thrown, highest number wins throw. Do series of 10 throws. The person with the most highest throws wins the series. (I.e. die “more likely to get a higher number” wins.)

Claim: In a game of 'The Best of Ten Throws’, I will almost certainly win --- no matter which die you pick!!

Why is this strange? Say, you pick die A. Let’s assume, die B is better. So, I pick B. But, then, next game & next person picks B. Let’s assume C is better. I’ll select C. Next person, will pick C. I’ll pick D. Next person, will pick D… Hmm… I’ll pick A and will win!!A < B < C < D …. < A !! Failure of transitivity!

But, could such a set ofdice exist?

Surprisingly, yes!

Page 34: Discrete Math CS 280

Magic Dice

Non-transitive dice: http://www.sciencenews.org/20020420/mathtrek.asp

D

C

B

A

Prob(D wins over C) = 2/32/6 + (4/6)* (1/2) = 4/6

Prob(C wins over B) = 2/3since3/6 + (3/6)* (2/6) = 4/6Prob(B wins over A) = 4/6 = 2/3

(i.e. Prob(A wins over B) = 1/3)

Prob(A wins over D) = 4/6 = 2/3

A < B < C < D …. < A !!

However: transitivity inexpected value of dice throw

E[B] < E[A] = E[C] < E[D] 16/6 < 18/6 = 18/6 < 20/6

Page 35: Discrete Math CS 280

35

Random Variables and Distributions

Page 36: Discrete Math CS 280

Random Variables

For a given sample space S, a random variable (r.v.) is any real valued

function on S, i.e., a random variable is a function that assigns a real

number to each possible outcome

Suppose our experiment is a roll of 2 dice. S is set of pairs.

Example random variables:

S0 2-2

X = sum of two dice. X((2,3)) = 5

Y = difference between two dice. Y((2,3)) = 1

Z = max of two dice. Z((2,3)) = 3

Sample spaceNumbers

Page 37: Discrete Math CS 280

Random variable

Suppose a coin is flipped three times. Let X(t) be the random variable that equals the number of heads that appear when t is the outcome.

X(HHH) = 3

X(HHT) = X(HTH)=X(THH)=2

X(TTH)=X(THT)=X(HTT)=1

X(TTT)=0

Note: we generally drop the argument! We’ll just say the “random variable X”.

And write e.g. P(X = 2) for “the probability that the random variable X(t) takes on the value 2”.

Or P(X=x) for “the probability that the random variable X(t) takes on the value x.”

Page 38: Discrete Math CS 280

38

Distribution of Random Variable

Definition:

The distribution of a random variable X on a sample space S is the set of

pairs (r, p(X=r)) for all r X(S), where p(X=r) is the probability that X

takes the value r.

A distribution is usually described specifying p(X=r) for each r X(S).

A probability distribution on a r.v. X is just an allocation of

the total probability mass, 1, over the possible values of X.

Page 39: Discrete Math CS 280

41

The Birthday Paradox

Page 40: Discrete Math CS 280

Birthdays

How many people have to be in a room to assure that the probability that at least two of them have the same birthday is greater than 1/2?

a) 23b) 183c) 365d) 730

Let pn be the probability that no people share a birthday among n people in a room.

We want the smallest n so that 1 - pn > 1/2.

Then 1 - pn is the probability that 2 or more share a birthday.

Hmm. Why does such an n exist? Upper-bound?

For L optionsanswer is inthe orderof sqrt(L) ?

Informally, why??

A: 23

Page 41: Discrete Math CS 280

Birthdays

Assumption:

Birthdays of the people are independent.

Each birthday is equally likely and that there are 366 days/year

Let pn be the probability that no-one shares a birthday among n people in a room.

Assume that people come in certain order; the probability that the second person has a birthday. Different than the first is 365/366; the probability that the third person has a different birthday. Form the two previous ones is 364/366.. For the jth person we have (366-(j-1))/366.

What is pn? (“brute force” is fine)

Page 42: Discrete Math CS 280

After several tries, when n=22 1= pn = 0.475.

n=23 1-pn = 0.506

So,366

367

366

363

366

364

366

365 npn

366

367

366

363

366

364

366

36511

npn

Relevant to “hashing”. Why?

Page 43: Discrete Math CS 280

45

From Birthday Problem to Hashing Functions

Probability of a Collision in Hashing Functions

A hashing function h(k) is a mapping of the keys (or records, e.g., SSN, around 300x 106 in the US) to a much smaller storage location. A good hashing fucntio yields few collisions. What is the probability that no two keys are mapped to the same location by a hashing function?

Assume m is the number available storage locations, so the probability of mapping a key to a location is 1/m.Assuming the keys are k1, k2, kn, the probability of mapping the jth record to a free location is after the first (j-1) records is (m-(j-1))/m.

m

nm

m

m

m

mp

m

nm

m

m

m

mp

n

n

12111

121

Given a certain m, find the smallest nSuch that the probability of a collision is greater than a particular threshold p.

It can be shown that for p>1/2,

n 1.177 m

m = 10,000, gives n = 117. Not that many!