46
Bayesian Statistics: Asking the “Right” Questions Michael L. Raymer, Ph.D.

Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

Embed Size (px)

Citation preview

Page 1: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

Bayesian Statistics: Asking the “Right” Questions

Michael L. Raymer, Ph.D.

Page 2: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 2

Statistical Games

“The defendant’s DNA is consistent with the evidentiary sample, and the defendant’s DNA type occurs with a frequency of one in 10,000,000,000.”

“The defendant’s DNA is consistent with the evidentiary sample, and the defendant’s DNA type occurs with a frequency of one in 10,000,000,000.”

“Only about 0.1% of wife batterers actually murder their wives. Therefore, evidence of abuse and battering should not be admissible in a murder trial.”

“Only about 0.1% of wife batterers actually murder their wives. Therefore, evidence of abuse and battering should not be admissible in a murder trial.”

Page 3: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 3

The Question

• “Given the evidentiary DNA typeand the defendant’s DNA type, what is the probability that the evidence sample contains the defendant’s DNA?”

• Information available:How common is each allele in a

particular population?CPI, RMP etc.

Page 4: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 4

An Example Problem• Suppose the rate of breast cancer

is 1%• Mammograms detect breast cancer

in 80% of cases where it is present• 10% of the time, mammograms will

indicate breast cancer in a healthy patient

• If a woman has a positive mammogram result, what is the probability that she has breast cancer?

Page 5: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 5

Results

• 75% -- 3• 50% -- 1• 25% -- 2• <10% -- a lot

Page 6: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 6

Determining Probabilities• Counting all possible outcomes• If you flip a coin 4 times, what is the

probability that you will get heads twice?TTTT THTT HTTT HHTTTTTH THTH HTTH HHTHTTHT THHT HTHT HHHTTTHH THHH HTHH HHHH

• P(2 heads) = 6/16 = 0.375

Page 7: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 7

Statistical Preliminaries

• Frequency and Probability

We can guess at probabilities by counting frequencies:P(heads) = 0.5

The law of large numbers: the more samples we take the closer we will get to 0.5.

Page 8: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 8

Distributions• Counting frequencies gives us

distributions

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

P(N)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

N =# Heads (20 Tosses)0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.5 1 1.5 2

N

P(N

)

Binomial Distribution(Discrete)

Gaussian Distribution(Continuous)

Page 9: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 9

Density Estimation• Parametric

Assume a Gaussian (e.g.) distribution.Estimate the parameters (,).

• Non-parametricHistogram samplingBin size is criticalGaussian smoothing

can help

Page 10: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 10

Combining Probabilities• Non-overlapping outcomes:

• Possible Overlap:

• Independent Events:

2121 or EPEPEEP 2121 or EPEPEEP

212121 and or EEPEPEPEEP 212121 and or EEPEPEPEEP

2121 and EPEPEEP 2121 and EPEPEEP The Product

RuleThe Product

Rule

Page 11: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 11

Product Rule Example

• P(Engine > 200 H.P.) = 0.2• P(Color = red) = 0.3• Assuming independence:

P(Red & Fast) = 0.2 × 0.3 = 0.06

• 1/4 * 1/10 * 1/6 * 1/8 * 1/5 1/10,000

Page 12: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 12

Statistical Decision Making• One variable:

A ring was found at the scene of the crime. The ring is size 11. The defendant’s ring size is also 11. If a random ring were left at the crime scene, what is the probability that it would have been size 11?

A ring was found at the scene of the crime. The ring is size 11. The defendant’s ring size is also 11. If a random ring were left at the crime scene, what is the probability that it would have been size 11?

0

20

40

60

80

100

120

Frequency

5 6 7 8 9 10 11 12 13

Ring Size

Page 13: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 13

Multiple Variables

• Assume independence:

Note what happens to significant digits!

The ring is size 11, and also made of platinum.The ring is size 11, and also made of platinum.

00045.0005.009.0platinum) and 11 size(

005.03822platinum

09.03823411 size

P

P

P

00045.0005.009.0platinum) and 11 size(

005.03822platinum

09.03823411 size

P

P

P

Page 14: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 14

Which Question?• If a fruit has a diameter of 4”, how

likely is it to be an apple?

Apples 4” Fruit

Page 15: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 15

“Inverting” the question

Given an apple, what is the probability that it will have a diameter of 4”?

Given an apple, what is the probability that it will have a diameter of 4”?

Given a 4” diameter fruit, what is the probability that it is an apple?

Given a 4” diameter fruit, what is the probability that it is an apple?

Page 16: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 16

Forensic DNA Evidence

• Given alleles (17, 17), (19, 21),(14, 15.1), what is the probability that a DNA sample belongs to Bob?

Find all (17,17), (19,21), (14,15.1) individuals, how many of them are Bob?

How common are 17, 19, 21, 14, and 15.1 in “the population”?

Page 17: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 17

Conditional Probabilities• For related events, we can express

probability conditionally:

• Statistical Independence:

0.01 sunny)|rain(

5.0cloudy|rain

P

P 0.01 sunny)|rain(

5.0cloudy|rain

P

P

121 | EPEEP 121 | EPEEP

Page 18: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 18

Bayesian Decision Making

• TerminologyWe have an object, and we want to

decide if it belongs to a classIs this fruit a type of apple?Does this DNA come from a Caucasian

American?Is this car a sports car?

We measure features of the object (evidence):Size, weight, colorAlleles at various loci

Page 19: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 19

Bayesian Notation• Feature/Evidence Vector:

• Classes & Posterior Probability:

3.3" oz, .39 yellow,

2.9" oz, 1.6 red,

2

1

x

x

3.3" oz, .39 yellow,

2.9" oz, 1.6 red,

2

1

x

x

15.0|pear

40.0|apple

2

2

xP

xP

15.0|pear

40.0|apple

2

2

xP

xP

Page 20: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 20

A Simple Example

• You are given a fruit with adiameter of 4” – is it a pear or an apple?

• To begin, we need to know the distributions of diameters for pears and apples.

Page 21: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 21

Maximum Likelihood

P(x)

apple|xP pear|xP

diameterx

Class-Conditional Distributions

Class-Conditional Distributions

1” 2” 3” 4” 5” 6”

Page 22: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 22

A Key Problem

• We based this decision on(class conditional)

• What we really want to use is(posterior probability)

• What if we found the fruit in a pear orchard?

• We need to know the prior probability of finding an apple or a pear!

pear|xP

xP |pear

Page 23: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 23

Prior Probabilities

• Prior probability + Evidence Posterior Probability

• Without evidence, what is the “prior probability” that a fruit is an apple?

• What is the prior probability that a DNA sample comes from the defendant?

Page 24: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 24

The heart of it all• Bayes Rule

classes all

)()|(

)()|(|

classPclassevidenceP

classPclassevidencePevidenceclassP

pearpear|4appleapple|4

appleapple|4

P"dpP"dp

P"dp

Page 25: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 25

Bayes Rule

c

jjj

jjj

Pxp

PxpxP

1

|

||

or

xpPxp

xP jjj

|

|

Page 26: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 26

Example Revisited

• Is it an ordinary apple or an uncommon pear?

05.0pear|4

4.0apple|4

"dP

"dP

9.0)pear(

1.0apple

P

P

Page 27: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 27

Bayes Rule Example

47.0085.0

04.0

9.005.01.04.0

1.04.0

"dP 4|apple

pearpear|4appleapple|4

appleapple|4

P"dpP"dp

P"dp

Page 28: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 28

Bayes Rule Example "dP 4|pear

pearpear|4appleapple|4

pearpear|4

P"dpP"dp

P"dp

53.0085.0

045.0

9.005.01.04.0

9.005.0

Page 29: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 29

Posing the question

1. What are the classes?2. What is the evidence?3. What is the prior probability?4. What is the class-conditional

probability?

classes all

)()|(

)()|(|

classPclassevidenceP

classPclassevidencePevidenceclassP

Page 30: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 30

An Example Problem• Suppose the rate of breast cancer

is 1%• Mammograms detect breast cancer

in 80% of cases where it is present• 10% of the time, mammograms will

indicate breast cancer in a healthy patient

• If a woman has a positive mammogram result, what is the probability that she has breast cancer?

Page 31: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 31

Practice Problem Revisited

01.0cancerP

8.0| cancerposP

1.0| healthyposP

• Classes: healthy, cancer• Evidence: positive mammogram

(pos), negative mammogram (neg)

• If a woman has a positive mammogram result, what is the probability that she has breast cancer? ?| poscancerP

Page 32: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 32

A Counting Argument

• Suppose we have 1000 women10 will have breast cancer

8 of these will have a positive mammogram

990 will not have breast cancer99 of these will have a positive

mammogram

Of the 107 women with a positive mammogram, 8 have breast cancer8/107 0.075 = 7.5%

Page 33: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 33

Solution

%5.7075.0107.0

008.0

99.01.001.08.0

01.08.0

)|( poscancerP

)(||

|

healthyPhealthypospcancerPcancerposp

cancerPcancerposp

Page 34: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 34

An Example Problem

• Suppose the chance of a randomly chosen person being guilty is .001

• When a person is guilty, a DNA sample will match that individual 99% of the time.

• .0001 of the time, a DNA will exhibit a false match for an innocent individual

• If a DNA test demonstrates a match, what is the probability of guilt?

Page 35: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 35

Solution

909.0000999.000099.0

00099.0

999.00001.0001.099.0

001.099.0

)|( posguiltP

)(||

|

innocentPinnocentpospguiltPguiltposp

guiltPguiltposp

Page 36: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 36

Marginal Distributions

apple|1xP pear|1xP

apple|2xP pear|2xP

Page 37: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 37

Combining Marginals

• Assuming independent features:

• If we assume independence and use Bayes rule, we have a Naïve Bayes decision maker (classifier).

jdjjj xPxPxPxP ω|ω|ω|ω| 21

Page 38: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 38

Bayes Decision Rule

• Provably optimum when the features (evidence) follow Gaussian distributions, and are independent.

jxPxP ji

i

||

such that , classPredict

Page 39: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 39

Forensic DNA

• Classes: DNA from defendant, DNA not from defendant

• Evidence: Allele matches at various lociAssumption of independence

• Prior Probabilities?Assumed equal (0.5)What is the true prior probability that an

evidence sample came from a particular individual?

Page 40: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 40

The Importance of Priors

0.1 0.5 0.050.1 0.25 0.0250.1 0.1 0.010.1 0.01 0.0010.1 0.001 0.0001

|xP P xP

|

Page 41: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 41

Likelihood Ratios

• When deciding between two possibilities, we don’t need the exact probabilities. We only need to know which one is greater.

• The denominator for all the classes is always equal.Can be eliminatedUseful when there are many possible

classes

Page 42: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 42

Likelihood Ratio Example

pearpear|4appleapple|4

pearpear|4

P"dpP"dp

P"dp

pearpear|4appleapple|4

appleapple|4

P"dpP"dp

P"dp

Page 43: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 43

Likelihood Ratio Example

appleapple|4

pearpear|4

P"dp

P"dp

Page 44: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 44

From alleles to identity:

• It is relatively easy to find the allele frequencies in the populationMarginal probability distributions

• Independence assumptionClass conditional probabilities

• Equal prior probabilitiesBayesian posterior probability estimate

Page 45: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 45

Thank you.Thank you.

Page 46: Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D

8/29/03 M. Raymer – WSU, FBS 46

A Key Advantage

• The oldest citation:

T. Bayes. “An essay towards solving a problem in the doctrine of chances.” Phil. Trans. Roy. Soc., 53, 1763.

T. Bayes. “An essay towards solving a problem in the doctrine of chances.” Phil. Trans. Roy. Soc., 53, 1763.