29
Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayesian inferencecalculate the model parameters that produce a distribution that gives the observed data the greatest probability

Page 2: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Thomas Bayes Bayesian methods were invented in the 18th century, but their application in phylogenetics dates from 1996.

Thomas Bayes? (1701?-1761?)

Page 3: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayes’ theorem Bayes’ theorema links a conditional probability to its inverse

Prob(H|D) = Prob(H) Prob(D|H)

∑H Prob(H) Prob(D|H)

Page 4: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayes’ theorem in the case of two alternative hypotheses, the theorem can be written as

Prob(H|D) = Prob(H) Prob(D|H)

∑H Prob(H) Prob(D|H)

Prob(H1|D) = Prob(H1) Prob(D|H1)

Prob(H1) Prob(D|H1) + Prob(H2) Prob(D|H2)

Page 5: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayes’ theorem Bayes for smarties

m

m

= D

H1=D came from mainly orange bag

H2=D came from mainly blue bag

Prob(D|H1) = ¾ • ¾ • ¾ • ¾ • ¼ • 5 = 405/1024

Prob(D|H2) = ¼ • ¼ • ¼ • ¼ • ¾ • 5 = 15/1024

Prob(H1) = ½

Prob(H2) = ½

Prob(H1|D) = Prob(H1) Prob(D|H1)

Prob(H1) Prob(D|H1) + Prob(H2) Prob(D|H2) = = 0.964

½ • 405/1024

½ • 405/1024 + ½ • 15/1024

m

m

mmm

mm

m

m

mm

mm

mm mmm m

mm

m

mmm m

m

m

m

m m

mm

mm

mm

m

m m m m

Page 6: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayes’ theorem a-priori knowledge can affect one’s conclusions

positive test result negative test result

ill true positive false negative

healthy false positive true negative

positive test result negative test result

ill 99% 1%

healthy 0.1% 99.9%

using the data only, P(ill|positive test result)≈0.99

Page 7: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayes’ theorem a-priori knowledge can affect one’s conclusions

positive test result negative test result

ill true positive false negative

healthy false positive true negative

positive test result negative test result

ill 99% 1%

healthy 0.1% 99.9%

using the data only, P(ill|positive test result)≈0.99

Page 8: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayes’ theorem a-priori knowledge can affect one’s conclusions

positive test result negative test result

ill 99% 1%

healthy 0.1% 99.9%

a-priori knowledge: 0.1% of the population (n=100 000) is ill

positive test result negative test result

Ill (100) 99 1

Healthy (99 900) 100 99800

with a-priori knowledge: 99/190 of persons with positive test results is ill P(ill|positive result) ≈ 50%

Page 9: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayes’ theorem a-priori knowledge can affect one’s conclusions

Page 10: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayes’ theorem a-priori knowledge can affect one’s conclusions

Page 11: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayes’ theorem a-priori knowledge can affect one’s conclusions

Behind door 1 Behind door 2 Behind door 3 Result if staying at door 1

Result if switching to door offered

Car Goat Goat Car Goat

Goat Car Goat Goat Car

Goat Goat Car Goat Car

Page 12: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayes’ theorem a-priori knowledge can affect one’s conclusions

P(C=c|H=h, S=s) = P(H=h|C=c, S=s)• P(C=c|S=s)

P(H=h|S=s)

C=number of the door hiding the carS=number of the door selected by the playerH=number of the door opened by the host

probability of finding the car, after the original selectionand the host’s opening of one.

Page 13: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayes’ theorem a-priori knowledge can affect one’s conclusions

P(C=c|H=h, S=s) = P(H=h|C=c, S=s)• P(C=c|S=s)

∑ P(H=h|C=c,S=s)

C=number of the door hiding the carS=number of the door selected by the playerH=number of the door opened by the host

the host’s behaviour depends on the candidate’s selectionand on where the car is.

C=1

3

Page 14: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayes’ theorem a-priori knowledge can affect one’s conclusions

P(C=2|H=3, S=1) = 1 • 1/3

C=number of the door hiding the carS=number of the door selected by the playerH=number of the door opened by the host

1/2 • 1/3 + 1 • 1/3 + 0 • 1/3= 2/3

Page 15: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayes’ theorem Bayes’ theorema is used to combine a prior probability with the likelihood to produce a posterior probability.

Prob(H|D) = Prob(H) Prob(D|H)

∑H Prob(H) Prob(D|H)

prior probability

posterior probability

likelihood

normalizing constant

Page 16: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayesian inference of trees in BI, the players are the tree topology and branch lengths, the evolution model and the (sequence) data)

tree topology and branch lengths

evolutionary modelA G

C T

(sequence) data

Page 17: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayesian inference of trees the posterior probability of a tree is calculated from the prior and the likelihood

A G

C TProb( , | ) =

A G

C TProb( , ) • Prob( | , )

A G

C T

Prob( )

posterior probabilityof a tree

prior probability of a tree

summation over all possible branch lengths and modelparameter values

likelihood

Page 18: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayesian inference of trees the prior probability of a tree is often not known and therefore all trees are considered equally probable

A

B

CD

EA

B

DC

EA

B

ED

CA

C

BD

EB

C

AD

E

A

D

CB

EA

D

BC

EA

D

EB

CA

C

DB

ED

C

AB

E

A

E

CB

DA

E

BC

DA

E

BD

CA

C

EB

DE

C

AB

E

115

115

115

115

115

115

115

115

115

115

115

115

115

115

115

Page 19: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayesian inference of trees

Prob

(Tre

e i)

Prob

(Dat

a |T

ree

i)Pr

ob(T

ree

i |D

ata)

prior probability

likelihood

posterior probability

the prior probability of a tree is often not known and therefore all trees are considered equally probable

Page 20: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayesian inference of trees but prior knowledge of taxonomy could suggest other prior probabilities

A

B

CD

EA

B

DC

EA

B

ED

CA

C

BD

EB

C

AD

E

A

D

CB

EA

D

BC

EA

D

EB

CA

C

DB

ED

C

AB

E

A

E

CB

DA

E

BC

DA

E

BD

CA

C

EB

DE

C

AB

E

13

13

13

0 0

0 0 0

0 0 0 0 0

0 0

(CDE) constrained:

Page 21: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayesian inference of trees BI requires summation over all possible trees … which is impossible to do analytically

A G

C TProb( , | ) =

A G

C TProb( , ) • Prob( | , )

A G

C T

Prob( )

summation over all possible branch lengths and modelparameter values

Page 22: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

1. Start at a random point

Bayesian inference of trees but Markov chain Monte Carlo allows approximating posterior probability

Post

erio

r pro

babi

lity

dens

ity

tree 1 tree 2 tree 3

parameter space

Page 23: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

1. Start at a random point2. Make a small random

move3. Calculate posterior density

ratio r = new/old state

Bayesian inference of trees but Markov chain Monte Carlo allows approximating posterior probability

Post

erio

r pro

babi

lity

dens

ity

tree 1 tree 2 tree 3

parameter space

1

2

Page 24: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

1. Start at a random point2. Make a small random

move3. Calculate posterior density

ratio r = new/old state4. If r > 1 always accept move

Bayesian inference of trees but Markov chain Monte Carlo allows approximating posterior probability

Post

erio

r pro

babi

lity

dens

ity

tree 1 tree 2 tree 3

parameter space

1

2 always accepted

Page 25: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

1. Start at a random point2. Make a small random

move3. Calculate posterior density

ratio r = new/old state4. If r > 1 always accept move

If r < 1 accept move with a probability ~ 1/distance

Bayesian inference of trees but Markov chain Monte Carlo allows approximating posterior probability

Post

erio

r pro

babi

lity

dens

ity

tree 1 tree 2 tree 3

parameter space

1

2

perhaps accepted

Page 26: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

1. Start at a random point2. Make a small random

move3. Calculate posterior density

ratio r = new/old state4. If r > 1 always accept move

If r < 1 accept move with a probability ~ 1/distance

Bayesian inference of trees but Markov chain Monte Carlo allows approximating posterior probability

Post

erio

r pro

babi

lity

dens

ity

tree 1 tree 2 tree 3

parameter space

1

2

rarely accepted

Page 27: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

1. Start at a random point2. Make a small random

move3. Calculate posterior density

ratio r = new/old state4. If r > 1 always accept move

If r < 1 accept move with a probability ~ 1/distance

5. Go to step 2

Bayesian inference of trees the proportion of time that MCMC spends in a particular parameter region is an estimate of that region’s posterior probability.

Post

erio

r pro

babi

lity

dens

ity

tree 1 tree 2 tree 3

parameter space

20% 48% 32%

Page 28: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayesian inference of trees Metropolis-coupled Markov Chain Monte Carlo speeds up the search

cold chain

hot chain: P(tree|data)b

hotter chain: P(tree|data)b

hottest chain: P(tree|data)b

0 < < 1b cold chainflat

Page 29: Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability

Bayesian inference of trees Metropolis-coupled Markov Chain Monte Carlo speeds up the search

cold scout stuck on local optimum

Hey!Over here!

hot scout signalling better spot