34
Journal Club – Bayes Estimators for Phylogenetic Reconstruction Syst. Biol. 60(4), 528 – 540, 2011 doi 10.1093/sysbio/syr021 Leonardo de O. Martins University of Vigo July 22, 2011 Leo Martins (Univ. Vigo) Journal Club 22/7 1 / 12

Journal Club @ UVigo 2011.07.22

Embed Size (px)

DESCRIPTION

Discussion of article " Bayes Estimators for Phylogenetic Reconstruction", presented by Leo Martins to the Phylogenomics Lab of the University of Vigo Syst. Biol. 60(4), 528 ­ 540, 2011 doi 10.1093/sysbio/syr021

Citation preview

Page 1: Journal Club @ UVigo 2011.07.22

Journal Club – Bayes Estimators for PhylogeneticReconstruction

Syst. Biol. 60(4), 528 – 540, 2011 doi 10.1093/sysbio/syr021

Leonardo de O. Martins

University of Vigo

July 22, 2011

Leo Martins (Univ. Vigo) Journal Club 22/7 1 / 12

Page 2: Journal Club @ UVigo 2011.07.22

Outline

1 Distance as a penalty

2 Distances, everywhere

3 No phylogenetics, yet...

4 Trees as points in space

5 To the paper, then

Leo Martins (Univ. Vigo) Journal Club 22/7 2 / 12

Page 3: Journal Club @ UVigo 2011.07.22

Statistical Risk

The risk ρ associated with a decision θ̂ is the expected loss of this decisionθ̂ (which can be, for instance, an estimate of θ).

ρ(θ̂) =

∫L(θ, θ̂) P(θ | data) dθ

(promptly called posterior expected loss)

The loss function L(θ, θ̂) is a penalty we give for ”deciding” away from theparameter. Examples are the squared loss and the absolute loss.

For some loss functions, we can calculate what is the best decision (i.e.the one that minimizes the risk, for any data).

Leo Martins (Univ. Vigo) Journal Club 22/7 3 / 12

Page 4: Journal Club @ UVigo 2011.07.22

Statistical Risk

The risk ρ associated with a decision θ̂ is the expected loss of this decisionθ̂ (which can be, for instance, an estimate of θ).

ρ(θ̂) =

∫L(θ, θ̂) P(θ | data) dθ

(promptly called posterior expected loss)

The loss function L(θ, θ̂) is a penalty we give for ”deciding” away from theparameter. Examples are the squared loss and the absolute loss.

For some loss functions, we can calculate what is the best decision (i.e.the one that minimizes the risk, for any data).

Leo Martins (Univ. Vigo) Journal Club 22/7 3 / 12

Page 5: Journal Club @ UVigo 2011.07.22

Statistical Risk

The risk ρ associated with a decision θ̂ is the expected loss of this decisionθ̂ (which can be, for instance, an estimate of θ).

ρ(θ̂) =

∫L(θ, θ̂) P(θ | data) dθ

(promptly called posterior expected loss)

The loss function L(θ, θ̂) is a penalty we give for ”deciding” away from theparameter. Examples are the squared loss and the absolute loss.

For some loss functions, we can calculate what is the best decision (i.e.the one that minimizes the risk, for any data).

Leo Martins (Univ. Vigo) Journal Club 22/7 3 / 12

Page 6: Journal Club @ UVigo 2011.07.22

Statistical Risk

The risk ρ associated with a decision θ̂ is the expected loss of this decisionθ̂ (which can be, for instance, an estimate of θ).

ρ(θ̂) =

∫L(θ, θ̂) P(θ | data) dθ

(promptly called posterior expected loss)

The loss function L(θ, θ̂) is a penalty we give for ”deciding” away from theparameter. Examples are the squared loss and the absolute loss.

For some loss functions, we can calculate what is the best decision (i.e.the one that minimizes the risk, for any data).

Leo Martins (Univ. Vigo) Journal Club 22/7 3 / 12

Page 7: Journal Club @ UVigo 2011.07.22

Outline

1 Distance as a penalty

2 Distances, everywhere

3 No phylogenetics, yet...

4 Trees as points in space

5 To the paper, then

Leo Martins (Univ. Vigo) Journal Club 22/7 4 / 12

Page 8: Journal Club @ UVigo 2011.07.22

How to summarise a collection of objects?

scattered points

library(MASS);

x <- mvrnorm (n=1000 , mu=c(0,0), Sigma = matrix (c(1, 0.8, 0.9, 1), 2, 2, byrow=T));

plot (x[,1], x[,2], pch= ".", cex = 2, xlab="x", ylab="y");

Leo Martins (Univ. Vigo) Journal Club 22/7 5 / 12

Page 9: Journal Club @ UVigo 2011.07.22

How to summarise a collection of objects?

centroid: minimizes a distance to all points

library(MASS);

x <- mvrnorm (n=1000 , mu=c(0,0), Sigma = matrix (c(1, 0.8, 0.9, 1), 2, 2, byrow=T));

plot (x[,1], x[,2], pch= ".", cex = 2, xlab="x", ylab="y");

Leo Martins (Univ. Vigo) Journal Club 22/7 5 / 12

Page 10: Journal Club @ UVigo 2011.07.22

How to summarise a collection of objects?

regression line: minimizes a distance to all points

library(MASS);

x <- mvrnorm (n=1000 , mu=c(0,0), Sigma = matrix (c(1, 0.8, 0.9, 1), 2, 2, byrow=T));

plot (x[,1], x[,2], pch= ".", cex = 2, xlab="x", ylab="y");

Leo Martins (Univ. Vigo) Journal Club 22/7 5 / 12

Page 11: Journal Club @ UVigo 2011.07.22

Outline

1 Distance as a penalty

2 Distances, everywhere

3 No phylogenetics, yet...

4 Trees as points in space

5 To the paper, then

Leo Martins (Univ. Vigo) Journal Club 22/7 6 / 12

Page 12: Journal Club @ UVigo 2011.07.22

How to summarise the posterior distribution P(X)?

Leo Martins (Univ. Vigo) Journal Club 22/7 7 / 12

Page 13: Journal Club @ UVigo 2011.07.22

How to summarise the posterior distribution P(X)?

Posterior mean

Minimize the expected loss under a squared loss function

L(θ, θ̂) = (θ − θ̂)2

(Euclidean distance)

Leo Martins (Univ. Vigo) Journal Club 22/7 7 / 12

Page 14: Journal Club @ UVigo 2011.07.22

How to summarise the posterior distribution P(X)?

Posterior median

Minimize the expected loss under a linear loss function

L(θ, θ̂) =| θ − θ̂ |

(Manhattan distance)

Leo Martins (Univ. Vigo) Journal Club 22/7 7 / 12

Page 15: Journal Club @ UVigo 2011.07.22

How to summarise the posterior distribution P(X)?

Posterior mode

a.k.a. Maximum A Posteriori (MAP) estimate.Minimize the expected loss under a delta loss function

L(θ, θ̂) =

{0, for θ = θ̂

1, for θ 6= θ̂

Leo Martins (Univ. Vigo) Journal Club 22/7 7 / 12

Page 16: Journal Club @ UVigo 2011.07.22

Outline

1 Distance as a penalty

2 Distances, everywhere

3 No phylogenetics, yet...

4 Trees as points in space

5 To the paper, then

Leo Martins (Univ. Vigo) Journal Club 22/7 8 / 12

Page 17: Journal Club @ UVigo 2011.07.22

Distances between trees

����BBBB

PPPP����

XXXXX

A

B

CD

E

����BBBB

PPPP����

XXXXX

A

B

ED

C

Trees from the article

Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12

Page 18: Journal Club @ UVigo 2011.07.22

Distances between trees

����BBBB

PPPP����

XXXXX

A

B

CD

E

����BBBB

PPPP����

XXXXX

A

B

ED

C

RF distance

DE|ABC and CD|ABEtotal 2 branches

Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12

Page 19: Journal Club @ UVigo 2011.07.22

Distances between trees

����BBBB

PPPP����

XXXXX

A

B

CD

E

����BBBB

PPPP����

XXXXX

A

B

ED

C

Quartet distance

AC|DE and AE|CDBC|DE and BE|CD4 quartets are different

Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12

Page 20: Journal Club @ UVigo 2011.07.22

Distances between trees

����BBBB

PPPP����

XXXXX

A

B

CD

E

����BBBB

PPPP����

XXXXX

A

B

ED

C

Quartet distance

AC|DE and AE|CDBC|DE and BE|CD4 quartets are different

Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12

Page 21: Journal Club @ UVigo 2011.07.22

Distances between trees

����BBBB

PPPP����

XXXXX

A

B

CD

E

����BBBB

PPPP����

XXXXX

A

B

ED

C

Path difference (number of speciations between trees)

path from A to E is one edge longer in one tree than the other

(...)

the overall difference is 6

Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12

Page 22: Journal Club @ UVigo 2011.07.22

Outline

1 Distance as a penalty

2 Distances, everywhere

3 No phylogenetics, yet...

4 Trees as points in space

5 To the paper, then

Leo Martins (Univ. Vigo) Journal Club 22/7 10 / 12

Page 23: Journal Club @ UVigo 2011.07.22

If there is a distance, there is a Bayes estimator

For points in Rn, we know that the mean minimizes the Euclideandistance, etc.

For phylogenies:

there are several Euclidean distances

the mean does not work since a tree has restrictions

But some distances between trees also lead to “analytical” solutions:

the consensus tree minimizes the Robinson-Foulds distance betweenthe samples

the quartet puzzling minimizes the quartet distance

the Buneman tree minimizes (I think) the dissimilarity map distance

some of these are hard to solve as well

Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12

Page 24: Journal Club @ UVigo 2011.07.22

If there is a distance, there is a Bayes estimator

For points in Rn, we know that the mean minimizes the Euclideandistance, etc.

For phylogenies:

there are several Euclidean distances

the mean does not work since a tree has restrictions

But some distances between trees also lead to “analytical” solutions:

the consensus tree minimizes the Robinson-Foulds distance betweenthe samples

the quartet puzzling minimizes the quartet distance

the Buneman tree minimizes (I think) the dissimilarity map distance

some of these are hard to solve as well

Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12

Page 25: Journal Club @ UVigo 2011.07.22

If there is a distance, there is a Bayes estimator

For points in Rn, we know that the mean minimizes the Euclideandistance, etc.

For phylogenies:

there are several Euclidean distances

the mean does not work since a tree has restrictions

But some distances between trees also lead to “analytical” solutions:

the consensus tree minimizes the Robinson-Foulds distance betweenthe samples

the quartet puzzling minimizes the quartet distance

the Buneman tree minimizes (I think) the dissimilarity map distance

some of these are hard to solve as well

Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12

Page 26: Journal Club @ UVigo 2011.07.22

If there is a distance, there is a Bayes estimator

For points in Rn, we know that the mean minimizes the Euclideandistance, etc.

For phylogenies:

there are several Euclidean distances

the mean does not work since a tree has restrictions

But some distances between trees also lead to “analytical” solutions:

the consensus tree minimizes the Robinson-Foulds distance betweenthe samples

the quartet puzzling minimizes the quartet distance

the Buneman tree minimizes (I think) the dissimilarity map distance

some of these are hard to solve as well

Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12

Page 27: Journal Club @ UVigo 2011.07.22

If there is a distance, there is a Bayes estimator

For points in Rn, we know that the mean minimizes the Euclideandistance, etc.

For phylogenies:

there are several Euclidean distances

the mean does not work since a tree has restrictions

But some distances between trees also lead to “analytical” solutions:

the consensus tree minimizes the Robinson-Foulds distance betweenthe samples

the quartet puzzling minimizes the quartet distance

the Buneman tree minimizes (I think) the dissimilarity map distance

some of these are hard to solve as well

Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12

Page 28: Journal Club @ UVigo 2011.07.22

If there is a distance, there is a Bayes estimator

For points in Rn, we know that the mean minimizes the Euclideandistance, etc.

For phylogenies:

there are several Euclidean distances

the mean does not work since a tree has restrictions

But some distances between trees also lead to “analytical” solutions:

the consensus tree minimizes the Robinson-Foulds distance betweenthe samples

the quartet puzzling minimizes the quartet distance

the Buneman tree minimizes (I think) the dissimilarity map distance

some of these are hard to solve as well

Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12

Page 29: Journal Club @ UVigo 2011.07.22

How do they find, then, the Bayes estimates?

like many other softwares: hill-climbing on the space of possibletopologies

their input data is the posterior distribution of trees from MrBayes

starting tree can be NJ, MAP tree, ML...

apply branch-swap (NNI) to current optimal tree, then verify distanceto all samples

the distance used is the path difference (matrix subtraction)don’t need to recalculate distance to all samples, just to matrix withaverage values

Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12

Page 30: Journal Club @ UVigo 2011.07.22

How do they find, then, the Bayes estimates?

like many other softwares: hill-climbing on the space of possibletopologies

their input data is the posterior distribution of trees from MrBayes

starting tree can be NJ, MAP tree, ML...

apply branch-swap (NNI) to current optimal tree, then verify distanceto all samples

the distance used is the path difference (matrix subtraction)don’t need to recalculate distance to all samples, just to matrix withaverage values

Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12

Page 31: Journal Club @ UVigo 2011.07.22

How do they find, then, the Bayes estimates?

like many other softwares: hill-climbing on the space of possibletopologies

their input data is the posterior distribution of trees from MrBayes

starting tree can be NJ, MAP tree, ML...

apply branch-swap (NNI) to current optimal tree, then verify distanceto all samples

the distance used is the path difference (matrix subtraction)don’t need to recalculate distance to all samples, just to matrix withaverage values

Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12

Page 32: Journal Club @ UVigo 2011.07.22

How do they find, then, the Bayes estimates?

like many other softwares: hill-climbing on the space of possibletopologies

their input data is the posterior distribution of trees from MrBayes

starting tree can be NJ, MAP tree, ML...

apply branch-swap (NNI) to current optimal tree, then verify distanceto all samples

the distance used is the path difference (matrix subtraction)don’t need to recalculate distance to all samples, just to matrix withaverage values

Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12

Page 33: Journal Club @ UVigo 2011.07.22

How do they find, then, the Bayes estimates?

like many other softwares: hill-climbing on the space of possibletopologies

their input data is the posterior distribution of trees from MrBayes

starting tree can be NJ, MAP tree, ML...

apply branch-swap (NNI) to current optimal tree, then verify distanceto all samples

the distance used is the path difference (matrix subtraction)

don’t need to recalculate distance to all samples, just to matrix withaverage values

Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12

Page 34: Journal Club @ UVigo 2011.07.22

How do they find, then, the Bayes estimates?

like many other softwares: hill-climbing on the space of possibletopologies

their input data is the posterior distribution of trees from MrBayes

starting tree can be NJ, MAP tree, ML...

apply branch-swap (NNI) to current optimal tree, then verify distanceto all samples

the distance used is the path difference (matrix subtraction)don’t need to recalculate distance to all samples, just to matrix withaverage values

Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12