Accommodatingclustered divergences inphylogenetic inference
Jamie R. Oaks1,2
1Department of Biology, University ofWashington
2Department of Biological Sciences,Auburn University
October 21, 2015
c© 2007 Boris Kulikov boris-kulikov.blogspot.com
Clustered diversification Jamie Oaks – phyletica.org 1/27
I Phylogenetics is rapidlyprogressing as an endeavorof statistical inference
I “Big data” present excitingpossibilities andcomputational challenges
I Exciting opportunities todevelop new ways to studybiology in the light ofphylogeny
c© 2007 Boris Kulikov boris-kulikov.blogspot.com
Clustered diversification Jamie Oaks – phyletica.org 2/27
I Phylogenetics is rapidlyprogressing as an endeavorof statistical inference
I “Big data” present excitingpossibilities andcomputational challenges
I Exciting opportunities todevelop new ways to studybiology in the light ofphylogeny
c© 2007 Boris Kulikov boris-kulikov.blogspot.com
Clustered diversification Jamie Oaks – phyletica.org 2/27
I Phylogenetics is rapidlyprogressing as an endeavorof statistical inference
I “Big data” present excitingpossibilities andcomputational challenges
I Exciting opportunities todevelop new ways to studybiology in the light ofphylogeny
c© 2007 Boris Kulikov boris-kulikov.blogspot.com
Clustered diversification Jamie Oaks – phyletica.org 2/27
Current state of phylogenetics
I Assumption: Divergences are independent across the tree
I We know this assumptionis frequently violated
I Why account for thisnon-independence?
1. Improve inference
2. Provide a frameworkfor studying processesof co-diversification
I This is a model-choiceproblem
Clustered diversification Jamie Oaks – phyletica.org 3/27
Current state of phylogenetics
I Assumption: Divergences are independent across the tree
I We know this assumptionis frequently violated
I Why account for thisnon-independence?
1. Improve inference
2. Provide a frameworkfor studying processesof co-diversification
I This is a model-choiceproblem
Clustered diversification Jamie Oaks – phyletica.org 3/27
Current state of phylogenetics
I Assumption: Divergences are independent across the tree
I We know this assumptionis frequently violated
I Why account for thisnon-independence?
1. Improve inference
2. Provide a frameworkfor studying processesof co-diversification
I This is a model-choiceproblem
Clustered diversification Jamie Oaks – phyletica.org 3/27
Current state of phylogenetics
I Assumption: Divergences are independent across the tree
I We know this assumptionis frequently violated
I Why account for thisnon-independence?
1. Improve inference
2. Provide a frameworkfor studying processesof co-diversification
I This is a model-choiceproblem
Clustered diversification Jamie Oaks – phyletica.org 3/27
Current state of phylogenetics
I Assumption: Divergences are independent across the tree
I We know this assumptionis frequently violated
I Why account for thisnon-independence?
1. Improve inference
2. Provide a frameworkfor studying processesof co-diversification
I This is a model-choiceproblem
Clustered diversification Jamie Oaks – phyletica.org 3/27
Current state of phylogenetics
I Assumption: Divergences are independent across the tree
I We know this assumptionis frequently violated
I Why account for thisnon-independence?
1. Improve inference
2. Provide a frameworkfor studying processesof co-diversification
I This is a model-choiceproblem
Clustered diversification Jamie Oaks – phyletica.org 3/27
Current state of phylogenetics
I Assumption: Divergences are independent across the tree
I We know this assumptionis frequently violated
I Why account for thisnon-independence?
1. Improve inference
2. Provide a frameworkfor studying processesof co-diversification
I This is a model-choiceproblem
Clustered diversification Jamie Oaks – phyletica.org 3/27
Current state of phylogenetics
I Assumption: Divergences are independent across the tree
I We know this assumptionis frequently violated
I Why account for thisnon-independence?
1. Improve inference
2. Provide a frameworkfor studying processesof co-diversification
I This is a model-choiceproblem
Clustered diversification Jamie Oaks – phyletica.org 3/27
Divergence model choice
τ1
T1
T2
T3
Clustered diversification Jamie Oaks – phyletica.org 4/27
Divergence model choice
τ1
T1
T2
T3
Clustered diversification Jamie Oaks – phyletica.org 4/27
Divergence model choice
τ2 τ1
T1
T2
T3
Clustered diversification Jamie Oaks – phyletica.org 4/27
Divergence model choice
τ1τ2
T1
T2
T3
Clustered diversification Jamie Oaks – phyletica.org 4/27
Divergence model choice
τ1τ2
T1
T2
T3
Clustered diversification Jamie Oaks – phyletica.org 4/27
Divergence model choice
τ3 τ1τ2
T1
T2
T3
Clustered diversification Jamie Oaks – phyletica.org 4/27
Inferring co-diversification
m1 m2 m3 m4 m5
τ1
T1
T2
T3
τ2 τ1
T1
T2
T3
τ1τ2
T1
T2
T3
τ1τ2
T1
T2
T3
τ3 τ1τ2
T1
T2
T3
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Clustered diversification Jamie Oaks – phyletica.org 5/27
Inferring co-diversification
m1 m2 m3 m4 m5
τ1
T1
T2
T3
τ2 τ1
T1
T2
T3
τ1τ2
T1
T2
T3
τ1τ2
T1
T2
T3
τ3 τ1τ2
T1
T2
T3
We want to infer m and T given DNA sequence alignments X
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Clustered diversification Jamie Oaks – phyletica.org 5/27
Inferring co-diversification
p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1
T1
T2
T3
τ2 τ1
T1
T2
T3
τ1τ2
T1
T2
T3
τ1τ2
T1
T2
T3
τ3 τ1τ2
T1
T2
T3
We want to infer m and T given DNA sequence alignments X
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Clustered diversification Jamie Oaks – phyletica.org 5/27
Inferring co-diversification
p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1
T1
T2
T3
τ2 τ1
T1
T2
T3
τ1τ2
T1
T2
T3
τ1τ2
T1
T2
T3
τ3 τ1τ2
T1
T2
T3
We want to infer m and T given DNA sequence alignments X
p(mi |X) ∝ p(X |mi )p(mi )
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Clustered diversification Jamie Oaks – phyletica.org 5/27
Inferring co-diversification
p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1
T1
T2
T3
τ2 τ1
T1
T2
T3
τ1τ2
T1
T2
T3
τ1τ2
T1
T2
T3
τ3 τ1τ2
T1
T2
T3
We want to infer m and T given DNA sequence alignments X
p(mi |X) ∝ p(X |mi )p(mi )
p(X |mi ) =
∫θp(X | θ,mi )p(θ |mi )dθ
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Clustered diversification Jamie Oaks – phyletica.org 5/27
Inferring co-diversification
p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1
T1
T2
T3
τ2 τ1
T1
T2
T3
τ1τ2
T1
T2
T3
τ1τ2
T1
T2
T3
τ3 τ1τ2
T1
T2
T3
We want to infer m and T given DNA sequence alignments X
p(mi |X) ∝ p(X |mi )p(mi )
p(X |mi ) =
∫θp(X | θ,mi )p(θ |mi )dθ
I Divergence times
I Gene trees
I Substitution parameters
I Demographic parameters
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Clustered diversification Jamie Oaks – phyletica.org 5/27
Inferring co-diversification
p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1
T1
T2
T3
τ2 τ1
T1
T2
T3
τ1τ2
T1
T2
T3
τ1τ2
T1
T2
T3
τ3 τ1τ2
T1
T2
T3
Challenges:
1. Cannot solve all the integrals analytically
I Numerical approximation via approximate-likelihood Bayesiancomputation (ABC)
2. Sampling over all possible models
I 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Clustered diversification Jamie Oaks – phyletica.org 5/27
Inferring co-diversification
p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1
T1
T2
T3
τ2 τ1
T1
T2
T3
τ1τ2
T1
T2
T3
τ1τ2
T1
T2
T3
τ3 τ1τ2
T1
T2
T3
Challenges:
1. Cannot solve all the integrals analytically
I Numerical approximation via approximate-likelihood Bayesiancomputation (ABC)
2. Sampling over all possible models
I 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Clustered diversification Jamie Oaks – phyletica.org 5/27
Inferring co-diversification
p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1
T1
T2
T3
τ2 τ1
T1
T2
T3
τ1τ2
T1
T2
T3
τ1τ2
T1
T2
T3
τ3 τ1τ2
T1
T2
T3
Challenges:
1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian
computation (ABC)
2. Sampling over all possible models
I 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Clustered diversification Jamie Oaks – phyletica.org 5/27
Inferring co-diversification
p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1
T1
T2
T3
τ2 τ1
T1
T2
T3
τ1τ2
T1
T2
T3
τ1τ2
T1
T2
T3
τ3 τ1τ2
T1
T2
T3
Challenges:
1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian
computation (ABC)
2. Sampling over all possible models
I 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Clustered diversification Jamie Oaks – phyletica.org 5/27
Inferring co-diversification
p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1
T1
T2
T3
τ2 τ1
T1
T2
T3
τ1τ2
T1
T2
T3
τ1τ2
T1
T2
T3
τ3 τ1τ2
T1
T2
T3
Challenges:
1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian
computation (ABC)
2. Sampling over all possible modelsI 5 taxa = 52 models
I 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Clustered diversification Jamie Oaks – phyletica.org 5/27
Inferring co-diversification
p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1
T1
T2
T3
τ2 τ1
T1
T2
T3
τ1τ2
T1
T2
T3
τ1τ2
T1
T2
T3
τ3 τ1τ2
T1
T2
T3
Challenges:
1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian
computation (ABC)
2. Sampling over all possible modelsI 5 taxa = 52 modelsI 10 taxa = 115,975 models
I 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Clustered diversification Jamie Oaks – phyletica.org 5/27
Inferring co-diversification
p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1
T1
T2
T3
τ2 τ1
T1
T2
T3
τ1τ2
T1
T2
T3
τ1τ2
T1
T2
T3
τ3 τ1τ2
T1
T2
T3
Challenges:
1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian
computation (ABC)
2. Sampling over all possible modelsI 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!
I A “diffuse” Dirichlet process prior (DPP)
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Clustered diversification Jamie Oaks – phyletica.org 5/27
Inferring co-diversification
p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1
T1
T2
T3
τ2 τ1
T1
T2
T3
τ1τ2
T1
T2
T3
τ1τ2
T1
T2
T3
τ3 τ1τ2
T1
T2
T3
Challenges:
1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian
computation (ABC)
2. Sampling over all possible modelsI 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Clustered diversification Jamie Oaks – phyletica.org 5/27
“Easy” as ABC
A
A
A
G
G
G
C
C
C
C
C
C
G
G
G
G
G
G
A
A
A
A
A
T
A
A
A
A
A
A
T
T
C
C
C
C
G
G
G
G
G
G
T
T
T
T
T
T
G
G
G
G
G
G
C
C
C
T
T
T
T
T
T
C
C
C
C
C
C
C
C
C
G
G
G
G
G
G
C
C
T
T
T
T
A
A
A
A
A
A
C
C
C
C
C
C
G
G
G
G
G
G
T
T
T
T
T
T
A
A
A
G
G
G
C
C
C
C
C
C
C
C
C
C
C
C
A
A
A
T
T
T
G
G
G
G
G
G
T
T
T
T
C
C
A
A
A
A
A
A
C
C
C
C
C
C
C
C
C
T
T
T
G
G
G
G
G
G
G
G
G
G
G
G
T
T
T
T
T
T
S1
S2
S3
Clustered diversification Jamie Oaks – phyletica.org 6/27
“Easy” as ABC
A
A
A
G
G
G
C
C
C
C
C
C
G
G
G
G
G
G
A
A
A
A
A
T
A
A
A
A
A
A
T
T
C
C
C
C
G
G
G
G
G
G
T
T
T
T
T
T
G
G
G
G
G
G
C
C
C
T
T
T
T
T
T
C
C
C
C
C
C
C
C
C
G
G
G
G
G
G
C
C
T
T
T
T
A
A
A
A
A
A
C
C
C
C
C
C
G
G
G
G
G
G
T
T
T
T
T
T
A
A
A
G
G
G
C
C
C
C
C
C
C
C
C
C
C
C
A
A
A
T
T
T
G
G
G
G
G
G
T
T
T
T
C
C
A
A
A
A
A
A
C
C
C
C
C
C
C
C
C
T
T
T
G
G
G
G
G
G
G
G
G
G
G
G
T
T
T
T
T
T
S1
S2
S3
Clustered diversification Jamie Oaks – phyletica.org 6/27
“Easy” as ABC
A
A
A
G
G
G
C
C
C
C
C
C
G
G
G
G
G
G
A
A
A
A
A
T
A
A
A
A
A
A
T
T
C
C
C
C
G
G
G
G
G
G
T
T
T
T
T
T
G
G
G
G
G
G
C
C
C
T
T
T
T
T
T
C
C
C
C
C
C
C
C
C
G
G
G
G
G
G
C
C
T
T
T
T
A
A
A
A
A
A
C
C
C
C
C
C
G
G
G
G
G
G
T
T
T
T
T
T
A
A
A
G
G
G
C
C
C
C
C
C
C
C
C
C
C
C
A
A
A
T
T
T
G
G
G
G
G
G
T
T
T
T
C
C
A
A
A
A
A
A
C
C
C
C
C
C
C
C
C
T
T
T
G
G
G
G
G
G
G
G
G
G
G
G
T
T
T
T
T
T
S1
S2
S3
Clustered diversification Jamie Oaks – phyletica.org 6/27
“Easy” as ABC
0.00.2
0.40.6
0.81.0 0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
S1
S2
S3
Clustered diversification Jamie Oaks – phyletica.org 7/27
“Easy” as ABC
0.00.2
0.40.6
0.81.0 0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
S1
S2
S3
Clustered diversification Jamie Oaks – phyletica.org 7/27
“Easy” as ABC
0.00.2
0.40.6
0.81.0 0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
S1
S2
S3
Clustered diversification Jamie Oaks – phyletica.org 7/27
“Easy” as ABC
0.00.2
0.40.6
0.81.0 0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
S1
S2
S3
Clustered diversification Jamie Oaks – phyletica.org 7/27
“Easy” as ABC
0.00.2
0.40.6
0.81.0 0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
S1
S2
S3
Clustered diversification Jamie Oaks – phyletica.org 7/27
“Easy” as ABC
0.00.2
0.40.6
0.81.0 0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
S1
S2
S3
Clustered diversification Jamie Oaks – phyletica.org 7/27
“Easy” as ABC
0.00.2
0.40.6
0.81.0 0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
S1
S2
S3
Clustered diversification Jamie Oaks – phyletica.org 7/27
“Easy” as ABC
0.00.2
0.40.6
0.81.0 0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
S1
S2
S3
Clustered diversification Jamie Oaks – phyletica.org 7/27
“Easy” as ABC
0.00.2
0.40.6
0.81.0 0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
S1
S2
S3
Clustered diversification Jamie Oaks – phyletica.org 7/27
Inferring co-diversification
p(m1 |X) p(m2 |X) p(m3 |X) p(m4 |X) p(m5 |X)τ1
T1
T2
T3
τ2 τ1
T1
T2
T3
τ1τ2
T1
T2
T3
τ1τ2
T1
T2
T3
τ3 τ1τ2
T1
T2
T3
Challenges:
1. Cannot solve all the integrals analyticallyI Numerical approximation via approximate-likelihood Bayesian
computation (ABC)
2. Sampling over all possible modelsI 5 taxa = 52 modelsI 10 taxa = 115,975 modelsI 20 taxa = 51,724,158,235,372 models!!I A “diffuse” Dirichlet process prior (DPP)
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Clustered diversification Jamie Oaks – phyletica.org 9/27
Sampling divergence models—a novel approach
I The divergence models are ways of assigning our taxa toevents
I A Dirichlet process prior (DPP) model is a convenient andflexible solution
I Common Bayesian approach to assigning variables to anunknown number of categories
I Controlled by “concentration” parameter: α
Peter Dirichlet
Clustered diversification Jamie Oaks – phyletica.org 10/27
Sampling divergence models—a novel approach
I The divergence models are ways of assigning our taxa toevents
I A Dirichlet process prior (DPP) model is a convenient andflexible solution
I Common Bayesian approach to assigning variables to anunknown number of categories
I Controlled by “concentration” parameter: α
Peter Dirichlet
Clustered diversification Jamie Oaks – phyletica.org 10/27
Sampling divergence models—a novel approach
I The divergence models are ways of assigning our taxa toevents
I A Dirichlet process prior (DPP) model is a convenient andflexible solution
I Common Bayesian approach to assigning variables to anunknown number of categories
I Controlled by “concentration” parameter: α
Peter Dirichlet
Clustered diversification Jamie Oaks – phyletica.org 10/27
Sampling divergence models—a novel approach
I The divergence models are ways of assigning our taxa toevents
I A Dirichlet process prior (DPP) model is a convenient andflexible solution
I Common Bayesian approach to assigning variables to anunknown number of categories
I Controlled by “concentration” parameter: α
Peter Dirichlet
Clustered diversification Jamie Oaks – phyletica.org 10/27
α =
(αα+1
)(αα+2
)
= 0.758
αα+2
(αα+1
)(1
α+2
)
= 0.076
1α+2
(αα+1
)(1
α+2
)
= 0.076
1α+2
αα+1
(1
α+1
)(αα+2
)
= 0.076
αα+2
(1
α+1
)(2
α+2
)
= 0.015
2α+2
1α+1
Clustered diversification Jamie Oaks – phyletica.org 11/27
α =
(αα+1
)(αα+2
)
= 0.758
αα+2
(αα+1
)(1
α+2
)
= 0.076
1α+2
(αα+1
)(1
α+2
)
= 0.076
1α+2
αα+1
(1
α+1
)(αα+2
)
= 0.076
αα+2
(1
α+1
)(2
α+2
)
= 0.015
2α+2
1α+1
Clustered diversification Jamie Oaks – phyletica.org 11/27
α =
(αα+1
)(αα+2
)
= 0.758
αα+2
(αα+1
)(1
α+2
)
= 0.076
1α+2
(αα+1
)(1
α+2
)
= 0.076
1α+2
αα+1
(1
α+1
)(αα+2
)
= 0.076
αα+2
(1
α+1
)(2
α+2
)
= 0.015
2α+2
1α+1
Clustered diversification Jamie Oaks – phyletica.org 11/27
α =
(αα+1
)(αα+2
)
= 0.758
αα+2
(αα+1
)(1
α+2
)
= 0.076
1α+2
(αα+1
)(1
α+2
)
= 0.076
1α+2
αα+1
(1
α+1
)(αα+2
)
= 0.076
αα+2
(1
α+1
)(2
α+2
)
= 0.015
2α+2
1α+1
Clustered diversification Jamie Oaks – phyletica.org 11/27
α =
(αα+1
)(αα+2
)
= 0.758
αα+2
(αα+1
)(1
α+2
)
= 0.076
1α+2
(αα+1
)(1
α+2
)
= 0.076
1α+2
αα+1
(1
α+1
)(αα+2
)
= 0.076
αα+2
(1
α+1
)(2
α+2
)
= 0.015
2α+2
1α+1
Clustered diversification Jamie Oaks – phyletica.org 11/27
α = 0.5
(αα+1
)(αα+2
)= 0.067
= 0.758
αα+2
(αα+1
)(1
α+2
)= 0.133
= 0.076
1α+2
(αα+1
)(1
α+2
)= 0.133
= 0.076
1α+2
αα+1
(1
α+1
)(αα+2
)= 0.133
= 0.076
αα+2
(1
α+1
)(2
α+2
)= 0.533
= 0.015
2α+2
1α+1
Clustered diversification Jamie Oaks – phyletica.org 11/27
α = 10.0
(αα+1
)(αα+2
)= 0.758
αα+2
(αα+1
)(1
α+2
)= 0.076
1α+2
(αα+1
)(1
α+2
)= 0.076
1α+2
αα+1
(1
α+1
)(αα+2
)= 0.076
αα+2
(1
α+1
)(2
α+2
)= 0.0152
α+2
1α+1
Clustered diversification Jamie Oaks – phyletica.org 11/27
New method: dpp-msbayes
I Flexible Dirichlet-process prior (DPP) over all possibledivergence models
I Flexible priors on parameters to avoid strongly weightedposteriors
I Multi-processing to accommodate genomic datasets
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 12/27
New method: dpp-msbayes
I Flexible Dirichlet-process prior (DPP) over all possibledivergence models
I Flexible priors on parameters to avoid strongly weightedposteriors
I Multi-processing to accommodate genomic datasets
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 12/27
New method: dpp-msbayes
I Flexible Dirichlet-process prior (DPP) over all possibledivergence models
I Flexible priors on parameters to avoid strongly weightedposteriors
I Multi-processing to accommodate genomic datasets
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 12/27
dpp-msbayes: Simulation-based assessment
Validation:
I Simulate 50,000 datasets and analyze each under the samemodel
Robustness:
I Simulate datasets that violate model assumptions and analyzeeach of them
Clustered diversification Jamie Oaks – phyletica.org 13/27
dpp-msbayes: Simulation-based assessment
Validation:
I Simulate 50,000 datasets and analyze each under the samemodel
Robustness:
I Simulate datasets that violate model assumptions and analyzeeach of them
Clustered diversification Jamie Oaks – phyletica.org 13/27
dpp-msbayes: Validation results
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
Posterior probability of one divergence
True
prob
abili
tyof
one
dive
rgen
ce
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 14/27
dpp-msbayes: Robustness results
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
Posterior probability of one divergence
True
prob
abili
tyof
one
dive
rgen
ce
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 15/27
dpp-msbayes: Performance
I New method for estimating shared evolutionary history shows:
1. Model-choice accuracy2. Robustness to model violations3. Power to detect variation in divergence times4. It’s fast!
I A new tool for biologists to leverage comparativegenomic data to explore processes of co-diversification
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 16/27
dpp-msbayes: Performance
I New method for estimating shared evolutionary history shows:
1. Model-choice accuracy2. Robustness to model violations3. Power to detect variation in divergence times4. It’s fast!
I A new tool for biologists to leverage comparativegenomic data to explore processes of co-diversification
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 16/27
Empirical applications
Did repeatedfragmentation of islandsduring inter-glacial risesin sea level promotediversification?
Clustered diversification Jamie Oaks – phyletica.org 17/27
Empirical applications
Did repeatedfragmentation of islandsduring inter-glacial risesin sea level promotediversification?
Clustered diversification Jamie Oaks – phyletica.org 17/27
Climate-driven diversification
Clustered diversification Jamie Oaks – phyletica.org 18/27
Climate-driven diversification
Clustered diversification Jamie Oaks – phyletica.org 18/27
Climate-driven diversification
Clustered diversification Jamie Oaks – phyletica.org 18/27
Results
1 3 5 7 9 11 13 15 17 19 21Number of divergence events
0.00
0.02
0.04
0.06
0.08
0.10
Pos
terio
r pro
babi
lity
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 19/27
Results
1 3 5 7 9 11 13 15 17 19 21Number of divergence events
0.00
0.02
0.04
0.06
0.08
0.10
Pos
terio
r pro
babi
lity
0100200300400500Time (kya)
0
-50
-100
Sea le
vel (m
)
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150Clustered diversification Jamie Oaks – phyletica.org 19/27
More data!
I Collecting genomic data from taxa co-distributed acrossSoutheast Asian Islands and Mainland
I Preliminary results for 1000 loci from 5 pairs of Gekkomindorensis populations
Clustered diversification Jamie Oaks – phyletica.org 20/27
More data!
I Collecting genomic data from taxa co-distributed acrossSoutheast Asian Islands and Mainland
I Preliminary results for 1000 loci from 5 pairs of Gekkomindorensis populations
1 2 3 4 5Number of divergence events, j¿j
-5.0
-4.0
-3.0
-2.0
-1.0
0.0
1.0
2.0
3.02l
n(B
ayes
fact
or)
Clustered diversification Jamie Oaks – phyletica.org 20/27
Diversification across African rainforests
I Did climate cycles drivediversification andcommunity assembly acrossrainforest taxa?
I Preliminary results with 300loci from 3 taxa
Clustered diversification Jamie Oaks – phyletica.org 21/27
Diversification across African rainforests
I Did climate cycles drivediversification andcommunity assembly acrossrainforest taxa?
I Preliminary results with 300loci from 3 taxa
1 2 3Number of divergence events, j¿j
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2ln(
Bay
es fa
ctor
)
Clustered diversification Jamie Oaks – phyletica.org 21/27
Conclusions
I New method for estimating shared evolutionary historyI Shows good “frequentist” behaviorI Relatively robust to model violations
I Finding support for temporally clustered divergences inmultiple systems
I However, there is a lot of uncertainty!
Clustered diversification Jamie Oaks – phyletica.org 22/27
Conclusions
I New method for estimating shared evolutionary historyI Shows good “frequentist” behaviorI Relatively robust to model violations
I Finding support for temporally clustered divergences inmultiple systems
I However, there is a lot of uncertainty!
Clustered diversification Jamie Oaks – phyletica.org 22/27
Conclusions
I New method for estimating shared evolutionary historyI Shows good “frequentist” behaviorI Relatively robust to model violations
I Finding support for temporally clustered divergences inmultiple systems
I However, there is a lot of uncertainty!
Clustered diversification Jamie Oaks – phyletica.org 22/27
Current work: More power
I Full-likelihood Bayesian implementation
I Uses all the information in the dataI Applicable to deeper timescales
I Analytically integrate over gene trees 1
I Very efficient numerical approximation of posteriorI Applicable to NGS datasets
1D. Bryant et al. (2012). Molecular Biology And Evolution 29: 1917–1932
Clustered diversification Jamie Oaks – phyletica.org 23/27
Current work: More power
I Full-likelihood Bayesian implementationI Uses all the information in the dataI Applicable to deeper timescales
I Analytically integrate over gene trees 1
I Very efficient numerical approximation of posteriorI Applicable to NGS datasets
1D. Bryant et al. (2012). Molecular Biology And Evolution 29: 1917–1932
Clustered diversification Jamie Oaks – phyletica.org 23/27
Current work: More power
I Full-likelihood Bayesian implementationI Uses all the information in the dataI Applicable to deeper timescales
I Analytically integrate over gene trees 1
I Very efficient numerical approximation of posteriorI Applicable to NGS datasets
1D. Bryant et al. (2012). Molecular Biology And Evolution 29: 1917–1932
Clustered diversification Jamie Oaks – phyletica.org 23/27
Current work: More power
I Full-likelihood Bayesian implementationI Uses all the information in the dataI Applicable to deeper timescales
I Analytically integrate over gene trees 1
I Very efficient numerical approximation of posteriorI Applicable to NGS datasets
1D. Bryant et al. (2012). Molecular Biology And Evolution 29: 1917–1932
Clustered diversification Jamie Oaks – phyletica.org 23/27
Next step: A general framework
I Develop a framework for inferringshared divergences acrossphylogenies
I Generalize Bayesian phylogeneticsto incorporate shared divergences
I Sample models numerically viareversible-jump Markov chainMonte Carlo
Benefits:
I Improve phylogenetic inference
I Framework for studying processesof co-diversification
τ1τ2
T1
T2
T3
Clustered diversification Jamie Oaks – phyletica.org 24/27
Next step: A general framework
I Develop a framework for inferringshared divergences acrossphylogenies
I Generalize Bayesian phylogeneticsto incorporate shared divergences
I Sample models numerically viareversible-jump Markov chainMonte Carlo
Benefits:
I Improve phylogenetic inference
I Framework for studying processesof co-diversification
τ1τ2
T1
T2
T3
Clustered diversification Jamie Oaks – phyletica.org 24/27
Next step: A general framework
I Develop a framework for inferringshared divergences acrossphylogenies
I Generalize Bayesian phylogeneticsto incorporate shared divergences
I Sample models numerically viareversible-jump Markov chainMonte Carlo
Benefits:
I Improve phylogenetic inference
I Framework for studying processesof co-diversification
τ1τ2
T1
T2
T3
Clustered diversification Jamie Oaks – phyletica.org 24/27
Next step: A general framework
I Develop a framework for inferringshared divergences acrossphylogenies
I Generalize Bayesian phylogeneticsto incorporate shared divergences
I Sample models numerically viareversible-jump Markov chainMonte Carlo
Benefits:
I Improve phylogenetic inference
I Framework for studying processesof co-diversification
τ1τ2
T1
T2
T3
Clustered diversification Jamie Oaks – phyletica.org 24/27
Next step: A general framework
I Develop a framework for inferringshared divergences acrossphylogenies
I Generalize Bayesian phylogeneticsto incorporate shared divergences
I Sample models numerically viareversible-jump Markov chainMonte Carlo
Benefits:
I Improve phylogenetic inference
I Framework for studying processesof co-diversification
τ1τ2
T1
T2
T3
Clustered diversification Jamie Oaks – phyletica.org 24/27
Everything is on GitHub. . .
Software:
I dpp-msbayes: https://github.com/joaks1/dpp-msbayes
I PyMsBayes: https://joaks1.github.io/PyMsBayes
I ABACUS: Approximate BAyesian C UtilitieS.https://github.com/joaks1/abacus
Open-Science Notebook:
I msbayes-experiments:https://github.com/joaks1/msbayes-experiments
Clustered diversification Jamie Oaks – phyletica.org 25/27
Acknowledgments
Ideas and feedback:
I Leache Lab
I Minin Lab
I Holder Lab
I Brown Lab/KU Herpetology
Computation:
Funding:
Photo credits:
I Rafe Brown, Cam Siler, JesseGrismer, & Jake Esselstyn
I FMNH Philippine MammalWebsite:
I D.S. Balete, M.R.M. Duya,& J. Holden
I PhyloPic!
Clustered diversification Jamie Oaks – phyletica.org 26/27
Questions?
c© 2007 Boris Kulikov boris-kulikov.blogspot.com
Clustered diversification Jamie Oaks – phyletica.org 27/27