31
Center for Evolutionary Medicine and Informatics MPAW Estimation of divergence times Fabia U. Battistuzzi [email protected]

Estimation of divergence times

Embed Size (px)

DESCRIPTION

Dr. Battistuzzi's presentation during the 2011 CEMI MPA Workshop.

Citation preview

Page 1: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

MPAWEstimation of divergence times

Fabia U. Battistuzzi

[email protected]

Page 2: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

Two dimensions of evolution Staphylococcaceae

Lactobacillaceae

Mycoplasmataceae1

Symbiobacterium

Thermoanaerobacteriaceae1

Dehalococcoides

Synechococcaceae2

Merismopediaceae

Frankiaceae

Nocardiaceae

Bifidobacteriaceae

Francisellaceae

Enterobacteriaceae

Colwelliaceae

Pseudomonadaceae

Legionellaceae

Piscirickettsiaceae

Rhodocyclaceae

Alcaligenaceae

Erythrobacteraceae

Bradyrhizobiaceae

Bartonellaceae

Acetobacteraceae

Rickettsiaceae

Myxococcaceae

Geobacteraceae

Chlamydiaceae

Bacteroidaceae

Spirochaetaceae

050010001500200025003000

Lineage Relations

Time frame

Evolutionary Rate

Page 3: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

Molecular clocks – brief overview

Time

Se

que

nce

Cha

nge

X

Kumar, Nature Reviews Genetics (2005)

1962 1968

19721976

1984 1989

19972006

1st protein clock

Page 4: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

Molecular clocks – brief overview

Kumar, Nature Reviews Genetics (2005)

1962 1968

19721976

1984 1989

19972006

1st protein clock

Neutral theory

Rate tests

Page 5: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

Molecular clocks – brief overview

Kumar, Nature Reviews Genetics (2005)

1962 1968

19721976

1984 1989

19972006

1st protein clock

Neutral theory

Deut.-Prot. divergenceRate tests

Rate Autocorrelation

Ancestor

Descendant

slower faster

Page 6: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

Molecular clocks – brief overview

Kumar, Nature Reviews Genetics (2005)

1962 1968

19721976

1984 1989

19972006

1st protein clock

Neutral theory

Deut.-Prot. divergenceRate tests

Rate Autocorrelation

Local rates

slower faster

Page 7: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

Molecular clocks – brief overview

Kumar, Nature Reviews Genetics (2005)

1962 1968

19721976

1984 1989

19972006

1st protein clock

Neutral theory

Deut.-Prot. divergenceRate tests

Rate Autocorrelation

Local rates

Autocorrelated clocks

Uncorrelated clocks

Page 8: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

What can we do with molecular clocks?

• Species divergence• Phylogeography• Epidemiology• Rate estimations

Eastern fox squirrel (Sciurus niger) lacks phylogeographic structure: recent range expansion and phenotypic differentiation

Page 9: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

Molecular clock packages available

• BEAST – Drummond & Rambaut– Uncorrelated rates

• MCMCTree – Yang– Uncorrelated and autocorrelated rates

• MultiDivTime – Thorne & Kishino– Autocorrelated rates between ancestor-descendant

• Pathd8 – Britton et al.– Autocorrelation between sister groups

• Phylobayes – Lartillot et al.– Uncorrelated and autocorrelated rates

• R8s – Sanderson– Strict, local, relaxed clock

Page 10: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

Molecular clock packages available

• BEAST – uncorrelated rates• MCMCTree – uncorrelated & autocorrelated rates• MultiDivTime – autocorrelated rates

Basic functionality

1. bayesian methods: based on priors and data, estimate posteriors (divergence times and credibility intervals)

2. analyze partitioned data (codon positions, genes)

3. calibration points

4. estimate phylogeny and/or branch lengths

Page 11: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

Calibration priors

Minimum only Maximum only Minimum-Maximum

time

lognormal

: 95% probability

uniformtime

exponentialtime

normaltime

Page 12: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

Calibration priors

Minimum only Maximum only Minimum-Maximum

time

: 95% probability

time

exponentialtime

normaltime

Hedges and Kumar, Trends in Genetics (2004)

Page 13: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

BEAUTI & BEAST

• nexus file • xml file

Page 14: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

Phylogeny specification

<newick id="startingTree">(((((Ssc:0.65,Bta:0.65):0.16,((Cfa:0.46,Fca:0.46):0.28,Eca:0.74):0.07):0.11,

(((Rno:0.20,Mmu:0.20):0.65,Ocu:0.85):0.05,(((Hsa:0.05,Ptr:0.05):0.05,Ppy:0.10):0.13,Mml:0.23):0.67):0.02):0.81,Tvu1:1.73):1.37,Gga:3.10);</newick>

Page 15: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

Strict clock & Relaxed clock

Priors

Page 16: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

Operators

• remove “Tree” operator for fixed phylogeny

Page 17: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

Generations

• Convergence & ESS values

Page 18: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

Beast running…

Page 19: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

A fuzzy caterpillar

Page 20: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

MCMCTree

seqfile = exampleseqs.phy treefile = example.tre outfile = exampleseqs_3.out

(((((Ssc,Bta),((Cfa,Fca)'B(0.45,0.47)',Eca)),(((Rno,Mmu),Ocu),(((Hsa,Ptr),Ppy)'B(0.09,0.11)',Mml))),Tvu1),Gga);

Page 21: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

MCMCTree

seqfile = exampleseqs.phy treefile = example.tre outfile = exampleseqs_3.out

(((((Ssc,Bta),((Cfa,Fca)‘L(0.35,0.1,0.5,0.025)',Eca)),(((Rno,Mmu),Ocu),(((Hsa,Ptr),Ppy),Mml))),Tvu1),Gga);

ppLL

ppLLttLL pp cc

Page 22: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

MCMCTree

seqfile = exampleseqs.phy treefile = example.tre outfile = exampleseqs_3.out

ndata = 1 usedata = 3 * 0: no data; 1:seq like; 2:use in.BV; 3: out.BV clock = 3 * 1: global clock; 2: independent rates; 3: correlated rates RootAge = < 3.0 * safe constraint on root age, used if no fossil for root.

Ancestor

Descendant

slower fasterslower faster

Ancestor

Descendant

uncorrelated autocorrelated

Page 23: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

MCMCTree

( , )F c TL

model = 4 * 0:JC69, 1:K80, 2:F81, 3:F84, 4:HKY85alpha = 0 * alpha for gamma rates at sitesncatG = 5 * No. categories in discrete gamma

cleandata = 0 * remove sites with ambiguity data (1:yes, 0:no)?

BDparas = 2 2 0.1 * birth, death, samplingkappa_gamma = 6 2 * gamma prior for kappaalpha_gamma = 1 1 * gamma prior for alpha

rgene_gamma = 1 7.13 * gamma prior for overall rates for genessigma2_gamma = 1 1.15 * gamma prior for sigma^2 (for clock=2 or 3)

rgene: prior on rate parameter;

Sigma2: prior on rate heterogeneity;

( , )F BL TL

Page 24: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

TimeTrees

300 250 200 150 100 50 0

Time (millions of years)

Page 25: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

TimeTrees

Ssc Bta Cfa Fca Eca Rno Mmu Ocu Hsa Ptr Ppy Mml Tvu1 Gga

050100150200

Time (millions of years)

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100

true time

est

ima

ted

tim

e

Model Match

Model Violation

Page 26: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

TimeTrees

Ssc Bta Cfa Fca Eca Rno Mmu Ocu Hsa Ptr Ppy Mml Tvu1 Gga

050100150200

Time (millions of years)

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100

true time

est

ima

ted

tim

e

Model Match

Model Violation

Model ViolationBEAST

Page 27: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

0

10

20

30

40

50

60

70

-45 -35 -25 -15 -5 5 15 25 35 45 55 65

% difference from true time

freq

uenc

y

ModelMatch

ModelViolationBEAST

Page 28: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

MCMCTree

0

10

20

30

40

50

60

-45 -35 -25 -15 -5 5 15 25 35 45 55 65

% difference from true time

fre

que

ncy

ModelMatch

ModelViolation autocorrelation

0

10

20

30

40

50

60

-45 -35 -25 -15 -5 5 15 25 35 45 55 65

% difference from true time

fre

que

ncy

ModelMatch

ModelViolation uncorrelation

Page 29: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

MultiDivTime

0

10

20

30

40

50

60

70

-45 -35 -25 -15 -5 5 15 25 35 45 55 65

% difference from true time

fre

qu

en

cyModel Match

Model Violation

Page 30: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

16%

84%

95% Credibility intervals

6%

94%

4%

96%

12%

88%

TT

TT

Success

Failure

Model Match

Model Violation

Page 31: Estimation of divergence times

Center for Evolutionary Medicine and Informatics

Things to remember

• Check priors for calibrations, substitution rate, rate model, etc.• Repeat every analyses at least twice to check for convergence• Look for the “fuzzy caterpillar” for all parameters• Test assumptions’ effects using multiple methods and priors (bayes factors)• Credibility intervals are a conservative estimate of divergences

Questions ?