15
Loss of information at deeper divergences David Penny Phylomania Nov 2013 The mathematicos caused the problem!!! Now they should solve it! Okay, maybe we could help them, Here are some ideas Need relative – not absolute - information

Loss of information at deeper divergences · 2014-11-14 · Loss of information at deeper divergences David Penny Phylomania Nov 2013 The mathematicos caused the problem!!! Now they

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Loss of information at deeper divergences · 2014-11-14 · Loss of information at deeper divergences David Penny Phylomania Nov 2013 The mathematicos caused the problem!!! Now they

Loss of information at deeper divergences

David Penny Phylomania Nov 2013

The mathematicos caused the problem!!! Now they should solve it!

Okay, maybe we could help them, Here are some ideas

Need relative – not absolute - information

Page 2: Loss of information at deeper divergences · 2014-11-14 · Loss of information at deeper divergences David Penny Phylomania Nov 2013 The mathematicos caused the problem!!! Now they

the comfort zone

ML Int

ML Rel

Mlav ML

MLan MP ML

MLep MP MP

popn classic phylogeny deep

phylogeny

Page 3: Loss of information at deeper divergences · 2014-11-14 · Loss of information at deeper divergences David Penny Phylomania Nov 2013 The mathematicos caused the problem!!! Now they

can we go further back

in time?

Markov models - Loss of information

Page 4: Loss of information at deeper divergences · 2014-11-14 · Loss of information at deeper divergences David Penny Phylomania Nov 2013 The mathematicos caused the problem!!! Now they

damned eukaryotes!

Page 5: Loss of information at deeper divergences · 2014-11-14 · Loss of information at deeper divergences David Penny Phylomania Nov 2013 The mathematicos caused the problem!!! Now they

Calculated results, Δ ≤ ¼ + ne-qt

-0.2

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000

0.01 0.005 0.002 0.001

Page 6: Loss of information at deeper divergences · 2014-11-14 · Loss of information at deeper divergences David Penny Phylomania Nov 2013 The mathematicos caused the problem!!! Now they

0%

20%

40%

60%

80%

100%

120%

0.1 1 10

pe

rce

nta

ge

of

tre

es

co

rre

ct

d=0.001

d=0.100

d=0.500

d=1.000

d=2.000

d=5.000

infinite

1. simulation results with covarion model

Page 7: Loss of information at deeper divergences · 2014-11-14 · Loss of information at deeper divergences David Penny Phylomania Nov 2013 The mathematicos caused the problem!!! Now they

number of internal edges correct, out of 6neighbor joining, 9 taxa, 1000 columns, i.i.d.

00.5

1

5 8 13 20 32 50 80 125

200

320

500

790

1250

2000

millions of years (log scale)

6

5

4

3

2

1

0

simulation results with standard model

Page 8: Loss of information at deeper divergences · 2014-11-14 · Loss of information at deeper divergences David Penny Phylomania Nov 2013 The mathematicos caused the problem!!! Now they

1 idea, delete fast sites

If there were a mixture of a) faster evolving sites, and b) and we could identify them c) and remove them would that help go further back in time?

Page 9: Loss of information at deeper divergences · 2014-11-14 · Loss of information at deeper divergences David Penny Phylomania Nov 2013 The mathematicos caused the problem!!! Now they

deleting faster sites

Page 10: Loss of information at deeper divergences · 2014-11-14 · Loss of information at deeper divergences David Penny Phylomania Nov 2013 The mathematicos caused the problem!!! Now they

Ancestral Sequence Reconstruction

Giardia animals plants

Page 11: Loss of information at deeper divergences · 2014-11-14 · Loss of information at deeper divergences David Penny Phylomania Nov 2013 The mathematicos caused the problem!!! Now they

2, 3, testing

Ancestral Sequence Reconstr-

uction

vaults 3-D info

Page 12: Loss of information at deeper divergences · 2014-11-14 · Loss of information at deeper divergences David Penny Phylomania Nov 2013 The mathematicos caused the problem!!! Now they

subgroups X and Y

a b c d e k l m n o

ax a

y

subgroup X subgroup Y

Page 13: Loss of information at deeper divergences · 2014-11-14 · Loss of information at deeper divergences David Penny Phylomania Nov 2013 The mathematicos caused the problem!!! Now they

4. gene length vs similarity

Page 14: Loss of information at deeper divergences · 2014-11-14 · Loss of information at deeper divergences David Penny Phylomania Nov 2013 The mathematicos caused the problem!!! Now they

f1 a . . . . a . . . . . .

f2 . . . . . a . . . . a .

f3 . . . . . a . . . . . a

f4 . . . . . a . . . . . .

g1 . a a a a . a a . . . .

g2 a a a a a . . a . . a .

h1 . . a a a . . . a a . .

h2 . . a . a . . . a a . .

h3 a . . a a . . . a a . .

i j 3 4 5 6 7 8 9 0 1 2

i j i j

. . . a

a . a a

upper bound = 17

lower bound = 12

?

13

Would weighting by incompatibilities

help?

5, Weighting

Page 15: Loss of information at deeper divergences · 2014-11-14 · Loss of information at deeper divergences David Penny Phylomania Nov 2013 The mathematicos caused the problem!!! Now they

information from sequence order not used

Alignment Reordered Alignment original sequence order shuffled/reordered AIIFLNSALGPSPELFPIILATKVL ASAGPSPPATPLLIIIILLFFNEKV

AIMFLNSALGPPTELFPVILATKVL ASAGPPTPATPLLIMVILLFFNEKV

SIMFLNHTLNPTPELFPIILATETL SHTNPTPPATPLLIMIILLFFNEET

TILFLNSSLGLQPEVTPTVLATKTL TSSGLQPPATPLLILTVLVTFNEKT

TLLFLNSMLKPPSELFPIILATKTL TSMKPPSPATPLLLLIILLFFNEKT

ALLFLNSTLNPPTELFPLILATKTL ASTNPPTPATPLLLLLILLFFNEKT

AILFLNSFLNPPKEFFPIILATKIL ASFNPPKPATPLLILIILFFFNEKI

c! ways to reorder alignment

shuffle by columns & by taxa

6. So, could we use ‘words’ of 2, 3, 4, 5, … letters

7. Alphabet reduction