45
In silico reconstruction of an ancestral mammalian genome UQAM Seminaire de bioinformatique Mathieu Blanchette QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

In silico reconstruction of an ancestral mammalian genome

  • Upload
    leon

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

In silico reconstruction of an ancestral mammalian genome. UQAM Seminaire de bioinformatique Mathieu Blanchette. CGACTGCATCAGACGACGATCAGACTACTATATCAGCAGATTACGGT GCATCGTATTTACGTTACGCATGACGATCAGACTACGCATAGATAGA TGCATCAGACGACGATCAGACTACTATATCAGCAGATTACGGTCGAT - PowerPoint PPT Presentation

Citation preview

Page 1: In silico  reconstruction of an ancestral mammalian genome

In silico reconstruction of an ancestral mammalian genome

UQAM

Seminaire de bioinformatique

Mathieu Blanchette

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 2: In silico  reconstruction of an ancestral mammalian genome

CGACTGCATCAGACGACGATCAGACTACTATATCAGCAGATTACGGTGCATCGTATTTACGTTACGCATGACGATCAGACTACGCATAGATAGA TGCATCAGACGACGATCAGACTACTATATCAGCAGATTACGGTCGATTATTTACGTTACGCATGACGATCAGACTACGCATAGATAGAGCAATA CGACTGCATCAGACGACGATCAGACTACTATATCAGCAGATTACGGTGCGTATTTACGTTACGCATGACGATCAGACTACGCATAGATAGAGCA CGCATCAGACGACGATCAGACTACTATATCAGCAGATTACGGTCGTAACGTTACGCATGACGATCAGACTACGCATAGATAGAGCCGATCATCT CAGACGACGATCAGACTACTATATCAGCAGATTACGGTGGCATACTAATCGTATTTACGTTACGCATGACGATCAGACTACGCATAGATAGAAA CGACGATCAGACTACTATATCAGCAGATTACGGTGCGCGAATTCATATATTTACGTTACGCATGACGATCAGACTACGCATAGATAGATTGATA CATCAGACGACGATCAGACTACTATATCAGCAGATTACGGTGCATATTTTACGTTACGCATGACGATCAGACTACGCATAGATAGAGATCATCATCAGACGACGATCAGACTACTATATCAGCAGATTACGGTAGCATTCTCGTATTTACGTTACGCATGACGATCAGACTACGCATAGATAGAATGC ACGACGATCAGACTACTATATCAGCAGATTACGGTGATAGATACGATCGTATTTACGTTACGCATGACGATCAGACTACGCATAGATAGAGATAGCATCAGACGACGATCAGACTACTATATCAGCAGATTACGGTGATACGCATGACGATCAGACTACGCATAGATAGATTATTACTGGATACTGCA

The Human genome• Sequence of ~3*109 nucleotides

• Complete sequence is known (2001)

HOW DOES IT WORK??

Page 3: In silico  reconstruction of an ancestral mammalian genome

Comparative Genomics

• Goal: Functional annotation of the genome– What is the role of each region of the genome?– Very hard to answer….

• Idea: Look not only at what our genome is now, but also at how it evolved– Different types of functional regions have different evolutionary

signatures

• Complete genomes are sequenced for:– Human, chimp, mouse, rat, house, chicken, zebrafish, pufferfish

• Partial genomes are available for:– Dog, cow, rabbit, elephant, armadillo

Page 4: In silico  reconstruction of an ancestral mammalian genome
Page 5: In silico  reconstruction of an ancestral mammalian genome

MutationsG(t) = ACGTAGGCGATCAG---ATCGATG(t+1)= ACGAAGG--ATCAGGGGATCGAT

• Other less frequent mutations:- Duplications- Genome rearrangements (e.g. large inversions)

• Mutations happen randomly• Natural selection favors mutations that improve fitness

Substitutions Deletions Insertions

Page 6: In silico  reconstruction of an ancestral mammalian genome

A random walk in genome space

Page 7: In silico  reconstruction of an ancestral mammalian genome

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

http://www.broad.mit.edu/personal/jpvinson/phylogenetics/bigtree_1_0.jpg

Mammalian evolution

-Rapid radiation ~75 Myrs ago

-Many nearly independent phyla

-Many “noisy” copies of ancestor

- Accurate reconstruction of ancestors may be feasible

Page 8: In silico  reconstruction of an ancestral mammalian genome

Ancestral Genome ReconstructionGiven: - Genomic sequences of several mammals

- Phylogenetic tree

Find: The genomic sequence of all their ancestors

ARMADILLO TGCTACTAATATTTAGTACATAGAGCCCAGGGGTGCTGCTGAAAGTCTTAAAATGCACAGTGTAGCCCCTCCTCC

COW GCCTCTCTTTCTGCCCTGCAGGCTAGAATGTATCACTTAGATGTTCCAAATCAGAAAGTGTTCAGCCATTTCCATACC

HORSE GTCACAATTTAGGAAGTGCCACTGGCCTCTAGAGGGTAGAAGACAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCC

CAT GTCACAGTTTAGGGGGTACTACTGGCATCTATCGGGTGGAGGATAGGGATACTGATAATCATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCC

DOG GTCACAATTTGGGGGATACTACTGGCATCTAATGGGTAGAGGACAGGGATACTGATAATTGCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCC

HEDGEHOG GTCATAGTTTGATTATATGGGCTTCTTAGTAGACAAAGAAAAAGATGTTCTGGTAGTCATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTC

MOUSE GTCACAGTTTGGAGGATGTTACTGACATCTAGAGAGTAGACTTTAAAGATACTGATAGTCACCCCATTGTGCACCTCC

RAT GTCACAATTTGGAGGATGTTACTGGCATCTAGAGAGTAGACTTTAAGGACACTGATAATCATACTATGCTGCACTTCC

RABBIT ATCACAATTTGGGGAACACCACTGGCATCTCGGGTAGCAGGCCAGGCATGCTGGTAATTATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACC

LEMUR ATCACAATTGGGGGTGCCACGGTCCTCCAGTGGGTAGAGAACAGGGAGGCTGATAACCACCCTGCAGTGCACAGGGCAGTGCCCCACTCCCACCAC

MOUSE-LEMUR ATCACAGTTGGGGGATGCCACTGGCCTCAAGTGGGTAGAGAACAGGGAGGCTGAAAACCACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCC

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTCTACTTGGGTAGAGAAACAGGGATGCTTATAATCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTCTACTTGGGTAGAGAAACAGGAATGCTTATAATCATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCC

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTCTACTTGGGTAGAAAAACAGGGATGCTTATAATCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTCGACTTGGGTAGAGAAGCGGGGATGCTTATAATCATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCC

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTCAACTTGGGTAGAGAAGTGGGGATGCTTATACTCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTCAACTTGGGTAGAGAAGCGGGGATGCTTATAATCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTCAACTTGGGTAGAGAAGCGGGGATGCTTATAATCATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCC

Mutational operations• Small-scale : Substitutions, deletions, insertions (inc. transposons)• Large scale: Genome rearrangement, segmental/tandem duplications

(*): Heterochromatin non-included

All of it: Functional, non-functional, introns,

intergenic, repeats, everything*!

Page 9: In silico  reconstruction of an ancestral mammalian genome

Reconstruction algorithm

1) Identify syntenic regions in each species• Blastz (Schwartz et al.) and Chaining/netting

program (Kent et al.)

• In ENCODE case: targeted BAC sequencing

Page 10: In silico  reconstruction of an ancestral mammalian genome

Reconstruction algorithm

2) Compute multiple genome alignment• TBA program (Blanchette, Miller, et al.)

ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG

COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA

HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA

CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA

DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA

HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA

MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG

RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG

RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG

LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG

MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG

• Goal: Phylogenetic correctness• Two nucleotides are aligned if and only if

they have a common ancestor.

Page 11: In silico  reconstruction of an ancestral mammalian genome

Reconstruction algorithm

3) Reconstruct insertion/deletion history • Find most likely explanation for gaps observed

ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG

COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA

HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA

CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA

DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA

HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA

MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG

RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG

RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG

LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG

MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG

Page 12: In silico  reconstruction of an ancestral mammalian genome

Reconstruction algorithm

3) Reconstruct insertion/deletion history • Find most likely explanation for gaps observed

ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG

COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA

HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA

CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA

DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA

HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA

MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG

RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG

RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG

LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG

MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG

Page 13: In silico  reconstruction of an ancestral mammalian genome

Reconstruction algorithm

3) Reconstruct insertion/deletion history – Find most likely explanation for gaps observed

• This defines the presence/absence of a base at each position of each ancestor

ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG

COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA

HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA

CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA

DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA

HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA

MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG

RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG

RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG

LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG

MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG

NNNNNNNNNNNNNNNNNNNNNNNNNNNN-----N-NNNNN-NNNNNNN-NN-NNNNNNNNNNNNNNNNN----------NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

Page 14: In silico  reconstruction of an ancestral mammalian genome

Reconstruction algorithm

ARMADILLO ----------------TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA----------GTCTTAAAATGCACAGTGTAGCCCCTCCTCC------------ACAAAGAATTAACTAGCCCAGAATGTCAGGA--------GT--A-CCAAG

COW GCCTCTCTTT-----------CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA---------------ATCAGAAAGTGTTCAG----------CCATTTCCATACCACC----AGGAGCTA-CAATGTTGGGCTGCAGCTA--------TTTGGATCAAA

HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCAGAACAGCACCCCTACCCTCACCCCATCAACAAAGAATTATCCAGCCCAAAATGCCAATA--------GT--GCCCAGA

CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC----------ATTCTACAGTGCACAGGACAGTACCCCTACTTTCACCCCACAA-CAAAGAATTATCCAGCCCAAAATGCCAACA--------GT--GCTCAGA

DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT----------GCTTTACAGTGCACAGGACAGCACCCTTATCTTCACCCCAAAAGCAAAGTATTATCCAGCCCCAAATGCCAATG--------GT--GCTCAGA

HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC----------ATTCTGCTTTCCATATGATAGCACTCCCATCTTCACTTCCAAAATTAAGAGTCATCATACTCAGTGTGCCAATA--------TG--GCCCAGA

MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC----------ACCCCATTGTGCAC---------------------CTCCAACAATAATGGCTCATCGAAACCTAAATGCCAATCTGCCAATTAT--GTCCATG

RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC----------ATACTATGCTGCAC---------------------TTCCAACAATAATGGCTCATCTAGACCTAAATACCAATCTGCCAATTAT--ATCCATG

RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT----------ATACTACAGTGCACAGTACAGTTCCCCACATCCCGCACCAACAACA--GGTTTATGCTGCCCAAAGTGCCAGTGTGC-----------CCACG

LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC----------ACCCTGCAGTGCACAGGGCAGTGCC-CCACTCCCACCACAACAATGGAGAATTATTGGGCCCCAAATGCCAATA--------GT--GCCCAAG

MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC----------ACCCTGCAGAGCACGGGGCAGTGCCTTCACCACCACTCCAACAACGGAGAATTATTGGGTCCCAAATGCCAATA--------GT—-GCCCAGG

VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGAACCCAAAATGTTAATA--------GT--GTCCAGG

MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC----------ATCCTACAGTGCACAGGTCAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGCTAATG--------GT--GTCCAGG

BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTATCGAAGAATCATTGGACCCAAAATGTTAATG--------GT--GTCCAGG

ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCAACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCACTGGACCCAAAATGTTAATG--------GT--GTCCAGG

GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGG

CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCGAAAATGTTAATG--------GT--GTCCAGA

HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC----------ATCCTACAGTGCACAGGACAGTACCCCCACCCACACTCCAGTAATGAAGAATCATTAGACCTAAAATGTTAATG--------GT--GTCCAGG

GTCACAATTTGGGGGATGCTACTGGCAT-----C-TAGTG-GGTAGAG-AA-CAGGGATGCTGATAATC----------ATCCTACAGTGCACAGGACAGTGCCCCCACCCCCACTCCAACAACAAAGAATTATCCGGCCCAAAATGCCAATA--------GT--GCCCAGG

4) Infer max.-like. nucleotide at each position– Felsenstein algo. with context-sensitive model

• Ancestral sequences are inferred!

Page 15: In silico  reconstruction of an ancestral mammalian genome

Optimal indel reconstructionNot so easy!

NNNNNNNNNNNNNNN

NN------NNNNNNN

NNNN-------NNNN

NNNNNN-----NNNN

Page 16: In silico  reconstruction of an ancestral mammalian genome

Reconstructing indel historyNot so easy!

NNNNNNNNNNNNNNN

NN------NNNNNNN

NNNN-------NNNN

NNNNNN-----NNNN

Page 17: In silico  reconstruction of an ancestral mammalian genome

Reconstructing indel historyNot so easy!

NNNNNNNNNNNNNNN

NN------NNNNNNN

NNNN-------NNNN

NNNNNN-----NNNN

NNNNNNNNNNNNNNN

NN------NNNNNNN

NNNN-------NNNN

NNNNNN-----NNNN

Page 18: In silico  reconstruction of an ancestral mammalian genome

Reconstructing indel historyNot so easy!

NNNNNNNNNNNNNNN

NN------NNNNNNN

NNNN-------NNNN

NNNNNN-----NNNN

NNNNNNNNNNNNNNN

NN------NNNNNNN

NNNN-------NNNN

NNNNNN-----NNNN

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

NN----------------------NNNNNNN

NNNN-----------------------NNNN

NNNNNN---------------------NNNN

Page 19: In silico  reconstruction of an ancestral mammalian genome

Reconstructing indel historyNot so easy!

NNNNNNNNNNNNNNN

NN------NNNNNNN

NNNN-------NNNN

NNNNNN-----NNNN

NNNNNNNNNNNNNNN

NN------NNNNNNN

NNNN-------NNNN

NNNNNN-----NNNN

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

NN----------------------NNNNNNN

NNNN-----------------------NNNN

NNNNNN---------------------NNNN

Page 20: In silico  reconstruction of an ancestral mammalian genome

Inferring indel history• Given:

– A multiple sequence alignment, – A phylogenetic tree, – Probability model for deletions

• Probability depends on deletion length and branch length

– Probability model for insertions• Probability depends on insertion length, branch length, and content

• Find: The most likely set of insertions and deletions that lead to the given alignment

• NP-hard (Chindelevitch et al. 2006)• Fredslund et al. (2003): Restricted enumeration• Blanchette et al. (2004): Greedy algorithm• Chindelevitch et al. (2006): Integer Linear Programming

Page 21: In silico  reconstruction of an ancestral mammalian genome

Partial Results - Deletions only• If only deletions are allowed and all deletions have

the same probability (cost), then:– Rectangle-covering problem, where the tree determines

which sets of rows of admissible

NNNNNNN---NN-----NNNNNNNNN--NN-----NN---NNNNNNNNNN---NNN--NNNNNNNNNNNNNN

– Exact polynomial-time greedy algorithm

– Idea: There always exists a “forced moved”, i.e. a gap that can only be covered by a single maximal deletion

Page 22: In silico  reconstruction of an ancestral mammalian genome

Measuring accuracy• We use simulations of mammalian sequence

evolution to evaluate the accuracy of the reconstruction on neutrally evolving DNA.

- Start with a random (realistic) ancestral sequence

AGCATAGA

Page 23: In silico  reconstruction of an ancestral mammalian genome

Measuring accuracy• We use simulations of mammalian sequence

evolution to evaluate the accuracy of the reconstruction on neutrally evolving DNA.

1) Simulate evolution along the mammalian treeAGCATAGAACGACGATAAGCATAAGCATCAGAGCAAATCAGACTACAAGCATCAGCAGGAGGCTAGGACATCAAGGACACCAAGGACACCAAGGACCCCAAGGACCCCAAGGATTCAGGATTCAGGATTCAGGGTTCAGGGTTC

AGCATAGA

AGGATAGA

AGCATTAGA

AGCATTGAGA

Page 24: In silico  reconstruction of an ancestral mammalian genome

Measuring accuracy• We use simulations of mammalian

sequence evolution to evaluate the accuracy of the reconstruction on neutrally evolving DNA.

- Use TBA to align the sequences generatedAG-C-AT---ACGA-CG---A----GC---AGC--AT---AGCA-A----AGAC-TA---AGCAATC---AGGC------AGGC------AGGA-CA---AGGA-CACCAAGGA-CACCAAGGA-CCCCAAGGA-CCCCAAGGA--TTC-AGGA--TTC-AGGA--TTC-AGGG--TTC-AGGG--TTC-

AGCATAGA

AGGATAGA

AGCATTAGA

AGCATTGAGA

Page 25: In silico  reconstruction of an ancestral mammalian genome

Measuring accuracy• We use simulations of mammalian

sequence evolution to evaluate the accuracy of the reconstruction on neutrally evolving DNA.

- Reconstruct indel history: AG-C-AT---ACGA-CG---A----GC---AGC--AT---AGCA-A----AGAC-TA---AGCAATC---AGGC------AGGC------AGGA-CA---AGGA-CACCAAGGA-CACCAAGGA-CCCCAAGGA-CCCCAAGGA--TTC-AGGA--TTC-AGGA--TTC-AGGG--TTC-AGGG--TTC-

AGCATAGA

AGGATAGA

AGCATTAGA

AGCATTGAGA

Page 26: In silico  reconstruction of an ancestral mammalian genome

Measuring accuracy• We use simulations of mammalian

sequence evolution to evaluate the accuracy of the reconstruction on neutrally evolving DNA.

- Infer ancestral sequences at each nodeAG-C-AT---ACGA-CG---A----GC---AGC--AT---AGCA-A----AGAC-TA---AGCAATC---AGGC------AGGC------AGGA-CA---AGGA-CACCAAGGA-CACCAAGGA-CCCCAAGGA-CCCCAAGGA--TTC-AGGA--TTC-AGGA--TTC-AGGG--TTC-AGGG--TTC-

AGCATAGA

AGGATAGA

AGCATTAGA

AGCATTGAGA

AGATCGA

AGCTTGAGA

AGTATTTAGA

AGTATAGGA

Page 27: In silico  reconstruction of an ancestral mammalian genome

Measuring accuracy• We use simulations of mammalian

sequence evolution to evaluate the accuracy of the reconstruction on neutrally evolving DNA.

For each node, align true and predicted ancestorCount: Missing bases

+ Added bases+ Substituted

basesAGCATAGA

AGGATAGA

ACGCATTAGA

AGCATTGAGA

AGATCGA

AGCTTGAGA

AGTATTTAGA

AGTATAGGA

ACGCATT-AGA A-GTATTTAGA

3 errors/10 bp Error rate = 0.3

Page 28: In silico  reconstruction of an ancestral mammalian genome

Simulation details• We simulate neutrally evolving regions of 50kb • We model: - Lineage-specific neutral mutation rates - Insertions and deletions based on empirical frequency and length distributions - Insertion of transposable elements - CpG effect• We don’t model: - DNA polymerase slippage - Positive selection - Genome rearrangement, duplications• Sanity checks: Simulated sequences are similar to actual mammalian

sequences: – Same pair-wise percent identity– Same frequency and length distribution of insertions and deletions

– Same repetitive content and age distribution of repeats

Page 29: In silico  reconstruction of an ancestral mammalian genome

Guess which ancestor can be best reconstructed?

Eizirik et al. 2001

Page 30: In silico  reconstruction of an ancestral mammalian genome

Reconstructability and tree topology

R

Star phylogeny• Leaves are independent• Accuracy approaches 100% exponentially fast as n increases

n independent descendents

Bifurcating root• Information lost between R and A or B can’t be recovered• Can’t do better than if A and B were reconstructed perfectly• Accuracy < 100% - for all n

n dependent descendents

R

A

B

Page 31: In silico  reconstruction of an ancestral mammalian genome

Eizirik et al. 2001

Page 32: In silico  reconstruction of an ancestral mammalian genome

How many species do we need?

Best choice of species:- Sample many taxa- Choose slowly evolving species

0

2

4

6

8

10

12

14

4 5 7 10 15 20

Number of species used

Percentage of error

Missing basesAdded basesMismatches

Page 33: In silico  reconstruction of an ancestral mammalian genome

What if the fast-radiation model is wrong?

0

1

2

3

4

5

6

7

0 1 2 4

Multiplicative factor for early branches

Error percentage

Added bases

Missing bases

Mismatches

Page 34: In silico  reconstruction of an ancestral mammalian genome

Reconstructing real ancestors

Page 35: In silico  reconstruction of an ancestral mammalian genome

MOUSE-LEMUR

COW

RAT

CHIMP, GORILLA, ORANGUTAN, MACAQUE, VERVET, BABOON

For this set of species, simulations predict:

- Expected accuracy ~95%

Page 36: In silico  reconstruction of an ancestral mammalian genome

Transposon consensus

Actual mammalian ancestor

External validation using ancestral transposons

Human relic

Page 37: In silico  reconstruction of an ancestral mammalian genome

Transposon consensus

Actual mammalian ancestor

0.391 subst/site

0.117 subst/site

External validation using ancestral transposons

Reconstructedmammalian ancestor

Human relic

0.314 subst/site

Page 38: In silico  reconstruction of an ancestral mammalian genome

Transposon consensus

Actual mammalian ancestor

0.391 subst/site

0.117 subst/site Error = 0.026 subst/site

External validation using ancestral transposons

Reconstructedmammalian ancestor

Human relic

0.314 subst/site

Page 39: In silico  reconstruction of an ancestral mammalian genome
Page 40: In silico  reconstruction of an ancestral mammalian genome

What’s next? Whole genome!• Data available

– Whole genomes: Human, chimp, mouse, rat, dog– Unassembled/ low coverage genomes: Cow, rabbit,

armadillo, elephant

• Challenges:– Fewer species– Unassembled contigs– Genome rearrangements– Recombination hotspots

We expect that 90% of theBoreoeutherian genome can be reconstructed with ~90% accuracy

Page 41: In silico  reconstruction of an ancestral mammalian genome
Page 42: In silico  reconstruction of an ancestral mammalian genome

Why should we care?

• Ancestral genome allows to see what and when changes happened in our genome– Allows detection and “dating” of lineage specific

innovations (e.g. FOXP2).

• Allows a better understanding of the forces driving genome evolution

• New model organism?– Human genome is 4 times closer to the ancestral

genome than to the mouse genome: better model for human phenotypes?

Page 43: In silico  reconstruction of an ancestral mammalian genome
Page 44: In silico  reconstruction of an ancestral mammalian genome

Even if we had the full genomes of all living mammalian species:

• Technological problem: – We can’t synthesize large regions of DNA

• Many regions can’t be reconstructed at all:– Heterochromatin– Regions with high recombination rates

• 99% base-by-base accuracy is not enough– One mistake may be enough to make life impossible

Page 45: In silico  reconstruction of an ancestral mammalian genome

Acknowledgements

• David Haussler, Brian Raney UC Santa Cruz• Webb Miller Penn State Univ.• Eric Green NHGRI

• UC Santa Cruz group:– Adam Siepel, Robert Baertsch, Gill Bejerano, Jim Kent

• McGill group:– Leonid Chindelevitch, Zhentao Li, Eric Blais