38
Remarks About Homework Remarks About Homework Write detailed answers Write detailed answers Pay attention to details in the Pay attention to details in the questions questions “… “… nor can the shy man learn…” nor can the shy man learn…”

Remarks About Homework

Embed Size (px)

DESCRIPTION

Remarks About Homework. Write detailed answers Pay attention to details in the questions “… nor can the shy man learn…”. Multiple Sequence Alignment (MSA) and Phylogeny. One of the options to get multiple sequence Fasta file. One of the options to get multiple sequence Fasta file. - PowerPoint PPT Presentation

Citation preview

Page 1: Remarks About Homework

Remarks About HomeworkRemarks About Homework

Write detailed answersWrite detailed answers

Pay attention to details in the questionsPay attention to details in the questions

“… “… nor can the shy man learn…”nor can the shy man learn…”

Page 2: Remarks About Homework

Multiple Multiple Sequence Sequence

Alignment (MSA)Alignment (MSA)andand

Phylogeny Phylogeny

Page 3: Remarks About Homework

OneOne of the options to get multiple of the options to get multiple sequence Fasta filesequence Fasta file

Page 4: Remarks About Homework

OneOne of the options to get multiple of the options to get multiple sequence Fasta filesequence Fasta file

Page 5: Remarks About Homework

MSA input: multiple sequence MSA input: multiple sequence Fasta fileFasta file

>gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens] >gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens] MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLT MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLT KGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLT KGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI RCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI

>gi|57113961|ref|NP_001009043.1| CD4 antigen [Pan troglodytes] >gi|57113961|ref|NP_001009043.1| CD4 antigen [Pan troglodytes] MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQTKILGNQGSFLT MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQTKILGNQGSFLT KGPSKLNDRVDSRRSLWDQGNFTLIIKNLKIEDSDTYICEVGDQKEEVQLLVFGLTANSDTHLLQGQSLT KGPSKLNDRVDSRRSLWDQGNFTLIIKNLKIEDSDTYICEVGDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAQRMSQIKRLLSEKKTCQCPHRFQKTCSPI RCRHRRRQAQRMSQIKRLLSEKKTCQCPHRFQKTCSPI

>gi|50054438|ref|NP_001001908.1| CD4 antigen [Sus scrofa] >gi|50054438|ref|NP_001001908.1| CD4 antigen [Sus scrofa] MDPGTSLRHLFLVLQLAMLPAASGTQEKYLVLGKAGDLAELPCHSSQKKNLPFNWKNSNQTKILGGHGSF MDPGTSLRHLFLVLQLAMLPAASGTQEKYLVLGKAGDLAELPCHSSQKKNLPFNWKNSNQTKILGGHGSF WHTASVTELTSRLDSKKNMWDHGSFPLIIKNLEVTDSGIYICEVEDKRIEVQLLVFRLTASVTRVLLGQS WHTASVTELTSRLDSKKNMWDHGSFPLIIKNLEVTDSGIYICEVEDKRIEVQLLVFRLTASVTRVLLGQS LTLTLEGPSGSHPTVQWKGPGNKSKNDVKSLLLPQVGLEDSGLWTCTVSQDQKTLVFRSNIFVLAFQKVP LTLTLEGPSGSHPTVQWKGPGNKSKNDVKSLLLPQVGLEDSGLWTCTVSQDQKTLVFRSNIFVLAFQKVP STVYVKEGDQVALSFPLTFEAESLSGELMWRQTKGASSPQSWITFSLKDRKVTVQKSLQNLKLRMAEKLP STVYVKEGDQVALSFPLTFEAESLSGELMWRQTKGASSPQSWITFSLKDRKVTVQKSLQNLKLRMAEKLP LQITLLQALPQYAGSGNLTLVLPEGRLHREVNLVVMRATQSKNEVTCEVLGPTPPKVVLSLKLGNQSMKV LQITLLQALPQYAGSGNLTLVLPEGRLHREVNLVVMRATQSKNEVTCEVLGPTPPKVVLSLKLGNQSMKV SDQQKLVTVLDPEAGMWRCLLRDKDKVLLESQVEVLPTAFTRAWPELLASVIGGIIGLLFLAGFCIACVK SDQQKLVTVLDPEAGMWRCLLRDKDKVLLESQVEVLPTAFTRAWPELLASVIGGIIGLLFLAGFCIACVK CWHRRRRAERMSQIKRLLSEKKTCQCAHRQQKNYSLT CWHRRRRAERMSQIKRLLSEKKTCQCAHRQQKNYSLT

>gi|6978631|ref|NP_036837.1| Cd4 molecule [Rattus norvegicus] >gi|6978631|ref|NP_036837.1| Cd4 molecule [Rattus norvegicus] MCRGFSFRHLLPLLLLQLSKLLVVTQGKTVVLGKEGGSAELPCESTSRRSASFAWKSSDQKTILGYKNKL MCRGFSFRHLLPLLLLQLSKLLVVTQGKTVVLGKEGGSAELPCESTSRRSASFAWKSSDQKTILGYKNKL LIKGSLELYSRFDSRKNAWERGSFPLIINKLRMEDSQTYVCELENKKEEVELWVFRVTFNPGTRLLQGQS LIKGSLELYSRFDSRKNAWERGSFPLIINKLRMEDSQTYVCELENKKEEVELWVFRVTFNPGTRLLQGQS LTLILDSNPKVSDPPIECKHKSSNIVKDSKAFSTHSLRIQDSGIWNCTVTLNQKKHSFDMKLSVLGFAST LTLILDSNPKVSDPPIECKHKSSNIVKDSKAFSTHSLRIQDSGIWNCTVTLNQKKHSFDMKLSVLGFAST SITAYKSEGESAEFSFPLNLGEESLQGELRWKAEKAPSSQSWITFSLKNQKVSVQKSTSNPKFQLSETLP SITAYKSEGESAEFSFPLNLGEESLQGELRWKAEKAPSSQSWITFSLKNQKVSVQKSTSNPKFQLSETLP LTLQIPQVSLQFAGSGNLTLTLDRGILYQEVNLVVMKVTQPDSNTLTCEVMGPTSPKMRLILKQENQEAR LTLQIPQVSLQFAGSGNLTLTLDRGILYQEVNLVVMKVTQPDSNTLTCEVMGPTSPKMRLILKQENQEAR VSRQEKVIQVQAPEAGVWQCLLSEGEEVKMDSKIQVLSKGLNQTMFLAVVLGSAFSFLVFTGLCILFCVR VSRQEKVIQVQAPEAGVWQCLLSEGEEVKMDSKIQVLSKGLNQTMFLAVVLGSAFSFLVFTGLCILFCVR CRHQQRQAARMSQIKRLLSEKKTCQCSHRMQKSHNLI CRHQQRQAARMSQIKRLLSEKKTCQCSHRMQKSHNLI

Page 6: Remarks About Homework

Clustal XClustal X

Page 7: Remarks About Homework

Step1: Load the sequencesStep1: Load the sequences

Page 8: Remarks About Homework

Uploaded sequencesUploaded sequences

A little unclear…

Page 9: Remarks About Homework

Edit Fasta headersEdit Fasta headers…… MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLT MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLT KGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLT KGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI RCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI

MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQTKILGNQGSFLT MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQTKILGNQGSFLT KGPSKLNDRVDSRRSLWDQGNFTLIIKNLKIEDSDTYICEVGDQKEEVQLLVFGLTANSDTHLLQGQSLT KGPSKLNDRVDSRRSLWDQGNFTLIIKNLKIEDSDTYICEVGDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAQRMSQIKRLLSEKKTCQCPHRFQKTCSPI RCRHRRRQAQRMSQIKRLLSEKKTCQCPHRFQKTCSPI

MDPGTSLRHLFLVLQLAMLPAASGTQEKYLVLGKAGDLAELPCHSSQKKNLPFNWKNSNQTKILGGHGSF MDPGTSLRHLFLVLQLAMLPAASGTQEKYLVLGKAGDLAELPCHSSQKKNLPFNWKNSNQTKILGGHGSF WHTASVTELTSRLDSKKNMWDHGSFPLIIKNLEVTDSGIYICEVEDKRIEVQLLVFRLTASVTRVLLGQS WHTASVTELTSRLDSKKNMWDHGSFPLIIKNLEVTDSGIYICEVEDKRIEVQLLVFRLTASVTRVLLGQS LTLTLEGPSGSHPTVQWKGPGNKSKNDVKSLLLPQVGLEDSGLWTCTVSQDQKTLVFRSNIFVLAFQKVP LTLTLEGPSGSHPTVQWKGPGNKSKNDVKSLLLPQVGLEDSGLWTCTVSQDQKTLVFRSNIFVLAFQKVP STVYVKEGDQVALSFPLTFEAESLSGELMWRQTKGASSPQSWITFSLKDRKVTVQKSLQNLKLRMAEKLP STVYVKEGDQVALSFPLTFEAESLSGELMWRQTKGASSPQSWITFSLKDRKVTVQKSLQNLKLRMAEKLP LQITLLQALPQYAGSGNLTLVLPEGRLHREVNLVVMRATQSKNEVTCEVLGPTPPKVVLSLKLGNQSMKV LQITLLQALPQYAGSGNLTLVLPEGRLHREVNLVVMRATQSKNEVTCEVLGPTPPKVVLSLKLGNQSMKV SDQQKLVTVLDPEAGMWRCLLRDKDKVLLESQVEVLPTAFTRAWPELLASVIGGIIGLLFLAGFCIACVK SDQQKLVTVLDPEAGMWRCLLRDKDKVLLESQVEVLPTAFTRAWPELLASVIGGIIGLLFLAGFCIACVK CWHRRRRAERMSQIKRLLSEKKTCQCAHRQQKNYSLT CWHRRRRAERMSQIKRLLSEKKTCQCAHRQQKNYSLT

MCRGFSFRHLLPLLLLQLSKLLVVTQGKTVVLGKEGGSAELPCESTSRRSASFAWKSSDQKTILGYKNKL MCRGFSFRHLLPLLLLQLSKLLVVTQGKTVVLGKEGGSAELPCESTSRRSASFAWKSSDQKTILGYKNKL LIKGSLELYSRFDSRKNAWERGSFPLIINKLRMEDSQTYVCELENKKEEVELWVFRVTFNPGTRLLQGQS LIKGSLELYSRFDSRKNAWERGSFPLIINKLRMEDSQTYVCELENKKEEVELWVFRVTFNPGTRLLQGQS LTLILDSNPKVSDPPIECKHKSSNIVKDSKAFSTHSLRIQDSGIWNCTVTLNQKKHSFDMKLSVLGFAST LTLILDSNPKVSDPPIECKHKSSNIVKDSKAFSTHSLRIQDSGIWNCTVTLNQKKHSFDMKLSVLGFAST SITAYKSEGESAEFSFPLNLGEESLQGELRWKAEKAPSSQSWITFSLKNQKVSVQKSTSNPKFQLSETLP SITAYKSEGESAEFSFPLNLGEESLQGELRWKAEKAPSSQSWITFSLKNQKVSVQKSTSNPKFQLSETLP LTLQIPQVSLQFAGSGNLTLTLDRGILYQEVNLVVMKVTQPDSNTLTCEVMGPTSPKMRLILKQENQEAR LTLQIPQVSLQFAGSGNLTLTLDRGILYQEVNLVVMKVTQPDSNTLTCEVMGPTSPKMRLILKQENQEAR VSRQEKVIQVQAPEAGVWQCLLSEGEEVKMDSKIQVLSKGLNQTMFLAVVLGSAFSFLVFTGLCILFCVR VSRQEKVIQVQAPEAGVWQCLLSEGEEVKMDSKIQVLSKGLNQTMFLAVVLGSAFSFLVFTGLCILFCVR CRHQQRQAARMSQIKRLLSEKKTCQCSHRMQKSHNLI CRHQQRQAARMSQIKRLLSEKKTCQCSHRMQKSHNLI

>Homo_sapiens_CD4

>Pan_troglodytes_CD4

>Sus_scrofa_CD4

>Rattus_norvegicus_CD4

>gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens]

>gi|57113961|ref|NP_001009043.1| CD4 antigen [Pan troglodytes]

>gi|50054438|ref|NP_001001908.1| CD4 antigen [Sus scrofa]

>gi|6978631|ref|NP_036837.1| Cd4 molecule [Rattus norvegicus]

Page 10: Remarks About Homework

Uploaded sequencesUploaded sequences

Much better

Page 11: Remarks About Homework

Step2: Perform alignmentStep2: Perform alignment

Page 12: Remarks About Homework

Multiple Sequence Alignment and Multiple Sequence Alignment and conservation viewconservation view

Page 13: Remarks About Homework

Step 3: Create treeStep 3: Create tree

Page 14: Remarks About Homework

The Newick tree format is used to represent trees as strings

CA D

In Newick format: ((A,C),(B,D));

B

• Each pair of parenthesis () encloses a clade in the tree • A comma “,” separates the members of the corresponding clade• A semicolon “;” is always the last character

Page 15: Remarks About Homework

Step 4: View tree with NJPlotStep 4: View tree with NJPlot

Note :unrooted tree

Page 16: Remarks About Homework

CB

A

A

B

C

=

B

C

A

=B

C

A

=

Page 17: Remarks About Homework

Rooted vs. unrooted trees

1

2

3A

B

C

1

CBA

2

BCA

3

ABC

Page 18: Remarks About Homework

How would each tree look in Newick format?

1

2

3A

B

C

1

CBA

2

BCA

3

ABC

((C,B),A) ((A,B),C)

((A,C),B)(A,B,C)

Page 19: Remarks About Homework

Step 4.5: defining an outgroupStep 4.5: defining an outgroup

Page 20: Remarks About Homework

Step 4: View tree with NJPlotStep 4: View tree with NJPlot

Note :The order

inside a split doesn’t matter

Page 21: Remarks About Homework

Chimp HumanGorillaHuman ChimpGorilla

=

Chimp GorillaHuman

= =

Human GorillaChimp

(Gorilla,(Human,Chimp)) = (Gorilla,(Chimp,Human))

= ((Human,Chimp),Gorilla) = ((Chimp,Human),Gorilla)

Page 22: Remarks About Homework
Page 23: Remarks About Homework

How How robustrobust is our tree is our tree??

Page 24: Remarks About Homework

We need some statistical way to estimate the We need some statistical way to estimate the confidence in the tree topology confidence in the tree topology (like we need the E-(like we need the E-value to estimate the confidence of a blast hit)value to estimate the confidence of a blast hit)

But we don’t know anything about the But we don’t know anything about the distribution of tree topologiesdistribution of tree topologies

The only data source we have is our data (MSA)The only data source we have is our data (MSA) So, we must rely on our own resources: So, we must rely on our own resources: “pull up “pull up

by your own bootstraps”by your own bootstraps”

How robust is our treeHow robust is our tree??

Page 25: Remarks About Homework

Bootstrap

Page 26: Remarks About Homework

Bootstrap1. Create n (100-1000) new MSAs (pseudo-datasets) by randomly sampling K positions from our original MSA with replacement

12345 K1 : ATCTG…A 2 : ATCTG…C3 : ACTTA…C 4 : ACCTA…T

11244…31 : AATTT…C2 : AATTT…C3 : AACTT…T4 : AACTT…C

97478…101 : TTTTA…T2 : CATAC…A3 : CATAC…T4 : AGTGG…A

51578… 121 : GAGTA…T2 : GAGAC…G3 : AAAAC…A4 : AAAGG…C

Page 27: Remarks About Homework

Bootstrap2. Reconstruct a pseudo-tree from each pseudo-dataset using the same method used for reconstructing the original tree

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

11244…31 : AATTT…C2 : AATTT…C3 : AACTT…T4 : AACTT…C

97478…101 : TTTTA…T2 : CATAC…A3 : CATAC…T4 : AGTGG…A

51578… 121 : GAGTA…T2 : GAGAC…G3 : AAAAC…A4 : AAAGG…C

Page 28: Remarks About Homework

Bootstrap3. For each node in our original tree, we count the number of times it appeared in the pseudo-trees Sp1

Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3

Sp4

67%100%

Page 29: Remarks About Homework

Step 3.5 - BootstrapStep 3.5 - Bootstrap

Page 30: Remarks About Homework

Bootstrap values on NJPlotBootstrap values on NJPlot

Note:ClustalX saves trees with .ph extension. Trees with bootstrap are saved with .phb extension

Page 31: Remarks About Homework

Reconstructing the tree of lifeReconstructing the tree of life

Page 32: Remarks About Homework

Darwin’s vision of the tree of life Darwin’s vision of the tree of life from the from the Origin of SpeciesOrigin of Species

Page 33: Remarks About Homework

Based on molecular data (SSU Based on molecular data (SSU rRNA), branching of several rRNA), branching of several kingdoms remain in disputekingdoms remain in dispute

Page 34: Remarks About Homework

Lateral Gene Transfer (LGT) Lateral Gene Transfer (LGT) Challenges the Conceptual Basis Challenges the Conceptual Basis

of Phylogenetic Classificationof Phylogenetic Classification

Page 35: Remarks About Homework

Science 3 March 2006:Vol. 311. no. 5765, pp. 1283 - 1287

Toward Automatic Reconstruction of a Highly Resolved Tree of Life

Page 36: Remarks About Homework

MethodologyMethodology Started with 36 genes universally present in 191 Started with 36 genes universally present in 191

species (spanning all 3 domains of life), for species (spanning all 3 domains of life), for which orthologs could be unambiguously which orthologs could be unambiguously identifiedidentified

Eliminated 5 genes that are LGT suspects Eliminated 5 genes that are LGT suspects (mostly tRNA synthetases)(mostly tRNA synthetases)

Constructed an MSA for each of the 31 Constructed an MSA for each of the 31 orthogroupsorthogroups

Concatenated all 31 MSAs to a super-MSA of Concatenated all 31 MSAs to a super-MSA of 8090 columns8090 columns

The phylogeny was reconstructed based on the The phylogeny was reconstructed based on the super-MSA using the maximum likelihood super-MSA using the maximum likelihood approachapproach

Page 37: Remarks About Homework

Archaea

Eukaryota

Bacteria

Page 38: Remarks About Homework

Tree supportTree support

81.7% of the branches show bootstrap 81.7% of the branches show bootstrap support of over 80%support of over 80%

65% of the branches show bootstrap 65% of the branches show bootstrap support of 100%support of 100%

However, several deep branchings show However, several deep branchings show low supportslow supports