20
Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis 13 December 2010 - Corrections -

Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis

  • Upload
    liseli

  • View
    18

  • Download
    0

Embed Size (px)

DESCRIPTION

Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis. 13 December 2010 - Corrections -. Exercise 1/2. One example tree … PhyML . Tree search : NNIs. Initial tree : BIONJ . Model of amino acids substitution : JTT . Number of taxa : 41 . Log-likelihood : -7592.66297 - PowerPoint PPT Presentation

Citation preview

Page 1: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis

Elements of Bioinformatics 14F01TP5 - Phylogenetic analysis

13 December 2010

- Corrections -

Page 2: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis

Exercise 1/2

One example tree …

PhyML

. Tree search : NNIs. Initial tree : BIONJ

. Model of amino acids substitution : JTT

. Number of taxa : 41

. Log-likelihood : -7592.66297

. Discrete gamma model : Yes - Number of categories : 8 - Gamma shape parameter : 1.764. Proportion of invariant : 0.061

Page 3: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis

Exercise 3

Page 4: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis
Page 5: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis
Page 6: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis
Page 7: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis
Page 8: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis
Page 9: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis
Page 10: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis

Blast result: vertebrate sequences closest to gem-1 seem to be MIRO2.

Exercise 4

Page 11: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis

Further below (same Blast than previous page), there is a 2nd vertebrate protein group annotated as MIRO1 – again with similarity over the full length of the protein. We copy all the data from the species of interest to infer the phylogenies. Please note that there is no MIRO1 gene for X. tropicalis.

Page 12: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis

Phylogenetic tree reconstructed using the 1-click mode with default parameter settings. The topology of the result tree corresponds to the species tree with one exception: chicken and frog seem to be monophyletic - this is probably not the case.

The result tree further suggests that the vertebrate genes MIRO1 and MIRO2 are paralogs and both are co-orthologs of the fungi gene gem-1 and invertebrate gene MIRO.

Page 13: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis

Tree robustness

Bootstrap replicates are created randomly – therefore results for the same analysis can slightly differ. It is thus more meaningful to perform an analysis with 1000 replicates.

Page 14: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis

Exercise 5

Page 15: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis

Phylogenetic analysis parameters: Alignment: muscle (default); Data model: Gblocks – stringent; Tree-building: PhyML (JTT, gamma(4), shape parameter estimated, invariable sites fixed (0.0)); Test for topology robustness: aLRT (SH).

The 1st tree includes species from Archaea and Bacteria, which form the 2 major branches of the tree. The 2 nd dataset consists of sequences derived from Eukaryotes. A phylogenetic tree reconstructed with both datasets suggests, that eukaryotic sequences are more closely related to Bacteria than to Archaea. This result is not in agreement with the species tree. An exaplanation could be that Eukaryotes derived this gene from bacteria by horizontal gene transfer (HGT).

Dataset 2

Page 16: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis

Phylogenetic tree of datasets 1, 2 and 3. The sequences of the 3rd dataset consist of eukaryotic homologs. The result suggests that eukaryotic sequences are possibly derived from a single gene transfer event. Sequence annotation indicates that one group of the eukaryotic proteins (‘SYAM’) are mitochondrial. A possible scenario: Eukaryotes could have derived a gene copy by endosymbiontic gene transfer (EGT) from the mitochondrion to the nucleus, followed by a gene duplication event. One of the copies subsequently evolved into the cytoplasmic form.

Analysis protokol: Alignment: muscle (default); Data model: Gblocks – stringent; Tree-building: PhyML (JTT, gamma(4), shape parameter estimated, invariable sites fixed (0.0)); Test for topology robustness: aLRT (SH).

Page 17: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis

Can you root this tree?

Exercise 6

Page 18: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis

Eukaryota

Eukaryota

Eukaryota

Archaea

Bacteria

Page 19: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis

Eukaryota, nuclear

Eukaryota, mitochondrial, nuclear encoded

Eukaryota, chloroplast, ARATH: nuclear encoded CYCME: chloropl. encoded

Archaea

Bacteria

Page 20: Elements of Bioinformatics 14F01 TP5 - Phylogenetic analysis

• To be improved:– Add more sequence data to the dataset– Use alignments methods such as probcons or T-coffee– Analyze individual branches, e.g. cyanobacteria and

chloroplastic data, proteobacteria and mitochondrial-derived sequences (& meaningful outgroups)

– ML methods or MrBayes– Calculate support values for internal nodes– Run other tests, which we have not discussed here …

• Can this tree be rooted by an outgroup? No, as no species is known that diverged prior to the divergence of the 2 major branches of life!