27
Computational Molecular Biology Lecture Eleven: Introduction to phylogenetic trees Semester I, 2009-10 Graham Ellis NUI Galway, Ireland

Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Computational Molecular Biology

Lecture Eleven: Introduction to phylogenetic trees

Semester I, 2009-10

Graham EllisNUI Galway, Ireland

Page 2: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Page from Darwin’s notebooks (c. 1837)

Page 3: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

On the origin of species (excerpt)

The affinities of all the beings of the same class have sometimesbeen represented by a great tree. I believe this simile largely speaksthe truth. The green and budding twigs may represent existingspecies; and those produced during former years may represent thelong succession of extinct species. At each period of growth all thegrowing twigs have tried to branch out on all sides, and to overtopand kill the surrounding twigs and branches, in the same manneras species and groups of species have at all times overmasteredother species in the great battle for life. The limbs divided intogreat branches, and these into lesser and lesser branches, werethemselves once, when the tree was young, budding twigs; and thisconnection of the former and present buds by ramifying branchesmay well represent the classification of all extinct and living speciesin groups subordinate to groups. Of the many twigs whichflourished when the tree was a mere bush, only two or three, nowgrown into great branches, yet survive and bear the other branches;so with the species which lived during long-past geological periods,very few have left living and modified descendants.

Page 4: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Jargon

A eukaryote is an organism whose cells contain complex structuresenclosed within membranes.

Page 5: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Jargon

A eukaryote is an organism whose cells contain complex structuresenclosed within membranes.

Almost all species of large organisms are eukaryotes, includinganimals, plants and fungi, although most species of eukaryoticprotists are microorganisms.

Page 6: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Jargon

A eukaryote is an organism whose cells contain complex structuresenclosed within membranes.

Almost all species of large organisms are eukaryotes, includinganimals, plants and fungi, although most species of eukaryoticprotists are microorganisms.

The defining membrane-bound structure that sets eukaryotic cellsapart from prokaryotic cells is the nucleus.

Page 7: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Tree of life today

Darwin’s tree model is still considered valid for eukaryotic lifeforms. The earliest branch of the eukaryote tree yields foursupergroups:

◮ Plants (green and red algae, and plants),

◮ Unikonts (amoebas, fungi, and all animals - includinghumans),

◮ Excavates (free-living organisms and parasites),

◮ and SAR (a recently identified main group, abbreviated fromStramenopiles, Alveolates, and Rhizaria, the names of some ofits members).

Page 8: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Tree of life today

Biologists now recognize that the prokaryotes, the bacteria andarchaea , have the ability to transfer genetic information betweenunrelated organisms through horizontal gene transfer (HGT).

Page 9: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Tree of life today

Biologists now recognize that the prokaryotes, the bacteria andarchaea , have the ability to transfer genetic information betweenunrelated organisms through horizontal gene transfer (HGT).

Recombination, gene loss, duplication, and gene creation are a fewof the processes by which genes can be transferred within andbetween bacterial and archael species, causing variation that’s notdue to vertical transfer.

Page 10: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Tree of life today

Biologists now recognize that the prokaryotes, the bacteria andarchaea , have the ability to transfer genetic information betweenunrelated organisms through horizontal gene transfer (HGT).

Recombination, gene loss, duplication, and gene creation are a fewof the processes by which genes can be transferred within andbetween bacterial and archael species, causing variation that’s notdue to vertical transfer.

Darwin’s tree is a useful tool in understanding the basic processesof evolution but cannot explain the full complexity of the situation.

Page 11: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

More jargon

A graph consists of a set of vertices and a set of edges joiningcertain pairs of verices.

Page 12: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

More jargon

A graph consists of a set of vertices and a set of edges joiningcertain pairs of verices.

Page 13: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

More jargon

A graph consists of a set of vertices and a set of edges joiningcertain pairs of verices.

A graph is connected if, for any pair of vertices A, B, there exists apath of edges starting at A and ending at B.

Page 14: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

And more

A tree is a connected graph with no loops.

Page 15: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Phylogenetic trees

A phylogenetic tree is a tree showing the evolutionary relationshipsamong various biological species or other entities that are believedto have a common ancestor.

Page 16: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Phylogenetic trees

A phylogenetic tree is a tree showing the evolutionary relationshipsamong various biological species or other entities that are believedto have a common ancestor.

In a phylogenetic tree, each node with descendants represents themost recent common ancestor of the descendants.

Page 17: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Phylogenetic trees

A phylogenetic tree is a tree showing the evolutionary relationshipsamong various biological species or other entities that are believedto have a common ancestor.

In a phylogenetic tree, each node with descendants represents themost recent common ancestor of the descendants.

The edge lengths in some trees correspond to time estimates or”distances” between species.

Page 18: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Phylogenetic trees

A phylogenetic tree is a tree showing the evolutionary relationshipsamong various biological species or other entities that are believedto have a common ancestor.

In a phylogenetic tree, each node with descendants represents themost recent common ancestor of the descendants.

The edge lengths in some trees correspond to time estimates or”distances” between species.

Each node is called a taxonomic unit. Internal nodes are generallycalled hypothetical taxonomic units as they cannot be directlyobserved.

Page 19: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic
Page 20: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Edit distance between DNA strands

Given two strings X ,Y of letters (or nucleotides) we can think ofthe score of an optimal alignment (some choice of µ, ρ) as being ameasure of the distance between X and Y .

Page 21: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Edit distance between DNA strands

Given two strings X ,Y of letters (or nucleotides) we can think ofthe score of an optimal alignment (some choice of µ, ρ) as being ameasure of the distance between X and Y .

Alternatively, we can count the minimum number ofinsertions/deletions and substitutions/mismatches needed toconvert X into Y , and use this count as a measure of the distncebetween X and Y . This is known as the edit distance between X

and Y .

Page 22: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Example

Consider V=AACCGGTT.

One subsitution/mismatch produces W=AACCGGTA.

Page 23: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Example

Consider V=AACCGGTT.

One subsitution/mismatch produces W=AACCGGTA.

A different substitution/mismatch of V produces X=TACCGGTT.

Page 24: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Example

Consider V=AACCGGTT.

One subsitution/mismatch produces W=AACCGGTA.

A different substitution/mismatch of V produces X=TACCGGTT.

A substitution/mismatch of X produces Y=TACTGGTT.

Page 25: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Example

Consider V=AACCGGTT.

One subsitution/mismatch produces W=AACCGGTA.

A different substitution/mismatch of V produces X=TACCGGTT.

A substitution/mismatch of X produces Y=TACTGGTT.

A different substitution/mismatch of X produces Z=TACCGATT.

Page 26: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

Example

Consider V=AACCGGTT.

One subsitution/mismatch produces W=AACCGGTA.

A different substitution/mismatch of V produces X=TACCGGTT.

A substitution/mismatch of X produces Y=TACTGGTT.

A different substitution/mismatch of X produces Z=TACCGATT.

The words V ,W ,X ,Y ,Z can be represented in a phylogenetictree where each edge is of length 1, and the edit distance betweentwo words is the number of edges in the path joining the words.

W

Y

X

Z

V

Page 27: Computational Molecular Biology - NUI Galwayhamilton.nuigalway.ie/teachingWeb/CompMolBio/ELEVEN/eleven.pdfComputational Molecular Biology - Lecture Eleven: Introduction to phylogenetic

ClustalW2

Have a go at using ClustalW2 software for reproducing the abovetree, starting from the data

V=AACCGGTT.

W=AACCGGTA.

X=TACCGGTT.

Y=TACTGGTT.

Z=TACCGATT.