29
1 Phylogenetics WHO-TDR Bioinformatics Workshop Jessica Kissinger New Delhi, India October, 2005 Why do Phylogenetics? We make evolutionary assumptions in our everyday research life. For example, we need a drug that will kill the parasite and not us. Thus, we need a target that is present in the parasite and not us. We need a good model system, Which parasite (or host) is most closely related to P. falciparum or Humans?

Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

1

PhylogeneticsWHO-TDR

Bioinformatics WorkshopJessica KissingerNew Delhi, India

October, 2005

Why do Phylogenetics?

• We make evolutionary assumptions in oureveryday research life. For example, we need adrug that will kill the parasite and not us. Thus,we need a target that is present in the parasite andnot us.

• We need a good model system, Which parasite(or host) is most closely related to P. falciparumor Humans?

Page 2: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

2

Why Phylogenetics?

• This strain is resistant to drug and this oneis sensitive, what has changed?

• Where did this parasite come from? Has it“co-evolved” with humans? Did it enterthe human lineage from another source?

• Which other mosquitoes are likely to serveas a host for my parasite in nature?

Phylogenetics

• What is Phylogenetics?– Molecular Systematics

• The use of molecular data to infer the relationshipsof the host species e.g. using rRNA to build trees tolook at the relationship of the bacteria to theeukaryotes

– Molecular Evolution• Use trees to infer how a molecule, protein, or gene

has evolved (insertions, deletions, substitutions).

Page 3: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

3

Gene Trees vs Species Trees

You Can Make Phylogenies ofMany Things:

• Amino acid sequences• Nucleotide sequences• RFLP data• Morphological data• “Paper fastening devices”

Page 4: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

4

12

34

5 6 7 8 9 10 11

12 13 14 15

16 17 1819

20

21

Issues you had to deal with

1) Conflict - Size, color, material, shape2) Direction of change, e.g. red to green?3) Homology - these items have a similar function

but do they have a similar origin?4) Mixed materials - plastic coated metal5) How do you assign weight, are some traits more

important?6) Lots of possibilities

>8,2000,794,532,637,891,559,375 rooted trees!

Page 5: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

5

Goals for this lecture

• Become familiar with concepts• Become familiar with vocabulary• Become familiar with the data analysis flow• Reach the point where you can read the

available literature on how to use thesemethods in greater detail

Assumptions made byPhylogenetic algorithms

• The sequences are correct• The sequence are homologous• Each position is homologous• The sampling of taxa or genes is sufficient to resolve the

problem of interest• Sequence variation is representative of the broader group

of interest• Sequence variation contains sufficient phylogenetic signal

(as opposed to noise) to resolve the problem of interest• Each position in the sequence evolved independently

Page 6: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

6

Availability of Sequenced Genomes

AquifexThermodesulfobacterium

Thermotoga

Flavobacteria

Cyanobacteria

Proteobacteria Green nonsulfurbacteria

Gram+bacteria

SpirochetesEuryarcheota

Crenarcheota

AnimalsFungi

PlantsSlime molds

Flagellates

MicrosporidiaGiardia

Bacteria 74 Archaea 16 Eucarya 14

Courtesy of Igor Zhulin

Apicomplexans

Giardia lamblia Varimorpha necatrix

Trichomonas vaginallisTrichomonas foetus

Physarum polycephalum

Euglenoids

KinetoplastidsBodonids

Amoebamastigote

Dictyostelium discoideumEntamoebae histolytica

Entamoebae invadens

Naegleria gruberi

STRAMENOPILES

Cnidaria

EUBACTERIA

ALVEOLATESGREENPLANTS

ANIMALS

FUNGI

EUKARYOTES

PROTISTS

ARCHAEBACTERIA

Brown algae Chr

ysop

hyte

s

Diatom

s

Oom

ycete

sLa

barin

thul

idsCili

ates

Dinoflagelates

Red Algae

adapted fromSogin et al (1991)

Page 7: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

7

Sandra Baldauf, Science June 2003

Circumsporozoite Phylogeny(molecular systematics, host relationships)

Page 8: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

8

How to do an analysis

• Define a question• Select sequences appropriate to answer your

question (not all sequences are equally good!)• Make a multiple sequence alignment• Edit your alignment to make it better• Perform lots and lots of analyses• Perform Bootstrap analyses to test confidence

Multiple Sequence Alignment

Page 9: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

9

Multiple Sequence Alignment

Study your Alignments!

Page 10: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

10

A Word About Methods• There are two overall categories of methods

– Transformed distance methods (data are transformedinto a distance matrix). The matrix is used to build asingle tree. UPGMA and Neighbor-Joining areexamples of this method. They are computationallysimple and very fast.

– Optimality methods (tree generation is separate fromtree evaluation). Parsimony and Maximum-likelihoodmethods divorce the issue of tree generation fromevaluating how good a tree is. For parsimony, theremany be more than 1 “most parsimonious” or“shortest” tree found.

Distance methods• UPGMA

– Assume all lineagesevolve at the same rate

– Produces a root– Produces only one tree– Computationally very

fast– Trees are additive

• Neighbor-joining– Permits variation in

rates of evolution– Does not produce a

root– Produces only one tree– Computationally very

fast– Trees are additive

Page 11: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

11

1 ATTGCTCAGA2 AATGCTCTGA3 ATAGGACTGA

1 vs 2 = 80% similar = 0.2 distance1 vs 3 = 60% similar = 0.4 distance2 vs 3 = 60% similar = 0.4 distance

0.1

0.1

0.1

0.2

1 2 3 123

0.10.2

1 2 31 - 0.2 0.42 - 0.43 -

Create a distance matrixCan use scoring schemes to transform data into distances(e.g. do transitions occur moreoften than transversions)

The implementation of theUPGMA algorithm toproduce the tree below. Anew matrix is calculatedat each iteration.

Page 12: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

12

An unrooted Neighbor-joining tree of thesame dataset

Factors that Affect Phylogenetic Inference

1. Relative base frequencies (A,G,T,C)2. Transition/transversion ratio3. Number of substitutions per site4. Number of nucleotides (or amino acids) in sequence5. Different rates in different parts of the molecule6. Synonymous/non-synonymous substitution ratio7. Substitutions that are uninformative or obfuscatory

1. Parallel substitutions2. Convergent substitutions3. Back substitutions4. Coincidental substitutions

In general, the more factors that are accounted for by themodel (i.e., more parameters), the larger the error ofestimation. It is often best to use fewer parameters bychoosing the simpler model.

Models of evolution: choosing parameters

Page 13: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

13

Some distance models: p-distance

• p = nd/n, where n is the number of sites(nucleotides or amino acids), and nd is thenumber of differences between the two sequencesexamined.• Very robust when divergence times are recentand the affect of complicating phenomena is minor

Some distance models: Jukes-Cantor

• Used to estimate the number ofsubstitutions per site

• The expected number of substitutionsper site is:

• d = 3αt = -(3/4)ln[1-(4/3)p], where pis the proportion of differencebetween 2 sequences

• Variance can be calculated• No assumptions are made about

nucleotide frequencies, or differentialsubstitution rates

A T C G

ATCG

-ααα

α-αα

αα-α

ααα-

Page 14: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

14

Some distance models: Kimura two-parameter

• Used to estimate the number ofsubstitutions per site

• d = 2rt, where r is thesubstitution rate (per site, peryear) and t is the generation time;r = α + 2β, so:

• d = 2αt + 4βt• Accounts for different transition

and transversion rates• No assumptions are made about

nucleotide frequencies, varianceis greater than Jukes-Cantor

C T

A G

Pyrimidines

Purines

α

α

ββ ββ

α = transition rateβ = transversion rateThese are treated thesame for longdivergence times.

Other models

• Hasegawa, Kishino, Yano (HKY): corrects forunequal nucleotide frequencies and transition/transversion bias into account

• Unrestricted model: allows different rates betweenall pairs of nucleotides

• General Time Reversible model: allows differentrates between all pairs of nucleotides and correctsfor unequal nucleotide frequencies

• Many other models have been invented to correctfor specific problems

• The more parameters are introduced, the larger thevariance becomes

Page 15: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

15

Optimality Methods

• All possible trees (or a heuristic samplingof trees) are generated and evaluatedaccording to Parsimony or Maximumlikelihood.

• Note: Tree generation is divorced from treeevaluation. More than one tree topologymay be optimal according to your criteria

General differences between optimality criteria

Works well with strongor weak sequencesimilarity

Works only when sequencesimilarity is high

Works well with strong orweak sequence similarity

Can estimate branchlengths with somedegree of accuracy

Cannot estimate branchlengths accurately

Can accurately estimatebranch lengths (importantfor molecular clocks)

Well understoodstatistical properties(easy to test)

Poorly understood statisticalproperties (hard to test)

Well understood statisticalproperties (easy to test)

Computationally slowComputationally fastComputationally fast

Can account for manytypes of sequencesubstitutions

Assumes that all substitutionsare equal

Can account for many typesof sequence substitutions

Model based“Model free”Model based

MaximumLikelihood

MaximumParsimony

Minimumevolution

Page 16: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

16

Rooted Tree Unrooted Tree

A definiteBeginning andPolarity, a root

Rooted Tree Unrooted Tree

Terminal branches

Nodes

InternalBranches

Root

Page 17: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

17

1 2 31 23 12 3

1

2

3

In the world of trees, there are more rooted topologies for a given Number of taxa than unrooted

Rooted

Unrooted

Possible trees as function ofnumber of Taxa

Taxa Rooted Trees Unrooted Trees3 3 14 15 35 105 15

10 34,459,425 2,027,025100 2 x 10 182

More trees than the number of atoms in the universe!

Page 18: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

18

Tree search considerations

• Exhaustive searches are searches of allpossible trees for the number of Taxa inyour data set (15 Taxa or less)

• If you have more than 15 Taxa, thenheuristic methods must be employed inwhich you search a sample of all possibletrees. There are many algorithms for thegeneration of different populations of trees.

Tree search considerations

Strategy Type• Stepwise addition Algorithmic• Star decomposition Algorithmic• Exhaustive Exact• Branch & bound Exact• Branch swapping Heuristic• Genetic algorithm Heuristic• Markov Chain Monte Carlo Heuristic

Page 19: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

19

Parsimony basics & scores• Based on shared derived characters

(synapomorphies)• Identical characters which evolve more than once

are “homoplasies”• Unique characters are “autapomorphies”• The score of the tree is the total of all the changes

needed to map the data. The scale bar is #ofchanges.

• Smaller, i.e. more parsimonious scores are better• More than one tree topology may have the same

score

An informative position is one that can favor one treeover another when some type of criteria are applied.

1 2 3 4 5 6 7 8 91 A A G A G T G C A2 A G C C G T G C G3 A G A T A T C C A4 A G A G A T C C G

* **

1 2 3423 4 11 2 3 4

1A

2G

3A

4G

1A

3A

2G

4G

1A

4G

2G

3A

2 1 2

Position #5 isInformative, itpermits us tochoose a shortertree from amongthe options. It prefers the treeof length 1 overthose of length 2

Page 20: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

20

1 2 3 4 5 6 7 8 91 A A G A G T G C A2 A G C C G T G C G3 A G A T A T C C A4 A G A G A T C C G

* **

1G

2C

3A

4A

1G

3A

2C

4A

1G

4A

2C

3A

1G

2A

3G

4G

1G

3G

2A

4G

1G

4G

2A

3G

1A

2A

3A

4A

1A

3A

2A

4A

1A

4A

2A

3A

Pos 1

Pos 2

Pos 3

0 0 0

22 2

1 1 1

Not all alignment positions can help pick a better tree

None ofthesecharacterscan distinguishbetween thethreepossible unrootedtopologies.They areuninformative

Maximum Likelihood

• Is an optimality method, it is an algorithm whichevaluates trees according to some criterion

• The algorithm searches for trees which maximizethe probability of observing the data

• Trees are scored with Log likelihoods• This is the most computationally intensive

method available• More tractable versions include (puzzle)• Alternate approaches include Bayesian inference

(Mr. Bayes)

Page 21: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

21

Not all methods can be usedwith all types of data

• Parsimony can be used with all types of data,nucleotide, protein, binary, morphological, mixeddata sets. States can be ordered.

• Distance can be used with nucleotide and proteindata but you need a model to generate distances

• Maximum likelihood, normally only nucleotidedata, but PAML can do protein maximumlikelihood (still a tricky and debatable approach).

• Bayesian - All types of sequence data

There are Many Types of Trees

• Cladogram vs. Phylogram– Cladograms have uniform branch lengths and only

represent relationships– Phylograms have lengths proportional to change or

distance• Rooted vs. Unrooted

– A defined origin as opposed to a network orrelationships (most tress are unrooted because they areeasier to calculate)

• Artistic license (slanted, rectangular, circle, “network”)

Page 22: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

22

A Word about treesABCDEFG

A

B

CD

E

F

G

AB

CD

E

FG

A

B

CD

E

F

G

A word about trees(there are many types)

rayfinned fish

frogs

salamanders

turtles

crocodiles

birds

lizards

snakes

mammals

lungfishbirds

crocodiles

snakeslizards

mammals

frogssalamanders

rayfinned fish

lungfish

turtles

1 change

rayfinned fish

frogs

salamanders

turtles

crocodiles

birds

lizards

snakes

mammals

lungfish

1 change

Slanted Cladogram Rectangular Phylogram

UnrootedPhylogram

Page 23: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

23

The Bootstrap

• The bootstrap is a method for assigning ameasure of confidence to a particular nodein tree.

• It is NOT a measure of the overall“goodness” of the tree.

• Rules of thumb: 70-100% = Good, 0-30%= bad, 30-70% = “gray zone” difficult tointerpret.

1 2 3 4 5 6 7 8 91 A A G A G T G C A2 A G C C G T G C G3 A G A T A T C C A4 A G A G A T C C G

8 8 3 2 1 4 6 5 9 1 C C G A A A T G A2 C C C G A C T G G3 C C A G A T T A A4 C C A G A G T A G

1STsample

Original DataEach column Represented once

2ND etc.

3RD etc.

2 9 6 2 1 3 4 8 7

6 3 3 1 6 5 7 4 9

100 or 1,000

The bootstrap process

Then build consensus of all trees produced by sample datasets. This provides support for nodes

Page 24: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

24

A caution about alignmentscharacters in columns are homologous

stickman

Daffy

Donald

RoadRunner

TweetyBugs

Goofy

Mickey

Wile E

Pluto

1 change

stickman

Daffy

Donald

RoadRunner

TweetyBugs

Goofy

Mickey

Wile E

Pluto

1 change

stickman

Daffy

Donald

RoadRunner

TweetyBugs

Goofy

Mickey

Pluto

Wile E

1 change

stickman

Daffy

RoadRunner

Tweety

DonaldBugs

Goofy

Mickey

Wile E

Pluto

1 change

stickman

Daffy

RoadRunner

Tweety

DonaldBugs

Goofy

Mickey

Wile E

Pluto

1 change

stickman

Daffy

RoadRunner

Tweety

DonaldBugs

Goofy

Mickey

Pluto

Wile E

1 change

stickman

Daffy

RoadRunner

Tweety

DonaldBugs

Goofy

Mickey

Wile E

Pluto

1 change

stickman

Daffy

RoadRunner

Tweety

DonaldBugs

Goofy

Mickey

Wile E

Pluto

1 change

stickman

Daffy

RoadRunner

Tweety

DonaldBugs

Goofy

Mickey

Pluto

Wile E

1 change

stickman

Daffy

RoadRunner

Donald

TweetyBugs

Goofy

Mickey

Wile E

Pluto

1 change

stickman

Daffy

RoadRunner

Donald

TweetyBugs

Goofy

Mickey

Wile E

Pluto

1 change

stickman

Daffy

RoadRunner

Donald

TweetyBugs

Goofy

Mickey

Pluto

Wile E

1 change

stickman

Daffy

Donald

RoadRunner

TweetyBugs

Goofy

Mickey

Wile E

Pluto

1 change

stickman

Daffy

Donald

RoadRunner

TweetyBugs

Goofy

Mickey

Wile E

Pluto

1 change

stickman

Daffy

Donald

RoadRunner

TweetyBugs

Goofy

Mickey

Pluto

Wile E

1 change

15 equally Parsimonious treesOf Disney characters.All trees have the same,smallest score.

Page 25: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

25

stickman

Daffy

RoadRunner

Tweety

Donald

Bugs

Goofy

Mickey

Wile E

Pluto

Strict

stickman

Daffy

RoadRunner

Tweety

Donald

Bugs

Goofy

Mickey

Wile E

Pluto

100

100

60

100

100

Majority rule

Comparison of real trees Assesment ofsupport

Bootstrap Example

Donald DuckDaffy DuckTweety bird

71 Donald Duck

Daffy DuckTweety bird

?

Donald Duck

Daffy DuckTweety bird

?

?

If 79% of the time this relationship holds, 29% it is something else

Page 26: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

26

Some points to consider for the paper fasteners:

We decided, in our evolutionary model that material was so important that weneeded to give it extra weight, so we did (weight = 2).

Based on external information, such as the archeological record, we have learnedthat metal predates plastic, so, we ordered our characters: metal must precedeplastic.

We decided to use as an “outgroup”, an unbent piece of metal, (taxon 21) topolarize the direction of evolution within our tree, i.e. we have evolved from astraight piece of metal into a “paper fastening device”. We will not allow reversionto this “unbent” state.

We will enforce the assumptions/decisions made above by using a constraint tree.By using this constraint tree, we reduce the number of possible rooted trees from2.216431 x 1020 to 273,922,023,375 and we reduce the number of unrooted treesfrom 6.332660 x 1018 to 54,784,404,674 - a considerable savings!

We removed taxa 4 and 11 from the data set because they are non-homologous, i.e.the have a similar function but they do not share a common evolutionary descent orpath. What we have here is a case of convergent evolution, i.e. independent originsof a paper fastening solution!

Page 27: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

27

Neighbor-joining analysis and bootstrap of clip dataset

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.12.17.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.8.9.10.20.14.15.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.15.14.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.13.4.11.5.6.21.

1 change

1.2.17.3.18.19.7.9.10.20.14.15.8.16.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.5.6.4.11.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.12.17.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.5.6.12.17.13.21.

1 change

1.2.3.18.19.7.8.9.10.20.14.15.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.8.14.15.9.10.20.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.8.9.10.20.15.14.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.8.9.10.20.14.15.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.8.14.15.9.10.20.17.16.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.8.14.15.9.10.20.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.8.9.10.20.15.14.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.8.9.10.20.14.15.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.8.9.10.20.14.15.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.8.14.15.9.10.20.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.8.9.10.20.14.15.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.8.9.10.20.14.15.12.16.17.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.8.14.15.9.10.20.17.16.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.8.9.10.20.14.15.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.14.15.7.9.10.20.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.9.10.20.7.15.14.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.9.10.20.7.15.14.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.15.7.9.10.20.14.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.15.8.14.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.15.8.14.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.8.14.15.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.15.14.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.14.15.7.9.10.20.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.8.15.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.8.15.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.14.15.7.9.10.20.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.15.14.7.9.10.20.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.9.10.20.7.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.8.14.9.10.20.15.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.8.15.9.10.20.14.16.17.12.4.11.5.6.13.21.

1 change

1.2.17.3.18.19.7.9.10.20.14.15.8.16.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.5.6.17.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.5.6.17.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.12.17.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.13.4.11.5.6.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.12.17.13.4.11.5.6.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.5.6.17.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.5.6.17.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.6.5.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.13.4.11.5.6.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.6.5.17.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.6.5.17.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.5.6.17.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.4.11.5.6.17.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.5.6.17.12.4.11.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.5.6.17.12.4.11.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.12.3.18.19.7.9.10.20.14.15.8.16.17.4.11.13.5.6.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.13.12.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.4.11.5.6.12.13.21.

1 change

1.2.3.18.19.7.8.9.10.20.14.15.4.11.5.6.12.16.17.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.12.17.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.8.15.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.9.10.20.7.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.8.15.16.17.12.4.11.5.6.13.21.

1 change

1.2.3.18.19.7.9.10.20.14.15.8.16.17.12.4.11.5.6.13.21.

1 change

Some of the >37,500Trees generated by a Parsimony analysisof the clip dataset

Page 28: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

28

Consensus of 5,000 parsimony Trees Bootstrap of clips

Software and Books

• “How to make a phylogenetic Tree” by BarryHall, comes with PAUP* CD, ~$30, Sinauer Press

• Phylip - Joe Felsenstein, Free via internet• PAML - Free via internet• Mr. Bayes - Free via internet• ClustalW or ClustalX - Free via internet• Fundamentals of molecular evolution, Second

edition, Wen-Hsiung Li, Sinauer Press

* Best on a MAC, but also command line

Page 29: Phylogenetics WHO-TDR Bioinformatics Workshop · Goals for this lecture •Become familiar with concepts •Become familiar with vocabulary •Become familiar with the data analysis

29

Giving Credit

• Several slides in this presentation wereprovided by Mike Thomas, via apresentation he posted on the internet in2002.