33
Introduction to Introduction to characters and characters and parsimony analysis parsimony analysis

Introduction to characters and parsimony analysis

Embed Size (px)

Citation preview

Page 1: Introduction to characters and parsimony analysis

Introduction to characters Introduction to characters and parsimony analysisand parsimony analysis

Page 2: Introduction to characters and parsimony analysis

Genetic RelationshipsGenetic Relationships

• Genetic relationships exist between individuals within Genetic relationships exist between individuals within populationspopulations

• These include ancestor-descendent relationships and more These include ancestor-descendent relationships and more indirect relationships based on common ancestryindirect relationships based on common ancestry

• Within sexually reproducing populations there is a Within sexually reproducing populations there is a network of relationshipsnetwork of relationships

• Genetic relations within populations can be measured with Genetic relations within populations can be measured with a coefficient of genetic relatednessa coefficient of genetic relatedness

Page 3: Introduction to characters and parsimony analysis

Phylogenetic RelationshipsPhylogenetic Relationships• Phylogenetic relationships exist between lineages (e.g. Phylogenetic relationships exist between lineages (e.g.

species, genes)species, genes)

• These include ancestor-descendent relationships and more These include ancestor-descendent relationships and more indirect relationships based on common ancestryindirect relationships based on common ancestry

• Phylogenetic relationships between species or lineages are Phylogenetic relationships between species or lineages are (expected to be) tree-like(expected to be) tree-like

• Phylogenetic relationships are not measured with a simple Phylogenetic relationships are not measured with a simple coefficient coefficient

Page 4: Introduction to characters and parsimony analysis

Phylogenetic RelationshipsPhylogenetic Relationships• Traditionally phylogeny reconstruction was dominated by Traditionally phylogeny reconstruction was dominated by

the search for ancestors, and ancestor-descendant the search for ancestors, and ancestor-descendant relationshipsrelationships

• In modern phylogenetics there is an emphasis on indirect In modern phylogenetics there is an emphasis on indirect relationshipsrelationships

• Given that all lineages are related, closeness of Given that all lineages are related, closeness of phylogenetic relationships is a relative concept. phylogenetic relationships is a relative concept.

Page 5: Introduction to characters and parsimony analysis

Phylogenetic relationshipsPhylogenetic relationships• Two lineages are more closely related to each other than to Two lineages are more closely related to each other than to

some other lineage if they share a more recent common some other lineage if they share a more recent common ancestor - this is the cladistic concept of relationshipsancestor - this is the cladistic concept of relationships

• Phylogenetic hypotheses are hypotheses of common Phylogenetic hypotheses are hypotheses of common ancestry ancestry

Frog

Toad

Oak

(Frog,Toad)OakHypothetical ancestral lineage

Page 6: Introduction to characters and parsimony analysis

Phylogenetic TreesPhylogenetic Trees

A B C D E F G H I J

ROOT

polytomy

terminal branches

interiorbranches

node 1 node 2

LEAVES

A CLADOGRAM

Page 7: Introduction to characters and parsimony analysis

CLADOGRAMS AND CLADOGRAMS AND PHYLOGRAMSPHYLOGRAMS

ABSOLUTE TIME or DIVERGENCE

RELATIVE TIME

A B

C DE

FG

HI

J

A B C D E F GH I J

Page 8: Introduction to characters and parsimony analysis

Trees - Rooted and UnrootedTrees - Rooted and Unrooted

ROOTA

B

C

D E

F

GH

I

J

A B C D E F GH I J

ROOT

A B C D E F G H I J

ROOT

Page 9: Introduction to characters and parsimony analysis

Characters and Character Characters and Character StatesStates

• Organisms comprise sets of featuresOrganisms comprise sets of features• When organisms/taxa differ with respect to a When organisms/taxa differ with respect to a

feature (e.g. its presence or absence or different feature (e.g. its presence or absence or different nucleotide bases at specific sites in a sequence) the nucleotide bases at specific sites in a sequence) the different conditions are called different conditions are called character states character states

• The collection of character states with respect to a The collection of character states with respect to a feature constitute a feature constitute a charactercharacter

Page 10: Introduction to characters and parsimony analysis

Character evolutionCharacter evolution• Heritable changes (in morphology, gene sequences, etc.) Heritable changes (in morphology, gene sequences, etc.)

produce different character statesproduce different character states

• Similarities and differences in character states provide Similarities and differences in character states provide the basis for inferring phylogeny (i.e. provide evidence the basis for inferring phylogeny (i.e. provide evidence of relationships)of relationships)

• The utility of this evidence depends on how often the The utility of this evidence depends on how often the evolutionary changes that produce the different evolutionary changes that produce the different character states occur independentlycharacter states occur independently

Page 11: Introduction to characters and parsimony analysis

Unique and unreversed charactersUnique and unreversed characters

• Given a heritable evolutionary change that is Given a heritable evolutionary change that is uniqueunique and and unreversedunreversed (e.g. the origin of hair) in an ancestral species, the (e.g. the origin of hair) in an ancestral species, the presence of the novel character state in any taxa must be due presence of the novel character state in any taxa must be due to inheritance from the ancestorto inheritance from the ancestor

• Similarly, absence in any taxa must be because the taxa are Similarly, absence in any taxa must be because the taxa are not descendants of that ancestornot descendants of that ancestor

• The novelty is a The novelty is a homologyhomology acting as badge or marker for the acting as badge or marker for the descendants of the ancestordescendants of the ancestor

• The taxa with the novelty are a clade (e.g. Mammalia)The taxa with the novelty are a clade (e.g. Mammalia)

Page 12: Introduction to characters and parsimony analysis

Unique and unreversed charactersUnique and unreversed characters• Because hair evolved only once and is unreversed Because hair evolved only once and is unreversed

(not subsequently lost) it is (not subsequently lost) it is homologoushomologous and provides and provides unambiguous evidence for of relationshipsunambiguous evidence for of relationships

Lizard

Frog

Human

Dog

HAIR

absentpresent

change or step

Page 13: Introduction to characters and parsimony analysis

• Homoplasy is similarity that is not homologous Homoplasy is similarity that is not homologous (not due to common ancestry)(not due to common ancestry)

• It is the result of independent evolution It is the result of independent evolution (convergence, parallelism, reversal)(convergence, parallelism, reversal)

• Homoplasy can provide misleading evidence of Homoplasy can provide misleading evidence of phylogenetic relationships (if mistakenly phylogenetic relationships (if mistakenly interpreted as homology)interpreted as homology)

Homoplasy - Independent evolution

Page 14: Introduction to characters and parsimony analysis

Homoplasy - independent evolutionHomoplasy - independent evolution

HumanLizard

Frog Dog

TAIL (adult)

absentpresent

• Loss of tails evolved independently in humans and frogs - there are two steps on the true tree

Page 15: Introduction to characters and parsimony analysis

Homoplasy - misleading evidence of Homoplasy - misleading evidence of phylogenyphylogeny

• If misinterpreted as homology, the absence of tails If misinterpreted as homology, the absence of tails would be evidence for a wrong tree: grouping would be evidence for a wrong tree: grouping humans with frogs and lizards with dogshumans with frogs and lizards with dogs

Human

Frog

Lizard

Dog

TAIL

absentpresent

Page 16: Introduction to characters and parsimony analysis

Homoplasy - reversalHomoplasy - reversal• Reversals are evolutionary changes back to an Reversals are evolutionary changes back to an

ancestral conditionancestral condition

• As with any homoplasy, reversals can provide As with any homoplasy, reversals can provide misleading evidence of relationshipsmisleading evidence of relationships

True tree Wrong tree101 2 3 4 5 67 8 91 2 3 4 5 6 7 8 9 10

Page 17: Introduction to characters and parsimony analysis

Homoplasy - a fundamental Homoplasy - a fundamental problem of phylogenetic inferenceproblem of phylogenetic inference

• If there were no homoplastic similarities If there were no homoplastic similarities inferring phylogeny would be easy - all the inferring phylogeny would be easy - all the pieces of the jig-saw would fit together neatlypieces of the jig-saw would fit together neatly

• Distinguishing the misleading evidence of Distinguishing the misleading evidence of homoplasy from the reliable evidence of homoplasy from the reliable evidence of homology is a fundamental problem of homology is a fundamental problem of phylogenetic inferencephylogenetic inference

Page 18: Introduction to characters and parsimony analysis

Homoplasy and IncongruenceHomoplasy and Incongruence• If we assume that there is a single correct phylogenetic tree If we assume that there is a single correct phylogenetic tree

then:then:

• When characters support conflicting phylogenetic trees we When characters support conflicting phylogenetic trees we know that there must be some misleading evidence of know that there must be some misleading evidence of relationships among the relationships among the incongruentincongruent or or incompatibleincompatible characters characters

• Incongruence between two characters implies that at least Incongruence between two characters implies that at least one of the characters is homoplastic and that at least one of one of the characters is homoplastic and that at least one of the trees the character supports is wrongthe trees the character supports is wrong

Page 19: Introduction to characters and parsimony analysis

Incongruence or IncompatibilityIncongruence or Incompatibility

• These trees and characters are incongruent - both trees cannot These trees and characters are incongruent - both trees cannot be correct, at least one is wrong and at least one character must be correct, at least one is wrong and at least one character must be homoplasticbe homoplastic

Lizard

Frog

Human

Dog

HAIR

absentpresent

Human

Frog

Lizard

Dog

TAIL

absentpresent

Page 20: Introduction to characters and parsimony analysis

Distinguishing homology and Distinguishing homology and homoplasy homoplasy

• Morphologists use a variety of techniques to Morphologists use a variety of techniques to distinguish homoplasy and homologydistinguish homoplasy and homology

• Homologous features are expected to display detailed Homologous features are expected to display detailed similarity (in position, structure, development) similarity (in position, structure, development) whereas homoplastic similarities are more likely to be whereas homoplastic similarities are more likely to be superficialsuperficial

• As recognised by Charles Darwin congruence with As recognised by Charles Darwin congruence with other characters provides the most compelling other characters provides the most compelling evidence for homologyevidence for homology

Page 21: Introduction to characters and parsimony analysis

The importance of congruenceThe importance of congruence

• ““The importance, for classification, of trifling The importance, for classification, of trifling characters, mainly depends on their being characters, mainly depends on their being correlated with several other characters of correlated with several other characters of more or less importance. The value indeed of more or less importance. The value indeed of an aggregate of characters is very an aggregate of characters is very evident ........ a classification founded on any evident ........ a classification founded on any single character, however important that may single character, however important that may be, has always failed.”be, has always failed.”

• Charles Darwin: Origin of Species, Ch. 13Charles Darwin: Origin of Species, Ch. 13

Page 22: Introduction to characters and parsimony analysis

CongruenceCongruence

• We prefer the ‘true’ tree because it is supported We prefer the ‘true’ tree because it is supported by multiple congruent charactersby multiple congruent characters

Lizard

Frog

Human

Dog

MAMMALIAHairSingle bone in lower jawLactationetc.

Page 23: Introduction to characters and parsimony analysis

Homoplasy in molecular dataHomoplasy in molecular data• Incongruence and therefore homoplasy can be Incongruence and therefore homoplasy can be

common in molecular sequence datacommon in molecular sequence data– There are a limited number of alternative character There are a limited number of alternative character

states ( e.g. Only A, G, C and T in DNA)states ( e.g. Only A, G, C and T in DNA)

– Rates of evolution are sometimes highRates of evolution are sometimes high

• Character states are chemically identical Character states are chemically identical – homology and homoplasy are equally similarhomology and homoplasy are equally similar

– cannot be distinguished by detailed study of cannot be distinguished by detailed study of similarity and differencessimilarity and differences

Page 24: Introduction to characters and parsimony analysis

Parsimony analysisParsimony analysis

• Parsimony methods provide one way of Parsimony methods provide one way of choosing among alternative phylogenetic choosing among alternative phylogenetic hypotheses hypotheses

• The parsimony criterion favours hypotheses The parsimony criterion favours hypotheses that maximise congruence and minimise that maximise congruence and minimise homoplasyhomoplasy

• It depends on the idea of the fit of a character to It depends on the idea of the fit of a character to a treea tree

Page 25: Introduction to characters and parsimony analysis

Character Fit Character Fit • Initially, we can define the fit of a character to Initially, we can define the fit of a character to

a tree as the minimum number of steps a tree as the minimum number of steps required to explain the observed distribution of required to explain the observed distribution of character states among taxa character states among taxa

• This is determined by This is determined by parsimonious character parsimonious character optimizationoptimization

• Characters differ in their fit to different treesCharacters differ in their fit to different trees

Page 26: Introduction to characters and parsimony analysis

Character FitCharacter FitF

rog

Coc

odile

Bird

Kan

gero

o

Bat

Hum

anHairabsentpresent

Fro

g

Kan

gero

o

Coc

odile

Hum

an

Bat

Bird

Tree A1 step

Tree B2 steps

Page 27: Introduction to characters and parsimony analysis

Parsimony AnalysisParsimony Analysis• Given a set of characters, such as aligned Given a set of characters, such as aligned

sequences, parsimony analysis works by sequences, parsimony analysis works by determining the fit (number of steps) of each determining the fit (number of steps) of each character on a given treecharacter on a given tree

• The sum over all characters is called The sum over all characters is called Tree Tree LengthLength

• Most parsimonious trees (MPTs) have the Most parsimonious trees (MPTs) have the minimum tree length needed to explain the minimum tree length needed to explain the observed distributions of all the charactersobserved distributions of all the characters

Page 28: Introduction to characters and parsimony analysis

Parsimony in practiceParsimony in practice

Frog

Bird

Crocodile

Kangeroo

Bat

Human

am

nio

n

hair

win

gs

anto

rbita

l fenestr

a

pla

centa

lacta

tion

Tree 1

Tree 2

T A

X A

FIT

-

-

-

-

--

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+ -

+

-

CHARACTERS

1 2 3 4 5 6

+

+

+

+

1

1

TREE LENGTH

1 1 1 1 2 7

2 2 2 2 1 10

Fro

g

Cocodile

Kangero

o

Bat

Bird

Hum

an

1

23

6

44

5

5

23

Tree 2

Cocodile

Kangero

o

Fro

g

Bird

Bat

Hum

an

1

Tree 1

23

4

66

5

Of these two trees, Tree 1 has the shortest length and is the most parsimoniousBoth trees require some homoplasy (extra steps)

Page 29: Introduction to characters and parsimony analysis

Results of parsimony analysisResults of parsimony analysis• One or more most parsimonious treesOne or more most parsimonious trees

• Hypotheses of character evolution associated with Hypotheses of character evolution associated with each tree (where and how changes have occurred) each tree (where and how changes have occurred)

• Branch lengths (amounts of change associated with Branch lengths (amounts of change associated with branches)branches)

• Various tree and character statistics describing the fit Various tree and character statistics describing the fit between tree and databetween tree and data

• Suboptimal trees - optionalSuboptimal trees - optional

Page 30: Introduction to characters and parsimony analysis

Parsimony - advantagesParsimony - advantages

• is a simple method - easily understood operationis a simple method - easily understood operation

• does not seem to depend on an explicit model of does not seem to depend on an explicit model of evolutionevolution

• gives both trees and associated hypotheses of gives both trees and associated hypotheses of character evolutioncharacter evolution

• should give reliable results if the data is well should give reliable results if the data is well structured and homoplasy is either rare or widely structured and homoplasy is either rare or widely (randomly) distributed on the tree(randomly) distributed on the tree

Page 31: Introduction to characters and parsimony analysis

Parsimony - disadvantagesParsimony - disadvantages• May give misleading results if homoplasy is common or May give misleading results if homoplasy is common or

concentrated in particular parts of the tree, e.g:concentrated in particular parts of the tree, e.g:- thermophilic convergencethermophilic convergence- base composition biasesbase composition biases- long branch attractionlong branch attraction

• Underestimates branch lengthsUnderestimates branch lengths• Model of evolution is implicit - behaviour of method not well Model of evolution is implicit - behaviour of method not well

understoodunderstood• Parsimony often justified on purely philosophical grounds - we Parsimony often justified on purely philosophical grounds - we

must prefer simplest hypotheses - particularly by must prefer simplest hypotheses - particularly by morphologistsmorphologists

• For most molecular systematists this is uncompellingFor most molecular systematists this is uncompelling

Page 32: Introduction to characters and parsimony analysis

Parsimony can be inconsistentParsimony can be inconsistent• Felsenstein (1978) developed a simple model phylogeny including four Felsenstein (1978) developed a simple model phylogeny including four

taxa and a mixture of short and long branchestaxa and a mixture of short and long branches

• Under this model parsimony will give the wrong treeUnder this model parsimony will give the wrong treeA B

C D

Model tree

p pq

q q

Rates or Branch lengths

p >> q

A

B

C

D

Parsimony tree

Wrong

• With more data the certainty that parsimony will give the wrong tree increases - so that parsimony is statistically inconsistent

• Advocates of parsimony initially responded by claiming that Felsenstein’s result showed only that his model was unrealistic

• It is now recognised that the long-branch attraction (in the Felsenstein Zone) is one of the most serious problems in phylogenetic inference

Long branches are attracted but the similarity is homoplastic

Page 33: Introduction to characters and parsimony analysis