CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr....

Preview:

Citation preview

CSCE555 BioinformaticsCSCE555 Bioinformatics

Lecture 12 Phylogenetics I

Meeting: MW 4:00PM-5:15PM SWGN2A21Instructor: Dr. Jianjun HuCourse page: http://www.scigen.org/csce555

University of South CarolinaDepartment of Computer Science and Engineering2008 www.cse.sc.edu.

HAPPY CHINESE NEW YEAR

OutlineOutline

Introduction to EvolutionWhat is phylogeny and

phylogeneticsApplication of phylogeneticsAlgorithms for phylogenetic

inference

04/20/23 2

How did life evolve on How did life evolve on earth?earth?

Courtesy of the Tree of Life project

An international effort to An international effort to understand how life evolved on understand how life evolved on earthearth

Biomedical applications: drug Biomedical applications: drug design, protein structure and design, protein structure and function prediction, biodiversity.function prediction, biodiversity.

EvolutionEvolution

Evolution of new organisms is driven by

Mutations◦ The DNA sequence can

be changed due to single base changes, deletion/insertion of DNA segments, etc.

Selection bias

Theory of EvolutionTheory of EvolutionBasic idea

◦speciation events lead to creation of different species.

◦Speciation caused by physical separation into groups where different genetic variants become dominant

Any two species share a (possibly distant) common ancestor

Primate evolution

A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species; also called a phylogenetic tree.

DNA Sequence EvolutionDNA Sequence Evolution

AAGACTT

TGGACTTAAGGCCT

-3 mil yrs

-2 mil yrs

-1 mil yrs

today

AGGGCAT TAGCCCT AGCACTT

AAGGCCT TGGACTT

TAGCCCA TAGACTT AGCGCTTAGCACAAAGGGCAT

AGGGCAT TAGCCCT AGCACTT

AAGACTT

TGGACTTAAGGCCT

AGGGCAT TAGCCCT AGCACTT

AAGGCCT TGGACTT

AGCGCTTAGCACAATAGACTTTAGCCCAAGGGCAT

Morphological vs. Morphological vs. MolecularMolecularClassical phylogenetic analysis: morphological features: number of legs, lengths of legs, etc.

Modern biological methods allow to use molecular features◦Gene sequences◦Protein sequences◦Whole genome sequences. E.g.

rearrangements

Morphological topology

BonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboonWhite-fronted capuchinSlow lorisTree shrewJapanese pipistrelleLong-tailed batJamaican fruit-eating batHorseshoe bat

Little red flying foxRyukyu flying foxMouseRatVoleCane-ratGuinea pigSquirrelDormouseRabbitPikaPigHippopotamusSheepCowAlpacaBlue whaleFin whaleSperm whaleDonkeyHorseIndian rhinoWhite rhinoElephantAardvarkGrey sealHarbor sealDogCatAsiatic shrewLong-clawed shrewSmall Madagascar hedgehogHedgehogGymnureMoleArmadilloBandicootWallarooOpossumPlatypus

Archonta

Glires

Ungulata

Carnivora

Insectivora

Xenarthra

(Based on Mc Kenna and Bell, 1997)

Rat QEPGGLVVPPTDA

Rabbit QEPGGMVVPPTDA

Gorilla QEPGGLVVPPTDA

Cat REPGGLVVPPTEG

From sequences to a phylogenetic tree

There are many possible types of sequences to use (e.g. Mitochondrial vs Nuclear proteins).

DonkeyHorseIndian rhinoWhite rhinoGrey sealHarbor sealDogCatBlue whaleFin whaleSperm whaleHippopotamusSheepCowAlpacaPigLittle red flying foxRyukyu flying foxHorseshoe batJapanese pipistrelleLong-tailed batJamaican fruit-eating bat

Asiatic shrewLong-clawed shrew

MoleSmall Madagascar hedgehogAardvarkElephantArmadilloRabbitPikaTree shrewBonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboon

White-fronted capuchinSlow lorisSquirrelDormouseCane-ratGuinea pigMouseRatVoleHedgehogGymnureBandicootWallarooOpossumPlatypus

Perissodactyla

Carnivora

Cetartiodactyla

Rodentia 1

HedgehogsRodentia 2

Primates

ChiropteraMoles+ShrewsAfrotheria

XenarthraLagomorpha

+ Scandentia

Mitochondrial topology(Based on Pupko et al.,)

Phylogenenetic treesPhylogenenetic trees

Leaves - current day species (or taxa – plural of taxon)

Internal vertices - hypothetical common ancestors

Edges length - “time” from one speciation to the next

Aardvark Bison Chimp Dog Elephant

Types of TreesTypes of TreesA natural model to consider is that

of rooted treesCommonAncestor

Types of treesTypes of treesUnrooted tree represents the same

phylogeny without the root node

Depending on the model, data from current day species does not distinguish between different placements of the root.

Rooted versus unrooted treesTree a

ab

Tree b

c

Tree c

Represents the three rooted trees

What is phylogenetics?What is phylogenetics?Phylogenetics is the study of

evolutionary relationships among and within species.◦Inference of trees from data◦Interpreting the evolutionary tree◦Application of evolutionary trees

crocodiles

birds

lizards

snakesrodents

primates

marsupials

What is phylogenetics?What is phylogenetics?

crocodiles

birds

lizards

snakes

rodents

primates

marsupials

This is an example of a phylogenetic tree.

• Forensics:Did a patient’s HIV infection result from an invasive

dental procedure performed by an HIV+ dentist?

Applications of Applications of phylogeneticsphylogenetics

• Conservation:How much gene flow is there among local populations of

island foxes off the coast of California?

• Medicine:What are the evolutionary relationships among the

various prion-related diseases? HIV case

Applications of Applications of phylogeneticsphylogenetics1. Forensics

Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist?

Phylogenetic analysisPhylogenetic analysis

So what do the results So what do the results mean?mean?

• 2 of 3 patients closer to dentist than to local controls. Statistical significance? More powerful analyses?

• Do we have enough data to be confident in our conclusions? What additional data would help?

• If we determine that the dentist’s virus is linked to those of patients E and G, what are possible interpretations of this pattern? How could we test between them?

Applications of Applications of phylogeneticsphylogenetics2. ConservationHow much gene flow is there

among local populations of island foxes off the coast of California?

http://bioquest.org/bedrock/

Wayne, K. R, Morin, P.A. 2004 Conservation Genetics in the New Molecular Age, Frontiers in Ecology and the Environment. 2: 89-97. (ESA publication)

Applications of Applications of phylogeneticsphylogenetics3. MedicineWhat are the evolutionary

relationships among the various prion-related diseases?

Inferring PhylogeniesInferring Phylogenies

Trees can be inferred:

◦ Morphology of the organisms

◦ Sequence comparison

Example:

Orc: ACAGTGACGCCCCAAACGT

Elf: ACAGTGACGCTACAAACGT

Dwarf: CCTGTGACGTAACAAACGA

Hobbit: CCTGTGACGTAGCAAACGA

Human: CCTGTGACGTAGCAAACGA

How Many Trees?How Many Trees?

Unrooted trees Rooted trees

# sequences

# pairwise distances # trees

# branches /

tree # trees

# branches

/tree

3

4

5

6

10

30

N

(assuming bifurcation only)

How Many Trees?How Many Trees?

2N - 2(2N - 3)!

2N - 2 (N - 2)!

2N - 3(2N - 5)!

2N - 3 (N - 3)!

N (N - 1)

2

N

584.95 1038578.69 103643530

1834,459,425172,027,0254510

109459105156

8105715105

6155364

433133

# branches

/tree# trees

# branches /

tree# trees

# pairwise distance

s

# sequence

s

Rooted treesUnrooted trees

Phylogenetic MethodsPhylogenetic Methods

Maximum likelihood• Maximizes likelihood of observed data

Many different procedures exist. Three of the most popular:

Maximum parsimony• Minimizes total evolutionary change

Neighbor-joining• Minimizes distance between nearest

neighbors

Comparison of MethodsComparison of Methods

Neighbor-joining Maximum parsimony Maximum likelihood

Very fast Slow Very slow

Easily trapped in local optima

Assumptions fail when evolution is rapid

Highly dependent on assumed evolution model

Good for generating tentative tree, or choosing among multiple trees

Best option when tractable (<30 taxa, strong conservation)

Good for very small data sets and for testing trees built using other methods

Distance based tree Distance based tree ConstructionConstructionDistance- A weighted tree that realizes the distances

between the objects.Given a set of species (leaves in a supposed tree), and

distances between them – construct a phylogeny which best “fits” the distances.

USER
לפני הבניה יש להכניס את משפט 4 הנקודות (מקובץ נפרד), שיחליף את ההוכחה הקודמת שלו בהרצאה 12. כמו כן ייתכן שכדאי לוותר על UPGMA. הערה זו משפיעה כמובן גם על הרצאה 12.שלמה 12.3.03

Distance MatrixDistance MatrixGiven n species, we can compute

the n x n distance matrix Dij

Dij may be defined as the edit distance between a gene in species i and species j, where the gene of interest is sequenced for all n species.

Distances in TreesDistances in Trees

Edges may have weights reflecting:◦Number of mutations on evolutionary path from one species to another

◦Time estimate for evolution of one species into another

In a tree T, we often compute dij(T) - the length of a path between leaves

i and j

Distance in Trees: an Distance in Trees: an ExampeExampe

d1,4 = 12 + 13 + 14 + 17 + 12 = 68

i

j

Fitting Distance MatrixFitting Distance MatrixGiven n species, we can compute

the n x n distance matrix Dij

Evolution of these genes is described by a tree that we don’t know.

We need an algorithm to construct a tree that best fits the distance matrix Dij

SummarySummaryEvolution and PhylogenyConcepts of Phylogenetics Application of PhylogeneticsCategory of phylogenetic inference

algorithms

Next lecture:Detailed algorithms for phylogenetic

inference

AcknowledgementAcknowledgementAnonymous authors

Recommended