36
CSCE555 Bioinformatics CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: http://www.scigen.org/csce555 University of South Carolina Department of Computer Science and Engineering 2008 www.cse.sc.edu . HAPPY CHINESE NEW YEAR

CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Embed Size (px)

Citation preview

Page 1: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

CSCE555 BioinformaticsCSCE555 Bioinformatics

Lecture 12 Phylogenetics I

Meeting: MW 4:00PM-5:15PM SWGN2A21Instructor: Dr. Jianjun HuCourse page: http://www.scigen.org/csce555

University of South CarolinaDepartment of Computer Science and Engineering2008 www.cse.sc.edu.

HAPPY CHINESE NEW YEAR

Page 2: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

OutlineOutline

Introduction to EvolutionWhat is phylogeny and

phylogeneticsApplication of phylogeneticsAlgorithms for phylogenetic

inference

04/20/23 2

Page 3: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

How did life evolve on How did life evolve on earth?earth?

Courtesy of the Tree of Life project

An international effort to An international effort to understand how life evolved on understand how life evolved on earthearth

Biomedical applications: drug Biomedical applications: drug design, protein structure and design, protein structure and function prediction, biodiversity.function prediction, biodiversity.

Page 4: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

EvolutionEvolution

Evolution of new organisms is driven by

Mutations◦ The DNA sequence can

be changed due to single base changes, deletion/insertion of DNA segments, etc.

Selection bias

Page 5: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Theory of EvolutionTheory of EvolutionBasic idea

◦speciation events lead to creation of different species.

◦Speciation caused by physical separation into groups where different genetic variants become dominant

Any two species share a (possibly distant) common ancestor

Page 6: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Primate evolution

A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species; also called a phylogenetic tree.

Page 7: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

DNA Sequence EvolutionDNA Sequence Evolution

AAGACTT

TGGACTTAAGGCCT

-3 mil yrs

-2 mil yrs

-1 mil yrs

today

AGGGCAT TAGCCCT AGCACTT

AAGGCCT TGGACTT

TAGCCCA TAGACTT AGCGCTTAGCACAAAGGGCAT

AGGGCAT TAGCCCT AGCACTT

AAGACTT

TGGACTTAAGGCCT

AGGGCAT TAGCCCT AGCACTT

AAGGCCT TGGACTT

AGCGCTTAGCACAATAGACTTTAGCCCAAGGGCAT

Page 8: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Morphological vs. Morphological vs. MolecularMolecularClassical phylogenetic analysis: morphological features: number of legs, lengths of legs, etc.

Modern biological methods allow to use molecular features◦Gene sequences◦Protein sequences◦Whole genome sequences. E.g.

rearrangements

Page 9: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Morphological topology

BonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboonWhite-fronted capuchinSlow lorisTree shrewJapanese pipistrelleLong-tailed batJamaican fruit-eating batHorseshoe bat

Little red flying foxRyukyu flying foxMouseRatVoleCane-ratGuinea pigSquirrelDormouseRabbitPikaPigHippopotamusSheepCowAlpacaBlue whaleFin whaleSperm whaleDonkeyHorseIndian rhinoWhite rhinoElephantAardvarkGrey sealHarbor sealDogCatAsiatic shrewLong-clawed shrewSmall Madagascar hedgehogHedgehogGymnureMoleArmadilloBandicootWallarooOpossumPlatypus

Archonta

Glires

Ungulata

Carnivora

Insectivora

Xenarthra

(Based on Mc Kenna and Bell, 1997)

Page 10: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Rat QEPGGLVVPPTDA

Rabbit QEPGGMVVPPTDA

Gorilla QEPGGLVVPPTDA

Cat REPGGLVVPPTEG

From sequences to a phylogenetic tree

There are many possible types of sequences to use (e.g. Mitochondrial vs Nuclear proteins).

Page 11: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

DonkeyHorseIndian rhinoWhite rhinoGrey sealHarbor sealDogCatBlue whaleFin whaleSperm whaleHippopotamusSheepCowAlpacaPigLittle red flying foxRyukyu flying foxHorseshoe batJapanese pipistrelleLong-tailed batJamaican fruit-eating bat

Asiatic shrewLong-clawed shrew

MoleSmall Madagascar hedgehogAardvarkElephantArmadilloRabbitPikaTree shrewBonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboon

White-fronted capuchinSlow lorisSquirrelDormouseCane-ratGuinea pigMouseRatVoleHedgehogGymnureBandicootWallarooOpossumPlatypus

Perissodactyla

Carnivora

Cetartiodactyla

Rodentia 1

HedgehogsRodentia 2

Primates

ChiropteraMoles+ShrewsAfrotheria

XenarthraLagomorpha

+ Scandentia

Mitochondrial topology(Based on Pupko et al.,)

Page 12: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Phylogenenetic treesPhylogenenetic trees

Leaves - current day species (or taxa – plural of taxon)

Internal vertices - hypothetical common ancestors

Edges length - “time” from one speciation to the next

Aardvark Bison Chimp Dog Elephant

Page 13: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Types of TreesTypes of TreesA natural model to consider is that

of rooted treesCommonAncestor

Page 14: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Types of treesTypes of treesUnrooted tree represents the same

phylogeny without the root node

Depending on the model, data from current day species does not distinguish between different placements of the root.

Page 15: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Rooted versus unrooted treesTree a

ab

Tree b

c

Tree c

Represents the three rooted trees

Page 16: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

What is phylogenetics?What is phylogenetics?Phylogenetics is the study of

evolutionary relationships among and within species.◦Inference of trees from data◦Interpreting the evolutionary tree◦Application of evolutionary trees

crocodiles

birds

lizards

snakesrodents

primates

marsupials

Page 17: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

What is phylogenetics?What is phylogenetics?

crocodiles

birds

lizards

snakes

rodents

primates

marsupials

This is an example of a phylogenetic tree.

Page 18: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

• Forensics:Did a patient’s HIV infection result from an invasive

dental procedure performed by an HIV+ dentist?

Applications of Applications of phylogeneticsphylogenetics

• Conservation:How much gene flow is there among local populations of

island foxes off the coast of California?

• Medicine:What are the evolutionary relationships among the

various prion-related diseases? HIV case

Page 19: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Applications of Applications of phylogeneticsphylogenetics1. Forensics

Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist?

Page 20: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Phylogenetic analysisPhylogenetic analysis

Page 21: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

So what do the results So what do the results mean?mean?

• 2 of 3 patients closer to dentist than to local controls. Statistical significance? More powerful analyses?

• Do we have enough data to be confident in our conclusions? What additional data would help?

• If we determine that the dentist’s virus is linked to those of patients E and G, what are possible interpretations of this pattern? How could we test between them?

Page 22: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Applications of Applications of phylogeneticsphylogenetics2. ConservationHow much gene flow is there

among local populations of island foxes off the coast of California?

Page 23: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

http://bioquest.org/bedrock/

Wayne, K. R, Morin, P.A. 2004 Conservation Genetics in the New Molecular Age, Frontiers in Ecology and the Environment. 2: 89-97. (ESA publication)

Page 24: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Applications of Applications of phylogeneticsphylogenetics3. MedicineWhat are the evolutionary

relationships among the various prion-related diseases?

Page 25: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Inferring PhylogeniesInferring Phylogenies

Trees can be inferred:

◦ Morphology of the organisms

◦ Sequence comparison

Example:

Orc: ACAGTGACGCCCCAAACGT

Elf: ACAGTGACGCTACAAACGT

Dwarf: CCTGTGACGTAACAAACGA

Hobbit: CCTGTGACGTAGCAAACGA

Human: CCTGTGACGTAGCAAACGA

Page 26: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

How Many Trees?How Many Trees?

Unrooted trees Rooted trees

# sequences

# pairwise distances # trees

# branches /

tree # trees

# branches

/tree

3

4

5

6

10

30

N

(assuming bifurcation only)

Page 27: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

How Many Trees?How Many Trees?

2N - 2(2N - 3)!

2N - 2 (N - 2)!

2N - 3(2N - 5)!

2N - 3 (N - 3)!

N (N - 1)

2

N

584.95 1038578.69 103643530

1834,459,425172,027,0254510

109459105156

8105715105

6155364

433133

# branches

/tree# trees

# branches /

tree# trees

# pairwise distance

s

# sequence

s

Rooted treesUnrooted trees

Page 28: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Phylogenetic MethodsPhylogenetic Methods

Maximum likelihood• Maximizes likelihood of observed data

Many different procedures exist. Three of the most popular:

Maximum parsimony• Minimizes total evolutionary change

Neighbor-joining• Minimizes distance between nearest

neighbors

Page 29: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Comparison of MethodsComparison of Methods

Neighbor-joining Maximum parsimony Maximum likelihood

Very fast Slow Very slow

Easily trapped in local optima

Assumptions fail when evolution is rapid

Highly dependent on assumed evolution model

Good for generating tentative tree, or choosing among multiple trees

Best option when tractable (<30 taxa, strong conservation)

Good for very small data sets and for testing trees built using other methods

Page 30: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Distance based tree Distance based tree ConstructionConstructionDistance- A weighted tree that realizes the distances

between the objects.Given a set of species (leaves in a supposed tree), and

distances between them – construct a phylogeny which best “fits” the distances.

USER
לפני הבניה יש להכניס את משפט 4 הנקודות (מקובץ נפרד), שיחליף את ההוכחה הקודמת שלו בהרצאה 12. כמו כן ייתכן שכדאי לוותר על UPGMA. הערה זו משפיעה כמובן גם על הרצאה 12.שלמה 12.3.03
Page 31: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Distance MatrixDistance MatrixGiven n species, we can compute

the n x n distance matrix Dij

Dij may be defined as the edit distance between a gene in species i and species j, where the gene of interest is sequenced for all n species.

Page 32: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Distances in TreesDistances in Trees

Edges may have weights reflecting:◦Number of mutations on evolutionary path from one species to another

◦Time estimate for evolution of one species into another

In a tree T, we often compute dij(T) - the length of a path between leaves

i and j

Page 33: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Distance in Trees: an Distance in Trees: an ExampeExampe

d1,4 = 12 + 13 + 14 + 17 + 12 = 68

i

j

Page 34: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Fitting Distance MatrixFitting Distance MatrixGiven n species, we can compute

the n x n distance matrix Dij

Evolution of these genes is described by a tree that we don’t know.

We need an algorithm to construct a tree that best fits the distance matrix Dij

Page 35: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

SummarySummaryEvolution and PhylogenyConcepts of Phylogenetics Application of PhylogeneticsCategory of phylogenetic inference

algorithms

Next lecture:Detailed algorithms for phylogenetic

inference

Page 36: CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

AcknowledgementAcknowledgementAnonymous authors