56
Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Embed Size (px)

Citation preview

Page 1: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Taking the Bite (Byte?) Out of Phylogeny

Jennifer Galovich

Lucy Kluckhohn Jones

Holly Pinkart

Page 2: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Introduction

• Goal is to produce an exercise that will engage allied health students and– Strengthen math skills and decrease math

phobia– Decrease molecular data phobia– Increase bioinformatics literacy

Page 3: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Prerequisites

• The following will be presented to students prior to this project– Basic evolutionary concepts and use of 16S

rRNA in determining relationships between prokaryotes

– Introduction to Biology Workbench, BLAST and tree construction

Page 4: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Approach

• Use the theme of food poisoning to engage both nursing and nutrition student populations

• Utilize mathematics and bioinformatics tools

Page 5: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Approach

• Students will pick a week in which food poisoning is likely; Christmas, 4th of July, Thanksgiving, etc.

• Students will– identify a source of food poisoning (ex. Salmonella),

and check the Morbidity and Mortality Weekly Report tables for the number of cases in a specific state or region

– calculate proportion of cases represented by that region – Answer “Is this number of cases unusual based on the

data presented for this time period? How can you tell?”

Page 6: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Approach

• Students will then address the questions– “Without culturing the organism, how might

you track it in humans or in a food supply?”– “What relationships (if any) exists between

various strains of this organism”?– “Can this type of data be used to find the

original strain?

Page 7: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Approach

• Students will – obtain sequence data from NCBI’s GenBank

for the organism (or virus) of interest– BLAST the sequence to find organisms with

related sequences– Collect 8-13 of the closest BLAST results to

perform a global alignment, and construct a tree

Page 8: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Questions

Students choose a time period (week), search MMWR (Morbidity and Mortality Weekly Report) for the number of cases of a particular disease for a given week.

1. Given the chosen disease, how many cases of the disease occurred in a particular state (or other locale) during the week?

Page 9: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

More Questions about the Scene

2a. How many persons are involved? Is there an index case?

2b. What percent of the population has the disease?3. What other question might you ask from these

data?4. What microbe causes the disease? What strain, if

appropriate?

Page 10: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Now What? (Questions about the microbe)

5. If you want to determine the specific strain of the microbe, can you find the genetic sequence?

6. How has the strain evolved?

7. What is its phylogeny, and what are the closest neighbors?

Page 11: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

And Then. . . (Questions to Investigate)

8a. Why is the answer to the previous question of interest to you if you are a nurse, a dietician, a parent, the mayor, the hospital director, the first responder, a restaurant owner, a cruise ship director, a public health inspector, or other interested person (you choose)?

8b. What other questions are of interest to you in this role?

Page 12: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Finding the Microbe

• Search MMWR Morbidity Tables

http://www.cdc.gov/mmwr/distrnds.html

Page 13: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Choose a Week

http://wonder.cdc.gov/mmwr/mmwrmorb.asp

Page 15: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

What Percent of the Residents are Sick?

http://wonder.cdc.gov/mmwr/mmwr_reps.asp?mmwr_year=2006&mmwr_week=01&mmwr_table=2F

Page 16: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Find a Microbe

• Use your text, class notes, or other resources to determine the causative agent of the disease you have chosen.

• Choose a microbe, then find its family tree.

• For the Salmonellosis example, we have chosen Salmonella enterica, a microbe with many variants, called serovars.

Page 17: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Basics of Tree Construction

• Preliminary Exercises• Goal

– Students will practice with small examples before trying to construct a tree

– Students will learn phylogenetics notation and terminology (also see Glossary at end)

Sequences

Alignment

Distance Matrix

Tree

Page 18: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

From Sequences to Pairwise Alignment

The Needleman-Wunsch Method 

Page 19: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

The Needleman-Wunsch Method

• We make a table of residue scores, S(i,j). The number S(i,j) is computed by comparing residue i in sequence (1) with residue j in sequence (2), using previously chosen values for matches and mismatches.  

• Each alignment matrix entry, H(i,j),  gives the score of the best alignment of the first i residues in sequence (1) with the first j residues of sequence (2)  

• We have one row for each residue in sequence (2) and one column for each residue in sequence (1).  To get started, we add a 0th row and a 0th column. 

• The upper left corner is position (0,0).  • We set H(0,0) = 0.• The rest of the values in the top row are

(reading across) -g, -2g, -3g, etc. , where g is the gap penalty.

• Similarly, the rest of the values in the leftmost column are (reading down) –g, -2g, -3g, etc. 

• To compute the value of H(i+1,j+1) we first consider the values north, west and northwest. We then find

• S(i+1,j+1) + the value immediately northwest • (The value just north) – g• (The value just west) – g

Page 20: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Distance Matrix

• Then we choose the largest of these three numbers to be H(i+1,j+1) and draw an arrow from position (i+1,j+1) to the position that gave us the value of H(i+1,j+1).

• Example:  Let match = 1, mismatch = -1

and g = 2. Consider the sequences

(1)  G A A T T C (2) G G A T  

    G A A T T C

  0 -2 -4 -6 -8 -10

-12

G -2 1 -1        

G -4 -1          

A -6            

T -8            

 

Page 21: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Try This Exercise (at home ok)a. Complete the table and then follow the arrows to

determine the alignment :– A diagonal arrow corresponds to aligning the two letters.– A horizontal arrow corresponds to aligning a letter from (2)

with a gap.– A vertical arrow corresponds to aligning a letter from (1)

with a gap. – (Note that if you have ties, you may have more than one

arrow, and so more than one “best” alignment.) 

b. Redo this exercise with your own choice of match, mismatch and gap values.  Experiment with these values to obtain alignments different from the ones you got in part (a).

Page 22: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

From Pairwise Alignment to Multiple Alignment 

• Idea of global progressive alignment:  Most alike sequences are aligned together in

order of their similarity.  A consensus is determined and then aligned to the next most similar sequence. The determination of “next most similar” is made using phylogenetic information (a guide tree).

Page 23: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

From Alignment to Distance Matrix 

There are many different ways of computing the distance between pairs of sequences in multiple alignment.   Each uses different assumptions, which may or may not be reasonable for a given situation. For example, the simplest model, Jukes-Cantor, assumes that mutation occurs at a constant rate, and that each nucleotide is equally likely to mutate into any other nucleotide (at that rate).  For protein sequences, the calculation is (even) more complicated.

Page 24: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

From distance matrix to tree

Again, there are many different methods available. Biology Workbench uses ClustalW to construct multiple alignments. Clustal uses the neighbor joining methods to find the guide tree. The final tree produced by Workbench is a compilation of these guide trees. 

Page 25: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Clustering Methods 

• The UPGMA (Unweighted Pair-Group Methods with Arithmetic means) method 

• +  easy to describe; produces an ultrametric (and hence additive) tree 

• -  assumptions  (molecular clock; all species evolve at the same rate) 

• General idea:  Step 1.  Find the two closest

taxa. Step 2.  Treat the two closest

as a new combined taxon, and make a new matrix, calculating distances from the combined taxon to the others using the average of all the pairwise distances involved.

Iterate these two steps until

the tree is completed. 

Page 26: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

  A B C D

A 0 9 7 5

B 9 0 8 10

C 7 8 0 8

D 5 10 8 0

 

Construct the UPGMA tree for the following distance matrix: 

 

  A/D B C

A/D 0 19/2 15/2

B   0 8

C     0

 

Observe:A and D are closest

Now the A/D cluster and C are closest.

Next, update the matrix

Page 27: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Exercises

1. Finish constructing this tree.

2. The tree is ultrametric, but the data are not. (Why not?) How would the data have to be changed in order that they be ultrametric?

3. The tree is additive.  Are the data?

Now, redo questions 1 – 3 in case the BD distance is 12 instead of 10. 

Page 28: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Neighbor Joining  (NJ) 

• +  additive (but not ultrametric); computationally efficient

• - unrooted. Prior knowledge is needed to decide how to root the tree.

• Note:  the species which are closest according to the distance matrix need NOT be neighbors. That’s why we need a modified distance formula

• Exercise:  Draw a picture of a tree on four taxa that illustrates the problem described in the note above. 

Page 29: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Constructing a Neighbor Joining Tree

Step 1:  Find the two taxa which are closest using the modified distance formula below.  Join them.

• To find the modified distance from node i to node j: Let N be the number of taxa.

Let R_i = sum of  all the distances from node i to all others except node j, divided by N – 2Let R_j = sum of  all the distances from node j to all others except node i, divided by N – 2Let D(i,j) = matrix distance. 

Calculate modified distance, D*, from i to j as D*(i,j) = D(i,j) – R_i – R_j. For example, using the distance matrix we used earlier, D*(A,B) = 9 – 6 – 9 = -6.

Page 30: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

NJ (continued)

Step 2: Suppose that nodes i and j give the smallest value of D*. Start the tree by joining those nodes to a new node. Call the new node (ij). We now have two fewer taxa and one more internal node, for a net of one less node than we started with.

Step 3: Now, as in the UPGMA method, we make a new matrix showing the distances to all the nodes except i and j. Problem: the new internal node (ij) is not in the original matrix. 

i

j

(ij)

Page 31: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

This problem can be solved

• Step 4: To update the matrix, you will need to compute the distance from the new internal node (ij) to the remaining nodes. For each remaining node k, compute the new distance as

½ [D(i,k) + D(j,k) – D(i,j)]• Step 5: Apply steps 1 – 4 to the revised

matrix.

Page 32: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Exercises

• Practice the NJ method on the matrix we had earlier.

• Now try both methods using the matrix to the right. Why do you get different trees?

A B C D

A 0 17 21 27

B 17 0 12 18

C 21 12 0 14

D 27 18 14 0

Page 33: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Final Approach

• Use the theme of food poisoning to engage both nursing and nutrition student populations

• Utilize mathematics and bioinformatics tools

Page 34: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Find the Microbial Gene

• NCBI Search

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide

Page 35: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Choose a Strain

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=Salmonella+enterica+16s+ribosomal+RNA+gene

Page 36: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

BLAST• Basic Local Alignment Search Tool

http://www.ncbi.nlm.nih.gov/BLAST/

Page 37: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Paste Sequence, BLAST off!

http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&ALIGNMENTS=50&ALIGNMENT_VIEW=Pairwise&CLIENT=web&DATABASE=nr&DESCRIPTIONS=100&ENTREZ_QUERY=%28none%29&EXPECT=10&FILTER=L&FORMAT_OBJECT=Alignment&FORMAT_TYPE=HTML&NCBI_GI=on&PAGE=Nucleotides&PROGRAM=blastn&SERVICE=plain&SET_DEFAULTS.x=34&SET_DEFAULTS.y=8&SHOW_OVERVIEW=on&END_OF_HTTPGET=Yes&SHOW_LINKOUT=yes&GET_SEQUENCE=yes

Page 38: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

BLAST Results

Page 39: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

BLAST Sequences http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi

Page 40: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

GenBank

http://www.ncbi.nlm.nih.gov/entre

z/viewr.fcgi?db=nucleotide&v

al=88604678

Page 41: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

FASTA

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?

db=nucleotide&qty=1&c_start=1&list_uids=88604678&dopt=fasta&dispmax=5&sendto=&from=begin&to=end&extrafeatpresent=1&ef_CDD=8&ef_MGC=16&ef_HPRD=

32&ef_STS=64&ef_tRNA=128

Page 42: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Constructing a Tree

• Add sequences

• http://seqtool.sdsc.edu/CGI/BW.cgi#!

Page 43: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Clustal W• Choose the Multiple Sequence Alignment

http://seqtool.sdsc.edu/CGI/BW.cgi#!

Page 44: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Choose a Tree Type• Choose Rooted and/or Unrooted

• Submit

http://seqtool.sdsc.edu/CGI/BW.cgi#!

Page 45: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Voila!

• Unrooted Tree

http://seqtool.sdsc.edu/CGI/BW.cgi#!

Page 46: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Rooted Tree• Which species are the most closely related?

http://seqtool.sdsc.edu/CGI/BW.cgi#!

Page 47: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Final Questions

• How are the data helpful if you are a– Parent?– Restaurant owner?– Hospital director?– Public health inspector?

Page 48: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Assessment

• Student Learning Outcomes– More comfortable with computation– Using the tools to answer questions– Empowerment (we hope!)

Page 49: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

References -- Texts

• Emphasis on algorithms:• Neil C. Jones and Pavel A. Pevzner, An

Introduction to Bioinformatics Algorithms• Michael S. Waterman, Introduction to

Computational Biology• Bio/Math Balanced:• Paul G. Higgs and Teresa K. Attwood,

Bioinformatics and Molecular Evolution• The Bible of Phylogenetics:• Joseph Felsenstein, Inferring Phylogenies

Page 50: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

References -- Websites

• http://mbi.ohio-state.edu/2005/tutorials2005.html

(tutorial on tree construction)• http://bioalgorithms.info/courses.php(list of links to bioinformatics course

websites)• http://tree-thinking.org/(resources for learning and teaching)

Page 51: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Glossary (for the faint of heart)

• Taxon (plural taxa) or operational taxonomic unit (OTU) – an entity (such as a species, protein sequence, language, etc.) whose distance from or similarity to other entities can be measured.

• Phylogeny – the evolutionary history of some collection of taxa, i.e., tracking lineages as the taxa change through time.

• Phylogenetic tree – a graphic representation of a phylogeny.

Page 52: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

More Glossary

• Matrix – a rectangular array of data

• Graph – a collection of nodes (aka vertices) (usually represented by dots) and edges (connected pairs of vertices, usually represented by line segments)

Example:

Page 53: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Even More Glossary

• Connected graph -- In a connected graph, it is always possible to get from any node to any other node by following the edges. Here is an example of a graph that is not connected, since we can’t get from to

Page 54: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Glossary- are we there yet?

• Cycle -- a graph has a cycle if you can start at some node and, following the edges, get back to that node without backtracking. Here is a graph with a cycle marked in red.

Page 55: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

Glossary – almost done• Tree – a connected graph with no cycles• Weighted tree – a tree whose edges are

labelled to represent distances• Additive tree – a tree where no matter what

three nodes you choose, say A, B and C, the distance from A to B plus the distance from B to C is the same as the distance from A to C.

• Degree of a node (or valence) - the number of edges attached to a node

• Rooted tree – a tree where some node has been specially designated. (Usually we interpret the root to be the ancestral taxon.

Page 56: Taking the Bite (Byte?) Out of Phylogeny Jennifer Galovich Lucy Kluckhohn Jones Holly Pinkart

The end of the Glossary

• Binary tree – if rooted the root has degree 2 and all others have degree 1 or 3.

• Internal nodes – nodes in a rooted tree of degree 3

• Leaves – nodes in any tree of degree 1.• Ultrametric tree – a tree is ultrametric if it

meets the three point condition. Any three nodes determine three distances, AB, BC and AC. The three point condition says that the two largest of these three distances must be the same.