Upload
gerald-flowers
View
216
Download
2
Tags:
Embed Size (px)
Citation preview
Applied Bioinformatics
Week 8
Jens Allmer
Practice I
Topic
• Multiple Sequence Alignment Review– Building an MSA– Editing an MSA
• Dendrograms
• Phylogenetic Trees
Choosing Sequences
• How many?– 10 – 15 (less than 50 would be good)
• Seqs should be >30% and <90% identical
• Prefer seqs of similar length
• Prefer seqs without internal repeats or extract them
Choosing Sequences
• While choosing your sequences give them good names
• Some sequences should be well annotated
Create an MSA
• This time use 20 – 50 sequences– From different species
• Use ClustalW for alignment
• Most ClustalW servers display a dendrogram
• Confirm this by using a few of them
Gathering Sequences
• Download the sequences as a FASTA file as well
• Most programs will support this format
Output Formats
• Many different formats– FASTA widely supported– Pdf Only for printing/ storing/ sharing– Pir Similar to fasta – Msf common MSA format– Aln subset of msf
Converting Formats
• http://bioweb.pasteur.fr/seqanal/interfaces/fmtseq.html
• Names (>…) no longer than 15 characters
• Different formats maintain different data
• Converting will introduce the problem of loosing data
• Make sure to have a master copy
Editing Alignments
• http://www.jalview.org• Start the program
• Choose File – Input Alignment – from Textbox
• Copy and paste the ClustalW alignment
Dendrogram
• Jalview also allows you to view different types of Dendrograms based on different similarity measures
• Use Jalview and compare the trees that are constructed based on the different measures
End Practice I
• 15 min break
Theory I
Phylogeny
• Sources– Sequences– Clades– Organims
• Why– Understand evolution– Strain diversity– Epidemiology– Gene predicion
Dendrogram
http://en.wikipedia.org/wiki/Dendrogram
Phylogenetic Tree
Tree Terminology• All circled elements (e.g.: a) are called node(s)• The connections between them are called edge(s) or branch(es)
• The first node that forms the tree is called root (here abcdef)
• Terminal nodes that have only one connection are called leaf(ves) (e.g.: a)
Unrooted Trees (remove red root)
Branch Length
• Arbitrary
• Similarity
• Evolutionary Time
Tree types
• A dendrogram is a broad term for the diagrammatic representation of a phylogenetic tree.
• A cladogram is a tree formed using cladistic methods. This type of tree only represents a branching pattern, i.e., its branch lengths do not represent time.
• A phylogram is a phylogenetic tree that explicitly represents number of character changes through its branch lengths.
• A chronogram is a phylogenetic tree that explicitly represents evolutionary time through its branch lengths.
Sequences• DNA
– Sensitive but quite divergent at longer distances
– Use for very closely related organisms
• cDNA– Still sensitve but less divergent (e.g. introns)
– Use for closely related families
• Protein– Least sensitive but most useful for more distant relationships
– Use for distantly related species
• 16S RNA– Exists in all organisms
– Highly conserved
Overall Process
• Get Sequences• Construct MSA• Compute pairwise distances (for some methods)• Build Tree
– Topology
– Branch Lengths
• Estimate accuracy, reliability– Build several different trees for that
• Visualize the tree
Computational Tree Formation
• Distance Methods– Neighbor-Joining– Least-Squares– UPGMA
• Parsimony– Least number of evolutionary steps
• Maximum Likelihood– Highest probable tree to fit to the hypothesis is
constructed
Neighbor Joining
• Bottom-up clustering method1. Create distance map
2. Join closest nodes
3. Do (1-2) until fully joined
http://en.wikipedia.org/wiki/Neighbor_joining
Least Squares
• Standard approximation approach– Minimizes the sum of the error (squares)
• Example PGLS – Phylogenetic Generalized Least Squares– Needs additional data (traits)
http://www.dynamicgeometry.com/General_Resources/Advanced_Sketch_Gallery/Other_Explorations/Statistics_Collection/Least_Squares.html
UPGMA
• Unweighted Pair Group Method with Arithmetic Mean– Aglomerative hierarchial clustering method– Assumes constant rate of evolution
Similarity Measures
• Sequence– Number of different positions
– Weighted differences• Substitution Matrices
– Pairwise alignments• NW, SW, ..
• Additional measurements or knowlege– Traits
• Parsimony– Number of changes for tree paths
Tree Accuracy
• Bootstrapping– Resample– Recompute– Do many times– Compare results
http://www.sciencedirect.com/science/article/pii/S0191814107000156
http://goergen.deviantart.com/art/Magic-Forrest-Wallpaper-139108299
End Theory I
• Mindmap
• Break
Practice II
Where to get Trees
• Most servers that allow for MSA will also provide at least the guide tree which was used to construct the alignment
• If that’s all you are interested in you don’t need to go any further
Edit your MSA
• Remove blocks consisting of mostly gaps (using JalView)
• Remove N- and C-termini if not conserved well
Easy Tree
• www.ebi.ac.uk/clustalw/• Paste your alignment• Select a tree type• Other options need to be set (see
right)• Press run• Make a screen shot• You can paste it where needed
Phylip (More elaborate tree)
• http://bioweb.pasteur.fr/seqanal/phylogeny/phylip-uk.html
• Choose protdist from the page• Paste the MSA• Bootstrapping e.g.:
Phylip
• Run the query
• Click further analysis
Click Run
Select full screen view
There is your tree
Ugly Tree
• Let’s face it the tree is quite ugly• http://iubio.bio.indiana.edu/treeapp/treeprint-form.html• Select the consense.outtree from the previous website and paste it
into the box
• Select submit to create the tree
• Play around with the formats and settings
Tree Topologies
Other Resources
• http://en.wikipedia.org/wiki/List_of_phylogenetics_software
• http://itol.embl.de/