41
Applied Bioinformatics Week 8 Jens Allmer

Applied Bioinformatics Week 8 Jens Allmer. Practice I

Embed Size (px)

Citation preview

Page 1: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Applied Bioinformatics

Week 8

Jens Allmer

Page 2: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Practice I

Page 3: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Topic

• Multiple Sequence Alignment Review– Building an MSA– Editing an MSA

• Dendrograms

• Phylogenetic Trees

Page 4: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Choosing Sequences

• How many?– 10 – 15 (less than 50 would be good)

• Seqs should be >30% and <90% identical

• Prefer seqs of similar length

• Prefer seqs without internal repeats or extract them

Page 5: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Choosing Sequences

• While choosing your sequences give them good names

• Some sequences should be well annotated

Page 6: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Create an MSA

• This time use 20 – 50 sequences– From different species

• Use ClustalW for alignment

• Most ClustalW servers display a dendrogram

• Confirm this by using a few of them

Page 7: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Gathering Sequences

• Download the sequences as a FASTA file as well

• Most programs will support this format

Page 8: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Output Formats

• Many different formats– FASTA widely supported– Pdf Only for printing/ storing/ sharing– Pir Similar to fasta – Msf common MSA format– Aln subset of msf

Page 9: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Converting Formats

• http://bioweb.pasteur.fr/seqanal/interfaces/fmtseq.html

• Names (>…) no longer than 15 characters

• Different formats maintain different data

• Converting will introduce the problem of loosing data

• Make sure to have a master copy

Page 10: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Editing Alignments

• http://www.jalview.org• Start the program

• Choose File – Input Alignment – from Textbox

• Copy and paste the ClustalW alignment

Page 11: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Dendrogram

• Jalview also allows you to view different types of Dendrograms based on different similarity measures

• Use Jalview and compare the trees that are constructed based on the different measures

Page 12: Applied Bioinformatics Week 8 Jens Allmer. Practice I

End Practice I

• 15 min break

Page 13: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Theory I

Page 14: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Phylogeny

• Sources– Sequences– Clades– Organims

• Why– Understand evolution– Strain diversity– Epidemiology– Gene predicion

Page 15: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Dendrogram

http://en.wikipedia.org/wiki/Dendrogram

Page 16: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Phylogenetic Tree

Page 17: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Tree Terminology• All circled elements (e.g.: a) are called node(s)• The connections between them are called edge(s) or branch(es)

• The first node that forms the tree is called root (here abcdef)

• Terminal nodes that have only one connection are called leaf(ves) (e.g.: a)

Unrooted Trees (remove red root)

Page 18: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Branch Length

• Arbitrary

• Similarity

• Evolutionary Time

Page 19: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Tree types

• A dendrogram is a broad term for the diagrammatic representation of a phylogenetic tree.

• A cladogram is a tree formed using cladistic methods. This type of tree only represents a branching pattern, i.e., its branch lengths do not represent time.

• A phylogram is a phylogenetic tree that explicitly represents number of character changes through its branch lengths.

• A chronogram is a phylogenetic tree that explicitly represents evolutionary time through its branch lengths.

Page 20: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Sequences• DNA

– Sensitive but quite divergent at longer distances

– Use for very closely related organisms

• cDNA– Still sensitve but less divergent (e.g. introns)

– Use for closely related families

• Protein– Least sensitive but most useful for more distant relationships

– Use for distantly related species

• 16S RNA– Exists in all organisms

– Highly conserved

Page 21: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Overall Process

• Get Sequences• Construct MSA• Compute pairwise distances (for some methods)• Build Tree

– Topology

– Branch Lengths

• Estimate accuracy, reliability– Build several different trees for that

• Visualize the tree

Page 22: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Computational Tree Formation

• Distance Methods– Neighbor-Joining– Least-Squares– UPGMA

• Parsimony– Least number of evolutionary steps

• Maximum Likelihood– Highest probable tree to fit to the hypothesis is

constructed

Page 23: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Neighbor Joining

• Bottom-up clustering method1. Create distance map

2. Join closest nodes

3. Do (1-2) until fully joined

http://en.wikipedia.org/wiki/Neighbor_joining

Page 24: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Least Squares

• Standard approximation approach– Minimizes the sum of the error (squares)

• Example PGLS – Phylogenetic Generalized Least Squares– Needs additional data (traits)

http://www.dynamicgeometry.com/General_Resources/Advanced_Sketch_Gallery/Other_Explorations/Statistics_Collection/Least_Squares.html

Page 25: Applied Bioinformatics Week 8 Jens Allmer. Practice I

UPGMA

• Unweighted Pair Group Method with Arithmetic Mean– Aglomerative hierarchial clustering method– Assumes constant rate of evolution

Page 26: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Similarity Measures

• Sequence– Number of different positions

– Weighted differences• Substitution Matrices

– Pairwise alignments• NW, SW, ..

• Additional measurements or knowlege– Traits

• Parsimony– Number of changes for tree paths

Page 27: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Tree Accuracy

• Bootstrapping– Resample– Recompute– Do many times– Compare results

http://www.sciencedirect.com/science/article/pii/S0191814107000156

Page 28: Applied Bioinformatics Week 8 Jens Allmer. Practice I

http://goergen.deviantart.com/art/Magic-Forrest-Wallpaper-139108299

Page 29: Applied Bioinformatics Week 8 Jens Allmer. Practice I

End Theory I

• Mindmap

• Break

Page 30: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Practice II

Page 31: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Where to get Trees

• Most servers that allow for MSA will also provide at least the guide tree which was used to construct the alignment

• If that’s all you are interested in you don’t need to go any further

Page 32: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Edit your MSA

• Remove blocks consisting of mostly gaps (using JalView)

• Remove N- and C-termini if not conserved well

Page 33: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Easy Tree

• www.ebi.ac.uk/clustalw/• Paste your alignment• Select a tree type• Other options need to be set (see

right)• Press run• Make a screen shot• You can paste it where needed

Page 34: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Phylip (More elaborate tree)

• http://bioweb.pasteur.fr/seqanal/phylogeny/phylip-uk.html

• Choose protdist from the page• Paste the MSA• Bootstrapping e.g.:

Page 35: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Phylip

• Run the query

• Click further analysis

Page 36: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Click Run

Select full screen view

There is your tree

Page 37: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Ugly Tree

• Let’s face it the tree is quite ugly• http://iubio.bio.indiana.edu/treeapp/treeprint-form.html• Select the consense.outtree from the previous website and paste it

into the box

• Select submit to create the tree

• Play around with the formats and settings

Page 38: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Tree Topologies

Page 39: Applied Bioinformatics Week 8 Jens Allmer. Practice I
Page 40: Applied Bioinformatics Week 8 Jens Allmer. Practice I
Page 41: Applied Bioinformatics Week 8 Jens Allmer. Practice I

Other Resources

• http://en.wikipedia.org/wiki/List_of_phylogenetics_software

• http://itol.embl.de/