Bioinformatics

1.Seyed mohammad motevalli December 2013

2. outline Introduction to bioinformatics Biological databases Sequence alignment and their algorithms Structural prediction Web-based tools Stand-alone software 3. Introduction to bioinformatics What is the bioinformatics? Bioinformatics is an interdisciplinary research area at the interface between computer science and biological science. 4. Introduction to bioinformatics What are differences between bioinformatics andinformatics? What are differences between bioinformatics and computational biology? What is the algorithm? 5. What is the proteomics!? 6. Biological databases DatabaseA database is a computerized archive used to store and organize data in such a way that information can be retrieved easily via a variety of search criteria Entry Each record should contain a number of fields that hold the actual data items Value a particular piece of information Making a query To retrieve a particular record from the database, a user can specify a value to be found in a particular field and expect the computer to retrieve the whole data record 7. Biological databases Primary databases Gen bank (NCBI) EMBL DDBJwww.ncbi.nlm.nih.gov www.ebi.ac.uk/embl/index.html www.ddbj.nig.ac.jp Secondary databases ExPASY PIR SWISS-Prothttp://web.expasy.org http://pir.georgetown.edu/pirwww/pirhome3.shtml www.ebi.ac.uk/swissprot/access.html 8. Biological databases Interconnection between Biological Databases 9. Biological databases Pitfalls of biological databases The causes of redundancy include: repeated submission of identical oroverlapping sequences by the same or different authors, revision of annotations, dumping of expressed sequence tags (EST) data Redundant sequences Non-redundant sequences (Ref Seq) 10. Biological databases Further databases NCBI www.ncbi.nlm.nih.gov Uniprot http://www.uniprot.org ExPASY http://web.expasy.org PIR http://pir.georgetown.edu/ SWISS-Prot http://swissmodel.expasy.org/ PDB http://www.rcsb.org/pdb/home/home.do Enzyme structure http://www.ebi.ac.uk/thornton-srv/databases/enzymes 11. Biological databases NCBIwww.ncbi.nlm.nih.gov 12. Biological databases Uniprothttp://www.uniprot.org 13. Biological databases ExPASYhttp://web.expasy.org 14. Biological databases PIRhttp://pir.georgetown.edu/ 15. Biological databases SWISS-Prothttp://swissmodel.expasy.org/ 16. Biological databases PDBhttp://www.rcsb.org/pdb/home/home.do 17. Biological databases Enzyme structurehttp://www.ebi.ac.uk/thornton-srv/databases/enzymes 18. Sequence alignment and their algorithms Pairwise sequence alignment Pairwise sequence alignment is the process of aligning two sequences and is the basis of database similarity searching and multiple sequence alignment Sequence similarity versus sequence homology When two sequences are descended from a common evolutionary origin, they are said to have a homologous relationship or share homology. A related but different term is sequence similarity, which is the percentage of aligned residues that are similar in physiochemical properties such as size, charge, and hydrophobicity Sequence similarity versus sequence identity In a protein sequence alignment, sequence identity refers to the percentage of matches of the same amino acid residues between two aligned sequences. Similarity refers to the percentage of aligned residues that have similar physicochemical characteristics and can be more readily substituted for each other 19. Sequence alignment and their algorithms Sequence alignment strategies Global alignmentIn global alignment, two sequences to be aligned are assumed to be generally similar over their entire length. Alignment is carried out from beginning to end of both sequences to find the best possible alignment across the entire length between the two sequences Local alignment In local alignment does not assume that the two sequences in question have similarity over the entire length. It only finds local regions with the highest level of similarity between the two sequences and aligns these regions without regard for the alignment of the rest of the sequence regions 20. Sequence alignment and their algorithms 21. Sequence alignment and their algorithms Linear gap penalty: The cost for creation and extension of gaps are the sameW(I)= gI, g is the cost for each gap and I is the lengthAffine gap penalty: different cost for creation and extension W(I)=gopen + gext (I-1) and gopen < GextSS,W I 22. Sequence alignment and their algorithms Alignment Algorithms And Methodes The dot matrix method The word method The dynamic programming method 23. Sequence alignment and their algorithms Alignment Algorithms The dot matrix methodThe most basic sequence alignment method is the dot matrix method, also known as the dot plot method 24. Sequence alignment and their algorithms Alignment Algorithms The word methodIt works by finding short stretches of identical or nearly identical letters in two sequences. These short strings of characters are called words, which are similar to the windows used in the dot matrix method 25. Sequence alignment and their algorithms Alignment Algorithms The word method 26. Sequence alignment and their algorithms Alignment Algorithms The dynamic programming methodDynamic programming is a method that determines optimal alignment by matching two sequences for all possible pairs of characters between the two sequences 27. Sequence alignment and their algorithms Alignment Algorithms The dynamic programming method Global alignmentThe classical global pairwise alignment algorithm using dynamic programming is the NeedlemanWunsch algorithm. In this algorithm, an optimal alignment is obtained over the entire lengths of the two sequences Local alignmentThe first application of dynamic programming in local alignment is the SmithWaterman algorithm. In this algorithm, positive scores are assigned for matching residues and zeros for mismatches. No negative scores are used 28. Sequence alignment and their algorithms substitution matrix PAM matrices (point accepted mutation)The PAM matrices were subsequently derived based on the evolutionary divergence between sequences of the same cluster. One PAM unit is defined as 1% of the amino acid positions that have been changed. Because of the use of very closely related homologs, the observed mutations were not expected to significantly change the common function of the proteins 29. Sequence alignment and their algorithms substitution matrix PAM matrices (point accepted mutation) 30. Sequence alignment and their algorithms substitution matrix BLOSUM matricesThis is the series of blocks amino acid substitution matrices (BLOSUM), all of which are derived based on direct observation for every possible amino acid substitution in multiple sequence alignments 31. Sequence alignment and their algorithms substitution matrix BLOSUM matrices 32. Sequence alignment and their algorithms What Matrices should be used and when? Matrix PAM40Best use Similarity (%) Short alignment that are 70-90 highly similar PAM160 Detecting members of a 50-60 protein family PAM250 Longer alignments of more App. 30 divergent sequences BLUSOM90 Short alignment that are 70-90 highly similar BLUSOME80 Detecting members of a 50-60 protein family BLUSOME62 Most effective in finding 30-40 all potential similarities BLUSOME30 Longer alignments of more

Technology

Bioinformatics