Computational biology bls 303

Preview:

Citation preview

Principles of Computational biology- Sequence Databases

- Nucleic acids (DNA/RNA)- Proteins

• (Biological) information and programs to work with this information are kept in websites.

• In many sites, you will also find userguides, tutorials, helpfiles,

• However, be aware that websites evolve at much faster rates than DNA, proteins and organisms...

• Web addresses or URLs (Uniform Resource Locations) may change without notification, and useful new sites emerge daily!!

Why then computers?

- Nucleotides (DNA/RNA)- Proteins

before Using a sequence file in a sequence analysis program it is important to ensure that computer sequence files

contain only sequence characters and not special sequence characters used by text editors. Editors usually provide a way to save files with only standard ASCII characters and these are

the files that will be suitable for most sequence analysis programs. For most sequence analysis programs require not only that a DNA or protein sequence file be a standard ASCII

file also that the file be in a particular format such as the FASTA format.

FASTA format• Includes three parts: - a comment line identified by “˃” followed by the

name and origin of the sequence - the sequence in the one standard letter symbol - an optional “*” to mark the end of the sequence

Terms used to Search for current internet addresses

• ACEDB – database management system for genetic information of an organism

• FASTA and BLAST – tools for fast searches of databases for similar sequences.

• CLUSTAW and T-COFFEE - example of multiple sequence programs.

• DDJP DNA - DataBank of Japan • EBI – European Bioinformatics institute.• ENSEMBL -The genome server at the European

bioinformatics institute.• ENTREZ – search engine for GenBank and Pubmed.• MIPS- Munich Center for Protein Sequences.

Cont.• NCBI – National Center for Biotechnology

Information, home of Genbank.• RDP- Ribosomal RNA database.• TIGR – the institute of Genomic Research.• PROSITE- Databases of conserved patterns in

proteins related to activity.

Recommended