Bioinformatics and Computational Biology

Bioinformatics

and

Computational Biology

• Bioinformatics collection and storage of biological information derives knowledge from computer analysis of

biological data

• Computational biology development of algorithms and statistical models

to analyze biological data

Few people adequately trained in both biology and computer science

Genome sequencing, microarrays etc. lead to large amounts of data to be analyzed

Leads to important discoveries

Saves time and money

Why bioinformatics is critical?

Why is the relationship between Computer Science and Biology is essential?

Three main reasons-

First, massive amounts of data have to be stored, analyzed and made accessible

Second, the nature of the data is often such that a computational statistical method is necessary. This applies in particular to the information on the building plans of proteins and spatial organization of their expression in the cell encoded by the DNA.

Third, there is a strong analogy between the DNA sequence and a computer program

Key Areas/Scope of Bioinformatics

1. Organizing biological knowledge in database

2. Analysing sequence data

3. Structural Bioinformatics

4. Pharmacological relevance (Population genetics)

1. Organizing biological knowledge in database

Genbank/Organized DNA sequences - NCBI, EMBL

Protein sequence databank and its structure and functional characteristics. For example, SWISSPROT contains verified protein sequences and more annotations describing the function

of a protein

Literature database – PUBMED, MEDLINE

2. Analysing sequence data Establish the correct order of sequence contigs Find the translation and transcription initiation sites, find promoter sites,

define open reading frames (ORF) Find splice sites, introns, exons Translate the DNA sequence into a protein sequence Compare the DNA sequence to known protein sequences in order to

verify exons etc with homologous sequences.

Multiple sequence alignments Studying evolutionary aspects, by the construction of phylogenetic trees Determining active site residues, and residues specific for subfamilies Predicting protein–protein interactions Analysing single nucleotide polymorphism to hunt for genetic sources of

diseases.

3. Structural Bioinformatics

This branch of bioinformatics is concerned with computational approaches

to predict and analyse the spatial structure of proteins and nucleic acids.

multiple sequence alignment, secondary structure, 3D structure can be predicted with an accuracy above 70 %.

4. Pharmacological relevance

Drug targets in infectious organisms can be revealed by wholegenome comparisons of infectious and non–infectious organisms.

The analysis of single nucleotide polymorphisms reveals genes potentially responsible for genetic diseases.

Prediction and analysis of protein 3D structure is used to develop drugs and understand drug resistance.

Patient databases with genetic profiles, e.g. for cardiovasculardiseases, diabetes, cancer, etc. may play an important role in thefuture for individual health care, by integrating personal geneticprofile (population genetics) into diagnosis.

National Center for Biotechnology information (NCBI)(http://ncbi.nlm.nih.gov)

Ensembl Genome Browser (http://www.ensembl.org) UCSC Genome Browser (http://genome.ucsc.edu/)

WormBase (http://www.wormbase.org/)

AceDB (http://www.acedb.org/)

FlyBase (http://flybase.bio.indiana.edu/)

Genomic Browsers

http://genome.ucsc.edu/



• SWISS-PROT/TrEMBL curated protein sequences http://www.expasy.ch/sprot

• InterPro: Protein families and domains http://www.ebi.ac.uk/interpro

• EXProt: proteins with experimentally verified functions http://www.cmbi.nl/exprot

• Protein Information Resource (PIR) http://pir.georgetown.edu/

Protein databses

NCBI

Continued..

NCBI text search of a protein

Abstract finding by NCBI

Nucleotide search of a typical gene

Continued..

FASTA format

FASTA: FASTA format is a text-based format for representing either nucleic acid sequences or protein sequences, in which base pairs or protein residues are represented using single letter codes.

Documents

Bioinformatics and Computational Biology