Transcript
Page 1: TEMPLATE DESIGN © 2008  BIOINFORMATICS REFERENCES Your name and the names of the people who have contributed to this presentation

TEMPLATE DESIGN © 2008

www.PosterPresentations.com

BIOINFORMATICS REFERENCESYour name and the names of the people who have contributed to this presentation go here.

The names and addresses of the associated institutions go here.

BIOLOGICAL SEQUENCEBIOINFORMATICS is about searching biological databases, comparing sequences, looking at protein structures, and (more generally) asking biological and biomedical questions with a computer. It is the computational branch of molecular biology.

ANALYZING PROTEIN SEQUENCESteak eating familiarizes you with protein Proteins are found in both fish and vegetablesThey are made up of the same basic building blocks known as Amino Acids – these are complex organic molecules, called carbon, hydrogen, oxygen, nitrogen, and sulfur atoms.

PROTEINSProteins are like small machines in the cell. Proteins carry out most of the work in a cell.Proteins are synthesized from RNA sequences. Proteins are like small machines in the cell.Proteins carry out most of the work in a cell. Proteins are synthesized from RNA sequences.

AMINO ACIDSProteins are made of 20 amino acids.Each amino acid is small molecule made up of fewer than 100 atoms.The 20 amino acids have similar terminations; they can be chained to one another like Lego bricks.

PROTEIN SEQUENCESProteins are made of amino acids chained by peptide bonds.Protein sequences are written from the N to the C-terminus.Your average protein is 400 amino acids long. The longest protein is 30,000 amino acids long.Proteins have well-defined 3-dimensional structures.Hydrophobic amino acids are in the protein’s core.Hydrophilic amino acids are on the protein’s surface.

PROTEIN STRUCTURESProteins have well-defined 3-dimensional structures.Hydrophobic amino acids are in the protein’s core.Hydrophilic amino acids are on the protein’s surface.

DNA: DeoxyriboNucleic AcidGenomes and genes are made of DNADNA is the main support of heredity DNA SEQUENCESDNA sequences are made of 4 nucleotides

Adenine AGuanine GCytosine CThymine T

DNA Sequences can be very longHuman chromosomes contain hundreds of millions of nucleotides

NUCLEOTIDESNucleotides have similar terminations.Nucleotides are meant to be chained like Lego bricks.Nucleotides can interact with each other:

Adenine with thymine (A with T)Guanine with cytosine (G with C)

A tiny bacterium can contain a genome of several million nucleotides

DOUBLE-STRAND DNADNA sequences always come in two strands.The strands are complementary and opposite in orientation.By convention, biologists write only the 5’ and 3’ strands.Database-search programs search both strands automatically .

RNA: Ribonucleic AcidRNA is a close relative of DNARNA has many functions

Provides coding for proteinsHelps synthesize proteinsHelps many basic processes in the cell

RNA is not very stableRNA is synthesized and very often degradedDNA, by contrast, is very stable

THE RNA SEQUENCERNA contains 4 nucleotides:

A, G, C, UU is Uracil

RNA does not contain Thymine (T)Uracil replaces Thymine in RNARNA is single-stranded

RNA SECONDARY STRUCTURESRNA can make secondary structuresRNA can make 1 strand with itself as a secondary structureSecondary structures are made of stems and loops

PUBMED/MEDLINE

MULTIPLE-SEQUENCE ALIGNMENTS (MSAS)

RETRIEVING PROTEIN SEQUENCES IN SWISS-PROT

TYPICAL PROKARYOTIC GENOME

GENBANK

EXPLORING THE HUMAN GENOME WITH ENSEMBL

OPTIONALLOGO HERE

OPTIONALLOGO HERE

TURNING DNA INTO PROTEINS:THE GENETIC CODEDNA gets transcribed into RNA using nucleotide complimentarily.

RNA gets translated into proteins using the genetic code:

UCU UAU GCG UAA SER-TYR-ALA-STOP

PubMed is a database containing all the recent scientific publications in biology PubMed is free You can search PubMed using any keyword you are interested in.Open www.ncbi.nlm.nih.gov/pubmedType your favorite keywordsPress Return or Enter Click the Limits tabCheck the boxes you are interested in, such as

ReviewEnglishAIDS

Restrict the search with fields[AU] Author[SO] Source (journal)[TI] Title[AD] Address[MH] Keywords

The words will be searched only in the corresponding fieldsMedline contains only papers published after 1965Use no more than 10 names for papers before 1995Swiss-Prot is a database containing all the proteins with known functions

Swiss-Prot is available from the ExPAsy server at www.expasy.ch/sprot/ExPASy: Expert Protein Analysis SystemExPASy contains many useful online tools

Each Swiss-Prot entry is dedicated to a proteinA Swiss-Prot entry summarizes everything that is known about a given proteinThe entry contains functional information and links to other databases mentioning this protein

LOOKING FOR DNA SEQUENCESThere are many types of DNA sequencesThe most common are

Regulatory regions, often before genesUntranslated regions, often around the genesProtein-coding regionsIntergenic regions (between the genes)

All these sequences can be found in GenBank

FETCHING A DNA SEQUENCE AT THE NCBI

• Navigate to www.ncbi.nlm.nih.gov/Genbank/

• Type in a keyword.• Press Return or Enter.You get a list of entries

matching your keyword.• Point, click, and explore…

Multiple alignments reveal common features between sequencesMultiple alignments are useful for :- Comparing very different sequences, Making phylogenetic trees, Making structure predictionsMultiple-sequence alignments are abbreviated as MSAs

MAKING AN MSA WITH M-COFFEEOpen www.tcoffee.orgClick MCoffee::RegularCut and paste your sequencesSubmit your MSA

MAKING SENSE OF YOUR MSA

Positions are marked:Completely conserved = asterisk ( * )Highly conserved = colon (:)Conserved = period (.)Look for highly conserved blocks:The red box on this slide shows a highly conserved block.These blocks are often functionally important positions.

PROKARYOTIC ORGANISMS - are organisms lacking a true nucleus.EUKSRYOTIC ORGANISMS - are organisms having a true nucleus.GENE – is defined as the contiguous genome segment encompassing all the nucleotide-sequence information necessary to bring about its successful expression – that is, the production of protein or RNA.The 3 most basic classes of living organism are the -PROKARYOTES – such as bacteria,ARCHAEA – these are bacteria-like organisms living in extreme conditions), andTHE EUKARYOTES – going from microscopic yeast to humans, animals, and plants.FOR BIOINFORMATICS – Prokaryotes and Achaea are very much the same – with few exceptions.

TYPICAL PROKARYOTIC PROTEIN - CODING GENE•The gene has an uninterrupted sequence•Prokaryotic mRNA contains

The Ribosome Binding Site (RBS)The Open Reading Frame (ORF) in one pieceIn operons, the RNA can contain several ORFs

•Eukaryotes can be small (yeast) or big (whales)•Genomes are made of linear pieces of DNA called chromosomes•One chromosome: 10 to 700 Mb •The Human Genome

Contains 22+1 chromosomesIs 3 Gb long

• One gene every 100 Kb (human)•5 % of the genome is coding for proteins

•ProkaryotesGenome=one large circular chromosome + a few small circular chromosomes (plasmides) 0.5 to 8 Mb / chromosomeGenes in one piece70% of the genome is coding1 gene / Kb

•EukaryotesGenome= many large linear chromosomes10 to 700 Mb / chromosomeGenes split 5% of the genome is coding1 gene/ 100 Kb (Human)

PROKARYOTES VS. EUKARYOTES

Housed by the National Center for Biotechnologies (NCBI)GenBank is the memory of biological scienceContains EVERY DNA sequence ever publishedGenBank is the original information source for most biological databases GenBank is more complicated to use than gene-centric databases

•ACCESSION is the accession number

•Unique to each entry•Permanent

•LOCUS contains information on gene size•ORGANISM Defines the organism containing the gene•REFERENCE indicates who produced the sequence•FEATURES lists some functional features of the gene•GenBank entries can contain more than one gene

READING A PROKARYOTIC

GENBANK ENTRY

Accessible at www.ensembl.orgENSEMBL is a database of eukaryotic genomesAnnotated entriesWide range of examples: human, mouse, dog, and so onENSEMBL annotation is mostly automatedENSEMBL contains tools toBrowse the complete genomeSearch the complete genome with BLASTVisualize the position of a geneVisualize all experimental information on this gene (transcripts)

By pointing on a chromosome region you can zoom inside the chromosomeAll genes are cross-indexed with databases so you can find all related experimental information

Recommended