Sequence Databases What are they and why do we need them

Preview:

Citation preview

Sequence Databases

What are they and why do we need them

DNA, RNA and Protein (Amino Acids)

What is sequence data?

Why do I need it?

• Evolution• Mutation• Natural Selection• Intra and Inter-species relationships• Niche exploitation• Ecosystems

REALLY?

• Phenotypes come from the proteins.

• Proteins come from the DNA via RNA.

• Changes in DNA cause changes in proteins.

• Changes in proteins cause changes in phenotypes.

YES!

EvolutionMutationNatural SelectionIntra and Inter-species relationshipsNiche exploitationEcosystems

Intra and Inter-species relationshipsNiche exploitationEcosystems

Phenotypes

How do we find those changes?Sequencing

What do Databases let you do?• Explore and investigate sequence data

Classify organisms

Assign a possible function to a gene

Verify a sequences identity

Annotate a genome

Design primers for PCR and probe experiments

Is the Sequence everything?

The sequence itself is not informative; it must be

analyzed by comparative methods against existing

databases to develop hypothesis concerning

relatives and function.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

What is a Database?

Databases allow us to more easily find what we need

What Databases are there?

Ten Important Bioinformatics DatabasesName Address Description

GenBank/DDBJ/EMBL www.ncbi.nlm.nih.gov Nucleotide sequences

Ensembl www.ensembl.org Human/Mouse genome

PubMed www.ncbi.nlm.nih.gov Literature references

NR www.ncbi.nlm.nih.gov Protein sequences

SWISS-PROT www.expasy.ch Protein sequences

InterPro www.ebi.ac.uk Protein domains

OMIM www.ncbi.nlm.nih.gov Genetic diseases

Enzymes www.chem.qmul.ac.uk Enzymes

PDB www.rcsb.org/pdb/ Protein structures

KEGG www.genome.ad.jp Metabolic pathways

Many other specialized Databases are available.

Bioinformatics for Dummies, 2003

What Database should I use?

A.K.A. GenBank

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

How big is GenBank?

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

1977DNA Sequencing

1985PCR

1987AutomatedSequencing

1997CapillarySequencing

Who can put data into GenBank?

Sequence data are submitted to GenBank from scientists from around the world.

Warning: GenBank does not check the validity or accuracy of sequences submitted. This is left up to the scientific community to verify, like all published scientific data.

How do I use GenBank?

www.ncbi.nlm.nih.govProblem 1. You are constructing a phylogeny of Euglenoids and you have determined from the literature that the Beta-tubulin gene is a good gene for this purpose.

How do I start???

QuickTime™ and aMPEG-4 Video decompressor

are needed to see this picture.

How do I use GenBank?

www.ncbi.nlm.nih.gov

Euglenozoa AND tubulin NOT kinetoplastida

AF182759

How do I use GenBank?

Problem 2. You are studying domestication of Sorghum

vulgare. From reading about sorghum you find out that it

is closely related to Zea mays.

You also find out that maize has a wild relative

teosinte that forms multiple stocks. Domesticated maize

forms a single stock. Domesticated sorghum has a single

stock while wild sorghum (Johnsongrass) has multiple

stocks.

Sorghum vulgareQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Sorghum halepense

Johnsongrass Wild

Broomcorn (Sorghum)Domesticated

How do I use GenBank?

Problem 2. Continued

Moreover, the paper states that this trait is controlled by a single gene teosinte branched 1 (tb1).

You wonder “Does sorghum have this gene?”.

The paper does provide a set (Forward and Reverse) PCR primers that where used to isolate and sequence the tb1 gene.

Will they work for Sorghum?

QuickTime™ and aGIF decompressor

are needed to see this picture.

Sequencing Sorghum

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aGIF decompressor

are needed to see this picture.

Sequencing Sorghum

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Sequencing Sorghum

www.ncbi.nlm.nih.gov/BLAST/

>Sorghum_vulgare_sequenceATGGACTTACCGCTTTACCAACAACTGCAGCTCAGCCCGCCTTCCCCAAAGCCGGACCAATCAAGCAGCTTCTACTGCTGCTACCCATGCTCCCCTCCCTTCGCCGCCGCCGCCGCCGACGCCAGCTTTCACCTGAGCTACCAGATCGGTAGTGCCGCCGCCGCCATCCCTCCACAAGCCGTGATCAACTCGCCGGAGGACCTGCCGGTGCAGCCGCTGATGGAGCAGGCGCCGGCGCCGCCTACAGAGCTTGTCGCCTGCGCCAGTGGTGGTGCACAAGGCGCCGGCGTCAGCGTCAGCCTGGACAGGGCGGCGGCCGCGGCCGCCGCGAGGAAAGACCGGCACAGCAAGATATGCACCGCCGGCGGGATGAGGGACCGCCGGATGCGGCTGTCCCTTGACGTCGCCCGCAAGTTCTTCGCGCTCCAGGACATGCTTGGCTTCGACAAGGCCAGCAAGACGGTACAATGGCTCCTCAACACGTCCAAGGCCGCCATCCAGGAGATCATGGCCGACGACGTCGACGCGTCGTCGGAGTGCGTGGAGGATGGCTCCAGCAGCCTCTCCGTCGACGGCAAGCACAACCCGGCGGAGCAGCTGGGAGATCAGAAGCCCAAGGGTAATGGCCGCAGCGAGGGGAAGAAGCCGGCCAAGTCAAGGAAGGCGGCGACCACCCCAAAGCCGCCAAGAAAATCGGGGAATAATGCGCACCCGGTCCCCGACAAGGAGACGAGGGCGAAGGCGAGGGAGAGGGCGAGGGAGCGAACCAAGGAGAAGCACCGGATGCGTTGGGTAAAGCTTGCATCAGCAATTGACGTGGAGGCGGCGGCTGCCTCGGTGGCTAGCGACAGGCCGAGCTCGAACCATTTGAACCACCACCACCACTCATCGTCGTCCATGAACATGCCGCGTGCTGCGGAGGCTGAATTGGAGGAGAGGGAGAGGTGCTCATCAACTCTCAACAATAGAGGAAGGATGCAAGAAATCACAGGGGCGAGCGAGGTGGTCCTAGGCTTTGGCAACGGAGGAGGATACGGCGGCGGCAACTACTACTGCCAAGAACAATGGGAACTCGGTGGAGTCGTCTTTCAGCAGAACTCACGCTTCTACTGA

Does sorghum have the tb1 gene?

Resources at NCBI

GenBank – Molecular DatabasesNucleotides, Proteins, Structures, Expression (ESTs) and Taxonomy.

Literature Databases PubMed, Journals, OMIM, Book, and Citation Matcher.

Genomes and Maps – EntrezMap Viewer, UniGene, COGs, Organism-specific, Organelle, Virus, and Plasmids.

Tools – Software EngineeringBLAST, Sequence Analysis, 3-D Structures, Gene Expression, Literature and Genome Analysis.

EducationBooks, Courses, Public Information.

ResearchBiology, Computers.

Objectives

1. Explain what can you do with sequence data.

2. Explain what a database is.

3. Describe what kinds of data and resources are available.

4. Describe some of the uses of databases.

Other Specialty Databases

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Recommended