20
Sequence Databases What are they and why do we need them

Sequence Databases What are they and why do we need them

Embed Size (px)

Citation preview

Page 1: Sequence Databases What are they and why do we need them

Sequence Databases

What are they and why do we need them

Page 2: Sequence Databases What are they and why do we need them

DNA, RNA and Protein (Amino Acids)

What is sequence data?

Why do I need it?

• Evolution• Mutation• Natural Selection• Intra and Inter-species relationships• Niche exploitation• Ecosystems

REALLY?

Page 3: Sequence Databases What are they and why do we need them

• Phenotypes come from the proteins.

• Proteins come from the DNA via RNA.

• Changes in DNA cause changes in proteins.

• Changes in proteins cause changes in phenotypes.

YES!

EvolutionMutationNatural SelectionIntra and Inter-species relationshipsNiche exploitationEcosystems

Intra and Inter-species relationshipsNiche exploitationEcosystems

Phenotypes

How do we find those changes?Sequencing

Page 4: Sequence Databases What are they and why do we need them

What do Databases let you do?• Explore and investigate sequence data

Classify organisms

Assign a possible function to a gene

Verify a sequences identity

Annotate a genome

Design primers for PCR and probe experiments

Is the Sequence everything?

The sequence itself is not informative; it must be

analyzed by comparative methods against existing

databases to develop hypothesis concerning

relatives and function.

Page 5: Sequence Databases What are they and why do we need them

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

What is a Database?

Databases allow us to more easily find what we need

Page 6: Sequence Databases What are they and why do we need them

What Databases are there?

Ten Important Bioinformatics DatabasesName Address Description

GenBank/DDBJ/EMBL www.ncbi.nlm.nih.gov Nucleotide sequences

Ensembl www.ensembl.org Human/Mouse genome

PubMed www.ncbi.nlm.nih.gov Literature references

NR www.ncbi.nlm.nih.gov Protein sequences

SWISS-PROT www.expasy.ch Protein sequences

InterPro www.ebi.ac.uk Protein domains

OMIM www.ncbi.nlm.nih.gov Genetic diseases

Enzymes www.chem.qmul.ac.uk Enzymes

PDB www.rcsb.org/pdb/ Protein structures

KEGG www.genome.ad.jp Metabolic pathways

Many other specialized Databases are available.

Bioinformatics for Dummies, 2003

Page 7: Sequence Databases What are they and why do we need them

What Database should I use?

A.K.A. GenBank

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 8: Sequence Databases What are they and why do we need them

How big is GenBank?

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

1977DNA Sequencing

1985PCR

1987AutomatedSequencing

1997CapillarySequencing

Page 9: Sequence Databases What are they and why do we need them

Who can put data into GenBank?

Sequence data are submitted to GenBank from scientists from around the world.

Warning: GenBank does not check the validity or accuracy of sequences submitted. This is left up to the scientific community to verify, like all published scientific data.

Page 10: Sequence Databases What are they and why do we need them

How do I use GenBank?

www.ncbi.nlm.nih.govProblem 1. You are constructing a phylogeny of Euglenoids and you have determined from the literature that the Beta-tubulin gene is a good gene for this purpose.

How do I start???

QuickTime™ and aMPEG-4 Video decompressor

are needed to see this picture.

Page 11: Sequence Databases What are they and why do we need them

How do I use GenBank?

www.ncbi.nlm.nih.gov

Euglenozoa AND tubulin NOT kinetoplastida

AF182759

Page 12: Sequence Databases What are they and why do we need them

How do I use GenBank?

Problem 2. You are studying domestication of Sorghum

vulgare. From reading about sorghum you find out that it

is closely related to Zea mays.

You also find out that maize has a wild relative

teosinte that forms multiple stocks. Domesticated maize

forms a single stock. Domesticated sorghum has a single

stock while wild sorghum (Johnsongrass) has multiple

stocks.

Page 13: Sequence Databases What are they and why do we need them

Sorghum vulgareQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Sorghum halepense

Johnsongrass Wild

Broomcorn (Sorghum)Domesticated

Page 14: Sequence Databases What are they and why do we need them

How do I use GenBank?

Problem 2. Continued

Moreover, the paper states that this trait is controlled by a single gene teosinte branched 1 (tb1).

You wonder “Does sorghum have this gene?”.

The paper does provide a set (Forward and Reverse) PCR primers that where used to isolate and sequence the tb1 gene.

Will they work for Sorghum?

Page 15: Sequence Databases What are they and why do we need them

QuickTime™ and aGIF decompressor

are needed to see this picture.

Sequencing Sorghum

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aGIF decompressor

are needed to see this picture.

Page 16: Sequence Databases What are they and why do we need them

Sequencing Sorghum

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 17: Sequence Databases What are they and why do we need them

Sequencing Sorghum

www.ncbi.nlm.nih.gov/BLAST/

>Sorghum_vulgare_sequenceATGGACTTACCGCTTTACCAACAACTGCAGCTCAGCCCGCCTTCCCCAAAGCCGGACCAATCAAGCAGCTTCTACTGCTGCTACCCATGCTCCCCTCCCTTCGCCGCCGCCGCCGCCGACGCCAGCTTTCACCTGAGCTACCAGATCGGTAGTGCCGCCGCCGCCATCCCTCCACAAGCCGTGATCAACTCGCCGGAGGACCTGCCGGTGCAGCCGCTGATGGAGCAGGCGCCGGCGCCGCCTACAGAGCTTGTCGCCTGCGCCAGTGGTGGTGCACAAGGCGCCGGCGTCAGCGTCAGCCTGGACAGGGCGGCGGCCGCGGCCGCCGCGAGGAAAGACCGGCACAGCAAGATATGCACCGCCGGCGGGATGAGGGACCGCCGGATGCGGCTGTCCCTTGACGTCGCCCGCAAGTTCTTCGCGCTCCAGGACATGCTTGGCTTCGACAAGGCCAGCAAGACGGTACAATGGCTCCTCAACACGTCCAAGGCCGCCATCCAGGAGATCATGGCCGACGACGTCGACGCGTCGTCGGAGTGCGTGGAGGATGGCTCCAGCAGCCTCTCCGTCGACGGCAAGCACAACCCGGCGGAGCAGCTGGGAGATCAGAAGCCCAAGGGTAATGGCCGCAGCGAGGGGAAGAAGCCGGCCAAGTCAAGGAAGGCGGCGACCACCCCAAAGCCGCCAAGAAAATCGGGGAATAATGCGCACCCGGTCCCCGACAAGGAGACGAGGGCGAAGGCGAGGGAGAGGGCGAGGGAGCGAACCAAGGAGAAGCACCGGATGCGTTGGGTAAAGCTTGCATCAGCAATTGACGTGGAGGCGGCGGCTGCCTCGGTGGCTAGCGACAGGCCGAGCTCGAACCATTTGAACCACCACCACCACTCATCGTCGTCCATGAACATGCCGCGTGCTGCGGAGGCTGAATTGGAGGAGAGGGAGAGGTGCTCATCAACTCTCAACAATAGAGGAAGGATGCAAGAAATCACAGGGGCGAGCGAGGTGGTCCTAGGCTTTGGCAACGGAGGAGGATACGGCGGCGGCAACTACTACTGCCAAGAACAATGGGAACTCGGTGGAGTCGTCTTTCAGCAGAACTCACGCTTCTACTGA

Does sorghum have the tb1 gene?

Page 18: Sequence Databases What are they and why do we need them

Resources at NCBI

GenBank – Molecular DatabasesNucleotides, Proteins, Structures, Expression (ESTs) and Taxonomy.

Literature Databases PubMed, Journals, OMIM, Book, and Citation Matcher.

Genomes and Maps – EntrezMap Viewer, UniGene, COGs, Organism-specific, Organelle, Virus, and Plasmids.

Tools – Software EngineeringBLAST, Sequence Analysis, 3-D Structures, Gene Expression, Literature and Genome Analysis.

EducationBooks, Courses, Public Information.

ResearchBiology, Computers.

Page 19: Sequence Databases What are they and why do we need them

Objectives

1. Explain what can you do with sequence data.

2. Explain what a database is.

3. Describe what kinds of data and resources are available.

4. Describe some of the uses of databases.

Page 20: Sequence Databases What are they and why do we need them

Other Specialty Databases

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.