View
217
Download
0
Category
Tags:
Preview:
Citation preview
Sequence Databases
What are they and why do we need them
DNA, RNA and Protein (Amino Acids)
What is sequence data?
Why do I need it?
• Evolution• Mutation• Natural Selection• Intra and Inter-species relationships• Niche exploitation• Ecosystems
REALLY?
• Phenotypes come from the proteins.
• Proteins come from the DNA via RNA.
• Changes in DNA cause changes in proteins.
• Changes in proteins cause changes in phenotypes.
YES!
EvolutionMutationNatural SelectionIntra and Inter-species relationshipsNiche exploitationEcosystems
Intra and Inter-species relationshipsNiche exploitationEcosystems
Phenotypes
How do we find those changes?Sequencing
What do Databases let you do?• Explore and investigate sequence data
Classify organisms
Assign a possible function to a gene
Verify a sequences identity
Annotate a genome
Design primers for PCR and probe experiments
Is the Sequence everything?
The sequence itself is not informative; it must be
analyzed by comparative methods against existing
databases to develop hypothesis concerning
relatives and function.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
What is a Database?
Databases allow us to more easily find what we need
What Databases are there?
Ten Important Bioinformatics DatabasesName Address Description
GenBank/DDBJ/EMBL www.ncbi.nlm.nih.gov Nucleotide sequences
Ensembl www.ensembl.org Human/Mouse genome
PubMed www.ncbi.nlm.nih.gov Literature references
NR www.ncbi.nlm.nih.gov Protein sequences
SWISS-PROT www.expasy.ch Protein sequences
InterPro www.ebi.ac.uk Protein domains
OMIM www.ncbi.nlm.nih.gov Genetic diseases
Enzymes www.chem.qmul.ac.uk Enzymes
PDB www.rcsb.org/pdb/ Protein structures
KEGG www.genome.ad.jp Metabolic pathways
Many other specialized Databases are available.
Bioinformatics for Dummies, 2003
What Database should I use?
A.K.A. GenBank
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
How big is GenBank?
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
1977DNA Sequencing
1985PCR
1987AutomatedSequencing
1997CapillarySequencing
Who can put data into GenBank?
Sequence data are submitted to GenBank from scientists from around the world.
Warning: GenBank does not check the validity or accuracy of sequences submitted. This is left up to the scientific community to verify, like all published scientific data.
How do I use GenBank?
www.ncbi.nlm.nih.govProblem 1. You are constructing a phylogeny of Euglenoids and you have determined from the literature that the Beta-tubulin gene is a good gene for this purpose.
How do I start???
QuickTime™ and aMPEG-4 Video decompressor
are needed to see this picture.
How do I use GenBank?
www.ncbi.nlm.nih.gov
Euglenozoa AND tubulin NOT kinetoplastida
AF182759
How do I use GenBank?
Problem 2. You are studying domestication of Sorghum
vulgare. From reading about sorghum you find out that it
is closely related to Zea mays.
You also find out that maize has a wild relative
teosinte that forms multiple stocks. Domesticated maize
forms a single stock. Domesticated sorghum has a single
stock while wild sorghum (Johnsongrass) has multiple
stocks.
Sorghum vulgareQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Sorghum halepense
Johnsongrass Wild
Broomcorn (Sorghum)Domesticated
How do I use GenBank?
Problem 2. Continued
Moreover, the paper states that this trait is controlled by a single gene teosinte branched 1 (tb1).
You wonder “Does sorghum have this gene?”.
The paper does provide a set (Forward and Reverse) PCR primers that where used to isolate and sequence the tb1 gene.
Will they work for Sorghum?
QuickTime™ and aGIF decompressor
are needed to see this picture.
Sequencing Sorghum
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aGIF decompressor
are needed to see this picture.
Sequencing Sorghum
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Sequencing Sorghum
www.ncbi.nlm.nih.gov/BLAST/
>Sorghum_vulgare_sequenceATGGACTTACCGCTTTACCAACAACTGCAGCTCAGCCCGCCTTCCCCAAAGCCGGACCAATCAAGCAGCTTCTACTGCTGCTACCCATGCTCCCCTCCCTTCGCCGCCGCCGCCGCCGACGCCAGCTTTCACCTGAGCTACCAGATCGGTAGTGCCGCCGCCGCCATCCCTCCACAAGCCGTGATCAACTCGCCGGAGGACCTGCCGGTGCAGCCGCTGATGGAGCAGGCGCCGGCGCCGCCTACAGAGCTTGTCGCCTGCGCCAGTGGTGGTGCACAAGGCGCCGGCGTCAGCGTCAGCCTGGACAGGGCGGCGGCCGCGGCCGCCGCGAGGAAAGACCGGCACAGCAAGATATGCACCGCCGGCGGGATGAGGGACCGCCGGATGCGGCTGTCCCTTGACGTCGCCCGCAAGTTCTTCGCGCTCCAGGACATGCTTGGCTTCGACAAGGCCAGCAAGACGGTACAATGGCTCCTCAACACGTCCAAGGCCGCCATCCAGGAGATCATGGCCGACGACGTCGACGCGTCGTCGGAGTGCGTGGAGGATGGCTCCAGCAGCCTCTCCGTCGACGGCAAGCACAACCCGGCGGAGCAGCTGGGAGATCAGAAGCCCAAGGGTAATGGCCGCAGCGAGGGGAAGAAGCCGGCCAAGTCAAGGAAGGCGGCGACCACCCCAAAGCCGCCAAGAAAATCGGGGAATAATGCGCACCCGGTCCCCGACAAGGAGACGAGGGCGAAGGCGAGGGAGAGGGCGAGGGAGCGAACCAAGGAGAAGCACCGGATGCGTTGGGTAAAGCTTGCATCAGCAATTGACGTGGAGGCGGCGGCTGCCTCGGTGGCTAGCGACAGGCCGAGCTCGAACCATTTGAACCACCACCACCACTCATCGTCGTCCATGAACATGCCGCGTGCTGCGGAGGCTGAATTGGAGGAGAGGGAGAGGTGCTCATCAACTCTCAACAATAGAGGAAGGATGCAAGAAATCACAGGGGCGAGCGAGGTGGTCCTAGGCTTTGGCAACGGAGGAGGATACGGCGGCGGCAACTACTACTGCCAAGAACAATGGGAACTCGGTGGAGTCGTCTTTCAGCAGAACTCACGCTTCTACTGA
Does sorghum have the tb1 gene?
Resources at NCBI
GenBank – Molecular DatabasesNucleotides, Proteins, Structures, Expression (ESTs) and Taxonomy.
Literature Databases PubMed, Journals, OMIM, Book, and Citation Matcher.
Genomes and Maps – EntrezMap Viewer, UniGene, COGs, Organism-specific, Organelle, Virus, and Plasmids.
Tools – Software EngineeringBLAST, Sequence Analysis, 3-D Structures, Gene Expression, Literature and Genome Analysis.
EducationBooks, Courses, Public Information.
ResearchBiology, Computers.
Objectives
1. Explain what can you do with sequence data.
2. Explain what a database is.
3. Describe what kinds of data and resources are available.
4. Describe some of the uses of databases.
Other Specialty Databases
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Recommended