Upload
ben-busby
View
72
Download
0
Embed Size (px)
Citation preview
Navigating NCBI Molecular Databases
What does NCBI do?Develop and maintain molecular and bibliographic
databases.
Develop software for searching, and analysis of these data.
Provide Web access point for data and software.
1/24/2017 2
Aspects of Molecular DataSequencesExpressionGenome Maps3D StructuresProtein DomainsHomologous Genes,
Proteins, StructuresPathwaysGenetic Variation
1/24/2017 3
NCBI Databases
Biomedical LiteraturePubMed, PubMed Central, Bookshelf
Molecular Databases and MetadatabasesSequences, Structures, Variation, Chemicals etc.
Clinical / Medical GeneticsGTR, ClinVar, MedGen, OMIM, PubMed Health, dbGaP
1/24/2017 4
Types of Molecular Data at NCBIPrimary Data /Database
Results of a particular technique
Submitted to NCBI
Submitter has editorial control
Curated Data /DatabaseBased on primary
database records
Third party (NCBI) maintains and updates
Often includes additional analyses
1/24/2017 5
Important Primary DatabasesSequences (DNA)
GenBank (International Sequence Database Collaboration) now 2.1 X 1012 bases
Sequence Read Archive (SRA), Next-Gen sequence reads now 9.7 X 1015
bases!
Other databases with a primary component Expression
Gene Expression Omnibus RNA-Seq, Microarray, Other high throughput data
VariationdbSNP small scale variantsdbVar genomic structural studiesDatabase of Genotype and Phenotype (dbGaP)
1/24/2017 6
Example Curated DatabasesSequences
GenPept translations of CDS regions on INSDC records NCBI Reference Sequences (DNA and Protein)
Variation NCBI Reference SNPs (non-redundant set of variants)
StructuresNCBI’s MMDB
based on PDB
Conserved DomainsNCBI Conserved Domain Database
1/24/2017 7
NCBI Search Services and ToolsEntrez integrated literature and molecular databasesGraphical Sequence Viewer annotation viewer and
analysis toolBLAST sequence similarity search serviceVAST structure similarity searchesCn3D 3D structure viewerGenome Workbench standalone sequence analysis
annotation platformSRA Utilities
SRA Run Browser web access for viewing, searching and downloading next-generation reads
SRA toolkit standalone SRA manipulator and client
1/24/2017 8
Web Access
www.ncbi.nlm.nih.gov9
101/24/2017
Quick Guide to Entrez Databases Literature
PubMed, PMC, BooksSequences
Protein, Nuccore, GSS, SRA, AssemblyExpression
GEO profilesVariation
dbSNP, dbVaRProtein and Nucleic acid structures
StructureSmall Molecules
PubChemMedical Genetics
ClinVar, MedGen, GTR
1/24/2017 11
12
Entrez: Integrated Molecular and Sequence Databases
Central Resources / Databases• Taxonomy• BioProject• Assembly• GeneFollow links to others when needed
Nucleotide, Protein, SRA
1/24/2017
The Entrez system: 39 (and counting) integrated databases
Higher Level Resources
1/24/2017 13
Start in High Level Resources
If your question is about data for ...an organism -> Taxonomya gene name -> Gene (common organisms)a large-scale project -> BioProjecta bacterial genome -> Genomea genome sequence -> Assembly
1/24/2017 14
The Gene Database
Organizes gene-centered data Biological role; genomic context; phenotypes; interactions;
literature Sequences
Genomic Transcript Proteins
Best entry point for many biomolecular searchesEukaryotic and Microbial Genomes
17.3 million records for 13,566 taxa
1/24/2017 15
NCBI Reference SequencesProvide a reference standardRepresent all molecules in the central dogma
Selected EukaryotesGenomicTranscriptsProteins
All Prokaryotes and VirusesGenomic and Protein only
Maintained by NCBI staff and outside expertsDistinct accession series
(NC_, AC_, NG_, NM_, NM_, NR_, XM_, XR_)
1/24/2017 16
The easiest Entrez search: Gene
Specific gene:XXX[Symbol] AND YYY[Organism]
APRT[Symbol] AND human[Organism]
apt[Symbol] AND Escherichia coli[Organism]
All genes:YYY[Organism] AND current only[Filter]
zebrafish[Organism]AND current only[Filter]
1/24/2017 17
Related Information:Links and Neighbors
1/24/2017 18
Protein-Structure Shortcut
Related Structures: Summary
1/24/2017 19
UniGene
GEO Profiles
Expression
HomoloGene
Homologs
PubMed
PMC
Literature
Gene
• Genomic Structure• Orthologs via Gpipe
Structure
Structures
SNP ClinVar
Variation
OMIMdbGaP
Nuccore
Protein
Sequences
Homologs via BlinkProteins w Structure via Related Strutures
SRA
201/24/2017
Getting Help
Learn: <ncbi>/learn.shtmlFactsheets: <ftp>/pub/factsheets/NCBI YouTube Channel: (www.youtube.com/ncbinlm)NCBI Helpdesk: [email protected]
1/24/2017 21