21
Navigating NCBI Molecular Databases

Ncbi basic intro_v_pitt_kent_osu

Embed Size (px)

Citation preview

Page 1: Ncbi basic intro_v_pitt_kent_osu

Navigating NCBI Molecular Databases

Page 2: Ncbi basic intro_v_pitt_kent_osu

What does NCBI do?Develop and maintain molecular and bibliographic

databases.

Develop software for searching, and analysis of these data.

Provide Web access point for data and software.

1/24/2017 2

Page 3: Ncbi basic intro_v_pitt_kent_osu

Aspects of Molecular DataSequencesExpressionGenome Maps3D StructuresProtein DomainsHomologous Genes,

Proteins, StructuresPathwaysGenetic Variation

1/24/2017 3

Page 4: Ncbi basic intro_v_pitt_kent_osu

NCBI Databases

Biomedical LiteraturePubMed, PubMed Central, Bookshelf

Molecular Databases and MetadatabasesSequences, Structures, Variation, Chemicals etc.

Clinical / Medical GeneticsGTR, ClinVar, MedGen, OMIM, PubMed Health, dbGaP

1/24/2017 4

Page 5: Ncbi basic intro_v_pitt_kent_osu

Types of Molecular Data at NCBIPrimary Data /Database

Results of a particular technique

Submitted to NCBI

Submitter has editorial control

Curated Data /DatabaseBased on primary

database records

Third party (NCBI) maintains and updates

Often includes additional analyses

1/24/2017 5

Page 6: Ncbi basic intro_v_pitt_kent_osu

Important Primary DatabasesSequences (DNA)

GenBank (International Sequence Database Collaboration) now 2.1 X 1012 bases

Sequence Read Archive (SRA), Next-Gen sequence reads now 9.7 X 1015

bases!

Other databases with a primary component Expression

Gene Expression Omnibus RNA-Seq, Microarray, Other high throughput data

VariationdbSNP small scale variantsdbVar genomic structural studiesDatabase of Genotype and Phenotype (dbGaP)

1/24/2017 6

Page 7: Ncbi basic intro_v_pitt_kent_osu

Example Curated DatabasesSequences

GenPept translations of CDS regions on INSDC records NCBI Reference Sequences (DNA and Protein)

Variation NCBI Reference SNPs (non-redundant set of variants)

StructuresNCBI’s MMDB

based on PDB

Conserved DomainsNCBI Conserved Domain Database

1/24/2017 7

Page 8: Ncbi basic intro_v_pitt_kent_osu

NCBI Search Services and ToolsEntrez integrated literature and molecular databasesGraphical Sequence Viewer annotation viewer and

analysis toolBLAST sequence similarity search serviceVAST structure similarity searchesCn3D 3D structure viewerGenome Workbench standalone sequence analysis

annotation platformSRA Utilities

SRA Run Browser web access for viewing, searching and downloading next-generation reads

SRA toolkit standalone SRA manipulator and client

1/24/2017 8

Page 9: Ncbi basic intro_v_pitt_kent_osu

Web Access

www.ncbi.nlm.nih.gov9

Page 10: Ncbi basic intro_v_pitt_kent_osu

101/24/2017

Page 11: Ncbi basic intro_v_pitt_kent_osu

Quick Guide to Entrez Databases Literature

PubMed, PMC, BooksSequences

Protein, Nuccore, GSS, SRA, AssemblyExpression

GEO profilesVariation

dbSNP, dbVaRProtein and Nucleic acid structures

StructureSmall Molecules

PubChemMedical Genetics

ClinVar, MedGen, GTR

1/24/2017 11

Page 12: Ncbi basic intro_v_pitt_kent_osu

12

Entrez: Integrated Molecular and Sequence Databases

Central Resources / Databases• Taxonomy• BioProject• Assembly• GeneFollow links to others when needed

Nucleotide, Protein, SRA

1/24/2017

The Entrez system: 39 (and counting) integrated databases

Page 13: Ncbi basic intro_v_pitt_kent_osu

Higher Level Resources

1/24/2017 13

Page 14: Ncbi basic intro_v_pitt_kent_osu

Start in High Level Resources

If your question is about data for ...an organism -> Taxonomya gene name -> Gene (common organisms)a large-scale project -> BioProjecta bacterial genome -> Genomea genome sequence -> Assembly

1/24/2017 14

Page 15: Ncbi basic intro_v_pitt_kent_osu

The Gene Database

Organizes gene-centered data Biological role; genomic context; phenotypes; interactions;

literature Sequences

Genomic Transcript Proteins

Best entry point for many biomolecular searchesEukaryotic and Microbial Genomes

17.3 million records for 13,566 taxa

1/24/2017 15

Page 16: Ncbi basic intro_v_pitt_kent_osu

NCBI Reference SequencesProvide a reference standardRepresent all molecules in the central dogma

Selected EukaryotesGenomicTranscriptsProteins

All Prokaryotes and VirusesGenomic and Protein only

Maintained by NCBI staff and outside expertsDistinct accession series

(NC_, AC_, NG_, NM_, NM_, NR_, XM_, XR_)

1/24/2017 16

Page 17: Ncbi basic intro_v_pitt_kent_osu

The easiest Entrez search: Gene

Specific gene:XXX[Symbol] AND YYY[Organism]

APRT[Symbol] AND human[Organism]

apt[Symbol] AND Escherichia coli[Organism]

All genes:YYY[Organism] AND current only[Filter]

zebrafish[Organism]AND current only[Filter]

1/24/2017 17

Page 18: Ncbi basic intro_v_pitt_kent_osu

Related Information:Links and Neighbors

1/24/2017 18

Protein-Structure Shortcut

Page 19: Ncbi basic intro_v_pitt_kent_osu

Related Structures: Summary

1/24/2017 19

Page 20: Ncbi basic intro_v_pitt_kent_osu

UniGene

GEO Profiles

Expression

HomoloGene

Homologs

PubMed

PMC

Literature

Gene

• Genomic Structure• Orthologs via Gpipe

Structure

Structures

SNP ClinVar

Variation

OMIMdbGaP

Nuccore

Protein

Sequences

Homologs via BlinkProteins w Structure via Related Strutures

SRA

201/24/2017