NCBI FieldGuide NCBI Molecular Biology Resources March 2007 Using Entrez

Preview:

Citation preview

NC

BI

Fie

ldG

uid

e

NCBI Molecular Biology Resources

March 2007

Using Entrez

NC

BI

Fie

ldG

uid

eWWWAccess

Entrez&BLAST

NC

BI

Fie

ldG

uid

e

Genomes

Taxonomy

Entrez: Database Integration

PubMed abstracts

Nucleotide sequences

Protein sequences

3-D Structure

3 -D Structure

Word weight

VAST

BLASTBLAST

Phylogeny

Hard LinkNeighborsRelated Sequences

NeighborsRelated SequencesBLinkDomains

NeighborsRelated Structures

NC

BI

Fie

ldG

uid

e

Database Searching with Entrez

Using limits and field restriction to find human MutL homologLinking and neighboring with MutLMapping SNPs onto structure and the genome

NC

BI

Fie

ldG

uid

e

Global NCBI (Entrez) Search

Human hereditary nonpolyposis colon cancerHuman hereditary nonpolyposis colon cancer

NC

BI

Fie

ldG

uid

e

Global Entrez Search Results

NC

BI

Fie

ldG

uid

e

Nucleotide Sequences

Nucleotide database now three parts•EST expressed sequence tags•GSS genome survey sequences•CoreNucleotide everything else

Nucleotide database now three parts•EST expressed sequence tags•GSS genome survey sequences•CoreNucleotide everything else

NC

BI

Fie

ldG

uid

e

Advanced Search OptionsTabsTabs

NC

BI

Fie

ldG

uid

eMore Precise Nucleotides

Search

nonpolyposis[All Fields] AND colon cancer[Title] AND human[Organism] AND biomol_mrna[Properties] AND srcdb_refseq[Properties]nonpolyposis[All Fields] AND colon cancer[Title] AND human[Organism] AND biomol_mrna[Properties] AND srcdb_refseq[Properties]

NC

BI

Fie

ldG

uid

e

Useful Field Restrictions[Title]: Definition line in GenBank / GenPept format shown in Summary format

glyceraldehyde 3 phosphate dehydrogenase[Title]

[Organism]: NCBI’s taxonomy. Organizing system for molecular databases

mouse[organism]; green plants[organism]; Streptomyces coelicolor[organism]

[Properties]: molecule type, location, database source

biomol_mrna[properties]; biomol_genomic[properties]; gene_in_mitochondrion[properties]; srcdb pdb[properties]

[Filter]: subsets of data, Entrez links

all[filter]; nucleotide mapview[filter]; nucleotide omim[filter]

NC

BI

Fie

ldG

uid

e

Organism Field: NCBI’s Taxonomy

NC

BI

Fie

ldG

uid

e

Useful Properties Field Terms

Molecule type

biomol_mrna

biomol_genomic

GenBank division

gbdiv_est

gbdiv_htg

gbdiv_xxx

Gene location

gene_in_mitochondrion

gene_in_chloroplast

gene_in_genomic

Source Database

srcdb_refseq

srcdb_pdb

srcdb_swiss_prot

NC

BI

Fie

ldG

uid

e

Human MutL RefSeq

GenBank RecordsGenBank Records

NC

BI

Fie

ldG

uid

e

NM_000249: Links

NC

BI

Fie

ldG

uid

e

Literature Links

OMIM

NC

BI

Fie

ldG

uid

e

OMIM: Human Disease Genes

Conserved Domain

NC

BI

Fie

ldG

uid

e

Sequence Links

Finding Homologs and Structures

NC

BI

Fie

ldG

uid

e

Protein Link

BLAST LinkBLAST Link

Conserved DomainsConserved Domains

NC

BI

Fie

ldG

uid

e

BLink: BLAST Link

top 200 onlytop 200 only

Redundant GIsRedundant GIs

NC

BI

Fie

ldG

uid

e

BLink: non-redundant relatives

zebrafish homolog zebrafish homolog

BLASTBLAST

NC

BI

Fie

ldG

uid

e

Short Cut: Related Structures

NC

BI

Fie

ldG

uid

e

E. coli MutL Structure

Cn3D viewerCn3D viewer

Conserved DomainsConserved Domains

3D Domain Neighbors3D Domain Neighbors

Structure NeighborsStructure Neighbors

Pubchem compoundPubchem compound

NC

BI

Fie

ldG

uid

e

MLH1 Domain Structure: CDD

ATPase DomainATPase Domain

Mismatch Repair DomainMismatch Repair Domain

NC

BI

Fie

ldG

uid

e

MLH1: ATPase Domain

NC

BI

Fie

ldG

uid

e

Mapping Polymorphisms onto Structure

NC

BI

Fie

ldG

uid

eGeneView: Variations Human

MLH1

ATPase domain

NC

BI

Fie

ldG

uid

e

Related Structures

NC

BI

Fie

ldG

uid

e

Mapping Variation Onto Structure

Conserved Asn

AsnIle

Ile – Val

NC

BI

Fie

ldG

uid

e

Genome Resources

NC

BI

Fie

ldG

uid

e

NM_000249: Genome Links

NC

BI

Fie

ldG

uid

e

The Map Viewer

Genome BLASTGenome BLAST

NC

BI

Fie

ldG

uid

e

Map Viewer: Human MLH1CustomizableCustomizable

NCBI Assembly

EST Hits

Gene Annotations

Models

Transcripts

Download data and sequencesDownload data and sequences

NC

BI

Fie

ldG

uid

eSynteny: Mammalian Genomes

Albumin Gene FamilyAlbumin Gene Family

NC

BI

Fie

ldG

uid

e

Homologene

early globin gene

A-chain gene B-chain gene

frog A chick A mouse A mouse B chick B frog B

paralogsorthologs orthologs

gene duplication

• Completely Annotated Eukaryotic Genomes

• Homologous UniGene determined for other organisms

• Protein similarities first• Guided by taxonomic tree• Includes orthologs and paralogs

• Completely Annotated Eukaryotic Genomes

• Homologous UniGene determined for other organisms

• Protein similarities first• Guided by taxonomic tree• Includes orthologs and paralogs

NC

BI

Fie

ldG

uid

e

Homologene Cluster

NC

BI

Fie

ldG

uid

e

Rice Homolog

NC

BI

Fie

ldG

uid

e

The Gene Database• Gene Centered Information• Unifies LocusLink and microbial Genomes• 2.4 million records for 3,822 taxaHuman 38,603 Sea Urchin 30,603

Chimpanzee 31,502 Mosquito 13,763

Mouse 60,746 Fruit Fly 21,116

Rat 38,117 C. elegans 20,935

Dog 20,154 Fungi 168,802

Cow 23, 677 Green Plants 76,847

Chicken 18, 469 Archea 74,627

Zebrafish 38, 594 Bacteria 1,361,390

NC

BI

Fie

ldG

uid

e

Genes MLH1: One Stop Shopping

NC

BI

Fie

ldG

uid

e

Genes MLH1: One Stop Shopping (cont.)

NC

BI

Fie

ldG

uid

e

Genes: Display Options and Links

Recommended