Entrez Digital Tools and Utilities

Preview:

DESCRIPTION

A guide on how to use NCBI Entrez

Citation preview

NCBI Entrez Digital Tools and Utilities

Jonathan A. Kans, Ph.D.Staff Scientist, NCBIjkans@stanford.edu

1

Topics

• Advanced Features of Entrez (to help separate the wheat from the chaff)

• Programmatic Access with EUtils (automate repeatable multi-step queries)

• EBot Generated Scripts (if you really don't want to write a program)

2

Comparative Analysis

• Anatomy

• Physiology

• Biochemistry

• Gene Sequences

3

Central Dogma of Molecular Biology

DNA(information)

RNA(expression)

Protein(function)

transcription(polymerase)

translation(ribosome)

mRNA

CDS

4

Genetic Diseases

• Specific molecular defects explain disease

• β-globin gene and protein sequences ...ATGGTGCATCTGACTCCTGAGGAGAAG...AAGTATCACTAA... (M) V H L T P E E K ... K Y H (*)

• Sickle-cell anemia variant ...ATGGTGCATCTGACTCCTGTGGAGAAG...AAGTATCACTAA... (M) V H L T P V E K ... K Y H (*)

5

Evolutionary Conservation3000 M yr

1000 M yr

500 M yr

HumanFlyWormYeastBacteria Mouse

Human 638 RHACVEVQDEIAFIPNDVYFEKDKQMFHIITGPNMGGKSTYIRQTGVIVLMAQIGCFVPC 697Yeast 657 RHPVLEMQDDISFISNDVTLESGKGDFLIITGPNMGGKSTYIRQVGVISLMAQIGCFVPC 716E.coli 584 RHPVVEQVLNEPFIANPLNLSPQRR-MLIITGPNMGGKSTYMRQTALIALMAYIGSYVPA 642

Colon cancer gene sequence (DNA mismatch repair protein)

6

Design of Entrez

Amino acid sequence similarity

Coding region

features

Literature citations in sequence

Literature citations in sequence

MEDLINE

Nucleotide Protein

Term frequency statistics

Nucleotide sequence similarity

7

Entrez Databases

8

PubMed Search

9

PubMed Fields

10

Advanced Search

11

Field AbbreviationsAffiliation [AFFL] Issue [ISS]All Fields [ALL] Journal [JOUR]Author [AUTH] Language [LANG]Author - Corporate [COLN] Location ID [LID]Author - First [FAUT] MeSH Major Topic [MAJR]Author - Full [FULL] MeSH Subheading [SUBH]Author - Last [LAUT] MeSH Terms [MESH]Book [BOOK] Pagination [PAGE]Date - Completion [CDAT] Pharmacological Action [PAPX]Date - Create [CRDT] Publication Type [PTYP]Date - Entrez [EDAT] Publisher [PUBN]Date - MeSH [MHDA] Publisher ID [PID]Date - Modification [MDAT] Secondary Source ID [SI]Date - Publication [PDAT] Supplementary Concept [SUBS]EC/RN Number [ECNO] Text Word [WORD]Editor [ED] Title [TITL]Filter [FILT] Title/Abstract [TIAB]Grant Number [GRNT] Transliterated Title [TT]ISBN [ISBN] UID [UID]Investigator [INVR] Volume [VOL]Investigator - Full [FINV]

12

MeSH CategoriesAnatomyOrganismsDiseasesChemicals and DrugsAnalytical, Diagnostic and Therapeutic Techniques and EquipmentPsychiatry and PsychologyPhenomena and ProcessesDisciplines and OccupationsAnthropology, Education, Sociology and Social PhenomenaTechnology, Industry, AgricultureHumanitiesInformation ScienceNamed GroupsHealth CarePublication CharacteristicsGeographicals

13

Organism HierarchyEukaryota Alveolata Amoebozoa Animals Animal Population Groups Choradata Invertebrates Choanoflagellata Cryptophyta Diplomonadida Euglenozoa Fungi Haptophyta Mesomycetozoea Oxymonadida Parabasalidea Plants Retortamonadidae Rhizaria StramenopilesArchaeaBacteriaVirusesOther Forms

14

Useful Querieshumans [MESH]pharmacokinetics [MESH]chemically induced [SUBH]all child [FILT]loprovflybase [FILT]randomized controlled trial [FILT]clinical trial, phase ii [PTYP]

mammalia [ORGN]mammalia [ORGN:noexp]cds [FKEY]lacz [GENE]beta galactosidase [PROT]biomol genomic [PROP]dbxref flybase [PROP]gbdiv phg [PROP]src cultivar [PROP]srcdb refseq validated [PROP]150:200 [SLEN]

15

Structured Query

transposition [TITL] AND (protease OR peptidase) NOT humans [MESH]

16

Using History

17

History Results

18

PubMed Record

19

Neighbor Hyperlink

20

Related Citations

21

Relevant Publication

22

Selecting Target

23

GenBank Record

24

Graphical View

25

LOCUS HUMADH1CB 1400 bp mRNA PRI 15-JUN-1989DEFINITION Homo sapiens class I alcohol dehydrogenase (ADH1) alpha subunit mRNA, complete cds.ACCESSION M12271KEYWORDS alcohol dehydrogenase; dehydrogenase.SOURCE Human liver, cDNA to mRNA, clone pUCADH-alpha-15L. ORGANISM Homo sapiens Eukaryota; Animalia; Metazoa; Chordata; Vertebrata; Mammalia; Theria; Eutheria; Primates; Haplorhini; Catarrhini; Hominidae; Homo; sapiens.REFERENCE 1 (bases 1 to 1400) AUTHORS Ikuta,T., Szeto,S. and Yoshida,A. TITLE Three human alcohol dehydrogenase subunits: cDNA structure and molecular and evolutionary divergence JOURNAL Proc. Natl. Acad. Sci. U.S.A. 83 (3), 634-638 (1986) STANDARD full staff_reviewCOMMENT A draft entry and printed copy of the sequence in [1] were kindly provided by A.Yoshida, 30-MAY-1986. The other human class I ADH1 alpha subunit sequence is found under accession M11307.FEATURES Location/Qualifiers mRNA <1..1400 /note="ADH1 mRNA" CDS 16..1143 /note="alcohol dehydrogenase alpha subunit (EC 1.1.1.1)" /map="'4q21' /hgml_locus_uid='LJ0082S'" /gene="ADH1"BASE COUNT 400 a 294 c 340 g 366 tORIGIN 52 bp upstream of PvuII site; chromosome 4q21. 1 gaagacagaa tcaacatgag cacagcagga aaagtaatca aatgcaaagc agctgtgcta 61 tgggagttaa agaaaccctt ttccattgag gaggtggagg ttgcacctcc taaggcccat 121 gaagttcgta ttaagatggt ggctgtagga atctgtggca cagatgacca cgtggttagt 181 ggtaccatgg tgaccccact tcctgtgatt ttaggccatg aggcagccgg catcgtggag 241 agtgttggag aaggggtgac tacagtcaaa ccaggtgata aagtcatccc actcgctatt 301 cctcagtgtg gaaaatgcag aatttgtaaa aacccggaga gcaactactg cttgaaaaac 361 gatgtaagca atcctcaggg gaccctgcag gatggcacca gcaggttcac ctgcaggagg 421 aagcccatcc accacttcct tggcatcagc accttctcac agtacacagt ggtggatgaa 481 aatgcagtag ccaaaattga tgcagcctcg cctctagaga aagtctgtct cattggctgt 541 ggattttcaa ctggttatgg gtctgcagtc aatgttgcca aggtcacccc aggctctacc 601 tgtgctgtgt ttggcctggg aggggtcggc ctatctgcta ttatgggctg taaagcagct 661 ggggcagcca gaatcattgc ggtggacatc aacaaggaca aatttgcaaa ggccaaagag 721 ttgggggcca ctgaatgcat caaccctcaa gactacaaga aacccatcca ggaggtgcta

26

ENTRY DEHUAA #Type ProteinTITLE Alcohol dehydrogenase alpha chain - Human #EC - number 1.1.1.1DATE 28-Dec-1987 #Sequence 28-Dec-1987 #Text 30-Sep-1989PLACEMENT 27.0 1.0 1.0 1.0 1.0 SOURCE Homo sapiens # Common-name manACCESSION A25428REFERENCE (Sequence translated from the mRNA sequence) #Authors Ikuta T., Szeto S., Yoshida A. #Journal Proc. Nat. Acad. Sci. USA (1986) 83:634-638 #Title Three human alcohol dehydrogenase subunits: cDNA structure and molecular and evolutionary divergence.GENETIC #Map-position 4q21-q25 #Name ADH1SUPERFAMILY #Name alcohol dehydrogenaseKEYWORDS oxidoreductase SUMMARY #Molecular-weight 39858 #Length 375 #Checksum 7545SEQUENCE 5 10 15 20 25 30 1 M S T A G K V I K C K A A V L W E L K K P F S I E E V E V A 31 P P K A H E V R I K M V A V G I C G T D D H V V S G T M V T 61 P L P V I L G H E A A G I V E S V G E G V T T V K P G D K V 91 I P L A I P Q C G K C R I C K N P E S N Y C L K N D V S N P 121 Q G T L Q D G T S R F T C R R K P I H H F L G I S T F S Q Y 151 T V V D E N A V A K I D A A S P L E K V C L I G C G F S T G 181 Y G S A V N V A K V T P G S T C A V F G L G G V G L S A I M 211 G C K A A G A A R I I A V D I N K D K F A K A K E L G A T E 241 C I N P Q D Y K K P I Q E V L K E M T D G G V D F S F E V I 271 G R L D T M M A S L L C C H E A C G T S V I V G V P P D S Q 301 N L S M N P M L L L T G R T W K G A I L G G F K S K E C V P 331 K L V A D F M A K K F S L D A L I T H V L P F E K I N E G F 361 D L L H S G K S I R T I L M F ///

27

Same Publication?

JOURNAL Proc. Natl. Acad. Sci. U.S.A. 83 (3), 634-638 (1986)

#Journal Proc. Nat. Acad. Sci. USA (1986) 83:634-638

28

Exponential Growth

29

Sequence Identifiers

Accession: AH006997GI Number: 6849043Accn.Ver: AH006997.2FASTA: >gi|6849043|gb|AH006997.2

30

Sequence AssemblyNC_000022.9

NT_028395.3 NT_011519.10

AP000522.1

AP000523.1GATCTGATAAGTCCCAGGAC …

… TGGTATCCACCTGGGGCCTG …

join(gap(14430000),gi|89058412:1..647850,gap(150000),gi|29806588:1..3661581 …)

join(gi|5931500:1..37693,gi|5931501:2273..41306 …)

… …

… … …

31

Features and Qualifiersgene 1..417 /gene="INS" /db_xref="GeneID:449570"CDS 60..392 /gene="INS" /codon_start=1 /product="proinsulin precursor" /protein_id="NP_001008996.1" /translation="MALWMRLLPLL ... YQLENYCN"sig_peptide 60..131 /gene="INS"mat_peptide 132..389 /gene="INS" /product="Insulin"

32

Graphical Views

33

Translation ValidationDNA ...cgaaaagGTGGTAGTGTAGGAGACGGTGAAGctaaga.../translation - V V * E T V KProtein M V V L E T E K

SEQ_FEAT_StartCodon SEQ_FEAT_MismatchAA

SEQ_FEAT_InternalStop SEQ_FEAT_NotSpliceConsensusDonor

34

Alignments

• Describe relationships between sequences

• Can reflect evolutionary conservation, structural similarity, functional similarity

• Can be generated algorithmically (e.g., BLAST) or manually

MRLTLLC-------EGEEGSELPLCASCGQRIELKYKPECYPDVKNSLHVMRLTLLCCTWREERMGEEGSELPVCASCGQRLELKYKPECFPDVKNSIHAMRLTCLCRTWREERMGEEGSEIPVCASCGQRIELKYKPE-----------

35

Original Databases

Amino acid sequence similarity

Coding region

features

Literature citations in sequence

Literature citations in sequence

MEDLINE

Nucleotide Protein

Term frequency statistics

Nucleotide sequence similarity

36

Discovery Space

Nucleotide sequences

Protein sequences

Taxon

Phylogeny 3-D Structure

MMDB

3 -D Structure

PubMed abstracts

Complete Genomes

PubMed Entrez Genomes

Publishers Genome Centers

37

Data Integration

38

Leveraging ResourcesGenBank

RefSeq

Human Genome

Bacterial Genome

Virus Genome

MMDB

PubMed

UniGene(s)

LocusLink

OMIM

Taxonomy

GEO

PopSet

BLAST

Entrez

ePCR

Sequin

39

Entrez Utilities• EInfo

• ESearch

• ESummary

• EFetch

• ELink

• EPost

40

EUtils Base URL

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/program.fcgi?arguments

41

EUtils Argumentsdb pubmed | nucleotide | protein

term transposition+AND+(protease+OR+peptidase)id 172344,U54439.1

rettype abstract | acc | seqid | gb | fasta | countretmode text | xml | asn.1retstartretmax

datetype mdat | pdat | edatreldate 60

dbfrom pubmed | nucleotide | proteincmd neighborlinkname gene_snp_genegenotype

usehistory yWebEnv NCID_1_216999436_130...086_61936294query_key 1

version 2.0tool

42

rettype=abstract1. Mol Microbiol. 2012 Feb;83(4):805-20.

Separate structural and functional domains of Tn4430 transposasecontribute to target immunity.

Lambin M, Nicolas E, Oger CA, Nguyen N, Prozzi D, Hallet B.

GSK Biologicals, Rue Flemming, 20, 1300 Wavre, Belgium.bernard.hallet@uclouvain.be

Like other transposons of the Tn3 family, Tn4430 exhibits targetimmunity, a process that prevents multiple insertions of thetransposon into the same DNA molecule. Immunity is conferred bythe terminal inverted repeats of the transposon and is specificto each element of the family, indicating that the transposase...transposition. One class of mutations was found to stimulatetransposition, whereas other mutations appeared to reduce TnpAactivity. The data are discussed with respect to alternativemodels in which TnpA acts as a specific determinant to bothestablish and respond to immunity.

PMID: 22624153 [PubMed - indexed for MEDLINE]

43

rettype=medlinePMID- 22624153OWN - NLMSTAT- MEDLINEDA - 20120523DCOM- 20120529IS - 1365-2958 (Electronic)IS - 0950-382X (Linking)VI - 83IP - 4DP - 2012 FebTI - Separate structural and functional domains of Tn4430 transposase contribute to target immunity.PG - 805-20AB - Like other transposons of the Tn3 family, Tn4430 exhibits target immunity, a process that prevents multiple insertions of the ...AD - GSK Biologicals, Rue Flemming, 20, 1300 Wavre, Belgium. bernard.hallet@uclouvain.be...AID - 10.1111/j.1365-2958.2012.07967.x [doi]PST - ppublishSO - Mol Microbiol. 2012 Feb;83(4):805-20.

44

EInfo URLs

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pubmed

45

curl Command in Terminal

https://itservices.stanford.edu/service/sharedcomputing/loggingin

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi"

46

Entrez Databases

<?xml version="1.0"?><!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD...> <eInfoResult> <DbList> <DbName>pubmed</DbName> <DbName>protein</DbName> <DbName>nuccore</DbName> <DbName>nucleotide</DbName> <DbName>nucgss</DbName> <DbName>nucest</DbName> <DbName>structure</DbName> <DbName>genome</DbName> ...

47

PubMed Fields<?xml version="1.0"?><!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD...><eInfoResult> <DbInfo> <DbName>pubmed</DbName> <MenuName>PubMed</MenuName> <Description>PubMed bibliographic record</Description> <Count>22006701</Count> <LastUpdate>2012/08/04 03:30</LastUpdate> <FieldList> ... <Field> <Name>TIAB</Name> <FullName>Title/Abstract</FullName> <Description>Free text associated with Abstract/Title</Description> <TermCount>38990504</TermCount> <IsDate>N</IsDate> <IsNumerical>N</IsNumerical> <SingleToken>N</SingleToken> <Hierarchy>N</Hierarchy> <IsHidden>N</IsHidden> </Field> ...

48

PubMed Links<?xml version="1.0"?><!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD...><eInfoResult> <DbInfo> <DbName>pubmed</DbName> <MenuName>PubMed</MenuName> ... <LinkList> ... <Link> <Name>pubmed_pubmed</Name> <Menu>Related Citations</Menu> <Description>Calculated set of PubMed ...</Description> <DbTo>pubmed</DbTo> </Link> ... <Link> <Name>pubmed_structure</Name> <Menu>Structure Links</Menu> <Description>Three-dimensional structure ...</Description> <DbTo>structure</DbTo> </Link> ...

49

ESearch URL

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=transposition+immunity

50

ESummary URL

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&version=2.0&id=2539356

51

EFetch URL

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&rettype=abstract&id=2539356

52

ELink URL

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&db=pubmed&cmd=neighbor&linkname=pubmed_pubmed&

id=2539356

53

curl GET and POST

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=transposition+immunity"

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"-d "db=pubmed&id=22624153,22555593,22253773,21729108,..."

54

Cluttered Result<?xml version="1.0" ?><!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN""http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd"><eSearchResult><Count>94</Count><RetMax>20</RetMax><RetStart>0</RetStart><IdList> <Id>22624153</Id> <Id>22555593</Id> <Id>22253773</Id> <Id>21729108</Id> <Id>21695252</Id> <Id>21347312</Id> <Id>20603074</Id> <Id>20481492</Id> <Id>20004590</Id> <Id>19464182</Id> <Id>19431236</Id> <Id>19237527</Id> <Id>19188259</Id> <Id>19144000</Id> <Id>19120617</Id> <Id>18931389</Id> <Id>18838147</Id> <Id>18396069</Id> <Id>17966893</Id> <Id>17709741</Id> </IdList><TranslationSet><Translation> <From>immunity</From> <To>"immunity"[MeSH Terms] OR "immunity"[All Fields]</To> </Translation></TranslationSet><TranslationStack> <TermSet> <Term>transposition[All Fields]</Term> <Field>All Fields</Field> <Count>19362</Count> <Explode>Y</Explode> </TermSet> <TermSet> <Term>"immunity"[MeSH Terms]</Term> <Field>MeSH Terms</Field> <Count>252127</Count> <Explode>Y</Explode> </TermSet> <TermSet> <Term>"immunity"[All Fields]</Term> <Field>All Fields</Field> <Count>189033</Count> <Explode>Y</Explode> </TermSet> <OP>OR</OP> <OP>GROUP</OP> <OP>AND</OP> <OP>GROUP</OP> </TranslationStack><QueryTranslation>transposition[All Fields] AND ("immunity"[MeSH Terms] OR "immunity"[All Fields])</QueryTranslation></eSearchResult>

55

Cleaned for Parsing<?xml version="1.0"?><!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD...> <eSearchResult> <Count>94</Count> <RetMax>20</RetMax> <RetStart>0</RetStart> <IdList> <Id>22624153</Id> <Id>22555593</Id> <Id>22253773</Id> <Id>21729108</Id> <Id>21695252</Id> <Id>21347312</Id> <Id>20603074</Id> <Id>20481492</Id> <Id>20004590</Id> <Id>19464182</Id> ...

56

Reformat XML

xmllint --format -

...<IdList> <Id>22624153</Id> <Id>22555593</Id> <Id>22253773</Id><Id>21729108</Id> <Id>21695252</Id> <Id>21347312</Id> <Id>20603074</Id>

...

... <IdList> <Id>22624153</Id> <Id>22555593</Id> <Id>22253773</Id> <Id>21729108</Id> <Id>21695252</Id> <Id>21347312</Id> <Id>20603074</Id>

...

57

Extract ID Numbers

perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g'

...<IdList> <Id>22624153</Id> <Id>22555593</Id> <Id>22253773</Id>

...

226241532255559322253773

...

58

Remove Blank Lines

grep [0-9]

226241532255559322253773

...

2262415322555593

22253773...

59

UNIX Pipes

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi" \-d "db=pubmed&term=transposition+immunity" | \

xmllint --format - | \perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g' | \

grep [0-9]

60

Resulting List of IDs

22624153225555932225377321729108216952522134731220603074204814922000459019464182...

61

UNIX Shell Script#!/bin/sh

encoded=$(echo "$2" | sed -e 's/ /%20/g' -e 's/\&/%26/g' -e 's/'\''/%27/g' \ -e 's/(/%28/g' -e 's/)/%29/g' -e 's/,/%2c/g' -e 's/\[/%5b/g' -e 's/\]/%5d/g')

base='http://eutils.ncbi.nlm.nih.gov/entrez/eutils'suffix="&rettype=xml&retmax=200"if [ -n "$3" ]; thensuffix="&rettype=xml&retmax=200&reldate=$3"fi

res=`curl -s "$base/esearch.fcgi?db=$1&term=$encoded$suffix"`

flt=`echo $res | xmllint --format - | \ perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g' | grep [0-9]`

for uid in $fltdoecho "$uid"done

./esrch.sh pubmed "transposition immunity Tn3" 365

62

ESearch -> ESummary

#!/bin/shencoded=$(echo "$2" | sed -e 's/ /%20/g' -e 's/\&/%26/g' -e 's/'\''/%27/g' \ -e 's/(/%28/g' -e 's/)/%29/g' -e 's/,/%2c/g' -e 's/\[/%5b/g' -e 's/\]/%5d/g')base='http://eutils.ncbi.nlm.nih.gov/entrez/eutils'res=`curl -s "$base/esearch.fcgi?db=$1&term=$encoded&rettype=xml&retmax=200"`flt=`echo $res | xmllint --format - | \ perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g' | grep [0-9]`

for uid in $fltdores=`curl -s "$base/esummary.fcgi?db=$1&version=2.0&id=$uid"`sum=`echo $res | xmllint --format -`echo "$sum"done

63

ESearch -> IDs

#!/bin/shencoded=$(echo "$2" | sed -e 's/ /%20/g' -e 's/\&/%26/g' -e 's/'\''/%27/g' \ -e 's/(/%28/g' -e 's/)/%29/g' -e 's/,/%2c/g' -e 's/\[/%5b/g' -e 's/\]/%5d/g')base='http://eutils.ncbi.nlm.nih.gov/entrez/eutils'res=`curl -s "$base/esearch.fcgi?db=$1&term=$encoded&rettype=xml&retmax=200"`flt=`echo $res | xmllint --format - | \ perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g' | grep [0-9]`for uid in $fltdoecho "$uid"done

64

IDs -> ESummary

#!/bin/shbase='http://eutils.ncbi.nlm.nih.gov/entrez/eutils'while read uid; dores=`curl -s "$base/esummary.fcgi?db=$1&version=2.0&id=$uid"`sum=`echo $res | xmllint --format -`echo "$sum"done

./esrch.sh pubmed "transposition immunity" | ./esmry.sh pubmed

65

IDs -> E-Mail Notification

#!/bin/shwhile read uid; doecho $uid | mail -s "$1" "$2"done

./esrch.sh pubmed "Competitor JQ [AUTH]" 30 | \

./eping.sh "Read this new publication" "myemail@myschool.edu"

66

Document Summaries<eSummaryResult> <DocumentSummarySet status="OK"> <DocumentSummary uid="22624153"> <PubDate>2012 Feb</PubDate> <EPubDate/> <Source>Mol Microbiol</Source> <Authors> <Author> <Name>Lambin M</Name> <AuthType> Author </AuthType> <ClusterID>0</ClusterID> </Author> <Author> <Name>Nicolas E</Name> <AuthType> Author </AuthType>

67

Use Historycurl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?

db=pubmed&term=transposition+immunity&usehistory=y"

<eSearchResult> <Count>94</Count> <RetMax>20</RetMax> <RetStart>0</RetStart> <QueryKey>1</QueryKey> <WebEnv>NCID_1_216310091_130.14.18.97_5555_1343867165_1026563511</WebEnv> <IdList> <Id>22624153</Id> <Id>22555593</Id> ...

68

WebEnv and query_key

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&version=2.0&query_key=1&

WebEnv=NCID_1_216310091_130.14.18.97_5555_1343867165_1026563511"

69

PERL Script#!/usr/bin/perluse LWP::Simple;

$dbase = shift or die "Must supply database on command line\n";$query = shift or die "Must supply query on command line\n";$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/';

$url = $base . "esearch.fcgi?db=$dbase&term=$query&retmax=0&usehistory=y";$output = get($url);

$web = $1 if ($output =~ /<WebEnv>(\S+)<\/WebEnv>/);$key = $1 if ($output =~ /<QueryKey>(\S+)<\/QueryKey>/);

$url = $base . "efetch.fcgi?db=$dbase&query_key=$key&WebEnv=$web";$url .= "&rettype=fasta&retmode=text";$data = get($url);

print "$data";

close (STDOUT);

./efaftch.pl nucleotide M65061+OR+U54469

70

ESearch -> XML#!/usr/bin/perluse LWP::Simple;

$dbase = shift or die "Must supply database on command line\n";$query = shift or die "Must supply query on command line\n";$days = shift or "";

$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/';

$url = $base . "esearch.fcgi?db=$dbase&term=$query&retmax=0&usehistory=y";if ( $days ne "" ) { $url .= "&reldate=$days";}

$output = get($url);

print "$output";

close (STDOUT);

71

XML -> EFetch [1]#!/usr/bin/perluse LWP::Simple;

$dbase = shift or die "Must supply database on command line\n";$type = shift or die "Must supply rettype on command line\n";$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/';

while ($thisline = <STDIN>) { $thisline =~ s/\r//; $thisline =~ s/\n//; $web = $1 if ($thisline =~ /<WebEnv>(\S+)<\/WebEnv>/); $key = $1 if ($thisline =~ /<QueryKey>(\S+)<\/QueryKey>/); $num = $1 if ($thisline =~ /<Count>(\S+)<\/Count>/);}

...

72

XML -> EFetch [2]...

$start = 0;$chunk = 500;

while ( $num > 0 ) { $url = $base . "efetch.fcgi?db=$dbase&query_key=$key&WebEnv=$web"; $url .= "&retstart=$start&retmax=$chunk&rettype=$type&retmode=text";

$data = get($url);

print "$data";

$start += $chunk; $num -= $chunk;

sleep 1;}

close (STDIN);close (STDOUT);

./esrch.pl nucleotide 1322283 | ./eftch.pl nucleotide fasta

73

EBot

74

Text Query

75

Second Step

76

Output Format

77

Generate Script

78

EBot ResultDEFINITION alcohol dehydrogenase [Cyberlindnera jadinii].ACCESSION BAM34535VERSION BAM34535.1 GI:398298384DBSOURCE accession AB649224.1KEYWORDS .SOURCE Cyberlindnera jadinii ORGANISM Cyberlindnera jadinii Eukaryota; Fungi; Dikarya; Ascomycota; Saccharomycotina; Saccharomycetes; Saccharomycetales; Phaffomycetaceae; Cyberlindnera.REFERENCE 1 AUTHORS Tamakawa,H., Tomita,Y., Yokoyama,A., Konoeda,Y. and Yoshida,S....FEATURES Location/Qualifiers source 1..348 /organism="Cyberlindnera jadinii" /strain="NBRC0988" /db_xref="taxon:4903" /note="anamorph: Candida utilis" Protein 1..348 /product="alcohol dehydrogenase" CDS 1..348 /gene="ADH1" /coded_by="AB649224.1:1..1047"ORIGIN 1 msipktqkgv ifyenggple ykdipvptpk pneilvnvky sgvchtdlha wkgdwplpvk 61 lplvgghega gvvvakgsev knfeigdyag ikwlngscms cefceksfea ncpkadlsgy 121 thdgsfqqya tadavqaaki skgtdlaeia pilcagvtvy kalktadlep gewvaisgag 181 gglgslaiqf akamglrvla idggddkkql cqelgaevfi dftktkdivk siqdatnggp 241 hgvinvsvse kaieqsteyv rncgtvvlvg lpagavaraq vfaavvksis vkgsyvgnra 301 dtreaidffe rglvkapiki vglselpevy klmeegkilg ryvvdtsk//

LOCUS EJF61282 496 aa linear PLN 12-JUL-2012DEFINITION alcohol dehydrogenase [Dichomitus squalens LYAD-421 SS1].ACCESSION EJF61282VERSION EJF61282.1 GI:395328892DBSOURCE accession JH719411.1...

79

• Entrez Programming Utilities Help

• EBot

• MeSH Browser

References

http://www.ncbi.nlm.nih.gov/books/NBK25501/

http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/ebot/ebot.cgi

http://www.nlm.nih.gov/mesh/MBrowser.html

80

Recommended