72
BioinfRes SoSe 16 Bioinforma)cs Resources - Swissprot - Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12

Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Bioinforma)csResources-Swissprot-

Lecture&ExercisesProf.B.Rost,Dr.L.Richter,J.Reeb

Ins)tutfürInforma)kI12

Page 2: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Puta)veSchedule

Apr. 22nd Intro, General Overview (1. sh.) Jun 10th No-SQL (7.sh.) Apr. 29th Sequence Databases (2. sh.) Jun 17th No-SQL (8.sh.)* May 6th No lecture Jun 24th JavaScript / UI (9.sh.) May 13th Sequence Databases (3. sh.) Jul 1st Web Services (10.sh.) May 20th Structure Databases (4. sh)* Jul 8th Bioinformatics Suites / Forums May 27th SQL (5. sh.) Jul 15th Wrap Up, Q&A Jun 3rd SQL (6. sh) Jul 28th Exam, 10:30-12:00 MW1050

* These exercises can earn you a bonus

Page 3: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

XMLInfusion(in10sec)●  compila)onfromhMp://www.w3schools.com/xml/default.asp

●  XMLisasoQware-andhardware-independenttooltostoreandtotransportdata

●  XMLstandsforeXtensibleMarkupLanguage

●  designedtostoreandtransportdata●  designedtobeself-descrip)ve

●  W3Crecommenda)on

●  itdoesNOTDOanything

Page 4: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

AboutTags

●  XMLtagsarenotpredefinedlikeHTMLtags●  everybodycan/hastoinventhisowntags

●  newtagscanbeaddedany)me

●  theauthorhastodefinecontentandstructureofthedocument

●  everythingisplaintext

Page 5: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

DocumentStructure<?xml version="1.0" encoding="UTF-8"?>!<bookstore>!!  <book category="cooking">!    <title lang="en">Everyday Italian</title>!    <author>Giada De Laurentiis</author>!    <year>2005</year>!    <price>30.00</price>!  </book>!! <book category="children">!    <title lang="en">Harry Potter</title>!    <author>J K. Rowling</author>!    <year>2005</year>!    <price>29.99</price>!  </book>!!....!</bookstore>!!takenfromhMp://www.w3schools.com/xml/xml_usedfor.asp

Page 6: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

SyntaxRules●  elementsaredefinedusingtags:<tagName> ... </tagName>or<tagName/>!

●  elementscanbenested(containotherelements-parentandchildnodes,siblingnodes)

●  elementscanhavetextcontent

●  eachdocumentmustcontainONErootelementthatistheparentofallotherelements

Page 7: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

SyntaxRefined

●  prologline<?xml ...>isop)onal●  tagsmustbe(self-)closed

●  tagarecasesensi)ve

●  tagsmustbeproperlynested:<a><b>....</a></b> Wrong!<a><b>....</b></a>! Right!

Page 8: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

SyntaxRefined●  tagsmayhaveaMributes●  aMributevaluesmustalwaysbequoted

●  somespecialcharacterscannotbeuseddirectly

●  ->codedbyen)tyreferences:&lt; < lessthan&gt; > greaterthan&amp; & ampersand&apos; ‘ apostrophe&quot; “ quota)onmark

●  comments:<!-- .... -->!

Page 9: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

TagNames●  casesensi)ve●  muststartwithaleMerorunderscore

●  mustnotstartwiththeleMersxmlinanycase

●  cancontain:leMers,digits,hyphens,underscoresandperiods

●  cannotcontainspaces

●  applycommonsenseandaconsistentstyle●  avoid:minus(-),period(.),colon(:),non-englishcharactersforcompa)bilityreasons

Page 10: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

XMLElement

●  everythingbetweenthestartandtheendtag●  tagsareincluded

●  cancontain:-  text-  aMributes-  otherelements-  amixofall

●  areextensible

Page 11: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

XMLAMributes

●  valuesmustbequoted:singleordoublequotes●  theunusedcharactercanbeusedinsidethevalue

●  decisionforaMributeorelementundecided,but:-  aMributescannotcontainmul)plevalues-  aMributescannotcontaintreestructures-  aMributesarenoteasilyexpandable

●  usefultostoremetadata,likeelementid,etc.

Page 12: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

AGlimpseofNamespaces

●  allowtopreventtagnamecollisionsbetweendifferentauthors/applica)ons/domains

●  implementedbytheintroduc)onofprefixes●  definedasanaMribute:xmlns:prefix=“URI”!

●  usage:<prefix:tagName>!●  theURIisonlyneededtobeunique

●  usedtointegrateotherspecifica)ons,e.g.XSLT

Page 13: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

LevelsofCorrectness●  wellformed:adocumentobeythesyntaxrules:-  rootelement-  closingtag-  casesensi)ve-  properlynested-  aMributevaluesquoted

●  validdocuments:inadd)ontobeingvalidthealsoconformtoadocumenttypedefini)on(formatspecifica)on)

Page 14: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

DocumentTypeDefini)ons

●  twowaystospecifyadocumentstructure:●  DTD:DocumentTypDefini)on

●  XMLSchema:XMLbasedalterna)vetoDTD

Page 15: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Example

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE note SYSTEM "Note.dtd”> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend! &copyright; </body> </note>!

Page 16: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Example

<!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> <!ENTITY copyright “Copyright by ..”> ]>!

Page 17: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

XMLDTD

●  referencedfromadocumentwith:<!DOCTYPE note SYSTEM "Note.dtd">!

●  !DOCTYPEdefinestherootelement●  !ELEMENTdefinesthestructureoftheelements

●  #PCDTAmeansparse-abletextdata●  !ENTITYdefinesspecialcharactersorstrings

Page 18: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

XMLSchema●  alterna)vetoDTD<xs:element name="note”> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>!

●  supportofdatatypesandnamespaces

●  wriMeninXMLandextensible!

Page 19: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

NamesandOtherComplica)ons

AmosBairoch

takenfromhMp://web.expasy.org/images/people/Amos_Bairoch.jpg

IoannisXenarios

takenfromhMp://www.isb-sib.ch/people/Ioannis.Xenarios

Page 20: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

History

1986 A.BairochcreatedSwiss-Protatthe UniversityofGeneva,since1988in

collabora)onwithEMBL/EBI

1993 togetherwithRonAppellaunchofExPASy

1998 Founda)onofSIB(SwissIns)tuteof Bioinforma)cs)

2002 Founda)onoftheUniProtconsor)umby EBI,SIBandPIR

Page 21: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

UniProtComponents:●  UniProtKB:-  UniProtKB/Swiss-Prot-  UniProtKB/TrEMBL

●  UniParc:puresequencearchive,noannota)ons

●  UniRef:consistsfothreedatabasesofclusteredsetsofproteinsequences(UniRef100,UniRef90,UniRef50)usingtheCD-HITalgorithm

●  UniMes:datafrommetagenomicandenvironmentalsamples,notinUniProtKB

Page 22: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

ExPASy

●  hMp://www.expasy.org●  ExpertProteinAnalysisSystem(1993)

●  now:SIBExPASyBioinforma)csResourcesPortal●  Ar)moP,JonnalageddaM,ArnoldK,Bara)nD,CsardiG,de

CastroE,DuvaudS,FlegelV,For)erA,GasteigerE,GrosdidierA,HernandezC,IoannidisV,KuznetsovD,Liech)R,MoreoS,MostaguirK,RedaschiN,RossierG,XenariosI,andStockingerH.ExPASy:SIBbioinforma9csresourceportal,NucleicAcidsRes,40(W1):W597-W603,2012.

Page 23: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

ExpasyCategories

●  Proteomics●  Genomics

●  StructuralBioinforma)cs

●  Systemsbiology●  Phylogeny/evolu)on

Page 24: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

ExpasyCategories

●  Popula)ongene)cs●  Transcriptomics

●  Biophysics

●  Imaging●  DrugDesign

Page 25: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

ResourceDescrip)on

1.  Resourcenameanddescrip)on2.  MaintainingSIBgroup

3.  Scien)ficcategory4.  Keywords:acontrolledvocabularyisusedtotag

theresource

Page 26: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

ResourceDescrip)on

5.  URLforthewebinterfaceandforthedownloadifavailable

6.  SoQwaretype:website,commandlineinterface,GUI,etc

7.  Status:greencheckboxifcurrentlyavailable

Page 27: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

UniProt/SwissProtSta)s)cs

●  Release2016_05,May.11th●  takenfromhMp://web.expasy.org/docs/relnotes/relstat.html

●  551.193sequenceentries(548.454in2015_05)/196.822.649aminoacids(195.409.447in2015_05)

Page 28: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

UniProt/SwissProtSta)s)cs●  Growthoveroneyear:2016_5vs2015_5

Protein existence (PE) Entries % 1. Evidence at protein level 92.536

(85.419) 16.8

(15.6) 2. Evidence at transcript level 57.757

(61.814) 10.5

(11.3) 3. Inferred from homology 387.589

(387.733) 70.3

(70.7) 4. Predicted 11358

(11.526) 2.1

(2.1) 5. Uncertain 1.953

(1.962) 0.4

(0.4)

Page 29: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Development

takenfromhMp://web.expasy.org/docs/relnotes/relstat1.pngforrelease2015_5

Page 30: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

MoreNumbers(rel.2015_5)

●  Representedspecies:13.209●  Top20species:116.206sequences,i.e.21.3%ofthetotalnumberofsequences

Entries No of Species Entries No of Species 1 5.495 8 228 2 1.899 9 214 3 1.023 10 122 4 657 11-20 711 5 487 21-50 426 6 399 51-100 213 7 289 >100 1.046

Page 31: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

SpeciesRepresenta)on(rel.2015_5)Top Frequency Species

1 20.198 Homo sapiens (Human) 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces cerevisiae (Baker’s yest) 6 5.993 Bos taurus (Bovine) 7 5.103 Schizosaccheromyces pombe (Fission yeast) 8 4.433 Escherichia coli K12 9 4.185 Bacillus subtilis 10 4.131 Dictyostelium discoideum (Slime mold) ... ... ...

Page 32: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Representa)onoftheDivisions(rel.2015_5)

Archaea (4%), 19340

Bacteria (61%), 332110

Eukaryota (33%), 180411

Viruses (3%), 16593

Page 33: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Distribu)onofEukaryota(rel.2015_5)

Human (11%), 20199

Other Mammalia

(26%), 46146

Other Vertebrata

(10%), 17823

Viridiplantae (20%), 36480

Fungi (17%), 31527

Insecta (5%), 8781

Nematoda (2%), 4417

Other (8%), 15038

Page 34: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

LengthDistribu)on(rel.2015_5)

0

10000

20000

30000

40000

50000

60000

70000

Page 35: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

AminoAcidComposi)on(rel.2015_5)

figure taken from http://web.expasy.org/docs/relnotes/relstat.html gray=aliphatic, red=acidic, green=small hydroxy, blue=basic, black=aromatic, white=amide, yellow=sulfur

Page 36: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

SwissProtAnnota)onProcess

●  definedinhMp://www.uniprot.org/docs/sop_manual_cura)on.pdf

●  explainedinhMp://www.uniprot.org/help/manual_cura)on

Page 37: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Annota)onPhases

1.  Sequencecura)on2.  Sequenceanalysis3.  Literaturecura)on4.  Family-basedcura)on5.  EvidenceaMribu)on6.  Qualityassurance,integra)onandupdate

Page 38: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

SequenceCura)on

●  morethan95%aretranslatedCDSfromINSDC●  othersources:PDB,directproteinsequencing,projectsnotsubmiongtoINSDC

●  sequencesareselectedaccordingtocura)onpriori)es(hMp://www.uniprot.org/program/)

●  resultsinthe“canonicalsequence”foragene/speciespair

Page 39: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Stepstowardthecanonicalsequence

●  Entryselec)on●  RunBLASTsimilaritysearchestoiden)fyaddi)onalsequencesforthesamegene

●  Iden)fyhomologsbyreciprocalBLASTandphylogenybasedresources

●  Lockselectedentriesforothercuratorstopreventduplica)on

Page 40: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Stepstowardthecanonicalsequence●  PreparesequencealignmentswithT-Coffee,Muscle,ClustalW

●  Mergeintothecanonicalsequence:-  mostprevalent-  mostsimilartoorthologssequencesfoundinotherspecies

-  basedonlengthandaacomposi)onitallowstheclearestdescrip)on

-  default:longest

●  recordconflictsandvaria)ons

Page 41: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

SequenceAnalysis

●  Severalanalysisprogramsareappliedtothesequencesfor:-  topologicalfeatures-  post-transla)onalmodifica)ons-  domains

●  allresultsaremanuallycheckedandin-orexcludedforannota)on

Page 42: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

TopologicalAnalysis

Tools Prediction Signal P Presence and location of signal peptides TargetP Presence and location of transit peptides Predotar Mitochondrial, plastid or ER targeting sequences ESKW Transmembrane domains MEMSAT Transmembrane domains TMHMM Transmembrane domains Phobius Discriminates transmembrane and signal regions

Page 43: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Post-transla)onalmodifica)onAnalysis

Tools Prediction GPI-predictor GPI lipid anchor sites NetNGlyc N-glycosylation sites NetOGlyc O-glycosylation sites NMT Predictor N-terminal myristoylation sites Sulfinator Tyrosine sulfatation sites

Page 44: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

DomainAnalysis

Tools Prediction ps_scan internal PROSITE profile, pattern and rule scanning InterPro retrieves non-PROSITE motif matches using InterPro database or

InterProScan Coils Coiled-coils regions polyAA internal program which identifies homopolymeric stretches of amino

acids REPEAT identifies the following repeats: Ankyrin, Armadillo, HAT, HEAT,

Kelch, Leucine-rich, PFTA, PFTB, RCC1, TPR, WD40

Page 45: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Automatically selected results are returned in a graphical interface which allows visualisation of the predictions (Figure 1). Selected features are shown in green and unselected features are shown in red. The selected/unselected state of a feature can be toggled by clicking on it.

Figure 1. UniProtKB sequence analysis results displayed in graphical interface

All predictions are manually reviewed and relevant results are selected for inclusion in the entry. The sequence analysis platform then transforms the selected features into UniProtKB annotation by applying a set of automatic annotation rules (Figure 2).

taken from http://www.uniprot.org/docs/sop_manual_curation.pdf

Page 46: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

LiteratureCura)on

●  Iden)fica)onofrelevantscien)ficliteraturefrom-  literatureandtextminingresources(PubMed,EuropePMC,iHOP,TextPresso)

-  addi)onsfromothersourcesmadebythecurator

●  Informa)onisextractedformthefulltext:-  generalannota)ons(notposi)onspecific)-  posi)onspecificannota)ons

Page 47: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

GeneralAnnota)ons

●  hMp://www.uniprot.org/help/general_annota)on

●  posi)on-independent●  containsmostlygeneralbiologicalinforma)onlike:func)ons,cataly)cac)vity,cofactor,enzymeregula)on,subunitstructure,pathway,...

Page 48: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

SequenceAnnota)ons

●  posi)ondependent●  hMp://www.uniprot.org/help/sequence_annota)on

●  regionsorsitesofinterestlikepost-transla)onalmodifica)ons,bindingsites,ac)vesites,etc.

●  containsseveralsubsec)ons:moleculeprocessing,regions,sites,aminoacidmodifica)ons,naturalvariants,experimentalinfo,secondarystructure

Page 49: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Family-basedCura)on

●  Evalua)onandcura)onofhomologsasdescribedabove

●  Standardiza)onofannota)onofhomologs●  Propaga)onofannota)onacrossthehomologstoensureconsistency

Page 50: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

EvidenceAMribu)on●  Everyannota)onisaMributedtoitsoriginalsource

●  Everyannota)oncanbetracedbackandevaluated

●  Forevidencedis)nc)onthereare7codesfromtheEvidenceCodeOntology(ECO)usedformanuallycuratedentries

●  hMp://www.uniprot.org/help/evidences●  Addi)onalGOtermannota)on

Page 51: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

done through the use of a subset of evidence codes from the Evidence Code Ontology (ECO) (24). There are seven ECO evidence codes used in manually curated entries as shown in Table 2.

Table 2. Evidence Code Ontology (ECO) codes used during the UniProt manual curation process

ECO code Term name Usage ECO:0000269 experimental evidence used in

manual assertion Information for which there is published experimental evidence

ECO:0000303 non-traceable author statement used in manual assertion

Information based on author statements in scientific articles for which there is no experimental support

ECO:0000250 sequence similarity evidence used in manual assertion

Information which has been propagated from a related experimentally characterised protein

ECO:0000312 imported information used in manual assertion

Information which has been imported from another database and manually verified

ECO:0000305 curator inference used in manual assertion

Information which has been inferred by a curator based on his/her scientific knowledge or on the scientific content of an article

ECO:0000255 match to sequence model evidence used in manual assertion

Information originating from the UniProt automatic annotation systems or any of the sequence analysis programs used during the manual curation process and which has been manually verified

ECO:0000244 combinatorial evidence used in manual assertion

Information which is manually curated based on a combination of experimental and computational evidence

Full details of the evidences used in UniProtKB are available at http://www.uniprot.org/manual/evidences. 4.11 GO annotation Gene Ontology (GO) terms are assigned based on experimental data from the literature. Relevant terms are identified using the QuickGO (25) browser and are assigned to entries using the Protein2GO curation tool. This tool has been developed within the UniProt group and is used both by UniProt and by other members of the GO Consortium. GO terms are also propagated to homologous proteins where appropriate. The procedure is described in more detail at http://www.ebi.ac.uk/GOA/ManualAnnotationEfforts. 4.12 Quality control and integration All finished entries are run through a series of automated checks which verify a large number of biological rules such as the positions and relevance of amino acids cited in the entry. Any reported errors are corrected. Once an entry has passed the automated checks, it undergoes manual review by a senior curator to ensure that all relevant sequences have been merged, that all relevant literature has been added, that the annotation has been added correctly, and that all relevant sequence analysis results have been included. Once an entry has passed the automated and manual quality control checks, it is integrated into the database. 4.13 Unlock finished entries Integrated entries are unlocked so that they are available for further curation.

taken from http://www.uniprot.org/docs/sop_manual_curation.pdf

Page 52: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

QualityControlandIntegra)on

●  Finishedentriesrunthroughaseriesofrule-basedcheckedconcerningespeciallyposi)onsandregions

●  Allerrorsarecorrected

●  Manuallyreviewedbyaseniorcurator

●  Finallyitisintegratedintothedatabase●  Unlockthefinishedentriesforfurthercura)on

Page 53: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Demostra)on

●  hMp://www.uniprot.org/uniprot/P62756#sec)on_features

Page 54: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

TheSwiss-ProtFlatFile●  hMp://web.expasy.org/docs/userman.html●  Anentryiscomposedbydifferentlinetypes

●  Linetypeshavetheirownformat

●  FollowsEMBLNucleo)deSequenceDatabaseformatascloseaspossible

●  2sec)ons:-  coredata(sequencedata,cita)oninfo,taxonomy)-  annota)ons(func)on,modifica)on,domains,secandquartstructure,diseaseassocia)ons,conflicts,asf)

Page 55: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Line Code

Content Occurence in an entry

ID Identification Once; starts the entry AC Accession number(s) Once or more DT Date Three times DE Description Once or more GN Gene name(s) Optional OS Organism species Once or more OG Organelle Optional OC Organism classification Once or more OX Taxonomy cross-reference Once OH Organism host Optional

--continued--

The following table lists the available two-letter line codes. Each code is followed by three blanks.

Page 56: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Line Code

Content Occurence in an entry

RN Reference number Once or more RP Reference position Once or more RC Reference comment(s) Optional RX Reference cross-reference(s) Optional RG Reference group Once or more (Optional if RA line) RA Reference authors Once or more (Optional if R line) RT Reference title Optional RL Reference location Once or more CC Comments or notes Optional DR Database cross-references Optional PE Protein existence Once KW Keywords Optional FT Feature table data Once or more in Swiss-Prot, optional in TrEMBL SQ Sequence header Once (blanks) Sequence data Once or more // Termination line Once; ends the entry

Page 57: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

FieldsinMoreDetail

●  IDline:IDEntryNameStatus;SequenceLength.

●  EntryName:upto11uppercasealphanumericcharactersX_Y-  Xisamnemoniccodeofatmost5alphanumericcharacters

-  Yisamnemonicspeciesiden)fica)oncodeofatmost5alphanumericcharacters

●  IDCYC_BOVINReviewed;104AA.

Page 58: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

●  ACline:ACAC_number_1;[AC_number_2;]...[AC_number_N;]

●  Accessionnumber:6or10characters1 2 3 4 5 6 7 8 9 10 [A-N,R-Z][0-9][A-Z] [A-Z,0-9][A-Z,0-9][0-9][O,P,Q] [0-9][A-Z,0-9][A-Z,0-9][A-Z,0-9][0-9][A-N,R-Z][0-9][A-Z] [A-Z,0-9][A-Z,0-9][0-9][A-Z] [A-Z,0-9] [A-Z,0-9] [0-9]

●  RegEx:[OPQ][0-9][A-Z0-9]{3}[0-9]|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2}

●  Examples:P12345,Q1AAA9,A0A022YWF9

Page 59: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

●  DTline:date,DD-MMM-YYYY●  alwaysoneofthebiweeklyreleasedates

●  alwaysthreelines:-  dateofintegra)on-  dateofsequenceversion,sequenceversionX-  dateofentryversion,entryversionX

●  Example:DT01-FEB-1999,integratedintoUniProtKB/TrEMBL.DT15-OCT-2000,sequenceversion2.DT15-DEC-2004,entryversion5.

Page 60: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

●  DElines:-  threecategoriesandaddi)onalsubcategories-  containsarecommendedname-  besides:fullname,shortname,ECnumber-  alterna)venames:e.g.asanallergenorinbiotechnology,...

Page 61: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

DERecName:Full=AnnexinA5;DEShort=Annexin-5;DEAltName:Full=AnnexinV;DEAltName:Full=Lipocor)nV;DEAltName:Full=EndonexinII;DEAltName:Full=CalphobindinI;DEAltName:Full=CBP-I;DEAltName:Full=Placentalan)coagulantproteinI;DEShort=PAP-I;DEAltName:Full=PP4;DEAltName:Full=Thromboplas)ninhibitor;DEAltName:Full=Vascularan)coagulant-alpha;DEShort=VAC-alpha;DEAltName:Full=AnchorinCII;DERecName:Full=Granulocytecolony-s)mula)ngfactor;DEShort=G-CSF;DEAltName:Full=Pluripoie)n;DEAltName:Full=Filgras)m;DEAltName:Full=Lenogras)m;DEFlags:Precursor;

Page 62: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

●  OSline:origina)ngorganism●  OSHomosapiens(Human).●  OSRoussarcomavirus(strainSchmidt-RuppinA)(RSV-SRA)(Avianleukosis

OSvirus-RSA).

●  OClines:containthetaxonomicclassifica)onofthesourceorganismaccordingto(hMp://www.ncbi.nlm.nih.gov/Taxonomy/)

●  OCNode[;Node...].

●  OCEukaryota;Metazoa;Chordata;Craniata;Vertebrata;Euteleostomi;OCMammalia;Eutheria;Euarchontoglires;Primates;Catarrhini;Hominidae;OCHomo.

Page 63: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

RN,RP,RC,RX,RG,RA,RT,RL●  canoccurmul)ple)me●  orderinblockfixed

●  e.g:RN[1]RPNUCLEOTIDESEQUENCE[MRNA](ISOFORMSAANDC),FUNCTION,INTERACTIONRPWITHPKC-3,SUBCELLULARLOCATION,TISSUESPECIFICITY,DEVELOPMENTALRPSTAGE,ANDMUTAGENESISOFPHE-175ANDPHE-221.RCSTRAIN=BristolN2;RXPubMed=11134024;DOI=10.1074/jbc.M008990200;RAZhangL.,WuS.-L.,RubinC.S.;RT"AnoveladapterproteinemploysaphosphotyrosinebindingdomainandRTexcep)onallybasicN-terminaldomainstocaptureandlocalizeanRTatypicalproteinkinaseC:characteriza)onofCaenorhabdi)selegansRTCkinaseadapter1,aproteinthatavidlybindsproteinkinaseC3.“;RLJ.Biol.Chem.276:10463-10475(2001).

Page 64: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

CClines

●  freetext●  containsmostoftheannotatedinforma)on●  CC-!-TOPIC:Firstlineofacommentblock;

CCsecondandsubsequentlinesofacommentblock.

●  structuredbypredefinedtopicslike:Allergen,Alterna)veProducts,..,Cofactor,...,Disease,..Domain,...,Func)on,Interac)on,.......

Page 65: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

CC -!- ALLERGEN: Causes an allergic reaction in human. Minor allergen of!

CC bovine dander.!

CC -!- ALTERNATIVE PRODUCTS:!

CC Event=Alternative initiation; Named isoforms=2;!

CC Name=Alpha;!

CC IsoId=P51636-1; Sequence=Displayed;!

CC Name=Beta;!

CC IsoId=P51636-2; Sequence=VSP_018696;!

CC -!- SUBCELLULAR LOCATION: Cell membrane {ECO:0000250}; Peripheral!

CC membrane protein {ECO:0000250}. Secreted {ECO:0000250}. Note=The!

CC last 22 C-terminal amino acids may participate in cell membrane!

CC attachment.!

CC -!- SUBCELLULAR LOCATION: Isoform 2: Cytoplasm {ECO:0000305}.!

!

!

Page 66: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

CrossReferences

●  toomanytoenumerate●  extensivereferenceswithnucleo)dedatabases,e.g.:inEMBLFTCDS302..2674FT/protein_id="CAA03857.1“FT/db_xref="SWISS-PROT:P26345“FT/gene="recA“FT/product="RecAprotein“inSwiss=ProtDREMBL;AJ297977;CAC17465.1;-;Genomic_DNA.DREMBL;X56491;CAA39846.1;ALT_FRAME;mRNA.

Page 67: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

KeyWords/FeatureTable

●  KWKeyword[;Keyword...].●  helpstosearchresp.indexthedatabase

●  nolimits:KW3D-structure;Alterna)vesplicing;Alzheimerdisease;Amyloid;KWApoptosis;Celladhesion;Coatedpits;Copper;KWDirectproteinsequencing;Diseasemuta)on;Endocytosis;KWGlycoprotein;Heparin-binding;Iron;Membrane;Metal-binding;KWNotchsignalingpathway;Phosphoryla)on;Polymorphism;KWProteaseinhibitor;Proteoglycan;Serineproteaseinhibitor;Signal;KWTransmembrane;Zinc.

●  FeaturetablelikeGenBank/EMBL/DDBJ

Page 68: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Programma)cAccess

●  hMp://www.uniprot.org/help/programma)c_access(rememberthislink!)

●  severalusecasesdocumented,butnotasanAPI●  bestway:usethewebinterfacetoconstruct/refineyourqueryfirstbeforeyoutrytoautomatetheprocess

Page 69: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

RetrievinganIndividualEntry

●  usessimpleURLwhichcanbebookmarked●  forindividualentries:hMp://www.uniprot.org/uniprot/P12345

●  defaultresultisawebpage

●  alterna)veformats:txt,xml,rdf,fasta,gff

●  specifiedviatheaccessionsuffix

●  structuredformatslikexmlorrdfcanincludereferencedentries

Page 70: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

UsingtheIDmappingservice

●  hMp://www.uniprot.org/help/programma)c_access#batch_retrieval_perl_example

●  useshMpPOSTmethod

●  convertsbetweendifferentdatabaseIDs

●  youhavetoknowthespecificabbrevia)onfortherespec)vedatabases

Page 71: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

RetrievingEntriesviaQueries

●  useshMpGETmethodi.e.●  thequerystringispartoftheURL

●  structuremightbequitecomplex

●  usethebrowsertoconfigurethequerystring●  moreseongareavailableviathequerybuilderhMp://www.uniprot.org/help/advanced_search

●  theURLlengthmightbelimitedto1000characters

Page 72: Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces

BioinfRes SoSe 16

Examples●  hMp://www.uniprot.org/uniprot/P12345.txt●  hMp://www.uniprot.org/uniprot/P12345.xml

●  hMp://www.uniprot.org/uniref/UniRef90_P04259.xml

●  hMp://www.uniprot.org/uniref/UniRef90_P04259.rdf

●  hMp://www.uniprot.org/uniref/UniRef90_P04259.fasta

●  hMp://www.uniprot.org/uniref/UniRef90_P04259.tab