Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Interoperable Semantic Web servicesInteroperable Semantic Web services
Matúš Kalaš, Computational Biology Unit, Uni Bergen, NorwayJon Ison, EMBL-EBI, Hinxton, U.K.
SWAT4LS tutorial, Berlin, 9th Dec 2010
Practical information
Coffee break 11:00 – 11:30
Hands-on & discussion after the coffee break
Tutorial end & lunch 13:00
This tutorial www.ii.uib.no/~matus/tutorial_webservices.pdf
This tutorial at SWAT4LSSWAT4LS
Our approach is
Practically oriented
Semantic Web enthusiastic & friendly
But not Semantic Web fanatic
Tools can be available as:
applicationsor programs Web applications
APIs (libraries) Web services
download & install access through Web
prog
ram
mat
icin
terfa
cehu
man
Inte
ract
ion
Requirements for Web services:
Easy to find
Interoperable with programmatic libraries
Easy to construct workflows
Produce semantically rich data
Automated workflow construction possible
The EMBRACEEMBRACE approach
The EMBRACE technology recommendation
Standard SOAP Web services
WS-I compliant + document/literal wrapped SOAP binding
WSDL-first (interface-centric design)
Use standard exchange formats & detailed XML Schemawhen applicable
Test, test, testusing various client frameworks & programming languages
The EMBRACE technology recommendation
BioBioXSDXSD
Document, and annotate WSDL by ontology termsusing the SAWSDL standard
EDAMEDAM
BioBioXSDXSD
EDAMEDAMAnnotation with EDAMAnnotation with EDAM
Hands-on exerciseHands-on exercise
Discussion & feedbackDiscussion & feedback
EDAMEMBRACE Data and Methods
Ontology for Bioinformatics Tools and Datatypes
Jon Ison ([email protected])
Matus Kalas ([email protected])
What is EDAM?
EMBRACE Data and Methods
Ontology for bioinformatics tools and data types
A set of defined terms, relationships between terms and rules that govern the terms and relations
Glorified glossary – with terms organised by is_a relations (class/subclass) into hierarchy
Controlled vocabulary for describing: • Web services e.g. WSDL files• XSD data schema, e.g. associated with a WSDL file• Standalone tools• Web servers• Databases• Ontologies• Data objects• Data syntax and file formats • etc.
Aims to describe (coarse level) all major bioinformatics databases, data types and tools in use
ScopeEDAM is 6 sub-ontologies (branches of terms in their own namespace) in the domain of "bioinformatics tool and data description“:
• topic – “A general field of bioinformatics study, data, processing and analysis or technology.”
• operation – “A specific, singular function or process performed by a tool, for example a WS operation. What is done, but not (typically) how or in what context.”
• data resource – “A category of content of a data source including databases and ontologies.”
• data – “A semantic description of a data entity (datum) commonly used in bioinformatics.”
• format – “A reference (typically a URL) of a data format specification.”
• biological entity – “Any biological thing (or part of a thing) with a physical existence, a physical part, region or feature that can be mapped to such a thing, a collection of such things or an observable phenomenon or occurrence”
• identifier (sub-branch of data) – “Something that identifies (typically uniquely) something such as an entity, database, ontology, datatype”.
biological entity provides biological context to other branches. It is not specific to the domain and might (eventually) be removed
Term Examples "Topic" o Alignment
o Biostatisticso Chemoinformaticso Database and file management
"Operation“ * o Annotation o Comparison and alignment o Mapping and assembly o Modelling and simulation
o Plotting and rendering
"Data resource" o Biological resource o Cell biology and culture o Classification and nomenclature o Genetics
"Data“ ** o Alignment data
o Biological modelo Sequence data
o Identifier
"Data format" o "Binary format"
o "HTML format" o "Text format" o "XML format"
"Biological entity" o Phenomenon
o Metabolic pathwayo Mutation
o Physical entityo Atomo Protein
* Top-level operations are coarse-grained (abstract) providing a navigable top-level
Term Relations 8 basic types:
• is_a:• concerns: topic concerns data resource / data / operation / entity• has_input: operation has_input data• has_output: operation has_output data• is_source_of: resource is_source_of data• is_identifier_of: identifier is_identifier_of data• is_format_of: format is_format_of data• has_attribute: entity has_attribute data
Relations are:• Defined between pairs of terms• Directional• Transitive (propagated from child to parent terms), e.g. if A is_a B is_a C we can infer A is_a C.
Rules:• Define which relations must (or may) be specified for which terms• Reflect well established or self-evident principles
Term hierarchy• Every term (excluding top-level) is related to one or more other term by an is_a (subclass) relationship. is_a define the basic term hierarchy• All "child" terms must share the intrinsic property of their "parent", in addition to having their own intrinsic property.
Conceptual model
Boxes indicates a namespace (top-level term)
Text indicates a term relation
Topic aggregates related concepts that would otherwise be unrelated.
Data includes everything from primitive types (e.g. simple parameters) to complex derived types (e.g. biological data)
TermsEach term corresponds to a well established concept (class) with >=1 intrinsic property
• The class (term + child terms) must have these properties! • Child terms can only add new properties (restrictions not allowed)
EDAM is in OBO format – convenient for editing, does everything we need.
An OBO term consists of:• Unique identifier - persistent IDs (see below)• Name - intuitive & consistent naming conventions • Namespace - “topic”, “operation”, “data resource” etc.• Definition - with consistent semantics• Comment (optional) - e.g. on term usage, boundaries with other ontologies etc.• Synonym(s) (optional) - where in common use• Cross-reference(s) (optional) - to various resources• Relationships to other terms
Handling change / persistenceTerm IDs will persist between versions:
• A term ID will never be deleted once created• A given ID will always identify the same concept• Term names, definitions and comments might change, but will remain true to concept• Obsolete terms will persist (remain in EDAM with same ID)
OBO Term Statement
[Term]id: EDAM:0000970name: Citationnamespace: datadef: "A bibliographic citation providing references to scientific article, book or other published material." [EDAM:EBI "EMBRACE definition"]comment: A citation might include the authors, title and journal name, date and (possibly) an abstract of the publication or link to the full-text if it's freely availability.synonym: "Reference" EXACT []xref: Moby:GCP_SimpleCitationxref: Moby:Publicationis_a: EDAM:0002526 ! Textual data
[Term]id: EDAM:0000292name: Sequence alignmentnamespace: operationdef: "Align molecular sequences." [EDAM:EBI "EMBRACE definition"]synonym: "Sequence alignment generation" EXACT []is_a: EDAM:0002463 ! Sequence alignment processingis_a: EDAM:0002451 ! Sequence comparisonrelationship: has_input EDAM:0000841 ! Undefinedrelationship: has_output EDAM:0000863 ! Sequence alignment
Status
EDAM is in “beta”:
• EDAM_beta10 – many cycles of design/changes/inspection!
• Provides coarse coverage of tools and data types in the EMBRACE Registry / BioCatalogue:
• Suitable for pilot usage (please provide feedback!)
• Starting point for service nomenclature
Coverage - Quite broad in general and quite deep for sequence analysis:
• ~2500 terms with definitions (Nov’ 2010)
• 8 basic types of relation (plus inverse relations)
• Relations are defined but not used in many term definitions (in progress!)
Terms RelationsTopic: 148 is_a: 2822Operation: 500 concerns: 198 Resource: 159 has_input: 20Data: 1028 has_output: 329Data (Identifier): 410 is_source_of: 205Format: 274 is_identifier_of: 257Entity: 81 is_format_of: 0
has_attribute: 27
Design Principles
It wasn’t just thrown together (honestly) …
• Clearly defined scope • Purpose-independent (design not tied to a use case) • Relevant to annotation of current:
• WSDL files • XSD schema • Standalone databases, servers and tools
• General (common concepts only, no fine-grained specialised concepts) • Comprehensive (enough terms to be useful)• Uncluttered (minimal namespaces and relation types) • Comprehensible (terms and relations are simple and intuitive) • Navigable (simple class (is_a) hierarchy)• Integrity (genuine class/subclass relationships with concepts having unique properties)• Complementary (not duplicating established ontologies) • Crosse-referenced (to existing resources) • Validatable (via file parsing / checks in viewers etc) • Compatible (so far as possible with "upper level" ontologies)• Extensible (clear guidelines for developers)• Convenient (clear guidelines for annotators)
There is a compromise in achieving the above – a pragmatic approach is essential!
Limitations
EDAM is/does not:
• Describe syntax or file formats in detail (format namespace will provide URL references)
• Define data structures. has_part / is_part_of relations are not used - it’s difficult (impossible?) to add such relations in a generic way without EDAM becoming a specific format of data types
• Include terms for every conceptual part of things. Typically a datatype is only listed if it known to be in common use
• A catalogue of individual data structures, databases etc. Terms correspond to classes; specific instances (individuals) are not included.
• Complete (and arguably never can be).
• A very complex ontology. Many relations and other domain features that could be expressed, e.g. in OWL format, are not modelled.
Other tools are needed to unify all services and data - EDAM should help!
Viewing
EDAM may be viewed in:
• Any text editor
• Ontology editorOBO Ontology Editor (OBOEdit) Version 2http://oboedit.org
• Web-based browsers: NCBO Ontology Browser http://bioportal.bioontology.org/visualize/42800EBI Ontology Look-up Servicehttp://www.ebi.ac.uk/ontology-lookup/
• SRSEBI SRS serverhttp://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+LibInfo+-lib+EDAM
Thanks
• Peter Rice (boss) • Hamish McWilliam (SRS)• Alan Bleasby (PURLs) • Mahmut Uludag (EMBOSS WS) • James Malone (SWO) • Steve Pettifer • The Forgotten … (sorry)
All enquiries to Jon Ison ([email protected]) cc’ing Matus Kalas ([email protected])
Download & Documentation
“Beta" version in OBO (Open Biomedical Ontologies) format: http://sourceforge.net/projects/edamontology/files/
Documentation at:
http://edamontology.sourceforge.net/
Including clear statement of:
• Branches of terms (namespaces / sub-ontologies)
• Relations
• Rules (governing rules and relations)
• Guidelines for Developers
• Guidelines for Annotators (basic)
• And more …
AnnotationUsing EDAM
Ontology for Bioinformatics Tools and Datatypes
Jon Ison ([email protected])
Matus Kalas ([email protected])
Guidelines for Annotators
Which EDAM branch to use?• "Topic" - any resources (very coarse-grained annotation)• "Operation" - tool functions (fine-grained annotation)• "Data resource" - databases, servers etc. (broad categories based of content-type)• "Data" - datatypes (annotation of semantics)• "Data format" - datatypes / formats (annotation of syntax)
Picking TermsMany annotations ? First familiarise yourself with EDAM (OBOEdit, NCBO etc.)
• Identify correct branch/namespace ("Operation", "Data" etc) considering what is being annotated• Search EDAM using keywords to find candidate terms. • Multiple searches using synonyms, alternative spellings etc. are preferable.• Pick the most specific term(s) available. • Bear in mind some concepts are necessarily overlapping or general.• Only pick correct terms• If a term doesn’t exist, request it's added to EDAM
Use of other ontologiesUse other ontologies too, where possible and desirable.
e.g. an operation that predicts molecular sequence features could be annotated with terms from SO (Sequence Ontology) for the features.
Model of a Web ServiceA WS is considered as an arbitrary (but usually related) set of one or more operations, reducing the problem of WS interoperation to one of compatibility between operations.
Operation• Discrete unit of functionality performing (typically) one or more definite functions• Reads an input• Writes an output• Uses zero or more data resources
Input / Output• Payload (e.g. of SOAP messag) passed / returned in operation call• Name and (ideally) description is given (e.g. in WSDL file)• Input has one or XML elements which must be set (input values) or which are written (output values)
XML elements• Correspond to the input and output service parameters – values which are set or generated • Name and (ideally) description of element is given in schema• Simple or complex XSD types in an XML schema (e.g. within / referenced from WSDL file)• Element values are instances of a particular datatype with a semantic type and a specific syntax.• Most element values have a syntax fully specified by the schema• Some element values correspond to formatted text not specified by the schema. Such reports may be a composite of different semantic types.
Data resources• Databases or ontologies used in the background• Not passed in a WS call• Might be specified indirectly via a parameter. e.g. an operation reads a database, the name of which is specified
Levels of AnnotationAnnotation of a WSDL file or associated XSD schema is possible at several levels.
Assuming SAWSDL annotation: http://www.w3.org/TR/sawsdl/
Annotatable XML elements are:
1. Web service (the whole thing as a collection of possibly multiple operations) ( <wsdl:portType> ) * One (or more) "Topic" terms for the general area(s) the service concerns * One (or more) "Data resource" terms for data resources used by the service (if applicable)
2. Operation ( <wsdl:operation> inside <wsdl:portType> ) * One (or more) "Operation" terms for each WSDL operation (typically just 1 annotation)
3. Input parameters and their sub-parts 4. Output parameters and their sub-parts ( <xs:element>, <xs:complexType>, <xs:simpleType>, <xs:attribute> ) * One (or more) "Data" terms * One (or more) "Data format" terms
SAWSDL annotation
Each EDAM term must have a URI
PURLs (Persistent Uniform Resource Locators) are used
EDAM PURLs include the ontology name (edam), term namespace, and the term unique identifier (ID)
A PURL should be given inside the sawsdl:modelReference annotation
<element name="elementName" sawsdl:modelReference="http://purl.org/edam/namespace/id">
Where ...
* element is the XML element being annotated * elementName is the name of the XML element * namespace is the namespace of the EDAM term, e.g. "operation" * id is the unique identifier of the term, e.g. "0000295"
The PURL (the value of the sawsdl:modelReference attribute) is a URI pointing to the term definition.
PURLs for all EDAM terms have been created.
SAWSDL annotationThese 3 EDAM terms:
[Term]id: EDAM:0000182name: Sequence alignmentnamespace: topic...
[Term]id: EDAM:0000292name: Sequence alignmentnamespace: operation...
[Term]id: EDAM:0000863name: Sequence alignmentnamespace: data...
Would give these 3 PURLs:
http://purl.org/edam/topic/0000182
http://purl.org/edam/operation/0000292
http://purl.org/edam/data/0000863
SAWSDL annotationWhich can be used in SAWSDL annotation, e.g.
<wsdl:portType name="myService" sawsdl:modelReference="http://purl.org/edam/topic/0000182">
<sawsdl:attrExtension sawsdl:modelReference="http://purl.org/edam/operation/0000292>
<xs:element name="outfile" sawsdl:modelReference="http://purl.org/edam/data/0000863>
If >1 annotation is required, delimit them with space characters:
<wsdl:portType name="myService" sawsdl:modelReference="http://purl.org/edam/topic/0000182 http://purl.org/edam/topic/0000181">
Note such multiple annotations:• Can be from different namespaces• Can be from different ontologies entirely
SAWSDL annotationSAWSDL spec. peculiarity when annotating operations
Annotations on <wsdl:operation> element inside <wsdl:portType> should be handled using a <sawsdl:attrExtensions> element. This is not a requirement for other elements.
The <sawsdl:attrExtension> element inside the <wsdl:operation> must be before <wsdl:input>, <wsdl:output> and <wsdl:fault> (typically after the <wsdl:documentation> element).
For example:
<wsdl:portType name="Clustalw2PortType" sawsdl:modelReference="http://purl.org/edam/topic/0000186"> <wsdl:operation name="submitClustalw2"> <wsdl:documentation>Submit a sequence and get a jobID</wsdl:documentation> <sawsdl:attrExtensions sawsdl:modelReference="http://purl.org/edam/operation/0000496"/> <wsdl:input message="submitClustalw2Msg"/> <wsdl:output message="submitClustalw2ResponseMsg"/> </wsdl:operation>
Caution: Some WSDL/XSD validators or SOAP libraries do not check for it, but some do require the strict order of these elements!
EDAM term end-points
When pasted into a browser, the PURLs:
http://purl.org/edam/topic/0000182http://purl.org/edam/operation/0000292http://purl.org/edam/data/0000863
... resolve to:
http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182 http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000292 http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000863
These are complete OBO term statements in plain text (OBO format). PURLs support text extensions allowing a format specifier to be added. For example these PURLs:
http://purl.org/edam/topic/0000182?style=htmlhttp://purl.org/edam/operation/0000292?style=htmlhttp://purl.org/edam/data/0000863?style=html
... will resolve to OBO term statements in HTML such that terms referred to in the statements (via relations) will be clickable to allow navigation:
http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182?style=html http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000292?style=html http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000863?style=html
EDAM term end-points
The eventual final list of end-points will provide other formats/views:
• Plain text in OBO format (default)• HTML • XML• JSON• The term in a web browser, e.g. NCBO Ontology Browser.
http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182?style=html http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=xmlhttp://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=txthttp://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=jsonhttp://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=browser (default)
Not all these formats are yet implemented!
For now, you can see this in action for this term:
http://purl.org/edam/entity/0000002 http://purl.org/edam/entity/0000002?style=html
Matúš KalašPål PuntervollArmin TöpferPrabakar VenkataramanJan Christian BryneInge Jonassen
CBU, BCCS, Bergen
Edita BartaševičiūtėKristoffer Rapacki
CBS, DTU, Greater Copenhagen
Jon Ison
EBI, EMBL, Hinxton
Alexandre JosephChristophe Blanchet
IBCP, CNRS, Lyon
Steve Pettifer
University of Manchester
The exchange format for basic bioinformatics data
Suitable for Web services
BioBioXSDXSD.org.org
Incompatible interfaces hamper usability
Compatible interfaces providesmooth interoperability
Goals of BioBioXSDXSD:
• Filling the gap between specialised exchange formats (such as SBML, MAGE-ML, PDBML, phyloXML, PSI-MI MIF, GCDML, GLYDE-II, … )
• Compatible with SOAP & XML libraries for all main programming languages
• As lightweight as possible, but fitting everyone
• Developed and maintained in open but organised collaborationwelcoming requests from the community
• Detailed structure, “semantically rich”allowing in-depth validation, semantic annotation, efficient compression
• Annotated by the EDAM ontologyData branch (+ Data format and Data resource)
BioBioXSDXSD 1.0 defines exchange formats for:
references to data, accessions, …
annotated sequence
sequence alignment
biological sequence
BioBioXSDXSD : BiosequenceRecord
BioBioXSDXSD : BiosequenceAlignment
>sp|P43353|AL3B1_HUMAN Aldehyde dehydrogenase family 3 member B1 OS=Homo sapiens GN=ALDH3B1 PE=1 SV=1MDPLGDTLRRLREAFHAGRTRPAEFRAAQLQGLGRFLQENKQLLHDALAQDLHKSAFESEVSEVAISQGEVTLALRNLRAWMKDERVPKNLATQLDSAFIRKEPFGLVLIIAPWNYPLNLTLVPLVGALAAGNCVVLKPSEISKNVEKILAEVLPQYVDQSCFAVVLGGPQETGQLLEHRFDYIFFTGSPRVGKIVMTAAAKHLTPVTLELGGKNPCYVDDNCDPQTVANRVAWFRYFNAGQTCVAPDYVLCSPEMQERLLPALQSTITRFYGDDPQSSPNLGRIINQKQFQRLRALLGCGRVAIGGQSDESDRYIAPTVLVDVQEMEPVMQEEIFGPILPIVNVQSLDEAIEFINRREKPLALYAFSNSSQVVKRVLTQTSSGGFCGNDGFMHMTLASLPFGGVGASGMGRYHGKFSFDTFSHHRACLLRSPGMEKLNALRYPPQSPRRLRMLLVAMEAQGCSCTLL
>AL3B1_HUMAN P43353 ALDEHYDE DEHYDROGENASE 3B1 (EC 1.2.1.5). - Homo sapiens (Human).MDPLGDTLRRLREAFHAGRTRPAEFRAAQLQGLGRFLQENKQLLHDALAQDLHKSAFESEVSEVAISQGEVTLALRNLRAWMKDERVPKNLATQLDSAFIRKEPFGLVLIIAPWNYPLNLTLVPLVGALAAGNCVVLKPSEISKNVEKILAEVLPQYVDQSCFAVVLGGPQETGQLLEHRFDYIFFTGSPRVGKIVMTAAAKHLTPVTLELGGKNPCYVDDNCDPQTVANRVAWFRYFNAGQTCVAPDYVLCSPEMQERLLPALQSTITRFYGDDPQSSPNLGRIINQKQFQRLRALLGCGRVAIGGQSDESDRYIAPTVLVDVQEMEPVMQEEIFGPILPIVNVQSLDEAIEFINRREKPLALYAFSNSSQVVKRVLTQTSSGGFCGNDGFMHMTLASLPFGGVGASGMGRYHGKFSFDTFSHHRACLLRSPGMEKLNALRYPPQSPRRLRMLLVAMEAQGCSCTLL
>gi|4502043|ref|NP_000685.1| aldehyde dehydrogenase family 3 member B1 isoform a [Homo sapiens]MDPLGDTLRRLREAFHAGRTRPAEFRAAQLQGLGRFLQENKQLLHDALAQDLHKSAFESEVSEVAISQGEVTLALRNLRAWMKDERVPKNLATQLDSAFIRKEPFGLVLIIAPWNYPLNLTLVPLVGALAAGNCVVLKPSEISKNVEKILAEVLPQYVDQSCFAVVLGGPQETGQLLEHRFDYIFFTGSPRVGKIVMTAAAKHLTPVTLELGGKNPCYVDDNCDPQTVANRVAWFRYFNAGQTCVAPDYVLCSPEMQERLLPALQSTITRFYGDDPQSSPNLGRIINQKQFQRLRALLGCGRVAIGGQSDESDRYIAPTVLVDVQEMEPVMQEEIFGPILPIVNVQSLDEAIEFINRREKPLALYAFSNSSQVVKRVLTQTSSGGFCGNDGFMHMTLASLPFGGVGASGMGRYHGKFSFDTFSHHRACLLRSPGMEKLNALRYPPQSPRRLRMLLVAMEAQGCSCTLL
>sp_ac|P43353 \ID= AL3B1_HUMAN \DE="Aldehyde dehydrogenase family 3 member B1 (Aldehyde dehydrogenase 7)" \NCBITAXID=9606 MDPLGDTLRRLREAFHAGRTRPAEFRAAQLQGLGRFLQENKQLLHDALAQDLHKSAFESEVSEVAISQGEVTLALRNLRAWMKDERVPKNLATQLDSAFIRKEPFGLVLIIAPWNYPLNLTLVPLVGALAAGNCVVLKPSEI
Sequence record in BioBioXSDXSD:
<mySequence xsi:type="AminoacidSequenceRecord"> <sequence>MDPLGDTLRRLREAFHAGRTRPAEFRAAQLQGLGRFLQENKQLLHDALAQDLHKSAFESEVSEVAISQGEVTLALRNLRAWMKDERVPKNLATQLDSAFIRKEPFGLVLIIAPWNYPLNLTLVPLVGALAAGNCVVLKPSEISKNVEKILAEVLPQ YVDQSCFAVVLGGPQETGQLLEHRFDYIFFTGSPRVGKIVMTAAAKHLTPVTLELGGKNPCYVDDNCDPQTVANRVAWFRYFNAGQTCVAPDYVLCSPEMQERLLPALQSTITRFYGDDPQSSPNLGRIINQKQFQRLRALLGCGRVAIGGQSDESDRYIAPTVLVDVQE MEPVMQEEIFGPILPIVNVQSLDEAIEFINRREKPLALYAFSNSSQVVKRVLTQTSSGGFCGNDGFMHMTLASLPFGGVGASGMGRYHGKFSFDTFSHHRACLLRSPG MEKLNALRYPPQSPRRLRMLLVAMEAQGCSCTLL</sequence> <species> <databaseName>NCBI Taxonomy</databaseName> <accession>9606</accession> <entryUri>http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606</entryUri> <name>Human</name> </species> <customName>Aldehyde dehydrogenase family 3 member B1 (ALDH3B1)</customName> <formalReference> <databaseName>UniProt</databaseName> <accession xsi:type=“UniprotAccession">P43353</accession> <entryUri>http://www.uniprot.org/uniprot/P43353</entryUri> <sequenceVersion>1</sequenceVersion> <isoformAccession xsi:type=“ExtendedUniprotAccession">P43353-1</isoformAccession> </formalReference></mySequence>
Sequence record in BioBioXSDXSD 1.0:
Sequence-string restriction in BioBioXSDXSD:
<xs:simpleType name="NucleotideSequence" sawsdl:modelReference="http://purl.org/edam/data/0001211"> <xs:annotation> <xs:documentation>
Nucleotide sequence in any letter case, without ambiguous ("degenerate") bases </xs:documentation>
</xs:annotation>
<xs:restriction base="GenericNucleotideSequence"> <xs:pattern value="[acgt]+"/>
<xs:pattern value="[acgu]+"/>
</xs:restriction></xs:simpleType>
BioBioXSDXSD 1.0 types:
SimpleTypes:
NucleotideSequence AminoacidSequence GeneralNucleotideSequence GeneralAminoacidSequence Biosequence
Accession(s)
helper types: Name, FreeText Uri Integer(s), Decimal(s) … and a few more
ComplexTypes:
NucleotideSequenceRecord AminoacidSequenceRecord GeneralNucleotideSequenceRecord GeneralAminoacidSequenceRecord BiosequenceRecord
..SequenceAlignment(s)
AnnotatedSequence
DatabaseReference, EntryReference OntologyReference, OntologyTerm Species, SequenceReference, Method
helper types: Score, SequencePosition(s) … a few more
BioBioXSDXSD can be used:
• Directly as an input/output format of tools
• BioXSD can be extended, restricted,or included within other formats
• BioXSD can serve as the intermediate canonical format
With BioXSD,, users of Web services getsmooth interoperability
With BioXSD, providers of Web services getready-made building blocks for interfaces
<wsdl:types><xsd:schema ...>
::<xsd:element name=” myOperation”>
?</xsd:element><xsd:element name=” myOperationResponse”>
?</xsd:element>::
</xsd:schema></wsdl:types>
BioBioXSDXSDdata types
Get ready for the exerciseOpen this tutorial from
www.ii.uib.no/~matus/tutorial_webservices.pdfOpen BioXSD documentation
bioxsd.org/technicalDocumentation/BioXSD-1.0Download EDAM from
sourceforge.net/projects/edamontology/files& Install OBO-Edit oboedit.orgOr open EDAM at
bioportal.bioontology.org/visualize/44871Install SoapUI soapui.orgIf you wish, install an XML editor oxygenxml.com
(or XMLSpy from altova.com; Win only)
Get ready for the exercise
Download example WSDL (template)www.ii.uib.no/~matus/example.wsdl.xml
Download filled-in example WSDLwww.ii.uib.no/~matus/example2.wsdl.xml
Exercise a)
WSDL you have downloaded is an example of document/literal-wrapped, WS-I compliant WSDL.
Following the recommended WSDL-first approach, edit the WSDL to match your tool that you have in mind.
Change names of constructs in the WSDL and fill in wsdl:documentation in wsdl:portType and xs:documentation of the I/O parameters
Exercise b)
Use a standard XML format for input parameters where applicable.
Define the structure/format of the output data. Use standard formats where applicable.
Exercise c)
Annotate wsdl:portType and its wsdl:operation-s by terms from EDAM ('Topic', 'Operation', 'Data resource') and maybe another ontology.
Annotate xs:element-s and other parts of the I/O data that you have defined by terms from EDAM ('Data', 'Data format'). If you are using some BioXSD types, these have been already annotated.