Interoperable Semantic Web servicesmatus/tutorial_webservices.pdf · Interoperable Semantic Web services Matúš Kalaš, Computational Biology Unit, Uni Bergen, Norway Jon Ison, EMBL-EBI,

Interoperable Semantic Web servicesInteroperable Semantic Web services

Matúš Kalaš, Computational Biology Unit, Uni Bergen, NorwayJon Ison, EMBL-EBI, Hinxton, U.K.

SWAT4LS tutorial, Berlin, 9th Dec 2010

Practical information

Coffee break 11:00 – 11:30

Hands-on & discussion after the coffee break

Tutorial end & lunch 13:00

This tutorial www.ii.uib.no/~matus/tutorial_webservices.pdf

This tutorial at SWAT4LSSWAT4LS

Our approach is

Practically oriented

Semantic Web enthusiastic & friendly

But not Semantic Web fanatic

Tools can be available as:

applicationsor programs Web applications

APIs (libraries) Web services

download & install access through Web

prog

ram

mat

icin

terfa

cehu

man

Inte

ract

ion

Requirements for Web services:

Easy to find

Interoperable with programmatic libraries

Easy to construct workflows

Produce semantically rich data

Automated workflow construction possible

The EMBRACEEMBRACE approach

The EMBRACE technology recommendation

Standard SOAP Web services

WS-I compliant + document/literal wrapped SOAP binding

WSDL-first (interface-centric design)

Use standard exchange formats & detailed XML Schemawhen applicable

Test, test, testusing various client frameworks & programming languages

The EMBRACE technology recommendation

BioBioXSDXSD

Document, and annotate WSDL by ontology termsusing the SAWSDL standard

EDAMEDAM

BioBioXSDXSD

EDAMEDAMAnnotation with EDAMAnnotation with EDAM

Hands-on exerciseHands-on exercise

Discussion & feedbackDiscussion & feedback

EDAMEMBRACE Data and Methods

Ontology for Bioinformatics Tools and Datatypes

Jon Ison ([email protected])

Matus Kalas ([email protected])

What is EDAM?

EMBRACE Data and Methods

Ontology for bioinformatics tools and data types

A set of defined terms, relationships between terms and rules that govern the terms and relations

Glorified glossary – with terms organised by is_a relations (class/subclass) into hierarchy

Controlled vocabulary for describing: • Web services e.g. WSDL files• XSD data schema, e.g. associated with a WSDL file• Standalone tools• Web servers• Databases• Ontologies• Data objects• Data syntax and file formats • etc.

Aims to describe (coarse level) all major bioinformatics databases, data types and tools in use

ScopeEDAM is 6 sub-ontologies (branches of terms in their own namespace) in the domain of "bioinformatics tool and data description“:

• topic – “A general field of bioinformatics study, data, processing and analysis or technology.”

• operation – “A specific, singular function or process performed by a tool, for example a WS operation. What is done, but not (typically) how or in what context.”

• data resource – “A category of content of a data source including databases and ontologies.”

• data – “A semantic description of a data entity (datum) commonly used in bioinformatics.”

• format – “A reference (typically a URL) of a data format specification.”

• biological entity – “Any biological thing (or part of a thing) with a physical existence, a physical part, region or feature that can be mapped to such a thing, a collection of such things or an observable phenomenon or occurrence”

• identifier (sub-branch of data) – “Something that identifies (typically uniquely) something such as an entity, database, ontology, datatype”.

biological entity provides biological context to other branches. It is not specific to the domain and might (eventually) be removed

Term Examples "Topic" o Alignment

o Biostatisticso Chemoinformaticso Database and file management

"Operation“ * o Annotation o Comparison and alignment o Mapping and assembly o Modelling and simulation

o Plotting and rendering

"Data resource" o Biological resource o Cell biology and culture o Classification and nomenclature o Genetics

"Data“ ** o Alignment data

o Biological modelo Sequence data

o Identifier

"Data format" o "Binary format"

o "HTML format" o "Text format" o "XML format"

"Biological entity" o Phenomenon

o Metabolic pathwayo Mutation

o Physical entityo Atomo Protein

* Top-level operations are coarse-grained (abstract) providing a navigable top-level

Term Relations 8 basic types:

• is_a:• concerns: topic concerns data resource / data / operation / entity• has_input: operation has_input data• has_output: operation has_output data• is_source_of: resource is_source_of data• is_identifier_of: identifier is_identifier_of data• is_format_of: format is_format_of data• has_attribute: entity has_attribute data

Relations are:• Defined between pairs of terms• Directional• Transitive (propagated from child to parent terms), e.g. if A is_a B is_a C we can infer A is_a C.

Rules:• Define which relations must (or may) be specified for which terms• Reflect well established or self-evident principles

Term hierarchy• Every term (excluding top-level) is related to one or more other term by an is_a (subclass) relationship. is_a define the basic term hierarchy• All "child" terms must share the intrinsic property of their "parent", in addition to having their own intrinsic property.

Conceptual model

Boxes indicates a namespace (top-level term)

Text indicates a term relation

Topic aggregates related concepts that would otherwise be unrelated.

Data includes everything from primitive types (e.g. simple parameters) to complex derived types (e.g. biological data)

TermsEach term corresponds to a well established concept (class) with >=1 intrinsic property

• The class (term + child terms) must have these properties! • Child terms can only add new properties (restrictions not allowed)

EDAM is in OBO format – convenient for editing, does everything we need.

An OBO term consists of:• Unique identifier - persistent IDs (see below)• Name - intuitive & consistent naming conventions • Namespace - “topic”, “operation”, “data resource” etc.• Definition - with consistent semantics• Comment (optional) - e.g. on term usage, boundaries with other ontologies etc.• Synonym(s) (optional) - where in common use• Cross-reference(s) (optional) - to various resources• Relationships to other terms

Handling change / persistenceTerm IDs will persist between versions:

• A term ID will never be deleted once created• A given ID will always identify the same concept• Term names, definitions and comments might change, but will remain true to concept• Obsolete terms will persist (remain in EDAM with same ID)

OBO Term Statement

[Term]id: EDAM:0000970name: Citationnamespace: datadef: "A bibliographic citation providing references to scientific article, book or other published material." [EDAM:EBI "EMBRACE definition"]comment: A citation might include the authors, title and journal name, date and (possibly) an abstract of the publication or link to the full-text if it's freely availability.synonym: "Reference" EXACT []xref: Moby:GCP_SimpleCitationxref: Moby:Publicationis_a: EDAM:0002526 ! Textual data

[Term]id: EDAM:0000292name: Sequence alignmentnamespace: operationdef: "Align molecular sequences." [EDAM:EBI "EMBRACE definition"]synonym: "Sequence alignment generation" EXACT []is_a: EDAM:0002463 ! Sequence alignment processingis_a: EDAM:0002451 ! Sequence comparisonrelationship: has_input EDAM:0000841 ! Undefinedrelationship: has_output EDAM:0000863 ! Sequence alignment

Status

EDAM is in “beta”:

• EDAM_beta10 – many cycles of design/changes/inspection!

• Provides coarse coverage of tools and data types in the EMBRACE Registry / BioCatalogue:

• Suitable for pilot usage (please provide feedback!)

• Starting point for service nomenclature

Coverage - Quite broad in general and quite deep for sequence analysis:

• ~2500 terms with definitions (Nov’ 2010)

• 8 basic types of relation (plus inverse relations)

• Relations are defined but not used in many term definitions (in progress!)

Terms RelationsTopic: 148 is_a: 2822Operation: 500 concerns: 198 Resource: 159 has_input: 20Data: 1028 has_output: 329Data (Identifier): 410 is_source_of: 205Format: 274 is_identifier_of: 257Entity: 81 is_format_of: 0

has_attribute: 27

Design Principles

It wasn’t just thrown together (honestly) …

• Clearly defined scope • Purpose-independent (design not tied to a use case) • Relevant to annotation of current:

• WSDL files • XSD schema • Standalone databases, servers and tools

• General (common concepts only, no fine-grained specialised concepts) • Comprehensive (enough terms to be useful)• Uncluttered (minimal namespaces and relation types) • Comprehensible (terms and relations are simple and intuitive) • Navigable (simple class (is_a) hierarchy)• Integrity (genuine class/subclass relationships with concepts having unique properties)• Complementary (not duplicating established ontologies) • Crosse-referenced (to existing resources) • Validatable (via file parsing / checks in viewers etc) • Compatible (so far as possible with "upper level" ontologies)• Extensible (clear guidelines for developers)• Convenient (clear guidelines for annotators)

There is a compromise in achieving the above – a pragmatic approach is essential!

Limitations

EDAM is/does not:

• Describe syntax or file formats in detail (format namespace will provide URL references)

• Define data structures. has_part / is_part_of relations are not used - it’s difficult (impossible?) to add such relations in a generic way without EDAM becoming a specific format of data types

• Include terms for every conceptual part of things. Typically a datatype is only listed if it known to be in common use

• A catalogue of individual data structures, databases etc. Terms correspond to classes; specific instances (individuals) are not included.

• Complete (and arguably never can be).

• A very complex ontology. Many relations and other domain features that could be expressed, e.g. in OWL format, are not modelled.

Other tools are needed to unify all services and data - EDAM should help!

Viewing

EDAM may be viewed in:

• Any text editor

• Ontology editorOBO Ontology Editor (OBOEdit) Version 2http://oboedit.org

• Web-based browsers: NCBO Ontology Browser http://bioportal.bioontology.org/visualize/42800EBI Ontology Look-up Servicehttp://www.ebi.ac.uk/ontology-lookup/

• SRSEBI SRS serverhttp://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+LibInfo+-lib+EDAM

Thanks

• Peter Rice (boss) • Hamish McWilliam (SRS)• Alan Bleasby (PURLs) • Mahmut Uludag (EMBOSS WS) • James Malone (SWO) • Steve Pettifer • The Forgotten … (sorry)

All enquiries to Jon Ison ([email protected]) cc’ing Matus Kalas ([email protected])

Download & Documentation

“Beta" version in OBO (Open Biomedical Ontologies) format: http://sourceforge.net/projects/edamontology/files/

Documentation at:

http://edamontology.sourceforge.net/

Including clear statement of:

• Branches of terms (namespaces / sub-ontologies)

• Relations

• Rules (governing rules and relations)

• Guidelines for Developers

• Guidelines for Annotators (basic)

• And more …

http://sourceforge.net/projects/edamontology/files/

http://edamontology.sourceforge.net/

AnnotationUsing EDAM

Ontology for Bioinformatics Tools and Datatypes

Jon Ison ([email protected])

Matus Kalas ([email protected])

mailto:[email protected]

mailto:[email protected]

Guidelines for Annotators

Which EDAM branch to use?• "Topic" - any resources (very coarse-grained annotation)• "Operation" - tool functions (fine-grained annotation)• "Data resource" - databases, servers etc. (broad categories based of content-type)• "Data" - datatypes (annotation of semantics)• "Data format" - datatypes / formats (annotation of syntax)

Picking TermsMany annotations ? First familiarise yourself with EDAM (OBOEdit, NCBO etc.)

• Identify correct branch/namespace ("Operation", "Data" etc) considering what is being annotated• Search EDAM using keywords to find candidate terms. • Multiple searches using synonyms, alternative spellings etc. are preferable.• Pick the most specific term(s) available. • Bear in mind some concepts are necessarily overlapping or general.• Only pick correct terms• If a term doesn’t exist, request it's added to EDAM

Use of other ontologiesUse other ontologies too, where possible and desirable.

e.g. an operation that predicts molecular sequence features could be annotated with terms from SO (Sequence Ontology) for the features.

Model of a Web ServiceA WS is considered as an arbitrary (but usually related) set of one or more operations, reducing the problem of WS interoperation to one of compatibility between operations.

Operation• Discrete unit of functionality performing (typically) one or more definite functions• Reads an input• Writes an output• Uses zero or more data resources

Input / Output• Payload (e.g. of SOAP messag) passed / returned in operation call• Name and (ideally) description is given (e.g. in WSDL file)• Input has one or XML elements which must be set (input values) or which are written (output values)

XML elements• Correspond to the input and output service parameters – values which are set or generated • Name and (ideally) description of element is given in schema• Simple or complex XSD types in an XML schema (e.g. within / referenced from WSDL file)• Element values are instances of a particular datatype with a semantic type and a specific syntax.• Most element values have a syntax fully specified by the schema• Some element values correspond to formatted text not specified by the schema. Such reports may be a composite of different semantic types.

Data resources• Databases or ontologies used in the background• Not passed in a WS call• Might be specified indirectly via a parameter. e.g. an operation reads a database, the name of which is specified

Levels of AnnotationAnnotation of a WSDL file or associated XSD schema is possible at several levels.

Assuming SAWSDL annotation: http://www.w3.org/TR/sawsdl/

Annotatable XML elements are:

1. Web service (the whole thing as a collection of possibly multiple operations) ( <wsdl:portType> ) * One (or more) "Topic" terms for the general area(s) the service concerns * One (or more) "Data resource" terms for data resources used by the service (if applicable)

2. Operation ( <wsdl:operation> inside <wsdl:portType> ) * One (or more) "Operation" terms for each WSDL operation (typically just 1 annotation)

3. Input parameters and their sub-parts 4. Output parameters and their sub-parts ( <xs:element>, <xs:complexType>, <xs:simpleType>, <xs:attribute> ) * One (or more) "Data" terms * One (or more) "Data format" terms

SAWSDL annotation

Each EDAM term must have a URI

PURLs (Persistent Uniform Resource Locators) are used

EDAM PURLs include the ontology name (edam), term namespace, and the term unique identifier (ID)

A PURL should be given inside the sawsdl:modelReference annotation

<element name="elementName" sawsdl:modelReference="http://purl.org/edam/namespace/id">

Where ...

* element is the XML element being annotated * elementName is the name of the XML element * namespace is the namespace of the EDAM term, e.g. "operation" * id is the unique identifier of the term, e.g. "0000295"

The PURL (the value of the sawsdl:modelReference attribute) is a URI pointing to the term definition.

PURLs for all EDAM terms have been created.

http://purl.org/edam/namespace/id

SAWSDL annotationThese 3 EDAM terms:

[Term]id: EDAM:0000182name: Sequence alignmentnamespace: topic...

[Term]id: EDAM:0000292name: Sequence alignmentnamespace: operation...

[Term]id: EDAM:0000863name: Sequence alignmentnamespace: data...

Would give these 3 PURLs:

http://purl.org/edam/topic/0000182

http://purl.org/edam/operation/0000292

http://purl.org/edam/data/0000863




SAWSDL annotationWhich can be used in SAWSDL annotation, e.g.

<wsdl:portType name="myService" sawsdl:modelReference="http://purl.org/edam/topic/0000182">

<sawsdl:attrExtension sawsdl:modelReference="http://purl.org/edam/operation/0000292>

<xs:element name="outfile" sawsdl:modelReference="http://purl.org/edam/data/0000863>

If >1 annotation is required, delimit them with space characters:

<wsdl:portType name="myService" sawsdl:modelReference="http://purl.org/edam/topic/0000182 http://purl.org/edam/topic/0000181">

Note such multiple annotations:• Can be from different namespaces• Can be from different ontologies entirely






SAWSDL annotationSAWSDL spec. peculiarity when annotating operations

Annotations on <wsdl:operation> element inside <wsdl:portType> should be handled using a <sawsdl:attrExtensions> element. This is not a requirement for other elements.

The <sawsdl:attrExtension> element inside the <wsdl:operation> must be before <wsdl:input>, <wsdl:output> and <wsdl:fault> (typically after the <wsdl:documentation> element).

For example:

<wsdl:portType name="Clustalw2PortType" sawsdl:modelReference="http://purl.org/edam/topic/0000186"> <wsdl:operation name="submitClustalw2"> <wsdl:documentation>Submit a sequence and get a jobID</wsdl:documentation> <sawsdl:attrExtensions sawsdl:modelReference="http://purl.org/edam/operation/0000496"/> <wsdl:input message="submitClustalw2Msg"/> <wsdl:output message="submitClustalw2ResponseMsg"/> </wsdl:operation>

Caution: Some WSDL/XSD validators or SOAP libraries do not check for it, but some do require the strict order of these elements!

EDAM term end-points

When pasted into a browser, the PURLs:

http://purl.org/edam/topic/0000182http://purl.org/edam/operation/0000292http://purl.org/edam/data/0000863

... resolve to:

http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182 http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000292 http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000863

These are complete OBO term statements in plain text (OBO format). PURLs support text extensions allowing a format specifier to be added. For example these PURLs:

http://purl.org/edam/topic/0000182?style=htmlhttp://purl.org/edam/operation/0000292?style=htmlhttp://purl.org/edam/data/0000863?style=html

... will resolve to OBO term statements in HTML such that terms referred to in the statements (via relations) will be clickable to allow navigation:

http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182?style=html http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000292?style=html http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000863?style=html

EDAM term end-points

The eventual final list of end-points will provide other formats/views:

• Plain text in OBO format (default)• HTML • XML• JSON• The term in a web browser, e.g. NCBO Ontology Browser.

http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182?style=html http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=xmlhttp://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=txthttp://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=jsonhttp://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=browser (default)

Not all these formats are yet implemented!

For now, you can see this in action for this term:

http://purl.org/edam/entity/0000002 http://purl.org/edam/entity/0000002?style=html

Matúš KalašPål PuntervollArmin TöpferPrabakar VenkataramanJan Christian BryneInge Jonassen

CBU, BCCS, Bergen

Edita BartaševičiūtėKristoffer Rapacki

CBS, DTU, Greater Copenhagen

Jon Ison

EBI, EMBL, Hinxton

Alexandre JosephChristophe Blanchet

IBCP, CNRS, Lyon

Steve Pettifer

University of Manchester

The exchange format for basic bioinformatics data

Suitable for Web services

BioBioXSDXSD.org.org

Incompatible interfaces hamper usability

Compatible interfaces providesmooth interoperability

Goals of BioBioXSDXSD:

• Filling the gap between specialised exchange formats (such as SBML, MAGE-ML, PDBML, phyloXML, PSI-MI MIF, GCDML, GLYDE-II, … )

• Compatible with SOAP & XML libraries for all main programming languages

• As lightweight as possible, but fitting everyone

• Developed and maintained in open but organised collaborationwelcoming requests from the community

• Detailed structure, “semantically rich”allowing in-depth validation, semantic annotation, efficient compression

• Annotated by the EDAM ontologyData branch (+ Data format and Data resource)

BioBioXSDXSD 1.0 defines exchange formats for:

references to data, accessions, …

annotated sequence

sequence alignment

biological sequence

BioBioXSDXSD : BiosequenceRecord

BioBioXSDXSD : BiosequenceAlignment

>sp|P43353|AL3B1_HUMAN Aldehyde dehydrogenase family 3 member B1 OS=Homo sapiens GN=ALDH3B1 PE=1 SV=1MDPLGDTLRRLREAFHAGRTRPAEFRAAQLQGLGRFLQENKQLLHDALAQDLHKSAFESEVSEVAISQGEVTLALRNLRAWMKDERVPKNLATQLDSAFIRKEPFGLVLIIAPWNYPLNLTLVPLVGALAAGNCVVLKPSEISKNVEKILAEVLPQYVDQSCFAVVLGGPQETGQLLEHRFDYIFFTGSPRVGKIVMTAAAKHLTPVTLELGGKNPCYVDDNCDPQTVANRVAWFRYFNAGQTCVAPDYVLCSPEMQERLLPALQSTITRFYGDDPQSSPNLGRIINQKQFQRLRALLGCGRVAIGGQSDESDRYIAPTVLVDVQEMEPVMQEEIFGPILPIVNVQSLDEAIEFINRREKPLALYAFSNSSQVVKRVLTQTSSGGFCGNDGFMHMTLASLPFGGVGASGMGRYHGKFSFDTFSHHRACLLRSPGMEKLNALRYPPQSPRRLRMLLVAMEAQGCSCTLL

>AL3B1_HUMAN P43353 ALDEHYDE DEHYDROGENASE 3B1 (EC 1.2.1.5). - Homo sapiens (Human).MDPLGDTLRRLREAFHAGRTRPAEFRAAQLQGLGRFLQENKQLLHDALAQDLHKSAFESEVSEVAISQGEVTLALRNLRAWMKDERVPKNLATQLDSAFIRKEPFGLVLIIAPWNYPLNLTLVPLVGALAAGNCVVLKPSEISKNVEKILAEVLPQYVDQSCFAVVLGGPQETGQLLEHRFDYIFFTGSPRVGKIVMTAAAKHLTPVTLELGGKNPCYVDDNCDPQTVANRVAWFRYFNAGQTCVAPDYVLCSPEMQERLLPALQSTITRFYGDDPQSSPNLGRIINQKQFQRLRALLGCGRVAIGGQSDESDRYIAPTVLVDVQEMEPVMQEEIFGPILPIVNVQSLDEAIEFINRREKPLALYAFSNSSQVVKRVLTQTSSGGFCGNDGFMHMTLASLPFGGVGASGMGRYHGKFSFDTFSHHRACLLRSPGMEKLNALRYPPQSPRRLRMLLVAMEAQGCSCTLL

>gi|4502043|ref|NP_000685.1| aldehyde dehydrogenase family 3 member B1 isoform a [Homo sapiens]MDPLGDTLRRLREAFHAGRTRPAEFRAAQLQGLGRFLQENKQLLHDALAQDLHKSAFESEVSEVAISQGEVTLALRNLRAWMKDERVPKNLATQLDSAFIRKEPFGLVLIIAPWNYPLNLTLVPLVGALAAGNCVVLKPSEISKNVEKILAEVLPQYVDQSCFAVVLGGPQETGQLLEHRFDYIFFTGSPRVGKIVMTAAAKHLTPVTLELGGKNPCYVDDNCDPQTVANRVAWFRYFNAGQTCVAPDYVLCSPEMQERLLPALQSTITRFYGDDPQSSPNLGRIINQKQFQRLRALLGCGRVAIGGQSDESDRYIAPTVLVDVQEMEPVMQEEIFGPILPIVNVQSLDEAIEFINRREKPLALYAFSNSSQVVKRVLTQTSSGGFCGNDGFMHMTLASLPFGGVGASGMGRYHGKFSFDTFSHHRACLLRSPGMEKLNALRYPPQSPRRLRMLLVAMEAQGCSCTLL

>sp_ac|P43353 \ID= AL3B1_HUMAN \DE="Aldehyde dehydrogenase family 3 member B1 (Aldehyde dehydrogenase 7)" \NCBITAXID=9606 MDPLGDTLRRLREAFHAGRTRPAEFRAAQLQGLGRFLQENKQLLHDALAQDLHKSAFESEVSEVAISQGEVTLALRNLRAWMKDERVPKNLATQLDSAFIRKEPFGLVLIIAPWNYPLNLTLVPLVGALAAGNCVVLKPSEI

Sequence record in BioBioXSDXSD:

<mySequence xsi:type="AminoacidSequenceRecord"> <sequence>MDPLGDTLRRLREAFHAGRTRPAEFRAAQLQGLGRFLQENKQLLHDALAQDLHKSAFESEVSEVAISQGEVTLALRNLRAWMKDERVPKNLATQLDSAFIRKEPFGLVLIIAPWNYPLNLTLVPLVGALAAGNCVVLKPSEISKNVEKILAEVLPQ YVDQSCFAVVLGGPQETGQLLEHRFDYIFFTGSPRVGKIVMTAAAKHLTPVTLELGGKNPCYVDDNCDPQTVANRVAWFRYFNAGQTCVAPDYVLCSPEMQERLLPALQSTITRFYGDDPQSSPNLGRIINQKQFQRLRALLGCGRVAIGGQSDESDRYIAPTVLVDVQE MEPVMQEEIFGPILPIVNVQSLDEAIEFINRREKPLALYAFSNSSQVVKRVLTQTSSGGFCGNDGFMHMTLASLPFGGVGASGMGRYHGKFSFDTFSHHRACLLRSPG MEKLNALRYPPQSPRRLRMLLVAMEAQGCSCTLL</sequence> <species> <databaseName>NCBI Taxonomy</databaseName> <accession>9606</accession> <entryUri>http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606</entryUri> <name>Human</name> </species> <customName>Aldehyde dehydrogenase family 3 member B1 (ALDH3B1)</customName> <formalReference> <databaseName>UniProt</databaseName> <accession xsi:type=“UniprotAccession">P43353</accession> <entryUri>http://www.uniprot.org/uniprot/P43353</entryUri> <sequenceVersion>1</sequenceVersion> <isoformAccession xsi:type=“ExtendedUniprotAccession">P43353-1</isoformAccession> </formalReference></mySequence>

Sequence record in BioBioXSDXSD 1.0:

Sequence-string restriction in BioBioXSDXSD:

<xs:simpleType name="NucleotideSequence" sawsdl:modelReference="http://purl.org/edam/data/0001211"> <xs:annotation> <xs:documentation>

Nucleotide sequence in any letter case, without ambiguous ("degenerate") bases </xs:documentation>

</xs:annotation>

<xs:restriction base="GenericNucleotideSequence"> <xs:pattern value="[acgt]+"/>

<xs:pattern value="[acgu]+"/>

</xs:restriction></xs:simpleType>

BioBioXSDXSD 1.0 types:

SimpleTypes:

NucleotideSequence AminoacidSequence GeneralNucleotideSequence GeneralAminoacidSequence Biosequence

Accession(s)

helper types: Name, FreeText Uri Integer(s), Decimal(s) … and a few more

ComplexTypes:

NucleotideSequenceRecord AminoacidSequenceRecord GeneralNucleotideSequenceRecord GeneralAminoacidSequenceRecord BiosequenceRecord

..SequenceAlignment(s)

AnnotatedSequence

DatabaseReference, EntryReference OntologyReference, OntologyTerm Species, SequenceReference, Method

helper types: Score, SequencePosition(s) … a few more

BioBioXSDXSD can be used:

• Directly as an input/output format of tools

• BioXSD can be extended, restricted,or included within other formats

• BioXSD can serve as the intermediate canonical format

With BioXSD,, users of Web services getsmooth interoperability

With BioXSD, providers of Web services getready-made building blocks for interfaces

<wsdl:types><xsd:schema ...>

::<xsd:element name=” myOperation”>

?</xsd:element><xsd:element name=” myOperationResponse”>

?</xsd:element>::

</xsd:schema></wsdl:types>

BioBioXSDXSDdata types

Get ready for the exerciseOpen this tutorial from

www.ii.uib.no/~matus/tutorial_webservices.pdfOpen BioXSD documentation

bioxsd.org/technicalDocumentation/BioXSD-1.0Download EDAM from

sourceforge.net/projects/edamontology/files& Install OBO-Edit oboedit.orgOr open EDAM at

bioportal.bioontology.org/visualize/44871Install SoapUI soapui.orgIf you wish, install an XML editor oxygenxml.com

(or XMLSpy from altova.com; Win only)

Get ready for the exercise

Download example WSDL (template)www.ii.uib.no/~matus/example.wsdl.xml

Download filled-in example WSDLwww.ii.uib.no/~matus/example2.wsdl.xml

Exercise a)

WSDL you have downloaded is an example of document/literal-wrapped, WS-I compliant WSDL.

Following the recommended WSDL-first approach, edit the WSDL to match your tool that you have in mind.

Change names of constructs in the WSDL and fill in wsdl:documentation in wsdl:portType and xs:documentation of the I/O parameters

Exercise b)

Use a standard XML format for input parameters where applicable.

Define the structure/format of the output data. Use standard formats where applicable.

Exercise c)

Annotate wsdl:portType and its wsdl:operation-s by terms from EDAM ('Topic', 'Operation', 'Data resource') and maybe another ontology.

Annotate xs:element-s and other parts of the I/O data that you have defined by terms from EDAM ('Data', 'Data format'). If you are using some BioXSD types, these have been already annotated.

Documents

Interoperable Semantic Web servicesmatus/tutorial_webservices.pdf · Interoperable Semantic Web services Matúš Kalaš, Computational Biology Unit, Uni Bergen, Norway Jon Ison, EMBL-EBI,