Upload
trandat
View
239
Download
0
Embed Size (px)
Citation preview
RESTful RDF Web Services for
Predictive Toxicology
Dr Nina Jeliazkova
ACS RDF Symposium Boston August 2010
Ideaconsult Ltd 4 A Kanchev str Sofia 1000 Bulgaria
Develops and maintains open source software applications
bull Toxtree 210 ndash toxic hazard
estimation 12 modules
bull Toxmatch 106 ndash A chemical
similarity evaluation tool
bull Online QMRF repository
httpqsardbjrcit
bull Ambit Ambit XT
bull httpambitsourceforgenet
bull Partner in OpenTox FP7 project
bull Partner in CADASTER FP7
project
Ideaconsult Ltd2
bull Objective to develop a framework that provides an unified access tondash toxicity data
ndash predictive models
ndash procedures supporting validation and additional information that helps with the interpretation of predicted results
bull European CommisionFramework Programme 7 HEALTH-2007-133
bull 11 partners
OpenTox project
httpwwwopentoxorg
Ideaconsult Ltd3
Why integration framework for predictive
toxicology
August 22 2010 Ideaconsult Ltd4
bull What we would like to do
ndash Build use validate and compare multiple
models
ndash Reliable reproduce models from the literature
ndash Merge data from different sources (files
databases)
ndash Find all models available for certain endpoint
ndash More hellip
bull Challengesndash Chemical structures
bull Might be ambiguous
bull Might be error prone or time consuming to reproduce from publications
ndash Data bull Multiple formats
bull Implicit semantics often buried in human readable documentation only
ndash Modelsbull Tens of thousands available in software or in publications
bull Multiple software solutions mostly incompatible
bull Predictions reproducibility is time consuming and often hard to achieve
bull Automatic comparison of prediction results difficult
Why integration framework for predictive
toxicology
Ideaconsult Ltd5
Framework design rationales
Ideaconsult Ltd6
User Requirements Software Requirements
Umambiguous data formal way of representing information about data
Unambiguous access well-defined interfaces
Transparency of
computational tools
formal way of representing information about
methods well-defined interfaces
Variety of user groups simplicity and modularity of design
Need to integrate various
resources (eg databases
prediction methods
models hellip) to make
meaningful predictions
distributed architecture interoperability
Need to integrate
biological information
again modularity of design extensibility
The framework
bull OpenTox API
ndash The way applications talk to each other
ndash The way developers talk to applications
ndash httpopentoxorgdevapisapi-11
bull The basic building blocks
ndash data chemical structures algorithms
and models
bull Functionality offered
ndash build models
ndash apply models
ndash validate models
ndash access and query data in various ways
bull Technologies
ndash REST style web services
ndash RDF for description of resources
ndash Links to existing and newly developed
ontologies (mainly to describe
metadata) about resources
Ideaconsult Ltd7
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Ontology
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
AppDomain
GET
POST
PUT
DELETEValidation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETE
Data
service
Models
service
Validatio
n service
Reporting
service
Ontology
Clients (User Interface) Standalone Browsers etc
Data
service
Data
service
all kind of
data
Registrati
on
Queries
Models
service
Reporting
service
Validation
service
Web servicesWeb services
Web services
Web services
WS
WS
Web services
Security
WSWS
Design principles
bull Resource oriented
ndash Every object (resource) is named and addressable (eg HTTP URL ) Example httpexampleopentoxcommodelmyBestModel httpexampleopentoxcomcompound50-00-0
ndash RESTfull API design starts by identifying most important objects and groups of objects supported by
the software system and proceeds by defining URL patterns
bull Transport protocol
ndash HTTP is the most popular choice of transport protocol but other protocols can be used as well
bull Operations
ndash All resources (nouns) support the same fixed and universal number of operations (verbs) HTTP
(GET POST PUT DELETE) operations are the common choice when the transport protocol is
HTTP
bull Hypermedia as the Engine of Application State
ndash All resources should be reachable via a single (or minimum) number of entry points into RESTful
applications Thus a representation of a resource should return hypermedia links to related resources
bull Error codes (for each resourceoperation pair )
ndash HTTP status codes (eg 200 OK 400 Bad Request 404 Not found etc) are usually used
Representational State Transfer (REST)A software architecture style defined by Roy Fielding in his PhD thesis (2000) Many services worldwide offer REST
API There are (currently) no standards for RESTful applications but merely design guides
Ideaconsult LtdAugust 22 20108
OpenTox resources (1)
OpenTox considers the following set of entities as essential building blocks
bull Structures of chemical compounds
bull Properties and identifiers of chemical compounds
bull Datasets of chemical compounds and various properties (measured or calculated)
bull Algorithms
ndash Data processing algorithms
bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)
bull Data preprocessing (eg Principal component analysis feature selection)
bull Structure processing (eg structure optimization)
bull Algorithms relating set of structures to another set of structures (eg similarity search or
metabolite generation)
ndash Machine learning algorithms
bull Supervised (eg Regression Classification)
bull Unsupervised (eg Clustering )
ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human
experts not derived by learning algorithms)
Ideaconsult LtdAugust 22 20109
OpenTox resources (2)
bull Models are generated by respective algorithms given specific parameters
bull Statistical models are generated by applying statisticalmachine learning algorithms
to specific dataset and parameters
bull Models can be other than statistical eg expert defined rules quantum mechanical
calculations metabolite generation etc The intention of the framework is to be
generic enough to accommodate varieties of predictive models
bull Validation provides procedures independent of model building facilities (eg
crossvalidation) and generates relevant statistics
bull Reports
ndash Various types of reports might be generated using building blocks above (eg validation
report can be generated using validation object a model and a dataset)
bull In addition the following components are introduced
ndash Task (asynchronous processing of computationally intensive tasks)
ndash Authentication and authorization (Ensuring secure access to sensitive resources)
ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources
registration)
Ideaconsult LtdAugust 22 201010
Resources identificationAll resources are identified via unique web address assigned according to the URL templates
Ideaconsult Ltd11
Component Description URL Template (example)
Compound Representations of chemical compounds httphostportcompoundcompoundid
Feature Properties and identifiers httphostportfeaturefeatureid
Dataset Encapsulates set of chemical compounds and their property
values
httphostportdatasetdatasetid
Model OpenTox model services httphostportmodelmodeld
Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid
Validation
Report
A validation corresponds to the validation of a model on a
test dataset
httphostportvalidationvalidationid
httphostportreportreportid
Task Asynchronous jobs are handled via an intermediate Task
resource A resource submitting an asynchronous job
should return the URI of the task
httphostporttasktaskid
Ontology service Provides storage and SPARQL search functionality for
objects defined in OpenTox services and relevant
ontologies
httphostportontology
Authentication and
authorisation
Granting access to protected resources for authorised users httphostportopensso
httphostportopensso-pol
OpenTox REST operations
Ideaconsult Ltd12
Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg
httphostportmodelmodel_id or httphostportdatasetdataset_id
bull GET ndash retrieve representation of the resource
bull PUT ndash update representation of the resource
bull POST
ndash replace representation of the resource with a new one (eg replace the dataset with new
content)
ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a
model URI as a result)
bull DELETE ndash delete the resource
Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)
bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)
bull PUT - NA
bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset
content to the dataset service)
bull DELETE ndash NA
Build a predictive model
Create a model
Run calculations with
dataset
httphost1datasetid
Structures
descriptors
endpoints
Dataset service
Returns the model URL
httphost1modelid
HTTP POST
Build a predictive model
Regression
Classification
Quantum Chemistry
Descriptors etc
validationid
Algorithm service
Validation service
modelid
Published models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to models creation
14Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmneuralnetwork
httpmyhostcommodelpredictivemodel1
Use an algorithm to build a model
Ideaconsult Ltd15
bullAn algorithm is applied by submitting
HTTP POST to the algorithm URI and
providing required parameters
bullA common required parameter is
dataset_uri=httphostportdatasetda
tasetid which specifies the data set to be
operated on
bullHTTP POST in REST style services
returns URI of the result and not the
content of the result
bullThe algorithm services are designed to
store the results into a dataset service and
return the URL of the resulted dataset
bullIn case of slow calculations a Task URI
instead of the dataset URI is returned
$ curl -H Accepttexturi-list -X POST -d
dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d
prediction_feature=httpappsideaconsultnet8080ambit2feature26
701 -d
dataset_service=httpappsideaconsultnet8080ambit2dataset
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmJ48 -iv
Connected to opentoxinformatiktu-muenchende (1311592816) port
8080 (0)
POST OpenTox-devalgorithmJ48 HTTP11
gt Host opentoxinformatiktu-muenchende8080
Accept
gt Content-Type applicationx-www-form-urlencoded
lt HTTP11 202 Accepted
lt Date Sat 31 Jul 2010 144638 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 99
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
Resources The model
Ideaconsult Ltd16
$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-
muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c
About to connect() to opentoxinformatiktu-muenchende port 8080 (0)
Trying 1311592816 connected
Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)
gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11
gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g
zlib1233 libidn18 libssh2018
gt Host opentoxinformatiktu-muenchende8080
gt Accepttexturi-list
gt
lt HTTP11 200 OK
lt Date Sat 31 Jul 2010 144722 GMT
Date Sat 31 Jul 2010 144722 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
lt Vary Accept-Charset Accept-Encoding Accept-Language Accept
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 86
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
bullWhen task URI is returned the
returned status code is HTTP 202
Accepted instead of HTTP 200
OK
bullThis tells the client the processing
is not completed and the client need
to poll the task URI until OK code
is returned
bullThe final result returned by
Example 25 is the URI of the new
model httpopentoxinformatiktu-
muenchende8080OpenTox-
devmodelTUMOpenToxModel_j4
8_48
bullTo obtain prediction results POST
a dataset to the model URI
Read data from a web address ndash process ndash write to a web address
Uniform approach to data processing (eg
Descriptors calculation)
17Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
+ =
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmdescriptorX
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Develops and maintains open source software applications
bull Toxtree 210 ndash toxic hazard
estimation 12 modules
bull Toxmatch 106 ndash A chemical
similarity evaluation tool
bull Online QMRF repository
httpqsardbjrcit
bull Ambit Ambit XT
bull httpambitsourceforgenet
bull Partner in OpenTox FP7 project
bull Partner in CADASTER FP7
project
Ideaconsult Ltd2
bull Objective to develop a framework that provides an unified access tondash toxicity data
ndash predictive models
ndash procedures supporting validation and additional information that helps with the interpretation of predicted results
bull European CommisionFramework Programme 7 HEALTH-2007-133
bull 11 partners
OpenTox project
httpwwwopentoxorg
Ideaconsult Ltd3
Why integration framework for predictive
toxicology
August 22 2010 Ideaconsult Ltd4
bull What we would like to do
ndash Build use validate and compare multiple
models
ndash Reliable reproduce models from the literature
ndash Merge data from different sources (files
databases)
ndash Find all models available for certain endpoint
ndash More hellip
bull Challengesndash Chemical structures
bull Might be ambiguous
bull Might be error prone or time consuming to reproduce from publications
ndash Data bull Multiple formats
bull Implicit semantics often buried in human readable documentation only
ndash Modelsbull Tens of thousands available in software or in publications
bull Multiple software solutions mostly incompatible
bull Predictions reproducibility is time consuming and often hard to achieve
bull Automatic comparison of prediction results difficult
Why integration framework for predictive
toxicology
Ideaconsult Ltd5
Framework design rationales
Ideaconsult Ltd6
User Requirements Software Requirements
Umambiguous data formal way of representing information about data
Unambiguous access well-defined interfaces
Transparency of
computational tools
formal way of representing information about
methods well-defined interfaces
Variety of user groups simplicity and modularity of design
Need to integrate various
resources (eg databases
prediction methods
models hellip) to make
meaningful predictions
distributed architecture interoperability
Need to integrate
biological information
again modularity of design extensibility
The framework
bull OpenTox API
ndash The way applications talk to each other
ndash The way developers talk to applications
ndash httpopentoxorgdevapisapi-11
bull The basic building blocks
ndash data chemical structures algorithms
and models
bull Functionality offered
ndash build models
ndash apply models
ndash validate models
ndash access and query data in various ways
bull Technologies
ndash REST style web services
ndash RDF for description of resources
ndash Links to existing and newly developed
ontologies (mainly to describe
metadata) about resources
Ideaconsult Ltd7
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Ontology
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
AppDomain
GET
POST
PUT
DELETEValidation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETE
Data
service
Models
service
Validatio
n service
Reporting
service
Ontology
Clients (User Interface) Standalone Browsers etc
Data
service
Data
service
all kind of
data
Registrati
on
Queries
Models
service
Reporting
service
Validation
service
Web servicesWeb services
Web services
Web services
WS
WS
Web services
Security
WSWS
Design principles
bull Resource oriented
ndash Every object (resource) is named and addressable (eg HTTP URL ) Example httpexampleopentoxcommodelmyBestModel httpexampleopentoxcomcompound50-00-0
ndash RESTfull API design starts by identifying most important objects and groups of objects supported by
the software system and proceeds by defining URL patterns
bull Transport protocol
ndash HTTP is the most popular choice of transport protocol but other protocols can be used as well
bull Operations
ndash All resources (nouns) support the same fixed and universal number of operations (verbs) HTTP
(GET POST PUT DELETE) operations are the common choice when the transport protocol is
HTTP
bull Hypermedia as the Engine of Application State
ndash All resources should be reachable via a single (or minimum) number of entry points into RESTful
applications Thus a representation of a resource should return hypermedia links to related resources
bull Error codes (for each resourceoperation pair )
ndash HTTP status codes (eg 200 OK 400 Bad Request 404 Not found etc) are usually used
Representational State Transfer (REST)A software architecture style defined by Roy Fielding in his PhD thesis (2000) Many services worldwide offer REST
API There are (currently) no standards for RESTful applications but merely design guides
Ideaconsult LtdAugust 22 20108
OpenTox resources (1)
OpenTox considers the following set of entities as essential building blocks
bull Structures of chemical compounds
bull Properties and identifiers of chemical compounds
bull Datasets of chemical compounds and various properties (measured or calculated)
bull Algorithms
ndash Data processing algorithms
bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)
bull Data preprocessing (eg Principal component analysis feature selection)
bull Structure processing (eg structure optimization)
bull Algorithms relating set of structures to another set of structures (eg similarity search or
metabolite generation)
ndash Machine learning algorithms
bull Supervised (eg Regression Classification)
bull Unsupervised (eg Clustering )
ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human
experts not derived by learning algorithms)
Ideaconsult LtdAugust 22 20109
OpenTox resources (2)
bull Models are generated by respective algorithms given specific parameters
bull Statistical models are generated by applying statisticalmachine learning algorithms
to specific dataset and parameters
bull Models can be other than statistical eg expert defined rules quantum mechanical
calculations metabolite generation etc The intention of the framework is to be
generic enough to accommodate varieties of predictive models
bull Validation provides procedures independent of model building facilities (eg
crossvalidation) and generates relevant statistics
bull Reports
ndash Various types of reports might be generated using building blocks above (eg validation
report can be generated using validation object a model and a dataset)
bull In addition the following components are introduced
ndash Task (asynchronous processing of computationally intensive tasks)
ndash Authentication and authorization (Ensuring secure access to sensitive resources)
ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources
registration)
Ideaconsult LtdAugust 22 201010
Resources identificationAll resources are identified via unique web address assigned according to the URL templates
Ideaconsult Ltd11
Component Description URL Template (example)
Compound Representations of chemical compounds httphostportcompoundcompoundid
Feature Properties and identifiers httphostportfeaturefeatureid
Dataset Encapsulates set of chemical compounds and their property
values
httphostportdatasetdatasetid
Model OpenTox model services httphostportmodelmodeld
Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid
Validation
Report
A validation corresponds to the validation of a model on a
test dataset
httphostportvalidationvalidationid
httphostportreportreportid
Task Asynchronous jobs are handled via an intermediate Task
resource A resource submitting an asynchronous job
should return the URI of the task
httphostporttasktaskid
Ontology service Provides storage and SPARQL search functionality for
objects defined in OpenTox services and relevant
ontologies
httphostportontology
Authentication and
authorisation
Granting access to protected resources for authorised users httphostportopensso
httphostportopensso-pol
OpenTox REST operations
Ideaconsult Ltd12
Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg
httphostportmodelmodel_id or httphostportdatasetdataset_id
bull GET ndash retrieve representation of the resource
bull PUT ndash update representation of the resource
bull POST
ndash replace representation of the resource with a new one (eg replace the dataset with new
content)
ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a
model URI as a result)
bull DELETE ndash delete the resource
Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)
bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)
bull PUT - NA
bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset
content to the dataset service)
bull DELETE ndash NA
Build a predictive model
Create a model
Run calculations with
dataset
httphost1datasetid
Structures
descriptors
endpoints
Dataset service
Returns the model URL
httphost1modelid
HTTP POST
Build a predictive model
Regression
Classification
Quantum Chemistry
Descriptors etc
validationid
Algorithm service
Validation service
modelid
Published models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to models creation
14Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmneuralnetwork
httpmyhostcommodelpredictivemodel1
Use an algorithm to build a model
Ideaconsult Ltd15
bullAn algorithm is applied by submitting
HTTP POST to the algorithm URI and
providing required parameters
bullA common required parameter is
dataset_uri=httphostportdatasetda
tasetid which specifies the data set to be
operated on
bullHTTP POST in REST style services
returns URI of the result and not the
content of the result
bullThe algorithm services are designed to
store the results into a dataset service and
return the URL of the resulted dataset
bullIn case of slow calculations a Task URI
instead of the dataset URI is returned
$ curl -H Accepttexturi-list -X POST -d
dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d
prediction_feature=httpappsideaconsultnet8080ambit2feature26
701 -d
dataset_service=httpappsideaconsultnet8080ambit2dataset
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmJ48 -iv
Connected to opentoxinformatiktu-muenchende (1311592816) port
8080 (0)
POST OpenTox-devalgorithmJ48 HTTP11
gt Host opentoxinformatiktu-muenchende8080
Accept
gt Content-Type applicationx-www-form-urlencoded
lt HTTP11 202 Accepted
lt Date Sat 31 Jul 2010 144638 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 99
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
Resources The model
Ideaconsult Ltd16
$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-
muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c
About to connect() to opentoxinformatiktu-muenchende port 8080 (0)
Trying 1311592816 connected
Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)
gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11
gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g
zlib1233 libidn18 libssh2018
gt Host opentoxinformatiktu-muenchende8080
gt Accepttexturi-list
gt
lt HTTP11 200 OK
lt Date Sat 31 Jul 2010 144722 GMT
Date Sat 31 Jul 2010 144722 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
lt Vary Accept-Charset Accept-Encoding Accept-Language Accept
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 86
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
bullWhen task URI is returned the
returned status code is HTTP 202
Accepted instead of HTTP 200
OK
bullThis tells the client the processing
is not completed and the client need
to poll the task URI until OK code
is returned
bullThe final result returned by
Example 25 is the URI of the new
model httpopentoxinformatiktu-
muenchende8080OpenTox-
devmodelTUMOpenToxModel_j4
8_48
bullTo obtain prediction results POST
a dataset to the model URI
Read data from a web address ndash process ndash write to a web address
Uniform approach to data processing (eg
Descriptors calculation)
17Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
+ =
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmdescriptorX
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
bull Objective to develop a framework that provides an unified access tondash toxicity data
ndash predictive models
ndash procedures supporting validation and additional information that helps with the interpretation of predicted results
bull European CommisionFramework Programme 7 HEALTH-2007-133
bull 11 partners
OpenTox project
httpwwwopentoxorg
Ideaconsult Ltd3
Why integration framework for predictive
toxicology
August 22 2010 Ideaconsult Ltd4
bull What we would like to do
ndash Build use validate and compare multiple
models
ndash Reliable reproduce models from the literature
ndash Merge data from different sources (files
databases)
ndash Find all models available for certain endpoint
ndash More hellip
bull Challengesndash Chemical structures
bull Might be ambiguous
bull Might be error prone or time consuming to reproduce from publications
ndash Data bull Multiple formats
bull Implicit semantics often buried in human readable documentation only
ndash Modelsbull Tens of thousands available in software or in publications
bull Multiple software solutions mostly incompatible
bull Predictions reproducibility is time consuming and often hard to achieve
bull Automatic comparison of prediction results difficult
Why integration framework for predictive
toxicology
Ideaconsult Ltd5
Framework design rationales
Ideaconsult Ltd6
User Requirements Software Requirements
Umambiguous data formal way of representing information about data
Unambiguous access well-defined interfaces
Transparency of
computational tools
formal way of representing information about
methods well-defined interfaces
Variety of user groups simplicity and modularity of design
Need to integrate various
resources (eg databases
prediction methods
models hellip) to make
meaningful predictions
distributed architecture interoperability
Need to integrate
biological information
again modularity of design extensibility
The framework
bull OpenTox API
ndash The way applications talk to each other
ndash The way developers talk to applications
ndash httpopentoxorgdevapisapi-11
bull The basic building blocks
ndash data chemical structures algorithms
and models
bull Functionality offered
ndash build models
ndash apply models
ndash validate models
ndash access and query data in various ways
bull Technologies
ndash REST style web services
ndash RDF for description of resources
ndash Links to existing and newly developed
ontologies (mainly to describe
metadata) about resources
Ideaconsult Ltd7
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Ontology
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
AppDomain
GET
POST
PUT
DELETEValidation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETE
Data
service
Models
service
Validatio
n service
Reporting
service
Ontology
Clients (User Interface) Standalone Browsers etc
Data
service
Data
service
all kind of
data
Registrati
on
Queries
Models
service
Reporting
service
Validation
service
Web servicesWeb services
Web services
Web services
WS
WS
Web services
Security
WSWS
Design principles
bull Resource oriented
ndash Every object (resource) is named and addressable (eg HTTP URL ) Example httpexampleopentoxcommodelmyBestModel httpexampleopentoxcomcompound50-00-0
ndash RESTfull API design starts by identifying most important objects and groups of objects supported by
the software system and proceeds by defining URL patterns
bull Transport protocol
ndash HTTP is the most popular choice of transport protocol but other protocols can be used as well
bull Operations
ndash All resources (nouns) support the same fixed and universal number of operations (verbs) HTTP
(GET POST PUT DELETE) operations are the common choice when the transport protocol is
HTTP
bull Hypermedia as the Engine of Application State
ndash All resources should be reachable via a single (or minimum) number of entry points into RESTful
applications Thus a representation of a resource should return hypermedia links to related resources
bull Error codes (for each resourceoperation pair )
ndash HTTP status codes (eg 200 OK 400 Bad Request 404 Not found etc) are usually used
Representational State Transfer (REST)A software architecture style defined by Roy Fielding in his PhD thesis (2000) Many services worldwide offer REST
API There are (currently) no standards for RESTful applications but merely design guides
Ideaconsult LtdAugust 22 20108
OpenTox resources (1)
OpenTox considers the following set of entities as essential building blocks
bull Structures of chemical compounds
bull Properties and identifiers of chemical compounds
bull Datasets of chemical compounds and various properties (measured or calculated)
bull Algorithms
ndash Data processing algorithms
bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)
bull Data preprocessing (eg Principal component analysis feature selection)
bull Structure processing (eg structure optimization)
bull Algorithms relating set of structures to another set of structures (eg similarity search or
metabolite generation)
ndash Machine learning algorithms
bull Supervised (eg Regression Classification)
bull Unsupervised (eg Clustering )
ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human
experts not derived by learning algorithms)
Ideaconsult LtdAugust 22 20109
OpenTox resources (2)
bull Models are generated by respective algorithms given specific parameters
bull Statistical models are generated by applying statisticalmachine learning algorithms
to specific dataset and parameters
bull Models can be other than statistical eg expert defined rules quantum mechanical
calculations metabolite generation etc The intention of the framework is to be
generic enough to accommodate varieties of predictive models
bull Validation provides procedures independent of model building facilities (eg
crossvalidation) and generates relevant statistics
bull Reports
ndash Various types of reports might be generated using building blocks above (eg validation
report can be generated using validation object a model and a dataset)
bull In addition the following components are introduced
ndash Task (asynchronous processing of computationally intensive tasks)
ndash Authentication and authorization (Ensuring secure access to sensitive resources)
ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources
registration)
Ideaconsult LtdAugust 22 201010
Resources identificationAll resources are identified via unique web address assigned according to the URL templates
Ideaconsult Ltd11
Component Description URL Template (example)
Compound Representations of chemical compounds httphostportcompoundcompoundid
Feature Properties and identifiers httphostportfeaturefeatureid
Dataset Encapsulates set of chemical compounds and their property
values
httphostportdatasetdatasetid
Model OpenTox model services httphostportmodelmodeld
Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid
Validation
Report
A validation corresponds to the validation of a model on a
test dataset
httphostportvalidationvalidationid
httphostportreportreportid
Task Asynchronous jobs are handled via an intermediate Task
resource A resource submitting an asynchronous job
should return the URI of the task
httphostporttasktaskid
Ontology service Provides storage and SPARQL search functionality for
objects defined in OpenTox services and relevant
ontologies
httphostportontology
Authentication and
authorisation
Granting access to protected resources for authorised users httphostportopensso
httphostportopensso-pol
OpenTox REST operations
Ideaconsult Ltd12
Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg
httphostportmodelmodel_id or httphostportdatasetdataset_id
bull GET ndash retrieve representation of the resource
bull PUT ndash update representation of the resource
bull POST
ndash replace representation of the resource with a new one (eg replace the dataset with new
content)
ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a
model URI as a result)
bull DELETE ndash delete the resource
Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)
bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)
bull PUT - NA
bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset
content to the dataset service)
bull DELETE ndash NA
Build a predictive model
Create a model
Run calculations with
dataset
httphost1datasetid
Structures
descriptors
endpoints
Dataset service
Returns the model URL
httphost1modelid
HTTP POST
Build a predictive model
Regression
Classification
Quantum Chemistry
Descriptors etc
validationid
Algorithm service
Validation service
modelid
Published models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to models creation
14Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmneuralnetwork
httpmyhostcommodelpredictivemodel1
Use an algorithm to build a model
Ideaconsult Ltd15
bullAn algorithm is applied by submitting
HTTP POST to the algorithm URI and
providing required parameters
bullA common required parameter is
dataset_uri=httphostportdatasetda
tasetid which specifies the data set to be
operated on
bullHTTP POST in REST style services
returns URI of the result and not the
content of the result
bullThe algorithm services are designed to
store the results into a dataset service and
return the URL of the resulted dataset
bullIn case of slow calculations a Task URI
instead of the dataset URI is returned
$ curl -H Accepttexturi-list -X POST -d
dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d
prediction_feature=httpappsideaconsultnet8080ambit2feature26
701 -d
dataset_service=httpappsideaconsultnet8080ambit2dataset
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmJ48 -iv
Connected to opentoxinformatiktu-muenchende (1311592816) port
8080 (0)
POST OpenTox-devalgorithmJ48 HTTP11
gt Host opentoxinformatiktu-muenchende8080
Accept
gt Content-Type applicationx-www-form-urlencoded
lt HTTP11 202 Accepted
lt Date Sat 31 Jul 2010 144638 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 99
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
Resources The model
Ideaconsult Ltd16
$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-
muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c
About to connect() to opentoxinformatiktu-muenchende port 8080 (0)
Trying 1311592816 connected
Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)
gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11
gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g
zlib1233 libidn18 libssh2018
gt Host opentoxinformatiktu-muenchende8080
gt Accepttexturi-list
gt
lt HTTP11 200 OK
lt Date Sat 31 Jul 2010 144722 GMT
Date Sat 31 Jul 2010 144722 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
lt Vary Accept-Charset Accept-Encoding Accept-Language Accept
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 86
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
bullWhen task URI is returned the
returned status code is HTTP 202
Accepted instead of HTTP 200
OK
bullThis tells the client the processing
is not completed and the client need
to poll the task URI until OK code
is returned
bullThe final result returned by
Example 25 is the URI of the new
model httpopentoxinformatiktu-
muenchende8080OpenTox-
devmodelTUMOpenToxModel_j4
8_48
bullTo obtain prediction results POST
a dataset to the model URI
Read data from a web address ndash process ndash write to a web address
Uniform approach to data processing (eg
Descriptors calculation)
17Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
+ =
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmdescriptorX
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Why integration framework for predictive
toxicology
August 22 2010 Ideaconsult Ltd4
bull What we would like to do
ndash Build use validate and compare multiple
models
ndash Reliable reproduce models from the literature
ndash Merge data from different sources (files
databases)
ndash Find all models available for certain endpoint
ndash More hellip
bull Challengesndash Chemical structures
bull Might be ambiguous
bull Might be error prone or time consuming to reproduce from publications
ndash Data bull Multiple formats
bull Implicit semantics often buried in human readable documentation only
ndash Modelsbull Tens of thousands available in software or in publications
bull Multiple software solutions mostly incompatible
bull Predictions reproducibility is time consuming and often hard to achieve
bull Automatic comparison of prediction results difficult
Why integration framework for predictive
toxicology
Ideaconsult Ltd5
Framework design rationales
Ideaconsult Ltd6
User Requirements Software Requirements
Umambiguous data formal way of representing information about data
Unambiguous access well-defined interfaces
Transparency of
computational tools
formal way of representing information about
methods well-defined interfaces
Variety of user groups simplicity and modularity of design
Need to integrate various
resources (eg databases
prediction methods
models hellip) to make
meaningful predictions
distributed architecture interoperability
Need to integrate
biological information
again modularity of design extensibility
The framework
bull OpenTox API
ndash The way applications talk to each other
ndash The way developers talk to applications
ndash httpopentoxorgdevapisapi-11
bull The basic building blocks
ndash data chemical structures algorithms
and models
bull Functionality offered
ndash build models
ndash apply models
ndash validate models
ndash access and query data in various ways
bull Technologies
ndash REST style web services
ndash RDF for description of resources
ndash Links to existing and newly developed
ontologies (mainly to describe
metadata) about resources
Ideaconsult Ltd7
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Ontology
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
AppDomain
GET
POST
PUT
DELETEValidation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETE
Data
service
Models
service
Validatio
n service
Reporting
service
Ontology
Clients (User Interface) Standalone Browsers etc
Data
service
Data
service
all kind of
data
Registrati
on
Queries
Models
service
Reporting
service
Validation
service
Web servicesWeb services
Web services
Web services
WS
WS
Web services
Security
WSWS
Design principles
bull Resource oriented
ndash Every object (resource) is named and addressable (eg HTTP URL ) Example httpexampleopentoxcommodelmyBestModel httpexampleopentoxcomcompound50-00-0
ndash RESTfull API design starts by identifying most important objects and groups of objects supported by
the software system and proceeds by defining URL patterns
bull Transport protocol
ndash HTTP is the most popular choice of transport protocol but other protocols can be used as well
bull Operations
ndash All resources (nouns) support the same fixed and universal number of operations (verbs) HTTP
(GET POST PUT DELETE) operations are the common choice when the transport protocol is
HTTP
bull Hypermedia as the Engine of Application State
ndash All resources should be reachable via a single (or minimum) number of entry points into RESTful
applications Thus a representation of a resource should return hypermedia links to related resources
bull Error codes (for each resourceoperation pair )
ndash HTTP status codes (eg 200 OK 400 Bad Request 404 Not found etc) are usually used
Representational State Transfer (REST)A software architecture style defined by Roy Fielding in his PhD thesis (2000) Many services worldwide offer REST
API There are (currently) no standards for RESTful applications but merely design guides
Ideaconsult LtdAugust 22 20108
OpenTox resources (1)
OpenTox considers the following set of entities as essential building blocks
bull Structures of chemical compounds
bull Properties and identifiers of chemical compounds
bull Datasets of chemical compounds and various properties (measured or calculated)
bull Algorithms
ndash Data processing algorithms
bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)
bull Data preprocessing (eg Principal component analysis feature selection)
bull Structure processing (eg structure optimization)
bull Algorithms relating set of structures to another set of structures (eg similarity search or
metabolite generation)
ndash Machine learning algorithms
bull Supervised (eg Regression Classification)
bull Unsupervised (eg Clustering )
ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human
experts not derived by learning algorithms)
Ideaconsult LtdAugust 22 20109
OpenTox resources (2)
bull Models are generated by respective algorithms given specific parameters
bull Statistical models are generated by applying statisticalmachine learning algorithms
to specific dataset and parameters
bull Models can be other than statistical eg expert defined rules quantum mechanical
calculations metabolite generation etc The intention of the framework is to be
generic enough to accommodate varieties of predictive models
bull Validation provides procedures independent of model building facilities (eg
crossvalidation) and generates relevant statistics
bull Reports
ndash Various types of reports might be generated using building blocks above (eg validation
report can be generated using validation object a model and a dataset)
bull In addition the following components are introduced
ndash Task (asynchronous processing of computationally intensive tasks)
ndash Authentication and authorization (Ensuring secure access to sensitive resources)
ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources
registration)
Ideaconsult LtdAugust 22 201010
Resources identificationAll resources are identified via unique web address assigned according to the URL templates
Ideaconsult Ltd11
Component Description URL Template (example)
Compound Representations of chemical compounds httphostportcompoundcompoundid
Feature Properties and identifiers httphostportfeaturefeatureid
Dataset Encapsulates set of chemical compounds and their property
values
httphostportdatasetdatasetid
Model OpenTox model services httphostportmodelmodeld
Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid
Validation
Report
A validation corresponds to the validation of a model on a
test dataset
httphostportvalidationvalidationid
httphostportreportreportid
Task Asynchronous jobs are handled via an intermediate Task
resource A resource submitting an asynchronous job
should return the URI of the task
httphostporttasktaskid
Ontology service Provides storage and SPARQL search functionality for
objects defined in OpenTox services and relevant
ontologies
httphostportontology
Authentication and
authorisation
Granting access to protected resources for authorised users httphostportopensso
httphostportopensso-pol
OpenTox REST operations
Ideaconsult Ltd12
Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg
httphostportmodelmodel_id or httphostportdatasetdataset_id
bull GET ndash retrieve representation of the resource
bull PUT ndash update representation of the resource
bull POST
ndash replace representation of the resource with a new one (eg replace the dataset with new
content)
ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a
model URI as a result)
bull DELETE ndash delete the resource
Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)
bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)
bull PUT - NA
bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset
content to the dataset service)
bull DELETE ndash NA
Build a predictive model
Create a model
Run calculations with
dataset
httphost1datasetid
Structures
descriptors
endpoints
Dataset service
Returns the model URL
httphost1modelid
HTTP POST
Build a predictive model
Regression
Classification
Quantum Chemistry
Descriptors etc
validationid
Algorithm service
Validation service
modelid
Published models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to models creation
14Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmneuralnetwork
httpmyhostcommodelpredictivemodel1
Use an algorithm to build a model
Ideaconsult Ltd15
bullAn algorithm is applied by submitting
HTTP POST to the algorithm URI and
providing required parameters
bullA common required parameter is
dataset_uri=httphostportdatasetda
tasetid which specifies the data set to be
operated on
bullHTTP POST in REST style services
returns URI of the result and not the
content of the result
bullThe algorithm services are designed to
store the results into a dataset service and
return the URL of the resulted dataset
bullIn case of slow calculations a Task URI
instead of the dataset URI is returned
$ curl -H Accepttexturi-list -X POST -d
dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d
prediction_feature=httpappsideaconsultnet8080ambit2feature26
701 -d
dataset_service=httpappsideaconsultnet8080ambit2dataset
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmJ48 -iv
Connected to opentoxinformatiktu-muenchende (1311592816) port
8080 (0)
POST OpenTox-devalgorithmJ48 HTTP11
gt Host opentoxinformatiktu-muenchende8080
Accept
gt Content-Type applicationx-www-form-urlencoded
lt HTTP11 202 Accepted
lt Date Sat 31 Jul 2010 144638 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 99
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
Resources The model
Ideaconsult Ltd16
$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-
muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c
About to connect() to opentoxinformatiktu-muenchende port 8080 (0)
Trying 1311592816 connected
Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)
gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11
gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g
zlib1233 libidn18 libssh2018
gt Host opentoxinformatiktu-muenchende8080
gt Accepttexturi-list
gt
lt HTTP11 200 OK
lt Date Sat 31 Jul 2010 144722 GMT
Date Sat 31 Jul 2010 144722 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
lt Vary Accept-Charset Accept-Encoding Accept-Language Accept
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 86
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
bullWhen task URI is returned the
returned status code is HTTP 202
Accepted instead of HTTP 200
OK
bullThis tells the client the processing
is not completed and the client need
to poll the task URI until OK code
is returned
bullThe final result returned by
Example 25 is the URI of the new
model httpopentoxinformatiktu-
muenchende8080OpenTox-
devmodelTUMOpenToxModel_j4
8_48
bullTo obtain prediction results POST
a dataset to the model URI
Read data from a web address ndash process ndash write to a web address
Uniform approach to data processing (eg
Descriptors calculation)
17Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
+ =
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmdescriptorX
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
bull Challengesndash Chemical structures
bull Might be ambiguous
bull Might be error prone or time consuming to reproduce from publications
ndash Data bull Multiple formats
bull Implicit semantics often buried in human readable documentation only
ndash Modelsbull Tens of thousands available in software or in publications
bull Multiple software solutions mostly incompatible
bull Predictions reproducibility is time consuming and often hard to achieve
bull Automatic comparison of prediction results difficult
Why integration framework for predictive
toxicology
Ideaconsult Ltd5
Framework design rationales
Ideaconsult Ltd6
User Requirements Software Requirements
Umambiguous data formal way of representing information about data
Unambiguous access well-defined interfaces
Transparency of
computational tools
formal way of representing information about
methods well-defined interfaces
Variety of user groups simplicity and modularity of design
Need to integrate various
resources (eg databases
prediction methods
models hellip) to make
meaningful predictions
distributed architecture interoperability
Need to integrate
biological information
again modularity of design extensibility
The framework
bull OpenTox API
ndash The way applications talk to each other
ndash The way developers talk to applications
ndash httpopentoxorgdevapisapi-11
bull The basic building blocks
ndash data chemical structures algorithms
and models
bull Functionality offered
ndash build models
ndash apply models
ndash validate models
ndash access and query data in various ways
bull Technologies
ndash REST style web services
ndash RDF for description of resources
ndash Links to existing and newly developed
ontologies (mainly to describe
metadata) about resources
Ideaconsult Ltd7
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Ontology
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
AppDomain
GET
POST
PUT
DELETEValidation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETE
Data
service
Models
service
Validatio
n service
Reporting
service
Ontology
Clients (User Interface) Standalone Browsers etc
Data
service
Data
service
all kind of
data
Registrati
on
Queries
Models
service
Reporting
service
Validation
service
Web servicesWeb services
Web services
Web services
WS
WS
Web services
Security
WSWS
Design principles
bull Resource oriented
ndash Every object (resource) is named and addressable (eg HTTP URL ) Example httpexampleopentoxcommodelmyBestModel httpexampleopentoxcomcompound50-00-0
ndash RESTfull API design starts by identifying most important objects and groups of objects supported by
the software system and proceeds by defining URL patterns
bull Transport protocol
ndash HTTP is the most popular choice of transport protocol but other protocols can be used as well
bull Operations
ndash All resources (nouns) support the same fixed and universal number of operations (verbs) HTTP
(GET POST PUT DELETE) operations are the common choice when the transport protocol is
HTTP
bull Hypermedia as the Engine of Application State
ndash All resources should be reachable via a single (or minimum) number of entry points into RESTful
applications Thus a representation of a resource should return hypermedia links to related resources
bull Error codes (for each resourceoperation pair )
ndash HTTP status codes (eg 200 OK 400 Bad Request 404 Not found etc) are usually used
Representational State Transfer (REST)A software architecture style defined by Roy Fielding in his PhD thesis (2000) Many services worldwide offer REST
API There are (currently) no standards for RESTful applications but merely design guides
Ideaconsult LtdAugust 22 20108
OpenTox resources (1)
OpenTox considers the following set of entities as essential building blocks
bull Structures of chemical compounds
bull Properties and identifiers of chemical compounds
bull Datasets of chemical compounds and various properties (measured or calculated)
bull Algorithms
ndash Data processing algorithms
bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)
bull Data preprocessing (eg Principal component analysis feature selection)
bull Structure processing (eg structure optimization)
bull Algorithms relating set of structures to another set of structures (eg similarity search or
metabolite generation)
ndash Machine learning algorithms
bull Supervised (eg Regression Classification)
bull Unsupervised (eg Clustering )
ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human
experts not derived by learning algorithms)
Ideaconsult LtdAugust 22 20109
OpenTox resources (2)
bull Models are generated by respective algorithms given specific parameters
bull Statistical models are generated by applying statisticalmachine learning algorithms
to specific dataset and parameters
bull Models can be other than statistical eg expert defined rules quantum mechanical
calculations metabolite generation etc The intention of the framework is to be
generic enough to accommodate varieties of predictive models
bull Validation provides procedures independent of model building facilities (eg
crossvalidation) and generates relevant statistics
bull Reports
ndash Various types of reports might be generated using building blocks above (eg validation
report can be generated using validation object a model and a dataset)
bull In addition the following components are introduced
ndash Task (asynchronous processing of computationally intensive tasks)
ndash Authentication and authorization (Ensuring secure access to sensitive resources)
ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources
registration)
Ideaconsult LtdAugust 22 201010
Resources identificationAll resources are identified via unique web address assigned according to the URL templates
Ideaconsult Ltd11
Component Description URL Template (example)
Compound Representations of chemical compounds httphostportcompoundcompoundid
Feature Properties and identifiers httphostportfeaturefeatureid
Dataset Encapsulates set of chemical compounds and their property
values
httphostportdatasetdatasetid
Model OpenTox model services httphostportmodelmodeld
Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid
Validation
Report
A validation corresponds to the validation of a model on a
test dataset
httphostportvalidationvalidationid
httphostportreportreportid
Task Asynchronous jobs are handled via an intermediate Task
resource A resource submitting an asynchronous job
should return the URI of the task
httphostporttasktaskid
Ontology service Provides storage and SPARQL search functionality for
objects defined in OpenTox services and relevant
ontologies
httphostportontology
Authentication and
authorisation
Granting access to protected resources for authorised users httphostportopensso
httphostportopensso-pol
OpenTox REST operations
Ideaconsult Ltd12
Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg
httphostportmodelmodel_id or httphostportdatasetdataset_id
bull GET ndash retrieve representation of the resource
bull PUT ndash update representation of the resource
bull POST
ndash replace representation of the resource with a new one (eg replace the dataset with new
content)
ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a
model URI as a result)
bull DELETE ndash delete the resource
Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)
bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)
bull PUT - NA
bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset
content to the dataset service)
bull DELETE ndash NA
Build a predictive model
Create a model
Run calculations with
dataset
httphost1datasetid
Structures
descriptors
endpoints
Dataset service
Returns the model URL
httphost1modelid
HTTP POST
Build a predictive model
Regression
Classification
Quantum Chemistry
Descriptors etc
validationid
Algorithm service
Validation service
modelid
Published models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to models creation
14Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmneuralnetwork
httpmyhostcommodelpredictivemodel1
Use an algorithm to build a model
Ideaconsult Ltd15
bullAn algorithm is applied by submitting
HTTP POST to the algorithm URI and
providing required parameters
bullA common required parameter is
dataset_uri=httphostportdatasetda
tasetid which specifies the data set to be
operated on
bullHTTP POST in REST style services
returns URI of the result and not the
content of the result
bullThe algorithm services are designed to
store the results into a dataset service and
return the URL of the resulted dataset
bullIn case of slow calculations a Task URI
instead of the dataset URI is returned
$ curl -H Accepttexturi-list -X POST -d
dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d
prediction_feature=httpappsideaconsultnet8080ambit2feature26
701 -d
dataset_service=httpappsideaconsultnet8080ambit2dataset
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmJ48 -iv
Connected to opentoxinformatiktu-muenchende (1311592816) port
8080 (0)
POST OpenTox-devalgorithmJ48 HTTP11
gt Host opentoxinformatiktu-muenchende8080
Accept
gt Content-Type applicationx-www-form-urlencoded
lt HTTP11 202 Accepted
lt Date Sat 31 Jul 2010 144638 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 99
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
Resources The model
Ideaconsult Ltd16
$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-
muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c
About to connect() to opentoxinformatiktu-muenchende port 8080 (0)
Trying 1311592816 connected
Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)
gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11
gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g
zlib1233 libidn18 libssh2018
gt Host opentoxinformatiktu-muenchende8080
gt Accepttexturi-list
gt
lt HTTP11 200 OK
lt Date Sat 31 Jul 2010 144722 GMT
Date Sat 31 Jul 2010 144722 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
lt Vary Accept-Charset Accept-Encoding Accept-Language Accept
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 86
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
bullWhen task URI is returned the
returned status code is HTTP 202
Accepted instead of HTTP 200
OK
bullThis tells the client the processing
is not completed and the client need
to poll the task URI until OK code
is returned
bullThe final result returned by
Example 25 is the URI of the new
model httpopentoxinformatiktu-
muenchende8080OpenTox-
devmodelTUMOpenToxModel_j4
8_48
bullTo obtain prediction results POST
a dataset to the model URI
Read data from a web address ndash process ndash write to a web address
Uniform approach to data processing (eg
Descriptors calculation)
17Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
+ =
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmdescriptorX
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Framework design rationales
Ideaconsult Ltd6
User Requirements Software Requirements
Umambiguous data formal way of representing information about data
Unambiguous access well-defined interfaces
Transparency of
computational tools
formal way of representing information about
methods well-defined interfaces
Variety of user groups simplicity and modularity of design
Need to integrate various
resources (eg databases
prediction methods
models hellip) to make
meaningful predictions
distributed architecture interoperability
Need to integrate
biological information
again modularity of design extensibility
The framework
bull OpenTox API
ndash The way applications talk to each other
ndash The way developers talk to applications
ndash httpopentoxorgdevapisapi-11
bull The basic building blocks
ndash data chemical structures algorithms
and models
bull Functionality offered
ndash build models
ndash apply models
ndash validate models
ndash access and query data in various ways
bull Technologies
ndash REST style web services
ndash RDF for description of resources
ndash Links to existing and newly developed
ontologies (mainly to describe
metadata) about resources
Ideaconsult Ltd7
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Ontology
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
AppDomain
GET
POST
PUT
DELETEValidation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETE
Data
service
Models
service
Validatio
n service
Reporting
service
Ontology
Clients (User Interface) Standalone Browsers etc
Data
service
Data
service
all kind of
data
Registrati
on
Queries
Models
service
Reporting
service
Validation
service
Web servicesWeb services
Web services
Web services
WS
WS
Web services
Security
WSWS
Design principles
bull Resource oriented
ndash Every object (resource) is named and addressable (eg HTTP URL ) Example httpexampleopentoxcommodelmyBestModel httpexampleopentoxcomcompound50-00-0
ndash RESTfull API design starts by identifying most important objects and groups of objects supported by
the software system and proceeds by defining URL patterns
bull Transport protocol
ndash HTTP is the most popular choice of transport protocol but other protocols can be used as well
bull Operations
ndash All resources (nouns) support the same fixed and universal number of operations (verbs) HTTP
(GET POST PUT DELETE) operations are the common choice when the transport protocol is
HTTP
bull Hypermedia as the Engine of Application State
ndash All resources should be reachable via a single (or minimum) number of entry points into RESTful
applications Thus a representation of a resource should return hypermedia links to related resources
bull Error codes (for each resourceoperation pair )
ndash HTTP status codes (eg 200 OK 400 Bad Request 404 Not found etc) are usually used
Representational State Transfer (REST)A software architecture style defined by Roy Fielding in his PhD thesis (2000) Many services worldwide offer REST
API There are (currently) no standards for RESTful applications but merely design guides
Ideaconsult LtdAugust 22 20108
OpenTox resources (1)
OpenTox considers the following set of entities as essential building blocks
bull Structures of chemical compounds
bull Properties and identifiers of chemical compounds
bull Datasets of chemical compounds and various properties (measured or calculated)
bull Algorithms
ndash Data processing algorithms
bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)
bull Data preprocessing (eg Principal component analysis feature selection)
bull Structure processing (eg structure optimization)
bull Algorithms relating set of structures to another set of structures (eg similarity search or
metabolite generation)
ndash Machine learning algorithms
bull Supervised (eg Regression Classification)
bull Unsupervised (eg Clustering )
ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human
experts not derived by learning algorithms)
Ideaconsult LtdAugust 22 20109
OpenTox resources (2)
bull Models are generated by respective algorithms given specific parameters
bull Statistical models are generated by applying statisticalmachine learning algorithms
to specific dataset and parameters
bull Models can be other than statistical eg expert defined rules quantum mechanical
calculations metabolite generation etc The intention of the framework is to be
generic enough to accommodate varieties of predictive models
bull Validation provides procedures independent of model building facilities (eg
crossvalidation) and generates relevant statistics
bull Reports
ndash Various types of reports might be generated using building blocks above (eg validation
report can be generated using validation object a model and a dataset)
bull In addition the following components are introduced
ndash Task (asynchronous processing of computationally intensive tasks)
ndash Authentication and authorization (Ensuring secure access to sensitive resources)
ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources
registration)
Ideaconsult LtdAugust 22 201010
Resources identificationAll resources are identified via unique web address assigned according to the URL templates
Ideaconsult Ltd11
Component Description URL Template (example)
Compound Representations of chemical compounds httphostportcompoundcompoundid
Feature Properties and identifiers httphostportfeaturefeatureid
Dataset Encapsulates set of chemical compounds and their property
values
httphostportdatasetdatasetid
Model OpenTox model services httphostportmodelmodeld
Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid
Validation
Report
A validation corresponds to the validation of a model on a
test dataset
httphostportvalidationvalidationid
httphostportreportreportid
Task Asynchronous jobs are handled via an intermediate Task
resource A resource submitting an asynchronous job
should return the URI of the task
httphostporttasktaskid
Ontology service Provides storage and SPARQL search functionality for
objects defined in OpenTox services and relevant
ontologies
httphostportontology
Authentication and
authorisation
Granting access to protected resources for authorised users httphostportopensso
httphostportopensso-pol
OpenTox REST operations
Ideaconsult Ltd12
Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg
httphostportmodelmodel_id or httphostportdatasetdataset_id
bull GET ndash retrieve representation of the resource
bull PUT ndash update representation of the resource
bull POST
ndash replace representation of the resource with a new one (eg replace the dataset with new
content)
ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a
model URI as a result)
bull DELETE ndash delete the resource
Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)
bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)
bull PUT - NA
bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset
content to the dataset service)
bull DELETE ndash NA
Build a predictive model
Create a model
Run calculations with
dataset
httphost1datasetid
Structures
descriptors
endpoints
Dataset service
Returns the model URL
httphost1modelid
HTTP POST
Build a predictive model
Regression
Classification
Quantum Chemistry
Descriptors etc
validationid
Algorithm service
Validation service
modelid
Published models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to models creation
14Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmneuralnetwork
httpmyhostcommodelpredictivemodel1
Use an algorithm to build a model
Ideaconsult Ltd15
bullAn algorithm is applied by submitting
HTTP POST to the algorithm URI and
providing required parameters
bullA common required parameter is
dataset_uri=httphostportdatasetda
tasetid which specifies the data set to be
operated on
bullHTTP POST in REST style services
returns URI of the result and not the
content of the result
bullThe algorithm services are designed to
store the results into a dataset service and
return the URL of the resulted dataset
bullIn case of slow calculations a Task URI
instead of the dataset URI is returned
$ curl -H Accepttexturi-list -X POST -d
dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d
prediction_feature=httpappsideaconsultnet8080ambit2feature26
701 -d
dataset_service=httpappsideaconsultnet8080ambit2dataset
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmJ48 -iv
Connected to opentoxinformatiktu-muenchende (1311592816) port
8080 (0)
POST OpenTox-devalgorithmJ48 HTTP11
gt Host opentoxinformatiktu-muenchende8080
Accept
gt Content-Type applicationx-www-form-urlencoded
lt HTTP11 202 Accepted
lt Date Sat 31 Jul 2010 144638 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 99
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
Resources The model
Ideaconsult Ltd16
$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-
muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c
About to connect() to opentoxinformatiktu-muenchende port 8080 (0)
Trying 1311592816 connected
Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)
gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11
gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g
zlib1233 libidn18 libssh2018
gt Host opentoxinformatiktu-muenchende8080
gt Accepttexturi-list
gt
lt HTTP11 200 OK
lt Date Sat 31 Jul 2010 144722 GMT
Date Sat 31 Jul 2010 144722 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
lt Vary Accept-Charset Accept-Encoding Accept-Language Accept
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 86
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
bullWhen task URI is returned the
returned status code is HTTP 202
Accepted instead of HTTP 200
OK
bullThis tells the client the processing
is not completed and the client need
to poll the task URI until OK code
is returned
bullThe final result returned by
Example 25 is the URI of the new
model httpopentoxinformatiktu-
muenchende8080OpenTox-
devmodelTUMOpenToxModel_j4
8_48
bullTo obtain prediction results POST
a dataset to the model URI
Read data from a web address ndash process ndash write to a web address
Uniform approach to data processing (eg
Descriptors calculation)
17Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
+ =
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmdescriptorX
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
The framework
bull OpenTox API
ndash The way applications talk to each other
ndash The way developers talk to applications
ndash httpopentoxorgdevapisapi-11
bull The basic building blocks
ndash data chemical structures algorithms
and models
bull Functionality offered
ndash build models
ndash apply models
ndash validate models
ndash access and query data in various ways
bull Technologies
ndash REST style web services
ndash RDF for description of resources
ndash Links to existing and newly developed
ontologies (mainly to describe
metadata) about resources
Ideaconsult Ltd7
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Ontology
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
AppDomain
GET
POST
PUT
DELETEValidation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETE
Data
service
Models
service
Validatio
n service
Reporting
service
Ontology
Clients (User Interface) Standalone Browsers etc
Data
service
Data
service
all kind of
data
Registrati
on
Queries
Models
service
Reporting
service
Validation
service
Web servicesWeb services
Web services
Web services
WS
WS
Web services
Security
WSWS
Design principles
bull Resource oriented
ndash Every object (resource) is named and addressable (eg HTTP URL ) Example httpexampleopentoxcommodelmyBestModel httpexampleopentoxcomcompound50-00-0
ndash RESTfull API design starts by identifying most important objects and groups of objects supported by
the software system and proceeds by defining URL patterns
bull Transport protocol
ndash HTTP is the most popular choice of transport protocol but other protocols can be used as well
bull Operations
ndash All resources (nouns) support the same fixed and universal number of operations (verbs) HTTP
(GET POST PUT DELETE) operations are the common choice when the transport protocol is
HTTP
bull Hypermedia as the Engine of Application State
ndash All resources should be reachable via a single (or minimum) number of entry points into RESTful
applications Thus a representation of a resource should return hypermedia links to related resources
bull Error codes (for each resourceoperation pair )
ndash HTTP status codes (eg 200 OK 400 Bad Request 404 Not found etc) are usually used
Representational State Transfer (REST)A software architecture style defined by Roy Fielding in his PhD thesis (2000) Many services worldwide offer REST
API There are (currently) no standards for RESTful applications but merely design guides
Ideaconsult LtdAugust 22 20108
OpenTox resources (1)
OpenTox considers the following set of entities as essential building blocks
bull Structures of chemical compounds
bull Properties and identifiers of chemical compounds
bull Datasets of chemical compounds and various properties (measured or calculated)
bull Algorithms
ndash Data processing algorithms
bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)
bull Data preprocessing (eg Principal component analysis feature selection)
bull Structure processing (eg structure optimization)
bull Algorithms relating set of structures to another set of structures (eg similarity search or
metabolite generation)
ndash Machine learning algorithms
bull Supervised (eg Regression Classification)
bull Unsupervised (eg Clustering )
ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human
experts not derived by learning algorithms)
Ideaconsult LtdAugust 22 20109
OpenTox resources (2)
bull Models are generated by respective algorithms given specific parameters
bull Statistical models are generated by applying statisticalmachine learning algorithms
to specific dataset and parameters
bull Models can be other than statistical eg expert defined rules quantum mechanical
calculations metabolite generation etc The intention of the framework is to be
generic enough to accommodate varieties of predictive models
bull Validation provides procedures independent of model building facilities (eg
crossvalidation) and generates relevant statistics
bull Reports
ndash Various types of reports might be generated using building blocks above (eg validation
report can be generated using validation object a model and a dataset)
bull In addition the following components are introduced
ndash Task (asynchronous processing of computationally intensive tasks)
ndash Authentication and authorization (Ensuring secure access to sensitive resources)
ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources
registration)
Ideaconsult LtdAugust 22 201010
Resources identificationAll resources are identified via unique web address assigned according to the URL templates
Ideaconsult Ltd11
Component Description URL Template (example)
Compound Representations of chemical compounds httphostportcompoundcompoundid
Feature Properties and identifiers httphostportfeaturefeatureid
Dataset Encapsulates set of chemical compounds and their property
values
httphostportdatasetdatasetid
Model OpenTox model services httphostportmodelmodeld
Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid
Validation
Report
A validation corresponds to the validation of a model on a
test dataset
httphostportvalidationvalidationid
httphostportreportreportid
Task Asynchronous jobs are handled via an intermediate Task
resource A resource submitting an asynchronous job
should return the URI of the task
httphostporttasktaskid
Ontology service Provides storage and SPARQL search functionality for
objects defined in OpenTox services and relevant
ontologies
httphostportontology
Authentication and
authorisation
Granting access to protected resources for authorised users httphostportopensso
httphostportopensso-pol
OpenTox REST operations
Ideaconsult Ltd12
Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg
httphostportmodelmodel_id or httphostportdatasetdataset_id
bull GET ndash retrieve representation of the resource
bull PUT ndash update representation of the resource
bull POST
ndash replace representation of the resource with a new one (eg replace the dataset with new
content)
ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a
model URI as a result)
bull DELETE ndash delete the resource
Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)
bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)
bull PUT - NA
bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset
content to the dataset service)
bull DELETE ndash NA
Build a predictive model
Create a model
Run calculations with
dataset
httphost1datasetid
Structures
descriptors
endpoints
Dataset service
Returns the model URL
httphost1modelid
HTTP POST
Build a predictive model
Regression
Classification
Quantum Chemistry
Descriptors etc
validationid
Algorithm service
Validation service
modelid
Published models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to models creation
14Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmneuralnetwork
httpmyhostcommodelpredictivemodel1
Use an algorithm to build a model
Ideaconsult Ltd15
bullAn algorithm is applied by submitting
HTTP POST to the algorithm URI and
providing required parameters
bullA common required parameter is
dataset_uri=httphostportdatasetda
tasetid which specifies the data set to be
operated on
bullHTTP POST in REST style services
returns URI of the result and not the
content of the result
bullThe algorithm services are designed to
store the results into a dataset service and
return the URL of the resulted dataset
bullIn case of slow calculations a Task URI
instead of the dataset URI is returned
$ curl -H Accepttexturi-list -X POST -d
dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d
prediction_feature=httpappsideaconsultnet8080ambit2feature26
701 -d
dataset_service=httpappsideaconsultnet8080ambit2dataset
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmJ48 -iv
Connected to opentoxinformatiktu-muenchende (1311592816) port
8080 (0)
POST OpenTox-devalgorithmJ48 HTTP11
gt Host opentoxinformatiktu-muenchende8080
Accept
gt Content-Type applicationx-www-form-urlencoded
lt HTTP11 202 Accepted
lt Date Sat 31 Jul 2010 144638 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 99
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
Resources The model
Ideaconsult Ltd16
$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-
muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c
About to connect() to opentoxinformatiktu-muenchende port 8080 (0)
Trying 1311592816 connected
Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)
gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11
gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g
zlib1233 libidn18 libssh2018
gt Host opentoxinformatiktu-muenchende8080
gt Accepttexturi-list
gt
lt HTTP11 200 OK
lt Date Sat 31 Jul 2010 144722 GMT
Date Sat 31 Jul 2010 144722 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
lt Vary Accept-Charset Accept-Encoding Accept-Language Accept
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 86
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
bullWhen task URI is returned the
returned status code is HTTP 202
Accepted instead of HTTP 200
OK
bullThis tells the client the processing
is not completed and the client need
to poll the task URI until OK code
is returned
bullThe final result returned by
Example 25 is the URI of the new
model httpopentoxinformatiktu-
muenchende8080OpenTox-
devmodelTUMOpenToxModel_j4
8_48
bullTo obtain prediction results POST
a dataset to the model URI
Read data from a web address ndash process ndash write to a web address
Uniform approach to data processing (eg
Descriptors calculation)
17Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
+ =
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmdescriptorX
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Design principles
bull Resource oriented
ndash Every object (resource) is named and addressable (eg HTTP URL ) Example httpexampleopentoxcommodelmyBestModel httpexampleopentoxcomcompound50-00-0
ndash RESTfull API design starts by identifying most important objects and groups of objects supported by
the software system and proceeds by defining URL patterns
bull Transport protocol
ndash HTTP is the most popular choice of transport protocol but other protocols can be used as well
bull Operations
ndash All resources (nouns) support the same fixed and universal number of operations (verbs) HTTP
(GET POST PUT DELETE) operations are the common choice when the transport protocol is
HTTP
bull Hypermedia as the Engine of Application State
ndash All resources should be reachable via a single (or minimum) number of entry points into RESTful
applications Thus a representation of a resource should return hypermedia links to related resources
bull Error codes (for each resourceoperation pair )
ndash HTTP status codes (eg 200 OK 400 Bad Request 404 Not found etc) are usually used
Representational State Transfer (REST)A software architecture style defined by Roy Fielding in his PhD thesis (2000) Many services worldwide offer REST
API There are (currently) no standards for RESTful applications but merely design guides
Ideaconsult LtdAugust 22 20108
OpenTox resources (1)
OpenTox considers the following set of entities as essential building blocks
bull Structures of chemical compounds
bull Properties and identifiers of chemical compounds
bull Datasets of chemical compounds and various properties (measured or calculated)
bull Algorithms
ndash Data processing algorithms
bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)
bull Data preprocessing (eg Principal component analysis feature selection)
bull Structure processing (eg structure optimization)
bull Algorithms relating set of structures to another set of structures (eg similarity search or
metabolite generation)
ndash Machine learning algorithms
bull Supervised (eg Regression Classification)
bull Unsupervised (eg Clustering )
ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human
experts not derived by learning algorithms)
Ideaconsult LtdAugust 22 20109
OpenTox resources (2)
bull Models are generated by respective algorithms given specific parameters
bull Statistical models are generated by applying statisticalmachine learning algorithms
to specific dataset and parameters
bull Models can be other than statistical eg expert defined rules quantum mechanical
calculations metabolite generation etc The intention of the framework is to be
generic enough to accommodate varieties of predictive models
bull Validation provides procedures independent of model building facilities (eg
crossvalidation) and generates relevant statistics
bull Reports
ndash Various types of reports might be generated using building blocks above (eg validation
report can be generated using validation object a model and a dataset)
bull In addition the following components are introduced
ndash Task (asynchronous processing of computationally intensive tasks)
ndash Authentication and authorization (Ensuring secure access to sensitive resources)
ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources
registration)
Ideaconsult LtdAugust 22 201010
Resources identificationAll resources are identified via unique web address assigned according to the URL templates
Ideaconsult Ltd11
Component Description URL Template (example)
Compound Representations of chemical compounds httphostportcompoundcompoundid
Feature Properties and identifiers httphostportfeaturefeatureid
Dataset Encapsulates set of chemical compounds and their property
values
httphostportdatasetdatasetid
Model OpenTox model services httphostportmodelmodeld
Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid
Validation
Report
A validation corresponds to the validation of a model on a
test dataset
httphostportvalidationvalidationid
httphostportreportreportid
Task Asynchronous jobs are handled via an intermediate Task
resource A resource submitting an asynchronous job
should return the URI of the task
httphostporttasktaskid
Ontology service Provides storage and SPARQL search functionality for
objects defined in OpenTox services and relevant
ontologies
httphostportontology
Authentication and
authorisation
Granting access to protected resources for authorised users httphostportopensso
httphostportopensso-pol
OpenTox REST operations
Ideaconsult Ltd12
Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg
httphostportmodelmodel_id or httphostportdatasetdataset_id
bull GET ndash retrieve representation of the resource
bull PUT ndash update representation of the resource
bull POST
ndash replace representation of the resource with a new one (eg replace the dataset with new
content)
ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a
model URI as a result)
bull DELETE ndash delete the resource
Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)
bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)
bull PUT - NA
bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset
content to the dataset service)
bull DELETE ndash NA
Build a predictive model
Create a model
Run calculations with
dataset
httphost1datasetid
Structures
descriptors
endpoints
Dataset service
Returns the model URL
httphost1modelid
HTTP POST
Build a predictive model
Regression
Classification
Quantum Chemistry
Descriptors etc
validationid
Algorithm service
Validation service
modelid
Published models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to models creation
14Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmneuralnetwork
httpmyhostcommodelpredictivemodel1
Use an algorithm to build a model
Ideaconsult Ltd15
bullAn algorithm is applied by submitting
HTTP POST to the algorithm URI and
providing required parameters
bullA common required parameter is
dataset_uri=httphostportdatasetda
tasetid which specifies the data set to be
operated on
bullHTTP POST in REST style services
returns URI of the result and not the
content of the result
bullThe algorithm services are designed to
store the results into a dataset service and
return the URL of the resulted dataset
bullIn case of slow calculations a Task URI
instead of the dataset URI is returned
$ curl -H Accepttexturi-list -X POST -d
dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d
prediction_feature=httpappsideaconsultnet8080ambit2feature26
701 -d
dataset_service=httpappsideaconsultnet8080ambit2dataset
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmJ48 -iv
Connected to opentoxinformatiktu-muenchende (1311592816) port
8080 (0)
POST OpenTox-devalgorithmJ48 HTTP11
gt Host opentoxinformatiktu-muenchende8080
Accept
gt Content-Type applicationx-www-form-urlencoded
lt HTTP11 202 Accepted
lt Date Sat 31 Jul 2010 144638 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 99
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
Resources The model
Ideaconsult Ltd16
$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-
muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c
About to connect() to opentoxinformatiktu-muenchende port 8080 (0)
Trying 1311592816 connected
Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)
gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11
gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g
zlib1233 libidn18 libssh2018
gt Host opentoxinformatiktu-muenchende8080
gt Accepttexturi-list
gt
lt HTTP11 200 OK
lt Date Sat 31 Jul 2010 144722 GMT
Date Sat 31 Jul 2010 144722 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
lt Vary Accept-Charset Accept-Encoding Accept-Language Accept
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 86
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
bullWhen task URI is returned the
returned status code is HTTP 202
Accepted instead of HTTP 200
OK
bullThis tells the client the processing
is not completed and the client need
to poll the task URI until OK code
is returned
bullThe final result returned by
Example 25 is the URI of the new
model httpopentoxinformatiktu-
muenchende8080OpenTox-
devmodelTUMOpenToxModel_j4
8_48
bullTo obtain prediction results POST
a dataset to the model URI
Read data from a web address ndash process ndash write to a web address
Uniform approach to data processing (eg
Descriptors calculation)
17Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
+ =
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmdescriptorX
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
OpenTox resources (1)
OpenTox considers the following set of entities as essential building blocks
bull Structures of chemical compounds
bull Properties and identifiers of chemical compounds
bull Datasets of chemical compounds and various properties (measured or calculated)
bull Algorithms
ndash Data processing algorithms
bull Algorithms generating certain values based on chemical structure (eg descriptor calculation)
bull Data preprocessing (eg Principal component analysis feature selection)
bull Structure processing (eg structure optimization)
bull Algorithms relating set of structures to another set of structures (eg similarity search or
metabolite generation)
ndash Machine learning algorithms
bull Supervised (eg Regression Classification)
bull Unsupervised (eg Clustering )
ndash Prediction algorithms defined by experts (eg series of structural alerts defined by human
experts not derived by learning algorithms)
Ideaconsult LtdAugust 22 20109
OpenTox resources (2)
bull Models are generated by respective algorithms given specific parameters
bull Statistical models are generated by applying statisticalmachine learning algorithms
to specific dataset and parameters
bull Models can be other than statistical eg expert defined rules quantum mechanical
calculations metabolite generation etc The intention of the framework is to be
generic enough to accommodate varieties of predictive models
bull Validation provides procedures independent of model building facilities (eg
crossvalidation) and generates relevant statistics
bull Reports
ndash Various types of reports might be generated using building blocks above (eg validation
report can be generated using validation object a model and a dataset)
bull In addition the following components are introduced
ndash Task (asynchronous processing of computationally intensive tasks)
ndash Authentication and authorization (Ensuring secure access to sensitive resources)
ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources
registration)
Ideaconsult LtdAugust 22 201010
Resources identificationAll resources are identified via unique web address assigned according to the URL templates
Ideaconsult Ltd11
Component Description URL Template (example)
Compound Representations of chemical compounds httphostportcompoundcompoundid
Feature Properties and identifiers httphostportfeaturefeatureid
Dataset Encapsulates set of chemical compounds and their property
values
httphostportdatasetdatasetid
Model OpenTox model services httphostportmodelmodeld
Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid
Validation
Report
A validation corresponds to the validation of a model on a
test dataset
httphostportvalidationvalidationid
httphostportreportreportid
Task Asynchronous jobs are handled via an intermediate Task
resource A resource submitting an asynchronous job
should return the URI of the task
httphostporttasktaskid
Ontology service Provides storage and SPARQL search functionality for
objects defined in OpenTox services and relevant
ontologies
httphostportontology
Authentication and
authorisation
Granting access to protected resources for authorised users httphostportopensso
httphostportopensso-pol
OpenTox REST operations
Ideaconsult Ltd12
Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg
httphostportmodelmodel_id or httphostportdatasetdataset_id
bull GET ndash retrieve representation of the resource
bull PUT ndash update representation of the resource
bull POST
ndash replace representation of the resource with a new one (eg replace the dataset with new
content)
ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a
model URI as a result)
bull DELETE ndash delete the resource
Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)
bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)
bull PUT - NA
bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset
content to the dataset service)
bull DELETE ndash NA
Build a predictive model
Create a model
Run calculations with
dataset
httphost1datasetid
Structures
descriptors
endpoints
Dataset service
Returns the model URL
httphost1modelid
HTTP POST
Build a predictive model
Regression
Classification
Quantum Chemistry
Descriptors etc
validationid
Algorithm service
Validation service
modelid
Published models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to models creation
14Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmneuralnetwork
httpmyhostcommodelpredictivemodel1
Use an algorithm to build a model
Ideaconsult Ltd15
bullAn algorithm is applied by submitting
HTTP POST to the algorithm URI and
providing required parameters
bullA common required parameter is
dataset_uri=httphostportdatasetda
tasetid which specifies the data set to be
operated on
bullHTTP POST in REST style services
returns URI of the result and not the
content of the result
bullThe algorithm services are designed to
store the results into a dataset service and
return the URL of the resulted dataset
bullIn case of slow calculations a Task URI
instead of the dataset URI is returned
$ curl -H Accepttexturi-list -X POST -d
dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d
prediction_feature=httpappsideaconsultnet8080ambit2feature26
701 -d
dataset_service=httpappsideaconsultnet8080ambit2dataset
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmJ48 -iv
Connected to opentoxinformatiktu-muenchende (1311592816) port
8080 (0)
POST OpenTox-devalgorithmJ48 HTTP11
gt Host opentoxinformatiktu-muenchende8080
Accept
gt Content-Type applicationx-www-form-urlencoded
lt HTTP11 202 Accepted
lt Date Sat 31 Jul 2010 144638 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 99
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
Resources The model
Ideaconsult Ltd16
$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-
muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c
About to connect() to opentoxinformatiktu-muenchende port 8080 (0)
Trying 1311592816 connected
Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)
gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11
gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g
zlib1233 libidn18 libssh2018
gt Host opentoxinformatiktu-muenchende8080
gt Accepttexturi-list
gt
lt HTTP11 200 OK
lt Date Sat 31 Jul 2010 144722 GMT
Date Sat 31 Jul 2010 144722 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
lt Vary Accept-Charset Accept-Encoding Accept-Language Accept
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 86
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
bullWhen task URI is returned the
returned status code is HTTP 202
Accepted instead of HTTP 200
OK
bullThis tells the client the processing
is not completed and the client need
to poll the task URI until OK code
is returned
bullThe final result returned by
Example 25 is the URI of the new
model httpopentoxinformatiktu-
muenchende8080OpenTox-
devmodelTUMOpenToxModel_j4
8_48
bullTo obtain prediction results POST
a dataset to the model URI
Read data from a web address ndash process ndash write to a web address
Uniform approach to data processing (eg
Descriptors calculation)
17Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
+ =
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmdescriptorX
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
OpenTox resources (2)
bull Models are generated by respective algorithms given specific parameters
bull Statistical models are generated by applying statisticalmachine learning algorithms
to specific dataset and parameters
bull Models can be other than statistical eg expert defined rules quantum mechanical
calculations metabolite generation etc The intention of the framework is to be
generic enough to accommodate varieties of predictive models
bull Validation provides procedures independent of model building facilities (eg
crossvalidation) and generates relevant statistics
bull Reports
ndash Various types of reports might be generated using building blocks above (eg validation
report can be generated using validation object a model and a dataset)
bull In addition the following components are introduced
ndash Task (asynchronous processing of computationally intensive tasks)
ndash Authentication and authorization (Ensuring secure access to sensitive resources)
ndash Ontology service (provides an RDF storage and SPARQL endpoint for resources
registration)
Ideaconsult LtdAugust 22 201010
Resources identificationAll resources are identified via unique web address assigned according to the URL templates
Ideaconsult Ltd11
Component Description URL Template (example)
Compound Representations of chemical compounds httphostportcompoundcompoundid
Feature Properties and identifiers httphostportfeaturefeatureid
Dataset Encapsulates set of chemical compounds and their property
values
httphostportdatasetdatasetid
Model OpenTox model services httphostportmodelmodeld
Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid
Validation
Report
A validation corresponds to the validation of a model on a
test dataset
httphostportvalidationvalidationid
httphostportreportreportid
Task Asynchronous jobs are handled via an intermediate Task
resource A resource submitting an asynchronous job
should return the URI of the task
httphostporttasktaskid
Ontology service Provides storage and SPARQL search functionality for
objects defined in OpenTox services and relevant
ontologies
httphostportontology
Authentication and
authorisation
Granting access to protected resources for authorised users httphostportopensso
httphostportopensso-pol
OpenTox REST operations
Ideaconsult Ltd12
Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg
httphostportmodelmodel_id or httphostportdatasetdataset_id
bull GET ndash retrieve representation of the resource
bull PUT ndash update representation of the resource
bull POST
ndash replace representation of the resource with a new one (eg replace the dataset with new
content)
ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a
model URI as a result)
bull DELETE ndash delete the resource
Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)
bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)
bull PUT - NA
bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset
content to the dataset service)
bull DELETE ndash NA
Build a predictive model
Create a model
Run calculations with
dataset
httphost1datasetid
Structures
descriptors
endpoints
Dataset service
Returns the model URL
httphost1modelid
HTTP POST
Build a predictive model
Regression
Classification
Quantum Chemistry
Descriptors etc
validationid
Algorithm service
Validation service
modelid
Published models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to models creation
14Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmneuralnetwork
httpmyhostcommodelpredictivemodel1
Use an algorithm to build a model
Ideaconsult Ltd15
bullAn algorithm is applied by submitting
HTTP POST to the algorithm URI and
providing required parameters
bullA common required parameter is
dataset_uri=httphostportdatasetda
tasetid which specifies the data set to be
operated on
bullHTTP POST in REST style services
returns URI of the result and not the
content of the result
bullThe algorithm services are designed to
store the results into a dataset service and
return the URL of the resulted dataset
bullIn case of slow calculations a Task URI
instead of the dataset URI is returned
$ curl -H Accepttexturi-list -X POST -d
dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d
prediction_feature=httpappsideaconsultnet8080ambit2feature26
701 -d
dataset_service=httpappsideaconsultnet8080ambit2dataset
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmJ48 -iv
Connected to opentoxinformatiktu-muenchende (1311592816) port
8080 (0)
POST OpenTox-devalgorithmJ48 HTTP11
gt Host opentoxinformatiktu-muenchende8080
Accept
gt Content-Type applicationx-www-form-urlencoded
lt HTTP11 202 Accepted
lt Date Sat 31 Jul 2010 144638 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 99
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
Resources The model
Ideaconsult Ltd16
$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-
muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c
About to connect() to opentoxinformatiktu-muenchende port 8080 (0)
Trying 1311592816 connected
Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)
gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11
gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g
zlib1233 libidn18 libssh2018
gt Host opentoxinformatiktu-muenchende8080
gt Accepttexturi-list
gt
lt HTTP11 200 OK
lt Date Sat 31 Jul 2010 144722 GMT
Date Sat 31 Jul 2010 144722 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
lt Vary Accept-Charset Accept-Encoding Accept-Language Accept
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 86
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
bullWhen task URI is returned the
returned status code is HTTP 202
Accepted instead of HTTP 200
OK
bullThis tells the client the processing
is not completed and the client need
to poll the task URI until OK code
is returned
bullThe final result returned by
Example 25 is the URI of the new
model httpopentoxinformatiktu-
muenchende8080OpenTox-
devmodelTUMOpenToxModel_j4
8_48
bullTo obtain prediction results POST
a dataset to the model URI
Read data from a web address ndash process ndash write to a web address
Uniform approach to data processing (eg
Descriptors calculation)
17Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
+ =
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmdescriptorX
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Resources identificationAll resources are identified via unique web address assigned according to the URL templates
Ideaconsult Ltd11
Component Description URL Template (example)
Compound Representations of chemical compounds httphostportcompoundcompoundid
Feature Properties and identifiers httphostportfeaturefeatureid
Dataset Encapsulates set of chemical compounds and their property
values
httphostportdatasetdatasetid
Model OpenTox model services httphostportmodelmodeld
Algorithm OpenTox algorithm services httphostportalgorithmalgorithmid
Validation
Report
A validation corresponds to the validation of a model on a
test dataset
httphostportvalidationvalidationid
httphostportreportreportid
Task Asynchronous jobs are handled via an intermediate Task
resource A resource submitting an asynchronous job
should return the URI of the task
httphostporttasktaskid
Ontology service Provides storage and SPARQL search functionality for
objects defined in OpenTox services and relevant
ontologies
httphostportontology
Authentication and
authorisation
Granting access to protected resources for authorised users httphostportopensso
httphostportopensso-pol
OpenTox REST operations
Ideaconsult Ltd12
Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg
httphostportmodelmodel_id or httphostportdatasetdataset_id
bull GET ndash retrieve representation of the resource
bull PUT ndash update representation of the resource
bull POST
ndash replace representation of the resource with a new one (eg replace the dataset with new
content)
ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a
model URI as a result)
bull DELETE ndash delete the resource
Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)
bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)
bull PUT - NA
bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset
content to the dataset service)
bull DELETE ndash NA
Build a predictive model
Create a model
Run calculations with
dataset
httphost1datasetid
Structures
descriptors
endpoints
Dataset service
Returns the model URL
httphost1modelid
HTTP POST
Build a predictive model
Regression
Classification
Quantum Chemistry
Descriptors etc
validationid
Algorithm service
Validation service
modelid
Published models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to models creation
14Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmneuralnetwork
httpmyhostcommodelpredictivemodel1
Use an algorithm to build a model
Ideaconsult Ltd15
bullAn algorithm is applied by submitting
HTTP POST to the algorithm URI and
providing required parameters
bullA common required parameter is
dataset_uri=httphostportdatasetda
tasetid which specifies the data set to be
operated on
bullHTTP POST in REST style services
returns URI of the result and not the
content of the result
bullThe algorithm services are designed to
store the results into a dataset service and
return the URL of the resulted dataset
bullIn case of slow calculations a Task URI
instead of the dataset URI is returned
$ curl -H Accepttexturi-list -X POST -d
dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d
prediction_feature=httpappsideaconsultnet8080ambit2feature26
701 -d
dataset_service=httpappsideaconsultnet8080ambit2dataset
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmJ48 -iv
Connected to opentoxinformatiktu-muenchende (1311592816) port
8080 (0)
POST OpenTox-devalgorithmJ48 HTTP11
gt Host opentoxinformatiktu-muenchende8080
Accept
gt Content-Type applicationx-www-form-urlencoded
lt HTTP11 202 Accepted
lt Date Sat 31 Jul 2010 144638 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 99
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
Resources The model
Ideaconsult Ltd16
$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-
muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c
About to connect() to opentoxinformatiktu-muenchende port 8080 (0)
Trying 1311592816 connected
Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)
gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11
gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g
zlib1233 libidn18 libssh2018
gt Host opentoxinformatiktu-muenchende8080
gt Accepttexturi-list
gt
lt HTTP11 200 OK
lt Date Sat 31 Jul 2010 144722 GMT
Date Sat 31 Jul 2010 144722 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
lt Vary Accept-Charset Accept-Encoding Accept-Language Accept
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 86
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
bullWhen task URI is returned the
returned status code is HTTP 202
Accepted instead of HTTP 200
OK
bullThis tells the client the processing
is not completed and the client need
to poll the task URI until OK code
is returned
bullThe final result returned by
Example 25 is the URI of the new
model httpopentoxinformatiktu-
muenchende8080OpenTox-
devmodelTUMOpenToxModel_j4
8_48
bullTo obtain prediction results POST
a dataset to the model URI
Read data from a web address ndash process ndash write to a web address
Uniform approach to data processing (eg
Descriptors calculation)
17Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
+ =
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmdescriptorX
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
OpenTox REST operations
Ideaconsult Ltd12
Individual resources (eg a dataset or a model)bull URI template httphostportresourceresourceid eg
httphostportmodelmodel_id or httphostportdatasetdataset_id
bull GET ndash retrieve representation of the resource
bull PUT ndash update representation of the resource
bull POST
ndash replace representation of the resource with a new one (eg replace the dataset with new
content)
ndash initiate calculations based on this resource (eg submit dataset URI to an algorithm resource and obtain a
model URI as a result)
bull DELETE ndash delete the resource
Collections of resources (eg list of all available models or datasets) bull URI template httphostportresource (eg httphostportmodel or httphostportdataset)
bull GET ndash retrieve representation of multiple resources ( eg retrieve all available algorithms)
bull PUT - NA
bull POST ndash create new resource and return its URI (eg create a new dataset by submitting new dataset
content to the dataset service)
bull DELETE ndash NA
Build a predictive model
Create a model
Run calculations with
dataset
httphost1datasetid
Structures
descriptors
endpoints
Dataset service
Returns the model URL
httphost1modelid
HTTP POST
Build a predictive model
Regression
Classification
Quantum Chemistry
Descriptors etc
validationid
Algorithm service
Validation service
modelid
Published models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to models creation
14Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmneuralnetwork
httpmyhostcommodelpredictivemodel1
Use an algorithm to build a model
Ideaconsult Ltd15
bullAn algorithm is applied by submitting
HTTP POST to the algorithm URI and
providing required parameters
bullA common required parameter is
dataset_uri=httphostportdatasetda
tasetid which specifies the data set to be
operated on
bullHTTP POST in REST style services
returns URI of the result and not the
content of the result
bullThe algorithm services are designed to
store the results into a dataset service and
return the URL of the resulted dataset
bullIn case of slow calculations a Task URI
instead of the dataset URI is returned
$ curl -H Accepttexturi-list -X POST -d
dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d
prediction_feature=httpappsideaconsultnet8080ambit2feature26
701 -d
dataset_service=httpappsideaconsultnet8080ambit2dataset
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmJ48 -iv
Connected to opentoxinformatiktu-muenchende (1311592816) port
8080 (0)
POST OpenTox-devalgorithmJ48 HTTP11
gt Host opentoxinformatiktu-muenchende8080
Accept
gt Content-Type applicationx-www-form-urlencoded
lt HTTP11 202 Accepted
lt Date Sat 31 Jul 2010 144638 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 99
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
Resources The model
Ideaconsult Ltd16
$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-
muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c
About to connect() to opentoxinformatiktu-muenchende port 8080 (0)
Trying 1311592816 connected
Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)
gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11
gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g
zlib1233 libidn18 libssh2018
gt Host opentoxinformatiktu-muenchende8080
gt Accepttexturi-list
gt
lt HTTP11 200 OK
lt Date Sat 31 Jul 2010 144722 GMT
Date Sat 31 Jul 2010 144722 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
lt Vary Accept-Charset Accept-Encoding Accept-Language Accept
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 86
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
bullWhen task URI is returned the
returned status code is HTTP 202
Accepted instead of HTTP 200
OK
bullThis tells the client the processing
is not completed and the client need
to poll the task URI until OK code
is returned
bullThe final result returned by
Example 25 is the URI of the new
model httpopentoxinformatiktu-
muenchende8080OpenTox-
devmodelTUMOpenToxModel_j4
8_48
bullTo obtain prediction results POST
a dataset to the model URI
Read data from a web address ndash process ndash write to a web address
Uniform approach to data processing (eg
Descriptors calculation)
17Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
+ =
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmdescriptorX
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Build a predictive model
Create a model
Run calculations with
dataset
httphost1datasetid
Structures
descriptors
endpoints
Dataset service
Returns the model URL
httphost1modelid
HTTP POST
Build a predictive model
Regression
Classification
Quantum Chemistry
Descriptors etc
validationid
Algorithm service
Validation service
modelid
Published models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to models creation
14Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmneuralnetwork
httpmyhostcommodelpredictivemodel1
Use an algorithm to build a model
Ideaconsult Ltd15
bullAn algorithm is applied by submitting
HTTP POST to the algorithm URI and
providing required parameters
bullA common required parameter is
dataset_uri=httphostportdatasetda
tasetid which specifies the data set to be
operated on
bullHTTP POST in REST style services
returns URI of the result and not the
content of the result
bullThe algorithm services are designed to
store the results into a dataset service and
return the URL of the resulted dataset
bullIn case of slow calculations a Task URI
instead of the dataset URI is returned
$ curl -H Accepttexturi-list -X POST -d
dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d
prediction_feature=httpappsideaconsultnet8080ambit2feature26
701 -d
dataset_service=httpappsideaconsultnet8080ambit2dataset
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmJ48 -iv
Connected to opentoxinformatiktu-muenchende (1311592816) port
8080 (0)
POST OpenTox-devalgorithmJ48 HTTP11
gt Host opentoxinformatiktu-muenchende8080
Accept
gt Content-Type applicationx-www-form-urlencoded
lt HTTP11 202 Accepted
lt Date Sat 31 Jul 2010 144638 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 99
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
Resources The model
Ideaconsult Ltd16
$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-
muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c
About to connect() to opentoxinformatiktu-muenchende port 8080 (0)
Trying 1311592816 connected
Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)
gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11
gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g
zlib1233 libidn18 libssh2018
gt Host opentoxinformatiktu-muenchende8080
gt Accepttexturi-list
gt
lt HTTP11 200 OK
lt Date Sat 31 Jul 2010 144722 GMT
Date Sat 31 Jul 2010 144722 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
lt Vary Accept-Charset Accept-Encoding Accept-Language Accept
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 86
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
bullWhen task URI is returned the
returned status code is HTTP 202
Accepted instead of HTTP 200
OK
bullThis tells the client the processing
is not completed and the client need
to poll the task URI until OK code
is returned
bullThe final result returned by
Example 25 is the URI of the new
model httpopentoxinformatiktu-
muenchende8080OpenTox-
devmodelTUMOpenToxModel_j4
8_48
bullTo obtain prediction results POST
a dataset to the model URI
Read data from a web address ndash process ndash write to a web address
Uniform approach to data processing (eg
Descriptors calculation)
17Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
+ =
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmdescriptorX
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Read data from a web address ndash process ndash write to a web address
Uniform approach to models creation
14Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmneuralnetwork
httpmyhostcommodelpredictivemodel1
Use an algorithm to build a model
Ideaconsult Ltd15
bullAn algorithm is applied by submitting
HTTP POST to the algorithm URI and
providing required parameters
bullA common required parameter is
dataset_uri=httphostportdatasetda
tasetid which specifies the data set to be
operated on
bullHTTP POST in REST style services
returns URI of the result and not the
content of the result
bullThe algorithm services are designed to
store the results into a dataset service and
return the URL of the resulted dataset
bullIn case of slow calculations a Task URI
instead of the dataset URI is returned
$ curl -H Accepttexturi-list -X POST -d
dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d
prediction_feature=httpappsideaconsultnet8080ambit2feature26
701 -d
dataset_service=httpappsideaconsultnet8080ambit2dataset
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmJ48 -iv
Connected to opentoxinformatiktu-muenchende (1311592816) port
8080 (0)
POST OpenTox-devalgorithmJ48 HTTP11
gt Host opentoxinformatiktu-muenchende8080
Accept
gt Content-Type applicationx-www-form-urlencoded
lt HTTP11 202 Accepted
lt Date Sat 31 Jul 2010 144638 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 99
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
Resources The model
Ideaconsult Ltd16
$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-
muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c
About to connect() to opentoxinformatiktu-muenchende port 8080 (0)
Trying 1311592816 connected
Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)
gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11
gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g
zlib1233 libidn18 libssh2018
gt Host opentoxinformatiktu-muenchende8080
gt Accepttexturi-list
gt
lt HTTP11 200 OK
lt Date Sat 31 Jul 2010 144722 GMT
Date Sat 31 Jul 2010 144722 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
lt Vary Accept-Charset Accept-Encoding Accept-Language Accept
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 86
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
bullWhen task URI is returned the
returned status code is HTTP 202
Accepted instead of HTTP 200
OK
bullThis tells the client the processing
is not completed and the client need
to poll the task URI until OK code
is returned
bullThe final result returned by
Example 25 is the URI of the new
model httpopentoxinformatiktu-
muenchende8080OpenTox-
devmodelTUMOpenToxModel_j4
8_48
bullTo obtain prediction results POST
a dataset to the model URI
Read data from a web address ndash process ndash write to a web address
Uniform approach to data processing (eg
Descriptors calculation)
17Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
+ =
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmdescriptorX
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Use an algorithm to build a model
Ideaconsult Ltd15
bullAn algorithm is applied by submitting
HTTP POST to the algorithm URI and
providing required parameters
bullA common required parameter is
dataset_uri=httphostportdatasetda
tasetid which specifies the data set to be
operated on
bullHTTP POST in REST style services
returns URI of the result and not the
content of the result
bullThe algorithm services are designed to
store the results into a dataset service and
return the URL of the resulted dataset
bullIn case of slow calculations a Task URI
instead of the dataset URI is returned
$ curl -H Accepttexturi-list -X POST -d
dataset_uri=httpappsideaconsultnet8080ambit2dataset1037 -d
prediction_feature=httpappsideaconsultnet8080ambit2feature26
701 -d
dataset_service=httpappsideaconsultnet8080ambit2dataset
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmJ48 -iv
Connected to opentoxinformatiktu-muenchende (1311592816) port
8080 (0)
POST OpenTox-devalgorithmJ48 HTTP11
gt Host opentoxinformatiktu-muenchende8080
Accept
gt Content-Type applicationx-www-form-urlencoded
lt HTTP11 202 Accepted
lt Date Sat 31 Jul 2010 144638 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 99
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5
Resources The model
Ideaconsult Ltd16
$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-
muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c
About to connect() to opentoxinformatiktu-muenchende port 8080 (0)
Trying 1311592816 connected
Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)
gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11
gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g
zlib1233 libidn18 libssh2018
gt Host opentoxinformatiktu-muenchende8080
gt Accepttexturi-list
gt
lt HTTP11 200 OK
lt Date Sat 31 Jul 2010 144722 GMT
Date Sat 31 Jul 2010 144722 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
lt Vary Accept-Charset Accept-Encoding Accept-Language Accept
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 86
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
bullWhen task URI is returned the
returned status code is HTTP 202
Accepted instead of HTTP 200
OK
bullThis tells the client the processing
is not completed and the client need
to poll the task URI until OK code
is returned
bullThe final result returned by
Example 25 is the URI of the new
model httpopentoxinformatiktu-
muenchende8080OpenTox-
devmodelTUMOpenToxModel_j4
8_48
bullTo obtain prediction results POST
a dataset to the model URI
Read data from a web address ndash process ndash write to a web address
Uniform approach to data processing (eg
Descriptors calculation)
17Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
+ =
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmdescriptorX
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Resources The model
Ideaconsult Ltd16
$ curl -iv -H Accepttexturi-list httpopentoxinformatiktu-
muenchende8080OpenTox-devtaskacdf6eac-d5a2-402c
About to connect() to opentoxinformatiktu-muenchende port 8080 (0)
Trying 1311592816 connected
Connected to opentoxinformatiktu-muenchende (1311592816) port 8080 (0)
gt GET OpenTox-devtaskacdf6eac-d5a2-402c-a4e2-06cd7e3ca1b5 HTTP11
gt User-Agent curl7182 (x86_64-pc-linux-gnu) libcurl7182 OpenSSL098g
zlib1233 libidn18 libssh2018
gt Host opentoxinformatiktu-muenchende8080
gt Accepttexturi-list
gt
lt HTTP11 200 OK
lt Date Sat 31 Jul 2010 144722 GMT
Date Sat 31 Jul 2010 144722 GMT
lt Location httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
lt Vary Accept-Charset Accept-Encoding Accept-Language Accept
lt Accept-Ranges bytes
lt Server Noelios-Restlet-Engine11snapshot
lt Content-Type texturi-listcharset=ISO-8859-1
lt Content-Length 86
lt
Connection 0 to host opentoxinformatiktu-muenchende left intact
Closing connection 0
httpopentoxinformatiktu-muenchende8080OpenTox-
devmodelTUMOpenToxModel_j48_48
bullWhen task URI is returned the
returned status code is HTTP 202
Accepted instead of HTTP 200
OK
bullThis tells the client the processing
is not completed and the client need
to poll the task URI until OK code
is returned
bullThe final result returned by
Example 25 is the URI of the new
model httpopentoxinformatiktu-
muenchende8080OpenTox-
devmodelTUMOpenToxModel_j4
8_48
bullTo obtain prediction results POST
a dataset to the model URI
Read data from a web address ndash process ndash write to a web address
Uniform approach to data processing (eg
Descriptors calculation)
17Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
+ =
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmdescriptorX
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Read data from a web address ndash process ndash write to a web address
Uniform approach to data processing (eg
Descriptors calculation)
17Ideaconsult Ltd
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
Algorithm
GET
POST
PUT
DELETE
+ =
httpmyhostcomdatasettrainingset1
httpmyhostcomalgorithmdescriptorX
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Read data from a web address ndash process ndash write to a web address
Uniform approach to models validation and
report generation
18Ideaconsult Ltd
Dataset
GET
POST
PUT
DELETE
Model
GET
POST
PUT
DELETE
+
=Validation
GET
POST
PUT
DELETE
Report
GET
POST
PUT
DELETEModel generating
predictions
Validation report
httpmyhostcomreport1
httpmyhostcomdatasettrainingset1
httpmyhostcomdatasetpredictedresults1
httpmyhostcommodelpredictivemodel1
httpmyhostcomvalidation
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Apply predictive models
Returns the results
dataset URL
httphostdatasetid
Retrieve available
endpoints and model
URLs eg
httphost1modelid
Published models
Algorithms
Ontologies
metadata
Ontology
service
HTTP POST
SPARQL
HTTP POST
modelidApply the model
httphost1modelid
to dataset
httphost2datasetid
Model service
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Read data from a web address ndash process ndash write to a web address
Uniform approach to model prediction
20Ideaconsult LtdAugust 22 2010
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
+=
httpmyhostcomdatasetid1
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
=
httpmyhostcomdatasetresults1
Model
GET
POST
PUT
DELETE
httpmyhostcommodelpredictivemodel1
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Uniform access to data
Upload data receive
dataset URL
httphost2datasetid
Annotation
Find chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
Publish your data retrieve linked data
Published models
Algorithms
Ontologies
metadata
Ontology service
Feature
GET
POST
PUT
DELETE
Compound
GET
POST
PUT
DELETE
Dataset
GET
POST
PUT
DELETE
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
RDF - Resources representation
bull The opentoxowl ontology
ndash A common OWL data model of all
OpenTox resources
ndash Describes OpenTox resources
ndash Describes relationships between them
ndash Generates objects RDF representations
bull RDFXML representation is mandatory for
OpenTox resources
bull Uniform approach to data representation
ndash Calculated and measured properties of
chemical compounds are represented in an
uniform way
ndash Linked to the resource used for data
generation
ndash Annotated via ontology entries
ndash Model representations link to algorithms
and data used
bull Ideaconsult Ltd
22
All OpenTox components are defined by
OWL ontology
httpopentoxorgapi11opentoxowl
All resources are subclasses of
otOpenToxResource
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Resources Chemical compound
Ideaconsult Ltd23
$ curl -H Acceptchemicalx-daylight-smiles
httpappsideaconsultnet8080ambit2compound1
O=C
$ curl -H Acceptchemicalx-mdl-molfile
httpappsideaconsultnet8080ambit2compound1
CH2O
APtclcactv09040902283D 0 000000 000000
4 3 0 0 0 0 0 0 0 0999 V2000
-06004 00000 00001 O 0 0 0 0 0 0 0 0 0 0 0 0
06072 00000 -00004 C 0 0 0 0 0 0 0 0 0 0 0 0
11472 09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
11472 -09353 00016 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
CompoundProvides different representations for
chemical compounds with a unique and
defined chemical structurecompoundid
Conformercompoundidconformerid
Documentationhttpopentoxorgdevapisapi-11structure
RepresentationA subclass of otOpenToxResource
Supports different Chemical MIME
formats
RDF representation only for specifying
owlsameAs links to external resources
Example 1 Retrieve compound as MOL
Example 2 Retrieve compound as SMILES
Example 3 Query compounds
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querycompoundany-identifier-or-keyword
$ curl ndashH Acceptchemical-mime ldquo
httpappsideaconsultnet8080ambit2querysmartssearch=smarts
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Resources Feature
Ideaconsult Ltd24
FeatureA Feature is a resource representing any
kind of a property or identifier assigned to
a Compound
The feature types are determined via their
links to ontologies (Feature ontologies
Decsriptor ontologies Endpoints
ontologies)featureid
Documentationhttpopentoxorgdevapisapi-11feature
RepresentationotFeature a subclass of
otOpenToxResource
Mandatory RDFXML format
Properties
bullName defined by dctitle (Dublin Core namespace )
bullUnits defined by otunits annotation property (OpenTox
namespace)
bullCreator defined by dccreator annotation property
(Dublin Core namespace)
bullThe origin of the Feature is defined by othasSource
object property (OpenTox namespace) element and can be
otAlgorithm otModel or otDataset
bullRelations to other resources which represent the same
entity could be established via owlsameAs property
This approach can be used for example to link the
otFeature resource to a resource from another ontology
(an example follows)
bullThere are subclasses of otFeature (namely) which are
used otNumericFeature otStringFeature
otNominalFeature denote if a feature holds numeric
nominal or string values
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Resources Feature (an example)
Ideaconsult Ltd25
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature22114
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlnsotee=httpwwwopentoxorgechaEndpointsowl
xmlnsdc=httppurlorgdcelements11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsowl=httpwwww3org200207owl
xmlnsxsd=httpwwww3org2001XMLSchema
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltowlClass rdfabout=httpwwwopentoxorgapi11Algorithmgt
ltowlClass rdfabout=httpwwwopentoxorgapi11Featuregt
ltowlClass rdfabout=httpwwwopentoxorgapi11NumericFeaturegt
ltrdfssubClassOf
rdfresource=httpwwwopentoxorgapi11Featuregt
ltowlClassgt
ltotNumericFeature rdfabout=feature22114gt
ltothasSourcegt
ltotAlgorithm
rdfabout=algorithmorgopensciencecdkqsardescriptorsmolecularXLo
gPDescriptorgt
ltothasSourcegt
ltowlsameAs rdfresource=ldquooteeOctanol-
water_partition_coefficient_Kowgt
ltdctitlegtXLogPltdctitlegt
ltrdftype rdfresource=httpwwwopentoxorgapi11Featuregt
ltotNumericFeaturegt
ltrdfRDFgt
The example shows an OpenTox feature with title ldquoXLogPrdquo and identified by the URI httpappsideaconsultnet8080ambit2feature22114
bullLinked to an entry of a simplified ontology of toxicological endpoints
The algorithm used to generate values for this featurebullSpecified by otalgorithm propertybullidentified by the URIhttpappsideaconsultnet8080ambit2algorithmorgopensciencecdkqsardescriptorsmolecularXLogPDescriptor
bullNote the URI identifies an OpenTox Algorithm resourcebullThe algorithm URI itself is dereferensable bullCan be used to initiate calculations of XLogP descriptor
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Resources Feature (an example)
Ideaconsult Ltd26
$ curl -H Accepttextn3 httpappsideaconsultnet8080ambit2feature26184
prefix ot lthttpwwwopentoxorgapi11gt
prefix dc lthttppurlorgdcelements11gt
prefix lthttpappsideaconsultnet8080ambit2gt
prefix rdfs lthttpwwww3org200001rdf-schemagt
prefix owl lthttpwwww3org200207owlgt
prefix xsd lthttpwwww3org2001XMLSchemagt
prefix rdf lthttpwwww3org19990222-rdf-syntax-nsgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
othasSource
a owlObjectProperty
otunits
a owlDatatypeProperty
otFeature
a owlClass
otNumericFeature
a owlClass
rdfssubClassOf otFeature
af26184
a otFeature otNumericFeature
dccreator httpambituni-plovdivbg8080ambit2
dctitle TUM_CDK_XLogP
othasSource lthttpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptorgt
otunits
= oteeOctanol-water_partition_coefficient_Kow
An example (N3) of another otFeature
resource XLogP descriptor (again)
bullGenerated by different implementation
bullIn this case its name is
ldquoTUM_CDK_XLogPrdquo and the algorithm
resource used to generate resides at Technical
University of Munich (TUM) premises
httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithmCDKPhysChemXLogPDescriptor
This algorithm URL could also be used to
initiate descriptor calculations
bullThe representation of Algorithm resources
refers to the BlueObelisk ontology entry httpwwwblueobeliskorgontologieschemoinformatics-
algorithmsxlogP
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Toxicity endpoints ontologies
Ideaconsult Ltd27
bullDerived from ECHA classification of
endpoints published in REACH guidance
documents
bullPhysicochemical properties and various
toxicological endpoints
bullThe hierarchy doesnrsquot represent the
complexity of toxicological assays but
can be used as a first approximation to
assign meaning to the data entries and
generate REACH report
bullMore specific description of toxicological
assays can be used as well
bullOntologies for specific toxicity assays are
developed by OpenTox partners
bullThe ECHA endpoints ontology httpwwwopentoxorgechaEndpointsowl
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Resources a feature
Ideaconsult Ltd28
curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2feature3
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
xmlns=httpappsideaconsultnet8080ambit2
xmlnsaf=httpappsideaconsultnet8080ambit2feature
hellip
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotFeature rdfabout=feature3gt
dccreatorgt
ltothasSourcegtECHAhellipltothasSourcegt
ltowlsameAs rdfresource=httpwwwopentoxorgapi11EINECSgt
ltotunitsgtltotunitsgt
ltdctitlegtECltdctitlegt
ltotFeaturegt
ltrdfRDFgt
An illustration of otFeature
imported from a file and not
calculated
The example shows a feature
representing EINECS
number imported from the
ECHA preregistration list
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Feature summary
bull OpenTox Feature resource uniquely identifies properties and identifiers assigned to a compound via feature URIs
bull These URIs are dereferencable
bull Allow to assign different levels of meaning by linking to entries to ontologies (eg algorithms or toxicological endpoints ) as well as linking to the algorithms which can be used to generated property values
bull The same approach can be used to denote assays provided that the assay is defined by an ontology species functional groups etc
Ideaconsult Ltd29
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Resources Dataset
OperationsPOST ndash Upload a dataset
PUT ndash Update the dataset content
DELETE ndash Remove the dataset
Representation
RDFXML (mandatory)
bullThe dataset consists of data entries
bullEach entry is associated with exactly one
chemical compound identified by its
URI and available via OpenTox
Compound service API
bullOne and the same compound can be
associated with multiple dataset entries
bullEvery ldquocolumnrdquo is associated with a
Feature its representation should be
available via OpenTox Feature API
Ideaconsult Ltd30
DatasetProvides access to chemical compounds and their
features (eg structural physical-chemical
biological toxicological properties)
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Resources Dataset
Representation
Ideaconsult Ltd31
The dataset services optionally
support formats other than RDF
bulltextcsv (Comma delimited)
bulltextx-arff (Weka ARFF)
bullapplicationpdf
bullchemicalx-mdl-sdfile
bullother Chemical MIME formats
This allows retrieving the same data
in convenient format but the URL
links to compound and feature
resources are being lost
prefix ad lthttpappsideaconsultnet8080ambit2datasetgt
prefix af lthttpappsideaconsultnet8080ambit2featuregt
prefix ot lthttpwwwopentoxorgapi11gt
hellip
ad9 a otDataset
otdataEntry
[ a otDataEntry
otcompound
lthttpappsideaconsultnet8080ambit2compound413conformer409421gt
otvalues
[ a otFeatureValue
otfeature af21576
otvalue 3309999942779541^^xsddouble
]
otvalues
[ a otFeatureValue
otfeature af21573
otvalue 30^^xsddouble
]
]
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Dataset metadata and features
Ideaconsult Ltd32
Description URI Template
Retrieve entire dataset content If uri-list
retrieve only compound URIs
httphostportdatasetid
Retrieve representation of features (columns)
of the dataset
httphostportdatasetidfeature
Retrieves dataset metadata (name etc) httphostportdatasetidmetadata
$ curl -H Acceptapplicationrdf+xml httpappsideaconsultnet8080ambit2dataset9metadata
ltrdfRDF
xmlnsot=httpwwwopentoxorgapi11
helliphellip
xmlnsrdfs=httpwwww3org200001rdf-schema
xmlbase=httpappsideaconsultnet8080ambit2gt
ltotDataset rdfabout=dataset9gt
ltdcsourcegtISSCAN_v3a_1153_19Sept081222179139sdfltdcsourcegt
ltdcpublishergtsomebodyltdcpublishergt
ltrdfsseeAlsogt
ltbxEntry rdfabout=reference20117gt
ltrdfsseeAlsogthttpwwwepagovNCCTdsstoxsdf_isscan_externalhtmlltrdfsseeAlsogt
ltdctitlegtISSCAN_v3a_1153_19Sept081222179139sdfltdctitlegt
ltbxEntrygt
ltrdfsseeAlsogt
ltdctitlegtISSCAN Istituto Superiore di Sanita CHEMICAL CARCINOGENS STRUCTURES AND EXPERIMENTAL DATAltdctitlegt
ltotDatasetgt
ltrdfRDFgt
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Data publishing
Upload data receive
dataset URL
httphost2datasetid
AnnotationFind chemical compounds
return dataset URL
httphost2compoundid
HTTP GET POST
Dataset service
Structures
endpoints
amppredictions
1)POST a file with chemical structures and properties to
OpenTox dataset servicebullThe structures and data are assigned a dataset URL and
become available by multiple formats (RDF Chemical
MIME CSV Weka ARFF)
2)Assign metadata
bullPUT datasetidmetadata
3)Annotate any of dataset features
datasetidfeature by assigning links to relevant
ontologies
bullPUT featureid
Toxiciology related
ontologies
Algorithms
ontologies
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Resources Algorithm
AlgorithmProvides access to OpenTox algorithms There are several algorithm services developed by
different OpenTox partners List of algorithms can be retrieved by HTTP GET operation at
httphostportalgorithm
Ideaconsult Ltd34
curl -H Accepttexturi-list httpopentoxinformatiktu-muenchende8080OpenTox-
devalgorithm
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNclassification
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJ48
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmkNNregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmPLSregression
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmM5P
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmGaussP
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmFTMsmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmgSpansmiles
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmCDKPhysChem
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmJOELIB2
httpopentoxinformatiktu-muenchende8080OpenTox-devalgorithmInfoGainAttributeEval
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Resources Algorithm
Representation
bull Multiple type of algorithms
bull descriptor calculation algorithms
bull machine learning procedures
bull data preprocessing
bull The representation of algorithms is again defined by
Opentox ontology where all algorithms are subclass of
otAlgorithm
Algorithm types ontology
httpopentoxorgdatadocumentsdevelopmentRDF2
0filesAlgorithmTypes
bull provides a classification of algorithm types
bull Algorithm type in RDF representation is set by direct
subclassing (rdftype) of a class from the algorithm types
ontology (otahttpwwwopentoxorgalgorithmsowl )
eg ltmyalgorithmgt rdftype otaClassification
Ideaconsult Ltd35
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Resources Algorithm
Ideaconsult Ltd36
$ curl -H Acceptapplicationrdf+xml
httpappsideaconsultnet8080ambit2algorithmJ48
ltrdfRDF
xmlnsrdf=httpwwww3org19990222-rdf-syntax-ns
xmlnsot=httpwwwopentoxorgapi11
hellip
xmlnsota=httpwwwopentoxorgalgorithmTypesowl
ltotaSupervised
rdfabout=httpappsideaconsultnet8080ambit2algorithmJ48gt
ltdctitle rdfdatatype=httpwwww3org2001XMLSchemastring
gtClassification Decision tree J48ltdctitlegt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlClassificationgt
ltdcdescription rdfdatatype=httpwwww3org2001XMLSchemastring
gtltdcdescriptiongt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlSingleTargetgt
ltdcpublisher rdfdatatype=httpwwww3org2001XMLSchemaanyURI
gtSomebodyltdcpublishergt
ltrdftype
rdfresource=httpwwwopentoxorgalgorithmTypesowlEagerLearninggt
ltrdftype rdfresource=httpwwwopentoxorgapi11Algorithmgt
ltdcdate rdfdatatype=httpwwww3org2001XMLSchemadateTime
gtSat Jul 31 171126 EEST 2010ltdcdategt
ltotaSupervisedgt
ltrdfRDFgt
RepresentationbullAlgorithm name is defined by dctitle
Parameters supported by the algorithm are
specified via object property otparameters
and should be of class otParameter (as
defined in opentoxowl)
These entries serve as a information what
parameters are required in order to run the
algorithm the values itself should be
provided by the client when initiating the
calculations via POST
bullAlgorithm types are distinguished by
means of Algorithm types ontology
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Resources Model
bull Representations of predictive models
bull A Model is created by HTTP POST to an
otAlgorithm with specific parameters
andor input otDataset
Ideaconsult Ltd37
Representation
bull Model Name is defined by dctitle property
bull Model creator might be defined by dccreator
property
bull The date of Model creation is defined by dcdate
property
bull The Algorithm defined by otalgorithm object
property
bull The independent variables are instances of
otFeature defined by otindependentVariables
property (can be multiple)
bull The dependent variables are are instances of
otFeature and are defined by
otdependentVariables property (can be multiple)
bull The variables where prediction results will be
stored are are instances of otFeature and are
defined by otpredictedVariables (can be multiple)
bull Parameters are defined by otparameters
bull The training Dataset is an instance of otDataset and
defined by ottrainingDataset
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Algorithm
service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd38
Model
resourceDataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Feature
service
Feature
service
Compoun
d service
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Linked resources Compound Algorithm Model Dataset Features
Ideaconsult Ltd39
Dataset
Resource
Descriptor
resource
Assay
resource
Chemical
compound
Regression
Classification
Quantum
Chemistry
Descriptorsetc
Blue Obelisk
algorithms
ontology
OpenTox
algorithm types
ontology
Toxiciology related
ontologies
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Make the model available
40Ideaconsult Ltd
Register at OpenTox ontology servicendash RDF tripple storage
ndash Accepts HTTP POST
ndash SPARQL endpoint
Curl ndashX POST ndashd rdquouri=httpappsideaconsultnet8080ambit2model57rdquo httpappsideaconsultnet8080ontology
Becomes visible for applications
modelidPublished models
Algorithms
Ontologies
metadata
Ontology service
HTTP POST
Model service
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Services implementation by partner and
service type
Ideaconsult Ltd41
Part
ner
No
Serv
ice t
ype
Com
pound
Data
set
Featu
re
Alg
ori
thm
(pro
cess
ing)
Alg
ori
thm
(model)
Model
Task
Report
Validati
on
Auth
ern
ticati
on
and A
uth
ori
sati
on
serv
ice
Onto
logy s
erv
ice
2 Y Y Y Y Y Y
3
(IDEA)
Y Y Y Y Y Y Y Y
5 Y Y Y
6 Y Y Y Y
7 Y Y Y
10 Y
All components are implemented as REST web services
There could be multiple implementations of same type of
components
(Subset of) services could be hosted by the same provider or
by multiple providers on separate locations
httpappsideaconsultnet8080ambit2
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
OpenTox Is A Framework
Framework
Unified Access
Open Source
bull Toxicity data
bull Predictive models
bull Validation support
bull Interpretation aids
bull Toxicologists
bull Modelers
bull API for new algorithmsdevelopment amp integration
bull To optimise impact
bull To allow inspection review
bull To attract external contributors
OpenTox services can be used to develop specific applications or embedded in
workflow systems
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
bull Two end user oriented demo applications making use of OpenTox
webservices have been developed deployed and are available for
testing ndash httptoxcreateorg and httptoxpredictorg
bull ToxCreate creates models from user supplied datasets
bull ToxPredict uses existing OpenTox models to estimate chemical
compound properties
Demo applications
43Ideaconsult LtdAugust 22
2010
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
RDF Lessons learned
bull OpenTox specific ndash it hasnrsquot started as Linked dataRDF project
bull REST and RDF mix is not (yet) popularhellip but is natural to be able to retrieve (partial)
resource representation described by triples
bull Steep learning curve
bull Some hard topicsndash Data model vs format
ndash The subject-predicate-object concept vs tabularhierarchicalother implicit structure
ndash The recognition of the added value bull XML JSON YAML plain text etc vs RDF
Ideaconsult Ltd44
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
RDF Performance
RDF representation is
verbose hellip
hellip and in-memory RDF
libraries are slow hellip
Ideaconsult Ltd45
A dataset with 320 chemicals 60 columns
A dataset with 6500 chemicals 12 columns
hellip lack of streaming parserswriters hellip
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
RDF Wish list
bull Convenient explanation of subject-predicate-
object concept for beginners
bull A (high performant) triple storage should not be
a mandatory requirement to publish RDF data
bull Fast streaming parsers and writers
bull (terse) JSON serialisation
bull Security
bull Synchronisation of distributed RDF content
Ideaconsult Ltd46
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet
Thank you
Ideaconsult Ltd47
Build an application with OpenTox REST Web Services API
httpopentoxorgdevapisapi-11
Download AMBIT Implementation of OpenTox API
and launch your OpenTox service
httpambitsourceforgenet