15
RICORDO http://www.RICORDO.eu/ Researching Interoperability using Core Reference Datasets and Ontologies for the Virtual Physiological Human Small or medium-scale focused research project (STREP) Grant agreement number 248502 (Collaborative Project) within the European Commission FP7 Framework Programme Deliverable D4.3: Documentation of prototype RICORDO repository and basic web-based query system Task 4.2: Implementation of a prototype RICORDO repository for the VPH Work Package: Establishing a resource interoperability plan for the VPH Toolkit Development Due date of deliverable: 31-Jan-2011 Actual submission date: 31-Jan-2011 Start date of project: 1-Feb-2010 Duration: 24 months Organisation name of lead contractor for this deliverable: EMBL-EBI Authors: Sarala Wimalaratne, Pierre Grenon, Robert Hoehndorf, George Gkoutos, and Bernard de Bono Dissemination Level PU Public X PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission Services)

RICORDO - macs.hw.ac.uk · As part of the RICORDO project, a communal methodis being established to support consi stent structured annotation of VPHDMs that can be processed by machines

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RICORDO - macs.hw.ac.uk · As part of the RICORDO project, a communal methodis being established to support consi stent structured annotation of VPHDMs that can be processed by machines

RICORDO

http://www.RICORDO.eu/

Researching Interoperability using Core Reference Datasets and Ontologies for the Virtual Physiological Human

Small or medium-scale focused research project (STREP)

Grant agreement number 248502 (Collaborative Project) within

the European Commission FP7 Framework Programme Deliverable D4.3: Documentation of prototype RICORDO repository and basic web-based query system

Task 4.2: Implementation of a prototype RICORDO repository for the VPH Work Package: Establishing a resource interoperability plan for the VPH Toolkit Development Due date of deliverable: 31-Jan-2011 Actual submission date: 31-Jan-2011 Start date of project: 1-Feb-2010 Duration: 24 months Organisation name of lead contractor for this deliverable: EMBL-EBI Authors: Sarala Wimalaratne, Pierre Grenon, Robert Hoehndorf, George Gkoutos, and Bernard de Bono

Dissemination Level PU Public X PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission Services)

Page 2: RICORDO - macs.hw.ac.uk · As part of the RICORDO project, a communal methodis being established to support consi stent structured annotation of VPHDMs that can be processed by machines

2

INTRODUCTION The RICORDO project aims to establish semantic interoperability between VPH data and models (VPHDMs) by:

i. identifying appropriate dictionaries (i.e. reference databases and ontologies) to create a set of co

re reference datasets and ontologies (CORDO) that will provide the stable identifiers (IDs) for the annotation of VPHDMs

ii. organizing a VPHDM interoperability plan (WP2) that is based on the co-ordinated approach of a resource annotation plan (WP3) and a corresponding technical framework (WP4) to enable communal application of and access to such annotations

iii. designing and prototyping the RICORDO infrastructure that will support efficient

CORDO ID-based queries across distributed VPHDM services (WP4)

iv. developing associated prototype look-up repositories to support the use of CORDO dictionary terms to search and integrate VPHDMs efficiently (WP4).

In particular, the latter objective is to co-ordinate the joint development of methodologies to: • Annotate VPHDM resources using terms in biological ontologies; • Construct composites to represent complex biological concepts; • Provide accessibility to such information for the VPH community through development of a

software infrastructure.

A prototype of the RICORDO repository and basic web-based query system is discussed in this report. The discussion is focused on development of:

• Repository infrastructure to store VPHDM annotations and ontological resources; • Software applications for users to query across annotated VPHDMs.

BACKGROUND The Virtual Physiological Human (VPH) community deals with large collections of anatomical, physiological, and pathological data stored in computer readable format. These data can range from computational models to patient specific data in clinical databases. The representations of these VPH data and models (VPHDM) are therefore as heterogeneous. Frequently the underlying biological concepts captured in VPHDM meta-data are not explicit. It is also common practice to represent meta-data as a comment in free text. Occasionally, the data is annotated with terms from biological ontologies. The lack of consistent annotation of the data makes it difficult to share these different types of data. Moreover, the biological concepts covered in the VPHDM spans across multiple domains, and therefore multiple ontologies. There is no formal definition for integrating terms from multiple ontologies, making it difficult to correctly annotate VPHDMs. As part of the RICORDO project, a communal method is being established to support consistent structured annotation of VPHDMs that can be processed by machines. This will promote sharing the body of knowledge contained in the different types of data and models relevant to the VPH community. A formal definition for integrating terms from multiple ontologies which facilitate the consistent communal annotation of VPHDM is being developed. In this document, we provide an initial assessment and documentation of an infrastructure implemented to support the interoperability requirements for VPHDMs (discussed in RICORDO’s D2.3).

Page 3: RICORDO - macs.hw.ac.uk · As part of the RICORDO project, a communal methodis being established to support consi stent structured annotation of VPHDMs that can be processed by machines

3

DESCRIPTION OF WORK This deliverable presents a description on the prototype RICORDO repository (section [I]) and basic web-based query system (section [II]). Further details about technologies referenced in the text below, are listed at the end of this document.

[I] RICORDO Repository The central module in the RICORDO repository infrastructure is a store of machine processable, semantic annotations of VPHDM resources. This section will detail the components of the infrastructure that allow to manage this resource module, including maintaining the store and querying with the help of intermediate reasoning on ontological resources used in the semantic annotations of VPHDM resources.

Overview of the architecture RICORDO follows a simple three layer architecture (see Figure 1) with a presentation layer for the interfacing with user interactions, a logical layer for carrying out the core functionality, and data bases which store the relevant sources. Figure 1 illustrates the basic components and workflow for the query application: • Web interfaces – basic forms that allow users to query the VPHDMs; • Core implementation - which handles the user requests coming from the web interfaces; • Data sources – stores ontological and metadata that is relevant to VPHDMs.

The organization of the current prototype implementing the above architecture is illustrated in Figure 2. This implementation consists of: • Query service application – which allows users to interact with the VPHDMs; • Application server – to deploy the RICORDO query service application which integrates the

infrastructures and expose it to the community to enable sharing of VPHDMs; • Reasoner server– to support storage and reasoning of ontology space; • Resource Description Framework (RDF) repository – to store and query VPHDM metadata.

Figure 1: Overview of the query application functional architecture.

Query Interface Provides an interface for querying different types of VPHDMs

OWL Reasoner Find relevant terms using ontological querying

SPARQL Search Engine RDF based metadata search engine

OWL Ontologies FMA, GO, CHEBI, Composites, and PATO

RDF Annotations Repository of standardized annotations of VPHDMs

Ontology Term finder Find terms using term names

Web interfaces

Core implementation

Data sources

Page 4: RICORDO - macs.hw.ac.uk · As part of the RICORDO project, a communal methodis being established to support consi stent structured annotation of VPHDMs that can be processed by machines

4

RDF annotations are statements making references to identifiers for elements within VPHDMs, on the one hand, and identifiers of terms in OWL ontologies, on the other. As a result, RICORDO application does not require access to original VPHDM data sources, but only has to point to a relevant set of IDs (this aspect of the system is illustrated in Figure 2 by VPHDMs being depicted outside the red RICORDO ‘box’). This is useful if VPHDMs providers choose not to publicize their data but limit themselves to providing references to the type of data that they maintain.

Description of the key steps in the architecture The key workflow steps of the current architecture are illustrated in Figure 3, as follows: 1. A user request originating from the query service (QS) web interface passes to the application

server (AS); 2. The latter connects to reasoner server to find ontology terms; 3. The reasoner server returns the relevant ontological terms to the AS; 4. The AS connects to the RDF triple store to query VPHDM metadata; 5. The RDF store returns relevant metadata to the AS; 6. AS provides the results to the QS to display the VPHDMs results on the interface.

In this particular implementation of the RICORDO infrastructure, the application also accesses external web services, as follows: • Ontology Lookup Service (OLS) - The applications that are developed in this work will require

us to access a number of large ontological resources. Thus, OLS to query for ontological terms. This process occurs between steps 1 and 2;

• MIRIAM Services - Mapped ontological terms are stored following the MIRIAM urn scheme. Therefore MIRIAM WS to resolve MIRIAM urns and MIRIAM resources. This process occurs between steps 3 and 4.

The application server is developed using java version of the Google Web Toolkit (GWT). GWT is a development toolkit for building and optimizing complex browser-based applications. It is an open

Figure 2: The current implementation of the RICORDO infrastructure (enclosed by a red margin). The RICORDO application is deployed in a tomcat application server. It interacts with (i) the OWL knowledge base which is deployed in a Pellet OWLlink server, and (ii) the RDF repository which resides in a Joseki server. RDF annotations carry references to OWL ontology terms and VPHDM elements.

Reasoner server [Pellet OWLlink]

RDF triple store [Joseki]

Application server

[Tomcat]

OWL Ontologies

RDF annotations

Data sources

VPHDMs

RICORDO query service application

Page 5: RICORDO - macs.hw.ac.uk · As part of the RICORDO project, a communal methodis being established to support consi stent structured annotation of VPHDMs that can be processed by machines

5

source tool developed by Google to enable productive development of high-performance web applications. Java is used as the implementation language to take advantage of other open source developments which are discussed in the following sections. Sections below describe in detail the interactions between the application and: (i) reasoner server, as well as (ii) the triple store.

Discussion on reasoner server OWL is a formal language based on description logic, and most biomedical ontologies are available in OWL. A key benefit of the use of OWL for representing knowledge is the possibility to perform automated reasoning across this knowledge. Reasoning is the process by which statements are automatically inferred based on a set of statements that serve as axioms. For example, based on the information that a finger must be part of a hand, and a hand a part of an arm, combined with the information that the part-of relation is transitive, it can be inferred that a finger must necessarily be a part of an arm even if this information is not explicitly stated. With an increasing number of classes and relations in an ontology, such a task can become increasingly more complex. Automated reasoning facilitates the automation of this process and thereby enables flexible access to information, the exploration of the consequences of stated knowledge as well as the automated detection of contradictions. One of the requirements in discussed in RICORDO deliverable D2.3 was to set up a framework that supports reasoning over a number of large ontologies. OWLlink specifies a standard way to interact with reasoning services by providing an extensible protocol for communication with OWL reasoning systems. The OWLlink protocol facilitates client applications to configure a reasoner to transmit ontologies and to access reasoning services via a set of basic queries. The OWLlink API is a Java implementation of the OWLlink protocol on top of the Java-based OWL API. It enables applications to access remote reasoners which are referred to as OWLlink servers. It turns any OWL API aware reasoner into an OWLlink server such as Pellet, HermiT, and FaCT++. As shown in Figure 1, we use the Pellet OWLlink server. Requests and responses corresponding to step 2 and 3 are illustrated in both Figures 3 and 4.

Figure 3: Key workflow steps in the RICORDO infrastructure implmentation.

Reasoner server [Pellet OWLlink]

RDF triple store [Joseki]

Application server

[Tomcat]

RICORDO query service application

1

2

3 4 5

6

Page 6: RICORDO - macs.hw.ac.uk · As part of the RICORDO project, a communal methodis being established to support consi stent structured annotation of VPHDMs that can be processed by machines

6

RICORDO ontologies are loaded into a knowledge base created in the Pellet OWLlink server (Figure 4). This allows us to query over the RICORDO ontologies using OWLlink requests and responses. Currently the RICORDO OWLlink ontology server is deployed at http://bioonto.gen.cam.ac.uk:8081., and it is possible for applications to query the RICORDO ontologies directly via this address.

Discussion on RDF triple store The RICORDO RDF store is a repository of machine processable, semantic annotations of VPHDM resources. In most cases, VPHDM resources (such as mathematical models or database schemas) contain an interpretative biological and biomedical dimension which is largely implicit. In the best cases, some traces of biomedical meaning of resources and their components is found in free-text annotations of resources and their parts. In RICORDO, such annotations are replaced, sometimes as a result of conversion, by a machine processable equivalent. These annotations are statements in a machine processable language that link identified resources and their components to terms in ontologies that represent elements of biological and biomedical reality. RICORDO uses the resource description framework (RDF) to record and maintain these linkages which are kept in a centralised repository. Metadata represented in VPHDMs are extracted into a centralized database (in this case, the RDF triple store). This strategy has several advantages, as it: • hides the complexity of the VPHDM structure and representation; • allows us to query the metadata independent to data sources; • makes it possible to maintain the privacy of the data sources.

A centralized RDF repository is used to store the annotation information (Figure 5). The store contains basic information about the VPHDMs and their elements and the mappings to ontological concepts. Storing the data in RDF supports complex querying. SPARQL, a query language for RDF, is used to querying the RDF data store. Requests and responses correspond to step 4 and 5 in Figure 3, respectively.

Figure 4: Workflow for Reasoner: once the knowledge base is set up in the Pellet OWLlink server, the knowledge can be retrieved using OWLlink requests. This diagram shows two examples, retrieving equivalent classes and subclasses.

key

Pellet OWLlink server

CreateKB

LoadOntologies

OK

OK

GetEquivalentClasses

SetOfClasses

GetSubClasses

SetOfClasseSynsets

OW

LlinkReasoner Interface

Setting up the knowledge base

Retrieving knowledge

requests

responses

OWL Ontologies

(FMA, GO, CHEBI, Composites, and

PATO)

2

3

2

3

Page 7: RICORDO - macs.hw.ac.uk · As part of the RICORDO project, a communal methodis being established to support consi stent structured annotation of VPHDMs that can be processed by machines

7

Joseki is an open source Java framework for storing, querying and reasoning with RDF and RDF Schema. We use a Joseki server to store VPHDM metadata and Jena API to access the data store. Java based Jena API is used to handle RDF triples and SPARQL queries.

[II] Querying The RICORDO query service application is developed to allow users to search across annotated data repositories. This application allows users to find ontological concepts that are of interest and to search across the VPHDM meta-data to find VPHDMs that have mappings to the selected concepts. The application has been tested on all widely used web browsers. The current development version can be accessed at the following URL: http://bioonto.gen.cam.ac.uk:8080/ricordo. The source code is available from http://code.google.com/p/ricordo/. Figure 6 shows the main page of the RICORDO query application. This particular application requires users to be familiar with Manchester query syntax that is used to query the ontological knowledge. As shown in the figure, users can type in the Manchester query directly or select the type of Manchester query they would like to construct.

Figure 5: Workflow for the RICORDO repository: the Joseki server is configured to load VPHDM metadata which resides in RDF files. Once the Joseki server is set up SPARQL queries can be executed using the Jena API.

Joseki server

RDF annotations <rdf:Description rdf:about="http://www.ebi.ac.uk/ricordo/ersatz/barcelona/CVM0000000001#_000002"> <rdf:type rdf:resource="http://www.ebi.ac.uk/ricordo/model#model" /> <rcmd:modelElementOf rdf:resource="http://www.ebi.ac.uk/ricordo/ersatz/barcelona/CVM0000000001#_000001" /> <bqbiol:isVersionOf rdf:resource="urn:miriam:obo.fma:FMA%3A7286"/> </rdf:Description>

Setting up a Joseki server

SPARQL query

Results

Retrieving metadata

Configure joseki server to load VPHDM metadata via RDF files

Jena API

key requests

responses

4

5

Page 8: RICORDO - macs.hw.ac.uk · As part of the RICORDO project, a communal methodis being established to support consi stent structured annotation of VPHDMs that can be processed by machines

8

Figure 6: Query page to search for VPHDMs based on their ontological annotations

Using templates to build queries The query application allows the user to select templates with which to construct query terms. Each template has a particular form to specify the query terms (see Figure 7). Currently we support the following templates: • <Term> – the Term template allows users to query the RICORDO RDF store for VPHDM

resources directly related to the selected term or one of its specializations (i.e. subclasses). For example, if the <term> selected is Cardiac Atrium from the FMA, the query application will return all VPHDM resources bearing annotations to this ontological term or one of its specialisations. For example, it will return also VPHM resources annotated to the (FMA's) Left Atrium.

• Inheres-in some <term> – the Inheres-in some term template allows users to query the RICORDO RDF store for VPHDM resources related to a quality that inheres in the term <term>. For example, if term is Heart from the FMA, the query application will return all VPHDM resources bearing annotations to ontological terms representing qualities of the heart. (Inheres-in is an ontological relation that obtains between a quality and its bearer, for example the volume of a heart.)

• Part_of some <term> – the Part_of some term template allows users to query the RICORDO RDF store for VPHDM resources related to the term <term> or one of its parts. For example, if term is Heart from the FMA, the query application will return all VPHDM resources bearing annotations to this term as well as to ontological terms representing parts of the heart such as, for example, the left or right atria.

• Part_of some <term> AND part_of some <term> – the Part_of some term AND part_of some term template allows users to query the RICORDO RDF store for VPHDM resources that are annotated/characterized with a class that represents entities which are part of two other entities. For example, a query for part_of some Abdomen AND part_of some Vasculature will return, amongst other things, VPHDM resources annotated with "kidney blood vessel". The "kidney blood vessel" is both a part of the Abdomen and a part of the Vasculature, and therefore a correct answer to the query.

• <Term> and inheres-in some (part_of some <term>) – the Term and inheres-in some (part_of some term) template illustrates complex queries that made by combining simpler ones. This template allows to query the RICORDO RDF store for VPHDM resources related to ontological terms that are i) specialisations of an ontological <term> standing for a given kind of quality, for example, volume, and ii) is a quality of some parts of a second given <term>. For example, if the first term is Volume from PATO and the second term is Heart from FMA, the query application will return all VPHDM resources bearing annotations to composite ontological terms representing the volume of the heart or one of its parts.

Page 9: RICORDO - macs.hw.ac.uk · As part of the RICORDO project, a communal methodis being established to support consi stent structured annotation of VPHDMs that can be processed by machines

9

Figure 7: Manchester query templates

Building a simple query Selecting a particular Manchester query type will generate fields to construct a Manchester query where users can type in terms to build the ontological concept they are interested in. These fields also support auto completion of ontological terms. Figure 8 illustrates the selection of left atrium.

Figure 8: Constructing a simple query

When the selection is complete, this creates the relevant Manchester query (see Figure 9). Once the Manchester query is constructed users can click on the ‘Search’ button.

Page 10: RICORDO - macs.hw.ac.uk · As part of the RICORDO project, a communal methodis being established to support consi stent structured annotation of VPHDMs that can be processed by machines

10

Figure 9: Manchester query for left atrium

Using complex templates to build queries Figure 10 illustrates a construction of a complex query using a template. Each reference to term requires an ontological concept to be filled in.

Figure 10: Complex query template

Figure 11 shows the construction of a complex query. It applies the Term and inheres-in some (part_of some term templates. In this case, the complex query formed binds the first term of the template to the ontological term volume from PATO, representing volumes, and the second term of the template to the ontological term heart, representing hearts. The query 'volume and inheres-in some (part_of some heart)' will return VPHDM resources annotated with PATO volume, or one of its specializations (i.e. subclasses), that are volumes of some part of the heart or the heart itself.

Page 11: RICORDO - macs.hw.ac.uk · As part of the RICORDO project, a communal methodis being established to support consi stent structured annotation of VPHDMs that can be processed by machines

11

Figure 11: Construction of a complex query: 1) a query template is selected, 2) the relevant fields are informed, 3) the ontological query term is formatted in Manchester syntax, using the appropriate ontology identifiers.

Interpreting query results The search will return a list of all the VPHDMs that have annotations to the ontological concept. Figure 12 shows the search results for all VPHDMs that have references to left atrium. It lists the model URL and the frequency of annotations.

Figure 12: Search results

The results table in Figure 12 can be further explored to find out individual elements with the annotations. Figure 13 lists the individual annotations and shows the variable URL, the property (biological qualifier) and the MIRIAM urn which is associated with the term.

Page 12: RICORDO - macs.hw.ac.uk · As part of the RICORDO project, a communal methodis being established to support consi stent structured annotation of VPHDMs that can be processed by machines

12

Figure 13: Variable annotations

Complete example Figure 14 illustrates a complete example for the query volume and inheres-in some (part_of some heart).

Page 13: RICORDO - macs.hw.ac.uk · As part of the RICORDO project, a communal methodis being established to support consi stent structured annotation of VPHDMs that can be processed by machines

13

Conclusion In summary, current RICORDO infrastructure provides: • A browser based query application - query application for querying across VPHDM metadata; • A repository infrastructure - RDF repository for storing annotations of VPHDMs (which we refer to as

metatada) and a store of ontological resources containing subset of ontologies that are specifically used to annotate VPHDMs.

Figure 14: Complete example

7. Clicking on one of the results of the query will list the individual annotations of VPHDMs.

6. The application queries the dataset of annotations for VPHDMs that are related to the term in the query field.

5. Manchester query is formed using the values in the auto complete fields.

4. Auto complete field(s) to find relevant ontological terms.

3. Fields are generated to construct the query.

2. Select a template to construct queries.

1. Main query page.

Page 14: RICORDO - macs.hw.ac.uk · As part of the RICORDO project, a communal methodis being established to support consi stent structured annotation of VPHDMs that can be processed by machines

14

REFERENCES OWLlink – http://OWLlink-owlapi.sourceforge.net/ RDF – http://www.w3.org/RDF/ OWL – http://www.w3.org/TR/owl-features/ GO – http://www.geneontology.org/ FMA – http://sig.biostr.washington.edu/projects/fm/AboutFM.html PATO – http://obofoundry.org/wiki/index.php/PATO:Main_Page Java – http://www.java.com/en/ GWT – http://code.google.com/webtoolkit/overview.html Jena – http://jena.sourceforge.net/ MIRIAM – http://www.ebi.ac.uk/miriam/main/ Pellet – http://clarkparsia.com/Pellet/ OLS – http://www.ebi.ac.uk/ontology-lookup/ Manchester query syntax – http://www.w3.org/TR/owl2-manchester-syntax/ Joseki - http://www.joseki.org/ Apache Tomcat - http://tomcat.apache.org/ Grau B, Horrocks I, Motik B, et al. OWL 2: The next step for OWL. Web Semantics: Science, Services and Agents on the World Wide Web. 2008;6(4):309-322. Available at: http://dx.doi.org/10.1016/j.websem.2008.05.001. Berners-Lee T, Hendler J, Lassila O, others. The Semantic Web. Scientific American. 2001;284(5):28-37. Gkoutos GV, Green ECJ, Mallon AM, Hancock JM, Davidson D. Building Mouse Phenotype Ontologies. In: Altman RB, Dunker KA, Hunter L, Jung TA, Klein TE, eds. Proceedings of the 9th Pacific Symposium on Biocomputing (PSB 2004), World Scientific; 2004. Sirin E, Parsia B, Grau BC, Kalyanpur A, Katz Y. Pellet: A practical OWL-DL reasoner. Web Semant. 2007;5(2):51-53. Available at: http://dx.doi.org/http://dx.doi.org/10.1016/j.websem.2007.03.004. Baader F, Lutz C, Suntisrivaraporn B. Efficient Reasoning in EL+. In: Proceedings of the 2006 International Workshop on Description Logics ({DL2006}).; 2006. Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25-29. Available at: http://dx.doi.org/10.1038/75556.

Page 15: RICORDO - macs.hw.ac.uk · As part of the RICORDO project, a communal methodis being established to support consi stent structured annotation of VPHDMs that can be processed by machines

15

Degtyarenko K, Matos P, Ennis M, et al. ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Research. 2007. Gennari JH, Neal ML, Carlson BE, Cook DL. Integration of multi-scale biosimulation models via light-weight semantics. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 2008:414-425. Available at: http://view.ncbi.nlm.nih.gov/pubmed/18229704. Rosse C, Mejino JLV. A reference ontology for biomedical informatics: the foundational model of anatomy. J. of Biomedical Informatics. 2003;36(6):478-500. Available at: http://dx.doi.org/10.1016/j.jbi.2003.11.007.