9

Click here to load reader

The Semantic Web and Knowledge Grids

Embed Size (px)

Citation preview

Page 1: The Semantic Web and Knowledge Grids

TECHNOLOGIES

DRUG DISCOVERY

TODAY

The Semantic Web and KnowledgeGridsCarole Goble*, Robert Stevens, Sean BechhoferSchool of Computer Science, The University of Manchester, Oxford Road, Manchester, UK M13 9PL

Drug Discovery Today: Technologies Vol. 2, No. 3 2005

Editors-in-Chief

Kelvin Lam – Pfizer, Inc., USA

Henk Timmerman – Vrije Universiteit, The Netherlands

Knowledge management

The Semantic Web and the Knowledge Grid are

recently proposed technological solutions to distribu-

ted knowledgemanagement. Early experimental appli-

cations from the Life Science community indicate that

the approaches have promise and suggest that this

community be an appropriate nursery for grounding,

developing and hardening the current, rather imma-

ture, machinery needed to deliver on the technological

visions, which thus far have been dominated by tech-

nological curiosity rather than application-led practi-

cality and relevance. Further necessary developments

in theory, infrastructure, tools, and content manage-

ment should and could be steered opportunistically by

the needs and applications of Life Science.

Introduction: what is the Semantic Web?

The Web has served life scientists well. Many data sets and

tools are published and accessed using web protocols and web

browsers. Sharing data repositories and tool libraries is

straightforward. Widespread collaboration is possible by pub-

lishing a simple web page. Thus, the Web enables individual

scientists to answer simple ‘low volume’ questions over large

but relatively simple data sets without needing a profound

knowledge of computer science. However, standard web

technology is now straining to meet the needs of biologists.

A Web-based distributed information infrastructure is a place

where a person performs complex tasks and computers pre-

*Corresponding author: C. Goble ([email protected])URL: http://www.cs.man.ac.uk/�carole/

1740-6749/$ � 2005 Elsevier Ltd. All rights reserved. DOI: 10.1016/j.ddtec.2005.08.005

Section Editor:Manuel Peitsch – Novartis, Basel, Switzerland

sent and fetch web pages. People have to manually search the

Web for content or just know where to go; interpret and process

page content by reading it and interacting with web pages;

infer crosslinks between information in web pages or other

sites; integrate content from multiple resources and consolidate

the heterogeneous information while preserving the under-

standing of its context. Well-known specialist applications like

SRS and Entrez are designed to overcome these difficulties but

they are not general solutions, and the interpretation of the

information is still buried in the application code. Service

providers still publish resources assuming that a person will

be ‘point-clicking’ at a browser and reading the text; without

a programmatic interface, automatic processing is difficult

and fragile.

The Semantic Web could enable automated processing and

effective reuse of information on the Web that would support

intelligent searching and improved interlinking [1,2]. The

Semantic Web is an extension of the Web in which informa-

tion is given well-defined meaning by being associated with

metadata described in common terms [3]. Ontologies repre-

sent the vocabulary terms, and how they inter-relate, for the

concepts shared by a community. This requires that all kind

of web content be marked up with metadata that encodes its

meaning in a way that is machine-interpretable and hence be

processed by agents, search engines and applications to auto-

mate the content discovery and integration tasks that people

currently do manually. The meaning of information is

embedded in the Web, not the applications, so it can be

unambiguously and reliably shared across many applications.

www.drugdiscoverytoday.com 225

Page 2: The Semantic Web and Knowledge Grids

Drug Discovery Today: Technologies | Knowledge management Vol. 2, No. 3 2005

Thus, the Web could evolve from documents published for

people to read to knowledge published for computer applica-

tions to process. We can think of it as a way of representing

data on the Web or as a globally linked knowledge base.

In practice there will be many high-quality Semantic Webs

linked together through regular Web links and search mechan-

isms or low-quality metadata. Semantic Webs are costly and

difficult to produce and maintain, so only community ‘web-

lets’ or corporate intranets that gain from the significant added

value will bother with the high quality metadata needed.

Semantic Webs for Life Science communities and organisa-

tions involved in drug discovery are obvious candidates. Why?

They are knowledge driven, fragmented, and have valuable

knowledge assets whose contents need to be combined and

used by many applications. The content is diverse, being

structured (databases, electronic lab books), semistructured

(papers, ExcelTM sheets) and unstructured (PowerpointTM

documents, Web blogs, images). Its scale necessitates that

the processing be done automatically. There are many suppli-

ers and consumers of knowledge and a loose coupling between

suppliers and consumers – information is used in unantici-

pated ways by knowledge workers unknown to those who

deposited it. People naturally form communities of practice,

and there is a culture of sharing and knowledge curation. The

explosion of data coupled with the need to innovate means the

problem is mainstream and urgent.

Knowledge Grids

The Semantic Web aims to facilitate machine support for

distributed knowledge management through the provision of

a semantic infrastructure. The Grid aims to support secure

and flexible coordinated resource sharing through the provi-

sion of a middleware platform for advanced distributing

computing [4]. Grid machinery aims to allow the collection

of all kinds of resources – computing, storage, data sets, digital

libraries, scientific instruments, people, among others – to

easily form Virtual Organisations that cross organisational

boundaries and work together to solve a problem. Computa-

tional-file based Grids are the most mature, harnessing avail-

able compute power to support compute-intensive analysis

applications. Whereas Computational Grids present the illu-

sion of a single virtual computer to an application, Data Grids

present a single virtual data store that is really distributed and

multilocated. Portals provide a way for application develo-

pers to submit their compute job or query. On top of these

‘plumbing’ Grids, Application Grids aim to present the illu-

sion that applications work together when they do not. Grid

Computing is supported by the Global Grid Forum (http://

www.ggf.org) and the Enterprise Grid Alliance (http://

www.gridalliance.org/) through a series of standards, core

services and applications. Standardisation routes include

OASIS and IETF. The Grid has a strong industry influence

and major applications pull chiefly from global, large-scale

226 www.drugdiscoverytoday.com

scientific collaborations. The chief emphasis to date has been

on practical deployment of data and building compute grids.

Over the past four years the Open Grid Service Architecture

Grid initiative has revised Grid computing to adopt the

Service Oriented Architecture paradigm using Web Services,

a distributed computing platform from the Web community

and heavily supported by industry. This revision is still in a

state of flux, having had one false start in the OGSI specifica-

tion (now the WSRF specification). OGSA services are still at

the prototype stage. However, OGSA does bring together the

Web and Grid communities to a common platform.

Knowledge Grids are much less established as a concept, and

the term is still controversial [5]. Cannataro and Talia [6] define

the term as an environment for the design and execution of

geographically distributed high-performance knowledge dis-

covery applications, which is pretty much the purpose of all

Grids. Knowledge Grids use knowledge-based methodologies,

including knowledge engineering tools, discovery and analysis

techniques such as data mining and machine learning, intel-

ligent software agents, mathematical modelling, simulation or

planning. Goble and De Roure [7] distinguish a Knowledge

Grid from a Semantic Grid by suggesting that Knowledge Grids

apply to knowledge associated with Grid domain applications

and resources and Semantic Grids to knowledge associated

with grid middleware and entities. Either way, Knowledge

Grids are intended to provide intelligent guidance for decision

makers, through an agreed knowledge representation and the

provision of homogeneous access over heterogeneous sources

of information, metadata and ontologies.

We interpret a Knowledge Grid as a knowledge base to

support distributed knowledge discovery applications. From

this standpoint, a Semantic Web is a Knowledge Grid with the

emphasis on distributed knowledge representation and inte-

gration, and a Knowledge Grid is a platform for distributed

knowledge processing over a Semantic Web. In combination

they seek to support automation by exposing the implicit and

tacit knowledge used by humans and providing an appro-

priate programming infrastructure.

Key Semantic Web machinery

The Semantic Web has the backing of the World Wide Web

Consortium (W3C) standards organisation (World Wide Web

Consortium Semantic Web Activity, http://www.w3.org/

2001/sw/) and has spawned extensive research, development

and standards activity supported by industry and academia. A

range of technologies and machinery is needed to deliver

vision, and these are currently in various states of maturity,

from established standards and commercial products to

things improbable.

There are four key ingredients necessary to deliver a Seman-

tic Web: (a) a universal model of metadata, knowledge and

assertions over the current web. This forms a web of knowl-

edge layered over conventional web (or Grid, resources); (b)

Page 3: The Semantic Web and Knowledge Grids

Vol. 2, No. 3 2005 Drug Discovery Today: Technologies | Knowledge management

semantic content; (c) tools to build and maintain the model

and capture the content; and (d) knowledge-driven applica-

tions that use the semantic infrastructure. To date the com-

puting community has concentrated on (a) and (c). We can

legitimately say the Semantic Web is technology pushed

rather than application pulled. Standardisation activities

are focused on (a); commercial activities by specialist and

major vendors (notably Oracle, IBM and HP) are focused on

(c) and (d).

The Semantic Web model has five main components

(Fig. 1), usually presented as a layered stack. XML is a carrying

syntax for all the languages of the model and for all the

languages of the Web.

Identification

The Universal Resource Identifier (URI) ensures a unique

identity for each Semantic Web entity. The Life Science

Figure 1. The layered architecture of the Semantic Web.

Identifier (LSID) [8] protocol is a domain-specific protocol

that introduces a standardised way of naming data resources,

backed by an OMG standard. LSIDs have been successfully

adopted [9,10] (BioDash http://www.w3.org/2005/04/swls/

BioDash/Demo/), but teething problems include poor sup-

port for versions and some confusion over what data can

change without changing identity. Semantic Web purists

claim that the LSID is unnecessary although it seems not

to have developed applications for life scientists.

Metadata annotation

Resources are annotated by metadata that asserts facts about

and between them and their content in a common, flexible

data model. The Resource Description Framework (RDF;

http://www.w3.org/RDF/) describes objects and relations

between them in a self-describing data model. RDF is a key

to integrated and federated data storage, making this the

www.drugdiscoverytoday.com 227

Page 4: The Semantic Web and Knowledge Grids

Drug Discovery Today: Technologies | Knowledge management Vol. 2, No. 3 2005

‘integration’ layer of the Semantic Web infrastructure. We

have webs of metadata (Fig. 1) as well as webs of pages. The

RDF model is based on ‘subject-predicate-object’ statements

(‘triples’), assembled into graphs that assert facts about and

between resources (Fig. 2(1 and 2)). Facts are commonly held

separate from the resource held in triple stores (http://simile.-

mit.edu/reports/stores/). Oracle Corp is incorporating RDF

support into their regular relational databases (http://www.or-

acle.com/technology/tech/semantic_technologies/). RDF sup-

ports statements about statements, crucial for provenance

(‘this fact is asserted by EMBL-EBI’); timestamping (‘this fact

is asserted on 31-05-2005’) and meta-statements (‘this fact is

untested’). Although the RDF standard is established, the

technologies are still relatively immature. There is a confusing

range of RDF query languages although the SPARQL RDF query

language is being standardised (http://www.w3.org/TR/rdf-

sparql-query/). Performance over medium-large data sets is

disappointing. There is poor support for grouping statements

(‘named graphs’) [11]. Representing RDF within an HTML page

is an issue (http://www.cs.vu.nl/�guus/public/carroll-rdf-

html.pdf). The RDF syntax is designed for machines and

should never be revealed to humans; the same is true for

XML, of course, but RDF is particularly distracting.

Figure 2. The Semantic Bus. Suppliers (resources) expose their contents as RD

provide the glue that ties the data together and the knowledge that helps us to i

contents of resources, instruments and data sources in a uniform fashion – for ex

GSK3beta is associated with diabetes type 2. (2) Graph merging based on identit

GSK3beta and diabetes. (3) Ontologies provide the consensus and shared kno

phosphorylates proteins, which is an enzyme catalysis, which is a kind of control in

reference genes and their products. (4) Ontologies and rules provide an infere

GSK3beta might play in the Insulin Signal Transduction pathway. (5) The Semanti

gathered together and that the knowledge tools and inference engines process

228 www.drugdiscoverytoday.com

Knowledge

A shared interpretation of what the metadata means requires

ontologies for describing controlled vocabularies and back-

ground knowledge. Ontologies – consensual, shared models

in an executable form of concepts, relations and their con-

straints tied to a scaffold of taxonomies [12] – are common in

Life Sciences (http://www.sofg.org and http://obo.sourcefor-

ge.net/). We use ontology terms to assert facts about

resources, like web pages, databases, services, a protein, a

compound, a gene, a person, among others (Fig. 2(3 and

4)). Topic Maps (http://www.topicmaps.org/) have been pro-

posed as ‘indexing’ mechanisms for the Web content. How-

ever, the two standardised languages for exchanging and

representing ontologies are: RDF Schema (RDFS) [13], for

simple taxonomies and OWL [14], a family of Web Ontology

Languages extending RDFS (see [15] in this issue). The Seman-

tic Web Rule Language (SWRL) [16] adds rules to OWL knowl-

edge bases, encoding constraints and supporting further

deduction of knowledge. This is undergoing standardisation

with only prototype implementations and no commercial

support. Specialist vendors such as Ontoprise (http://

www.ontoprise.com) and Cerebra (http://cerebra.com) pro-

vide OWL tool suites.

F, allowing Consumers (applications) to make use of the data. Ontologies

nterpret the data. (1) RDF provides a common data model, exposing the

ample glycogen synthase kinase 3 beta (GSK3beta) is a protein kinase, and

ies from URIs provides links at the syntactic level to link resources about

wledge that ties data together semantically – that a protein kinase

teraction or that chemical entities reference compounds and drug targets

nce apparatus for generating implied knowledge, to infer the role that

c Bus is the collective knowledge base that could be physically or virtually

.

Page 5: The Semantic Web and Knowledge Grids

Vol. 2, No. 3 2005 Drug Discovery Today: Technologies | Knowledge management

Inference and reasoning

Metadata, ontologies and rules combine to make a distributed

knowledge base. The added value is to use the OWL and

SWRL computational reasoning capabilities to infer new

unasserted facts on or between resources and classify

resources based on their descriptions. The design of OWL

has been strongly influenced by its reasoning procedures.

However, concerns include the benefits, performance and

scalability of reasoning mechanisms and whether they can

cope with the incomplete or inconsistent knowledge found

in reality. One practice is to use reasoning ‘offline’, fix the

results and use these in real-time applications [17]. A few

reasoning engines are available commercially (Cerebra Ser-

verTM, RacerProTM http://www.racer-systems.com/) to be

embedded in applications and middleware.

Trust, proof, policy and context

The separation of assertions from the resource, and the

ability to assert facts about facts, is intended to support a

tangled accumulative web of third party assertions over

resources by those other than the resource creators or own-

ers. The Semantic Web is envisioned as a ‘democracy’ where

everyone can annotate resource or an annotation. The

Semantic Web is changeable, inconsistent and will contain

many dubious, outdated or conflicting statements.

Metadata on the metadata ties assertions (inferred or stated)

to a context such as its origin or creation date. This is

important for intellectual property, provenance tracing,

accountability and security as well as untangling contra-

dictions or weighting support for an assertion. Thus, in

addition to webs of metadata we gain webs of meta-meta-

data (Fig. 1). This is the least well-developed layers of the

technology stack. Very little infrastructure exists, the issues

are poorly understood and research is patchy and imma-

ture.

Semantic Web content

Without content, the Semantic Web has no semantics.

Acquiring the ontologies, rules and metadata is the prime

bottleneck to adoption. The Life Sciences have made great

efforts to develop standard domain ontologies for annotating

data sets (e.g. The Gene Ontology) and indexing documents

(e.g. UMLS). Ontologies have also been developed for describ-

ing services [17]. However, a Semantic Web for Life Sciences

also needs ontologies for publications, resources, experi-

ments, hypothesis, analysis, workflows among others as well

as people and organisations. Concerns arise over the com-

plexity of developing and maintaining ontologies. Knowl-

edge engineering tools like Protege-OWL [18], still tend to be

oriented to knowledge engineers not subject specialists, and

there is a paucity of shared best practices. Machine learning

and language processing to automatically generate ontolo-

gies [19] is in its infancy.

Annotation of resources with metadata splits into (a) high

quality manual annotation using tools like OntoMat Anno-

tizer [20] and (b) automatic (or semiautomatic) annotation

using text mining [21], language processing techniques [22]

or other forms of processing for other types of data. Text

mining, UMLS, MeSH and a culture of manual curation make

these approaches feasible. An alternative is to generate meta-

data in RDF directly from the content generating services, like

content management systems, data mining tools or by service

providers [23]. Semantic content generated by service provi-

ders, publishers and instrument suppliers needs to become

universal and ubiquitous to have an impact. UniProt (http://

www.isb-sib.ch/�ejain/rdf/) exports results in RDF, as do

some publishers like Nature [24] but these are exceptions.

Fig. 1 shows that there will be webs of knowledge in

addition to webs of metadata and web pages. Multiple, pos-

sibly overlapping, ontologies need aligning, merging and

linking, multiple metadata even on the same resource that

can potentially conflict multiple rules and multiple assertions

of trust and context. Just how this will work in practice, the

mechanisms needed to cope and the tools and services to

assist applications are still research subjects. The current

reality is either simple versions of the Semantic Web, or these

problems being passed on to the applications.

Semantic Web applications

Knowledge-driven applications consume the semantic infra-

structure. Applications have been mainly confined to corpo-

rate intranets for managing ring-fenced knowledge assets –

semantic islands in the sea of the Web. Current applications

divide into those that emphasise a Web of Semantics and

those that just emphasis the processing of knowledge. We

take the knowledge-oriented applications first.

Ontology development

The inferencing capabilities of OWL have been shown to aid

the building of large and sophisticated ontologies such as The

Gene Ontology [25] and BioPAX (http://www.biopax.org/).

The concept classification is derived and inconsistencies are

automatically identified. The standardisation of a language

significantly helps the exchange of ontologies. However,

there are some problems with the expressivity of OWL for

Life Science, Chemical and Clinical ontologies.

Advanced metadata modelling

The self-describing nature of RDF and OWL models enables

flexible descriptions for data collections, suiting those whose

schemas might evolve and change or whose data types are

hard to fix, like knowledge bases of scientific hypotheses,

provenance records of in silico experiments [26,27] or pub-

lication collections.

Now consider the applications where metadata is asso-

ciated with distributed resources.

www.drugdiscoverytoday.com 229

Page 6: The Semantic Web and Knowledge Grids

Drug Discovery Today: Technologies | Knowledge management Vol. 2, No. 3 2005

Intelligent searching and document discovery

Semantically enabled search engines, like TAP [28] could

support ‘concept-based’ searches over content annotations,

exploiting the structure of the ontology by automatically

narrowing or broadening search terms. The ontology hier-

archy classifies the contents of metadata collections. An

ontology definition for an oncogene including references

to organism, function, locus, and associated diseases anno-

tating a web page about function can be inferred to link to a

paper on locus despite the paper not mentioning it [1].

Semantically powered catalogues and service registries can

find data sets and tools based on ontological descriptions.

Semantic Portals describe the properties, relationships and

classifications of the various information items using an

ontology, for example, PlanetOnto (http://kmi.open.ac.uk/

projects/kmi-planet/) [29] and the SEmantic portAL [30].

Social and knowledge networking

A key knowledge commodity is who knows what and making

connections with similar minded communities. The popular

Friend of a Friend (FOAF) project http://www.foaf-project.org/

creates a Web of machine-readable homepages describing

people, the links between them and the things they create

and do. In addition to providing simple directory services,

FOAF can be used to provide assistance to new entrants in a

community and locate people with similar interests. It is a

simple RDF vocabulary; the power is the links. The big plus

comes when the data is aggregated and can then be explored

and crosslinked. SciFOAF builds a FOAF community mined

from the analysis of authors and publications over PubMed

(http://www.urbigene.com/foaf/).

Advanced content syndication and publication

Syndication is the process by which a web site is able to share

information, such as articles, with other web sites. Scientific

publishers like the Institute of Physics (http://syndication.io-

p.org/) and the International Union of Crystallography

(http://journals.iucr.org/services/rss.html) publish RSS feeds

in RDF using standard RSS, Dublin Core and PRISM (http://

www.prismstandard.org/) RDF vocabularies. Uniprot has an

experimental publication of results in RDF. As with FOAF, the

fun starts when data is aggregated and crosslinked.

Integration, aggregation and crosslinking

Results in RDF graphs yielded from multiple heterogeneous

sources, say, UniProt, Interpro and PubMed, are manipulated

and combined using shared ontological concepts and corre-

sponding URIs as bridges (Figs 2 and 3). The semantic defini-

tions serve as ‘smart glue’ specifying how objects relate to

each other, managing these inferred or explicitly proposed

relations outside any one repository and aggregating geno-

mic, proteomic, cellular, physiological, and chemical data,

even when these data are kept in different databases with

230 www.drugdiscoverytoday.com

different schemas and bridging the structured and unstruc-

tured free texts. Shared ontologies and the common RDF data

model help overcome the data of syntactic and semantic

heterogeneity, forming a global integration ‘Semantic bus’

that knowledge discovery applications build upon. For exam-

ple, YeastHub [31] converts the outputs of a variety of data-

bases into RDF and combines them in a warehouse built over

a native RDF data store. BioDASH (http://www.w3.org/2005/

04/swls/BioDash/Demo/) (Fig. 3) is an experimental Drug

Development Dashboard that uses RDF and OWL to associate

disease, compounds, drug progression stages, molecular biol-

ogy and pathway knowledge for a team of users. Correspon-

dences are not necessarily obvious to detect, requiring

specific rules. BioDASH uses ‘Semantic Lenses’ to filter and

aggregate RDF particularly in bio-orientation – for example, a

pathway lens.

Application interoperability

It is not just web pages that can be annotated; middleware like

the Web and Grid services can also be annotated to make

them accessible to applications [32]. Techniques for plan-

ning, composing, editing, reasoning and analysing about

these descriptions are being investigated and deployed to

resolve semantic interoperability between services within

scalable, open environments. Semantic annotations on appli-

cations and data sets enable them to be intelligently discov-

ered, compared and combined into workflows [33,34] to run

complex applications, integrate data and mine knowledge

(DiscoveryNet http://www.discovery-on-the.net/) [9].

Knowledge Grids and Knowledge mining

The purpose of all this technology is to create a knowledge

substrate for knowledge mining. Examples of Grid-based

Problem Solving Environments that integrate ontology and

workflow approaches for bioinformatics application on the

Grid include Proteus [35], myGrid [9] and DiscoveryNet

(http://www.discovery-on-the.net/).

Concluding remarks

The Semantic Web, and its less well-defined cousin the

Knowledge Grid, promises a paradigm shift for the delivery

of knowledge just as the Web changed how we publish and

find information. Certainly there is a tremendous amount of

activity in the development of standards, machinery and

tools for the semantic infrastructure and research into its

theoretical underpinnings. The technologies being devel-

oped have already demonstrated their value for ontology

development and advanced metadata modelling for general

knowledge-based applications. The first W3C Semantic Web

for Life Science Workshop in 2004 attracted over 100 parti-

cipants with representation from all the major pharmaceu-

tical and drug discovery players. Within a closed enterprise

pharmaceutical environment, such as KSpace in Novartis, or

Page 7: The Semantic Web and Knowledge Grids

Vol. 2, No. 3 2005 Drug Discovery Today: Technologies | Knowledge management

Figure 3. The BioDASH Drug Development Dashboard prototype. BioDASH associates disease, compounds, drug progression stages, molecular biology,

and pathway knowledge for a team of users. It is based on the concept of a therapeutic topic model. Information sources are exposed as RDF – common

vocabularies then tie these resources together. Applications can then make use of knowledge services and inference to support user tasks. The figure shows

an investigation into the therapeutic value of glycogen synthase kinase 3 beta (GSK3beta), a regulatory enzyme associated with multiple diseases, including

diabetes type 2. The BioDASH demonstration is built on Haystack [10], which is an extensible Semantic Web Browser developed by MIT. Data is aggregated

and browsed from OMIM and Uniprot by using LSIDs resolved into RDF and automatically linked with a BioPAX pathway, also in RDF, using common LSIDs.

The pathway view combines the protein and interacting compounds views. The BioPAX model is represented in OWL. Current work uses reasoning to

consistency check nutrient-related analyses for essential compounds, missing essential compounds, and reactions, both fired and unfired.

bounded communities, like the Alzheimer’s Forum we are

beginning to see Semantic Webs as they were envisioned.

For a Semantic Web to flourish, the communities it would

serve needs to be knowledge driven, globally distributed and

able and willing to create and maintain the semantic content.

It is this latter point that is crucial. As Gardner [15] suggests in

this issue, the drug discovery related communities are embra-

cing ontologies. The Life Science world has the desire for

collaboration, a culture of annotation, and act as service

providers that might be persuaded to generate RDF or at least

annotated XML. A Semantic Web is expensive to set up and

maintain and, thus, is only probable to work for communities

where the added value is worthwhile and an ‘open source

data’ philosophy prevails.

However, beware of the hype. Most of the technologies are

yet to be established and many key components – trust,

security, context – are missing. Reusing data in unexpected

and uncontrolled ways might not be desirable. There are

sceptics [36] who argue that XML is enough, and for tightly

defined and static problems they could be right. The best

examples of Semantic Web applications have been when the

emphasis has been on the content and connectivity rather

than elaborate reasoning. Simple ontologies and simple inte-

gration approaches, like FOAF and RSS, have had an impact.

Elaborate reasoning has chiefly been used in knowledge

applications or for building large ontologies, rather than

integrating metadata, despite its centre stage position in

the W3C standards activity. Although Semantic Web advo-

cates like to distance themselves from A.I. the Semantic Web

is still sold with far-fetched A.I. scenarios [3]. In reality it is the

Web aspect – the integration layer and a means to simply link

heterogeneous semistructured data – that will win out rather

than the rich semantics, although the research activity, and

hype, has primarily been the other way about [37]. The

important lesson is that a Semantic Web needs Semantics,

however simple and however scruffy.

Up to now there has been more emphasis on technology

push and not enough on application and service provider

www.drugdiscoverytoday.com 231

Page 8: The Semantic Web and Knowledge Grids

Drug Discovery Today: Technologies | Knowledge management Vol. 2, No. 3 2005

Links

� The Semantic Web portal http://www.semanticweb.org/

� Semantic Web for Life Sciences http://www.biopathways.org/

semweb/

� W3C Semantic Web Activity http://www.w3.org/2001/sw/

� Semantic Grid portal http://www.semanticgrid.org

� Global Grid Forum http://www.ggf.org

� W3C Semantic Web for Life Science Workshop http://

www.w3.org/2004/07/swls-ws.html

� Standards and Ontologies for Functional Genomics http://

www.sofg.org

� Open Biological Ontologies portal http://obo.sourceforge.net/

� RDF Resource Description Framework http://www.w3.org/RDF/

� RDF Schema http://www.w3.org/TR/rdf-schema

� OWL Web Ontology Language http://www.w3.org/2004/OWL/

� LSID Life Science Identifier http://lsid.sourceforge.net/

� Friend of a Friend, FOAF http://www.foaf-project.org/

pull. While waiting for the ‘killer application’, the machinery

for the platform has been built in a partial vacuum. Some of

this infrastructure is cumbersome and hard for application

developers to engage with – the layering of OWL over RDF

lacks elegance and the number of layers and components can

make debugging tricky. Service providers need migration

tools and application developers need client libraries and

simple APIs, and both need appropriate tooling. In an ideal

world, users should never see the Semantic Web at all.

The Web needed to be incubated by a highly motivated

community with an application and a generous spirit – in this

case, High Energy Physics. The Semantic Web needs the

nursery of another community that would benefit hugely

from its capabilities. It is not e-commerce. It should be drug

discovery.

Related articles

Hendler, J. (2003) Science and the Semantic Web. Science 299, 520–521

Berners-Lee, T. et al. (2001) The Semantic Web. Scientific American,

May

Neumann, E. (2005) A Life Science Semantic Web: are we there yet?

Sci. STKE 2005, 10 May

Outstanding issues

� Making it semantic: content acquisition and content management.

� Making it Web: tackling distribution, heterogeneity and

inconsistency of metadata, ontologies and rules.

� Making it understandable: lowering the barriers of entry for

developers and simplifying the complexity of the components and

language layers.

� Making it safe: security, trust, proof and policy management.

� Making it accessible: providing the tooling and machinery for

applications and content providers.

� Making it work: focus on performance and scalability, not just

language expressivity.

232 www.drugdiscoverytoday.com

References1 Hendler, J. (2003) Science and the Semantic Web. Science 299, 520–521

2 Neumann, E. (2005) A Life Science Semantic Web: are we there yet? Sci.

STKE 283, pe22

3 Berners-Lee, T. et al. (2001) The Semantic Web. Scientific American, May

4 Foster, I. and Kesselman, C. (1999) The Grid: Blueprint for a New Computing

Infrastructure. Morgan Kaufmann

5 Goble, C.A. et al. (2000) Enhancing services and applications with

knowledge and semantics. In The Grid: Blueprint for a New Computing

Infrastructure (2nd edn) (Foster, I. and Kesselman, C., eds), Morgan

Kaufman

6 Cannataro, M. and Talia, D. (2004) Semantic and Knowledge Grids:

building the next-generation grid. IEEE Intell. Syst. (ISSI-0095-1203) –

Special Issue on E-Science 19, 56–63

7 Goble, C. and De Roure, D. (2004) The Semantic Grid: building bridges and

busting myths. 16th European conference on Artificial Intelligence ECAI 2004

including Prestigious Applicants of Intelligent Systems, 22–27 August 2004,

Valencia, Spain, PAIS IOS Press (ISBN 1-58603-452-9)

8 Clark, T. et al. (2004) Globally distributed object identification for

biological knowledgebases. Brief. Bioinform. 5.1, 59–70 (http://

lsid.sourceforge.net/)

9 Stevens, R. et al. (2004) Exploring Williams–Beuren Syndrome using

myGrid. Bioinformatics 20, i303–i310. Proceedings of 12th Intelligent Systems

in Molecular Biology (ISMB), 31st July–4th August, Glasgow, UK (myGrid

http://www.mygrid.org.uk)

10 Quan, D. et al. (2003) Haystack: a platform for authoring end user semantic

web applications. Proceedings of the 2nd Intl. SemanticWeb Conference ISWC

2003, Sanibel, FL

11 Carroll, J.J. et al. (2004) Named graphs, provenance and trust. Proceedings of

the 14th International Conference on World Wide Web, Chiba, Japan. pp.

613–622

12 Stevens, R.D. (2000) Ontology-based knowledge representation for

bioinformatics. Brief. Bioinform. 1, 398–414

13 RDF Vocabulary Description Language 1.0: RDF Schema. W3C

Recommendation, 10th February 2004 (http://www.w3.org/TR/rdf-schema)

14 Horrocks, I. et al. (2003) From SHIQ and RDF to OWL: the making of a web

ontology language. J. Web Semant. Sci. Serv. AgentsWorldWideWeb 1, 7–26

15 Gardner, S.P. Ontologies. Drug Discov. Today Technol. (in press)

16 Horrocks, I. et al. (2005) OWL rules: a proposal and prototype

implementation. J. Web Semant. 3, 23–40

17 Lord, P. et al. (2004) Applying semantic web services to bioinformatics:

experiences gained, lessons learnt. Proceedings of the 3rd International

Semantic Web Conference – ISWC 2004, 9–11 November, Hiroshima, Japan

(Springer LNCS 3298)

18 Knublauch, H. et al. (2004) The Protege-OWL Plugin: an open

development environment for semantic web applications. Third

International SemanticWeb Conference – ISWC 2004, November, Hiroshima,

Japan (http://protege.stanford.edu/plugins/owl/)

19 Sabou, M. et al. (2005) Learning domain ontologies for web

service descriptions: an experiment in bioinformatics. Proceedings

of the 17th International Conference on World Wide Web (WWW2005),

May, Japan

20 Handschuh, S. and Staab, S. (2002) Authoring and annotation of web

pages in CREAM. Proceedings of the 11th International World Wide Web

Conference (WWW2002), 7–11 May, Honolulu, Hawaii, USA

21 Ghanem, M. et al. (2005) A grid infrastructure for mixed bioinformatics

data and text mining. Proceedings of the 3rd ACS/IEEE International

Conference on Computer Systems and Applications, January, Cairo, Egypt,

IEEE Computer Society

22 Ciravegna, F. et al. (2004) Learning to harvest information for the

Semantic Web. Proceedings of the 1st European SemanticWeb Symposium, 10–

12 May, Heraklion, Greece

23 Volz, R. et al. (2003) Unveiling the hidden bride: deep annotation for

mapping and migrating legacy data to the Semantic Web. J. Web Semant.

Sci. Serv. Agents World Wide Web 1, 187–206

24 Lund, B. (2004) Science Publishing and the Semantic Web: RSS, RDF and

Urchin, Position Paper W3C Semantic Web for Life Science Workshop

(http://www.w3.org/2004/07/swls-ws.html)

Page 9: The Semantic Web and Knowledge Grids

Vol. 2, No. 3 2005 Drug Discovery Today: Technologies | Knowledge management

25 Wroe, C.J. et al. (2003) A methodology to migrate the gene ontology to a

description logic environment using DAML+OIL. Proceedings of the 8th

Pacific Symposium on Biocomputing (PSB), January, Hawaii

26 Zhao, J. et al. (2004) Using Semantic Web technologies for representing e-

science provenance. Proceedings of the 3rd International Semantic Web

Conference ISWC2004, 9–11 November, Hiroshima, Japan (Springer LNCS

3298)

27 Frey, J. et al. (2004) Less is More: Lightweight Ontologies and User Interfaces for

Smart Labs. The UK e-Science All Hands Meeting 2004

28 Guha, R. and McCool, R. (2003) TAP: a semantic web test-bed. J. Web

Semant. Sci. Serv. Agents World Wide Web 1, 81–87

29 Domingue, J. and Motta, E. (2005) Planet-Onto: from news publishing to

integrated knowledge management support. IEEE Expert Syst. Special Issue

on Knowledge Management and Distribution over the Internet 15, 26–32

30 Maedche, A. et al. (2001) Semantic portal – the SEAL approach. In Creating

the Semantic Web (Fensel, D. et al. eds), MIT Press

31 Cheung, K.H. et al. (2005) YeastHub: a semantic web use case for

integrating data in the life sciences domain. Bioinformatics 21 (Suppl. 1),

i85–i96

32 Nixon, L. and Paslaru, E. State of the Art of Current Semantic Web Services

Initiatives, Deliverable 2.4.ID1 Knowledge Web Network of Excellence

(http://knowledgeweb.semanticweb.org)

33 Lord, P. et al. (2005) Feta: a light-weight architecture for user oriented

semantic service discovery. In European Semantic Web Conference, Lecture

Notes in Computer Science, (Vol. 3532) (Gomez-Perez, A. and Euzenat, J.,

eds) Springer-Verlag

34 Potter, S. and Aitken, S. (2005) A semantic service environment: a case

study in bioinformatics. In European Semantic Web Conference, Lecture

Notes in Computer Science, (Vol. 3532) (Gomez-Perez, A. and Euzenat, J.,

eds) Springer-Verlag

35 Cannataro, M. et al. (2004) Proteus, a grid based problem solving

environment for bioinformatics: architecture and experiments. IEEE

Comput. Intell. Bull. 3, 7–18 (ISSN 1727-5997)

36 Nee, E. (2005) Web Future is Not Semantic, Or Overly Orderly CIO Insight 5

May (http://www.cioinsight.com/article2/0,1540,1815338,00.asp)

37 McBride, B. (2004) Four Steps Towards the Widespread Adoption of the

Semantic Web. Proceedings of the 1st International Semantic Web Conference

ISWC2002, 9–12 June, Sardinia, Italy (Springer LNCS 2342)

www.drugdiscoverytoday.com 233