24
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Tuomas Korpilahti [email protected] 47972U

€¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

Tuomas Korpilahti Distributed Development of Ontologies - Keeping the Architecture Consistent November 4, 2003

Tuomas Korpilahti

[email protected]

47972U

Page 2: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

Abstract

Authors: Tuomas Korpilahti

Name of Report: Distributed Development of Ontologies – Keeping the

Architecture Consistent

Date: November 4, 2003 Pages: 24

This paper presents a study on methods to prevent ontology developers from

inserting conflicts into ontologies in collaborative, distributed development. A set of

problems encountered in distributed ontology development is presented. Current

methods to answer these problems are described, and the methods are evaluated

against the problems.

The study suggests a set of development environment integratable methods that

contribute most in preventing errors in collaborative, distributed ontology

development. These methods cover the issues of ontology design criteria, knowledge

transfer, ontology architecture, architectural support for collaborative development,

synchronization of concurrent development, development tool user interface and

ontology storage.

Keywords: Distributed development, ontology development, collaborative

development, computer-aided development

Page 3: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

Table of Contents

Abstract ............................................................................................................................... 2

1. Introduction ..................................................................................................................... 4

1.1. Background .............................................................................................................. 4

1.2. Research Problem and Objectives ........................................................................... 4

1.3. Study Scope and Study Methods.............................................................................. 5

2. Definition of Terms .......................................................................................................... 5

3. Background on Ontologies.............................................................................................. 6

4. Problems with Collaborative, Distributed Ontology Development ................................... 7

4.1. General Problems..................................................................................................... 7

4.2. Problems with Axioms .............................................................................................. 9

5. Current Approaches and Techniques.............................................................................. 9

5.1. Design Criteria for Ontologies................................................................................... 9

5.2. Knowledge Transfer................................................................................................ 11

5.3. Development Environment ..................................................................................... 12

5.4. Architectural Strategy ............................................................................................. 13

5.5. Synchronization Strategy........................................................................................ 14

5.6. Ontology Storage Strategy ..................................................................................... 15

6. Suitability of Approaches to Problems........................................................................... 16

6.1. Design Criteria for Ontologies................................................................................. 16

6.2. Knowledge Transfer................................................................................................ 17

6.3. Development Environment ..................................................................................... 17

6.4. Architectural Strategy ............................................................................................. 18

6.5. Synchronization Strategy........................................................................................ 19

6.6. Ontology Storage Strategy ..................................................................................... 20

7. Discussion and Conclusion ........................................................................................... 20

8. References.................................................................................................................... 23

Page 4: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

4

1. Introduction

1.1. Background

Today more and more software is being developed in distributed teams. The emerging

semantic web will introduce ontologies and architectures that have estimated lifetimes

from 10 to 50 years. Ontologies are to be extended and used together in non-anticipated

ways. Also their development is likely to be a highly distributed task.

The architectural model of any system should remain clear and consistent until software is

decommissioned. Ontologies and architectures are very similar in a sense that they both

define a framework, within which different actors and software components interact with

each other and reason about the world. But as in any logical model, any conflict in

ontology will render it completely useless. Is it possible to provide tool support for

combining the requirements of high correctness with distributed, collaborative, evolutionary

development?

1.2. Research Problem and Objectives

We seek to find out how is it possible to maintain architecture and model consistency in a

changing environment where different people are building software in distributed,

collaborative teams using incremental or evolutionary software development models.

Our primary study objectives are to

• Find methods to prevent conflicts from being inserted into the system.

• Find methods to minimize the number of inserted conflicts.

Our secondary objective is to

• Seek ways to integrate the methods found to an integrated ontology development

environment (IDE).

Page 5: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

5

1.3. Study Scope and Study Methods

We have conducted a literature study on collaborative, distributed ontology development.

As it is a relatively novel area with only so many publications, we will also bring in ideas

from collaborative development in general.

The rest of this paper is structured as follows. We will first define some common terms and

give background information on ontologies. The we present some major problems in

distributed, collaborative ontology development. In section 5 we will give an overview of

currently used approaches and practices to answer these challenges. We will analyze the

suitability of current practices and methods to the problems in section 6. Based on our

analyses, we will introduce a set of propositions to help the development of distributed

ontology development environments, and end the paper with a summary of our findings.

2. Definition of Terms

Ontology is a closed world model of some part of the world around us. It specifies the

world’s relevant concepts and relations between those concepts. It also may include some

type of logic rules to reason about the concepts and relations. Ontologies are typically

written in RDF (Resource Description Framework) and RDF Schema languages.

Ontology axiom is a logic rule in the ontology. It serves as a given description of what is

fundamentally true within that ontology. As an example, we could have an ontology of

transportation vehicles such as cars, ships and airplanes. We might then include an axiom

saying that a car cannot fly.

Collaborative development in this paper means the type of software development where

several developers actively collaborate while building a common product together. One

example would be pair programming of extreme programming (XP). A less computer

science oriented example would be two construction workers building a house. As one of

them holds a piece of wood in place, the other nails it to its place. In collaborative

development the developers need not necessarily be co-located – the idea is that people

Page 6: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

6

are actively influencing each other’s work and interacting together as they are building

their shared goal.

Distributed development is used to describe software development where development

work is distributed across different geographically distributed locations. The locations may

be in different continents or just a few floors from each other. The main point is that the

developers are not co-located, in other words they do not sit together while working.

3. Background on Ontologies

As defined previously, an ontology is essentially a domain model that explicitly specifies

domain’s concepts and the relations between them. Ontologies are used in knowledge

representation. Their advantage is that they make the knowledge both human and

machine-readable, thus creating new possibilities for applications in artificial intelligence

and context-sensitive information portals, to name a few. Recent research has investigated

their use in ameliorating information indexing in and information retrieval from the Internet;

these efforts are done under the semantic web research field.

Ontology is a logical model; in other words, it describes a closed world and a set of rules

that are in effect in that world. Should the ontology contain a contradiction, the entire

model is useless in automated reasoning because as we know from mathematical logic,

one can prove any statement to be true in a contradictionary logical model. This is why in

the case of a complex ontology, often a considerable amount of work must be invested in

its development to ensure consistency.

Ontology development requires domain knowledge on the field that the ontology will

model. If the use of ontologies becomes widespread, it is likely that there will be a serious

lack of experts on knowledge modeling. Therefore an ontology development team is likely

to be geographically distributed, and team members cannot meet in person to discuss

modeling decisions. During development the experts must be able to use development

tools to find consensus on how they see the world and on the differing modeling ideas they

have.

Page 7: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

7

Ontologies are to be used together with other ontologies to reason about the world in

cross-domain issues. An example of this would be the classic producer – seller

relationship, where one company makes for example cars, and another sells them. They

both talk about the same concept – a car – but have very different points of view, and thus

use different ontologies. The manufacturer might connect car with properties like model

number, electrical system provider, engine provider and factories capable of producing

that model. Retailer might prefer associating it with properties such as color, power,

maximum speed, supplier and unit price. They both need a common concept of a car, but

think about it differently. This example is quite obvious, but ontologies may be used in a

variety of ways that cannot be anticipated at the time of their creation.

The interaction of ontologies and the high level of expertise needed to develop them imply

that they must be developed in a distributed, collaborative fashion. The challenge comes

from the fact that ontologies are likely to have relatively long lifetimes. In our projects at the

University of Helsinki, we have estimated realistic lifetimes of 10 to 50 years.

4. Problems with Collaborative, Distributed

Ontology Development

4.1. General Problems

As we presented in the previous chapter, ontologies are likely to be developed in a

distributed fashion for a considerably long time. The development is likely to follow an

evolutionary and incremental model, as new ontologies are created from existing ones to

adapt to different use cases. Ontology development must answer the traditional challenges

of distributed software development: concurrent modifications, division of responsibilities

and integration.

As is normally the case in any work related to modeling real world, there are often several

ways to represent the same concepts and relations in an ontology. Usually the choice

between these does matter, and should be done considering the possible use cases of the

Page 8: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

8

ontology. Therefore, knowledge transfer becomes and issue in ontology development. It is

especially crucial in re-using exiting ontologies to create new ones. As Aitken has showed

(1998), the quality of the resulting ontology tightly correlates with how deep understanding

the developers have on the ontologies they are re-using. In order to achieve good results,

one must know well how the source ontologies work; otherwise one easily inserts semantic

errors. We believe the same also applies when referencing concepts from other

ontologies.

To connect ontologies together, it is possible to reference a concept in one ontology from

another ontology. This brings in additional challenge; if the referenced ontology is

modified, the referencing ontology might get broken. This is an issue because ontologies

can be too large or too cross-referenced to be integrated into a new ontology. (Farquhar

et al., 1996) presents an example of this by introducing two large ontologies, medical

ontology and sports ontology that cross-reference each other. In medical ontology we

might state, “Roller-blading (sports ontology concept) is a major cause of wrist fractures.”

Sports ontology might claim, “Some weight-lifters use anabolic steroids (medical ontology

concept).” We don’t want to include all of the concepts of the other domain into our

ontology, just the useful. Any ontology might have same kinds of cross-references with

several ontologies, for example sports – business, business – medical, medical – law, law

– business etc. Creating one combined ontology for each case would make maintenance

impossible.

(Herbsleb & Grinter, 1999, page 69) suggests that in distributed development one should

“only split the development of well-understood products where architectures, plans and

processes are likely to be stable”. Problem with ontologies is that there is no general

authority to divide the work; at the start of the modeling process the result is fuzzy at best;

there are lots and lots of users, and all of them have an opinion on how the ontology

should support their use case. Instability greatly increases the need for communication.

Page 9: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

9

4.2. Problems with Axioms

Logical knowledge in ontologies is often encoded in the form of logical axioms. They are a

very powerful tool for reasoning, but at the same time introduce some sources of axiom

specific problems. (Sure et al., 2002) and (Sure & Studer, 2002) have classified the errors

related to axioms in three categories:

• Axioms contain typing errors like variables not specified by a quantifier, typos in

concept names or relationship names etc.

• Axioms contain semantic errors, i.e. the rules do not express the intended meaning.

• Performance issues, like axioms defined such that evaluation needs a lot of time,

which is not always easily recognizable by the users.

5. Current Approaches and Techniques

In the previous chapter we presented some problems that occur in distributed ontology

development. Next we will present some approaches and techniques currently used to

solve these problems. We have divided them into six categories. 5.1 discusses about

ontology design in general. In 5.2 we discuss issues related to communication,

documentation and collaboration between developers. We then present existing

enhancements to ontology development environments in 5.3. In section 5.4, we

concentrate on the possibilities to divide and guide work by ontology system architecture.

5.5 addresses the issues related to synchronizing concurrent, collaborative development.

Lastly, 5.6 presents possible solutions for combining the development results into a long-

term storage.

5.1. Design Criteria for Ontologies

Gruber has defined design criteria for ontologies (1993). These criteria aim to act as a

basis for delivering clear and expressive ontologies that are easy to use and re-use. They

were developed to support ontology development process in general. We present the

Page 10: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

10

design criteria here because they provide a strong guideline in pursuit of better tool

support for distributed ontology development.

The design criteria for ontologies Gruber defines are:

• Clarity: one should use objective, formal and complete definitions.

• Coherency: the ontology and use of terms should be internally coherent. Also the

free text explanations should be coherent with the model.

• Extendibility: defining new terms should not require revisioning the original ontology.

• Minimal encoding bias: minimize the dependency on data type specification and

alike.

• Minimal ontological commitment: make as few claims about the world as possible.

Use vocabulary consistently.

Clarity and extendibility are quite obvious criteria. Coherency is notably stricter than one

might expect, as it poses a requirement even on the free text explanation fields that

normally are left out of the scope of most development methods and tools.

What minimal encoding bias means is actually much more than one might expect on first

look. Gruber proposes that ontologies should be independent not only of data types but

also of concepts such as units of measure or precision. For example, an ontology handling

speeds cannot expect them to be measured in kilometers per hour. The same ontology

must be capable of handling speeds also in any unit one could possibly invent – miles per

hour, millimeters per nanosecond, Roman legion march speed and so on. This way, the

ontology is more usable as other users can just define new units of speed and use them

with the original ontology.

Minimal ontological commitment is a very intuitive criterion, for one may not always

foresee how her ontology is used and re-used. Thus, one should leave as many

possibilities for further development as possible. This can be done by stating as little as

Page 11: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

11

possible about the world in order that the statements would not prevent further

development. Minimal ontological commitment is this principle of creating an ontology with

a minimal set of concepts and relations needed to describe the world in that use case.

5.2. Knowledge Transfer

Knowledge transfer is an issue in all distributed development. Currently most ontology

development environments are used in academic community. They offer very little

communication support, and seem to relay mostly on methods similar to those of open

source projects; main media of collaboration are email and chat.

The need for organized knowledge transfer processes and means has been recognized.

Aitken suggests in (1998) that extensive documentation on design principles, intended use

and planned re-use methods of an ontology should be included with it. As he has showed,

understanding the functionality of source ontologies is a key to successful re-use.

(Herbsleb & Grinter, 1999) further underlines the importance of proper documentation. In

distributed development there is no informal communication. Important design decisions

are not spread through the parties working on one ontology if not explicitly recorded and

promoted to the developer community. However, mere documentation alone does not

solve the issue of daily collaboration but helps in a longer term. Other solutions are

needed for day-to-day knowledge transfer.

OntoEdit provides one solution for knowledge transfer. OntoEdit is a tool for ontology

development. It facilitates collaborative development by incorporating integrated support

for mind maps, as (Sure & Studer, 2002) explains. It has a plug-in to connect to

commercial software for building mind maps. Developers can sketch their ideas in mind

maps and share them with other developers. Currently, the relations in the mind map are

not connected to the ontology engine. They serve as external documentation on

developer’s goals and purposes.

Page 12: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

12

5.3. Development Environment

As we noted in previous chapter, most ontology development environments in the market

today are results of academic research and meant for academic community. This often

means that they aim at providing freedom of expressiveness at relatively low development

cost. In this context, users are experienced and thus the development of and investment in

graphical user interface has been much smaller, with a few notable exceptions.

Traditionally, development environments have tried to provide support for conflict detection

and resolution. An example of this is presented in (Farquhar et al., 1996). In Ontolingua

system, when merging two ontologies, conflicts are detected and user is prompted for a list

of possible solutions to solve the problem. This approach tries to ease solving possible

semantic errors. The problem is that not all semantic errors can be catched nor solved

automatically (Klein et al., 2002).

Despite historically being editors for experienced people, a common agreement is that

ontology development environment should include a graphical user interface (GUI)

(Grosso et al., 1999; Sure et al., 2002; Sure & Studer, 2002). Ontology can be seen as a

directed graph and it often has some type of hierarchy, which can be used to visualize it.

Intuitive visualization together with drag and drop editing address particularly two problems

related to ontology axioms: typing errors and performance issues.

As the mappings between concepts are done by computer while developer clicks a

graphical representation of ontology, all typing errors can be prevented. Optimization

algorithms can be used to optimize complex axioms for performance; different algorithms

could be used depending on how much knowledge developer has on the system where

the ontology will be used.

In some cases, ontology axioms can be too complex for drag and drop editing. Also, GUI

tools often limit developers’ freedom to define relations and axioms. Thus, direct access to

ontology encoding is needed. (Sure et al., 2002; Sure & Studer, 2002) suggest that

whenever this is the case, the development environment should provide at least syntax

highlighting to help developer note possible typing errors.

Page 13: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

13

5.4. Architectural Strategy

A widely used method to control complex and large entities in software industry is to divide

them into independent components that communicate via well-specified interfaces. Also

ontologies may in some cases be divided into independent modules. Modules can interact

in two different ways. One way is to connect a module to other modules by referencing

concepts defined in other modules. Another possibility is to combine the modules at query

level, querying each module separately and then combining the results according to some

logic.

According to (Ding & Fensel), most systems that aim to facilitate ontology re-use,

integration, or connecting ontologies have adopted modular organization of ontologies in

their ontology library. But as (Stuckenschmidt & Klein, 2001) notes, ontology

modularization requires that one is able to divide an ontology into independent modules

that can be queried individually. This is because modules should be independent of other

modules, and more specifically, one module should be able to answer a query without

querying other modules.

Another paradigm that can be used to organize ontology libraries is a standard upper level

ontology. It serves as a common framework that domain-specific ontologies extend.

Different ontologies are therefore connected via upper level ontology relations. This can be

used to divide work to different groups of specialists when creating the domain-specific

ontologies, and to relate cross-domain knowledge when querying ontologies. (Ding &

Fensel) states that standard upper level ontologies provide very important contribution in

these aspects.

Independent from ontology modularization and standard upper level ontologies, current

collaborative ontology development systems come in two architectural models. All systems

implement either peer-to-peer or client / server architecture to store and allow modification

of ontologies. Each model has its strengths and weaknesses. (Ding & Fensel) suggests

that client / server model seems to be critical for collaborative editing. It allows one to

Page 14: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

14

specify a fixed connection point, where ontology can be accessed, and version and

concurrency control can be enforced.

Peer-to-peer architectures offer considerable scalability advantages. According to our

understanding, collaboration support seems to be somewhat more challenging to

implement to them. Existing peer-to-peer systems appear to focus more on the problem of

distributing large ontologies in a performance efficient manner. Systems with client / server

model attempt to tackle collaborative development.

5.5. Synchronization Strategy

In this section we will discuss different strategies to support collaborative editing in every

day tasks, and in merging the edits of different developers. One strategy is of course not to

support it at all. This is not necessarily a bad idea. As (Sure & Studer, 2002) explains,

ontology developers can re-use parts of existing ontologies to create a new one, and then

add the new ontology into the ontology library system. This method does not provide

support for collaborative editing but is very effective for re-using existing ontologies. It is

also easier to implement than other methods.

Collaborative editing can be understood in such sense that several developers develop the

same part of an ontology together. To support this approach, the development

environment can implement a laissez-faire collaboration strategy where the system does

not protect a developer from others’ modifications but all modifications are broadcasted to

all developers in real time, and their editing environments display the changes

immediately. The newest version of a development tool called Protégé from Stanford

University implements beta level collaboration support this way. Protégé is a client / server

based architecture where collaboratively edited ontologies reside on the server, and clients

access them via a network connection. More detailed information on Protégé can be found

from (Grosso et al., 1999), but so far there exist no papers on its collaboration abilities.

Quite the opposite strategy to laissez-faire is locking. System can lock parts of an ontology

as they are to be modified. Different approaches can be taken here. One approach very

similar to open source development is explained in (Farquhar et al., 1996) and in (Karp et

Page 15: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

15

al., 1999). Developers edit local copies of ontology, and as they attempt to commit the

changes, the modifications are tested for conflicts. Should any conflicts be found,

developers are forced to remove them before the commit can be done. The system locks

the ontology only while integrating the non-conflicting changes to the original ontology.

This method aims to minimize the time any part of an ontology is locked, because locking

an ontology effectively prevents other developers from working on it.

Another approach would be to lock part of an ontology for the entire duration of

development. This would require either short commit intervals or well defined responsibility

areas in order not to block other developers and to minimize the number of conflicts at

commit time.

Locking requires the implementation of user and group access control. They can be used

to manage changes developers can make by assigning read, overwrite and new concept

creation rights to users. One implementation of access control is done in Ontoligua

(Farquhar et al., 1996).

Another way to divide development responsibility is modularization. If ontology can be

modularized, the modules can be used as a basis for assigning work to different

developers (Herbsleb & Grinter, 1999). This helps in preventing overlapping modifications,

but it may not always be possible.

During development the developers need to be informed on changes to other parts of

ontology, when these changes affect the parts they are developing. A notification

mechanism can be implemented to alert developers about others’ changes (Farquhar et

al., 1996).

5.6. Ontology Storage Strategy

Collaborative development can be supported with different ontology storage strategies.

The main issue in storing ontologies is version control. As (Ding & Fensel) remarks,

version control is very important, and most existing ontology development environments

Page 16: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

16

could do it better. Existing version control systems use mainly methods similar to those

explained in (Karp et al., 1999). We discussed these methods the previous section.

In versioning it is important to separate the identity of ontologies from the identity of files in

which they are stored, as (Klein et al., 2002) explains. This way an ontology can have a

unique version identifier that can be used to refer to that particular ontology. It allows us to

change ontology location and name, adapting to changing organizations and correcting

possible typing errors.

One way to maintain consistency as the ontology evolves is to separate instances from

ontologies (Ding & Fensel). This allows system to use different ontologies to provide

different perspectives on the same data. To refer to our previous automotive industry

example, instances would be real, physical cars that are in stock. Manufacturer could use

one ontology to keep track of the different parts in a specific car and their providers,

whereas retailer might have another ontology for storing ownership history, condition and

price to provide buyers with a search application to find the best possible match for their

needs.

6. Suitability of Approaches to Problems

6.1. Design Criteria for Ontologies

In section 5.1 we presented design criteria for ontologies. As we mentioned, the criteria

are very general, and aim to help developer process her thoughts. Therefore they may not

completely integrate to an integrated development environment. If developer is able to

follow them, the resulting ontology is likely to contain less semantic errors than if the

criteria are not followed.

Clarity criterion could be partly integrated by using graphical tools and windows to add

concepts and relations. At the creation time the system could hint the developer to fill in

most important details. Development environment should not force her to do so, though.

As ontology modeling is a creative task, we do not believe that all information could be

Page 17: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

17

entered at once. Developer might be still just drafting her model and might not yet know

the exact details.

Extendibility criterion should be completely intergratable by using standard ontology

description languages. The system could take care of all required namespace and

referencing issues, thus preventing lots of possible errors.

Other criteria depend greatly on developer’s will to commit her to them. This is why we

remain rather skeptic that the development tool could enforce them. However, we do

believe that careful user interface design can aid them by guiding user to follow the

criteria.

6.2. Knowledge Transfer

Current systems do not seem to support knowledge transfer very well. We believe that a

collaborative ontology development environment should strongly support communication

channels where developers could present different opinions, discuss on design

propositions and reach a consensus. One possible option would be integrating the IDE via

plug-ins to various communication tools, as the mind map case shows. Presence

information and real time communication might also be very useful.

Combining documentation with ontology is especially important when ontologies reach

long lifetimes. The development environment should encourage developers to keep the

documentation up to date to help maintenance and re-use at any later point in time. We

believe that implementing these strategies would greatly reduce the number of errors

inserted into an ontology during development.

6.3. Development Environment

Current publications all agree that graphical user interface tools are needed to help

ontology development. When using them, typing errors in concept names etc. no longer

introduce conflicts, as the tool creates all relations between concepts and updates them

automatically. We believe that good visualization combined with drag and drop editing

Page 18: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

18

could reduce also semantic errors, as developers can clearly see the results of their

modifications. If developers want to directly modify the ontology description language

code, syntax highlighting could help them prevent typing errors.

So far, there has been a shortage of graphical development environments that allow

collaborative and concurrent development. We are happy to see that lately a lot of effort

has been invested in this area. As some results are already available in the form of for

example Protégé, efforts continue to produce even better quality development tools.

Meanwhile, as we are dealing with a relatively complex application area with lot of

expressive power, we suggest that a tool offering syntax highlighting be kept available for

dealing with issues that cannot be solved with graphical tools.

Semi-automated conflict identification and resolution seems very useful indeed, and we

suggest incorporating them into any development environment. They can provide valuable

help in recognizing problems and offering a set of possible causes and solutions. Having

said that, we would like to remind that while highly applicable, conflict resolution tools

cannot identify all possible problems (Klein et al., 2002). A combination of knowledge

transfer and design criteria support might therefore prove more valuable, as they actually

help prevent the conflicts from being inserted in the first place.

6.4. Architectural Strategy

We are strongly inclined to believe that standard upper level ontologies can be used with

success to organize ontology library systems and ontology developers’ work. This should

result in fewer semantic errors. In addition to that, they help relating different domain-

specific ontologies in ways that allow non-trivial, cross-domain associations between

concepts.

Modularity has been proven to work well in distributed development. It helps to divide work

into smaller fragments that are easier to understand. Less semantic errors are therefore

expected. Still, we would like to point out that dividing an ontology to completely

independent modules might limit the ability to relate concepts together via upper level

Page 19: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

19

concepts. This in turn could limit the potential of finding related concepts using non-trivial

relations in the concept hierarchy.

A client / server model has convinced us about its ability to support collaborative editing.

We suggest that it be considered an option for any development environment attempting to

implement collaboration protocols. We would like to see it combined with (semi-)

automated conflict resolution and version control.

6.5. Synchronization Strategy

We believe that a real synchronization strategy is required to enable developers

collaborate effectively when developing an ontology. Just allowing one to re-use existing

ontologies to create new ones might decrease the number of modularity and referencing

issues, but would exponentially increase the number of ontologies. This would lead into a

versioning and maintenance catastrophe before ontology’s lifetime of 50 years has been

reached.

Locking effectively prevents others from working on any particular area of an ontology.

Still, it does not prevent misassumptions about locked concepts. We believe locking

together with user access control and notification mechanisms could be used to divide

development work to different organizations. In collaborative activities, we believe it would

pose too rigid limits.

For collaboration support on microscopic level we suggest laissez-faire style. Surely, it will

pose problems when one user spends a lot of effort to model a concept and another user

deletes it right away, but its strengths in communication outweighs the problems.

Laissez-faire strategy does exactly what communication tools aim to do: it allows

developers to quickly demonstrate their ideas to others. Another advantage is that as

conflict resolution cannot be fully automated, laissez-faire forces all developers have

identical models all the time. Fewer conflicts thus emerge at commit time. For coordinating

the work, users should use more traditional communication and versioning techniques.

Page 20: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

20

6.6. Ontology Storage Strategy

As we already mentioned, proper versioning is very important in collaborative, distributed

development. Versioning should handle only “finished work”; it should omit the details of

collaborative editing done to modify one version to another. The modification details

themselves could be used to as hints to facilitate automated conflict resolution.

Separating instances from ontologies and the identity of ontologies from the identity of files

in which they are stored are good decisions. They let the system to work with concepts

such as ontology, instance and version in an independent manner, which allows one to

freely distribute the physical storage around the Internet. Same instance data can be

viewed through different ontologies to create different views on the same concepts.

Separating instances, ontologies and files can – and in our opinion, should – be supported

by development environment. The developer can then concentrate her energy to ontology

development itself, not on the possibly confusing details of storing ontologies, which could

introduce further conflicts.

7. Discussion and Conclusion

Most important steps to prevent errors in collaborative ontology design seem to be

• Sound ontology design principles

• Good knowledge of the domain and of the ontologies being re-used

• Ability to share design ideas with other developers

• Separating modeling from the dirty details of language syntax

• Version control to keep track of the evolvement of the ontology.

It is difficult to provide tool support for ontology design criteria. Still, we believe that any

investment on them will be high appreciated by the developers, as it is likely to decrease

the number of errors inserted during ontology development.

Page 21: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

21

Collaborative development tools should support various communication methods, both

asynchronous and synchronous. Exchange of ideas and design sketches is also important,

as is the ability to derive initial ontology definitions from the sketches.

Development tools should offer collaboration support to allow multiple authors to edit the

same model concurrently, without local copies. Client / server model seems to suite this

purpose naturally. Laissez-faire style synchronization can be used to quickly draft ideas

and propagate changes to all developers in real time. More rigid conflict identification and

resolution schemes should be used when finished parts are being committed back to

repository. Conflict checking can also occur on user request. Real-time checking might

prove too heavy and could hinder rapid drafting.

Ontology repository should implement strong version control to help developers

incrementally build the ontology and derive evolutionary branches. If possible, ontologies

should be divided in a modular fashion. Careful consideration of pros and cons is required

if modularization would limit the reasoning one is able to do on the ontology.

Locking mechanisms and access right control might help to divide work between different

organizations and thus prevent errors rising from modeling conceptually overlapping items

independently. However, they require short commit intervals in order not to let the

development efforts stray.

Use of a development tool with a graphical user interface is a very powerful way to

eliminate problems rising from typing errors and syntax mismatches. The tool may also

use optimization algorithms to optimize ontology axiom descriptions for performance. In

case when it is not possible to implement a graphical user interface, the development tool

should provide at least syntax highlighting to help user locate typing errors and syntactical

errors as she modifies the ontology representation language.

Most of the steps described above help in preventing problems that rise from syntactic

errors or concurrent editing. Semantic errors are harder to prevent, as they depend on the

domain and the intended purpose of an ontology. We believe that more research on this

field would be beneficial in developing better distributed development tools.

Page 22: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

22

We have presented common problems in collaborative, distributed ontology development.

We then gave an overview of different methods currently used to overcome these

problems. We analyzed the suitability of these methods to solve the problems presented

and found that most methods can be integrated into a distributed ontology development

environment. We noted that some methods concentrate on removing existing conflicts

from ontologies instead of trying to prevent them from being inserted. We introduced a set

of propositions for distributed development environments. We believe that those are the

key points in helping users better achieve high quality results when collaboratively

developing ontologies. The study suggests that more research on how to prevent semantic

errors during collaborative modeling could benefit development of distributed development

tools.

Page 23: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

23

8. References

S. Aitken, Extending the HPKB-Upper-Level Ontology experiences and observations. In

Proceedings of the Workshop on Applications of Ontologies and Problem Solving

Methods(ECAI'98), Brighton, England, August 1998.

Ying Ding and Dieter Fensel, Ontology Library Systems: The key to successful Ontology

Re-use.

A. Farquhar, R. Fikes, and J. Rice. The Ontolingua server: A tool for collaborative ontology

construction. Technical report, Stanford KSL 96-26, 1996.

W. Grosso and H. Eriksson and R. Fergerson and J. Gennari and S. Tu and M. Musen,

Knowledge Modeling at the Millennium -- The Design and Evolution of Protege-2000.

Proceedings of the 12th International Workshop on Knowledge Acquisition, Modeling and

Management (KAW'99), Banff, Canada, October 1999.

T. R. Gruber, Towards Principles for the Design of Ontologies Used for Knowledge

Sharing, Formal Ontology in Conceptual Analysis and Knowledge Representation, Kluwer

Academic Publishers, 1993.

Herbsleb, J.D. and Grinter, R.E. (1999) Architectures, coordination, and distance:

Conway’s law and beyond. IEEE Software, Volume: 16 Issue: 5, Sep-Oct. 1999 Page(s):

63-70

Peter D. Karp and Vinay K. Chaudhri and Suzanne M. Paley, A Collaborative Environment

for Authoring Large Knowledge Bases, Journal of Intelligent Information Systems, Volume:

13, Number: 3, Pages 155-194, 1999.

Michel Klein and Dieter Fensel and Atanas Kiryakov and Damyan Ognyanov, Ontology

versioning and change detection on the Web, 2002.

Heiner Stuckenschmidt and Michel Klein, Modularization of Ontologies - WonderWeb:

Ontology Infrastructure for the Semantic Web, 2001.

Page 24: €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT Helsinki University of Technology Table of Contents

T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT

Helsinki University of Technology

24

Y. Sure and S. Staab and J. Angele, OntoEdit: Guiding Ontology Development by

Methodology and Inferencing, Proceedings of the International Conference on Ontologies,

Databases and Applications of SEmantics ODBASE 2002.

Y. Sure and R. Studer, On-To-Knowledge Methodology - Final Version. Institute AIFB,

University of Karlsruhe, On-To-Knowledge Deliverable 18, 2002.