IOS Press LabelTranslator: A System for Localizing ... · Undeﬁned 1 (2009) 1–5 1 IOS Press LabelTranslator: A System for Localizing Ontologies in Collaborative Settings Mauricio

Undefined 1 (2009) 1–5 1IOS Press

LabelTranslator: A System for LocalizingOntologies in Collaborative SettingsMauricio Espinoza a,c, Asunción Gómez-Pérez a and Eduardo Mena b

a Departamento de Inteligencia Artificial, UPM, 28660 Boadilla del Monte, Spain,E-mail: [email protected], [email protected] IIS Department, Univ. of Zaragoza, María de Luna 1, 50018 Zaragoza, SpainE-mail: [email protected] Facultad de Ingeniería, Universidad de Cuenca, Cdla. Universitaria Av. 12 de Abril s/n, Cuenca, Ecuador

Abstract. LabelTranslator is a system to semi-automatically localize ontologies in different natural languages: it extracts, trans-lates and updates the lexicon of an ontology. Its objective is to support the development of multilingual ontologies, combiningand exploiting the linguistic and semantic knowledge available on the Web to discover the translations of each ontology entity.We present the design of LabelTranslator, which have been guided by the requirements of international institutions as FAO forlocalizing different types of resources such as thesauri or ontologies. In particular, the three main activities handled by Label-Translator are detailed: managing the multilingual content for localization, monitoring and control the localization activity, anddiscovering the more appropriate translations for each ontology entity.

Keywords: Knowledge Engineering, Ontology Localization, Collaborative Ontology Localization

1. Introduction

On the light of the growing interest and use of Se-mantic Web technologies, the need for ontology-basedapplications that manage and interact with differentnatural languages has also arisen. Many are the organi-zations operating at an international level that have tomanage information in several natural languages, andwant to offer their users the possibility of accessingand retrieving data in their own languages.

International institutions with information and re-sources in multiple languages, such as the FAO1 in theAgricultural domain or the WHO2 in the Health area,are currently working in the introduction of semantictechnologies in their information systems. One of theirmain concerns is to account for multilinguality and thelocalized representation of knowledge [1].

Thus, multilinguality and localization become a realneed in the current Semantic Web, and research in this

1Food and Agricultural Organization.2World Health Organization.

sense has focused on the representation of linguisticand multilingual information in ontologies [2,16], andon the automatic localization of ontologies [6,7]. Inother to be able to exploit ontologies in a multilingualsettings, an adaptation of the ontology to different nat-ural languages is required.

Ontology Localization has been defined as the pro-cess of adapting a given ontology to the needs of acertain community, which can be characterized by acommon language, a common culture, or a certaingeo-political environment [3]. The difficulties involvedin the localization of ontologies have been discussedin [8], as well as the strategies that can be followeddepending on the final aim of the localized ontology,and the representation modalities that better complywith each strategy. Additionally, guidelines have alsobeen proposed for the localization of ontologies withthe aim of helping users in this task.

Regarding ontology localization tools, LabelTrans-lator [6,7], implemented as a plugin of the NeOnToolkit [9], supports a semi-automatic localization ofontologies to different natural languages, and in its cur-

0000-0000/09/$00.00 c⃝ 2009 – IOS Press and the authors. All rights reserved

2 M. Espinoza et al. / LabelTranslator: A System for Localizing Ontologies in Collaborative Settings

rent version, it can localize ontologies in English, Ger-man, and Spanish. The main goal of LabelTranslatoris to reduce the cost of translation and minimize thetime to localize an ontology. The first prototype pre-sented in [5] gave us an insight in the potential of thistechnology and it provided a single working scenario,in which only one person was in charge of supervisingall the steps in the ontology localization activity. How-ever, it lacked of functionalities to support collabora-tively the localization of ontologies in those settings inwhich many actors geographically disperse, using di-verse natural spoken languages where involved. Thus,the maintenance cycle of each language will often becarried out separately. To address these limitations, wecreated a workflow-based model for the collaborativelocalization of ontologies in distributed environments.

In this paper, our objective is to describe the ap-proach we propose for the management of a collab-orative ontology localization, and the components re-quired to support it. Our contribution also includes theimplementation of the proposed approach. The rest ofthe paper is structured as follows. In section 2 the mainmodules for supporting the localizing of ontologies indistributed and collaborative environments are intro-duced. A description of the modules of our generic ar-chitecture is presented in sections 3, 4, and 5. Section 6presents a set of experiments used to evaluate the tech-nological aspects of the LabelTranslator system. Fi-nally, conclusions and future work appear in section 7.

2. Architecture of Collaborative LabelTranslator

In this section we present a generic architecture forlocalizing ontologies in distributed and collaborativeenvironments. Both Automated Translation and Col-laboration and Distribution have been the key factorsin the design of the advanced version of LabelTransla-tor. To increase the quality of the translations in auto-matic localization systems we propose a collaborativeand distributed approach to: i) enhance the communi-cation and collaboration among localization stakehold-ers, and ii) increase the participation of ontology usersin the ontology localization process.

LabelTranslator has been designed around threecore activities, each corresponding to a “layer" of itsarchitecture (see Figure 1):

– The Ontology Management module enables on-tology editors and managers to automatically ac-cess and manage multilingual content to be local-ized.

– The Collaborative Localization Managementis the module designed to help manage, monitor,and control the localization activity itself.

– The Ontology Translator module is the core ofthe whole system. It allows to automatically dis-cover the more appropriate translations for eachontology entity.

At a more technical level, these three modulesare hosted on centralized client-server architecture inwhich the server maintains the current state and fullhistory of all localized ontologies. The descriptions ofthe three main tasks of LabelTranslator, both at theconceptual and technical levels, are presented in thefollowing sections.

3. The Ontology Management Module

Ontology Management is the module designed tohelp ontology engineers and managers to managemultilingual content for localization. The main func-tionalities of the Ontology Management componentare: i) development and storage of ontologies that needto be localized, ii) selection of the ontology elements tobe translated into different natural languages, iii) han-dling of different versions of the localized ontologies,and iv) deployment of all multilingual information as-sociated with ontology elements.

Analyzing the state-of-the-art on ontology manage-ment tools3, we observe that the evolution of seman-tic technologies has led to a number of concrete im-plementations to support specific ontology engineeringactivities. Basically, the first two functionalities (abovedescribed) are well supported by ontology manage-ment tools. However, popular tools for ontology devel-opment are limited with respect to how to model mul-tilinguality in ontologies.

In our approach we have implemented the multi-linguality support on NeOn Toolkit, a state-of-the-art,open source multi-platform ontology engineering en-vironment, which provides comprehensive support forthe ontology engineering life-cycle. We have extendedthe architecture of the NeOn Toolkit incorporating twocomponents: the Localization GUI component and theOntology Repository component.

3Readers interested in the state-of-the-art of ontology manage-ment tools can refer to [15].

M. Espinoza et al. / LabelTranslator: A System for Localizing Ontologies in Collaborative Settings 3

Fig. 1. General architecture of the LabelTranslator system.

3.1. Localization GUI component

This component provides additional extension pointsto modify the main components of the NeOn Toolkit,with the aim of controlling the aspects related to the lo-calization activity. For the different localization stake-holders, a whole set of views4 has been developed,which interacts with the different modules of the sys-tem. Along this paper we will show some of viewsused in our tool.

3.2. Ontology Repository component

This component captures the multilingual linguisticinformation associated with ontology elements. Label-Translator supports the Linguistic Information Repos-itory model (LIR) [16] designed for the representationof multilingual information in ontologies. The inclu-sion of the LIR in the system ensures the separationof information that is considered orthogonal in nature(we refer to the ontological and linguistic information).This way of representation is the current trend in theintegration of multilinguality in ontologies [2].

The main advantages of the LIR model rely on itsflexibility by allowing i) the enrichment of any ontol-ogy element with as much linguistic information asneeded by the final application (lexicalizations, defini-tions, sources of provenance, etc.), and ii) the estab-lishment of links among linguistic elements within andacross different natural languages. Within the samelanguage, lexical and terminological variants of the

4In the NeOn Toolkit a view is typically used to navigate a hier-archy of information, open an editor, or display properties for theactive editor.

lexicalization of an ontology element can be accountedfor. Across languages, the LIR also allows to defineequivalence relationships between lexicalizations indifferent languages, as well as account for conceptual-ization mismatches. The rationale underlying the LIRis not to design a lexicon for each language involved inthe localization activity, but to express the knowledgerepresented in the ontology so that it can be understoodand reused by the target culture.

In the Figure 2 we show the Linguistic Informationview implemented in LabelTranslator to allow ontol-ogy users to manage the multilingual information pro-vided by the LIR model. The page shows five sectionsthat correspond to the lexical entries of the selected on-tology element (FAO in our example). For instance, inthis case the concept FAO has three lexical entries, twoin English and one in Spanish.

In our approach, the LIR model is represented as anontology, whose instances represent the lexical knowl-edge. All information managed by the LIR model iscontrolled by a specialized unit of the ontology repos-itory component. This unit provide the following fea-tures:

– It provides a special API to retrieve linguisticknowledge or to update the linguistic model.Also, it acts as a wrapper around any possiblerepresentation of the model.

– It implements both load and save mechanismswhich can serialize the lexical entries associatedwith one ontology.


Fig. 2. Linguistic Information view that support the LIR model.

4. The Collaborative Localization ManagementModule

This section describes in detail the goal and func-tionalities of the Localization Management which isthe key module that allows localization managers tomonitor, manage and control the localization activityin collaborative and distributed settings. The WorkflowLocalization Manager (WLM) is the core componentof this module. The main goals of the CollaborativeLocalization module are:

1. To manage the time flow of the localization ac-tivity from the start to the end,

2. To detect the changes in the Ontology Manage-ment module, and

3. To manage the individual localization tasks per-formed in the Translator module.

4.1. Workflow Localization Manager (WLM)

In order to support the first goal, the WLM includesa collaborative workflow that implements the neces-sary mechanisms to allow ontology stakeholders toperform the activities of the ontology localization ac-tivity. Thus, the collaborative workflow is responsible

for the coordination of who (depending on the userrole) can do what (i.e., what kind of actions) and when(depending on the status of the ontology elements).

From a technical point of view, the collaborativeworkflow is associated with a set of initialization pa-rameters (e.g., user roles, assigned tasks, etc), sourceand target languages, and a partially ordered set of ac-tivities. The WLM stores individually the initializationparameters of each ontology. However, the informa-tion about user, roles and skills is stored in a shareddatabase, which has two benefits:

1. Improved Project Staffing. The Localization Man-agers can see all the information related to a par-ticipant (e.g., language skills). This saves timeand allows for better decisions when staffing anew ontology localization project.

2. Shared Information Across Ontology Projects.The shared information provides a particularbenefit for ontology projects that need to local-ize several ontologies. Maintaining a single userdatabase allows to share users in different ontol-ogy projects.


Fig. 3. User wizard used by the Workflow Localization Manager.

In order to configure the workflow localization pa-rameters, the WLM module extends the NeOn toolkitwith a set of wizards5.

For example the user wizard shown in Figure 3, al-lows managing the profile of each participant of thelocalization activity. The wizard records informationabout the skills of each participant (source and targetlanguages), and it describes the roles, operations andpolicies that apply to a certain ontology. All this infor-mation is used in LabelTranslator to check the userscredentials at login time, and to determine whether auser is allowed to perform a certain operation based onthe policies of the ontology to be localized. In our ap-proach a user can play several roles in the localizationactivity. For example, a given user could play the roleof Translator and Reviewer.

Coming back to the description of the workflow,the activities supported are: selecting the ontology el-ements to be translated, translating the selected ele-ments, reviewing the translations, and updating the on-tology with the linguistic information obtained. Theseactivities summarize the localization tasks commonlyfollowed by different organizations such as FAO.

In the next sections we explain the rest of the asso-ciated components.

5A wizard is a user interface element that presents a user with asequence of dialog boxes that lead the user through a series of well-defined steps. Tasks that are complex, infrequently performed, orunfamiliar may be easier to perform using a wizard.

4.2. Synchronization component

Since our approach follows the current trend in theintegration of multilinguality in ontologies, the Syn-chronization component monitors the changes in theontology model and automatically propagates thosechanges to the linguistic model using synchronizationtechniques.

In order to keep both models, the ontology model(OM) and lexical model (LIR), synchronized, we firstneed to find out exactly what has been changed in theontology model, then find the associated informationin the linguistic model and, only then, start the updat-ing. Basically, the addition of new terms in the ontol-ogy, or deletion of an existing term can be controlledby some mechanism of change. In the NeOn Toolkit,an advanced change tracking based on Resource Delta6

is able to capture changes even when ontological termshave changed their position within the ontology model.By adopting this feature, the synchronization compo-nent can accurately identify the minimal set of changesneeded to adjust the structure of the linguistic model, acritical first step to ensure that a change is made prop-erty in the localized ontology. To correctly update thelinguistic model, the system identifies:

6A resource delta represents changes in the state of a resource treebetween two discrete points in time.


1. any ontology term in the original ontology whoselabel have changed in the updated ontology,

2. any ontology term that has been added to the up-dated ontology,

3. any ontology term which has been removed fromthe original ontology, and

4. any ontology term whose position in the updatedontology differs from that in the original ontol-ogy.

Finding where a translation is required is only partof the problem. We also need to ensure that changes inthe ontology structure are accurately propagated to thelinguistic information of each node in the distributednetwork that maintains a copy of the localized ontol-ogy (translators and reviewers). In our approach, theontology elements whose structure needs to be updatedare clearly flagged in each node. In this sense, the rel-evant structural changes are indicated in a form thatturns updating the translation into a simple process,thus involving minimal work on the part of the transla-tor or reviewer expert.

4.3. Localization Monitoring component

The last goal of the Localization Management mod-ule is supported by the Localization Monitoring com-ponent. This component controls and manages the ac-tivities that the ontology stakeholders are allowed toperform depending on their roles and the status of theontology elements to be localized. The possible tasksin the collaborative workflow (described previously)apply at different levels of abstraction. In our solu-tion we consider two levels: ontology element leveland translation level. Regarding the ontology model,LabelTranslator deals with OWL-DL ontologies. Con-cepts, properties, instances and ontology term com-ments make up the set of ontology elements we aretaking into account for the localization.

The possible states that can be assigned to ontologyelements are:

– In Use: This is the status assigned to any ontol-ogy element when it passes first into the collabo-rative workflow, or when it was localized and thenupdated in the Ontology Repository.

– New: If the ontology element is added to the on-tology once the ontology has been localized, theontology element is passed to the “New"’ status,and remains there until it is localized.

– Changed: If the original label of the ontology el-ement has changed, the element is passed to the

“Changed" status, and remains there until the el-ement is checked again due to localization needs.

– Unused: If the ontology element has been deleted,it is passed to the “Unused" status, and remainsthere until the ontology is synchronized (see syn-chronization component in the previous section).

The localization component controls also the statusof the translations. Figure 4 shows the workflow of thetranslation level. States are denoted by rectangles andactions by arrows. The actors on the figure specify theactions that an ontology stakeholder can perform de-pending on their role. In the following we provide adetailed explanation:

– Not translated: This is the status assigned to anytranslation when it first passes into the collabora-tive workflow or when any change has been per-formed in the element of the ontology under con-sideration.

– To be Translated: Once the Localization Managerselects the translations with not translated status,these translations are passed to the “To be Trans-lated" status, and remain there until a “Translator"translates them.

– Auto translated: If a “Translator" uses the auto-matic translation algorithm provided by the sys-tem, then translations are passed to the “Auto-translated" status, and remain there until the own“Translator" sends them to the “To Be Reviewed(automatic)" status.

– Translated: If a “Translator" discovers manuallya translation, then the translation is passed to the“Translated" status, and remains there until theown “Translator" sends the translation to the “Tobe Reviewed" status.

– To be Reviewed: If a “Reviewer" approves thetranslations send by the translator, they pass tothe “Complete" status. The reviewer knows inadvance if translations have been discovered au-tomatically or manually. For example, the word“automatic" in the message “To Be Reviewed (au-tomatic)" indicates to reviewers that these transla-tions have been obtained automatically. Addition-ally, when the translations reach the “Complete"status, they are automatically updated in the On-tology Repository.

In our implementation, the localization compo-nent performs automatically the actions defined inthe workflow. Thus, this component takes care of en-forcing the constraints imposed by the collaborative


Fig. 4. Workflow to control the status of the translations.

workflow. In detail, whenever a new workflow ac-tion is performed, the component performs the fol-lowing tasks: i) It gets the identity and role of theuser performing the action, ii) It gets the status ofthe ontology element/translation associated to the ac-tion/change, iii) It verifies that the role associatedto the user can perform the requested action whenthe ontology element/translation is in that particu-lar status, iv) if the verification succeeds, it performsthe workflow action (e.g., enabling all correspondingfields in the views); else neither action is performed.

Additionally, LabelTranslator extends some viewsin the Neon toolkit which allow ontology localizationstakeholders: i) to see the appropriate information ofthe translations in the workflow, and ii) to perform theapplicable workflow actions (select, translate, review,etc.), depending on their role.

Figure 5 shows the perspective7 used by Label-Translator in order to support the localization work-flow. The Ontology Navigator in the figure is locatedon the left side of the main view. It contains all on-tologies that need be localized. The localization viewis located on the middle of the main view. This viewis used to add or to update the translations associatedwith the ontology terms that has been selected in theproject tree on the left. Each ontology term is locatedin its own row. The localization view contains severalshortcuts that make work faster, and are enabled ac-cording to user profile. Finally, the filter view is locatedon the right side of the figure. It contains several checkbox that allow user to modify what items are shown inthe localization view.

7A perspective is a visual container for a set of views and contenteditors.

5. The Translator Module

The goal of the Translator Module in LabelTrans-lator is to discover semi-automatically the more ap-propriate translations for each ontology entity. Themain steps given by the translator module are: la-bel pre-processing, label translation and label post-processing; (see Figure 6 which shows the steps pre-sented in Figure 1 in more detail). With the help of thetranslator module, the translators/reviewers can reducethe effort and time spent to localize an ontology man-ually. In the rest of this section we will describe themain steps of the Translator module.

Fig. 6. Details of the Translator Module.


Fig. 5. A perspective of the LabelTranslator system.

5.1. Label Pre-Processing

Ontology label pre-processing is essential in an on-tology localization system, in order to simplify thecore translation processing and make it both qualityand time effective. The ontology labels pose differentchallenges to machine translation, which can be at-tributed to two distinct characteristics:

– Ontology labels differ linguistically and stylisti-cally from written language: phrases are shorterand in some cases poorly structured, also they cancontain ungrammaticality expressions (e.g., Ser-vice_Transport instead of Transport_Service)

– The most used approach for naming the ontologylabels is to use a CamelCase8 approach. There-fore, we cannot rely on the initial uppercase let-

8CamelCase (also spelled camel case, camel-case or medial capi-tals) is the practice of writing compound words or phrases in whichthe elements are joined without spaces, with each element’s initialletter capitalized within the compound, and the first letter is eitherupper or lower case as in “LaBelle", “BackColor", or “iPod". Thename comes from the uppercase “bumps" in the middle of the com-pound word, suggestive of the humps of a camel. The practice isknown by many other names.

ter to identify a phrase initial word or recognizingproper name, since names cannot be identified byan initial capital.

These problematic factors are dealt with a pre-processing pipeline that prepares the input for process-ing by a machine translation module. This approachhas been used in a wider range of applications [21,20]where the task of the pre-processing pipeline is tomake the input amenable to a linguistically-principled,domain independent treatment. In LabelTranslator,this task is accomplished in two ways:

1. By normalizing the input, i.e., removing noise,reducing the input to standard typographical con-ventions, and also restructuring and simplify-ing it, whenever this can be done in a reliable,meaning-preserving way.

2. By annotating the input with linguistic informa-tion, whenever this can be reliably done with ashallow linguistic analysis, to reduce input am-biguity and make a full linguistic analysis moremanageable.

In the following we describe the functionalities ofthe different tasks in more detail:


5.1.1. NormalizationThe label normalization groups two components,

which clean up and tokenize the input. The text-levelnormalization phase performs operations at the stringlevel (ontology term comments by example), such asremoving strange text and punctuation (e.g., brackets,used to mark synonyms or usage context), or removingperiods from abbreviations.

The tokenization phase breaks a ontology labelinto words. The token-level normalization recognizesand annotates tokens belonging to special categories(times, numbers, etc.), recognizes and identifies com-pound words. For annotating the tokens, LabelTrans-lator relies on Treetager system [18]. The process ofidentifying of compound terms uses the Yahoo TermExtraction API.9

5.1.2. TaggingIn the tagging phase a tagger system10 assigns parts

of speech to tokens. Part of speech information is usedby the subsequent pre-processing modules, and alsoin parsing, to prioritize the most likely lexical assign-ments of ambiguous items. Treetagger is used in ourapproach.

5.1.3. Proper name recognitionProper names are ubiquitous in ontology labels, spe-

cially in instance terms. Their recognition is impor-tant for deciding what instances should be translated,avoiding an annoying effect if any instance term is sys-tematically mistranslated (e.g., a sport domain ontol-ogy where the golfer named Tiger Woods is an in-stance systematically referred to as “los bosques deltigre" (in Spanish), lit. “the woods of the tiger").

Name recognition is harder in the ontology domainby the fact that capitalization information is used com-monly for naming all type of ontological terms (con-cepts, properties, and instances), thus making unusableall methods that rely on capitalization as the main wayto identify candidates. Of course, this problem is evenmore complex when no capitalization information isgiven. For instance, an expression like “mark shields",as a possible instance in the ontology, is problematicin the absence of capitalization, as both ‘mark’ and‘shields’ are three-way ambiguous (proper name, com-mon noun, and verb). Our approach does not supportthe proper name recognition for the moment.

9http://developer.yahoo.com/search/content/V1/termExtraction.html10A tagger system is a tool for annotating text with part-of-speech

and lemma information.

5.1.4. SegmentationSegmentation breaks a ontology label into one or

more segments, which are passed separately to subse-quent modules. For our purposes, the translation unitsthat we identify are syntactic units, motivated by cross-linguistic considerations. Each unit is a component thatcan be translated independently. Its translation is inde-pendent to the context in which the unit occurs, and theorder of the units is preserved by translation.

The main motivation for segmenting are the prob-lems found in the conventional online machine trans-lation systems (e.g., GoogleTranslate, BabelFish, etc.).These systems have serious problem in dealing withlong sentences due to the grammar coverage, mem-ory limitation and computational complexity. With-out proper treatment of long phrases, these systems,may fail to produce understandable translations. Al-though in our proposal we did not treat the translationof phrases (as the found in term annotations), we con-sidered this component of utmost importance for fu-ture versions of the system.

In our approach we use a basic segmentation pro-cess to divide the tokens of a compound label. How-ever, for the translation of phrases we devised a seg-mentation component based on machine learning tech-niques [12], syntactic analysis techniques [11], or sup-port vector machines [13].

5.2. Ontology Label Translation

After preparing the ontology element for an ef-fective machine translation processing, the OntologyTranslator invokes the label translation component,which obtains the most probable translation for eachontology entity. This component integrates differentmachine translation approaches, combining the outputby means of different translation combination strate-gies. The output of this component is a ranked set oftranslations for each ontology entity.

We have identified two translation strategies: one forsimple labels and other for compound labels. Thesestrategies are inspired in all empirical studies describedin the literature about multi-engine machine transla-tion architectures which operate by combining outputsfrom different translation engines. In the following wedescribe these translation strategies:

5.2.1. Translating simple labelsThe strategy used for the translation of simple la-

bels is a hybrid composition of two components, seealso figure 7. The first component combines two differ-


ent translation paradigms to discover candidate trans-lations. The second component use an approach basedon ontologies to discover the semantic senses of eachtranslated label. In the figure 7, ovals represent the con-text extracted from lexicon and core ontology, and thetranslation results are represented as concentric circles.

First component - obtaining candidate translationsThe first component takes as input an ontology la-

bel l described in a source language and returns aset of possible translations T = {t1, t2, ..., tn} in atarget language. In order to discover the translationsof each ontology label, each translation method ac-cesses different lexical resources. On the one hand,the terminological-based approach use IATE11. On theother hand, the dictionary-based approach use the mul-tilingual dictionary Wiktionary12. A buffer stores pre-viously translations to avoid accessing the same datatwice.

The algorithm used by the component is summa-rized in the following:

1. If the selected ontology label is already availablein the target language in our buffer, then Label-Translator just displays it, with all the relevantavailable information,

2. If the translation is not stored locally, then eachtranslation method accesses remote repositoriesto retrieve possible translations. A simple disam-biguation process based on term POS tagging isused to avoid a explosion of nuisance candidatetranslations.

3. If no results are obtained from the two previoussteps, then the user can enter his/her own trans-lation (together with the definition).

To combine the output of the different transla-tion methods we use a linear combination. We as-signed a major weight to obtained translations from thedictionary-based approach, because the translationsobtained from these resources in our tests had highquality. In our approach, the translation of an ontol-ogy label denoted by t, is a tuple ⟨trs, senses⟩, wheretrs is translated label in the specific target language,and senses is a list of semantic senses extracted fromdifferent knowledge pools. In the following we brieflydescribe the task of automatically retrieving the possi-ble semantic senses of a translated label (second com-ponent in our translation strategy).

11http://iate.europa.eu/iatediff/SearchByQueryLoad.do?method=load12http://en.wiktionary.org/wiki/

Second component - obtaining semantic sensesIn order to discover the senses of each translated

label (ti), we have considered the ontology-based ap-proach proposed in a previous work [19]. Our systemtakes as input a list of words (each ti), discovers theirsemantics in run-time and obtains a list of senses ex-tracted from different ontology pools: Watson13, whichindexes many ontologies available on the Web, remotelexical resources as EuroWordnet14, and other ontolo-gies not indexed by Watson to find ontological termsthat match those translated labels. We summarize herethe key characteristic of the sense discovering process:

1. To discover the semantic of the input words, thesystem relies on a pool of ontologies instead ofjust a single ontology.

2. The system builds a sense (meaning) with the in-formation retrieved from matching terms in theontology pool.

3. Each sense is represented as a tuple sk = <s,grph, descr>, where s is the list of synonymnames15 of keyword k, grph describes the sensesk by means of the hierarchical graph of hyper-nyms and hyponyms of synonym terms found inone or more ontologies, and descr is a descrip-tion in natural language of such a sense if avail-able.

4. As matching terms could be ontology classes,properties or individuals, three lists of possi-ble senses are associated with each keyword k:Sclassk , Sprop

k , and Sindvk .

5. Each keyword sense is enhanced incrementallywith the synonym senses (which also searchesthe ontology pool).

6. A sense alignment process integrates the key-word sense with those synonym senses represent-ing the same semantics, and discards the syn-onym senses that do not enrich the keywordsense.

A detailed description of this process can be foundin [19]. In order to perform cross-language sense trans-lations, the external resources are limited to those re-sources that have multilingual information like Eu-roWordNet; however other resources can be used too.For example, a specific domain resource for the FAO

13http://watson.kmi.open.ac.uk/WatsonWUI/14www.illc.uva.nl/EuroWordNet/15The system extracts the synonym names of a term by consulting

the synonym relationships defined in the ontology of such a term.


Fig. 7. LabelTranslator strategy for localize concept, attributes and relation terms represented by simple labels.

(Food and Agricultural Organization) is Agrovoc16,which could cover the vocabulary missed in Eu-roWordNet.

Once identified the semantic senses, the ontology-based method uses a ranking method to sort the listof translations according to similarity with the struc-tural context of the label to be translated. The rank-ing method relies on the disambiguation algorithm de-scribed in [19]. Once all the translations are ranked,the method allows two operation modes:

– Semi-automatic mode: It shows a list with allthe possible translations sorted decreasingly. Themethod proposes the most relevant translation tobe selected first although the user can change thisdefault selection.

– Automatic mode: It automatically selects thetranslation with the highest score.

5.2.2. Translating compound labelsCompound labels which have an entry in linguistic

resources such as lexical databases, dictionaries, etc.(for example, “jet lag", “travel agent", and “bed andbreakfast") are treated as single words in our approach.Others like “railroad transportation", which have noentry in the previous resources, are translated using acompositional method (see figure 8).

LabelTranslator uses a hybrid composition of twotranslation components for translating compound la-bels. In the following we describe both components.

First component - obtaining candidate translationsThe first component is similar to the first compo-

nent used for translating simple labels. However, inthis case we have incorporated a method based on on-line machine translation systems in the parallel com-bination. The fundament behind of this approach isthe vocabulary limitation (specially for compound la-

16http://www.fao.org/aims/ag_download.htm

bels) of both terminology and dictionary translationapproaches. The online translation approach uses dif-ferent multilingual systems such as GoogleTranslate17,Babelfish18, and FreeTranslation19.

First, the component splits the label into tokens(“railroad" and “transportation" in the example); theindividual components are translated and then com-bined into a compound label in the target language.Care is taken to combine the components respectingthe word order of the target language (see second com-ponent below).

Second component - compositional methodThe second component relies on a similar approach

to the example-based machine translation techniques(EBMT) [17]. The main idea behind EBMT is that agiven input phrase in the source language is comparedwith the example translations in the given bilingualparallel text to find the closet matching examples canbe used in the translation of that input phrase.

One of the main approaches in the EBMT paradigmis to use pattern matching techniques. First, these ap-proaches collect word sequences from each corpus byfirst using translation patterns to acquire candidatesfor bilingual expressions. Second, a search for pairsof words that satisfy the correspondences of the se-quences is performed.

Two are the main differences in our approach. First,instead of extracting the bilingual translation templatesfrom a simple monolingual corpus or from a paral-lel corpora, we derived these templates from differ-ent ontologies. The second difference concerns to theused method to discover the translations. We do not ex-tract the translations from a corpus, but from differentlinguistic resources. Thus, the compositional method

17http://www.google.com/translate_t18http://babelfish.altavista.com/19http://ets.freetranslation.com


Fig. 8. LabelTranslator strategy for localize concept, attributes, and relations represented by compound labels.

first searches for translation candidates of a given com-posed label and then builds the translations for the can-didates using lexical templates.

The lexical templates derived from different ontolo-gies are used to control the order of translation. Themain steps of the algorithm are:

1. The compound label is normalized, e.g., rewrit-ing in lowercase, hyphens are removed, it is splitinto tokens, etc (see segmentation task in the fig-ure 8).

2. A set of possible translations is obtained foreach token of the compound label using the dif-ferent translation paradigms (first component inour translation strategy). Note that no process ofcombination is executed among different trans-lations obtained from each method. The methoduses all possible combinations of translation ob-tained for each token.

3. Since translations between languages do notkeep the same word order, the algorithm cre-ates candidate translations in the target languageusing lexical templates20. Each lexical templatecontains at least a pair of patterns, namely‘source’ and ‘target’ patterns. A source pattern isa template to be compared with the tagged com-pound label21, described in the source language,while the target pattern is used to generate thelabel in the target language. If no applicable tem-plate is found, the compound label is translatedusing the translation service directly.

20The notion of lexical template proposed in this paper refers totext correlations found between a pair of languages.

21We use TreeTagger in order to annotate the compound labelswith part-of-speech and lemma information.

4. All the candidate labels that fulfill the target pat-tern are returned as candidate translations of thecompound label.

In our approach, we used a semi-automatic processto obtain the lexical templates. As we explained be-fore, each lexical template is composed of source andtarget patterns. The ontology labels used to learn thesource patterns were extracted from different domainontologies expressed in English, German, or Span-ish. Each label was tokenized, and tagged using thelanguage independent part-of-speech tagger proposedin [18]. On the other hand, the labels used to learn thetarget patterns were extracted from the multilingual in-formation associated to each ontological term or bymeans of a manual translation process. The same pro-cess used to annotate part of speech (POS) in the labelsof the source patterns was used to annotate the labelsof the target patterns. A sample list of lexical templateslearned to translate composed labels among English,Spanish and German can be consulted in [5].

5.3. Label Post-Processing

This component show the translations to the user forreview the quality of them. The quality of the transla-tions is measured by means two factors adequacy andfluency. The first factor is used to determine the quan-tity in that the meaning of a correct translation is pre-served. While that the fluency factor is used to deter-mine how good the corresponding translation in thetarget language is.

The checking of the quality of a translation is theonly task of the ontology localization activity in whichthe user interacts necessarily. In the next versions ofour system we will try automatize this component.


6. Experimental Evaluation

In this section we describe a set of experiments thatwere conducted with the objective of evaluating Label-Translator system. In section 6.1 we describe the ex-periments used to evaluate some aspects related to thetranslation ranking techniques, where the task is to se-lect the most appropriate translation for ontology la-bels. Section 6.2 deals with the study used to assessthe usability of the LabelTranslator system for carry-ing out the ontology localization activity.

6.1. Performance Evaluation

To evaluate the quality of translation of the Label-Translator system we conducted in March 2008 a pre-liminary experiment [5] involving ten PhD students.The main goal of the experiment was to evaluate thetranslation ranking techniques used by the system toselect the most appropriate translation for each ontol-ogy label. This was done by comparing the translationsprovided by an expert (gold standard) with the trans-lations provided by the ranking algorithm used in La-belTranslator. The ontology corpus used for the eval-uation was selected from the set of KnowledgeWeb22

ontologies used to manage EU projects. The experi-mental results showed that our system suggested thecorrect translation 72% of the times. Also, the valuesof recall obtained suggested that a high percentage ofcorrect translations were part of the final translationsshown to the user. One of the main limitations wasthe low quality of the translations of compound labels.We implemented some improvements to the algorithmas i) a recursive function that attempts to match thebi/tri-tokens of a compound label with the lexical tem-plates23 stored in the database, or ii) a method thatlearns new lexical templates from the translations sup-plied by the user.

To test the improvements implemented, a new ex-periment was performed in the “Artificial Intelligence(AI)" Master course at the Facultad de Informática(Universidad Politécnica de Madrid) with 17 Masterstudents. We decided to use a questionnaire that al-lowed collecting the assessments of the students aboutthe capacity of the translation algorithm in provid-ing correct translations according to the context. Asa general conclusion of the results obtained, we can

22http://knowledgeweb.semanticweb.org/23The notion of lexical template refers to text correlations found

between a pair of languages.

mention that 33% of the students identified the levelof correctness of the translations greater than 80%.The rest of students believed that the translations ob-tained had a level of correctness greater than 90%.Basically, the main strength provided by this experi-ment was the improvement in the quality of transla-tion of compound labels. However, some weaknesseswere detected: i) the misuse or omission of the defi-nite article; ii) the incorrect translation of acronyms;and iii) the erroneous identification of the part ofspeech (POS) of single words or compound labels. Thefinal aspect is very important, because we use POS tag-ging as a first mechanism of disambiguation to dis-card candidate translations. We are now working tosolve these problems. More details about this experi-ment in [4].

6.2. Usability Evaluation

To asses the usability of the LabelTranslator systemwe conducted an experiment following the SoftwareUsability Measurement Inventory (SUMI) method [14].The SUMI questionnaire includes 50 items for whichthe user selects one of three responses (“agree", “don’tknow", “disagree"). The questionnaire is designed tomeasure the affect, efficiency, learnability, helpfulness,and control of a software product [10]. The experimentinvolved 10 participants, most of whom were PhD stu-dents with a good command on ontology engineer-ing. The experimenters met with all participants for 10minutes to explain the purpose of the evaluation ses-sion and present the methodology of SUMI evaluation.Then, the participants had 20 minutes to translate anontology using the guidelines and LabelTranslator, and10 minutes to fill in the SUMI questionnaire for user-interaction satisfaction. During these two phases of theexperiment users were not allowed to ask questions tothe evaluators.

The most important findings of the experiment arerelated to the high level of learnability shown by La-belTranslator, especially in the case of a novice user.There was only one evidence about the need of makingminor modifications in the LabelTranslator user inter-face to improve affect and efficiency with better nav-igation and informative functions. Additional detailsabout user perception with respect to the goals of eachSUMI dimension can be found in [4].


7. Conclusion

In this paper, we have presented LabelTranslator,our approach to semi-automatically localize ontologiesamong English, Spanish, and German. We first haveexplained the generic architecture of our approach tosupport an automated ontology localization in collabo-rative and distributed environments. In order to definethe infrastructure requirements we took as base the lo-calization requirements of international institutions asFAO and then we compared them with our own obser-vations in the field.

We have included the description of technical de-tails of the main components in the architecture. So,we have discussed the different aspects concerning tothe implementation of the Ontology Repository com-ponent included in the Ontology Management module.A description of the capabilities of the implementedinterfaces has also included. Concerning to the Local-ization Management module we have described theirmain functionalities, that are: synchronize the changesof the ontology and linguistic model and implementthe actions described in the collaborative workflow.Hence, we propose two generic workflows specializedat different levels of abstraction (i.e., ontology elementlevel and translation level). Our proposal includes thedefinition of the role of the workflow in the infrastruc-ture required for the management of the multilingualcontent for localization, and describes its relationshipswith other modules and activities (i.e., synchroniza-tion, label translations, etc.).

We have finished the description of technical detailsof the main components in the architecture by com-menting some implementations details about OntologyTranslator module. We have described here the transla-tion strategies used for localize simple and compoundlabels.

Although we have already evaluated, the technolog-ical aspects of the individual components of our ap-proach [8], the next step is the complete evaluation ofour collaborative approach within FAO and other col-laborative and distributed scenarios.

As future work, we point out current needs that arenot addressed in this work and that will have to be ad-dressed in future versions of LabelTranslator system:

– Evaluation. It is necessary to design extensiveevaluation mechanism of ontology localizationsystems. Beside evaluating systems, it is neces-sary to be able to help users in choosing the ap-propriate translation technique or to combine the

most appropriate techniques for their tasks. Wehave tried in this work to identify some local-ization strategies, but a lot remains to be inves-tigated, for example for the localization of in-stances and ontology term translations.

– Translation techniques. The machine translationfield identifies a wealth of basic techniques whichcan be used to discover the translations of the on-tology elements. In this work we have used onlysome of these techniques to discover the transla-tions. However, further investigation is necessaryin order to incorporate for example corpus-basedand web-based techniques in the localization ac-tivity.

– Translation strategies. In this work we have in-corporated two basic translation strategies for dis-covering the translations of simple and compoundlabels. One of the important issues to deal with isthe proper combination and integration of variouscategories of translators. In particular, the inte-gration of corpus-based (statistical) and ontology-based (semantic) techniques is of high interest.

Acknowledgements

This work was funded by the Neon Project spon-sored under EC grant number (FP6-248458) and itis actually supported by the Monnet Project (FP6-248458) and the CICYT project TIN2007-68091-C02-02.

References

[1] U.N.’s FAO Aims At Agricultural Services Based OnSemantics-Across Multiple Languages. In the Seman-tic Web online journal, March 2010. (Available at:http://www.semanticweb.com/news/unas_fao_aims_at_agricultural_services_based_on_semantics_a_across_multiple_languages_156742.asp).

[2] P. Buitelaar, P. Cimiano, P. Haase, and M. Sintek. Towardslinguistically grounded ontologies. In ESWC’09, Heraklion,Greece., 2009.

[3] P. Cimiano, E. Montiel-Ponsoda, P. Buitelaar, M. Espinoza,and A. Gómez-Pérez. A Note on Ontology Localization. InJournal of Applied Ontology (in press), 2010.

[4] Deliverable 5.6.2. (2009). In http://www.neon-project.org/.[5] M. Espinoza, A. Gómez-Pérez, and E. Mena. Enriching an on-

tology with multilingual information. In Proc. of 5th EuropeanSemantic Web Conference (ESWC’08), Tenerife, (Spain), June2008.


[6] M. Espinoza, A. Gómez-Pérez, and E. Mena. Labeltranslator -automatically localizing an ontology. In Proc. of 5th EuropeanSemantic Web Conference (ESWC’08), Tenerife, (Spain), June2008.

[7] M. Espinoza, A. Gómez-Pérez, and E. Montiel-Ponsoda. Mul-tilingual and localization support for ontologies. In Proc. of6th European Semantic Web Conference (ESWC’09), Herak-lion, (Greece), June 2009.

[8] M. Espinoza, E. Montiel-Ponsoda, and A. Gómez-Pérez. On-tology localization. In Proc. of 5th international conferenceon Knowledge capture (KCAP-09), Redondo Beach, California(USA), ACM, ISBN 978-1-60558-658-8, pp. 33-40,, 2009.

[9] P. Hasse, H. Lewen, R. Studer, and M. Erdmann. The neonontology engineering toolkit. In WWW2008, Beijing, China,2008.

[10] J. Dumas and J. Redish. A Practical Guide to Usability Testing.Exeter, UK: Intellect, 1993.

[11] S. Kim and Y. Kim. Sentence segmentation for efficient en-glish syntactic analysis. Journal of Korea Information ScienceSociety. v24 i8., 1997.

[12] S.-D. Kim, B.-T. Zhang, and Y. T. Kim. Learning-based in-trasentence segmentation for efficient translation of long sen-tences. Machine Translation, 16(3):151–174, 2001.

[13] Y.-S. Kim and Y.-J. Oh. Intra-sentence segmentation based onsupport vector machines in english-korean machine translationsystems. Expert Syst. Appl., 34(4):2673–2682, 2008.

[14] J. Kirakowski and M. Corbett. Sumi: The software usabilitymeasurement inventory. British Journal of Educational Tech-nology, 1993.

[15] H. Martin, P. De Leenheer, A. de Moor, and Y. Sure. Ontol-ogy Management, Semantic Web, Semantic Web Services, andBusiness Applications. Semantic Web and Beyond. Springer-Verlag, Heidelberg, 2008.

[16] E. Montiel-Ponsoda, G. Aguado, A. Gómez-Pérez, and W. Pe-ters. Modelling multilinguality in ontologies. In Coling 2008:Companion volume - Posters and Demonstrations, Manchester,UK, 2008.

[17] M. Nagao. A framework of a mechanical translation betweenjapanese and english by analogy principle. In Proc. of the in-ternational NATO symposium on Artificial and human intelli-gence, pages 173–180, New York, NY, USA, 1984. ElsevierNorth-Holland, Inc.

[18] TreeTagger, 1997. http://www.ims.uni-stuttgart.de/projekte/corplex/.

[19] R. Trillo, J. Gracia, M. Espinoza, and E. Mena. Discoveringthe semantics of user keywords. Journal on Universal Com-puter Science. Special Issue: Ontologies and their Applica-tions, 2007.

[20] D. Turcato, F. Popowich, P. McFetridge, D. Nicholson, andJ. Toole. Pre-processing closed captions for machine transla-tion. In NAACL-ANLP 2000 Workshop on Embedded machinetranslation systems, pages 38–45. Association for Computa-tional Linguistics, 2000.

[21] S. Yuh, K. Lee, and J. Seo. Multilingual closed caption trans-lation system for digital television. IEICE - Trans. Inf. Syst.,E89-D(6):1885–1892, 2006.

Documents

IOS Press LabelTranslator: A System for Localizing ... · Undeﬁned 1 (2009) 1–5 1 IOS Press LabelTranslator: A System for Localizing Ontologies in Collaborative Settings Mauricio