15
Localisation Focus Vol.10 Issue 1 The International Journal of Localisation 1. Introduction The term localisation has been defined as the "linguistic and cultural adaptation of digital content to the requirements and locale of a foreign market, and the provision of services and technologies for the management of multilingualism across the digital global information flow" (Schäler 2009). As the definition suggests, localisation is a complex process. Localisation involves many steps: project management, translation, review, quality assurance etc. It also requires a considerable effort as it involves many languages, dealing with characteristics and challenges unique to each of these languages such as the handling of right-to-left scripts, collation, and locale specific issues. Time-frame is another parameter that affects the complexity of the localisation process. Localisation processes require dealing with frequent software updates, short software development life cycles and the simultaneous shipment of source and target language versions (simship). A broad spectrum of software is required to handle the process, ranging from project management software to translation software. A large number of file formats are encountered during the localisation process. These file formats may consist of both open standard and proprietary file formats. Localisation processes involve different types of organisations (e.g. translation and localisation service providers) and different professions (e.g. translators, reviewers, and linguists). Localisation constantly has to deal with new challenges such as those arising in the context of mobile device content or integration with content management systems. In this extremely complex process, the ultimate goal is to maximise quality (translations, user interfaces etc.) and quantity (number of locales, simships etc.) while minimising time and overall cost. Interoperability is the key to the seamless integration of different technologies and components across the localisation process. The term interoperability has been defined in a number of different ways in the literature. For example, Lewis et al. (2008) define interoperability as: "The ability of a collection of communicating entities to (a) share specified information and (b) operate on that information according to an agreed operational semantics". The most frequently used definition for the term "interoperability" is by the IEEE: "Interoperability is the ability of two or more systems or components to exchange information and to use the information that has been exchanged." (IEEE, 1991). However, interoperability, while presenting one of the most challenging problems in localisation, has not had much attention paid to it in the literature. We 29 LocConnect: Orchestrating Interoperability in a Service-oriented Localisation Architecture Asanka Wasala, Ian O'Keeffe and Reinhard Schäler Centre for Next Generation Localisation Localisation Research Centre CSIS Dept., University of Limerick, Limerick, Ireland. {Asanka.Wasala, Ian.OKeeffe, Reinhard.Schaler}@ul.ie Abstract Interoperability is the key to the seamless integration of different entities. However, while it is one of the most challenging problems in localisation, interoperability has not been discussed widely in the relevant literature. This paper describes the design and implementation of a novel environment for the inter-connectivity of distributed localisation components, both open source and proprietary. The proposed solution promotes interoperability through the adoption of a Service Oriented Architecture (SOA) framework based on established localisation standards. We describe a generic use scenario and the architecture of the environment that allows us to study interoperability issues in localisation processes. This environment was successfully demonstrated at the CNGL Public Showcase in Microsoft, Ireland, November 2010. Keywords: : Interoperability, Localisation, SOA, XLIFF, Open Standards

LocConnect: Orchestrating Interoperability in a …...Localisation Focus The International Journal of Localisation Vol.10 Issue 1 aim to address this deficit by presenting a novel

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Localisation Focus Vol.10 Issue 1The International Journal of Localisation

1. Introduction

The term localisation has been defined as the"linguistic and cultural adaptation of digital contentto the requirements and locale of a foreign market,and the provision of services and technologies for themanagement of multilingualism across the digitalglobal information flow" (Schäler 2009). As thedefinition suggests, localisation is a complex process.Localisation involves many steps: projectmanagement, translation, review, quality assuranceetc. It also requires a considerable effort as it involvesmany languages, dealing with characteristics andchallenges unique to each of these languages such asthe handling of right-to-left scripts, collation, andlocale specific issues. Time-frame is anotherparameter that affects the complexity of thelocalisation process. Localisation processes requiredealing with frequent software updates, shortsoftware development life cycles and thesimultaneous shipment of source and target languageversions (simship). A broad spectrum of software isrequired to handle the process, ranging from projectmanagement software to translation software. A largenumber of file formats are encountered during thelocalisation process. These file formats may consistof both open standard and proprietary file formats.Localisation processes involve different types oforganisations (e.g. translation and localisation

service providers) and different professions (e.g.translators, reviewers, and linguists). Localisationconstantly has to deal with new challenges such asthose arising in the context of mobile device contentor integration with content management systems. Inthis extremely complex process, the ultimate goal isto maximise quality (translations, user interfaces etc.)and quantity (number of locales, simships etc.) whileminimising time and overall cost.

Interoperability is the key to the seamless integrationof different technologies and components across thelocalisation process. The term interoperability hasbeen defined in a number of different ways in theliterature. For example, Lewis et al. (2008) defineinteroperability as: "The ability of a collection ofcommunicating entities to (a) share specifiedinformation and (b) operate on that informationaccording to an agreed operational semantics".

The most frequently used definition for the term"interoperability" is by the IEEE: "Interoperability isthe ability of two or more systems or components toexchange information and to use the information thathas been exchanged." (IEEE, 1991).

However, interoperability, while presenting one ofthe most challenging problems in localisation, hasnot had much attention paid to it in the literature. We

29

LocConnect: Orchestrating Interoperability in a Service-orientedLocalisation Architecture

Asanka Wasala, Ian O'Keeffe and Reinhard Schäler

Centre for Next Generation Localisation Localisation Research Centre

CSIS Dept., University of Limerick,Limerick, Ireland.

{Asanka.Wasala, Ian.OKeeffe, Reinhard.Schaler}@ul.ie

AbstractInteroperability is the key to the seamless integration of different entities. However, while it is one of the mostchallenging problems in localisation, interoperability has not been discussed widely in the relevant literature. Thispaper describes the design and implementation of a novel environment for the inter-connectivity of distributedlocalisation components, both open source and proprietary. The proposed solution promotes interoperabilitythrough the adoption of a Service Oriented Architecture (SOA) framework based on established localisationstandards. We describe a generic use scenario and the architecture of the environment that allows us to studyinteroperability issues in localisation processes. This environment was successfully demonstrated at the CNGLPublic Showcase in Microsoft, Ireland, November 2010.

Keywords: : Interoperability, Localisation, SOA, XLIFF, Open Standards

Localisation Focus Vol.10 Issue 1The International Journal of Localisation

aim to address this deficit by presenting a novelapproach to interoperability across localisation toolsthrough the adoption of a Service OrientedArchitecture (SOA) framework based on establishedlocalisation standards. We describe a generic usescenario and the architecture of the approach offeringan environment for the study of interoperabilityissues in localisation process management. To ourknowledge, this is the first demonstrator prototypebased on SOA and open localisation standardsdeveloped as a test bed in order to exploreinteroperability issues in localisation.

The remainder of the paper is organized as follows:Section 2 provides an overview of interoperability ingeneral, and in localisation in particular, in thecontext of open localisation standards; Section 3explains the experimental setup, introduces theLocConnect framework, and presents the localisationcomponent interoperability environment developedas part of this research; Section 4 presents thearchitecture of LocConnect in detail; and section 5discusses future work. The paper concludes with asummary of the present work and the contributionsmade by this study.

2. Background

Currently, software applications are increasinglymoving towards a distributed model. Standards arevital for the interoperability of these distributedsoftware applications. However, one of the majorproblems preventing successful interoperabilitybetween and integration of distributed applicationsand processes is the lack of (standardised) interfacesbetween them.

In order to address these issues, workflowinteroperability standards have been proposed(Hayes et al 2000) to promote greater efficiency andto reduce cost. The Wf-XML message set defined bythe Workflow Management Coalition (WfMC) andThe Simple Workflow Access Protocol (SWAP) areexamples of such internet-scale workflow standards(Hayes et al 2000). Most of these standards onlydefine the data and metadata structure whilestandards such as Hyper-Text Transfer Protocol(HTTP), Common Object Request BrokerArchitecture (CORBA), and the Internet Inter-ORBProtocol (IIOP) focus on the transportation of datastructures (Hayes et al 2000).

From a purely functional standpoint, we also have theWeb Service Description Language (WSDL), the

most recent version being WSDL 2.0 (W3C 2007).WSDL is an XML-based language that definesservices as a collection of network endpoints or ports.It is regarded as being a simple interface definitionlanguage (Bichler and Lin 2006) which does notspecify message sequence or its constraints onparameters (Halle et al 2010). However, while it doesdescribe the public interface to a web service, itpossesses limited descriptive ability and covers onlythe functional requirements in a machine-readableformat. Where this becomes an issue is in defining anon-static workflow, as the interface does not provideenough information to allow a broker to make a valuejudgement in terms of other qualities that are ofconsiderable interest in the localisation process, suchas the quality, quantity, time and cost aspectsdiscussed earlier. These service attributes are muchmore difficult to define, as they cover the non-functional aspects of a service, e.g. how well it isperformed. This contrasts with the more Booleanfunctional requirements (either it complies with theservice support requirements, or it does not).Therefore, WSDL does not provide sufficientcoverage to support our requirements forinteroperability.

There are some notable examples of localisation andtranslation-centric web services, such as thosecurrently offered by Google, Bing and Yahoo!.However, even here we run into interoperabilityissues as the interfaces provided do not follow anyspecific standard, and connecting to these services isstill very much a manual process requiring theintervention of a skilled computer programmer to setup the call to the service, to validate the data sent interms of string length, language pair, and so on, andthen to handle the data that is returned. Somelocalisation Translation Management Systems (TMS)purport to provide such flexibility, but they tend to bemonolithic in their approach, using pre-definedworkflows, and requiring dedicated developers toincorporate services from other vendors into theseworkflows through the development of bespokeAPIs. What is needed is a unified approach forintegrating components, so that any service can becalled in any order in an automated manner.

2.1 The XLIFF StandardThe XML-based Localization Interchange FileFormat (XLIFF) is an open standard for exchanginglocalisation data and metadata. It has been developedto address various issues related to the exchange oflocalisation data.

30

Localisation Focus Vol.10 Issue 1The International Journal of Localisation

The XLIFF standard was first developed in 2001 bya technical committee formed by representatives of agroup of companies, including Oracle, Novell,IBM/Lotus, Sun, Alchemy Software, Berlitz,Moravia-IT, and ENLASO Corporation (formerly theRWS Group). In 2002, the XLIFF specification wasformally published by the Organization for theAdvancement of Structured Information Standards(OASIS) (XLIFF-TC 2008).

The purpose of XLIFF as described by OASIS is to"store localizable data and carry it from one step ofthe localization process to the other, while allowinginteroperability between tools" (XLIFF-TC 2008).By using this standard, localisation data can beexchanged between different companies,organizations, individuals or tools. Various fileformats such as plain text, MS Word, DocBook,HTML, XML etc. can be transformed into XLIFF,enabling translators to isolate the text to be translatedfrom the layout and formatting of the original fileformat.

The XLIFF standard aims to (Corrigan & Foster2003):

l Separate translatable text from layout andformatting data;

l Enable multiple tools to work on source strings;

l Store metadata that is helpful in thetranslation/localisation process.

The XLIFF standard is becoming the de factostandard for exchanging localisation data. It isaccepted by almost all localisation service providersand is supported by the majority of localisation toolsand CAT tools. The XLIFF standard is beingcontinuously developed further by the OASIS XLIFFTechnical Committee (2010).

2.2 Localisation Standards and InteroperabilityIssuesAlthough the adoption of localisation standardswould very likely provide benefits relating toreusability, accessibility, interoperability, andreduced cost, software publishers often refrain fromthe full implementation of a standard or do not carryout rigorous standard conformance testing. There isstill a perceived lack of evidence for improvedoutcomes and an associated fear of the high costs ofstandard implementation and maintenance. One ofthe biggest problems with regards to tools and

technologies today is the pair-wise product drift(Kindrick et al 1996), i.e. the need for the output ofone tool to be transformed in order to compensate foranother tool's non-conforming behaviour. This trait ispresent within the localisation software industry.Although the successful integration of differentsoftware brings enormous benefits, it is still a veryarduous task.

Most current CAT tools, while accepting anddelivering a range of file formats, maintain their ownproprietary data formats within the boundary of theapplication. This makes sharing of data between toolsfrom different software developers very difficult, asconversion between formats often leads to data loss.

XLIFF, as mentioned above, intends to provide asolution to these problems, but true interoperabilitycan only be achieved once the XLIFF standard isimplemented in full by the majority of localisationtools providers. Currently, XLIFF compliance seemsto be regarded as an addition to the function list ofmany localisation applications, rather than beingused to the full extent of its abilities, and indeedmany CAT tools seem to pay mere lip service to theXLIFF specification (Anastasiou and Morado-Vazquez 2010; Bly 2010), outputting just a minorsubset of the data contained in their proprietaryformats as XLIFF to ensure conformance.

3. Experimental Setup

With advancements in technology, the localisationprocess of the future can be driven by a successfulintegration of distributed heterogeneous softwarecomponents. In this scenario, the components aredynamically integrated and orchestrated dependingon the available resources to provide the best possiblesolution for a given localisation project. However,such an ideal component-based interoperabilityscenario in localisation is still far from reality.Therefore, in this research, we aim to model this idealscenario by implementing a series of prototypes. Asthe initial step, an experimental setup has beendesigned containing the essential components.

The experimental setup includes multiple interactingcomponents. Firstly, a user creates a localisationproject by submitting a source file and supplyingsome parameters through a user interface component.Next, the data captured by this component is sent toa Workflow Recommender component. TheWorkflow Recommender implements the appropriatebusiness process. By analysing source file content,

31

Localisation Focus Vol.10 Issue 1The International Journal of Localisation

resource files as well as parameters provided by theuser, the Workflow Recommender offers an optimumworkflow for this particular localisation project.Then, a Mapper component analyses this workflowand picks the most suitable components to carry outthe tasks specified in the workflow. Thesecomponents can be web services such as MachineTranslation systems, Translation Memory Systems,Post Editing systems etc. The Mapper will establishlinks with the selected components. Then a datacontainer will be circulated among the differentcomponents according to the workflow establishedearlier. As this data container moves throughdifferent components, the components modify thedata. At the end of the project's life cycle, a Convertercomponent transforms this data container to atranslated or localised file which is returned to theuser.

Service Oriented Architecture is a key technologythat has been widely adopted for integrating suchhighly dynamic distributed components. Ourresearch revealed that the incorporation of anorchestration engine is essential to realise asuccessful SOA-based solution for coordinatinglocalisation components. Furthermore, the necessityof a common data layer that will enable thecommunication between components becameevident. Thus, in order to manage the processes aswell as data, we incorporated an orchestration engineinto the aforementioned experimental setup. Thisexperimental setup along with the orchestrationengine provide an ideal framework for theinvestigation of interoperability issues amonglocalisation components.

3.1 LocConnectAt the core of the experimental setup are theorchestration engine and the common data layer,which jointly provide the basis for the exploration ofinteroperability issues among components. Thisprototype environment is called LocConnect. Thefollowing sections introduce the features ofLocConnect and describe its architecture.

3.1.1 Features of LocConnectLocConnect interconnects localisation componentsby providing access to an XLIFF-based data layerthrough an Application Programming Interface(API). By using this common data layer we allow forthe traversal of XLIFF-based data between differentlocalisation components. Key features of theLocConnect testing environment are summarizedbelow.

l Common Data Layer and ApplicationProgramming Interface

LocConnect implements a common XLIFF-baseddatastore (see section 4.5) corresponding toindividual localisation projects. The components canaccess this datastore through a simple API.Furthermore, the common datastore can also holdvarious supplementary resource files related to alocalisation project (see section 4.4). Componentscan manipulate these resource files through the API.

l Workflow Engine

The orchestration of components is achieved via anintegrated workflow engine that executes alocalisation workflow generated by anothercomponent.

l Live User Interface (UI)

One of the important aspects of a distributedprocessing scenario is the ability to track progressalong the different components. An AJAX-poweredUI has been developed to display the status of thecomponents in real-time. LocConnect's UI has beendeveloped in a manner that allows it to be easilylocalised into other languages.

l Built-in post-editing component (XLIFF editor)

In the present architecture, localisation projectcreation and completion happens within LocConnect.Therefore, an online XLIFF editor was developedand incorporated into LocConnect in order tofacilitate post-editing of content.

l Component Simulator

In the current experimental setup, only a smallnumber of components, most of them developed aspart of the CNGL research at the University ofLimerick and other participating research groups,have been connected up. The WorkflowRecommender, Mapper, Leveraging Component anda Translating Rating component are among thesecomponents. A component simulator was, therefore,developed to allow for further testing ofinteroperability issues in an automated localisationworkflow using the LocConnect framework.

A single-click installer and administratorconfiguration panel for LocConnect were developedas a part of this work to allow for easy installation

32

Localisation Focus Vol.10 Issue 1The International Journal of Localisation

and user-friendly administration.

3.1.2 Business CaseCloud-based storage and applications are becomingincreasingly popular. While the LocConnectenvironment supports the adhoc connection oflocalisation components, it can also serve as cloud-based storage for localisation projects. These andother key advantages of LocConnect from a businesspoint of view are highlighted below.

l Cloud-based XLIFF and resource file storage

LocConnect can simply be used as a cloud-basedXLIFF storage. Moreover, due to its ability to storeresource files (e.g. TMX, SRX etc.), it can be used asa repository for localisation project files. As such,LocConnect offers a central localisation datarepository which is easy to backup and maintain.

l Concurrent Versioning System (CVS)

During a project's life cycle, the associated XLIFFdata container continuously changes as it travelsthrough different localisation components.LocConnect keeps track of these changes and storesdifferent versions of the XLIFF data container.Therefore, LocConnect acts as a CVS system forlocalisation projects. LocConnect provides thefacility to view both data and metadata associatedwith the data container at different stages of aworkflow.

l In-built Online XLIFF editor

Using the inbuilt online XLIFF editor, users can editXLIFF content easily. The AJAX-based UI allowseasy inline editing of content. Furthermore, theonline editor shows alternative translations as well asuseful metadata associated with each translation unit.

l Access via internet or intranet

With its single click installer, it can easily bedeployed via the internet or an intranet. LocConnectcan also act as a gateway application whereLocConnect is connected to the internet while thecomponents can safely reside within an intranet.

l Enhanced revenues

The LocConnect-centric architecture increases dataexchange efficiency as well as automation. Due toincreased automation, we would expect lower

localisation costs and increased productivity.

3.2 Description of Operation (Use Case)The following scenario provides a typical use casefor LocConnect in the above experimental setup.

A project manager logs into the LocConnect serverand creates a LocConnect project (a.k.a. a job) byentering some parameters. Then the project manageruploads a source file. The LocConnect server willgenerate an XLIFF file and assign a unique ID to thisjob. Next, it will store the parameters capturedthrough its interface in the XLIFF file and embed theuploaded file in the same XLIFF file as an internalfile reference. The Workflow Recommender will thenpick up the job from LocConnect (see the proceduredescribed in section 4.2.1), retrieve thecorresponding XLIFF file and analyse it. TheWorkflow Recommender will generate an optimumworkflow to process the XLIFF file. The workflowdescribes the other components that this XLIFF filehas to go through and the sequence of thesecomponents. The Workflow Recommender embedsthis workflow information in the XLIFF file. Oncethe workflow information is attached, the file will bereturned to the LocConnect server. WhenLocConnect receives the file from the WorkflowRecommender, it decodes the workflow informationfound in the XLIFF file and initiates the rest of theactivities in the workflow. Usually, the next activitywill be to send the XLIFF file to a MapperComponent which is responsible for selecting thebest web services, components etc. for processing theXLIFF file. LocConnect will establishcommunication with the other specified componentsaccording to the workflow and componentdescriptions. As such, the workflow will be enactedby the LocConnect workflow engine. Once theXLIFF file is fully processed, XLIFF content can beedited online using LocConnect's built-in editingcomponent. During the project's lifecycle, the projectmanager can check the status of the componentsusing LocConnect's live project tracking interface.Finally, the project manager can download theprocessed XLIFF and the localised files.

4. Architecture

This section describes the LocConnect architecture indetail.

LocConnect is a web-based, client-server system.The design is based on a three-tier architecture asdepicted in figure 1. The implementation of the

33

Localisation Focus Vol.10 Issue 1The International Journal of Localisation

system is based on PHP and AJAX technologies.

Figure 1. Three-tier architecture of LocConnect

User interface tier - a client-based graphical userinterface that runs on a standard web browser. Theuser interface provides facilities for projectmanagement, administration and tracking.

Middle tier - contains most of the logic andfacilitates communication between the tiers. Themiddle tier mainly consists of a workflow engine andprovides an open API with a common set of rules thatdefine the connectivity of components and their inputoutput (IO) operations. The components simply dealwith this interface in the middle tier.

Data Storage tier - uses a relational database for thestorage and searching of XLIFF and other resourcedata. The same database is used to store informationabout individual projects.

The tiers are described below.

4.1 User InterfaceWeb-based graphical user interfaces were developedfor:

1. Capturing project parameters during projectcreation;

2. Tracking projects (i.e. to display the currentstatus of projects);

3. Post-editing translations;

4. Configuring the server and localising theinterface of LocConnect.

During project creation, a web-based form ispresented to a user. This form contains fields that are

required by the Workflow Recommender to generatea workflow. Parameters entered through this interfacewill be stored in the XLIFF file along with theuploaded source file (or source text) and resourcefiles. The project is assigned a unique ID through thisinterface and this ID is used throughout the project'slifecycle.

The project-tracking interface reflects the project'sworkflow. It shows the current status of a project, i.e.pending, processing, or complete in relation to eachcomponent. It displays any feedback messages (suchas errors, warnings etc.) from components. Thecurrent workflow is shown in a graphicalrepresentation. Another important feature is a log ofactivities for the project. Changes to the XLIFF file(i.e. changes of metadata) during different stages ofthe workflow can be tracked. The project-trackinginterface uses AJAX technologies to dynamicallyupdate its content frequently (see figure 2).

Figure 2. Project Tracking UI

At the end of a project's lifecycle, the user is giventhe option to post-edit its content using the built-inXLIFF post-editor interface. It displays sourcestrings, translations, alternative translations andassociated metadata. Translations can be editedthrough this interface. The Post-editing componentalso uses AJAX to update XLIFF files in the maindatastore (see section 4.4). See figure 3 for a screen-shot of the post-editing interface. A preliminarytarget file preview mechanism has been developedand integrated into the same UI.

34

Localisation Focus Vol.10 Issue 1The International Journal of Localisation

Figure 3. Post-Editing Interface

A password-protected interface has been provided forthe configuration of the LocConnect server. Throughthis interface various configuration options such asLocConnect database path, component descriptionsetc. can be edited. The same interface can be used tolocalise the LocConnect server itself (see figure 4 fora screenshot of the administrator's interface).

Figure 4. Administrator's Interface

The user interfaces were implemented in PHP,Javascript, XHTML and use the JQuery library forgraphical effects and dynamic content updates.

4.2 Middle tier: Application ProgrammingInterface (API)The LocConnect server implements aRepresentational State Transfer (REST) - based

interface (Fielding 2000) to send and retrieveresources, localisation data and metadata betweencomponents through HTTP-GET and HTTP-POSToperations using proper Uniform ResourceIdentifiers (URI). These resources include:

l Localisation projects;

l XLIFF files;

l Resource files (i.e. files such as TBX, TMX, SRXetc.);

l Resource Metadata (metadata to describe resourcefile content).

The LocConnect API provides functions for thefollowing tasks:

1. Retrieving a list of jobs pending for a particularcomponent (list_jobs method);

2. Retrieving an XLIFF file corresponding to aparticular job (get_job method);

3. Setting the status of a job. The status can be oneof the following: Pending, Processing, Complete(set_status method);

4. Sending a feedback message to the server(send_feedback method);

5. Sending processed XLIFF files to the sever(send_output method);

6. Sending a resource file (i.e. a non-XLIFF assetfile) to the server (send_resource method);

7. Retrieving a resource file from the server(get_resource method);

8. Retrieving metadata associated with a resourcefile (get_metadata method).

A complete description of each REST-based functionis provided below.

Obtaining available jobs: list_jobs methodThis method takes a single argument: component ID.It will return an XML containing the IDs of jobspending for any given component. The IDs arealphanumeric and consist of 10 characters. Thecomponent ID is a string (usually, a short form of acomponent's name, such as WFR for Workflow

35

Localisation Focus Vol.10 Issue 1The International Journal of Localisation

Recommender).

This method uses the HTTP GET method tocommunicate with the LocConnect server.

<jobs><job>16674f2698</job><job>633612fb37</job>

</jobs>

Retrieving the XLIFF file corresponding to aparticular job: get_job methodThis method takes two arguments: component ID andjob ID. It will return a file corresponding to the givenjob ID and component ID. Usually, the file is anXLIFF file, however it can be any text-based file.Therefore, the returned content is always enclosedwithin special XML mark-up: <content>..</content>.The XML declaration of the returned file will beomitted in the output (i.e. <?xml version="1.0" ..?>will be stripped off from the output).

This method uses the HTTP GET method tocommunicate with the LocConnect server.

<content><xliffversion='1.2'xmlns='urn:oasis:names:tc:xliff:document:1.2'><file original='hello.txt' source-language='en' target-language='fr' datatype='plaintext'><body><trans-unit id='hi'>

<source>Hello world</source><target>Bonjour le monde</target>

</trans-unit></body>

</file></xliff></content>

Setting current status: set_status methodThis method takes three arguments: component ID,job ID, status. The status can be 'pending','processing' or 'complete'. Initially, the status of a jobis set to 'pending' by the LocConnect server to markthat a job is available for pick up by a certaincomponent. Once the job is picked by the component,it will change the status of the job to 'processing'.This ensures that the same job will not be re-allocatedto the component. Once the status of a job is set to'complete', LocConnect will perform the next actionspecified in the workflow.

This method uses the HTTP GET method tocommunicate with the LocConnect server.

Sending feedback message: send_feedbackmethodThis method takes three arguments: component ID,job ID, feedback message. Components can sendvarious messages (e.g. error messages, notificationsetc.) to the server through this method. Thesemessages will be instantly displayed in the relevantjob tracking page of the LocConnect interface. Thelast feedback message sent to the LocConnect serverbefore sending the output file will be stored withinthe LocConnect server and it will appear in theactivity log of the job. The messages are restricted to256 words in length.

This method uses the HTTP GET method tocommunicate with the LocConnect server.

Sending a processed XLIFF file: send_outputmethodThis method takes three arguments: component ID,job ID and content. The content is usually aprocessed XLIFF file. Once the content is receivedby LocConnect, it will be stored within theLocConnect datastore. LocConnect will wait for thecomponent to set the status of the job to 'complete'and move on to the next step of the workflow.

This method uses the HTTP POST method tocommunicate with the LocConnect server.

Storing a resource file: send_resource methodThis method takes one optional argument: resourceID and two mandatory arguments: resource file andmetadata description. The resource file should be intext format. Metadata has to be specified using thefollowing notation:

Metadata notation: 'key1:value1-key2:value2-key3:value3'e.g. 'language:en-domain:health'

If the optional argument resource ID is not given,LocConnect will generate an ID and assign that ID tothe resource file. If the resource ID is given, it willoverwrite the current resource file and metadata withthe new resource file and metadata.

This method usew the HTTP POST method tocommunicate with the LocConnect server.

Retrieving a stored resource file: get_resourcemethodThis method takes one argument: resource ID. Giventhe resource ID, the LocConnect server will return

36

Localisation Focus Vol.10 Issue 1The International Journal of Localisation

the resource associated with the given ID.

This method uses the HTTP GET method tocommunicate with the LocConnect server.

Retrieving metadata associated with a resourcefile: get_metadata methodThis method takes one argument: resource ID. TheLocConnect server will return the metadataassociated with the given resource ID as shown in theexample below:

<metadata><meta key="language" value="en"><meta key="domain" value="health">

</metadata>

This method uses the HTTP GET method tocommunicate with the LocConnect server.

4.2.1 Component-Server Communication ProcessA typical LocConnect component-servercommunication process includes the followingphases.

Step 1: list_jobs

This component calls the list_jobs method to retrievea list of available jobs for that component byspecifying its ID.

Step 2: get_job

This component uses get_job to retrieve the XLIFFfile corresponding to the given job ID and thecomponent ID.

A component may either process one job at a time ormany jobs at once. However, the get_job method isonly capable of returning a single XLIFF file at atime.

Step 3: set_status - Set status to processing

This component sets the status of the selected job to'processing'.

Step 4: Process file

This component processes the retrieved XLIFF file.It may send feedback messages to the server whileprocessing the XLIFF file. These feedback messageswill be displayed in the job tracking interface of theLocConnect.

Step 5: send_output

This component sends the processed XLIFF file backto the LocConnect server using send_output method.

Step 6: set_status

This component sets the status of the selected job to'complete'. This will trigger the LocConnect server tomove to the next stage of the workflow.

4.3 Middle tier: Workflow EngineA simple workflow engine has been developed andincorporated into the LocConnect server to allow forthe management and monitoring of individuallocalisation jobs. The current workflow engine doesnot support parallel processes or branching.However, it allows the same component to be usedseveral times in a workflow. The engine parses theworkflow information found in the XLIFF datacontainer (see section 4.5) and stores the workflowinformation in the project management datastore.The project management datastore is then used tokeep track of individual projects. In the current setup,setting the status of a component to 'complete' willtrigger the next action of the workflow.

4.4 LocConnect DatastoreThe database design can be logically stratified in 3layers:

l Main datastore holds XLIFF files;

l Project management datastore holds data aboutindividual projects and their status;

l Resource datastore holds data and metadata aboutother resource files;

The main datastore is used to store XLIFF filescorresponding to different jobs. It stores differentversions of the XLIFF file that correspond to aparticular job. Therefore, the LocConnect server alsoacts as a Concurrent Versions System (CVS) forlocalisation projects.

The project management datastore is used for storingthe information necessary to keep track of individuallocalisation jobs with respect to localisationworkflows. Furthermore, it is used to store varioustime-stamps such as job pick-up time, job completiontime etc by different components.

37

Localisation Focus Vol.10 Issue 1The International Journal of Localisation

The resource datastore is used to store various assetfiles associated with localisation projects. The assetfiles can be of any text-based file format such asTMX, XLIFF, SRX, TBX, XML etc. Thecomponents can store any intermediate files,temporary or backup files in this datastore. The filescan then be accessed at any stage during workflowexecution. The resource files (i.e. asset files) can bedescribed further using metadata. The metadataconsists of key-value pairs associated with theresource files and can also be stored in the resourcedatastore.

SQLite was chosen as the default database forimplementing the logical data structure in thisprototype, for a number of reasons. Firstly, it can beeasily deployed. It is lightweight and virtually noadministration required. Furthermore, it does notrequire any configuration.

4.5 XLIFF Data ContainerThe core of this architecture is the XLIFF-based datacontainer defined in this research. Maximum efforthas been made to abstain from custom extensions indefining this data container. Different componentswill access and make changes to this data containeras it travels through different components anddifferent phases of the workflow. The typicalstructure of the data container is given in figure 5.

When a new project is created in LocConnect, it willappend parameters captured via the project creationpage into the metadata section (see section 2) of thedata container. The metadata is stored as key-valuepairs. During the workflow execution process,various components may use, append or change themetadata. The source file uploaded by the user willbe stored within the XLIFF data container as aninternal file reference (see section 1). Any resourcefiles uploaded during the project creation will also bestored as external-references as shown in section 4.4.The resource files attached to this data container canbe identified by their unique IDs and can be retrievedat any stage during the process. Furthermore, theidentifier will allow retrieval of the metadataassociated with those resources.

After project creation, the data container generated(i.e. the XLIFF file) is sent to the WorkflowRecommender component. It analyses the projectmetadata as well as the original file format torecommend the optimum workflow to process thegiven source file. If the original file is in a formatother than XLIFF, the Workflow Recommender will

suggest that the data container to be sent to a FileFormat Converter component. The file formatconverter will read the original file from the aboveinternal-file reference and convert the source file intoXLIFF. The converted content will be stored in thesame data container using the <body> section and theskeleton sections. The data container with theconverted file content is then reanalysed by theWorkflow Recommender component in order topropose the rest of the workflow. The workflowinformation will be stored in section 3 of the datacontainer. When the LocConnect server receives thedata container back from the WorkflowRecommender component, it will parse the workflowdescription and execute the rest of the sequence.Once the entire process is completed, the convertercan use the data container to build the target file.

In this architecture, a single XLIFF-based datacontainer is being used throughout the process.Different workflow phases and associated tools canbe identified by the standard XLIFF elements such as<phase> and <tools>. Furthermore, tools can includevarious statistics (e.g. <count-groups>) in the sameXLIFF file.

The XLIFF data container based architectureresembles the Transmission Control Protocol and theInternet Protocol (TCP/IP) architecture in that thedata packet is routed based on its content. However,in this scenario, LocConnect plays several roles,including the role of a router, web server and a fileserver.

38

Localisation Focus Vol.10 Issue 1The International Journal of Localisation

39

Figure 5. XLIFF-Based Data Container

Localisation Focus Vol.10 Issue 1The International Journal of Localisation

5. Discussion and Future Work

Savourel (2007) highlights the importance of a"Translation Resource Access API" which facilitateslocalisation data exchange among different systemsin a heterogeneous environment. Like Savrourel(2007) we also believe that access to a common datalayer through an API would enable interoperabilitybetween different localisation components. Thedevelopment of the prototype has revealedsyntactical requirements of such an API as well as thecommon data layer. Whilst the prototype provides atest bed for the exploration of interoperability issuesamong localisation tools, it has a number oflimitations.

In the present architecture, metadata is being storedas attribute-value pairs within an internal filereference of the XLIFF data container (see section 3of figure 5). However, according to the currentXLIFF specification (XLIFF-TC 2008), XMLelements cannot be included within an internal filereference. Doing so will result in an invalid XLIFFfile. While this could be interpreted as a limitation ofthe XLIFF standard itself, the current metadatarepresentation mechanism also presents severalproblems. The metadata is exposed to all thecomponents. Yet there might be situations wheremetadata should only be exposed to certaincomponents. Therefore, some security and visibilitymechanisms have to be implemented for themetadata. Moreover, there may be situations wherecomponents need to be granted specific permissionsto access metadata, e.g. read or write. Theseproblems can be overcome by separating themetadata from the XLIFF data container. That is, themetadata has to be stored in a separate datastore (asin the case of resource files). Then, specific APIfunctions can be implemented to manipulatemetadata (e.g. add, delete, modify, retrieve) bydifferent components. This provides a securemechanism to manage metadata.

The Resource Description Framework (RDF) is aframework for describing metadata (Anastasiou2011). Therefore, it is worthwhile exploring thepossibility of representing metadata using RDF. Forexample, API functions could be implemented toreturn the metadata required by a component in RDFsyntax.

The current API lacks several important functions.Functions should be implemented for deletingprojects (and associated XLIFF files), modifying

projects, deleting resource files and modifyingmetadata associated with resource files etc. Thecurrent API calls set_output and set_status to'complete' could be merged (i.e. sending the outputby a component will automatically set its status to'complete'). Furthermore, a mechanism could beimplemented for granting proper permissions tocomponents for using the above functions. Usermanagement is a significant aspect that we did notpay much attention to when developing the initial testbed. User roles could be designed and implementedso that users with different privileges can assigndifferent permissions to components as well asdifferent activities managed through the LocConnectserver. This way, data security could be achieved to acertain extent. Furthermore, an API key should beintroduced for the validation of components asanother security measure. This way, componentswould have to specify the key whenever they useLocConnect API functions in order to access theLocConnect data.

The XLIFF data container could contain sensitivedata (i.e. source content, translations or metadata)which some components should not be able to access.A mechanism could be implemented to secure thecontent and to grant permissions to components sothat they would only be able to access relevant datafrom the XLIFF data container. There are threepotential solutions to this problem. One would be tolet the workflow recommender (or the Mapper) selectonly secure and reliable components. The secondsolution could be to encrypt content within theXLIFF data container. The third solution could be toimplement API functions to access specific parts ofthe XLIFF data container. However, the lattermechanism will obviously increase the complexity ofthe overall communication process due to frequentAPI calls to the LocConnect server.

Because the XLIFF standard was originally definedas a localisation data exchange format, it has, so far,not been thoroughly assessed with regard to itssuitability as a localisation data storage format or asa data container. A systematic evaluation has to beperformed on the use of XLIFF as a data container inthe context of a full localisation project life cycle, asfacilitated by our prototype. For example, during thetraversal, an XLIFF-based data container couldbecome cumbersome causing performancedifficulties. Different approaches to addressing likelyperformance issues could be explored, such as datacontainer compression, support for parallelprocessing, or the use of multiple XLIFF-based data

40

Localisation Focus Vol.10 Issue 1The International Journal of Localisation

containers transmitted in a single compressedcontainer. The implications of such strategies wouldhave to be evaluated, such as the need to equip thecomponents with a module to extract and compressthe data container.

While the current workflow engine provides essentialprocess management operations, it currently lacksmore complex features such as parallel processes andbranching. Therefore, incorporation of a fully-fledged workflow engine into the LocConnect serveris desirable. Ideally, the workflow engine shouldsupport standard workflow description languagessuch as Business Process Execution Language(BPEL) or Yet Another Workflow Language(YAWL). This would allow the LocConnect server tobe easily connected to an existing business process,i.e. localisation could be included as a part of anexisting workflow. In the current system, theworkflow data is included as an internal file referencein the XLIFF data container (see section 3 of figure5) which invalidates the XLIFF file due to the use ofXML elements inside the internal file reference. Infuture versions, this problem can be easily addressedby simply storing the generated workflow as aseparate resource file (e.g. using BPEL) andproviding the link to the resource file in the XLIFFdata container as an external file reference.

LocConnect implements REST-based services forcommunication with external components.Therefore, it is essential to implement our ownsecurity measures in the REST-based API. Sincethere are no security measures implemented in thecurrent LocConnect API, well-established andpowerful security measures such as XML encryption,API keys would need to be implemented in the APIas well as in the data transmission channel (e.g. theuse of Secure Socket Layer (SSL) tunnels for RESTcalls).

Currently, the LocConnect server implements a'PULL' based architecture where components have toinitiate the data communication process. Forexample, components must keep checking for newjobs in the LocConnect server and fetch jobs from theserver. The implementation of both 'PUSH' and'PULL' based architectures would very likely yieldmore benefits. Such architecture would help tominimize communication overhead as well asresource consumption (e.g. the LocConnect servercan push a job whenever a job is available for acomponent, rather than a component continuouslychecking the LocConnect server for jobs). The

implementation of both 'PUSH' and 'PULL' basedarchitectures would also help to establish theavailability of the components prior to assigning ajob, and help the LocConnect server to detectcomponent failures. The current architecture lacksthis capability of identifying communication failuresassociated with components. If the LocConnectserver could detect communication failures, it couldthen select substitute components (instead of failedcomponents) to enact a workflow. An architecturesimilar to internet protocol could be implementedwith the help of a Mapper component. For example,whenever the LocConnect server detects acomponent failure, the data container could beautomatically re-routed to another component thatcan undertake the same task so that the failure of acomponent will not affect the rest of the workflow.

The current resource datastore is only capable ofstoring textual data. Therefore, it could be enhancedto store binary data too. This would enable the storingof various file formats including windows executablefiles, dll files, video files, images etc. Once theresource datastore is improved to store binary data,the original file can be stored in the resourcedatastore and in XLIFF, and a reference to thisresource can be included as an external file reference(see section 1 of figure 5).

In the present architecture, the information aboutcomponents has to be manually registered with theLocConnect server using its administrator interface.However, the architecture should be improved todiscover and register ad-hoc componentsautomatically.

5.1 Proposed improvements to the XLIFF baseddata container and new architectureBy addressing the issues related to the above XLIFF-based data container, a fully XLIFF compliant datacontainer could be developed to evaluate its effect onimprovements in interoperability. A sample XLIFFdata container is introduced in figure 6.

This data container differs from the current datacontainer (see figure 5) in the following aspects:

The new container:

l Does not represent additional metadata (i.e.metadata other than that defined in the XLIFFspecification) within the data container itself.Instead, this metadata will be stored in a separatemetadata store that can be accessed via

41

Localisation Focus Vol.10 Issue 1The International Journal of Localisation

corresponding API functions.

l Does not represent workflow metadata as aninternal file reference. Instead, the workflowmetadata will be stored separately in the resourcedatastore. A link to this workflow will then beincluded in the XLIFF data container as anexternal file reference (see section 2 of figure 6).

l Does not store the original file as an internal filereference. It will also be stored separately in theresource datastore. An external file reference willbe included in the XLIFF file as shown in section1 of figure 6.

The new data container does not use any extensionsto store additional metadata or data, nor does it useXML syntax within internal-file elements. Thus, theabove architecture would provide a fully XLIFFcompliant (i.e. XLIFF strict schema compatible)interoperability architecture. Due to the separation ofthe original file content, workflow information andmetadata from the XLIFF data container, thecontainer itself becomes lightweight and easy tomanipulate. The development of a file formatconverter component based on this data containerwould also be uncomplicated.

6. Conclusions

In this paper we presented and discussed a service-oriented framework that was developed and then

applied to evaluate interoperability in localisationprocess management using the XLIFF standard. Theuse cases, architecture and issues of this approachwere discussed. A prototype of the framework wassuccessfully demonstrated at the CNGL PublicShowcase in Microsoft, Ireland, in November 2010.

The framework has revealed the additional metadataand related infrastructure services required forlinking distributed localisation tools and services. Ithas also been immensely helpful in identifyingprominent issues that need to be addressed whendeveloping a commercial application.

The prototype framework described in this paper isthe first to use XLIFF as a data container to addressinteroperability issues among localisation tools. Inour opinion, the successful implementation of thispilot prototype framework suggests the suitability ofXLIFF as a full project life-cycle data container thatcan be used to achieve interoperability in localisationprocesses. The development of the above prototypehas mostly focused on addressing the syntacticinteroperability issues in localisation processes. Thefuture work will mainly focus on addressing thesemantic interoperability issues of localisationprocesses by improving the proposed system. TheLocConnect framework will serve as a platform forfuture research on interoperability issues inlocalisation.

42

Figure 6. Improved Data Container

Localisation Focus Vol.10 Issue 1The International Journal of Localisation

Acknowledgement

This research is supported by the Science FoundationIreland (Grant 07/CE/I1142) as part of the Centre forNext Generation Localisation (www.cngl.ie) at theLocalisation Research Centre (Department ofComputer Science and Information Systems),University of Limerick, Limerick, Ireland. We wouldalso like to acknowledge the vital contributions ofour colleagues and fellow researchers from theCNGL project.

References

Anastasiou, D. and Morado-Vazquez, L. (2010)'Localisation Standards and Metadata', ProceedingsMetadata and Semantic Research, 4th InternationalConference (MTSR 2010). Communications in Computerand Information Science, Springer, 255-276.

Anastasiou, D. (2011) 'The Impact of Localisation onSemantic Web Standards', in European Journal of ePractice,N. 12, March/April 2011, ISSN 1988-625X, 42-52.

Bichler, M. and Lin, K. J. (2006) 'Service-orientedcomputing'. IEEE Computer 39(3), 99-101.

Bly, M. (2010) 'XLIFFs in Theory and in Reality' [online],available: http://www.localisation.ie/xliff/resources/presentations/xliff_symposium_micahbly_20100922_clean.pdf [accessed 09Jun 2011].

Corrigan, J. & Foster, T. (2003) 'XLIFF: An Aid toLocalization' [online], available: http://developers.sun.com/dev/gadc/technicalpublications/articles/xliff.html[accessed 22 Jun 2009].

Fielding, R. (2000) 'Architectural Styles and the Design ofNetwork-based Software Architectures' [PhD], Universityof California, Irvine.

Halle, S., Bultan, T., Hughes, G., Alkhalaf, M. andVillemaire, R. (2010) 'Runtime Verification of Web ServiceInterface Contracts', Computer, 43(3), 59-66.

Hayes, J. G., Peyrovian, E., Sarin, S., Schmidt, M. T.,Swenson, K. D. and Weber, R. (2000) 'Workflowinteroperability standards for the Internet', InternetComputing, IEEE, 4(3), 37-45.

IEEE. (1991) 'IEEE Standard Computer Dictionary. ACompilation of IEEE Standard Computer Glossaries', IEEEStd 610, 1.

Kindrick, J. D., Sauter, J. A. and Matthews, R. S. (1996)'Improving conformance and interoperability testing',StandardView, 4(1), 61-68.

Lewis, G. A., Morris, E., Simanta, S. and Wrage, L. (2008)'Why Standards Are Not Enough to Guarantee End-to-EndInteroperability', in Proceedings of the SeventhInternational Conference on Composition-Based SoftwareSystems (ICCBSS 2008), 1343630: IEEE ComputerSociety, 164-173.

Savourel, Y. (2007) 'CAT tools and standards: a briefsummary', MultiLingual, September 2007, 37.

Schäler, R. (2009) 'Communication as a Key to GlobalBusiness'. In: Hayhoe, G. Connecting people withtechnology: issues in professional communication.Amityville N.Y. Baywood Pub. 57-67.

W3C. (2007) 'Web Service Description Language (WSDL)'Version 2.0 Part 1:Core Language, W3C Recommendation[online]. In Chinnici, R., Moreau, J, J., Ryman, A. andWeerawarana, S. eds.W3C, http://www.w3.org/TR/wsdl20.

XLIFF Technical Committee. (2008). 'XLIFF 1.2Specification' [online]. http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html [accessed 25 Jun 2009].

XLIFF Technical Committee. (2010) 'XLIFF2.0 / FeatureTracking' [online], http://wiki.oasis-open.org/xliff/XLIFF2.0/FeatureTracking, [accessed 23 Jul2009].

43