14
http://jis.sagepub.com/ Journal of Information Science http://jis.sagepub.com/content/31/5/394 The online version of this article can be found at: DOI: 10.1177/0165551505055403 2005 31: 394 Journal of Information Science Dongwon Jeong, Peter Hoh In, Fran Jarnjak, Young-Gab Kim and Doo-Kwon Baik repository A message conversion system, XML-based metadata semantics description language and metadata Published by: http://www.sagepublications.com On behalf of: Chartered Institute of Library and Information Professionals can be found at: Journal of Information Science Additional services and information for http://jis.sagepub.com/cgi/alerts Email Alerts: http://jis.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: http://jis.sagepub.com/content/31/5/394.refs.html Citations: What is This? - Aug 16, 2005 Version of Record >> at Universitats-Landesbibliothek on December 10, 2013 jis.sagepub.com Downloaded from at Universitats-Landesbibliothek on December 10, 2013 jis.sagepub.com Downloaded from

A message conversion system, XML-based metadata semantics description language and metadata repository

  • Upload
    d

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Page 1: A message conversion system, XML-based metadata semantics description         language and metadata repository

http://jis.sagepub.com/Journal of Information Science

http://jis.sagepub.com/content/31/5/394The online version of this article can be found at:

 DOI: 10.1177/0165551505055403

2005 31: 394Journal of Information ScienceDongwon Jeong, Peter Hoh In, Fran Jarnjak, Young-Gab Kim and Doo-Kwon Baik

repositoryA message conversion system, XML-based metadata semantics description language and metadata

  

Published by:

http://www.sagepublications.com

On behalf of: 

  Chartered Institute of Library and Information Professionals

can be found at:Journal of Information ScienceAdditional services and information for    

  http://jis.sagepub.com/cgi/alertsEmail Alerts:

 

http://jis.sagepub.com/subscriptionsSubscriptions:  

http://www.sagepub.com/journalsReprints.navReprints:  

http://www.sagepub.com/journalsPermissions.navPermissions:  

http://jis.sagepub.com/content/31/5/394.refs.htmlCitations:  

What is This? 

- Aug 16, 2005Version of Record >>

at Universitats-Landesbibliothek on December 10, 2013jis.sagepub.comDownloaded from at Universitats-Landesbibliothek on December 10, 2013jis.sagepub.comDownloaded from

Page 2: A message conversion system, XML-based metadata semantics description         language and metadata repository

A message conversion system,XML-based metadata semanticsdescription language and metadatarepository

394 Journal of Information Science, 31 (5) 2005, pp. 394–406 © CILIP, DOI: 10.1177/0165551505055403

Dongwon Jeong

Department of Informatics and Statistics, Kunsan NationalUniversity, Korea

Peter Hoh In, Fran Jarnjak, Young-Gab Kimand Doo-Kwon Baik

Department of Computer Science and Engineering, KoreaUniversity, Korea

Received 21 September 2004Revised 8 April 2005

Abstract.

Metadata can be used to precisely represent data semantics.It can also serve to improve data sharing and exchange.Because the various types of metadata are created in differ-ent ways, they can suffer from a problem of inconsistency.Recently, metadata gateway methods have been researchedto solve this problem. However, the performance of theexisting approaches based on metadata schemas is poor andtheir maintenance (adaptation of metadata changes) is timeconsuming. In this paper, a novel message conversionsystem is proposed, which functions by separating theheterogeneous mapping information from the mapping rulesof the metadata, in order to overcome the drawbacks of theexisting metadata gateway methods. The proposed system

controls the standardized data elements dynamically basedon the Metadata Registry (MDR), which is one of the mostimportant elements of the ISO/IEC 11179 standard. Theproblems associated with adding supplementary metadataare resolved, since the standard provides for incorporatingadditional data elements created in the future. MSDL isdefined as a protocol which can be used for exchangingmessages between heterogeneous systems, and whichensures that all of the systems have their own independentmetadata schemas.

Keywords: message conversion; metadata; messageexchange; XML; description language; metadataregistry; heterogeneity

1. Introduction

Metadata, which are used to represent data semanti-cally, consist of a data structure, representation, andsemantics. However, a problem of incompatibilitybetween metadata has been raised due to the fact thatthere is no standardized rule defining the consistencyof metadata representation methods.

Recently metadata gateway methods have beenstudied which allow for the heterogeneity of themetadata. The existing gateway process for metadatadepends on the metadata schema. One of the disadvan-tages of this method is that the metadata schemas needto be changed when the related systems are restruc-tured.

In order to resolve this issue, in this paper, a messageconversion system is proposed, which is designed to

Correspondence to: Dongwon Jeong. E-mail: [email protected] and Peter Hoh In. E-mail: [email protected]

at Universitats-Landesbibliothek on December 10, 2013jis.sagepub.comDownloaded from

Page 3: A message conversion system, XML-based metadata semantics description         language and metadata repository

D. JEONG ET AL.

support the independence of the mapping informationand mapping rules. The proposed system consists of aMetadata Registry (MDR), a Metadata SemanticsDescription Language (MSDL), and a message con-verter. The MSDL is used to separate the mappinginformation used for conversion from the mappingrules defined in the message converter. The MDRcontrols the standardized data elements by applyingthe ISO/IEC 11179 standard dynamically, in order toprovide a basic data element which is used to conveythe real data semantics. This paper is organized asfollows: Section 2 describes related research andSection 3 defines the mapping protocol used to resolvethe problem of metadata inconsistency. Section 4proposes the mark-up language, MSDL, which is usedto depict the differences between the metadata. InSection 5, the implementation of the system and itsevaluation are presented. Our conclusions and futureresearch directions are presented in Section 6.

2. Related work

In this section, the types of metadata inconsistency aredefined and classified. The existing approaches whichhave been used to attempt to resolve this issue arereviewed.

2.1. Classification of inconsistencies betweenmetadata

First, the major terminologies are defined, in order toavoid any confusion regarding metadata-related wordsand concepts. The term metadata stands for structureddata. It is used to explain a set of data elements. A dataelement is a metadata unit used to show a datum undera general and abstract concept that shows a specialfeature of the data. The term metadata schema refers to aset of specified data elements and their mutual relations.

The data elements can generally be classified intotwo kinds: domain-dependent and domain-independ-ent ones. The domain-dependent data elements areused and standardized for a specific domain or field,while the independent data elements are standardizedfor all domains and can be used as a standard formatfor the creation of new databases.

Generally, metadata are made up of three elements,viz. the semantics, representation and structure, toconvey the information concerning the data, as shown inTable 1 [1]. A consistent description of these elementsis required for efficient data sharing. However, dis-crepancies between the semantic, representation, and

structure of the metadata may exist, due to the fact thatthere is no standardized protocol for the creation ofmetadata [2]. In an attempt to solve this problem, manyresearchers have studied the inconsistency of metadataand their classification.

Gio Wiederhold studied the problem of the hetero-geneity of the label data which arises when hetero-geneous data are integrated [3]. W. Kim and J. Seodefined the schema of the Component Database (CDB)and studied the problem of data heterogeneity by com-paring schematic collision and data collision [4].Parent and Spaccapietra studied the problem ofdatabase integration and the associated collisions andpresented a solution to this problem [5]. Holowczakmade a comparative study of the advantages and dis-advantages of the table, rule, ontology, and model,which are the different methods of representing hetero-geneous metadata [6].

2.2. Background of the metadata registry

The ISO/IEC JTC 1/SC 32 WG 2 developed the ISO/IEC11179 standard, in order to enhance the interoperabil-ity of databases. The things which make up a databaseare recognized through their properties, and the datarepresent these properties. In ISO/IEC 11179, a dataelement is a unit whose definition, identification, rep-resentation, and permissible values are specified bymeans of a set of attributes. The documented dataelements are managed in an MDR [7].

Figure 1 illustrates the idea behind the ISO/IEC11179 standard. In ISO/IEC 11179, a data elementconsists of three parts: object class, property, andrepresentation.

The object class is a set of ideas, abstractions, orthings in the real world that can be identified withexplicit boundaries and a meaning and whose

395Journal of Information Science, 31 (5) 2005, pp. 394–406 © CILIP, DOI: 10.1177/0165551505055403

Table 1Classification of metadata information

Component Description

Semantic information Semantic information of dataelements

Schematic information Schematic information thatshows the relation betweeneach data element

Representation information Representation informationsuch as value area, data typeand measuring unit

at Universitats-Landesbibliothek on December 10, 2013jis.sagepub.comDownloaded from

Page 4: A message conversion system, XML-based metadata semantics description         language and metadata repository

A message conversion system, XML-based metadata semantics description language and metadata repository

properties and behavior follow the same rules. Theproperty is a peculiarity common to all members of anobject class. The representation describes how the dataare denoted, and consists of a value domain, data type,and, if necessary, a unit of measure or a characterset.

Object classes are used to describe the objects forwhich we wish to collect and store data. For example,object classes can be cars, persons, households, etc.However, it is important to distinguish the actual objectclass from its name. Properties are what humans use todistinguish or describe objects. Examples of propertiesare color, model, sex, age, and address.

The most important aspect of the representation partof the data element is the value domain. A valuedomain is a set of permissible values for a data element.The combination of an object class and a property is adata element concept. A data element concept isdescribed independently of any particular representa-tion and can be represented in the form of a dataelement.

The data element is the smallest unit of data that isshared and held in common. A data element consistsof mandatory and optional attributes that fully describethe data. Documentation of the data elements isaccomplished through the standardized registrationprocess, and thus a data element has several registra-tion statuses: submitted, recorded, qualified, standard,preferred standard, and retired. This status corre-sponds to a field (attribute) in a relational database. InFigure 1, an object class is mapped to an entity in theE-R model and a class in the object-oriented modelrespectively.

2.3. Previous research into the resolution of metadatainconsistencies

Previous research into the resolution of metadatainconsistencies can be classified into two types. One isthe establishment of a single metadata standard and theother is the metadata gateway approach. In reality,however, it is impossible to define a single metadatastandard that can be used for all types of data, so thatthe single-standard (domain-dependent) approach isimpractical [8–11, 12].

Consequently, the metadata gateway method ofsolving the heterogeneity between the different types ofmetadata is gaining ground. However, in the existingmetadata gateway-based systems (e.g. BizTalk [13]), themapping information between data elements (i.e.metadata) is dependent on the specific systems. Inother words, the other related systems have to bemodified whenever the metadata schemas of thespecific systems are changed. Thus, systems whosemapped metadata are frequently changed require agreat deal of time and effort to maintain. Hence, theindependence of the systems is very weak.

To solve this issue, a new method is required tostrengthen the independence of the mapping infor-mation and mapping rules. In this paper, we propose anew message conversion system which uses an XML-based MSDL and an MDR. The MSDL plays the role ofseparately controlling the mapping information usedfor conversion and the mapping rules defined in themessage converter. The MDR dynamically manages thestandardized data elements and provides a basic dataelement (domain-independent data elements) whichcan be used to convey the real semantics of the data.That is, the goal of the proposed system is to guaranteeindependency between the mapping rules andmapping information in a cost effective and non-timeconsuming manner.

3. The proposed method of resolvingmetadata inconsistencies

In this section, a metadata registry-based method ofestablishing the semantic equivalence of data elementsis proposed. The mapping rules for the resolution ofthe representational and structural inconsistenciesbetween data elements are presented.

396 Journal of Information Science, 31 (5) 2005, pp. 394–406 © CILIP, DOI: 10.1177/0165551505055403

Object Class

Property

Object Class

Representation

(1:1)

(1:1)

(1:N)(1:N)

(1:1)

(1:N)

Data Element Concept Data Element

Property

Fig. 1. Conceptual data element structure.

at Universitats-Landesbibliothek on December 10, 2013jis.sagepub.comDownloaded from

Page 5: A message conversion system, XML-based metadata semantics description         language and metadata repository

D. JEONG ET AL.

3.1. Semantics sharing by MDR and classification ofinconsistencies

The MDR, one of the most important components inthe ISO/IEC 11179 standard [7], maintains and controlsa set of standardized metadata by specifying theirsemantics.

A data element is a basic unit of data controlled bythe MDR and is used to describe the meaning of thedata. For this purpose, each data element has a uniqueidentifier. Hence, the MDR ensures the semantic equiv-alence between metadata by providing a unique andaccurate semantic of the data through the concept ofthe data element.

The inconsistencies in the data elements can beclassified into representational and structural inconsis-tencies. Table 2 summarizes the types of structuraland representational inconsistency between metadatawhich have equivalent semantics. The inconsistencytypes listed in Table 2 can theoretically exist individu-ally, but in general they are mixed with multiple types.

3.2. Mapping rules for the resolution of representativeinconsistencies

For the exchange of semantically equivalent dataelements, any representational inconsistenciesbetween related elements need to be resolved. Depend-ing on the feasibility of data type conversion, there aretwo cases to consider, viz. with and without the con-version of a code set.

3.2.1. Conversion of a code set. In the case where acode set is used, its conversion to a different kind ofcode set can be accomplished by mapping the target

code set to the base code set of the MDR and then sub-stituting the codes. The operation of code set conver-sion between code sets A and B is shown as follows:

As Code Set A = {x0, x1, x3, . . . , xn},Code Set B = {y0, y1, y3, . . . , ym}.

If codes xn and ym are semantically identical codes forall elements in A and B, xn is replaced with ym.

3.2.2. Data type and conversion of measurement unit.In the case where no code set is used, the conversionof both the data type and measurement unit is necess-ary. Data types are categorized as the representationformat and the semantics format according to the datatype used to represent the difference in their realsemantics. The conversion is performed betweensemantically equivalent data elements. The conversionof the measurement unit is done concurrently with theprocess of converting the data types, in the case wherethe numeric data have a measurement unit.

Figure 2 depicts an example of the case where 1.0 cmrepresented in char type is converted to 10.0 mm alsoof char type.

3.3. Mapping rules for resolving structuralinconsistency

The mapping rules used to resolve structural inconsis-tencies can be classified into four types: replacement,composition, decomposition, and rearrangement. Theyare described using the basic statements defined inTable 3. These basic statements are represented usingKIF (Knowledge Interchange Format) [14].

The basic statements listed in Table 3 contain the

397Journal of Information Science, 31 (5) 2005, pp. 394–406 © CILIP, DOI: 10.1177/0165551505055403

Table 2Categories of inconsistency between metadata with equivalent semantics

Category Inconsistency type Description Example

Structure Composition Unite or compose multiple data (FirstName U LastName)elements into one → name

Decomposition Divide or decompose one data element Nameinto multiple ones → (FirstName U LastName)

Rearrangement Change the order of data elements FirstName, LastName→ LastName, FirstName

Representation Code set Inconsistency of application code set ISO 3166-2 ↔ ISO 3166-3

Measurement unit Inconsistency of measurement unit Mile ↔ kilometer

Data type Inconsistency of data type Integer ↔ float

at Universitats-Landesbibliothek on December 10, 2013jis.sagepub.comDownloaded from

Page 6: A message conversion system, XML-based metadata semantics description         language and metadata repository

A message conversion system, XML-based metadata semantics description language and metadata repository

following semantics. If the semantics of data elementsx and y are identical, this is represented as same (x, y).To extract the semantics of an element a from themetadata schema S, the function get_element(S, a) isused. The function set_selement [S(x), a] representsthe mapping of data element a to data element x of themetadata schema S. When z is created, then after thecomposition of x and y, z = +(x, y) is used to representthe composition.

Table 4 defines the four mapping rules (functions)used to address the structural inconsistencies.

4. An MSDL and message conversion system

In order for them to be independent of any changes inthe metadata information exchange, message conver-sion systems should control the mapping informationbetween the metadata and mapping rules separately. Inthis section, a semantic representation language ofmetadata based on eXtensible Mark-up Language(XML), which is referred to as Metadata SemanticsDescription Language (MSDL), is defined to control themapping information between the metadata independ-ently. The structure of the message conversion systemis described based on the MSDL and the message con-version process.

4.1. MSDL

MSDL is an XML-based language which is used todescribe those differences in the semantics, repre-sentation, and structure of the metadata, which are

398 Journal of Information Science, 31 (5) 2005, pp. 394–406 © CILIP, DOI: 10.1177/0165551505055403

(Representation) (Representation)(Semantic) (Semantic) (Semantic)

Data typeconversion

Data typeconversion

Measurement unitconversion

Data typeconversion

(string) 1.0 cm (float) 1.0 cm (float) 10.0 mm (int) 10 mm (string) 10 mm

Fig. 2. Example of conversion of data type and measurement unit.

Table 3Basic statements used for the definition of functions

Same (x, y) ← x = yget_element (S, a) ← extract a from Sset_element (S(x), a) ← link a to S(x)z = +(x, y) ← z means x and y

Table 4Mapping rules used for resolving structural inconsistencies

Function 1. Substitution ruleset_element (MDR(y), get_element(schema, x))← same (get_element(MDR, y), get_element(schema, x))

Function 2. Composition ruleset_element (MDR(z), +(get_element(schema, x), get_element(schema, y)))← same (get_element(schema, z), +(get_element(MDR, x), get_element(schema, y)))

Function 3. Decomposition ruleset_element (MDR(x’), x) and set_element(MDR(y’), y)← same (get_element(schema, z), +(get_element(MDR, x’), get_element(MDR, y’)))

Function 4. Rearrangement ruleset_element (MDR(x), y’) and set_element(MDR(y), x’)← same(get_element(MDR, x), get_element(schema, y’))

and same (get_element(MDR, y), get_element(schema, x’))

at Universitats-Landesbibliothek on December 10, 2013jis.sagepub.comDownloaded from

Page 7: A message conversion system, XML-based metadata semantics description         language and metadata repository

D. JEONG ET AL.

necessary for exchanging the semantics between differ-ent metadata. The structure of MSDL is shown inFigure 3.

An MSDL message has a root element <MSDL>. TheMSDL information is classified into three components,viz. a Name space, a MAP, and a CodeSet. The namespace part uniquely represents the target data elements,the <MAP> represents the mapping relation betweenthe data elements and the <CodeSet> represents themapping of the code set being used. The name spacearea is used for the standard data elements of the MDRand the data elements in the schema of the targetmetadata being converted, in order to prevent thepossibility of redundancy between the data elementsused in the other metadata schemas.

The <MAP> contains the information on the seman-tics, representation, and structural differences whichare used to convey the semantics between dataelements. The <MAP> is the basic unit of informationused to represent the mapping between data elementsand consists of the information required to create asingle semantics between them. MSDL describes themapping relation with regard to all of the data elementsof two different heterogeneous metadata schemasthrough multiple <MAP>s. The information in the<MAP> consists of the standard data element infor-mation, the data element information of the targetmetadata schema being converted, and related incon-sistency information.

Both the standard data element and the target dataelement describe the actual name of a data element.

Thus, the identifiers which can be used to identify ituniquely are its data type and the measurement unit ofthe data element if one exists. Furthermore, in the casewhere the data type of a data element written in<MAP> is a code set type, the corresponding code setname and catalogue is stored as a sequence in the<CodeSet> forming a mapping array.

Table 5 shows a part of the XML schema used todefine the MSDL.

4.2. The structure of the message conversion system

The proposed message conversion system performsautomatic conversion when a user acknowledges thedata contained in an XML message. There are fourmajor parts of this system, viz. the MDR, the MSDLregistry, a user interface, and the message converter.

The overall structure of the message conversionsystem is shown in Figure 4. The MDR is designedbased on the specification described in the ISO/IEC11179 standard and it registers and controls thestandard data elements, which constitute the basicitems needing to be converted. A data element in aspecified timeframe is distributed in the form of anXML message, which is controlled by the MDR. Whenusers create an MSDL description, the data elements ofthe MDR distributed at specified timeframes should beused. Furthermore, its version should be explicitlystated, to prevent potential confusion due to somefuture changes in the standard data elements.

The MSDL registry plays the role of controlling anddistributing the MSDL descriptions that contain themapping information of the standard data element andthe data element of the target metadata being con-verted. In the case where there is a change in themetadata schema, the user should re-register themodified MSDL and XML schema in the MSDL registryso as to reflect this change.

The user interface can be classified into four differ-ent kinds: the MDR user interface used for the searchof the standard data element, the MSDL generator usedfor the generation of the MSDL document, the messagegenerator used to convert the user’s own data to theXML message format and the message converter inter-face to exchange data through a message converter.

The message converter contains the mapping rulesused to resolve representational and structural incon-sistencies. It converts an input source message to atarget message and returns the result to the user. Thedetailed message conversion process is presented inthe next section.

399Journal of Information Science, 31 (5) 2005, pp. 394–406 © CILIP, DOI: 10.1177/0165551505055403

MSDL

MAP (Data element mapping)

Name space of MDRName space for schema of target metadata to be converted

Information of standard data elements defined in MDR

Information of target data elements to be converted

Information for resolving inconsistency

CodeSet (Mapping information between MDR and Code set of targetmetadata schema to be converted)

... // MAP list

... // CodeSetlist

Fig. 3. Structure of MSDL.

at Universitats-Landesbibliothek on December 10, 2013jis.sagepub.comDownloaded from

Page 8: A message conversion system, XML-based metadata semantics description         language and metadata repository

A message conversion system, XML-based metadata semantics description language and metadata repository

4.3. The XML message conversion process

Message conversion is performed using five categoriesof input information, viz. the source message, the XMLschema of the source message, the MSDL of the sourcemessage, the XML schema of the target message and theMSDL descriptions of the target message.

The message converter applies the mapping rules intwo steps in order to resolve any inconsistencies andthe process is described in Figure 5. In step 1, thesource message is converted to a standard message thatis configured using the standard data elements. Themessage converter receives the source message that theuser inputs through the message converter interface.The MSDL descriptions of the source message and theXML schema needed for conversion are requested fromthe MSDL registry, as shown in Figure 4.

The message converter maps the data elements

contained in each <MAP> of the MSDL to the dataelements of the target message. In the case where astructural inconsistency is encountered, the conversionis done by applying the mapping rules defined inSection 3.3 based on the names of the mapping ruleslisted in the <MAP>. In the case of representationalinconsistencies, after the structural inconsistency hasbeen taken care of, the mapping rules in Section 3.2 areapplied and the conversion to the format of thestandard data elements that are assigned in the <MAP>is performed. Thus, a standard message is generatedafter the mapping of the data elements in each <MAP>has been completed.

In step 2, the standard message generated in step 1 isconverted to the target message that the user initiallyrequested. The standard message is used instead of thesource message. The only difference between step 1and step 2 is that the operations of composition and

400 Journal of Information Science, 31 (5) 2005, pp. 394–406 © CILIP, DOI: 10.1177/0165551505055403

Table 5MSDL schema based on XML

. . .<xs:element name= “msdl”>

<xs:complexType><xs:sequence>

<xs:element name= “mdrNamespace” type= “xs:string”/><xs:element ref= “localNamespace”/><xs:element ref= “map” maxOccurs= “unbounded”/><xs:element ref= “CodeSet” minOccurs= “0” maxOccurs= “unbounded”/>

</xs:sequence></xs:complexType>

</xs:element>. . .<xs:element name= “map”>

<xs:complexType><xs:sequence>

<xs:element name= “mdrElement” type= “elementIDType” maxOccurs= “unbounded”/><xs:element name= “localElement” type= “elementPathType” maxOccurs= “unbounded”/><xs:element ref= “mappingRule”/>

</xs:sequence></xs:complexType>

</xs:element>. . .<xs:simpleType name= “mappingTypeType”>

<xs:restriction base= “xs:string”><xs:enumeration value= “substitution”/><xs:enumeration value= “composition”/><xs:enumeration value= “decomposition”/>

</xs:restriction></xs:simpleType>. . .

at Universitats-Landesbibliothek on December 10, 2013jis.sagepub.comDownloaded from

Page 9: A message conversion system, XML-based metadata semantics description         language and metadata repository

D. JEONG ET AL.

decomposition are applied in the reverse order whenapplying the mapping rules. The remainder of thisprocess is similar to that described in step 1.

When step 2 is completed, the target message isreturned to the user interface, as shown in Figure 5.The message conversion process resolves structuralinconsistencies by applying the mapping rules in atwo-step process.

4.4. Example of the conversion of an XML message

An example of a simple solution to the problem ofstructural inconsistencies using the MSDL with arelevant source is presented.

Company A requests a new book from publishing companyB in order to sell it. In order to obtain the necessary infor-mation on the newly published book, companies A and Bagree to exchange their data based on a data standard

401Journal of Information Science, 31 (5) 2005, pp. 394–406 © CILIP, DOI: 10.1177/0165551505055403

MDR

MessageConverterMessage

Converter

MDR UserInterface

MDR UserInterface

MSDLGenerator

MSDLGenerator

MessageGeneratorMessage

Generator

MSDLRegistry

MSDL–XML Schema Pair

Converter Interface

Converter Interface

Data ElementSearching

Source Message

StandardData Element

MSDL–XMLSchema Pair

Target Message

Fig. 4. Structure of the proposed message conversion system.

Source Message ( XML )

S-Message(Standard Message)

Target Message ( XML )

SourceMessage Schema( XML )

Target Message Schema( XML )

MSDL between Source Message & S-Message MSDL between S-Message & Target Message

Step 1 Step 2

MessageConverter

Step1

MessageConverter

Step2

Message Converter

Fig. 5. Message conversion process.

at Universitats-Landesbibliothek on December 10, 2013jis.sagepub.comDownloaded from

Page 10: A message conversion system, XML-based metadata semantics description         language and metadata repository

A message conversion system, XML-based metadata semantics description language and metadata repository

defined in the MDR. Therefore, they individually preparethe MSDL descriptions for the XML documents to be usedfor this purpose. For the sake of convenience, let us assumethat the message from company A is Message-A, themessage converted using the standard data elements ofthe MDR is S-Message, and the message converted to theformat of company B is Message-B. All of these messagescontain the title of a book, the author name, and the priceinformation.

The XML schemas of Message-A, S-Message, andMessage-B are shown in Figure 6.

Most of the items can be converted by the substitu-tion operation, which is the basic mapping rule.However, FirstName and LastName need to becombined through the composition operation in orderto derive the name of the author which is to be repre-sented as an AuthorName in the S-Message.

A part of the MSDL descriptions for Message-A isshown in Table 6, in order to demonstrate the processof deriving the author’s name using the compositionrule. If it is mapped to the other data element throughsubstitution, the <mappingType> is described as ‘sub-stitution’. The other parts of the message are omittedsince the process of conversion is similar.

If the instance of Message-A is the same as that of (a)in Figure 6, the first step of the XML message conver-sion process is performed using the MSDL descriptionsof Message-A. Figure 6(b) represents the standardmessage that is the result of the first stage of conver-sion. The standard message is temporarily stored until

the second stage of message conversion has beencompleted.

The process of XML message conversion is com-pleted after the second stage of conversion is finished.This step uses the standard message shown in Figure6(b) and the MSDL descriptions prepared by companyB to generate the final message, Message-B, which isshown in Figure 6(c).

5. Implementation and comparativeevaluation

In this section, we discuss the implementation andresults. In addition, the comparative evaluation resultsare included.

5.1. Implementation

The XML message conversion system was imple-mented with the following capabilities: searching forstandard data elements, message creation, and messageconversion. ASP.NET is used as the developmentlanguage and Windows 2000 as the operating system.SQL Server 2000 is used as the database managementsystem, in order to ensure good compatibility and tightintegration with ASP.NET.

The basic data types and code sets of MDR areprovided in XML format. Using a search browser,the XML message is identified and translated. Once

402 Journal of Information Science, 31 (5) 2005, pp. 394–406 © CILIP, DOI: 10.1177/0165551505055403

(a) Structure and source of Message-A

Product Book BookTitle

FirstName

LastName

Price

(c) Structure and source of Message-B

Price

eBookSample BookName

Name

(b) Structure and source of S-Message

Product Title

AuthorName

Price

Product title

Author

Price

<?xml version="1.0" encoding="UTF-8"?> <locala:Product> <locala:Book> <locala:BookTitle>XML Book</locala:BookTitle> <locala:FirstName>Gil-Dong</locala:FirstName> <locala:LastName>Hong</locala:LastName> </locala:Book> <locala:Price>$15.00</locala:Price>

</locala:Product>

<?xml version="1.0" encoding="UTF-8"?> <Product> <Title>XML Book</Title> <AuthorName>Kil-Dong, Hong</AuthorName> <Price>$15.00</Price> </Product>

<?xml version="1.0" encoding="UTF-8"?> <localb:Sample>

<localb:eBook> <localb:BookName>XML Book</localb:BookName> <localb:Name>Kil-Dong, Hong</localb:Name>

</localb:eBook> <localb:Price>$15.00</localb:Price>

</localb:Sample>

Fig. 6. Structures and sources of Message-A, Standard Message, and Message-B.

at Universitats-Landesbibliothek on December 10, 2013jis.sagepub.comDownloaded from

Page 11: A message conversion system, XML-based metadata semantics description         language and metadata repository

D. JEONG ET AL.

the standard data element has been compared with thedata element to be converted, the mapping betweenthese two elements is created by the MSDL generator.At the same time, the MSDL description is generatedand the XML schema of the metadata schema isregistered.

Given the format of the target message, the messageconverter requests the MSDL description and XMLschema from the MSDL registry. The requested messageis converted using the two-stage process describedabove. Once the conversion is completed, the systemchecks the correctness of the message using the XMLschema of the target document. The final message isreturned to the user. Figure 7 depicts the user interfacescreen of the message converter that was implemented.

The source message is loaded using the ‘Load SourceMessage’ button, and the format of the target messageis chosen, with the possible formats being ONIX,Dublin Core or a local standard, i.e. a standard for aparticular company. Once the target message is chosen,the converter shows the target message in the text areaon the right-hand side of the screen. The user isnotified as to whether the conversion process hassucceeded or not via the text field ‘Status Information’.Once the correctness of the created target message hasbeen verified, the converted message can be saved byclicking on the ‘Save Target Message’ button.

5.2. Comparative evaluation

To evaluate the performance of the message converter,bibliography information is used as a benchmark fordata sharing and exchange. In the field of bibliographyinformation, various standard metadata schemas suchas ONIX, Dublin Core and MARC are used.

In this paper, the instances of ONIX, Dublin Core,and a randomly selected metadata schema (Randomschema) were targeted and applied to the XMLmessage conversion system. For the selection of thedata elements used for the sample, the most frequentlyused ones in the instances of ONIX and Dublin Coreand the sample provided in the ONIX spec wereselected. After creating the appropriate MSDL descrip-tion for each of the metadata schemas, the conversionswere performed by creating the instances of themetadata schemas. Table 7 shows the experimentalresults.

The test results show that data conversion can beachieved successfully between data elements withidentical semantics and with the conveyance of thedata semantics. However, some data loss may occur inthe case where there are data elements that do not existin the appropriate form in the metadata schema.

The message conversion system proposed in thispaper has the following three advantages.• The data elements of the conversion target are

mapped to the standard data elements provided inthe MDR. Accordingly the exchange or conversionof messages is accomplished regardless of datareceiver.

• The mapping relations between the data elementsare kept separate from the system implementa-tion. Thus, less maintenance is needed whenthere is a change in the metadata schema.

• The use of the MSDL allows for multiple dataexchanges. These exchanges can be performedwhenever the target is changed.

403Journal of Information Science, 31 (5) 2005, pp. 394–406 © CILIP, DOI: 10.1177/0165551505055403

Table 6MSDL for Message-A

<msdl>. . .<map>

<mdrElement> <elementName>AuthorName</elementName><elementID>DE008201</elementID><dataType>

<nonCodeSetDataType>string</nonCodeSetDataType></dataType><measurementUnitID>MU000000</measurementUnitID>

</mdrElement><localElement>

<elementName>FirstName</elementName><elementPath>/Product/Book/FirstName</elementPath><dataType>

<nonCodeSetDataType>string</nonCodeSetDataType></dataType><measurementUnitID>MU000000</measurementUnitID>

</localElement><localElement>

<elementName>LastName</elementName><elementPath>/Product/Book/LastName</elementPath><dataType>

<nonCodeSetDataType>string</nonCodeSetDataType></dataType><measurementUnitID>MU000000</measurementUnitID>

</localElement><mappingRule>

<mappingType>composition</mappingType><delimiter>,</delimiter>

</mappingRule></map>. . .

</msdl>

at Universitats-Landesbibliothek on December 10, 2013jis.sagepub.comDownloaded from

Page 12: A message conversion system, XML-based metadata semantics description         language and metadata repository

A message conversion system, XML-based metadata semantics description language and metadata repository

The existing systems which are commonly employedfor data exchange using metadata are X-MAP [11]proposed by David Wang and BizTalk [13]. The X-MAPsystem is a semi-automated system which can be usedto connect the semantics between the schema elementsof different applications in the case of multiple hetero-geneous systems that exchange data in the XML format.In the case of BizTalk, which is commonly used by

Microsoft, the mapping between schemas is createddirectly using a BizTalk mapper.

However, the message conversion system proposedin this paper only requires changes in the MSDL toreflect related metadata schema changes, and does notrequire any modification or a redesigning of the wholesystem. Consequently, the time and effort required forsystem maintenance and update are greatly reduced.

404 Journal of Information Science, 31 (5) 2005, pp. 394–406 © CILIP, DOI: 10.1177/0165551505055403

Fig. 7. Snapshot of XML message conversion interface.

Table 7Experiment result

Classification Domain-dependent standard Non-standard

Kind of metadata ONIX Dublin Core Random schema

Number of data elements 44 9 7

Number mapped to others DC (9) ONIX (9) ONIX(7)(Number of data elements with same semantic) Random schema (7) Random schema (6) Random schema (6)

Conversion result DC (9) ONIX (9) ONIX (7)(Number of data elements with same semantic) Random schema (7) Random schema (6) Random schema (6)

at Universitats-Landesbibliothek on December 10, 2013jis.sagepub.comDownloaded from

Page 13: A message conversion system, XML-based metadata semantics description         language and metadata repository

D. JEONG ET AL.

Table 8 shows a comparison of the proposed systemand the existing systems. In Table 8, the standardiza-tion item indicates whether standardized data elementsare used or not. One of the benefits of following such astandard is that it prevents additional data inconsis-tency issues. The degree of automation is evaluatedaccording to the method of data delivery. As theproposed message conversion system uses XML-basedMSDL for the delivery of messages between hetero-geneous systems, its flexibility and independency aresuperior to those of the existing systems.

6. Conclusion

The existing metadata gateways suffer from the dis-advantage of being relatively time consuming in termsof their system maintenance when changes occur in themetadata schema since the system fully depends on themetadata schema. To overcome these disadvantages, asystem needs to be developed that separates themapping information from the mapping protocol, inorder to resolve inconsistencies between the metadata.

The MSDL proposed in this paper controls themapping information between heterogeneous metadataindependently in the metadata gateway systems. Theproposed system can control the mapping informationindependently by means of the MSDL description.Hence, in comparison with the existing systems, it iseasy to maintain and modify the proposed system,regardless of the changes which occur in the metadataschema. Also, because of the existence of a temporarymetadata schema created by the MSDL description, itis possible to perform multiple conversions, due to thefact that there is no limit to the number of dataexchange targets and the frequency of the dataexchanges.

When the mapping relations between the metadataare written in MSDL, the basic standard data elementof the MDR specified in the ISO/IEC 11179 standard is

in fact used. These standardized data elements whichare created through the metadata registry can formthe basis for data that will be created in the future.Therefore, metadata inconsistencies can be reducedthrough metadata standardization.

However, the practical registration and control of thestandard data elements of the MDR are not discussedhere, since they are beyond the scope of this paper. Ina future study, a case tool which can be used to controlthe ISO/IEC 11179 MDR will be developed. Also,further research into the possibility of using theproposed system for e-Commerce or knowledge infor-mation exchange is envisaged.

References

[1] D.-K. Baik, Trends of information communication andstandard technology: data standardization and metadataregistry, TTA Journal 71 (2000) 120–7.

[2] C. Blanchi and J. Petrone, Distributed interoperablemetadata registry, D-Lib Magazine 7(12) (2001). Avail-able at: www.dlib.org (accessed 8 April 2005).

[3] G. Wiederhold, Intelligent integration of information. In:Proceedings of the 1993 ACM SIGMOD InternationalConference on Management of Data (Washington D.C.,1993) 434–7.

[4] W. Kim and J. Seo, Classifying schematic and dataheterogeneity in multi-database systems, IEEEComputer 24(12) (1991) 12–18.

[5] C. Parent and S. Spaccapietra, Issues and approaches ofdatabase integration, Communications of the ACM 41(5)(1998) 166–78.

[6] R.D. Holowczak and W.-S. Li, A survey on attributecorrespondence and heterogeneity metadata representa-tion, First IEEE Metadata Conference (IEEE, SilverSpring, 1996).

[7] ISO/IEC JTC 1/SC 32, ISO/IEC 11179: InformationTechnology – Metadata Registries (MDR) – Part 1–6(2003). Available at: http://jtc1sc32.org/ (accessed 22May 2005).

[8] DCMI (Dublin Core Metadata Initiative), Dublin Core

405Journal of Information Science, 31 (5) 2005, pp. 394–406 © CILIP, DOI: 10.1177/0165551505055403

Table 8Qualitative comparison

Comparative item X-MAP BizTalk Proposed system

Standardization N/A N/A ISO/IEC 11179Automation Semi-automatic Automatic Semi-automaticXML support Support Support SupportFlexibility Middle Low MiddleIndependency Dependent Dependent Independent

at Universitats-Landesbibliothek on December 10, 2013jis.sagepub.comDownloaded from

Page 14: A message conversion system, XML-based metadata semantics description         language and metadata repository

A message conversion system, XML-based metadata semantics description language and metadata repository

Metadata Element Set, Version 1.1: Reference Descrip-tion (2004). Available at: http://dublincore.org (accessed8 April 2005).

[9] ONIX (ONline Information eXchange) International,ONIX for Books Product Information Message – Release2.1 (2003). Available at: www.editeur.org (accessed 8April 2005).

[10] W3C, eXtensible Markup Language (XML) 1.1 (2003).Available at: www.w3.org/XML/ (accessed 8 April2005).

[11] D. Wang, Automated Semantic Correlation betweenMultiple Schema for Information Exchange (unpub-lished manuscript, MIT, Cambridge, MA, May 2000).

[12] The Library of Congress, MARC 21 Format (2004). Avail-able at: www.loc.gov (accessed 8 April 2005).

[13] Microsoft, Microsoft BizTalk (2004). Available at:www.microsoft.com (accessed 8 April 2005).

[14] M.R. Genesereth, Knowledge Interchange Format (KIF)(1998). Available at: http://logic.stanford.edu/kif/(accessed 8 April 2005).

406 Journal of Information Science, 31 (5) 2005, pp. 394–406 © CILIP, DOI: 10.1177/0165551505055403

at Universitats-Landesbibliothek on December 10, 2013jis.sagepub.comDownloaded from