3
Toward Assessing Data Quality of Ontology Matching on the Web Olga Vorochek Kharkiv National University of Radio- Electronics, Ukraine [email protected] Yevgen Biletskiy University of New Brunswick Canada [email protected] Abstract Nowadays the Semantic Web is a leading web technology, which enables semantic interoperability between structurally and semantically heterogeneous web sources and web users. Ontologies are a key of semantic interoperability and the main vehicle of the development of the Semantic Web. One of the most challenging and important tasks of ontology engineering is integration of ontologies because with the purpose to built a common ontology for all web sources and consumers in a domain. The present paper describes an approach of assessing data quality of ontology matching that allows evaluating correctness of mapping concepts and relationships from one ontological fragment to another. The purpose of correct mapping is to find identical and synonymous concepts and relationships in ontologies facilitating their integration to a common ontology, which serves as a base for attaining interoperability. 1. Introduction The present work is related to achieving semantic interoperability between information sources and consumers by matching their ontological fragments. There are many approaches for achieving semantic interoperability as well as for ontology matching [1, 2, 3, 4, and 5]. This work focuses on finding semantically identical and/or synonymous ontological concepts and relationships between them in potentially interoperable ontological fragments. If two fragments of different ontologies are semantically equivalent (identical or synonymous), a link of interoperability is established between them. In order to find truly equivalent ontological fragments data quality factors are very important because they allow qualifying ontology integration results. This work uses the ontological graph (ontograph) as a formalism to represent ontologies [5, 6] as follows: G o =(V o ,E o ), (1) where V o – is a set of vertices encoding ontological concepts; E o is a set of edges encoding relationships between these concepts. For example, in a simple ontological fragment (figure 1) describing a product database Product (Name, Price): V o ={Product, Name, Price}, E o ={nameOf, priceOf}. Data Quality (DQ) is a complex multidimensional concept, and each dimension is characterized by an attribute or a set of attributes. There are many dimensions of DQ as well as many methods for their definition and estimation [7, 8]. The present work focuses on the specific task of assessing DQ of matching graph-based ontologies with the purpose of their integration. DQ in this work is considered in three aspects: DQ attributes – single DQ units, which are important for graph-based ontologies (accuracy, completeness, consistency, unambiguity, significance, timeliness and reliability); DQ dimensions – define an ontology’s properties during various phases of its life cycle that provides a complex estimation of DQ, minimize and structure information, and identify semantic problems that cause low level of DQ; Product Name Price nameOf priceOf Figure 1 A fragment of a graph-based ontology Fifth Annual Conference on Communication Networks and Services Research(CNSR'07) 0-7695-2835-X/07 $20.00 © 2007

[IEEE Fifth Annual Conference on Communication Networks and Services Research - Fredericton, NB, Canada (2007.05.14-2007.05.17)] Fifth Annual Conference on Communication Networks and

  • Upload
    yevgen

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE Fifth Annual Conference on Communication Networks and Services Research - Fredericton, NB, Canada (2007.05.14-2007.05.17)] Fifth Annual Conference on Communication Networks and

Toward Assessing Data Quality of Ontology Matching on the Web

Olga Vorochek Kharkiv National University of Radio-

Electronics, Ukraine [email protected]

Yevgen Biletskiy University of New Brunswick Canada

[email protected]

Abstract

Nowadays the Semantic Web is a leading web technology, which enables semantic interoperability between structurally and semantically heterogeneous web sources and web users. Ontologies are a key of semantic interoperability and the main vehicle of the development of the Semantic Web. One of the most challenging and important tasks of ontology engineering is integration of ontologies because with the purpose to built a common ontology for all web sources and consumers in a domain. The present paper describes an approach of assessing data quality of ontology matching that allows evaluating correctness of mapping concepts and relationships from one ontological fragment to another. The purpose of correct mapping is to find identical and synonymous concepts and relationships in ontologies facilitating their integration to a common ontology, which serves as a base for attaining interoperability. 1. Introduction

The present work is related to achieving semantic interoperability between information sources and consumers by matching their ontological fragments. There are many approaches for achieving semantic interoperability as well as for ontology matching [1, 2, 3, 4, and 5]. This work focuses on finding semantically identical and/or synonymous ontological concepts and relationships between them in potentially interoperable ontological fragments. If two fragments of different ontologies are semantically equivalent (identical or synonymous), a link of interoperability is established between them. In order to find truly equivalent ontological fragments data quality factors are very important because they allow qualifying ontology integration results.

This work uses the ontological graph (ontograph) as a formalism to represent ontologies [5, 6] as follows:

Go=(Vo,Eo), (1) where Vo – is a set of vertices encoding ontological

concepts; Eo – is a set of edges encoding relationships

between these concepts. For example, in a simple ontological fragment (figure 1) describing a product database Product (Name, Price): Vo={Product, Name, Price}, Eo={nameOf, priceOf}.

Data Quality (DQ) is a complex multidimensional

concept, and each dimension is characterized by an attribute or a set of attributes. There are many dimensions of DQ as well as many methods for their definition and estimation [7, 8]. The present work focuses on the specific task of assessing DQ of matching graph-based ontologies with the purpose of their integration. DQ in this work is considered in three aspects: − DQ attributes – single DQ units, which are

important for graph-based ontologies (accuracy, completeness, consistency, unambiguity, significance, timeliness and reliability);

− DQ dimensions – define an ontology’s properties during various phases of its life cycle that provides a complex estimation of DQ, minimize and structure information, and identify semantic problems that cause low level of DQ;

Product

Name

PricenameOf

priceOf

Figure 1 – A fragment of a graph-based ontology

Fifth Annual Conference on Communication Networks and Services Research(CNSR'07)0-7695-2835-X/07 $20.00 © 2007

Page 2: [IEEE Fifth Annual Conference on Communication Networks and Services Research - Fredericton, NB, Canada (2007.05.14-2007.05.17)] Fifth Annual Conference on Communication Networks and

− DQ categories: each category characterizes a situation of the most frequent use of a set of DQ dimensions; therefore, there is a hierarchy of DQ.

Section 2 of this paper describes the estimation of several DQ attributes, section 3 illustrates the use of these estimations to estimate DQ dimensions and categories, and section 4 concludes this work, which is a research in progress.

2. An approach of assessing data quality in graph-based ontologies

The proposed evaluation of data quality attributes considers their meanings in the proposed graph-based ontology matching. Let us define a fragment Oi of an ontology i and the fragment’s potential match Oj of another ontology j. In case of similarity between these fragments we can determine a sub-fragment Oi↔j that is common to both ontological fragments, meaning all concepts and relationships from this sub-fragment are equivalent (identical or synonymous) to corresponding concepts and relationships in both fragments Oi and Oj; therefore, we can define several useful values for the ontological graph representing the fragments mentioned above: − ∑VOi and ∑VOj - number of concepts (vertices of

ontological graph) in the ontological fragments Oi and Oj respectively;

− ∑EOi and ∑EOj - number of relationships (edges of ontological graph) between concepts in the ontological fragments Oi and Oj respectively;

− ∑VOi↔j - number of matched concepts from the ontological fragments Oi and Oj respectively;

− ∑EOi↔j - number of matched relationships between concepts from ontological fragments Oi and Oj.

Accuracy (A) of matching one ontological fragment with another is evaluated as the relation of the number of relationships between concepts, which are accurately mapped from the ontological fragment to the other fragment, to the total number of relationships (2). Accuracy of matching a concept is evaluated as the relation of the number of relationships with other concepts in the ontology to the corresponding number in the match (3).

∑ ∑∑

+= ↔

OjOi

jOitotal EE

EA (2)

OjOi where, ↔=∑∑

Oj

Oiconcept E

EA (3)

Completeness (C) represents a degree of participation of relationships in the definition of

properties of ontological concepts. Completeness of the common ontological fragment is evaluated as the relation of the number of relationships between concepts, which have their match in another ontological fragment, to the total number of relationships in each fragment (4). Completeness of a concept is evaluated as the relation of the number of relationships with other concepts, which have their match in another ontological fragment, to the total number of relationships with other concepts in the fragment (5, 6).

∑∑

∑∑ ↔↔ +=

Oj

jOi

Oi

jOitotal E

EEE

C (4)

∑∑ ↔=

Oi

jOiconcept E

EC (5)

∑∑ ↔=

Oj

jOiconcept E

EC (6)

Conformity (Con) is a degree of overlap between matched concepts in ontological fragments. Conformity of the whole ontological structure is evaluated as the relation of the number of concepts, which have their match in another ontological fragment, to the total number of concepts in each fragment (7). Conformity of a concept is evaluated as the relation of the number of concepts, which have their match in another ontological fragment, to the total number of concepts in the fragments (8).

∑∑∑

+= ↔

OjOi

jOitotal VV

VCon (7)

∑∑∑

+=

OjOi

linkOiconcept VV

VCon (8)

Unambiguity (U) is a degree of overlapping

ontology relationships. Unambiguity of a concept is determined as the relation of the minimum number of links to the potential concept match to its maximum number (9).

)EE()EE(

UjjOiijOi

jjOiijOiconcept ∑ ∑

∑ ∑∈↔∈↔

∈↔∈↔=,max,min (9)

Significance (S) is a degree of influence of

relationships on forming the concept properties. Significance of a concept is determined by the relation of the number of relationships of this concept to other concepts in an ontological fragment to the total number of relationships participating in integration (10).

∑ ∑∑+

=OjOi

Oiconcept EE

ES (10)

Fifth Annual Conference on Communication Networks and Services Research(CNSR'07)0-7695-2835-X/07 $20.00 © 2007

Page 3: [IEEE Fifth Annual Conference on Communication Networks and Services Research - Fredericton, NB, Canada (2007.05.14-2007.05.17)] Fifth Annual Conference on Communication Networks and

Timeliness (T) or Currency and Reliability (R) can be evaluated by an expert, but if an expert is not available the following assumptions can be involved: − Timeliness (T) of an ontological fragment with the

shortest update interval is evaluated as 1, and timeliness of another matched fragment is evaluated as the relation of its update interval to the update interval of the first fragment;

− Reliability (R) depends on how familiar a user is with the matched ontological fragments: if user uses (or trusts) both of them equally then the reliability of both of them is 1; however, if one of them is used rarely then its reliability is relatively lower.

3. Assessing dimensions and categories of data quality in graph-based ontologies

Combinations of attributes define the dimensions of data quality. These dimensions for ontologies are sets of participating attributes and are defined as follows: − Functionality (F) – {T, R, Atotal, Ctotal, Contotal}; − Contextualization (Ctx) – {Aconcept, Cconcept,

Conconcept, Uconcept}; − Completeness (Cfn) – {Ctotal, Uconcept, Sconcept}; − Normalization (N) – {T, R, Atotal, Ctotal, Contotal,

Uconcept, Sconcept}; − Interpretability (I) – {Cconcept, Uconcept}; − Representation (Rp) – {Aconcept, Uconcept, Sconcept}; − Timeliness (Tl) – {T, R}; − Cohesion (Ch) – {Aconcept, Conconcept, R}.

The data quality dimensions can be integrated to the following categories: − Characteristic – {Cfn, Ch, N}; − Contextual – {F, Ctx}; − Descriptive – {F, I, P}.

Contextual and descriptive categories are usually used during the process of building and integrating ontologies, and characteristic category is usually used during the process of interoperation using the common ontology.

The proposed numerical assessment of data quality attributes, dimensions and categories allows evaluating the overall success of integrating ontologies to a common ontology that is used for achieving interoperability between structurally and semantically heterogeneous information sources and consumers. The quality of a common ontology is an important estimation of the effectiveness of interoperation using this ontology.

4. Conclusion

The present paper described the theoretical foundation of an approach to evaluating data quality attributes, dimensions and categories in graph-based ontologies, which are widely used in the (Semantic) Web. The presented approach evaluates the correctness of mapping concepts and relationships from one graph-based ontological fragment to another, allowing the evaluation of the overall success of integration of these ontologies to a common ontology, an important part of estimating the effectiveness of interoperation using this ontology. There is some future work with this approach concerning with clarifying how to select attributes, dimensions and categories for evaluation in a particular scenario; computational complexity; and formalization of an intuitive approach to the evaluation of timeliness and reliability. The proposed approach requires further research, experimentation and evaluation; therefore it is presented as a short paper (research in progress). 5. References [1] Bressan, S., Goh, C., Levina, N., Madnick, S., Shah, A.,

Siegel, M. Context Knowledge Representation and Reasoning in the Context Interchange System. ACM Applied Intelligence, 2000, 13/2, 165-180.

[2] Noy, N. Semantic Integration: A Survey of Ontology-based Approaches. SIGMOD Record, Special Issue on Semantic Integration, 33/4, 2004, pp. 65-70.

[3] Svaiko, P., Euzenat, J. A Survey of Schema-based Matching Approaches. Journal on Data Semantics, LNCS, Springer, 3730/2005, pp. 146-171.

[4] Gomez-Perez, A., Corcho, O., Fernandez-Lopez, M. Ontological Engineering : with examples from the areas of Knowledge Management, e-Commerce and the Semantic Web (Advanced Information and Knowledge Processing), Springer, 2004, 415 pp.

[5] Biletskiy Y., Boley H., Zhu, L. A RuleML-Based Ontology for Interoperation between Learning Objects and Learners. UCFV Research Review, Issue 1, 2006: http://journals.ucfv.ca/ojs/rr/

[6] P. Mitra, G. Wiederhold, and M. L. Kersten. (2000) “A Graph-Oriented Model for Articulation of Ontology Interdependencies.” In Proceedings of the International Conference on Extending Database Technology (EDBT), 86–100.

[7] Wand, Y. and Wang, R. Anchoring Data Quality Dimensions in Ontological Foundations, Communications of the ACM, November 1996. pp. 86-95.

[8] Pipino, L.L., Lee, Y.W., Wang R.Y. Data Quality Assessment, Communications of the ACM, April 2002, vol. 45. pp. 211-218.

Fifth Annual Conference on Communication Networks and Services Research(CNSR'07)0-7695-2835-X/07 $20.00 © 2007