8
Fuzzy Similarity From Conceptual Relations Song Ling, Ma Jun, Lian Li, Chen Zhumin School of Computer Science & Technology, Shandong University, Jinan, 250061, China [email protected] Abstract Semantic similarity between concepts supported by the use of ontology plays a prominent role in the concept of the semantic level to provide semantic information for web services discovery and composition. In this work we consider ontology as knowledge structure that specifies concepts and their semantic relations. And we propose a fuzzy similarity measure for not only atomic concepts with inclusion relation but also complex concepts with semantic relation. This fuzzy similarity measure has property of weak fuzzy similarity relation, which conquers existing limitations of equivalence relation. Furthermore, this fuzzy similarity measure based on shared information content could reflect latent semantic relation of concepts better than ever. 1. Introduction Web service discovery and composition is one of the emerging research areas that exploits the service semantic information to reason about the compatibility and functionality of the services. For example, a parameter of service A is a sub-concept of a parameter of service B. In this type of semiautomatic discovery and composition system, the users would prefer to have a selection of “similar” parameters that could be potentially composed rather than one or two exact syntax matches. Similarity plays an important role in several application fields such as the discovery and composition of web services, question answering, recommend system and information retrieval. Current matchmaking methods are inadequate in semantic level. Several approaches have already been suggested for adding semantics to web services [1-2]. Ontology is a specification of a conceptualization of a knowledge domain, which is a controlled vocabulary that describes concepts and the relations between them in a formal way, and has a grammar for using the concepts to express something meaningful within a specified domain of interest. So semantic similarity between concepts supported by the use of ontology plays a prominent role in the concept of the semantic level to provide semantic information for matchmaking. Semantic similarity, semantic relatedness and semantic distance are three terms that are often used [3]. Semantic relatedness can include not only similarity, such as synonymy (car- automobile), hyponymy(dog-poodle), but also semantic dissimilarity, such as antonymy (wet–dry), or any kind of functional relationship or frequent association such as meronymy (car–air bag) and (cause-effect)relationships. Semantic distance, however, is used for the opposite of both semantic similarity and semantic relatedness. In this paper we aim at deriving semantic similarity between ontological concepts that measure degree of similarity proportional to how much the concepts x and y share or how close they are. Several approaches have suggested only on atomic concepts with Inclusion relation. In this paper we propose an approach dealing with not only atomic concepts with Inclusion relation but also complex concepts with all kinds of semantic relations. Our work about measure semantic similarity between concepts x and y is viewing each concept as a fuzzy set along its semantic paths. The semantic similarity between the two concepts x and y can be determined by computing two semantically extended fuzzy sets. The advantages of our measure are that it can mine latent semantic relation and describe conceptual semantic similarity more precisely and more closely to human’s intuition. The paper is organized as follows: In section 2 we first introduce a formal model for ontology and concept language. In section 3 we give brief overview of several relevant similarity measures and analyze their properties and existing limitations. In section 4, we put forward a fuzzy similarity measure, which has a property of weak fuzzy similarity relation. 2. Ontological Representations and Concept Language Proceedings of the 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06) 0-7695-2751-5/06 $20.00 © 2006

[IEEE 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06) - Guangzhou, China (2006.12.12-2006.12.12)] 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)

  • Upload
    zhumin

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: [IEEE 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06) - Guangzhou, China (2006.12.12-2006.12.12)] 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)

Fuzzy Similarity From Conceptual Relations

Song Ling, Ma Jun, Lian Li, Chen Zhumin School of Computer Science & Technology, Shandong University, Jinan, 250061, China

[email protected]

Abstract

Semantic similarity between concepts supported by the use of ontology plays a prominent role in the concept of the semantic level to provide semantic information for web services discovery and composition. In this work we consider ontology as knowledge structure that specifies concepts and their semantic relations. And we propose a fuzzy similarity measure for not only atomic concepts with inclusion relation but also complex concepts with semantic relation. This fuzzy similarity measure has property of weak fuzzy similarity relation, which conquers existing limitations of equivalence relation. Furthermore, this fuzzy similarity measure based on shared information content could reflect latent semantic relation of concepts better than ever.

1. Introduction

Web service discovery and composition is one of the emerging research areas that exploits the service semantic information to reason about the compatibility and functionality of the services. For example, a parameter of service A is a sub-concept of a parameter of service B. In this type of semiautomatic discovery and composition system, the users would prefer to have a selection of “similar” parameters that could be potentially composed rather than one or two exact syntax matches.

Similarity plays an important role in several application fields such as the discovery and composition of web services, question answering, recommend system and information retrieval. Current matchmaking methods are inadequate in semantic level. Several approaches have already been suggested for adding semantics to web services [1-2]. Ontology is a specification of a conceptualization of a knowledge domain, which is a controlled vocabulary that describes concepts and the relations between them in a formal way, and has a grammar for using the concepts to express something meaningful within a specified

domain of interest. So semantic similarity between concepts supported by the use of ontology plays a prominent role in the concept of the semantic level to provide semantic information for matchmaking.

Semantic similarity, semantic relatedness and semantic distance are three terms that are often used [3]. Semantic relatedness can include not only similarity, such as synonymy (car- automobile), hyponymy(dog-poodle), but also semantic dissimilarity, such as antonymy (wet–dry), or any kind of functional relationship or frequent association such as meronymy (car–air bag) and (cause-effect)relationships. Semantic distance, however, is used for the opposite of both semantic similarity and semantic relatedness.

In this paper we aim at deriving semantic similarity between ontological concepts that measure degree of similarity proportional to how much the concepts x and y share or how close they are. Several approaches have suggested only on atomic concepts with Inclusion relation. In this paper we propose an approach dealing with not only atomic concepts with Inclusion relation but also complex concepts with all kinds of semantic relations. Our work about measure semantic similarity between concepts x and y is viewing each concept as a fuzzy set along its semantic paths. The semantic similarity between the two concepts x and y can be determined by computing two semantically extended fuzzy sets. The advantages of our measure are that it can mine latent semantic relation and describe conceptual semantic similarity more precisely and more closely to human’s intuition.

The paper is organized as follows: In section 2 we first introduce a formal model for ontology and concept language. In section 3 we give brief overview of several relevant similarity measures and analyze their properties and existing limitations. In section 4, we put forward a fuzzy similarity measure, which has a property of weak fuzzy similarity relation.

2. Ontological Representations and Concept Language

Proceedings of the 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)0-7695-2751-5/06 $20.00 © 2006

Page 2: [IEEE 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06) - Guangzhou, China (2006.12.12-2006.12.12)] 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)

2.1. Ontological Representations

In order to compare and measure semantic similarity between concepts in ontology, one may consider semiotic levels. Researchers have define a frame system as the underlying model for the ontology [4-6], thereby we work on a slightly revised excerpt of the OKBC knowledge models: Definition 1. (Core Ontology) An Core ontology is a sign system O :=(L;F; G; C; ROOT;H), which consists of:

A Lexicon: The lexicon consists of a set of terms (lexical entries) for concepts, LC, and a set of terms for relations, LR. Their union is the lexicon L:= LC∪LR.

Two Reference Function: The reference functions F,

G, with F : CLC

22 → and G : RLR

22 → . F and G link sets of lexical entries {Li} ⊂ L to the set of concepts and relations they refer to, respectively. In general, one lexical entry may refer to several concepts or relations and one concept or relation may be refered to by several lexical entries. Their inverses are F-1 and G-1.

A set of concepts C (classes in OKBC). About each c∈C exists at least one statement in the ontology.

A particular top concept Root. ROOT is not in C, but in the taxonomy, it is above every other concept.

Concept Hierarchy structure H: Concepts are taxonomically related by the acyclic, transitive relation H, }){( ROOTCCH ∪×⊂ . H (c1; c2) means that c1 is a sub-concept of c2. It holds that ),(: ROOTcHCc ∈∀ .

Concept non-hierarchical structure R: Concepts are semantic related by relations R, CCR ×⊂ , R(c1,c2) means that c1 has a kind of semantic relation with c2. 2.2. Concept Language

ONTOLOG [7-8] is a concept algebra for integration,

formalization, representation and reasoning. It defines a set of semantic relations, which can be used for “attribution” (feature-attachment) to form compound concepts. Formally, assume a set of atomic concepts A and a set of semantic relations R. Then the set of well-formed concepts L of the ONTOLOG language is recursively defined as follows: Definition 2 . (ONTOLOG language) (1) If x∈A then x∈L (2) If x∈L, ri∈R and yi∈L, I=1,2...,n then x[r1:y1,…,rn:yn] ∈L Take an example, the sentence: “the black cat is in the sofa” which can be translated into this semantic expression :sofa[LOC:cat[CHR:black]]. And Disorder[CBY:lack[WRT:vitaminD]] describes a “disorder caused by lack of vitamin D”.

Fig.1 is a fragment of ontology with ONTOLOGY

language [9]. The solid edges are ISA relation and the broken are relations by other semantic relations – in this example CBY (caused by) and WRT (with respect to) are in use. Each compound concept has broken edges to its attribution concept.

di sorder[CBY: l ack[WRT:vi tami nC]]

vi tami nC

l ack[WRT:vi tami nC]

water-sol ubl e vi tami n

di sorder[CBY: l ack[WRT:vi tami nK]]

l ack[WRT:vi tami nK]

f at -sol ubl e vi tami n

vi tami nK

l ack

vi tami n di sorder

di sorder[CBY:l ack]

ent i ty

Fig.1. A fragment of the ontology with semantic relations

3. Previous Work

Intuition tells us that the similarity between object A

and B is related to both common and different characteristics. The more commonality they share, the more similar they are. The more differences they have, the less similar they are. The definition of general objects’ similarity is given: Definition 3. (Similarity) A similarity measure is a real-valued function sim(x,y):U×U→[0,1] on a set U measuring the degree of similarity between x and y. where U is the set of universe.

Tversky [10-11] developed a feature-theoretical approach to the analysis of similarity relations. See equation (1):

S(A,B)=

)]()()([)(

ABfBAfBAfBAf

−∗+−∗+∩∩

βα (1)

Where the features shared by A and B, A∩B; the features of A that are not shared by B , A – B; the features of B that are not shared by A, B – A.

Many measuring similarity between ontological concepts have been proposed in the past researches [4,12-15]. These measures can be divided into two main approaches: network distance models and information theoretic models.

Proceedings of the 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)0-7695-2751-5/06 $20.00 © 2006

Page 3: [IEEE 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06) - Guangzhou, China (2006.12.12-2006.12.12)] 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)

3.1. Network distance models

Because of hierarchical structure of the ontology, one

of most natural approaches to evaluate semantic similarity in ontology is to use its graphical representation and measure the distance between the nodes corresponding to the concepts being compared. The shorter the path from one node to another, the more similar they are.

In order to measure similarity, the simplest measure is to count edges. The shorter the path from one node to another, the more similar they are. But this is the underlying assumption that edges or links between concepts represent uniform distances. To overcome the limitation, it is therefore necessary to consider that the edge connecting the two nodes should be weighted. So the depth, the density of edges at that depth, and the strength of connotation between parent and child nodes should be considered [15].

Sussna [16] considered the three factors in the edge weight determination scheme. The weight between two nodes c1 and c2 is calculated as follows:

dccwtccwtccwt rr

2)()(),( 1'221

21→+→= (2)

Given

)(minmaxmax)(

xnyxwt

r

rrrr

−−=→ (3)

Where r is a relation of type r, �r' is its reverse, d is the depth of the deeper one of the two, maxr and minr are the maximum and minimum weights possible for a specific relation type r respectively, and nr(x) is the number of relations of type r leaving node x.

Wu and Palmer[16] proposed a measure for semantic similarity :

321

321& 2

2),(NNN

Nccsim PalmerWu ×++×= (4)

Where N1 and N2 are the length of the path from c1 and c2 to their most specific common super-concept c3, and N3 is the length of the path from c3 to the root of the hierarchy.

In ontology, concept inclusion relation plays a central role as the ordering relation that bind the ontology in a lattice. Concepts in the direction of the inclusion are called generalizations, while concepts in the opposite direction of the inclusion are called specializations. Concept inclusions intuitively imply strong similarity in the opposite direction of the inclusion, but the direction of the inclusion must contribute with some degree of similarity. That is, upwards and downwards are considered as generalization and specialization respectively. Henrik [9] used two factors isρ and

inclusionρ to express similarity of immediate

specialization and generalization respectively, in fig.2,

isρ =0.9, inclusionρ =0.4. Then a simple similarity function can be defined: If there is a path from node (concept) x to y in the hypernym/hyponym relation, then it has the form: P = (v1, · · · , vk) where vi IS-A vi+1 or vi+1 IS-A vi for each i with x = v1 and y = vk.

Given a path P = (v1, · · · , vk) with x=v1 and y=vk, sets s(P) and g(P) are the numbers of specializations and generalizations respectively along the path P. thus: s(P)=|{i|vi ISA vi+1}| and g(P)=|{i|vi+1 ISA vi}|

If P1, · · · , Pm are all paths connecting x and y, this similarity measure can be considered as derived from the ontology by transforming the ontology into a directed weighted graph, and similarity derives as the product of the weights on the paths. Formally, the degree that y is similar to x can be defined as [9]:

}{ )()(

,,1max),( jj pg

inclusionps

ismjyxsim ρρ ⋅=

= (5)

Fig.2. A fragment of ontology by parameterizing specialization and generalization with inclusion relation In Fig.2, with equation (5), sim(tiger,leopard)= 0.36 3.2. Information Theoretic Models

The information-theoretic models add to the information already present in the hierarchy by using a probability theory. Conceptual similarity between two concepts c1 and c2 may be judged by the degree to which they share information. The more information they share, the more similar they are. Following the notation in information theory, the information content (IC) of a concept c can be quantified as follows [15]: )(log)( cpcIC −= (6) Where P(c) is the probability of encountering an instance of concept c. In the hierarchical structure, P(c) is monotonic as one moves up the hierarchy. As the node’s probability increases, its information content decreases. If there is a unique top node in the hierarchy, then its probability is 1; hence its information content is 0.Formally,

0.9/0.4 0.9/0.4

0.9/0.4 0.9/0.4 0.9/0.4

0.9/0.4

0.9/0.4

big cat

animal

dog

leopard tiger lion poodle alsatian

Proceedings of the 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)0-7695-2751-5/06 $20.00 © 2006

Page 4: [IEEE 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06) - Guangzhou, China (2006.12.12-2006.12.12)] 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)

∑∈

=)(

)()(cwordsn

ncountcFreq (7)

Where words(c) is the set of words subsumed by concept c. Concept probabilities were computed as : N

cFreqcp )()( = (8) Where N was the total number of words observed.

The semantic similarity between two concepts c1 and c2 is not about the concepts but about the instances of the concepts. For example, if one says that a lemon and an orange are similar, the set of lemons, is not being compared to the set of all oranges. Therefore the amount of information contained in x1�c1 and x2�c2 is

)(log)(log 21 cpcp −− (9) Where P (c1) is the probability that a randomly

selected object x1 would belong to c1. If concept c3 is the most specific concept that subsumes both cl and c2, Guided by the intuition that the similarity between two concepts may be judged by “the extent to which they share information”, the similarity of two concepts can be formally defined as [15]:

)(log)](log[max

)]([max),(

3),(

),(21

21

21

cpcp

cICccsim

ccSupc

ccSupcR

−=−

==

∈ (10)

Where Sup(c1,c2) is the set of concepts that subsume both c1 and c2 . To maximize the representativeness, the similarity value is the information content value of the node whose IC value is the largest among those super classes. In another word, the node c3 is the “lowest upper bound” among those that subsume both c1 and c2.

Another similarity approach is [3,17]:

)(log)(log)(log2),(

21

321 cpcp

cpccsimL +×= (11)

Interestingly, if conditional probability of encountering an instance of the child concept ci given an instance of its parent concept pa: P(ci |pa) is the same for all pairs of concepts, equation (11) coincides with equation(4)[15]. In fact, Network Distance Models and Information Theoretic Models target semantic similarity from different angles. These two measures may be viewed as a variation of Tversky's parameterized ratio model of similarity (see equation (1)) [3].

The previous similarity measures (except equation (5)) have a common property: equivalence relation.

Formally, For U is a universal set of all sets defined on a given domain D [18]: Definition 4.(A Equivalence relation) An equivalence relation is a mapping, sim : U×U ->[0,1], such that for x, y, z∈U, (a) Reflexivity: sim(x, x) = 1 (b)Symmetry: sim(x, y) = sim(y, x) (c)Transitivity:sim(x,y)=1, sim(y,z)=1⇒ sim(x,z)=1.

But we do not think such an equivalence relation

could be suitable for semantic similarity between concepts in ontology. For a counter example, with concept inclusions relation, intuitions imply dogs have strong similarity with animals, but animals are only to some degree similar to dogs. That is, the similarity should be anti-symmetry.

The equation (5) solve anti-symmetry problem, In Fig 2, with equation (5), sim(poodle ,animal)=0.4*0.4=0.16, while sim(animal,poodle)=0.9*0.9=0.81.

However this method has some limitations: First, Henrik [18] did not consider density of edges.

The intuition is that the similarity between two siblings on density edges as “leopard and tiger” should be higher than the similarity between siblings on lower density edges as “poodle and alsatian” at the same level.

Second, in Fig 2, with equation (5) we can get that sim(poodle,alsatian)=sim(big cat,dog)=0.36, but the intuition tells us that the similarity between for siblings on low levels in the ontology as “poodel and alsatian” should be higher than the similarity between siblings close to the top as “cat” and “dog”.

In order to find a reasonable similarity measure to deal with complex situation for ontology, we introduce fuzzy sets and weak fuzzy relation. 4. Our Work: A Fuzzy Similarity Measure

Firstly we give a definition of Anti-Symmetry Similarity: Definition 5. (Anti-Symmetry Similarity) A similarity measure is a real-valued function sim(x,y):U×U→[0,1] on a set U measuring the degree that y is similar to x, and sim(y,x):U×U→[0,1] on a set U measuring the degree that x is similar to y, where U is the set of universe.

The theory of Fuzzy Sets proposed by Zadeh [19] has achieved a great success in various fields. Recently, researchers have begun to propose the use of fuzzy ontologies for query refinement and Information Retrieval [3,17,20-21]. Semantic similarity measure may be generalized to a fuzzy semantic similarity measure if the weights for the relation link are replaced by membership degrees indicating the strength of the relationships between the parent and child concepts. The structure of the ontology could be used to define new fuzzy set similarity measures. 4.1. Fuzzy Sets and Weak Fuzzy Similarity Relation

For U is a universal set of all fuzzy sets defined on a given domain D [18]: Definition 6.( Fuzzy Set membership function) Let D be an ordinary set of a given particular domain of

Proceedings of the 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)0-7695-2751-5/06 $20.00 © 2006

Page 5: [IEEE 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06) - Guangzhou, China (2006.12.12-2006.12.12)] 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)

data. An imprecise data, X, over domain D regarded as a fuzzy set X on D is simply defined as a mapping from D to the closed interval [0,1] characterized by a membership function ].1,0[D:X →µ Where X is a label and

Xµ is a membership function of the fuzzy set. If X is finite, X can be expressed in the form:

}x/)x(,,x/)x({X nnX11X µµ= xi∈D To deal with fuzzy data, in 1970, Zadeh [18]

proposed a weaker relation, called fuzzy similarity relation. Definition 7. ( A fuzzy similarity relation) A fuzzy similarity relation is a mapping, sim :U×U ->[0,1], such that for x, y, z∈U, (a) Reflexivity: sim(x, x) = 1 (b)Symmetry: sim(x, y) = sim(y, x) (c)Max-min Transitivity: sim(x,z)≥ )]z,y(s),y,x(smin[max

Uy∈

However, this max-min transitivity is still considered as a very restrictive constraint. In fact, we think that the degree of similarity between two concepts in ontology is neither necessarily symmetric nor necessarily transitive. A weak fuzzy similarity relation as a generalization of a fuzzy similarity relation is proposed [22]. Definition 8. (A weak fuzzy similarity relation) A weak fuzzy similarity relation is a mapping, sim :U×U ->[0,1], such that for x, y, z∈U, (a) Reflexivity: sim(x, x) = 1 (b)Conditional symmetry: if sim(x, y) >0 then sim(y, x)>0 (c)Conditional transitivity: if sim(x, y) ≥ sim(y, x)>0 and sim(y,z)≥sim(z,y)>0 then sim(x,z) ≥sim(z,x)

According to above discuss, we think that those existed similarity measures could not reflect completely the characteristic of structure in an ontology. Only the weak fuzzy similarity relation in Definition 8 can be suitable for fuzzy similarity between two concepts in ontology. Next we’ll propose a new approach that satisfies weak fuzzy similarity relation to measure conceptual similarity. 4.2. A Fuzzy Similarity Measure on Atomic Concepts With Inclusion Relation

Firstly we propose a fuzzy similarity measure only on atomic concepts with inclusion relation. In order to solve density of edges problem, we can borrow the original thought of equation (3) to verify weights and reflect our intention. Assuming Inclusionmax =0.4 and

Inclusionmin =0.3 are the maximum and minimum

weights for an inclusion relation respectively, and nInclusion (x) is the number of inclusion relations leaving node x. In fig. 2, nInculde(big cat)=3 nInclude(dog)=2, combined with equation (3): wt(big cat,tiger)= maxσ -( maxσ - minσ )/3=0.37

wt(dog,poodle)= maxσ -( maxσ - minσ )/2=0.35 After verifying weights Fig.2 becomes Fig. 3

Fig.3. verified weights of Fig.2 In order to solve the depth, the density of edges at

that depth problem and consider some other factors, we introduce semantic path and extend an atomic concept to a fuzzy set. Definition 9. (Extended Fuzzy Set for Atomic Concept With Inclusion): For c is a atomic concept,

)ROOT,c(H , there is only one path from c to ROOT in ontology O, named path(c,ROOT). Assume C’= {c,c1,c2,…ROOT} is the set of nodes along path . The concept c can be extended to the set C; the fuzzy set of c can be described by: c+={1/c,sim(c,c1)/c1, sim(c,c2)/c2,…,sim(c,ROOT)/ROOT} For example, in Fig.6, leopard+={1/leopard, 0.37/big cat, 0.13/animal}, tiger+= {1/tiger, 0.37/big cat, 0.13/animal }.

With Definition 9 we can extend any concept to a fuzzy set in hierarchical structure. According to information theoretic models, we should retain the greatest possible shared information of the concepts being compared. After defining fuzzy set, we give a definition of fuzzy similarity based on fuzzy sets: Definition 10. (Fuzzy similarity) Let

Xµ and Yµ be

two membership functions over a given domain D for two labels X and Y of a universe U. A fuzzy similarity relation is a mapping, R: U × U -> [0,1], defined by:

∑∑∑

−+

=∩

−+∩

=

Dd Y

Dd YX

Dd X

Dd YX

)d()}d(),d(min{

)1(

)d()}d(),d(min{

yYX

)1(X

YX)Y,X(R

µµµ

α

µµµ

α

αα (12)

0.9/0.35 0.9/0.35

0.9/0.35 0.9/0.35 0.9/0.37

0.9/0.37

0.9/0.37

big cat

animal

dog

leopard tiger lion poodle alsatian

Proceedings of the 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)0-7695-2751-5/06 $20.00 © 2006

Page 6: [IEEE 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06) - Guangzhou, China (2006.12.12-2006.12.12)] 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)

Where R(X,Y) is the degree that Y is similar to X and ∑ ∈

=Dd X dX )(|| µ and ∑ ∈

=Dd Y dY )(|| µ are

regarded as cardinality of X, and α is a parameter, ]1,0[∈α .

Equation (12) is based on information theory. Especially, semantic path and cardinality of

∑ ∈=∩

Dd YX )}d(),d(min{YX µµ are actually based on

shared information content. What is more, it satisfies properties of reflexivity,

conditional symmetry and conditional transitivity, which is a weak fuzzy similarity relation.

With equation (12) we can compute fuzzy similarity R(c1+ , c2+) between c1 and c2 : 1) R(big cat+,poodle+)=0.144

R(poodle+,big cat+)=0.086 It still satisfies anti-symmetry property, that is, the

“cost” of generalization should be significantly higher than the cost of specialization. 2) R(poodle+,alsatian+)=0.32,

R(big cat+,dog+)=0.26. That means it satisfies “specificity cost property”.

That is, the similarity between two siblings on low levels in the ontology as “poodle” and “alsatian” is higher than the similarity between siblings close to the top as “cig cat” and “dog”. 3) R(tiger+,leopard+)= 0.33

R(poodle+,alsatian+)=0.32 We can get that the similarity between two siblings

on density edges as “leopard and tiger” is higher than the similarity between two siblings on lower density edges as “poodle and alsatian”. 4.3. A Fuzzy Similarity Measure on Complex Concept with Semantic Relations

Finally we solve a fuzzy similarity measure on complex concepts with semantic relation. To represent different semantic relations, an explicit way is introduce different similarity factors for different semantic relations. Assume that we have k different semantic relations R1, . . . ,Rk and let k1 , ρρ be the attached similarity factors. Given a path P=(P1, · · · , Pn), set rj(P) to the number of Rj edges along the path P, rj(P)=|{i|Pi Rk Pi+1}| If P1, · · · , Pm are all paths connecting x and y then the degree to which y is similar to x can be defined as [9]:

}{ )P(rk

)P(r2

)P(r1m,,1j

k21

max)y,x(sim ρρρ ⋅==

(13)

For example, to compute similarity between disorder[CBY:lack[WRT:vitaminC]] and water-soluble vitamin, there are more than one path from disorder[CBY:lack[WRT:vitaminC]] to water-soluble vitamin, we use the maximal one by (13) is as their

similarity. In order to consider density of certain semantic

relation, we still use equation (3) to verify weights. Fig. 4 is a fragment of the ontology after verified weights.

di sorder[CBY:l ack[WRT:vi tami nC]]

vi tami nC

l ack[WRT:vi tami nC]

water-sol ubl e vi tami n

di sorder[CBY:l ack[WRT:vi tami nK]]

l ack[WRT:vi tami nK]

fat-sol ubl e vi tami n

vi tami nK

l ack

vi tami n di sorder

di sorder[CBY:l ack]

enti ty

Fig.4. A fragment of the ontology parameterizing

with semantic relations In order to extend a complex concept to a fuzzy set

we give the following definition. Definition 11. (Extended Fuzzy Set for Complex Concept With Semantic relations): Given a complex concept c =c0 [r1: c1, ..., rn,: cn], where c0 is the atom attributed in c and cl,. . . , cn are the attributes (which are atoms or further compound concepts). The concept c can be extended to a concept set C={c1’,c2’,…ROOT} along the upward direction of hierarchy structure H. The fuzzy set of c can be described by: c+={1/c,sim(c,c1’)/c1’, sim(c,c2)/c2’,…,sim(c,ROOT)/ROOT}

In definition 11, there perhaps exists more than one path from c to ci or from ci to ROOT, concepts along these paths are all be considered by having semantic relation with c. For example, in Fig. 4, disorder[CBY:lack[WRT:vitaminC]]+={ 1/ disorder[CBY:lack[WRT:vitaminC]], 0.55/ disorder[CBY:lack], 0.33/disorder, 0.4/ lack[WRT:vitaminC], 0.22/ lack, 0.11/vitaminC, 0.066/water-soluble vitamin, 0.0363/vitamin,0.188/entity} According to equation (14), we can compute semantic similarity between disorder[CBY:lack[WRT:vitaminC]] and disorder[CBY:lack[WRT:vitaminK]]: R(disorder[CBY:lack[WRT:vitaminC]],

Proceedings of the 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)0-7695-2751-5/06 $20.00 © 2006

Page 7: [IEEE 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06) - Guangzhou, China (2006.12.12-2006.12.12)] 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)

disorder[CBY:lack[WRT:vitaminK]])=0.35 4.4. Evaluation

According to above discussion, we think that our fuzzy similarity measure (definition 10) has three advantages: firstly, it could reflect latent semantic relation between concepts; secondly, it is based on information theory. Especially, it is actually based on shared information content; thirdly, it satisfies reflexivity, conditional symmetry and conditional transitivity of weak fuzzy similarity relation, which is a more general fuzzy similarity relation.

Next we give an evaluation to our fuzzy similarity measure for a general ontology (Fig. 5). To compute simply, this general ontology own atomic concepts with only inclusion relation.

Fig. 5. Illustration of a general ontology

The comparison of anti-symmetry property

0.25

0.28

0.31

0.34

0.37

0.4

0.43

2 3 4 5 6depth on the ontology

sim

ilarit

y

the degree son concept is similar to parent concept

the degree parent concept is similar to son concept

Fig.6. The comparison of anti-symmetry property

The i mpact of dept h t o si mi al r i t y bet weensi bl i ng concept s

0.25

0.3

0.35

0.4

0.45

2(c2,c3) 3(c4,c5) 4(c8,c9) 5(c14,c15) 6(c34,c35)depth on the ontology

sim

ilarit

y

Fuzzy similarity between sibling concepts

Fig.7. The impact of depth to similarity between sibling concepts

The impact of density to simialrity between siblingconcepts at a level

0.38

0.39

0.4

0.41

0.42

0.43

0.44

2(c14,c15) 3(c16,c17) 4(c19,c20) 5(c23,c24) 6(c28,c29)

density of the ontology at a level

sim

ilarit

y

Fuzzy similarity between sibling cocepts

Fig.8.The impact of density to similarity between sibling concepts

Fig. 5 is an illustration of a general ontology. Fuzzy

similarity between concepts in fig. 5 is computed with fuzzy similarity. We can see that it satisfy anti-symmetry property from fig.6. The similarity between two siblings on low levels in the ontology is higher than the similarity between siblings close to the root from fig.7. The similarity between two siblings on density edges is higher than the similarity between two siblings on lower density edges from fig.8. These three characteristics reflect latent semantic relation between concepts in an ontology. 5. Conclusion Existing resource description and resource selection in the Grid is highly constrained. Traditional resource

Proceedings of the 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)0-7695-2751-5/06 $20.00 © 2006

Page 8: [IEEE 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06) - Guangzhou, China (2006.12.12-2006.12.12)] 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)

matching is done based on symmetric, attribute-based matching. In these systems, the exact matching and coordination between providers and consumers make such systems inflexible and difficult to extend to new characteristics or concepts. Instead of exact syntax matching, our ontology-based approach, a fuzzy semantic similarity approach to measure conceptual relations can perform semantic matching using terms defined in ontology, which is a flexible and extensible approach for parameters matching of web services. The loose coupling between resource and request descriptions remove the tight coordination requirement between resource providers and consumers.

In this paper we propose a fuzzy similarity measure for not only atomic concept with inclusion relation but also complex concepts with semantic relations, which is a semantic similarity measure based on semantic paths and shared information content. The focus is regarding each concept as a fuzzy set along its semantic paths. And a new approach is proposed that the semantic similarity between the two concepts can be determined by computing two semantic extended fuzzy sets rather than only comparing concepts themselves. Our fuzzy similarity measure could reflect latent semantic relation of concepts better than ever; furthermore, it has property of weak fuzzy similarity relation. . References [1] Y.Q. Wang, and E. Stroulia.,”Semantic Structure Matching for Assessing Web-Service Similarity”, Lecture Notes in Computer Science, Springer Berlin , Heidelberg, 2003,2910, pp.194 – 207. [2] M. Paolucci, T. Kawamura, T. R. Payne, and K. Sycara, “Semantic Matching of Web Services Capabilities”, http://citeseer.ist.psu.edu/paolucci02semantic.html. [3] V. Cross, “Fuzzy Semantic Distance Measures Between Ontological Concepts”, Fuzzy Information. 04. IEEE Annual Meeting of the North America, Vol. 2, June 2004, pp.635 - 640 [4] A. Maedche1, and S. Staab, “Measuring Similarity between Ontologies”, Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Springer-Verlag, London, UK,2002, pp. 251 – 263. [5] V. K. Chaudhri, A. Farquhar, R. Fikes, P. D. Karp, and J. P. Rice, “OKBC: A Progammatic Foundation for Knowledge Base Interoperability”,Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, Madison, Wisconsin, United States, 1998, pp.600-607. [6] A. Hotho, A. Maedche, and S. Staab, “Ontology-based Text Document Clustering”, http://citeseer.ist.psu.edu/585623.html. [7] T. Andreasen,H.Bulskov,and R. Knappe, “Similarity From Conceptual Relations”,. http://akira.ruc.dk/~knappe/publications/nafips2003.pdf. [8] J. F. Nilsson, “A Logico-algebraic Framework for Ontologies”, In Proceedings of workshop on Ontology-based interpretation of NP's, Kolding, Denmark, January 17, 2000.

[9] H. Bulskov, R. Knappe, T. Andreasen, “On Measuring Similarity for Conceptual Querying”, Proceedings of the 5th International Conference on Flexible Query Answering Systems, 2002, pp.100–111. [10] A.Tversky, “Features of Similarity”, Psychological Review, 1977, 84(4), pp.327-352. [11] A. Tversky , I. Gati, “Studies of Similarity”, http://ruccs.rutgers.edu/forums/seminar1_fall03/Lila2.pdf. [12] P. Resnik, “Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language”, Journal of Articial Intelligence Research, 1999, 11, pp.95-130. [13] M. A. Rodrı´guez, and M. J. Egenhofer, “Determining Semantic Similarity among Entity Classes from Different Ontologies”, IEEE Transactions on Knowledge and Data Engineering.2003,15(2), pp442 – 456. [14] P. Haase, M. Hefke, and N. Stojanovic, “Similarity for Ontologies - a Comprehensive Framework”, http://citeseer.ist.psu.edu/ehrig04similarity.html. [15] J. J. Jiang, and D. W. Conrath, “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy”. In Proceedings of International Conference Research on Computational Linguistics (ROCLING X), Taiwan, 1997. [16] M. Sussna, “Word Sense Disambiguation for Free-text Indexing Using a Massive Semantic Network”, Proceedings of the Second International Conference on Information and Knowledge Management, Washington, D.C., United States, 1993, pp.67 - 74. [17] D. H. Widyantoro, and J. Yen. A Fuzzy Ontology-based Abstract Search Engineer and Its User Studies, http://ist.psu.edu/yen/publications/fuzzieee01.pdf. [18] R. Intan, “Rarity-based Similarity Relations in a Generalized Fuzzy Information System”, Proceedings of the 2004 IEEE Conference on Cybernetics and Intelligent Systems, Singapore, December 1-3, 2004, pp.462-467. [19] L. A. Zadeh, “Similarity Relations and Fuzzy Orderings”, Information Science, 1970, 3(2), pp.177-200. [20] D. Parry, “A fuzzy ontology for medical document retrieval”, Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation, Dunedin, New Zealand, 2004, pp.121 -126. [21] D. H. Widyantoro and J. Yen. Using Fuzzy Ontology for Query Refinement in a Personalized Abstract Search Engine. http://students.cs.tamu.edu/dhw7942/papers/ifsanafips01.pdf. [22] R. Intan, M. Muhidono, “A Proposal of Fuzzy Thesaurus Generated by Fuzzy Covering”, Fuzzy Information Processing Society-22nd International Conference of the North American, 2003, pp.167- 172.

Proceedings of the 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)0-7695-2751-5/06 $20.00 © 2006