18
AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany [email protected] http://www-dbs.cs.uni-sb.de/ An Ontology for Domain-oriented Semantic Similarity Search On XML Data (BTW) February 25 – 28, 2003 Leipzig, Germany

AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany [email protected] An Ontology for Domain-oriented Semantic

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 1

Anja Theobald

University of the Saarland, [email protected]

http://www-dbs.cs.uni-sb.de/

An Ontology for Domain-oriented Semantic Similarity Search

On XML Data

(BTW)February 25 – 28, 2003

Leipzig, Germany

Page 2: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 2

MotivationQuery on Web Data:

Ranking based on content data and structure (XML,…)

Grouping results by their topics Using Ontologies for similarity search

movie

astronomy

sports

Page 3: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 3

Outline

5. Similarity of Ontology Nodes

2. Ontologies - a Linguistic Challenge

3. Graph-based Ontology

4. Quantification: Edge Weights

6. Ontology-based Query Processing

0. Why we need Ranked Retrieval and Ontologies?1. XXL Search Engine

Page 4: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 4

XXL Search Engine

VisualXXL

WWW

CrawlerPathIndexer

ContentIndexer

NameOntologyIndexer

ContentOntologyIndexer

EPI

ECI

NOI

COI

QueryProcessor

EPIHandler

ECIHandler

NameOntologyHandler

ContentOntologyHandler

XXL Query:SELECT * FROM INDEXWHERE #.~universe AS U AND U.#.~appearance AS A AND U.#.S ~ „star“

… XML Document<galaxy> <object> <description>sun</> <appearance>…light and heat…</> <location>…</> … </object> <history> … </> …</galaxy>…

Page 5: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 5

Ontologies – a linguistic challenge

ontology: ...representational vocabulary of words including hier- archical relationships and associative relationships between these words [Gruber93]...

symboliz

ed

stands for

refers to

sense: ...a celestial body of hot gases...

word: star

object:

Page 6: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 6

Word – Sense – Synset

synset(s) = { w | (w,s) U}

U = {(w,s) | w Σ*, s S: word w has sense s}

words w Σ*+ word senses

+ synonym relationship

Page 7: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 7

Disambiguation: Synset – Category synset(s) = { w | (w,s) U}// U = {(w,s) | word w has

sense s}+ hypernym relationship

sense s:synset(s): sense 1: (astronomy) a celestial

body of hot gases…

starsense 4: a plane figure with 5 or more points…

star

category(s) = { synset(s‘) | synset(s‘) is hypernym of synset(s)}

celestial body, heavenly body

natural object

object, physical object

entity, physical thing

plane figure, 2-dim. figure

figure

abstraction

shape, form

attribute

Page 8: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 8

Disambiguation: Synset – Category

sense s:synset(s):

entity, physical thing

object, physical object

natural object

celestial body, heavenly body

sense 1: (astronomy) a celestial body of hot gases…

star

synset(s) = { w | (w,s) U}// U = {(w,s) | word w has sense s}

+ hypernym relationship

plane figure, 2-dim. figure

figure

abstraction

shape, form

attribute

sense 4: a plane figure with 5 or more points…

star

category(s) = { synset(s‘) | synset(s‘) is hypernym of synset(s)}

Page 9: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 9

Example Ontologyentity, physical thing

[entity, physical thing]

food[substance, matter]

milk[foodstuff, ...]

cows‘milk[milk]

group, grouping[group, grouping]

galaxy, ...[collection,...]

milky way[galaxy,...]

natural object[object,...]

sun[star]

universe, cosmos[collection,...]

star[celestial body,...]

Beta Centauri[star]

[0. 71]

abstraction[abstraction]

star[plane figure, 2-dim figure]

hexagram[star]

[0.83]

[0.94]

Page 10: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 10

Graph-based Ontology

Ontology G=(V,E)x = (synset(s), category(s))

Ve = (x,y, type, weight) E

Construction:

Use:

word: ... extracted from a document category, type:

... extracted from an existing thesaurus (interchangable!!!) weight: ... expresses semantic similarity of connected words

sim: ... expresses semantic similarity of ontology nodes

Page 11: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 11

Quantification: Edge Weight semantic similarity of connected synsets according to their concepts vector space measures / probabilistic measures

galaxy, extragalactic nebula[collection,aggregation,accumulation,assemblage]

star[celestial body,heavenly body]

sun[star]

DICE coefficient:||||

||2

YX

YX

…using web search engines

for word frequencies…

Y := (cel heav) (star)X := (coll … ass) (galaxy extr…)

X Y := X Y

172.0600.15600.70

410.72

||||

||2

YX

YX

[0.172]

[0.113]

113.0000.470.2600.15

000.1402

||||

||2

YX

YX

Page 12: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 12

Similarity of Ontology Nodesentity

[entity]

cows‘ milk[milk]

milk[liquid]

protein[macromolecule]

group[group]

galaxy[collection]

milky way[galaxy]

sun[star]

universe[collection]

star[celestial body]

Beta Centauri[star]

[0.2]

natural object[object]

[0.6][0.5]

[0.8]

[0.1]

[0.1]

[0.3]

[0.6]

1)(

01 ),(

)(

)()(

plength

mmmxy nnweight

plength

mplengthppathsim

sim(milky way, sun)

|p|=3: 3/3 0.6 + 2/3 0.5 + 1/3 0.8 = 1.2

Page 13: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 13

Similarity of Ontology Nodesentity

[entity]

cows‘ milk[milk]

milk[liquid]

protein[macromolecule]

group[group]

galaxy[collection]

milky way[galaxy]

sun[star]

universe[collection]

star[celestial body]

Beta Centauri[star]

[0.2]

natural object[object]

[0.6][0.5]

[0.8]

[0.1]

[0.1]

[0.3]

[0.6]

1)(

01 ),(

)(

)()(

plength

mmmxy nnweight

plength

mplengthppathsim

sim(milky way, sun)

|p|=3: 3/3 0.6 + 2/3 0.5 + 1/3 0.8 = 1.2

3/3 0.8 + 2/3 0.5 + 1/3 0.6 = 1.3

Page 14: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 14

Similarity of Ontology Nodes

)(2

)()(),(

plength

ppathsimppathsimyxsim yxxy

p

entity[entity]

cows‘ milk[milk]

milk[liquid]

protein[macromolecule]

group[group]

galaxy[collection]

milky way[galaxy]

sun[star]

universe[collection]

star[celestial body]

Beta Centauri[star]

[0.2]

natural object[object]

[0.6][0.5]

[0.8]

[0.1]

[0.1]

[0.3]

[0.6]

1)(

01 ),(

)(

)()(

plength

mmmxy nnweight

plength

mplengthppathsim

sim(milky way, sun)

|p|=3: 3/3 0.6 + 2/3 0.5 + 1/3 0.8 = 1.2

3/3 0.8 + 2/3 0.5 + 1/3 0.6 = 1.3

sim(milky way, sun) = 0.42

sim(milky way, cows‘ milk) = 0.2

Page 15: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 15

Ontology-based Query Processing XXL Query:

... WHERE #.~universe AS U AND U.#.~appearance AS A AND U.#.S ~ „star“

XXL Query Representation:

~universe

~appearance

% %

~ “star”

XML Documents:

…<galaxy> <object> <description>sun</> <appearance>…light and heat… </appearance> <location>…</> … </object> <history> … </> …</galaxy>…

Page 16: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 16

Ontology-based Query Processing XXL Query:

... WHERE #.~universe AS U AND U.#.~appearance AS A AND U.#.S ~ „star“

sim(universe,

galaxy)

0.94

1.0

sim(star, sun) * tfidf(sun)0.43

XXL Query Representation:

~universe

~appearance

% %

~ “star”

1.0

sim(app,

app)

1.0

XML Data Graph:

galaxy

object

“…light and heat…”

description

sun

appearance

location

history

Page 17: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 17

Ontology-based Query Processing XXL Query:

... WHERE #.~universe AS U AND U.#.~appearance AS A AND U.#.S ~ „star“

sim(universe,

galaxy)

0.94

1.0

sim(star, sun) * tfidf(sun)0.43

XXL Query Representation:

~universe

~appearance

% %

~ “star”

1.0

sim(app,

app)

1.0

XML Data Graph:

galaxy

object

“…light and heat…”

description

sun

appearance

location

history

(result graph) = 0.4

Page 18: AT, 26.02.2003 1 Anja Theobald University of the Saarland, Germany theobald@cs.uni-sb.de  An Ontology for Domain-oriented Semantic

AT, 26.02.2003 18

- ENDE -

Vielen Dank!

Gibt es etwa noch Fragen?