30
Inquiry Optimization Technique for a Topic Map Database Yuki Kuribara (Graduate School of Engineering, Shibaura Institute of Technology) Masaomi Kimura (Information Engineering, Shibaura Institute of Technology)

Inquiry Optimization Technique for a Topic Map Database

  • Upload
    tmra

  • View
    822

  • Download
    4

Embed Size (px)

DESCRIPTION

In this paper the inquiry optimization technique for a topic map database is presented.

Citation preview

Inquiry Optimization Technique

for a Topic Map Database

Yuki Kuribara

(Graduate School of Engineering,

Shibaura Institute of Technology)

Masaomi Kimura

(Information Engineering,

Shibaura Institute of Technology)

Contents

2010/10/6Data Engineering Lab2

Background

Research contents

Experimental

Conclusion

Topic maps

2010/10/6Data Engineering Lab3

Recently, many kinds of topic maps are created

For web portal site

For application development… and so on

When we target the large topic maps, we need to construct

databases for them

since databases can deal with the data larger than the size of physical

memory

Out of memory

On memory

The role of database

2010/10/6Data Engineering Lab4

Database systems should take responsibility for managing

information of topic maps

Query optimization

Transaction management

Physical data structure hiding

query

information

of topic map

Transaction management

Physical data structure

hiding

Database system

Queryoptimization

The physical data model for databases

2010/10/6Data Engineering Lab5

There are several options of data models for the databases

A relational model (table) and an object oriented model are mainly used

in topic map databases

When we crawl on the topic map to retrieve information, an

object oriented model needs not to join tables multiple times

unlike a relational model

We propose to utilize the object oriented model for the databases

Object BObject A

An object oriented modelA relational model

The logical data model for databases

2010/10/6Data Engineering Lab6

We assumed the topic map data structure defined by the topic

maps data model (TMDM)

since topic maps should follow TMDM!!

The data model consists of seven types of information items

and 19 types of named properties

We implemented these items as classes, whose instance have reference

relationships to other corresponding information item objects

TopicMap

Topic

Association

TopicNameAssociationRole

0..*

1

+topics

+parent

0..* 1

+associations +parent

+topicNames+parent

0..*1

+player

10..*

+roles

0..*+roles

1+parent

The possibility of plural retrieval routes

2010/10/6Data Engineering Lab7

When we retrieve the information of topic map, there may be

more than one way to retrieve the same objects

We can retrieve objects efficiently by searching method

The database systems need to select most suitable retrieval route (Query optimization)

TopicMap

Topic

Association

TopicNameAssociationRole

0..*

1

+topics

+parent

0..* 1

+associations +parent

+topicNames+parent

0..*1

+player

10..*

+roles

0..*+roles

1+parent

Query optimization

2010/10/6Data Engineering Lab8

Database systems need to estimate the suitable execution plan

the database system may take very long retrieval time without the query optimization

Though there are some topic map database systems, they seem not to take the optimization into consideration

The database should take responsibility for query optimization

Objective

2010/10/6Data Engineering Lab9

In this presentation, we focus on retrieval of topic objects that

are referred by a specific association with a particular topic

e.g.) we want to know that what Conan Doyle write?

We propose the optimization technique based on the estimation of execution cost

write

A study in

Scarlet Conan Doyle

A particular topic

Specified in the queryIntended topic

A specific association

Retrieval plan - the association route

2010/10/6Data Engineering Lab10

e.g.) What did Conan Doyle write?

A study in

Scarlet

write Conan

Doyle

1

2

2

We search the associationobjects ‘write’

We find the intendedtopic objects

We search the topic object‘Conan Doyle’

Retrieval plan - the topic route

2010/10/6Data Engineering Lab11

e.g.) What did Conan Doyle write?

writeA study in

ScarletConan Doyle

2

13

We find intendedtopics

We again search the association objects ‘write’ referred by the association role objects

We search the topic object ‘Conan Doyle’

Estimation of execution cost

2010/10/6Data Engineering Lab12

Systems have to choose the most suitable plan

It is necessary to define the cost which can effectively estimate

the retrieval time (cost estimation)

We define the estimation formulae for the retrieval cost of each plan

cost : 10

cost : 100

Route A

Route B

query

information

of topic map

Cost of objects - definition of cost

2010/10/6Data Engineering Lab13

We measured the total execution time and the retrieval time

of objects

The object retrieval time dominates the processing time more

than 99%

It is enough to measure the time to retrieve objects to

evaluate the cost of query processing

Execution Time

(A) (nano sec)

Retrieval time

of objects (B)

(nano sec)

The ratio of object

retrieval time (B/A)

Association

Route6.025×108 5.991×108 99.44 (%)

Topic

Route1.035×108 1.033×108 99.81 (%)

Retrieval time of

objects :

More than 99%

Other time :

Less than 1%

Execution time

of retrieval

Cost estimation formula

for the association route

2010/10/6Data Engineering Lab14

Q

NCCNCC tararouteassoc 2_

We need to retrieve all associations since multiple associations may have

the same name

The cost is doubled since we retrieve two topics both sides of the association

write

A study in

Scarlet

Conan

Doyle1

2 21

2

We approximate the number of associations with the specified name by the average number of associations per

their unique name

Cost estimation formula

for the topic route

2010/10/6Data Engineering Lab15

MQ

NC

M

NCC

MCC araartroutetopic

22

2_

The average times of topic retrieval ( note that each topic must have a

unique name )

The average number of associations per topic

The average number of associations that have the name specified by the

query

3 1

2

1 2 3

write

A study in

Scarlet

Conan

Doyle

Experiment

2010/10/6Data Engineering Lab16

In order to demonstrate our method, we applied our

technique to TOME

TOME is a prototype topic map database developed by authors

As target topic maps, we selected following two that have

different sizes

Rampo Edogawa* topic map

# of topics:29 (his name, his works and his hometown)

# of associations:15 (his works and his hometown)

Pokemon topic map

# of topics:174 (Pokemon names and their attributes)

# of associations:432 (evolutional and attribute relationships)

*Rampo Edogawa is a famous mystery story writer in Japan.

Evaluation of cost estimation formulae

2010/10/6Data Engineering Lab17

In order to evaluate our cost estimation formulae, we

measured the execution time of a query and compared the

tendency of the value of cost

Topic Maps

The average time of query execution

(nano sec)

The evalueated cost for each query

execution plan

The association

routeThe topic route

The association

routeThe topic route

Rampo Edogawa

Topic Map31 157 133.2 164.0

Pokemon

Topic Map297 31 2533 697.7

We can see the tendencies :the less estimated costs are, the short the execution time is

> >

< <

Conclusion

2010/10/6Data Engineering Lab18

We proposed the optimization technique based on the

estimation of execution cost

We showed that there are possibly more than one way to retrieve the

same objects

We defined the cost estimation formulae for the retrieval cost of each

plan

We estimated our optimization technique

The result of our experiment shows that we can see a proportional

tendency of the retrieval time and the object size

We can also see the tendencies that estimated costs are small in the

case that the execution time is short

Thank you for your kind attention

2010/10/619 Data Engineering Lab

The effect of buffers

2010/10/6Data Engineering Lab20

If the objects existing on the memory are required to be

loaded, a buffer shortens the retrieval time

the cost estimated by the formulae needs to be modified (reduced)

because of the effect of buffers

In our target query, there are two cases that the buffer is

used :

Conan

Doyle

Write

The Sign

of Four

A Study

in ScarletThe topic for association

name existing on the memory is also loaded

from buffer

The topic existing on the memory is loaded

from buffer

The coefficients of buffer

2010/10/6Data Engineering Lab21

In our target query, we need two coefficients :

For retrieval of topic

For retrieval of topic for the association names

N

Mr

N

M

21

2

N

Qr

N

Q1

r : the effective retrieval

ratio of cost for buffer

N:the number of

association objects

M:the number of

topic objects

Q:the number of unique

association names

The probability that the topic for the association names do not exist on

buffer

The probability that the topic do not exist on buffer

The modified cost estimation formulae

2010/10/6Data Engineering Lab22

Taking the buffering effect into consideration, we modify the

cost estimation formulae into this

The contribution of loading topic name objects is also taken into

consideration

Q

NCCCNCCCC tntartntarouteassoc 2_

MQ

NCCC

M

NCCCC

MCCC tntartntaartntroutetopic

22

2_

Cost estimation formula

for the association route

2010/10/6Data Engineering Lab23

We define the cost estimation formula as follows

Q

NCCCNCCCC tntartnta 21

Retrieval of

TopicMap objects

Retrieval of

Topic objects

Retrieval of

Association objects

Retrieval of Topic

objects that are defined

as the Association name

Retrieval of TopicName

objects that are defined

as the Association name

Retrieval of

AssociationRole objects

Retrieval of TopicName

objects that are defined

as the Topic name

N

Mr

N

M

21

2

N

Qr

N

Q1

N:the number of

association objects

M:the number of

topic objects

Q:the number of unique

association names

TMDM permits the redundant existence of multiple associations that have the same name

We assume that the association roles are uniformly assigned to associations

Q

NCCNCC tararouteassoc 2_

The accurate cost estimation formula

for the association route

2010/10/6Data Engineering Lab24

Q

NCCCNCCCC tntartntarouteassoc 2_

N

Mr

N

M

21

2

N

Qr

N

Q1

We have to consider the retrieval cost of

topic and topic name objects and

effect of buffer

We have to consider the retrieval cost of topic name objects and

effect of buffer

Ca: the retrieval cost of

association objects

Car: the retrieval cost of

association role objects

Ct: the retrieval cost of

topic objects

Ctn: the retrieval cost of

topic name objects

N:the number of association objects

M:the number of topic objects

Q:the number of

unique association names

Cost estimation formula

for the topic route

2010/10/6Data Engineering Lab25

We define the cost estimation formula as follows

MQ

NCCC

M

NCCCC

MCCC tntartntaartnt

22

22

Retrieval of

TopicMap objects

Retrieval of

Association objects

Retrieval of

AssociationRole objects

Retrieval of

Topic objects

Retrieval of Topic objects that are

defined as the Association name

Retrieval of TopicName objects that

are defined as the Association name

Retrieval of

AssociationRole objects

Retrieval of TopicName objects

that are defined as the Topic name

Retrieval of

Topic objects

Retrieval of TopicName objects

that are defined as the Topic name

TMDM permits the existence of only one topic that has the same name

Regarding the topic map as a graph, this is equal to the average degree

We assume that the association roles are uniformly assigned to associations

The accurate cost estimation formula

for the topic route

2010/10/6Data Engineering Lab26

MQ

NCCC

M

NCCCC

MCCC tntartntaartntroutetopic

22

2_

We have to consider the

retrieval cost of topic name

objects

We have to consider the retrieval cost of

topic objects and topic name objects and effect of buffer

We have to consider the

retrieval cost of topic name objects and effect of buffer

N

Mr

N

M

21

2

N

Qr

N

Q1

Ca: the retrieval cost of

association objects

Car: the retrieval cost of

association role objects

Ct: the retrieval cost of

topic objects

Ctn: the retrieval cost of

topic name objects

N:the number of association objects

M:the number of topic objects

Q:the number of

unique association names

MQ

NC

M

NCC

MCC araartroutetopic

22

2_

Result-Cost estimation of an object of each

class

2010/10/6Data Engineering Lab27

Topic Maps The object nameThe retrieval time

(nano sec)

The normalized value

by setting the retrieval time

to be 1

The object

Size

(byte)

The normalized value

by setting the object size

to be 1

Rampo

Edogawa

Topic Map

The retrieval time of

topic969200 3.34 608 4.75

The retrieval time of

topicname496700 1.71 376 2.94

The retrieval time of

associationrole289900 1 128 1

The retrieval time of

association562600 1.94 376 2.94

Pokemon

Topic Map

The retrieval time of

topic1053000 5.5 608 4.75

The retrieval time of

topicname501600 2.62 376 2.94

The retrieval time of

associationrole191400 1 128 1

The retrieval time of

association577700 3.02 376 2.94

We can see a similar tendency between the retrieval time and the object size

Retrieval cost of each object

2010/10/6Data Engineering Lab28

We measured the retrieval time and the object size of each

object

The result tells us that the retrieval time is almost proportional to the

object size

Based on this, we define the cost as an object size scale factor

( the ratio of object size to association role objects)

Topic Maps The object nameThe normalized value by setting

the retrieval time to be 1Object size scale factor

Pokemon

Topic Map

Topic object 5.5 4.75

Topic name object 2.62 2.94

Association role object 1 1

Association object 3.02 2.94

We can see a similar tendency between the retrieval time and the object size

Future perspective

2010/10/6Data Engineering Lab29

We will apply our method to other topic maps that have much

larger size

Our target topic maps are less than 1000 topics

We need to confirm the universality of cost estimate formulae by

evaluating of various topic maps

We will develop the mechanism to measure the size of objects

in a topic map

Since the size of objects depends on each topic map, we have to

measure it to set the value of costs adequate to evaluate execution plan

Reference

2010/10/6Data Engineering Lab30

M. Naito:An Introduction to Topic Maps. Tokyo Denki University

Press, 2006.

Yuki Kuribara, Takeshi Hosoya, Masaomi Kimura : TOME : The

Topic Map Database Extended, 2009

Ontopia:tolog Language tutorial.

http://www.ontopia.net/

ISO/IEC JTC1/SC34, Topic Map – Data Model

http://www.isotopicmaps.org/sam/sam-model/

Pokemon Topic Map

http://www.ontopia.net/omnigator/models/topicmap_complete

.jsp?tm=pokemon.ltm

Pajek, http://vlado.fmf.uni-lj.si/pub/networks/pajek/