23
LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik [email protected]

LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik [email protected]

Embed Size (px)

Citation preview

Page 1: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

LoG: A Methodology for Metadata Registry-based Management of

Scientific Data

July 5, 2002

Doo-Kwon Baik

[email protected]

Page 2: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 2

Content

Motivation Objectives Related works

Overview on the MDR

The scientific data properties User levels and the data property Data visibility The conceptual model of the LoG A LoG Framework An Example Conclusions and Future work

Page 3: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 3

Motivation

The existing data integration approachesjust focus on the technical researches and

system developments

not consider the properties of the domain

knowledge

Page 4: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 4

The Domain Knowledge

The domain knowledge property is a very important factor in data integration Many works and services depends on the domain knowledge

properties• The quality degree and the quantity scope in data integration are

defined depending on the domain knowledge property.• Many other services such as data services and application services

depend on it.

Domainknowledge

the quality degreeof data integration

the quantity scopeof data integration

data services(information providing)

applicationservices

Page 5: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 5

Objectives

The objectives of our research to solve the problems of the existing data integration approaches to analyze and define the domain knowledge properties

• In this paper, we focus on the scientific data.

to define relationship among the domain knowledge properties, users and metadata

• i.e., define the considerations for data integration.

to create a new methodology considering the results of domain knowledge analysis

• we called it as LoG (Localization-based Global MDR methodology).

finally to design a framework which is suitable for the methodology.

Page 6: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 6

Related works: Bottom-up approach(1/2) The existing data integration approaches are classified into the

top-down approach and the bottom-up approach Bottom-up approach

is the most general approach The ontology-based methodology is representative

Design and create a guideline such as a global viewfrom the specified databases

new databases(the number of them = c)

Analyze all factual databases(the number of databases = n)

the number of databases = n + c

Page 7: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 7

Related works: Bottom-up approach(2/2) Advantages

can reach the perfect data integration because we use a global guideline which is created through analysis and design about all databases

Disadvantages the creation of a global guideline spends many costs and time is not suitable for very large scale data integration provides a static integration management mechanism

• Whenever a new schema or a new database is added to the integrated database, the previous processes is required.

• It causes the increase of costs and time geometrically.

not provide a standardized guideline• i.e., it depends on its domain.• each application domain for integration define and utilize the different and

various guidelines respectively.

Page 8: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 8

Related works: Top-down approach(1/2) Top-down approach

to solve the problems of the bottom-up approach MDR(ISO/IEC 11179) is representative

• MDR is the international standard

Design and create a guideline such as a global view(metadata elements) from the specified databases

new databases

Analyze all factual databases

Define the schemas of new databaseaccording to the standardized guideline

Page 9: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 9

Related works: Bottom-up approach(2/2) Advantages

reduces many costs• because it doesn’t require for the rebuilding process of the global guideline.

provides a standardized schema• all new databases can be built and managed consistently.

Disadvantages It also spends many costs initially as the bottom-up approach

• because it require for the create a global view through analysis of all legacy databases.

• It is a hard work in case of the very large scale integration.

Page 10: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 10

Overview on the MDR: Definition

Definition of MdR Metadata Registry System of Registering, Storing and managing the specification(Metadata)

about data elements Evolution of ISO/IEC 11179 Metamodel of Data Registry : ANSI X3.285

Purpose Metadata Registry for data standardization Support of data search, data specification Support of data sharing among systems or organizations Supporting System of creating, registering and managing data element Support understanding of meaning, representation and identification of

data for users

Page 11: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 11

Overview on the MDR: Basic concepts Data Element

The basic unit of data management the unit specifying the identification, context, representation of value about data

Components of Data Element Object Class : The data for collecting or storing Property : the characteristics needed to identify and explain objects Representation : The description about representational form and value domain of

each data elements

Object Class

Property

Data Element Concept

1:N

1:1

Object Class

Property

Data Element

1:N

1:1

Representation

1:1

Page 12: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 12

Overview on the MDR: Specification Specification of Data Element

Basic Attribute for specifying data element

Classification Characteristics

Identification Identification of data element

Definition Description of meaning

Relation Relation of data elements

Representation Description of data element representation

Administration Description of data element management

Page 13: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 13

Overview on the MDR: An Example

Definition of a metadata element

Identifying and Definition

Attributes

Data Element Name Student_IDIdentifier 2002020177Version 1Synonymous name Student Number

Context Student’s ID

Definitional Attribute Definition Assigned the unique number to each student

Relational and

Representational Attributes

Type Data Element

Representation Category Number

Representation Form CodeData Type NumericMin.size 7Max.size 12

Representation Layout N(12)

Data Domain reference of student ID classification

Administrative Attribute Registration Authority KOREA UNIV.

Registration Status recorded

Page 14: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 14

The scientific data properties

The scientific data(knowledge) has the following properties: the general data

• most people can understand and use it easily.• most databases in the scientific fields have the similar or same data elements.

the specialized data• are more complicated and detailed.• the general users can’t understand it.• the experts in the specific group are interested in the data, and can utilize it.

※ Building the MDR for all data as a whole is not necessary

Page 15: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 15

User levels and the data property

Classification of users The users are classified into two groups according to the scientific data

property• The general users and the specialized users.

The general users• use the general data in high-level and in the many fields.

The specialized users• domain experts in a specific field.• use the general data and specialized data.• also differentiated into more detailed fields.

i.e., The specialized users are distributed into several groups, the experts in each group are interested in more specialized data independently.

Page 16: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 16

Data visibility

Data visibility The quantity and the specialized degree is differentiated into several

levels according to the knowledge property, and each level has a independent data set

allusers

detailed-specialized

users n

specializedusers

detailed-specialized

users 1

generalusers

. . .

used by all users

used by specialized users

used in independentexpert domain group

the whole data set

set 1

set 2

set 3

set 4set 5

Page 17: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 17

The conceptual relation diagram

General User 1 General User 2 General User n

DomainExpert 1

.

.

.

DomainExpert 2

DomainExpert n

Local MDR 1 (Domain 1)

Local MDR 2 (Domain 2)

Local MDR m(Domain m)

DB 11 DB 12

DB 1n. . .

DB 21 DB 22

DB 2n. . .DB m1 DB m2

DB mn. . .. . .Domain mDomain 2Domain 1

. . .

Global MDR

Localization

Globalization

Specialization

Generalization. . .

Page 18: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 18

The conceptual model of the LoG The LoG methodology has four layers

Interface Layer• provides the user interface environments for all users.

Global MDR Layer• manages the global MDR for the most generalized and common data which all

users(general and specialized users) utilize and access.

Local MDR Layer• manages the local MDRs for the specialized data which the experts use.• The local MDR may be hierarchical structure.

Factual Database Layer• manages the low and factual data.

User Interface LayerUser Interface Layer

Factual Database LayerFactual Database Layer

Global MDR Layer (Generalized Layer)Global MDR Layer (Generalized Layer)

Local MDR Layer (Specialized Layer)Local MDR Layer (Specialized Layer)

Page 19: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 19

Factual DB Layer Factual DB Layer

A LoG Framework(1/2)

DB 11 DB 12DB 1n. . .

DB 21 DB 22DB 2n. . .

DB m1 DB m2DB mn. . .. . .

Domain mDomain 2Domain 1

Global User Interface (General User Level Interface)Global User Interface (General User Level Interface)

Loc

al U

ser

Inte

rfac

e(E

xper

t Lev

el I

nter

face

)

Expert Level

Interface Agent

LMDR Agent(Registration, Classification, Authorization)

LMDRs

LMDR 1 LMDR 2 LMDR n…

LMeta Repository

(Sets of actual metadata)

General User Level

Interface AgentGMDR Agent

(Registration, Classification) GMDR

GMeta RepositoryGlobal MDR

Layer

Local MDR Layer

User Interface Layer

Factual DB Layer

Page 20: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 20

A LoG Framework(2/2) Interface Layer

Global user interface and local user interface sub-layers Global MDR layer

GMDR agent• manage the GMDR(global MDR) and the GMeta(global metadata repository).

GMDR(global MDR)• a standardized guideline for general users and experts.• the set of metadata elements used commonly in all databases.

GMeta(global metadata repository)• the set of actual metadata

Local MDR layer LMDR agent

• manage the LMDRs and the LMeta LMDRs(local MDRs)

• a standardized guideline for the specialized users.• a set of metadata elements which is to generalize data in each field or detailed

field.

Page 21: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 21

GMDR

LMDRs

An Example

Namedefinition the unique object name

version 1registration

statusstandard

datatype characterformat character(20)

Biological Order Name

definition The systematic name that represents

the biological Species

version 1registration

statusstandard

datatype characterformat character(50)

Chemical Molecular Formula Code

definitionThe code that represents the number of atoms of each element in a molecule of a chemical substance

version 1registration

statusstandard

datatype characterformat character(100)

NameBiological Order Name

. . .

NameChemical Molecular Formula

Code . . .

. . .

. . .

Page 22: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

July 5, 2002 CODATA/DSAO 2002 22

Conclusions and Future work

Conclusions We considered and defined the domain knowledge property The LoG methodology is proposed with the knowledge property

• provides a dynamic integration mechanism partially.• provides a standardization guideline based on ISO/IEC 11179, the

international standard.• reduces unnecessary costs from analysis and design all databases for creation

of a global view.

Future work to analyze and define the domain knowledge property in detail to implement a prototype based on the framework we described

Page 23: LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

Q / AThanks !