Discovering simple mappings between Relational database schemas and ontologies

Preview:

DESCRIPTION

Discovering simple mappings between Relational database schemas and ontologies. Wei Hu, Yuzhong Qu {whu, yzqu}@seu.edu.cn Institute of Web Science School of Computer Science and Engineering Southeast University, China. Outline. Introduction Our approach Evaluation Related work - PowerPoint PPT Presentation

Citation preview

23/4/21 ISWC2007, Nov. 14.

Discovering simple mappings between Relational database schemas and ontologies

Wei Hu, Yuzhong Qu{whu, yzqu}@seu.edu.cn

Institute of Web ScienceSchool of Computer Science and Engineering

Southeast University, China

23/4/21 ISWC2007, Nov. 14.

Outline

IntroductionOur approachEvaluationRelated workSummary and future work

23/4/21 ISWC2007, Nov. 14.

Introduction

The popularity of ontologies is rapidly growing since the emergence of the Semantic Web. Swoogle collected more than 10,000 ontologies so far. Falcons indexed more than 2 million classes/properties.

However, most of the world’s data today are still locked in data stores, and are not published as an open Web of inter-referring resources. [Ref.4. Creating a science of the Web. 2006]

About 77.3% data on the current Web are stored in relational databases. [Ref.6. SIGMOD Record. 33(3) (2004)]

So, it is necessary to establish interoperability between (Semantic) Web applications using relational databases and ontologies for creating a Web of data.

23/4/21 ISWC2007, Nov. 14.

Introduction – By an example

Left part: relations, attributes, primary keys, foreign keys. Right part: classes, properties (data valued or object properties)

23/4/21 ISWC2007, Nov. 14.

Introduction (cont’d)

Manually discovering such simple mappings is tedious and improbable at the Web scale.

So (semi-) automatic approaches have been proposed. Not well consider the characteristics of relational data models and ontology

model The mappings are not accurate enough.

Most of the present approaches cannot construct semantic mappings The (missed) semantic mappings are useful in various practical applications.

23/4/21 ISWC2007, Nov. 14.

Introduction – the contribution

We propose a new approach to discovering simple mappings It constructs virtual documents for the entities

To discover mappings by comparing virtual documents .

It validates mapping consistency To eliminate certain incorrect mappings.

It explores contextual mappings Can be transformed directly to view-based mappings with selection conditions. Be useful for applications in real world domains.

[Ref. 5. Putting context into schema matching. VLDB'06]

23/4/21 ISWC2007, Nov. 14.

Introduction – Terminology

R denotes a relation, and A denotes an attribute. type(A): the domain name of A; rel(A): the relation which specifies A; pk(R): the attributes appeared as the primary keys of R; ref(A): the attributes referenced by A;

C represents a class, and P represents a property. PD denotes a data valued property and PO denotes an object property. d(P): the domain(s) of P; r(P): its range(s) of P.

23/4/21 ISWC2007, Nov. 14.

Introduction – Terminology (cont’d)

A mapping m is a 5-tuple: < id, u, v, t, f >, where: id is a unique identifier; u is an entity in {R} {∪ A}, and v is an entity in {C} {∪ P}; t is a relationship, e.g. equivalence and subsumption, holding between u

and v; f is a confidence measure in the [0, 1] range.

Examples < 1, writes, hasAuthor, , 1.0 > < 2, id, hasID, , 1.0 > < 3, Paper, JournalPaper, , 0.8 >

23/4/21 ISWC2007, Nov. 14.

Outline

IntroductionOur approachEvaluationRelated workSummary and future work

23/4/21 ISWC2007, Nov. 14.

Overview of the approach

Phase 1: Classifying entity types (A preprocess step) Heuristically classifies entities into different groups, coordinates different characteristics.

Phase 2: Discovering simple mappings Constructs virtual documents for entities, calculating confidence measure via TF/IDF model.

Phase 3: Validating mapping consistency Use <relation, class> mappings to validate the consistency of <attribute, property> ; Also, the comparability between the data types of attributes and data valued properties.

Phase 4: Constructing contextual mappings <relation, class> + sample instances contextual mappings.

23/4/21 ISWC2007, Nov. 14.

Phase 1: Classifying entity types

Relation: strong entity relation (SER), weak entity relation (WER), regular relationship relation (RRR), specific relationship relation (SRR).

Attribute: foreign key attribute (FKA), non-foreign key attribute (NFKA).

[Ref.9. Data & Knowledge Engineering. 12 (1994)]

Group 1: {{SER} {∪ WER}}×{C};Group 2: {{RRR} {∪ SRR}}×{PO};Group 3: {FKA}×{PO};Group 4: {NFKA}×{{PD} {∪ PO}}.

Coordinate different characteristics Reifying n-arity relationship (n>2) Others.

23/4/21 ISWC2007, Nov. 14.

Phase 2: Discovering simple mappings

We construct virtual documents for the entities in both the relational schema and the ontology to capture their structural information. A virtual document represents a collection of weighted tokens,

which are derived not only from the description of the entity itself, but also from the descriptions of its neighbors. The weights of the tokens indicate their importance, and could then be viewed as a vector in the TF/IDF model.

Rationality: the semantic information of a relational schema is characterized mainly by its ICs; an OWL ontology can be mapped to an RDF graph, which also indicates the semantic information in its structure.

23/4/21 ISWC2007, Nov. 14.

Discovering simple mappings (cont’d.)

Relations and attributes:

Classes and properties:

23/4/21 ISWC2007, Nov. 14.

Phase 3: Validating mapping consistency

Using mappings between <relations, classes> to validate the consistency of <attributes, properties> mappings. Attributes cannot stand alone without relations. The restriction construct in an OWL ontology specifies local

domain and range constraints on the classes.

23/4/21 ISWC2007, Nov. 14.

Phase 4: Constructing contextual mappings

Focus on a special type of mappings – contextual mappings Directly translated to conditional mappings or view-based mappings.

23/4/21 ISWC2007, Nov. 14.

Constructing contextual mappings (cont’d.)

23/4/21 ISWC2007, Nov. 14.

Outline (cont’d.)

IntroductionOur approachEvaluationRelated workSummary and future work

23/4/21 ISWC2007, Nov. 14.

Evaluation – Data sets

Data sets:

http://www.cs.toronto.edu/~yuana/research/maponto/relational/testData.html [Ref.1. MapOnto]

We implemented our approach in Java, called Marson.

23/4/21 ISWC2007, Nov. 14.

Evaluation – Experimental methodology

Experiment 1. Discovering simple mappings: Marson vs. Simple, VDoc, Valid, RONTO

Simple: not constructing virtual documents, not checking mapping consistency; VDoc: constructing virtual documents, not validating mapping consistency; Valid: not constructing virtual documents, validating mapping consistency; RONTO: an existing prototype, distinguish the types of entities, using I-Sub.

F1-Measure: a combination of precision and recall. Testing various thresholds for each approach, and selecting the best ones.

Experiment 2. Constructing contextual mappings Collecting instances from the Web for the first three data sets:

More than 50 instances for each relation and class.

Comparing with the mappings established by experienced volunteers.

23/4/21 ISWC2007, Nov. 14.

Evaluation – Experiment 1

Under Intel Pentium IV 2.8GHz processor, 512MB DDR2 memory, Windows XP Professional, and Java SE 6, Marson takes about 5 seconds to complete all the five tests (including the parsing time).

23/4/21 ISWC2007, Nov. 14.

Evaluation – Experiment 2

In Case 1, missing < academic_staff, Professor (subclasses of Faculty ) >. Not finding the mapping <academic_staff, Faculty>:

Without background knowledge.

23/4/21 ISWC2007, Nov. 14.

Evaluation – Experiment 2 (cont’d.)

In Case 2: finding <the relation Event, the class Conference> When the values of the attribute type in Event equals to “Research Sessio

n” or “Industrial Session”, the subsumption relationship between Event and Conference can be converted to the equivalence relationship.

23/4/21 ISWC2007, Nov. 14.

Outline (cont’d.)

IntroductionOur approachEvaluationRelated workSummary and future work

23/4/21 ISWC2007, Nov. 14.

Related work

Interested by both Database and Semantic Web communities. At an early stage: visual toolkits, help users specify mappings manually. At present: discovering mappings (semi-) automatically.

For example, COMA, RONTO: – Not considering the structural differences in models;

– Not validating the consistency between mappings.

Other research directions: Describing system framework, e.g., OntoGrate; Defining mapping expression language, e.g., R2O; Extending OWL with ICs; Inferring complex mappings, e.g., MapOnto.

23/4/21 ISWC2007, Nov. 14.

Summary and future work

Summary An approach to discovering simple mappings; An algorithm to build contextual mappings; Experiments to evaluate our approach.

Future work Instance matching; Machine learning techniques for mining semantic mappings; Others.

23/4/21 ISWC2007, Nov. 14.

Thanks for your attention!

Any comments are welcome!

http://iws.seu.edu.cn/

Tools: Marson, Falcon-AO, OntoSum

Services: Falcons (Searching the SW with CSpaces)

Recommended