View
28
Download
6
Category
Tags:
Preview:
DESCRIPTION
Chapter 6: General Schema Manipulation Operators. PRINCIPLES OF DATA INTEGRATION. ANHAI DOAN ALON HALEVY ZACHARY IVES. Outline. Introduction to model management and motivation The merge operator The ModelGen operator The Invert operator. Model Management Operators. - PowerPoint PPT Presentation
Citation preview
ANHAI DOAN ALON HALEVY ZACHARY IVES
Chapter 6: General Schema Manipulation Operators
PRINCIPLES OF
DATA INTEGRATION
Outline
Introduction to model management and motivation The merge operator The ModelGen operator The Invert operator
Model Management Operators
We saw operators for creating mappings between pairs of schemas.
But you can imagine other operators on schemas and mappings: Merge schemas, compose and invert mappings, translate
schemas from one data model to another In fact, imagine an entire algebra of operators that
apply to schemas and to mappings: Many common workflows can be formulated as a sequence of
such operators [Bernstein, 2000] Note: “model” = “schema”. More terminology coming soon.
Example of Model Management (1)
In a data integration scenario, you may proceed as follows, beginning with sources S1 and S2: Use a match operator to create a mapping between S1
and S2
Use merge to create a merged (mediated) schema of S1 and S2 with mappings. Merge will create the minimal schema that includes both S1 and S2.
Example of Model Management (2)
Suppose we have another source S3, which is very similar to S1.
We could first use match to create a mapping from S1 to S3
Then use compose to create a mapping from S3 to the mediated schema G.
Operators
Match: see previous chapters Merge: create a merged schema of S1 and S2 w.r.t. a
mapping M12
ModelGen: create an equivalent model but in a different data model (e.g., relational XML)
Invert: given M12, create M21
Diff: find the difference between two models (see bibliography)
Some Terminology
Model: a specific description of a set of data in a given data model.
Meta model: a data model, such as relational schema, XML DTD, java class definitions, …
Meta-meta-model: a generic language that is independent of a particular meta-model Usually, some a graph-based formalism.
Outline
Introduction to model management and motivation The merge operator The ModelGen operator The Invert operator
The Merge Operator
Given Two models, M1 and M2
A mapping from M1 to M2
Create: A merged model M12 that contains only the information in M1
and M2, but does not repeat information that is in both Mappings from M1and M2 to M12
Challenge to many model management operators: Can you develop algorithms that are generic, i.e., not specific to
particular data models?
Merge Challenges: Example
Challenge 1: different attribute representations. Resolution should be part of the input mappings.
Merge Challenges: Example
Challenge 2: merging models of different data models. (What if one data model supports sub-attributes and another doesn’t?) See ModelGen.
Merge Challenges: Example
Challenge 3: “fundamental conflicts”. Zipcode is an integer in one model and string in another. Merged model cannot have both: Solutions depend on particular conflict and data models
involved.
Outline
Introduction to model management and motivationThe merge operator The ModelGen operator The Invert operator
The ModelGen Operator
Transform a schema from one meta-model (e.g,. Java object model, relational, XML) to another meta-model.
Main challenge: features that exist in the source meta-model may not exist in the target (e.g., sub-classes and inheritance).
The need for ModelGen is very common in practice and is used by several of the other operators.
ModelGen Example
Java classes relational tables
No classes or inheritance in the relational model
ModelGen Strategy
Possible to design specific transformations from one meta-model to another, but we want a generic approach.
Design a super meta-model that has (almost) all features that exist in the meta-models.
The super meta-model knows which features are present in each meta-model.
The algorithm will translate a given model into the super meta-model and from there to the target meta-model.
ModelGen Algorithm
Input: model M1 in meta-model MM1
Output: a model M2 in meta-model MM2 that is equivalent to M1.
Transform M1 to the super-model, yielding M’. While M’ includes features that are not present in
MM2, apply transformations to remove these features (e.g., remove class hierarchy by translating it to multiple vertically partitioned tables)
Transform M’ into M2
Outline
Introduction to model management and motivationThe merge operatorThe ModelGen operator The Invert operator
The Invert Operator
Schema mappings are often directional: They map data in source schema into a target schema.
Natural question: Can we find an inverse mapping?
But what is the right definition of inverse. We’ll see a couple of failed attempts before we see a good
one. Note: algorithms here are not generic. Highly
dependent on the meta-model.
Invert Definition: Attempt 1
Given a mapping M between a source S and target T. M defines a relation between pairs of instances (I,J)
that are consistent with each other: I is an instance of S, J is an instance of T.
Hence, a natural definition is: M-1 should define the relation (J,I), where (I,J) in M.
However, inverses defined this way will not be expressible with tuple-generating dependencies/GLAV mappings.
Why? See next slide.
Attempt #1 Problem Explained
Any relation defined by TGDs is closed up on the right and closed down on the left.
Formally, assume (I,J) is in M I’ is a subset of I, J is a subset of J’, then (I’, J’) is also in M.
However, by definition, M’ would have to be closed up on the left and closed down on the right Hence, cannot be defined with TGDs or GLAV.
Invert Definition: Attempt 2
Definition by composition: M composed with M’ should be the identity mapping!
However, it can be shown that under that condition, a mapping has an inverse only if the following holds: If I1 and I2 are two distinct instances of S, then their targets
under M should be distinct instances of T. The above result considerably limits the mappings
that have inverses. m1 and m2 won’t have inverses:
Third Time’s a Charm: Quasi inverses
Define equivalence between two instances w.r.t. M as:
Define M’ to be the quasi-inverse of M if the composition of M and M’ always maps I to an instance I’ such that
Example:
So m is a quasi-inverse of m’
Summary of Chapter 6
Generic model management operators save a lot of repetitive code and can result in several forms of efficiency gains Employing such operators also ensures that applications
think carefully about the meaning of what they are doing. Two main open challenges:
Can the implementation of these operators be described in a meta-model independent fashion?
Is model management a system in itself that should be built or should operator implementations be individual services?
Recommended