76
Model-independent Schema and Data Translation Paolo Atzeni Dipartimento di Informatica e Automazione Università Roma Tre 29/09/2010

Model-independent Schema and Data Translation

  • Upload
    gianna

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Model-independent Schema and Data Translation. Paolo Atzeni Dipartimento di Informatica e Automazione Università Roma Tre 29/09/2010. Content. Part 1: Background: Data integration and transformation Part 2: Core: Model-independent Schema and Data Translation. References. - PowerPoint PPT Presentation

Citation preview

Page 1: Model-independent  Schema and Data Translation

Model-independent Schema and Data Translation

Paolo Atzeni Dipartimento di Informatica e Automazione

Università Roma Tre29/09/2010

Page 2: Model-independent  Schema and Data Translation

Content

• Part 1: Background:– Data integration and transformation

• Part 2: Core:– Model-independent Schema and Data Translation

GII School 29/09/2010 2

Page 3: Model-independent  Schema and Data Translation

References

• P. Atzeni, G. Gianforme, and P. Cappellari. A Universal Metamodel and Its Dictionary. Transactions on Large-Scale Data-and Knowledge-Centered Systems I, 2009

• P. Atzeni, P. Cappellari, R. Torlone, P. A. Bernstein, and G. Gianforme. Model-independent schema translation. VLDB Journal, 2008

• P. Atzeni, P. Cappellari, and P. A. Bernstein. Model independent schema and data translation. EDBT 2006.

• P. Atzeni, L. Bellomarini, F. Bugiotti and G. Gianforme. A runtime approach to model-independent schema and data translation. EDBT 2009

GII School 29/09/2010 3

Page 4: Model-independent  Schema and Data Translation

GII School 29/09/2010 4

Schema and data translation

• Schema translation:– given schema S1 in model M1 and model M2– find a schema S2 in M2 that “corresponds” to S1

• Schema and data translation:– given also a database D1 for S1– find also a database D2 for S2 that “contains the same data”

as D1

Page 5: Model-independent  Schema and Data Translation

GII School 29/09/2010 5

A wider perspective

• (Generic) Model Management:– A proposal by Bernstein et al (2000 +)– Includes a set of operators on

• schemas and mappings between schemas– The main operators

• Match• Merge• Diff• ModelGen (=schema translation)

Page 6: Model-independent  Schema and Data Translation

GII School 29/09/2010 6

A long standing issue

• Translations from a model to another have been studied since the 1970’s

• Whenever a new model is defined, techniques and tools to generate translations are studied

• However, proposals and solutions are usually model specific: – Given an ER schema, find the suitable relational schema

that “implements” it • the original paper (Chen 1976) contains the basics• further discussions by many (e.g. Markowitz and

Shoshani 1989)• illustrated in every textbook

– Similarly with • any other conceptual model and any other logical one• XML and relational (or object)

Page 7: Model-independent  Schema and Data Translation

GII School 29/09/2010 7

 A simple example

• An object relational database, to be translated in a relational one• Source: the OR-model• Target: the relational model

System managed idsused as references

Page 8: Model-independent  Schema and Data Translation

GII School 29/09/2010 8

 Example, 2

• How do we traslate? – Two possibilities. Does the OR model allow for keys? – If YES (EmpNo and Name are keys)

Page 9: Model-independent  Schema and Data Translation

GII School 29/09/2010 9

 Example, 3

• How do we traslate? – Two possibilities. Does the OR model allow for keys? – If NOT

Page 10: Model-independent  Schema and Data Translation

GII School 29/09/2010 10

Many different models (plus variants …)

OR

Relational

OO

ER

XSD

Page 11: Model-independent  Schema and Data Translation

GII School 29/09/2010 11

Heterogeneity

• We need to handle artifacts and data in various models– Data are defined wrt to schemas– Schemas wrt to models– How can models be defined? We need metamodels

Data

Models

Schemas

= metaschemas

= metadata

Page 12: Model-independent  Schema and Data Translation

GII School 29/09/2010 12

A metamodel approach

• The constructs in the various models are rather similar: – can be classified into a few categories (Hull & King 1986):

• Abstract (entity, class, …)• Lexical: set of printable values (domain)• Aggregation: a construction based on (subsets of)

cartesian products (relationship, table)• Function (attribute, property)• Hierarchies• …

• We can fix a set of metaconstructs (each with variants):– abstract, lexical, aggregation, function, ...– the set can be extended if needed, but this will not be frequent

• A model is defined in terms of the metaconstructs it uses

Page 13: Model-independent  Schema and Data Translation

GII School 29/09/2010 13

 The metamodel approach, example

• The ER model:– Abstract (called Entity)– Function from Abstract to Lexical (Attribute)– Aggregation of abstracts (Relationship) – …

• The OR model:– Abstract (Table with ID)– Function from Abstract to Lexical (value-based Attribute)– Function from Abstract to Abstract (reference Attribute)– Aggregation of lexicals (value-based Table)– Component of Aggregation of Lexicals (Column)– …

Page 14: Model-independent  Schema and Data Translation

GII School 29/09/2010 14

 The supermodel

• A model that includes all the meta-constructs (in their most general forms)– Each model is subsumed by the supermodel (modulo

construct renaming) – Each schema for any model is also a schema for the

supermodel (modulo construct renaming)

• In the example, a model that generalizes OR and relational

• Each translation from the supermodel SM to a target model M is also a translation from any other model to M:– given n models, we need n translations, not n2 

Page 15: Model-independent  Schema and Data Translation

GII School 29/09/2010 15

Generic translations

1. Copy

2. Translation

3. Copy

Supermodel

Source model

Target model

Translation:composition 1,2 & 3

Page 16: Model-independent  Schema and Data Translation

GII School 29/09/2010 16

Translations within the supermodel

• We still have too many models:– we have few constructs, but each has several independent

features which give rise to variants• for example, within simple OR model versions,

– Key may be specifiable or not– Generalizations may be allowed or not– Foreign keys may be used or not– Nesting may be used or not

– Combining all these, we get hundreds of models!– The management of a specific translation for each model

would be hopeless

Page 17: Model-independent  Schema and Data Translation

GII School 29/09/2010 17

The metamodel approach, translations

• As we saw, the constructs in the various models are similar: – can be classified according to the metaconstructs– translations can be defined on metaconstructs,

• there are standard, known ways to deal with translations of constructs (or variants theoreof)

• Elementary translation steps can be defined in this way– Each translation step handles a supermodel construct (or a

feature thereof) "to be eliminated" or "transformed"• Then, elementary translation steps to be combined• A translation is the concatenation of elementary translation

steps

Page 18: Model-independent  Schema and Data Translation

GII School 29/09/2010 18

Many different models

OR

Relational

OO

ER

XSD

Page 19: Model-independent  Schema and Data Translation

GII School 29/09/2010 19

Many different models (and variants …) OR

w/ PK, gen, ref, FK

Relational…

OR w/ PK, gen, ref

OR w/ PK, gen, FK

OR

w/ PK, ref, FK

OR w/ gen, ref

OR w/ PK, FK

OR w/ PK, ref

OR w/ ref

Page 20: Model-independent  Schema and Data Translation

GII School 29/09/2010 20

A complex translation, example(0,N) (0,N)

Target: simple object model• Eliminate N-ary relationships • Eliminate attributes from relationships • Eliminate many-to-many relationships• Replace relationships with references• Eliminate generalizations

Page 21: Model-independent  Schema and Data Translation

GII School 29/09/2010 21

Complex translations

Binary ER w/o gen

N-ary ER w/ gen

Binary ERw/ gen

Relational

Bin ER w/ gen w/o attr on rel

OO w/ gen

N-ary ER w/o gen

Bin ER w/o gen w/o attr on rel

Bin ER w/o gen w/o M:N rel

Elim. N-ary relationships Elim. Relationship attr.sElim. M:N relationshipsReplace relationships with

referencesElim OO generalizations Elim ER generalizations

Bin ER w/ gen w/o M:N rel

OO w/o gen…

Page 22: Model-independent  Schema and Data Translation

GII School 29/09/2010 22

 A more complex example

• An object relational database, to be translated in a relational one– Source: an OR-model– Target: the relational model

Page 23: Model-independent  Schema and Data Translation

GII School 29/09/2010 23

A more complex example, 2

ENG

EMP DEPT

Last Name

School

Dept

NameAddress

ID ID

ID

Dept_ID

Emp_ID

Target: relational modelEliminate generalizations Add keysReplace refs with FKs

Page 24: Model-independent  Schema and Data Translation

A more complex example, 3

GII School 29/09/2010 24

EMP

Last Name

ID

Dept_ID

DEPT

NameAddress

ID

ENG

School

ID

Emp_ID

Target: relational modelEliminate generalizations Add keysReplace refs with FKsReplace objects with tables

Page 25: Model-independent  Schema and Data Translation

GII School 29/09/2010 25

Many different models (and variants …) OR

w/ PK, gen, ref, FK

Relational

OR w/ PK, gen, ref

OR w/ PK, gen, FK

OR

w/ PK, ref, FK

OR w/ gen, ref

OR w/ PK, FK

OR w/ PK, ref

OR w/ ref Eliminate generalizations

Add keysReplace refs with FKsReplace objects with tables

Source

Target

Page 26: Model-independent  Schema and Data Translation

GII School 29/09/2010 26

Translations in MIDST (our tool)

• Basic translations are written in a variant of Datalog, with OID invention– We specify them at the schema level– The tool "translates them down" to the data level

(both in off-line and run-time manners, see later)– Some completion or tuning may be needed

Page 27: Model-independent  Schema and Data Translation

GII School 29/09/2010 27

A Multi-Level Dictionary

• Handles models, schemas and data• Has both a model specific and a model independent component• Relational implementation, so Datalog rules can be easily

specified

Page 28: Model-independent  Schema and Data Translation

GII School 29/09/2010 28

A common dictionary (for an ER design tool)

EntityOID Schema Name301 1 Employees

302 1 Departments

201 3 Clerks

202 3 Offices

AttributeOfEntityOID Schema Name isIdent isNullable Type Entity401 1 EmpNo T F Int 301

402 1 Name F F Text 301

404 1 Name T F Char 302

405 1 Address F F Text 302

501 3 Code T F Int 201

… … … … … … …

Relationship

OID Schema Name Entity1 Entity2401 1 Memb 301 302 … …

… … … … … … …

Employees DepartmentsMemb

Name

AddressName

EmpNo

Page 29: Model-independent  Schema and Data Translation

GII School 29/09/2010 29

Multi-Level Dictionary

model

schema

modelgeneric

modelspecific

desc

riptio

n

modelindependence

Supermodel description(mSM)

Model descriptions(mM)

Supermodel schemas(SM)

Model specific schemas(M)

Supermodel instances (i-SM)

Model specific instances(i-M)data

Page 30: Model-independent  Schema and Data Translation

GII School 29/09/2010 30

The supermodel description

MSM-ConstructOID Name IsLex

1 AggregationOfLexicals F

2 ComponentOfAggrOfLex T

3 Abstract F

4 AttributeOfAbstract T

5 BinaryAggregationOfAbstracts F

MSM-PropertyOID Name Constr. Type11 Name 1 String

12 Name 2 String

13 IsKey 2 Boolean

14 IsNullable 2 Boolean

15 Type 2 String

16 Name 3 String

17 Name 4 String

18 IsIdentifier 4 Boolean

19 IsNullable 4 Boolean

20 Type 4 String

21 IsFunct1 5 Boolean

22 IsOptional1 5 Boolean

23 Role1 5 String

24 IsFunct2 5 Boolean

25 IsOptional2 5 Boolean

26 Role2 5 String

MSM-ReferenceOID Name Construct Target30 Aggregation 2 1

31 Abstract 4 3

32 Abstract1 5 3

33 Abstract2 5 3

mSMmMSMM

i-SMi-M

Page 31: Model-independent  Schema and Data Translation

GII School 29/09/2010 31

Model descriptions

MSM-ConstructOID Name IsLex

1 AggregationOfLexicals F

2 ComponentOfAggrOfLex T

3 Abstract F

4 AttributeOfAbstract T

5 BinaryAggregationOfAbstracts F

MM-ConstructOID Model MSM-Constr Name

1 2 3 ER_Entity

2 2 4 ER_Attribute

3 2 5 ER_Relationship

4 1 1 Rel_Table

5 1 2 Rel_Column

6 3 3 OO_Class

MM-ModelOID Name

1 Relational

2 Entity-Relationship

3 Object

MSM-PropertyOID Name Construct Type… … … …

MSM-ReferenceOID Name Construct …… … … …

MM-Property… … … …

MM-Reference… … … …

mSMmMSMM

i-SMi-M

Page 32: Model-independent  Schema and Data Translation

GII School 29/09/2010 32

Schemas in a model

ER-EntityOID Schema Name301 1 Employees

302 1 Departments

201 3 Clerks

202 3 Offices

ER-AttributeOfEntityOID Schema Name isIdent isNullable Type AbstrOID401 1 EmpNo T F Int 301

402 1 Name F F Text 301

404 1 Name T F Char 302

405 1 Address F F Text 302

501 3 Code T F Int 201

… … … … … … …

SM-ConstructOID Name IsLex… … …

3 ER-Entity F

4 ER-AttributeOfEntity T

… … …

ER schemas

mSMmMSMM

i-SMi-M

Employees

Departments

EmpNo

Name

Name

Address

Page 33: Model-independent  Schema and Data Translation

GII School 29/09/2010 33

Schemas in the supermodel

SM-AbstractOID Schema Name301 1 Employees

302 1 Departments

201 3 Clerks

202 3 Offices

SM-AttributeOfAbstractOID Schema Name isIdent isNullable Type AbstrOID401 1 EmpNo T F Int 301

402 1 Name F F Text 301

404 1 Name T F Char 302

405 1 Address F F Text 302

501 3 Code T F Int 201

… … … … … … …

MSM-ConstructOID Name IsLex… … …

3 Abstract F

4 AttributeOfAbstract T

… … …

Supermodel schemas

mSMmMSMM

i-SMi-M

Employees

Departments

EmpNo

Name

Name

Address

Page 34: Model-independent  Schema and Data Translation

GII School 29/09/2010 34

i-SM-AbstractOID AbsOID InstOID3010 301 1

3011 301 1

… … …

3010 301 2

i-SM-AttributeOfAbstractOID AttrOfAbsOID InstOID Value i- AbsOID4010 401 1 75432 3010

4020 402 1 John Doe 3010

4021 402 1 Bob White 3011

… … … … …

Instances in the supermodel

SM-AbstractOID Schema Name301 1 Employees

302 1 Departments

201 3 Clerks

202 3 Offices

SM-AttributeOfAbstractOID Schema Name isIdent isNullable Type AbstrOID401 1 EmpNo T F Int 301

402 1 Name F F Text 301

404 1 Name T F Char 302

405 1 Address F F Text 302

501 3 Code T F Int 201

… … … … … … …Supermodel schemasSupermodel instances

mSMmMSMM

i-SMi-M

Page 35: Model-independent  Schema and Data Translation

GII School 29/09/2010 35

Multi-Level Repository, generation and use

model

schema

modelgeneric

modelspecific

desc

riptio

n

modelindependence

Supermodel description(mSM)

Model descriptions(mM)

Supermodel schemas(SM)

Model specific schemas(M)

Supermodel instances (i-SM)

Model specific instances(i-M)data

Structure fixed, content provided by tool

designers

Structure generated by the tool from the content

of mSM

Structure generated by the tool from the content

of mSM

Structure fixed, content provided by model

designers out of mSMStructure generated by

the tool from the content of mM

Page 36: Model-independent  Schema and Data Translation

GII School 29/09/2010 36

Translations

• Basic translations are written in a variant of Datalog, with OID invention– We specify them at the schema level– The tool "translates them down" to the data level– Some completion or tuning may be needed

Page 37: Model-independent  Schema and Data Translation

GII School 29/09/2010 37

A basic translation

• From (a simple) binary ER model to the relational model– a table for each entity– a column (in the table for E) for each attribute of an entity E– for each M:N relationship

• a table for the relationship• columns …

– for each 1:N and 1:1 relationship:• a column for each attribute of the identifier …

Page 38: Model-independent  Schema and Data Translation

GII School 29/09/2010 38

A basic translation application

DepartmentsName Address

EmployeesEmpNo

Name

Affiliation

Departments

1,1

0,N

Name

Address

Employees

EmpNo Name Affiliation

Page 39: Model-independent  Schema and Data Translation

GII School 29/09/2010 39

A basic translation (in supermodel terms)

• From (a simple) binary ER model to the relational model– an aggregation of lexicals for each abstract– a component of the aggregation for each attribute of abstract– for each M:N aggregation of abstracts …

• …

• From (a simple) binary ER model to the relational model– a table for each entity– a column (in the table for E) for each attribute of an entity E– for each M:N relationship

• a table for the relationship• columns …

– for each 1:N and 1:1 relationship:• a column for each attribute of the identifier …

Page 40: Model-independent  Schema and Data Translation

GII School 29/09/2010 40

"An aggregation of lexicals for each abstract"

SM_AggregationOfLexicals(OID: #aggregationOID_1(OID), Name: name)

SM_Abstract (

OID: OID, Name: name ) ;

Page 41: Model-independent  Schema and Data Translation

GII School 29/09/2010 41

Datalog with OID invention

• Datalog (informally):– a logic programming language with no function symbols and

predicates that correspond to relations in a database– we use a non-positional notation

• Datalog with OID invention:– an extension of Datalog that uses Skolem functions to

generate new identifiers when needed• Skolem functions:

– injective functions that generate "new" values (value that do not appear anywhere else; so different Skolem functions have disjoint ranges

Page 42: Model-independent  Schema and Data Translation

GII School 29/09/2010 42

"An aggregation of lexicals for each abstract"

SM_AggregationOfLexicals(OID: #aggregationOID_1(OID), Name: n)

SM_Abstract (

OID: OID, Name: n ) ;

• the value for the attribute Name is copied (by using variable n)• the value for OID is "invented": a new value for the function

#aggregationOID_1(OID) for each different value of OID, so a different value for each value of SM_Abstract.OID

Page 43: Model-independent  Schema and Data Translation

GII School 29/09/2010 43

"An aggregation of lexicals for each abstract"

SM-AbstractOID Schema Name301 1 Employees

302 1 Departments

… … …

SM-AttributeOfAbstractOID Schema Name isIdent isNullable Type AbstrOID401 1 EmpNo T F Int 301

402 1 Name F F Text 301

… … … … … … …

EmployeesEmpNoName

11

11

Departments1002

Employees1001

SM_AggregationOfLexicals(OID: #aggregationOID_1(OID), Name: n)

SM_Abstract (

OID: OID, Name: n ) ;

3021002

……

1001

SM-AggregationOfLexicalsSchema NameOID

SM-aggregationOID_1_SKabsOIDOID

301

Employees

Page 44: Model-independent  Schema and Data Translation

GII School 29/09/2010 44

"A component of the aggregation for each attribute of abstract"

SM_ComponentOfAggregation… ( OID: #componentOID_1(attOID),Name: name, AggrOID: #aggregationOID_1(absOID),IsNullable: isNullable, IsKey: isIdent, Type : type )

←SM_AttributeOfAbstract(

OID: attOID,Name: name,AbstractOID: absOID, IsIdent: isIdent, IsNullable: isNullable , Type : type ) ;

• Skolem functions– are functions– are injective– have disjoint ranges

• the first function "generates" a new value

• the second "reuses" the value generated by the first rule

Page 45: Model-independent  Schema and Data Translation

GII School 29/09/2010 45

A component of the aggregation for each attribute of abstract"

SM-AbstractOID Schema Name301 1 Employees

302 1 Departments

… … …

EmployeesEmpNoName

11

11

Departments1002

Employees1001

3021002

……

1001

SM-AggregationOfLexicalsSchema NameOID

SM-aggregationOID_1_SKabsOIDOID

301

SM_ComponentOfAggregation… ( OID: #componentOID_1(attOID),Name: name, AggrOID: #aggregationOID_1(absOID),IsNullable: isNullable, IsKey: isIdent, Type : type )

←SM_AttributeOfAbstract(

OID: attOID,Name: name,AbstractOID: absOID, IsIdent: isIdent, IsNullable: isNullable , Type : type ) ;

SM-componentOID_1_SKabsOIDOID

Employees

EmpNo Name

SM-AttributeOfAbstractOID Schema Name isIdent isNullable Type AbstrOID401 1 EmpNo T F Int 301

402 1 Name F F Text 301

… … … … … … …

SM-ComponentOfAggregationOfLexicalsSchema Type AggrOIDisNullableisIdentNameOID

Text 1001FFName111004

1001IntFTEmpNo11

1003 401

1004 402

1003

Page 46: Model-independent  Schema and Data Translation

Many rules, how to choose?

• Models can be described in a compact way – the involved constructs and their properties (boolean

propositions)• Datalog rules can be "summarized" by means of "signatures"

that describe the involved constructs and how they are transformed (in terms of boolean properties)

GII School 29/09/2010 46

Page 47: Model-independent  Schema and Data Translation

GII School 29/09/2010 47

Model Signatures

• MRel = {T(true), C(true)} relational• MRelNoN = {T(true), C(¬N)} relational with no nulls• MER = {E(true), A(true), R(true), A-R(true)} ER• MERsimple = {E(true), A(¬N), R(true)} • MERnoM2N = {E(true), A(true), R(F1 F2), A-R(true)}

• T table• C column; ¬N: no null values allowed• E entity• R relationship; F1 F2 : no many-to-many• A attribute• A-R attribute of relationship

Page 48: Model-independent  Schema and Data Translation

GII School 29/09/201048

Rule signature

• Rule signature– Body - B

• List of signatures of body constructs• Applicability of the rule

– Head - H• Signature of the atom in the head• Sure conditions of the result of the application of the rule

– MAP• Mapping between properties of body constructs and

properties of the head construct• Transfer of values from the body to the head of the rule

Page 49: Model-independent  Schema and Data Translation

GII School 29/09/201049

Rule signature

• Example• Relationship (OID:#relationship_1(eOid,rOid),

Name: eN+rN,isOpt1: false, isFunct1: true, isIdent: true,isOpt2: isOpt, isFunct2: false,Entity1: #entity_1(rOid), Entity2: #entity_0(eOid))

Relationship (OID:,rOid, Name: rN,

isOpt1: isOpt, isFunct1: false, isFunct2: false,Entity1: eOid),

Entity (OID: eOid, Name:eN)

Page 50: Model-independent  Schema and Data Translation

GII School 29/09/201050

Rule signature

– Body• Relationship (OID:#relationship_1(eOid,rOid),

Name: eN+rN,isOpt1: false, isFunct1: true, isIdent: true,isOpt2: isOpt, isFunct2: false,Entity1: #entity_1(rOid), Entity2: #entity_0(eOid))

Relationship (OID:,rOid, Name: rN,

isOpt1: isOpt, isFunct1: false, isFunct2: false,Entity1: eOid),

Entity (OID: eOid, Name:eN)

Page 51: Model-independent  Schema and Data Translation

GII School 29/09/201051

Rule signature

– Head• Relationship (OID:#relationship_1(eOid,rOid),

Name: eN+rN,isOpt1: false, isFunct1: true, isIdent: true,isOpt2: isOpt, isFunct2: false,Entity1: #entity_1(rOid), Entity2: #entity_0(eOid))

Relationship (OID:,rOid, Name: rN,

isOpt1: isOpt, isFunct1: false, isFunct2: false,Entity1: eOid),

Entity (OID: eOid, Name:eN)

Page 52: Model-independent  Schema and Data Translation

GII School 29/09/201052

Rule signature

– MAP• Relationship (OID:#relationship_1(eOid,rOid),

Name: eN+rN,isOpt1: false, isFunct1: true, isIdent: true,isOpt2: isOpt, isFunct2: false,Entity1: #entity_1(rOid), Entity2: #entity_0(eOid))

Relationship (OID:,rOid, Name: rN,

isOpt1: isOpt, isFunct1: false, isFunct2: false,Entity1: eOid),

Entity (OID: eOid, Name:eN)

Page 53: Model-independent  Schema and Data Translation

Reasoning

• Formal System– Compact representation of models and rules

• Based on logical formulas– Reasoning on data models

• Union, intersection, difference of models and schemas• Applicability and application of rules and programs

– Sound and complete with respect to the Datalog programs• let us see

53GII School 29/09/2010

Page 54: Model-independent  Schema and Data Translation

GII School 29/09/2010 54

Reasoning on translations

• M1 = SIG(M1) description of model M1

• rP = SIG(P) signature of Datalog program P• M2 = rP(M1) application of the sig of P to the desc

of M1

S1 M1

PS2 = P(S1)

M2 = rP(M1)M1 = SIG(M1)

rP = SIG(P)

Theorem• Program P applied to schemas of M1 generates schemas (and

somehow all of them) that belong to a model M2 whose

description is M2 = rP(M1)

schema model

program

S2 M2

M2 = SIG(M2)

Page 55: Model-independent  Schema and Data Translation

Reasoning

• Main application– We automatically find a sequence of basic translations to

perform the transformation of a schema from a model to another, under suitable assumptions

• Observations– Few “families” of models

• ER, OO, Relational…• Each family has a progenitor

– Two kinds of Datalog programs• Reduction• Transformation

55GII School 29/09/2010

Page 56: Model-independent  Schema and Data Translation

Reasoning

• Automatic Translation– 3-step transformation

• Reduction within the source family• Transformation from the source family to the target family• Reduction within the target family

M∗

1

M∗

2

M1

M'1

M'2

M2

T

F1

F2

56GII School 29/09/2010

Page 57: Model-independent  Schema and Data Translation

GII School 29/09/2010 57

Correctness

• Usually modelled in terms of information capacity equivalence/dominance (Hull 1986, Miller 1993, 1994)

• Mainly negative results in practical settings that are non-trivial• Probably hopeless to have correctness in general• We follow an "axiomatic" approach:

– We have to verify the correctness of the basic translations, and then infer that of complex ones

Page 58: Model-independent  Schema and Data Translation

The data level

• So far, we have considered only schemas• How do we translate data?• Should we move data or translate it on the fly?

GII School 29/09/2010 58

Page 59: Model-independent  Schema and Data Translation

GII School 29/09/2010 59

Translations: off-line approach

• The dictionary has a third level, for data• Basic translations are "translates them down" to the data level• Some completion or tuning may be needed

Page 60: Model-independent  Schema and Data Translation

GII School 29/09/2010 60

Multi-Level Dictionary

model

schema

modelgeneric

modelspecific

desc

riptio

n

modelindependence

Supermodel description(mSM)

Model descriptions(mM)

Supermodel schemas(SM)

Model specific schemas(M)

Supermodel instances (i-SM)

Model specific instances(i-M)data

Page 61: Model-independent  Schema and Data Translation

GII School 29/09/2010 61

i-SM-AbstractOID AbsOID InstOID3010 301 1

3011 301 1

… … …

3010 301 2

i-SM-AttributeOfAbstractOID AttrOfAbsOID InstOID Value i- AbsOID4010 401 1 75432 3010

4020 402 1 John Doe 3010

4021 402 1 Bob White 3011

… … … … …

Instances in the supermodel

SM-AbstractOID Schema Name301 1 Employees

302 1 Departments

201 3 Clerks

202 3 Offices

SM-AttributeOfAbstractOID Schema Name isIdent isNullable Type AbstrOID401 1 EmpNo T F Int 301

402 1 Name F F Text 301

404 1 Name T F Char 302

405 1 Address F F Text 302

501 3 Code T F Int 201

… … … … … … …Supermodel schemasSupermodel instances

mSMmMSMM

i-SMi-M

Page 62: Model-independent  Schema and Data Translation

GII School 29/09/2010 62

Generating data-level translations

Schema translation

Data translation

• Same environment• Same language• High level translation specification

Supermodel description(mSM)

Supermodel schemas(SM)

Supermodel instances (i-SM)

Page 63: Model-independent  Schema and Data Translation

GII School 29/09/2010 63

Translation rules, data leveli-SM_ ComponentOfAggregation… (

OID: #i-componentOID_1 (i-attOID),i-AggrOID: #i-aggregationOID_1(i-absOID),ComponentOfAggregationOfLexicalsOID:

#componentOID_1(attOID),Value: Value )

←i-SM_AttributeOfAbstract(

OID: i-attOID,i-AbstractOID: i-absOID,AttributeOfAbstractOID: attOID,Value: Value ),

SM_AttributeOfAbstract(OID: attOID,AbstractOID: absOID,Name: attName,IsNullable: isNull,IsID: isIdent,Type: type )

SM_ComponentOfAggregation… ( OID: #componentOID_1(attOID), Name: name, AggrOID:

#aggregationOID_1(absOID),IsNullable: isNullable, IsKey: isIdent, Type : type )

←SM_AttributeOfAbstract(

OID: attOID,Name: name,AbstractOID: absOID, IsIdent: isIdent, IsNullable: isNullable , Type: type ) ;

Page 64: Model-independent  Schema and Data Translation

GII School 29/09/2010 64

i-SM-AbstractOID AbsOID InstOID3010 301 1

3011 301 1

3012 302 1

… … …

i-SM-AttributeOfAbstractOID AttrOfAbsOID InstOID Value i- AbsOID4010 401 1 75432 3010

4020 402 1 John Doe 3010

4021 402 1 Bob White 3011

4022 404 1 CS 3012

… … … … …

Instances in our dictionary

SM-AbstractOID Schema Name301 1 Employees

302 1 Departments

SM-AttributeOfAbstractOID Schema Name isIdent isNullable Type AbstrOID401 1 EmpNo T F Int 301

402 1 Name F F Text 301

404 1 Name T F Char 302

… … … … … … …

mSMmMSMM

i-SMi-M

75432

John Doe

Bob White

CS

Page 65: Model-independent  Schema and Data Translation

GII School 29/09/2010 65

Translation of instances

75432

John Doe

Bob White

CS

EmployeesEmpNo Name Affiliation75432 John Doe CS

Bob White

DepartmentsName

CS

DepartmentsName Address

EmployeesEmpNo

Name

Affiliation

Departments

1,1

0,N

Name

Address

EmployeesEmpNo Name Affiliation

Page 66: Model-independent  Schema and Data Translation

GII School 29/09/2010 66

i-SM-AbstractOID AbsOID InstOID3010 301 1

3011 301 1

3012 302 1

… … …

i-SM-AttributeOfAbstractOID AttrOfAbsOID InstOID Value i- AbsOID

4010 401 1 75432 3010

4020 402 1 John Doe 3010

4022 404 1 CS 3012

… … … … …

Instances in our dictionary75432

John Doe

Bob White

…CS

EmployeesEmpNo Name Affiliation75432 John Doe CS

DepartmentsName

CS

i-SM-AggregationOfLexicals

1002

1001

AggOID

115011

115010

InstOIDOID

5010CS1110054030

5010754321110034010

5010John Doe1110044020

i-SM-ComponentOfAggregationOfLexicalsi- AggOIDValueInstOIDCompOfAggOIDOID

Page 67: Model-independent  Schema and Data Translation

GII School 29/09/2010 67

i-SM-AbstractOID AbsOID InstOID3010 301 1

3011 301 1

3012 302 1

… … …

i-SM-AttributeOfAbstractOID AttrOfAbsOID InstOID Value i- AbsOID

4010 401 1 75432 3010

4020 402 1 John Doe 3010

4022 404 1 CS 3012

… … … … …

Instances in our dictionary

i-SM-AggregationOfLexicals

1002

1001

AggrOID

115011

115010InstOIDOID

5010754321110036010

5010John Doe1110046020

5010CS1110056030

i-SM-ComponentOfAggregationOfLexicals

i- AggrOIDValueInstOIDCompOfAggOIDOID

i-SM_ComponentOfAggregation… (OID: #i-componentOID_1 (i-attOID),i-AggrOID: #i-aggregationOID_1(i-absOID),ComponentOfAggregationOfLexicalsOID: #componentOID_1(attOID),Value: Val )

← i-SM_AttributeOfAbstract( OID: i-attOID, i-AbstractOID: i-absOID,AttributeOfAbstractOID: attOID, Value: Val ),

SM_AttributeOfAbstract( OID: attOID, AbstractOID: absOID)

Page 68: Model-independent  Schema and Data Translation

GII School 29/09/2010 68

Translation rules, data leveli-SM_ Lexical (

OID: #i-lexicalOID_1 (i-attOID),i-AggrOID: #i-aggregationOID_1(i-absOID),LexicalOID: #lexicalOID_1(attOID),Value: Value )

←i-SM_AttributeOfAbstract(

OID: i-attOID,i-AbstractOID: i-absOID,AttributeOfAbstractOID: attOID,Value: Value ),

SM_AttributeOfAbstract(OID: attOID,AbstractOID: absOID,Name: attName,IsNullable: isNull,IsID: isIdent,Type: type )

SM_Lexical( OID: #lexicalOID_1(attOID), Name: name, AggrOID:

#aggregationOID_1(absOID),IsNullable: isNullable, IsKey: isIdent, Type : type )

←SM_AttributeOfAbstract(

OID: attOID,Name: name,AbstractOID: absOID, IsIdent: isIdent, IsNullable: isNullable , Type: type ) ;

Page 69: Model-independent  Schema and Data Translation

Off-line approach

GII School 29/09/2010 69

Operational Systems MIDST

DB Translator

Importer

Exporter

Supermodel

DB

DBDB

Page 70: Model-independent  Schema and Data Translation

GII School 29/09/2010 70

Experiments

• A significant set of models– ER (in many variants and extensions)– Relational– OR– XSD– UML

Page 71: Model-independent  Schema and Data Translation

GII School 29/09/2010 71

XSD

OR noTab ref, FK, nested

Relational

OO ref, nested

OR Tab, ref, FK, nested

OO ref, flat

OR noTab, FK, nested

OR noTab, FK, flat

OR Tab, genref, FK, nested

OR noTab, gen

ref, FK, nestedOR noTab, gen,

FK, nested

37+ Remove generalizations36 Unnest sets (with ref)03 Unnest sets (with FKs)24 Unnest structures (flatten)06 Unnest structures (TT & FKs)01 Unnest structures (TT & ref)43 FKs for references29 Tables for typed tables30 Typed tables for tables13 References for FK31 Nest referenced classes

Page 72: Model-independent  Schema and Data Translation

The off-line approach: drawbacks

• It is highly inefficient, because it requires databases to be moved back and forth

• It does not allow data to be used directly• A "run-time" approach is needed

72GII School 29/09/2010

Page 73: Model-independent  Schema and Data Translation

A run-time alternative: generating views

• Main feature:– generate, from the datalog traslation programs, executable

statements defining views representing the target schema.• How:

– by means an analysis of the datalog schema rules under a new classification of constructs

73GII School 29/09/2010

Page 74: Model-independent  Schema and Data Translation

Runtime translation procedure

Views

DB SupermodelSchemaImporter

ViewGenerator

Translator

MIDSTOperational System

Access via target schema St

Access via source schema Ss

74GII School 29/09/2010

Page 75: Model-independent  Schema and Data Translation

Run-time vs off-line

GII School 29/09/2010 75

Operational Systems MIDST

DB Translator

Importer

Exporter

Supermodel

DB

DB

DB

Views

DB SupermodelSchemaImporter

ViewGenerator

Translator

MIDSTOperational System

Access via target schema

Access via source schema

Page 76: Model-independent  Schema and Data Translation

GII School 29/09/2010 76

Issues

• Reasoning on translations• ModelGen in the model management scenario:

– Round-trip engineering (and more)• Customization of translations• Off-line translation vs run-time• “Correctness” of data translation wrt schema translation:

– compare with data exchange• Schematic heterogeneity and semantic Web framework:

– What if the distinction between schemas and instances is blurred?

• Materialized Skolem function & provenance