28
1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National Univ. of Singapore

Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas

  • Upload
    makya

  • View
    26

  • Download
    3

Embed Size (px)

DESCRIPTION

Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas. Qi He Tok Wang Ling Dept. of Computer Science School of Computing National Univ. of Singapore. Outline. Schema integration – background Schematic discrepancy - PowerPoint PPT Presentation

Citation preview

Page 1: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

1

Resolving Schematic Discrepancy in the Integration of

Entity-Relationship Schemas

Qi He Tok Wang Ling

Dept. of Computer Science School of Computing

National Univ. of Singapore

Page 2: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

2

Outline

• Schema integration – background

• Schematic discrepancy

• Representation of meta information in ER schemas

• Resolution of schematic discrepancy in schema integration

• Related work

• Conclusion

Page 3: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

3

Schema Integration

• In DB integration, produce an integrated view which provides a unified access to heterogeneous data in source schemas.

• In DB design, produce a global schema of a proposed DB by integrating user views in DB design.

Page 4: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

4

Challenges in schema integration

• Many types of conflicts among different source schemas need to be resolved in schema integration:– Naming conflicts– Domain mismatch– Structural conflicts – Cardinality conflicts– Local constraints vs global constraints (e.g. local vs global functional dependencies)

– Schematic discrepancy…

Page 5: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

5

Schematic Discrepancy

• Schematic discrepancy occurs when a metadata in one database corresponds to attribute values in the other.

• An example (next page) – months and supplier numbers (i.e., S1, …, S

n) are modeled differently as attribute values or schema labels (in general, metadata which will be introduced later) in databases DB1, DB2, and DB3.

Page 6: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

6

Motivation Example

P RO D

P #

M O N T H

M O N T H

S U Pm m

P RICE

S U P P LIERS #

m

D B 1 :

JA N _ P RO D

P # S 1_ P RICE

D B 2 :

S N _ P RICE

P RO D

P #

P RICE

S U P P LIER

S #JA N _ S U P

D EC_ S U P

P RICE

m

mm

m

D B 3 :

P N A M E

P N A M E

P N A M E

J AN _ P R O D = P M [ m o n th = 'ja n '] . . . { P # = p # , P N AM E = p n a m e, S 1 _ P R I C E = p rice[ su p p lier= 's1 ', in h er it AL L ] ,

. . .}

J AN _ S UP = P M S [ m o n th = 'ja n '] { P R I C E = p rice[ in h er it AL L ] }

D EC_ P RO D

P # S 1_ P RICE

S N _ P RICE

P N A M E

P R O D = p ro d u ct { P # = p # , P N AM E = p n a m e}S UP P L I E R = su p p lier

{ S # = s# }M O N T H = m o n th

{ M O N T H = m o n th }S UP = P M S

{ P R I C E = p rice}

. . .

price is an attribute of the ternary relationship type PMS

PM is a relationship type between product and month

Page 7: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

7

Contexts of schema constructs

• Conceptual modeling is always done within a particular context which is explicitly represented as a set of meta attributes with values (called metadata).

• Meta attributes with values specify the conditions satisfied by the instances of a schema construct (i.e., entity type, relationship type, or attribute).

Page 8: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

8

Ontology

• A representational vocabulary for a shared domain of discourse which includes the definitions of entity types, relationship types, and attributes.

• We use an ontology to describe the meta information of the ER schemas of the supply example:– Entity types: product, supplier, month– Attributes of entity types: p#, pname, s#, month– Relationship types: PMS (a ternary supply relationship type among product, month and supplier) PM (a binary relationship type between product and month)

PM is a projection of PMS.– Attributes of relationship types: price (an attribute of PMS)

Page 9: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

9

Example of Context• In DB2, the entity type JAN_PROD is represented as:

JAN_PROD = PM [month = ‘jan’]

where PM and month are resp. a relationship type and an entity type from the ontology.

It means that JAN_PROD is derived from the product-month binary relationship type (i.e. PM) when the month value is ‘jan’.

month is a meta-attribute and jan the metadata of JAN_PROD.

Page 10: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

10

Inheritance of Context

• Context could be specified at 4 levels of– Databases– Entity types– Relationship types– Attributes

• The context of a higher level schema construct could be inherited by a lower level schema construct. The inheritance hierarchy of contexts is:

relationship type attribute of relationship type

database entity type attribute of entity type

Page 11: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

11

Example of context inheritance

• In DB2, the attribute S1_PRICE of the entity type JAN_PROD is represented as:

S1_PRICE = price [supplier=’s1’, inherit ALL]

S1_PRICE inherits ‘all’, i.e. the context month=’jan’, from the entity type JAN_PROD.

The representation means that each value of S1_PRICE of the entity type JAN_PROD is a price of a product supplied by supplier s1 in the month of jan.

Page 12: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

12

Resolution of schematic discrepancy in the integration of

ER schemas

Page 13: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

13

• Basic Idea: Remove the contexts of schema constructs by transforming meta-attributes into entity types.

• Only meta-attributes causing schematic discrepancy need to be transformed.

• Schema transformation should keep the semantics (information and constraints) of source schemas.

Page 14: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

14

• Resolve schematic discrepancy for entity types, relationship types, attributes of entity types and attributes of relationship types in order (the order conforms to the hierarchical order of context inheritance).

• The context at database level is handled in the entity types which inherit it.

Page 15: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

15

An example

• Transforming DB2 into DB1 in 2 steps– Step 1: Resolve discrepancies for the entity

types JAN_PROD, …, DEC_PROD• Step 1.1: Transform meta-attributes into entity

types• Step 1.2: Merge equivalent entity types,

relationship types and attributes

– Step 2: resolve discrepancies for the attributes S1_PRICE, …, SN_PRICE

P RO D

P #

M O N T H

M O N T H

S U Pm m

P RICE

S U P P LIERS #

m

D B 1 :

JA N _ P RO D

P # S 1_ P RICE

D B 2 :

S N _ P RICE

P N A M EP N A M E

Page 16: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

16

JA N _ P RO D

P #S 1_ P RICE

D B 2 :

S N _ P RICE

P N A M E

D EC_ P RO D

P #S 1_ P RICE

S N _ P RICE

P N A M E

P RO DP Mm m

S N _ P RICE

M O N T H

S 1_ P RICE

d o m ( M O N T H) = { ja n }

P # P N A M E M O N T H

P RO D

P Mm m

S N _ P RICE

M O N T H

S 1_ P RICE

d o m ( M O N T H) = { d ec}

P # P N A M E M O N T H

. . .

. . .

S te p 1 .1

J AN _ P R O D = P M [ m o n th = 'ja n '] { S 1 _ P R I C E = p rice[ su p p lier= 's1 ', in h er it AL L ] ,

. . .}

S 1 _ P R I C E = p rice[ su p p lier= 's1 ']. . .

Step 1.1: Transform the meta-attribute month of the entity type JAN_PROD (the other entity types are similar):

1. Construct an entity type MONTH to model the meta info

2. JAN_PROD becomes PROD after removing the context

3. Construct a relationship type PM to relate PROD and MONTH

4. Attributes S1_PRICE, …, SN_PRICE are moved to PM, as they inherit the context (i.e., the month) of the entity type JAN_PROD.

PM is a relationship type between product and month

Page 17: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

17

P RO DP Mm m

S N _ P RICE

M O N T H

S 1_ P RICE

d o m ( M O N T H) = { ja n }

P # P N A M E M O N T H

P RO D

P Mm m

S N _ P RICE

M O N T H

S 1_ P RICE

d o m ( M O N T H) = { d ec}

P # P N A M E M O N T H

P RO D

P Mm m

S N _ P RICE

M O N T H

S 1_ P RICE

d o m ( M O N T H)= { ja n , . . . , d ec}

P # P N A M E M O N T H

. . .

. . .

. . .

S te p 1 .2

Step 1.2: Merge the equivalent entity types, relationship types and attributes which refer to the same ontology names. Note the domains of the MONTH attributes are united.

Page 18: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

18

An example (cont.)

• Transforming DB2 into DB1 in 2 steps– Step 1: Resolve discrepancies for the entity

types JAN_PROD, …, DEC_PROD– Step 2: Resolve discrepancies for the

attributes S1_PRICE, …, SN_PRICE• Step 2.1: Transform meta-attributes into entity

types.• Step 2.2: Merge equivalent entity types,

relationship types and attributes.• Step 2.3: Remove redundant relationship types.

Page 19: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

19

P RO D

P Mm m

S N _ P RICE

M O N T H

S 1_ P RICE

P # P N A M E M O N T H. . .

S te p 2 .1P RO D

P #

M O N T H

P N A M E M O N T H

P M

P M SP RICE

S U P P LIER

S #

d o m ( S # ) = { s1 }

S U P P LIER

S #

d o m ( S # ) = { sn }

P M S

P RICE. . .S 1 _ P R I C E = p rice[ su p p lier= 's1 ']

. . .

S UP P L I E R = su p p lierP R I C E = p rice

mm

m m

mm

mm

Step 2.1: Transform the meta-attribute supplier of the attribute S1_PRICE (the other attributes are similar):

1. Construct an entity type SUPPLIER to model the meta information.

2. Construct a relationship type PMS to relate PROD, MONTH and SUPPLIER.

3. S1_PRICE becomes PRICE after removing the context, and is moved to PMS.

price is an attribute of the relationship

type PMS

Page 20: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

20

P RO D

P #

M O N T H

P N A M E M O N T H

P M

P M SP RICE

S U P P LIER

S #

d o m ( S # ) = { s1 }

S U P P LIER

S #

d o m ( S # ) = { sn }

P M S

P RICE

P RO D

P #

M O N T H

P N A M E M O N T H

P M

P M SP RICE

S U P P LIERS #

d o m ( S # ) = { s1 , . . . , sn }

. . .S te p 2 .2

mm

mm

mm

m

m

mm

mm

m

Step 2.2: Merge the equivalent entity types, relationship types and attributes. The domains of the S# attributes are united.

Page 21: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

21

P RO D

P #

M O N T H

P N A M E M O N T H

P M

P M SP RICE

S U P P LIERS #

d o m ( S # ) = { s1 , . . . , sn }

m

m

mm

S te p 2 .3 P RO D

P #

M O N T H

M O N T H

P M Sm m

P RICE

S U P P LIERS #

m

P N A M E

m

Step 2.3: Remove the redundant relationship type PM that is a projection of PMS.

Page 22: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

22

Semantic preservation

• Our solution to schematic discrepancy preserves the semantics of source schemas in schema transformation:– Information preservation. The instance of a s

chema can be losslessly converted into the instance of another schema, and conversely.

– Constraint preservation. Cardinality constraints of ER schemas can be preserved in schema transformation, but in different forms in the source and transformed schemas (an example is given in the next page).

Page 23: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

23

Constraint Preservation (E.g.)

• Functional dependency (FD) is preserved in the transformation from DB2 to DB1.

• Suppose in each entity type JAN_PROD, …, DEC_PROD of DB2, the FD holds:

P# {S1_PRICE, …, SN_PRICE}

• In DB1, the FD is preserved, but in a different form: {P#, S#, MONTH} PRICE

• In [3], we gave inference rules to derive FDs in schema transformation.

P RO D

P #

M O N T H

M O N T H

S U Pm m

P RICE

S U P P LIERS #

m

D B 1 :

JA N _ P RO D

P # S 1_ P RICE

D B 2 :

S N _ P RICE

P N A M EP N A M E

[3] Qi He and Tok Wang Ling: Extending and inferring functional dependency in schema transformation. CIKM, 2004.

Page 24: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

24

Related work

• The definition of context as a set of meta-attributes with values is originally adopted in [2, 9].

• They defined context at the attribute level only. • We consider contexts at the levels of database,

entity types and attributes, as well as the inheritance of context.

[2] C. H. Goh, S. Bressan, S. Madnick, and M. Siegel: Context interchange: new features and formalisms for the intelligent integration of information. TOIS, 1999

[9] E. Sciore, M. Siegel, A. Rosenthal: Using semantic values to facilitate interoperability among heterogeneous information systems, TODS, 1994

Page 25: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

25

Related work

• Existing work in schema integration focused on the resolution of structural conflicts [1, 7] and constraint conflicts [6, 8].

• Our solution to schematic discrepancy complements those works.

• The resolution of schematic discrepancy is followed by the resolution of other conflicts.

[1] C. Batini, M. Lenzerini: A methodology for data schema integration in the Entity-Relationship model. IEEE Trans. on Software Engineering, 10(6), 1984

[6] Mong Li Lee, Tok Wang Ling: Resolving constraint conflicts in the integration of entity-relationship schemas. ER, 1997

[7] Mong Li Lee, Tok Wang Ling: A methodology for structural conflicts resolution in the integration of entity-relationship schemas. Knowledge and Information Sys., 5, 2003

[8] M. P. Reddy, B.E.Prasad, Amar Gupta: Formulating global integrity constraints during derivation of global schema. Data & Knowledge Engineering, 16, 1995

Page 26: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

26

Related work

• Schematic discrepancy in relational model is solved in some multidatabase languages [4, 5].

• They solved a special problem in schematic discrepancy: they transform relation names or attribute names to attribute values, or converse.

• They did not consider the constraint issue in schema transformation.

• Our work solves a general problem, and preserves cardinality constraints of ER schemas in the schema transformation.

[4] R. Krishnamurthy, W. Litwin, W. Kent: Language features for interoperability of databases with schematic discrepancies. SIGMOD, 1991

[5] L. V. S. Lakshmanan, F. Sadri, S. N. Subramanian: SchemaSQL—an extension to SQL for multidatabase interoperability. TODS, 2001

Page 27: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

27

Conclusion

• ER model supports cardinality constraints, which facilitates the derivation of constraints in schema transformation and integration.

• Context is used to explicitly represent meta information of entity types, relationship types and attributes in ER schemas.

• Schematic discrepancy is resolved by removing context.

• The solution to schematic discrepancy preserves information and constraints.

Page 28: Resolving Schematic Discrepancy in the Integration of  Entity-Relationship Schemas

28

Reference[1] C. Batini, M. Lenzerini: A methodology for data schema integration in the Entity-Relati

onship model. IEEE Trans. on Software Engineering, 10(6), 1984 [2] C. H. Goh, S. Bressan, S. Madnick, and M. Siegel: Context interchange: new features

and formalisms for the intelligent integration of information. ACM Transactions on Information Systems, 17(3), 1999, pp 270-293

[3] Qi He and Tok Wang Ling: Extending and inferring functional dependency in schema transformation. CIKM, 2004.

[4] R. Krishnamurthy, W. Litwin, W. Kent: Language features for interoperability of databases with schematic discrepancies. SIGMOD, 1991, pp 40-49

[5] L. V. S. Lakshmanan, F. Sadri, S. N. Subramanian: SchemaSQL—an extension to SQL for multidatabase interoperability. TODS, 2001, pp 476-519

[6] Mong Li Lee, Tok Wang Ling: Resolving constraint conflicts in the integration of entity-relationship schemas. ER, 1997, pp 394-407

[7] Mong Li Lee, Tok Wang Ling: A methodology for structural conflicts resolution in the integration of entity-relationship schemas. Knowledge and Information Sys., 5, 2003, pp 225-247

[8] M. P. Reddy, B.E.Prasad, Amar Gupta: Formulating global integrity constraints during derivation of global schema. Data & Knowledge Engineering, 16, 1995, pp 241-268

[9] E. Sciore, M. Siegel, A. Rosenthal: Using semantic values to facilitate interoperability among heterogeneous information systems, TODS, 19(2), 1994, pp 254-290