37
Multidatabase and Distributed Manipulations Part 1 Witold Litwin

1 Multidatabase and Distributed Manipulations Part 1 Witold Litwin

  • View
    235

  • Download
    2

Embed Size (px)

Citation preview

1

Multidatabase and Distributed Manipulations

Part 1

Multidatabase and Distributed Manipulations

Part 1

Witold Litwin

2

Plan

Introduction Technical problems Origin of the concept

– Approach : Centralized DB (ANSI-SPARC)

– Approach : DDB (top-down)

– Approach : Global Schema (bottom-up)

Reference architectures– Multidatabase architecture

– Federated architecture

Autonomy, semantic heterogeneity , common model

3

Multidatabase Model

Single database model ANSI-SPARC : – The real universe is modeled as one DB

The real universe is modeled as multiple dbs – Autonomous

– Semantically heterogeneous

– Manipulated through a multibase language "Multidatabase interoperability". Litwin, W. Abdellatif, A. Multidatabase

systems: An advanced solution for global information sharing. Hurson, A., R., Bright, M., W., Pakzad, S., H., (Ed.). IEEE press, 1993

4

Multidatabase Model

Cours & étudiants

Bibliothèque

Employés

Rest.

Mes-amisAutres BDs

surInternet

Paris 9Privé

Teletel

FolioCine

5

Problems

Reference architecture Semantic heterogeneity in presence of local

autonomy Common data model Fonctions of MDB langage Transactions Protocols & standards Performance

6

Reference Architectures

Multidatabase architecture– Generalization of the ANSI-SPARC

architecture – Federated dbs architecture – Generalization of the federated DB

architecture Others

7

ANSI-SPARC Architecture Centralized Integrated BD (1960-70)

CS

PS

ES ES

Internal Level

Ext. Level

Conc. level

Reel Univers

ES - External Schema

CS - Conceptual Schema

PS - Physical or Internal Schema

8

Distributed DB Origin of the concept (years 1970)

– WAN development ( 20 kb/s)

– Overload of centralized dbs

9

Ddb Idea : distribution of functions other than local

communication ("top-down" approach) Which ones ?

Distr. Execution (OS) File access The DB

Then, what data model for CS ? Network (codasyl) ? Relational

10

Relation Fragmentation

ParisLyon

Hotels (H#, Ville, Cat, #Chambres)

(H#, Ville) (H#, Cat, #Chambres)

1 Fragment

11

Problems

GS scalability GS utility for a local user Query performance (bad case) Data migration from ldbs

– IMS, IDMS, Socrate Local applications

12

Problemes With the "Bottom-up" Approach

Creation of GS Semantic heterogeneity Time for the integration / local

restructuring autonomy In other words: scalability of GS Updates Performance Heterogeneos views

Creation of GS Semantic heterogeneity Time for the integration / local

restructuring autonomy In other words: scalability of GS Updates Performance Heterogeneos views

CS CS CS

GS

ES ES

PS PS PS

GS Approach("bottom-up")

13

User may have data in multiple dabs ANSI-SPARC compatibles

One may face multiple css» In general, it will be impossible

to create a GS

User may have data in multiple dabs ANSI-SPARC compatibles

One may face multiple css» In general, it will be impossible

to create a GS

MDB Architecture Absence of (GS)

CS CS CS

GS

ES ES

PS PS PS

GS Approach("bottom-up")

14

MBD Architecture Concept of the MDB Language

Language for definition and manipulation of collections of dbs (multidatabases)

» Definition of MDB ESs Perhaps GSs

» Definition of MDB dependancies Semantics, integrity, security, manipulation...

» Formulation of MDB queries (explicitly) Referencing DB names With MDB joins...

Find in DB michelin and in DB gaumont all restaurants '**' and cinemas at the same street

Language for definition and manipulation of collections of dbs (multidatabases)

» Definition of MDB ESs Perhaps GSs

» Definition of MDB dependancies Semantics, integrity, security, manipulation...

» Formulation of MDB queries (explicitly) Referencing DB names With MDB joins...

Find in DB michelin and in DB gaumont all restaurants '**' and cinemas at the same street

15

MBD Architecture Multibases

A multidatabase (MDB) is a collection of dbs with MDB language– E.G. MSQL

A collection of dbs without MDB langage is not an MDB, but just a collection of dbs– As a collection of flat files (tables) without a

DB language , SQL for example, is not a DB

16

Potential Multibases

Cours & étudiants

Bibliothèque

Employés

Rest.

Mes-amisOther DBs

surInternet

Paris 9Privé

Teletel

FolioCine

17

MDB Architecture Concept of Internal Logical Sublayer

The legacy data models can be heterogeneous– Different SQL dialects– Relational, hierarchical, network– OO and object-relational

It is preferable to have one model at MDB layer– One needs a sub-layer for translations

Also local DBA may wish not to show some of the data at MDB layer

Solution: ILS - internal logical schema » Unknown of ANSI-SPARC» Called also gateway

18

Multibase Architecture Result of All This : (W. Litwin & Al, Années 1980)

PS

ES

CSDS DS

ILS

CS

PS

CS

PS

CS

PS

ES ES

MDB layers

Internal layer

Ext. MDB layer

CS Conc. MDB layer

Usagers

ESmultibase

Req.MDB

19

Federated Architecture (Hambiger & Mcleod, Années 1980)

Every DB should be autonomous In general, there will be no GS

– Global integration is against the autonomy

The dbs used in common should form a federation of autonomous dbs

Every DB in a federation should be provided with three schemes:– ES: export schema

– IS: import schema

– PS: private schema : for all the private data, of ES and of IS included

There should be some federation dictionary (FD)

Every DB should be autonomous In general, there will be no GS

– Global integration is against the autonomy

The dbs used in common should form a federation of autonomous dbs

Every DB in a federation should be provided with three schemes:– ES: export schema

– IS: import schema

– PS: private schema : for all the private data, of ES and of IS included

There should be some federation dictionary (FD)

20

Federated Architecture (Hambiger & Mcleod, Années 1980)

PS

ESIS

PSES

IS

PSES

IS

PSES

IS

Fig 3. Federated databases architecture

FD

21

Comparison

MDB architecture focuses on the concept of MDB language

Federated architecture focuses on the concept of autonomy– No MDB language

– But there is the notion of autonomy also in MDB arch.

MDB architecture is + decentralized– No equivalent of FD

– Several DSs

Both architectures are popular

22

Comparison MDB <-> Fed.MDB Arch. Fed.Arch.

Multidatabase Federation

Autonomy Autonomy

MDB Lang.

¬ GS ¬ GS

CS PS

MDB ES IS

DB ES ES

ILS ES

DSs Fed Dict.

23

DB Autonomy(local autonomy)

Capability to control local DB data by the DBA– Naming– Value Types– Data Structures – Physical Structures– Query Execution – Security– Priority to local queries

24

Multidatabase Autonomy

Capability to controle multiple DBs by a DBA

Same aspects as for the local autonomy– Naming...

May create a conflict with a DB autonomy

Priority to local autonomy

B1 B2 B3

25

Semantic Heterogeneity Differences in representations of the same real

properties Names André Andrew Value Types

– Representations– units of measure cm/s pied/h– precision 1 g 1 Kg

Data Structures One table in 2 NF several tables in 3 NF

26

Solutions (partial) Schemas + descriptive Protocols + descriptive Data Dictionaries Thesaurus Automatic Representation Conversions Automatic Unit Conversion Implicit Joints Higher level models and manip. languages

– IDL (Krishnamourthy, Litwin, Kent, ACM-Sigmod 92

27

Common Models Ext. Relational

– EDA-SQL– MSQL (research)– ODBC Microsoft SQL

Object-Relational– SQL 3

CCS language for inf. retr. DBs Numerous gateways towards SQL

– IMS SQL– Codasyl SQL

28

UniSQL/M

UniSQL/M

SybaseOracleUniSQL

IMS

29

Other gateways

UniSQL/M

SybaseOracleUniSQL

IMS

30

Yet other gateways

UniSQL/M

SybaseOracleUniSQL

IMS

EDA-SQL

31

The future (ODBC, JDBC, UDA…)

ODBC x

32

Conclusion

Modern DBMSs are in general MDBMSs– UniSQL/M, Oracle, Sybase, MsAccess, SQL

Server, InterBase, DB2... MDB access requires new functions for the

management of – autonomy– semantic heterogeneity – physical distribution of data

33

Conclusion technical solutions are basd on :

– new reference architectures » multidatabase architecture

» federated architecture

– common data models » relational and object-relational

– Gateways in rapid development » Any DBMS towards any other using ODBC, JDBC…

» Relational or XML wrappers

34

Conclusion

MDB Languages– MSQL et SQL-x ; x > 2

New transaction models Protocoles et Standards

– ODBC, JDBC…

All this is dealt with more in depth– later on– in the class books & several others

35

Exercises & Research Problems All these already in the text Difference between the concepts of a DB, DDB,MDB and FDB. What does it mean «  reference architecture », as ANSI-SPARC for example ? Differences between the architectures: « top-down » « bottom-up », multibase et

federated. Comment on the actual architecture of federated DBs in DB2 V. 6 and later (see DB2

Help or RedBooks at the IBM web site). Comment on the actual architecture of federated DBs in SQL Server 2000 (see SQL

Server 2000 Help or the papers at the MS web site). Design SQL SELECT statements realizing the fragmentation of the Hotels DB. Can you propose then how to deal with the updates ? Comment on the concept of ILS, of gateway and of mediator What multidatabase common model is most used today ? Comment the concept of local autonomy (what, why, how) Provide examples of various types of semantic heterogeneity Prove that usual associability of equijoints does not hold if units of measure of

values to join may have different precisions What are consequences for relational DBMSs ? Propose an extension to SQL for units of measure and the corresponding query

optimization (PH.D Thesis).

TheEND