40
Distributed DBMSs – Concepts and Design Chapter 22 in Elmasry book Copy is available at Miss Fatima Shameem the RA

Distributed DBMSs – Concepts and Design

Embed Size (px)

DESCRIPTION

Distributed DBMSs – Concepts and Design. Chapter 22 in Elmasry book Copy is available at Miss Fatima Shameem the RA. Overview. Concepts. What is a distributed DBMS? Distributed Processing. Homogeneous vs. Heterogeneous. Functions of a DDBMS. Components of a DDBMS. - PowerPoint PPT Presentation

Citation preview

Page 1: Distributed DBMSs – Concepts and Design

Distributed DBMSs – Concepts and Design

Chapter 22 in Elmasry bookCopy is available at Miss Fatima Shameem the RA

Page 2: Distributed DBMSs – Concepts and Design

Overview

2

Concepts. What is a distributed DBMS? Distributed Processing. Homogeneous vs. Heterogeneous.

Functions of a DDBMS. Components of a DDBMS. Advantages and Disadvantages. DDBMS Design.

Fragmentation. Replication. Allocation.

DDBMS Transparencies. Date’s 12 Rules for a DDBMS.

Page 3: Distributed DBMSs – Concepts and Design

Concepts

3

Centralized DBMS systems with a single logical database

located at one site under the control of a single DBMS.

Distributed DBs logically interrelated collection of shared

data physically distributed over a computer network.

Applications can be classified into: Local applications.

Global applications.

Page 4: Distributed DBMSs – Concepts and Design

Distributed DBMS

4

Distributed DBMS The software system that: manages the distributed DBs.

makes distribution transparent to users.

allows users to access data on their own site as well

as remote sites.

Transparent distribution is the fundamental

principle of DDBMS.

Page 5: Distributed DBMSs – Concepts and Design

Characteristics of DDBMS

5

• A collection of logically related shared data.

• The data is split into a number of fragments.

• Fragments may be replicated.

• Fragments/replicas are allocated to sites.

• The sites are linked by a communications networks.

• The data at each site is under the control of a DBMS.

• The DBMS at each site can handle local applications.

• Each DBMS participates in at least one global application.

Page 6: Distributed DBMSs – Concepts and Design

Distributed DBMS Topology

6

Site 1

Site 2

Site 3

Site 4

Computer Network

Data itself is distributed and access to it can be local or remote.

Page 7: Distributed DBMSs – Concepts and Design

Distributed Processing

7

Site 1

Site 2

Site 3

Site 4

Computer Network

Data itself is centralized but access to it can be local or remote.

Page 8: Distributed DBMSs – Concepts and Design

Homogeneous vs. Heterogeneous DDBMS

8

Homogenous system: all sites use the same DBMS product.

Heterogeneous system: sites may run different DBMS

products & data model.

Possible differences between data in different DBS:

• Data type difference.

• Value difference.

• Semantic difference.

Page 9: Distributed DBMSs – Concepts and Design

Functions of a DDBMS

9

• Provide access to remote sites and allow transfer of

queries & data among the network’s site.

• Store data distribution details.

• Distributed data processing.

• Security control.

• Concurrency control.

• Recovery services.

Page 10: Distributed DBMSs – Concepts and Design

Components of a DDBMS

10

Site 1

Site 3

Computer Network

DDBMS

DC LDBMS

DDBMS

DC

GSC

GSC

DB

Global system catalog

Data communication component

Page 11: Distributed DBMSs – Concepts and Design

Advantages of DDBMS

11

• Reflects organizational structure.

• Improve sharability & local autonomy.

• Improved availability.

• Improved reliability.

• Improved performance.

Page 12: Distributed DBMSs – Concepts and Design

Disadvantages of DDBMS

12

• Complexity.

• Cost.

• Security.

• Integrity control more difficult.

• Lack of standards.

• Lack of experience.

• DB design more complex.

Page 13: Distributed DBMSs – Concepts and Design

Distributed Relational DB Design

13

We have a group of tables and we want to distribute them between a group of sites.

Consists of 3 major steps:1. Fragmentation divide a relation into a number of sub-relations (fragments).

(Horizontal & vertical).

2. Replication make a copy of a fragment.

3. Allocation decide where (which site) each of the fragments and replicas are

to be stored.

Page 14: Distributed DBMSs – Concepts and Design

Distributed Relational DB Design

14

When we fragment, replicate and allocate, we try

to achieve:

• Locality of reference.

• Improved reliability and availability.

• Good performance.

• Balanced storage capacities and costs.

• Minimal communication costs.

Page 15: Distributed DBMSs – Concepts and Design

Rules of Fragmentation

15

Completeness: Nothing (rows or columns) gets lost while we fragment.

Reconstruction: We can get back the original table after we fragmented it.

Dis-jointness: No row or column appears in 2 fragments (there is 1 exception).

Page 16: Distributed DBMSs – Concepts and Design

Types of Fragmentation

16

Horizontal fragmentation

Vertical fragmentation

Mixedfragmentation

Page 17: Distributed DBMSs – Concepts and Design

Original PropertyForRent Table

17

PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005

PropertyNo CityStreet PostCode Type Rooms Rent OwnerNo StaffNo BranchNo

Page 18: Distributed DBMSs – Concepts and Design

18

BranchNo

PropertyNo CityStreet

Fragment P1

PostCode Type Rooms Rent OwnerNo StaffNo BranchNo

PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003

PropertyNo CityStreet

Fragment P2

PostCode Type Rooms Rent OwnerNo StaffNo

PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003

Based on type of property.

P1: Type=‘House’ (PropertyForRent)

P2: Type=‘Flat’ (PropertyForRent)

Horizontal Fragmentation

Page 19: Distributed DBMSs – Concepts and Design

Original Staff Table

19

John

Ann

David

Susan

FName

White

Beech

Ford

Brand

LName BranchNo

B005

B003

B003

B007

SL21

SG37

SG14

SG5

StaffNo

Manager

Assistant

Supervisor

Assistant

Position

M

F

M

F

sex Salary

30000

12000

18000

24000

DOB

1 Oct 93

10 Nov 60

24 Mar 58

3 Jun 40

Page 20: Distributed DBMSs – Concepts and Design

20

SL21

SG37

SG14

SG5

StaffNo

John

Ann

David

Susan

FName

White

Beech

Ford

Brand

LName BranchNo

B005

B003

B003

B007

SL21

SG37

SG14

SG5

StaffNo

Manager

Assistant

Supervisor

Assistant

Position

M

F

M

F

sex Salary

30000

12000

18000

24000

DOB

1 Oct 93

10 Nov 60

24 Mar 58

3 Jun 40

Fragment S1 Fragment S2

S1: staffno,Position,sex,DOB, Salary(STAFF)

S2: staffno,fname,lname,BranchNo(STAFF)

Vertical Fragmentation

Page 21: Distributed DBMSs – Concepts and Design

21

StaffNo FName LName BranchNo

SG5 Susan Brand B007

SL21

SG37

SG14

SG5

StaffNo

Manager

Assistant

Supervisor

Assistant

Position

M

F

M

F

sex Salary

30000

12000

18000

24000

DOB

1 Oct 93

10 Nov 60

24 Mar 58

3 Jun 40

Fragment S2.3

SL21

StaffNo

John

FName

White

LName BranchNo

B005

Fragment S2.1

StaffNo FName LName BranchNo

Fragment S2.2

SG14 David Ford B003

SG37 Ann Beech B003

S2.1: BranchNo=‘B005’ (S2)

S2.2: BranchNo=‘B003’ (S2)

S2.3: BranchNo=‘B007’ (S2)

S1: staffno,Position,sex,DOB, Salary(STAFF)

S2: staffoo,fname,lname,BranchNo(STAFF)

Fragment S1

Mixed Fragmentation – Vertical then Horizontal

Page 22: Distributed DBMSs – Concepts and Design

Derived Horizontal Fragmentation

22

Derived Horizontal Fragmentation is the horizontal fragmentation of a table (child), T1, because we horizontally fragmented another related table (parent), T2.

It is not explicitly specified in design but implied from fragmentation of T2.

T1 (child) has a foreign key that belongs to T2 (parent).

Relationship between T1 and T2 either 1-to-1 or Many-to-1.

Use Semi-join operation:

Page 23: Distributed DBMSs – Concepts and Design

Derived Horizontal Fragmentation

23

You were required by the design to horizontally fragment Staff table. S1: BranchNo=‘B003’ (Staff) S2: BranchNo=‘B005’ (Staff) S3: BranchNo=‘B007’ (Staff)

John

Ann

David

Susan

FName

White

Beech

Ford

Brand

LName BranchNo

B005

B003

B003

B007

SL21

SG37

SG14

SG5

StaffNo

Manager

Assistant

Supervisor

Assistant

Position

M

F

M

F

sex Salary

30000

12000

18000

24000

DOB

1 Oct 93

10 Nov 60

24 Mar 58

3 Jun 40

Page 24: Distributed DBMSs – Concepts and Design

Derived Horizontal Fragmentation

24

Ann

David

FName

Beech

Ford

LName BranchNo

B003

B003

SG37

SG14

StaffNo

Assistant

Supervisor

Position

F

M

sex Salary

12000

18000

DOB

10 Nov 60

24 Mar 58

FName LName BranchNoStaffNo Position sex SalaryDOB

John White B005SL21 Manager M 300001 Oct 93

FName LName BranchNoStaffNo Position sex SalaryDOB

Susan Brand B007SG5 Assistant F 240003 Jun 40

Fragment S1

Fragment S2

Fragment S3

Page 25: Distributed DBMSs – Concepts and Design

Derived Horizontal Fragmentation

25

After we fragmented Staff, we found out that there is a table related to it, PropertyForRent.

Because Staff is now fragmented, it makes sense to fragment PropertyForRent too.

PropertyForRent

Staffhandle

s1 N

S1: BranchNo=‘B003’ (Staff)

S2: BranchNo=‘B005’ (Staff) Pi: PropertyForRent staffNo Si

S3: BranchNo=‘B007’ (Staff)

Page 26: Distributed DBMSs – Concepts and Design

Original PropertyForRent Table

26

PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005

PropertyNo CityStreet PostCode Type Rooms Rent OwnerNo StaffNo BranchNo

Page 27: Distributed DBMSs – Concepts and Design

27

PropertyNo CityStreet

Fragment P1

PostCode Type Rooms Rent OwnerNo StaffNo BranchNo

PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003

PropertyNo CityStreet

Fragment P2

PostCode Type Rooms Rent OwnerNo StaffNo BranchNo

PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005

PropertyNo CityStreet

Fragment P3

PostCode Type Rooms Rent OwnerNo StaffNo BranchNo

PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007

Derived Horizontal Fragmentation

Page 28: Distributed DBMSs – Concepts and Design

Transparencies in a DDBMS

28

4 main transparencies:1. Distribution Transparency.

a. Fragmnetation.b. Location. c. Replication.d. Local Mapping.e. Naming.

2. Transaction Transparency.3. Performance Transparency.4. DBMS Transparency.

Page 29: Distributed DBMSs – Concepts and Design

1. Distribution Transparency

29

Allows the user to perceive the DB as a single, logical entity. Types:

a. Fragmentation: the user does not need to know the data is fragmented.

b. Location: the user does not need to know the location of fragments.

c. Replication: the user does not need to know the fragments are replicated.

d. Local Mapping: the user specifies the fragment and its location.

e. Naming: DDBMS makes sure every item name is unique.

Consider the distribution of the STAFF relation: S1: staffno,Position,sex,DOB, Salary(STAFF) S2: staffno,fname,lname,BranchNo(STAFF) S21: BranchNo=‘B003’ (S2) S22: BranchNo=‘B005’ (S2) S22: BranchNo=‘B007’ (S2)

Page 30: Distributed DBMSs – Concepts and Design

a. Fragmentation Transparency

30

Highest level of distribution transparency. The user does not need to know that the data is

fragmented. User treats DDB like a centralized DB. The database access are based on the global schema. Fragmentation of the data can be changed without

impacting the user.

Example:SELECT Fname, Lname FROM Staff WHERE position = ‘Manager’;

Page 31: Distributed DBMSs – Concepts and Design

b. Location Transparency

31

The middle level of distribution transparency.

The user must know that the data is fragmented but still does not need

to know the location of the data.

Data location can be changed without impact on the user.

Example:

SELECT Fname, Lname FROM S21

WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)

UNION

SELECT Fname, Lname FROM S22

WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)

UNION

SELECT Fname, Lname FROM S23

WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)

Page 32: Distributed DBMSs – Concepts and Design

c. Replication Transparency

32

User unaware of replication and location but knows that data is fragmented.

On the same level with location transparency.

Page 33: Distributed DBMSs – Concepts and Design

d. Local Mapping Transparency

33

The lowest level of distribution transparency.

The user knows that the data is fragmented and the location of the data.

Example:

SELECT Fname, Lname FROM S21 AT SITE 3

WHERE staffNo IN

(SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’)

UNION

SELECT Fname, Lname FROM S22 AT SITE 5

WHERE staffNo IN

(SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’)

UNION

SELECT Fname, Lname FROM S23 AT SITE 7

WHERE staffNo IN

(SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’)

Page 34: Distributed DBMSs – Concepts and Design

e. Naming Transparency

34

Each item in distributed database must have a unique name.

DDBMS must ensure that no two sites violate that.

Solutions Create a central name server.

Bottleneck. against local autonomy.

Prefix an object with the identifier of the site. loss of distribution transparency.

Page 35: Distributed DBMSs – Concepts and Design

2. Transaction Transparency

35

All transactions must ensure the consistency and integrity of the DDB.

Each transaction that needs to access data in multiple sites is divided into multiple sub-transactions.

Even if transaction is split, atomicity has to be maintained.

Page 36: Distributed DBMSs – Concepts and Design

3. Performance Transparency

36

DDBMS performs as if it were a centralized DBMS.

Should not suffer because it is distributed (network communication cost).

When a site issues a query, the system must figure out the fastest way of executing it.

Distributed Query Processor (DQP) must figure out: Which fragment to access. Which copy of fragment to access (if replication is

used). Where are the fragments.

Page 37: Distributed DBMSs – Concepts and Design

3. Performance Transparency

37

Consider the following distributed DB: Property(PropertyNo, city) 10,000 records in London Client(ClientNo, maxPrice) 100,000 records in Glasgow Viewing(PropertNo, ClientNo) 1,000,000 records in London

London site wants to list properties in Aberdeen that have been viewed by clients who have a maximum price limit greater than 200,000.

SELECT p.propertyNo

FROM Property P INNER JOIN

(Client c INNER JOIN Viewing v ON c.clientNo = v.clientNo)

ON p.propertyNo = v.propertyNo

WHERE p.city = ‘Aberdeen’ AND

c.maxprice > 200000;

Page 38: Distributed DBMSs – Concepts and Design

3. Performance Transparency

38

After the query is issued, DDBMS must determine the most cost-effective strategy to execute the query.

Strategies:

1. Move Client table to London and process query there.

2. Move Property and Viewing relation to Glasgow and process query there then return result.

3. Join Property and Viewing at London, project only property number and client number and move result to Glasgow to join with clients with salary > 200,000 then return results.

4. Select clients at Glasgow with salary > 200000, move them to London and join with viewing and Aberdeen property.

Page 39: Distributed DBMSs – Concepts and Design

4. DBMS Transparency

39

Hides the fact that different sites have different local DBMSs.

Heterogeneous DDBMSs.

Page 40: Distributed DBMSs – Concepts and Design

Date’s 12 Rules for a DDBMS

40

1. Local autonomy.

2. No reliance on a central site.

3. Continuous operation.

4. Location independence.

5. Fragmentation independence.

6. Replication independence.

7. Distributed query processing.

8. Distributed transaction processing.

9. Hardware independence.

10. Operating system independence.

11. Network independence.

12. Database independence.