Upload
gretchen-nichols
View
74
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Distributed DBMSs – Concepts and Design. Chapter 22 in Elmasry book Copy is available at Miss Fatima Shameem the RA. Overview. Concepts. What is a distributed DBMS? Distributed Processing. Homogeneous vs. Heterogeneous. Functions of a DDBMS. Components of a DDBMS. - PowerPoint PPT Presentation
Citation preview
Distributed DBMSs – Concepts and Design
Chapter 22 in Elmasry bookCopy is available at Miss Fatima Shameem the RA
Overview
2
Concepts. What is a distributed DBMS? Distributed Processing. Homogeneous vs. Heterogeneous.
Functions of a DDBMS. Components of a DDBMS. Advantages and Disadvantages. DDBMS Design.
Fragmentation. Replication. Allocation.
DDBMS Transparencies. Date’s 12 Rules for a DDBMS.
Concepts
3
Centralized DBMS systems with a single logical database
located at one site under the control of a single DBMS.
Distributed DBs logically interrelated collection of shared
data physically distributed over a computer network.
Applications can be classified into: Local applications.
Global applications.
Distributed DBMS
4
Distributed DBMS The software system that: manages the distributed DBs.
makes distribution transparent to users.
allows users to access data on their own site as well
as remote sites.
Transparent distribution is the fundamental
principle of DDBMS.
Characteristics of DDBMS
5
• A collection of logically related shared data.
• The data is split into a number of fragments.
• Fragments may be replicated.
• Fragments/replicas are allocated to sites.
• The sites are linked by a communications networks.
• The data at each site is under the control of a DBMS.
• The DBMS at each site can handle local applications.
• Each DBMS participates in at least one global application.
Distributed DBMS Topology
6
Site 1
Site 2
Site 3
Site 4
Computer Network
Data itself is distributed and access to it can be local or remote.
Distributed Processing
7
Site 1
Site 2
Site 3
Site 4
Computer Network
Data itself is centralized but access to it can be local or remote.
Homogeneous vs. Heterogeneous DDBMS
8
Homogenous system: all sites use the same DBMS product.
Heterogeneous system: sites may run different DBMS
products & data model.
Possible differences between data in different DBS:
• Data type difference.
• Value difference.
• Semantic difference.
Functions of a DDBMS
9
• Provide access to remote sites and allow transfer of
queries & data among the network’s site.
• Store data distribution details.
• Distributed data processing.
• Security control.
• Concurrency control.
• Recovery services.
Components of a DDBMS
10
Site 1
Site 3
Computer Network
DDBMS
DC LDBMS
DDBMS
DC
GSC
GSC
DB
Global system catalog
Data communication component
Advantages of DDBMS
11
• Reflects organizational structure.
• Improve sharability & local autonomy.
• Improved availability.
• Improved reliability.
• Improved performance.
Disadvantages of DDBMS
12
• Complexity.
• Cost.
• Security.
• Integrity control more difficult.
• Lack of standards.
• Lack of experience.
• DB design more complex.
Distributed Relational DB Design
13
We have a group of tables and we want to distribute them between a group of sites.
Consists of 3 major steps:1. Fragmentation divide a relation into a number of sub-relations (fragments).
(Horizontal & vertical).
2. Replication make a copy of a fragment.
3. Allocation decide where (which site) each of the fragments and replicas are
to be stored.
Distributed Relational DB Design
14
When we fragment, replicate and allocate, we try
to achieve:
• Locality of reference.
• Improved reliability and availability.
• Good performance.
• Balanced storage capacities and costs.
• Minimal communication costs.
Rules of Fragmentation
15
Completeness: Nothing (rows or columns) gets lost while we fragment.
Reconstruction: We can get back the original table after we fragmented it.
Dis-jointness: No row or column appears in 2 fragments (there is 1 exception).
Types of Fragmentation
16
Horizontal fragmentation
Vertical fragmentation
Mixedfragmentation
Original PropertyForRent Table
17
PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005
PropertyNo CityStreet PostCode Type Rooms Rent OwnerNo StaffNo BranchNo
18
BranchNo
PropertyNo CityStreet
Fragment P1
PostCode Type Rooms Rent OwnerNo StaffNo BranchNo
PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003
PropertyNo CityStreet
Fragment P2
PostCode Type Rooms Rent OwnerNo StaffNo
PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003
Based on type of property.
P1: Type=‘House’ (PropertyForRent)
P2: Type=‘Flat’ (PropertyForRent)
Horizontal Fragmentation
Original Staff Table
19
John
Ann
David
Susan
FName
White
Beech
Ford
Brand
LName BranchNo
B005
B003
B003
B007
SL21
SG37
SG14
SG5
StaffNo
Manager
Assistant
Supervisor
Assistant
Position
M
F
M
F
sex Salary
30000
12000
18000
24000
DOB
1 Oct 93
10 Nov 60
24 Mar 58
3 Jun 40
20
SL21
SG37
SG14
SG5
StaffNo
John
Ann
David
Susan
FName
White
Beech
Ford
Brand
LName BranchNo
B005
B003
B003
B007
SL21
SG37
SG14
SG5
StaffNo
Manager
Assistant
Supervisor
Assistant
Position
M
F
M
F
sex Salary
30000
12000
18000
24000
DOB
1 Oct 93
10 Nov 60
24 Mar 58
3 Jun 40
Fragment S1 Fragment S2
S1: staffno,Position,sex,DOB, Salary(STAFF)
S2: staffno,fname,lname,BranchNo(STAFF)
Vertical Fragmentation
21
StaffNo FName LName BranchNo
SG5 Susan Brand B007
SL21
SG37
SG14
SG5
StaffNo
Manager
Assistant
Supervisor
Assistant
Position
M
F
M
F
sex Salary
30000
12000
18000
24000
DOB
1 Oct 93
10 Nov 60
24 Mar 58
3 Jun 40
Fragment S2.3
SL21
StaffNo
John
FName
White
LName BranchNo
B005
Fragment S2.1
StaffNo FName LName BranchNo
Fragment S2.2
SG14 David Ford B003
SG37 Ann Beech B003
S2.1: BranchNo=‘B005’ (S2)
S2.2: BranchNo=‘B003’ (S2)
S2.3: BranchNo=‘B007’ (S2)
S1: staffno,Position,sex,DOB, Salary(STAFF)
S2: staffoo,fname,lname,BranchNo(STAFF)
Fragment S1
Mixed Fragmentation – Vertical then Horizontal
Derived Horizontal Fragmentation
22
Derived Horizontal Fragmentation is the horizontal fragmentation of a table (child), T1, because we horizontally fragmented another related table (parent), T2.
It is not explicitly specified in design but implied from fragmentation of T2.
T1 (child) has a foreign key that belongs to T2 (parent).
Relationship between T1 and T2 either 1-to-1 or Many-to-1.
Use Semi-join operation:
Derived Horizontal Fragmentation
23
You were required by the design to horizontally fragment Staff table. S1: BranchNo=‘B003’ (Staff) S2: BranchNo=‘B005’ (Staff) S3: BranchNo=‘B007’ (Staff)
John
Ann
David
Susan
FName
White
Beech
Ford
Brand
LName BranchNo
B005
B003
B003
B007
SL21
SG37
SG14
SG5
StaffNo
Manager
Assistant
Supervisor
Assistant
Position
M
F
M
F
sex Salary
30000
12000
18000
24000
DOB
1 Oct 93
10 Nov 60
24 Mar 58
3 Jun 40
Derived Horizontal Fragmentation
24
Ann
David
FName
Beech
Ford
LName BranchNo
B003
B003
SG37
SG14
StaffNo
Assistant
Supervisor
Position
F
M
sex Salary
12000
18000
DOB
10 Nov 60
24 Mar 58
FName LName BranchNoStaffNo Position sex SalaryDOB
John White B005SL21 Manager M 300001 Oct 93
FName LName BranchNoStaffNo Position sex SalaryDOB
Susan Brand B007SG5 Assistant F 240003 Jun 40
Fragment S1
Fragment S2
Fragment S3
Derived Horizontal Fragmentation
25
After we fragmented Staff, we found out that there is a table related to it, PropertyForRent.
Because Staff is now fragmented, it makes sense to fragment PropertyForRent too.
PropertyForRent
Staffhandle
s1 N
S1: BranchNo=‘B003’ (Staff)
S2: BranchNo=‘B005’ (Staff) Pi: PropertyForRent staffNo Si
S3: BranchNo=‘B007’ (Staff)
Original PropertyForRent Table
26
PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005
PropertyNo CityStreet PostCode Type Rooms Rent OwnerNo StaffNo BranchNo
27
PropertyNo CityStreet
Fragment P1
PostCode Type Rooms Rent OwnerNo StaffNo BranchNo
PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003
PropertyNo CityStreet
Fragment P2
PostCode Type Rooms Rent OwnerNo StaffNo BranchNo
PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005
PropertyNo CityStreet
Fragment P3
PostCode Type Rooms Rent OwnerNo StaffNo BranchNo
PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007
Derived Horizontal Fragmentation
Transparencies in a DDBMS
28
4 main transparencies:1. Distribution Transparency.
a. Fragmnetation.b. Location. c. Replication.d. Local Mapping.e. Naming.
2. Transaction Transparency.3. Performance Transparency.4. DBMS Transparency.
1. Distribution Transparency
29
Allows the user to perceive the DB as a single, logical entity. Types:
a. Fragmentation: the user does not need to know the data is fragmented.
b. Location: the user does not need to know the location of fragments.
c. Replication: the user does not need to know the fragments are replicated.
d. Local Mapping: the user specifies the fragment and its location.
e. Naming: DDBMS makes sure every item name is unique.
Consider the distribution of the STAFF relation: S1: staffno,Position,sex,DOB, Salary(STAFF) S2: staffno,fname,lname,BranchNo(STAFF) S21: BranchNo=‘B003’ (S2) S22: BranchNo=‘B005’ (S2) S22: BranchNo=‘B007’ (S2)
a. Fragmentation Transparency
30
Highest level of distribution transparency. The user does not need to know that the data is
fragmented. User treats DDB like a centralized DB. The database access are based on the global schema. Fragmentation of the data can be changed without
impacting the user.
Example:SELECT Fname, Lname FROM Staff WHERE position = ‘Manager’;
b. Location Transparency
31
The middle level of distribution transparency.
The user must know that the data is fragmented but still does not need
to know the location of the data.
Data location can be changed without impact on the user.
Example:
SELECT Fname, Lname FROM S21
WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)
UNION
SELECT Fname, Lname FROM S22
WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)
UNION
SELECT Fname, Lname FROM S23
WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)
c. Replication Transparency
32
User unaware of replication and location but knows that data is fragmented.
On the same level with location transparency.
d. Local Mapping Transparency
33
The lowest level of distribution transparency.
The user knows that the data is fragmented and the location of the data.
Example:
SELECT Fname, Lname FROM S21 AT SITE 3
WHERE staffNo IN
(SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’)
UNION
SELECT Fname, Lname FROM S22 AT SITE 5
WHERE staffNo IN
(SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’)
UNION
SELECT Fname, Lname FROM S23 AT SITE 7
WHERE staffNo IN
(SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’)
e. Naming Transparency
34
Each item in distributed database must have a unique name.
DDBMS must ensure that no two sites violate that.
Solutions Create a central name server.
Bottleneck. against local autonomy.
Prefix an object with the identifier of the site. loss of distribution transparency.
2. Transaction Transparency
35
All transactions must ensure the consistency and integrity of the DDB.
Each transaction that needs to access data in multiple sites is divided into multiple sub-transactions.
Even if transaction is split, atomicity has to be maintained.
3. Performance Transparency
36
DDBMS performs as if it were a centralized DBMS.
Should not suffer because it is distributed (network communication cost).
When a site issues a query, the system must figure out the fastest way of executing it.
Distributed Query Processor (DQP) must figure out: Which fragment to access. Which copy of fragment to access (if replication is
used). Where are the fragments.
3. Performance Transparency
37
Consider the following distributed DB: Property(PropertyNo, city) 10,000 records in London Client(ClientNo, maxPrice) 100,000 records in Glasgow Viewing(PropertNo, ClientNo) 1,000,000 records in London
London site wants to list properties in Aberdeen that have been viewed by clients who have a maximum price limit greater than 200,000.
SELECT p.propertyNo
FROM Property P INNER JOIN
(Client c INNER JOIN Viewing v ON c.clientNo = v.clientNo)
ON p.propertyNo = v.propertyNo
WHERE p.city = ‘Aberdeen’ AND
c.maxprice > 200000;
3. Performance Transparency
38
After the query is issued, DDBMS must determine the most cost-effective strategy to execute the query.
Strategies:
1. Move Client table to London and process query there.
2. Move Property and Viewing relation to Glasgow and process query there then return result.
3. Join Property and Viewing at London, project only property number and client number and move result to Glasgow to join with clients with salary > 200,000 then return results.
4. Select clients at Glasgow with salary > 200000, move them to London and join with viewing and Aberdeen property.
4. DBMS Transparency
39
Hides the fact that different sites have different local DBMSs.
Heterogeneous DDBMSs.
Date’s 12 Rules for a DDBMS
40
1. Local autonomy.
2. No reliance on a central site.
3. Continuous operation.
4. Location independence.
5. Fragmentation independence.
6. Replication independence.
7. Distributed query processing.
8. Distributed transaction processing.
9. Hardware independence.
10. Operating system independence.
11. Network independence.
12. Database independence.