Upload
nishant-kumar-narottam
View
220
Download
0
Embed Size (px)
Citation preview
8/7/2019 1---DistributedDBDesign-Basic Concepts
1/18
Distributed Database Design
BASICS
8/7/2019 1---DistributedDBDesign-Basic Concepts
2/18
Introduction Distributed Database defn:
A logicallly interrelated collection of shareddata, physically distributed over a computernetwork
Distributed DBMS defn:
the software system that permits themanagement of the distributed database andmakes the distribution transparent to users
8/7/2019 1---DistributedDBDesign-Basic Concepts
3/18
Distributed Systems Data spread over multiple machines
(also referred to as sites or nodes).
Network interconnects the machines
Data shared by users on multiplemachines
8/7/2019 1---DistributedDBDesign-Basic Concepts
4/18
Advantages Organisational structure - many
organisations cover several sites Shareability -users at different sites can share Local autonomy each site is able to retain a
degree of control over data stored locally
Improved availability- node failure will notmake system inoperable
Improved Reliability- replicated data allowsdata accessability Improved Performance - data located near
site Modular Growth - easier expansion
8/7/2019 1---DistributedDBDesign-Basic Concepts
5/18
Disadvantages Complexity - more complex than centralised
Cost - added network and maintenance costs
Security - network must be made secure Integrity control - more difficult to ensure
proper coordination among sites.
Lack of standards experience - no tools ormethodologies
Complex Design- Database design morecomplex
8/7/2019 1---DistributedDBDesign-Basic Concepts
6/18
Same software/schema on all sites, data may be
partitioned among sites Goal: provide a view of a single database, hiding
details of distribution
Characteristics All sites use same DBMS product.
Much easier to design and manage.
Approach provides incremental growth and allows
increased performance.
Homogeneous DDBMSClassification of DDBMS
8/7/2019 1---DistributedDBDesign-Basic Concepts
7/18
Different software/schema on different sites
Goal: integrate existing databases to provide
useful functionalityCharacteristics Sites may run different DBMS products, with possibly different
underlying data models.
Occurs when sites have implemented their own databases and
integration is considered later.
Translations required to allow for:
Differenthardware.
DifferentDBMS products.
DifferentHW and differentDBMS products.
Typical solution is to use gateways.
HeterogeneousDDBMS
8/7/2019 1---DistributedDBDesign-Basic Concepts
8/18
Classification Contd
Type of DBMS LAN network WAN network
Homogenous
Heterogeneous
Data managementand financialapplications
Inter-divisionalinformationsystems
Travel managementand finanacialapplications
Integrated bankingand inter-bankingsystems
Examples of typical applications:
8/7/2019 1---DistributedDBDesign-Basic Concepts
9/18
Design Issues with DDBMS In designing a distributed database, the same issues are faced
as for a centralized database plus, in addition:
Fragmentation
Relation may be divided into a number of sub-relations,which are then distributed.
Allocation:
Each fragment is stored at site with "optimal" distribution.
Replication
Copy of fragment may be maintained at several sites.
8/7/2019 1---DistributedDBDesign-Basic Concepts
10/18
Functions of DDBMS
extended communication to allow the transfer of
queries and data among sites extended system catalog to store data distribution
details
distributed query processing , including query
optimisation extended concurrency control to maintain consistency
of replicated data
extended recovery services to take account of failuresof individual sites and comms links
Functions of a Centralised DBMS plus:
8/7/2019 1---DistributedDBDesign-Basic Concepts
11/18
Component Architecture LocalDBMS (LDBMS) - responsible for local data
Transaction, Buffer and Recovery Managers and Scheduler
DataCommunications (DC) component- allows all sites to
communicate with each other lobalsystemcatalog (GSC) - catalog information re:
fragmentation and allocation schema
DistributedDBMS (DDBMS) - controlling unit of the entiresystem
ComponentsofaDDBMS
8/7/2019 1---DistributedDBDesign-Basic Concepts
12/18
Due to diversity, no accepted architecture
equivalent to ANSI/SPARC 3-level
architecture.
A reference architecture consists of:Set of global external schemas. (GES)Global conceptual schema (GCS).Fragmentation schemaAllocation schema.
Set of schemas for each local DBMSconforming to 3-level ANSI/SPARC.
Some levels may be missing, depending on
levels of transparency supported.
Reference Architecture for
DDBMS
8/7/2019 1---DistributedDBDesign-Basic Concepts
13/18
Local and Global Transactions
A local transaction accesses data in the single siteat which the transaction was initiated.
A global transaction either accesses data in a site
different from the one at which the transactionwas initiated or accesses data in several differentsites.
8/7/2019 1---DistributedDBDesign-Basic Concepts
14/18
Implementation Issues for Distributed
Databases Atomicity needed even for transactions that update
data at multiple sites
The two-ph
ase commit protocol (2P
C) is used toensure atomicity
Basic idea: each site executes transaction until just beforecommit, and the leaves final decision to a coordinator
Each site must follow decision of coordinator, even if there is afailure while waiting for coordinators decision
Distributed concurrency control (and deadlock detection)required
Data items may be replicated to improve data availability
8/7/2019 1---DistributedDBDesign-Basic Concepts
15/18
DDBMS Network Types Local-areanetworks (LANs) composed
of processors that are distributed over small
geographical areas, such as a singlebuilding or a few adjacent buildings.
Wide-areanetworks (WANs) composed of processors distributed over alarge geographical area.
8/7/2019 1---DistributedDBDesign-Basic Concepts
16/18
Networks Types (Cont.) WANs with continuous connection (e.g. the
Internet) are needed for implementing distributed
database systems Groupware applications such as Lotus notes can
work on WANs with discontinuous connection:
Data is replicated.
Updates are propagated to replicas periodically.
Copies of data may be updated independently.
Non-serializable executions can thus result.
Resolution is application dependent.
8/7/2019 1---DistributedDBDesign-Basic Concepts
17/18
Other CategoriesOpen Database Access and Interoperability
Open Group has formed a Working Group to provide specifications thatwill create database infrastructure environment where there is:
Common SQL API that allows client applications to be written that donot need to know vendor of DBMS they are accessing.
Common database protocol that enables DBMS from one vendor tocommunicate directly with DBMS from another vendor without theneed for a gateway.
A common network protocol that allows communications betweendifferent DBMSs.
Most ambitious goal is to find a way to enable transaction to span DBMSsfrom different vendors without use of a gateway.
8/7/2019 1---DistributedDBDesign-Basic Concepts
18/18
DDBMS in which each site maintains complete
autonomy. DBMS that resides transparently on top of existing
database and file systems and presents a singledatabase to its users.
Allows users to access and share data withoutrequiring physical database integration.
Two Categories: Unfederated MDBS (no local
users) and
federate
dMDBS.
MultiDatabase System (MDBS)Other Categories Contd..