Chap 1 Introduction to DDBMS

Embed Size (px)

Citation preview

  • 8/4/2019 Chap 1 Introduction to DDBMS

    1/34

    Introduction to DDBMSIntroduction to DDBMS

    What is a distributed DBMS

    DBMS and Computer Network

    Integration and Centralization

    DDBMS environments

    Transparency

    Advantages

  • 8/4/2019 Chap 1 Introduction to DDBMS

    2/34

    File SystemsFile Systems

    program 1

    data description 1

    program 2

    data description 2

    program 3

    data description 3

    File 1

    File 2

    File 3

  • 8/4/2019 Chap 1 Introduction to DDBMS

    3/34

    Database ManagementDatabase Management

    database

    DBMS

    Applicationprogram 1(with datasemantics)

    Applicationprogram 2(with datasemantics)

    Applicationprogram 3(with datasemantics)

    description

    manipulation

    control

  • 8/4/2019 Chap 1 Introduction to DDBMS

    4/34

    MotivationMotivation

    DatabaseTechnology

    ComputerNetworks

    integration distribution

    integration

    integration centralization

    Distributed

    D

    atabaseSystems

  • 8/4/2019 Chap 1 Introduction to DDBMS

    5/34

    Distributed ComputingDistributed Computing

    y A concept in search of a definition and a name.

    y A number of autonomous processing elements (notnecessarily homogeneous) that are interconnected by a

    computer network and that cooperate in performing

    their assigned tasks.

  • 8/4/2019 Chap 1 Introduction to DDBMS

    6/34

    Distributed ComputingDistributed Computing

    y Synonymous terms distributed function

    distributed data processing multiprocessors/multicomputers

    satellite processing

    backend processing

    dedicated/special purposecomputers

    timeshared systems

    functionally modular systems

  • 8/4/2019 Chap 1 Introduction to DDBMS

    7/34

    What is distributed What is distributed

    y Processing logic

    y Functions

    y Data

    y

    Control

  • 8/4/2019 Chap 1 Introduction to DDBMS

    8/34

    What is not a DDBS?What is not a DDBS?

    y A timesharing computer system

    y A loosely or tightly coupled multiprocessorsystem

    y A database system which resides at one of the

    nodes of a network of computers - this is acentralized database on a network node

  • 8/4/2019 Chap 1 Introduction to DDBMS

    9/34

    SharedShared--Memory ArchitectureMemory Architecture

    Examples : symmetric multiprocessors (Sequent,Encore) and some mainframes (IBM3090,Bull's DPS8)

    1 Pn M

    D

  • 8/4/2019 Chap 1 Introduction to DDBMS

    10/34

    SharedShared--Disk ArchitectureDisk Architecture

    Examples : DEC's VAXcluster, IBM's IMS/VSData Sharing

    D

    P1

    M1

    Pn

    Mn

  • 8/4/2019 Chap 1 Introduction to DDBMS

    11/34

    SharedShared--Nothing ArchitectureNothing Architecture

    Examples : Teradata's DBC, Tandem, Intel'sParagon, NCR's 3600 and 3700

    P1

    M1

    D1

    Pn

    Mn

    Dn

  • 8/4/2019 Chap 1 Introduction to DDBMS

    12/34

    Centralized DBMS on a NetworkCentralized DBMS on a Network

    Site 5

    Site 1

    Site 2

    Site 3Site 4

    Communication

    Network

  • 8/4/2019 Chap 1 Introduction to DDBMS

    13/34

    Distributed DBMS EnvironmentDistributed DBMS Environment

    Site 5

    Site 1

    Site 2

    Site 3Site 4

    Communication

    Network

  • 8/4/2019 Chap 1 Introduction to DDBMS

    14/34

    What is a Distributed Database System?What is a Distributed Database System?

    A distributed database (DDB) is a collection ofmultiple, logically interrelateddatabases distributed overa computer network.

    A distributed database management system (DDBMS)is the software that manages the DDB and provides anaccess mechanism that makes this distribution

    transparent to the users.

    Distributed database system (DDBS) = DDB + DDBMS

  • 8/4/2019 Chap 1 Introduction to DDBMS

    15/34

    Implicit AssumptionsImplicit Assumptions

    y Data stored at a number of sites each sitelogicallyconsists of a single processor.

    y Processors at different sites areinterconnected by a computer network nomultiprocessors parallel database systems

    y Distributed database is a database, not acollection of files data logically related as

    exhibited in the users access patterns relational data model

    y D-DBMS is a full-fledged DBMS not remote file system

  • 8/4/2019 Chap 1 Introduction to DDBMS

    16/34

    ApplicationsApplications

    y Manufacturing - especially multi-plantmanufacturing

    y Military command and controly EFTy Corporate MISy Airlinesy

    Hotel chainsy Any organization which has a

    decentralized organization structure

  • 8/4/2019 Chap 1 Introduction to DDBMS

    17/34

    Distributed DBMS PromisesDistributed DBMS Promises

    Transparent management of distributed,fragmented, and replicated data

    Improved reliability/availability throughdistributed transactions

    Improved performance

    Easier and more economical systemexpansion

  • 8/4/2019 Chap 1 Introduction to DDBMS

    18/34

    TransparencyTransparency

    y Transparency is the separation of the higherlevel semantics of a system from the lowerlevel implementation issues.

    y Fundamental issue is to provide

    data independencein the distributed environment

    Network (distribution) transparency

    Replication transparency

    Fragmentation transparency

    x horizontal fragmentation: selectionx vertical fragmentation: projection

    xhybrid

  • 8/4/2019 Chap 1 Introduction to DDBMS

    19/34

    ExampleExample

    TITLE SAL

    PAY

    Elect. Eng. 40000

    Syst. Anal. 34000

    Mech. Eng. 27000

    Programmer 24000

    PROJ

    PNO PNAME BUDGET

    ENO ENAME TITLE

    E1 J. Doe Elect. Eng.E2 M. Smith Syst. Anal.E3 A. Lee Mech. Eng.

    E4 J. Miller Programmer

    E5B.

    Casey Syst. Anal.E6 L. Chu Elect. Eng.

    E7 R. Davis Mech. Eng.

    E8 J. Jones Syst. Anal.

    EMP

    ENO PNO RESP

    E1 P1 Manager 12

    DUR

    E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E

    3P

    4E

    ngineer 48

    E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48

    E7 P3 Engineer 36

    E8 P3 Manager 40

    ASG

    P1 Instrumentation 150000

    P3 CAD/C AM 250000

    P2 Database Develop. 135000

    P4 Maintenance 310000

    E7 P5 Engineer 23

  • 8/4/2019 Chap 1 Introduction to DDBMS

    20/34

    Transparent AccessTransparent Access

    SELECTENAME,SAL

    FROM EMP,ASG,PAY

    WHERE DUR > 12

    AND EMP.ENO = ASG.ENOAND PAY.TITLE = EMP.TITLE Paris projects

    Paris employees

    Paris assignments

    Boston employees

    Montreal projects

    Paris projects

    Montreal employees

    Montreal assignments

    Boston

    Communication

    Network

    Montreal

    Paris

    New

    York

    Boston projects

    Boston employees

    Boston assignments

    Boston projects

    New York employees

    New York projects

    New York assignments

    Tokyo

  • 8/4/2019 Chap 1 Introduction to DDBMS

    21/34

    Distributed DatabaseDistributed Database -- User ViewUser View

    Distributed Database

  • 8/4/2019 Chap 1 Introduction to DDBMS

    22/34

    Distributed DBMSDistributed DBMS -- RealityReality

    CommunicationSubsystem

    UserQuery

    DBMSSoftware

    DBMSSoftware

    UserApplication

    DBMSSoftware

    UserApplicationUser

    QueryDBMS

    Software

    User

    Query

    DBMS

    Sof

    tware

  • 8/4/2019 Chap 1 Introduction to DDBMS

    23/34

    Potentially Improved PerformancePotentially Improved Performancey A distributed DBMS fragments the conceptual db enabling

    data to be stored in close proximity to its points of use(localization)

    Each site handles only a portion of db so contention for CPUand IO services is not serious

    Localization reduces remote access delays

    y Inherent parallelism of distributed system may be exploitedfor :

    Inter query parallelism

    Intra query parallelism

  • 8/4/2019 Chap 1 Introduction to DDBMS

    24/34

    Inter and Intra Query ParallelismInter and Intra Query Parallelism

    y Inter Query Parallelism Ability to

    execute multiple queries at the same time

    y Intra Query Parallelism Ability to breakup a single query into a number of sub

    query each of which is executed at a

    different site, accessing a different part ofthe distributed database

  • 8/4/2019 Chap 1 Introduction to DDBMS

    25/34

    Parallelism RequirementsParallelism Requirements

    y If user access to distributed db consist ofonly read only access then much of thedatabase should be replicated

    y If the db access are not only read only buta mix of read and update operations then the database will requireimplementation of concurrency controland commit protocols

  • 8/4/2019 Chap 1 Introduction to DDBMS

    26/34

    Complicating FactorsComplicating Factors

    y Complexity of the architecture

    y

    Cost of implementation

    y Distribution of control

    y Security

  • 8/4/2019 Chap 1 Introduction to DDBMS

    27/34

    Distributed DBMSDistributed DBMS Problem AreasProblem Areas

    yDistributed Database Design

    how to distribute the database

    replicated & non-replicated database distribution

    yQuery Processing

    Design algorithms that convert user transactionsto data manipulation instructions

    optimization problem

    min{cost = data transmission + local processing}

  • 8/4/2019 Chap 1 Introduction to DDBMS

    28/34

    y Distributed Directory Management

    Directory contains information about data items inthe database

    Address the problem of how to arrange the directory

    global to entire DDBMS or local to each site Single copy / Multiple copy

    y Operating System Support operating system with proper support for database

    operations

    dichotomy between general purpose processingrequirements and database processing requirements

    Distributed DBMSDistributed DBMS Problem AreasProblem Areas

  • 8/4/2019 Chap 1 Introduction to DDBMS

    29/34

    Distributed DBMSDistributed DBMS Problem AreasProblem Areas

    y Concurrency Control

    Integration of the database is maintained

    Most extensive studied problem in DDBMS

    Mutual consistency All values of multiple copiesof every data item to converge to the same value

    Achieved through locking or time stamping

    y Reliability

    how to make the system resilient to failures

    consistency

  • 8/4/2019 Chap 1 Introduction to DDBMS

    30/34

    Directory

    Management

    Relationship Between IssuesRelationship Between Issues

    Reliability

    Deadlock

    Management

    Query

    Processing

    ConcurrencyControl

    Distribution

    Design

  • 8/4/2019 Chap 1 Introduction to DDBMS

    31/34

    ExerciseExercise

    y A bank has 3 branches at different

    location. At each branch a computercontrols the teller terminals of the branch

    and the account database of that branch.y Please draw two distributed database high

    level designs for the same, one for LAN

    and the other for WAN

  • 8/4/2019 Chap 1 Introduction to DDBMS

    32/34

    Distributed database on LANDistributed database on LAN

  • 8/4/2019 Chap 1 Introduction to DDBMS

    33/34

    Distributed database on WANDistributed database on WAN

  • 8/4/2019 Chap 1 Introduction to DDBMS

    34/34

    Multiprocessor SystemMultiprocessor System