Chapter 17Chapter 17
Client-Server Processing and Distributed Databases
OutlineOutline
Overview of Distributed Processing and Distributed Data
Client-Server Database Architectures Web Database Connectivity Architectures for Distributed Database
Management Systems Transparency for Distributed Database
Processing Distributed Database Processing
Evolution of Distributed Evolution of Distributed Processing and Distributed DataProcessing and Distributed Data
Need to share resources across a networkTimesharing (1970s)Remote procedure calls (1980s)Client-server computing (1990s)
Timesharing NetworkTimesharing Network
Database
Terminal
Terminal
Terminal
Mainframe computer
Resource Sharing with a Resource Sharing with a Network of Personal ComputersNetwork of Personal Computers
Database
(a) Remote procedural call
(b) File sharing
Database
Procedural call
Return results
File request
File returned
Client-Server Processing with Client-Server Processing with Distributed processing onlyDistributed processing only
Database
Client
Client
Client
Server
Distributed processing and dataDistributed processing and data
DatabaseDatabase
Client
Client
Client
Client
Server Server
Motivation for Distributed Motivation for Distributed ProcessingProcessing Flexibility: the ease of maintaining and
adapting a systemScalability: the ability to support scalable
growth of hardware and software capacity Interoperability: open standards that allow
two or more systems to exchange and use software and data
Motivation for Distributed DataMotivation for Distributed Data
Data control: locate data to match an organization’s structure
Communication costs: locate data close to data usage to lower communication cost and improve performance
Reliability: increase data availability by replicating data at more than one site
Summary of Distributed Summary of Distributed Processing and DataProcessing and Data
Distributed Processing
Distributed Data
Advantages Flexibility, interoperability, scalability
Local control of data, improved performance, reduced communication costs, improved reliability
Disadvantages High complexity, high development cost, possible interoperability problems
High complexity, additional security concerns
Client-Server Database Client-Server Database ArchitecturesArchitectures Client-Server Architecture is an
arrangement of components (clients and servers) among computers connected by a network.
A client-server architecture supports efficient processing of messages (requests for service) between clients and servers.
Design IssuesDesign Issues
Division of processing: the allocation of tasks to clients and servers.
Process management: interoperability among clients and servers and efficiently processing messages between clients and servers.
• Middleware: software for process management
Tasks to DistributeTasks to Distribute Presentation: code to maintain the
graphical user interfaceValidation: code to ensure the consistency
of the database and user inputs Business logic: code to perform business
functionsWorkflow: code to ensure completion of
business processesData access: code to extract data to answer
queries and modify a database
MiddlewareMiddleware
A software component that performs process management.
Allow clients and servers to exist on different platforms.
Allows servers to efficiently process messages from a large number of clients.
Often located on a dedicated computer.
Client-Server Computing with Client-Server Computing with MiddlewareMiddleware
Middleware
Types of MiddlewareTypes of Middleware
Transaction-processing monitors: relieve the operating system of managing database processes
Message-oriented middleware: maintain a queue of messages
Object-request brokers: provide a high level of interoperability and message intelligence
Data access middleware: provide a uniform interface to relational and non relational data using SQL
Two-Tier ArchitectureTwo-Tier Architecture
Database
SQL statements
Query results
Database server
Two-Tier Client-Server Two-Tier Client-Server ArchitectureArchitecture
A PC client and a database server interact directly to request and transfer data.
The PC client contains the user interface code.
The server contains the data access logic. The PC client and the server share the
validation and business logic.
Three-Tier Architecture Three-Tier Architecture (Middleware Server)(Middleware Server)
Database
SQL statements
Query Results
Middleware server Database server
Three-Tier Architecture Three-Tier Architecture (Application Server)(Application Server)
(b) Application serverDatabase
SQL statements
Query results
Database server Application server
Three-Tier ArchitectureThree-Tier Architecture
To improve performance, the three-tier architecture adds another server layer either by a middleware server or an application server.
• The additional server software can reside on a separate computer.
• Alternatively, the additional server software can be distributed between the database server and PC clients.
Multiple-Tier ArchitectureMultiple-Tier Architecture
A client-server architecture with more than three layers: a PC client, a backend database server, an intervening middleware server, and application servers.
Provides more flexibility on division of processing
The application servers perform business logic and manage specialized kinds of data such as images.
Multiple-Tier ArchitectureMultiple-Tier Architecture
Database
Application server
Application server
Middleware server Database server
Multiple-Tier Architecture with Multiple-Tier Architecture with Software BusSoftware Bus
Database
Software bus
Application server Database server Application server
Web Database ConnectivityWeb Database Connectivity
Internet commerce depends heavily on database access for websites.
Web database connectivity allows a database to be manipulated through a Web page.
A user may use a Web form to change a database or view a report generated from a database.
Internet BasicsInternet Basics
Network of networksUses standard protocols: TCP/IP
– TCP: splits messages into packets– IP: routes messages
Each computer on the Internet has a unique numeric address known as an IP address.
Internet and Intranet RelationshipInternet and Intranet Relationship
Firewall
TCP/IP
TCP/IPTCP/IP
TCP/IP
TCP/IP TCP/IP
Intranet
Internet
World Wide WebWorld Wide Web
Most popular application on the InternetSupports browsing pages located on any
computer on the InternetHypertext Transport Protocol (HTTP)
establishes a session between a browser and a Web server.
Each page has a unique address known as a URL.
Web Page Request CycleWeb Page Request Cycle
User clicks hyperlink.
Web server locatespage.
1
3
Web server sends file.
Browser sends request toweb server.
2
4
Browser displays file.5
XML/XSLXML/XSL
Solutions to HTML limitationseXtensible Markup Language (XML)
– Separates content and structure of a document– Use document type declaration or schema to
specify document structureeXtensible Style Language (XSL):
supports transformation into display languages
Both are extensible languages
The Common Gateway The Common Gateway Interface (CGI)Interface (CGI)CGI is an interface that allows a Web
server to invoke an external program on the same computer.
The external program uses the parameters passed by the Web server to produce output that is sent back to the browser.
Usually, the output contains HTML/XML so that the browser can display it properly.
Straight CGIStraight CGI
Web serverDatabase
serverExternalprogram
Parameters SQL
HTML/XML Results
Database
Hybrid CGIHybrid CGI
Web serverPartnerprogram
Externalprogram
Parameters SQL
Results
Databaseserver
Parameters
Database
HTML/XML HTML/XML
Server-side connectivity Server-side connectivity Server-side connectivity bypasses the
external program needed with the CGI approaches.
Specialized Web server or middleware server is needed
SQL statements and database logic are kept in a web page or external file.
The database code can execute stored procedures on the database server.
Server-Side Connectivity Server-Side Connectivity ApproachApproach
Web serverwith
middlewareextension
Databaseserver
SQL
HTML/XML
SQL statementsand formatting requirements
Database
Server-Side Connectivity with a Server-Side Connectivity with a Middleware ServerMiddleware Server
Web serverMiddleware
serverwith listener
SQL
Results
SQL statementsand formatting requirements
Database
Databaseserver
Databaserequest
HTML/XML
Client-Side ConnectivityClient-Side Connectivity
Client computing capacity can be more fully utilized without storing code on the client.
Provides a more customized interface than permitted by HTML
Supports data buffering by the client to improve performance
Web Page Request Cycle with Web Page Request Cycle with Client-Side ConnectivityClient-Side Connectivity
Web server sends filecontaining HTML/XMLand embedded applet(Java) or binary object(ActiveX).
Browser sends request toweb server.
Summary of Web ConnectivitySummary of Web ConnectivityApproach Architecture Example Product Comments
Straight CGI Two-tier Apache Web server with external PERL program
Inexpensive; portability only limited by external program; limited scalability
Hybrid CGI Two-tier Apache Web server with Cold Fusion server extensions
More expensive than straight CGI; portability depends on partner program; scalable to modest loads
Server-side connectivity
Three-tier and multiple-tier
Internet Information Server with Microsoft Transaction Server
Expensive; Web server dependent; highly scalable
Server-side connectivity (Middleware)
Three-tier and multiple-tier
Oracle Application Server
Expensive; Web server independent; highly scalable
Client-side connectivity
Two- and multiple-tier
Microsoft Remote Data Service
Customized client interface; efficient data buffering; usually works with server-side connectivity approaches
Architectures for Distributed Architectures for Distributed Database Management Systems Database Management Systems
DBMSs need fundamental extensions.Underlying the extensions are a different
component architecture and a different schema architecture.
Component Architecture manages distributed database requests.
Schema Architecture provides additional layers of data description.
Global RequestsGlobal Requests
Customer-order data
Product data
Customer-order data
Product data
Component ArchitectureComponent Architecture
DB
Site 1DDM
LDM
DDM
DDM
LDM
DB
GD
GD
Site 2
Site 3
GD
Schema Architecture ISchema Architecture IExternal
schema 1External
schema 2External
schema n...
Conceptualschema
Fragmentationschema
Allocationschema
Internalschema 1
Internalschema 2
Internalschema m...
m Sites
Schema Architecture IISchema Architecture II Global
externalschema 1
Globalexternal
schema 2
Globalexternal
schema n...
Globalconceptual
schema
Site 1 localmappingschema
Site 1 localschemas
(conceptual,internal,external) ...
m Sites
Site 2 localschemas
(conceptual,internal,external)
Site m localschemas
(conceptual,internal,external)
Site 2 localmappingschema
Site m localmappingschema
...
Transparency for Distributed Transparency for Distributed Database ProcessingDatabase Processing Transparency is related to data independence. With transparency, users can write queries
with no knowledge of the distribution, and distribution changes will not cause changes to existing queries and transactions.
Without transparency, users must reference some distribution details in queries and distribution changes can lead to changes in existing queries.
Motivating ExampleMotivating Example
CustNameCustCityCustStateCustZipCustRegion
CustNo
Customer
OrdDateOrdAmtCustNo
OrdNo
Order
OrdCity
ProdNoOrdNo
OrderLine
ProdNameProdColorProdPrice
ProdNo
Product
QOHWarehouseNoProdNo
StockNo
Inventory1
11
1
8
8
8 8
Fragments Based on the Fragments Based on the CustRegionCustRegion Field Field
CREATE FRAGMENT Western-Customers AS SELECT * FROM Customer WHERE CustRegion = 'West'
CREATE FRAGMENT Western-Orders AS SELECT Order.* FROM Order, Customer WHERE Order.CustNo = Customer.CustNo AND CustRegion = 'West'
CREATE FRAGMENT Western-OrderLines AS SELECT OrderLine.* FROM Customer, OrderLine, Order WHERE OrderLine.OrdNo = Order.OrdNo AND Order.CustNo = Customer.CustNo AND CustRegion = 'West'
CREATE FRAGMENT Eastern-Customers AS SELECT * FROM Customer WHERE CustRegion = 'East'
CREATE FRAGMENT Eastern-Orders AS SELECT Order.* FROM Order, Customer WHERE Order.CustNo = Customer.CustNo AND CustRegion = 'East'
CREATE FRAGMENT Eastern-OrderLines AS SELECT OrderLine.* FROM Customer, OrderLine, Order WHERE OrderLine.OrdNo = Order.OrdNo AND Order.CustNo = Customer.CustNo AND CustRegion = 'East'
Fragments Based on the Fragments Based on the WareHouseNoWareHouseNo Field Field
CREATE FRAGMENT Denver-Inventory AS SELECT * FROM Inventory WHERE WareHouseNo = 1
CREATE FRAGMENT Seattle-Inventory AS SELECT * FROM Inventory WHERE WareHouseNo = 2
Fragmentation TransparencyFragmentation Transparency
Fragmentation transparency provides the highest level of data independence.
Users formulate queries and transactions without knowledge of fragments (locations, or local formats).
If fragments change, queries and transactions are not affected.
Location TransparencyLocation Transparency
Location transparency provides a lesser level of data independence than fragmentation transparency.
Users need to reference fragments in formulating queries and transactions.
However, knowledge of locations and local formats is not necessary.
Local Mapping TransparencyLocal Mapping Transparency
Local mapping transparency provides a lesser level of data independence than location transparency.
Users need to reference fragments at sites in formulating queries and transactions.
However, knowledge of local formats is not necessary.
Distributed Database Distributed Database ProcessingProcessing Distributed data adds considerable
complexity to query processing and transaction processing.
Distributed database processing involves movement of data, remote processing, and site coordination.
Performance implications sometimes cannot be hidden.
Distributed query processingDistributed query processing
Involves both local (intra site) and global (inter site) optimization.
Multiple optimization objectivesThe weighting of communication costs
versus local processing costs depends on network characteristics.
There are many more possible access plans for a distributed query.
Distributed Transaction Distributed Transaction ProcessingProcessing Distributed DBMS provides concurrency
and recovery transparency. Independently operating sites must be
coordinated. New kinds of failures exist because of the
communication network.New protocols are necessary.
Distributed Concurrency ControlDistributed Concurrency Control The simplest scheme involves centralized
coordination.Centralized coordination involves the
fewest messages and the simplest deadlock detection.
The number of messages can be twice as much in distributed coordination.
Primary Copy Protocol is used to reduce overhead with locking multiple copies.
Centralized CoordinationCentralized Coordination
Centralcoordinator
Subtransaction 1at Site x
......Subtransaction 2at Site y
Subtransaction nat Site z
Lockrequest
Lockstatus
Distributed Recovery Distributed Recovery ManagementManagement Distributed DBMSs must contend with
failures of communication links and sites.Detecting failures involves coordination
among sites. The recovery manager must ensure that
different parts of a partitioned network act in unison.
The protocol for distributed recovery is the two phase commit protocol (2PC).
Voting and Decision PhasesVoting and Decision Phases
Voting phase
Coordinator Participant
Write Begin-Commit to log.Send Ready messages.Wait for responses.
Force updates to disk.If no failure, Write Ready-Commit to log. Send Ready vote.Else send Abort vote.
If all sites vote ready before timeout, Write Global Commit record. Send Commit messages. Wait for Acknowledgments.Else send Abort messages.
Decision phase
Write Commit to log.Release locks.Send acknowledgment.
Wait for acknowledgments.Resend Commit messages if necessary.Write global end of transaction.
1
2
3
4
5
SummarySummary
Utilizing distributed processing and data can significantly improve DBMS services but at the cost of new design challenges.
Several client-server architectures provide alternatives among cost, complexity, and benefit levels.
Architectures for distributed DBMSs differ in the integration of the local databases and level of data independence.