INFORMATION MANAGEMENT: STRATEGY, SYSTEMS, AND ... · INFORMATION MANAGEMENT: STRATEGY, SYSTEMS, AND TECHNOLOGIES M ASSIVELY P ... more general arena of parallelism, ... a uniprocessor

12/97

3-02-45

INFORMATION MANAGEMENT: STRATEGY, SYSTEMS, AND TECHNOLOGIES

MASSIVELY PARALLEL PROCESSING: ARCHITECTURE

AND TECHNOLOGIES

Prem N. Mehra

I N S I D E

System Components, Hardware Configurations, Multiprocessors: Scalability and Throughput,Software Layers, DBMS Architecture

INTRODUCTIONOrganizations today require a great deal more processing power to ac-commodate new kinds of applications, such as an information deliveryfacility, data mining, and electronic commerce. This need for large com-puting cycles may be met relatively inexpensively through massively par-allel processing.

Although massively parallel processors have been around for sometime, their use has been limited primarily to scientific and technical ap-plications, and to applications in the defense industry. Today, parallelprocessing is moving out to organizations in other industries, which findtheir own data volumes growing exponentially. The growth is not onlyfrom traditional transaction systems but also from new data types, suchas text, images, audio, and video. Massively parallel processing is en-abling companies to exploit vast amounts of data for both strategic andoperational purposes.

Parallel processing technology is potentially very powerful, capable ofproviding a performance boost twoto three orders of magnitude greaterthan sequential uniprocessing and ofscaling up gracefully with data basesizes. However, parallel processing isalso easy to misuse. Data base man-agement system (DBMS) and hard-ware vendors offer a wide variety ofproducts, and the technical features

P A Y O F F I D E A

Massively parallel processing, once exclusivelylimited to scientific and technical applications, of-fers organ iza t ions the ab i l i t y to exp lo i t vas tamounts of data for both strategic and operation-al purposes. This article reviews parallel process-ing concepts, current and potential applications,and the architectures needed to exploit parallelprocessing — especially massively parallel pro-cessing — in the commercial marketplace.

Auerbach Publications© 1998 CRC Press LLC

of parallel processing are complex. Therefore, organizations must workfrom a well-considered vantage point, in terms of application, informa-tion, and technical architectures.

This article reviews parallel processing concepts, current and potentialapplications, and the architectures needed to exploit parallel processing— especially massively parallel processing — in the commercial market-place. Article 3-02-48 focuses on implementation considerations.

PARALLEL PROCESSING CONCEPTSMassively parallel computing should be placed within the context of amore general arena of parallelism, which has been in use in the commer-cial marketplace for some time. Many variations and algorithms for par-allelism are possible; some of the more common ones are listed here:

• OLTP. Concurrent execution of multiple copies of a transaction pro-gram under the management of an online terminal monitor process-ing (OLTP) manager is one example. A program written to executeserially is enabled to execute in parallel by the facilities provided andmanaged by the OLTP manager.

• Batch. For many long-running batch processes, it is common to exe-cute multiple streams in parallel against distinct key ranges of filesand merge the output to reduce the total elapsed time. Operating sys-tems (OS) or DBMS facilities enable the multiple streams to serializethe use of common resources to ensure data integrity.

• Information delivery facility (data warehouse). The use of nonblock-ing, asynchronous, or pre-fetch input/output (I/O) reduces theelapsed time in batch and information delivery facility environmentsby overlapping I/O and CPU processing. The overlap, parallelism ina loose sense, is enabled by the DBMS and OS without user programcoding specifically to invoke such overlap.

Recent enhancements in DBMS technologies are now enabling userprograms, still written in a sequential fashion, to exploit parallelism evenfurther in a transparent way. The DBMS enables the user SQL (structuredquery language) requests to exploit facilities offered by the new massivelyparallel processors by concurrently executing components of a single SQL.This is known as intra-SQL parallelism. Large and well-known data ware-house implementations at retail and telecommunications organizations(such as Sears, Roebuck and Co. and MCI Corp., respectively) are basedon exploiting this form of parallelism. The various architectures and tech-niques used to enable this style of parallelism are covered in this article.

Facets of ParallelismFrom a hardware perspective, this article focuses on a style of parallelismthat is enabled when multiple processors are arranged in different con-

figurations to permit parallelism for a single request. Examples of suchconfigurations include symmetric multiprocessors (SMP), shared-diskclusters (SDC), and massively parallel processors (MPPs).

From an external, high-level perspective, multiple processors in acomputer are not necessary to enable parallel processing. Several othertechniques can enable parallelism, or at least give the appearance of par-allelism, regardless of the processor environment. These styles of paral-lelism have been in use for some time:

• Multitasking or programming. This enables several OLTP or batchjobs to concurrently share a uniprocessor, giving the appearance ofparallelism.

• Specialized processing. Such functions as I/O managers can be exe-cuted on specialized processors to permit the CPU’s main processorto work in parallel on some other task.

• Distributed processing. Similarly, peer-to-peer or client/server style ofdistributed processing permits different components of a transactionto execute in parallel on different computers, displaying yet anotherfacet of parallelism.

Multiprocessors, important as they are, are only one component in en-abling commercial application program parallelism. Currently, the keycomponent for exploiting parallelism in most commercial applications isthe DBMS, whose newly introduced intra-SQL parallelism facility haspropelled it to this key role. This facility permits users to exploit multi-processor hardware without explicitly coding for such configurations intheir programs.

System ComponentsIn general, from the perspective of commercial application programs,such facilities as parallelism are characteristics of the entire system, notmerely of its components, which include the hardware, operating sys-tem, infrastructure, and DBMS. However, each of the components playsan important role in determining the system characteristics.

Exhibit 1 shows one way to depict a system in a layered fashion. Thehardware layer, which consists of a multiple processor configuration, of-fers the base component. The operating system (OS) and the functionand I/O shipping facilities, called infrastructure, influence the program-ming model that the DBMS layer uses to exhibit the system characteris-tics, which is exploited by the application program.

This article first discusses the hardware configurations and their char-acteristics, including scalability. It then discusses the DBMS architectures,namely shared data and buffer, shared data, and partitioned data. The ar-

ticle focuses on the DBMS layer because it is generally recognized thatthe DBMS has become the key enabler of parallelism in the commercialmarketplace. The OS and infrastructure layer will not be explicitly dis-cussed, but their characteristics are interwoven in the hardware and theDBMS discussion.

HARDWARE CONFIGURATIONSProcessors are coupled and offered to the commercial users in multipleprocessor configurations for many reasons. These have to do with anumber of factors — performance, availability, technology, and compe-tition — and their complex interaction in the marketplace.

Multiprocessors provide enhanced performance and throughput be-cause more computing resources can be brought to bear on the problem.In addition, they also provide higher data availability. Data can be ac-cessed from another processor in the event of a processor failure. Histor-ically, enhanced data availability has been a key motivation and driverfor using multiple processors.

EXHIBIT 1 — System Components

Multiprocessor ClassificationA formal and technically precise multiprocessor classification scheme de-veloped by Michael J. Flynn is widely used in technical journals and byresearch organizations. Flynn introduced a number of programmingmodels, including the multiple instruction multiple data stream (MIMD),the single instruction multiple data stream (SIMD), the single instructionsingle data stream (SISD), and the multiple instruction single data stream(MISD). This taxonomy, based on the two key characteristics of process-ing and data, is still used. For the purposes of this article, however, mul-tiprocessors can be classified into three simple types:

1. SMP (symmetric multiprocessors)2. SDC (shared-disk clusters)3. MPP (massively parallel processors)

Because this simplification can lead to misunderstandings, it is advis-able to clarify these terms early in any discussion to avoid the pitfalls ofoversimplification.

How are multiprocessors distinguished from uniprocessors? The basicbuilding block for other configurations, a uniprocessor is a single proces-sor with a high-speed cache and a bus to access main memory and ex-ternal I/O. Processors are very fast and need data at high speeds to avoidwasting valuable processing cycles. On the other hand, memory, the sup-plier of data, is made of slow-speed, inexpensive dynamic random ac-cess memory (DRAM) chips, which cannot keep up with the demands ofthese processors. Cache provides a small amount of buffered data at highspeeds to the processor to balance the fast data requirements of the pro-cessor and the slow speed of DRAM memory. Cache is not as fast as aprocessor but much faster than main memory, and is made from fast butexpensive static random access memory (SRAM) chips.

Even though SMPs, clusters, and MPPs are being produced to providefor additional processing power, vendors are also continuing to enhancethe uniprocessor technology. Intel’s 386, 486, Pentium, and Pro-Pentiumprocessors exemplify this trend. The enhancements in the PowerPC linefrom the alliance among Apple, IBM, and Motorola is another example.Although research and development costs for each generation of theseprocessors require major capital investment, the competitive pressuresand technological enhancements continue to drive the vendors to devel-op faster uniprocessors, which are also useful in reducing the overallelapsed time for those tasks that cannot be performed in parallel. This isa very significant consideration and was highlighted by Amdahl’s law.

An SMP is a manifestation of a shared-everything configuration (seeExhibit 2). It is characterized by having multiple processors, each with itsown cache, but all sharing memory and I/O devices. Here, the proces-sors are said to be tightly coupled.

SMP configuration is very popular and has been in use for over 2 dec-ades. IBM Systems/370 158MP and 168MP were so configured. Currently,several vendors (including COMPAQ, DEC, HP, IBM, NCR, SEQUENT,SGI, and Sun) offer SMP configurations.

A very important consideration that favors the choice of SMP is theavailability of a vast amount of already written software, for example,DBMS, that is based on the programming model that SMP supports. Oth-er alternative platforms, such as loosely coupled multiprocessors, cur-rently do not have as rich a collection of enabling software.

The fact that multiple processors exist is generally transparent to auser-application program. The transparency is enabled by the OS soft-ware layer, which hides the complexity introduced by the multiple pro-cessors and their caches, and permits increased horsepower exploitationrelatively easily. It is also efficient because a task running on any proces-sor can access all the main memory and share data by pointer passing asopposed to messaging.

The subject of how many processors can be tightly coupled togetherto provide for growth (i.e., scalability) has been a topic of vast interestand research. One of the inhibitors for coupling too many processors isthe contention for memory access. High-speed cache can reduce thecontention because main memory needs to be accessed less frequently;however, high-speed cache introduces another problem: cache coher-ence. Because each processor has its own cache, coherence among them

EXHIBIT 2 — The Symmetric Multiprocessor

has to be maintained to ensure that multiple processors are not workingwith stale copy of data from main memory.

Conventional wisdom has held that because of cache coherence andother technical considerations, the practical upper limit for the numberof processors that could be tightly coupled in an SMP configuration is be-tween 10 and 20. However, over the years, a number of new schemes,including Central Directory, Snoopy Bus, Snoopy Bus and Switch, Dis-tributed Directories using Scalable Coherent Interconnect (SCI), and theDirectory Architecture for Shared memory (DASH), have been inventedand implemented by SMP vendors to overcome the challenge of main-taining cache coherence. Even now, new innovations are being rolledout, so SMPs have not necessarily reached their scalability limits. Thesenewer techniques have made it possible to scale well beyond the con-ventionally accepted limit, but at an increased cost of research and de-velopment. So, economics as well as technology now become importantissues in SMP scalability. The question then is not whether SMP can scale,but whether it can scale economically, compared to other alternatives,such as loosely coupled processors (e.g., clusters and MPPs).

ClustersA cluster is a collection of interconnected whole computers utilized as asingle computing resource. Clusters can be configured using either theshared-something or shared-nothing concept. This section discusses theshared-something clusters, and the MPP section covers the shared-nothing clusters.

Shared-something clusters may share memory or I/O devices. SharedI/O clusters are very popular; various DEC clusters and IBM Sysplex areexamples. Because of their wide use, people colloquially use the wordcluster to mean a shared-disk configuration. However, sharing disks isnot a requirement of a cluster; for that matter, no sharing is necessary atall for clustering.

Digital Equipment Corporation has offered VAXCluster since the early1980s (see Exhibit 3), and followed it with Open VMSCluster and, morerecently, DECTrueClusters.

IBM Parallel Sysplex configuration uses a shared electronic storage (i.e.,a coupling facility) to communicate locking and other common data struc-tures between the cluster members. This helps in system performance.

A shared-disk cluster (SDC) presents a single system image to a userapplication program. OS, DBMS, and load balancing infrastructure canhide the presence and complexities of a multitude of processors. None-theless, a user application program can target a specific computer if ithas an affinity for it for any reason, such as the need for a specializedI/O device.

Shared-disk clusters offer many benefits in the commercial market-place, including high availability, incremental growth, and scalability. Ifone processor is down for either planned or unplanned outage, otherprocessors in the complex can pick up its workload, thus increasing dataavailability. Additional processors can be incrementally added to thecluster to match the workload, as opposed to having to purchase a sin-gle, large computer to meet the eventual workload. Also, the scalabilitychallenge faced by the SMP is not encountered here because the inde-pendent processors do not share common memory.

Clusters face the challenge of additional overhead caused by interpro-cessor message passing, workload synchronization and balancing, andsoftware pricing. Depending on the pricing model used by the softwarevendor, the sum of software license costs for equivalent computing pow-er can be higher for a cluster as compared to an SMP, because multiplelicenses are required.

The Massively Parallel Processor (MPP)In an MPP configuration, each processor has its own memory and access-es its own external I/O. This is a shared-nothing architecture, and the pro-cessors are said to be loosely coupled. An MPP permits the coupling of

EXHIBIT 3 — The VAXCluster Configuration

thousands of relatively inexpensive off-the-shelf processors to provide bil-lions of computing instructions. Because an MPP implies coupling a largenumber of processors, an interconnection mechanism (e.g., buses, rings,hypercubes, and meshes) is required to provide for the processors tocommunicate with each other and coordinate their work. Currently, manyvendors, including IBM, ICL, NCR, and nCUBE, offer MPP configurations.

From a simplistic I/O and memory sharing point of view, there is adistinction between MPP and SMP architectures. However, as will be dis-cussed later in the article, operating systems (OSs) or other software lay-ers can mask some of these differences, permitting some software writtenfor other configurations, such as shared-disk, to be executed on an MPP.For example, the virtual shared-disk feature of IBM’s shared-nothingRS/6000 SP permits higher-level programs, for example Oracle DBMS, touse this MPP as if it were a shared-disk configuration.

Before the mid-1980s, the use of MPPs was essentially limited to sci-entific and engineering work, which typically requires a large number ofcomputing instructions but limited I/O. Such a task can often be split intomany individual subtasks, each executing on a processor and performingminimal I/O. Computational fluid dynamics is an example of this. Sub-task results are then combined to prepare the final response. The decom-position of tasks and combination of results required sophisticatedprogramming tools and skills that were found in the scientific and engi-neering community.

Traditional commercial work, on the other hand, requires relativelyfewer computing cycles (though this is changing) but higher I/O band-width. In addition, typical programming tools and required skills for ex-ploiting MPPs in commercial environments were not available. Thecombination of these factors resulted in minimal use of this technologyin the commercial arena.

In the late 1980s, offerings introduced by NCR’s DBC1012 Teradatadata base machine started to permit online analytical processing (OLAP)of commercial workloads to benefit from the MPP configurations. Morerecently, innovations introduced by several data base vendors, such asOracle and Informix, have further accelerated this trend.

An important point is that a key ingredient for the acceptance of MPPin the commercial marketplace is the ease with which parallelism can beexploited by a DBMS application programmer without the programmerneeding to think in terms of parallel programs or needing to implementsophisticated and complex algorithms.

The technique used by the DBMSs to exploit parallelism is the sameas previously mentioned for the scientific work. A single task (i.e., anSQL request) is split into many individual subtasks, each executing on aprocessor. Subtask results are then combined to prepare the answer setfor the SQL request. In addition, the packaging of processors in an MPPconfiguration offers several administrative benefits. The multitude of pro-

cessors can be managed from a single console, and distribution andmaintenance of software is simplified because only a single copy is main-tained. The system makes the same copy available to the individualnodes in the MPP. This administrative benefit is very attractive and attimes used for justification of an MPP solution, even when no parallelismbenefits exist.

At times, financial requests for additional hardware may also be lessof an administrative concern, because the upgrade (i.e., the addition ofprocessors and I/Os) is made to an existing installed serial number ma-chine and does not require the acquisition of a new one, which may in-volve more sign-offs and other administrative steps.

Currently, DBMS vendors focus on MPP technology to implement stra-tegic information processing (e.g., data warehouse) style of applications.They have placed only a limited emphasis on operational applications.As more experience is gained and technology matures, it is likely thatmore operational applications, such as OLTP, may find a home on theMPP platform.

However, the current reduction in government grants to scientific anddefense organizations, the slow development of the software, and fiercecompetition in the MPP marketplace have already started a shakedownin the industry. One analysis by the Wall Street Journal has suggestedthat by the time the commercial marketplace develops, quite a few sup-pliers will “sputter toward the abyss.”

In addition to MPP, the shared-nothing configuration can also be im-plemented in a cluster of computers where the coupling is limited to alow number, as opposed to a high number, which is the case with anMPP. In general, this shared-nothing lightly (or modestly) parallel clusterexhibits characteristics similar to those of an MPP.

The distinction between MPP and a lightly parallel cluster is somewhatblurry. The following table shows a comparison of some salient featuresfor distinguishing the two configurations. The most noticeable featureappears to be the arbitrary number of connected processors, which islarge for MPP and small for a lightly parallel cluster.

From a programming model perspective, given the right infrastructure,an MPP and shared-nothing cluster should be transparent to an applica-tion, such as a DBMS. As discussed in a later section, IBM’s DB2 Com-mon Universal Server DBMS can execute both on a shared-nothing,

Characteristic MPP Lightly Parallel Cluster

Number of processors Thousands HundredsNode OS Homogeneous Can be different, but usually

homogeneousInter-node security None Generally nonePerformance metric Turnaround time Throughput and turnaround

lightly parallel cluster of RS/6000 computers and on IBM’s massively par-allel hardware, RS/6000 SP.

Lightly parallel, shared-nothing clusters may become the platform ofchoice in the future for several reasons, including low cost. However,software currently is not widely available to provide a single system im-age and tools for performance, capacity, and workload management.

It is expected that Microsoft Corp.’s architecture for scaling WindowsNT and SQL Server to meet the enterprisewide needs may include thisstyle of lightly parallel, shared-nothing clustering. If this comes to pass,the shared-nothing clusters will become very popular and overshadowtheir big cousin, MPP.

The various hardware configurations also offer varying levels of scal-ability (i.e., how much useable processing power is delivered to the us-ers when such additional computing resources as processors, memory,and I/O are added to these configurations). This ability to scale is clearlyone of the major considerations in evaluating and selecting a multipro-cessor platform for use.

MULTIPROCESSORS: SCALABILITY AND THROUGHPUTAn installation has several hardware options when requiring additionalcomputational power. The choice is made based on many technical, ad-ministrative, and financial considerations. From a technical perspectivealone, there is considerable debate as to how scalable the various hard-ware configurations are.

One commonly held opinion is that the uniprocessors and massivelyparallel processors represent the two ends of the spectrum, with SMPand clusters providing the intermediate scalability design points. Exhibit4 depicts this graphically.

As discussed in the previous section, a commonly held opinion is thatan SMP can economically scale only to between 10 and 20 processors,beyond which alternative configurations, such as clusters and MPPs, be-come financially more attractive. Whether that is true depends on manyconsiderations, such as hardware technology, software availability andpricing, and technical marketing and support from the vendors. Consid-ering all the factors, one can observe that hardware technology, by itself,plays only a minor role.

SMPs currently enjoy the most software availability and perhaps a soft-ware price advantage. Shared-disk clusters, in general, suffer the penaltyfrom higher software prices because each machine is a separate comput-er requiring an additional license, although licenses are comparativelyless expensive, as the individual machines are smaller than a correspond-ing equivalent single SMP. However, new pricing schemes based on totalcomputing power, number of concurrent logged-on users, or other fac-

tors are slowly starting to alter the old pricing paradigm, which put clus-ters at a disadvantage.

Clusters and MPPs do not suffer from the cache coherence and mem-ory contention that offer SMP design challenges in scaling beyond the 10-to-20 range. Here, each processor has its own cache, no coherence needsto be maintained, and each has its own memory, so no contention oc-curs. Therefore, from a hardware viewpoint, scaling is not as challenging.The challenge lies, however, in interconnecting the processors and en-abling software so that the work on the separate processors can be coor-dinated and synchronized in an efficient and effective way.

To provide more computing resources, vendors are including SMP asindividual nodes in cluster and MPP configurations. NCR’s WorldMark5100M is one example in which individual nodes are made of SMPs.Thus, the achievement of huge processing power is a multitiered phe-nomenon: increasing speeds of uniprocessors, the combination of fasterand larger number of processors in an SMP configuration, and inclusionof SMPs in cluster and MPP offerings.

SOFTWARE LAYERSAll the approaches to parallelism discussed to this point have touched onmultiprocessing. All the approaches manifest and support parallelism indifferent ways; however, one underlying theme is common to all. It maynot be very obvious, but it needs to be emphasized. In almost all cases,with few exceptions, a commercial application program is written in a se-quential fashion, and parallelism is attained by using some externalmechanism. Here are some examples to illustrate the point.

EXHIBIT 4 — Scalability

During execution, an OLTP manager spawns separate threads or pro-cesses, schedules clones of the user’s sequential program, and managesstorage and other resources on its behalf to manifest parallelism. Inbatch, multiple job streams executing an application program are oftenorganized to processes different files or partitions of a table to supportparallelism. In the client/server model, multiple clients execute the samesequential program in parallel; the mechanism that enables parallelism isthe software distribution facility. Another way of looking at client/serverparallelism is the execution of a client program in parallel with an asyn-chronous stored procedure on the server; the facility of remote proce-dure calls and stored procedures enables parallelism.

In the same way, a single SQL call from an application program canbe enabled to execute in parallel by a DBMS. The application program isstill written to execute sequentially, requiring neither new compilers norprogramming techniques. This can be contrasted with building parallel-ism within a program using special constructs (e.g., doall and foreach),in which the FORTRAN compiler generates code to execute subtasks inparallel for the different elements of a vector or matrix. Such constructs,if they were to become widely available in the commercial languages,will still require significant retraining of the application programmers tothink of a problem solution in terms of parallelism. Such facilities are notavailable. In addition, from an installations point of view, the approachof sequential program and DBMS-enabled parallelism is much easier andless expensive to implement. The required new learning can be limitedto the designers and supporters of the data bases: data base and systemadministrators. Similarly, tools acquisition can also be focused toward thetask performed by such personnel.

Now, SQL statements exhibit varying amounts of workload on a sys-tem. At one extreme, calls (generally associated with transaction-orientedwork) that retrieve and update a few rows require little computational re-source. At the other extreme, statements (generally associated with OLAPwork) perform large amounts of work and require significant computa-tional resources. Within this wide range lie the rest. By their very nature,those associated with OLAP work, performed within a data warehouseenvironment, can benefit most from the intra-SQL parallelism, and thoseare the current primary target of the parallelism.

If the user-application code parallelism is attained only through theuse of other techniques listed earlier (e.g., OLTP), it is questionablewhether DBMS-enabled parallelism is limited to the OLAP data ware-house-oriented SQL. This is really not valid. New enhancements to therelational data base technology include extensions that permit user-defined functions and data types to be tightly integrated with the SQLlanguage. User-developed application code can then be executed as partof the DBMS using these facilities. In fact, Oracle7 allows execution ofbusiness logic in parallel using user-defined SQL functions.

DBMS ARCHITECTUREAs discussed earlier, from an application programming perspective, ex-ploitation of hardware parallelism in the commercial marketplace waslimited because of lack of tools and skills. It is now generally recognizedthat DBMS vendors have recently stepped up their efforts in an attemptto address both of these challenges. They are starting to become the en-ablers of parallelism in the commercial arena. Additionally, more andmore applications are migrating to DBMS for storage and retrieval of da-ta. Therefore, it is worthwhile to understand how the DBMS enablesparallelism.

This understanding will help in choosing an appropriate DBMS and,to some extent, the hardware configuration. Also, it will help in design-ing applications and data bases that perform and scale well. Lack of un-derstanding can lead to poorly performing systems and wastedresources.

In a manner similar to the three hardware configurations, DBMS archi-tectures can also be classified into three corresponding categories:

• Shared data and buffer• Shared data• Partitioned data

There is a match between these architectures and the characteristics ofthe respective hardware configuration, (SMP, shared disk clusters, andMPP), but a DBMS does not have to execute only on its correspondinghardware counterpart. For example, a DBMS based on shared-data archi-tecture can execute on a shared-nothing MPP hardware configuration.

When there is a match, the two build on each other’s strengths andsuffer from each other’s weaknesses. However, the picture is muchcloudier when a DBMS executes on a mismatched hardware configura-tion. On the one hand, in these cases, the DBMS is unable to build onand fully exploit the power of the hardware configuration. On the otherhand, it can compensate for some of the challenges associated with usingthe underlying hardware configuration.

Shared Data and BufferIn this architecture, a single instance of the DBMS executes on a config-uration that supports sharing of buffers and data. Multiple threads are ini-tiated to provide an appropriate level of parallelism. As shown in Exhibit5, all threads have complete visibility to all the data and buffers.

System administration and load balancing are comparatively easierwith this architecture because a single instance of the DBMS has full vis-ibility to all the data. This architecture matches facilities offered by SMPconfiguration, in which this architecture is frequently implemented.

When executed on an SMP platform, the system inherits the scalabilityconcerns associated with the underlying hardware platform. The maxi-mum number of processors in an SMP, the cache coherence among pro-cessors, and the contention for memory access are some of the reasonsthat contribute to the scalability concerns.

Informix DSA, Oracle7, and DB2 for MVS are examples of DBMSs thathave implemented the shared-data-and-buffer architecture. TheseDBMSs and their SQL parallelizing features permit intra-SQL parallelism;that is, they can concurrently apply multiple threads and processors toprocess a single SQL statement.

The algorithms used for intra-SQL parallelism are based on the notionof program and data parallelism. Program parallelism allows such SQLoperations as scan, sort, and join to be performed in parallel by passingdata from one operation to another. Data parallelism allows these SQLoperations to process different pieces of data concurrently.

Shared DataIn this architecture, multiple instances of the DBMS execute on differentprocessors. Each instance has visibility and access to all the data, but ev-ery instance maintains its own private buffers, and updates rows within it.

Because the data is shared by multiple instances of the DBMSs, amechanism is required to serialize the use of resources so that multipleinstances do not concurrently update a value and corrupt the modifica-tions made by another instance. This serialization is provided by the glo-bal locking facility of the DBMS and is essential for the shared dataarchitecture. Global locking may be considered an extension of DBMS

EXHIBIT 5 — Shared Data and Buffers

local locking facilities, which ensures serialization of resource modifica-tion within an instance, to the multiple instances that share data.

Another requirement, buffer coherence, is also introduced, becauseeach instance of the DBMS maintains a private buffer pool. Withoutbuffer coherence, a resource accessed by one DBMS from disk may notreflect the modification made by another system if the second systemhas not yet externalized the changes to the disk. Logically, the problemis similar to cache coherence, discussed earlier for the SMP hardwareconfiguration.

Combined, global locking and buffer coherence ensure data integrity.Oracle Parallel Server (OPS), DB2 for MVS Data Sharing, and InformationManagement System/Virtual Storage (IMS/VS) are examples of DBMSsthat implement this architecture.

It must be emphasized that intraquery parallelism is not being exploit-ed in any of these three product feature examples. The key motivationfor the implementation of this architecture is the same as those discussedfor the shared-disk hardware configuration, namely: data availability, in-cremental growth, and scalability. However, if one additionally choosesto use other features of these DBMSs in conjunction, the benefits of in-traquery parallelism can also be realized.

As can be seen, there is a match between this software architectureand the facilities offered by shared-disk clusters, where this architectureis implemented. Thus, the performance of the DBMS depends not onlyon the software but also on the facilities provided by the hardware clus-ter to implement the two key components: global locking and buffercoherence.

In some implementations, maintenance of buffer coherence necessi-tates writing of data to the disks to permit reading by another DBMS in-stance. This writing and reading is called pinging. Depending on thelocking schemes used, pinging may take place even if the two instancesare interested in distinct resources but represented by the same lock. Thisis known as false pinging. Heavy pinging, false or otherwise, has a neg-ative effect on application performance because it increases both the I/Oand the lock management overhead.

System vendors use many techniques to minimize the impact of ping-ing. For example, IBM’s DB2 for MVS on the Parallel Sysplex reduces thepinging I/O overhead by using the coupling facility, which couples allthe nodes in the cluster through use of high-speed electronics. That is,disk I/Os are replaced by writes and reads from the coupling facility’selectronic memory. In addition, hardware facilities are used to invalidatestale data buffers in another instances.

Even if the hardware and DBMS vendors have provided adequate fa-cilities to minimize the adverse impact of pinging, it is still an applicationarchitect and data base administrator’s responsibility to design applica-

tion and data bases such that the need for sharing data is minimized toget the best performance.

It is interesting to note that shared-data software architecture can beimplemented on hardware platforms other than shared disk. For exam-ple, Oracle Parallel Server, which is based on shared-data software archi-tecture, is quite common on IBM’s RS 6000 SP, an implementation basedon shared-shared-nothing hardware configuration. This is achieved byusing the virtual shared disk feature of RS/6000 SP.

In this case, Oracle7’s I/O request for data residing on another nodeis routed by RS/6000 SP device drivers to the appropriate node, an I/O isperformed, if necessary, and data is returned to the requesting node. Thisis known as data shipping and contributes to added traffic on the node’sinterconnect hardware. The inter-node traffic is a consideration when ar-chitecting a solution and acquiring hardware.

In general, for the data base administrators and application architects,it is necessary to understand such features in detail because the applica-tion performance depends on the architecture of the DBMS and the OSlayers.

Partitioned DataAs the name implies, the data base is partitioned among different instanc-es of the DBMSs. In this option, each DBMS owns a portion of the database and only that portion may be directly accessed and modified by it.

Each DBMS has its private or local buffer pool, and as there is no shar-ing of data, the kind of synchronization protocols discussed earlier forshared data (i.e., global locking and buffer coherence) are not required.However, a transaction or SQL modifying data in different DBMS in-stances residing on multiple nodes will need some form of two-phasecommit protocol to ensure data integrity.

Each instance controls its own I/O, performs locking, applies the localpredicates, extracts the rows of interest, and transfers them to the nextstage of processing, which may reside on the same or some other node.

As can be seen, there is a match between the partitioned-data optionand the MPP hardware configuration. Additionally, because MPP pro-vides a large amount of processing power, and the partitioned-data archi-tecture does not need the synchronization protocols, some argue that thiscombination offers the highest scalability. Thus, it has been the focus ofrecent development in the DBMS community. The partitioned-data archi-tecture requires frequent communication among the DBMS instances tocommunicate messages and transfer results. Therefore, low latency andhigh bandwidth for the interconnect are required if the system is to scaleup with the increased workload.

As mentioned earlier, NCR’s Teradata system was one of the earliestsuccessful commercial products based on this architecture. Recent new

UNIX DBMSs offerings from other vendors are also based on this archi-tecture. IBM DB2 Parallel Edition, Informix XPS, and Sybase MPP are ex-amples.

In this architecture, requests for functions to be performed at otherDBMS instances are shipped to them; the requesting DBMS instance re-ceives only the results, not a block of data. This concept is called func-tion shipping and is considered to offer better performancecharacteristics, compared to data shipping, because only the results aretransferred to the requesting instance.

The partitioned-data architecture uses the notion of data parallelism toget the benefits of parallelism, and data partitioning algorithms play animportant role in determining the performance characteristics of the sys-tems. Various partitioning options are discussed in a later section.

The partitioned-data architecture also provides an additional flexibilityin choosing the underlying hardware platform. As one can observe, par-titioned data matches well with the MPP hardware configuration. It alsomatches with the shared-nothing lightly parallel clusters. The distinctionbetween these clusters and MPP is primarily based on the number ofnodes, which is somewhat arbitrary.

For illustration, DB2 Parallel Edition is considered a partitioned-dataimplementation. As shown in Exhibit 6, Parallel Edition can execute bothon RS/6000 SP, an MPP offering, and on a cluster of RS/6000 computers,which are connected by a LAN. However, for a number of technical andfinancial reasons, the RS/6000 cluster solution is not marketed actively.On the other hand, there are conjectures in the market that similar systemsolutions are likely to become more prevalent when Microsoft becomesmore active in marketing its cluster offerings.

THE THREE DBMS ARCHITECTURES: SUMMARYThere is great debate over which of the three software models — sharedbuffer and data, shared data, or partitioned data — is best for the com-

EXHIBIT 6 — DB2 Parallel Edition as a Partitioned-Data Implementation

mercial marketplace. This debate is somewhat similar to the one that re-volves around the choice of hardware configurations (i.e., SMP, clusters,or MPP). Although one might assume that making the choice of a soft-ware architecture would lead to a straightforward choice of a corre-sponding hardware configuration, or vice versa, this is not the case.

OS and infrastructure layers permit cohabitation of a DBMS architec-ture with a non-matching hardware configuration. Because of the mis-match, it is easy to observe that the mismatched components may notfully exploit the facilities of its partner. It is somewhat harder to appreci-ate, however, that the shortcomings of one may be compensated to someextent by the strengths of another. For example, a shared-data softwarearchitecture on an MPP platform avoids the management issues associat-ed with repartitioning of data over time as the data or workload charac-teristics change.

This variety and flexibility in mix-and-match implementation presentstradeoffs to both the DBMS and hardware vendors and to the applicationsystem developers. Even after an installation has made the choice of aDBMS and a hardware configuration, the application architects and thedata base and system administrators must still have a good understandingof the tradeoffs involved with the system components to ensure scalabil-ity and good performance of the user applications.

CONCLUSIONThe future seems extremely promising. Making faster and faster uni-

processors is not only technically difficult but is becoming economicallyprohibitive. Parallel processing is the answer. In the near future, all threehardware configurations are likely to find applicability:

• SMP on desktops and as departmental servers• Shared-disk SMP clusters as enterprise servers• MPPs as servers of choice for strategic information processing and as

multimedia servers

Image and video servers requiring large I/O capacity seem ideal forparallel processing. This need can be satisfied by a variety of currentMPP, which emphasize scalability of I/O more than the scalability of pro-cessing power.

Lightly parallel shared-nothing clusters based on commodity hardwareand ATM or Ethernet interconnect are likely to become popular becauseof their low cost as soon as software for workload management becomesavailable.

Making long-range predictions in this business is unwise. However,one can be assured that parallel processing solutions in the commercialmarketplace will be driven less by the hardware technology, and more

by the software innovations, systems management offerings, pricingschemes, and most important by the marketing abilities of the vendors.

Prem N. Mehra is a senior architectural engineer with Microsoft Corp. and a former associate partner in the tech-nology integration services-worldwide organization of Andersen Consulting. This article is adapted from Auer-bach’s forthcoming book, Network-Centric Computing: Computing, Communications, and Knowledge, by HughW. Ryan and associates.

Documents

INFORMATION MANAGEMENT: STRATEGY, SYSTEMS, AND ... · INFORMATION MANAGEMENT: STRATEGY, SYSTEMS, AND TECHNOLOGIES M ASSIVELY P ... more general arena of parallelism, ... a uniprocessor