Xu Ly Tuong Tranh

  • Upload
    pnl55

  • View
    226

  • Download
    0

Embed Size (px)

Citation preview

  • 8/8/2019 Xu Ly Tuong Tranh

    1/37

    Con currency Control in Distr ibuted Da tabase S ystem s

    PHILIP A. BERNSTEIN AND NATHAN GOODMAN

    Com puter Corpora tion of Ame rica, Ca mbridge, M assachusetts 02139

    In this pa pe r we survey, consolidate, and present the s tate o f the ar t in distributeddatabase concurrency control. The he art o f our analysts is a decomposition of theconcurrency control problem into two m ajor subproblems: read-w rite and w rite-writesynchronization. We describe a series of synchrom zation techniques for solving eac hsubproblem a nd show how to com bine these techniques into algorithm s for solving theentire concurrency control problem. Such algorithms are called "concurrency con trolmethods." W e describe 48 principal method s, including all practical algorithm s th at h av eappeared m the literature plus several new ones. W e concentrate on the stru ctur e andcorrectness of concurrency control algorithm s. Issues o f performance are g iven onlysecondary treatment.

    Keywords a nd Phrases:concurrency control, deadlock, dtstnbu ted da tabase ma nag em entsystems, locking, senahzability, sy nchrom zation, tun esta m p ordering, timestamps, two-phase co mm it, two-phase locking

    CR Categories:4.33, 4.35

    I N T R O D U C T I O N

    The Concurrency Control Problem

    C o n c u r r e n c y c o n t r o l is t h e a c t i v i t y o f c o -

    o r d i n a t i n g c o n c u r r e n t a c c e s s e s t o a d a t a -b a s e in a m u l t iu s e r d a t a b a s e m a n a g e m e n ts y s t e m ( D B M S ) . C o n c u r r e n c y c o n t r o l p e r -m i t s u s e r s t o a c c e s s a d a t a b a s e i n a m u l t i-p r o g r a m m e d f a s h i o n w h i l e p r e s e r v i n g t h ei ll u s io n t h a t e a c h u s e r is e x e c u t i n g a l o n e o na d e d i c a t e d s y s t e m . T h e m a i n t e c h n ic a ld i f f ic u l t y i n a t t a i n i n g t h i s g o a l is t o p r e v e n td a t a b a s e u p d a t e s p e r f o r m e d b y o n e u s e rf r o m i n t e r f e ri n g w i t h d a t a b a s e r e t r i e v a l s

    a n d u p d a t e s p e r f o r m e d b y a n o t h e r . T h ec o n c u r r e n c y c o n t ro l p r o b l e m is e x a c e r b a t e din a d i s t r i b u te d D B M S ( D D B M S ) b e c a u s e(1 ) u s e r s m a y a c c e s s d a t a s t o r e d i n m a n yd i f f er e n t c o m p u t e r s i n a d i s t r i b u t e d s y s t e m ,a n d (2 ) a c o n c u r r e n c y c o n t r o l m e c h a n i s ma t o n e c o m p u t e r c a n n o t i n s t a n t a n e o u s l yk n o w a b o u t i n te r a c t i o n s a t o t h e r c o m -p u t e r s .

    C o n c u r r e n c y c o n t r o l h a s b e e n a c t iv e l yi n v e s t i g a t e d f o r th e p a s t s e v e r a l y e a r s, a n dt h e p r o b l e m f o r n o n d i s t r i b u t e d D B M S s isw e l l u n d e r s t o o d . A b r o a d m a t h e m a t i c a l

    t h e o r y h a s b e e n d e v e l o p e d t o a n a l y z e t h ep r o b l e m , a n d o n e a p p r o a c h , c a l l e dt w o -p h a s e l o c k i n g , h a s b e e n a c c e p t e d a s as t a n d a r d s o l u ti o n . C u r re .n t r e s e a r c h o n n o n -d i s t r i b u t e d c o n c u n ' e n c y c o n t r o l i s f o c u s e do n e v o l u t i o n a ry i m p r o v e m e n t s t o t w o -p h a s e l o c k i n g , d e t a i l e d p e r f o r m a n c e a n a l y -s is a n d o p t i m i z a t i o n , a n d e x t e n s i o n s t o t h em a t h e m a t i c a l t h e o r y .

    D i s t r i b u t e d c o n c u r r e n c y c o n tr o l , b y c o n -

    t r a s t, i s i n a s t a t e o f e x t r e m e t u r b u l e n c e .M o r e t h a n 2 0 c o n c u r r e n c y c o n t r o l a l g o -r i t h m s h a v e b e e n p r o p o s e d f o r D D B M S s ,a n d s e v e r a l h a v e b e e n , o r a r e b e i n g , i m p l e -m e n t e d . T h e s e a l g o r i t h m s a r e u s u a l l y c o m -p l e x , h a r d t o u n d e r s t a n d , a n d d i f f i c u l t t o

    p r o v e c o r r e c t ( i n d e e d , m a n y a r e i n c o r r e c t ) .B e c a u s e t h e y a r e d e s c r i b e d i n d i f f e r e n t t e r -m i n o l o g i e s a n d m a k e d i f f e r e n t a s s u m p t i o n s

    P e r m i s s i o n t o c o p y w i t h o u t f ee a ll o r p a r t o f t h is a t e r i a l g r a n t e d p r o v i d e d t h a t t h e c o p l e s r e n o t m a d e o rd i st r ib u te d o r d i r ec t o m m e r c i a l a d v a n t a g e , h e A C M c o p y r i g h t o t i c e n d t h e t it le f t h e p u b l i ca t i o n n d i tsd a t e ap p e a r , n d n o t ic e s g i v e n t h a t c o p y i n g i s b y p e r r m s s i o n f t h e A s so c i a t i o n o r C o m p u t i n g M a c h i n e r y . T oc o p y o t h e r w i s e , r t o r e p u b l i s h, e q u i r e s f e e a n d / o r s p ec i fi c e r m i s s i o n . 1981 AC M 0 010-4892/81/0600-0185 $00.75

    C o m p u t i n g S u r v e y s, Vo l . 1 3 , N o . 2 ,June 1 9 8 1

  • 8/8/2019 Xu Ly Tuong Tranh

    2/37

    186 P. A . B e r n s t e i n a n d N . G o o d m a n

    CONTENTS

    INTRODUCTIONThe Concurrency Control Problem

    Examples of Concurrency Control AnomaliesComparison to Mutual Exclnslon Problems

    1. TRANSACTION-PROCESSING MODEL1.1 Prelmunary Defimtmns and DDBMS Archi-

    tecture1.2 Centrahzed Transactmn-Processmg Model1.3 Dmmbute d Transactmn-Processing Model

    2 DECOMPOSITION OF TH E CONCUR-RENCY CONTROL PROBLEM2 1 Selaallzabfllty2.2 A Parachgm for Concurrency Control

    3. SYNCHRONIZATION TECHNIQUE S

    BASED ON TWO-PHASE LOCKING3.1 Basra 2PL Implementation3.2 Primary Copy 2PL3.3 Voting 2PL3.4 Centrahzed 2PL3.5 Deadlock Detection and Prevention

    4 SYNCHRONIZATION TECHNIQU ESBASED ON TIMESTAMP ORDERING4.1 Basic T/O Implementatmn4.2 The Thomas Write Rule4.3 MulUversion T/O4.4 Conservative T/O4 5 Tnnestamp Management

    5 INTEGRA TED CONCURRENCY CONTROLMETHODS5 1 Pure 2PL Methods5.2 Pure T/O Methods5.3 MLxed 2PL and T /O Methods

    6. CONCLUSIONAPPENDIX. OTHER CONCURRENCY CON-TROL METHODS

    AI. CertifiersA2. Thomas ' MaJority Consensus AlgorithmA3. Ellis' Ring Algorithm

    ACKNOWLEDGMENT

    REFERENCESv

    a b o u t t h e u n d e r l y i n g D D B M S e n v i r o n -m e n t , i t i s d i f f i c u l t t o c o m p a r e t h e m a n yp r o p o s e d a l g o r i t h m s , e v e n i n q u a l i t a t i v et e r m s . N a t u r a l l y e a c h a u t h o r p r o c l a i m s h i so r h e r a p p r o a c h a s b e s t , b u t t h e r e i s l i t t l ec o m p e l l i n g e v i d e n c e t o s u p p o r t t h e c la im s .

    To s u r v e y t h e s t a t e o f t h e a r t, w e i n t ro -d u c e a s t a n d a r d t e r m i n o l o g y f o r d e s c r i b in gD D B M S c o n c u r r e n c y c o n t r o l a l g o r i t h m sa n d a s t a n d a rd m o d e l f o r t h e D D B M S e n -v i r o n m e n t . F o r a n a l y s i s p u r p o s e s w e d e -c o m p o s e t h e c o n c u r r e n c y c o n t r o l p r o b l e mi n t o t w o m a j o r s u b p r o b l e m s , c a l l e d r e a d -w r i t e a n d w r i t e - w r i t e s y n c h r o n i z a t i o n . E v -

    c r y c o n c u r r e n c y c o n t r o l a l g o r i th m m u s t ic l u d e a s u b a l g o r i t h m t o s o lv e e a c h s u b p r o bl em . T h e f ir s t s t e p t o w a r d u n d e r s t a n d i n g c o n c u r r e n c y c o n t r o l a l g o r i th m i s t o i s o lat h e s u b a l g o r i t h m e m p l o y e d f o r e a c h s u bp r o b l e m .

    A f t e r s tu d y i n g t h e l arg e n u m b e r o f p r op o s e d a l g o ri th m s , w e f in d t h a t t h e y a rc o m p o s i t i o n s o f o n l y a f e w s u b a l g o r it h mI n f a c t, t h e s u b a l g o r i t h m s u s e d b y a ll p r at i c a l D D B M S c o n c u r r e n c y c o n t r o l a l g or i t h m s a r e v a r i a t io n s o f j u s t t w o b a s i c te c hn i q u e s : t w o - p h a s e l o c k i n g a n d t i m e s t a mo r d e ri n g ; t h u s t h e s t a t e o f t h e a r t i s f am o r e c o h e r e n t t h a n a re v i e w o f t h e l i te r at u r e w o u l d s e e m t o i n d ic a t e.

    Examples of Co ncu rrency Control Anom alies

    T h e g o a l o f c o n c u r r e n c y c o n t r o l is t o p rv e n t i n t e r f e r e n c e a m o n g u s e r s w h o a r e sm u l t a n e o u s l y a c ce s si n g a d a t a b a s e . L e t ui ll u s tr a te t h e p r o b l e m b y p r e s e n t in g t w" c a n o n i c a l " e x a m p l e s o f i n t e r u s e r i n te r fee n c e. B o t h a r e e x a m p l e s o f a n o n - li ne l e c t r o n i c f u n d s t r a n s f e r s y s t e m a c c e s s ev i a r e m o t e a u t o m a t e d t e l l e r m a c h i n e s( AT M s ) . I n r e s p o n s e t o c u s t o m e r r e q u e s tsAT M s r e t r i e v e d a t a f r o m a d a t a b a s e , p e rf o r m c o m p u t a t i o n s , a n d s t o r e r e s u l t s b a ci n t o t h e d a t a b a s e .

    A n o m a l y 1: L o s t U p d a te s .S u p p o s e t w oc u s t o m e r s s i m u l t a n e o u s l y t r y t o d e p o sm o n e y i n t o t h e s a m e a c c o u n t . I n t h e a bs e n c e o f c o n c u r r e n c y c o n t r o l, t h e s e t w o at iv i t i e s cou ld in t e r fe re ( see F igure 1 ) . Tht w o AT M s h a n d l i n g t h e t w o c u s t o m e r

    c o u l d r e a d t h e a c c o u n t b a l a n c e a t a p p r o xm a t e l y t h e s a m e t im e , c o m p u t e n e w b aa n c e s i n p a r a l l e l , a n d t h e n s t o r e t h e n eb a l a n c e s b a c k in t o t h e d a t a b a s e . T h e ne f f e c t i s i n c o rr e c t: a l t h o u g h t w o c u s t o m ed e p o s i t e d m o n e y , t h e d a t a b a s e o n l y r e f le co n e a c t i v it y ; th e o t h e r d e p o s i t is l o s t b y t hs y s t e m .

    A n o m a l y 2 : I n c o n s i s t e n t R e t r i e v a l s .S u p p o s e t w o c u s t o m e r s s im u l ta n e o u s l y e

    e c u t e t h e f o ll o w i n g tr a n s a c t io n s .Cus tomer 1 : Move $1 ,000 ,000 f rom Acm

    C o r p o r a t i o n ' s s a v i n g s a c -c o u n t t o i t s ch e c k i n g a c c o u n t

    C u s to m e r 2: P r i n t A c m e C o r p o r a t i o n 't o t a l b a l a n c e i n s a v i n g s a n dcheck ing .

    Computing Surveys, Vol. 13, No 2, June 1981

  • 8/8/2019 Xu Ly Tuong Tranh

    3/37

    Execut,on of T

    READbolonce

    Add ~I,000,000

    WRITE resultbock to dotobose

    Concurrency Control in Database Systems

    Dotobose Execution of T

    I I , 0 0 0 0 0 ] ,0 0 0 e

    $1,500,000 [ J $2~ 500,000 ] Add $21000,000

    bock to dotobose

    187

    Figure 1. Lostupdate anomaly.

    In the absence of concurrency controlthese two transactions could interfere (seeFigure 2). The first transaction might readthe savings account balance, subtract$1,000,000, and store the resul t back in thedatabase. Then the second transactionmight read the savings and checking ac-count balances and print the total. Thenthe first transaction might finish the fundstransfer by reading the checking accountbalance, adding $1,000,000, and f inally stor-ing the result in the database. UnlikeAnomaly 1, the final values placed into th edatabase by this execution are correct. Still,the execution is incorrect because the bal-ance printed by Customer 2 is $1,000,000short.

    These two examples do not exhaust allpossible ways in which concurrent userscan interfere. However, these examples aretypical of the concurrency control problemstha t arise in DBMSs.

    Compar i son to Mutua l Exc lus ion Problems

    The problem of database concurrency con-trol is similar in some respects to that ofmut ual exclusion in operating systems. Thelatte r problem is concerned with coordinat-ing access by concurrent processes to sys-tem resources such as memory , I/O devices,and CPU. Many solution techniques havebeen developed, including locks, sema-phores, monitors, and serializers [BRIN73,DIJK71, HEWI74, HOAR74].

    The concurrency control and mutu al ex-clusion problems are similar in that bothare concerned with controlling concurrent

    access to shared resources. However, con-trol schemes th at work for one do not nec-essarily work for the other, as il lustrated bythe following example. Suppose processesP1 and P2 require access to resources R1and R2 at dif ferent points in thei r execution.In an operating system, th e following inter-leaved execution of these processes is per-fec tly acceptable: P1 uses R1, P2 uses R~, Peuses R2, P1 uses R2. In a database, however,this execution is not always acceptable. As-sume, for example, that P2 transfers fundsby debiting one accoun t (RI), then creditinganother (R2). If P2 checks bot h balances, itwill see R~ after i t has been debited, bu t seeR2 before it has been credited. Other differ-ences between concurrency control and mu-tual exclusion are discussed in CHAM74.

    1 . T R A N S A C T I O N - P R O C E S S I N G M O D E L

    To understand how a concurrency controlalgorithm operates, one must understandhow the algorithm fits into an overallDDBMS. In this section we present a sim-ple model of a DDBMS, emphasizing howthe DDBMS processes user interactions.Later we explain how concurrency controlalgorithms operate in the context of thismodel.

    1.1 Pre l iminary Def in i t ions and DD B M SArchi tec ture

    A distributed database management sys-tem (DDBMS) is a collection of sites in-terconnected by a network [DEPP76,

    ComputingSurveys,Vol 13, No. 2, June 1981

  • 8/8/2019 Xu Ly Tuong Tranh

    4/37

    188 P. A. Bernstein and N. Goodman

    E x e c u t ,o n o f T D o l o b o s e

    R E A D s o w n g s b o l o n ce

    S u b t r o c t $ 1 , 0 0 O , O O O

    W R I T E r e s u l t

    R E A D c h e c k in g b o lo n c e

    A d d $ 1 , O O O , O O O

    W R I T E r e s u l t

    1 ,2 ,o o o , o o lI .,o o o ,o o o

    1 ,5 o o ,o o o . f$1.500,000 ]

    \oo.o . .ooo5Ol~,0OO

    ls u m "

    S t ,S O D , O D D J

    E x e c u t io n o f T

    R E A D s o v ,n g s b o lo n c e

    R E A D c h e c k , n g b o l o n c e

    P r i n t S u m

    Figure 2 . I n c o n s i s t e n t r e t ri e v a l a n o m a l y .

    ROTH77]. Each site is a computer runningone or both of the following software mod-ules: a transac tion manager (TM) or a dat amanager (DM). TMs supervise interactionsbetween users and the DDBM S while DMsmanage the actual database. A network isa computer-to-computer communicationsystem. The network is assumed to be per-fectly reliable: if site A sends a message tosite B, site B is guaranteed to receive themessage without error. In addition, we as-sume that between any pair of sites thenetwork delivers messages in the order th eywere sent.

    From a user's perspective, a databaseconsists of a collection of logical dataitems, denoted X, Y, Z. We leave the gran-

    ular ity of logical data items unspecified; inpractice, they may be files, records, etc. Alogical database state is an assignment ofvalues to the logical data items composinga database. Eac h logical data item m ay bestored at any DM in the system or redun-dantly at several DMs. A stored copy of a

    logical data item is called a stored dataitem. (When no confusion is possible, weuse the term data item for stored dataitem.) The stored copies of logical dat a it emX are denot ed xl . . . . . Xm. We typically usex to denote an arbitrary stored data item.A stored database state is an assignmentof values to the stored data items in adatabase.

    Users interact with the DDB MS by exe-cuting transactions. Transactions may beon-line queries expressed in a self-containedquery language, or application programswritten in a general-purpose programminglanguage. The concurrency control algo-rithms we study pay no attention to thecomputations performed by transactions.

    Instead, these algorithms make all of theirdecisions on the basis of the data items atransact ion reads and writes, and so detailsof the for m of transactions are unim porta ntin our analysis. However we do assume th attransactions represent complete and cor-rect computations; each transaction, if ex-

    Com put ing Surveys , Vol. 13 , No 2 , Ju ne 1981

  • 8/8/2019 Xu Ly Tuong Tranh

    5/37

    C o n c u rr e n cy C o n t r o l in D a t a b a s e S y s t e m s 189

    t r o n s o c t t o n ,

    t r a n s o c t l o n

    t r o n s a c t l o n

    t ronsoc t lon

    t ronsoc t lon

    t ronsoc t lon

    / X \

    F i g u r e 3 . D D B M S s y s t e m a r c hi te c tu r e .

    ecuted alone on an initially consistent da-tabase, would terminate, produce correctresults, and leave the database consistent.The l og ica l r eadse t (correspondingly,writeset) of a transac tion is the set of logicaldata items the t ransaction reads (or writes).Similarly, s t o r e d r e a d s e t s and s t o r e dwri tese ts a re the stored data items that atransaction reads and writes.

    The correctness of a concurrency controlalgorithm is defined relative to users' ex-pectations regarding transac tion execution.The re are two correctness criteria: (1) usersexpect that each transaction submitted tothe system will eventually be executed; (2)users expect the computation performed byeach transaction to be the same whet her itexecutes alone in a dedicated system or inparallel with other transactions in a multi-programmed system. Realizing this expec-

    tation is the principal issue in concurrencycontrol.A DDBMS contains four components

    (see Figure 3): transactions, TMs, DMs,and data. Transactions communicate withTMs, TMs communicate with DMs, and

    DMs manage the data. (TMs do not com-municate with other TMs, nor do DMscommunicate with othe r DMs.)

    TMs supervise transactions. Each trans-action executed in the DDBMS is super-vised by a s ingle TM, meaning that thetransaction issues all of its database oper-ations to that TM. Any distributed com-putation that is needed to execute thetransaction is managed by th e TM .

    Four operations are defined at the trans-action-TM interface. READ(X) returnsthe value of X (a logical data item) in thecurrent logical database state. WRITE(X,new-value) creates a new logical databasestate in which X has the specified newvalue. Since transactions are assumed torepresent complete computations, we useBEGIN and END operations to brackettransaction executions.

    DMs manage the stored database, func-tioning as backend database processors. Inresponse to commands from transactions,TMs issue commands to DMs specifyingstored data items to be rea d or written. Thedetails of the T M- DM interface constitute

    Com puting Surveys, Vol. 13, No. 2, Ju ne 1981

  • 8/8/2019 Xu Ly Tuong Tranh

    6/37

    190 P. A. Bernstein and N. Goodman

    the core of our transaction-processingmodel and are discussed in Sections 1.2 and1.3. Section 1.2 describes the TM-DM in-teraction in a centralized database environ-ment, and Section 1.3 extends the discus-sion to a distributed database setting.

    1 .2 Centra l ized Transact ion-Process ingM o d e l

    A centralized DBMS consists of one TMand one DM executing at one site. A trans-action T accesses the DBMS by issuingBEGIN, READ, WRITE, and END oper-ations, which are processed as follows.

    BEGIN: The TM initializes for T a pri-vate workspace tha t functions as a tempo-rary buffer for values read from and writteninto the database.

    READ(X): The TM looks for a copy ofX in T's private workspace. If the copyexists, its value is returne d to T. Otherwisethe TM issues din-read(x) to the DM toretrieve a copy of X from the database,gives the retrieved value to T, and puts itinto T's private workspace.

    WRITE(X, new-value): The TM againchecks the priva te workspace for a copy ofX. If it finds one, the value is updated tonew-value; otherwise a copy of X wi th thenew value is created in the workspace. Thenew value o f X is not stored in the databaseat this time.

    END: The TM issues dm-write(x) foreach logical data item X updated by T.Each dm-write(x) requests that the DM

    update the value of X in the stored databaseto t he value of X in T's local workspace.When all dm-writes are processed, T isfinished executing, and its private work-space is discarded.

    The DBMS may restart T any time be-fore a din-write has been processed. Theeffect of restarting T is to obliterate itsprivate workspace and to reexecute T fromthe beginning. As we will see, ma ny concur-

    rency control algorithms use transactionrestarts as a tactic for attaining correctexecutions. However, once a single dm-write has been processed, T cannot be re-started; each dm-write perma nently installsan update into the database, and we cannotpermit the database to reflect partial effectsof transactions.

    A DBMS can avoid such partial resultsby having the property of atomic commit-ment, which requires that either all of atransaction's din-writes are processed ornone are. The "standard" implementationof atomic commi tment is a procedure calledtwo-phase commit [LAMP76, GRAY78]. 1Suppose T is updating dat a items X and Y.When T issues its END, the first phase oftwo-phase commi t begins, during which theDM issues prewrite commands for X andY. These commands instruct the DM tocopy the values of X and Y from T' s priva teworkspace onto secure storage. If theDBM S fails during the first phase, no ha rmis done, since none of T's updates have yet

    been applied to the stored database. Duringthe second phase, the TM issues din-writecommands for X and Y which instruct theDM to copy the values of X and Y into thestored database. If the D BMS fails duringthe second phase, the database ma y containincorrect information, but since the valuesof X and Y are stored on secure storage,this inconsistency can be rectified when th esystem recovers: the recovery procedure

    reads th e values of X and Y from securestorage and resumes the comm itme nt activ-ity.

    We emphasize that this is a mathemati-cal model of transac tion processing, an ap-proximation to the way DBMSs actuallyfunction. While the impleme ntation detailsof atomic commitment are important indesigning a DBMS, the y are not central toan understanding of concurrency control.To explain concurrency control algorithmswe need a model of transaction executionin which atomic co mmitm ent is visible, butnot dominant.

    1.3 Dis t ributed Transact ion-Process ingM o d e l

    Our model of transaction processing in adistributed environment differs from thatin a centralized one in two areas: handlingprivate workspaces and implement ing two-phase commit.

    The term "two-phase commit" is commonly used todenote t he distributed version of this procedure. How-ever, since the centralized and dist ribute d versions areidentical in structure, we use "two-phase commit" todescribe both.

    Computing Surveys, Vol 13, No. 2, June 1981

  • 8/8/2019 Xu Ly Tuong Tranh

    7/37

    Concurrency Control in Database Systems 191

    In a centralized DBMS we assumed tha t(1) private workspaces were par t of the TM,and (2) data could freely move between atransaction and its workspace, and betweena workspace and the DM. These assump-tions are not appropriate in a DDBMSbecause TMs and DMs may run at differentsites and the moveme nt of data between aTM and a DM can be expensive. To reducethis cost, many DDBMSs employ queryoptimization procedures which regulate(and, it is hoped, reduce) the flow of databetween sites. For example, in SDD-1 theprivate workspace for transac tion T is dis-tributed across all sites at which T accessesda ta [BF.RN81]. The details o f how T readsand writes data in these workspaces is aquery optimization problem and ha s no di-rect effect on concurrency control.

    The problem of atomic commitment isaggravated in a DDBMS by the possibilityof one site failing while the rest of thesystem continues to operate. Suppose T isupda ting x, y, z s tored a t DMx, DMy, DMz,and suppose T' s T M fails after issuing dm-write(x), but before issuing the dm-writesfor y and z. At this point the database is

    incorrect. In a centralized DBMS this phe-nomenon is not har mful because no trans-action can access the database until theTM recovers from the failure. However, ina DDBMS, other TMs remain operationaland can access the incorrect database.

    To avoid this problem, prewrite com-mands must be modified slightly. In addi-tion to specifying data items to be copiedonto secure storage, prewrites also specify

    which other DMs are involved in the com-mitm ent activity. The n if the TM fails dur-ing the second phase of two-phase commit,the DMs whose dm-writes were not issuedcan recognize the situat ion and consult theother DMs involved in the commitment. Ifany DM received a dm-write, the remainingones act as if they had also received thecommand. The details of this procedure arecomplex and appear in HAMM80.

    As in a centralized DBMS, a transactionT accesses the system by issuing BEGIN,READ, WRITE, and END operations. Ina DDBMS these are processed as follows.

    BEGIN: The TM creates a private work-space for T. We leave the location andorganization of this workspace unspecified.

    READ(X): The TM checks T's private

    workspace to see if a copy of X is present.If so, tha t copy's value is made available toT. Otherwise the TM selects some storedcopy of X, say xi, and issues din-read(x,) tothe DM at which x, is stored. The DMresponds by retrieving the stored value ofx, from the database, placing it in the pri-vate workspace. The TM returns this valueto T.

    WRITE(X, new-value): Th e value of X inT's private workspace is updated to new-value, assuming the workspace contains acopy of X. Otherwise, a copy of X with t henew value is crea ted in the workspace.

    END: Two-phase commit begins. Foreach X updated by T, and for each storedcopy x, of X, the TM issues a prewrite (x,)to the DM tha t stores x,. The DM respondsby copying the value of X from T's privateworkspace onto secure storage internal tothe DM. After all prewrites are processed,the TM issues dm-writes for all copies of alllogical data items updated by T. A DMresponds to dm-write(x,) by copying thevalue of x, from secure storage into thestored database. After all dm-writes areinstalled, T's execution is finished.

    2 . D EC O M P O S IT I O N O F T H E C O N C U R -R EN C Y C O N T R O L P R O B L E M

    In this section we review concurrency con-trol theory with two objectives: to define"correct executions" in precise terms, andto decompose the concurrency controlproblem into more tractable subproblems.

    2.1 Serial izabil i tyLet E denote an execution of transactionsT1 . . . . . T,. E is a serial execution if notransactions execute concurrently in E; tha tis, each transact ion is executed to comple-tion before the next one begins. Every serialexecution is defined to be correct, becausethe properties of transactions (see Section1.1) imply th at a serial execution termina tesproperly and preserves database consist-

    ency. An execution is serializable if it iscomputationally equivalent to a serial exe-cution, th at is, if it produces the same out-put and has the same effect on the databaseas some serial execution. Since serial exe-cutions are correct and every serializableexecution is equivalent to a serial one, everyserializable execution is also correct. The

    Com puting Surveys , V ol. 13, No. 2, J un e 1981

  • 8/8/2019 Xu Ly Tuong Tranh

    8/37

    192 P. A. Berns te in and N. Goodman

    Tr a n s a c h o n s Database

    T 1 BEGIN; i----n ~READ (X); WRITE(Y); END

    T2 BEGIN;RE AD(Y), WRITE (Z); END

    T 3 . BEGIN,READ(Z), W RITE(X), END

    One poss ib le execu t ion o f T1 , T2 , and T3 i s r ep resen ted by thefo l lowing logs . (Note . r, [x ] deno tes the opera t ion d in - read(x) i s suedby T~; w,[x] de note s a d in-wri te(x) i ssu ed by T, .)

    L o g f o r D M A :rl[xl]wl[yl]r2[yl]w3[xl]

    L o g f o r D M B :wl[y2]w2[z2]L o g f o r D M C .w2[z3]r3[z3]

    Figure 4 . M o d e l i n g e x e c u t i o n s a s l og s .

    T h e e x e c u t i o n m o d e l e d i n Fi g u r e 4 i s s er ia l . E a c hl o g i s i t s el f e r ia l ; h a t i s, t h e r e i s n o i n t e r l e a v i n g o f

    o p e r a t i o n s f r o m d if f e re n t r an s a c t i o n s . t D M A , T i

    p r e c e d e s T ~ p r e c e d e s T 3 ; a t D M B , % p r e c e d e s T ~;

    a n d a t D M C , T 2 p r e c e d e s T 3 . T h e r ef o r e , T I, T 2 , T 3i s a t o t a l o r d e r s a t i s f y i n g h e d e f i n i t i o n f s e ri al .

    T h e f o l l ow i n g e x e c u t i o n i s n o t s e ri al . h e l o g s t h e m -s e l v e s a r e n o t s er i al .

    D M A :rl[xl]r2[YllW3[Xl]Wl[ l]D M B :w2[z2]wl[y2]D M C :w2[z3lr3[z3]

    Th e fo l lowing execu t ion is a l so no t se r i a l Al tho ugheach log i s se r ia l , t he re i s no to ta l o rde r cons i s t en tw i th a l l logs .

    D M A :rl[x~]wl[yl]re[yl]w3[x~]D M B: w2[z2]wl[y2]

    D M C :w2[z3]r3[z3]

    Figure 5 . Se r i a l and nonse r i a l loops .

    goal of database concurrency control is toensure t ha t all executions are serializable.

    The only operations that access thestored database are din-read and din-write.Hence it is sufficient to model an executionof transactions by the execution of din-reads and din-writes at the various DMs ofthe DDBMS. In this spirit we formallymodel an execution of transactions by a setof logs, each of which indicates the order inwhich dm-reads and din-writes are proc-essed at one DM (see Figure 4). An execu-tion is serial i f there is a total order oftransactions such tha t if T, precedes Tj in

    the total order, then all of T,'s operationsprecede all of Tfs operations in every logwhere both appear (see Figure 5). Intui-tively, this says that transactions executeserially and in th e same order at all DMs.

    Two operations confl ict if they operateon the same data item and one of the op-erations is a dm-write. The order in whichoperations execute is computationally sig-nificant if and only if the operations con-flict. To illustrate the notion of conflict,consider a data item x and transac tions T,and Tj. If T, issues dm-read (x) and T~issues dm-write(x), the value read by T, will(in general) differ depending on whetherthe dm-read precedes or follows the dm-write. Similarly, if both transactions issuedm-write(x) operations, the final value of xdepends on which dm-write happens last.Those conflict situations are called r e a d -write (rw) confl icts and write-write (ww)conflicts, respectively.

    The notion of conflict helps characterizethe equivalence of executions. Two execu-tions are computa t iona l ly equ iva len tif (1)each dm-read operation reads data itemvalues that were produced by the same dm-writes in both executions; and (2) the finaldm-write on each data item is the same inboth executions [PAPA77, PAPA79]. Condi-tion (1) ensures that each transaction readsthe same input in both executions (andtherefore performs the same computation).

    Com puting Surveys, V ol. 13, No. 2, Jun e 1981

  • 8/8/2019 Xu Ly Tuong Tranh

    9/37

    C o n c u r r e n c y C o n t r o l i n D a t a b a s e S y s t e m s 193

    Combined with (2), it ensures that bothexecutions leave the database in the samefinal state.

    From this we can characterize serializa-ble executions precisely.

    T h e o r e m 1 [PAPA77, PAPA79, STEA76]

    L e t T ffi (T1, ..., Tin} be a s e t o f tr ansac -t i ons an d l e t E be an execu t ion o f t heset r ansac t ions m ode led by logs (Lb . . . .Lm}. E i s se r ia l izab le i f there ex is t s a to ta lo r d e r i n g o f T s u c h t h a t f o r e a ch p a i r o fconf l ic t ing oper a t ions O~ an d Oj f ro m dis -t inc t t ransac t ions T, and Tj ( respec t ive ly) ,O~ p rece des Oj i n an y log L~ . . . . Lm i f an donly i f T~ pre ced es T~ in the to ta l o rder ing .

    The total order hypothesized in Theorem1 is called a se r i a l i za t ion o rde r. If thetransactions had executed serially in theserialization order, the computation per-formed by the transactions would havebeen identical to the computation repre-sented by E.

    To attain serializability, the DDBMSmust guarantee that all executions satisfythe condition of Theorem 1, namely, that

    conflicting dm-reads and dm-writes beprocessed in certain relative orders. Con-cu r rency con t ro l is the activity of control-ling the relative order of conflicting opera-tions; an algorithm to perform such controlis called a synchron iza t ion t echn ique .Tobe correct, a DDBMS must incorporatesynchronization techniques tha t guaranteethe conditions of The ore m 1.

    (3) T, --,ww Tj if in some log of E, T, wri tesinto some dat a i tem into which T~ sub-sequently writes;

    (4) T, --,~w~ Tj if T, - * ~ T~ or T, --*w~ Tj;(5) T~ --* Tj if Tj --*~ T~ or T~ --*ww

    %

    Intuitively, -* (with any subscript)means "in a ny serialization must precede."For example, T, --*~w Tj means "T, in anyserialization must precede Tj." This inter-pretation follows from Theorem 1: If T,reads x before Tj writes into x, then thehypothetical serialization in Theorem 1must have T, preceding T~.

    Every conflict between operations in E isrepresented by an --, relationship. There-

    fore, we can restate Th eore m 1 in terms of--,. According to Th eore m 1, E is serializa-ble if there is a total order of transactionstha t is consistent with -*. Th is latte r con-dition holds if and only if --, is acyclic. (Arela tion, --*, is acycl ic if ther e is no sequenceT1 -* T2, T2 --* Ta . . . . Tn-1 --* Tn such t hatT1 ffi T~.) Let us decompose --, into itscomponents, --*rwr and--* ww, an d re stat e thetheor em using them.

    Theorem 2 [BERNSOa]

    L e t "-'>rwr an d ---,ww be as soc ia ted w i th exe-cut io n E. E is ser ia l izab le i f (a) -'*rwr a nd"-'>w~ ar e acyc lic, a n d (b) the re is a to talo r d e r i n g o f t h e t ra n s a c t i o n s c o n s i s te n twi th a l l - - ~ an d a l l ---~w re l a t ionsh ips .

    2.2 A P aradigm for Concu rrency Contro l

    In Theorem 1, rw and ww conflicts aretreated together under the general notionof conflict. However, we can decompose theconcept of serializability by distinguishingthese two types of conflict. Let E be anexecution modeled by a set of logs. Wedefine several binary relations on transac-tions in E, denoted by -* with various sub-scripts. For each pair of transactions, T~and Tj

    (1) T~ --*~w Tj if in some log of E, T, readssome data item into which T~ subse-quently writes;

    (2) T~ --*~ T~ if in some log of E, T, writesinto some data item that Tj subse-quently reads;

    Theo rem 2 is an immediate consequenceof The ore m 1. (Indeed, par t (b) of The ore m2 is essentially a re sta tem ent of the earliertheorem.) However, this way of character-izing serializability suggests a way of de-composing the problem into simpler parts.The ore m 2 implies th at rw and ww conflictscan be synchronized independently exceptinsofar as there mus t be a total ordering ofthe transactions consistent with both type sof conflicts. This suggests th at we can use

    one technique to guarantee an acyclic--*~w~relation (which amount s to r e a d - w r i t esynchron iza t ion ) and a different techniqueto guaran tee an acyclic --*~,~ relation( w r i t e - w r i t e s y n c h r o n i z a ti o n ) .However, inaddition to bot h - - . ~ and -*ww beingacyclic, there mus t also be o n e serial order

    Co mp uting Surveys, Vol. 13, No. 2, Ju ne 1981

  • 8/8/2019 Xu Ly Tuong Tranh

    10/37

    194 P.A. Bernstein and N. Goodman

    consistent with all--, relations. This serialorder is the cement th at binds together therw and ww synchronization techniques.

    Decomposing serializability into rw andww synchronization is the cornerstone ofour paradigm for concurrency control. Itwill be important hereafter to distinguishalgorithms tha t attain either rw or ww syn-chronization from algorithms tha t solve theentire distributed concurrency controlproblem. We use the term synchronizationtechnique for the former type of algorithm,and concurrency control method for thelatter.

    3 . S Y N C H R O N I Z AT I O N T E C H N I Q U E SB A S ED O N T W O - P H A S E L O C K I N G

    Two-phase locking (2PL) synchronizesreads and writes by explicitly detecting andpreventing conflicts between concurrentoperations. Before reading data item x,a transaction must "own" a readlock onx. Before writing into x, it must "own" awritelock on x. The ownership of locks isgoverned by two rules: (1) different trans-actions cannot simultaneously own con-flicting locks; and (2) once a transactionsurrenders ownership of a lock, it may neverobtain additional locks.

    The definition of conflicting lock de-pends on the type of synchronization beingperformed: for rw synchronization twolocks conflict if (a) both are locks on thesame data item, and (b) one is a readlockand the other is a writelock; for ww syn-

    chronization two locks conflict if (a) bothare locks on the same data item, and (b)both are writelocks.

    The second lock ownership rule causesevery transaction to obtain locks in a two-phase manner. During the growing phasethe transaction obtains locks without re-leasing any locks. By releasing a lock thetransaction enters the shrinking phase.During this phase the transaction releaseslocks, and, by rule 2, is prohibited fromobtaining additional locks. When t he trans-action terminates (or aborts), all remain inglocks are autom atically released.

    A common variation is to require thattransactions obtain all locks before begin-ning their mai n execution. This varia tion iscalled predeclaration. Some systems also

    require t ha t transactions hold all locks untiltermination

    Two-phase locking is a correct synchro-nization technique, meaning that 2PLattains an acyclic --*~ (--*~) relationwhen used for rw (ww) synchronization[BERs79b, EswA76, PAPA79]. The seriali-zation order atta ined by 2P L is determinedby the order in which transactions obtainlocks. The point at the end of the growingphase, when a tra nsact ion owns all the locksit ever will own, is called the locked poin tof the transaction [BERN79b]. Let E be anexecution in which 2PL is used for rw (ww)synchronization. The -- *~ (--*~) relationinduced by E is identical to the relation

    induced by a serial execution E' in whichevery transaction executes at its lockedpoint. Thus the locked points of E deter-mine a serialization order for E.

    3 .1 Bas ic 2P L Implemen ta t ion

    An implementation of 2PL amounts tobuilding a 2PL scheduler, a software mod-ule that receives lock requests and lockreleases and processes them according tothe 2PL specification.

    The basic way to implement 2PL in adistributed database is to distribute theschedulers along with the database, placingthe scheduler for data item x at the DMwere x is stored. In this implementationreadlocks may be implicitly requested bydin-reads and writelocks may be implicitlyrequested by prewrites. If the requestedlock cannot be granted, the operation is

    placed on a waiting queue for the desireddata item. (This can produce a deadlock,as discussed in Section 3.5.) Writelocks areimplicitly released by din-writes. However,to release readlocks, special lock-release op-erations are required. These lock releasesma y be trans mitte d in parallel with the din-writes, since the dm-writes signal the startof the shrinking phase. When a lock isreleased, the operations on the waitingqueue of th at data item are processed first-in/first- out (FIFO) order.

    Notice that this implementation "auto-matically" handles redundant data cor-rectly. Suppose logical data item X hascopies xl, ..., xm. If basic 2PL is used forrw synchronization, a tr ansaction m ay r eadany copy and need only obtain a readlock

    Cora pu tm g Survey s , Vo l . 13 , No . 2 , Ju ne 1981

  • 8/8/2019 Xu Ly Tuong Tranh

    11/37

    Concurrency Control in Database Systems 195

    o n t h e c o p y o f X i t a c t u a l ly r e a d s . H o w e v e r,i f a tr a n s a c t i o n u p d a t e s X , t h e n i t m u s tu p d a t e a ll c o p ie s o f X , a n d s o m u s t o b t a i nw r i t e lo c k s o n a l l c o p ie s o f X ( w h e t h e r b a s i c2 P L i s u s e d f o r r w o r w w s y n c h r o n i z a ti o n ) .

    3 2 P r im a r y C o p y 2 P L

    Primary copy 2PL i s a 2 P L t e c h n i q u e t h a tp a y s a t t e n t i o n t o d a t a r e d u n d a n c y[ S To s 7 9 ] . O n e c o p y o f e a c h l o g i ca l d a t ai t e m i s d e s ig n a t e d t h eprimary copy; b e f o r ea c c e ss i n g a n y c o p y o f t h e l og i ca l d a t a i t em ,t h e a p p r o p r i a t e l o c k m u s t b e o b t a i n e d o nt h e p r i m a r y c o p y.

    F o r r e a d l o c k s t h i s t e c h n i q u e r e q u i r e sm o r e c o m m u n i c a t io n t h a n b a s ic 2 PL . S u p -

    p o s e x l i s t h e p r i m a r y c o p y o f l o gi ca l d a t ai t e m X , a n d s u p p o s e tr a n s a c t i o n T w i s h e st o r e a d s o m e o t h e r c o p y, x ,, o f X . To r e a dx ,, T m u s t c o m m u n i c a t e w i th t w o D M s , t h eD M w h e r e X s is s t o r e d ( so T c a n l o c k x l)a n d t h e D M w h e r e x , is s t o r ed . B y c o n t r a s t,u n d e r b a s ic 2 PL , T w o u l d o n l y c o m m u n i -c a t e w i t h x , 's D M . F o r w r i t e lo c k s , h o w e v e r,p r i m a r y c o p y 2 P L d o e s n o t i n cu r e x t r a c o m -m u n i c a ti o n . S u p p o s e T w i s h e s t o u p d a t e X .U n d e r b a s i c 2 P L , T w o u l d i s s u e p r e w r i t e st o a l l c o p ie s o f X ( t h e r e b y r e q u e s t i n gw r i t e l o c k s o n t h e s e d a t a i t e m s ) a n d t h e ni s sue dm-wr i t e s to a l l cop ies . Under p r i -m a r y c o p y 2 P L t h e s a m e o p e r a t i o n s w o u l db e r e q u i re d , b u t o n l y t h e p r e w r i te (X l)w o u l d r e q u e s t a w r i t e l o c k . T h a t i s , p r e -w r i t e s w o u l d b e s e n t f o r x l, . . . , xm , b u t t h eprew r i t e s fo r x2 . . . . xm w ould no t imp l i c it lyr e q u e s t w r i te l o c k s.

    3 .3 Vot ing 2P L

    Voting 2PL (o r majority consensus 2PL) isa n o t h e r 2 P L im p l e m e n t a t i o n t h a t e x p l o it sd a t a r e d u n d a n c y. Vo t i n g 2 P L i s d e r i v e df r o m t h e m a j o r i t y c o n s e n s u s t e c h n i q u e o fT h o m a s [ T H O M 7 9 ] a n d i s o n l y s u i t a b l e f o rw w s y n c h r o n i z a ti o n .

    To u n d e r s t a n d v o t i n g , w e m u s t e x a m i n ei t i n t h e c o n t e x t o f t w o - p h a s e c o m m i t. S u p -p o s e t r a n s a c t i o n T w a n t s t o w r i t e i n t o X .I ts T M s e n d s p r ew r i t e s t o e a c h D M h o l di n ga c o p y o f X . F o r t h e v o t i n g p r o t o c o l, t h eD M a l w a y s r e s p o n d s i m m e d i a te l y. I t a c-k n o w l e d g e s r e c e i p t o f t h e p r e w r i t e a n d s a y s" l o c k s e t " o r " l o c k b l o c k e d . " ( I n t h e b a s i ci m p l e m e n t a t i o n i t w o u l d n o t a c k n o w l e d g ea t a l l un t i l t he lock i s se t . ) Af te r the TM

    r e c e i v e s a c k n o w l e d g m e n t s f r o m t h e D M s ,i t c o u n t s t h e n u m b e r o f "l oc k ~ s e t" r e s p o n s e s :i f t h e n u m b e r c o n s t i tu t e s a m a j o r i t y, th e nt h e T M b e h a v e s a s i f a ll l o c k s w e r e s e t.O t h e rw i s e , i t w a i t s f o r " lo c k s e t " o p e r a t i o n sf r o m D M s t h a t o r i g i n a l l y s a i d " l o c kb l o c k e d . " D e a d l o c k s a s i d e ( s ee S e c t i o n 3 .5 ),i t w ill e v e n t u a l l y r e c e i v e e n o u g h " l o c k s e t "o p e r a t i o n s t o p r o c e e d .

    S i n c e o n l y o n e t r a n s a c t i o n c a n h o l d am a j o r i t y o f l o c k s o n X a t a t im e , o n l y o n et r a n s a c t i o n w r i t i n g i n t o X c a n b e i n i t ss e c o n d c o m m i t p h a s e a t a n y t i m e . A l l c o p -i es o f X t h e r e b y h a v e t h e s a m e s e q u e n c e o fw r i t e s a p p l i e d t o t h e m . A t r a n s a c t i o n ' sl o c k e d p o in t o c c u r s w h e n i t h a s o b t a i n e d am a j o r i t y o f i ts w r i t e lo c k s o n e a c h d a t a i t e mi n i t s w r i t e s e t . W h e n u p d a t i n g m a n y d a t ai te m s , a t r a n s a c t i o n m u s t o b t a i n a m a j o r i t yo f l o c k s o n e v e r y d a t a i t e m b e f o r e i t i s s u e sa n y d m - w r i t e s .

    I n p r in c i p le , v o t i n g 2 P L c o u l d b e a d a p t e df o r r w s y n c h r o n i z a t i o n . B e f o r e re a d i n g a n yc o p y o f X a t r a n s a c t io n r e q u e s t s r e a d l o c k so n a ll c o p i e s o f X ; w h e n a m a j o r i t y o f l o c k sa r e s et , t h e t r a n s a c t io n m a y r e a d a n y c o p y.T h i s t e c h n i q u e w o r k s b u t is o v e r l y s tr o ng :

    C o r r e c t n e s s o n l y r e q u i r e s t h a t a s i n g le c o p yo f X b e l o c k e d - - n a m e l y, t h e c o p y t h a t isr e a d - - y e t t h i s t e c h n i q u e r e q u e s t s l o c k s o na l l c o p i e s . F o r t h i s r e a s o n w e d e e m v o t i n g2 P L t o b e i n a p p r o p r i a t e f o r r w s y n c h r o n i -za t ion .

    3 .4 Cen t ra li zed 2P L

    I n s t e a d o f d i s t ri b u t in g t h e 2 P L s c h e d u l e r s ,

    o n e c a n c e n t ra l iz e t h e s c h e d u l e r a t a s i n g les i te [ALsB76a, GARC79a] . Before access ingd a t a a t a n y s it e, a p p r o p r i a t e l o c k s m u s t b eo b t a i n e d f r o m t h e c e n t r a l 2 P L s c h e d u l e r.S o , f o r e x a m p l e , t o p e r f o r m d m - r e a d ( x )w h e r e x i s n o t s t o r e d a t t h e c e n t r a l s it e, t h eT M m u s t f i r st r e q u e s t a r e a d l o c k o n x f r o mthe cen t ra l s i t e , wa l t fo r the cen t ra l s i t e toa c k n o w l e d g e t h a t t h e l o c k h a s b e e n s e t,t h e n s e n d d m - r e a d (x ) t o th e D M t h a t h o l d sx . ( To s a v e s o m e c o m m u n i c a t i o n , o n e c a nh a v e t h e T M s e n d b o t h t h e l o ck r e q u e s ta n d d m - r e a d ( x ) t o t h e c e n t r a l s i t e a n d l e tt h e c e n t r a l s i te d i r e c t ly f o r w a r d d m - r e a d ( x )t o x ' s D M ; t h e D M t h e n r e s p o n d s t o t h eT M w h e n d m - r e a d (x ) h a s b e e n p r o c e ss e d .)L ik e p r i m a r y c o p y 2 P L, th i s a p p r o a c h t e n d st o r e q u i r e m o r e c o m m u n i c a t i o n t h a n b a s i c

    Co mp uting Surveys, Vol. 13, No. 2, J un e 1981

  • 8/8/2019 Xu Ly Tuong Tranh

    12/37

    196 P. A. Berns te in an d N. Goodm an

    Tronsochons Datobose

    T 1 : B EG IN; r--- ~ t

  • 8/8/2019 Xu Ly Tuong Tranh

    13/37

    Concurrency Control in Database Systems

    T 1 mus t wa l t f o r T2 to

    r e l e as e r e a d - l o c k o n Y2

    T~' ~ ' r a

    T3 must wa,t for Tl tO ~ / 1 " 2 must wa,t forT3tOr e l e a s e r e a d - l o c k o n x 1 - \ T 5 r e l e a s e r e a d - l o o k o n Z3

    Figure 7 . W ai t s - fo r g rap h fo r F igu re 6 .

    1 9 7

    g u a r a n t e e t h a t i f T, w a i t s fo r Tj, t h e n d e a d -l o c k c a n n o t r e s u l t . O n e s i m p l e a p p r o a c h i snev er to l e t T~ w a i t fo r T j. Th i s t r iv i a l lyp r e v e n t s d e a d l o c k b u t f o rc e s m a n y r e s ta r ts .

    A b e t t e r a p p r o a c h i s t o a s s i g nprioritiest o t r ansac t ions and to t e s t p r io r i t i e s to de -c i d e w h e t h e r T, c a n w a i t fo r T j. F o r e x a m -p le, we cou ld l e t T , wa i t fo r T j i f T, h asl o w e r p r i o r i ty t h a n T j ( i f T~ a n d T j h a v eequal pr io r i t i e s , T, canno t wa i t fo r T j , o rv i c e v e r sa ) . T h i s t e s t p r e v e n t s d e a d l o c kb e c a u s e , fo r e v e r y e d g e ( T , T j ) i n th e w a i ts -fo r g raph , T, has lower p r io r i ty than T j .S ince a cyc le i s a pa th f rom a nod e to i t s e l fa n d s i n c e T, c a n n o t h a v e l o w e r p r i o r i t ythCan i tself , no cy cle ca n exist .

    O n e p r o b l e m w i t h t h e p r e c e d i n g a p -p r o a c h i s t h a tcyclic restart i s p o s s i b l e - -

    s o m e u n f o r t u n a t e t ra n s a c t io n c o u l d b e co n -t inua l ly r e s t a r t ed w i tho u t eve r f in i sh ing . T oa v o i d t h i s p r o b l e m , R o s e n k r a n t z e t a l .[ R o s E 7 8 ] p r o p o s e u s i n g " t i m e s t a m p s " a sp r io r i t i e s . In tu i t ive ly, a t r ansac t ion ' s t ime-s t a m p i s t h e t i m e a t w h i c h i t b e g i n s e x e c u t -ing , so o ld t r ansa c t ions hav e h ighe r p r io r i tyt h a n y o u n g o n e s.

    T h e t e c h n i q u e o f R o s ~ .7 8 r e q u i r es t h a te a c h t r a n s a c t i o n b e a s s i g n e d auniquet i m e s t a m p b y it s T M . W h e n a t r a n s a c ti o nb e g i n s , t h e T M r e a d s t h e l o c a l c l o c k t i m ea n d a p p e n d s a u n i q u e T M i d e n t i f i e r t o t h elow-orde r b i t s [THOM79] . The r e su l t ingn u m b e r i s t h e d e s ir e d t im e s t a m p . T h e T Ma l so a g r ee s n o t t o a s si g n a n o t h e r t i m e s t a m pu n t i l t h e n e x t cl o c k t ic k . T h u s t i m e s t a m p sa s s i g n e d b y d i f f e r e n t T M s d i f f e r i n t h e i rl o w - o r d e r b i t s ( s i n c e d i f f e r e n t T M s h a v ed i ff e ren t iden t i f i e r s ) , wh i l e t imes tamps as -s i g n ed b y t h e s a m e T M d i ff e r i n t h e i r h i gh -o r d e r b i t s ( si nc e t h e T M d o e s n o t u s e t h es a m e c lo c k ti m e t w i ce ). H e n c e t i m e s t a m p sa r e u n i q u e t h r o u g h o u t t h e s y s t e m . N o t et h a t t h i s a lg o r i t h m d o e s n o t re q u i r e c l o c k sa t d i ff e ren t s i t e s to be p rec i se ly synchro -nized.

    Tw o t i m e s t a m p - b a s e d d e a d lo c k p r ev e n -t i o n s c h e m e s a r e p r o p o s e d i n R a s p , 7 8 .Wait-Die is th e n o n p r e e m p t i v e t e c h n i q u e .S u p p o s e t r a n s a c t i o n T, t r ie s t o w a i t f o r T~ .I f T, h as low er pr ior i ty th an T~ ( i.e ., T, i sy o u n g e r t h a n T ~ ), t h e n T, i s p e r m i t te d t owa i t . O the rwise , i t i s abor t ed ( "d ies" ) andf o r c e d to r e s ta r t . I t is i m p o r t a n t t h a t T, n o tb e a s s i g n e d a n e w t i m e s t a m p w h e n i t r e -s tar t s . Wound.Wait i s t h e p r e e m p t i v ec o u n t e r p a r t t oWait-Die. I f T, h a s h i g h e rp r i o r i ty t h a n T j , t h e n T, w a it s ; o t h e r w i s e T ji s abo r t ed .

    B o t h Wa i t - D i e a n d Wo u n d - Wa i t a v o i dc y c li c r e s t a rt . H o w e v e r, i n W o u n d - W a i t a no ld tr a n s a c t io n m a y b e r e s ta r t e d m a n yt i m e s , w h i l e i n Wa i t - D i e o l d t r a n s a c t i o n sn e v e r r e s t a r t. I t is s u g g e s t e d i n R o s E 7 8 t h a t

    W o u n d - W a i t i n d u c e s f e w e r r e s t a r ts i n to t al .C a r e m u s t b e e x e r c i s e d i n u s i n g p r e e m p -t i v e d e a d l o c k p r e v e n t i o n w i t h t w o - p h a s ec o m m i t : a t r a n s a c ti o n m u s t n o t b e a b o r t e do n c e t h e s e c o n d p h a s e o f t w o - p h a s e c o m m i th a s b e g u n . I f a p r e e m p t i v e t e c h n i q u ew i s h e s t o a b o r t T j , i t c h e c k s w i t h T f s T Ma n d c a n c el s th e a b o r t i f T j h a s e n t e r e d t h es e c o n d p h a s e . N o d e a d l o c k c a n r e s u l t b e -c a u s e i f T j i s i n th e s e c o n d p h a s e , i t c a n n o tb e w a i ti n g f o r a n y t r a n s a c t i o n s .

    Preordering of resources i s a dead locka v o i d a n c e t e c h n i q u e t h a t a v o i d s r e s t a r t sa l t o g e t h e r. T h i s t e c h n i q u e r e q u i r e s p r e d e -c l a ra t io n o f l o c k s ( e a c h t ra n s a c t i o n o b t a i n sa l l i t s l o c k s b e f o r e e x e c u t i o n ) . D a t a i t e m sa r e n u m b e r e d a n d e a c h t r a n s a c t i o n r e -q u e s t s lo c k s o n e a t a t im e i n n u m e r i c o r d e r.T h e p r io r i ty o f a t ra n s a c t i o n is t h e n u m b e ro f t h e h i g h e s t n u m b e r e d l o c k i t o w n s . S i n c ea t r a n s a c t i o n c a n o n l y w a i t f o r t r a n s a c t i o n sw i t h h i g h e r p r i o r i t y, n o d e a d l o c k s c a n o c -cur. In add i t ion to r equ i r ing p redec la ra t ion ,a p r in c i p a l d i s a d v a n t a g e o f t h i s t e c h n i q u eis t h a t i t f o r c e s l o c k s t o b e o b t a i n e d s e q u e n -t i a l l y, w h i c h t e n d s t o i n c r e a s e r e s p o n s et ime .

    Computing Surveys, V ol. 13, No. 2, Ju ne 1981

  • 8/8/2019 Xu Ly Tuong Tranh

    14/37

    198 P. A . B e r n s t e i n a n d N . G o o d m a n

    Consider the execution illustrated in Figures 6 and 7. Locks are request ed at DMs in t he following order:

    DM A DM B DM C

    readlock xl for T1 readlo ck y2 for T2 readlock z3 for %

    writelock yl for T~ writelock z2 for T2*writelock x~ for T3 *writelock y2 for T1 *wri telock z3 for T2 None of the "starred" locks can be granted and the system is in deadlock. However,

    the waits-for graphs at each DM are acyclic.

    DM A DM B DM C

    ,(9 @ ,@Figure 8 . Multisite deadlock.

    3 . 5 . 2 D e a d l o c k D e t e c t io nIn d e a d l o c k d e t e c t i o n ,transactions wait foreach other in an uncontrolled manner andare only aborted if a deadlock actually oc-curs. Deadlocks are detected by explicitlyconstructing the waits-for graph andsearching it for cycles. {Cycles in a graphcan be found efficiently using, for example,Algori thm 5.2 in AHO75.) If a cycle is found,one transaction on the cycle, called thevic t im, is aborted, thereby breaking thedeadlock. To minimize the cost of resta rtingthe victim, victim selection is usually basedon the amount of resources used by eachtransaction on the cycle.

    The principal difficulty in implementingdeadlock detection in a distributed data-base is constructing the waits-for graph ef-ficiently. Each 2PL scheduler can easilyconstruct the waits-for graph based on the

    waits-for relationships local to that sched-uler. However, these local waits-for graphsare not sufficient to characterize all dead-locks in the distributed system (see Figure8). Instead, local waits-for graphs must becombined into a more "global" waits-forgraph. (CentrAlized 2PL does not have thisproblem, since there is only one scheduler.}We describe two techniques for construct-ing global waits-for graphs: centralized an dhierarchical deadlock detection.

    In the c e n t r a l i z e d approach, one site isdesignated the deadlock detector for thedis tr ibu ted system [GRAY78, STON79]. Pe-riodically (e.g., every few minutes) eachscheduler sends its local waits-for graph tothe deadlock detector. The deadlock detec-tor combines the local graphs into a system-

    wide waits-for graph by constructing theunion o f the local graphs.In the h i e r a r c h i c a l approach, the data-

    base sites are organized into a hie rarchy (ortree), with a deadlock detector at each nodeof the hierarchy [MENA79]. For example,one might group sites by r eg ion , then bycoun t ry, then by c o n t i n e n t . Deadlocks tha tare local to a single site are dete cted at th atsite; deadlocks involving two or more sitesof the same region are detected by theregional deadlock detector; and so on.

    Although centralized and hierarchicaldeadlock detection differ in detail, both in-volve periodic transmiss ion of local waits-for information to one or more deadlockdetector sites. The periodic nature of theprocess introduces two problems. First, adeadlock may exist for several minuteswithout being detected, causing response-time degradation. The solution, executingthe deadlock detector more frequently, in-creases the cost of deadlock detection. Sec-ond, a transaction T may be restarted forreasons other than concurrency control(e.g., its site crashed). Until T's restartpropagates to the deadlock detector, thedeadlock detector can find a cycle in thewaits-for graph that includes T. Such acycle is called a p h a n t o m d e a d l o c k .Whenthe deadlock detector discovers a ph ant om

    deadlock, it may unnecessarily restart atransaction other than T. Special precau-tions are also needed to avoid unnecessaryresta rts for deadlocks in voting 2PL. 2

    2 Supp ose logical da ta it em X has copies x~, x2, and x3,and suppose usmg voting 2PL T, owns write-locks onx] and x2 but T,'s lock re ques t for x~ is blocked by Tj.

    Computing Surveys, Vol 13, No. 2, June 1981

  • 8/8/2019 Xu Ly Tuong Tranh

    15/37

    Concur rency

    A m a j o r c o s t o f d e a d l o c k d e t e c t i o n i s t h er e s t a r ti n g o f p a r t i a ll y e x e c u t e d t r a n s a c -t io n s . P re d e c l a r a t i o n c a n b e u s e d t o r e d u c et h is c o s t . B y o b t a i n i n g a t r a n s a c t i o n ' s l o c k sb e f o r e i t e x e c u t e s , th e s y s t e m w i ll o n l y r e -s t a r t t r an s a c t io n s t h a t h a v e n o t y e t e x e-c u t e d . T h u s l i t t l e w o r k i s w a s t e d b y t h eres t a r t .

    4 . S Y N C H R O N I Z AT I O N T EC H N I Q U E SB A S E D O N T I M E S TA M P O R D E R I N G

    Ti m e s t a m p o r d e ri n g ( T / O ) is a t e c h n i q u ew h e r e b y a s e r i a l i z a t i o n o r d e r i s s e l e c t e d ap r io r i a n d t r a n s a c t i o n e x e c u t i o n i s f o r c e d t oo b e y th i s o r d e r. E a c h t r a n s a c t i o n i s a s -s ig n e d a u n i q u e t i m e s t a m p b y it s T M . T h e

    T M a t t a c h e s t h e ti m e s t a m p t o a ll d m - r e a d sa n d d m - w r i t e s i s s u e d o n b e h a l f o f t h e t r an s -a c t i o n , a n d D M s a r e r e q u i r e d t o p r o c e s sconfl ic t ing operat ionsi n t i m e s t a m p o r d er.T h e t i m e s t a m p o f o p e r a t io n O is d e n o t e dts (O) .

    T he de f in i t ion o f conf l ic t ing ope ra t ionsd e p e n d s o n t h e t y p e o f s y n c h r o n i z at io nb e i n g p e r f o r m e d a n d i s a n a l o g o u s t o c o n -f l i c t ing locks . For rw synchron iza t ion , twoo p e r a t i o n sconflict i f ( a ) b o t h o p e r a t e o nt h e s a m e d a t a i t e m , a n d ( b ) o n e i s a d m -r e a d a n d t h e o t h e r i s a d m - w r i t e . F o r w ws y n c h r o n i z a ti o n , t w o o p e r a t i o n sconflict i f(a ) b o t h o p e r a t e o n t h e s a m e d a t a i t e m , a n d(b ) b o t h a r e d m - w r i te s .

    I t is e a s y to p r o v e t h a t T / O a t t a in s a na c y c li c - - . ~ ( - . ~ w) r e l a ti o n w h e n u s e d f o rr w ( w w ) s y n c h r o n i z a t i o n . S i n c e e a c h D Mprocesses conf l i c t ing ope ra t ions in t ime-s ta m p order, eac h edg e o f th e - - . ~w~ ( -~ ww)r e l a ti o n is i n t i m e s t a m p o r d e r. C o n s e -q u e n t l y, a l l p a t h s i n t h e r e l a t i o n a r e i nt i m e s t a m p o r d e r a n d , s i n c e a ll t r a n s a c t i o n sh a v e u n i q u e t im e s t a m p s , n o c y c l e s a r e p o s-s ib l e. I n a d d i t io n , t h e t i m e s t a m p o r d e r i s aval id ser ia l iza t ion order.

    4 .1 Bas ic T /O Implemen ta tion

    A n i m p l e m e n ta t io n o f T / O a m o u n t s t obu i ld ing aT/O schedule r,a s o f t w a r e m o d -u l e th a t r e c e iv e s d m - r e a d s a n d d m - w r i te s

    Insofar as xa's scheduler is concerned, T, is w aitmg for%. How ever, since T, has amaJorityof the copieslocked, T, can proceed without waiting for Tj. T hisfact shou ld be incorporated into the deadlock resolu-tion sch eme to avoid unnecessary restarts.

    Cont ro l in Da tabase Sys tem s 199

    a n d o u t p u t s t h e s e o p e r a t i o n s a c c o r d i n g t othe T/O speci f ica t ion [SHAP77a, SHAP77b] .I n p r a c t i c e , p r e w r i t e s m u s t a l s o b e p r o c -e s s e d t h r o u g h t h e T / O s c h e d u l e r f o r t w o -p h a s e c o m m i t t o o p e r a t e p r o p e r l y. A s w a st h e c a s e w i t h 2 P L , t h e b a s i c T / O i m p l e -m e n t a t i o n d i s t r i b u t e s t h e s c h e d u l e r s a l o n gw i t h t h e d a t a b a s e [ B E B N 8 0 a ] .

    I f w e i g n o re t w o - p h a s e c o m m i t, t h e b a s icT / O s c h e d u l e r is q u i t e s im p l e. A t e a c h D M ,a n d f o r e a c h d a t a i t e m x s t o r e d a t th e D M ,t h e s c h e d u l e r r e c o rd s t h e l a rg e st t i m e s t a m po f a n y d m - r e a d ( x ) o r d i n - w r it e (x ) t h a t h a sb e e n p r o c e s s e d . T h e s e a r e d e n o t e d R - ts ( x )a n d W- t s ( x ) , r e s p e c t i v e l y. F o r r w s y n c h r o -n iza t ion , schedu le r S ope ra te s a s fo l lows .C o n s i d e r a d i n - r e a d ( x ) w i t h t i m e s t a m p T S .I f T S < W - t s( x ) , Srejects t h e d m - r e a d a n da b o r t s t h e i s s u in g t ra n s a c t i o n . O t h e r w i s e So u t p u t s t h e d m - r e a d a n d s e t s R - t s ( x ) t omax(R- t s (x ) - ,TS) . For a dm-wr i t e (x ) wi tht i m e s t a m p T S , S r e j e c t s t h e d m - w r i t e ifT S < R - t s( x ); o th e r w i s e i t o u t p u t s t h e d m -w r i t e a n d s e t s W- t s (x ) t o m a x ( W - t s ( x ) ,T S ) .F o r w w s y n c h r o n i z a t io n , S r e j e c t s a d m -w r i te ( x ) w i th t im e s t a m p T S i f T S < W -t s( x ) ; o t h e r w i s e it o u t p u t s t h e d m - w r i t e a n d

    s e t s W- t s ( x ) t o T S .W h e n a t r a n s a c t i o n i s a b o r t e d , i t i s a s -s i g n e d a n e w a n d l a rg e r t i m e s t a m p b y i tsT M a n d i s r e s ta r te d . R e s t a r t i s s u es a red i s c u s s e d f u r t h e r b e l o w.

    T w o - p h a s e c o m m i t is i nc o rp o ra te d b yt i m e s t a m p i n g p r e w r i t e s a n d a c c e p t i n g o rr e j e c ti n g p r e w r i t e s i n s t e a d o f d m - w r i te s .O n c e a s c h e d u l e r a c c e p t s a p r e w r i te , i t m u s tg u a r a n t e e t o a c c e p t t h e c o r r e sp o n d i n g d m -

    w r i te n o m a t t e r w h e n t h e d m - w r i te a rr iv e s.F o r r w ( o r w w ) s y n c h r o n i z a t i o n , o n c e Sa c c e p t s a p r e w r it e( x ) w i t h t im e s t a m p T S i tm u s t n o t o u t p u t a n y d m - r e a d ( x ) ( or d m -w r i t e ( x ) ) w i t h t i m e s t a m p g r e a t e r t h a n T Su n t i l t h e d m - w r i t e (x ) i s o u t p u t . T h e e f f e c ti s s imi l a r to se t t ing a wr i t e lock o n x fo r thed u r a t i o n o f t w o - p h a s e c o m m i t . .

    To i m p l e m e n t t h e a b o v e ru le s , Sbuffersd m - r e a d s , d m - w r i t e s , a n d p r e w r i t e s . L e tm i n - R - ts ( x ) b e t h e m i n i m u m t i m e s t a m p o fa n y b u f f e r e d d i n - r e a d ( x ) , a n d d e f i n e m i n -W- t s ( x ) a n d m i n - P - t s ( x ) a n a l o g o u s l y. R ws y n c h r o n i z a t i o n i s a c c o m p l i s h e d a s f o ll ow s :

    1 . L e t R b e a d m - r e a d ( x ) . I f t s ( R ) < W -t s (x ) , R i s r e j ec ted . E l se i f t s (R) > min-P- t s (x ) , R i s bu ffe red . E lse R i s ou tpu t .

    Com put ing Su rveys , Vo l . 13, No . 2 , Jun e 1981

  • 8/8/2019 Xu Ly Tuong Tranh

    16/37

    200 P. A. Bernstein and N. Goodman

    Le t R ffi dm -re ad (x).Le t W ffi r im-w rite (x).R is ready f l i t p recedes th e ea r l i e s t p rew r i t e reques t :

    ff t s (R) < min-P- ts(x) .W is ready i f i t p recedes th e ea r l i e s t d in - read

    reques t :

    ifts (W) < min-R-ts(x).When a din-write(x) arrives, do the following:

    I Bufferit,I

    es

    I Outpu t a l l r eadyW ' s , and debuffe r the i r ' 1p rewr i t e s. (Th i s ma y inc rease min-P- t~ (x )

    In d m a k e s o m eR 's ready.)!

    i

    Outpu t a ll r eady R ' s . (Th i s m ay inc rease |m i n - R - t s (x ) a n d m a k e s o m eW ' s ready.) I

    IFigure 9 . B u ff e r e m p t y i n g f o r b a s ic T / O r w s y n c h r o m z a t io n .

    2. Let P be a prewrite(x). If ts(P) < R-ts(x), P is rejected. Else P is buffered.

    3. Let W be a dm-write(x). W is never

    rejected. If ts(W) > min-R-ts(x), W isbuffered. (If W were output it wouldcause a buffered dm-read(x) to be re-jected.) Else W is output.

    4. When W is output, the correspondingprewrite is debuffered. If this causesmin-P-ts(x) to increase, the buffered din-reads are re tested to see if any of the mcan be output. If this causes min-R-ts(x)to be increased, the buffered dm-writesare also retested, and so forth. This proc-ess is diagramed in Figure 9.

    Ww synchronization is accomplished as fol-lows:

    1. Let P be a prewrite(x). If ts(P) < W-ts(x), P is rejected; else P is buffered.

    2. Let W be a dm-write(x). W is never

    rejected. If ts(W) > min-P-ts(x), W isbuffered; else W is output.

    3. When W is output, the corresponding

    prewrite is debuffered. If this causesmin-P-ts(x) to be increased, the buffereddm-writes are retested to see if any cannow be output. See Figure 10.

    As with 2PL, a common variation is torequire that transactions predeclare theirreadsets and writesets, issuing all dm-readsand prewrites before beginning their mainexecution. 3 If all operations are accepted,

    3 T h e s e p r e w r i te s a r e n o n s t a n d a r d r e l a ti v e t o t h e d e f-in i t ion in S ec t ion 1 .4 . S ince new va lues fo r the da tai t e m s i n t h e w r i t e s e t a r e n o t y e t k n o w n , t h e s e p r e -wr i t e s don o t i n s t r u c t D M s t o s t o r e v a l u e s o n s e c u r es torage; ins tead, prewri te (x) merely"warns" t h e D Mto expec t a d in -wr i t e (x ) m th e nea r fu tu re . How ever,these p rewr i t e s a re p rocessed by synchron iza t ion a l -g o r i t h m s e x a c t ly a s " s t a n d a r d " o n e s a re .

    Com puUng Surveys, V ol. 13, No. 2, Ju ne 1981

  • 8/8/2019 Xu Ly Tuong Tranh

    17/37

    Concurrency Control in Database Syst ems 201

    Wh en a din-wri te(x) arr ives , do the fol lowmg:

    I Buffe r l t ]

    es

    Outpu t a l l r eady W's and debuffe r the i rp rewr ites . (Th is m ay increase min-P- t s (x )a n d m a k e s o m e W ' s re a d y. )

    IFigure 10 . Buffe r empty ing fo r bas ic T / O ww synchron iza tion .the transaction is guaranteed to executewithout danger of restart. Another varia-tion is to delay the processing of operationsto wait for operations with smaller time-stamps. The extreme version of this heuris-tic is conservative T/O, described in Sec-tion 4.4.

    4 . 2 T h e T h o m a s W r it e R u le

    For ww synchronization the basic T/Oscheduler can be optimized using an obser-vat ion of THOM79. Le t W be a dm-write(x),and suppose ts(W) < W-ts(x). Instead ofrejecting W we can simply ignore it. Wecall this the Thomas Write Rule (TWR ).Intuitively, TW R applies to a dm-write thattries to place obsolete information into thedatabase. The rule guarantees that the ef-fect of applying a set of dm-writes to x isidentical to wha t would have happened hadthe dm-writes been applied in timestamporder.

    If TWR is used, there is no need to in-corporate two-phase commit into the wwsynchronization algorithm; the ww sched-uler always accepts prewrites and neverbuffers dm-writes.

    4 . 3 M u lt iv e r s io n T / O

    For rw synchronization the basic T/Oscheduler can be improved using multiver-sion data items [REED78]. For each dataitem x there is a se t of R-ts's and a set of

    (W-ts, value) pairs, called versions. The R-ts's of x record the timestamps of all exe-cuted dm-read(x) operations, and the ver-sions record the timestamp s and values ofall executed dm-write(x) operations. (Inpractice one cannot store R-ts's and ver-sions forever; techniques for deleting old

    versions and timestamps are described inSections 4.5 and 5.2.2.)Multiversion T/O accomplishes rw syn-

    chronization as follows (ignoring two-phasecommit). Let R be a dm-read(x). R is proc-essed by reading t he version of x wi th larg-est timestamp less than ts(R) and addingts(R) to x's set of R-ts's; see Figure lla. Ris never rejected. Let W be a dm-write(x),and let interval(W) be the interval fromts( W) to the smallest W-ts(x) > ts(W); 4see Figure llb. If any R-ts(x) lies ininterval(W) , W is rejected; otherwise W isoutp ut and creates a new version of x withtimestamp ts(W).

    To prove the correctness of multiversionT/O, we must show that every execution isequivalent to a serial execution in time-stamp order [BERNS0b]. Let R be a dm-read(x) that is processed "out of order";that is, suppose R is executed after a dm-write(x) whose timestamp exceeds ts(R).Since R ignores all versions with time-

    4In terv al(W ) ffi ( ts(W),oo) i f no W -ts(x) > ts (W )exists.

    C o m p u t i n g S u r ve y s , Vo l . 1 3, N o . 2 , J u n e 1 9 8 1

  • 8/8/2019 Xu Ly Tuong Tranh

    18/37

  • 8/8/2019 Xu Ly Tuong Tranh

    19/37

    Concurrency Control in Database Systems

    Le t R ffi d in- r ead( x) .R is ready i f t s ( R ) ~ i n t e r v a l(P ) , where P i s any bu ffe redprewr i t e (x ) .

    W h e n a d m - w r i t e a r r i v e s d o t h e f o ll o w in g :

    O u t p u t I t a n d d e b u f f e r i t s p r e w r i t e [

    1o t u t r e a d y ' s ' l

    Figure 1 2 . B u f f e r e m p t y i n g f o r m u l t i v e r s l o n T / O .

    4.4 Conservat ive T/O

    Conservative timestamp ordering i s a tech -n i q u e f o r e l i m i n a t i n g r e s t a r t s d u r i n g T / Os c h e d u l i n g [ B E R N 8 0 a ] . W h e n a s c h e d u l e rr e c e i v e s a n o p e r a t i o n O t h a t m i g h t c a u s e af u t u r e r e s t a r t , t h e s c h e d u l e rdelays 0 un t i l

    i t i s su re th a t no fu tu re r e s t a r t s a re poss ib le.C o n s e r v a t i v e T / O r e q u i r e s t h a t e a c h

    s c h e d u l e r r e c e i v e d m - r e a d s ( o r d m - w r i t e s )f r o m e a c h T M i n t i m e s t a m p o r d e r. F o re x a m p l e , i f s c h e d u l e r S j r e c e i v e s d m -r e a d ( x ) f o l lo w e d b y d m - r e a d ( y ) f r o m T M , ,t h e n t s ( d m - r e a d ( x ) ) _ t s ( d m - r e a d ( y ) ) .S i n ce t h e n e t w o r k i s a s s u m e d t o b e a F I F Oc h a n n e l, t h i s t i m e s t a m p o r d e r i n g is a c c o m -p l i s h e d b y r eq u i r in g t h a t T M ,send din-

    reads (o r d in -wr i t e s ) to S : in t imes tampo r d e r :C o n s e r v a t iv e T / O b u f f er s d i n - re a d s a n d

    d i n - w r i te s a s p a r t o f i ts n o r m a l o p e r a t i o n .W h e n a s c h e d u l e r b u f f e r s a n o p e r a t i o n , i tr e m e m b e r s t h e T M t h a t s e n t it. L e t m i n - R -t s( T M , ) b e t h e m i n im u m t im e s t a m p o f a n yb u ff e r e d d i n - r e a d f r o m T M ~ , w i t h m i n - R -t s (TM ,) ffi -oo i f no suc h d in - read i s

    5 T h i s c a n b e i m p l e m e n t e d b y r e q u i ri n g t h a t T M sp r o c e s s t r a n s a c t i o n s s e ri a ll y A l t e r n a t i v e ly, w e c a nr e q u i r e t h a t t r a n s a c t i o n s i s s u e a l l d m - r e a d s b e f o r eb e g i n n i n g t h e i r m a i n e x e c u t m n , a n d a l l d m - w r i t e s a f t e rt e r m i n a t i n g t h e i r m a i n e x e c u ti on . T h e n t r a n s a c t io n sc a n e x e c u t e c o n c u r r e n t l y, a l t h o u g h t h e y m u s t t e r m i -n a t e i n t i m e s t a m p o r de r.

    203

    b u ff e r e d . D e f i n e m i n - W - t s ( T M i ) a n a lo -gous ly.

    C o n s e r v a t i v e T / O p e r f o r m s r w s y n c h r o -niza t ion as fo l lows:

    1 . Le t R b e a d in - read(x ) . I f ts (R ) > min-W - t s( T M ) f o r a n y T M i n t h e s y s te m , R

    i s bu ffe red . E l se R i s ou tpu t .2. L e t W b e a d m - w r i te ( x ) . I f t s ( W ) :> m i n -R - t s ( T M ) f o r a n y T M , W i s b u f fe r e d .E l s e W i s o u t p u t .

    3 . W h e n R o r W i s o u t p u t o r b u f f e r e d , t h i sm a y in c r e a se m i n - R - t s ( T M ) o r m i n - W-t s ( T M ) ; b u f f e r e d o p e r a ti o n s a r e r e t e s t e dt o s e e if t h e y c a n n o w b e o u t p u t .

    T h e e f f e c t i s t h a t R i s o u t p u t i f a n d o n l y

    i f ( a) t h e s c h e d u l e r h a s a b u f f e r e d d i n - w r i t ef r o m e v e r y T M , a n d ( b) t s ( R ) < m i n i m u mt i m e s t a m p o f a n y b u f f e r e d d m - w r i te . S im i -l a r ly, W i s ou tp u t i f an d o n ly i f ( a) the re i sa b u f f e r e d d i n - re a d f r o m e v e r y T M , a n d (b )t s (W ) < m i n i m u m t i m e s t a m p o f a n yb u ff e r e d d in - r ea d . T h u s R ( o r W ) is o u t p u tf f a n d o n l y i f t h e s c h e d u l e r h a s r e c e i v e deve ry d in -w r i t e (o r d in - read) w i th sma l l e rt i m e s t a m p t h a t i t w i ll e v e r r e ce i v e .

    W w s y n c h r o n i z a t i o n i s a c c o m p l i s h e d a sfo l lows:

    1. L e t W b e a d i n - w r it e (x ) . I f t s ( W ) > m i n -W - t s( T M ) f o r a n y T M in th e s y s t em , Wis buffe red ; e l se it i s ou tp u t .

    2. W h e n W i s b u f f e r e d o r o u t p u t , th i s m a yi n c r e a s e m i n - W - t s ( T M ) ; b u f f e r e d d in -w r i t e s a r e r e t e s t e d a c c o r d in g l y.

    T h e e f f e c t i s t h a t t h e s c h e d u l e r w a i t s

    u n t i l i t h a s a b u f f e r e d d i n - w r i te f r o m e v e r yT M a n d t h e n o u t p u t s t h e d i n - w r i t e w i t hs m a l l e s t t i m e s t a m p .

    Tw o - p h a s e c o m m i t n e e d n o t b e t i g h t l yi n te g r a t e d i n to c o n s e r v a t iv e T / O b e c a u s ed m - w r i te s a r e n e v e r r e j ec t e d . A l t h o u g h p r e -w r i t e s m u s t b e i s s u e d f o r a l l d a t a i t e m su p d a t e d , t h e s e o p e r a t io n s a r e n o t p r o c e s s e db y t h e c o n s e r v a t iv e T / O s c h e d u le r s .

    T h e a b o v e i m p l e m e n t a t i o n o f c o n s e rv a -t iv e T / O s u f f e rs t h r e e m a j o r p r o b l e m s : (1 )I f s o m e T M n e v e r s e n d s a n o p e r a ti o n t os o m e s c h e d u l e r, t h e s c h e d u l e r w i l l " g e ts t u c k " a n d s t o p o u t p u t t i n g . ( 2 ) To a v o i dt h e f ir st p ro b l em , e v e r y T M m u s t c o m m u -n i c a t e r e g u l a r l y w i t h e v e r y s c h e d u l e r; t h i si s in feas ib le in l a rge ne tworks . (3 ) The im-

    Computing Surveys, V ol. 13, No. 2, J un e 1981

  • 8/8/2019 Xu Ly Tuong Tranh

    20/37

    204 P. A. Bernstein and N. Goodman

    A class i s d e f i n e d b y a r e a d s e t a n d a w r i t e se t . F o re x a m p l e ,

    CI : r ea d se t ffi {x l} , w r i t es e t ffi {y l , Y2}C2: re a d se t ffi {xl , y2}, w ri te se t ffi {yl , y2, z2, z3}C3: readset ff i {y2, z3}, wri teset - - - {xl , z2, z3}

    A t r a n s a c t i o n i s a m e m b e r o f a c l a s s i f i t s r e a d s e t i sa s u b s e t o f t h e c l a s s r e a d s e t a n d i t s w r i t e s e t i s as u b s e t o f t h e c l a s s w n t e s e t . F o r e x a m p l e ,

    T l : r e ad se t ffi {x l} , w r i t es e t ffi {Yl, y2}T 2 : r e a d s e t f fi ( y 2 ) , w r i t e s e t - - { z 2, z 3)Ts" re ad se t ffi {z3}, w r i t es e t ffi {x~}

    T ~ i s a m e m b e r o f C1 a n d C 2 T ~ is a m e m b e r o f C2 a n d C 3 T 3 I s a m e m b e r o f C3

    Figure 1 3 . T r a n s a c t i o n c l as s es .

    plem enta tion is overly conservative; the wwalgorithm, for instance, processes al l dm-writes in timestamp order, not merely con-flicting ones. These problems are addressedbelow.

    Null Operations. To solve the firstproblem, TMs are required to periodicallysend timestamped null operations to eachscheduler in the absence of "real" traffic. Anull operation is a dm-read or dm-writewhose sole purpose is to convey timest ampinformation and thereby unblock "real"dm-reads and prewrites. An impatientscheduler can prompt a TM for a null op-eration by sending a "request message."For example, for rw synchronization sup-pose scheduler S wants to process a dm-read with timestamp TS, bu t does not havea buffered dm-wr ite from TM~. S can senda message to TM~ requesting a null-dm-write with timestamp greater tha n TS.

    A variation is to use null operations withvery large (perhaps infinite) timestamps.For example, if TM~ rarely needs to issuedm-reads to S, TM, can send S a null-dm-read with infinite timestamp signifying tha tTM, does not intend to communicat e withS until further notice.

    Transaction Classes. Transac t ionclasses [BER~78a, BERN80d] is a techniquefor reducing communication in conserva-tive T/O and for supporting a less conserv-ative scheduling policy. As in predeclara-tion, assume t hat every transaction's read-set and writeset are known in advance. Aclass is defined by a readse t and a wri teset(see Figure 13). Tra nsa ction T is a member

    of class C if readset(T) is a subset of read-set(C) a nd writeset(T) is a subset of write-set(C). (Classes need not be disjoint.)

    Class definitions are not expected tochange frequently during normal operationof the system. Changing a class definitionis akin to changing the database schemaand requires mechanisms beyond the scopeof this paper. We assume that class defini-tions are stored in static tables that areavailable to an y site requiring them.

    Classes are associated with TMs. Everytransaction that executes at a TM must bea member of a class associated with theTM. If a transacti on is submitt ed to a T Mthat has no class containing it, the trans-action is forwarded to another TM thatdoes. We assume t ha t every class is associ-ated with exactly one TM, and vice versa.The class associated with TM, is denotedC,. To execute transactions that are mem-bers of class C at two TMs, we define an-othe r class C' with the same definition as Cand associate C with one TM and C' withthe other. To execute transactions tha t aremembers of two classes at one site, we

    multiprogram two TMs at tha t site.Classes are exploited by conservativeT/ O schedulers as follows. Consider rw syn-chronization and suppose scheduler Swants to output a dm-read(x). Instead ofwaiting for dm-writes with smaller time-stamps from all TMs, S need only waitfor dm-writes from those TMs whose classwriteset contains x. Similarly, to process adm-write (x), S need only wait for dm-readswith smaller timestamp from those TMswhose class readset contains x. Thus com-munication requirements are decreased,and th e level of concurrency in the systemis increased. Ww synchronization proceedssimilarly.

    Conflict Graph Analysis. Conflictgraph analysis is a technique for furtherimproving the performance of conservativeT/O with classes. A conflict graph is anundirected graph that summarizes poten-tial conflicts between transa ctions in differ-ent classes. For each class C, the graphcontains two nodes, denoted r~ and w,,which represent th e readset a nd writeset ofC,. The edges of the graph are defined asfollows (see Figure 14): (1) For each class

    Com put ing Surveys , V ol 13, No 2 , Ju ne 1981

  • 8/8/2019 Xu Ly Tuong Tranh

    21/37

    Concurrency Control in Database Systems 205

    De f ine C1, C2, Ca as in F i gur e 13 .

    C1 re ad se t = {xl} C2 re ad se t = {x] , y2} C3 re ad se t ffi {Y2, za}

    C1 wr i te se t = {yl , y2} C2 wr i te set = {yl , y2, z2, za} Ca w ri t es et ff i {xl , z2, za}

    Figure 14.

    C~ th er e is avertical edge b e t w e e n r ~ a n dw~; (2) for each pa i r of c lasses C, and Cj(wi th i ~ j ) t he re i s ahorizontal edgebe tw ee n w~ and wj i f and on ly i f wr i te se t(C~)in te r sec t s wr i t e se t (C~) ; (3 ) fo r each pa i r o fc lasses C, an d C~ (wi th i # j ) th ere i s adiagonal edge be tw een r~ and w~ i f and on lyi f readse t (C~ ) in te rsec ts w r i tese t (C~) .

    In tu i t ive ly, a ho r i zon ta l edge ind ica te st h a t a s c h e d u le r S m a y b e f o rc e d to d e l a yd m - w r i t e s f o r p u r p o s e s o f w w s y n c h r o n i z a -t ion . Su pp os e c l a s ses C~ and C~ a re con-nec ted by a hor i zon ta l edge (w, , w j ) , i nd i -ca t ing tha t the i r c l a s s w r i t e se t s in t e r sec t. I fS r e c e i v e s a d m - w r i t e f r o m C ,, i t m u s t d e l a yt h e d m - w r i t e u n t i l i t r e c e i v e s a ll d m - w r i te s

    w i t h s m a l le r t i m e s t a m p s f r o m C j. S im i l ar ly,a d i a g o n a l e d g e i n d i c a t e s t h a t S m a y n e e dt o d e l a y o p e r a t i o n s f o r r w s y n c h r o n i z a t io n .

    C o n f l ic t g r a p h a n a l y s i s i m p r o v e s t h e s it -ua t ion by iden t i fy ing in t e rc l a s s conf l i c t st h a t c a n n o t c a u s e n o n s e r ia l iz a b l e b e h a v i o r.T h i s c o r r e s p o n d s t o i d e n t i f y i n g h o r i z o n t a la n d d i a g o n a l ed g e s t h a t d o n o t r e q u i r e s y n -chron iza t ion . In pa r t i cu la r, s chedu le r s needo n l y s y n c h r o n i z e d m - w r i te s f r o m C , a n d C ji f e i the r (1 ) the h or i zon ta l e dge be tw ee n w~a n d w j is e m b e d d e d i n acycle of the conf l i c tg raph ; o r (2) po r t ions o f the in t e r s ec t ion o fC ~ 's w r i t e s e t a n d C / s w r i t e s e t a r e s t o r e d a ttwo o r more DMs [BERN80C] . Tha t i s , i fcond i t ions (1 ) and (2 ) do no t ho ld , s chedu-l e r S n e e d n o t p r o c e s s d m - w r i t e s f r o m C ~a n d C j i n t i m e s t a m p o r d e r. S i m i l a rl y, d m -r e a d s f r o m C , a n d d m - w r i t e s f ro m C j n e e do n l y b e p r o c e s s e d i n t i m e s t a m p o r d e r i fe i t h e r (1 ) t h e d i a g o n a l e d g e b e t w e e n r, a n dw j i s e m b e d d e d i n a c y c l e o f t h e c o n f li c tg raph ; o r (2) po r t ions o f the in t e r sec t ion o fC , ' s r e a d s e t a n d C j ' s w r i t e s e t a r e s t o r e d a tt w o o r m o r e D M s .

    S ince c l a s ses a re de f ined s t a t i ca l ly, con-f l i c t g raph ana lys i s i s a l so pe r fo rmed s t a t i -

    C onf l i c t g r a ph .

    ca l ly. The ana lys i s p roduces a t ab le ind i -c a t i n g w h i c h h o r i z o n t a l a n d v e r t i c a l e d g e sr e q u i r e s y n c h r o n i z a t i o n a n d w h i c h d o n o t .This tab le , l ike c lass def in i t ions , i s d is t r ib-u t e d i n a d v a n c e t o a l l s c h e d u l e r s t h a tneed i t .

    4 .5 Timest amp M anagemen tA c o m m o n c r i ti ci sm o f T / O s c h e d u l e r s ist h a t t o o m u c h m e m o r y i s n e e d e d t o s t o r et im e s t a m p s . T h i s p r o b l e m c a n b e o v e r c o m eb y " f o rg e t t in g " o l d t i m e s t a m p s .

    T i m e s t a m p s a r e u s e d in b as ic T / O t or e j e c t o p e r a t i o n s t h a t " a r r iv e l a t e , " f o r e x -a m p l e , t o r e j e c t a d m - r e a d ( x ) w i t h t i m e -s t a m p T S 1 t h a t a r r i v e s a f t e r a d m - w r i t e ( x )

    w i t h t i m e s t a m p T S 2 , w h e r e T S 1 < : T S 2. I npr inc ip le , TS1 and TS2 can d i ff e r by ana r b i t r a r y a m o u n t . H o w e v e r, in p r a c t i c e i t i su n l i k e l y t h a t t h e s e t i m e s t a m p s w i l l d i f f e rb y m o r e t h a n a fe w m i n u t e s. C o n s e q u e n t l y,t i m e s t a m p s c a n b e s t o r e d i n s m a l l t a b l e sw h i c h a r e p e r i o d ic a l ly p u rg e d .

    R - t s ' s a r e s t o r e d i n t h eR-table w i t h e n -t r ie s o f t h e f o r m ( x, R - t s ) ; f o r a n y d a t ai t e m x , t h e r e i s a t m o s t o n e e n t r y. I n a d d i -t ion , a var iable ,R-min, t el ls th e m a x i m u mv a l u e o f a n y t i m e s t a m p t h a t h a s b e e np u rg e d f r o m t h e t a b l e . To f i n d R - t s ( x ) , as c h e d u l e r s e a r c h e s t h e R - t a b l e f o r a n ( x ,T S ) e n t r y. I f s u c h a n e n t r y i s f o u n d , R -t s (x ) = TS ; o the rw ise , R- t s (x ) _ R- ra in . Toe r r o n t h e s i d e o f s a f e t y, t h e s c h e d u l e ra s s u m e s R - t s ( x ) ffi R - r a in . To u p d a t eR -t s ( x ) , t h e s c h e d u l e r m o d i f i e s t h e ( x, T S )en t ry, i f one ex i s t s . O the rwise , a n ew e n t ryis c r e a t e d a n d a d d e d t o t h e t a b l e. W h e n t h eR - t a b l e i s fu ll, t h e s c h e d u l e r s e l e c t s a n a p -p r o p r i a t e v a l u e f o r R - r a i n a n d d e l e t e s a l le n t r i e s f r o m t h e t a b l e w i t h s m a l l e r t i m e -s t a m p . W- t s ' s a r e m a n a g e d s i m i l a r l y, a n da n a l o g o u s t e c h n i q u e s c a n b e d e v i s e d f o rm u l t i v e r s i o nd a t a b a s e s .

    Com puting Survey s, Vol. 13, No. 2, Ju ne 1981

  • 8/8/2019 Xu Ly Tuong Tranh

    22/37

    206 P. A. Bern