Chapter 15: Transaction Management

Chapter 15: Transaction Management

Chapter 16: Transaction Management Transaction Processing Overview Transaction Concept Transaction Definition in SQL Transaction State ACID Properties Concurrent Executions Serializability Recoverability Implementation of Isolation Testing for Serializability.

Transaction Processing - Basics A transaction is a logical unit of a database

processing Transaction processing systems include large

databases and hundreds of concurrent users Examples of these systems are:

airline reservations, banking, credit card processing, supermarket checkout, and similar systems

Multi - User Database Systems One way to classify DBMSs is according to the number of

concurrent users: single user multi-user

Majority of database systems are of a multi - user type Concurrent (or simultaneous from the user point of view)

database usage is possible thanks to computer multiprogramming

Multiprogramming operating systems execute some commands of one process, then suspend this process and execute some commands of another process

After a while, the execution of the first process is resumed at the point where it was interrupted

This type of process execution is called interleaving

process 1

Interleaved and Parallel Processes

t1 t2 t3 t4t5

process 1

process 2 process 2

process 1

t2- t1+ t4- t3

process 2

t

t

t3- t2+ t5- t4

Inter-leaved

Parallel

A Question for You Are multiprogramming and interleaving more efficient

than monoprogramming with serial program execution?

Answers:a) Yes, but only from a user’s point of view

b) No, because multiprogramming means interrupting and resuming programs, which introduce OS overheads

c) Yes, from the point of view of both users and computers, because an OS interrupts execution of a program when it issues an i/o operation, thus saving long idle times, and resumes its execution when the i/o has been finished

Transaction Concept A transaction is a single logical unit of a database processing that includes one or more database

access operations (read and write ) A transaction is a unit of program execution that accesses and possibly updates various

data items. It may involve one or more operations on the database It could be as simple as a single SQL command Or as complex as the set of accesses performed by an application program

E.g. transaction to transfer $50 from account A to account B:1. read(A)2. A := A – 503. write(A)4. read(B)`5. B := B + 506. write(B)

Here our transaction consists of four database accesses (reads and writes), and twp non-database activities (decreasing value of A & increasing value of B)

A transaction should always transform the database from one consistent state to another.

If a transaction finishes successfully, all data it has changed are visible to other transactions

If a transaction fails for any reason, DBMS has to undo all the changes that the transaction made against the database

Transactions (continued) In multi – user transaction processing systems, users execute

database transactions concurrently Most often, concurrent means interleaved The users can attempt to modify the same database items at

the same time, and that is potential source of database inconsistency

Checking database integrity constraints is not enough to protect a database from threats induced by its concurrent usage

Two main issues to deal with: Failures of various kinds, such as hardware failures and system crashes, due to

which transaction is incomplete Concurrent execution of multiple transactions

Transaction Outcomes

A transaction can have one of two outcomes: If it completes successfully, the transaction is said

to have committed and the database reaches a new consistent state.

If it does not execute successfully, the transaction is aborted.– In this case, the database must be restored to the consistent state it was in before the transaction started.– This is known as roll-back.

Whatever the outcome, the database is in a consistent state at the end of the transaction

Transaction Support

Once a transaction has committed, it cannot be aborted. Thus, if we decide that a committed transaction was a mistake,

then we must perform another transaction to reverse it. On the other hand, an aborted transaction can be restarted later,

and depending on the cause of failure, may successfully execute and commit at that time.

A DBMS has no way of knowing which updates are grouped together to form a single, logical transaction.

Therefore, the user must be provided with a way to indicate the boundaries of each transaction.

For example, there may be keywords such as BEGIN_TRANSACTION, COMMIT, and ROLLBACK to delimit a transaction.

If such delimiters are not used, the whole program is usually treated as a single transaction with the DBMS automatically performing a COMMIT upon successful termination, or ROLLBACK if not.

Transaction Execution with SQL Transaction support provided by

COMMIT ROLLBACK

COMMIT statement is reached when all changes are permanently recorded within the database, it indicates successful end of transaction

ROLLBACK statement is reached in which all changes are aborted and database is rolled back to previous consistent stage, it indicates unsuccessful end of transaction.

Transaction State Active – the initial state; the transaction stays in this state

while it is executing

Partially committed – after the final statement has been executed.

Failed -- after the discovery that normal execution can no

longer proceed. Aborted – after the transaction has been rolled back and

the database restored to its state prior to the start of the

transaction. Two options after it has been aborted: restart the transaction

can be done only if no internal logical error kill the transaction

Committed – after successful completion. Effects of committed transactions cannot be undone

Transaction State (Cont.)

End tr

ansa

ction

Commit

Abort

Abort

Begin Transaction

Transaction Properties (ACID)

Atomicity. Either all operations of the transaction are executed and properly reflected in the database or none are.

Consistency. A transaction transforms the database from one consistent state to another.

Isolation. Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. Intermediate transaction results must be hidden from other concurrently executed transactions. That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj,

finished execution before Ti started, or Tj started execution after Ti finished.

Durability. After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures.

To preserve the integrity of data the database system must ensure:

Example of Fund Transfer Transaction to transfer $50 from account A to account B:

1. read(A)

2. A := A – 50

3. write(A)

4. read(B)

5. B := B + 50

6. write(B)

Atomicity requirement if the transaction fails after step 3 and before step 6, money will be

“lost” leading to an inconsistent database state Failure could be due to software or hardware

the system should ensure that updates of a partially executed transaction are not reflected in the database

Either all the operations of the transaction be completed or if not then the transaction should be aborted

Example of Fund Transfer Transaction to transfer $50 from account A to account B:

1. read(A)

2. A := A – 50

3. write(A)

4. read(B)

5. B := B + 50

6. write(B)

Durability requirement — once the user has been notified that the transaction has completed (i.e., the transfer of the $50 has taken place), the updates to the database by the transaction must persist even if there are software or hardware failures.

Ensuring durability is the responsibility of recovery management component

Example of Fund Transfer (Cont.)

Transaction to transfer $50 from account A to account B:1. read(A)2. A := A – 503. write(A)4. read(B)5. B := B + 506. write(B)

Consistency requirement : The sum of A and B is unchanged by the execution of the transaction A transaction must see a consistent database. Execution of a transaction

must leave the database in either a new stable state or revert to old stable state.

During transaction execution the database may be temporarily inconsistent. When the transaction completes successfully the database must be

consistent Erroneous transaction logic can lead to inconsistency

In general, consistency requirements include Explicitly specified integrity constraints such as primary keys and foreign keys Implicit integrity constraints

e.g. sum of balances of all accounts, minus sum of loan amounts must equal value of cash-in-hand

Example of Fund Transfer (Cont.) Isolation requirement — if between steps 3 and 6, another

transaction T2 is allowed to access the partially updated database, it will see an inconsistent database (the sum A + B will be less than it should be). T1 T21. read(A)2. A := A – 503. write(A)

read(A), read(B), print(A+B)4. read(B)5. B := B + 506. write(B

Data used during execution of a transaction cannot be used by a second transaction, till the first one is completed

Isolation can be ensured trivially by running transactions serially, that is, one after the other.

However, executing multiple transactions concurrently has significant benefits, as we will see later.

Reason for Incomplete transactions It can be aborted or terminated unsuccessfully

because of some problem during execution. System may crash while one or more transactions

are in progress. Transaction may encounter an unexpected situation

(read an unexpected data value or unable to access some disk) and decide to abort. DBMS ensures transactions atomicity by undoing actions of

incomplete transactions. It maintains a records, called log, of all the writes to the

database. It ensures durability also. If system crashes before changes made by complete Transaction

are written to the disk, log is used to remember and restore changes when system restarts.

Log File Typically, a log file contains records with the

following contents:

[start_transaction, T ] (*T is transaction id*)

[write_item, T, X, old_value, new_value]

[read_item,T, X ] (*optional*)

[commit, T ]

[abort, T ]

Sources of Database Inconsistency

Uncontrolled execution of database transactions in a multi – user environment can lead to database inconsistency

There is a number of possible sources of database inconsistency

The typical ones are: lost update problem, dirty read problem, and unrepeatable read problem

Lost Update ProblemT1

T2

read_item ( X )

X = X – N

write_item (X )

read_item (X)

X = X + M

write_item (X)

time

After termination of T2, X = X + M.T1's update to X is lost becauseT2 wrote over X

Generally, lost updateproblem is characterized by:•T2 reads X,•T1 writes X, and•T2 writes X

Lost update problem

Imagine that a customer wants to withdraw £30 from a bank account.

At the same time, the bank is crediting this month’s salary.

Time T1 (withdrawal) T2 (credit salary) Balancet1 begin_transaction 100t2 begin_transaction read(balance) 100t3 read(balance) balance=balance+1000 100t4 balance=balance-30 write(balance) 1100t5 write(balance) commit 70t6 commit 70

Both transactions occur at roughly the same time and read the same initial balance.

The last transaction to commit overwrites the update made by the first.

Uncommitted Dependency : Dirty Read Problem

T1T2

read_item ( X )X = X – Nwrite_item (X )

read_item ( Y )T1 fails

read_item (X)X = X + Mwrite_item (X)

time

Generally, dirty readproblem is characterizedby:•T1 writes X,•T2 reads X, and•T1 failsSince T1 failed, DBMS isgoing to undo the changesit made against thedatabaseT2 has already read item X =X - N value, and that value isgoing to be altered by DBMSback to X

The Uncommitted dependency :Dirty Read problem The uncommitted dependency problem (dirty read) occurs when

one transaction sees the intermediate results of another (aborted) transaction.

Time T1 (withdrawal) T2 (credit salary) Balancet1 begin_transaction 100t2 read(balance) 100t3 balance=balance – 30 100t4 write(balance) begin_transaction 70t5 read(balance) 70t6 rollback balance = balance + 1000 70t7 write(balance) 1070t8 commit 1070

For some reason, the withdrawal transaction is aborted, but the salary credit transaction has already seen the update.

When T2 commits, the balance is incorrect (it should be 1100).

Unrepeatable Read / Inconsistent Retrieval problem

T1T2

read_item ( X )

read_item (X )

read_item (X)

X = X + M

write_item (X)

time

Transaction T1 has got two different values of X in two subsequentreads, because T2 has changed it in the meantimeEven if T1 didn't execute the second read command, it would use astale X value, and that's another form of the unrepeatable read problem

Generally, unrepeatable readproblem is characterized by:•T1 reads X,•T2 writes X, and•T1 reads X

Unrepeatable Read / Inconsistent Retrieval problem The previous problems involved simultaneous updates to the database. However, problems

can also result if a transaction merely reads the result of an uncommitted transaction. Below, one transaction (T1) is transferring £10 from account Balw to Balz, and at the same

time, another transaction (T2) is summing all the accounts (Balw, Balx, Baly and Balz). Try to figure out what’s gone wrong….

Time T1 (transfer funds) T2 (sum accounts) Balw Balx Baly Balz Sumt1 begin_transaction 100 50 10 25t2 begin_transaction sum=0 100 50 10 25 0t3 read(Balw) read(Balw) 100 50 10 25 0t4 balw=Balw - 10 sum = sum + Balw 100 50 10 25 100t5 write(Balw) read(Balx) 90 50 10 25 100t6 read(Balz) sum = sum + Balx 90 50 10 25 150t7 balz = Balz + 10 read(Baly) 90 50 10 25 150t8 write(Balz) sum = sum + Baly 90 50 10 35 160t9 commit read(balz) 90 50 10 35 160t10 sum = sum+Balz 90 50 10 35 195t11 commit 90 50 10 35 195….here, the £10 transferred by T1 has been counted twice by T2, making its result too large by £10.

A Question for You What is the difference between:

Dirty read and Unrepeatable read

A Question for You What is the difference between:

Dirty read and Unrepeatable read

Answers:a) There is no difference.

b) Even if there is a difference, I can't recall what it is.

c) The difference is:– The dirty read is a consequence of reading updates made

by a transaction before it has successfully finished (and has even failed later).

– The unrepeatable read is a consequence of allowing a transaction to read data that the other one is altering.

Prevention of Concurrency Anomalies Lost update, dirty read and unrepeatable read are

called concurrency anomalies The concurrency control part of a DBMS has the

task to prevent these problems DBMS is responsible to ensure that either all

operations of a transaction are successfully executed and their effect is permanently stored in the database, or it happens as if the transaction were even not started

The effect of a partially executed transaction has to be undone

Advantages of Concurrent Execution of Transactions

Advantages of concurrent execution are: Increased Throughput and Resource Utilization, leading

to better transaction throughput E.g. one transaction can be using the CPU while another is

reading from or writing to the disk (processor and disk spend less time idle.)

Because I/O activity can be done in parallel with CPU activity.

Reduced Average Response Time: The average time for a transaction to be completed after it has been submitted

For transactions: short transactions need not wait behind long ones. Concurrent execution reduces unpredictable delays in running trans.

Implementation of Atomicity and Durability The recovery-management component of a database system

implements the support for atomicity and durability. E.g. A very simple but extremely inefficient scheme is the

shadow-copy scheme: all updates are made on a shadow copy of the database

db_pointer is made to point to the updated shadow copy after the transaction reaches partial commit and all updated pages have been flushed to disk.

Implementation of Atomicity and Durability (Cont.)

db_pointer always points to the current consistent copy of the database. In case transaction fails, old consistent copy pointed to by

db_pointer can be used, and the new copy can be deleted.

The shadow-database scheme: Assumes that only one transaction is active at a time. Assumes disks do not fail Useful for text editors, but

extremely inefficient for large databases (why?) Does not handle concurrent transactions

Database Architecture

TransactionManager

Scheduler

Buffer Manager Recovery Manager

Low Level DBMS

Database Architecture

The components of a DBMS that manage transactions are as follows: The transaction manager coordinates transactions on

behalf of application programs. It communicates with the scheduler, which implements a

particular strategy for concurrency control The scheduler tries to maximize concurrency without allowing

transactions to interfere with one another. If failure occurs during a transaction, the recovery

manager ensures that the database is restored to the state it was in before the start of the transaction.

The buffer manager is responsible for the transfer of data between disk storage and main memory

Transaction Management Both concurrency and recovery control are required to protect the

database from data inconsistencies and data loss. Many DBMSs allow users to undertake simultaneous operations

on the database. If these operations are not controlled, they can interfere with each

other, and the database may become inconsistent. To overcome this the DBMS implements Concurrency control

schemes – i.e. mechanisms to achieve isolation That is, to control the interaction among the concurrent transactions in

order to prevent them from destroying the consistency of the database Database recovery is the process of restoring the database to a

correct state following a failure– The failure may be the result of a system crash, media failure, software error, or accidental or malicious destruction of data.

Whatever the cause of failure, the DBMS must be able to restore the database to a consistent state

Concurrency Control

Concurrency control is the process of managing simultaneous operations on the database without having them interfere with one another. Concurrency control is needed because many users are able to

access the database simultaneously.

Note that managing concurrent access is easy if all users are only reading data.

There is no way that such uses can interfere with one another.

However, when two or more users are accessing the databases simultaneously, and at least one of them is updating data, there may be interference that can cause inconsistencies.

Concurrency Control

T1 T2 … Tn

DB(consistencyconstraints)

How to prevent harmful interference btw transactions?

To prevent such problems a DBMS must implement concurrency control techniques based on

- locks

- timestamps and validation

Schedules Schedule – A sequences of instructions that specify

the chronological order (possibly interleaving) in which instructions of concurrent transactions are executed Suppose there are n transactions T1, T2 ,…,Tn

A schedule S of these n transactions is an ordering of their operations such that for each Ti participating in S, the operations of Ti in S appear in the same order as in Ti itself A schedule for a set of transactions must consist of

all instructions of those transactions It must preserve the order in which the instructions

appear in each individual transaction.

A transaction that successfully completes its execution will have a commit instructions as the last statement by default transaction assumed to execute commit

instruction as its last step A transaction that fails to successfully complete its

execution will have an abort instruction as the last statement

A schedule that contains either an abort or a commit for each transaction whose actions are listed in it is called complete schedule.

Complete Schedules

Serial Schedules

A schedule S is said to be a serial schedule if for each Ti in S, all operations of Ti are executed consecutively in S

If the actions of different transactions are not interleaved-that is transactions are executed from start to end one by one

Serial schedules are considered to be correct, i.e. that they do not exhibit concurrency anomalies, like: lost update unrepeatable read dirty read

But, serial schedules mean no interleaving and hence are considered inefficient

Serial Schedules : Schedule 1 Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance

from A to B. A serial schedule in which T1 is followed by T2 :

Schedule 2• A serial schedule where T2 is followed by T1

Schedule 3 Let T1 and T2 be the transactions defined previously. The following schedule is not a

serial schedule, its concurrent schedule. But it is equivalent to Schedule 1.

In Schedules 1, 2 and 3, the sum A + B is preserved.

Schedule 4 The following concurrent schedule does not preserve the

value of (A + B ).

Examples of Legal Schedules

T < S 1. avoid lost update problem

T: transfer $100 from A to C: R(A) W(A) R(C) W(C)

S: transfer $100 from B to C: R(B) W(B) R(C) W(C)

2. avoid inconsistent retrievals problem T: transfer $100 from A to C: R(A) W(A) R(C) W(C)

S: compute total balance for A and C: R(A) R(C)

3. avoid non-repeatable reads T: transfer $100 from A to C R(A) W(A) R(C) W(C)

S: check balance and withdraw $100 from A: R(A) R(A) W(A)

Defining the Legal Schedules 1. To be serializable, the conflicting operations of T and S must

be ordered as if either T or S had executed first. We only care about the conflicting operations: everything else

will take care of itself. 2. Suppose T and S conflict over some shared item(s) x. 3. In a serial schedule, T’s operations on x would appear before

S’s, or vice versa....for every shared item x. As it turns out, this is true for all the operations, but again, we

only care about the conflicting ones. 4. A legal (conflict-serializable) interleaved schedule of T and S

must exhibit the same property. Either T or S “wins” in the race to x; serializability dictates that

the “winner take all”.

Serializable Schedules Basic Assumption – Each transaction preserves

database consistency. Thus serial execution of a set of transactions

preserves database consistency. Schedules that allow interleaving to some extent

and are correct are called serializable schedules

A serializable schedule is considered to be correct if it is equivalent to a serial schedule

Different forms of schedule equivalence 1. conflict serializability

2. view serializability

Conflict Serializable Schedules Two operations conflict if

they belong to different transactions at least one of them is a write they access the same item X

A serializiable schedule is conflict equivalent to a serial schedule if order of any two conflicting operations is the same in

both schedules

Simplified view of transactions We ignore operations other than read and write instructions We assume that transactions may perform arbitrary computations

on data in local buffers in between reads and writes. Our simplified schedules consist of only read and write

instructions.

Conflicting Instructions Instructions li and lj of transactions Ti and Tj respectively,

conflict if and only if there exists some item Q accessed by both li and lj, and at least one of these instructions wrote Q.

1. li = read(Q), lj = read(Q). li and lj don’t conflict. 2. li = read(Q), lj = write(Q). They conflict. 3. li = write(Q), lj = read(Q). They conflict 4. li = write(Q), lj = write(Q). They conflict

Intuitively, a conflict between li and lj forces a (logical) temporal order between them. If li and lj are consecutive in a schedule and they do not

conflict, their results would remain the same even if they had been interchanged in the schedule.

Conflict Non Equivalent Schedules

T1T2


read_item (X)X = X + Mwrite_item (X)

time

T1T2

read_item ( X )

X = X – N

write_item (X )

read_item (X)

X = X + M

write_item (X)

Serial schedule S1 Schedule S2

time

Schedules S1 and S2 are NOT conflict equivalent, since:•in S1, write1_item (X ), read2_item(X )•in S2, read2_item (X ), write1_item(X )S2 is not conflict serializable schedule, because one can't find a serial schedule that is conflict equivalent to S2

Conflict Serializability If a schedule S can be transformed into a

schedule S´ by a series of swaps of non-conflicting instructions, we say that S and S´ are conflict equivalent.

We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule.

Conflict Serializability (Cont.) Schedule 3 can be transformed into Schedule 6, a

serial schedule where T2 follows T1, by series of swaps of non-conflicting instructions. Therefore Schedule 3 is conflict serializable.

Schedule 3 Schedule 6

Conflict Serializability (Cont.) Example of a schedule that is not conflict serializable:

We are unable to swap instructions in the above schedule to obtain either the serial schedule < T3, T4 >, or the serial schedule < T4, T3 >.

View Serializability Let S and S´ be two schedules with the same set of

transactions. S and S´ are view equivalent if the following three conditions are met, for each data item Q, 1. If in schedule S, transaction Ti reads the initial value of Q, then in

schedule S’ also transaction Ti must read the initial value of Q.

2. If in schedule S transaction Ti executes read(Q), and that value was produced by transaction Tj (if any), then in schedule S’ also transaction Ti must read the value of Q that was produced by the same write(Q) operation of transaction Tj .

3. The transaction (if any) that performs the final write(Q) operation in schedule S must also perform the final write(Q) operation in schedule S’.

As can be seen, view equivalence is also based purely on reads and writes alone.

View Serializability (Cont.) A schedule S is view serializable if it is view equivalent to a

serial schedule. Every conflict serializable schedule is also view

serializable. But reverse is not true. Below is a schedule which is view-serializable but not conflict

serializable.( It is view equivalent to serial schedule<T3,T4,T6>)

Every view serializable schedule that is not conflict serializable has blind writes. (Without having performed read(Q) operation directly writes operations are called blind writes)

Other Notions of Serializability The schedule below produces same outcome as the serial

schedule < T1, T5 >, yet is not conflict equivalent or view equivalent to it.

Determining such equivalence requires analysis of operations other than read and write.

The Graph Test for Serializability To determine if a schedule is serializable, make a directed

graph: Add a node for each committed transaction. Add an arc from T to S if any equivalent serial schedule must

order T before S. T must commit before S iff the schedule orders some operation of T

before some operation of S. The schedule only defines such an order for conflicting operations... ...so this means that a pair of accesses from T and S conflict over

some item x, and the schedule says T “wins” the race to x. The schedule is conflict-serializable if the graph has no cycles.

(winner take all)

Testing for Serializability Consider some schedule of a set of transactions T1, T2, ..., Tn Precedence graph — The graph consist of a pair G=(V,E). A direct graph where V is set of vertices consist of all transactions

and E set of edges consist of all edges Ti Tj for which one of three condition holds: Ti executes write(X) before Tj executes read(X). Ti executes read(X) before Tj executes write(X). Ti executes write(X) before Tj executes write(X).

If precedence graph for schedule S has a cycle, then schedule S is not conflict serializable.If graph contains no cycles then schedule S is conflict serializable.

Example 1V

yE

R(X) W(X)

R(X)W(X)

Testing for Conflict Serializability

T1T2

read_item ( X )

X = X – N

write_item (X )

read_item (X)

X = X + M

write_item (X)

Schedule S2

time

T1 T2

read2_item(X ), write1_item(X )

read1_item(X ), write2_item(X )write1_item(X ), write2_item(X )

Testing for Conflict Serializability

T1T2


read_item ( Y )Y = Y + Qwrite_item (Y )

read_item (X)X = X + M

write_item (X)

Schedule S3

time

T1 T2

read1_item(X ), write2_item(X )write1_item(X ), read2_item(X )write1_item(X ), write2_item(X )

Test for Conflict Serializability A schedule is conflict serializable if and

only if its precedence graph is not cyclic.

Cycle-detection algorithms (such as depth first search) require on the order of n2 operations, where n is the number of vertices in the graph. (Better algorithms take order n + e where

e is the number of edges.) If precedence graph is acyclic (not

cyclic),the serializability order can be obtained by a topological sorting of the graph. This is a linear order consistent with the

partial order of the graph. For example, a serializability order for

Schedule A would beTi-Tj-Tk-Tm , Ti-Tk-Tj-Tm

Test for View Serializability The precedence graph test for conflict serializability

cannot be used directly to test for view serializability. Extension to test for view serializability has cost exponential in

the size of the precedence graph. The problem of checking if a schedule is view

serializable falls in the class of NP-complete problems. Thus existence of an efficient algorithm is extremely unlikely.

However practical algorithms that just check some sufficient conditions for view serializability can still be used.

Recoverable Schedules

Recoverable schedule —For each pair of transactions Ti, Tj such that Tj reads a data item previously written by a transaction Ti , then the commit operation of Ti appears before the commit operation of Tj.

The following schedule is not recoverable if T9 commits immediately after the read

If T8 abort, T9 would have read (and possibly shown to the user) an inconsistent database state. Hence, database must ensure that schedules are recoverable.

Need to address the effect of transaction failures on concurrently running transactions.

Cascading Rollbacks Cascading rollback – a single transaction failure leads

to a series of transaction rollbacks. Consider the following schedule where none of the transactions has yet committed (so the schedule is recoverable)

If T10 fails, T11 and T12 must also be rolled back because T11 depends on T10 and T12 depends on T11.

Can lead to the undoing of a significant amount of work

Cascadeless Schedules Cascadeless schedules — cascading

rollbacks cannot occur; for each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the read operation of Tj.

Every cascadeless schedule is also recoverable

It is desirable to restrict the schedules to those that are cascadeless

Concurrency Control A database must provide a mechanism that will

ensure that all possible schedules are either conflict or view serializable, and are recoverable and preferably cascadeless

A policy in which only one transaction can execute at a time generates serial schedules, but provides a poor degree of concurrency

Testing a schedule for serializability after it has executed is a little too late!

Goal – to develop concurrency control protocols that will assure serializability.

Concurrency Control vs. Serializability Tests Concurrency-control protocols allow concurrent schedules, but

ensure that the schedules are conflict/view serializable, and are recoverable and cascadeless .

Concurrency control protocols generally do not examine the precedence graph as it is being created Instead a protocol imposes a discipline that avoids nonseralizable

schedules. We study such protocols in next chapter.

Different concurrency control protocols provide different tradeoffs between the amount of concurrency they allow and the amount of overhead that they incur.

Tests for serializability help us understand why a concurrency control protocol is correct.

Summary Executing transaction in an interleaved way may bring a

database in an inconsistent state Transaction anomalies are:

Lost update, Dirty read, and Unrepeatable read

A DBMS is responsible to ensure that either all operations of a transaction are successfully executed, or it is rolled back

Log file records all important events (start, read, write, commit)

When a transaction reaches its commit point, everything is safely stored in a database (or a log file)

Summary on Serializability Concurrency anomalies are avoided if a schedule is

serializable (equivalent to a serial schedule) Serializable schedules are desirable because serial

schedules are generally inefficient A schedule is conflict serializable if there are no cycles in

the precedence graph Serializibility is only a tool to develop understanding of

desirable schedules, but it is not tested by DBMS before executing a series of transactions

Instead, DBMS’s apply protocols that assure correctness of the schedule Many of these protocols rely on locking

End of Chapter

Documents

Chapter 15: Transaction Management