Locks and Deadlocks (in Teradata)




Citation preview

Locks and Deadlocks (In Teradata)

1) First of all, a deadlock is a situation where at least 2 requests are locking each other out of a database

object(s) in a way that is irresolvable by any means other than aborting one of them.

2) If not a deadlock situation, then "the system will wait forever on a lock."

(There are two timers in DBC software:

- The deadlock timer

- The hung transaction timer

The hung transaction timer detects transactions hung due to system errors and backs them out.

There are however, no "time-outs" for transactions on the DBC. )

3) There are four things that can cause a 2631 (deadlock):

a) A pending lock causing a bottleneck. (Item #1)

b) A very congested system.

c) A number of transactions is exceeding the machine capacity.

d) Actual software bug. (not important to this discussion)

4) BTEQ will abort the youngest transaction in a deadlock and resubmit it. (It is important to keep this in

mind as you continue reading since it not always restated. Also, a developer should note what software

detects deadlocks.)

(The DBS software detects deadlocks and aborts transactions.)

5) Some utilities (or all?) cannot detect deadlocks and, as a result, may cause operations to hang

forever! ARC is an example of such a utility. (Again, a developer should note what software detects


ARC is probably the only utility with this problem. ARC will hang forever, and it is possible that the entire

system may hang. The only way around this is to restart the system. This problem is not detectable to

the user except thru the DEBUGGER.

6) Application coding (ie: PP2; CLI/V2) that doesn't interpret deadlock return codes will result in

transactions not getting applied.

7) As a note, locking modifiers should not be used for OLTP since these are table level locks that incur

significant overhead.


Having reviewed the basic definitions, let us now review the problems of locking as they exist.

1) By way of review, perhaps the best known locking/deadlocking problem is request 1 locking table_a,

waiting for lock on table_b, which is locked by request 2 and waiting for a lock on table_a. This kind of

locking problem is well known and not a concern in this discussion.

2) Consider two or more transactions accessing a single, NON FALLBACK table whereby the transactions

require TABLE LEVEL locks that are incompatible with each other (ie: read vs write).

(This scenario applies to fallback as well. Table level locks are applied to both primary and

fallback data concurrently.)


a) The transactions are submitted at nearly the same time.

b) The transaction's lock requests are received by the AMPs so that query x obtains a lock on the

table at AMP 1, and waits on query y's lock on the table at AMP 2, which is waiting on query x's

lock on the table at AMP 1.Then a global deadlock occurs and the youngest request is aborted.

(And probably retried based on scenario 3 results) This can be avoided when mixing selects and

updates by using the locking for access statement. For multiple updates this will be a problem

only when transactions are submitted at NEARLY THE SAME TIME. At that time, the one that is

aborted due to deadlocking can be re-submitted. These occurrences are very rare.

3) Consider three or more transaction accessing a single FALLBACK table whereby at least two require

table level locks and another is an insert or delete by prime index.

Here is the scenario.

Transaction 1 issues a BT;

Transaction 1 issues a SEL * FROM table_a;

(gets all-AMP read lock on table_a, both primary and fallback copy)

Transaction 2 issues an insert, update (via prime index) or delete against table_a.

(write lock on AMP for primary row) (blocks on T1 read lock) (notice how, via a prime index, these

operations lock the primary hash first before locking fallback hash)

Transaction 3 issues a SEL * FROM table_a;

(locks on table_a queue behind write lock on the single AMP)

Transaction 1 issues an ET;

(releases all-AMP read lock on table_a)

Transaction 2 inserts primary row

(holds read lock on primary AMP, blocks on spawned write lock of T3 for fallback AMP)

A global deadlock occurs and the youngest request (in this case transaction 3) is aborted (canceling the

select and releasing the locks) and retried. In my testing, transaction 2 finished then allowing a

successful retry on transaction 3!

This can be avoided when mixing selects with updates by using the locking for access statement.

(In a 4.2 E-tape, rather than waiting for the global deadlock detection routine, DBS automatically backs

out any transaction that can't get a spawned row hash fallback lock because of it getting blocked against

a table level lock. This may surprise some users who see a 2631 message instantly rather than after the

usual delay from the global detector which runs every 4 minutes.)

4) Consider a BT; ET; type transaction of the following type:


SELECT . . . FROM table_a WHERE UPI = x;

UPDATE table_a SET col = value WHERE UPI = x;


If transaction 1 gets the hash read lock for the select and, prior to upgrading it to a write lock,

transaction 2 comes in for the SAME UPI VALUE; transaction 1's upgrade blocks on transaction 2's read

lock. When transaction 2 wants to upgrade to the write lock it now blocks on transaction 1's read lock

which is still being held due to the BT; ET; statements.

A local deadlock occurs and the youngest request is aborted. (And probably retried based on scenario 3


(This scenario is often considered as poor programming and would occur in any system, even DB2.)

This can be handled in a program by resubmitting the aborted request after the deadlock is detected. In

BTEQ batch requests special care must be taken for cases like the following:


SELECT . . . FROM table_a WHERE UPI = x;

UPDATE table_a SET col = value WHERE UPI = x;

UPDATE . . .;

SELECT . . .;

. . .;


In cases like these, if the second requests fails for any reason, then it is backed out along with the first

request, however the transaction is closed and processing resumes at request 3! In a program this can

be controlled. In a BTEQ script this can't unless BTEQ ".IF THEN" logic is invoked. (Refer to BTEQ Guide)

5) Consider a BT; ET; type transaction of the following type:


LOCKING table_a FOR locklevel

SELECT . . . FROM table_a WHERE UPI = x;

UPDATE table_a SET col = value WHERE UPI = x;


a) The locking modifier on the SELECT statement results in a table level lock even though the request is

via a primary index.

b) The UPDATE statement results in an additional WRITE lock at the row hash level. This is regardless of

whether the locklevel in statement 1 is ACCESS, READ or WRITE.

c) Two locks are requested; the table level ACCESS, READ or WRITE lock and subsequently (after

statement 1 has been processed successfully) the row hash WRITE lock.

d) If the locklevel in statement 1 is READ, then the deadlocks will occur as two occurrences of the

transaction could both be granted the table READ lock and then queue on each others held table lock

when subsequently requesting a row hash WRITE lock.

e) However, if the locklevel in statement 1 is ACCESS then deadlocks will not occur as the two

occurrences of the transaction would both be granted the table ACCESS lock and the subsequent

requests for a row hash WRITE lock are compatible with the existing table ACCESS locks.

f) For OLTP, the recommendation is to use row hash accesses (and therefore locking) whenever possible

and when this is not possible then use an ACCESS modifier. Care must be taken to ensure that the loss of

repeatable read transaction integrity is acceptable to the application.

6) Consider the following case:

UPDATE table_a SET col = value WHERE NUSI = xxx;

Since the NUSI update requests a table level lock we have a locking scenario similar to that in items 2

and 3. These locks can be avoided when mixing selects with updates using the locking for access


7) With the information now provided, one can now think through other possible scenarios. For


Since queries like "SELECT * FROM ..." lock primary and fallback data, a scenario similar to #2 can occur

when these are run concurrently with prime index requests of


8) One scenario that deadlocks but does not get detected is the following where a table is fallback:

1) Job1 inserts/updates/deletes a row by prime index from table_a

2) Very, VERY shortly afterwards (or at the same time) Job2 running ARC to dump table_a starts and

grabs read locks on all AMPs for table_a.

3) Job1 beats Job2 to AMP 1-0 and grabs a write lock on it for the primary data row. Now it requests a

write lock on the FB data block for AMP 1-2, but it gets queued behind the ARC read lock.

4) ARC is waiting for the write lock on 1-0 by Job1 which is waiting on the read lock on 1-2 from ARC


5) A deadlock occurs that will not get detected. Result is a hang that will wait forever, blocking any write

to that table!



Summarizing some general rules:

1) Always push towards locking at the primary index hash level.

The DBC locks PI requests at hash level, this is not to be confused with row level. A UPI does not

guarantee row level locking. A UPI without hash collisions (synonyms) does. NUPIs, by their very nature,

would always contain several occurrences per hash value.

You can avoid synonyms in at least two ways:

a) A 1 column index is an example (?)

b) For several columns involving integer values

(Integer values that sum to the same value hash to the same value.) redefine each as DECIMAL with

varying decimal places.

For example, a 3-column primary index, each defined as INTEGER will have each of these values

hash identically:

Col1 Col2 Col3

2 4 6 (2 + 4 + 6 = 12)

4 4 4 (4 + 4 + 4 = 12)

5 2 5 (5 + 2 + 5 = 12)

If, instead of INTEGER, you define them as INTEGER, DECIMAL(15,2), DECIMAL(15,3), then they will each

hash differently.

Col1 Col2 Col3

2 4 6 (2 + 400 + 6000 = 6402)

4 4 4 (4 + 400 + 4000 = 4404)

5 2 5 (5 + 200 + 5000 = 5205)

2) Use the "LOCKING ... FOR ACCESS;" command whenever possible for reads.

3) A consequence of item 1 is to avoid using CHARACTER data types for indexing where synonyms will

result in unacceptable numbers.

4) Updating and selecting rows via NUSI's request table level locks. (Refer to item 2 in section 1)

5) Inserts/updates/deletes via primary index first acquire primary hash locks, then, when acquired, get

the hash locks for the fallback data hash.

6) Where deadlocking occurs frequently, use application processing (eg: PP2; CLI/V2) to be able to re-

issue deadlocked requests. Application processing also manages situations that straight BTEQ scripts

may not. (Refer to item 4 in section 1)

7) DBC macros have features that make deadlocks easier to handle. Unlike batch BTEQ using BT; ET;,

aborted macros not only backout any transactions to the beginning, but also leave the macro

withoutcontinuing at the point of failure. It is easier to re-execute macros from application code when

deadlocks or any other errors occur causing the macro to abort.Since macros will run its statements in

parallel,whenever possible, the explain feature should be usedto verify it will execute as desired.

8) One concern with macros is that locks are sometimes not described well in EXPLAINs. For example:


(INSERT INTO t1 VALUES ( . . . );

SELECT FROM t1 WHERE non-indexed/nusi = x;

UPDATE t2 . . .;

UPDATE t3 . . .; );

Several users executing this macro concurrentlywill more-than-likely deadlock. This is because the DBC

will get the table read locks first.When one of the executions goes to get the row hash write lock it will

queue on the other users read lock and cause a deadlock.

An EXPLAIN will not reveal the WRITE lock at the row hash level because this lock is requested as part of

the AMP processing. As a result, theWRITE lock is actually obtained AFTER THE IFP READ


If this macro is one of many in the applicationthen locking problems can cascade affecting manyother

macros since the write locks on tables t2 and t3 will not be released until the macro holding the initial

locks finishes.


Some comments controlling locking strategies using BT;/ET; and a process control table. First lets review

the process.


UPDATE control table; <= Set time and user id.

Locks are held until ET;

Perform operations on other tables;

UPDATE control table; <= Reset time and user id to nulls.

ET; <= All locks are released.

This is a very good procedure as outlined. I have tested it and it works as anticipated. Many users

executing this script should experience no deadlocks. The idea is to prevent certain concurrent locking

conditions from occurring due to all AMP lock requests. Since the control table update is a single row

update and the BT/ET locks locks will be held throughout, only one (1) set of the 'inner' operations can

be executed at a time.

a) The control table must be accessed by a UPI.

b) The date, time, and userid columns must be disjoint fromthe UPI.

c) The UPI should (I suggest) be an INTEGER constant.

Since the process control table involves only updates (at theindex level) chances of deadlocking should

be zero because the primary row gets locked first and then the fallback row. All lock requests should



For those who are interested, a good method for testing locking situations might try variations of this.

1) On terminal 1 enter via an interactive BTEQ session.

=> BT;


2) Read locks are now held on FALLBACK table TABLE_A.

3) Now on terminal 2 enter via an interactive BTEQ session.

=> UPDATE TABLE_A SET colx = 'xxx' WHERE UPI = 'a';

4) This waits on 1)

5) On terminal 3 enter via an interactive BTEQ session.


6) On terminal 1, now enter: => ET;

7) A deadlock situation now exists. The second select is aborted and resubmitted after the update

finishes. Many variations of this can be performed for many circumstances.

It is still possible to get row hash synonyms even with single col UPI. Use of BT; and ET; should be

accompanied by .SET RETRY OFF for cases like the following:


UPDATE . . .

UPDATE . . .

. . . . . .


With RETRY set ON for this BT/ET, if the second update should deadlock and be the one to abort, update

one gets backed out, and since RETRY is ON update two gets retried! As previously stated, an ET; is

issued and any following requests inside the BT/ET are executed. (Unless of course other BTEQ

commands prevent this or this is issued from a programand handled programmatically.)

As previously described, the potential for deadlocking increases when lock escalation is involved. So this

would be particularly significant for the following BT/ET.


SELECT . . .

UPDATE . . .

UPDATE . . .


Some other information on database locking:

I'm not sure about the details but it goes something like this -

When some one performs an ALTER TABLE XYZ, FALLBACK; odd locking experiences can occur. There is a

column in table DBC.Dbase named NUMFALLBACKTABLES that gets causes a write lock on this table.

Now when anyone accesses ANY TABLE in that database they will blocked on the table DBC.DBASE if the

parser for the IFP or COP they attach to does not find the table definition in memory. Because it needs

to verify the existence of the table, the parser must read DBC.DBASE. Since that table is locked for read,

it can't read it. If however, the existence of the altered table can be verified because that row in

DBC.DBASE is in memory then the DBC table need not be read.

The result is a sporadic locking problem that can be difficult to solve, especially if the ALTER FALLBACK is

done on a BIG table. Both the NumFallBackTables and the NumLogProTables fields in the DBC.DBase

table have become obsolete and are no longer maintained since release 4.2.5. As a result, DDL

statements such as ALTER TABLE and CREATE TABLE no longer require a rowhash lock on DBC.DBase.

The locking problem described in DLM's last note should no longer exist for the ALTER TABLE statement.
