78
Transaction 1 Principles of Computer System (2012 Fall) All-or-nothing

Transaction

  • Upload
    doane

  • View
    29

  • Download
    1

Embed Size (px)

DESCRIPTION

Transaction. All-or-nothing. Principles of Computer System (2012 Fall). Where are we?. System Complexity Modularity & Naming Enforced Modularity Network Fault Tolerance Transaction All-or-nothing Before-or-after. Review. Goal Building reliable system out of unreliable components - PowerPoint PPT Presentation

Citation preview

Page 1: Transaction

1

Transaction

Principles of Computer System (2012 Fall)

All-or-nothing

Page 2: Transaction

2

Where are we?

• System Complexity• Modularity & Naming• Enforced Modularity• Network• Fault Tolerance• Transaction– All-or-nothing– Before-or-after

Page 3: Transaction

3

Review

• Goal– Building reliable system out of unreliable

components• Fault, Error, Failure• MTTF, MBTF, MTTR– Bath curve

• RAID

Page 4: Transaction

4

Review

• Disk Failure Tolerance– Use replication– Checksum used to detect faults– If cannot read one disk, read data from replica(s)

instead– Weaker (but still very effective): error-correcting

codes on disk platter– Replication allows masking component failure

Page 5: Transaction

All-or-Nothing and Before-or-After Atomicity

• An action is atomic – If there is no way for a higher layer to discover the internal

structure of its implementation

5

Page 6: Transaction

All-or-Nothing and Before-or-After Atomicity

• From the point of view of a procedure that invokes an atomic action– The atomic action always appears either to complete as

anticipated, or to do nothing– This consequence is the one that makes atomic actions

useful in recovering from failures

6

Page 7: Transaction

All-or-Nothing and Before-or-After Atomicity

• From the point of view of a concurrent thread– An atomic action acts as though it occurs either

completely before or completely after every other concurrent atomic action

– This consequence is the one that makes atomic actions useful for coordinating concurrent threads

• Atomicity hides – not just the details of which steps form the atomic action– but the very fact that it has structure

7

Page 8: Transaction

All-or-Nothing and Before-or-After Atomicity

• 1. Data abstraction – Hide the internal structure of data

• 2. Client/server organization– Hide the internal structure of major subsystems

• 3. Atomicity – Hide the internal structure of an action

• Enforce industrial-strength modularity– Guarantee absence of unanticipated interactions among

components of a complex system• The implementer’s point of view– Painful

8

Page 9: Transaction

Atomic actions’ benevolent side effects• Audit log– atomic actions that run into trouble record

• the nature of the detected failure and • the recovery sequence

– for later analysis• Data management system when insert a record– rearrange the file into a better physical order

• Cache• Garbage collection

• They are all hidden from upper levels9

Page 10: Transaction

Overall system fault tolerance model• 1. Error-free operation: – All work goes according to expectations– The user initiates actions and the system confirms the

actions by displaying messages to the user• 2. Tolerated error:– The user who has initiated an action notices that the

system failed before it confirmed completion of the action – when the system is operating again, checks to see

whether or not it actually performed that action

10

Page 11: Transaction

Overall system fault tolerance model• 3. Un-tolerated error: – The system fails without the user noticing– the user does not realize that he or she should check or

retry an action that the system may not have completed

11

Page 12: Transaction

Disk Storage System Fault Tolerance Model

• A perfect-disk assumption– a disk never decays and that it has no hard errors– only one thing can go wrong: a system crash at just the

wrong time

12

Page 13: Transaction

Disk Storage System Fault Tolerance Model

• The fault tolerance model– Error-free operation: • CAREFUL_GET returns the result of the most recent call

to CAREFUL_PUT • at sector_number on track, with status = OK.

– Detectable error: • The operating system crashes during a CAREFUL_PUT • and corrupts the disk buffer in volatile storage• and CAREFUL_PUT writes corrupted data on one sector

of the disk.

13

Page 14: Transaction

ALL_OR_NOTHING_PUT

1. procedure ALMOST_ALL_OR_NOTHING_PUT (data, all_or_nothing_sector)

2. CAREFUL_PUT(data, all_or_nothing_sector.S1)

3. CAREFUL_PUT (data, all_or_nothing_sector.S2)

4. CAREFUL_PUT (data, all_or_nothing_sector.S3)

5. procedure ALL_OR_NOTHING_GET (reference date,all_or_nothing_sector)

6. CAREFUL_GET (data1, all_or_nothing_sector.S1)

7. CAREFUL_GET (data2, all_or_nothing_sector.S2)

8. CAREFUL_GET (data3, all_or_nothing_sector.S3)

9. if (data1 = data2) data ← data1

10. else data ← data3

14

Page 15: Transaction

ALL_OR_NOTHING_PUT

1. procedure ALL_OR_NOTHING_PUT (data, all_or_nothing_sector)

2. CHECK_AND_REPAIR (all_or_nothing_sector)

3. ALMOST_ALL_OR_NOTHING_PUT (data,

all_or_nothing_sector)

4. procedure CHECK_AND_REPAIR (all_or_nothing_sector)

// Ensure copies match

5. CAREFUL_GET (data1, all_or_nothing_sector.S1)

6. CAREFUL_GET (data2, all_or_nothing_sector.S2)

7. CAREFUL_GET (data3, all_or_nothing_sector.S3)15

Page 16: Transaction

ALL_OR_NOTHING_PUT

8. if (data1 = data2) and (data2 = data3) return // State 1 or 7, no repair

9. if (data1 = data2)10. CAREFUL_PUT (data1, all_or_nothing_sector.S3) return // State 5 or

6.11. if (data2 = data3)12. CAREFUL_PUT (data2, all_or_nothing_sector.S1) return // State 2 or

3.13. CAREFUL_PUT (data1, all_or_nothing_sector.S2) // State 4, go to state 514. CAREFUL_PUT (data1, all_or_nothing_sector.S3 // State 5, go to state 7

16

Page 17: Transaction

Atomicity

17

Page 18: Transaction

Commit

18

Page 19: Transaction

Commit• Pre-commit– identify all the resources needed – establish their availability– maintain the ability to abort at any instant• shared resources, once reserved, cannot be released until

the commit point is passed• should not do anything externally visible

• Post-commit– release reserved resources that are no longer needed– perform externally visible actions – cannot try to acquire additional resources

• Q: where’s commit in ALL_OR_NOTHING_PUT?19

Page 20: Transaction

Shadow Copy• Pre-commit: – Create a complete duplicate working copy of the file that is

to be modified– make all changes to the working copy

• Commit point: – Carefully exchange the working copy with the original– Typically this step is bootstrapped

• Post-commit: – Release the space that was occupied by the original

20

The golden rule of atomicity : Never modify the only copy!

Page 21: Transaction

21

Bank account transfer

xfer(bank, a, b, amt):    bank[a] = bank[a] – amt bank[b] = bank[b] + amt

Page 22: Transaction

22

Bank account transfer

xfer(bank, a, b, amt):

    bank[a] = bank[a] – amt

bank[b] = bank[b] + amt

<- a=100, b=100

xfer(bank, a, b, 50):

<- a=50, b=100

<- a=50, b=150

Page 23: Transaction

23

Bank account transfer

xfer(bank, a, b, amt):    bank[a] = bank[a] – amt bank[b] = bank[b] + amt

audit(bank): sum = 0 for acct in bank:        sum = sum + bank[acct] return sum

Page 24: Transaction

24

Bank account transfer

xfer(bank, a, b, amt):    bank[a] = bank[a] – amt bank[b] = bank[b] + amt

audit(bank): sum = 0 for acct in bank:        sum = sum + bank[acct] return sum

<- sum=200

audit(bank):

<- sum=150

<- sum=200

Page 25: Transaction

25

Eventual goal: transactionsxfer(bank, a, b, amt): begin    bank[a] = bank[a] – amt bank[b] = bank[b] + amt commit

audit(bank): begin sum = 0 for acct in bank:        sum = sum + bank[acct] commit return sum

Page 26: Transaction

26

Strawman implementationxfer(bank, a, b, amt): bank[a] = read_accounts(bankfile)    bank[a] = bank[a] – amt bank[b] = bank[b] + amt write_accounts(bankfile)

Page 27: Transaction

27

Shadow copyxfer(bank, a, b, amt): bank[a] = read_accounts(bankfile)    bank[a] = bank[a] – amt bank[b] = bank[b] + amt write_accounts(#bankfile) rename(“#bankfile”, bankfile)

Page 28: Transaction

28

Rename

• Rename – unlink(bankname)– link("newfile", bankname)– unlink("newfile")

Page 29: Transaction

29

File system data structures

• directory data blocks:– filename “bank” → inode 12– filename “#bank” → inode 13

• inode 12:– data blocks: 3, 4, 5– refcount: 1

• inode 13:– data blocks: 6, 7, 8– refcount: 1

Page 30: Transaction

30

Rename• What needs to happen during rename?– Point "bank" directory entry at inode 13– Remove "newfile" directory entry– Remove refcount on inode 12

• First try at rename(x, y):– y's dirent gets x's inode #– decref(y's original inode)– remove x's dirent

• Problems– What happens if we crash after modifying y's dirent?– Two names point to inode 13, refcount is 1.– What if we increment refcount before, and decrement it

afterwards?

Page 31: Transaction

31

Increase ref-count before

rename(x, y):   newino = lookup(x)   oldino = lookup(y)

   incref(newino)   change y's dirent to newino   decref(oldino)   remove x's dirent   decref(newino)

Page 32: Transaction

32

rename(“#bank”, “bank”)

• directory data blocks:– filename “bank” → inode 12– filename “#bank” → inode 13

• inode 12:– data blocks: 3, 4, 5– refcount: 1

• inode 13:– data blocks: 6, 7, 8– refcount: 2

Page 33: Transaction

33

rename(“#bank”, “bank”)

• directory data blocks:– filename “bank” → inode 13– filename “#bank” → inode 13

• inode 12:– data blocks: 3, 4, 5– refcount: 1

• inode 13:– data blocks: 6, 7, 8– refcount: 2

Page 34: Transaction

34

rename(“#bank”, “bank”)

• directory data blocks:– filename “bank” → inode 13– filename “#bank” → inode 13

• inode 12:– data blocks: 3, 4, 5– refcount: 0

• inode 13:– data blocks: 6, 7, 8– refcount: 2

Page 35: Transaction

35

rename(“#bank”, “bank”)

• directory data blocks:– filename “bank” → inode 13– filename “#bank” → inode 13

• inode 12:– data blocks: 3, 4, 5– refcount: 0

• inode 13:– data blocks: 6, 7, 8– refcount: 2

Page 36: Transaction

36

rename(“#bank”, “bank”)

• directory data blocks:– filename “bank” → inode 13– filename “#bank” → inode 13

• inode 12:– data blocks: 3, 4, 5– refcount: 0

• inode 13:– data blocks: 6, 7, 8– refcount: 1

Page 37: Transaction

37

Recovery after crashsalvage(disk): for inode in disk.inodes: inode.refcnt = find_all_refs(disk.root_dir, inode)  if exists(“#bank”): unlink(“#bank”)

Page 38: Transaction

38

Shadow copy• Write to a copy of data, atomically switch to new

copy• Switching can be done with one all-or-nothing

operation (sector write)• Requires a small amount of all-or-nothing atomicity

from lower layer (disk)• Main rule: only make one write to current/live copy

of data– In our example, sector write for rename.– Creates a well-defined commit point.

Page 39: Transaction

39

Shadow copy• Does the shadow copy approach work in

general?– +: Works well for a single file.– -: Hard to generalize to multiple files or directories.

• Might have to place all files in a single directory, or rename subdirs.

– -: Requires copying the entire file for any (small) change.

– -: Only one operation can happen at a time.– -: Only works for operations that happen on a single

computer, single disk.

Page 40: Transaction

40

Summary• Not always possible to use replication to avoid failures.

– To recover from failure, need to reason about what happened to the system

• Atomicity makes it much easier to reason about possible system states:– All-or-nothing atomicity– Before-or-after atomicity

• Shadow copy: one approach to all-or-nothing atomicity– Make all modifications to a shadow copy of data.– Replace old copy with new copy with an all-or-nothing write to one

sector.• -> commit point

– Rule: never modify live data (unless one sector write is all you need)

Page 41: Transaction

41

Summary• Two kinds of atomicity: all-or-nothing, before-or-after

– Shadow copy can provide all-or-nothing atomicity

– Golden rule of atomicity: never modify the only copy!

• Typical way to achieve all-or-nothing atomicity.– Works because you can fall back to the old copy in case of failure.

– Software can also use all-or-nothing atomicity to abort in case of error.

• Drawbacks of shadow file approach:– only works for single file (maybe fixable with shadow dirs)

– copy the entire file for every all-or-nothing action (harder to avoid)

• Still, shadow copy is a simple and effective design when it suffices.– Many Unix applications (e.g., text editors) use it, owing to rename.

Page 42: Transaction

42

All-or-nothing: Logging

• A more general techniques for achieving all-or-nothing atomicity– Idea: keep a log of all changes, and whether each

change commits or aborts– We will start out with a simple scheme that's all-

or-nothing but slow– Then, we will optimize its performance while

preserving atomicity

Page 43: Transaction

Journal Storage• A store operation – Not overwrites old data– create a new, tentative version of the data • remains invisible to any reader outside this all-or-

nothing action • until the action commits

43

Page 44: Transaction

Journal Storage

44

Page 45: Transaction

Journal Storage

45

Page 46: Transaction

Journal Storage1. procedure NEW_ACTION ()2. id ← NEW_OUTCOME_RECORD ()3. id.outcome_record.state ← PENDING4. return id

5. procedure COMMIT (reference id)6. id.outcome_record.state ← COMMITTED

7. procedure ABORT (reference id)8. id.outcome_record.state ← ABORTED

46

Page 47: Transaction

Journal Storage1. procedure READ_CURRENT_VALUE (data_id, caller_id)2. starting at end of data_id repeat until beginning3. v ← previous version of data_id // Get next older

version4. a ← v.action_id // Identify the action a that created it5. s ← a.outcome_record.state // Check action a’s outcome

record

6. if s = COMMITTED then7. return v.value8. else skip v // Continue backward search9. signal (“Tried to read an uninitialized variable!”)

47

Page 48: Transaction

Journal Storage10. procedure WRITE_NEW_VALUE (reference data_id,

new_value, caller_id)11. if caller_id.outcome_record.state = PENDING12. append new version v to data_id13. v.value ← new_value14. v.action_id ← caller_id15. else signal (“Tried to write outside of an all-or-nothing action!”)

48

Page 49: Transaction

Journal Storage

49

Page 50: Transaction

Journal Storage1. procedure TRANSFER (reference debit_account, reference credit_account, amount)

2. my_id ← NEW_ACTION ()

3. xvalue ← READ_CURRENT_VALUE (debit_account, my_id)

4. xvalue ← xvalue - amount

5. WRITE_NEW_VALUE (debit_account, xvalue, my_id)

6. yvalue ← READ_CURRENT_VALUE (credit_account, my_id)

7. yvalue ← yvalue + amount

8. WRITE_NEW_VALUE (credit_account, yvalue, my_id)

9. if xvalue > 0 then

10. COMMIT (my_id)

11. else

12. ABORT (my_id)

13. signal (“Negative transfers are not allowed.”)50

Page 51: Transaction

Log

51

Page 52: Transaction

52

Example: Bank Account App

xfer(bank, a, b, amt):copy(bank, tmp)tmp[a] = tmp[a] – amttmp[b] = tmp[b] + amtrename(tmp, bank)

Page 53: Transaction

53

Shadow copy abort VS. commit

xfer(bank, a, b, amt):copy(bank, tmp)tmp[a] = tmp[a] – amttmp[b] = tmp[b] + amtif tmp[a] < 0:print “Not enough funds”unlink(tmp)else:rename(tmp, bank)

Page 54: Transaction

54

Transaction terminology

xfer(bank, a, b, amt):    begin

bank[a] = bank[a] – amtbank[b] = bank[b] + amtif bank[a] < 0:print “Not enough funds”abortelse:commit

Page 55: Transaction

55

Consider our bank account example• Two accounts: A and B.

– Accounts start out empty.– Run these all-or-nothing actions:

begin    A = 100    B = 50    commit

    begin    A = A - 20    B = B + 20    commit

    begin    A = A + 30    --CRASH--

Page 56: Transaction

A Log Sample    +-------+------+--------+-------+------+--------+-------+TID |  T1   |  T1  |   T1   |  T2   |  T2  |   T2   |   T3  |OLD | A=0   | B=0  |        | A=100 | B=50 |        | A=80  |NEW | A=100 | B=50 | COMMIT | A=80  | B=70 | COMMIT | A=110 |    +-------+------+--------+-------+------+--------+-------+

• Begin• Write variable• Read variable• Commit• Abort

begin    A = 100    B = 50    commit

    begin    A = A - 20    B = B + 20    commit

    begin    A = A + 30    --CRASH--

Page 57: Transaction

Logging

• What happens when a program runs now?– begin: allocate a new transaction ID.– write variable: append an entry to the log.– read variable: scan the log looking for last

committed value.• As an aside: how to see your own updates?• Read uncommitted values from your own tid.

Page 58: Transaction

Read with a logread(log, var):    commits = { } for record r in log[len(log)-1] .. log[0]: if (r.type == commit): commits = commits + r.tid if (r.type == update)    and (r.tid in commits) and (r.var == var): return r.new_val

Page 59: Transaction

Logging– Commit: write a commit record• Expectedly, writing a commit record is the "commit

point" for action, because of the way read works (looks for commit record)

• However, writing log records better be all-or-nothing– One approach, from last time: make each record fit within one

sector

– Abort: • Do nothing or write a record?

– could write an abort record, but not strictly needed• Recover from a crash: do nothing.

– Recovery?

Page 60: Transaction

Performance of Read

• What's the performance of this log-only approach?– Write performance is probably good: sequential

writes, instead of random.– (Since we aren't using the old values yet, we could

have skipped the read.)• Read performance is terrible: scan the log for

every read!– Crash recovery is instantaneous: nothing to do.

Page 61: Transaction

Performance Optimization

• How can we optimize read performance?– Keep both a log and "cell storage”– Log is just as before: authoritative, provides all-or-

nothing atomicity– Cell storage: provides fast reads, but cannot

provide all-or-nothing

• Terminology– "log" an update when it's written to the log– "install" an update when it's written to cell storage

Page 62: Transaction

Read / write with cell storage

62

Write-ahead-log protocol (WAL)Log the update before installing it

Page 63: Transaction

63

Read / write with cell storageread(var):    return cell_read(var)

write(var, value): log.append(cur_tid, update,   var, read(var), value) cell_write(var, value)

Page 64: Transaction

64

Read / write with cell storage

• Two questions we have to answer now:– How to update both the log and cell storage when

an update happens?– How to recover cell storage from the authoritative

log after crash?

Page 65: Transaction

Order Matters• Ordering of logging and installing

– The two together don't have all-or-nothing atomicity

– Can crash in-between, so just one of the two might have taken place

• What happens if we install first and then log?– If we crash, no idea what happened to cell storage, or how to fix it

– Bad idea, violates the golden rule ("Never modify the only copy”)

• The corresponding rule for logging is the "Write-ahead-log protocol" (WAL)– Log the update before installing it

• If we crash, log is authoritative and intact, can repair cell storage– Can think of it as not being the only copy, once it's in the log

Page 66: Transaction

Logging Protocol

• CHANGE Record– The identity of the all-or-nothing action– A component action redo• installs the intended value in cell storage

– After commit, if the system crashes , the recovery procedure can perform the install on behalf of the action

– A second component action undo• reverses the effect on cell storage of the install

– After aborts or the system crashes, it may be necessary for the recovery procedure to reverse the effect

66

Page 67: Transaction

Logging Protocol• The application• NEW_ACTION– log a BEGIN record that contains just the new identity– As the all-or-nothing action proceeds through its pre-commit

phase, it logs CHANGE records• To implement COMMIT or ABORT– logs an OUTCOME record– commit point

67

Page 68: Transaction

Logging Protocol1. procedure TRANSFER (debit_account, credit_account,

amount)2. my_id ← LOG (BEGIN_TRANSACTION)3. dbvalue.old ← GET (debit_account)4. dbvalue.new ← dbvalue.old - amount5. crvalue.old ← GET (credit_account, my_id)6. crvalue.new ← crvalue.old + amount7. LOG (CHANGE, my_id,8. “PUT (debit_account, dbvalue.new)”, //redo

action9. “PUT (debit_account, dbvalue.old)”) //undo

action

68

Page 69: Transaction

Logging Protocol10. LOG ( CHANGE, my_id,11. “PUT (credit_account, crvalue.new)”

//redo action

12. “PUT (credit_account, crvalue.old)”)//undo action

13. PUT (debit_account, dbvalue.new) // install14. PUT (credit_account, crvalue.new) // install15. if dbvalue.new > 0 then16. LOG ( OUTCOME, COMMIT, my_id)17. else18. LOG (OUTCOME, ABORT, my_id)19. signal(“Action not allowed. Would make debit

account negative.”)20. LOG (END_TRANSACTION, my_id)

69

Page 70: Transaction

Logging Protocol

70

Page 71: Transaction

Logging Protocol1. procedure ABORT (action_id)2. starting at end of log repeat until beginning3. log_record ← previous record of log4. if log_record.id = action_id then5. if (log_record.type = OUTCOME)6. then signal (“Can’t abort an already

completed action.”)7. if (log_record.type = CHANGE)8. then perform undo_action of log_record9. if (log_record.type = BEGIN)10. then break repeat11. LOG (action_id, OUTCOME, ABORTED) // Block future

undos.12. LOG (action_id, END)

71

Page 72: Transaction

Logging Protocol: non-volatile cell memory

72

Page 73: Transaction

Logging Protocol: non-volatile cell memory

73

Page 74: Transaction

74

Page 75: Transaction

75

winners: 1067, 1081, 1084, 1083, and 1086

losers: 1082, 1085, and 1087

Page 76: Transaction

Performance now• Writes might still be OK– but we do write twice: log & install

• Reads are fast– just look up in cell storage

• Recovery requires scanning the entire log

• Remaining performance problems:– We have to write to disk twice.– Scanning the log will take longer and longer, as the log

grows.

Page 77: Transaction

Optimization: truncate the log

• Current log grows without bound: not practical.– What part of the log can be discarded?

• Must know the outcome of every action in that part of log

• Cell storage must reflect all of those log records (commits, aborts).

– Truncating mechanism (assuming no pending actions):• Flush all cached updates to cell storage

• Write a checkpoint record, to save our place in the log.

• Truncate log prior to checkpoint record

• (Often log implemented as a series of files, so can delete old log files.)

– With pending actions, delete before checkpoint & earliest undecided record

Page 78: Transaction

Summary• Logging is a general technique for achieving all-or-

nothing atomicity.– Widely used: databases, file systems, ..

• Can achieve reasonable performance with logging.– Writes are always fast: sequential.– Reads can be fast with cell storage.– Key idea 1: write-ahead logging, makes it safe to update

cell storage.– Key idea 2: recovery protocol, undo losers / redo

winners.