Upload
jase-bebb
View
218
Download
4
Tags:
Embed Size (px)
Citation preview
1
CS411Database Systems
12: Recoveryobama and eric schmidt http://www.youtube.com/watch?v=k4RRi_ntQc8sysadmin song http://www.youtube.com/watch?v=udhd9fmOdCs14th century sysadmin http://www.youtube.com/watch?v=8UXAF-CUmIA
2
Bad things happen, but the DB contents must live on regardless.
System crashes are the most common problem.
We’ll worry about media failure later.
3
On restart, some transactions should be aborted, others must be durable.
crash!T1
T2
T3
T4
T5
T1, T2, T3 are should be durable.
T4, T5 should be aborted.
Recovery has a big impact on buffer management.
Force T’s writes to disk at commit time?– Poor response time.– If not, how do we
guarantee durability?
Steal dirty buffer pool pages from uncommitted Tns?– If not, poor throughput.– If so, what about
atomicity?
Force
No Force
No Steal Steal
Trivial
Desired
If T aborts, must undo T’s writes on disk!
The log helps us guarantee atomicity and durability.
Append-only file with all info needed to REDO or UNDO every write
Give it its own disk (why?)
6
Undo Logging
(force, steal)
7
Undo logs don’t need to save after-images
Log record types:<START T>
– transaction T has begun
<COMMIT T> – T has committed
<ABORT T>– T has aborted
<T, X, old_v>– T has updated element X, and its old value was
old_v
8
Undo logging has 2.5 rules.
U1: If T modifies X, then the log record <T, X, old_v> must be on disk before X is written to disk
U2: If T commits, then <COMMIT T> can’t be written to disk until all data changes by T are on disk (“early OUTPUTs”)
There may be many pages
to force, &
other Tns may want
them in memor
y
U2.5: Need to do the right thing when a transaction aborts (what?)
Buffer management
rule, not a logging rule
9
Crash recovery is easy with an undo log.1. Scan log, decide which transactions T
completed. <START T>….<COMMIT T>…. <START T>….<ABORT T>……. <START T>………………………
2. Starting from the end of the log, undo all modifications made by incomplete transactions.
The chance of crashing during recovery is relatively high!
But undo recovery is idempotent: just restart it if it crashes.
10
Detailed algorithm for undo log recovery
From the last entry in the log to the first:– <COMMIT T>: mark T as completed– <ABORT T>: mark T as completed– <T,X,v>: if T is not completed
then write X=v to disk else ignore
– <START T>: ignore
So how should we
handle ordinary
Tn aborts?
11
Undo recovery practice
…<T6,X6,v6>……<START T5><START T4><T1,X1,v1><T5,X5,v5><T4,X4,v4><COMMIT T5><T3,X3,v3><T2,X2,v2>
Which actions do we undo, in which order?
What could go wrong if we undid them in a different order?
12
Scanning a year-long log is SLOW and businesses lose money every minute their DB is down.
Solution: checkpoint the database periodically.
Easy version:
1. Stop accepting new transactions
2. Wait until all current transactions complete
3. Flush log to disk
4. Write a <CKPT> log record, flush
5. Resume transactions
13
During undo recovery, stop at first checkpoint.
……<T9,X9,v9>……(all completed)<CKPT><START T2><START T3<START T5><START T4><T1,X1,v1><T5,X5,v5><T4,X4,v4><COMMIT T5><T3,X3,v3><T2,X2,v2>
T2,T3,T4,T5
other transactions
14
This “quiescent checkpointing” isn’t good enough for 24/7 applications. Instead:
1. Write <START CKPT(T1,…,Tk)>,where T1,…,Tk are all active transactions
2. Continue normal operation
3. When all of T1,…,Tk have completed, write <END CKPT>
15
Example of undo recovery with nonquiescent checkpointing
…………
…<START CKPT T4, T5, T6>…………<END CKPT>………
T4, T5, T6, plus later transactions
earlier transactions plus T4, T5, T5
later transactions
What would go wrong if we didn’t use<END CKPT> ?
What would go wrong if we didn’t use<END CKPT> ?
16
Crash recovery algorithm with undo log, nonquiescent checkpoints.
1. Scan log backwards until the start of the latest completed checkpoint, deciding which transactions T completed. <START T>….<COMMIT T>…. <START T>….<ABORT T>……. <START CKPT {T…}>….<COMMIT T>…. <START CKPT {T…}>….<ABORT T>……. <START T>………………………
2. Starting from the end of the log, undo all modifications made by incomplete transactions.
17
Redo Logging
(no force, no steal)
18
Redo log entries are just slightly different from undo log entries.
<START T>
<COMMIT T>
<ABORT T>
<T, X, new_v> – T has updated element X, and its new value is
new_v
same as before
19
Redo logging has one rule.
R1: If T modifies X, then both <T, X, new_v> and <COMMIT T> must be written to disk before X is written to disk (“late OUTPUT”)
Don’t have to force all those
dirty data pages to disk
before committing!
Don’t steal dirty buffer pages
from uncommitted
tns!
Implicit and reasonable
assumption: log records reach disk in order;
otherwise terrible things will happen.
20
Recovery is easy with an undo log.
1. Decide which transactions T completed. <START T>….<COMMIT T>…. <START T>….<ABORT T>……. <START T>………………………
2. Read log from the beginning, redo all updates of committed transactions.
The chance of crashing during recovery is relatively high!
But REDO recovery is idempotent: just restart it if it crashes.
21
Example of redo recovery
<START T1><T1,X1,v1><START T2><T2, X2, v2><START T3><T1,X3,v3><COMMIT T2><T3,X4,v4><T1,X5,v5>……
Which actions do we redo, in which order?
What could go wrong if we redid them in a different order?
22
Nonquiescent checkpointing is trickier with a redo log than an undo log
1. Write a <START CKPT(T1,…,Tk)>where T1,…,Tk are the active transactions
2. Flush to disk all dirty data pages of transactions committed by the time the checkpoint started, while continuing normal operation
3. After that, write <END CKPT>
dirty = written
23
What exactly does “dirty” mean?
• When you are talking about buffer management and which buffers you can steal, a dirty page is a data page in memory that has been modified but not yet sent back to disk.
• When you are talking about concurrency control, a dirty page is a data page in memory that has been modified but not yet committed. A dirty read is a read of a dirty page.
Either way, the dirty pages are the ones that can get you in trouble.
24
Example of redo recovery with nonquiescent checkpointing
…<START T1>…<COMMIT T1>……<START CKPT T4, T5, T6>……<END CKPT>……<START CKPT T9, T10>…
1. Look forthe last<END CKPT>
2. Redo from <START T>, for committed T in {T4, T5, T6}.
3. Normal redo for committed Tns that started after this point.
All data written by T1 is known
to be on disk
25
But neither undo nor redo logging matches what we would like to have for buffer management
Force
No Force
No Steal Steal
Trivial
Desired
Undo Logging
Redo Logging
Use undo/redo logging to attain this
nirvana
26
Redo/undo logs save both before-images and after-images.
<START T> <COMMIT T> <ABORT T><T, X, old_v, new_v>
– T has written element X; its old value was old_v, and its new value is new_v
Undo/redo recovery has 1.5 rules.
1. Must force the log record for an update to disk before the corresponding data page goes to disk.
As usual, T committed iff <T
commits> is on disk
1.5: Need to do the right thing when a transaction aborts (what?)
Item X can be updated on disk once <T wrote X> is
on disk , before <T
commits> is on disk (i.e., early or late
OUTPUT)
“Write-ahead
logging”
28
Recovery is more complex with undo/redo logging.
1. Redo all committed transactions, starting at the beginning of the log
2. Undo all incomplete transactions, starting from the end of the log
<START T1><T1,X1,v1><START T2><T2, X2, v2><START T3><T1,X3,v3><COMMIT T2><T3,X4,v4><T1,X5,v5>……
REDO
UNDO
“incomplete” = started &
not committed or aborted
How do we know these undos won’t undo some committed
writes?
29
Algorithm for non-quiescent checkpoint for undo/redo
1. Write <start checkpoint, list of all active transactions> to log
2. Flush log to disk3. Write to disk all dirty buffers,
whether or not their transaction has committed(this implies some log records may
need to be written to disk (WAL))
4. Write <end checkpoint> to log
5. Flush log to disk29
Flush dirty
buffer pool
pages
…
<start checkpoint, active Tns are T1, T2, …>
…
<end checkpoint>
…
Active
Tns
Pointers are one of
many tricks to speed up future
undos
UNDO
30
Algorithm for undo/redo recovery with nonquiescent checkpoint 1. Backwards undo pass (end of log to
start of last completed checkpoint)
a. C = transactions that committed after the checkpoint started
b. Undo actions of transactions that (are in A or started after the checkpoint started) and (are not in C)
2. Undo remaining actions by incomplete transactionsa. Follow undo chains for transactions in
(checkpoint active list) – C
3. Forward pass (start of last completed checkpoint to end of log)
a. Redo actions of transactions in C
Active
Tns…
<start checkpoint, A=active Tns>
…<end checkpoint>
…
REDO
S
31
Examples what to do at recovery time?
no <T1 commit>
Undo T1 (undo A, B, C)
…T1 wrote A, ……checkpoint start (T1 active)
…T1 wrote B, ……checkpoint end…T1 wrote C, ……
32
Redo T1: (redo B, C)
…T1 wrote A, ……checkpoint start (T1 active)
…T1 wrote B, ……checkpoint end…T1 wrote C, ……T1 commit
Examples what to do at recovery time?
33
Real world actions
E.g., dispense cash at ATM
Ti = a1…... aj …... an
$
“Solution”:
(1) try to make idempotent
(2) execute real-world actions after commit
Why are these a problem from a
DB perspecti
ve?
34
PHYSICAL DISASTERS
35
These recovery algorithms won’t help you if your disk fails.
Solution: careful replication!
36
Example 1 Triple modular redundancy
Keep 3 copies on separate disks• Output(X) --> three outputs• Input(X) --> three inputs + vote
Copy 1 Copy 2 Copy 3
37
Example 2 Redundant writes, single reads
Keep N copies on separate disks• Output(X) --> N outputs• Input(X) --> Input one copy
- if ok, done
- else try another one
Assumes bad data can be
detected (traditional but false)
Copy 1Copy 1Copy 1Copy 1Copy 1Copy 1
38
Example 3: DB dump + log
backup
databaseactive
databaselog
If active database is lost,– restore active database from backup– bring up-to-date using redo entries in log
39
When can log be discarded?
check-
pointdb
dump
last
needed
undo
not needed for
media recovery
not needed for undo
after system failure
not needed for
redo after system failure
log
time
The real picture: what’s stored where
DB
Data pageseach with a pageLSN
(LSN of last write to that data page)
Xact TablelastLSN
status
Dirty Page TablerecLSN
flushedLSN
RAM
prevLSNXIDtype
lengthpageID
offsetbefore-imageafter-image
LSN (log sequence number)
LogRecords
LOG
Master record
Summary of Logging/Recovery
• Recovery manager guarantees atomicity & durability---two of the ACID properties.
• Redo logging and undo logging are simple but make the system too slow in practice for serious applications.
• Use write-ahead logging with undo/redo logging to speed up the system (by allowing STEAL/NO-FORCE) without sacrificing correctness.
Summary, Cont.
• Checkpointing: A quick way to limit the amount of log to scan on recovery. Nonquiescent checkpoints are especially useful.
• Recovery works in 3 phases:– Analysis: Forward from checkpoint.– Redo: Forward.– Undo: Backward.