Upload
ellen-candy
View
218
Download
4
Tags:
Embed Size (px)
Citation preview
After ImagingThe DBA’s Best Friend
A Few Words About The Speaker
• Tom Bascom• Progress® User since 1987 • White Star Software, LLC• DBAppraise®, LLC• Consulting Services related to Progress
Databases and Application Architecture.
What is it?and
Why Do I Need it?
What is After-Imaging?
• A journal of transaction “notes” that can be replayed against a baseline backup to restore a database to the last completed transaction or a point in time or a specific transaction number.
• This is the same concept that some other databases refer to as the “redo log”.
• Differs from the before image file (undo log) as space is not reused without interaction or scripting.*
* 10.1B AI Archiver improves this.
Why do I need after-imaging?
• Protection from media loss -- such as bad tapes, a crashed disk, a destroyed data center or stolen servers…
I have backups.Do I still need after-imaging?• With a backup your potential exposure to data loss
is the entire time period between backups.• For example -- if you do nightly backups and your
disk crashes at 4:45pm you restore from backup and lose an entire day of work. If you have one or more bad tapes your data loss could be much worse.
• With after-imaging you restore the same backup, roll-forward your archived ai files and lose only uncommitted transactions.
Why else do I need after-imaging?
• Protection from human errors:
• Human error is at least as big a risk as hardware problems.
for each customer: delete customer.end.
$ cd /db$ rm *
for each order: delivered = yes.end.
$ vi dbname.db…:x
Or an Audit Log?
• No, disk mirrors will happily delete both copies of your deleted database.
• Or delete all of your customers on both mirrors.
Isn’t AI the same as disk mirroring?
• No, an audit log cannot be replayed to reconstruct the missing data.
I have OpenEdge Replication.Do I still need after-imaging?
• OE Replication is a super-set of after-imaging. You still must configure and manage after-imaging.
• After-imaging still provides an additional layer of protection – even with OE Replication in place.
• OE Replication is aggressively real-time. You cannot build in a time delay like you can with after-imaging.
What about performance?
• It is not automatically enabled.• You must manage archived logs.• Recovery is not automated.
Are there downsides to after-imaging?
• There might be a very small penalty.• But you can usually only measure it under
extremely high loads.
Loss Prevention StrategiesSLA Data Loss Strategy Hardware Loss Strategy
Days Nightly Backups• Simple & Inexpensive
Service contract• Relatively low % of system cost
Hours Multiple online backups during day•More files to keep track of
Contract with same-day, on-site repair• More expensive, a long time to wait
ManyMinutes
After Imaging• Moderately complex scripting• Monitoring becomes more critical• Skilled DBA is helpful
Some redundant HW• SAN with RAID• Spare parts kept onsite
A fewMinutes
After Imaging• Complex scripting• Monitoring becomes more critical• Skilled DBA is important
Warm spare server• Twice the cost of production HW• Ideally in a remote facility• Additional DB licensing costs
Seconds Open Edge Replication• Much more complex•Skilled DBA is critical•Monitoring extremely critical
Hot spare server & automated fail-over• Twice the cost of production HW• Ideally in a remote facility• Additional DB licensing costs• Additional OS & 3rd party SW costs
Balancing Cost vs Lost Data
Days Hours Many Minutes Few Minutes Seconds$0
$250
$500
$750
$1,000Hypothetical Relative Costs of Different SLAs
How DoesAfter-Imaging
Work?
How does after-imaging work?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
DB
probkup dbname dbname.pbk
First, make a backup!
How does after-imaging work?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
Shared Memory
DBBIWAIW
rfutil dbname –C aimage begin
busy empty emptyempty
Then, enable after-imaging, start the database and start an AI Writer. Extent .a1 will be “busy”.
How does after-imaging work?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
Shared Memory
DBBIWAIW
rfutil dbname –C aimage new
full busy emptyempty
Switch extents. Extent .a1 will be marked “full” and extent .a2 will become “busy”.
How does after-imaging work?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
Shared Memory
DBBIWAIW
rfutil dbname –C aimage new
full full busy empty
Switch extents again. Extent .a2 will be marked “full” and extent .a3 will become “busy”.
How does after-imaging work?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
Shared Memory
DBBIWAIW
rfutil dbname –C aimage new
full fullfull busy
Once more, switch extents. Extent .a3 will be marked “full” and extent .a4 will become “busy”.
How does after-imaging work?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
Shared Memory
DBBIWAIW
rfutil dbname –C aimage new
Switch… Oops! There are no “empty” extents! All after-image extents are either “full” or “busy”!
full fullfull busy
How does after-imaging work?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
Shared Memory
DBBIWAIW
Copy full extents…Use the extent sequence number to name them.
full fullfull busy
.001
.002
.003
How does after-imaging work?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
Shared Memory
DBBIWAIW
Mark the full extents as “empty”.rfutil dbname -C aimage extent empty
empty emptyempty busy
.001
.002
.003
How does after-imaging work?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
Shared Memory
DBBIWAIW
rfutil dbname –C aimage new
.001
.002
.003
busy empty fullempty
How does after-imaging work?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
Shared Memory
DBBIWAIW
ai.sweep
.001
.002
.003
.004
busy empty fullempty
How does after-imaging work?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
Shared Memory
DBBIW
.001
.002
.003
.004
AIW
full busy emptyempty
ai.newai.sweep
.005
How does after-imaging work?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
Shared Memory
DBBIWAIW
.001
.002
.003
.004
.005
empty full emptybusy
…ai.newai.sweep
.006
How do I use after-imaging to recover?
• Restore from backup. The preferred method is to restore to a dedicated recovery area. DO NOT DESTROY a damaged database without first backing it up.
• Determine where to recover to (point in time, transaction id, last archived ai extent...)
• Obtain the archived ai extents from the backup point through to the recovery point.
• Roll forward the archived extents:rfutil dbname -C roll forward [–endtime yyyy:mm:dd:hh:ss] –a archiveExtent
ai.roll dbname startExtent [endExtent]
How do I recover using AI?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
Shared Memory
DB
prorest dbname dbname.pbk < backup.listrfutil dbname –C roll forward –a /ailogs/dbname.001
.002
.003
.004
.006
.005
…
.001
/ailogs
How do I recover using AI?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
Shared Memory
DB
rfutil dbname –C roll forward –a /ailogs/dbname.002.003
.004
.006
.005
…
.001
.002
/ailogs
How do I recover using AI?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
Shared Memory
DB
rfutil dbname –C roll forward –a /ailogs/dbname.003…
.002
.004
.006
.005
…
/ailogs
.001
.003
Post-recovery…
• Remember to enable after-imaging. It is disabled on the roll-forward target!
What is “Log Based Replication”?
• Log Based Replication is a fancy name for using after-image files (“logs”) to maintain a copy of your database.
• Uses for Log Based Replication:– Verified Backup – make sure that your archived AI
files are valid.– Reporting Database – use “norecover” to create a
reporting database.– Warm Spare – keep a copy of your database
(almost) ready to go in failover mode.
How does Log Based Replication work?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
DB
rfutil dbname –C roll forward –a /stg/dbname.001mv /stg/dbname.001 /arc/dbname.001
/stg
.001
.001
/arc
How does Log Based Replication work?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
DB
rfutil dbname –C roll forward –a /stg/dbname.002mv /stg/dbname.002 /arc/dbname.002
/stg
.001
.002
.002
/arc
How does Log Based Replication work?
Database
BI File
AI Logs
BI .a1 .a4.a3.a2
DB
rfutil dbname –C roll forward –a /stg/dbname.seq#mv /stg/dbname.seq# /arc/dbname.seq#
/stg
.001
.002
.003
.004
.006
.005
…
.006
/arc
• The ai archiver is a daemon that automates extent switching and archiving.
• New startup parameters allow you to start, stop and configure the ai archiver.
• Does not handle off-site archiving, redundant archiving, compression or purging of archived logs.
• Uses a hideous file naming convention.• Does not handle recovery.• Does not handle monitoring or alerting.
What about the New! AI Archiver?
AI Archiver (and some other loosely related features)
Command Purpose
proutil dbname -C enableaiarchiver Enable ai archiver (offline).
probkup online dbname -enableaiarchiver -aiarcdir dir -aiarcinterval n [-aiarcdircreate]
Enable ai archiver (online).
rfutil dbname -C aiarchiver setarcdir <dir-list> Set or change archive directory(s)
rfutil dbname -aiarchiver setinterval # Set or change archive interval (seconds; 120 to 86400).
proutil dbname -C addonline [st-file-name] Add extents online.
probkup online dbname backupFile -enableai Enable after-imaging online.
PracticalMatters
How often should I switch extents?
• How much data can you afford to lose?– Can users re-enter 5 minutes of data? 15? 60?– Can you “replay” external transactions? (EDI interfaces and
so forth…)• Is your workload the same 24x7?
– Do the answers above vary between a “batch window” and “online activity”?
– How about weekends and holidays?
• I often find hourly switches at night and every 15 minutes during the day to be a good starting point.
How should I setup after-imaging?
• Add ai extents:
• How many extents?– 4 is the absolute minimum:
• 1 busy, 1 full, 1 empty (plus 1 “locked” if using OE Replication).
– 8 is my recommended default:• The “extras” give you time to react to issues.
– 16 is my suggested maximum – more is just awkward.
prostrct add dbname ai.st-or-prostrct addonline dbname ai.st
# ai.sta /aia /aia /aia /ai
Should I use fixed or variable extents?
• Variable Length– More flexible.– Simpler scripting.– Easier monitoring.– More time to correct problems.
• Fixed Length– Many legacy implementations still use them.– Fixed might be appropriate for very high volume sites.
• Recommendation: Use variable length extents.
How much disk space do I need?
• How much BI space do you use? (How many bi clusters do you close in a period of time?)
• How many archived logs should you keep online?
• Do you keep disk images of backups online?• What about off-site copies of backups and
archived logs?• Do you plan to recover to dedicated recovery
disk space or “on top of” the existing database?
What sort of disks should I use for AI?
• Dedicated disks.– The primary job of after-imaging is to protect against media
failure.– Storing after-image files on the same disks as the data
extents nullifies that protection!• RAID5 (parity) is probably not your best option:
– After-Imaging is, essentially, write-only.– RAID5 disks are performance-challenged when writing.
• RAID10 (mirrored stripes) is probably not beneficial:– After-Imaging writes are sequential.
• RAID1 (mirroring) is the best choice.
AI Implementation WorksheetItem FileSystem Description
Extent Switching Schedule M-F, 9-5 Every 15 minutes; hourly otherwise
Number & Type of Extents 8, Variable, Dedicated RAID 1 disks
AI Extents /ai 8GB (~50 16MB bi clusters per day = 800MB)
Archived Logs /ailog/aizip/aistg/aiver
32GB (40 days)16GB, Zipped logs8GB, staging area for logs to be verified from32GB, archive of verified logs
Verified Backup /aitest 125GB
Backup Strategy /backup 250GB, Backup –norecover from /aitest to disk, then tape
Offsite Archives /ailog scp logs to remote server X, 32GB (40 days)
Recovery Strategy /recover 250GB (current production db size x 2.5)
Warm Spare Strategy X is an offsite mirror of prod, apply offsite logs continuously
Report Server /reports 125GB, Restored from /backup nightly
How do I start after-imaging?
• Backup:– probkup is simpler because it marks the db as
“backed up”.– OS backups require an extra manual step:
• Enable After Imaging:
• Start an AI Writer (AIW):
rfutil dbname -C mark backedup
rfutil dbname -C aimage begin
proaiw dbname
How do I manage after-imaging?Script AI Archiver Descriptionai.new Yes Switches to the next available empty extent.
ai.sweep Partial Copies full extents to (multiple, redundant and possibly remote) archive locations. (The AI Archiver only copies archived extents to a single location on the same server.)
ai.roll No Rolls forward a set of AI logs against a database. Simplifies roll-forward by grouping files and ignoring “wrong extent” warnings.
ai.purge No Purges old archived extents.
ai.warm No Applies AI logs that appear in a staging directory to a target database. Used to maintain warm spares and verified backup databases.
ai.ready No Checks a warm spare or verified backup database to ensure that AI logs are being properly applied.
After-Imaging on UNIX# crontab (source server)#1,16,31,46 * * * * ai.new cs608 base callb callr invpr >> /logs/ai.log 2>&1#2,17,32,47 * * * * ai.sweep cs608 base callb callr invpr >> /logs/ai.log 2>&1#0 20 * * * ai.purge cs608
# crontab (target server)#10,25,40,55 * * * * ai.warm cs608 base > /dev/null#0 * * * * ai.ready cs608 base callb callr invpr > /tmp/ai.ready.log#0 20 * * * ai.purge cs608
How should I monitor after-imaging?
• After-imaging should be enabled.• Busy extents should be 1.• Full extents should be less than or equal to 2.• Empty extents should be “most of them”.• The last messages in the .lg file of a replicated
database should be:
(with appropriately recent date and time stamps.)
(662)Roll forward completed.(334) rfutil -C roll forward session end.
Trouble-shooting
Extents Stop Switching
• You may have disabled cron, the cron job or the ai archiver (if you are using it).
• Or you may have introduced a scripting error.• You may have run out of disk space somewhere.• With variable extents in use and “large files”
enabled disk space becomes the limiting factor. You have more time to detect, respond to and fix the problem.
• With fixed extents the database may stall or crash much sooner.
• If you are out of ideas try a manual extent switch.
Roll Forward Fails
• You may have guessed the wrong extent – this is harmless. Try another. The message in the .lg file tells you which sequence# you need.
• An archived extent might be missing or damaged – find a valid copy and try again. This is a good reason to make redundant copies of ai logs.
• A more serious error may have occurred. Read the .lg file and check out the error on PSDN if necessary. Use “roll forward retry” after correcting the error.
Opening a Replication Target
• Once you start a server or open a single-user session against a replication target you cannot roll-forward any more logs.
• Even if you change no data.• You can, however, safely start a –RO session.• If someone opens the database you will need
to re-initialize the replication target.
Forgetting to Enable After-Imaging.
• Usually happens after a conversion or a recovery/fail-over.
• Add extents online (if necessary).• probkup and enable ai online.• Re-initialize your replication targets.
(Re-)Initializing a Replication Target
• Move any accumulated staged ai logs to a temporary directory.
• Obtain a backup of the source database.• Restore the backup on the target server.• Transfer the 1st needed ai log and all
subsequent logs to the staging directory.– An incorrect log will result in a message in the .lg
file that identifies the needed sequence#.
Why re-initialize?
• Failing back from fail-over recovery to your warm spare.
• Someone accidentally opened your replication target.
• After-imaging was deliberately disabled for some reason.
• Dump and load.
Disabling After-Imaging
• There are not many good reasons to disable after-imaging. This should be very rare.
• Among the possible reasons:– Dumping and loading.– Large, write-intensive processes that can be restarted.
• If you must disable after-imaging:– Backup and be prepared to restore.
• Allowing users to have access in this period is often not compatible with being able to restore from backup.
– Do what needs to be done.– Re-enable after-imaging.– Re-initialize any replication targets.
• The actual commands are in the documentation.
Tricks!
• Getting the next “full” extent:
• Getting an extent’s sequence number:
• Using the verification database for backups:
• Using the backed up verification database for reporting:
probkup dbname dbname.pbk –com –norecover < backup.list
EXTENT=`$DLC/bin/_rfutil ${DB} -C aimage extent full`
prorest dbname dbname.pbk < backup.list
SEQ=`rfutil ${DUMMY} -C aimage scan -a ${EXTENT} | grep number | tail -1 | awk '{print $6}'`
Conclusion
After-Imaging Best Practices• Enable after-imaging on all updateable databases.• Place after-image extents on separate disks from data
extents.• Use 8 to 16 variable extents with “large files” enabled.• Run an AIW.• Switch extents as often as the business needs you to.• Use the sequence number when naming archived logs.• Copy archived logs to a remote location ASAP.• Verify your process by continuously rolling forward.• Monitor your “empty” and “full” extents.• Keep at least 30+ days of archived after-image logs.• Establish a dedicated backup and recovery directory.
?