5
SYBASE RFI ( RECOVERY FAULT ISOLATION ) Sybase ASE recovers all databases by rolling back or rolling forward transactions to online the database to bring it to a consistent state when the dataserver is restarted. During normal ASE operation, all changes to a database are written first to table syslogs and then to the data pages in the data caches. Eventually the checkpoint process flushes the changes to disk. Log pages are written to disk when the transactions commit. However, because all changed pages are written to disk whenever a checkpoint occurs, changes could be written to the log or data pages even when they are part of an incomplete or uncommitted transaction. If the dataserver crashes after an uncommitted transaction is written to the log but before the transaction completes, the recovery upon startup reads the log and ensures that no uncommitted changes are reflected in the database by rolling back the changes. Likewise, online recovery ensures that any changes recorded in the log for committed transactions that have not yet been flushed to disk are updated on the data pages and written to disk by rolling forward the transactions. In prior versions of ASE partial recovery of a database was not possible. If recovery failed due to some corruption, there was no way to recover the uncorrupt portion of the database and bring it online. The only option was to either recover from backups or “suicide” the log. The “recover from backups” approach has the drawback of not being to recover up to the minute since transaction backups are typically taken every 5 to 15 minutes. The obvious drawback from “suiciding the log” is the possibility of data physical and logical data corruption. The corruption may not surface until a later time, and the relation to the earlier log suicide is not always obvious.

Sybase RFI

Embed Size (px)

DESCRIPTION

A quick guide to recovery fault isolation in Sybase ASE

Citation preview

Page 1: Sybase RFI

SYBASE RFI ( RECOVERY FAULT ISOLATION )

Sybase ASE recovers all databases by rolling back or rolling forward transactions to online the database to bring it to a consistent state when the dataserver is restarted.

During normal ASE operation, all changes to a database are written first to table syslogs and then to the data pages in the data caches. Eventually the checkpoint process flushes the changes to disk. Log pages are written to disk when the transactions commit. However, because all changed pages are written to disk whenever a checkpoint occurs, changes could be written to the log or data pages even when they are part of an incomplete or uncommitted transaction. If the dataserver crashes after an uncommitted transaction is written to the log but before the transaction completes, the recovery upon startup reads the log and ensures that no uncommitted changes are reflected in the database by rolling back the changes. Likewise, online recovery ensures that any changes recorded in the log for committed transactions that have not yet been flushed to disk are updated on the data pages and written to disk by rolling forward the transactions.

In prior versions of ASE partial recovery of a database was not possible. If recovery failed due to some corruption, there was no way to recover the uncorrupt portion of the database and bring it online. The only option was to either recover from backups or “suicide” the log.

The “recover from backups” approach has the drawback of not being to recover up to the minute since transaction backups are typically taken every 5 to 15 minutes. The obvious drawback from “suiciding the log” is the possibility of data physical and logical data corruption. The corruption may not surface until a later time, and the relation to the earlier log suicide is not always obvious.

Sybase ASE now implements Recovery Fault Isolation (RFI), a new online recovery feature that provides for partial recovery of the database. RFI can isolate corruption, encountered during recovery, to the corrupt pages. This allows us to restore database to a consistent state by isolating and repairing corruption on a page by page (and hence, on an object by object) basis without having to go back to database backups or log suiciding. RFI can be used only when non-system object corruption is encountered. If system tables are corrupt, the entire database has to be restored from backups.

RFI allows the DBA to select the granularity of recovery for each database. A DBA can

Page 2: Sybase RFI

1) Mark the entire database suspect on any recovery failure ( default behavior)

2) Set a threshold of the number of pages that would be allowed to be offline. The DBA can determine if the database would be updateable or just read only.

3) Also, the DBA can setup the database to be marked suspect on any recovery failure so that the DBA can fix the corruption before the database is opened for all users.

Page level granularity, allows the server to offline corrupt pages in a transaction while onlining other pages. Since the entire database is not recovered by replaying the log for rollforward/rollback data could be inconsistent i.e. some transactions may be partially available due to offline data. There is no way to determine which transactions involved offlined pages except by manual examination.

It is possible to online corrupt pages. However, doing so without first repairing the pages will result in logical data inconsistency. When restoring a database by repairing offline pages or restore the affected object from a backup. The DBA along with application team must determine the extent to which logical consistency of the database has been compromised. It would be wise to revert to restoring the database from backups and applying transaction logs if the extent of corruption is undetermined. It is also important to run dbcc tablealloc or dbcc indexalloc on any objects with suspect pages.

How to proceed if Online recovery fails

The following options should be considered from most to the least desirable in that order

Restoring from Backups and applying transaction logs Partial recovery using RFI Suiciding the Log

Restoring from Backups and applying transaction logs

This was the only course of action in earlier versions of ASE if recovery failed, the database could not be repaired, and suicide of the log was not desirable and is still the preferred option. It is still the preferred option for recovering the database after failure during online recovery if a) the entire database is marked suspect due to thresholds being exceeded, or b) system

Page 3: Sybase RFI

table(s) are corrupt. It is also the preferred option whenever physical and logical consistency is critical.

Partial recovery using RFI

Implementing RFI gives us an opportunity to recover the database partially before opting to suicide the log.

1. Isolated pages are known and can be examined. The DBA can decide whether to repair the faults or restore from backups. If the isolated pages belong to an index, the index can be dropped and rebuilt. For data pages, data can be recovered possibly by other means such as restoring the backup to a development environment and bcp-ing the data. The data pages can also be left offline safely depending on usage of these tables etc.

2. You can set thresholds to determine at what level page faults are unacceptable, and at which the whole database should remain unrecovered.

3. You can make the database available to users while conducting repairs. The database can be configured to allow updates or to allow read-only access.

RFI commands / steps

1. Check/alter granularity using sp_setsuspect_granularity:

sp_setsuspect_granularity [dbname [,{"database" | "page"} [, "read_only"]]]

using read_only mode is encouraged.If a query attempts to access an offline page, the server raises error messages 12716 and 12717.

2. Set the threshold for escalating page level granularity to database granularity using sp_setsuspect_threshold:

sp_setsuspect_threshold [dbname [,threshold ]]

Once the number of offlined pages reaches this threshold value, recovery marks the entire database suspect.

3. You can print a list of pages or databases that are suspect after recovery using sp_listsuspect_db and sp_listsuspect_page

Page 4: Sybase RFI

sp_listsuspect_dbsp_listsuspect_page [dbname]

You can bring these suspect pages or database online using sp_forceonline_db or sp_forceonline_page:

sp_forceonline_db dbname {"sa_on" | "sa_off" | "all_users"}

sp_forceonline_page dbname, pagenumber {"sa_on" | "sa_off" | "all_users"}

sa_on and sa_off toggle the database or page online and offline.

Suiciding the log

RFI eliminates the need to suicide the log