Upload
stanculeanu
View
224
Download
2
Embed Size (px)
Citation preview
8/14/2019 8.4 NTFS Recovery
1/27
Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas PolzeWindows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze
Unit OS8: File SystemUnit OS8: File System
8.4. NTFS Recovery Support8.4. NTFS Recovery Support
8/14/2019 8.4 NTFS Recovery
2/27
2
Copyright NoticeCopyright Notice 2000-2005 David A. Solomon and Mark Russinovich 2000-2005 David A. Solomon and Mark Russinovich
These materials are part of theThese materials are part of the Windows OperatingWindows Operating
System Internals Curriculum Development Kit,System Internals Curriculum Development Kit,
developed by David A. Solomon and Mark E.developed by David A. Solomon and Mark E.
Russinovich with Andreas PolzeRussinovich with Andreas Polze
Microsoft has licensed these materials from DavidMicrosoft has licensed these materials from David
Solomon Expert Seminars, Inc. for distribution toSolomon Expert Seminars, Inc. for distribution to
academic organizations solely for use in academicacademic organizations solely for use in academic
environments (and not for commercial use)environments (and not for commercial use)
8/14/2019 8.4 NTFS Recovery
3/27
3
Roadmap for Section 8.4Roadmap for Section 8.4
The Evolution of File SystemsThe Evolution of File Systems
Recoverable File SystemRecoverable File System
Log File Service OperationLog File Service Operation
NTFS Recovery ProceduresNTFS Recovery Procedures
Fault-Tolerance SupportFault-Tolerance Support
Volume Management -Volume Management -Striped and Spanned VolumesStriped and Spanned Volumes
8/14/2019 8.4 NTFS Recovery
4/27
4
NTFS Recovery SupportNTFS Recovery Support
Transaction-based logging schemeTransaction-based logging scheme
Fast, even for large disksFast, even for large disks
Recovery is limited to file system dataRecovery is limited to file system data
Use transaction processing like SQL server for user dataUse transaction processing like SQL server for user data
Tradeoff: performance versus fully fault-tolerant file systemTradeoff: performance versus fully fault-tolerant file system
Design options for file I/O & caching:Design options for file I/O & caching:Careful writeCareful write: VAX/VMS fs, other proprietary OS fs: VAX/VMS fs, other proprietary OS fs
Lazy writeLazy write: most UNIX fs, OS/2 HPFS: most UNIX fs, OS/2 HPFS
8/14/2019 8.4 NTFS Recovery
5/27
5
Careful Write File SystemsCareful Write File Systems
OS crash/power loss may corrupt file systemOS crash/power loss may corrupt file system
Careful write file system orders write operations:Careful write file system orders write operations:
System crash will produce predictable, nonSystem crash will produce predictable, non--critical inconsistenciescritical inconsistencies
Update to disk is broken in sub operations:Update to disk is broken in sub operations:
Sub operations are written seriallySub operations are written serially
Allocating disk space: first write bits in bitmap indicating usage;Allocating disk space: first write bits in bitmap indicating usage;then allocate space on diskthen allocate space on disk
I/O requests are serialized:I/O requests are serialized:
Allocation of disk space by one process has to be completed beforeAllocation of disk space by one process has to be completed beforeanother process may create a fileanother process may create a file
No interleaving sub operations of the two I/O requestsNo interleaving sub operations of the two I/O requests
Crash: volume stays usable; no need to run repair utilityCrash: volume stays usable; no need to run repair utility
8/14/2019 8.4 NTFS Recovery
6/27
6
Lazy Write File SystemsLazy Write File Systems
Careful fCareful fileile ssystemystem write sacrifices speed for safetywrite sacrifices speed for safety
Lazy write improves performance by write back cachingLazy write improves performance by write back caching
Modifications are written to the cache;Modifications are written to the cache;
Cache flush is an optimized background activityCache flush is an optimized background activity
Less disk writes; buffer can be modified multiple timesLess disk writes; buffer can be modified multiple times
before being written to diskbefore being written to disk
File system can return to caller before op. is completedFile system can return to caller before op. is completed
Inconsistent intermediate states on volume are ignoredInconsistent intermediate states on volume are ignored
Greater risk / user inconvenience if system failsGreater risk / user inconvenience if system fails
8/14/2019 8.4 NTFS Recovery
7/27
7
Recoverable File SystemRecoverable File System(Journaling File System)(Journaling File System)
Safety of careful write fs / performance of lazy write fsSafety of careful write fs / performance of lazy write fs
Log file + fast recovery procedureLog file + fast recovery procedure
Log file imposes some overheadLog file imposes some overhead
Optimization over lazy write: distance between cache flushesOptimization over lazy write: distance between cache flushesincreasedincreased
NTFS supportsNTFS supports cache write-throughcache write-through andand cache flushingcache flushingtriggered by applicationstriggered by applications
No extra disk I/O to update fs data structures necessary:No extra disk I/O to update fs data structures necessary:all changes to fs structure are recorded in log file which can beall changes to fs structure are recorded in log file which can bewritten in a single operationwritten in a single operation
In the future, NTFS may support logging for user files (hooks inIn the future, NTFS may support logging for user files (hooks inplace)place)
8/14/2019 8.4 NTFS Recovery
8/27
8
Log File Service (LFS)Log File Service (LFS)
LFS is designed to provide logging to multipleLFS is designed to provide logging to multiplekernel components (clients)kernel components (clients)
Currently used only by NTFSCurrently used only by NTFS
Cachemanager
I/O manager
NTFS driver
Access the mapped
file or flush the cache
Flush the
log file
Write/flush
the log file
Log file
serviceLog the transaction
Write the
Volume updates
8/14/2019 8.4 NTFS Recovery
9/279
Log File RegionsLog File Regions
NTFS calls LFS to read/write restart areaNTFS calls LFS to read/write restart area
Context info: location of logging area to be used for recoveryContext info: location of logging area to be used for recovery
LFS maintains 2nd copy of restart areaLFS maintains 2nd copy of restart area
Logging area: circularly reusedLogging area: circularly reused
LFS uses logical sequence numbers (LSNs) to identify log recordsLFS uses logical sequence numbers (LSNs) to identify log records
NTFS never reads/writes transactions to log file directlyNTFS never reads/writes transactions to log file directly
During recovery:During recovery:
NTFS calls LFS to read forward; recorded transactions are redoneNTFS calls LFS to read forward; recorded transactions are redone
NTFS calls LFS to read backward; undo all incompletely loggedNTFS calls LFS to read backward; undo all incompletely loggedtransactionstransactions
LFS restart area infinite logging area
Copy 1 Copy 2 Log records
8/14/2019 8.4 NTFS Recovery
10/2710
Operation of the LFS/NTFSOperation of the LFS/NTFS
1.1. NTFS calls LFS to record in (cached) log file anyNTFS calls LFS to record in (cached) log file anytransactions that will modify volume structuretransactions that will modify volume structure
2.2. NTFS modifies the volume (also in the cache)NTFS modifies the volume (also in the cache)
3.3. Cache manager calls LFS to flush log file to diskCache manager calls LFS to flush log file to disk(LFS implements flushing by calling cache manager back, telling(LFS implements flushing by calling cache manager back, tellingwhich page to flush)which page to flush)
4.4. After cash manager flushes log file, it flushes volumeAfter cash manager flushes log file, it flushes volumechangeschanges
->-> Transactions of unsuccessful modifications can beTransactions of unsuccessful modifications can beretrieved from log file and un-/redoneretrieved from log file and un-/redone
Recovery begins automatically the first time a volume isRecovery begins automatically the first time a volume isused after system is rebooted.used after system is rebooted.
8/14/2019 8.4 NTFS Recovery
11/2711
Log Record TypesLog Record TypesUpdate recordsUpdate records (series of ...)(series of ...)
Most common; each record contains:Most common; each record contains:
Redo informationRedo information: how to reapply on subop. of a committed trans.: how to reapply on subop. of a committed trans.
Undo informationUndo information: how to reverse a partially logged sub operation: how to reverse a partially logged sub operation
Last record commits the transactionLast record commits the transaction (not shown here)(not shown here)
Log file records
... T1a T1b T1c
Redo: Allocate/initialize an MFT file record
Undo: Deallocate the file record
Redo: Add the filename to the index
Undo: Remove the filename from the index
Redo: Set bits 3-9 in the bitmap
Undo: Clear bits 3-9 in the bitmap
Recovery: redo committed/undo incompletely logged transact.Recovery: redo committed/undo incompletely logged transact.
8/14/2019 8.4 NTFS Recovery
12/2712
Log Records (contd.)Log Records (contd.)
Physical vs. logical description of redo/undo actions:Physical vs. logical description of redo/undo actions:
Delete file a.dat vs. Delete byte range on diskDelete file a.dat vs. Delete byte range on disk
NTFS writes update records with physical descriptionsNTFS writes update records with physical descriptions
NTFS wNTFS wrrites update records (usually several) for:ites update records (usually several) for:
Creating a fileCreating a file
Deleting a fileDeleting a file
Extending a fileExtending a file
Truncating a fileTruncating a file
Setting file informationSetting file information
Renaming a fileRenaming a file
Changing security applied to a fileChanging security applied to a file
Redo/undo ops. must be idempotentRedo/undo ops. must be idempotent(might be applied twice)(might be applied twice)
8/14/2019 8.4 NTFS Recovery
13/2713
Checkpoint RecordsCheckpoint Records
NTFS periodically writes a checkpoint recordNTFS periodically writes a checkpoint record
Describes, what processing would be necessary to recover aDescribes, what processing would be necessary to recover a
volume if a crash would occur immediatelyvolume if a crash would occur immediately
How far back in the log file must NTFS go to begin recoveryHow far back in the log file must NTFS go to begin recovery
LSN of checkpoint record is stored in restart areaLSN of checkpoint record is stored in restart area
Log file records
... LSN 2058 LSN 2059 LSN 2060
Checkpoint record
NTFS restart
8/14/2019 8.4 NTFS Recovery
14/2714
Log File FullLog File Full
LFS presents log file to NTFS as is it were infinitely largeLFS presents log file to NTFS as is it were infinitely largeWriting checkpoint records usually frees up spaceWriting checkpoint records usually frees up space
LFS tracks several numbers:LFS tracks several numbers:
Available log spaceAvailable log space
Amount of space needed to write an incoming log record and to undo theAmount of space needed to write an incoming log record and to undo the
writewriteAmount of space needed to roll back all active (no committed) transactions,Amount of space needed to roll back all active (no committed) transactions,should that be necessaryshould that be necessary
Insufficient space: Log file full error & NTFS exceptionInsufficient space: Log file full error & NTFS exception
NTFS prevents further transactions on files (block creation/deletion)NTFS prevents further transactions on files (block creation/deletion)
Active transactions are completed or receive log file full exceptionActive transactions are completed or receive log file full exception
NTFS calls cache manager to flush unwritten dataNTFS calls cache manager to flush unwritten data
If data is written, NTFS marks log file empty; resets beginning of log fileIf data is written, NTFS marks log file empty; resets beginning of log file
No effect on executing programs (short I/O pause)No effect on executing programs (short I/O pause)
8/14/2019 8.4 NTFS Recovery
15/2715
Recovery - PrinciplesRecovery - Principles
NTFS performs automatic recoveryNTFS performs automatic recoveryRecovery depends on two NTFS in-memory tables:Recovery depends on two NTFS in-memory tables:
Transaction tableTransaction table: keeps track of active transactions (not completed): keeps track of active transactions (not completed)(sub operations of these transactions must be removed from disk)(sub operations of these transactions must be removed from disk)
Dirty page tableDirty page table: records which pages in cache contain: records which pages in cache contain
modifications to file system structure that have not yet been writtenmodifications to file system structure that have not yet been writtento diskto disk
NTFS writes checkpoint every 5 sec.NTFS writes checkpoint every 5 sec.
Includes copy of transaction table and dirty page tableIncludes copy of transaction table and dirty page table
Checkpoint includes LSNs of the log records containing the tablesCheckpoint includes LSNs of the log records containing the tables
Dirty page
table
Update
record
Transaction
table
Checkpoint
record
Update
record
Update
record
Begin of checkpoint operation End of checkpoint operation
Analysis pass
8/14/2019 8.4 NTFS Recovery
16/2716
Recovery - PassesRecovery - Passes
1.1. Analysis passAnalysis pass NTFS scans forward in log file from beginning of last checkpointNTFS scans forward in log file from beginning of last checkpoint
Updates transaction/dirty page tables it copied in memoryUpdates transaction/dirty page tables it copied in memory
NTFS scans tables for oldest update record of a nonNTFS scans tables for oldest update record of a non--commitcommittted trans.ed trans.
1.1. Redo passRedo pass
NTFS looks for page update records which contain volumeNTFS looks for page update records which contain volumemodification that might not have been flushed to diskmodification that might not have been flushed to disk
NTFS redoes these updates in the cache until it reaches end of log fileNTFS redoes these updates in the cache until it reaches end of log file
Cache manager lazy writer thread begins to flush cache to diskCache manager lazy writer thread begins to flush cache to disk
1.1. Undo passUndo pass
Roll back any transactions that weren'tt committed when system failedRoll back any transactions that weren'tt committed when system failed
After undo pass volume is at consistent stateAfter undo pass volume is at consistent state
Write empty LFS restart area; no recovery is needed if system fails nowWrite empty LFS restart area; no recovery is needed if system fails now
8/14/2019 8.4 NTFS Recovery
17/2717
Undo Pass - ExampleUndo Pass - Example
Transaction 1 was commited before power failureTransaction 1 was commited before power failure
Transaction 2 was still activeTransaction 2 was still active
NTFS must log undo operations in log file!NTFS must log undo operations in log file!
Power might fail again during recovery;Power might fail again during recovery;
NTFS would have to redo its undo operationsNTFS would have to redo its undo operations
LSN
4044
LSN
4045
LSN
4046
LSN
4047
LSN
4048
LSN
4049
Transaction committed record
Redo: Allocate/Initialize an MFT file record
Undo: Deallocate the file record
Redo: Add the filename to the index
Undo: Remove the filename from the index
Redo: Set bits 3-9 in the bitmap
Undo: Clear bits 3-9 in the bitmap
Powerfailure
8/14/2019 8.4 NTFS Recovery
18/2718
NTFS Recovery - ConclusionsNTFS Recovery - Conclusions
Recovery will return volume to some preexisting consistent state (notRecovery will return volume to some preexisting consistent state (not
necessarily state before crash)necessarily state before crash)
Lazy commit algorithm: log file is not immediately flushed when a transactionLazy commit algorithm: log file is not immediately flushed when a transactioncommitted record is writtencommitted record is written
LFS batches records;LFS batches records;
Flush when cache manager calls or check pointing record is writtenFlush when cache manager calls or check pointing record is written
(once every 5 sec)(once every 5 sec)Several parallel transactions might have been active before crashSeveral parallel transactions might have been active before crash
NTFS uses log file mechanisms for error handlingNTFS uses log file mechanisms for error handling
Most I/O errors are not file system errorsMost I/O errors are not file system errors
NTFS might create MFT record and detect that disk is full when allocatingNTFS might create MFT record and detect that disk is full when allocating
space for a file in the bitmapspace for a file in the bitmap
NTFS uses log info to undo changes and returns disk full error to callerNTFS uses log info to undo changes and returns disk full error to caller
8/14/2019 8.4 NTFS Recovery
19/2719
Fault Tolerance SupportFault Tolerance Support
NTFS capabilities are enhanced by the fault-tolerantNTFS capabilities are enhanced by the fault-tolerantvolume managersvolume managers FtDiskFtDisk/DMIO/DMIO
Lies above hard disk drivers in the I/O systems layered driverLies above hard disk drivers in the I/O systems layered driverschemescheme
FtDisk for basic disksFtDisk for basic disks
DMIO for dynamic disksDMIO for dynamic disks
Volume management capabilitiesVolume management capabilities::
Redundant data storageRedundant data storage
Dynamic data recovery from bad sectors on SCSI disksDynamic data recovery from bad sectors on SCSI disks
NTFS itself implements bad-sector recovery forNTFS itself implements bad-sector recovery fornon-SCSI disksnon-SCSI disks
8/14/2019 8.4 NTFS Recovery
20/2720
Volume Management Features Volume Management Features
Spanned VolumesSpanned Volumes
SpannedSpanned VolumeVolumess::
single logical volume composed of a maximum of 32 areas of free spacesingle logical volume composed of a maximum of 32 areas of free spaceon one or more diskson one or more disks
NTFS volume sets can be dynamically increased in sizeNTFS volume sets can be dynamically increased in size(only bitmap file which stores allocation status needs to be extended)(only bitmap file which stores allocation status needs to be extended)
FtDiskFtDisk/DMIO/DMIO hide physical configuration of disks from file systemhide physical configuration of disks from file system
Tool: WindowsTool: Windows Disk Management MMC snap-inDisk Management MMC snap-in
Spanned volumes were called volume sets in Windows NT 4.0Spanned volumes were called volume sets in Windows NT 4.0
C:
(100 MB)
E:
(100 MB)
D:
(100 MB)
D:
(100 MB)
Volume set D:
occupies half of
two disks
8/14/2019 8.4 NTFS Recovery
21/2721
StripeStriped Volumesd Volumes
Series of partitions, one partition per disk (of same size)Series of partitions, one partition per disk (of same size)
Combined into a single logical volumeCombined into a single logical volume
FtDiskFtDisk/DMIO/DMIO optimize data storage and retrieval timesoptimize data storage and retrieval times
Stripes are narrow: 64KBStripes are narrow: 64KB
Data tends to be distributed evenly among disksData tends to be distributed evenly among disks
Multiple pending read/write ops. will operate on different disksMultiple pending read/write ops. will operate on different disks
Latency for disk I/O is often reduced (parallel seek operations)Latency for disk I/O is often reduced (parallel seek operations)
(150 MB) (150 MB) (150 MB)
1 2
4
3
8/14/2019 8.4 NTFS Recovery
22/2722
Fault Tolerant VolumesFault Tolerant Volumes
FtDiskFtDisk/DMIO/DMIO implement redundant storage schemesimplement redundant storage schemesMirror setsMirror sets
Stripe sets with parityStripe sets with parity
Sector sparingSector sparing
ToolToolss:: Windows Disk Management MMC snap-inWindows Disk Management MMC snap-in
Mirrored VolumesMirrored Volumes:: Contents of a partition on one disk are duplicated on another diskContents of a partition on one disk are duplicated on another disk
FtDiskFtDisk/DMIO/DMIO write same data to both locationswrite same data to both locations
Read operations are done simultaneously on both disksRead operations are done simultaneously on both disks
(load balancing)(load balancing)
C:
C:
(mirror)
8/14/2019 8.4 NTFS Recovery
23/2723
RAID-5 VolumesRAID-5 Volumes
Fault tolerant version of a regular stripe setFault tolerant version of a regular stripe set
Parity: logical sum (XOR)Parity: logical sum (XOR)
Parity info is distributed evenly over available disksParity info is distributed evenly over available disks
FtDiskFtDisk/DMIO/DMIO reconstruct missing data by using XOR op.reconstruct missing data by using XOR op.
parity
8/14/2019 8.4 NTFS Recovery
24/2724
Bad Cluster RecoveryBad Cluster Recovery
Sector sparing is supported by FtDiskSector sparing is supported by FtDisk/DMIO/DMIO
Dynamic copying of recovered data to spare sectorsDynamic copying of recovered data to spare sectors
Without intervention from file system / userWithout intervention from file system / user
Works for certain SCSI disksWorks for certain SCSI disks
FtDiskFtDisk/DMIO/DMIO return bad sector warning to NTFSreturn bad sector warning to NTFS
Sector reSector re--mapping is supported by NTFSmapping is supported by NTFS
NTFS will not reuse bad clustersNTFS will not reuse bad clusters
NTFS copies data recovered by FtDiskNTFS copies data recovered by FtDisk/DMIO/DMIO into a new clusterinto a new cluster
NTFS cannot recover data from bad sector without helpNTFS cannot recover data from bad sector without helpfrom FtDiskfrom FtDisk/DMIO/DMIO
NTFS will never write to bad sector (reNTFS will never write to bad sector (re--map before write)map before write)
8/14/2019 8.4 NTFS Recovery
25/2725
Bad-cluster re-mappingBad-cluster re-mapping
NTFS filenameStandard info Security desc. Data
DataData
User file
StartingVCN
StartingLCN
Number ofclusters
0 1355 2
2 1049 1
3 1588 4
VCN 0 1
LCN 1355 1356 DataData
VCN 3 4 5 6
LCN 1588 1589 1590 1591
DataData
VCN 2
LCN 1049
NTFS filenameStandard info Security desc. DataStarting
VCNStarting
LCNNumber of
clusters
0 1357 1
BadBad
VCN 0
LCN 1357
8/14/2019 8.4 NTFS Recovery
26/2726
Further ReadingFurther Reading
Mark E. Russinovich and David A. Solomon,Mark E. Russinovich and David A. Solomon,Microsoft Windows Internals, 4th Edition, MicrosoftMicrosoft Windows Internals, 4th Edition, MicrosoftPress, 2004.Press, 2004.
Chapter 12 - File SystemsChapter 12 - File SystemsNTFS Recovery Support (from pp. 775)NTFS Recovery Support (from pp. 775)
Chapter 10 - Storage ManagementChapter 10 - Storage ManagementVolume Management (from pp. 622)Volume Management (from pp. 622)
8/14/2019 8.4 NTFS Recovery
27/27
Source Code ReferencesSource Code References
Windows Research Kernel sources do notWindows Research Kernel sources do not
include NTFSinclude NTFS
A raw file system driver is included inA raw file system driver is included in
\base\ntos\raw\base\ntos\raw
Also see \base\ntos\fstrl (File System Run-TimeAlso see \base\ntos\fstrl (File System Run-Time
Library)Library)