59
CSIT 301 (Blum) 1 File Systems

CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

Embed Size (px)

Citation preview

Page 1: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 1

File Systems

Page 2: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

FAT

CSIT 301 (Blum) 2

Page 3: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 3

Too many sectors

• Tied up with the concept of FAT is the notion of clusters.

• The hard drive is organized into sectors but a large hard drive has a large number of sectors. – E.g. 10 GB drive has approx. 20,000,000

sectors.

Page 4: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 4

FAT Review

• FAT (16) uses up to 16 bits to address data on the hard drive (or partition thereof) – 216 = 65,536

– If you address 65,536 sectors, each having 512 bytes then you would have

65,536 512 = 33,554,432 bytes

= 32,768 kilobytes

= 32 megabytes (MB)

Page 5: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 5

Clusters• The various sectors must be addressed. Operating

systems have a limited size address which in turn limits the number of sectors. – Early partitioning was used to allow hard drives to

exceed this limit.

• Another solution to this limitation was to address groups of sectors instead of individual sectors. – A set of sectors (4 to 64) grouped together for

addressing purposes is known as a cluster.

Page 6: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 6

Cluster• Clusters are groups of sectors addressed

in the FAT system. Within FAT(16)

Sectors/Cluster Cluster Size (KB) Partition Capacity (MB)

1 0.5 32

2 1 64

4 2 128

8 4 256

16 8 512

32 16 1024 (= 1 GB)

64 32 2048 (= 2 GB)

Page 7: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 7

The bigger the cluster, the more the slack

• The cluster size is the minimal space that can be used to store a file.

• With 32 sectors per cluster, a cluster was 16KB, much larger than many of the files that need to be stored on a typical partition.

• The unused portion of all of these clusters is called slack.

• While large clusters allowed for larger partitions, they resulted in unacceptable amounts of slack.

Page 8: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 8

More addresses needed

• To have large capacity partitions without sacrificing much of that capacity to slack, a larger address space is needed.

• FAT 32 can devote up to 28 bits to addressing (the other four bits are reserved for other purposes). – Allows one to address 228 = 268,435,456 things

Page 9: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 9

228 is a lot• Even if one addressed sectors then theoretically one

could have a capacity of 268,435,456 512 bytes137,438,953,472 bytes134,217,728 kilobytes131,072 megabytes128 gigabytes

• And that’s if you’re addressing sectors, it’s even larger if you’re addressing clusters.

Page 10: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 10

A very fat FAT • The price one pays for having small clusters

(which save on slack) is to have a large FAT table. • The FAT table does not take up much room as far

as disk space is concerned but it is something one probably wants in memory (disk cache). But a large FAT table will take up too much space in memory.

• So partition size, cluster size and FAT table size is a balancing act.

Page 11: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 11

FAT32 Table

The table shows the FAT32 Table size for various choices of the partition size and cluster size. The size should be compared to the amount of memory since the FAT table is often in memory (disk cache).

Page 12: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 12

Partitioning helped

• Breaking the drive down into smaller pieces helped since the address only had to identify clusters within a partition. This allowed for smaller clusters and less slack.

• The switch from FAT (FAT16) to FAT32 increased the size of the address used to identify clusters. Thus the cluster size could be reduced without introducing a lot of partitions.

Page 13: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 13

Partitioning can still help

• Although FAT 32 allows one to address many more clusters, doing so can have detrimental effects. – The size of the file allocation table increases if there are

more clusters. – The file allocation is something you may read often and

thus something you might want to cache. But if it is too big, it will not fit in the cache or take up too much room in cache.

• The level of cache we are talking about here is holding something in memory to access faster than going to hard drive.

Page 14: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 14

Cluster size automated

• The numbers of sectors in a cluster is set automatically within FAT32. It is based on the size of the partition. – < 256 MB 1 sector/cluster – 256MB to 8 GB 8 sectors/cluster– 8 GB to 16 GB 16 sectors/cluster– 16 GB to 32 GB 32 sectors/cluster – 32 GB to …. 64 sectors/cluster

Page 15: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 15

FAT• The file allocation table (FAT) stores information

about clusters. • The FAT describes how each cluster is being used,

for example, which clusters are free and which are being used. – Sometimes the operating system indicates that a cluster is

being used when it is not. This is called a lost cluster. – You can free up disk space by reassigning lost clusters

with the ScanDisk utility.

• The FAT also indicates how clusters are chained together to form files.

Page 16: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 16

Disk scanning

Depending on options selected, this may require a restart.

Page 17: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 17

FAT• The FAT is located right after the volume boot sector. • The differences in filesystems (such as FAT, FAT32

and NTFS) lie in the size of the address and the management of the FAT.

• For example, there are usually two copies of the FAT (the second serving as a backup of the first). FAT and FAT32 differ in how they manage this backing up process.

• One can determine the filesystem of a drive by using the chkdsk command.

Page 18: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 18

chkdsk Command (while running)

Page 19: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 19

chkdsk Command (completed)

Page 20: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 20

Chkdsk on a floppy

Page 21: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 21

FAT Comparison

Not 32 as the name might suggest

(2G)

Page 22: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 22

Directories and Folders• Users think of files are stored in directories (or

folders). So in addition to the actual location of the information associated with a file, the disk must also store the logical information about where the user believes the file to be stored – the directory structure.

• To each directory, there corresponds a file containing a table with information about what files are in the folder.

Page 23: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 23

Directory entry data

• Each directory table entry has data for – Name of the file (and extension)– Attribute byte (whether the file is read-only,

etc.)– Last data/time the file was modified– File size– Pointer to the first cluster

Page 24: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 24

Attribute Byte

Page 25: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 25

File Properties (Right click on file)

Page 26: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 26

File Properties Dialog Box

Page 27: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 27

Directory Tree

• The files are in directories (folders). The directories are in directories. Ultimately every file on a drive is contained in the root directory.

• The root directory plays a special role. The corresponding file is located right after the two copies of the FAT.

Page 28: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 28

Limited Size?

• FAT (a.k.a. FAT16) limited the size of the root directory.– See table on next slide

• FAT32 lifted this restriction. – Still the root directory is a poor place to locate

too many files.

Page 29: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 29

FAT Limitations on number of entries in directory file

Page 30: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 30

File name size limitations• Originally MSDOS filesystems used 11 bytes for

the name (8 bytes) and extension (3 bytes) of the file in the directory table entry. – Users were stuck with this naming convention.

• Microsoft introduced VFAT in Windows 95 to allow for longer file names. – An alias table was set up, a user’s long file name was

assigned to a short file name.

Page 31: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 31

NTFS

• New Technology File System (NTFS) was built to provide features like:– Reliability: introduced ideas like “transactions”

(grouping certain updates together to maintain integrity)

– Security and Access Control: built-in features to manage who can access files and what type of access they have

Page 32: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 32

NTFS Features (Cont.)

• Large-capacity partitions: allows large partitions and even RAID (Redundant Array of Inexpensive Disks, treating multiple disk as one large disk)

• Slack reduction: allocates space differently from FAT

• Allows for long file names (not limited to 8-character names with 3 character extensions)

• Networking: built with networking in mind

Page 33: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 33

A more structured file system

• In NTFS files are more than just pools of data, they have structure – The difference between FAT and NTFS is somewhat

analogous to the difference between a flat file and a database.

– Just as in databases where one has data and metadata (the data about the data), NTFS has metadata files (files that contain data about other files).

Page 34: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 34

Partition/Volume Boot Sector/Record

• One of the first things made when an NTFS partition is created is the volume boot sector, which contains:– BIOS parameter block: identifies the

partition, how big it is, etc. – Volume boot code: code that starts to load the

operating system

Page 35: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 35

All else is files

• After the volume boot sector, just about everything else is a file. There are – metadata files: files about files

• Created automatically when the partition is formatted

• Placed at the beginning

– (Actual or real) Data files

Page 36: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 36

MFT

• Think of the Master File Table (MFT) as a database containing records about all of the files (both data and metadata, including itself).

• Each file’s record holds the values of its attributes. – The actual data in a data file is simply one of its

attributes.

Page 37: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 37

The first several records

• The first several records in the MFT are about other important metadata files, including– MFT itself– MFT Mirror (1st 16 records)– Log file (keeps account of transactions)– Attribution Definition Table (names file properties and

says what they are)– Root Directory Folder – Bad cluster file – Etc.

Page 38: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 38

MFT Zone

• There will be a record in the MFT for every file on the partition.

• Thus the MFT needs room to grow. • Some space in the partition, called the MFT Zone,

is reserved for this purpose. • If one needs part of the MFT zone for storage, it

will eventually be used. • On the other hand, the MFT can grow to be larger

than the MFT zone. It is then fragmented which could affect performance.

Page 39: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 39

Resident vs. Non-Resident Attributes

• The MFT’s record size is fixed (between 1KB and 4KB), but the attributes may be of any size (especially since a data file’s data is an attribute). – Attributes that are contained in the MFT are

called resident. • A small file may be entirely resident.

– Attributes that are linked to but not actually contained in the MFT are called non-resident.

Page 40: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 40

Extents

• Small files are contained within the MFT • For larger files, the MFT contains a

collection of pointers to the data runs of extents which actually hold the data.

• If the collection of pointers grows too large, then it is placed in a separate file and the MFT points to this file, which in turns points to the data runs.

Page 41: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 41

Some File Attributes

• File name: (can be up to 255 characters, allows a file to have aliases)

• Standard Information: read-only, hidden, archived, time stamps, etc.

• Security Descriptor: Access Control Lists (ACLs) who owns the file, who has what privilege, etc.

• Data: the actual data

Page 42: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 42

Security• NTFS was designed with the idea of

multiple users and security in mind. • The features necessary to implement a

security policy are built directly into the file system.

• In FAT32 a file may be hidden or read-only, but in NTFS a file can be hidden from user1, read-only to user2 and fully accessible to user3.

Page 43: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 43

Security Concepts

• Ownership: some user owns a file/folder and he or she grants permissions to other users.

• Permissions: what a user can do with a file/folder (read, read-write, delete, etc.) – Users are placed in groups (possibly more than one)

and permissions are assigned to groups – Permissions can be inherited, e.g. new files gets

permissions of folder it was created in

• Auditing: tracking information about users’ access to and modification of files

Page 44: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 44

ACLs

• An important security attribute of a file is its Access Control List (ACL).

• The ACL specifies which users can access the file and in what way they can access the file

• There are two types of ACL: – System ACL: used for auditing purposes

– Discretionary ACL: explicit assigning of permissions to users or groups

Page 45: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 45

Permissions

Page 46: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 46

Reparse points

• Reparse Points: One can associate an action or actions with a file. So that if the file is accessed, the action is performed. – Analogous to a trigger in a database– Reparse points is very flexible, one example is

redirection sending one to another file or directory, it may be on another drive or even have been archived.

Page 47: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 47

Other features

• Improved Security and Permissions: one change is from static to dynamic permission inheritance. – Static: a child inherits the parent’s permissions when

it is created but is unaffected by subsequent changes in the parent’s permissions

– Dynamic: a change to the parent’s permission will affect the child’s permissions

• Change Journals: improved auditing (journaling) of file/folder access activity.

• Encryption: Automatic encryption/decryption of files (when accessed by users with the appropriate permissions).

Page 48: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 48

Improvements (NTFS 5 over 4)

• Disk Quotas: Users or groups of users can be limited in the amount of disk space they can use.

• Sparse File Support: A sparse file is one that may be big but hold very little data (relative to its size). NTFS has utilities to help store sparse files more efficiently.

• Disk Defragmenter: Strictly speaking part of the operating system, it affects the file system.

Page 49: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 49

Transactions• Don’t forget NTFS is pretty much a database. • Almost any activity involving the drive in anyway

is going to affect a number of files. • NTFS introduces the notion of a transaction – the

grouping together of various operations to form an “atomic” unit. – In other words these operations should be viewed as

“all or nothing” in order to maintain the file system’s integrity.

– Recall the “ACID test” from databases?

Page 50: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 50

Logging and Committing

• There is a special metafile for logging all activity. • When all of the components of a transaction are

complete, this completion is indicated in the log file and the transaction is said to be committed.

• If something goes wrong (e.g. power failure) before a transaction is completed, the file system can undo the partially enacted transaction to return the file system to a consistent state. Doing so is said to be rolling back the transaction. – It is also called transaction recovery.

Page 51: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 51

Effect on Performance

• Logging each activity which is great for security and integrity of the file system but does have some negative effects on performance.

• Each file access now requires another file access (writing to the log file).

• One way to save on performance but risk somewhat integrity is to cache the activity log changes rather than write to disk every time.

• The cached log results are written to disk periodically but not continuously.

Page 52: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 52

Recovery

• Recovery then involves three passes over the log file:– Analysis pass: determine the part of the disk

affected– Redo pass: perform any transaction that was

completed since the last “checkpoint”– Undo pass: roll back any incomplete

transactions

Page 53: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 53

Change Journal

• NTFS can record changes to files, these are kept in the Change Journal.

• Each change is assigned an ID, an Update Sequence Number (USN). – It will record that a file was written to but not

what was written. Otherwise it would be gargantuan.

Page 54: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 54

Fault Tolerance• NTFS has a fault tolerance disk driver known as

FTDISK. • That’s where one can find the transaction recovery

features. • Also where one finds support for RAID (redundant

array of inexpensive (or is that independent) disks). • And where you’ll find dynamic bad cluster

remapping. – Basically the drive reads immediately after writing to

ensure that the cluster written to was OK. If it was not, it writes it somewhere else and marks the cluster as bad.

Page 55: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 55

Compression

• NTFS has build-in utilities for file compression – File compression takes advantage of patterns in data to

reduce the amount of space required to store it. – E.g. instead of ASCII code for text (each character 8

bits) one might use a variable length code with short codes for common letters like e and longer codes for uncommon letters like q or j. On average the files are much smaller.

• In NTFS one can compress any part of the partition.

Page 56: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 56

POSIX support

• NTFS offers POSIX support.

• POSIX stands for Portable Operating System Interface for UNIX

• It allows software developers to make sure that their code can be ported to a POSIX-compliant operating system, which includes most versions of UNIX.

Page 57: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 57

Supports Encryption

• NTFS supports Encrypting File System (EFS).

• EFS is really part of the operating system (Windows 2000). But the operating system works with the file system to make this feature easy to use.

Page 58: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 58

Disk Quota support

• As a genuinely multi-user file system, NTFS support disk quotas– A quota can be set for a particular user or on a

particular partition or the combination.

– Allows for limits and warnings. The user is warned when he or she exceeds the warning amount. The user is blocked (from writing?) when he or she exceeds the limit amount.

– Monitor and log events that cause a user to go over the "limit" or "warning" levels.

Page 59: CSIT 301 (Blum)1 File Systems. FAT CSIT 301 (Blum)2

CSIT 301 (Blum) 59

References

• PC Hardware in a Nutshell (Thompson and Thompson)

• http://www.pcguide.com

• All-in-One A+ Certification, Meyers and Jernigan

• http://www.webopedia.com

• http://www.serialata.org