57
03/23/22 CST 352 - Operating Systems 1 Operating Systems CST 352 File Systems

10/22/2015CST 352 - Operating Systems1 Operating Systems CST 352 File Systems

Embed Size (px)

Citation preview

04/20/23 CST 352 - Operating Systems 1

Operating Systems

CST 352

File Systems

04/20/23 CST 352 - Operating Systems 2

Topics

Introduction

File System ConsiderationsNaming

Structure

Types

Access

Operations

04/20/23 CST 352 - Operating Systems 3

Topics

DirectoriesSingle LevelHierarchicalOperations

ImplementationLayoutAllocationDirectoriesFree Space Management

04/20/23 CST 352 - Operating Systems 4

Introduction

All computer systems need to retrieve and store information.

A machine must be capable of being powered down without losing important information.

Information must also remain viable outside the reference of a creating/consuming process.

04/20/23 CST 352 - Operating Systems 5

Introduction

Persistence RequirementsInformation must transcend process boundaries.Information must transcend power cycles of a machine.Multiple processes/threads must be able to have simultaneous access to information.Stored information must be capable of growing to large quantities.

04/20/23 CST 352 - Operating Systems 6

Introduction

To deal with the aforementioned requirements, the most common solution is to use a magnetic disk.

On the magnetic disk, information needs to be organized in groups, known as files.

04/20/23 CST 352 - Operating Systems 7

Introduction

A File is an abstract way to represent information stored on some persistent media, such as a magnetic disk.

Using a file, information can be created by a thread in a process and written to a disk, then later read by a thread in a separate process.

04/20/23 CST 352 - Operating Systems 8

File System Considerations

File NamingTo deal with files on a magnetic disk, the process using the file must have some way to refer to the chunk of disk storage.File names must conform to standards set by the operating system.A typical naming scheme deals with two parts – a name and an extension.The extension can be used by the OS to associate files with application programs.

04/20/23 CST 352 - Operating Systems 9

File System Considerations

File Naming – Possible components.protocol (or scheme) — access method (e.g., http, ftp, file etc.)

host (or network-ID) — host name, IP address, domain name, or LAN network name (e.g., wikipedia.org, 207.142.131.206, \\MYCOMPUTER, SYS:, etc.)

device (or node) — port, socket, drive, root mountpoint, disc, volume (e.g., C:, /, SYSLIB, etc.)

directory (or path) — directory tree (e.g., /usr/bin, \TEMP, [USR.LIB.SRC], etc.)

file — base name of the file

type (format or extension) — indicates the content type of the file (e.g., .txt, .exe, .COM, etc.)

version — revision number of the file

04/20/23 CST 352 - Operating Systems 10

File System Considerations

File StructureStructure as simply a sequence of 8 bit values.

Most common and most flexible.

Using process must interpret the file.

Structure as a sequence of fixed length records. Allows each record to be structured for block read.

Structure fields in the file using a BTree type structure.

Allows fast access and indexing.

04/20/23 CST 352 - Operating Systems 11

File System Considerations

File TypesRegular Files: These files are created by users and contain information relative to those user processes.Directories: System files used to manage groupings of files.Character Special Files: Used by the system for spooling of serial I/O devices.Block Special Files: Used by the system to model disk I/O.

04/20/23 CST 352 - Operating Systems 12

File System Considerations

File TypesRegular Files are of two types:

ASCII– The file can be dumped directly to the screen. All characters are printable.Binary – The file contains a sequence of binary information, not necessarily ASCII.

In UNIX, binary files start with a “magic number”. The OS uses this to determine if the file is a true executable file.

04/20/23 CST 352 - Operating Systems 13

File System Considerations

File Access

Sequential Access – The bytes in a file must be accessed in a serial fashion. There can be no random seeks through a file.

Random Access – The bytes in a file can be accesses in any order. Access is done based on a “key-value” pair.

04/20/23 CST 352 - Operating Systems 14

File System Considerations

File OrganizationSequential Access

PileVariable length recordsVariable set of fieldsChronological order

Each record in the file contains a burst of data.Data is appended to the file as it shows up for write.Read access must be done sequentially.Search for a particular item must be an exhaustive

search.

04/20/23 CST 352 - Operating Systems 15

File System ConsiderationsFile Organization

Sequential AccessPile

Record 1

etc...Record 6Record 5Record 4Record 3Record 2

Length

04/20/23 CST 352 - Operating Systems 16

File System Considerations

File Organization

Sequential AccessSequential File

Fixed-length Records.

Fixed set of fields in fixed order.

Sequential order based on a “key” field.

04/20/23 CST 352 - Operating Systems 17

File System Considerations

File Organization

Sequential AccessSequential File

Key Record 1

etc...Record 6Record 5Record 4Record 3Record 2

Data Cksum

Key Data Cksum

Key Data Cksum

Key Data Cksum

Key Data Cksum

Key Data Cksum

04/20/23 CST 352 - Operating Systems 18

File System Considerations

File Organization

Random AccessIndexed Sequential File

Characteristics are the same as those of a sequential file.

A “key” based index of access pointers (file pointers) is maintained to give random access points into file records.

04/20/23 CST 352 - Operating Systems 19

File System Considerations

File Organization

Random AccessIndexed Sequential File

Key Record 1

etc...Record 6Record 5Record 4Record 3Record 2

Data Cksum

Key Data Cksum

Key Data Cksum

Key Data Cksum

Key Data Cksum

Key Data Cksum

Index Tree

04/20/23 CST 352 - Operating Systems 20

File System Considerations

File Organization

Random AccessIndexed File

A tree based index is created to give direct access to file records.

Multiple tree indexes may be deployed to search on different key types.

Records can be variable length.

04/20/23 CST 352 - Operating Systems 21

File System Considerations

File OrganizationRandom Access

Indexed File

PrimaryIndexTree

SecondaryIndex Tree

04/20/23 CST 352 - Operating Systems 22

File System Considerations

File Organization

Random AccessHashed Index

Disk records are of variable length.

A hash table is deployed to map a key to the actual disk address.

04/20/23 CST 352 - Operating Systems 23

File System Considerations

File Organization

Random AccessHashed Index Hash

TableFile Nodes

04/20/23 CST 352 - Operating Systems 24

File System Considerations

File AttributesIn the header of a file, special attributes are stored

for management of the file:

Permission – What process can and cannot access the file.

Password – File access control.

Creator – What process or user created the file.

Owner – Who is the current owner of the file.

RW Flag – Readable?

04/20/23 CST 352 - Operating Systems 25

File System Considerations

File AttributesHidden Flag – Can the file be seen by directory listings?

System Flag – Is this a system file?

Archive Flag – Has this file been archived?

Binary Flag – Is this a binary file?

Etc.

04/20/23 CST 352 - Operating Systems 26

File System Considerations

File OperationsCreate – Write a file entry point in the file system.Delete – Free up any disk space associated with the file to be deleted.Open – Get the file attributes and address copied from the disk into main memory. Initialize the file pointer.Close – Remove the file attribute cache from main memory.

04/20/23 CST 352 - Operating Systems 27

File System Considerations

File Operations (cont’d)

Read – Read a sequence of bytes from the file from the current file pointer position.

Write – Write a sequence of bytes to the file based on the current file pointer position.

Append – Add a sequence of bytes to the end of a file.

Seek – Move the file pointer to a new position in the file.

04/20/23 CST 352 - Operating Systems 28

File System Considerations

File Operations (cont’d)

Get Attributes – Fetch only the attributes of a file.

Set Attributes – Set the attributes of a file.

Rename – Assign a new logical name to the file.

04/20/23 CST 352 - Operating Systems 29

File System Considerations

File System Generic Layered Architecture

Device Drivers - Magnetic Disk, DVD, CDROM, etc

Basic File System - Low Level Access

Basic I/O Subsystem

Logical I/O Interface

Pile Seq.Indexed

Seq.Indexed Hashed

Physical Device

04/20/23 CST 352 - Operating Systems 30

File System Considerations

File System Generic Layered Architecture

Physical Device – The actual hardware.

Device Drivers – Perform operations on the hardware (e.g. start, stop, read, write, etc.)

Basic File System – Block interface, buffering, read commands, write commands.

Basic I/O Subsystem – File I/O initiation and termination. Management of control structures.

Logical I/O Interface – Present file I/O to the file system as records of data.

04/20/23 CST 352 - Operating Systems 31

Directories

Directories provide an abstract method to create a “file of files”.

Creating a directory allows the ability to store multiple files in a file system.

04/20/23 CST 352 - Operating Systems 32

Directories

Single Level

There is one “file of files” in the system.

The directory can only contain files, not directories.

04/20/23 CST 352 - Operating Systems 33

Directories

Hierarchical

Allow any directory to contain directories as one of the entries.

A directory that can contain files and other directories.

04/20/23 CST 352 - Operating Systems 34

Directories

OperationsFind – find a file or directory in a directory.

Create File – create a new file entry in this directory.

Delete File – delete a file from this directory.

List – list a directory or file or all directories and file contained in this directory.

04/20/23 CST 352 - Operating Systems 35

Implementation

Allocation StrategiesTo manage files, the free space on the disk must be tracked.

Every time a file is created and written to, the file manager must write to a block (group of sectors) on disk.

In addition to handling free and used disk space, the file system must keep track of what blocks go with which files.

04/20/23 CST 352 - Operating Systems 36

Implementation

Allocation StrategiesContiguous – Keep a file as a contiguous sequence of disk blocks.

Advantages:Simple implementation

Must only keep track of the starting block and the number of blocks used for the file.

Read performance is ideal A file is located in a continuous sequence of

blocks, requiring minimum seek.

04/20/23 CST 352 - Operating Systems 37

Implementation

Allocation StrategiesContiguous (cont’d)

disadvantages:High fragmentation

Initially, new files are added to the end of free space. As files are freed, space will open up. The file system will then use this free space, most

likely for a file smaller than the one freed up.

To create a file, the file size must be known in advance.

Files that grow larger that their original size must be relocated to a new contiguous area of free blocks on the disk.

04/20/23 CST 352 - Operating Systems 38

Implementation

Allocation StrategiesLinked List – Keep a linked list of free disk sectors.

The start of each free block has the address to the next free block.

When a file is created, the next free block will be added to the file descriptor.

The file system just needs to keep track of the first block address.

04/20/23 CST 352 - Operating Systems 39

Implementation

Allocation StrategiesLinked List (cont’d)

AdvantagesThere will be no disk fragmentation

Management is simple

DisadvantagesFile reading is limited to sequential access

Random access is non existent

04/20/23 CST 352 - Operating Systems 40

Implementation

Free Space ManagementBitmap

Keep a bitmap where each bit corresponds to a block on disk.1 – allocated0 – free

04/20/23 CST 352 - Operating Systems 41

Implementation

Popular File Systems• FAT (File Allocation Table – Created by Bill

Gates)• NTFS (New Technology File System – Microsoft)• XFS (X File System – Silicon Graphics)• HFS+ (Hierarchical File System – Apple)• EXT 2/3/4….(Linux, Android, others???)• ZFS (Free BSD) – popular??? I don’t know• UFS (Unix File System) – not so popular any more

04/20/23 CST 352 - Operating Systems 42

Implementation

Popular File SystemsFile Allocation Table (FAT) – Keep the file pointers in a table in memory (an array implementation of a linked list).

A Cluster is a Group of Sectors on the Hard Drive that have information in them.

A 16K Cluster has 32 Sectors in it (512*32=16384). 

Each Cluster has an entry in the FAT Table.

FAT 16 – Limited to 216 (16 bit) entries (clusters).

A File name maps to an entry in the FAT table.

04/20/23 CST 352 - Operating Systems 43

Implementation

Popular File SystemsFile Allocation Table (FAT) – Entry Structure:

FAT Code Range Meaning

0000h Available Cluster

0002h-FFEFh Used, Next Cluster in File

FFF0h-FFF6h Reserved Cluster

FFF7h BAD Cluster

FFF8h-FFFFh Used, Last Cluster in File

04/20/23 CST 352 - Operating Systems 44

Implementation

Popular File SystemsFile Allocation Table (FAT) – Keep the file pointers in a table in memory (an array implementation of a linked list).

AdvantagesFile pointer access is fast because it is in memory.

Entire block is available for memory.

File chain may be followed without accessing the disk.

04/20/23 CST 352 - Operating Systems 45

Implementation

Popular File SystemsFile Allocation Table (FAT)

disadvantagesThe entire FAT must be in memory.

20 Gbyte disk with a 1 Kb block requiring 20 million entries. Each entry is 4 bytes (FAT 32). The table will therefore be 60 Mbytes of memory.

04/20/23 CST 352 - Operating Systems 46

Implementation

Popular File SystemsNTFS

Physical disk space is divided into clusters (like FAT).MFT -12% of the partition is set aside for the Master File Table.The first 16 MFT files are special “housekeeping” files for NTFS (called metafiles).

04/20/23 CST 352 - Operating Systems 47

Implementation

Popular File SystemsNTFS

MFT - the common table of files.

The centralized directory of all remaining disk files and itself.

MFT is divided into records of the fixed size (usually 1 KBytes)

Each record corresponds to some file.

04/20/23 CST 352 - Operating Systems 48

Implementation

Popular File SystemsNTFS - Partition Layout

04/20/23 CST 352 - Operating Systems 49

Implementation

Popular File SystemsNTFS - MFT Entry Structure

04/20/23 CST 352 - Operating Systems 50

Implementation

Popular File SystemsNTFS – Metafiles

$MFT Itself MFT$MFTmirr copy of the first 16 MFT records placed in the middle of the disk $LogFile journaling support file$Volume housekeeping information - volume label, file system version,

etc. $AttrDef list of standard files attributes on the volume $. root directory $Bitmap volume free space bitmap$Boot boot sector (bootable partition) $Quota file where the users rights on disk space usage are recorded

(began to work only in NT5)$Upcase File - the table of accordance between capital and small

letters in files names on current volume. It is necessary because in NTFS file names are stored in Unicode that makes 65 thousand

various characters and it is not easy to search for their large and small equivalents.

04/20/23 CST 352 - Operating Systems 51

Implementation

Popular File SystemsNTFS

The $. Metafile points to the root directory file.

Directory files are divided into blocksEach block contains: file name, base attributes, reference to the element MFT which gives the complete information on an element of the directory.

The inner structure of the directory is a binary tree.

04/20/23 CST 352 - Operating Systems 52

Implementation

Popular File SystemsI-node – Each file has a data node created for it that contains the address of every block for the file (essentially a FAT for each file, created in memory upon file open).

Advantages – Typically takes up less memory than the FAT.Disadvantage – More overhead in file creation.

04/20/23 CST 352 - Operating Systems 53

Implementation

Popular File Systems

XFS –B-Trees of I-nodes.

04/20/23 CST 352 - Operating Systems 54

Implementation

XFS

04/20/23 CST 352 - Operating Systems 55

Implementation

Popular File Systems

HFS+ - Volume LayoutVolumes are structured as B-Trees.

04/20/23 CST 352 - Operating Systems 56

Implementation

Popular File Systems

HFS+ - B-Tree basedEach node contains file information.

04/20/23 CST 352 - Operating Systems 57

Implementation

Popular File Systems (HFS+)Each B-tree contains a single header node. The header node is always the first node in the B-tree. It contains the information needed to find other any other node in the tree.

Map nodes contain map records, which hold any allocation data (a bitmap that describes the free nodes in the B-tree) that overflows the map record in the header node.

Index nodes hold pointer records that determine the structure of the B-tree.

Leaf nodes hold data records that contain the data associated with a given key. The key for each data record must be unique.