Upload
suzan-mccarthy
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
04/20/23 CST 352 - Operating Systems 2
Topics
Introduction
File System ConsiderationsNaming
Structure
Types
Access
Operations
04/20/23 CST 352 - Operating Systems 3
Topics
DirectoriesSingle LevelHierarchicalOperations
ImplementationLayoutAllocationDirectoriesFree Space Management
04/20/23 CST 352 - Operating Systems 4
Introduction
All computer systems need to retrieve and store information.
A machine must be capable of being powered down without losing important information.
Information must also remain viable outside the reference of a creating/consuming process.
04/20/23 CST 352 - Operating Systems 5
Introduction
Persistence RequirementsInformation must transcend process boundaries.Information must transcend power cycles of a machine.Multiple processes/threads must be able to have simultaneous access to information.Stored information must be capable of growing to large quantities.
04/20/23 CST 352 - Operating Systems 6
Introduction
To deal with the aforementioned requirements, the most common solution is to use a magnetic disk.
On the magnetic disk, information needs to be organized in groups, known as files.
04/20/23 CST 352 - Operating Systems 7
Introduction
A File is an abstract way to represent information stored on some persistent media, such as a magnetic disk.
Using a file, information can be created by a thread in a process and written to a disk, then later read by a thread in a separate process.
04/20/23 CST 352 - Operating Systems 8
File System Considerations
File NamingTo deal with files on a magnetic disk, the process using the file must have some way to refer to the chunk of disk storage.File names must conform to standards set by the operating system.A typical naming scheme deals with two parts – a name and an extension.The extension can be used by the OS to associate files with application programs.
04/20/23 CST 352 - Operating Systems 9
File System Considerations
File Naming – Possible components.protocol (or scheme) — access method (e.g., http, ftp, file etc.)
host (or network-ID) — host name, IP address, domain name, or LAN network name (e.g., wikipedia.org, 207.142.131.206, \\MYCOMPUTER, SYS:, etc.)
device (or node) — port, socket, drive, root mountpoint, disc, volume (e.g., C:, /, SYSLIB, etc.)
directory (or path) — directory tree (e.g., /usr/bin, \TEMP, [USR.LIB.SRC], etc.)
file — base name of the file
type (format or extension) — indicates the content type of the file (e.g., .txt, .exe, .COM, etc.)
version — revision number of the file
04/20/23 CST 352 - Operating Systems 10
File System Considerations
File StructureStructure as simply a sequence of 8 bit values.
Most common and most flexible.
Using process must interpret the file.
Structure as a sequence of fixed length records. Allows each record to be structured for block read.
Structure fields in the file using a BTree type structure.
Allows fast access and indexing.
04/20/23 CST 352 - Operating Systems 11
File System Considerations
File TypesRegular Files: These files are created by users and contain information relative to those user processes.Directories: System files used to manage groupings of files.Character Special Files: Used by the system for spooling of serial I/O devices.Block Special Files: Used by the system to model disk I/O.
04/20/23 CST 352 - Operating Systems 12
File System Considerations
File TypesRegular Files are of two types:
ASCII– The file can be dumped directly to the screen. All characters are printable.Binary – The file contains a sequence of binary information, not necessarily ASCII.
In UNIX, binary files start with a “magic number”. The OS uses this to determine if the file is a true executable file.
04/20/23 CST 352 - Operating Systems 13
File System Considerations
File Access
Sequential Access – The bytes in a file must be accessed in a serial fashion. There can be no random seeks through a file.
Random Access – The bytes in a file can be accesses in any order. Access is done based on a “key-value” pair.
04/20/23 CST 352 - Operating Systems 14
File System Considerations
File OrganizationSequential Access
PileVariable length recordsVariable set of fieldsChronological order
Each record in the file contains a burst of data.Data is appended to the file as it shows up for write.Read access must be done sequentially.Search for a particular item must be an exhaustive
search.
04/20/23 CST 352 - Operating Systems 15
File System ConsiderationsFile Organization
Sequential AccessPile
Record 1
etc...Record 6Record 5Record 4Record 3Record 2
Length
04/20/23 CST 352 - Operating Systems 16
File System Considerations
File Organization
Sequential AccessSequential File
Fixed-length Records.
Fixed set of fields in fixed order.
Sequential order based on a “key” field.
04/20/23 CST 352 - Operating Systems 17
File System Considerations
File Organization
Sequential AccessSequential File
Key Record 1
etc...Record 6Record 5Record 4Record 3Record 2
Data Cksum
Key Data Cksum
Key Data Cksum
Key Data Cksum
Key Data Cksum
Key Data Cksum
04/20/23 CST 352 - Operating Systems 18
File System Considerations
File Organization
Random AccessIndexed Sequential File
Characteristics are the same as those of a sequential file.
A “key” based index of access pointers (file pointers) is maintained to give random access points into file records.
04/20/23 CST 352 - Operating Systems 19
File System Considerations
File Organization
Random AccessIndexed Sequential File
Key Record 1
etc...Record 6Record 5Record 4Record 3Record 2
Data Cksum
Key Data Cksum
Key Data Cksum
Key Data Cksum
Key Data Cksum
Key Data Cksum
Index Tree
04/20/23 CST 352 - Operating Systems 20
File System Considerations
File Organization
Random AccessIndexed File
A tree based index is created to give direct access to file records.
Multiple tree indexes may be deployed to search on different key types.
Records can be variable length.
04/20/23 CST 352 - Operating Systems 21
File System Considerations
File OrganizationRandom Access
Indexed File
PrimaryIndexTree
SecondaryIndex Tree
04/20/23 CST 352 - Operating Systems 22
File System Considerations
File Organization
Random AccessHashed Index
Disk records are of variable length.
A hash table is deployed to map a key to the actual disk address.
04/20/23 CST 352 - Operating Systems 23
File System Considerations
File Organization
Random AccessHashed Index Hash
TableFile Nodes
04/20/23 CST 352 - Operating Systems 24
File System Considerations
File AttributesIn the header of a file, special attributes are stored
for management of the file:
Permission – What process can and cannot access the file.
Password – File access control.
Creator – What process or user created the file.
Owner – Who is the current owner of the file.
RW Flag – Readable?
04/20/23 CST 352 - Operating Systems 25
File System Considerations
File AttributesHidden Flag – Can the file be seen by directory listings?
System Flag – Is this a system file?
Archive Flag – Has this file been archived?
Binary Flag – Is this a binary file?
Etc.
04/20/23 CST 352 - Operating Systems 26
File System Considerations
File OperationsCreate – Write a file entry point in the file system.Delete – Free up any disk space associated with the file to be deleted.Open – Get the file attributes and address copied from the disk into main memory. Initialize the file pointer.Close – Remove the file attribute cache from main memory.
04/20/23 CST 352 - Operating Systems 27
File System Considerations
File Operations (cont’d)
Read – Read a sequence of bytes from the file from the current file pointer position.
Write – Write a sequence of bytes to the file based on the current file pointer position.
Append – Add a sequence of bytes to the end of a file.
Seek – Move the file pointer to a new position in the file.
04/20/23 CST 352 - Operating Systems 28
File System Considerations
File Operations (cont’d)
Get Attributes – Fetch only the attributes of a file.
Set Attributes – Set the attributes of a file.
Rename – Assign a new logical name to the file.
04/20/23 CST 352 - Operating Systems 29
File System Considerations
File System Generic Layered Architecture
Device Drivers - Magnetic Disk, DVD, CDROM, etc
Basic File System - Low Level Access
Basic I/O Subsystem
Logical I/O Interface
Pile Seq.Indexed
Seq.Indexed Hashed
Physical Device
04/20/23 CST 352 - Operating Systems 30
File System Considerations
File System Generic Layered Architecture
Physical Device – The actual hardware.
Device Drivers – Perform operations on the hardware (e.g. start, stop, read, write, etc.)
Basic File System – Block interface, buffering, read commands, write commands.
Basic I/O Subsystem – File I/O initiation and termination. Management of control structures.
Logical I/O Interface – Present file I/O to the file system as records of data.
04/20/23 CST 352 - Operating Systems 31
Directories
Directories provide an abstract method to create a “file of files”.
Creating a directory allows the ability to store multiple files in a file system.
04/20/23 CST 352 - Operating Systems 32
Directories
Single Level
There is one “file of files” in the system.
The directory can only contain files, not directories.
04/20/23 CST 352 - Operating Systems 33
Directories
Hierarchical
Allow any directory to contain directories as one of the entries.
A directory that can contain files and other directories.
04/20/23 CST 352 - Operating Systems 34
Directories
OperationsFind – find a file or directory in a directory.
Create File – create a new file entry in this directory.
Delete File – delete a file from this directory.
List – list a directory or file or all directories and file contained in this directory.
04/20/23 CST 352 - Operating Systems 35
Implementation
Allocation StrategiesTo manage files, the free space on the disk must be tracked.
Every time a file is created and written to, the file manager must write to a block (group of sectors) on disk.
In addition to handling free and used disk space, the file system must keep track of what blocks go with which files.
04/20/23 CST 352 - Operating Systems 36
Implementation
Allocation StrategiesContiguous – Keep a file as a contiguous sequence of disk blocks.
Advantages:Simple implementation
Must only keep track of the starting block and the number of blocks used for the file.
Read performance is ideal A file is located in a continuous sequence of
blocks, requiring minimum seek.
04/20/23 CST 352 - Operating Systems 37
Implementation
Allocation StrategiesContiguous (cont’d)
disadvantages:High fragmentation
Initially, new files are added to the end of free space. As files are freed, space will open up. The file system will then use this free space, most
likely for a file smaller than the one freed up.
To create a file, the file size must be known in advance.
Files that grow larger that their original size must be relocated to a new contiguous area of free blocks on the disk.
04/20/23 CST 352 - Operating Systems 38
Implementation
Allocation StrategiesLinked List – Keep a linked list of free disk sectors.
The start of each free block has the address to the next free block.
When a file is created, the next free block will be added to the file descriptor.
The file system just needs to keep track of the first block address.
04/20/23 CST 352 - Operating Systems 39
Implementation
Allocation StrategiesLinked List (cont’d)
AdvantagesThere will be no disk fragmentation
Management is simple
DisadvantagesFile reading is limited to sequential access
Random access is non existent
04/20/23 CST 352 - Operating Systems 40
Implementation
Free Space ManagementBitmap
Keep a bitmap where each bit corresponds to a block on disk.1 – allocated0 – free
04/20/23 CST 352 - Operating Systems 41
Implementation
Popular File Systems• FAT (File Allocation Table – Created by Bill
Gates)• NTFS (New Technology File System – Microsoft)• XFS (X File System – Silicon Graphics)• HFS+ (Hierarchical File System – Apple)• EXT 2/3/4….(Linux, Android, others???)• ZFS (Free BSD) – popular??? I don’t know• UFS (Unix File System) – not so popular any more
04/20/23 CST 352 - Operating Systems 42
Implementation
Popular File SystemsFile Allocation Table (FAT) – Keep the file pointers in a table in memory (an array implementation of a linked list).
A Cluster is a Group of Sectors on the Hard Drive that have information in them.
A 16K Cluster has 32 Sectors in it (512*32=16384).
Each Cluster has an entry in the FAT Table.
FAT 16 – Limited to 216 (16 bit) entries (clusters).
A File name maps to an entry in the FAT table.
04/20/23 CST 352 - Operating Systems 43
Implementation
Popular File SystemsFile Allocation Table (FAT) – Entry Structure:
FAT Code Range Meaning
0000h Available Cluster
0002h-FFEFh Used, Next Cluster in File
FFF0h-FFF6h Reserved Cluster
FFF7h BAD Cluster
FFF8h-FFFFh Used, Last Cluster in File
04/20/23 CST 352 - Operating Systems 44
Implementation
Popular File SystemsFile Allocation Table (FAT) – Keep the file pointers in a table in memory (an array implementation of a linked list).
AdvantagesFile pointer access is fast because it is in memory.
Entire block is available for memory.
File chain may be followed without accessing the disk.
04/20/23 CST 352 - Operating Systems 45
Implementation
Popular File SystemsFile Allocation Table (FAT)
disadvantagesThe entire FAT must be in memory.
20 Gbyte disk with a 1 Kb block requiring 20 million entries. Each entry is 4 bytes (FAT 32). The table will therefore be 60 Mbytes of memory.
04/20/23 CST 352 - Operating Systems 46
Implementation
Popular File SystemsNTFS
Physical disk space is divided into clusters (like FAT).MFT -12% of the partition is set aside for the Master File Table.The first 16 MFT files are special “housekeeping” files for NTFS (called metafiles).
04/20/23 CST 352 - Operating Systems 47
Implementation
Popular File SystemsNTFS
MFT - the common table of files.
The centralized directory of all remaining disk files and itself.
MFT is divided into records of the fixed size (usually 1 KBytes)
Each record corresponds to some file.
04/20/23 CST 352 - Operating Systems 49
Implementation
Popular File SystemsNTFS - MFT Entry Structure
04/20/23 CST 352 - Operating Systems 50
Implementation
Popular File SystemsNTFS – Metafiles
$MFT Itself MFT$MFTmirr copy of the first 16 MFT records placed in the middle of the disk $LogFile journaling support file$Volume housekeeping information - volume label, file system version,
etc. $AttrDef list of standard files attributes on the volume $. root directory $Bitmap volume free space bitmap$Boot boot sector (bootable partition) $Quota file where the users rights on disk space usage are recorded
(began to work only in NT5)$Upcase File - the table of accordance between capital and small
letters in files names on current volume. It is necessary because in NTFS file names are stored in Unicode that makes 65 thousand
various characters and it is not easy to search for their large and small equivalents.
04/20/23 CST 352 - Operating Systems 51
Implementation
Popular File SystemsNTFS
The $. Metafile points to the root directory file.
Directory files are divided into blocksEach block contains: file name, base attributes, reference to the element MFT which gives the complete information on an element of the directory.
The inner structure of the directory is a binary tree.
04/20/23 CST 352 - Operating Systems 52
Implementation
Popular File SystemsI-node – Each file has a data node created for it that contains the address of every block for the file (essentially a FAT for each file, created in memory upon file open).
Advantages – Typically takes up less memory than the FAT.Disadvantage – More overhead in file creation.
04/20/23 CST 352 - Operating Systems 53
Implementation
Popular File Systems
XFS –B-Trees of I-nodes.
04/20/23 CST 352 - Operating Systems 55
Implementation
Popular File Systems
HFS+ - Volume LayoutVolumes are structured as B-Trees.
04/20/23 CST 352 - Operating Systems 56
Implementation
Popular File Systems
HFS+ - B-Tree basedEach node contains file information.
04/20/23 CST 352 - Operating Systems 57
Implementation
Popular File Systems (HFS+)Each B-tree contains a single header node. The header node is always the first node in the B-tree. It contains the information needed to find other any other node in the tree.
Map nodes contain map records, which hold any allocation data (a bitmap that describes the free nodes in the B-tree) that overflows the map record in the header node.
Index nodes hold pointer records that determine the structure of the B-tree.
Leaf nodes hold data records that contain the data associated with a given key. The key for each data record must be unique.