Chapter 10.2: File-System Interface

Chapter 10.2: File-System InterfaceChapter 10.2: File-System Interface

10.2 Silberschatz, Galvin and Gagne ©2005Operating System Concepts

Chapter 10: File-System InterfaceChapter 10: File-System Interface

Chapter 10.1 File Concept

Access Methods

Chapter 10.2 Directory Structure - continued

File-System Mounting

Protection


Directory StructureDirectory Structure


DirectoriesDirectories

Systems may have zero or more file systems and each of these may be of various types used to manage data.

Files systems themselves may consist of millions of files scattered and organized (or not well organized) in a many of ways.

All files must be managed and organized, as files constitute a major component of any computing system..

For files that are organized (again, they don’t have to be…) the principal way of organizing files is by using a directory

But there are many different directory structures used to organize / manage files.

These various directories can contain different data items too.

We will look at the key ways in which directories are organized.


Directories - 2Directories - 2 While disks may certainly be dedicated, it is frequently the case that we

may have multiple file systems on a single disk. These can be organized in many ways and termed also in many ways.

Disks themselves may be partitioned, can have ‘raw disk,’ ‘regular’ formatted disk, etc.

Disks can be sliced and diced by manufacturers and vendors many ways. For the time being, refer to a storage device holding a file system as a

volume. A volume may be thought of as a virtual disk, because volumes can

actually span physical devices.

A disk itself can not only store data files, program files, directories (all with a variety of formats), and more but also a host of other storable items such as other operating systems.

A Volume Table of Contents (VTOC), which is a device directory, contains information describing the volume contents.

Simply refer to these structures as ‘directories.’


Directory StructureDirectory Structure A directory can be organized in various ways. A directory may be considered a table mapping file names to a specific

files. A directory may be considered a collection of nodes containing

information about all files; that is, the directory entry not only points to a file but also contains much informaton about the file.

F 1 F 2F 3

F 4

F n

Directory

Files

Both the directory structure and the files reside on diskBackups of these two structures are often kept on magnetic tapes


Operations Performed on DirectoryOperations Performed on Directory

Search for a file Given a file name, we need to be able to search the directory to

find the file. Create a file; Delete a file.

Need to be able to create / delete file on disk and hence maintain an appropriate entry in the directory

List a directory We need to be able to list the contents of a directory and see

characteristics of the files contained in the directory. Rename a file

Often need to rename a file; its name may imply its position in a directory. (a full path name…)

Traverse the file system Here we want to be able to access the directory and every file

contained in the directory structure.

We want to be able to do all this very quickly!


Single-Level DirectorySingle-Level Directory

A single directory for all users – the simplest format. All files in the same directory.

Problems:Files must have unique namesThis is very difficult (not practical) for multiple users using the same

directory.

Not uncommon for a single user to have hundreds of files on a single computing system.

(I know that I do on my local Linux machine!)


Two-Level DirectoryTwo-Level Directory Here, we have a separate directory for each user Similar structure. A master file directory contains the user name /

account number and points to the file directory for that user.

In creating a file, the OS uses the user’s user file directory (UFD) as part of the pathname and thus ensures file names and other entries are unique..

Creation of a new directory will normally require a system administrator.

Every entry has a path name to uniquely define / locate a file..

Other systems require a volume, as in C:\mydir\pgr1.java.


Two-Level DirectoryTwo-Level Directory Important to note that for system files, such as loaders,

linkers, assemblers, compilers, and various other ‘commands,’ these too are defined as files and when we invoke them, the file is loaded and executed. e.g gcc pgm1.c This invokes the compiler and passes a file name as a

parameter. But where is gcc?

Search Path: So, many commonly used files, such as system files are put in a special directory for system files.

Because the user’s directory is always searched first, a ‘not-found’ will result in a search of this system directory.

The sequence of directories searched when a file is named is called a search path and this can have many fully-defined directories in it.

Both Unix and Windows machines use this approach.


Tree-Structured DirectoriesTree-Structured Directories

So we’ve seen a two-level directory. The natural extension to a two-level directory is a

tree (inverted tree) of arbitrary height. A tree, by definition, has one root, and, because it

is a tree (not a graph), supports only a single path to each item.

At each level, we either have files or directories / subdirectories for a lower levels. Be sure to continue to recognize that a directory is itself

simply a file, but directories are used for special ‘things’ and are organized and managed differently than a standard datafile, as we shall see.


Tree-Structured Directories - 2Tree-Structured Directories - 2

Running Processes: Each running process has a ‘current directory.’

References made to files by a running process causes the OS to search the current directory to locate the reference.

If the desired item is NOT in the current directory, then the user must specify a path name or path name(s) that can be alternatively used to search for the desired item In Unix / Linux. The current directory is indicated with a dot (.).

Typically, when one ‘logs onto a system, one is in a login shell. The operating system searches this directory for some kind of

information identifying this user – perhaps a profile file… You can edit your profile in various ways:

Easiest is $ pico .profile if you don’t mind ‘pico.’ Notice the dot (.) in front of profile. You can set search PATHs in

here… Upon successful login, one is typically linked to your current

directory.


Tree-Structured Directories – Path NamesTree-Structured Directories – Path Names

Path names can be both absolute or relative.

An absolute path name is the full path which will start at the root directory and will follow a path ‘down’ to desired file while specifying directories and subdirectories en route to that item.

A relative path name defines a path in the current directory. Of course, we can change the current directory to be whatever we

want whenever we want to do this.

We can issue a $ cd .. Which means go up one level

$ pwd which will print your working directory –

in other words, where you are ‘at’ in your directory.

Example: $ cd nextdirdown <enter>


Tree-Structured DirectoriesTree-Structured Directories

In the tree-structured directory above, if current directory is root/spell/mail, then the relative path to prt/first refers to the same file as the absolute path root/spell/mail/prt/firstNote that root, spell,mail, and prt are directories; first is a file.

Of course, as a user, we can create directories and subdirectories to organize our files in any way we please.


Tree-Structured Directories (Cont)Tree-Structured Directories (Cont)

Current directory (working directory) The Linux command: cd /spell/mail/prog makes this subdirectory ‘current

directory.’ cd is a command that invokes a file containing an executable program that

‘changes our directory’ to the one specified.

prog is a directory with three files in it (see previous slide) list, obj, and spell.

We can also just issue a ls command, which will list the contents of our current directory – wherever we ‘are’ in our directory structure.

Dangers: Some operating systems will not allow a user to delete a directory while there

are ‘entries’ in it, such as other directories, files, etc. perhaps to many levels.

Windows environment requires directory empty before you can delete it. Inconvenient, but may save your bacon!!

Unix provides the rm command (remove).

There is also a rmdir to remove an entire directory – but be careful!!! Removing a directory in Linux removes all beneath it!!


Tree-Structured Directories (cont)Tree-Structured Directories (cont)

Remember: in our directory system we have both absolute and relative path names.

Creating a new file is done in current directory, unless we change directories or cite a different directory as part of the creation of the new file.

Delete a file?

rm <file-name> Creating a new subdirectory is done in a current directory

mkdir <dir-name>

mail

prog copy prt exp count

Deleting “mail” (above) deletes the entire subtree rooted by “mail” Be careful!!!


Acyclic-Graph DirectoriesAcyclic-Graph Directories Have the ability to share subdirectories and files. Perhaps you wish to share resources with other people working on the same file or same project. A tree data structure does not permit more than one path to an entry. So we need a different data structure. An acyclic graph is a graph with no cycles, but unlike a tree, there may be more than one path to a node (file or subdirectory) Thus this permits the same file / same subdirectory to be in two different higher level directories.


Acyclic-Graph Directories (Cont.)Acyclic-Graph Directories (Cont.) Note that the sharing does not mean duplication. Quite the contrary!

There is only one copy of the item being shared!

If using an acyclic-graph directory structure, be careful. A file may have multiple absolute path names.

Referencing a file having more than one absolute path can cause problems in accumulating statistics on files or copying files to backup storage, or other issues too, such as accounting…

Deleting a file? With more than one path to a file, do we remove the file whenever anyone deletes it? This

may well cause problems for other ‘users’ of this file referencing it by a different path name.

If links are used and a link is deleted, the file may still be present. But if the file itself is deleted, the space is de-allocated and we may well have links with no file!

Your book points out that Unix leaves symbolic links when a file is deleted, and it is then up to the user to realize that the original file is gone. Windows does the same thing.


LinksLinks Sharing files and subdirectories is very important and done all the time. Unix accommodates this need by providing a new kind of directory

entry called a link. A link is effectively a pointer to another file or subdirectory.

Can be an absolute or a relative path. In practice, when we reference a file, we search the current directory.

If the directory entry is marked as a link, then the name of the real file

is included in the link information. “We resolve the link by using that path name to locate the real file.”

Links: easily identifiable in a directory; often called indirect pointers. The operating system ignores these links when traversing

directory trees to preserve the acyclic structure of the system.


More on Links – in UnixMore on Links – in Unix

In Unix, a symbolic link is also termed a soft link, and is a special kind of file that points to another file, much like a shortcut in Windows.

Unlike a hard link, a symbolic link does not contain the data in the target file.

It simply points to another entry somewhere in the file system.

This difference gives symbolic links the ability to link to directories, or to files (on remote computers networked through a network file system.

Also, when you delete a target file, symbolic links to that file become unusable.

(Google search on Unix, links)

http://kb.iu.edu/data/abhm.html


General Graph DirectoryGeneral Graph Directory

Here is a visual for a general graph directory.

You will note that there is a cycle present.

You can see that graph is not only acyclic but also a ‘general graph’ and contains a cycle.


General Graph Directory (Cont.)General Graph Directory (Cont.) How do we guarantee no cycles? This is the main question.

We understand two-level directories and tree-structured directories.

But when we add links to another existing tree-structured directories, we no longer have a tree and we have a graph.

See figure 10.11. Again, note that this graph contains a cycle.

Bottom line is that we want to avoid cycles at all costs, and a general graph, as shown, may (this one does) contain cycles! They may cause infinite loops in searching and potential degraded

performance. Problems too when we wish to delete a file, and more

In acyclic graphs, we may use a reference count bit = 0 for each entry to tell us there are no more references to a file or directory and hence it can be deleted.

In general graphs, however, when cycles are permitted, a reference count may not be 0 even when it is no longer possible to refer to a directory or a file due to deletion of links...

So what to do:


General Graph Directory – Garbage CollectionGeneral Graph Directory – Garbage Collection

One approach is to have a garbage collection routine to discover when there are no more references to an entry (hence space may be recovered.)

Implementation: Entire file system must be traversed marking everything that can be

accessed.

A second pass collects those not marked into a list of free space.

Unfortunately, traversing a file system in attempts to manage references to files that may / may not be deleted is very expensive and often not done.


Acyclic Graph – Garbage CollectionAcyclic Graph – Garbage Collection

We need garbage collection for a file system that permits cycles. In acyclic graphs, we can use a reference count bit = 0 for each entry

to tell us there are no more references to a file or directory and hence it can be deleted.

So in an acyclic graph garbage collection is much easier to deal with, since no cycles are permitted.

But as we add links, we must be certain that new additions will not result in a cycle, if we are to maintain acyclic nature of the directory.

We can effect garbage collection in an acyclic graph by using an algorithm that determines when a new file will cause a cycle. But running such an algorithm is very expensive when analyzing a large

directory structure on disk.

A simpler approach for directories and links is to bypass any links during directory traversals.

This precludes any possibility of a cycle and costs very little.

End of Chapter 10.2End of Chapter 10.2

Documents

Chapter 10.2: File-System Interface