35
1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files Disadvantage – added complexity can’t cope with new file types e.g MP3 Both MS-DOS and UNIX don’t care Considered to be a sequence of bytes with no structure However UNIX recognises Regular files – text data etc Directories Char/block – files which refer to devices Pipes – FIFO buffers MS-DOS only really has attributes System files Archive Hidden Read only Application packages do the rest

1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

Embed Size (px)

Citation preview

Page 1: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

1

File Management

How much does the operating system know?

Some systems support different types

Advantage – prevents you trying to read executable files

Disadvantage – added complexity

can’t cope with new file types

e.g MP3

Both MS-DOS and UNIX don’t care

Considered to be a sequence of bytes with no structure

However UNIX recognises

Regular files – text data etc

Directories

Char/block – files which refer to devices

Pipes – FIFO buffers

MS-DOS only really has attributes

System files

Archive

Hidden

Read only

Application packages do the rest

Page 2: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

2

File System Services

In one form or another, all file systems provide applications with the ability to:

• Create a file

• Remove a file

• Open an existing file

• Read from an open file

• Write to an open file

• Close an open file fetch metadata of a file

• Modify metadata of a file

Metadata are the data about a file.e.g. file attributes, (name, size, data type, etc) and data about records or data structures (length, fields, columns, etc) and data about data (where it is located, how it is associated, ownership, etc.).

Page 3: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

3

File StructureIn the simplest scenario the data is totally unstructured

and appears as a stream of bytes.

– The disadvantage to this approach is that each application may treat data structures e.g. one program may treat the fields in a database in a totally different way.

• The second way is to store and process data in terms of records of 80 (or some other fixed number of) characters.

– E.g. the first nine characters might be Social Security Number, the next 15 might be the first name etc.

• When dealing with fixed sized records, the record size is usually stored in the file’s metadata.

• However, there are disadvantages in having the operating system know about the file structures.

– The principal of these is the resultant size and complexity of the system.

– Additionally, a new application may require a file structure or access facility not implemented by the supplied system.

Page 4: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

4

In this respect, UNIX adopts an extreme position; files are considered to be sequences of bytes with no structure. UNIX recognises a limited number of ‘file types’, which are described below:

• regular ‘Ordinary’ files such as programs, text, data etc; in fact any file which is not of the other types.

• directory File containing references to other files; described shortly.

• char/block Not ‘true’ files at all but directory entries which refer to devices.

• pipe A pipe file is used as a queuing buffer which holds the standard output of one process and supplies this data as the standard input of another process.

Page 5: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

5

Both Microsoft and UNIX use directories which are notional grouping of files; since directories reside on disk, they can be considered as special files.

With the exception of directories, the nearest that Microsoft comes to having different file types is that files can have certain attributes. The possible attributes are:

• System Assigned to system files such as the operating system files

• Archive Used by file back-up systems

• Hidden A file with this attribute is ignored by many system commands

• Read-only The file cannot be written to or deleted

Attributes are not mutually exclusive; e.g. a ‘read-only’ file can also be ‘hidden’.

Page 6: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

6

File identification• Microsoft Windows original naming convention was the

8.3 filename convention

BASENAME.EXT

• When Internet first arrived, Windows systems were still restricted to 8.3 filename formats had to create web pages with names ending in .HTM, while Macintosh or Unix used .html filename extension.

• Similar with Java, since source code files to have the extension .java and compiles object code with .class.

• Eventually, Windows introduced support for long file names, and removed the 8.3 name/extension split in file names. It changed the length restriction to 255 characters, and allowed a mix of upper case and lower case letters.

• The use of three-character extensions under Microsoft Windows has continued (although it could be longer, as long as the whole name is less than 255) mainly for backward compatibility

• Cannot use / \ ? : * < > “ | characters or control characters in a filename

Page 7: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

• Unix stored the file name as a single string, not split into base name and extension components, with the '.' being just another character. Some applications use suffixes to indicate file types, but they did not use them as much - for example, executables and ordinary text files had

no suffixes in their names.

7

Page 8: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

8

Directories• Early operating systems ‘lumped together’ the

files on a disk.

• Files belonging to several different users and/or applications cannot be readily distinguished, hence problems such as file naming, security and ‘housekeeping’.

– For example, if several people were using the disk, the name of files would need to be strictly controlled by some person assigned to this task or by enforcing conventions which avoided name conflict.

– This was not a problem in earlier systems where, in effect, access to the computer was centralised in the data processing department.

• Systems introduced directories as a logical grouping of files managed by using a special directory file which contains a list of the directory’s member files. The first directory systems were simply two level; the top level contained user names plus a pointer to another directory which held all the files for that user.

Page 9: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

9

Simple Two Level Directory Structure

Page 10: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

10

• For each of its component files, the directory will generally hold information pertaining to the file e.g.

• Filename

• file type, if the system recognises different file types

• file attributes

• information indicating the location of the file on the disk

• access rights; i.e. an indication of who can access the file and how it can be accessed

• file size in bytes

• date information: e.g. date of creation , date of last access, date of last amendment

Note that it is admissible to have two or more files with the same name within the system provided that they are in separate directories.

Page 11: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

11

Managing file space

• Generally, space is allocated in units of a fixed size, called an allocation unit or block, which is a simple multiple of the disk physical sector size, usually 512 bytes. Typical sizes are 512, 1024 and 2048 bytes.

• Unix is generally 1kByte (1024 bytes).

• Each disk block has a unique address or disk block number

Page 12: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

12

The actual representation of the set of free blocks generally takes on of several forms:• Firstly, there is the free bitmap. In this representation, each block is represented by a single bit, which is 1 if the block is free and 0 if allocated.

The second representation is a free list, normally implemented as a linked list. The links need only to be a single pointer to the head of the list.

Page 13: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

13

The Third representation for free blocks is a simple list of free

blocks. If there is at least one free block on the disk then the

list can be stored in the free blocks themselves. However there

must be a way to identify other blocks if the entire list doesn’t

fit into one block. One approach is to create a linked list of

these list blocks using the last pointer in the block to point to

the next block in the list

Page 14: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

14

Treatment of Devices and Files• As far as the user is concerned, all sources of input and output in a

Unix system are represented as files. Teminals, disk drives, files, communication mechanisms such as pipes and sockets, all look alike to the systems programmer and are treated in the same way.

Users

System Call Interface

I/O Subsystem

Device Driver Interface

Drivers

Terminal Disk Network

Page 15: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

15

File Management

File naming

MS-DOS – up to 8 character name + dot + 3 character extension

UNIX – typical length = 14 but Linux = 256

no structure required – any character except / or <space> ok

can’t use >, *, ? because they have special meaning

Windows – up to 255 characters, can have spaces

also generates MS-DOS filename

ADAMS

JONES

SMITH

PROG1 File PROG1

MASTER DIRECTORY

USER DIRECTORYFOR ADAMS

List ofuser

names

Two Level Directory System

Page 16: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

16

File Management

system cprogs

edit spool src include work

prog1.c prog2.c prog1.e file.dat

ROOT

Directory normally holds following information

• filename

• file type

• file attributes

• location on disk

• access rights

• file size

• date information

Page 17: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

17

File Management

Clusters

Disk Space as an array of clusters

Allocated file

Unused portionof cluster

•Cluster sizes range from 512 to 64kBytes

•Using LBA addressing is a 32 bit address for each cluster

•Therefore 232

= 4,294,967,296 addresses.

•At 512 bytes

= 4,294,967,296 x 512 = 2,199,023,255,552

= 2 Tera bytes

Page 18: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

18

File Management

A A B A A C C C B B

File AEnd of file

File BEnd of file

Freecluster

A A B A A C C C B B

From directory entries

Space allocation – chained clusters

Typical cluster allocation of several files

Page 19: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

19

File Management (FAT Table)

ORDERS DAT no attribs 9/12/00 11:23:44 40 11,230

Directory entry:

Field 1- filenameField 2 – extension e.g .txt, .datField 3 – attributes e.g hidden, read only, directoryField 4 – date last modifiedField 5 – time last modifiedField 6 - starting clusterField 7 – size in bytes

File Allocation Table (FAT)

Entry# Value

…………………………

39 EOF40 4141 4242 4443 Bad44 102........................................102 103103 EOF

Page 20: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

20

Current Windows File Systems• HPFS (High Performance File System) is used by OS/2

and is supported by Windows NT. It provides better performance than FAT on larger disk volumes and supports long file names.

• NTFS (New Technology File System) is the standard file system of Windows NT, including its later versions Windows 2000, Windows XP, Windows Server 2003, Windows Server 2008, Windows Vista, and Windows 7.

• NTFS supersedes the FAT file system as the preferred file system for Microsoft’s Windows operating systems. NTFS has several improvements over FAT and HPFS.

• NTFS supports long file names including Unicode filenames, large volumes, data security, and universal file sharing.

• Formatting a volume with the NTFS file system results in the creation of several system files and the Master File Table (MFT), which contains information about all the files and folders on the NTFS volume.

Page 21: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

21

Master File Table• Logically, the disk consists of allocation units called

clusters.

• A cluster is a power-of-two multiple of the physical disk block size. The cluster size is set when the disk is formatted.

• The free list is a bitmap, each of whose bits describe one cluster.

• Clusters on the disk are numbered starting from zero to the maximum number of clusters (minus one). These numbers are called logical cluster numbers (LCN) and are used to name blocks (clusters) on disk.

Page 22: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

22

MFTStandard information: This attribute includes the

information that was standard in the MS-DOS world:

• read/write permissions,

• creation time,

• last modification time,

• count of how many directories point to this this file (hard link count.

File Name: This attribute describes the file's name in the Unicode character set.

Security Descriptor: This attribute lists which user owns the file and which users can access it (and how they can access it).

Data: This attribute either contains the actual file data in the case of a small file or points to the data

Page 23: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

23

MFT• When dealing with large data, the Data attribute contains

pointers to the data.

• The pointers to data are actually pointers to sequences of logical clusters on the disk.

• Each sequence is identified by three parts:

– starting cluster in the file, called the virtual cluster number (VCN),

– starting logical cluster (LCN) of the sequence on disk,

– length, counted as the number of clusters.

• The run of clusters is called an extent.

Page 24: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

24

Unix File Systems

boot block - used to boot the operating system.

super block - main function of the super block is to tell the file system how big the various pieces of the file system are.

The super block contains the following information, to keep track of the entire file system.

• Size of the file system

• Number of free blocks on the system

• A list of free blocks

• Index to next free block on the list

• Size of the inode list

• Number of free the inodes

• A list of free inodes

• Index to next free inode on the list

• Lock fields for free block and free inode lists

• Flag to indicate modification of super block

i-nodes followed by the block available for storage. Note that the free space is maintained as a linked list of available blocks.

Page 25: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

25

Inodes

Page 26: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

26

Inode Pointer Structure

Page 27: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

27

Page 28: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

END

28

Page 29: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

29

• Distributed Link Tracking maintains the integrity of shortcuts to files as well as OLE links within compound documents.

• Sparse Files Sparse files allow programs to create very large files but consume disk space only as needed.

Page 30: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

30

• Encryption The Encrypting File System (EFS) provides the core file encryption technology.

• Disk Quotas Disk quotas can be used to monitor and limit disk-space use.

• Reparse Points similar to Windows shortcuts and Unix symbolic links. For example, a reparse point would allow a folder such as C:\DVD to point to E:, the actual DVD drive.

• Volume Mount Points You already have one hard disk (Drive 1) mapped as C, and you don't want to map the second disk (Drive 2) as D. You can get around this problem by adding a mount point to the directory structure of Drive 1 that references Drive 2.

• Distributed Link Tracking maintains the integrity of shortcuts to files as well as OLE links within compound documents.

• Sparse Files Sparse files allow programs to create very large files but consume disk space only as needed.

Page 31: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

31

Page 32: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

32

Shared and Exclusive Access

• If a file is already open and another process wants access to it, the operating system has to decide whether to allow this or block it.

– In practice both cases may be desirable.

– For instance if both processes are reading the file o.k, however if both processes want to write to the file it may lead to inconsistent data.

– Consequently most file systems allow for both.

Two methods of requesting exclusive access are:

• The system call to open the file is passed a flag to say it is to be opened exclusively – if another process wants to access it, then it will have to wait.

• A system call which has the ability to lock a file or parts of it

– The difference between locking a file and locking an area of memory is that processes declare when they intend to write to a file.

Page 33: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

33

Access Patterns

• More often than not a process expects to open a file and begin reading and writing at the beginning.

• Each subsequent read or write continues where the last one left off.

• This type of sequential access requires that the operating system keeps tabs on the current location.

• However there are times when random access is required.

• This feature is sometimes included by using a rewind operation or seek. You can find both of these commands in the ‘C’ programming language.

Page 34: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

34

Uniform treatment of devices and files

Page 35: 1 File Management How much does the operating system know? Some systems support different types Advantage – prevents you trying to read executable files

35

File Management

1245 3

1011

678

File A

File B

File C

DIRECTORY