View
216
Download
0
Tags:
Embed Size (px)
Citation preview
HistoryHistory First developed in 1969 by Ken First developed in 1969 by Ken
Thompson and Dennis Ritchie of Thompson and Dennis Ritchie of the Research Group at Bell the Research Group at Bell Laboratories; incorporated features Laboratories; incorporated features of other operating systems, of other operating systems, especially MULTICS.especially MULTICS.
The third version was written in C, The third version was written in C, which was developed at Bell Labs which was developed at Bell Labs specifically to support UNIX.specifically to support UNIX.
History History (cont)(cont)
The most influential of the non-Bell Labs The most influential of the non-Bell Labs and non-AT&T UNIX development and non-AT&T UNIX development groups — University of California at groups — University of California at Berkeley (Berkeley Software Berkeley (Berkeley Software Distributions).Distributions).– 4BSD UNIX resulted from DARPA funding to 4BSD UNIX resulted from DARPA funding to
develop a standard UNIX system for develop a standard UNIX system for government use.government use.
– Developed for the VAX, 4.3BSD is one of the Developed for the VAX, 4.3BSD is one of the most influential versions, and has been most influential versions, and has been ported to many other platforms.ported to many other platforms.
History History (cont)(cont)
Several standardization projects Several standardization projects seek to consolidate the variant seek to consolidate the variant flavors of UNIX leading to one flavors of UNIX leading to one programming interface to UNIX.programming interface to UNIX.
History of UNIX VersionsHistory of UNIX Versions
Early Advantages of UNIXEarly Advantages of UNIX
Written in a high level language.Written in a high level language. Distributed in source form.Distributed in source form. Provided powerful operating Provided powerful operating
system primitives on an system primitives on an inexpensive platform.inexpensive platform.
Small size, modular, clean design.Small size, modular, clean design.
UNIX Design PrinciplesUNIX Design Principles
Designed to be a time sharing system.Designed to be a time sharing system. Has a simple standard user interface Has a simple standard user interface
(shell) that can be replaced.(shell) that can be replaced. File system with multilevel tree File system with multilevel tree
structured directories.structured directories. Files are supported by the kernel as Files are supported by the kernel as
unstructured sequences of bytes.unstructured sequences of bytes.
UNIX Design Principles UNIX Design Principles (cont)(cont)
Supports multiple processes; a Supports multiple processes; a process can easily create new process can easily create new processes.processes.
High priority given to making High priority given to making system interactive, and providing system interactive, and providing facilities for program development.facilities for program development.
Programmer InterfaceProgrammer Interface Like most computer systems, UNIX Like most computer systems, UNIX
consists of two separable parts:consists of two separable parts:– Kernel: everything below the system call Kernel: everything below the system call
interface and above the physical hardware.interface and above the physical hardware. Provides file system, CPU scheduling, memory Provides file system, CPU scheduling, memory
management, and other OS functions through management, and other OS functions through system calls.system calls.
– Systems programs: use the kernel Systems programs: use the kernel supported system calls to provide useful supported system calls to provide useful functions, such as compilation and file functions, such as compilation and file manipulation.manipulation.
4.3BSD Layer Structure4.3BSD Layer Structure
System CallsSystem Calls
System calls define the System calls define the programmer’s interface to UNIX programmer’s interface to UNIX
The set of system programs The set of system programs commonly available defines the commonly available defines the user interface.user interface.
The programmer and user The programmer and user interface define the context that interface define the context that the kernel must support.the kernel must support.
System Calls System Calls (cont)(cont)
Roughly, there are three Roughly, there are three categories of system calls in UNIX:categories of system calls in UNIX:– File manipulation (same system calls File manipulation (same system calls
also support device manipulation).also support device manipulation).– Process control.Process control.– Information manipulation.Information manipulation.
File ManipulationFile Manipulation
A A filefile is a sequence of bytes; the is a sequence of bytes; the kernel does not impose a structure kernel does not impose a structure on files.on files.
Files are organized in tree Files are organized in tree structured structured directoriesdirectories..
Directories are files that contain Directories are files that contain information on how to find other information on how to find other files.files.
File Manipulation File Manipulation (cont)(cont)
Path namePath name: identifies a file by : identifies a file by specifying a path through the directory specifying a path through the directory structure to the file.structure to the file.– Absolute path namesAbsolute path names start at root of file start at root of file
systemsystem– Relative path namesRelative path names start at the current start at the current
directorydirectory System calls for basic file manipulation: System calls for basic file manipulation:
create, open, read, write, close, unlink, trunc.create, open, read, write, close, unlink, trunc.
Typical UNIX Directory Typical UNIX Directory StructureStructure
Process ControlProcess Control A process is a program in execution.A process is a program in execution. Processes are identified by their Processes are identified by their
process identifier, an integer.process identifier, an integer. Process control system calls:Process control system calls:
– forkfork creates a new process. creates a new process.– execveexecve is used after a fork to replace is used after a fork to replace
one of the two processes’ virtual one of the two processes’ virtual memory space with a new program.memory space with a new program.
Process Control Process Control (cont)(cont)
– exitexit terminates a process. terminates a process.– A parent may A parent may waitwait for a child process to for a child process to
terminate; terminate; waitwait provides the process id of a provides the process id of a terminated child so that the parent can tell terminated child so that the parent can tell which child terminated.which child terminated.
– wait3wait3 allows the parent to collect allows the parent to collect performance statistics about the child.performance statistics about the child.
A A zombiezombie process results when the process results when the parent of a parent of a defunctdefunct child process exits child process exits before the terminated child.before the terminated child.
Illustration of Process Illustration of Process Control CallsControl Calls
Process Control Process Control (cont)(cont)
Processes communicate via pipes; Processes communicate via pipes; queues of bytes between two queues of bytes between two processes that are accessed by a processes that are accessed by a file descriptor.file descriptor.
All user processes are descendants All user processes are descendants of one original process, of one original process, initinit..
Process Control Process Control (cont)(cont)
init init forks a forks a gettygetty process: process: initializes terminal line parameters initializes terminal line parameters and passes the user’s and passes the user’s login namelogin name to to loginlogin..– loginlogin sets the numeric sets the numeric user identifieruser identifier
of the process to that of the userof the process to that of the user– Executes a Executes a shellshell which forks which forks
subprocesses for user commands. subprocesses for user commands.
Process Control Process Control (cont)(cont)
setuidsetuid bit sets the effective user bit sets the effective user identifier of the process to the user identifier of the process to the user identifier of the owner of the file, and identifier of the owner of the file, and leaves the leaves the real user identifierreal user identifier as it as it was.was.
setuidsetuid scheme allows certain processes scheme allows certain processes to have more than ordinary privileges to have more than ordinary privileges while still being executable by while still being executable by ordinary users.ordinary users.
SignalsSignals
Facility for handling exceptional Facility for handling exceptional conditions similar to software conditions similar to software interrupts.interrupts.
The The interruptinterrupt signal, SIGINT, is signal, SIGINT, is used to stop a command before used to stop a command before that command completes (usually that command completes (usually produced by ^C).produced by ^C).
Signals Signals (cont)(cont)
Signal use has expanded beyond Signal use has expanded beyond dealing with exceptional events. dealing with exceptional events. – Start and stop subprocesses on Start and stop subprocesses on
demand.demand.– SIGWINCH informs a process that the SIGWINCH informs a process that the
window in which output is being window in which output is being displayed has changed size.displayed has changed size.
– Deliver urgent data from network Deliver urgent data from network connections.connections.
Process GroupsProcess Groups Set of related processes that cooperate to Set of related processes that cooperate to
accomplish a common task.accomplish a common task. Only one process group may use a Only one process group may use a
terminal device for I/O at any time.terminal device for I/O at any time.– The foreground job has the attention of the The foreground job has the attention of the
user on the terminal.user on the terminal.– Background jobs – nonattached jobs that Background jobs – nonattached jobs that
perform their function without user perform their function without user interaction.interaction.
Access to the terminal is controlled by Access to the terminal is controlled by process group signals.process group signals.
Process Groups Process Groups (cont)(cont)
Each job inherits a controlling terminal Each job inherits a controlling terminal from its parent.from its parent.– If the process group of the controlling If the process group of the controlling
terminal matches the group of a process, terminal matches the group of a process, that process is in the foreground.that process is in the foreground.
– SIGTTIN or SIGTTOU freezes a background SIGTTIN or SIGTTOU freezes a background process that attempts to perform I/O; if the process that attempts to perform I/O; if the user foregrounds that process, SIGCONT user foregrounds that process, SIGCONT indicates that the process can now perform indicates that the process can now perform I/O.I/O.
– SIGSTOP freezes a foreground process.SIGSTOP freezes a foreground process.
Information ManipulationInformation Manipulation System calls to set and return an System calls to set and return an
interval timer: interval timer: getitimer/setitimer.getitimer/setitimer. Calls to set and return the current time:Calls to set and return the current time:
gettimeofday/settimeofday.gettimeofday/settimeofday.
Processes can ask for:Processes can ask for:– Their process identifier:Their process identifier: getpidgetpid– Their group identifier: Their group identifier: getgidgetgid– The name of the machine on which they are The name of the machine on which they are
executing: executing: gethostnamegethostname
Library RoutinesLibrary Routines
The system call interface to UNIX is The system call interface to UNIX is supported and augmented by a large supported and augmented by a large collection of library routines.collection of library routines.
Header files provide the definition of Header files provide the definition of complex data structures used in system complex data structures used in system calls.calls.
Additional library support is provided for Additional library support is provided for mathematical functions, network mathematical functions, network access, data conversion, etc.access, data conversion, etc.
User InterfaceUser Interface Programmers and users mainly deal with already Programmers and users mainly deal with already
existing systems programs:existing systems programs: – The needed system calls are embedded within the The needed system calls are embedded within the
program and do not need to be obvious to the user.program and do not need to be obvious to the user. The most common systems programs are file or The most common systems programs are file or
directory oriented.directory oriented.– Directory: Directory: mkdirmkdir, , rmdirrmdir, , cdcd, , pwdpwd– File: File: lsls, , cpcp, , mvmv, , rmrm
Other programs relate to editors (e.g., Other programs relate to editors (e.g., emacsemacs, , vivi) ) text formatters (e.g., text formatters (e.g., trofftroff, , TTEEXX), and other ), and other activities. activities.
Shells and CommandsShells and Commands
ShellShell – the user process which – the user process which executes programs (also called executes programs (also called command interpreter).command interpreter).
It is called a shell, because it surrounds It is called a shell, because it surrounds the kernel.the kernel.
The shell indicates its readiness to The shell indicates its readiness to accept another command by typing a accept another command by typing a prompt, and the user types a prompt, and the user types a command on a single line.command on a single line.
Shells and Commands Shells and Commands (cont)(cont)
A typical command is an A typical command is an executable binary object file.executable binary object file.
The shell travels through the The shell travels through the search pathsearch path to find the command to find the command file, which is then loaded and file, which is then loaded and executed.executed.
The directories The directories /bin/bin and and /usr/bin/usr/bin are are almost always in the search path.almost always in the search path.
Shells and Commands Shells and Commands (cont)(cont)
Typical search path on a BSD Typical search path on a BSD system:system:
( . /export/home/allan/Bin ( . /export/home/allan/Bin /usr/local/bin /bin /usr/bin /usr/ucb/bin )/usr/local/bin /bin /usr/bin /usr/ucb/bin )
The shell usually suspends its own The shell usually suspends its own execution until the command execution until the command completes.completes.
Can execute shell commands in Can execute shell commands in the background (using &).the background (using &).
Standard I/OStandard I/O Most processes expect three file Most processes expect three file
descriptors to be open when they start:descriptors to be open when they start:– Standard inputStandard input – program can read what the – program can read what the
user types.user types.– Standard outputStandard output – program can send output to – program can send output to
user’s screen.user’s screen.– Standard errorStandard error – error output. – error output.
Most programs can also accept a file Most programs can also accept a file (rather than a terminal) for standard input (rather than a terminal) for standard input and standard output.and standard output.
Standard I/O Standard I/O (cont)(cont)
The common shells have a simple The common shells have a simple syntax for changing what files are syntax for changing what files are open for the standard I/O streams open for the standard I/O streams of a process — I/O of a process — I/O redirectionredirection..
Standard I/O RedirectionStandard I/O Redirection
CommandCommand Meaning of commandMeaning of command
% % ls > fileals > filea direct output of direct output of lsls to file to file fileafilea
% % pr < filea > filebpr < filea > fileb input from input from fileafilea and output to and output to filebfileb
% % lpr < fileblpr < fileb input from input from filebfileb
% % make program>&errsmake program>&errs save both standard output save both standard output and and
standard error in a filestandard error in a file
Pipelines, Filters, and Shell Pipelines, Filters, and Shell ScriptsScripts
Can coalesce individual commands via a Can coalesce individual commands via a vertical bar that tells the shell to pass vertical bar that tells the shell to pass the previous command’s output as input the previous command’s output as input to the following commandto the following command
% ls | pr | lpr% ls | pr | lpr
Filter – a command such as Filter – a command such as prpr that that passes its standard input to its standard passes its standard input to its standard output, performing some processing on output, performing some processing on it.it.
Pipelines, Filters, and Shell Pipelines, Filters, and Shell Scripts Scripts (cont)(cont)
Writing a new shell with a different Writing a new shell with a different syntax and semantics would syntax and semantics would change the user view, but not change the user view, but not change the kernel or programmer change the kernel or programmer interface.interface.
X Window System is a widely X Window System is a widely accepted graphical interface for accepted graphical interface for UNIX.UNIX.
Process ManagementProcess Management Representation of processes is a major design Representation of processes is a major design
problem for operating systems.problem for operating systems. UNIX is distinct from other systems in that UNIX is distinct from other systems in that
multiple processes can be created and multiple processes can be created and manipulated with ease.manipulated with ease.
These processes are represented in UNIX by These processes are represented in UNIX by various control blocks.various control blocks.– Control blocks associated with a process are stored in the Control blocks associated with a process are stored in the
kernel.kernel.
– Information in these control blocks is used by the kernel for Information in these control blocks is used by the kernel for process control and CPU scheduling.process control and CPU scheduling.
Process Control BlocksProcess Control Blocks The most basic data structure The most basic data structure
associated with processes is the associated with processes is the process structureprocess structure..– Unique process identifier,Unique process identifier,– Scheduling information (e.g., priority).Scheduling information (e.g., priority).– Pointers to other control blocks.Pointers to other control blocks.
The The virtualvirtual addressaddress spacespace of a user of a user process is divided into text (program process is divided into text (program code), data, and stack segments.code), data, and stack segments.
Process Control Blocks Process Control Blocks (cont)(cont)
Every process with sharable text Every process with sharable text has a pointer from its process has a pointer from its process structure to a structure to a text structuretext structure..– Always resident in main memory.Always resident in main memory.– Records how many processes are Records how many processes are
using the text segment.using the text segment.– Records where the page table for the Records where the page table for the
text segment can be found on disk text segment can be found on disk when it is swapped.when it is swapped.
System Data SegmentSystem Data Segment Most ordinary work is done in Most ordinary work is done in user modeuser mode; ;
system calls are performed in system calls are performed in system modesystem mode.. The system and user phases of a process The system and user phases of a process
never execute simultaneously.never execute simultaneously. A A kernel stackkernel stack (rather than the user stack) is (rather than the user stack) is
used for a process executing in system used for a process executing in system mode.mode.
The kernel stack and the user structure The kernel stack and the user structure together compose the together compose the system datasystem data segment segment for the process. for the process.
Finding Parts of a Process Finding Parts of a Process Using Process Structure Using Process Structure
Allocating a New Process Allocating a New Process StructureStructure
forkfork allocates a new process structure allocates a new process structure for the child process, and copies the for the child process, and copies the user structure.user structure.– New page table is constructed.New page table is constructed.– New main memory is allocated for the data New main memory is allocated for the data
and stack segments of the child process.and stack segments of the child process.– Copying the user structure preserves open Copying the user structure preserves open
file descriptors, user and group identifiers, file descriptors, user and group identifiers, signal handling, etc.signal handling, etc.
Allocating a New Process Allocating a New Process Structure Structure (cont)(cont)
vforkvfork does does notnot copy the data and copy the data and stack to the new process; the new stack to the new process; the new process simply shares the page process simply shares the page table with the old one.table with the old one.– New user structure and a new process New user structure and a new process
structure are still created.structure are still created.– Commonly used by a shell to execute a Commonly used by a shell to execute a
command and to wait for its command and to wait for its completion.completion.
Allocating a New Process Allocating a New Process Structure Structure (cont)(cont)
A parent process uses A parent process uses vforkvfork to produce a to produce a child process; the child uses child process; the child uses execveexecve to to change its virtual address space, so change its virtual address space, so there is no need for a copy of the there is no need for a copy of the parent.parent.
Using Using vforkvfork with a large parent process with a large parent process saves CPU time, but can be dangerous saves CPU time, but can be dangerous since any memory change occurs in since any memory change occurs in both processes until both processes until execveexecve occurs. occurs.
Allocating a New Process Allocating a New Process Structure Structure (cont)(cont)
execveexecve creates no new process or creates no new process or user structure; rather the text and user structure; rather the text and data of the process are replaced.data of the process are replaced.
CPU SchedulingCPU Scheduling
Every process has a Every process has a scheduling scheduling prioritypriority associated with it; larger associated with it; larger numbers indicate lower priority.numbers indicate lower priority.
Negative feedback in CPU scheduling Negative feedback in CPU scheduling makes it difficult for a single process makes it difficult for a single process to take all the CPU time.to take all the CPU time.
Process aging is employed to Process aging is employed to prevent starvation.prevent starvation.
CPU Scheduling CPU Scheduling (cont)(cont)
When a process chooses to relinquish When a process chooses to relinquish the CPU, it goes to the CPU, it goes to sleepsleep on an on an eventevent..
When that event occurs, the system When that event occurs, the system process that knows about it calls process that knows about it calls wakeupwakeup with the address corresponding with the address corresponding to the event, and to the event, and allall processes that processes that have done a have done a sleepsleep on the same on the same address are put in the ready queue to address are put in the ready queue to be run.be run.
Memory ManagementMemory Management
The initial memory management The initial memory management schemes were constrained in size schemes were constrained in size by the relatively small memory by the relatively small memory resources of the PDP machines on resources of the PDP machines on which UNIX was developed.which UNIX was developed.
Memory Management Memory Management (cont)(cont)
Pre 3BSD systems use swapping Pre 3BSD systems use swapping exclusively to handle memory exclusively to handle memory contention among processes: If contention among processes: If there is too much contention, there is too much contention, processes are swapped out until processes are swapped out until enough memory is available.enough memory is available.
Allocation of both main memory Allocation of both main memory and swap space is done first-fit.and swap space is done first-fit.
Memory Management Memory Management (cont)(cont)
Sharable text segments do not need to Sharable text segments do not need to be swapped; results in less swap traffic be swapped; results in less swap traffic and reduces the amount of main and reduces the amount of main memory required for multiple processes memory required for multiple processes using the same text segment.using the same text segment.
The The schedulerscheduler processprocess (or (or swapperswapper) ) decides which processes to swap in or decides which processes to swap in or out, considering such factors as time out, considering such factors as time idle, time in or out of main memory, idle, time in or out of main memory, size, etc.size, etc.
Memory Management Memory Management (cont)(cont)
In 4.2BSD, swap space is allocated In 4.2BSD, swap space is allocated in pieces that are multiples of in pieces that are multiples of power of 2 and minimum size, up power of 2 and minimum size, up to a maximum size determined by to a maximum size determined by the size or the swap-space the size or the swap-space partition on the disk.partition on the disk.
Paging Paging Berkeley UNIX systems depend primarily Berkeley UNIX systems depend primarily
on paging for memory-contention on paging for memory-contention management, and depend only management, and depend only secondarily on swapping.secondarily on swapping.
Demand pagingDemand paging – When a process needs – When a process needs a page and the page is not there, a page a page and the page is not there, a page fault to the kernel occurs, a frame of main fault to the kernel occurs, a frame of main memory is allocated, and the proper disk memory is allocated, and the proper disk page is read into the frame. page is read into the frame.
Paging Paging (cont)(cont)
A A pagedaemonpagedaemon process uses a process uses a modified second-chance page-modified second-chance page-replacement algorithm to keep replacement algorithm to keep enough free frames to support the enough free frames to support the executing processes.executing processes.
If the scheduler decides that the If the scheduler decides that the paging system is overloaded, paging system is overloaded, processes are swapped out whole processes are swapped out whole until the overload is relieved. until the overload is relieved.
File SystemFile System
The UNIX file system supports two The UNIX file system supports two main objects: files and directories.main objects: files and directories.
Directories are just files with a Directories are just files with a special format, so the special format, so the representation of a file is the basic representation of a file is the basic UNIX concept.UNIX concept.
Blocks and FragmentsBlocks and Fragments Most of the file system is taken up by Most of the file system is taken up by data data
blocksblocks.. 4.2BSD uses 4.2BSD uses twotwo block sizes for files which block sizes for files which
have no indirect blocks:have no indirect blocks:– All the blocks of a file are of a large All the blocks of a file are of a large blockblock size size
(such as 8K), except the last.(such as 8K), except the last.– The last block is an appropriate multiple of a The last block is an appropriate multiple of a
smaller smaller fragment sizefragment size (i.e., 1024) to fill out the file. (i.e., 1024) to fill out the file.– Thus, a file of size 18,000 bytes would have two Thus, a file of size 18,000 bytes would have two
8K blocks and one 2K fragment (which would not 8K blocks and one 2K fragment (which would not be filled completely).be filled completely).
Blocks and Fragments Blocks and Fragments (cont)(cont)
The The blockblock and and fragmentfragment sizes are set sizes are set during file system creation according to during file system creation according to the intended use of the file system:the intended use of the file system:– If many small files are expected, the fragment If many small files are expected, the fragment
size should be small.size should be small.– If repeated transfers of large files are expected, If repeated transfers of large files are expected,
the basic block size should be large.the basic block size should be large. The maximum block-to-fragment ratio is The maximum block-to-fragment ratio is
8 : 1; the minimum block size is 4K (typical 8 : 1; the minimum block size is 4K (typical choices are 4096 : 512 and 8192 : 1024).choices are 4096 : 512 and 8192 : 1024).
InodesInodes
A file is represented by an A file is represented by an inodeinode — a — a record that stores information about record that stores information about a specific file on the disk.a specific file on the disk.
The inode also contains 15 pointers The inode also contains 15 pointers to the disk blocks containing the file’s to the disk blocks containing the file’s data contents.data contents.– First 12 point to First 12 point to direct blocksdirect blocks..
Inodes Inodes (cont)(cont)
– Next three point to Next three point to indirect blocksindirect blocks First indirect block pointer is the address of a First indirect block pointer is the address of a
single indirect blocksingle indirect block — an index block — an index block containing the addresses of blocks that do containing the addresses of blocks that do contain data.contain data.
Second is a Second is a double-indirect-blockdouble-indirect-block pointepointer, the r, the address of a block that contains the addresses address of a block that contains the addresses of blocks that contain pointer to the actual data of blocks that contain pointer to the actual data blocks.blocks.
A A triple indirecttriple indirect pointer is not needed; files with pointer is not needed; files with as many as 232 bytes will use only double as many as 232 bytes will use only double indirection.indirection.
DirectoriesDirectories
The inode type field distinguishes The inode type field distinguishes between plain files and directories.between plain files and directories.
Directory entries are of variable Directory entries are of variable length; each entry contains first length; each entry contains first the length of the entry, then the the length of the entry, then the file name and the inode number.file name and the inode number.
Directories Directories (cont)(cont)
The user refers to a file by a path The user refers to a file by a path name,whereas the file system uses name,whereas the file system uses the inode as its definition of a file.the inode as its definition of a file.– The kernel has to map the supplied The kernel has to map the supplied
user path name to an inode.user path name to an inode.– Directories are used for this mapping.Directories are used for this mapping.
Directories Directories (cont)(cont)
First determine the starting directory:First determine the starting directory:– If the first character is “/”, the starting If the first character is “/”, the starting
directory is the root directory.directory is the root directory.– For any other starting character, the For any other starting character, the
starting directory is the current starting directory is the current directory.directory.
The search process continues until The search process continues until the end of the path name is reached the end of the path name is reached and the desired inode is returned.and the desired inode is returned.
Directories Directories (cont)(cont)
Once the inode is found, a file Once the inode is found, a file structure is allocated to point to structure is allocated to point to the inode.the inode.
4.3BSD improved file system 4.3BSD improved file system performance by adding a directory performance by adding a directory name cache to hold recent name cache to hold recent directory-to-inode translations.directory-to-inode translations.
Mapping of a File Mapping of a File Descriptor to an InodeDescriptor to an Inode
System calls that refer to open System calls that refer to open files indicate the file is passing a files indicate the file is passing a file descriptor as an argument.file descriptor as an argument.
The file descriptor is used by the The file descriptor is used by the kernel to index a table of open files kernel to index a table of open files for the current process.for the current process.
Each entry of the table contains a Each entry of the table contains a pointer to a file structure.pointer to a file structure.
Mapping of a File Mapping of a File Descriptor to an Inode Descriptor to an Inode (cont)(cont)
This file structure in turn points to This file structure in turn points to the inode.the inode.
Since the open file table has a Since the open file table has a fixed length which is only setable fixed length which is only setable at boot time, there is a fixed limit at boot time, there is a fixed limit on the number of concurrently on the number of concurrently open files in a system.open files in a system.
File System Control BlocksFile System Control Blocks
Disk StructuresDisk Structures The one file system that a user The one file system that a user
ordinarily sees may actually ordinarily sees may actually consist of several physical file consist of several physical file systems, each on a different systems, each on a different device.device.
Partitioning a physical device into Partitioning a physical device into multiple file systems has several multiple file systems has several benefits:benefits:
Disk Structures Disk Structures (cont)(cont)
– Different file systems can support Different file systems can support different uses.different uses.
– Reliability is improved.Reliability is improved.– Can improve efficiency by varying file Can improve efficiency by varying file
system parameters.system parameters.– Prevents one program from using all Prevents one program from using all
available space for a large file.available space for a large file.– Speeds up searches on backup tapes Speeds up searches on backup tapes
and restoring partitions from tape.and restoring partitions from tape.
Disk Structures Disk Structures (cont)(cont)
The The root fileroot file system is always available on system is always available on a drive.a drive.
Other file systems may be Other file systems may be mountedmounted — — i.e., integrated into the directory i.e., integrated into the directory hierarchy of the root file system.hierarchy of the root file system.
The following figure illustrates how a The following figure illustrates how a directory structure is partitioned into file directory structure is partitioned into file systems, which are mapped onto logical systems, which are mapped onto logical devices, which are partitions of physical devices, which are partitions of physical devices.devices.
Mapping File System to Mapping File System to Physical DevicesPhysical Devices
ImplementationsImplementations The user interface to the file system is The user interface to the file system is
simple and well defined, allowing the simple and well defined, allowing the implementation of the file system itself implementation of the file system itself to be changed without significant effect to be changed without significant effect on the user.on the user.
For Version 7, the size of inodes For Version 7, the size of inodes doubled, the maximum file and file doubled, the maximum file and file system sized increased, and the details system sized increased, and the details of free-list handling and superblock of free-list handling and superblock information changed.information changed.
Implementations Implementations (cont)(cont)
In 4.0BSD, the size of blocks used in the file In 4.0BSD, the size of blocks used in the file system was increased from 512 bytes to 1024 system was increased from 512 bytes to 1024 bytes — increased internal fragmentation, but bytes — increased internal fragmentation, but doubled throughput.doubled throughput.
4.2BSD added the Berkeley Fast File System, 4.2BSD added the Berkeley Fast File System, which increased speed, and included new which increased speed, and included new features.features.– New directory system calls.New directory system calls.– truncatetruncate calls calls– Fast File System found in most implementations of Fast File System found in most implementations of
UNIX.UNIX.
Layout and Allocation Layout and Allocation PolicyPolicy
The kernel uses a <The kernel uses a <logical device logical device number, inode numbernumber, inode number> pair to > pair to identify a file.identify a file.– The logical device number defines the The logical device number defines the
file system involved.file system involved.– The inodes in the file system are The inodes in the file system are
numbered in sequence.numbered in sequence.
Layout and Allocation Layout and Allocation Policy Policy (cont)(cont)
4.3BSD introduced the 4.3BSD introduced the cylinder groupcylinder group — allows localization of the blocks in a — allows localization of the blocks in a file.file.– Each cylinder group occupies one or more Each cylinder group occupies one or more
consecutive cylinders of the disk, so that consecutive cylinders of the disk, so that disk accesses within the cylinder group disk accesses within the cylinder group require minimal disk head movement.require minimal disk head movement.
– Every cylinder group has a superblock, a Every cylinder group has a superblock, a cylinder block, an array of inodes, and cylinder block, an array of inodes, and some data blocks.some data blocks.
4.3BSD Cylinder Group4.3BSD Cylinder Group
I/O SystemI/O System
The I/O system hides the The I/O system hides the peculiarities of I/O devices from the peculiarities of I/O devices from the bulk of the kernel.bulk of the kernel.
Consists of a buffer caching system, Consists of a buffer caching system, general device driver code, and general device driver code, and drivers for specific hardware devices.drivers for specific hardware devices.
Only the device driver knows the Only the device driver knows the peculiarities of a specific device.peculiarities of a specific device.
4.3 BSD Kernel I/O 4.3 BSD Kernel I/O StructureStructure
Block Buffer CacheBlock Buffer Cache Consist of buffer headers, each of which can Consist of buffer headers, each of which can
point to a piece of physical memory, as well as point to a piece of physical memory, as well as to a device number and a block number on the to a device number and a block number on the device.device.
The buffer headers for blocks not currently in The buffer headers for blocks not currently in use are kept in several linked lists:use are kept in several linked lists: – Buffers recently used, linked in LRU order (LRU list).Buffers recently used, linked in LRU order (LRU list).– Buffers not recently used, or without valid contents Buffers not recently used, or without valid contents
(AGE list).(AGE list).– EMPTY buffers with no associated physical memory.EMPTY buffers with no associated physical memory.
Block Buffer Cache Block Buffer Cache (cont)(cont)
When a block is wanted from a When a block is wanted from a device, the cache is searched.device, the cache is searched.
If the block is found it is used, and If the block is found it is used, and no I/O transfer is necessary.no I/O transfer is necessary.
If it is not found, a buffer is chosen If it is not found, a buffer is chosen from the AGE list, or the LRU list if from the AGE list, or the LRU list if AGE is empty.AGE is empty.
Block Buffer Cache Block Buffer Cache (cont)(cont)
Buffer cache size effects system Buffer cache size effects system performance; if it is large enough, the performance; if it is large enough, the percentage of cache hits can be high and percentage of cache hits can be high and the number of actual I/O transfers low.the number of actual I/O transfers low.
Data written to a disk file are buffered in Data written to a disk file are buffered in the cache, and the disk driver sorts its the cache, and the disk driver sorts its output queue according to disk address output queue according to disk address — these actions allow the disk driver to — these actions allow the disk driver to minimize disk head seeks and to write minimize disk head seeks and to write data at times optimized for disk rotation.data at times optimized for disk rotation.
Raw Device InterfacesRaw Device Interfaces
Almost every block device has a Almost every block device has a character interface, or character interface, or raw deviceraw device interfaceinterface — unlike the block — unlike the block interface, it bypasses the block interface, it bypasses the block buffer cache.buffer cache.
Each disk driver maintains a queue Each disk driver maintains a queue of pending transfers.of pending transfers.
Raw Device Interfaces Raw Device Interfaces (cont)(cont)
Each record in the queue specifies: Each record in the queue specifies: – Whether it is a read or a write.Whether it is a read or a write.– A main memory address for the transfer.A main memory address for the transfer.– A device address for the transfer.A device address for the transfer.– A transfer size.A transfer size.
It is simple to map the information It is simple to map the information from a block buffer to what is from a block buffer to what is required for this queue.required for this queue.
C-ListsC-Lists
Terminal drivers use a character Terminal drivers use a character buffering system which involves buffering system which involves keeping small blocks of characters in keeping small blocks of characters in linked lists.linked lists.
A A writewrite system call to a terminal system call to a terminal enqueues characters on a list for the enqueues characters on a list for the device. An initial transfer is started, device. An initial transfer is started, and interrupts cause dequeueing of and interrupts cause dequeueing of characters and further transfers.characters and further transfers.
C-Lists C-Lists (cont)(cont)
Input is similarly interrupt driven.Input is similarly interrupt driven. It is also possible to have the It is also possible to have the
device driver bypass the canonical device driver bypass the canonical queue and return characters queue and return characters directly form the raw queue — directly form the raw queue — raw raw modemode (used by full-screen editors (used by full-screen editors and other programs that need to and other programs that need to react to every keystroke).react to every keystroke).
Interprocess Interprocess CommunicationCommunication
Most UNIX systems have not permitted Most UNIX systems have not permitted shared memoryshared memory because the PDP-11 because the PDP-11 hardware did not encourage it.hardware did not encourage it.
The The pipepipe is the IPC mechanism most is the IPC mechanism most characteristic of UNIX.characteristic of UNIX.– Permits a reliable unidirectional byte stream Permits a reliable unidirectional byte stream
between two processes.between two processes.– A benefit of pipes small size is that pipe data A benefit of pipes small size is that pipe data
are seldom written to disk; they usually are are seldom written to disk; they usually are kept in memory by the normal block buffer kept in memory by the normal block buffer cache.cache.
Interprocess Interprocess Communication Communication (cont)(cont)
In 4.3BSD, pipes are implemented as In 4.3BSD, pipes are implemented as a special case of the a special case of the socketsocket mechanism which provides a general mechanism which provides a general interface not only to facilities such as interface not only to facilities such as pipes, which are local to one pipes, which are local to one machine, but also to networking machine, but also to networking facilities.facilities.
The socket mechanism can be used The socket mechanism can be used by unrelated processes.by unrelated processes.
SocketsSockets A socket is an end point of a A socket is an end point of a
communication.communication. An in-use socket it usually bound with an An in-use socket it usually bound with an
address; the nature of the address address; the nature of the address depends on the depends on the communication domaincommunication domain of of the socket.the socket.
A characteristic property of a domain is A characteristic property of a domain is that processes communication in the that processes communication in the same domain use the same same domain use the same address address format.format.
Sockets Sockets (cont)(cont)
A single socket can communicate A single socket can communicate in only one domain — the three in only one domain — the three domains currently implemented in domains currently implemented in 4.3BSD are:4.3BSD are:– UNIX domain (AF_UNIX).UNIX domain (AF_UNIX).– Internet domain (AF_INET).Internet domain (AF_INET).– XEROX Network Service (NS) domain XEROX Network Service (NS) domain
(AF_NS).(AF_NS).
Socket TypesSocket Types Stream socketsStream sockets provide reliable, provide reliable,
duplex, sequenced data streams. duplex, sequenced data streams. Supported in Internet domain by the Supported in Internet domain by the TCP protocol. In UNIX domain, pipes TCP protocol. In UNIX domain, pipes are implemented as a pair of are implemented as a pair of communicating stream sockets.communicating stream sockets.
Sequenced packet socketsSequenced packet sockets provide provide similar data streams, except that similar data streams, except that record boundaries are provided. record boundaries are provided. Used in XEROX AF_NS protocol.Used in XEROX AF_NS protocol.
Socket Types Socket Types (cont)(cont)
Datagram socketsDatagram sockets transfer transfer messages of variable size in either messages of variable size in either direction. Supported in Internet direction. Supported in Internet domain by UDP protocoldomain by UDP protocol
Reliably delivered message Reliably delivered message socketssockets transfer messages that are transfer messages that are guaranteed to arrive. Currently guaranteed to arrive. Currently unsupported.unsupported.
Socket Types Socket Types (cont)(cont)
Raw socketsRaw sockets allow direct access allow direct access by processes to the protocols that by processes to the protocols that support the other socket types; support the other socket types; e.g., in the Internet domain, it is e.g., in the Internet domain, it is possible to reach TCP, IP beneath possible to reach TCP, IP beneath that, or a deeper Ethernet that, or a deeper Ethernet protocol. Useful for developing new protocol. Useful for developing new protocols.protocols.
Socket System CallsSocket System Calls The The socketsocket call creates a socket; takes call creates a socket; takes
as arguments specifications of the as arguments specifications of the communication domain, socket type, communication domain, socket type, and protocol to be used and returns a and protocol to be used and returns a small integer called a small integer called a socket descriptor.socket descriptor.
A name is bound to a socket by the A name is bound to a socket by the bindbind system call. system call.
The The connectconnect system call is used to system call is used to initiate a connection.initiate a connection.
Socket System Calls Socket System Calls (cont)(cont)
A server process uses A server process uses socketsocket to create a to create a socket and socket and bindbind to bind the well-known to bind the well-known address of its service to that socket. address of its service to that socket. – Uses Uses listenlisten to tell the kernel that it is ready to to tell the kernel that it is ready to
accept connections from clients.accept connections from clients.– Uses Uses acceptaccept to accept individual connections. to accept individual connections.– Uses Uses forkfork to produce a new process after the to produce a new process after the
acceptaccept to service the client while the original to service the client while the original server process continues to listen for more server process continues to listen for more connections.connections.
Socket System Calls Socket System Calls (cont)(cont)
The simplest way to terminate a The simplest way to terminate a connection and to destroy the connection and to destroy the associated socket is to use the associated socket is to use the closeclose system call on its socket descriptor.system call on its socket descriptor.
The The selectselect system call can be used system call can be used to multiplex data transfers on to multiplex data transfers on several file descriptors and /or socket several file descriptors and /or socket descriptorsdescriptors
Network SupportNetwork Support Networking support is one of the most Networking support is one of the most
important features in 4.3BSD.important features in 4.3BSD. The socket concept provides the The socket concept provides the
programming mechanism to access programming mechanism to access other processes, even across a network.other processes, even across a network.
Sockets provide an interface to several Sockets provide an interface to several sets of protocols. sets of protocols.
Almost all current UNIX systems support Almost all current UNIX systems support UUCP.UUCP.
Network Support Network Support (cont)(cont)
4.3BSD supports the DARPA Internet 4.3BSD supports the DARPA Internet protocols UDP, TCP, IP, and ICMP on a protocols UDP, TCP, IP, and ICMP on a wide range of Ethernet, token-ring, and wide range of Ethernet, token-ring, and ARPANET interfaces.ARPANET interfaces.
The 4.3BSD networking The 4.3BSD networking implementation, and to a certain implementation, and to a certain extent the socket facility , is more extent the socket facility , is more oriented toward the ARPANET oriented toward the ARPANET Reference Model (ARM).Reference Model (ARM).
Network Reference Network Reference Models and LayeringModels and Layering