128
Introduction & Architecture Refresher CS 519: Operating System Theory Computer Science, Rutgers University Instructor: Thu D. Nguyen TA: Xiaoyan Li Spring 2002

Introduction & Architecture Refresher CS 519: Operating System Theory Computer Science, Rutgers University Instructor: Thu D. Nguyen TA: Xiaoyan Li Spring

  • View
    221

  • Download
    1

Embed Size (px)

Citation preview

Introduction & Architecture Refresher

CS 519: Operating System Theory

Computer Science, Rutgers University

Instructor: Thu D. Nguyen

TA: Xiaoyan Li

Spring 2002

Computer Science, Rutgers CS 519: Operating System Theory2

Logistics

Instructor: Thu D. NguyenOffice: CoRE 326

Office Hours: TDB

TA: Xiaoyan LiOffice: Hill 411

Office Hours: TDB

Resourceshttp://paul.rutgers.edu/cs519/S02

[email protected]

Computer Science, Rutgers CS 519: Operating System Theory3

Course Overview

GoalsDeeper understanding of OS and the OS/architecture interface/interaction

A little on distributed/parallel OS

PrerequisitesUndergraduate OS and architecture courses

Good programming skills in C and UNIX

Computer Science, Rutgers CS 519: Operating System Theory4

Course Overview

Structure:Each major area

Review basic material

Read and discuss papers to understand advanced issues

Several programming assignments

Significant project

Midterm and final exams

Computer Science, Rutgers CS 519: Operating System Theory5

Textbooks - Required

William Stallings. Operating Systems: Internals and Design Principles, Prentice-Hall, 1998. (Third Edition)

Papers

Most available from the web

Others accessible from the library IRIS Reserve Desk

Computer Science, Rutgers CS 519: Operating System Theory6

Topics

Processes, threads, and synchronization

Virtual memory

I/O and file system

Communication

Processor scheduling

Transactions

Distributed file systems

Parallel and real-time scheduling

Cluster memory management

OS structure and organization

Current research topics

Computer Science, Rutgers CS 519: Operating System Theory7

Project

GoalsLearn to design, implement, and evaluate a significant OS-level software system

Improve systems programming skills: virtual memory, threads, synchronization, sockets

StructureSmall teams (3 students per team)

Written project report

Possibly 15-minute oral presentation of final report

Computer Science, Rutgers CS 519: Operating System Theory8

Grading

Midterm 20%

Final 30%

Programming assignments 20%

Project 30%

Computer Science, Rutgers CS 519: Operating System Theory9

Today

What is an Operating System?Stallings 2.1-2.4

Architecture refresher …

Computer Science, Rutgers CS 519: Operating System Theory10

What is an operating system?

A software layer between the hardware and the application programs/users which provides a virtual machine interface: easy to use (hides complexity) and safe (prevents and handles errors)

Acts as resource manager that allows programs/users to share the hardware resources in a protected way: fair and efficient

hardware

operating system

application (user)

Computer Science, Rutgers CS 519: Operating System Theory11

How does an OS work?

Receives requests from the application: system calls

Satisfies the requests: may issue commands to hardware

Handles hardware interrupts: may upcall the application

OS complexity: synchronous calls + asynchronous events

hardware

OS

application (user) system calls upcalls

commands interrupts

hardware independent

hardware dependent

Computer Science, Rutgers CS 519: Operating System Theory12

Mechanism and Policy

Mechanisms: data structures and operations that implement an abstraction (e.g. the buffer cache)

Policies: the procedures that guide the selection of a certain course of action from among alternatives (e.g. the replacement policy for the buffer cache)

Traditional OS is rigid: mechanism together with policy

hardware

operating system: mechanism+policy

application (user)

Computer Science, Rutgers CS 519: Operating System Theory13

Mechanism-Policy Split

Single policy often not the best for all cases

Separate mechanisms from policies:OS provides the mechanism + some policy

applications contribute to the policy

Flexibility + efficiency require new OS structures and/or new OS interfaces

Architecture Refresher

Computer Science, Rutgers CS 519: Operating System Theory15

Basic computer structure

CPU Memory

memory bus

I/O bus

disk Net interface

Computer Science, Rutgers CS 519: Operating System Theory16

von Neumann Machine

The first computers (late 40’s) were calculators

The advance was the idea of storing the instructions (coded as numbers) along with the data in the same memory

Crux of the split between:Central Processing Unit (CPU) and

Memory

Computer Science, Rutgers CS 519: Operating System Theory17

Conceptual Model

Addresses ofmemory cells

+-*/

+-*/

CPU 012

Memory contents

34

5

76

89

"big byte array"

Computer Science, Rutgers CS 519: Operating System Theory18

Operating System Perspective

A computer is a piece of hardware which runs the fetch-decode-execute loop

Next slides: walk through a very simple computer to illustrate

Machine organizationWhat are the pieces and how they fit together

The basic fetch-decode-execute loop

How higher-level constructs are translated into machine instructions

At its core, the OS builds what looks like a more complex machine on top of this basic hardware

Computer Science, Rutgers CS 519: Operating System Theory19

Fetch-Decode-Execute

Computer as a large, general purpose calculatorwant to program it for multiple functions

All von Neumann computers follow the same loop:Fetch the next instruction from memory

Decode the instruction to figure out what to do

Execute the instruction and store the result

Instructions are simple. Examples:Increment the value of a memory cell by 1

Add the contents of memory cells X and Y and store in Z

Multiply contents of memory cells A and B and store in B

Computer Science, Rutgers CS 519: Operating System Theory20

Instruction Encoding

How to represent instructions as numbers?

operators+: 1-: 2*: 3/: 4

operands destination8 bits 8 bits 8 bits 8 bits

Computer Science, Rutgers CS 519: Operating System Theory21

Example Encoding

Add cell 28 to cell 63 and place result in cell 100:

operator

+: 1-: 2*: 3/: 4

source operands destinationCell 100Cell 63 Cell 28

8 bits 8 bits 8 bits 8 bits

Instruction as a number in: Decimal: 1:28:63:100Binary: 00000001:00011100:00111111:01100100 Hexadecimal: 01:1C:3F:64

Computer Science, Rutgers CS 519: Operating System Theory22

Example Encoding (cont)

How many instructions can this encoding have?

8 bits, 2^8 combinations = 256 instructions

How much memory can this example instruction set support?

Assume each memory cell is a byte (8 bits) wide

Assume operands and destination come from the same memory

8 bits per source/dest = 2^8 combinations = 256 bytes

How many bytes did we use per instruction?

4 bytes per instruction

How could we get more memory without changing the encoding?

Why is this simple encoding not realistic?

Computer Science, Rutgers CS 519: Operating System Theory23

The Program Counter

Where is the “next instruction” held in the machine?

In a special memory cell in the CPU called the “program counter" (the PC)

Special purpose memory in the CPU and devices are called registers

Naïve fetch cycle: Increment the PC by the instruction length (4) after each execute

Assumes all instructions are the same length

Computer Science, Rutgers CS 519: Operating System Theory24

Conceptual Model

+-*/

+-*/

CPU

44Program Counter

Arithmetic Units

01

2

Memory

operator

operand 1

operand 2

destination34

5

76

89

Instruction 0@ memory address 0

Instruction 1@ memory address 4

Computer Science, Rutgers CS 519: Operating System Theory25

Memory Indirection

How do we access array elements efficiently if all we can do is name a cell?

Modify the operand to allow for fetching an operand "through" a memory location

E.g.: LOAD [5], 2 means fetch the contents of the cell whose address is in cell 5 and put it into cell 2

So if cell 5 had the number 100, we would place the contents of cell 100 into cell 2

This is called indirection

Fetch the contents of the cell “pointed to” by the cell in the opcode

Steal an operand bit to signify if an indirection is desired

Computer Science, Rutgers CS 519: Operating System Theory26

Conditionals and Looping

Primitive “computers” only followed linear instructions

Breakthrough in early computing was addition of conditionals and branching

Instructions that modify the Program Counter

Conditional instructionsIf the content of this cell is [positive, not zero, etc.] execute the instruction or not

Branch InstructionsIf the content of this cell is [zero, non zero, etc.], set the PC to this location

jump is an unconditional branch

Computer Science, Rutgers CS 519: Operating System Theory27

Example: While Loop

while (counter > 0) {

sum = sum + Y[counter];

counter–-;

};

Variables to memory cells: counter is cell 1sum is cell 2index is cell 3Y[0]= cell 4, Y[1]=cell 5…

100 LOOP: BNZ 1,END // branch to address of END

// if cell 1 is not 0.104 ADD 2,[3],2 // Add cell 2 and the

value // of the cell pointed

to by// cell 3 then place the // result in cell 2

108 DEC 3 // decrement cell 3 by 1112 DEC 1 // decrement cell 1 by 1116 JUMP LOOP // start executing from

the // address of LOOP120 END: <next code block>

Memory cell

address

Assembler label

Assembler "mnemonic" English

Computer Science, Rutgers CS 519: Operating System Theory28

Registers

Architecture rule: large memories are slow, small ones are fast

But everyone wants more memory!

Solution: Put small amount of memory in the CPU for faster operation

Most programs work on only small chunks of memory in a given time period. This is called locality.

So, if we cache the contents of a small number of memory cells in the CPU memory, we might be able to execute a number of instructions before having to access memory

Small memory in CPU named separately in the instructions from the “main memory”

Small memory in CPU = registers

Large memory = main memory

Computer Science, Rutgers CS 519: Operating System Theory29

Register Machine Model

+,-,*,/+,-,*,/

CPU 01

2

Memory

34

5

76

89

88Program Counter

Arithmetic Units

Logic Units <,>,!=<,>,!=

2424

100100

1818

register 0

register 1

register 2

Computer Science, Rutgers CS 519: Operating System Theory30

Registers (cont)

Most CPUs have 16-32 “general purpose” registers All look the “same”: combination of operators, operands and destinations possible

Operands and destination can be in:Registers only (Sparc, PowerPC, Mips, Alpha)

Registers & 1 memory operand (Intel x86 and clones)

Any combination of registers and memory (Vax)

Only memory operations possible in "register-only" machines are load from and store to memory

Operations 100-1000 times faster when operands are in registers compared to when they are in memory

Save instruction space too Only address 16-32 registers, not GB of memory

Computer Science, Rutgers CS 519: Operating System Theory31

Typical Instructions

Add the contents of register 2 and register 3 and place result in register 5

ADD r2,r3,r5

Add 100 to the PC if register 2 is not zeroRelative branch

BNZ r2,100

Load the contents of memory location whose address is in register 5 into register 6

LDI r5,r6

Computer Science, Rutgers CS 519: Operating System Theory32

Memory Hierarchy

cpu

cache

main memory

word transfer

block transfer

disks

page transfer

decrease cost per bit

decrease frequency of access

increase capacity

increase access time

increase size of transfer unit

Computer Science, Rutgers CS 519: Operating System Theory33

Memory Access Costs

Computer Science, Rutgers CS 519: Operating System Theory34

Memory Caches

Motivated by the mismatch between processor and memory speed

Closer to the processor than the main memory

Smaller and faster than the main memory

Act as “attraction memory”: contains the value of main memory locations which were recently accessed (temporal locality)

Transfer between caches and main memory is performed in units called cache blocks/lines

Caches contain also the value of memory locations which are close to locations which were recently accessed (spatial locality)

Computer Science, Rutgers CS 519: Operating System Theory35

Cache Architecture

CPU

L1

L2

Memory

cache line

associativity

Capacity miss

Conflict miss

Cold miss

Cache line ~32-128Associativity ~2-4

Computer Science, Rutgers CS 519: Operating System Theory36

Cache design issues

Cache size and cache block size

Mapping: physical/virtual caches, associativity

Replacement algorithm: direct or LRU

Write policy: write through/write back

cpu

cache

mainmemory

word transfer

block transfer

Computer Science, Rutgers CS 519: Operating System Theory37

Abstracting the Machine

Bare hardware provides a computation device

How to share this expensive piece of equipment between multiple users?

Sign up during certain hours?

Give program to an operator?

they run it and give you the results

Software to give the illusion of having it all to yourself while actually sharing it with others (time-sharing)!

This software is the Operating System

Need hardware support to “virtualize” machine

Computer Science, Rutgers CS 519: Operating System Theory38

Architecture Features for the OS

Next we'll look at the mechanisms the hardware designers add to allow OS designers to abstract the basic machine in software

Processor modes

Exceptions

Traps

Interrupts

These require modifications to the basic fetch-decode-execute cycle in hardware

Computer Science, Rutgers CS 519: Operating System Theory39

Processor Modes

OS code is stored in memory … von Neumann model, remember?

What if a user program modifies OS code or data?

Introduce modes of operation

Instructions can be executed in user mode or system mode

A special register holds which mode the CPU is in

Certain instructions can only be executed when in system mode

Likewise, certain memory location can only be written when in system mode

Only OS code is executed in system mode

Only OS can modify its memory

The mode register can only be modified in system mode

Computer Science, Rutgers CS 519: Operating System Theory40

Simple Protection Scheme

All addresses < 100 are reserved for operating system use

Mode register providedzero = CPU is executing the OS (in system mode)

one = CPU is executing in user mode

Hardware does this check: On every fetch, if the mode bit is 1 and the address is less than 100, then do not execute the instruction

When accessing operands, if the mode bit is 1 and the operand address is less than 100, do not execute the instruction

Mode register can only be set if mode is 0

Computer Science, Rutgers CS 519: Operating System Theory41

Simple Protection Model

Memory

0

99100101

102

104103

105106

+,-,*,/+,-,*,/

CPU

88Program Counter

Arithmetic Units

Logic Units <,>,!=<,>,!=

Registers 0-31

Mode register 00

OS

User

Computer Science, Rutgers CS 519: Operating System Theory42

Fetch-decode-execute Revised

Fetch:

if (( the PC < 100) && ( the mode register == 1)) then

Error! User tried to access the OS

else

fetch the instruction at the PC

Decode:

if (( destination register == mode) && ( the mode register == 1)) then

Error! User tried to set the mode register

< more decoding >

Execute:

if (( an operand < 100) && ( the mode register == 1) then

error! User tried to access the OS

else

execute the instruction

Computer Science, Rutgers CS 519: Operating System Theory43

Exceptions

What happens when a user program tries to access memory holding the operating system code or data?

Answer: exceptions

An exception occurs when the CPU encounters an instruction which cannot be executed

Modify fetch-decode-execute loop to jump to a known location in the OS when an exception happens

Different errors jump to different places in the OS (are "vectored" in OS speak)

Computer Science, Rutgers CS 519: Operating System Theory44

Fetch-decode-execute with Exceptions

Fetch:

if (( the PC < 100) && ( the mode bit == 1)) then

set the PC = 60

set the mode = 0

fetch the instruction at the PC

Decode:

if (( destination register == mode) && ( the mode register == 1)) then

set the PC = 64

set the mode = 0

goto fetch

< more decoding >

Execute:

< check the operands for a violation>

60 is the well known entry pointfor a memory violation

64 is the well known entry pointfor a mode register violation

Computer Science, Rutgers CS 519: Operating System Theory45

Access Violations

Notice both instruction fetch from memory and data access must be checked

Execute phase must check both operands

Execute phase must check again when performing an indirect load

This is a very primitive memory protection scheme. We'll cover more complex virtual memory mechanisms and policies later in the course

Computer Science, Rutgers CS 519: Operating System Theory46

Recovering from Exceptions

The OS can figure out what caused the exception from the entry point

But how can it figure out where in the user program the problem was?

Solution: add another register, the PC’

When an exception occurs, save the current PC to PC’ before loading the PC with a new value

OS can examine the PC' and perform some recovery action

Stop user program and print an error message: error at address PC'

Run a debugger

Computer Science, Rutgers CS 519: Operating System Theory47

Fetch-decode-execute with Exceptions & Recovery

Fetch:

if (( the PC < 100) && ( the mode bit == 1)) then

set the PC' = PC

set the PC = 60

set the mode = 0

Decode:

if (( destination register == mode) && ( the mode register == 1)) then

set the PC' = PC

set the PC = 64

set the mode = 0

goto fetch

< more decoding >

Execute:

Computer Science, Rutgers CS 519: Operating System Theory48

Traps

Now we know what happens when a user program illegally tries to access OS code or data

How does a user program legitimately access OS services?

Solution: Trap instruction

A trap is a special instruction that forces the PC to a known address and sets the mode into system mode

Unlike exceptions, traps carry some arguments to the OS

Foundation of the system call

Computer Science, Rutgers CS 519: Operating System Theory49

Fetch-decode-execute with traps

Fetch:

if (( the PC < 100) && ( the mode bit == 1)) then

< memory exception>

Decode:

if (the instruction is a trap) then

set the PC' = PC

set the PC = 68

set the mode = 0

goto fetch

if (( destination register == mode) && ( the mode bit == 1)) then

< mode exeception >

Execute:

Computer Science, Rutgers CS 519: Operating System Theory50

How does the OS know which service the user program wants to invoke on a trap?

User program passes the OS a number that encodes which OS service is desired

This example machine could include the trap ID in the instruction itself:

Most real CPUs have a convention for passing the trap code in a set of registers

E.g. the user program sets register 0 with the trap code, then executes the trap instruction

Traps

Trap opcode Trap service ID

Computer Science, Rutgers CS 519: Operating System Theory51

Returning from a Trap

How to "get back" to user mode and the user's code after a trap?

Set the mode register = 0 then set the PC?

But after the mode bit is set to user, exception!

Set the PC, then set the mode bit?

Jump to "user-land", then in kernel mode

Most machines have a "return from exception" instruction

A single hardware instruction:

Swaps the PC and the PC'

Sets the mode bit to user mode

Traps and exceptions use the same mechanism (RTE)

Computer Science, Rutgers CS 519: Operating System Theory52

Interrupts

How can we force a the CPU back into system mode if the user program is off computing something?

Solution: Interrupts

An interrupt is an external event that causes the CPU to jump to a known address

Link an interrupt to a periodic clock

Modify fetch-decode-execute loop to check an external line set periodically by the clock

Computer Science, Rutgers CS 519: Operating System Theory53

Simple Interrupt Model

Clock

+,-,*,/+,-,*,/

CPU

88

PC'

Arithmetic Units

Logic Units <,>,!=<,>,!=

Registers 0-31

Mode register 00

Program Counter

Memory

Interrupt line

Reset line

OSUser

Computer Science, Rutgers CS 519: Operating System Theory54

The Clock

The clock starts counting to 10 milliseconds

The clock sets the interrupt line "high" (e.g. sets it logic 1, maybe +5 volts)

When the CPU toggles the reset line, the clock sets the interrupt line low and starts count to 10 milliseconds again

Computer Science, Rutgers CS 519: Operating System Theory55

Fetch-decode-execute with Interrupts

Fetch:

if (the clock interrupt line == 1) then

set the PC' = PC

set the PC = 72

set the mode = 0

goto fetch

if (( the PC < 100) && ( the mode bit == 1)) then

< memory exception >

fetch next instruction

Decode:

if (the instruction is a trap) then

< trap exception >

if (( destination register == mode) && ( the mode bit == 1)) then

< mode exeception >

<more decoding>

Execute: …

Computer Science, Rutgers CS 519: Operating System Theory56

Entry Points

What are the "entry points" for our little example machine?

60: memory access violation

64: mode register violation

68: User-initiated trap

72: Clock interrupt

Each entry point is typically a jump to some code block in the OS

All real OS’es have a set of entry points for exceptions, traps and interrupts

Sometimes they are combined and software has to figure out what happened.

Computer Science, Rutgers CS 519: Operating System Theory57

Saving and Restoring Context

Recall the processor state:PC, PC', R0-R31, mode register

When an entry to the OS happens, we want to start executing the correct routine then return to the user program such that it can continue executing normally

Can't just start using the registers in the OS!

Solution: save/restore the user context Use the OS memory to save all the CPU state

Before returning to user, reload all the registers and then execute a return from exception instruction

Computer Science, Rutgers CS 519: Operating System Theory58

Input and Output

How can humans get at the data?

How to load programs?

What happens if I turn the machine off?

Can I send the data to another machine?

Solution: add devices to perform these tasks:Keyboards, mice, graphics

Disk drives

Network cards

Computer Science, Rutgers CS 519: Operating System Theory59

A Simple I/O device

Network card has 2 registers: a store into the “transmit” register sends the byte over the wire.

Transmit often is written as TX (E.g. TX register)

a load from the “receive” register reads the last byte which was read from the wire

Receive is often written as RX

How does the CPU access these registers?

Solution: map them into the memory space An instruction that access memory cell 98 really accesses the transmit register instead of memory

An instruction that accesses memory cell 99 really accesses the receive register

These registers are said to be memory-mapped

Computer Science, Rutgers CS 519: Operating System Theory60

Basic Network I/O

Clock

+,-,*,/+,-,*,/

CPU

88

PC'

Arithmetic Units

Logic Units <,>,!=<,>,!=

Registers 0-31

Mode register 00

Program Counter

Memory

Interrupt line

Reset line

0

Network card

9899

Transmit Reg.Transmit Reg.

Receive Reg.Receive Reg.

Computer Science, Rutgers CS 519: Operating System Theory61

Why Memory-Mapped Registers

"Stealing" memory space for device registers has 2 functions:

Allows protected access --- only the OS can access the device.

User programs must trap into the OS to access I/O devices because of the normal protection mechanisms in the processor

Why do we want to prevent direct access to devices by user programs?

OS can control devices and move data to/from devices using regular load and store instructions

No changes to the instruction set are required

This is called programmed I/O

Computer Science, Rutgers CS 519: Operating System Theory62

Status Registers

How does the OS know if a new byte has arrived?

How does the OS know when the last byte has been transmitted? (so it can send another one)

Solution: status registers

A status register holds the state of the last I/O operation

Our network card has 1 status register

To transmit, the OS writes a byte into the TX register and sets bit 0 of the status register to 1. When the card has successfully transmitted the byte, it sets bit 0 of the status register back to 0.

When the card receives a byte, it puts the byte in the RX register and sets bit 1 of the status register to 1. After the OS reads this data, it sets bit 1 of the status register back to 0.

Computer Science, Rutgers CS 519: Operating System Theory63

Polled I/O

To Transmit:While (status register bit 0 == 1); // wait for card to be ready

TX register = data;

Status reg = status reg | 0x1; // tell card to TX (set bit 0 to 1)

Naïve Receive:While (status register bit 1 != 1); // wait for data to arrive

Data = RX register;

Status reg = status reg & 0x01; // tell card got data (clear bit 1)

Cant' stall OS waiting to receive!

Solution: poll after the clock ticks If (status register bit 1 == 1)

Data = RX register

Status reg = status reg & 0x01;

Computer Science, Rutgers CS 519: Operating System Theory64

Interrupt driven I/O

Polling can waste many CPU cyclesOn transmit, CPU slows to the speed of the device

Can't block on receive, so tie polling to clock, but wasted work if no RX data

Solution: use interruptsWhen network has data to receive, signal an interrupt

When data is done transmitting, signal an interrupt.

Computer Science, Rutgers CS 519: Operating System Theory65

Polling vs. Interrupts

Why poll at all?

Interrupts have high overhead:Stop processor

Figure out what caused interrupt

Save user state

Process request

Key factor is frequency of I/O vs. interrupt overhead

Computer Science, Rutgers CS 519: Operating System Theory66

Direct Memory Access (DMA)

Problem with programmed I/O: CPU must load/store all the data into device registers.

The data is probably in memory anyway!

Solution: more hardware to allow the device to read and write memory just like the CPU

Base + bound or base + count registers in the device

Set base + count register

Set the start transmit register

I/O device reads memory from base

Interrupts when done

Computer Science, Rutgers CS 519: Operating System Theory67

PIO vs. DMA

Overhead less for PIO than DMAPIO is a check against the status register, then send or receive

DMA must set up the base, count, check status, take an interrupt

DMA is more efficient at moving dataPIO ties up the CPU for the entire length of the transfer

Size of the transfer becomes the key factor in when to use PIO vs. DMA

Computer Science, Rutgers CS 519: Operating System Theory68

Example of PIO vs. DMA

Given:A load costs 100 CPU "cycles“ (time units)

A store costs 50 cycles

To process an interrupt costs 2000 instructions, each an average each of 2 cycles

To send a packet via PIO costs 1 load + 1 store per byte

To send via DMA costs 4 loads + interrupt

Find the packet size where transmitting via DMA costs less CPU cycles than PIO

Computer Science, Rutgers CS 519: Operating System Theory69

Example PIO vs. DMA

Find the number of bytes were PIO==DMA (cutoff point) cycles per load: L

cycles per store: S

byte in the packet: C

Express simple equation for CPU cycles in terms of cost per byte:

# of cycles for PIO = L + S*B

# of cycles for DMA = setup + interrupt

# of cycles for DMA = 4L + 4000

Set PIO cycles equal to DMA cycles and solve for bytes:L+S*B = 4L+4000

100+50B = 4(100)+4000

B = 86 bytes (cutoff point)

When the packet size is >86 bytes, DMA costs less cycles than PIO.

Computer Science, Rutgers CS 519: Operating System Theory70

Typical I/O devices

Disk drives:Present the CPU a linear array of fixed-sized blocks that are persistent across power cycles

Network cards:Allow the CPU to send and receive discrete units of data (packets) across a wire, fiber or radio

Packet sizes 64-8000 bytes are typical

Graphics adapters:Present the CPU with a memory that is turned into pixels on a screen

Computer Science, Rutgers CS 519: Operating System Theory71

Recap: the I/O design space

Polling vs. interruptsHow does the device notify the processor an event happened?

Polling: Device is passive, CPU must read/write a register

Interrupt: device signals CPU via an interrupt

Programmed I/O vs. DMA How the device sends and receives data

Programmed I/O: CPU must use load/store into the device

DMA: Device reads and writes memory

Computer Science, Rutgers CS 519: Operating System Theory72

Practical: How to boot?

How does a machine start running the operating system in the first place?

The process of starting the OS is called booting

Sequence of hardware + software event form the boot protocol

Boot protocol in modern machines is a 3-stage processCPU starts executing from a fixed address

Firmware loads the boot loader

Boot loader loads the OS

Computer Science, Rutgers CS 519: Operating System Theory73

Boot Protocol

(1) CPU is hard-wired to start executing from a known address in memory

E.g., on x86 this address is 0xFFFF0 (hexadecimal)

This memory address is typically mapped to read-only memory (ROM)

(2) ROM contains the “boot” codeThis kind of read-only software is called firmware

On x86, the starting address corresponds to the BIOS (basic input-output system) boot entry point

This "firmware" code contains only enough enough code to read 1 block from the disk drive. This block is loaded and then executed. This program is the boot loader.

Computer Science, Rutgers CS 519: Operating System Theory74

Boot Protocol (cont)

(3) The boot loader can then load the rest of the operating system from disk. Note that this point the OS still is not running

The boot loader can know about multiple operating systems

The boot loader can know about multiple versions of the OS

Computer Science, Rutgers CS 519: Operating System Theory75

Why Have A Boot Protocol?

Why not just store the OS into ROM?

Separate the OS from the hardwareMultiple OSes or different versions of the OS

Want to boot from different devices E.g. security via a network boot

OS is pretty big (4-8MB). Rather not have it as firmware

Computer Science, Rutgers CS 519: Operating System Theory76

Multiprocessors

CPUMemory

memory bus

I/O bus

disk Net interface

cache

Simple scheme (SMP): more than one processor on the same bus

Memory is shared among processors -- cache coherence

Bus contention increases -- does not scale

Alternative (non-bus) system interconnect – complex and expensive

SMPs naturally support single-image operating systems

CPU

cache

Computer Science, Rutgers CS 519: Operating System Theory77

Cache-Coherent Shared-Memory: UMA

CPU

Memory

CPU CPU CPU

SnoopingCaches

Computer Science, Rutgers CS 519: Operating System Theory78

CC-NUMA Multiprocessors

CPUMemory

memory bus

I/O bus

disk Net interface

cache

CPUMemory

memory bus

I/O bus

diskNet interface

cache

network

• Non-uniform access to different memories• Hardware allows remote memory accesses and maintains cache coherence• Scalable interconnect more scalable than bus-based UMA systems• Also naturally supports single-image operating systems• Complex hardware coherence protocols

Computer Science, Rutgers CS 519: Operating System Theory79

Multicomputers

Network of computers: “share-nothing” -- cheap

Distributed resources: difficult to program

Message passing

Distributed file system

Challenge: build efficient global abstraction in software

CPUMemory

memory bus

I/O bus

disk Net interface

cache

CPUMemory

memory bus

I/O bus

diskNet interface

cache

network

OS Mechanisms and Policies

Computer Science, Rutgers CS 519: Operating System Theory81

Basic computer structure

CPU Memory

memory bus

I/O bus

disk Net interface

Computer Science, Rutgers CS 519: Operating System Theory82

System Abstraction: Processes

A process is a system abstraction:illusion of being the only job in the system

hardware: computer

operating system: process

user: application create, kill processes,inter-process comm.

multiplex resources

Computer Science, Rutgers CS 519: Operating System Theory83

Processes: Mechanism and Policy

Mechanism:

Creation, destruction, suspension, context switch, signaling, IPC, etc.

Policy:

Minor policy questions:

Who can create/destroy/suspend processes?

How many active processes can each user have?

Major policy question that we will concentrate on:

How to share system resources between multiple processes?

Typically broken into a number of orthogonal policies for individual resources such as CPU, memory, and disk.

Computer Science, Rutgers CS 519: Operating System Theory84

Processor Abstraction: Threads

A thread is a processor abstraction: illusion of having 1 processor per execution context

hardware: processor

operating system: thread

application: execution contextcreate, kill, synch.

context switch

Process vs. Thread: Process is the unit of resource ownership,while Thread is the unit of instruction execution.

Computer Science, Rutgers CS 519: Operating System Theory85

Threads: Mechanism and Policy

Mechanism:Creation, destruction, suspension, context switch, signaling, synchronization, etc.

Policy:How to share the CPU between threads from different processes?

How to share the CPU between threads from the same process?

Computer Science, Rutgers CS 519: Operating System Theory86

Threads

Traditional approach: OS uses a single policy (or at most a fixed set of policies) to schedule all threads in the system. Assume two classes of jobs: interactive and batch.

New approaches: application-controlled scheduling, reservation-based scheduling

Computer Science, Rutgers CS 519: Operating System Theory87

Memory Abstraction: Virtual memory

hardware: physical memory

operating system: virtual memory

Virtual memory is a memory abstraction: illusion of large contiguous memory, often more memory than physically available

application: address spacevirtual addresses

physical addresses

Computer Science, Rutgers CS 519: Operating System Theory88

Virtual Memory: Mechanism

Mechanism:

Virtual-to-physical memory mapping, page-fault, etc.

physical memory:

v-to-p memory mappings

processes:

virtual address spacesp1 p2

Computer Science, Rutgers CS 519: Operating System Theory89

Virtual Memory: Policy

Policy:How to multiplex a virtual memory that is larger than the physical memory onto what is available?

How to share physical memory between multiple processes?

Computer Science, Rutgers CS 519: Operating System Theory90

Virtual Memory

Traditional approach: OS provides a sufficiently large virtual address space for each running application, does memory allocation and replacement, and may ensure protection

New approaches: external memory management, huge (64-bit) address space, global virtual address space

Computer Science, Rutgers CS 519: Operating System Theory91

Storage Abstraction: File System

hardware: disk

operating system: files, directories

A file system is a storage abstraction: illusion of structured storage space

application/user: copy file1 file2 naming, protection,operations on files

operations on disk blocks

Computer Science, Rutgers CS 519: Operating System Theory92

File System

Mechanism:File creation, deletion, read, write, file-block-to-disk-block mapping, file buffer cache, etc.

Policy:Sharing vs. protection?

Which block to allocate for new data?

File buffer cache management?

Computer Science, Rutgers CS 519: Operating System Theory93

File System

Traditional approach: OS does disk block allocation and caching (buffer cache) , disk operation scheduling, and management of the buffer cache

New approaches: application-controlled cache replacement, log-based allocation (makes writes fast)

Computer Science, Rutgers CS 519: Operating System Theory94

Traditional file system

application: read/write files

OS: translate file to disk blocks

...buffer cache ...maintains

controls disk accesses: read/write blocks

hardware:

Computer Science, Rutgers CS 519: Operating System Theory95

Application-controlled caching

application: read/write files replacement policy

OS: translate file to disk blocks

...buffer cache ...maintains

controls disk accesses: read/write blocks

hardware:

Computer Science, Rutgers CS 519: Operating System Theory96

Communication Abstraction: Messaging

hardware: network interface

operating system: TCP/IP protocols

Message passing is a communication abstraction: illusion of reliable (sometime ordered) transport

application: socketsnaming, messages

network packets

Computer Science, Rutgers CS 519: Operating System Theory97

Message Passing

Mechanism:Send, receive, buffering, retransmission, etc.

Policy:Congestion control and routing

Multiplexing multiple connections onto a single NIC

Computer Science, Rutgers CS 519: Operating System Theory98

Message Passing

Traditional approach: OS provides naming schemes, reliable transport of messages, packet routing to destination

New approaches: user-level protocols, zero-copy protocols, active messages, memory-mapped communication

Computer Science, Rutgers CS 519: Operating System Theory99

UMA Multiprocessors

CPU

Memory

memory bus

I/O bus

disk Net interface

cache

CPU

cache

Computer Science, Rutgers CS 519: Operating System Theory100

UMA Multiprocessors: OS Issues

ProcessesHow to divide processors among multiple processes? Time sharing vs. space sharing

ThreadsSynchronization mechanisms based on shared memory

How to schedule threads of a single process on its allocated processors?

Affinity scheduling?

Computer Science, Rutgers CS 519: Operating System Theory101

CC-NUMA Multiprocessors

CPUMemory

memory bus

I/O bus

disk Net interface

cache

CPUMemory

memory bus

I/O bus

diskNet interface

cache

network

Hardware allows remote memory accesses and maintains cache coherence through protocol

Computer Science, Rutgers CS 519: Operating System Theory102

CC-NUMA Multiprocessors: OS Issues

Memory locality!!Remote memory access up to an order of magnitude more expensive than local access

Thread migration vs. page migration

Page replication

Affinity scheduling

Computer Science, Rutgers CS 519: Operating System Theory103

Multicomputers

CPUMemory

memory bus

I/O bus

disk Net interface

cache

CPUMemory

memory bus

I/O bus

diskNet interface

cache

network

Share-nothing

Computer Science, Rutgers CS 519: Operating System Theory104

Multicomputers: OS Issues

SchedulingNode allocation? (CPU and memory allocated together)

Process migration?

Software distributed shared-memory (Soft DSM)

Distributed file systems

Low-latency reliable communication

OS Structure

Computer Science, Rutgers CS 519: Operating System Theory106

Traditional OS structure

Monolithic/layered systems

one/N layers all executed in “kernel-mode”

good performance but rigid

OS kernel

hardware

userprocess

filesystem

memorysystem

usersystem calls

Computer Science, Rutgers CS 519: Operating System Theory107

Micro-kernel OS

client-server model, IPC between clients and servers

the micro-kernel provides protected communication

OS functions implemented as user-level servers

flexible but efficiency is the problem

easy to extend for distributed systems

micro-kernel

hardware

clientprocess

fileserver

memoryserver

IPC

user mode

Computer Science, Rutgers CS 519: Operating System Theory108

Extensible OS kernel

User processes can load customized OS services into the kernel

Good performance and flexibility but protection and scalability become problems

extensible kernel

hardware

process

defaultmemoryservice

user modeprocess

mymemoryservice

Computer Science, Rutgers CS 519: Operating System Theory109

Virtual Machines

Old concept which is heavily revived today

the real hardware is “cloned” in several identical virtual machines

OS functionality built on top of the virtual machine

hardware

user

exokernel

allocate resourceOS on virtual machine

Some History …

… and a little bit of the future …

Computer Science, Rutgers CS 519: Operating System Theory111

The UNIX Time-sharing System

FeaturesTime-sharing system

Hierarchical file system

System command language (shell)

File-based device-independent I/O

Versions 1 & 2No multi-programming

Ran on PDP-7,9,11

Computer Science, Rutgers CS 519: Operating System Theory112

More History

Version 4Ran on PDP-11 (hardware costing < $40k)

Took less than 2 man-years to code

~50KB code size (kernel)

Written in C

Computer Science, Rutgers CS 519: Operating System Theory113

File System

Ordinary files (uninterpreted)

DirectoriesFile of files

Organized as a rooted tree

Pathnames (relative and absolute)

Contains links to parent, itself

Multiple links to files can exist

Link - hard OR symbolic

Computer Science, Rutgers CS 519: Operating System Theory114

File System (contd)

Special filesEach I/O device associated with a special file

To provide uniform naming and protection model

Uniform I/O

Computer Science, Rutgers CS 519: Operating System Theory115

Removable File Systems

Tree-structured file hierarchies

Mounted on existing space by using mount

No links between different file systems

Computer Science, Rutgers CS 519: Operating System Theory116

Protection

User iduid marked on files

Ten protection bitsnine - rwx permissions for user, group & other

setuid bit is used to change user id

Super-user has special uidexempt from constraints on access

Computer Science, Rutgers CS 519: Operating System Theory117

Uniform I/O Model

Basic system callsopen, close, creat, read, write, seek

Streams of bytes, no records

No locks visible to the user

Computer Science, Rutgers CS 519: Operating System Theory118

File System Implementation

I-node contains a short description of file

direct, single-indirect and double-indirect pointers to disk blocks

I-listtable of i-nodes, indexed by i-number

pathname scanning to determine i-number

Allows simple and efficient fsck

Buffered data

Write-behind

Computer Science, Rutgers CS 519: Operating System Theory119

Processes

ProcessMemory image, register values, status of open files etc.

Memory image consists of text, data, and stack segments

To create new processes

pid = fork()

process splits into two independently executing processes (parent and child)

Pipes used for communication between related processes

exec(file, arg1, ..., argn) used to start another application

Computer Science, Rutgers CS 519: Operating System Theory120

The Shell

Command-line interpreter

cmd arg1 arg2 ... argn

i/o redirection

<, >

filters & pipes

ls | more

job control

cmd &

simplified shell =>

Computer Science, Rutgers CS 519: Operating System Theory121

Plan 9 From Bell Labs

Move towards distributed computing with PCs brought two problems:

Focus on private machines difficult for networks of machines to serve as seamlessly as old monolithic timesharing systems.

Many machines in physically different locations administrative nightmare

Plan 9build Unix out of lot of little systems, not a system out of lots of little Unixes.

Computer Science, Rutgers CS 519: Operating System Theory122

Plan 9 Core

Client/server modelShared-memory multiprocessors provide computing cycles

File servers provide storage

PCs/workstation provides terminal

Three basic principles:File-like API to name and access all resources

Standard protocol for accessing these resources

Personal hierarchical namespace

Computer Science, Rutgers CS 519: Operating System Theory123

A Plan 9 Installation

Computer Science, Rutgers CS 519: Operating System Theory124

Everything Is A File

All systems export a file system APIOpen, close, read, write, etc.

API defined as set of messages on a communication channelextends easily to a remote environment

Kernel-based service - mount deviceAttach communication channels to remote servers, into the name space

Bind to give a new name for an existing object

Converts system calls on remote resource into 9P messages on the communication channel

Real file systemSingle view of system spread across many disks

Storage hierarchy: memory buffer, disks, WORM jukebox

Computer Science, Rutgers CS 519: Operating System Theory125

Resource Access Protocol

A single resource access protocol - 9PControls file systems

Resolve file names, traverse name hierarchy of filesystems

Computer Science, Rutgers CS 519: Operating System Theory126

Personal Name Space

Personal name space a powerful abstractionAlso used in other systems (e.g. Spring)

Plan 9 example: "my house" vs. the exact address/dev/cons always refers to the user's terminal

Local name space obeys globally understood conventions

Computer Science, Rutgers CS 519: Operating System Theory127

Parallel Programming

Processes (no threads)Can control what resources are shared when forking

New processes created using an rfork system call

argument to rfork is a bit vector which specifies which of the parent's resources should be copied/shared/created anew

resources - name space, the environment, file descriptor table, memory etc.

Basic IPC mechanism: rendezvous

Computer Science, Rutgers CS 519: Operating System Theory128

Protection & Access Control

Authentication mechanism based on DESbuilt into 9P

No super-useruser with special privileges for maintenance

Use of file system interface for most of the services takes care of ownership and permissions problems