38
Virtual Memory Input/Output Computer Science 104 Lecture 21 CPS 104 2 © Alvin R. Lebeck Projects Due Next Friday April 20 Homework #6 assigned Today VM VM + Caches, I/O Reading I/O Chapter 8 Input Output (primarily 8.4 and 8.5) Admin

Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

Virtual MemoryInput/Output

Computer Science 104Lecture 21

CPS 104 2© Alvin R. Lebeck

Projects Due Next Friday April 20Homework #6 assignedToday

VMVM + Caches,I/O

ReadingI/O

Chapter 8 Input Output (primarily 8.4 and 8.5)

Admin

Page 2: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 3© Alvin R. Lebeck

Review: Virtual Memory

• Process = virtual address space + thread of control

• TranslationVA -> PAWhat physical address does virtual address A map toIs VA in physical memory?

• Protection (access control)

Do you have permission to access it?

Virtual Address (VA)

Physical Address (PA)

CPS 104 4© Alvin R. Lebeck

Review: Paged Virtual Memory

• Virtual address (232, 264) to Physical Address mapping (228)

virtual page to physical page frame• Fixed size units for access control & translation

Virtual

Physical0x1000

0x6000

0x9000

0x00000x1000

0x2000

0x11000

Virtual page number Offset

Page 3: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 5© Alvin R. Lebeck

31

Page offset

011

Virtual Page Number

Page offset

11 0

Physical Frame Number

27

Page Table

Virtual Address

Physical Address

Page size: 4K

Review: Virtual to Physical Address translation

Need to translate

every access (instruction and data)

CPS 104 6© Alvin R. Lebeck

The Memory Management Unit (MMU)

• Inputvirtual address

• Outputphysical addressaccess violation (exception)

• Access Violationsnot presentuser vs. kernelwritereadexecute

Page 4: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 7© Alvin R. Lebeck

Translation Lookaside Buffers (TLB)

• Need to perform address translation on every memory reference

30% of instructions are memory referencesat least one memory reference per cycle on today’s processors

• Make Common Case Fast, others correct• If you just accessed a virtual page and hence

determined its physical frame, you’re likely to need that same translation again in the near future

LOCALITY!

• Throw hardware at the problem• Cache PTEs

CPS 104 8© Alvin R. Lebeck

Fast Translation: Translation Buffer

• Cache of translated addresses• 64 entry fully associative

Page Number

Pageoffset

. . . . . .

v r w tag phys frame

. . .

64x1 mux

1 2

. . .

3

4Physical Address

Virtual Address

Page 5: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 9© Alvin R. Lebeck

TLB Design

• Must be fast, not increase critical path• Must achieve high hit ratio• Generally small # of entries with high associative• Mapping change

page removed from physical memoryprocessor must invalidate the TLB entry (special privileged instructions)

• PTE is per process entityMultiple processes with same virtual addressesContext Switches?

• Flush TLB• Add Address space identifier (ASID or PID)

part of processor state, must be set on context switch

CPS 104 10© Alvin R. Lebeck

Hardware Managed TLBs

• Hardware Handles TLB miss

• Dictates page table organization

• Complicated state machine to “walk page table”

Multiple levels for forward mappedLinked list for inverted

• Exception only if access violation

Control

Memory

TLB

CPU

Page 6: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 11© Alvin R. Lebeck

Software Managed TLBs

• Software Handles TLB miss

OS reads translations from Page Table and puts them in TLBspecial instructions

• Flexible page table organization

• Simple Hardware to detect Hit or Miss

• Exception if TLB miss or access violation

Control

Memory

TLB

CPU

CPS 104 12© Alvin R. Lebeck

Virtual Memory• Provides illusion of very large memory

Sum of the memory of many jobs greater than physical memoryAddress space of each job larger than physical memory

• Good utilization of available physical memory.• Simplifies memory management: code and data movement,

protection, ... (main reason today)• Exploits memory hierarchy to keep average access time low.• Involves at least two storage levels: main and secondary

Virtual Address -- address used by the programmerVirtual Address Space -- collection of such addressesMemory Address -- address in physical memory also known as

“physical address” or “real address”

Page 7: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 13© Alvin R. Lebeck

• Divide memory (virtual and physical) into fixed size blocks (Pages, Frames).

Pages in Virtual space.Frames in Physical space.

• Make page size a power of 2: (page size = 2k)• All pages in the virtual address space are contiguous.• Pages can be mapped into any physical Frame• Some pages in main memory (DRAM), some pages on

secondary memory (disk).

Paged Virtual Memory: Main Idea

CPS 104 14© Alvin R. Lebeck

Paged Virtual Memory: Main Idea (Cont)

• All programs are written using Virtual Memory Address Space.

• The hardware does on-the-fly translation between virtual and physical address spaces.

• Use a Page Table to translate between Virtual and Physical addresses

• Translation Lookaside Buffer (TLB) expedites address translation

• Must select “good” page size to minimize fragmentation

Page 8: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 15© Alvin R. Lebeck

Kernel

Mapping the Kernel

• Digital Unix Ksegkseg (bit 63 = 1, 62 = 0)

• Kernel has direct access to physical memory

• One VA->PA mapping for entire Kernel

• Lock (pin) TLB entryor special HW detection

UserStack

Kernel

User Code/Data

PhysicalMemory

0

264-1

CPS 104 16© Alvin R. Lebeck

Segmented Virtual Memory• Virtual address (232,

264) to Physical Address mapping (228)

• Each segmentvariable sizeAddress = base + offsetcontiguous in both VA and PA

Virtual

Physical0x1000

0x6000

0x9000

0x00000x1000

0x2000

0x11000

Page 9: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 17© Alvin R. Lebeck

Intel Pentium Segmentation

Seg Selector Offset

Virtual Address

SegmentDescriptor

Global DescriptorTable (GDT)

Segment BaseAddress

Physical Address Space

CPS 104 18© Alvin R. Lebeck

Intel Pentium Segmentation + Paging

Seg Selector Offset

Virtual Address(logical address)

SegmentDescriptor

Global DescriptorTable (GDT)

Segment BaseAddress

Linear Address Space

PageDir

Physical Address Space

Dir OffsetTable

PageTable

Page 10: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 19© Alvin R. Lebeck

Review: Fast Translation: Translation Buffer

• Cache of translated addresses• 64 entry fully associative

Page Number

Pageoffset

. . . . . .

v r w tag phys frame

. . .

64x1 mux

1 2

. . .

483

4Physical Address

VirtualAddress

CPS 104 20© Alvin R. Lebeck

Address Translation and Caches

• Where is the TLB wrt the cache?• What are the consequences?

• Most of today’s systems have more than 1 cache

• Does the OS need to worry about this?

Page 11: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 21© Alvin R. Lebeck

TLBs and Caches

CPU

TB

$

MEM

VA

PA

PA

ConventionalOrganization

CPU

$

TB

MEM

VA

VA

PA

Virtually Addressed CacheTranslate only on miss

Alias (Synonym) Problem

CPU

$ TB

MEM

VA

PATags

PA

Overlap $ accesswith VA translation:requires $ index to

remain invariantacross translation

VATags

L2 $

Input / Output

Page 12: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 23© Alvin R. Lebeck

Overview

• I/O devicesdevice controller

• Device drivers• Memory Mapped I/O• Programmed I/O• Direct Memory Access (DMA)• Rotational media (disks)• I/O Bus technologies• RAID (if time)

CPS 104 24© Alvin R. Lebeck

Why I/O?

• Interactive Applications (keyboard, mouse, screen)• Long term storage (files, data repository)• Swap for VM• Many different devices

character vs. blockNetworks are everywhere!

• 106 difference CPU (10 -9) & I/O (10 -3)• Response Time vs. Throughput

Not always another process to execute

• OS hides (some) differences in devicessame (similar) interface to many devices

• Permits many apps to share one device

Page 13: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 25© Alvin R. Lebeck

I/O Device Examples

Device Behavior Partner Data Rate (KB/sec)Keyboard Input Human 0.01Mouse Input Human 0.02Line Printer Output Human 1.00Laser Printer Output Human 100.00Graphics Display Output Human 30,000.00Network-LAN Input/Output Machine 10,000.00Floppy disk Storage Machine 50.00Optical Disk Storage Machine 500.00Magnetic Disk Storage Machine 5,000.00

CPS 104 26© Alvin R. Lebeck

Time(workload) = Time(CPU) + Time(I/O) - Time(Overlap)

I/O Systems

I/O Devices

I/O Bus

Memory Bus

Processor

Cache

MainMemory

DiskController

Disk Disk

GraphicsController

NetworkInterface

Graphics Network

interrupts

I/O Bridge

Core Chip Set

Page 14: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 27© Alvin R. Lebeck

Device Drivers

• top-halfAPI (open, close, read, write, ioctl)I/O Control (IOCTL, device specific arguments)

• bottom-halfinterrupt handlercommunicates with deviceresumes process

• Must have access to user address space and device control registers => runs in kernel mode.

CPS 104 28© Alvin R. Lebeck

Review: Handling an Interrupt/Exception

• Invoke specific kernelroutine based on type of interrupt

interrupt/exception handler

• Must determine what caused interrupt

• Clear the interrupt• Return from interrupt

(RETT, MIPS = RFE, NiosII = eret)

ldaddst

mulbeqld

subbne

RETT

User Program

Interrupt Handler

ServiceRoutines

Page 15: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 29© Alvin R. Lebeck

Processor <-> Device Interface Issues

• InterconnectionsBusses

• Processor interfaceI/O InstructionsMemory mapped I/O

• I/O Control StructuresDevice ControllersPolling/Interrupts

• Data movementProgrammed I/O / DMA

• Capacity, Access Time, Bandwidth

CPS 104 30© Alvin R. Lebeck

Device Controllers

DeviceController

Command Status Data 0

Data 1

Data n-1

Busy Done Error

Bus

Device

Interrupt?

Controller deals withmundane control(e.g., position head,

error detection/correction)

Processor communicateswith Controller

Page 16: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 31© Alvin R. Lebeck

I/O Instructions

Independent I/O BusCPU

Controller Controller

Device Device

Memorymemorybus

Separate instructions (in,out)

CPS 104 32© Alvin R. Lebeck

Memory Mapped I/O

ROM

RAM

I/O

Bridge

Physical Address• Issue command through store instruction• Check status with load instructionCaches?

$

CPU

L2 $

Memory Bus

Memory Bus Adapter

I/O bus

Controller

Device

Page 17: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 33© Alvin R. Lebeck

Communicating with the processor

• Pollingcan waste time waiting for slow I/O devicebusy waitcan interleave with useful work

• Interruptsinterrupt overheadinterrupt could happen anytime - asynchronousno busy wait

CPS 104 34© Alvin R. Lebeck

Data Movement

• Programmed I/Oprocessor has to touch all the datatoo much processor overhead

» for high bandwidth devices (disk, network)

• DMAprocessor sets up transfer(s)DMA controller transfers datacomplicates memory system

Page 18: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 35© Alvin R. Lebeck

Programmed I/O & Polling

• Advantage: CPU totally in control

• Disadvantage: Overhead of pollingProgram must perform check of device, thus can’t do useful work

yes

Is thedata

ready?

loaddata

storedata

yesno

done? no

CPS 104 36© Alvin R. Lebeck

Programmed I/O & Interrupt Driven Data Transfer

• User program progress halted only during actual transfer

• Interrupt overhead can dominate transfer time

• Processor must touch all data…too slow for some devices

addsubandornop

readstore...rtimemory

userprogram

(1) I/Ointerrupt

(2) save PC

(3) interruptservice addr

interruptserviceroutine(4)

$

CPU

L2 $

Memory Bus

Memory Bus Adapter

I/O bus

Controller

Device

Page 19: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 37© Alvin R. Lebeck

Direct Memory Access (DMA)

• CPU delegates responsibility for data transfer to a special controller

CPU sends a starting address, direction, and length count to DMAC. Then issues "start".

DMAC provides handshake signals for devicecontroller, and memory addresses and handshakesignals for memory.

0 ROM

RAM

Peripherals

DMACn

Memory Mapped I/O

$

CPU

L2 $

Memory Bus

Memory Bus Adapter

I/O bus

DMA CNTRL

CPS 104 38© Alvin R. Lebeck

I/O Bus

Memory Bus

Processor

Cache

MainMemory

DiskController

Disk Disk

GraphicsController

NetworkInterface

Graphics Network

interrupts

I/O and Virtual Caches

I/O Bridge

VirtualCache

PhysicalAddresses

I/O is accomplishedwith physical addressesDMA• flush pages from cache• need pa->va reverse

translation• coherent DMA

Page 20: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 39© Alvin R. Lebeck

Types of Storage Devices

• Magnetic Disks• Magnetic Tapes• CD ROM• Juke Box (automated tape library, robots)

CPS 104 40© Alvin R. Lebeck

Organization of a Hard Magnetic Disk

• Typical numbers (depending on the disk size):500 to 2,000 tracks per surface32 to 128 sectors per track

» A sector is the smallest unit that can be read or written

• Traditionally all tracks have the same number of sectors:Constant bit density: record more sectors on the outer tracksRecently relaxed: constant bit size, speed varies with track location

Platters

Track

Sector

Page 21: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 41© Alvin R. Lebeck

Magnetic Disk Characteristic

• Cylinder: all the tracks under the headat a given point on all surfaces

• Read/write data is a three-stage process:Seek time: position the arm over the proper trackRotational latency: wait for the desired sectorto rotate under the read/write headTransfer time: transfer a block of bits (sector)under the read-write head

• Average seek time as reported by the industry:Typically in the range of 8 ms to 12 ms(Sum of the time for all possible seek) / (total # of possible seeks)

• Due to locality of disk reference, actual average seek time may:

Only be 25% to 33% of the advertised number

SectorTrack

Cylinder

HeadPlatter

CPS 104 42© Alvin R. Lebeck

Typical Numbers of a Magnetic Disk

• Rotational Latency:Most disks rotate at 3,600 to 10,000 RPMApproximately 16ms to 3.5ms per revolution, respectivelyAn average latency to the desiredinformation is halfway around the disk: 8 ms at 3600 RPM, 4 ms at 7200 RPM

• Transfer Time is a function of :Transfer size (usually a sector): 1 KB / sectorRotation speed: 3600 RPM to 7200 RPMRecording density: bits per inch on a trackDiameter typical ranges from 2.5 to 5.25 inTypical values: 2 to 12 MB per second

SectorTrack

Cylinder

HeadPlatter

Page 22: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 43© Alvin R. Lebeck

Disk Access

• Access time = queue + seek + rotational + transfer + overhead

• Seek timemove arm over trackaverage is confusing (startup, slowdown, locality of accesses)

• Rotational latencywait for sector to rotate under headaverage = 0.5/(3600 RPM) = 8.3ms

• Transfer Timef(size, BW bytes/sec)

CPS 104 44© Alvin R. Lebeck

Disk Access Time Example

• Disk Parameters:Transfer size is 8K bytesAdvertised average seek is 12 msDisk spins at 7200 RPMTransfer rate is 4 MB/sec

• Controller overhead is 2 ms• Assume that disk is idle so no queuing delay• What is Average Disk Access Time for a Sector?

Ave seek + ave rot delay + transfer time + controller overhead12 ms + 0.5/(7200 RPM/60) + 8 KB/4 MB/s + 2 ms12 + 4.15 + 2 + 2 = 20 ms

• Advertised seek time assumes no locality: typically 1/4 to 1/3 advertised seek time: 20 ms => 12 ms

Page 23: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 45© Alvin R. Lebeck

Buses: Connecting I/O to Processor and Memory

• A bus is a shared communication link• It uses one set of wires to connect multiple

subsystems

Control

Datapath

Memory

ProcessorInput

Output

CPS 104 46© Alvin R. Lebeck

Advantages of Buses

• Versatility:New devices can be added easilyPeripherals can be moved between computersystems that use the same bus standard

• Low Cost:A single set of wires is shared in multiple ways

MemoryProcessorI/O

DeviceI/O

DeviceI/O

Device

Page 24: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 47© Alvin R. Lebeck

Disadvantages of Buses

• The bus creates a communication bottleneckBus bandwidth can limit the maximum I/O throughput

• The maximum bus speed is largely limited by:The length of the busThe number of devices on the busThe need to support a range of devices with:

» Widely varying latencies » Widely varying data transfer rates

MemoryProcessorI/O

DeviceI/O

DeviceI/O

Device

CPS 104 48© Alvin R. Lebeck

The General Organization of a Bus

• Control lines:Signal requests and acknowledgmentsIndicate what type of information is on the data lines

• Data lines carry information between the source and the destination:

Data and AddressesComplex commands

Data Lines

Control Lines

Page 25: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 49© Alvin R. Lebeck

Master versus Slave

• A bus transaction includes two parts:Sending the addressReceiving or sending the data

• Master is the device that starts the bus transaction by:

Sending the address

• Slave is the device that responds to the address by:Sending data to the master if the master ask for dataReceiving data from the master if the master wants to send data

BusMaster

BusSlave

Master send address

Data can go either way

CPS 104 50© Alvin R. Lebeck

Output Operation

• Output: Processor sending data to the I/O device:

Processor

Control (Memory Read Request)

Memory

Step 1: Request Memory

I/O Device (Disk)

Data (Memory Address)

Processor

Control

Memory

Step 2: Read Memory

I/O Device (Disk)

Data

Processor

Control (Device Write Request)

Memory

Step 3: Send Data to I/O Device

I/O Device (Disk)

Data(I/O Device Address

and then Data)

Page 26: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 51© Alvin R. Lebeck

Input Operation

° Input is defined as the Processor receiving data from the I/O device:

Processor

Control (Memory Write Request)

Memory

Step 1: Request Memory

I/O Device (Disk)

Data (Memory Address)

Processor

Control (I/O Read Request)

Memory

Step 2: Receive Data

I/O Device (Disk)

Data(I/O Device Address

and then Data)

CPS 104 52© Alvin R. Lebeck

Types of Buses

• Processor-Memory Bus (design specific)Short and high speedOnly need to match the memory system

» Maximize memory-to-processor bandwidthConnects directly to the processor

• External I/O Bus (industry standard)Usually is lengthy and slowerNeed to match a wide range of I/O devicesConnects to the processor-memory bus or backplane bus

• Backplane Bus (industry standard)Backplane: an interconnection structure within the chassisAllow processors, memory, and I/O devices to coexistCost advantage: one single bus for all components

• Bit-Serial Buses (New trend: USB, Firewire, .. )Use High speed unidirectional point-to-point communication

Page 27: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 53© Alvin R. Lebeck

A Computer System with One Bus: Backplane Bus

• A single bus (the backplane bus) is used for:Processor to memory communicationCommunication between I/O devices and memory

• Advantages: Simple and low cost• Disadvantages: slow and the bus can become a major

bottleneck• Example: Early IBM PC

Processor Memory

I/O Devices

Backplane Bus

CPS 104 54© Alvin R. Lebeck

A Two-Bus System

• I/O buses tap into the processor-memory bus via bus adaptors:Processor-memory bus: mainly for processor-memory trafficI/O buses: provide expansion slots for I/O devices

• Example: Apple Macintosh-IINuBus: Processor, memory, and a few selected I/O devicesSCCI Bus: the rest of the I/O devices

Processor Memory

I/OBus

Processor Memory Bus

BusAdaptor

BusAdaptor

BusAdaptor

I/OBus

I/OBus

Page 28: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 55© Alvin R. Lebeck

A Three-Bus System

• A small number of backplane buses tap into the processor-memory bus

Processor-memory bus is used for processor memory trafficI/O buses are connected to the backplane bus

• Advantage: loading on the processor bus is greatly reduced

Processor MemoryProcessor Memory Bus

BusAdaptor

BusAdaptor

BusAdaptor

I/O BusBackplane Bus

I/O Bus

CPS 104 56© Alvin R. Lebeck

Pentium 4

• I/O Options

Parallel ATA(100 MB/sec)

Parallel ATA(100 MB/sec)

(20 MB/sec)

PCI bus(132 MB/sec)

CSA(0.266 GB/sec)

AGP 8X(2.1 GB/sec)

Serial ATA(150 MB/sec)

Disk

Pentium 4processor

1 Gbit Ethernet

Memorycontroller

hub(north bridge)

82875P

MainmemoryDIMMs

DDR 400(3.2 GB/sec)

DDR 400(3.2 GB/sec)

Serial ATA(150 MB/sec)

Disk

AC/97(1 MB/sec)

Stereo(surround-

sound) USB 2.0(60 MB/sec)

. . .

I/Ocontroller

hub(south bridge)

82801EB

Graphicsoutput

(266 MB/sec)

System bus (800 MHz, 604 GB/sec)

CD/DVD

Tape

10/100 Mbit Ethernet

Page 29: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 57© Alvin R. Lebeck

Synchronous and Asynchronous Bus

• Synchronous Bus:Includes a clock in the control linesA fixed protocol for communication that is relative to the clockAdvantage: involves very little logic and can run very fastDisadvantages:

» Every device on the bus must run at the same clock rate» To avoid clock skew, bus must be short if it is fast.

• Asynchronous Bus:It is not clockedIt can accommodate a wide range of devicesIt can be lengthened without worrying about clock skewIt requires a handshaking protocol

CPS 104 58© Alvin R. Lebeck

A Handshaking Protocol

• Three control linesReadReq: indicates a read request for memory

Address is put on the data lines at the same lineDataRdy: indicates the data word is now ready on the data lines

Data is put on the data lines at the same timeAck: acknowledge the ReadReq or the DataRdy of the other party

ReadReq

AddressData Data

Ack

DataRdy

1 2

2

3

4

4

56

6 7

Page 30: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 59© Alvin R. Lebeck

Increasing the Bus Bandwidth

• Separate versus multiplexed address and data lines:Address and data can be transmitted in one bus cycleif separate address and data lines are availableCost: (a) more bus lines, (b) increased complexity

• Data bus width:By increasing the width of the data bus, transfers of multiple words require fewer bus cyclesExample: USB vs. 32-bit PCI vs. 64-bit PCICost: more bus lines

• Block transfers:Allow the bus to transfer multiple words in back-to-back bus cyclesOnly one address needs to be sent at the beginningThe bus is not released until the last word is transferredCost: (a) increased complexity

(b) decreased response time for request

CPS 104 60© Alvin R. Lebeck

Obtaining Access to the Bus

• One of the most important issues in bus design:How is the bus reserved by a devices that wishes to use it?

• Chaos is avoided by a master-slave arrangement:Only the bus master can control access to the bus:

» It initiates and controls all bus requestsA slave responds to read and write requests

• The simplest system:Processor is the only bus masterAll bus requests must be controlled by the processorMajor drawback: the processor is involved in every transaction

BusMaster

BusSlave

Control: Master initiates requests

Data can go either way

Page 31: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 61© Alvin R. Lebeck

Multiple Potential Bus Masters: the Need for Arbitration

• Bus arbitration scheme:A bus master wanting to use the bus asserts the bus requestA bus master cannot use the bus until its request is grantedA bus master must signal to the arbiter after finish using the bus

• Bus arbitration schemes usually try to balance two factors:

Bus priority: the highest priority device should be serviced firstFairness: Even the lowest priority device should never

be completely locked out from the bus• Bus arbitration schemes can be divided into four

broad classes:Distributed arbitration by self-selection: each device wanting the bus places a code indicating its identity on the bus.Distributed arbitration by collision detection: Ethernet uses this.Daisy chain arbitration: single device with all request lines.Centralized, parallel arbitration: see next-next slide

CPS 104 62© Alvin R. Lebeck

Centralized Bus Arbitration

Master A Master B Master C

Request-C

Request-B

Request-A

Grant-C

Grant-B

Grant-A

ArbitrationCircuit

Release

Page 32: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 63© Alvin R. Lebeck

The Daisy Chain Bus Arbitration Scheme

• Advantage: simple• Disadvantages:

Cannot assure fairness:A low-priority device may be locked out indefinitely

The use of the daisy chain grant signal also limits the bus speed

BusArbiter

Device 1HighestPriority

Device NLowestPriority

Device 2

Grant Grant GrantRelease

Request

CPS 104 64© Alvin R. Lebeck

Summary of Bus Options

Option High performance Low cost

Bus width Separate address Multiplex address& data lines & data lines

Data width Wider is faster Narrower is cheaper (e.g., 64 bits) (e.g., 8 bits)

Transfer size Multiple words has Single-word transferless bus overhead is simpler

Bus masters Multiple Single master(requires arbitration) (no arbitration)

Clocking Synchronous Asynchronous

Page 33: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 65© Alvin R. Lebeck

Designing an I/O System

• CPU 3x109 inst/sec, average 100,000 insts in OS per I/O

• Memory bus 1000 MB/sec• SCSI Ultra320 controller 320MB/sec, up to 7 disks• Disk R/W BW 75 MB/sec, average seek+rotational 6ms• Workload:

64KB reads (sequential) 200,000 user insts per I/O

• Find: max sustainable I/O rate # of disks & controllers required

CPS 104 66© Alvin R. Lebeck

I/O System Design

Find bottleneck• CPU I/O rate = inst rate/(insts per I/O) =

3x109/(200+100) x103 = 10,000 I/Os / sec• Memory bus I/O rate = Bus BW/Bytes per I/O = 1000

x 106/64x103 = 15,625 I/Os / secConfigure rest of system• # of disks = CPU I/O rate / disk I/O rate• Time for 1 disk I/O = seek + rotational + transfer

= 6ms + 64KB / 75MB/sec = 6.9 ms• I/Os / sec = (1000 ms/sec) / (6.9 ms / I/O) = 146 I/Os /

sec• # disks = 10,0000 I/Os / sec / 146 I/Os / sec = 69 disks

Page 34: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 67© Alvin R. Lebeck

I/O system design

• Determine # of controllers• Can we fill one controller with 7 disks?

Average Transfer rate per disk = 64KB / 6.9 ms = 9.56 MB/secTotal is < 70 MB /sec << 320MB /sec

• So 69 / 7 ~= 10 SCSI controllers

• What would have to change to require more controllers?

CPS 104 68© Alvin R. Lebeck

Communication Networks

NodeProcessor

ControlReg. I/F

NetI/F Memory

RequestBlock

ReceiveBlock

Media

Network Controller

Peripheral Backplane Bus

DMA

. . .

Processor MemoryList of request blocks

Data to be transmitted

. . .List of receive blocks

Data receivedDMA

. . .List of free blocks

• Send/receive queues in processor memories• Network controller copies back and forth via DMA• No host intervention needed• Interrupt host when message sent or received

Page 35: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 69© Alvin R. Lebeck

Relationship to Processor Architecture

• Virtual memory frustrates DMApage faults during DMA?

• Synchronization between controller and CPU• Caches required for processor performance cause

problems for I/OFlushing is expensive, I/O pollutes cacheSolution is borrowed from shared memory multiprocessors "snooping” (coherent DMA)

• Caches and write buffersneed uncached and write buffer flush for memory mapped I/O

CPS 104 70© Alvin R. Lebeck

Summary

• Virtual memory & caches• I/O

Processor interface issuesMemory mapped I/O, Interrupts and polling

• I/O devicesdevice controller

• Rotational media (disks)Disk access time equation, etc.

• I/O bus ⇔ memory busStandard interfaces, many devices from different vendors

• BusesAsynchronous vs. synchronousBandwidth tradeoffsArbitration schemes

• I/O and virtual memory interaction• Designing an I/O system

Page 36: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 71© Alvin R. Lebeck

SPIM (Homework): Interrupt Handler

• MIPS/SPIM program• Use memory-mapped I/O• Use interrupts

• Program should:Accept keyboard input

» interruptsEcho input to terminal

» pollingExit if user typed ‘q’

• Programmed I/O

CPS 104 72© Alvin R. Lebeck

1

Interrupt enable

Ready

1UnusedReceiver control (0xffff0000)

8

Received byte

UnusedReceiver data (0xffff0004)

1

Interrupt enable

Ready

1UnusedTransmitter control (0xffff0008)

Transmitter data (0xffff000c)

8

Transmitted byte

Unused

Terminal Control

• Memory mapped I/O

use LW, SW• -mapped_io

command line option

• Receiver - inputready=1 when data valid

• Transmitterready=1 when ready to print next char

Page 37: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 73© Alvin R. Lebeck

Interrupt Driven I/O

• Set Interrupt Enable = 1generates a level 0 interrupt when Ready becomes 1if interrupt is enabled in Status Register also

• Run spim with -notrapallows you to install interrupt handlerSo use both –mapped_io and -notrap

1

Interruptenable

Ready

1UnusedReceiver control(0xffff0000)

CPS 104 74© Alvin R. Lebeck

Status Register

• Bit 0 = interrupt enable• Bit 8 = allow level 0 interrupts

terminal input generates level 0 int.

• Coprocessor 0, register 12use mfc0, mtc0

• On interrupt, bits 0-5 are shifted left by 2disables interrupts and enters kernel mode

• When done servicing interrupt, use rfe to restore15 8 5 4 3 2 1 0

Interrupt mask Old Previous Current

Kern

el/

user Inte

rrupt

en

able

Kern

el/

user Ke

rnel/

us

erInte

rrupt

en

able

Inte

rrupt

en

able

Page 38: Virtual Memory Input/Output - Duke Computer Science › ... › cps104 › lectures › 2up-input-output.pdf · 2007-04-11 · Virtual Memory Input/Output Computer Science 104 Lecture

CPS 104 75© Alvin R. Lebeck

• Code 0000 = external interruptterminal interrupt

Cause Register

15 10 5 2

Pendinginterrupts

Exceptioncode