Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Virtual MemoryInput/Output
Computer Science 104Lecture 21
CPS 104 2© Alvin R. Lebeck
Projects Due Next Friday April 20Homework #6 assignedToday
VMVM + Caches,I/O
ReadingI/O
Chapter 8 Input Output (primarily 8.4 and 8.5)
Admin
CPS 104 3© Alvin R. Lebeck
Review: Virtual Memory
• Process = virtual address space + thread of control
• TranslationVA -> PAWhat physical address does virtual address A map toIs VA in physical memory?
• Protection (access control)
Do you have permission to access it?
Virtual Address (VA)
Physical Address (PA)
CPS 104 4© Alvin R. Lebeck
Review: Paged Virtual Memory
• Virtual address (232, 264) to Physical Address mapping (228)
virtual page to physical page frame• Fixed size units for access control & translation
Virtual
Physical0x1000
0x6000
0x9000
0x00000x1000
0x2000
0x11000
Virtual page number Offset
CPS 104 5© Alvin R. Lebeck
31
Page offset
011
Virtual Page Number
Page offset
11 0
Physical Frame Number
27
Page Table
Virtual Address
Physical Address
Page size: 4K
Review: Virtual to Physical Address translation
Need to translate
every access (instruction and data)
CPS 104 6© Alvin R. Lebeck
The Memory Management Unit (MMU)
• Inputvirtual address
• Outputphysical addressaccess violation (exception)
• Access Violationsnot presentuser vs. kernelwritereadexecute
CPS 104 7© Alvin R. Lebeck
Translation Lookaside Buffers (TLB)
• Need to perform address translation on every memory reference
30% of instructions are memory referencesat least one memory reference per cycle on today’s processors
• Make Common Case Fast, others correct• If you just accessed a virtual page and hence
determined its physical frame, you’re likely to need that same translation again in the near future
LOCALITY!
• Throw hardware at the problem• Cache PTEs
CPS 104 8© Alvin R. Lebeck
Fast Translation: Translation Buffer
• Cache of translated addresses• 64 entry fully associative
Page Number
Pageoffset
. . . . . .
v r w tag phys frame
. . .
64x1 mux
1 2
. . .
3
4Physical Address
Virtual Address
CPS 104 9© Alvin R. Lebeck
TLB Design
• Must be fast, not increase critical path• Must achieve high hit ratio• Generally small # of entries with high associative• Mapping change
page removed from physical memoryprocessor must invalidate the TLB entry (special privileged instructions)
• PTE is per process entityMultiple processes with same virtual addressesContext Switches?
• Flush TLB• Add Address space identifier (ASID or PID)
part of processor state, must be set on context switch
CPS 104 10© Alvin R. Lebeck
Hardware Managed TLBs
• Hardware Handles TLB miss
• Dictates page table organization
• Complicated state machine to “walk page table”
Multiple levels for forward mappedLinked list for inverted
• Exception only if access violation
Control
Memory
TLB
CPU
CPS 104 11© Alvin R. Lebeck
Software Managed TLBs
• Software Handles TLB miss
OS reads translations from Page Table and puts them in TLBspecial instructions
• Flexible page table organization
• Simple Hardware to detect Hit or Miss
• Exception if TLB miss or access violation
Control
Memory
TLB
CPU
CPS 104 12© Alvin R. Lebeck
Virtual Memory• Provides illusion of very large memory
Sum of the memory of many jobs greater than physical memoryAddress space of each job larger than physical memory
• Good utilization of available physical memory.• Simplifies memory management: code and data movement,
protection, ... (main reason today)• Exploits memory hierarchy to keep average access time low.• Involves at least two storage levels: main and secondary
Virtual Address -- address used by the programmerVirtual Address Space -- collection of such addressesMemory Address -- address in physical memory also known as
“physical address” or “real address”
CPS 104 13© Alvin R. Lebeck
• Divide memory (virtual and physical) into fixed size blocks (Pages, Frames).
Pages in Virtual space.Frames in Physical space.
• Make page size a power of 2: (page size = 2k)• All pages in the virtual address space are contiguous.• Pages can be mapped into any physical Frame• Some pages in main memory (DRAM), some pages on
secondary memory (disk).
Paged Virtual Memory: Main Idea
CPS 104 14© Alvin R. Lebeck
Paged Virtual Memory: Main Idea (Cont)
• All programs are written using Virtual Memory Address Space.
• The hardware does on-the-fly translation between virtual and physical address spaces.
• Use a Page Table to translate between Virtual and Physical addresses
• Translation Lookaside Buffer (TLB) expedites address translation
• Must select “good” page size to minimize fragmentation
CPS 104 15© Alvin R. Lebeck
Kernel
Mapping the Kernel
• Digital Unix Ksegkseg (bit 63 = 1, 62 = 0)
• Kernel has direct access to physical memory
• One VA->PA mapping for entire Kernel
• Lock (pin) TLB entryor special HW detection
UserStack
Kernel
User Code/Data
PhysicalMemory
0
264-1
CPS 104 16© Alvin R. Lebeck
Segmented Virtual Memory• Virtual address (232,
264) to Physical Address mapping (228)
• Each segmentvariable sizeAddress = base + offsetcontiguous in both VA and PA
Virtual
Physical0x1000
0x6000
0x9000
0x00000x1000
0x2000
0x11000
CPS 104 17© Alvin R. Lebeck
Intel Pentium Segmentation
Seg Selector Offset
Virtual Address
SegmentDescriptor
Global DescriptorTable (GDT)
Segment BaseAddress
Physical Address Space
CPS 104 18© Alvin R. Lebeck
Intel Pentium Segmentation + Paging
Seg Selector Offset
Virtual Address(logical address)
SegmentDescriptor
Global DescriptorTable (GDT)
Segment BaseAddress
Linear Address Space
PageDir
Physical Address Space
Dir OffsetTable
PageTable
CPS 104 19© Alvin R. Lebeck
Review: Fast Translation: Translation Buffer
• Cache of translated addresses• 64 entry fully associative
Page Number
Pageoffset
. . . . . .
v r w tag phys frame
. . .
64x1 mux
1 2
. . .
483
4Physical Address
VirtualAddress
CPS 104 20© Alvin R. Lebeck
Address Translation and Caches
• Where is the TLB wrt the cache?• What are the consequences?
• Most of today’s systems have more than 1 cache
• Does the OS need to worry about this?
CPS 104 21© Alvin R. Lebeck
TLBs and Caches
CPU
TB
$
MEM
VA
PA
PA
ConventionalOrganization
CPU
$
TB
MEM
VA
VA
PA
Virtually Addressed CacheTranslate only on miss
Alias (Synonym) Problem
CPU
$ TB
MEM
VA
PATags
PA
Overlap $ accesswith VA translation:requires $ index to
remain invariantacross translation
VATags
L2 $
Input / Output
CPS 104 23© Alvin R. Lebeck
Overview
• I/O devicesdevice controller
• Device drivers• Memory Mapped I/O• Programmed I/O• Direct Memory Access (DMA)• Rotational media (disks)• I/O Bus technologies• RAID (if time)
CPS 104 24© Alvin R. Lebeck
Why I/O?
• Interactive Applications (keyboard, mouse, screen)• Long term storage (files, data repository)• Swap for VM• Many different devices
character vs. blockNetworks are everywhere!
• 106 difference CPU (10 -9) & I/O (10 -3)• Response Time vs. Throughput
Not always another process to execute
• OS hides (some) differences in devicessame (similar) interface to many devices
• Permits many apps to share one device
CPS 104 25© Alvin R. Lebeck
I/O Device Examples
Device Behavior Partner Data Rate (KB/sec)Keyboard Input Human 0.01Mouse Input Human 0.02Line Printer Output Human 1.00Laser Printer Output Human 100.00Graphics Display Output Human 30,000.00Network-LAN Input/Output Machine 10,000.00Floppy disk Storage Machine 50.00Optical Disk Storage Machine 500.00Magnetic Disk Storage Machine 5,000.00
CPS 104 26© Alvin R. Lebeck
Time(workload) = Time(CPU) + Time(I/O) - Time(Overlap)
I/O Systems
I/O Devices
I/O Bus
Memory Bus
Processor
Cache
MainMemory
DiskController
Disk Disk
GraphicsController
NetworkInterface
Graphics Network
interrupts
I/O Bridge
Core Chip Set
CPS 104 27© Alvin R. Lebeck
Device Drivers
• top-halfAPI (open, close, read, write, ioctl)I/O Control (IOCTL, device specific arguments)
• bottom-halfinterrupt handlercommunicates with deviceresumes process
• Must have access to user address space and device control registers => runs in kernel mode.
CPS 104 28© Alvin R. Lebeck
Review: Handling an Interrupt/Exception
• Invoke specific kernelroutine based on type of interrupt
interrupt/exception handler
• Must determine what caused interrupt
• Clear the interrupt• Return from interrupt
(RETT, MIPS = RFE, NiosII = eret)
ldaddst
mulbeqld
subbne
RETT
User Program
Interrupt Handler
ServiceRoutines
CPS 104 29© Alvin R. Lebeck
Processor <-> Device Interface Issues
• InterconnectionsBusses
• Processor interfaceI/O InstructionsMemory mapped I/O
• I/O Control StructuresDevice ControllersPolling/Interrupts
• Data movementProgrammed I/O / DMA
• Capacity, Access Time, Bandwidth
CPS 104 30© Alvin R. Lebeck
Device Controllers
DeviceController
Command Status Data 0
Data 1
Data n-1
Busy Done Error
Bus
Device
Interrupt?
Controller deals withmundane control(e.g., position head,
error detection/correction)
Processor communicateswith Controller
CPS 104 31© Alvin R. Lebeck
I/O Instructions
Independent I/O BusCPU
Controller Controller
Device Device
Memorymemorybus
Separate instructions (in,out)
CPS 104 32© Alvin R. Lebeck
Memory Mapped I/O
ROM
RAM
I/O
Bridge
Physical Address• Issue command through store instruction• Check status with load instructionCaches?
$
CPU
L2 $
Memory Bus
Memory Bus Adapter
I/O bus
Controller
Device
CPS 104 33© Alvin R. Lebeck
Communicating with the processor
• Pollingcan waste time waiting for slow I/O devicebusy waitcan interleave with useful work
• Interruptsinterrupt overheadinterrupt could happen anytime - asynchronousno busy wait
CPS 104 34© Alvin R. Lebeck
Data Movement
• Programmed I/Oprocessor has to touch all the datatoo much processor overhead
» for high bandwidth devices (disk, network)
• DMAprocessor sets up transfer(s)DMA controller transfers datacomplicates memory system
CPS 104 35© Alvin R. Lebeck
Programmed I/O & Polling
• Advantage: CPU totally in control
• Disadvantage: Overhead of pollingProgram must perform check of device, thus can’t do useful work
yes
Is thedata
ready?
loaddata
storedata
yesno
done? no
CPS 104 36© Alvin R. Lebeck
Programmed I/O & Interrupt Driven Data Transfer
• User program progress halted only during actual transfer
• Interrupt overhead can dominate transfer time
• Processor must touch all data…too slow for some devices
addsubandornop
readstore...rtimemory
userprogram
(1) I/Ointerrupt
(2) save PC
(3) interruptservice addr
interruptserviceroutine(4)
$
CPU
L2 $
Memory Bus
Memory Bus Adapter
I/O bus
Controller
Device
CPS 104 37© Alvin R. Lebeck
Direct Memory Access (DMA)
• CPU delegates responsibility for data transfer to a special controller
CPU sends a starting address, direction, and length count to DMAC. Then issues "start".
DMAC provides handshake signals for devicecontroller, and memory addresses and handshakesignals for memory.
0 ROM
RAM
Peripherals
DMACn
Memory Mapped I/O
$
CPU
L2 $
Memory Bus
Memory Bus Adapter
I/O bus
DMA CNTRL
CPS 104 38© Alvin R. Lebeck
I/O Bus
Memory Bus
Processor
Cache
MainMemory
DiskController
Disk Disk
GraphicsController
NetworkInterface
Graphics Network
interrupts
I/O and Virtual Caches
I/O Bridge
VirtualCache
PhysicalAddresses
I/O is accomplishedwith physical addressesDMA• flush pages from cache• need pa->va reverse
translation• coherent DMA
CPS 104 39© Alvin R. Lebeck
Types of Storage Devices
• Magnetic Disks• Magnetic Tapes• CD ROM• Juke Box (automated tape library, robots)
CPS 104 40© Alvin R. Lebeck
Organization of a Hard Magnetic Disk
• Typical numbers (depending on the disk size):500 to 2,000 tracks per surface32 to 128 sectors per track
» A sector is the smallest unit that can be read or written
• Traditionally all tracks have the same number of sectors:Constant bit density: record more sectors on the outer tracksRecently relaxed: constant bit size, speed varies with track location
Platters
Track
Sector
CPS 104 41© Alvin R. Lebeck
Magnetic Disk Characteristic
• Cylinder: all the tracks under the headat a given point on all surfaces
• Read/write data is a three-stage process:Seek time: position the arm over the proper trackRotational latency: wait for the desired sectorto rotate under the read/write headTransfer time: transfer a block of bits (sector)under the read-write head
• Average seek time as reported by the industry:Typically in the range of 8 ms to 12 ms(Sum of the time for all possible seek) / (total # of possible seeks)
• Due to locality of disk reference, actual average seek time may:
Only be 25% to 33% of the advertised number
SectorTrack
Cylinder
HeadPlatter
CPS 104 42© Alvin R. Lebeck
Typical Numbers of a Magnetic Disk
• Rotational Latency:Most disks rotate at 3,600 to 10,000 RPMApproximately 16ms to 3.5ms per revolution, respectivelyAn average latency to the desiredinformation is halfway around the disk: 8 ms at 3600 RPM, 4 ms at 7200 RPM
• Transfer Time is a function of :Transfer size (usually a sector): 1 KB / sectorRotation speed: 3600 RPM to 7200 RPMRecording density: bits per inch on a trackDiameter typical ranges from 2.5 to 5.25 inTypical values: 2 to 12 MB per second
SectorTrack
Cylinder
HeadPlatter
CPS 104 43© Alvin R. Lebeck
Disk Access
• Access time = queue + seek + rotational + transfer + overhead
• Seek timemove arm over trackaverage is confusing (startup, slowdown, locality of accesses)
• Rotational latencywait for sector to rotate under headaverage = 0.5/(3600 RPM) = 8.3ms
• Transfer Timef(size, BW bytes/sec)
CPS 104 44© Alvin R. Lebeck
Disk Access Time Example
• Disk Parameters:Transfer size is 8K bytesAdvertised average seek is 12 msDisk spins at 7200 RPMTransfer rate is 4 MB/sec
• Controller overhead is 2 ms• Assume that disk is idle so no queuing delay• What is Average Disk Access Time for a Sector?
Ave seek + ave rot delay + transfer time + controller overhead12 ms + 0.5/(7200 RPM/60) + 8 KB/4 MB/s + 2 ms12 + 4.15 + 2 + 2 = 20 ms
• Advertised seek time assumes no locality: typically 1/4 to 1/3 advertised seek time: 20 ms => 12 ms
CPS 104 45© Alvin R. Lebeck
Buses: Connecting I/O to Processor and Memory
• A bus is a shared communication link• It uses one set of wires to connect multiple
subsystems
Control
Datapath
Memory
ProcessorInput
Output
CPS 104 46© Alvin R. Lebeck
Advantages of Buses
• Versatility:New devices can be added easilyPeripherals can be moved between computersystems that use the same bus standard
• Low Cost:A single set of wires is shared in multiple ways
MemoryProcessorI/O
DeviceI/O
DeviceI/O
Device
CPS 104 47© Alvin R. Lebeck
Disadvantages of Buses
• The bus creates a communication bottleneckBus bandwidth can limit the maximum I/O throughput
• The maximum bus speed is largely limited by:The length of the busThe number of devices on the busThe need to support a range of devices with:
» Widely varying latencies » Widely varying data transfer rates
MemoryProcessorI/O
DeviceI/O
DeviceI/O
Device
CPS 104 48© Alvin R. Lebeck
The General Organization of a Bus
• Control lines:Signal requests and acknowledgmentsIndicate what type of information is on the data lines
• Data lines carry information between the source and the destination:
Data and AddressesComplex commands
Data Lines
Control Lines
CPS 104 49© Alvin R. Lebeck
Master versus Slave
• A bus transaction includes two parts:Sending the addressReceiving or sending the data
• Master is the device that starts the bus transaction by:
Sending the address
• Slave is the device that responds to the address by:Sending data to the master if the master ask for dataReceiving data from the master if the master wants to send data
BusMaster
BusSlave
Master send address
Data can go either way
CPS 104 50© Alvin R. Lebeck
Output Operation
• Output: Processor sending data to the I/O device:
Processor
Control (Memory Read Request)
Memory
Step 1: Request Memory
I/O Device (Disk)
Data (Memory Address)
Processor
Control
Memory
Step 2: Read Memory
I/O Device (Disk)
Data
Processor
Control (Device Write Request)
Memory
Step 3: Send Data to I/O Device
I/O Device (Disk)
Data(I/O Device Address
and then Data)
CPS 104 51© Alvin R. Lebeck
Input Operation
° Input is defined as the Processor receiving data from the I/O device:
Processor
Control (Memory Write Request)
Memory
Step 1: Request Memory
I/O Device (Disk)
Data (Memory Address)
Processor
Control (I/O Read Request)
Memory
Step 2: Receive Data
I/O Device (Disk)
Data(I/O Device Address
and then Data)
CPS 104 52© Alvin R. Lebeck
Types of Buses
• Processor-Memory Bus (design specific)Short and high speedOnly need to match the memory system
» Maximize memory-to-processor bandwidthConnects directly to the processor
• External I/O Bus (industry standard)Usually is lengthy and slowerNeed to match a wide range of I/O devicesConnects to the processor-memory bus or backplane bus
• Backplane Bus (industry standard)Backplane: an interconnection structure within the chassisAllow processors, memory, and I/O devices to coexistCost advantage: one single bus for all components
• Bit-Serial Buses (New trend: USB, Firewire, .. )Use High speed unidirectional point-to-point communication
CPS 104 53© Alvin R. Lebeck
A Computer System with One Bus: Backplane Bus
• A single bus (the backplane bus) is used for:Processor to memory communicationCommunication between I/O devices and memory
• Advantages: Simple and low cost• Disadvantages: slow and the bus can become a major
bottleneck• Example: Early IBM PC
Processor Memory
I/O Devices
Backplane Bus
CPS 104 54© Alvin R. Lebeck
A Two-Bus System
• I/O buses tap into the processor-memory bus via bus adaptors:Processor-memory bus: mainly for processor-memory trafficI/O buses: provide expansion slots for I/O devices
• Example: Apple Macintosh-IINuBus: Processor, memory, and a few selected I/O devicesSCCI Bus: the rest of the I/O devices
Processor Memory
I/OBus
Processor Memory Bus
BusAdaptor
BusAdaptor
BusAdaptor
I/OBus
I/OBus
CPS 104 55© Alvin R. Lebeck
A Three-Bus System
• A small number of backplane buses tap into the processor-memory bus
Processor-memory bus is used for processor memory trafficI/O buses are connected to the backplane bus
• Advantage: loading on the processor bus is greatly reduced
Processor MemoryProcessor Memory Bus
BusAdaptor
BusAdaptor
BusAdaptor
I/O BusBackplane Bus
I/O Bus
CPS 104 56© Alvin R. Lebeck
Pentium 4
• I/O Options
Parallel ATA(100 MB/sec)
Parallel ATA(100 MB/sec)
(20 MB/sec)
PCI bus(132 MB/sec)
CSA(0.266 GB/sec)
AGP 8X(2.1 GB/sec)
Serial ATA(150 MB/sec)
Disk
Pentium 4processor
1 Gbit Ethernet
Memorycontroller
hub(north bridge)
82875P
MainmemoryDIMMs
DDR 400(3.2 GB/sec)
DDR 400(3.2 GB/sec)
Serial ATA(150 MB/sec)
Disk
AC/97(1 MB/sec)
Stereo(surround-
sound) USB 2.0(60 MB/sec)
. . .
I/Ocontroller
hub(south bridge)
82801EB
Graphicsoutput
(266 MB/sec)
System bus (800 MHz, 604 GB/sec)
CD/DVD
Tape
10/100 Mbit Ethernet
CPS 104 57© Alvin R. Lebeck
Synchronous and Asynchronous Bus
• Synchronous Bus:Includes a clock in the control linesA fixed protocol for communication that is relative to the clockAdvantage: involves very little logic and can run very fastDisadvantages:
» Every device on the bus must run at the same clock rate» To avoid clock skew, bus must be short if it is fast.
• Asynchronous Bus:It is not clockedIt can accommodate a wide range of devicesIt can be lengthened without worrying about clock skewIt requires a handshaking protocol
CPS 104 58© Alvin R. Lebeck
A Handshaking Protocol
• Three control linesReadReq: indicates a read request for memory
Address is put on the data lines at the same lineDataRdy: indicates the data word is now ready on the data lines
Data is put on the data lines at the same timeAck: acknowledge the ReadReq or the DataRdy of the other party
ReadReq
AddressData Data
Ack
DataRdy
1 2
2
3
4
4
56
6 7
CPS 104 59© Alvin R. Lebeck
Increasing the Bus Bandwidth
• Separate versus multiplexed address and data lines:Address and data can be transmitted in one bus cycleif separate address and data lines are availableCost: (a) more bus lines, (b) increased complexity
• Data bus width:By increasing the width of the data bus, transfers of multiple words require fewer bus cyclesExample: USB vs. 32-bit PCI vs. 64-bit PCICost: more bus lines
• Block transfers:Allow the bus to transfer multiple words in back-to-back bus cyclesOnly one address needs to be sent at the beginningThe bus is not released until the last word is transferredCost: (a) increased complexity
(b) decreased response time for request
CPS 104 60© Alvin R. Lebeck
Obtaining Access to the Bus
• One of the most important issues in bus design:How is the bus reserved by a devices that wishes to use it?
• Chaos is avoided by a master-slave arrangement:Only the bus master can control access to the bus:
» It initiates and controls all bus requestsA slave responds to read and write requests
• The simplest system:Processor is the only bus masterAll bus requests must be controlled by the processorMajor drawback: the processor is involved in every transaction
BusMaster
BusSlave
Control: Master initiates requests
Data can go either way
CPS 104 61© Alvin R. Lebeck
Multiple Potential Bus Masters: the Need for Arbitration
• Bus arbitration scheme:A bus master wanting to use the bus asserts the bus requestA bus master cannot use the bus until its request is grantedA bus master must signal to the arbiter after finish using the bus
• Bus arbitration schemes usually try to balance two factors:
Bus priority: the highest priority device should be serviced firstFairness: Even the lowest priority device should never
be completely locked out from the bus• Bus arbitration schemes can be divided into four
broad classes:Distributed arbitration by self-selection: each device wanting the bus places a code indicating its identity on the bus.Distributed arbitration by collision detection: Ethernet uses this.Daisy chain arbitration: single device with all request lines.Centralized, parallel arbitration: see next-next slide
CPS 104 62© Alvin R. Lebeck
Centralized Bus Arbitration
Master A Master B Master C
Request-C
Request-B
Request-A
Grant-C
Grant-B
Grant-A
ArbitrationCircuit
Release
CPS 104 63© Alvin R. Lebeck
The Daisy Chain Bus Arbitration Scheme
• Advantage: simple• Disadvantages:
Cannot assure fairness:A low-priority device may be locked out indefinitely
The use of the daisy chain grant signal also limits the bus speed
BusArbiter
Device 1HighestPriority
Device NLowestPriority
Device 2
Grant Grant GrantRelease
Request
CPS 104 64© Alvin R. Lebeck
Summary of Bus Options
Option High performance Low cost
Bus width Separate address Multiplex address& data lines & data lines
Data width Wider is faster Narrower is cheaper (e.g., 64 bits) (e.g., 8 bits)
Transfer size Multiple words has Single-word transferless bus overhead is simpler
Bus masters Multiple Single master(requires arbitration) (no arbitration)
Clocking Synchronous Asynchronous
CPS 104 65© Alvin R. Lebeck
Designing an I/O System
• CPU 3x109 inst/sec, average 100,000 insts in OS per I/O
• Memory bus 1000 MB/sec• SCSI Ultra320 controller 320MB/sec, up to 7 disks• Disk R/W BW 75 MB/sec, average seek+rotational 6ms• Workload:
64KB reads (sequential) 200,000 user insts per I/O
• Find: max sustainable I/O rate # of disks & controllers required
CPS 104 66© Alvin R. Lebeck
I/O System Design
Find bottleneck• CPU I/O rate = inst rate/(insts per I/O) =
3x109/(200+100) x103 = 10,000 I/Os / sec• Memory bus I/O rate = Bus BW/Bytes per I/O = 1000
x 106/64x103 = 15,625 I/Os / secConfigure rest of system• # of disks = CPU I/O rate / disk I/O rate• Time for 1 disk I/O = seek + rotational + transfer
= 6ms + 64KB / 75MB/sec = 6.9 ms• I/Os / sec = (1000 ms/sec) / (6.9 ms / I/O) = 146 I/Os /
sec• # disks = 10,0000 I/Os / sec / 146 I/Os / sec = 69 disks
CPS 104 67© Alvin R. Lebeck
I/O system design
• Determine # of controllers• Can we fill one controller with 7 disks?
Average Transfer rate per disk = 64KB / 6.9 ms = 9.56 MB/secTotal is < 70 MB /sec << 320MB /sec
• So 69 / 7 ~= 10 SCSI controllers
• What would have to change to require more controllers?
CPS 104 68© Alvin R. Lebeck
Communication Networks
NodeProcessor
ControlReg. I/F
NetI/F Memory
RequestBlock
ReceiveBlock
Media
Network Controller
Peripheral Backplane Bus
DMA
. . .
Processor MemoryList of request blocks
Data to be transmitted
. . .List of receive blocks
Data receivedDMA
. . .List of free blocks
• Send/receive queues in processor memories• Network controller copies back and forth via DMA• No host intervention needed• Interrupt host when message sent or received
CPS 104 69© Alvin R. Lebeck
Relationship to Processor Architecture
• Virtual memory frustrates DMApage faults during DMA?
• Synchronization between controller and CPU• Caches required for processor performance cause
problems for I/OFlushing is expensive, I/O pollutes cacheSolution is borrowed from shared memory multiprocessors "snooping” (coherent DMA)
• Caches and write buffersneed uncached and write buffer flush for memory mapped I/O
CPS 104 70© Alvin R. Lebeck
Summary
• Virtual memory & caches• I/O
Processor interface issuesMemory mapped I/O, Interrupts and polling
• I/O devicesdevice controller
• Rotational media (disks)Disk access time equation, etc.
• I/O bus ⇔ memory busStandard interfaces, many devices from different vendors
• BusesAsynchronous vs. synchronousBandwidth tradeoffsArbitration schemes
• I/O and virtual memory interaction• Designing an I/O system
CPS 104 71© Alvin R. Lebeck
SPIM (Homework): Interrupt Handler
• MIPS/SPIM program• Use memory-mapped I/O• Use interrupts
• Program should:Accept keyboard input
» interruptsEcho input to terminal
» pollingExit if user typed ‘q’
• Programmed I/O
CPS 104 72© Alvin R. Lebeck
1
Interrupt enable
Ready
1UnusedReceiver control (0xffff0000)
8
Received byte
UnusedReceiver data (0xffff0004)
1
Interrupt enable
Ready
1UnusedTransmitter control (0xffff0008)
Transmitter data (0xffff000c)
8
Transmitted byte
Unused
Terminal Control
• Memory mapped I/O
use LW, SW• -mapped_io
command line option
• Receiver - inputready=1 when data valid
• Transmitterready=1 when ready to print next char
CPS 104 73© Alvin R. Lebeck
Interrupt Driven I/O
• Set Interrupt Enable = 1generates a level 0 interrupt when Ready becomes 1if interrupt is enabled in Status Register also
• Run spim with -notrapallows you to install interrupt handlerSo use both –mapped_io and -notrap
1
Interruptenable
Ready
1UnusedReceiver control(0xffff0000)
CPS 104 74© Alvin R. Lebeck
Status Register
• Bit 0 = interrupt enable• Bit 8 = allow level 0 interrupts
terminal input generates level 0 int.
• Coprocessor 0, register 12use mfc0, mtc0
• On interrupt, bits 0-5 are shifted left by 2disables interrupts and enters kernel mode
• When done servicing interrupt, use rfe to restore15 8 5 4 3 2 1 0
Interrupt mask Old Previous Current
Kern
el/
user Inte
rrupt
en
able
Kern
el/
user Ke
rnel/
us
erInte
rrupt
en
able
Inte
rrupt
en
able
CPS 104 75© Alvin R. Lebeck
• Code 0000 = external interruptterminal interrupt
Cause Register
15 10 5 2
Pendinginterrupts
Exceptioncode