Upload
aubrey-allen
View
215
Download
0
Embed Size (px)
Citation preview
Input/Output
Professor Alvin R. Lebeck
Computer Science 220
Fall 2001
8/18/97 CPS 220 Alvin R. Lebeck 2
Admin
• HW #4 Due November 12
• Projects
8/18/97 CPS 220 Alvin R. Lebeck 3
Review: VM & Complete Memory Hierarchy
• Caches cost-effective memory hierarchy
• VM is very nice for programmers
• TLB speeds up address translation
• Know how to block diagram entire hierarchy– direct-mapped, 2-way, fully-associative
– where is TLB?
– including how to get desired word or byte from cache block
8/18/97 CPS 220 Alvin R. Lebeck 4
I/O Bus
Memory Bus
Processor
Cache
MainMemory
DiskController
Disk Disk
GraphicsController
NetworkInterface
Graphics Network
interrupts
System Organization
I/O Bridge
Core Chip Set
8/18/97 CPS 220 Alvin R. Lebeck 5
Why I/O?
• Interactive Apps
• Long term storage (files)
• Swap for VM
• Networks (next chapter)
• 106 difference CPU (10 -9) & I/O (10 -3)
• Response Time vs Throughput– Not always another process to execute
• Remember Amdahl’s Law
8/18/97 CPS 220 Alvin R. Lebeck 6
Types of Storage Devices
• Magnetic Disks
• Magnetic Tapes
• CD
• DVD
• Flash Memory
• Juke Box (automated tape library, robots)
8/18/97 CPS 220 Alvin R. Lebeck 7
Magnetic Disks
• Long term nonvolatile storage
• Another slower, less expensive level of memory hierarchy
SectorTrack
Cylinder
HeadPlatter
Arm
Alvin R. Lebeck 8
Organization of a Hard Magnetic Disk
• Typical numbers (depending on the disk size):– 500 to 2,000 tracks per surface
– 32 to 128 sectors per track
» A sector is the smallest unit that can be read or written
• Traditionally all tracks have the same number of sectors:– Constant bit density: record more sectors on the outer tracks
– Recently relaxed: constant bit size, speed varies with track location
Platters
Track
Sector
Alvin R. Lebeck 9
Magnetic Disk Characteristic
• Cylinder: all the tracks under the head at a given point on all surfaces
• Read/write data is a three-stage process:– Seek time: position the arm over the proper track
– Rotational latency: wait for the desired sectorto rotate under the read/write head
– Transfer time: transfer a block of bits (sector)under the read-write head
• Average seek time as reported by the industry:– Typically in the range of 8 ms to 12 ms
– (Sum of the time for all possible seek) / (total # of possible seeks)
• Due to locality of disk reference, actual average seek time may:
– Only be 25% to 33% of the advertised number
SectorTrack
Cylinder
HeadPlatter
8/18/97 CPS 220 Alvin R. Lebeck 10
Disk Access
• Access time =
queue + seek + rotational + transfer + overhead
• Seek time– move arm over track
– average is confusing (startup, slowdown, locality of accesses)
• Rotational latency– wait for sector to rotate under head
– average = 0.5/(3600 RPM) = 8.3ms
• Transfer Time– f(size, BW bytes/sec)
8/18/97 CPS 220 Alvin R. Lebeck 11
Disk Time Example
• Disk Parameters:– Transfer size is 8K bytes
– Advertised average seek is 12 ms
– Disk spins at 7200 RPM
– Transfer rate is 4 MB/sec
• Controller overhead is 2 ms
• Assume that disk is idle so no queuing delay
• What is Average Disk Access Time for a Sector?– Ave seek + ave rot delay + transfer time + controller overhead
– 12 ms + 0.5/(7200 RPM/60) + 8 KB/4 MB/s + 2 ms
– 12 + 4.15 + 2 + 2 = 20 ms
• Advertised seek time assumes no locality: typically 1/4 to 1/3 advertised seek time: 20 ms => 12 ms
8/18/97 CPS 220 Alvin R. Lebeck 12
DRAM as Disk
• Solid state disk, Expanded Storage, NVRAM
• Disk is slow, DRAM is fast => replace Disk with battery backed DRAM
• BUT, Disk is cheap, much cheaper than DRAM
8/18/97 CPS 220 Alvin R. Lebeck 13
Alternative Storage
• CD ROM– read only: good for distribution
• CD RW
• FLASH Memory
• Magnetic Tape– Sequential Access
– R-DAT (Rotating Digital Audio Tape)
» Helical Scan (angle to tape, high density ~5GB)
– Tera to peta bytes of storage (NASA EOS)
8/18/97 CPS 220 Alvin R. Lebeck 14
Connecting I/O Devices to CPU/Memory
• Memory Bus– Short
– Fast
– Known set of components
– Proprietary (don’t release design free)
– Ultra doesn’t have traditional bus
• Separate I/O Bus (e.g., PCI)– Standard
– Accept variety of components (w/ different BW performance)
– Long
– Slow
8/18/97 CPS 220 Alvin R. Lebeck 15
Processor Interface Issues
• Interconnections– Busses
• Processor interface– Instructions
– Memory mapped I/O
• I/O Control Structures– Polling
– Interrupts
– DMA
– I/O Controllers
– I/O Processors
• Capacity, Access Time, Bandwidth
Alvin R. Lebeck 16
Device Controllers
DeviceController
Command Status Data 0
Data 1
Data n-1
Busy Done Error
Bus
Device
Interrupt?
Controller deals withmundane control(e.g., position head, error detection/correction)
Processor communicateswith Controller
Alvin R. Lebeck 17
Review: Interrupts and Exceptions
• Unnatural change in control flow
• Interrupt is external event – devices: disk, network, keyboard, etc.
– clock for timeslicing
– these are useful events, must do something when they occur.
• Exception is often potential problem with program– segmentation fault
– bus error
– divide by 0
– don’t want my bug to crash the entire machine
– page fault (virtual memory…)
Alvin R. Lebeck 18
Review: Handling an Interrupt/Exception
• Invoke specific kernel routine based on type of interrupt
– interrupt/exception handler
• Must determine what caused interrupt
– could use software to examine each device
– PC = interrupt_handler
• Vectored Interrupts– PC = interrupt_table[i]
• Clear the interrupt• kernel initializes table at boot
time• May return from interrupt
(RETT) to different process (e.g, context switch)
ldaddst
mulbeqld
subbne
RETT
User Program
Interrupt Handler
ServiceRoutines
Alvin R. Lebeck 19
Device Drivers
• top-half– API (open, close, read, write, ioctl)
– I/O Control (IOCTL, device specific arguments)
• bottom-half– interrupt handler
– communicates with device
– resumes process
• Must have access to user address space and device control registers => runs in kernel mode.
8/18/97 CPS 220 Alvin R. Lebeck 20
I/O Interface
Independent I/O Bus
CPU
Interface Interface
Peripheral Peripheral
Memory
memorybus
Seperate I/O instructions (in,out)
CPU
Interface Interface
Peripheral Peripheral
Memory
Lines distinguish between I/O and memory transferscommon memory
& I/O bus
VME busMultibus-IINubus
40 Mbytes/secoptimistically
10 MIP processorcompletelysaturates the bus!
8/18/97 CPS 220 Alvin R. Lebeck 21
Memory Mapped I/O
Single Memory & I/O BusNo Separate I/O Instructions
CPU
Interface Interface
Peripheral Peripheral
Memory
ROM
RAM
I/O$
CPU
L2 $
Memory Bus
Memory Bus Adaptor
I/O bus
Bridge
Physical Address
Issue command through storeCheck for completion with loadWrite-back cache / Write buffer?
8/18/97 CPS 220 Alvin R. Lebeck 22
Programmed I/O (Polling)
CPU
IOC
device
Memory
Is thedata
ready?
readdata
storedata
yesno
done? no
yes
busy wait loopnot an efficient
way to use the CPUunless the device
is very fast!
but checks for I/O completion can bedispersed amongcomputationallyintensive code
8/18/97 CPS 220 Alvin R. Lebeck 23
Interrupt Driven Data Transfer
addsubandornop
readstore...rti
memory
userprogram(1) I/O
interrupt
(2) save PC
(3) interruptservice addr
interruptserviceroutine(4)
User program progress only halted during actual transfer
Interrupt Overhead can dominate transfer time.1000 xfers of 1000 bytes each: 2usecs for interrupt 98usecs for service
Device xfer rate: 10 MB/s => .1usec/byte => .1ms for 1000 bytes
$
CPU
L2 $
Memory Bus
Memory Bus Adapter
I/O bus
Controller
Device
8/18/97 CPS 220 Alvin R. Lebeck 24
Direct Memory Access
Time to do 1000 x 1000 bytes:
1 DMA set-up sequence @ 50 µsec1 interrupt @ 2 µsec1 interrupt service sequence @ 48 µsec
.0001 second of CPU time
CPU sends a starting address, direction, and length count to DMAC. Then issues "start".
DMAC must “talk” to both memory bus andI/O bus (e.g., PCI).
0ROM
RAM
Peripherals
DMAC n
Memory Mapped I/O
$
CPU
L2 $
Memory Bus
Memory Bus Adapter
I/O bus
DMA CNTRL
8/18/97 CPS 220 Alvin R. Lebeck 25
Input/Output Processors
CPU IOP
Mem
D1
D2
Dn
. . .main memory
bus
I/Obus
CPU
IOP
issues instruction to IOP
interrupts when done(1)
memory
(2)
(3)
(4)
Device to/from memorytransfers are controlledby the IOP directly.
IOP steals memory cycles.
OP Device Address
target devicewhere cmds are
looks in memory for commands
OP Addr Cnt Other
whatto do
whereto putdata
howmuch
specialrequests
8/18/97 CPS 220 Alvin R. Lebeck 26
Relationship to Processor Architecture
• I/O instructions and I/O busses connected directly to processor have largely disappeared (Memory Mapped I/O)
– Some embedded processors still have them (micro-controllers)
• Interrupts:– Stack replaced by shadow registers
– Handler saves registers and re-enables higher priority int's
– Interrupt types reduced in number; handler must query interrupt controller
8/18/97 CPS 220 Alvin R. Lebeck 27
Relationship to Processor Architecture
• Caches required for processor performance cause problems for I/O
– Flushing is expensive, I/O pollutes cache
– Solution is borrowed from shared memory multiprocessors "snooping"
• Virtual memory frustrates DMA
• Load/store architecture at odds with atomic operations
– load locked, store conditional
• Caches and write buffers– need uncached and write buffer flush for memory mapped I/O
• Stateful processors hard to context switch
8/18/97 CPS 220 Alvin R. Lebeck 28
I/O Data Flow
Memory-to-Memory Copy
DMA over Peripheral Bus
Xfer over Disk Channel
Xfer over Serial Interface
Application Address Space
OS Buffers (>10 MByte)
HBA Buffers (1 M - 4 MBytes)
Track Buffers (32K - 256KBytes)
I/O Device
I/O Controller
Embedded Controller
Head/Disk Assembly
Host Processor
Impediment to high performance: multiple copies, complex hierarchy
8/18/97 CPS 220 Alvin R. Lebeck 29
Communication Networks
Performance limiter is memory system, OS overhead, not HW protocols
NodeProcessor
ControlReg. I/F
NetI/F Memory
RequestBlock
ReceiveBlock
Media
Network Controller
Peripheral Backplane Bus
DMA
. . .
Processor MemoryList of request blocks
Data to be transmitted
. . .
List of receive blocks
Data receivedDMA
. . .
List of free blocks
• Send/receive queues in processor OS memory• Network controller copies back and forth via DMA• No host intervention needed• Interrupt host when message sent or received• Memory-to-Memory copy to user space
Alvin R. Lebeck 30
Network Connected Devices
• High speed networks (10Gb Ethernet soon)• How can we eliminate overheads?Page Flipping• OS places aligned data into memory and remaps
pagesRDMA• Idea is to eliminate kernel-to-user copy (User-level
messaging)• Requires “translation” on Network Interface (NI)• Application registers region with OS• OS stores pointer in NI• On Receive, pointer says where data should go
Alvin R. Lebeck 31
Next Time
• Bus designs (connecting components)
• RAID