31
Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

Embed Size (px)

Citation preview

Page 1: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

Input/Output

Professor Alvin R. Lebeck

Computer Science 220

Fall 2001

Page 2: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 2

Admin

• HW #4 Due November 12

• Projects

Page 3: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 3

Review: VM & Complete Memory Hierarchy

• Caches cost-effective memory hierarchy

• VM is very nice for programmers

• TLB speeds up address translation

• Know how to block diagram entire hierarchy– direct-mapped, 2-way, fully-associative

– where is TLB?

– including how to get desired word or byte from cache block

Page 4: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 4

I/O Bus

Memory Bus

Processor

Cache

MainMemory

DiskController

Disk Disk

GraphicsController

NetworkInterface

Graphics Network

interrupts

System Organization

I/O Bridge

Core Chip Set

Page 5: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 5

Why I/O?

• Interactive Apps

• Long term storage (files)

• Swap for VM

• Networks (next chapter)

• 106 difference CPU (10 -9) & I/O (10 -3)

• Response Time vs Throughput– Not always another process to execute

• Remember Amdahl’s Law

Page 6: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 6

Types of Storage Devices

• Magnetic Disks

• Magnetic Tapes

• CD

• DVD

• Flash Memory

• Juke Box (automated tape library, robots)

Page 7: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 7

Magnetic Disks

• Long term nonvolatile storage

• Another slower, less expensive level of memory hierarchy

SectorTrack

Cylinder

HeadPlatter

Arm

Page 8: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

Alvin R. Lebeck 8

Organization of a Hard Magnetic Disk

• Typical numbers (depending on the disk size):– 500 to 2,000 tracks per surface

– 32 to 128 sectors per track

» A sector is the smallest unit that can be read or written

• Traditionally all tracks have the same number of sectors:– Constant bit density: record more sectors on the outer tracks

– Recently relaxed: constant bit size, speed varies with track location

Platters

Track

Sector

Page 9: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

Alvin R. Lebeck 9

Magnetic Disk Characteristic

• Cylinder: all the tracks under the head at a given point on all surfaces

• Read/write data is a three-stage process:– Seek time: position the arm over the proper track

– Rotational latency: wait for the desired sectorto rotate under the read/write head

– Transfer time: transfer a block of bits (sector)under the read-write head

• Average seek time as reported by the industry:– Typically in the range of 8 ms to 12 ms

– (Sum of the time for all possible seek) / (total # of possible seeks)

• Due to locality of disk reference, actual average seek time may:

– Only be 25% to 33% of the advertised number

SectorTrack

Cylinder

HeadPlatter

Page 10: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 10

Disk Access

• Access time =

queue + seek + rotational + transfer + overhead

• Seek time– move arm over track

– average is confusing (startup, slowdown, locality of accesses)

• Rotational latency– wait for sector to rotate under head

– average = 0.5/(3600 RPM) = 8.3ms

• Transfer Time– f(size, BW bytes/sec)

Page 11: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 11

Disk Time Example

• Disk Parameters:– Transfer size is 8K bytes

– Advertised average seek is 12 ms

– Disk spins at 7200 RPM

– Transfer rate is 4 MB/sec

• Controller overhead is 2 ms

• Assume that disk is idle so no queuing delay

• What is Average Disk Access Time for a Sector?– Ave seek + ave rot delay + transfer time + controller overhead

– 12 ms + 0.5/(7200 RPM/60) + 8 KB/4 MB/s + 2 ms

– 12 + 4.15 + 2 + 2 = 20 ms

• Advertised seek time assumes no locality: typically 1/4 to 1/3 advertised seek time: 20 ms => 12 ms

Page 12: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 12

DRAM as Disk

• Solid state disk, Expanded Storage, NVRAM

• Disk is slow, DRAM is fast => replace Disk with battery backed DRAM

• BUT, Disk is cheap, much cheaper than DRAM

Page 13: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 13

Alternative Storage

• CD ROM– read only: good for distribution

• CD RW

• FLASH Memory

• Magnetic Tape– Sequential Access

– R-DAT (Rotating Digital Audio Tape)

» Helical Scan (angle to tape, high density ~5GB)

– Tera to peta bytes of storage (NASA EOS)

Page 14: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 14

Connecting I/O Devices to CPU/Memory

• Memory Bus– Short

– Fast

– Known set of components

– Proprietary (don’t release design free)

– Ultra doesn’t have traditional bus

• Separate I/O Bus (e.g., PCI)– Standard

– Accept variety of components (w/ different BW performance)

– Long

– Slow

Page 15: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 15

Processor Interface Issues

• Interconnections– Busses

• Processor interface– Instructions

– Memory mapped I/O

• I/O Control Structures– Polling

– Interrupts

– DMA

– I/O Controllers

– I/O Processors

• Capacity, Access Time, Bandwidth

Page 16: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

Alvin R. Lebeck 16

Device Controllers

DeviceController

Command Status Data 0

Data 1

Data n-1

Busy Done Error

Bus

Device

Interrupt?

Controller deals withmundane control(e.g., position head, error detection/correction)

Processor communicateswith Controller

Page 17: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

Alvin R. Lebeck 17

Review: Interrupts and Exceptions

• Unnatural change in control flow

• Interrupt is external event – devices: disk, network, keyboard, etc.

– clock for timeslicing

– these are useful events, must do something when they occur.

• Exception is often potential problem with program– segmentation fault

– bus error

– divide by 0

– don’t want my bug to crash the entire machine

– page fault (virtual memory…)

Page 18: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

Alvin R. Lebeck 18

Review: Handling an Interrupt/Exception

• Invoke specific kernel routine based on type of interrupt

– interrupt/exception handler

• Must determine what caused interrupt

– could use software to examine each device

– PC = interrupt_handler

• Vectored Interrupts– PC = interrupt_table[i]

• Clear the interrupt• kernel initializes table at boot

time• May return from interrupt

(RETT) to different process (e.g, context switch)

ldaddst

mulbeqld

subbne

RETT

User Program

Interrupt Handler

ServiceRoutines

Page 19: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

Alvin R. Lebeck 19

Device Drivers

• top-half– API (open, close, read, write, ioctl)

– I/O Control (IOCTL, device specific arguments)

• bottom-half– interrupt handler

– communicates with device

– resumes process

• Must have access to user address space and device control registers => runs in kernel mode.

Page 20: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 20

I/O Interface

Independent I/O Bus

CPU

Interface Interface

Peripheral Peripheral

Memory

memorybus

Seperate I/O instructions (in,out)

CPU

Interface Interface

Peripheral Peripheral

Memory

Lines distinguish between I/O and memory transferscommon memory

& I/O bus

VME busMultibus-IINubus

40 Mbytes/secoptimistically

10 MIP processorcompletelysaturates the bus!

Page 21: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 21

Memory Mapped I/O

Single Memory & I/O BusNo Separate I/O Instructions

CPU

Interface Interface

Peripheral Peripheral

Memory

ROM

RAM

I/O$

CPU

L2 $

Memory Bus

Memory Bus Adaptor

I/O bus

Bridge

Physical Address

Issue command through storeCheck for completion with loadWrite-back cache / Write buffer?

Page 22: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 22

Programmed I/O (Polling)

CPU

IOC

device

Memory

Is thedata

ready?

readdata

storedata

yesno

done? no

yes

busy wait loopnot an efficient

way to use the CPUunless the device

is very fast!

but checks for I/O completion can bedispersed amongcomputationallyintensive code

Page 23: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 23

Interrupt Driven Data Transfer

addsubandornop

readstore...rti

memory

userprogram(1) I/O

interrupt

(2) save PC

(3) interruptservice addr

interruptserviceroutine(4)

User program progress only halted during actual transfer

Interrupt Overhead can dominate transfer time.1000 xfers of 1000 bytes each: 2usecs for interrupt 98usecs for service

Device xfer rate: 10 MB/s => .1usec/byte => .1ms for 1000 bytes

$

CPU

L2 $

Memory Bus

Memory Bus Adapter

I/O bus

Controller

Device

Page 24: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 24

Direct Memory Access

Time to do 1000 x 1000 bytes:

1 DMA set-up sequence @ 50 µsec1 interrupt @ 2 µsec1 interrupt service sequence @ 48 µsec

.0001 second of CPU time

CPU sends a starting address, direction, and length count to DMAC. Then issues "start".

DMAC must “talk” to both memory bus andI/O bus (e.g., PCI).

0ROM

RAM

Peripherals

DMAC n

Memory Mapped I/O

$

CPU

L2 $

Memory Bus

Memory Bus Adapter

I/O bus

DMA CNTRL

Page 25: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 25

Input/Output Processors

CPU IOP

Mem

D1

D2

Dn

. . .main memory

bus

I/Obus

CPU

IOP

issues instruction to IOP

interrupts when done(1)

memory

(2)

(3)

(4)

Device to/from memorytransfers are controlledby the IOP directly.

IOP steals memory cycles.

OP Device Address

target devicewhere cmds are

looks in memory for commands

OP Addr Cnt Other

whatto do

whereto putdata

howmuch

specialrequests

Page 26: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 26

Relationship to Processor Architecture

• I/O instructions and I/O busses connected directly to processor have largely disappeared (Memory Mapped I/O)

– Some embedded processors still have them (micro-controllers)

• Interrupts:– Stack replaced by shadow registers

– Handler saves registers and re-enables higher priority int's

– Interrupt types reduced in number; handler must query interrupt controller

Page 27: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 27

Relationship to Processor Architecture

• Caches required for processor performance cause problems for I/O

– Flushing is expensive, I/O pollutes cache

– Solution is borrowed from shared memory multiprocessors "snooping"

• Virtual memory frustrates DMA

• Load/store architecture at odds with atomic operations

– load locked, store conditional

• Caches and write buffers– need uncached and write buffer flush for memory mapped I/O

• Stateful processors hard to context switch

Page 28: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 28

I/O Data Flow

Memory-to-Memory Copy

DMA over Peripheral Bus

Xfer over Disk Channel

Xfer over Serial Interface

Application Address Space

OS Buffers (>10 MByte)

HBA Buffers (1 M - 4 MBytes)

Track Buffers (32K - 256KBytes)

I/O Device

I/O Controller

Embedded Controller

Head/Disk Assembly

Host Processor

Impediment to high performance: multiple copies, complex hierarchy

Page 29: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

8/18/97 CPS 220 Alvin R. Lebeck 29

Communication Networks

Performance limiter is memory system, OS overhead, not HW protocols

NodeProcessor

ControlReg. I/F

NetI/F Memory

RequestBlock

ReceiveBlock

Media

Network Controller

Peripheral Backplane Bus

DMA

. . .

Processor MemoryList of request blocks

Data to be transmitted

. . .

List of receive blocks

Data receivedDMA

. . .

List of free blocks

• Send/receive queues in processor OS memory• Network controller copies back and forth via DMA• No host intervention needed• Interrupt host when message sent or received• Memory-to-Memory copy to user space

Page 30: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

Alvin R. Lebeck 30

Network Connected Devices

• High speed networks (10Gb Ethernet soon)• How can we eliminate overheads?Page Flipping• OS places aligned data into memory and remaps

pagesRDMA• Idea is to eliminate kernel-to-user copy (User-level

messaging)• Requires “translation” on Network Interface (NI)• Application registers region with OS• OS stores pointer in NI• On Receive, pointer says where data should go

Page 31: Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001

Alvin R. Lebeck 31

Next Time

• Bus designs (connecting components)

• RAID