33
Chapter 8: Part II Storage, Network and Other Peripherals

Chapter 8: Part II Storage, Network and Other Peripherals

  • View
    222

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Chapter 8: Part II Storage, Network and Other Peripherals

Chapter 8: Part II

Storage, Network and Other Peripherals

Page 2: Chapter 8: Part II Storage, Network and Other Peripherals

Performance Analysis: Sync. vs. Async. Synchronous bus: clock time=50ns, each

transaction takes one clock cycle Asynchronous bus: 40 ns per handshake Data portion=32 bits Question: Find the bandwidth of each bus

when performing one-word reads from a 200ns memory.

Page 3: Chapter 8: Part II Storage, Network and Other Peripherals

Sync. vs. Async. Buses (I) For the synchronous bus:

1. Send the address to memory:50 ns2. Read the memory: 200 ns3. Send the data to the device: 50 ns

Total time= 300 ns, bandwidth=4bytes/300ns=13.3 MB/sec

Page 4: Chapter 8: Part II Storage, Network and Other Peripherals

Sync. vs. Async. Buses (II) For the asynchronous bus:

1. Step 1: 40 ns2. Step 2,3,4: max(3x40, 200ns)=200ns3. Step 5,6,7: 3x40ns = 120ns

Total time = 360 ns, maximum bandwidth= 4bytes/360ns = 11.1 MB/s

Page 5: Chapter 8: Part II Storage, Network and Other Peripherals

Increasing Bus Bandwidth Data bus width Separate versus multiplexed address and

data lines Block transfers

Page 6: Chapter 8: Part II Storage, Network and Other Peripherals

Performance Analysis of Two Bus Schemes Given a system with

a memory and bus system supporting block access of 4 to 16 words

a 64-bit synchronous bus clocked at 200MHz, with each 64-bit transfer taking 1 clock cycle, and 1 clock cycle to send an address to memory

two clock cycles needed between each bus operation memory access for first 4 words takes 200ns, each

additional set of 4 words requires 20ns

Page 7: Chapter 8: Part II Storage, Network and Other Peripherals

Question Find the sustained bandwidth and latency

for a read of 256 words for transfers using 4-word blocks and 16-word blocks.

Find the effective number of bus transactions for each case.

Page 8: Chapter 8: Part II Storage, Network and Other Peripherals

4-Word Block Transfer 1 clock cycle to send address to memory 200ns/(5ns/cycle) = 40 cycles to read

memory 2 cycles to send data from memory 2 idle cycles Total = 45 cycles 256 words requires 45x64= 2880 cycles

Page 9: Chapter 8: Part II Storage, Network and Other Peripherals

4-Word Block Transfer Latency = 2880 cycles x 5ns/cycle =

14400 ns Number of bus transactions = 64 x

1s/14400ns = 4.44M transactions/s Bandwidth = (256x4 bytes)x 1/14400ns =

71.11 MB/s

Page 10: Chapter 8: Part II Storage, Network and Other Peripherals

16-Word Block Transfer

1 clock cycle to send address to memory 40 cycles to read first 4 words from memory 2 cycles to send data, during which the read of

the next 4 words is started. 2 idle cycles between transfers, during which

the read of the next block is completed. Need to repeat the last two steps 3 times to

read a total of 16 words.

Page 11: Chapter 8: Part II Storage, Network and Other Peripherals

16-Word Block Transfer Total cycles required = 1 + 40 + 4x(2+2) =57

cycles 256/16=16 transactions are required Total number of cycles required for 256 word =

16x57 = 912 cycles, latency = 4560 ns Number of bus transactions = 16 x 1s/4560ns

= 3.51M transactions/s Bandwidth = (256x4 bytes)x 1/4560ns =

224.56 MB/

Page 12: Chapter 8: Part II Storage, Network and Other Peripherals

Bus Arbitration Daisy chain arbitration (not very fair)

Centralized arbitration (requires an arbiter), e.g., PCI

Self selection, e.g., NuBus used in Macintosh

Collision detection, e.g., Ethernet

Page 13: Chapter 8: Part II Storage, Network and Other Peripherals

Bus Standards PCI ( a general

purpose backplane bus)

SCSI (Small Computer System Interface)

IEEE 1394 (Firewire)

USB 2.0

Characteristic Firewire(1394) USB 2.0

Bus width 4 2

Clocking asynchronous asynchronous

Peak bandwidth 50MB/s (Firewire 400)100MB/s (Firewire 800)

0.2 MB/s1.5 MB/s60 MB/s

Hot pluggable Yes Yes

Max # of devices 63 127

Max. Bus length 4.5M 5M

Page 14: Chapter 8: Part II Storage, Network and Other Peripherals

Interfacing I/O Devices How is a user I/O request transformed into

a device command and communicated to the device?

How is data actually transferred to or from a memory location?

What is the role of the operating system?

Page 15: Chapter 8: Part II Storage, Network and Other Peripherals

Role of the OS The OS plays a major role in handling I/O,

in that: I/O system is shared by multiple programs

using the processor I/O system often use interrupts (cause transfer

to supervisor mode) low-level control of I/O is complex

Page 16: Chapter 8: Part II Storage, Network and Other Peripherals

Communications between OS and I/O Devices

The OS must be able to give commands to I/O.

The I/O must be able to notify the OS when operation is completed or error has occurred.

Data must be transferred between memory and an I/O device.

Page 17: Chapter 8: Part II Storage, Network and Other Peripherals

Giving Commands to I/O To give a command, the processor must

be able to address the device and to supply command words: memory-mapped I/O: portions of the address

space is assigned to I/O devices special I/O: dedicated I/O instructions in the

processor.

Page 18: Chapter 8: Part II Storage, Network and Other Peripherals

Communicating with the Processor

Polling Interrupts DMA

Page 19: Chapter 8: Part II Storage, Network and Other Peripherals

Polling Polling: processor periodically checks the

status of I/O. Overhead of polling in an I/O system

Example 1: mouse Example 2: floppy disk Example 3: hard disk

Page 20: Chapter 8: Part II Storage, Network and Other Peripherals

Mouse Assume the number of clock cycles for a

polling operation, including transferring to the polling routine, accessing the device, and restarting the user program, is 400, with a 500 MHz clock.

The mouse must be polled 30 times a second to ensure that no user movement is missed.

Fraction of CPU time = 30x400/(500x10^6) = 0.002%

Page 21: Chapter 8: Part II Storage, Network and Other Peripherals

Floppy Disk

The floppy disk transfers data to the processor in 16-bit units and has a data rate of 50KB/s.

Polling rate = (50KB/s)/(2 Bytes/polling)= 25K polling/sec

Fraction of CPU time = 25Kx400/(500x10^6) = 2%

Page 22: Chapter 8: Part II Storage, Network and Other Peripherals

Hard Disk

Transfer in 4-word blocks transfer rate: 4MB/s Polling rate = (4MB/s)/(4x4 Bytes/polling)

= 250K polling/sec Fraction of CPU time =

250Kx400/(500x10^6) = 20%

Page 23: Chapter 8: Part II Storage, Network and Other Peripherals

Overhead of Polling Can do the polling only when the device is

active, thus reducing the overhead. However, the overhead is still significant,

resulting in another design called interrupt-driven I/O.

Page 24: Chapter 8: Part II Storage, Network and Other Peripherals

Overhead of Interrupt-Driven I/O Assume the overhead for each transfer, including

the interrupt, is 500 cycles. Cycles per second for disk = 250Kx500

= 125x10^6 cycles Fraction of processor consumed =

125x10^6/(500x10^6) = 25% Assuming disk is transferring data 5% of the time,

fraction of CPU on average = 25%x5%=1.25%

Page 25: Chapter 8: Part II Storage, Network and Other Peripherals

Direct Memory Access(DMA) If disk is transferring data most of the time, the

overhead for interrupt-driven I/O is still high. For high-bandwidth device, let the device

controller transfer data directly to or from the memory without involving the processor, known as direct memory access.

Interrupt is used to signal the completion of I/O transfer or error.

Note: How does it affect cache design?

Page 26: Chapter 8: Part II Storage, Network and Other Peripherals

Overhead of I/O Using DMA Assume initial setup of DMA transfer takes

1000 cycles, handling of interrupt at DMA completion takes 500 cycles, average transfer from disk is 8KB

Each DMA transfer takes 8KB/(4MB/s) = 2x10^-3s

If the disk is constantly transferring data, it requires: (1000+500)/(2x10^-3) = 750x10^3 cycles

Fraction of CPU time= 750x10^3/(500x10^6) = 0.15%

Page 27: Chapter 8: Part II Storage, Network and Other Peripherals

I/O System Design Latency constraints: ensuring the latency

to complete and I/O operation is bounded. Bandwidth constraints Performance Analysis techniques:

— queuing theory— simulation— analysis

Page 28: Chapter 8: Part II Storage, Network and Other Peripherals

I/O System Design- Example CPU: 3 BIPS, average 100,000 instructions in the

OS per I/O operation backplane bus transfer rate: 1000 MB/s SCSI-Ultra 320 controller with transfer rate = 320

MB/s, accommodating up to 7 disks Disk bandwidth = 75MB/s, seek+rotational

latency=6 ms Workload: 64-KB reads, user program need

200,000 instructions per I/O

Page 29: Chapter 8: Part II Storage, Network and Other Peripherals

Example Find

the maximum sustainable I/O rate the number of disks and SCSI controller

required.

Page 30: Chapter 8: Part II Storage, Network and Other Peripherals

Real Stuff: Buses and Network of P4

Page 31: Chapter 8: Part II Storage, Network and Other Peripherals

Intel P4 I/O Chip Sets

Page 32: Chapter 8: Part II Storage, Network and Other Peripherals

A Digital Camera

Page 33: Chapter 8: Part II Storage, Network and Other Peripherals

SoC (System on a chip)