53
1 醫醫醫醫醫醫醫醫醫 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D. Chapters 8 Storage, Networks, and Other Peripherals 授授授授 : 授授授 授授 (Chuan-Yu Chang Ph.D.) E-mail: [email protected] Tel: (05)5342601 ext. 4337

Chapters 8 Storage, Networks, and Other Peripherals

  • Upload
    cachet

  • View
    69

  • Download
    0

Embed Size (px)

DESCRIPTION

Chapters 8 Storage, Networks, and Other Peripherals. 授課教師 : 張傳育 博士 (Chuan-Yu Chang Ph.D.) E-mail: [email protected] Tel: (05)5342601 ext. 4337. Interfacing Processors and Peripherals. I/O Design affected by many factors (expandability, resilience) - PowerPoint PPT Presentation

Citation preview

Page 1: Chapters 8 Storage, Networks, and Other Peripherals

1醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Chapters 8Storage, Networks, and Other Peripherals

授課教師 : 張傳育 博士 (Chuan-Yu Chang Ph.D.)

E-mail: [email protected]

Tel: (05)5342601 ext. 4337

Page 2: Chapters 8 Storage, Networks, and Other Peripherals

2醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Interfacing Processors and Peripherals

• I/O Design affected by many factors (expandability, resilience)

• I/O system 的 Performance 比 CPU 的 performance 更複雜 :

– 有些 device 注重 access latency – 有些 device 注重 Throughput

• I/O system 的 performance 和系統的許多方面有關:

– connection between devices and the system

– the memory hierarchy

– the operating system

Page 3: Chapters 8 Storage, Networks, and Other Peripherals

3醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Example

– Suppose we have a benchmark that executes in 100 seconds of elapsed time, where 90 seconds is CPU time and the rest is I/O time. If CPU time improves by 50 % per year for the next five years but I/O time doesn’t improve, how much faster will our program run at the end of five years?

– Solution:已知實耗時間 = CPU time + I/O time

100 = 90 + I/O time所以 I/O time = 10 (s)

幾年後 CPU time I/O time 實耗時間 % IO time

0 90 10 100 10%

1 90/1.5 = 60 10 70 10/70 =14%

2 60/1.5 = 40 10 50 10/50 = 20%

3 40/1.5 = 27 10 37 10/37 = 27%

4 27/1.5 = 18 10 28 10/28 = 36%

5 18/1.5 = 12 10 22 10/22 = 45%

所以五年後 CPU 效能提升了 90/12 = 7.5 倍實耗時間提升了 100/22 = 4.5 倍,但 I/O time 佔實耗時間的比率從 10% 增加到 45%

Page 4: Chapters 8 Storage, Networks, and Other Peripherals

4醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Type and characteristics of I/O Devices

• I/O 裝置具有相當多的變化,可歸納出三種特性:– behavior (i.e., input vs. output) – partner (who is at the other end?) – data rate

Device Behavior Partner Data rate (KB/sec)Keyboard input human 0.01Mouse input human 0.02Voice input input human 0.02Scanner input human 400.00Voice output output human 0.60Line printer output human 1.00Laser printer output human 200.00Graphics display output human 60,000.00Modem input or output machine 2.00-8.00Network/LAN input or output machine 500.00-6000.00Floppy disk storage machine 100.00Optical disk storage machine 1000.00Magnetic tape storage machine 2000.00Magnetic disk storage machine 2000.00-10,000.00

Page 5: Chapters 8 Storage, Networks, and Other Peripherals

5醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Type and characteristics of I/O Devices

• Mouse– 滑鼠與系統間的介面可以是下列中的一種:

• 當滑鼠移動時產生一連串的脈衝 (pulse)• 當滑鼠移動時會增加或減少計數器。

Initialposition

of mouse+20 in X– 20 in X

+20 in Y+20 in Y+20 in X

+20 in Y– 20 in X

– 20 in Y– 20 in Y+20 in X

– 20 in Y– 20 in X

Page 6: Chapters 8 Storage, Networks, and Other Peripherals

6醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

I/O Example: Disk Drives

Platter

Track

Platters

Sectors

Tracks

• 硬碟的組成– Platter– Track– Sector

• 為所能 read/write 的最小單位。• Logical Block Access (LBA) 使所能 read/write 的最小單位變成block

• 每個 track 有相同數量的 sector

• Zone Bit Recording (ZBR) 讓外圈有較多的 sector ,以增加容量。

– Cylinder

每個 sector 間會有 gap

每個 sector 內會有 ECC

Page 7: Chapters 8 Storage, Networks, and Other Peripherals

7醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

• Disk access time:– Seek time:

• Move the head to the proper track (8 to 20 ms. avg.)

– Rotational latency: • wait for desired sector to rotate under the read/write head

– Transfer time : • grab the data (one or more sectors) 2 to 15 MB/sec

– Controller time• The overhead the controller imposes in performing an I/O access

– Disk access time = Seek time+ Rotational latency+ Transfer time+ Controller time

Page 8: Chapters 8 Storage, Networks, and Other Peripherals

8醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Type and characteristics of I/O Devices

• 硬碟和軟碟相比具有下列優點:– The hard disk can be larger because it is rigid.– The hard disk has higher density because it can be controlled

more precisely.– The hard disk has a higher data rate because it spins faster.– Hard disks can incorporate more than one platter.

Page 9: Chapters 8 Storage, Networks, and Other Peripherals

9醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Example

• Disk Read time– What is the average time to read or write a 512-byte sector for a typical

disk rotating at 5400 RPM? The advertised average seek time is 12 ms, the transfer rate is 5MB/sec, and the controller overhead is 2 ms. Assume that the disk is idle so that there is no waiting time.

– Solution:• Disk access time = seek time + rotation time + transfer time + controller

overhead• Disk access time =

ms

M

7.1921.06.512

25

512100060

5400

1

2

112

Page 10: Chapters 8 Storage, Networks, and Other Peripherals

10醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

RAID

• Redundant Array of Independent Disks • Redundant Array of Inexpensive Disks• 6 levels in common use• Not a hierarchy• Set of physical disks viewed as single logical drive by O/S• Data distributed across physical drives• Can use redundant capacity to store parity information

Page 11: Chapters 8 Storage, Networks, and Other Peripherals

11醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

RAID 0

• No redundancy• Data striped across all disks• Round Robin striping• Increase speed

– Multiple data requests probably not on same disk– Disks seek in parallel– A set of data is likely to be striped across multiple disks

Page 12: Chapters 8 Storage, Networks, and Other Peripherals

12醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

RAID 1

• Mirrored Disks• Data is striped across disks• 2 copies of each stripe on separate disks• Read from either• Write to both• Recovery is simple

– Swap faulty disk & re-mirror– No down time

• Expensive

Page 13: Chapters 8 Storage, Networks, and Other Peripherals

13醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

RAID 2

• Disks are synchronized• Very small stripes

– Often single byte/word

• Error correction calculated across corresponding bits on disks

• Multiple parity disks store Hamming code error correction in corresponding positions

• Lots of redundancy– Expensive– Not used

Page 14: Chapters 8 Storage, Networks, and Other Peripherals

14醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

RAID 3

• Similar to RAID 2• Only one redundant disk, no matter how large the array• Simple parity bit for each set of corresponding bits• Data on failed drive can be reconstructed from surviving

data and parity info• Very high transfer rates

Page 15: Chapters 8 Storage, Networks, and Other Peripherals

15醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

RAID 4

• Each disk operates independently• Good for high I/O request rate• Large stripes• Bit by bit parity calculated across stripes on each disk• Parity stored on parity disk

Page 16: Chapters 8 Storage, Networks, and Other Peripherals

16醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

RAID 5

• Like RAID 4• Parity striped across all disks• Round robin allocation for parity stripe• Avoids RAID 4 bottleneck at parity disk• Commonly used in network servers

• N.B. DOES NOT MEAN 5 DISKS!!!!!

Page 17: Chapters 8 Storage, Networks, and Other Peripherals

17醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

RAID 6

• Two parity calculations• Stored in separate blocks on different disks• User requirement of N disks needs N+2• High data availability

– Three disks need to fail for data loss– Significant write penalty

Page 18: Chapters 8 Storage, Networks, and Other Peripherals

18醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

RAID 0, 1, 2

Page 19: Chapters 8 Storage, Networks, and Other Peripherals

19醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

RAID 3 & 4

Page 20: Chapters 8 Storage, Networks, and Other Peripherals

20醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

RAID 5 & 6

Page 21: Chapters 8 Storage, Networks, and Other Peripherals

21醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Data Mapping For RAID 0

Page 22: Chapters 8 Storage, Networks, and Other Peripherals

22醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Optical Storage CD-ROM

• Originally for audio• 650Mbytes giving over 70 minutes audio• Polycarbonate coated with highly reflective coat, usually aluminium• Data stored as pits• Read by reflecting laser• Constant packing density

– CD-ROM contains a single spiral track– Sectors near the outside of the disk are the same length as those near

the inside.– Information is packed evenly across the disk in segment of the same size.

• Constant linear velocity– The disk rotate more slowly for accesses near the outer edge than for

those near the center.

Page 23: Chapters 8 Storage, Networks, and Other Peripherals

23醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

I/O Example: Buses

• Shared communication link (one or more wires)• Bus 的優點:

– 多樣性 (versatility) 、低成本 (low cost)

• Difficult design:– may be bottleneck– length of the bus– number of devices– tradeoffs (buffers for higher bandwidth increases latency)– support for many different devices– cost

Page 24: Chapters 8 Storage, Networks, and Other Peripherals

24醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Buses: Connecting I/O Device to Processor and Memory

• Bus transaction– Read: transfers data from memory

– Write : write data to the memory

– Input: putting data from the device to memory

– Output: data will be read from memory and sent to the device.

1. CPU 送出 Read 控制訊號,及 address 給 memory

2. memory 讀取所需的資料

3. memory 將資料送出至 data lines ,並且送出 data 可用訊號給 disk。

4. Disk 將 data line上的資料寫入 disk。

The three steps of an output operation

Page 25: Chapters 8 Storage, Networks, and Other Peripherals

25醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Buses: Connecting I/O Device to Processor and Memory

• Input Operation ( 將磁碟的內容載入 memory)

1. CPU 送出 write request 控制訊號,及 address 給 memory

4. Memory 將 data line 上的資料寫入 Memory

2. 通知 disk , memory 已準備就緒。

3.Disk 將資料送上 data line 。

Page 26: Chapters 8 Storage, Networks, and Other Peripherals

26醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Buses: Connecting I/O Device to Processor and Memory

• Types of buses:– processor-memory bus

• Short, high speed, to maximize memory-processor bandwidth

– backplane bus• Allow processor, memory, and I/O devices to coexist on a single

bus.

• high speed, often standardized, e.g., PCI

– I/O bus • lengthy, different devices, standardized, e.g., SCSI

Page 27: Chapters 8 Storage, Networks, and Other Peripherals

27醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

I/O Bus Standards

• Today we have two dominant bus standards:

Page 28: Chapters 8 Storage, Networks, and Other Peripherals

28醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Buses: Connecting I/O Device to Processor and Memory

Processor MemoryBackplane bus

a. I/O devices

Processor MemoryProcessor-memory bus

b.

Busadapter

Busadapter

I/Obus

I/Obus

Busadapter

I/Obus

Processor MemoryProcessor-memory bus

c.

Busadapter

Backplanebus

Busadapter

I/O bus

Busadapter

I/O bus

Page 29: Chapters 8 Storage, Networks, and Other Peripherals

29醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Buses: Connecting I/O Device to Processor and Memory

• Synchronous and Asynchronous Bus• Synchronous

– use a clock and a synchronous protocol, such as processor-memory bus– The bus can run very fast and the interface logic will be small– 缺點:

• every device must operate at same rate • “clock skew “ requires the bus to be short

• Asynchronous– don’t use a clock and instead use handshaking– Handshaking: Assume that there are there control lines:

• ReadReq:– Indicate a read request for memory. Put the address on the data lines.

• DataRdy:– Indicate the data word is now ready on the data lines.

• Ack:– Used to acknowledge the ReadReq or DataRdy signal of the other party.

Page 30: Chapters 8 Storage, Networks, and Other Peripherals

30醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

I/O read a word from memory

DataRdy

Ack

Data

ReadReq 13

4

57

642 2

1. I/O 送出 ReadReq 的同時,也送出 address 於 data bus 。

2. Memory 回應 Ack ,並讀取 data bus 上的 address ;此時 I/O 裝置收到 Ack 後,將 release ReadReq 及 Data bus。

3. Memory 偵測到 ReadReq low , release Ack 。

4. Memory 準備好 data ,並且將 data 放上 data bus 上,同時送出 DataRdy 訊號通知 I/O 。

5. I/O 偵測到 DataRdy,開始讀取 data bus上的 data ,同時送出Ack 訊號通知 Memory。

6. Memory 收到 Ack ,釋出 DataRdy及 data bus 。

7. I/O 偵測到 DataRdy low , release Ack 。傳輸結束。

Page 31: Chapters 8 Storage, Networks, and Other Peripherals

31醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Example

• Performance Analysis of Synchronous Vs. Asynchronous Bus– The synchronous bus has a clock cycle time of 50 ns, and each bus tran

smission takes 1 clock cycle. The asynchronous bus requires 40 ns per handshake. The data portion of both buses is 32 bits wide. Find the bandwidth for each bus when performing one-word reads from a 200-ns memory.

– Solution:從題目可知,在同步 bus 中每一次傳輸需要花費 1 個時脈週期 (50ns) ,所以從記憶體中讀取一個字組,需要花費: 1. Send the address to memory: 50ns + 2. Read the memory: 200 ns + 3. Send the data to the device: 50nstotal time = 300ns所以傳輸 4 bytes 需花 300ns 4/300ns = 13.3 MB/sec

– 在非同步 bus 中,每次 handshake 需花費 40 ns ,而非同步 bus 的七個步驟中有需多步驟可以重疊進行,步驟 2~4 可重疊 ( 因為 memory access 時間較長 ) ,所以從記憶體中讀取一個字組,需要花費: Step 1: 40ns + Step 2, 3, 4: max (3*40ns, 200ns): 200 ns + Step 5, 6, 7: 3x40 120nstotal time = 360ns所以傳輸 4 bytes 需花 360ns 4/360ns = 11.1 MB/sec

Page 32: Chapters 8 Storage, Networks, and Other Peripherals

32醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Buses: Connecting I/O Device to Processor and Memory

• Increasing the Bus Bandwidth– Data bus width

• 增加 data bus 的寬度– Separate versus multiplexed address and data lines

• 將 data 和 address 分別用不同的 bus ,如此在一個 bus cycle 可同時傳送 address 和 data 。

– Block transfer• 不需送出位址及釋放 bus ,允許 bus 一個接一個的傳送 multiple wo

rds ,如此將降低傳送大量區塊資料的時間。

Page 33: Chapters 8 Storage, Networks, and Other Peripherals

33醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Example:

• Performance Analysis of two bus schemes– Suppose we have a system with the following characteristics:

• A memory and bus system supporting block access of 4 to 16 32-bit words.

• A 64-bit synchronous bus clocked at 200MHz, with each 64-bit transfer taking 1 clock cycle, and 1 clock cycle required to send an address to memory.

• Two clock cycles needed between each bus operation. (Assume the bus is idle before an access.)

• A memory access time for the first four words of 200ns; each additional set of four words can be read in 20 ns. Assume that a bus transfer of the most recently read data and a read of the next four words can be overlapped.

– Find the sustained bandwidth and the latency for a read of 256 words for transfers that use 4-word blocks and for transfers that use 16-word blocks. Also compute the effective number of bus transactions per second for each case. Recall that a single bus transaction consists of an address transmission followed by data.

Page 34: Chapters 8 Storage, Networks, and Other Peripherals

34醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Example:

• Solution:– Bus clock = 200MHz 一個 clock cycle=1/200MHz = 5ns– 針對 4-word block transfer ,每一個 block 需要

• 傳送 address 到 memory: 1 clock cycle• 讀取 memory 中的 data: 200ns / 5ns = 40 clock cycle• 從 memory 傳送 data: 2 clock cycle• 每一次傳輸之間的暫停 : 2 clock cycle

– 共需要 1+40+2+2=45 clock cycle , 256/4=64 次傳輸。– 因此,共需要 45x64=2880 clock cycle = 2880x5ns =14400ns– Transaction per second = 64/14400ns = 4.44M transaction/sec– Bus bandwidth = (256x4)/14400 = 71.11MB/sec– 針對 16-word block transfer ,每一個 block 需要

• 傳送 address 到 memory: 1 clock cycle• 讀取 memory 中的 data: 200ns / 5ns = 40 clock cycle• 從 memory 傳送 data: 2 clock cycle x 4 = 8 clock cycle• 每一次傳輸之間的暫停 : 2 clock cycle x 4 = 8 clock cycle

– 共需要 1+40+8+8=57 clock cycle , 256/16=16 次傳輸。– 因此,共需要 57x16=912 clock cycle = 912x5ns =4560ns– Transaction per second = 16/4560ns = 3.51 M transaction/sec– Bus bandwidth = (256x4)/4560 = 224.56MB/sec

Page 35: Chapters 8 Storage, Networks, and Other Peripherals

35醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Buses: Connecting I/O Device to Processor and Memory

• Obtaining Access to the Bus– In a single-master system, all bus requests must be controlled by th

e processor.

– 缺點: processor 必須處理每一個 bus transaction.

Memory Processor

Bus request lines

Bus

Disks

Bus request lines

Bus

Disks

Processor

Bus request lines

Bus

Disks

a.

b.

c.

ProcessorMemory

Memory

Page 36: Chapters 8 Storage, Networks, and Other Peripherals

36醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Buses: Connecting I/O Device to Processor and Memory

• Bus Arbitration:

– Deciding which bus master gets to use the bus next.

– 仲裁時須注意 bus priority 及 fairness 。– 四種匯流排仲裁:

• Daisy chain arbitration (not very fair)

• Centralized arbitration (requires an arbiter), e.g., PCI

• Distributed arbitration by self selection, e.g., NuBus used in Macintosh

• Distributed arbitration by collision detection, e.g., Ethernet

Device n

Lowest priority

Device 2Device 1

Highest priority

Busarbiter

Grant

Grant Grant

Release

Request

Page 37: Chapters 8 Storage, Networks, and Other Peripherals

37醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

D1 D2 D3 D4

Bus Busy

Bus

匯流排控制器

BG1

BR1

Centralized arbitration (requires an arbiter), e.g., PCI

Page 38: Chapters 8 Storage, Networks, and Other Peripherals

38醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

• Communicating with the Processor:

– Polling• The process of periodically checking status bits to see if it

is time for the next I/O operation.

– Interrupts• When an I/O device requires attention from the processor.

– Direct Memory Access, DMA• Off-loading the processor and having the device controller

transfer data directly to or from the memory without

involving the processor.

Page 39: Chapters 8 Storage, Networks, and Other Peripherals

39醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Interrupts

• Mechanism by which other modules (e.g. I/O) may interrupt normal sequence of processing– Program

• e.g. overflow, division by zero

– Timer• Generated by internal processor timer

• Used in pre-emptive multi-tasking

– I/O• from I/O controller

– Hardware failure• e.g. memory parity error

Page 40: Chapters 8 Storage, Networks, and Other Peripherals

40醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Program Flow Control

Page 41: Chapters 8 Storage, Networks, and Other Peripherals

41醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Interrupt Cycle

• Added to instruction cycle• Processor checks for interrupt

– Indicated by an interrupt signal

• If no interrupt, fetch next instruction• If interrupt pending:

– Suspend execution of current program – Save context– Set PC to start address of interrupt handler routine– Process interrupt– Restore context and continue interrupted program

Page 42: Chapters 8 Storage, Networks, and Other Peripherals

42醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Instruction Cycle (with Interrupts) - State Diagram

Page 43: Chapters 8 Storage, Networks, and Other Peripherals

43醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Multiple Interrupts

• Disable interrupts– Processor will ignore further interrupts whilst processing one

interrupt– Interrupts remain pending and are checked after first interrupt has

been processed– Interrupts handled in sequence as they occur

• Define priorities– Low priority interrupts can be interrupted by higher priority

interrupts– When higher priority interrupt has been processed, processor

returns to previous interrupt

Page 44: Chapters 8 Storage, Networks, and Other Peripherals

44醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Multiple Interrupts - Sequential

Page 45: Chapters 8 Storage, Networks, and Other Peripherals

45醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Multiple Interrupts - Nested

Page 46: Chapters 8 Storage, Networks, and Other Peripherals

46醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Concept of DMA

• 直接記憶體存取 (Direct Memory Access, DMA)• 直接記憶體存取 (DMA) 是一種介面, DMA 控制器 (DMA controll

er) 利用週期竊取 (cycle stealing) 的方式將記憶體單元中的資料直接對周邊作大量資料的傳輸。

• 當 CPU 送出起始位址及傳送字數以啟動 DMAC 之後,由於 CPU進行指令解碼及執行的時候,並不會使用到系統的匯流排,因此DMAC 會利用這個階段,使 CPU 讓出系統的匯流排, DMAC 就可以使用系統的匯流排,直接使周邊與記憶單元間作資料的傳輸,不必經由 CPU 的管理。這種技術就稱為週期竊取 (cycle stealing) 。

• DMA 與程式 I/O 不同的地方在於 DMA 不使用 CPU 的暫存器,直接竊取記憶週期,進行資料傳送,適用於高速的周邊設備。

Page 47: Chapters 8 Storage, Networks, and Other Peripherals

47醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

1. 週邊裝置向 DMAC 提出服務請求2. DMAC 向 CPU 提出統匯流排請求3. CPU完成目前週期,回應 HLDA

4. DMAC 向週邊裝置提出服務認可 5. DMAC 送出欲傳送資料的起始位址後,開始進行 memory 與周邊之間資料傳輸

6. 傳輸完畢, DMAC disable HRQ ,將系統匯流排控制權交還給 CPU

Page 48: Chapters 8 Storage, Networks, and Other Peripherals

48醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Example:

• Overhead of Polling in an I/O system– Assume that the number of clock cycles for a polling operation

is 400 and the processor executes with a 500MHz clock. Determine the fraction of CPU time consumed for the following three cases, assuming that you poll often enough so that no data is ever lost and assuming that the devices are potentially always busy:

• The mouse must be polled 30 times per second to ensure that we do not miss any movement made by the user.

• The floppy disk transfers data to the processor in 16-bit units and has a data rate of 50KB/sec. No data transfer can be missed.

• The hard disk transfers data in four-word chunks and can transfer at 4MB/sec. Again, no transfer can be missed.

Page 49: Chapters 8 Storage, Networks, and Other Peripherals

49醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Example:

• Solution:– (a) For mouse

30x400 = 12000 cycles per second每秒消耗 processor 的時間比例 =12000 / 500M = 0.002%

– (b) For Floppy每秒可存取 50KB ,每次 16-bit = 2Bytes所以需要 50KB / 2 = 25K 次的 poll ,每次 poll 花費 400 個 clock cycle ,共需要 25K x 400 = 10

000K每秒消耗 processor 的時間比例 = 10000K / 500M = 2%

– (c)For hard disk每秒可存取 4MB ,每次 4 word = 16Bytes所以需要 4MB / 16 = 250K 次的 poll ,

每次 poll 花費 400 個 clock cycle ,共需要 250K x 400 = 100000K每秒消耗 processor 的時間比例 = 100000K / 500M = 20%

Page 50: Chapters 8 Storage, Networks, and Other Peripherals

50醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Example:

• Overhead of Interrupt-driven I/O– Suppose we have the same hard disk and processor we used

in the previous example, but we use interrupt-driven I/O. The overhead for each transfer is 500 clock cycles. Find the fraction of the processor consumed if the hard disk is only transferring data 5% of the time.

• Solution– 每秒可存取 4MB ,每次 4 word = 16Bytes

所以需要 4MB / 16 = 250K 次的 interrupt ,每次 interrupt 花費 500 個 clock cycle ,共需要 250K x 500 = 125000K

每秒消耗 processor 的時間比例 = 125000K / 500M = 25%假設 the hard disk is only transferring data 5% of the time ,則 每秒消耗 processor 的時間比例 = 25% x 5% = 1.25%

Page 51: Chapters 8 Storage, Networks, and Other Peripherals

51醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

Example:

• Overhead of I/O Using DMA– Suppose we have the same processor and hard disk we used

in the previous example, Assume that the initial setup of a DMA transfer takes 1000 clock cycles for the processor, and assume the handling of the interrupt at DMA completion requires 500 clock cycles for the processor. The hard disk has a transfer rate of 4MB/sec and uses the DMA. IF the average transfer from the disk is 8 KB, what fraction of the 500MHz processor is consumed if the disk is actively transferring 100% of the time?

– Solution: 由題目可知使用 DMA 的硬碟傳輸率為 4MB/sec ,硬碟平均每次傳輸 8KB ,因此每次傳輸需花費 8K / 4MB = 0.002秒如果硬碟不斷的傳輸資料,則需花費(1000+500) / 0.002 = 750000 clock/sec每秒消耗 processor 的時間比例 = 750000/ 500M = 0.15%

Page 52: Chapters 8 Storage, Networks, and Other Peripherals

52醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

I/O System Design

• Consider the following computer system:– A CPU that sustains 3 billion instructions per second and averages

100000 instructions in the operating system per I/O operation.

– A memory backplane bus capable of sustaining a transfer rate of 1000 MB/sec.

– SCSI Ultra320 controllers with a transfer rate of 320MB/sec and accommodating up to 7 disks.

– Disk drives with a read/write bandwidth of 75 MB/sec and an average seek plus rotational latency of 6 ms

• If the workload consists of 64KB reads (where the block is sequential on a track) and the user program needs 200000 instructions per I/O operation, find the maximum sustainable I/O rate and the number of disks and SCSI controllers required. Assume that the reads can always be done on an idle disk if one exists (i.e., ignore disk conflicts).

Page 53: Chapters 8 Storage, Networks, and Other Peripherals

53醫學影像處理實驗室 (Medical Image Processing Lab.) Chuan-Yu Chang Ph.D.

I/O System Design

• 系統中兩個固定元件是 memory bus 和 CPU ,首先計算可支援的 I/O rate ,及知道哪一個是瓶頸。– 已知每個 I/O 花費 200000 個使用者指令與 100000 個 OS 指令,因此

maximum I/O rate of CPU = Instruction execution rate / Instructions per I/O= 3*109/(200+100)*103 = 10000 I/Os/second

– 每個 I/O 傳輸 64KB ,因此maximum I/O rate of bus= Bus bandwidth / Bytes per I/O= 1000*106/64*103

= 15625 I/Os/second– 因為 10000 I/Os/second < 15625 I/Os/second ,所以瓶頸發生在 CPU– 接著決定需要多少磁碟才能提供每秒 10000 個 I/O 動作

• 先計算每個 I/O 動作在磁碟花多少時間time per I/O at disk = seek + rotational time + transfer time = 6 + 64K/75M =6.9 ms

• 因此每個磁碟可完成每秒 1000ms/6.9ms = 146 個 I/O 動作。• 未滿足 CPU 每秒 10000 個 I/O 動作,需要 10000/146=69 個磁碟

– 為計算 SCSI bus 的個數,我們需檢查每個磁碟的平均傳輸率• Transfer rate=transfer size/transfer time=64K/6.9ms=9.56MB/sec• 因為最大的 SCSI 數為 7 個,因此不會飽和匯流排