Upload
lamphuc
View
213
Download
1
Embed Size (px)
Citation preview
198:211 Computer Architecture
Topics: System I/O Buses
Disk-based storage in computers Memory/storage hierarchy
Combining many technologies to balance costs/benefits
Recall the memory hierarchy and virtual memory lectures
Memory/storage hierarchies Balancing performance with cost
Small memories are fast but expensive Large memories are slow but cheap
Exploit locality to get the best of both worlds locality = re-use/nearness of accesses allows most accesses to use small, fast memory
Cap
acity
Perf
orm
ance
An Example Memory Hierarchy registers!
on-chip L1!cache (SRAM)!
main memory!(DRAM)!
local secondary storage!(local disks)!
Larger, !slower, !
and !cheaper !
(per byte)!storage!devices!
remote secondary storage!(tapes, distributed file systems, Web servers)!
Local disks hold files retrieved from disks on remote network servers.!
Main memory holds disk !blocks retrieved from local !disks.!
off-chip L2!cache (SRAM)!
L1 cache holds cache lines retrieved from the L2 cache memory.!
CPU registers hold words retrieved from L1 cache.!
L2 cache holds cache lines retrieved from main memory.!
L0:!
L1:!
L2:!
L3:!
L4:!
L5:!
Smaller,!faster,!
and !costlier!
(per byte)!storage !devices!
Disk-based storage in computers Memory/storage hierarchy
Combining many technologies to balance costs/benefits
Recall the memory hierarchy and virtual memory lectures
Persistence
Storing data for lengthy periods of time DRAM/SRAM is “volatile”: contents lost if power lost Disks are “non-volatile”: contents survive power outages Disk are blocks access (read/write blocks) Conventional magnetic disks Newer: Solid state disks
What’s Inside A Disk Drive? Spindle
Arm
Actuator
Platters
Electronics
SCSI connector
Image courtesy of Seagate Technology
Disk Electronics
Connect to disk Control processor Cache memory Control ASIC
Connect to motor
Just like a small computer – processor, memory, network interface
Disk “Geometry” Disks contain platters, each with two surfaces Each surface organized in concentric rings called tracks Each track consists of sectors separated by gaps
spindle!
surface!tracks!
track k!
sectors!
gaps!
Disk Geometry (Muliple-Platter View) Aligned tracks form a cylinder
surface 0!
surface 1!surface 2!
surface 3!surface 4!
surface 5!
cylinder k
spindle!
platter 0!
platter 1!
platter 2!
Disk Structure
Read/Write Head
Upper Surface Platter
Lower Surface
Cylinder
Track
Sector
Arm
Actuator
Disk Operation (Single-Platter View) The disk surface !
spins at a fixed!rotational rate!
spindle!
By moving radially, the arm can position the read/write head over any track!
The read/write head!is attached to the end!of the arm and flies over! the disk surface on!a thin cushion of air!
spindle!
spindle!
spin
dle!
spindle!
Disk Operation (Multi-Platter View)
arm!
read/write heads !move in unison!
from cylinder to cylinder!
spindle!
Tracks divided into sectors
Disk Structure - top view of single platter
Surface organized into tracks
Disk Access
Head in position above a track
Disk Access
Rotation is counter-clockwise
Disk Access – Read
About to read blue sector
Disk Access – Read
After BLUE read
After reading blue sector
Disk Access – Read
After BLUE read
Red request scheduled next
Disk Access – Seek
After BLUE read Seek for RED
Seek to red’s track
Disk Access – Rotational Latency
After BLUE read Seek for RED Rotational latency
Wait for red sector to rotate around
Disk Access – Read
After BLUE read Seek for RED Rotational latency After RED read
Complete read of red
Disk Access – Service Time Components
After BLUE read Seek for RED Rotational latency After RED read
Seek Rotational Latency Data Transfer
Disk Access Time Average time to access a specific sector approximated by:
Taccess = Tavg seek + Tavg rotation + Tavg transfer Seek time (Tavg seek)
Time to position heads over cylinder containing target sector
Typical Tavg seek = 3-5 ms Rotational latency (Tavg rotation)
Time waiting for first bit of target sector to pass under r/w head
Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 min e.g., 3ms for 10,000 RPM disk
Transfer time (Tavg transfer) Time to read the bits in the target sector Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1
min e.g., 0.006ms for 10,000 RPM disk with 1,000 sectors/track given 512-byte sectors, ~85 MB/s data transfer rate
Disk Access Time Example Given:
Rotational rate = 7,200 RPM Average seek time = 5 ms Avg # sectors/track = 1000
Derived average time to access random sector: Tavg rotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms Tavg transfer = 60/7200 RPM x 1/1000 secs/track x 1000 ms/sec = 0.008 ms Taccess = 5 ms + 4 ms + 0.008 ms = 9.008 ms
Time to read sector: 0.008 ms
Important points: Access time dominated by seek time and rotational latency First bit in a sector is the most expensive, the rest are free SRAM access time is about 4 ns/doubleword, DRAM about 60 ns
~100,000 times longer to access a word on disk than in DRAM
Disk Scheduling
The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk bandwidth.
Access time has two major components Seek time is the time for the disk are to move the heads
to the cylinder containing the desired sector. Rotational latency is the additional time waiting for the
disk to rotate the desired sector to the disk head. Minimize seek time Seek time ≈ seek distance Disk bandwidth is the total number of bytes transferred,
divided by the total time between the first request for service and the completion of the last transfer.
Disk Scheduling
Several algorithms exist to schedule the servicing of disk I/O requests.
We illustrate them with a request queue (0-199).
98, 183, 37, 122, 14, 124, 65, 67
Head pointer 53
FCFS Illustration shows total head movement of 640 cylinders.!
SSTF
Selects the request with the minimum seek time from the current head position.
SSTF scheduling is a form of SJF scheduling; may cause starvation of some requests.
Illustration shows total head movement of 236 cylinders.
SSTF (Cont.)
SCAN
The disk arm starts at one end of the disk, and moves toward the other end, servicing requests until it gets to the other end of the disk, where the head movement is reversed and servicing continues.
Sometimes called the elevator algorithm. Illustration shows total head movement of 208
cylinders.
SCAN (Cont.)
C-SCAN
Provides a more uniform wait time than SCAN. The head moves from one end of the disk to the other.
servicing requests as it goes. When it reaches the other end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip.
Treats the cylinders as a circular list that wraps around from the last cylinder to the first one.
C-SCAN (Cont.)
C-LOOK Version of C-SCAN Arm only goes as far as the last request in each
direction, then reverses direction immediately, without first going all the way to the end of the disk.
C-LOOK (Cont.)
Selecting a Disk-Scheduling Algorithm
SSTF is common and has a natural appeal SCAN and C-SCAN perform better for systems that place
a heavy load on the disk. Performance depends on the number and types of requests. Requests for disk service can be influenced by the file-
allocation method. The disk-scheduling algorithm should be written as a
separate module of the operating system, allowing it to be replaced with a different algorithm if necessary.
Either SSTF or LOOK is a reasonable choice for the default algorithm.
Solid state disks
Solid state disks
An array of flash memory devices Emulates conventional hard disk drive or HDD No moving parts Consumes less power than HDD Small reads (< 4K) are 20x faster Average reads comparable to HDD reads Writes are still slow ½ x slower than HDD Capacity/cost (today) 0.15$/GB-HDD , 2-3$/GB-SSD
Solid State Drive The interface to the system looks like HDD Read, write and erase Memory consists of blocks Each block contains several pages Each page is 2K or 4K in size Unit of read/write are pages Need to erase before write!
I/O or input and output
In addition to memory, data transfer needs to occur between CPU and Input output devices When reading from memory, a byte or several bytes can be transferred from memory to register using mov address, %eax or mov %eax, address!
I/O devices also are sources or destinations for bytes of data I/O devices can be viewed just as memory I/O devices can be viewed as separate from memory
I/O programming
There are two ways of addressing I/O devices Memory mapped I/O
The address space is divided between memory and I/O devices
Higher order addresses can refer to device Lower order addresses can refer to memory mov %eax, address will fetch data from I/O or memory
based on the address E.g., memory range to from 0000 to BFFF I/O range from C000 to CFFF Device or memory selection based on address range Different devices can have different addresses in the I/O
range
Memory mapped I/O
Send or receive data to /from I/O device is a memory transfer instruction (mov) with the right address Main memory not selected when address is in I/O range Adv
Uniformity of programming, same mov works for I/O and memory Dis adv
Memory address space is reduced
CPU
Memory - I/O Bus
Main Memory keyboard Disk
Display Network
I/O mapped I/O
Memory and I/O devices use distinct address spaces Isolated I/O
Two separate instructions to address I/O devices A separate code or control signal based on the op code will select memory or I/O IN for input OUT for output mov for memory access Less flexible for programming
0000
FFFF
00FF
0000 I/O
MEM
Interfacing with I/O
Many devices, with varying speeds, complexity CPU/bus shared among all peripherals and memory CPU should be able to select a device and transfer data to the device Interpretation of data left to each device Unlike memory, device need to be ready before initiating transfer All of this handed by I/O module
I/O module
CPU selects device by means of address Data corresponds to instructions for device Each device has its own set of commands Status of device can be checked by reading status registers
data
status
I/O logic
To device
To device
Address Control
Data lines
Data transfer schemes
There are two schemes Programmed data transfers
CPU transfers data from I/O devices onto registers Useful for small data transfers
Direct memory access or DMA Device or I/O module directly transfers data to
memory Useful or large block transfers
Programmed I/O
Programmed I/O can be further classified as Synchronous transfer Asynchronous transfer Interrupt driven transfer All of the above can be used to interface with different I/O devices Require special hardware features in the CPU
Synchronous transfer
Simplest among three CPU and I/O speed match Transfer a byte, word, or double word Memory mapped
mov %eax , 2 Address of device port is 2
I/O mapped mov $2, %edx out %eax, %edx
Similarly for Input device, Memory mapped: mov 3, %eax or I/O mapped mov $3, %edx in %edx, %eax
Asynchronous transfer I/O devices slower Instruct device to be ready Wait until device ready Device has status flag/register Busy waiting Waste of CPU resources
Request device to get ready
Issue data transfer command
Yes
READY No
Interrupt driven I/O Processor need not wait for slow device Processor continues with other instructions Device interrupts processor when ready Interrupt Service Routine
CPU transfers word from device to register
CPU writes word from register to memory
Execute instruction
Call Interrupt Service routine
Yes
INT High No
Request device
Fetch next instruction
DMA or direct memory Bulk data transfers Direct device to memory transfer Memory bus is contention between CPU and DMA unit During DMA Either CPU is in hold state Or Cycle stealing CPU and DMA access in interleaved
Execute instruction
Send R/W command Starting address, #bytes
Yes
INT High No
Request DMA device
Fetch next instruction
DMA interrupt
System bus
A bus is a shared communication link Contains address bus, data bus Each bus is a set of wires Bus can transfer several bits between devices connected by bus Bus width determines the number of bits transferred in a cycle
CPU
Memory - I/O Bus
Main Memory keyboard Disk
Display Network
Characteristics of bus
Several devices can be connected Single bus for all devices – cost sharing Added/removed without affecting others I/O devices can be connected to other devices following the same bus standard Disadvantages: Bus contention Speed of I/O devices determined by bus speed Bus speed determined by number of devices Slower devices impact others
Bus architecture
Any interaction consists of two steps 1. Issue command 2. transfer data
Master Initiates Issues command, starting address, #bytes
Slave Responds Sends or receives data as per command from master
Bus Master
Bus Slave
Master issues command
Data can go either way
Computer buses
Modern computers have several I/O devices Varying speeds A simple linear bus will not suffice Modern computers have hierarchical buses Bus is split into different segments CPU-Memory one bus CPU-I/O devices another bus CPU-cache – another bus
Backplane bus
Single bus for memory and I/O Cheap Slow and bus becomes bottleneck
System bus- Memory and I/O
Two-bus systems
Processor-memory traffic on one bus I/O devices connected by a bridge Bridge can connect to different kinds of buses Traffic is isolated I/O buses can provided expansion slots for devices
Processor-Memory bus
I/O bus
Bus Bridge
hierarchical-bus systems
A single bus bridge connects to the processor-memory bus Other I/O buses connected to this bus bridge (tree) CPU-memory sees little contention Costly
Processor-Memory bus
I/O bus
Bus Bridge
I/O bus
L2 cache
Backside cache bus
Examples of buses
ISA bus – Industry Standard bus Old technology 8 Mhz, < 1 byte transfer/cycle, bus B/W 5.3 MB/
sec (1 MB = 1048576 B) EISA bus – Extended ISA
Old technology 8 Mhz, 4 byte transfer, bus B/W 32 Mb/sec
PCI bus- Peripheral Component Interconnect Speeds up to 132 MB/s Bus speed of 33mhz, 4 Bytes/transfer PCI popularized Plug and Play
Examples of buses
PCI-X extended PCI 133 MHz, 8 bytes/transfer, 1064 MB/sec or 1 GB/
sec Used to connect gigabit ethernet, high speed disks
SCSI (Small Computer System Interface) Capable of handling internal/external peripherals Speed anywhere from 80 – 640 Mb/s Many types of SCSI
Fast SCSI Ultra SCSI Ultra wide SCSI
Parallel vs serial (point-to-point) bus
Parallel bus Bus shared among devices Bus arbitration is slow Example: PCI, SCSI
Serial I/O Point to pint links connected directly to CPU Requires lots of additional high speed hardware Examples: SATA, USB, firewire
Processor-Memory bus
I/O bus
Bus Bridge
CPI/IO IO
IO
USB 1.0
plug-and-play Full speed USB devices signal at 12Mb/s Low speed devices use a 1.5Mb/s subchannel. Up to 127 devices chained together
2.0 data rate of 480 mega bits per second
Firewire (apple)
High speed serial port 400 mbps transfer rate 30 times faster than USB 1.0 plug-and-play
Intel Bus
North bridge and South bridge bus
http://www.testbench.in/pcie_sys_2.PNG
http://www.yourdictionary.com/images/computer/CHIPSET.GIF