Operating Systems I/0 devices
Nipun Batra
Motivation
What good is a computer without any I/O devices? - keyboard, display, disks
We want: - H/W that will let us plug in different devices - OS that can interact with different combinations
Largely a communication problem…
2
3
CPU RAM
Memory Bus
System Architecture
4
CPU RAM
Graphics
Memory BusGeneral I/O Bus (e.g., PCI)
System Architecture
5
CPU RAM
Graphics
Memory BusGeneral I/O Bus (e.g., PCI)
Peripheral I/O Bus (e.g., SCSI, SATA, USB)
Why use hierarchical buses?
System Architecture
Canonical Device
6
Status COMMAND DATADevice Registers:
Canonical Device
7
Status COMMAND DATA
OS reads/writes to these
Device Registers:
Canonical Device
8
Status COMMAND DATA
OS reads/writes to these
Hidden Internals: ???
Device Registers:
Canonical Device
9
Status COMMAND DATADevice Registers:
OS reads/writes to these
Hidden Internals:Microcontroller (CPU+RAM) Extra RAM Other special-purpose chips
Example Protocol
while (STATUS == BUSY) ; // spin Write data to DATA register Write command to COMMAND register while (STATUS == BUSY) ; // spin
10
while (STATUS == BUSY) // 1 ; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 ;
11
CPU:
Disk:
Example
while (STATUS == BUSY) // 1 ; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 ;
12
A
C
CPU:
Disk:
Example
while (STATUS == BUSY) // 1 ; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 ;
13
A
C
A wants to do I/OCPU:
Disk:
Example
1
while (STATUS == BUSY) // 1 ; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 ;
14
A
C
CPU:
Disk:
Example
1
while (STATUS == BUSY) // 1 ; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 ;
15
2
A
AC
CPU:
Disk:
Example
1
while (STATUS == BUSY) // 1 ; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 ;
16
2
A
AC
3
CPU:
Disk:
Example
1
while (STATUS == BUSY) // 1 ; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 ;
17
2 43
A
C A
CPU:
Disk:
Example
1
while (STATUS == BUSY) // 1 ; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 ;
18
2 43
A
C A
CPU:
Disk:
Example
1
while (STATUS == BUSY) // 1 ; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 ;
19
2 43
A B
C A
CPU:
Disk:
Example
1
while (STATUS == BUSY) // 1 ; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 ;
20
2 43
A B
C A
CPU:
Disk:
Example
How to avoid spinning?
1
while (STATUS == BUSY) // 1 ; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 ;
21
2 43
A B
C A
CPU:
Disk:
Example
How to avoid spinning? Interrupts!
1 2 43
A B
C A
how to avoid spinning? interrupts!
while (STATUS == BUSY) // 1 wait for interrupt; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 wait for interrupt;
22
CPU:
Disk:
Example
while (STATUS == BUSY) // 1 wait for interrupt; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 wait for interrupt;
23
23,4
A B
C A
how to avoid spinning? interrupts!
B B AA
1
CPU:
Disk:
Example
Interrupts vs. Polling
Discuss: are interrupts ever worse?
24
Interrupts vs. Polling
Discuss: are interrupts ever worse?
Interrupts can sometimes lead to livelock - e.g., flood of network packets
25
Interrupts vs. Polling
26
Interrupts vs. Polling
• Discuss: are interrupts ever worse?
26
Interrupts vs. Polling
• Discuss: are interrupts ever worse?• Interrupts can sometimes lead to livelock
26
Interrupts vs. Polling
• Discuss: are interrupts ever worse?• Interrupts can sometimes lead to livelock
• e.g., flood of network packets
26
Interrupts vs. Polling
• Discuss: are interrupts ever worse?• Interrupts can sometimes lead to livelock
• e.g., flood of network packets• Techniques:
26
Interrupts vs. Polling
• Discuss: are interrupts ever worse?• Interrupts can sometimes lead to livelock
• e.g., flood of network packets• Techniques:
• hybrid approach
26
Interrupts vs. Polling
• Discuss: are interrupts ever worse?• Interrupts can sometimes lead to livelock
• e.g., flood of network packets• Techniques:
• hybrid approach• Poll for a while, then wait for interrupts
26
Interrupts vs. Polling
• Discuss: are interrupts ever worse?• Interrupts can sometimes lead to livelock
• e.g., flood of network packets• Techniques:
• hybrid approach• Poll for a while, then wait for interrupts
• interrupt coalescing
26
Interrupts vs. Polling
• Discuss: are interrupts ever worse?• Interrupts can sometimes lead to livelock
• e.g., flood of network packets• Techniques:
• hybrid approach• Poll for a while, then wait for interrupts
• interrupt coalescing• Coalesce or combine the delivery of multiple
interrupts
26
Protocol Variants
Status checks: polling vs. interrupts
Data: PIO vs. DMA
Control: special instructions vs. memory-mapped I/O
27
while (STATUS == BUSY) // 1 wait for interrupt; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 wait for interrupt;
28
23,4
A BCPU:
Disk: C A
B B AA
1
while (STATUS == BUSY) // 1 wait for interrupt; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 wait for interrupt;
29
23,4
A BCPU:
Disk: C A
B B AA
1
while (STATUS == BUSY) // 1 wait for interrupt; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 wait for interrupt;
29
23,4
A BCPU:
Disk: C A
B B AA
1
What else can we optimize?
while (STATUS == BUSY) // 1 wait for interrupt; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 wait for interrupt;
29
23,4
A BCPU:
Disk: C A
B B AA
1
What else can we optimize? Data transfer!
Programmed I/O vs. Direct Memory Access
PIO (Programmed I/O): - CPU directly tells device what data is
DMA (Direct Memory Access): - CPU leaves data in memory - DMA device does copy
30
PIO Flow
31
CPU RAM
Disk
DMA Flow
32
CPU RAM
Disk
while (STATUS == BUSY) // 1 wait for interrupt; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 wait for interrupt;
33
23,4
A BCPU:
Disk: C A
B B AA
1
while (STATUS == BUSY) // 1 wait for interrupt; initiate DMA transfer // 2a wait for interrupt // 2b Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 wait for interrupt;
34
2b,3,4
A BCPU: B B A
1
DMA:
Disk: C A
A
2a
B
Protocol Variants
Status checks: polling vs. interrupts
Data: PIO vs. DMA
Control: special instructions vs. memory-mapped I/O
35
ACPU:
Disk: C A
B B A
1 3,4
while (STATUS == BUSY) // 1 wait for interrupt; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 wait for interrupt;
36
ACPU:
Disk: C A
B B A
1 3,4
while (STATUS == BUSY) // 1 wait for interrupt; Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4 wait for interrupt;
37
How does OS read and write registers?
Special Instructions vs. Mem-Mapped I/OSpecial instructions - each device has a port - in/out instructions (x86) communicate with device
Memory-Mapped I/O - H/W maps registers into address space - loads/stores sent to device
Doesn’t matter much (both are used).
38
Variety is a ChallengeProblem: - many, many devices - each has its own protocol
How can we avoid writing a slightly different OS for each H/W combination?
39
Solution
Encapsulation!
Write driver for each device.
Drivers are 70% of Linux source code.
40
Solution
Encapsulation!
Write driver for each device.
Drivers are 70% of Linux source code.
Encapsulation also enables us to mix-and-match devices, schedulers, and file systems.
41
Storage Stack
42
applicationfile systemscheduler
driverhard drive
Storage Stack
43
applicationfile systemscheduler
driverhard drive
build common interface on top of all HDDs
Storage Stack
44
applicationfile systemscheduler
driverhard drive
build common interface on top of all HDDs
what about special capabilities?
Hard-disk Basic Interface
45
Hard-disk Basic Interface
Disk has a sector-addressable address space
45
Hard-disk Basic Interface
Disk has a sector-addressable address space(so a disk is like an array of sectors).
45
Hard-disk Basic Interface
Disk has a sector-addressable address space(so a disk is like an array of sectors).
45
Hard-disk Basic Interface
Disk has a sector-addressable address space(so a disk is like an array of sectors).
Sectors are typically 512 bytes or 4096 bytes.
45
Hard-disk Basic Interface
Disk has a sector-addressable address space(so a disk is like an array of sectors).
Sectors are typically 512 bytes or 4096 bytes.
45
Hard-disk Basic Interface
Disk has a sector-addressable address space(so a disk is like an array of sectors).
Sectors are typically 512 bytes or 4096 bytes.
Main operations: reads + writes to sectors (blocks).
45
46
Platter
Hard-disk Basic Interface
46
Platter
Hard-disk Basic Interface
Platter is covered with a magnetic film.
47
Spindle
Hard-disk Basic Interface
48
Surface
Surface
Hard-disk Basic Interface
49
Many platters may be bound to the spindle.
Hard-disk Basic Interface
50
Hard-disk Basic Interface
51
Each surface is divided into rings called tracks. A stack of tracks (across platters) is called a cylinder.
Hard-disk Basic Interface
52
The tracks are divided into numbered sectors.
123
065 4
78
9
1011
1514
1312
16
17
18
19
23
22
21
20
Hard-disk Basic Interface
53
Heads on a moving arm can read from each surface.
123
065 4
78
9
1011
1514
1312
16
17
18
19
23
22
21
20
Hard-disk Basic Interface
54
123
065 4
78
9
1011
1514
1312
16
17
18
19
23
22
21
20
spin
Spindle/platters rapidly spin.
Hard-disk Basic Interface
Let’s Read 12!
56
123
065 4
78
9
1011
1514
1312
16
17
18
19
23
22
21
20
Seek to right track.
57
12
3
065 4
78
9
10
11
1514
1312
16
17
18
19
23
22
21
20
Seek to right track.
58
12
3
065 4
78 9
10
11
15
14
1312
1617
18
19
23
22
2120
Seek to right track.
59
12
3
06
5 4
7
8 910
11
15
1413 12
16 17
18
19
23
22
21 20
Wait for rotation.
60
1 23
06 5 4
78 9
1011
1514
13 12
16
17
1819
2322
21
20
Wait for rotation.
61
1 230
6 547
89 10
11
1514 13
12
16
17 18
19
23
22 21
20
Wait for rotation.
62
12 3
0 654
78
910 11
15 1413
12
16
17
18 19
23 22
21
20
Wait for rotation.
63
12
3
0654
78
9
10
11
1514
1312
16
17
18
19
23
22
21
20
Wait for rotation.
64
12
30
654
789
1011
1514
1312
1617
1819
2322
2120
Wait for rotation.
65
123 0
654 7
8910
11
151413
12
16
1718
19
23
2221
20
Transfer data.
66
123
065
4
7 89
10
11
15
141312
1617
18
19
23
22
2120
Transfer data.
67
123
0654
7 8
91011
151413
12
16
17
1819
2322
21
20
Transfer data.
68
123
0654
7 8
91011
151413
12
16
17
1819
2322
21
20
Yay!
69
123
065
4
78
910
11
1514
13
12
16
17
1819
23
22
21
20
Seek, Rotate, Transfer
70
Seek, Rotate, Transfer
Must accelerate, coast, decelerate, settle
70
Seek, Rotate, Transfer
Must accelerate, coast, decelerate, settle
70
Seek, Rotate, Transfer
Must accelerate, coast, decelerate, settle
Seeks often take several milliseconds!
70
Seek, Rotate, Transfer
Must accelerate, coast, decelerate, settle
Seeks often take several milliseconds!
70
Seek, Rotate, Transfer
Must accelerate, coast, decelerate, settle
Seeks often take several milliseconds!
Settling alone can take 0.5 - 2 ms.
70
Seek, Rotate, Transfer
Must accelerate, coast, decelerate, settle
Seeks often take several milliseconds!
Settling alone can take 0.5 - 2 ms.
70
Seek, Rotate, Transfer
Must accelerate, coast, decelerate, settle
Seeks often take several milliseconds!
Settling alone can take 0.5 - 2 ms.
Entire seek often takes 4 - 10 ms.
70
Seek, Rotate, Transfer
71
Seek, Rotate, Transfer
Depends on rotations per minute (RPM).
71
Seek, Rotate, Transfer
Depends on rotations per minute (RPM). - 7200 RPM is common, 15000 RPM is high end.
71
Seek, Rotate, Transfer
Depends on rotations per minute (RPM). - 7200 RPM is common, 15000 RPM is high end.
71
Seek, Rotate, Transfer
Depends on rotations per minute (RPM). - 7200 RPM is common, 15000 RPM is high end.
1 / 7200 RPM =
71
Seek, Rotate, Transfer
Depends on rotations per minute (RPM). - 7200 RPM is common, 15000 RPM is high end.
1 / 7200 RPM =1 minute / 7200 rotations =
71
Seek, Rotate, Transfer
Depends on rotations per minute (RPM). - 7200 RPM is common, 15000 RPM is high end.
1 / 7200 RPM =1 minute / 7200 rotations = 1 second / 120 rotations =
71
Seek, Rotate, Transfer
Depends on rotations per minute (RPM). - 7200 RPM is common, 15000 RPM is high end.
1 / 7200 RPM =1 minute / 7200 rotations = 1 second / 120 rotations =12 ms / rotation
71
Seek, Rotate, Transfer
Depends on rotations per minute (RPM). - 7200 RPM is common, 15000 RPM is high end.
1 / 7200 RPM = 1 minute / 7200 rotations = 1 second / 120 rotations = 12 ms / rotation
72
so it may take 6 ms on avg to rotate to target (0.5 * 12 ms)
Seek, Rotate, Transfer
73
Seek, Rotate, Transfer
Pretty fast — depends on RPM and sector density.
73
Seek, Rotate, Transfer
Pretty fast — depends on RPM and sector density.
73
Seek, Rotate, Transfer
Pretty fast — depends on RPM and sector density.
100+ MB/s is typical.
73
Seek, Rotate, Transfer
Pretty fast — depends on RPM and sector density.
100+ MB/s is typical.
73
Seek, Rotate, Transfer
Pretty fast — depends on RPM and sector density.
100+ MB/s is typical.
1s / 100 MB = 10 ms / MB = 4.9 us / sector
73
Seek, Rotate, Transfer
Pretty fast — depends on RPM and sector density.
100+ MB/s is typical.
1s / 100 MB = 10 ms / MB = 4.9 us / sector(assuming 512-byte sector)
73
Workload
74
WorkloadSo…
74
WorkloadSo… - seeks are slow
74
WorkloadSo… - seeks are slow - rotations are slow
74
WorkloadSo… - seeks are slow - rotations are slow - transfers are fast
74
WorkloadSo… - seeks are slow - rotations are slow - transfers are fast
74
WorkloadSo… - seeks are slow - rotations are slow - transfers are fast
What kind of workload is fastest for disks?
74
WorkloadSo… - seeks are slow - rotations are slow - transfers are fast
What kind of workload is fastest for disks?
74
WorkloadSo… - seeks are slow - rotations are slow - transfers are fast
What kind of workload is fastest for disks? Sequential: access sectors in order (transfer dominated) Random: access sectors arbitrarily (seek+rotation dominated)
75
Demos: example-rand.csh and example-seq.csh
Disk Spec
76
Cheetah BarracudaCapacity 300 GB 1 TB
RPM 15,000 7,200Avg Seek 4 ms 9 ms
Max Transfer 125 MB/s 105 MB/sPlatters 4 4Cache 16 MB 32 MB
Disk Spec
77
Cheetah BarracudaCapacity 300 GB 1 TB
RPM 15,000 7,200Avg Seek 4 ms 9 ms
Max Transfer 125 MB/s 105 MB/sPlatters 4 4Cache 16 MB 32 MB
Sequential workload: what is throughput for each?
Disk Spec
78
Cheetah BarracudaCapacity 300 GB 1 TB
RPM 15,000 7,200Avg Seek 4 ms 9 ms
Max Transfer 125 MB/s 105 MB/sPlatters 4 4Cache 16 MB 32 MB
Cheeta: 125 MB/s. Barracuda: 105 MB/s.
Disk Spec
79
Cheetah BarracudaCapacity 300 GB 1 TB
RPM 15,000 7,200Avg Seek 4 ms 9 ms
Max Transfer 125 MB/s 105 MB/sPlatters 4 4Cache 16 MB 32 MB
Random workload: what is throughput for each? (what else do you need to know?)
Disk Spec
80
Cheetah BarracudaCapacity 300 GB 1 TB
RPM 15,000 7,200Avg Seek 4 ms 9 ms
Max Transfer 125 MB/s 105 MB/sPlatters 4 4Cache 16 MB 32 MB
Random workload: what is throughput for each? Assume 16-KB reads.
Disk Spec
81
Cheetah BarracudaCapacity 300 GB 1 TB
RPM 15,000 7,200Avg Seek 4 ms 9 ms
Max Transfer 125 MB/s 105 MB/sPlatters 4 4Cache 16 MB 32 MB
Random workload: what is throughput for each? Assume 16-KB reads.
82
Cheetah BarracudaRPM 15,000 7,200
Avg Seek 4 ms 9 msMax Transfer 125 MB/s 105 MB/s
How long does an average 16-KB read take w/ Cheetah?
83
Cheetah BarracudaRPM 15,000 7,200
Avg Seek 4 ms 9 msMax Transfer 125 MB/s 105 MB/s
How long does an average 16-KB read take w/ Cheetah?
avg rotation = 12
1 min15000
84
Cheetah BarracudaRPM 15,000 7,200
Avg Seek 4 ms 9 msMax Transfer 125 MB/s 105 MB/s
How long does an average 16-KB read take w/ Cheetah?
avg rotation = 12
1 min15000
60 sec1 min
1000 ms1 sec
= 2 ms
85
Cheetah BarracudaRPM 15,000 7,200
Avg Seek 4 ms 9 msMax Transfer 125 MB/s 105 MB/s
How long does an average 16-KB read take w/ Cheetah?
transfer = 1 sec
125 MB16 KB
1,000,000 us1 sec
= 125 us
86
Cheetah BarracudaRPM 15,000 7,200
Avg Seek 4 ms 9 msMax Transfer 125 MB/s 105 MB/s
How long does an average 16-KB read take w/ Cheetah?
Cheetah time = 4ms + 2ms + 125us = 6.1ms
87
Cheetah BarracudaRPM 15,000 7,200
Avg Seek 4 ms 9 msMax Transfer 125 MB/s 105 MB/s
How long does an average 16-KB read take w/ Cheetah?
Cheetah time = 4ms + 2ms + 125us = 6.1ms
throughput = 16 KB6.1ms
88
Cheetah BarracudaRPM 15,000 7,200
Avg Seek 4 ms 9 msMax Transfer 125 MB/s 105 MB/s
How long does an average 16-KB read take w/ Cheetah?
Cheetah time = 4ms + 2ms + 125us = 6.1ms
throughput = 16 KB6.1ms
1 MB1024 KB
100 ms1 sec
= 2.5 MB/s
89
Cheetah BarracudaRPM 15,000 7,200
Avg Seek 4 ms 9 msMax Transfer 125 MB/s 105 MB/s
How long does an average 16-KB read take w/ Barracuda?
90
Cheetah BarracudaRPM 15,000 7,200
Avg Seek 4 ms 9 msMax Transfer 125 MB/s 105 MB/s
How long does an average 16-KB read take w/ Barracuda?
avg rotation = 12
1 min7200
60 sec1 min
1000 ms1 sec
= 4.1 ms
91
Cheetah BarracudaRPM 15,000 7,200
Avg Seek 4 ms 9 msMax Transfer 125 MB/s 105 MB/s
How long does an average 16-KB read take w/ Barracuda?
transfer = 1 sec
105 MB16 KB
1,000,000 us1 sec
= 149 us
92
Cheetah BarracudaRPM 15,000 7,200
Avg Seek 4 ms 9 msMax Transfer 125 MB/s 105 MB/s
How long does an average 16-KB read take w/ Barracuda?
throughput = 16 KB13.2ms
Barracuda time = 9ms + 4.1ms + 149us = 13.2ms
1 MB1024 KB
1000 ms1 sec
93
Cheetah BarracudaRPM 15,000 7,200
Avg Seek 4 ms 9 msMax Transfer 125 MB/s 105 MB/s
How long does an average 16-KB read take w/ Barracuda?
throughput = 16 KB13.2ms
Barracuda time = 9ms + 4.1ms + 149us = 13.2ms
1 MB1024 KB
1000 ms1 sec
= 1.2 MB/s