1 Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID

1

Lecture 25: Interconnection Networks, Disks

• Topics: flow control, router microarchitecture, RAID

2

Virtual Channel Flow Control

• Each switch has multiple virtual channels per phys. channel

• Each virtual channel keeps track of the output channel assigned to the head, and pointers to buffered packets

• A head flit must allocate the same three resources in the next switch before being forwarded

• By having multiple virtual channels per physical channel, two different packets are allowed to utilize the channel and not waste the resource when one packet is idle

3

Example

• Wormhole:

• Virtual channel:

AB

B

A is going from Node-1 to Node-4; B is going from Node-0 to Node-5

Node-1

Node-0

Node-5(blocked, no free VCs/buffers)

Node-2 Node-3 Node-4

idleidle

AB

ANode-1

Node-0

Node-5(blocked, no free VCs/buffers)

Node-2 Node-3 Node-4

BA

Traffic Analogy: B is trying to make a left turn; A is trying to go straight; there is no left-only lane with wormhole, but there is one with VC

4

Buffer Management

• Credit-based: keep track of the number of free buffers in the downstream node; the downstream node sends back signals to increment the count when a buffer is freed; need enough buffers to hide the round-trip latency

• On/Off: the upstream node sends back a signal when its buffers are close to being full – reduces upstream signaling and counters, but can waste buffer space

5

Deadlock Avoidance with VCs

• VCs provide another way to number the links such that a route always uses ascending link numbers

2 1 0

1 2 32 1 0

1 2 32 1 0

1 2 32 1 0

17

18

19

18

17

16

102 101 100

101 102 103

117

118

119

118

117

116 202 201 200

201 202 203

217

218

219

218

217

216

• Alternatively, use West-first routing on the 1st plane and cross over to the 2nd plane in case you need to go West again (the 2nd

plane uses North-last, for example)

6

Router Functions

• Crossbar, buffer, arbiter, VC state and allocation, buffer management, ALUs, control logic

• Typical on-chip network power breakdown: 30% link 30% buffers 30% crossbar

7

Virtual Channel Router

• Buffers and channels are allocated per flit

• Each physical channel is associated with multiple virtual channels – the virtual channels are allocated per packet and the flits of various VCs can be interweaved on the physical channel

• For a head flit to proceed, the router has to first allocate a virtual channel on the next router

• For any flit to proceed (including the head), the router has to allocate the following resources: buffer space in the next router (credits indicate the available space), access to the physical channel

8

Router Pipeline

• Four typical stages: RC routing computation: the head flit indicates the VC that it belongs to, the VC state is updated, the headers are examined and the next output channel is computed (note: this is done for all the head flits arriving on various input channels) VA virtual-channel allocation: the head flits compete for the available virtual channels on their computed output channels SA switch allocation: a flit competes for access to its output physical channel ST switch traversal: the flit is transmitted on the output channel

A head flit goes through all four stages, the other flits do nothing in the first two stages (this is an in-order pipeline and flits can not jump ahead), a tail flit also de-allocates the VC

9

Router Pipeline

• Four typical stages: RC routing computation: compute the output channel VA virtual-channel allocation: allocate VC for the head flit SA switch allocation: compete for output physical channel ST switch traversal: transfer data on output physical channel

RC VA SA ST

-- -- SA ST

-- -- SA ST

-- -- SA ST

Cycle 1 2 3 4 5 6 7

Head flit

Body flit 1

Body flit 2

Tail flit

RC VA SA ST

-- -- SA ST

-- -- SA ST

-- -- SA ST

SA

--

--

--

STALL

10

Speculative Pipelines

• Perform VA and SA in parallel• Note that SA only requires knowledge of the output physical channel, not the VC• If VA fails, the successfully allocated channel goes un-utilized

RCVASA

ST

-- SA ST

-- SA ST

-- SA ST

Cycle 1 2 3 4 5 6 7

Head flit

Body flit 1

Body flit 2

Tail flit

• Perform VA, SA, and ST in parallel (can cause collisions and re-tries)• Typically, VA is the critical path – can possibly perform SA and ST sequentially

• Router pipeline latency is a greater bottleneck when there is little contention• When there is little contention, speculation will likely work well!• Single stage pipeline?

RCVA

SA ST

SA ST

SA ST

SA ST

11

Recent Intel Router

Source: Partha Kundu, “On-Die Interconnects for Next-Generation CMPs”, talk at On-Chip Interconnection Networks Workshop, Dec 2006

• Used for a 6x6 mesh• 16 B, > 3 GHz• Wormhole with VC flow control

12

Recent Intel Router


13

Recent Intel Router


14

Magnetic Disks

• A magnetic disk consists of 1-12 platters (metal or glass disk covered with magnetic recording material on both sides), with diameters between 1-3.5 inches

• Each platter is comprised of concentric tracks (5-30K) and each track is divided into sectors (100 – 500 per track, each about 512 bytes)

• A movable arm holds the read/write heads for each disk surface and moves them all in tandem – a cylinder of data is accessible at a time

15

Disk Latency

• To read/write data, the arm has to be placed on the correct track – this seek time usually takes 5 to 12 ms on average – can take less if there is spatial locality

• Rotational latency is the time taken to rotate the correct sector under the head – average is typically more than 2 ms (15,000 RPM)

• Transfer time is the time taken to transfer a block of bits out of the disk and is typically 3 – 65 MB/second

• A disk controller maintains a disk cache (spatial locality can be exploited) and sets up the transfer on the bus (controller overhead)

16

RAID

• Reliability and availability are important metrics for disks

• RAID: redundant array of inexpensive (independent) disks

• Redundancy can deal with one or more failures

• Each sector of a disk records check information that allows it to determine if the disk has an error or not (in other words, redundancy already exists within a disk)

• When the disk read flags an error, we turn elsewhere for correct data

17

RAID 0 and RAID 1

• RAID 0 has no additional redundancy (misnomer) – it uses an array of disks and stripes (interleaves) data across the arrays to improve parallelism and throughput

• RAID 1 mirrors or shadows every disk – every write happens to two disks

• Reads to the mirror may happen only when the primary disk fails – or, you may try to read both together and the quicker response is accepted

• Expensive solution: high reliability at twice the cost

18

RAID 3

• Data is bit-interleaved across several disks and a separate disk maintains parity information for a set of bits

• For example: with 8 disks, bit 0 is in disk-0, bit 1 is in disk-1, …, bit 7 is in disk-7; disk-8 maintains parity for all 8 bits

• For any read, 8 disks must be accessed (as we usually read more than a byte at a time) and for any write, 9 disks must be accessed as parity has to be re-calculated

• High throughput for a single request, low cost for redundancy (overhead: 12.5%), low task-level parallelism

19

RAID 4 and RAID 5

• Data is block interleaved – this allows us to get all our data from a single disk on a read – in case of a disk error, read all 9 disks

• Block interleaving reduces thruput for a single request (as only a single disk drive is servicing the request), but improves task-level parallelism as other disk drives are free to service other requests

• On a write, we access the disk that stores the data and the parity disk – parity information can be updated simply by checking if the new data differs from the old data

20

RAID 5

• If we have a single disk for parity, multiple writes can not happen in parallel (as all writes must update parity info)

• RAID 5 distributes the parity block to allow simultaneous writes

21

RAID Summary

• RAID 1-5 can tolerate a single fault – mirroring (RAID 1) has a 100% overhead, while parity (RAID 3, 4, 5) has modest overhead

• Can tolerate multiple faults by having multiple check functions – each additional check can cost an additional disk (RAID 6)

• RAID 6 and RAID 2 (memory-style ECC) are not commercially employed

22

Title

• Bullet

Documents

1 Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID