Upload
henry-perkins
View
223
Download
1
Embed Size (px)
Citation preview
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .1
UNIVERSITY OF MASSACHUSETTSDept. of Electrical & Computer
Engineering
Fault Tolerant ComputingECE 655
Part 8Networks - 3
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .2
Hypercube Networks
Hn - An n-dimensional hypercube network - 2 nodes
A 0-dimensional hypercube H0 - a single node
Hn constructed by connecting the corresponding nodes of two Hn-1 networks
The edges added to connect corresponding nodes are called dimension-(n-1) edges
n
Dimension-0 edgeH1
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .3
Hypercube - Examples
Dimension-0 edges
Dimension-1 edges
Dimension-2 edges
Dimension-3 edges
H4
H1
H2
H3H
3
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .4
Routing in Hypercubes Specific numbering to simplify routing Number expressed in binary - if nodes i and j are
connected by a dimension-k edge, the names of i and j differ in only the k-th bit position
Example - nodes 0000 and 0010 differ in only the 2 bit position - connected by a dimension-1 edge
Example - a packet needs to travel from node 14=1110 to node 2=0010 in an H network
Possible routings - 1110 0110 (dimension 3) 0010
(dimension 2) 1110 1010 (dimension 2) 0010
(dimension 3)
2 42
Dimension-0 edgesDimension-1 edgesDimension-2 edgesDimension-3 edges
1
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .5
Routing - General
In general - the distance between source and destination is the number of different bits in addresses
Going from X to Y can be accomplished by traveling once along each dimension in which they differ
X = x ... x ; Y=y ... y Define z = x y -
is the exclusive-or operator Packet must traverse an edge in every
dimension i for which z = 1
n-1 00 n-1
i
i
ii
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .6
Fault Tolerance in Hypercubes Hn (for n2) can tolerate link failures - multiple
paths from any source to any destination Node failures can disrupt the operation One way is to increase the number of
communication ports of each node from n to n+1 and connecting these extra ports through additional links to one or more spare nodes
Example - two spare nodes - each a spare for 2 nodes of an Hn-1 sub-cube
Spare nodes may require 2 ports - can be reduced by using several crossbar switches whose outputs is connected to the corresponding spare node
Number of ports of the spare node is reduced to n+1 - same as for all other nodes
n-1
n-1
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .7
An H4 Hypercube with Two Spare Nodes
S S
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .8
Different Method of Fault-
Tolerance
Duplicating the processor in a few selected nodes
Each additional processor - spare also for any of the processors in the neighboring nodes
Example - nodes 0, 7, 8, 15 in H4 - modified to duplex nodes
Every node now has a spare at a distance no larger than 1
Replacing a faulty processor by a spare results in an additional communication delay
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .9
Routing in Injured Hypercubes
Routing algorithm must be modified to route around the faulty nodes or links
Basic idea - list the dimensions along which the packet must travel, and traverse them one by one
As edges are traversed and are crossed off the list
If, due to a link or a node failure, the desired link is not available - another edge in the list, if any, is chosen for traversal
If packet arrives at some node to find all dimensions on its list down - it backtracks to the previous node and tries again
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .10
Formal Routing Algorithm - Notations
TD - list of dimensions that the message has traveled on - in order of traversal; TD - in reversed order
- exclusive-or operation carried out k times, sequentially
Example - a means (a a ) a
D - destination, S - source, d=DS ( - bitwise exclusive-or operation on corresponding bits of D and S)
SC(A) - set of nodes visited if we travel on each of the dimensions listed in set A
Example - at node 0010 - SC(1,3)={0000,1000}
R
k
i
3
1 2 3i=1
i=1
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .11
Notations - Cont.
e - n-bit vector consisting of a 1 in the i-th bit position and 0 everywhere else
Example - e = 100 Packets are assumed to consist of
(I) d; d=DS (II) Message being transmitted (the ``payload'') (III) List of dimensions taken so far - TD
- append operation TD x - append x to the list TD
transmit(j) - send packet (d e , message, TD j) along the j-th-dimensional link from the present node
i
2
n
j
3
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .12
Routing Algorithm for Injured Hypercubes
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .13
Example - H3
H3 with faulty node 011 node 000 wants to send a
packet to 111 At 000, d=111 - sends the
message out on dimension-0, to node 001
At 001, d = 110 and TD=(0) - attempts dimension-1 edge - impossible
Bit 2 of d is also 1 - checks and finds that the dimension-2 edge to 101 is available - message is sent to 101 and then to 111
Exercise - What if both 011 and 101 are down?
Dimension-0 edgesDimension-1 edgesDimension-2 edges
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .14
Reliability of Point-to-Point Networks
Not necessarily a regular structure - often more than one path between any two nodes
Terminal Reliability - the probability that there exists an operational path between two specific nodes, given the probabilities of link failures
Example - calculating the terminal reliability for the source-sink pair N - N41
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .15
Terminal Reliability - Example I
Three paths from N to N P ={X ,X } P ={X ,X } P ={X ,X ,X }
p (q ) - probability that link X is good (faulty) Nodes are assumed fault-free - if not, their failure
probability is incorporated into outgoing links Set of paths must be modified to an equivalent set
of mutually exclusive events - otherwise some events will be counted more than once
Mutually exclusive events - (I) P up ; (II) P up and P down ; (III) P up and both P and P down
41
1 1,2 2,4
2
3
3,41,3
1,2 2,3 3,4
i,ji,j i,j
3 211
1 2
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .16
Calculating Terminal Reliability Terminal reliability of a network with m paths
P , ... , P from source to sink E (E ) - event in which path P is operational
(faulty)
R = P(Operational Path Exists) = P( E ) Set of events can be decomposed into mutually
exclusive events -
O. P. Exists = E (E E ) (E E E ) ... (E E … E )
R = P(E )+P(E E )+P(E E E ) + ... +P(E E … E )
1 m
m
i=1
i i
i
1
i
11
1
2 2
m-1
m
3
1
111 22 3
m
m-1
_
___
__
__
_
_
_
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .17
Terminal Reliability - Cont.
The last expression can be rewritten using conditional probabilities
R = P(E )+P(E )P(E /E )+P(E )P(E E /E ) + ... +P(E )P(E … E /E )
The problem is calculating the probabilities P(E … E /E )
To identify the links which must must fail so that E occurs but not E ,…, E , conditional sets are used
S = P - P = { x | x P and x P } Identifying disjoint events in the general case is not
always straightforward
1 11 2 22
m
33
1 m-1 m
i-11 i
__
__
___
i 1 i-1
j/i iij j
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .18
Terminal Reliability - Example II
This six-node network has 9 links - 6 uni-directional and 3 bi-directional
All paths from N1 to N6 -
Paths are ordered from shortest to longest
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .19
Calculating Terminal Reliability for Example II
The first term for the reliability equation is P(E )=p p p
To calculate the second term in the reliability equation - the conditional set is used
S = P - P = {x , x } At least one of the links in this set must fail so
that P is faulty (while P is operational) The second term in the probability equation -
p p p (1-p p )
1/2 2
1 1,3 3,5
1
2
1,2
3,51,3
5,6
2,5
1
5,6 1,3 3,5
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .20
Example II - Cont. For calculating other terms in the sum -
intersection of several conditional sets must be considered
Calculating the fourth term - expression for P - the conditional sets are: S ={x }; S ={x ,x ,x }; S ={x ,x }
S is included in S - if S is faulty, S is faulty
S can be ignored The fourth term in the reliability equation - p p p p (1-p )(1-p p )
3/4
1/4
5,6
4
2/4 2,5
1,3
1,25,6
2,43,5 4,5 4,6 5,6 1,2
2,41,2
1/4
1/4
2/4
2/4 2/4
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .21
Example II - Cont. Calculating the third term S
= {x ,x ,x } ; S = {x ,x } The two conditional sets are not disjoint The event both S and S are faulty
needs to be divided into disjoint events: (I) x is faulty (II) x is operational and both x and
x are faulty (III) Both x and x are up, and
both x and x are faulty Resulting expression for third term p p p (q + p q q + p p q q ) Remaining terms - calculated similarly Terminal reliability is the sum of all thirteen
terms
2/3 1/3 1,3 5,63,5 2,5 5,6
1/3 2/3
5,6
5,6
1,31,2
1,3
1,3
2,5
3,52,5
2,5
5,65,65,6
5,63,5
2,4 4,6 1,3 2,5