12
Permutation Realizability and Fault Tolerance Property of the Inside-Out Routing Algorithm Seung-Woo Seo, Member, IEEE, Tse-yun Feng, Fellow, IEEE, and Hyoung-Il Lee, Student Member, IEEE Abstract—In this paper, we analyze ways of realizing permutations in a class of 2log 2 N- or (2log 2 N 1)-stage rearrangeable networks. The analysis is based on the newly developed inside-out routing algorithm [1] and we derive the upper and lower bounds on the number of possible realizations of a permutation. It is shown that the algorithm can provide us with comparable degrees of freedom in realizing a given permutation as the well-known looping algorithm, while it can be more generally applied to a class of 2log 2 N- or (2log 2 N 1)-stage rearrangeable networks. In finding a set of complete assignments for the center-stage cycles, alternate realizations of a permutation can be obtained by changing the initial position, changing the assigning direction, or even interchanging the first-level decompositions of the permutation. We also show that these numerable alternate realizations can be utilized to make the networks tolerate some sets of faults, i.e., control faults of SEs including stuck-at-straight and stuck-at-cross. Various cases of single control faults at the center stages and other stages are examined through examples. These new approaches originate from routing outward from center stages to outer stages; therefore, the center stages and two half networks may be treated separately. Index Terms—Rearrangeable networks, Omega + Omega network, Benes network, looping algorithm, inside-out routing algorithm, fault tolerance. æ 1 INTRODUCTION M ULTISTAGE interconnection networks (MINs) provide high bandwidth and cost-effective communication between processors and memory modules or among processors in multiple computer systems. Most researchers have focused on log 2 N-stage blocking networks such as baseline, omega, banyan, flip [2], [3]. These networks are faster than buses and cheaper than crossbar switches. However, they have two drawbacks. First, the path between any input port and any output port is unique, a property that frequently prevents simultaneous communication for certain input-to-output connections. Second, due to this unique path property, path blockage may be encountered in the presence of a switching element (SE) or a link fault. To overcome the first drawback, some schemes, such as multi- passing by feedback or confining the network to frequently used permutations, have been studied [4]. For the second problem, many types of schemes have been proposed which provide redundant paths by adding extra hardware, such as extra stage [5] or extra link [6], or by using larger size SEs such as 3 3 or 4 4 [7]. On the other hand, 2log 2 N- or (2log 2 N 1)-stage MINs are known to have a full input/output connection (or permutation) capability, with which all input/output pairs can be connected simultaneously. However, only a few such networks have been proven theoretically to have these features. These networks are called rearrangeable networks. The Benes network [8], Joel network [9], and Waksman network [10] belong to this category. Among these, the Benes rearrangeable network has been the most widely studied. It has a well-defined recursive structure; an N N Benes network BN can be decomposed at the first nesting level into two N 2 N 2 subnetworks, B 0 N 2 and B 1 N 2 , and so on. The routing algorithm, known as the looping algorithm [11], [12], also has a recursive structure to match the Benes network. Recently, we proposed a new routing algorithm [1] for a class of 2log 2 N- or (2log 2 N 1)-stage rearrangeable net- works that can be obtained by concatenating two unique- path networks such as baseline, omega, data manipulator, flip, and regular SW-banyan (S F 2) [2]. This algorithm is called the inside-out algorithm and is more general than the looping algorithm. It covers a large class of symmetric rearrangeable networks regardless of the connection pat- terns between the two center stages, which can be straight, butterfly, skewed straight, skewed butterfly, or a combina- tion of these. The looping algorithm, meanwhile, is only applicable to the Benes equivalent networks [13]. In this paper, we analyze the capability of permutation realization of the inside-out routing algorithm. The permu- tation realizability of the Benes and its equivalent networks has been studied based on the looping algorithm and can easily be understood. Pradhan [15] also provides some results on the number of realizable permutations under faults in unique-path networks. However, the permutation realizability in an arbitrary 2log 2 N- or (2log 2 N 1)-stage rearrangeable network, by using the inside-out algorithm, has yet to be determined. In rearrangeable networks, identifying the redundant realizations of a given permuta- tion may be helpful in examining the fault tolerance 946 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 9, SEPTEMBER 1999 . S.-W. Seo and H.-I. Lee are with the School of Electrical Engineering, Seoul National University, Kwanak PO Box 34, Seoul, Korea 151-742. E-mail: {sseo; hillee}@snu.ac.kr. . T.-y. Feng is with the Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802. E-mail: [email protected]. Manuscript received Dec. 1994; revised 12 Nov. 1997. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number 101055. 1045-9219/99/$10.00 ß 1999 IEEE

946 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …yang/536notes/tfeng.pdf · Permutation Realizability and Fault Tolerance Property of the Inside-Out Routing Algorithm Seung-Woo

Embed Size (px)

Citation preview

Page 1: 946 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …yang/536notes/tfeng.pdf · Permutation Realizability and Fault Tolerance Property of the Inside-Out Routing Algorithm Seung-Woo

Permutation Realizability and Fault ToleranceProperty of the Inside-Out Routing Algorithm

Seung-Woo Seo, Member, IEEE, Tse-yun Feng, Fellow, IEEE, and

Hyoung-Il Lee, Student Member, IEEE

AbstractÐIn this paper, we analyze ways of realizing permutations in a class of 2log2N- or (2log2N ÿ 1)-stage rearrangeable

networks. The analysis is based on the newly developed inside-out routing algorithm [1] and we derive the upper and lower bounds on

the number of possible realizations of a permutation. It is shown that the algorithm can provide us with comparable degrees of freedom

in realizing a given permutation as the well-known looping algorithm, while it can be more generally applied to a class of 2log2N- or

(2log2N ÿ 1)-stage rearrangeable networks. In finding a set of complete assignments for the center-stage cycles, alternate realizations

of a permutation can be obtained by changing the initial position, changing the assigning direction, or even interchanging the first-level

decompositions of the permutation. We also show that these numerable alternate realizations can be utilized to make the networks

tolerate some sets of faults, i.e., control faults of SEs including stuck-at-straight and stuck-at-cross. Various cases of single control

faults at the center stages and other stages are examined through examples. These new approaches originate from routing outward

from center stages to outer stages; therefore, the center stages and two half networks may be treated separately.

Index TermsÐRearrangeable networks, Omega + Omega network, Benes network, looping algorithm, inside-out routing algorithm,

fault tolerance.

æ

1 INTRODUCTION

MULTISTAGE interconnection networks (MINs) providehigh bandwidth and cost-effective communication

between processors and memory modules or amongprocessors in multiple computer systems. Most researchershave focused on log2N-stage blocking networks such asbaseline, omega, banyan, flip [2], [3]. These networks arefaster than buses and cheaper than crossbar switches.However, they have two drawbacks. First, the path betweenany input port and any output port is unique, a propertythat frequently prevents simultaneous communication forcertain input-to-output connections. Second, due to thisunique path property, path blockage may be encountered inthe presence of a switching element (SE) or a link fault. Toovercome the first drawback, some schemes, such as multi-passing by feedback or confining the network to frequentlyused permutations, have been studied [4]. For the secondproblem, many types of schemes have been proposedwhich provide redundant paths by adding extra hardware,such as extra stage [5] or extra link [6], or by using largersize SEs such as 3� 3 or 4� 4 [7].

On the other hand, 2log2N- or (2log2N ÿ 1)-stage MINs

are known to have a full input/output connection (or

permutation) capability, with which all input/output pairs

can be connected simultaneously. However, only a few such

networks have been proven theoretically to have these

features. These networks are called rearrangeable networks.The Benes network [8], Joel network [9], and Waksmannetwork [10] belong to this category. Among these, theBenes rearrangeable network has been the most widelystudied. It has a well-defined recursive structure; an N �NBenes network B�N� can be decomposed at the first nestinglevel into two N

2 � N2 subnetworks, B0�N2 � and B1�N2 �, and so

on. The routing algorithm, known as the looping algorithm[11], [12], also has a recursive structure to match the Benesnetwork.

Recently, we proposed a new routing algorithm [1] for aclass of 2log2N- or (2log2N ÿ 1)-stage rearrangeable net-works that can be obtained by concatenating two unique-path networks such as baseline, omega, data manipulator,flip, and regular SW-banyan (S � F � 2) [2]. This algorithmis called the inside-out algorithm and is more general than thelooping algorithm. It covers a large class of symmetricrearrangeable networks regardless of the connection pat-terns between the two center stages, which can be straight,butterfly, skewed straight, skewed butterfly, or a combina-tion of these. The looping algorithm, meanwhile, is onlyapplicable to the Benes equivalent networks [13].

In this paper, we analyze the capability of permutationrealization of the inside-out routing algorithm. The permu-tation realizability of the Benes and its equivalent networkshas been studied based on the looping algorithm and caneasily be understood. Pradhan [15] also provides someresults on the number of realizable permutations underfaults in unique-path networks. However, the permutationrealizability in an arbitrary 2log2N- or (2log2N ÿ 1)-stagerearrangeable network, by using the inside-out algorithm,has yet to be determined. In rearrangeable networks,identifying the redundant realizations of a given permuta-tion may be helpful in examining the fault tolerance

946 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 9, SEPTEMBER 1999

. S.-W. Seo and H.-I. Lee are with the School of Electrical Engineering, SeoulNational University, Kwanak PO Box 34, Seoul, Korea 151-742.E-mail: {sseo; hillee}@snu.ac.kr.

. T.-y. Feng is with the Department of Computer Science and Engineering,The Pennsylvania State University, University Park, PA 16802.E-mail: [email protected].

Manuscript received Dec. 1994; revised 12 Nov. 1997.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number 101055.

1045-9219/99/$10.00 ß 1999 IEEE

Page 2: 946 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …yang/536notes/tfeng.pdf · Permutation Realizability and Fault Tolerance Property of the Inside-Out Routing Algorithm Seung-Woo

property of a network. While there are many types of faultsin MINs [16], we limit our focus to the control faults whichcorrespond to SEs stuck at either straight (zero) or cross(one). This is because the main theme of this paper is toidentify the permutation realizability of the inside-outalgorithm and consider the fault tolerance of the networkas an application. It is noted that there are many researchresults on the fault tolerance of the Benes network relyingon the looping algorithm (for example, see [17]). As will beshown later, the inside-out algorithm can provide compar-able degrees of freedom to the looping algorithm inrealizing a given permutation, while it is more generallyapplicable to non-Benes class networks. This generality isdue to the fact that by following some specified rules, theinside-out algorithm finds a set of terminal numbersbased on the search among possible candidates in acomprehensive manner so that they can be assigned tothe center stages without conflicts (i.e., the precoordina-tion among terminal numbers is performed before thepure assigning procedure).

The remainder of this paper is organized as follows: InSection 2, we introduce some notations used throughout thepaper, followed by a discussion of cycles and loops in theinside-out routing algorithm. We enumerate the possibleways of realizing a given permutation in comparison withthe looping algorithm in Section 3. The detailed fault-tolerant properties (single control-fault cases) are discussedin Sections 4 and 5 with some examples. The results of thepaper are summarized and some concluding remarks areprovided in Section 6.

2 CYCLES AND LOOPS IN INSIDE-OUT ROUTING

ALGORITHM

In the subsequent discussion, we use the same terminologyas defined in [1] for the continuity of discussion (for thedetails, see [1]). The concatenation of two omega networks,denoted as omega � omega with 2log2N stages for N � 2n,can be converted into a symmetric network with center-stage connections in butterfly patterns, which is denoted asomegaÿ1 � omega, where omegaÿ1 means reverse omega.

Definition 2.1. The center stages of a 2log2N-stage network are,from left to right, the log2Nth and (log2N � 1)th SE columns(left- and right-center stages, respectively), each of which iscomposed of N2 SEs. In a (2log2N ÿ 1)-stage network, the centerstage is the log2Nth SE column. TheN links departing outwardfrom the left-center stage and the right-center stage are denotedas Ll and Lr, respectively. N distinct location addresses (orsimply addresses) are assigned such that from top to bottom, eachlink has an n-bit binary address. (See Fig. 1 as an example.)

There are N addresses at each input and output side ofthe center stages. These addresses can have the correspond-ing numbers called values that are either input or outputport terminal numbers (i.e., values are assigned toaddresses). Let a denote an address and v denote the valueassigned to that address. Then, each of N addresses at eitherLl or Lr will have a distinct value v where 0 � v � N ÿ 1.The relation between an address and its value can be

denoted by �s�a� � v, where s 2 fLl;Lrg and the function �

gives an actual value of the address.Unlike other algorithms [4], [10], [11], [12], [13], the inside-

out algorithm is a select-and-assign algorithm and proceedsoutward from center stages to outer stages. This means that,to 2N addresses at both the input and the output sides of thecenter stages, proper values should be assigned first throughprecoordination (backtracking is often necessary during thesearch) so that the center stages and remaining two halfnetworks are routed without conflicts. While this steprequires centralized control, setting of SEs after the assign-ments can be done in a distributed manner using the binaryvalues of the assigned numbers.

Given a permutation P , we first decompose it into twosubpermutations P0 and P1 using the decompositiontechnique as in the looping algorithm (called two-waydecomposition), i.e.,

�P0; P1� � �fXULg; fXL

Ug�or

�P0; P1� � �fXUUg; fXL

Lg�:The assignment procedure is basically composed of twophases called generation and propagation of assignments andis performed in each of the subpermutations separately. Toassign the values to the addresses effectively, a decisionchart, as shown in Chart 2, in [1] is used. This chart reflectsthe connectivity properties between terminal ports andcenter-stage SEs. In the decision chart, the generation phasecorresponds to picking up a terminal number from anadjacent block. These two phases are applied to both theinput and output sides and the repetitive alternation of thetwo phases results in complete loops in the decision chartthat correspond to the center-stage cycles. A cycle and aloop are formally defined as follows:

Definition 2.2. A cycle is defined as a sequence of addressestraversing the left and the right SE columns and a c-elementcycle is a cycle in which the number of addresses covered is c.

SEO ET AL.: PERMUTATION REALIZABILITY AND FAULT TOLERANCE PROPERTY OF THE INSIDE-OUT ROUTING ALGORITHM 947

Fig. 1. Addresses at the center stages of a 16� 16 omegaÿ1 � omeganetwork.

Page 3: 946 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …yang/536notes/tfeng.pdf · Permutation Realizability and Fault Tolerance Property of the Inside-Out Routing Algorithm Seung-Woo

Any 8-element cycle that is a sequence of eight addresses

at the center stages can be denoted as follows, when the

starting address at the input side is a0:

a0 !I:G: a1 ! a2 !O:G: a3 ! a4 !I:G: a5 ! a6 !O:G: a7:

Here, I:G: and O:G: indicate the input and output

generations, respectively, and the arrows without charac-

ters refer to the propagation phases. Analogously, a 16-

element cycle can also be defined. It is noted that in the

above sequence, the input and output terminal numbers of

the same pair should be assigned to a0 and a7, respectively,

so that they can be connected through the center stage (the

same is also true for the other addresses, i.e., a1 and a2, a3

and a4, a5 and a6.). As an example, in Fig. 1, the sequence of

addresses

�0000�IN !I:G: �1000�IN ! �0110�OUT !

O:G: �1110�OUT! �1100�IN !

I:G: �0100�IN ! �1010�OUT!I:G: �0010�OUT

results in an 8-element cycle with the corresponding label

I:G:s and O:G:s as the above notation, where the subscripts

IN and OUT mean that the addresses are in input and

output sides, respectively. In this example, a cycle whose

length is 16 may be made in a similar fashion.As N becomes larger, more than one complete cycle exist

at the center stages. For example, in the 16� 16 omegaÿ1 �omega network, there are a maximum of two 16-element

cycles at the center stages; in the case of 32� 32, a

maximum of four 16-element cycles exist. However, it must

be noted that a c-element cycle may be divided into smaller

c0-element cycles, depending on the characteristics of a

given permutation. For instance, it may be possible in the

16� 16 omegaÿ1 � omega network for a 16-element cycle to

be divided into two 8-element cycles. This property is

closely related to the concept of a loop in the decision chart

which matches the cycle at the center stages.

Definition 2.3. A t-input/output (I/O) pair loop is defined as a

sequence of t-I/O pairs in the decision chart and is denoted as

Lt.

A 4-I/O pair loop L4 in the decision chart that matches

an 8-element cycle in Definition 2.2 is denoted as

�x1; y1� ! �x2; y2� ! �x3; y3� ! �x4; y4�. An 8-I/O pair loop

L8 is defined similarly. A loop is made so that it can match a

center-stage cycle. Depending on the center-stage connec-

tion pattern, 2- or 4- or 8-I/O pair loops may exist in the

decision chart since the maximum number of addresses that

can be traversed in a cycle in any network is assumed not to

exceed 16. For example, in the N �N omegaÿ1 � omeganetwork, the basic structures at the center stages are 8- or

16-element cycles, matching 4- or 8-I/O pair loops,

respectively, when N � 16. In the Benes network, however,

only 4- or 8-element cycles exist, matching 2- or 4-I/O pair

loops, respectively, when N � 4. As a summary, the

combination of I/O-pair loops is formally described as

follows:

Definition 2.4. The number of I/O pairs in t-I/O pair loops inthe decision chart is denoted as jLtj, where t 2 f2; 4; 8g,depending on the network topology.

Lemma 2.1. In the Benes network for N � 4, m1jL2j �m2jL4j � N

2 for a subpermutation either P0 or P1, where m1

and m2 are the numbers of L2 and L4, respectively.

Proof. In the Benes network, the minimum number ofelements in a center-stage cycle is four and the maximumnumber is eight. This implies that the correspondingnumber of I/O pairs in a loop for the cycle is a minimumof two and a maximum of four. These lead to the aboveequation. tu

Lemma 2.2. In the omegaÿ1 � omega network, m3jL4j �m4jL8j � N

2 for a subpermutation either P0 or P1 forN � 16, where m3 and m4 are the numbers of L4 and L8,respectively. Moreover, if N � 8, the combined number of jL2jand jL4j is four for a subpermutation.

Proof. Based on the fact that the basic structure of the centerstages in the omegaÿ1 � omega network is composed of 8-element and 16-element cycles, when N � 16, thecorresponding number of I/O pairs in a loop for a cycleis a minimum of four and a maximum of eight. WhenN � 8, there are only two simple butterflies at the centerstages. tu

3 ENUMERATION OF MULTIPLE REALIZATIONS

In this section, we examine the multiple realizations of agiven permutation in (2log2N ÿ 1)- and 2log2N-stage net-works. The looping algorithm for the Benes networkdecomposes a permutation into two subpermutations ateach decomposition level for each subnetwork. The loopingalgorithm is known so far to be the most ideal inenumerating possible realizations of a permutation in theBenes newtork. We will show that comparable degrees offreedom may be obtained in the inside-out routingalgorithm as in the looping algorithm.

According to the concepts of a cycle and a correspondingloop in the decision chart, we assume that at least one set ofcomplete assignments may be found for each permutation.The fact that we obtained a set of complete assignmentsmeans that the terminal numbers can be assigned properlyat the center stages and the whole network can be routedwithout any conflict. We investigate alternate methods ofassignment in the Benes network and the N �N omegaÿ1 �omega network for a given set of assignments. However,our enumeration scheme can also be applied to other2log2N-stage networks in the class.

Once a set of perfectly matched terminal numberassignments between center-stage cycles and loops in thedecision chart has been obtained, the set can also bematched to the center stages in many different ways bychanging their assignments. The process of locating othercomplete assignments in the center-stage cycles with agiven matched set is well illustrated by the matching circles.Fig. 2 shows four sets of matching circles for the case of an8-element cycle and a 4-I/O pair loop where addresses aregiven as in Section 2. The inner circle represents a completeloop in the chart and rotates clockwise or counterclockwise.

948 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 9, SEPTEMBER 1999

Page 4: 946 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …yang/536notes/tfeng.pdf · Permutation Realizability and Fault Tolerance Property of the Inside-Out Routing Algorithm Seung-Woo

The outer circle is the center-stage cycle and is fixed. Hence,the inner circle stops when it finds a perfect match betweenthem. According to the rule, the inner circle can only stopwhen both the input terminal numbers xis match the input-side addresses in the outer circle and the output terminalnumbers yis match the output-side addresses in the outercircle at the same time.

As shown in Fig. 2, a 4-I/O pair loop and an 8-elementcycle in the omegaÿ1 � omega network can be matched asfollows: A 4-I/O pair loop can be made within a quadrantor within two adjacent quadrants or using all fourquadrants in the decision chart (note that quadrants areseparated by the double lines). In this case, if we pick�x1; y1� as the initial pair to start with, then x1 can beassigned to either a0 or a4. Moreover, we can also go over inthe reverse direction with the initial position at either a1 ora5. As a result, in the maximum case, every 8-element cycleand its corresponding 4-I/O pair loop may be matched in 22

different ways. Similarly, every 16-element cycle and itscorresponding 8-I/O pair loop may be matched maximallyin 22 different ways. However, in the Benes network, a 4-I/O pair loop can be made only by using all four quadrantsin the decision chart to match an 8-element cycle becauseonly simple butterflies exist at both sides of the center stage.This is due to the splitting property of upper and lower

positions in an SE [1]. Thus, in this case, a 4-I/O pair loopcan be matched to an 8-element cycle in only two possibleways (i.e., Figs. 2a and 2b). With these concepts, we nowenumerate alternate assignments in the networks.

3.1 Benes Network

The maximum case in the Benes network occurs when all ofthe loops are 2-I/O pair loops and each of the loops matcheseach of the cycles in two different ways. Hence, the overallnumber of possible assignments, when a matched set isprovided, is given in the following theorems. Since the casesof N � 4 are trivial, only the cases of N � 8 are considered.

Theorem 3.1. The upper bound on the number of possiblerealizations of a permutation in the Benes network for N � 8

is given by

2N2|{z}�1�f�N

8�!g4|����{z����}�2�

f�N8�!g4|����{z����}�3�

2N2|{z}�4�� 2Nf�N

8�!g8: �1�

Proof. There are four factors in the equation; each factor isshown one at a time. Four different factors come fromeach of the steps of the algorithm: (a) ªUº and ªLºassignments in the decision chart (two-way decomposi-tion); (b) selection of elements to make loops in the

SEO ET AL.: PERMUTATION REALIZABILITY AND FAULT TOLERANCE PROPERTY OF THE INSIDE-OUT ROUTING ALGORITHM 949

Fig. 2. Examples of matching circles between an 8-element cycle and a 4-I/O pair loop: Assignment starts at (a) a0 and proceeds clockwise; (b) a1

and proceeds counterclockwise; (c) a4 and proceeds clockwise; (d) a5 and proceeds counterclockwise.

Page 5: 946 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …yang/536notes/tfeng.pdf · Permutation Realizability and Fault Tolerance Property of the Inside-Out Routing Algorithm Seung-Woo

decision chart; (c) selection of cycles in the network tomatch the loops; and (d) assignments of loops withineach of the cycles.

a) Factor (1): The ªUº and ªLº assignments (two-way decomposition) may be made in a maximumof 2

N2 possible ways in the decision chart (e.g.,

with the identity permutation, we can have N2

distinct circles in assigning ªUº and ªL,º and eachof the circles can be assigned in two possibleways). In other words, a given permutation P canbe decomposed into two subpermutations, P1 andP2, maximally in 2

N2 different ways.

b) Factor (2): After the two-way decomposition,loops must be constructed in the decision chartusing generation and propagation procedures.Since P1 and P2 can be treated independently,there is a maximum of N

4 input/output pairs ateach quadrant in the decision chart for asubpermutation (such as in the identity permuta-tion case). With the N

4 input/output pairs, 2-I/Opair loops can be made in �N8 �! different ways.These can be considered as a matching problembetween two N

8 -pair sets. At the same time,considering the other N

4 input/output pairs andthe other subpermutation, the maximum numberof possible selections of elements to make loops inthe decision chart is f�N8 �!g4.

c) Factor (3): In the Benes network, there are N4 8-

element cycles at the center stages. In the maximalcase, every N

4 8-element cycle may be split intotwo independent 4-element cycles. In either fXU

Ugor fXL

Lg, N8 loops, each of which is a 2-I/O pair

loop (a result from (2)), can be matched to N8 cycles

in �N8 �! different ways (N8 comes from the fact thatthe terminal number pairs for the upper andlower positions at each SE cannot be mixed.). Asin (2), considering other cases, the total number ofselections between loops and cycles is f�N8 �!g4.

d) Factor (4): After the selection of cycles and loopsas in (3), each of the input/output numbers in aloop can be assigned to a corresponding cycle intwo different ways. Since there are a total of N

2number of cycles, the possible number of assign-ments is 2

N2 .

Hence, the above equation can be obtained whenmultiplying all factors. tu

The upper bound in the above theorem can be madetighter by considering an additional condition presented asthe fourth rule in [14] for the algorithm. To satisfy the fourthrule, the cycles in the network must not be independentlymatched to the loops as in Theorem 3.1 but must becarefully selected to avoid conflicts. The fourth ruledescribes more general conditions for proper routing inthe Benes network (also in the omegaÿ1 � omega network),which are applicable when the network size is larger than16. According to the conditions, the input/output numberassignments to the SEs at center stages, which are connectedto the same SEs at �log2N � i� 1�th and �log2N ÿ iÿ 1�thstages for 1 � i � log2N ÿ 2, must be limited to a particular

set of numbers to avoid conflicts at those stages. Therefore,to satisfy the conditions at all stages, the loops in thedecision chart need to be decomposed (or grouped) stage-by-stage to subgroups that do not create conflicts at eachstage.

Definition 3.1. We denote an m-segment as a set of integerswhich have the same quotient when the integers are divided bya positive integer m. For example, if we consider a set ofintegers such as f0; 1; 2; 3; 4; 5; 6; 7g, then two 4-segments aref0; 1; 2; 3g and f4; 5; 6; 7g.

Definition 3.2. If two I/O-pair loops have their elements whichbelong to the same �2r�1�-segment, these loops are called as r-adjacent with each other. A group of r-adjacent loops aredefined as the loops which are r-adjacent with each other.

Definition 3.3. An h-outer group of cycles in the right-halfnetwork are defined as a group of cycles to which the same SEat the �log2N � h� 1�th stage is connected through theinterconnection links between �log2N � 1�th stage and�log2N � h� 1�th stage. The same can also be defined in theleft-half network. As an example, we illustrate an 1-outergroup of cycles in the 16� 16 Benes network in Fig. 3. InFig. 3, the 1-outer group of cycles, all of which are connected tothe third SE at the sixth stage (shaded one), are represented bysolid lines.

For proper routing without conflicts, two loops, whichare r-adjacent with each other, must not be simultaneouslymatched to an r-outer group of cycles for 1 � r � log2N ÿ 2.As a result of the ªUº-ªLº decomposition procedure, theloops in the decision chart well satisfy the above conditionfor r � log2N ÿ 2. According to the original rules of theinside-out routing algorithm, the I/O pairs in the decisionchart are selected so as to also satisfy the condition for r � 1.Therefore, we should decompose N=8 loops to subgroupsso that the loops in a subgroup are not r-adjacent with oneanother for 2 � r � log2N ÿ 3. (We call a subgroup of loopsat the rth decomposition step the rth subgroup.) After theselog2N ÿ 4 times of loop decompositions as precoordination,we can match the loops to the cycles in the network withoutconflicts. For example, we illustrate the loop decompositionprocedure of N=8 loops in the 32� 32 Benes network inFig. 4. Although an identity permutation is assumed in thisexample, in a general case, the numbers should be eitherinput or output terminal numbers. During this loopdecomposition procedure, we have choices to decomposethe subgroups in different ways, and the number of ways isdifferent according to how the subgroups are composed asillustrated in Fig. 4.

Definition 3.4. In an rth decomposition step, if all loops of asubgroup are r-adjacent with more than one loop in the samesubgroup and if the subgroup does not include any loops whichare r-adjacent with the loops that belong to another subgroup,we call the subgroup an independent subgroup. Since allloops of an rth subgroup have at least one loop which is r-adjacent with itself, the number of loops composing anindependent subgroup, called the size of an independentsubgroup, is always even. In Fig. 4a, we show an independentsubgroup which is composed of two loops represented by solidlines.

950 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 9, SEPTEMBER 1999

Page 6: 946 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …yang/536notes/tfeng.pdf · Permutation Realizability and Fault Tolerance Property of the Inside-Out Routing Algorithm Seung-Woo

Definition 3.5. A subgroup, composed of 2p loops, is denotedas Sr � �k1; k2; k3; � � � ; kp�, where ki is the number ofindependent subgroups composed of 2i loops in the loopdecomposition. Here, p � 2rÿ1 for 2 � r � log2N ÿ 3, and1 � k1 � 2 � k2 � � � � � p � kp � p. An instance is defined as aparticular composition of loops which satisfies an Sr.

For example, Figs. 4a and 4b are particular instances of

S2 � �2; 0� and S2 � �0; 1�, respectively. Note that Sr indi-

cates how the independent subgroups are composed of the

loops in the decision chart. The number of possible loop

decompositions at each step is dependent on the number of

independent subgroups of loops. It can also be seen that

there exist two possible ways of assignment per indepen-

dent subgroup, e.g., two decomposed subgroups in Fig. 4b

can be assigned to an 1-outer group of cycles in Fig. 3 in two

possible ways. Thus, when given an Sr, the total number of

possible assignments between loops and cycles is given as

2Kn in a subpermutation where Kn �Pn

i�1 ki. Moreover,

because the number of possible assignments is different

depending on the composition of loops in the decision

chart, we have to consider every possible instance of Sr to

calculate the overall possible number of assignments. The

number of instances of an Sr is calculated through two

steps. In the first step, we select I/O-pairs in the decision

chart for each independent subgroup. In the second step,

we determine how to make an independent subgroup with

SEO ET AL.: PERMUTATION REALIZABILITY AND FAULT TOLERANCE PROPERTY OF THE INSIDE-OUT ROUTING ALGORITHM 951

Fig. 4. Examples of the loop decomposition procedure of N=8 loops in the 32� 32 Benes network: (a) an example case with the maximal degree of

freedom; (b) an example case with the minimal degree of freedom.

Fig. 3. An example of a 1-outer group of cycles in a 16� 16 Benes network.

Page 7: 946 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …yang/536notes/tfeng.pdf · Permutation Realizability and Fault Tolerance Property of the Inside-Out Routing Algorithm Seung-Woo

the selected I/O-pairs. The number of possible selections

for 4p I/O-pairs in the first step, symbolized by Sr, is as

follows:

Sr �Ypi�1

siP�ki�i�=�i!�kih i2

ki!;where si � pÿ

Xiÿ1

j�1

j � kj: �2�

In the second step, the number of possible ways to makean independent subgroup with the selected I/O-pairs fromthe first step, denoted by bSr, is as follows:

bSr �Ypj�1

jC1 � 2 �Yjÿ1

i�1

�iC1 � 2�2" #kj

: �3�

Therefore, the overall number of possible assignments at

the rth step is given as:

Dr �

XSr

Sr � bSr � 2Kp

� �� N

2log2Nÿr�!; �4�

where Kp �Pp

i�1 ki and

� N

2log2Nÿr�! �XSr

Sr � bSr� �;

which is the total number of loop decompositions at the rthstep.

Theorem 3.2. The upper bound on the number of possible

realizations of a permutation in the Benes network for N � 8

when the fourth rule is considered is given as

2N2|{z}�1�f�N

8�!g4|����{z����}�2�

f2log2Nÿ3 � �D2�log2Nÿ4 � � � �Dlog2Nÿ4�2�Dlog2Nÿ3�g4|����������������������������������������������{z����������������������������������������������}�3�

2N2|{z}�4�� 2Nf�N

8�! �N

8�Ylog2Nÿ3

i�2

�Di�log2Nÿiÿ2g4:

�5�

Proof. Among the four factors in Theorem 3.1, factor (3)needs to be modified by considering the fourth rule of

the inside-out routing algorithm. Since factor (3) caneasily be proven using (4), duplicated proofs are omittedhere. The factor 2log2Nÿ3 indicates the number of alternateways that the two loops of the last subgroup are assigned

to an 1-outer group of cycles. The other factors are thesame as in Theorem 3.1. tu

Theorem 3.3. The lower bound on the number of possible

realizations of a permutation in the Benes network for N � 8

is given as

2|{z}�1�f�N

8�!g2|����{z����}�2�

2N4|{z}�3�� 2

N4�1f�N

8�!g2: �6�

Proof. A minimal number of realizations can be obtained by

considering only three factors (1), (3), and (4) inTheorem 3.1. In this case, selection of the elements to

make loops in the decision chart may not be arbitrary,instead, loops are made by following a certain sequence

of numbers without the freedom to match the cycles in

the network.

a) Factor (1): The number 2 implies that ªUº-ªLºassignment in the decision chart can be made inonly two possible ways making a single circle.

b) Factor (2): The cycles in the network can beselected to match the loops in f�N8 �!g2 differentways.

c) Factor (3): Within each of the cycles, a loop can beassigned in two different ways, which leads to 2

N4

possible assignments for N4 cycles. tu

We compare the permutation realizability of the inside-out algorithm with that of the well-known looping algo-

rithm. In the looping algorithm, at each level of network

decomposition, there are two choices in assigning the first

input/output pair in a loopÐ namely, the looping algo-rithm has two degrees of freedom in a loop at each nesting

level for a given permutation. Since the looping algorithm

has to iterate log2N ÿ 1 times in the Benes network, theminimal number of realizations of a permutation is given as

21�2�4�����N=4 � 2N2ÿ1, while the maximal number is

2N2 �log2Nÿ1�. The minimal number of permutation realization

occurs when there are 2iÿ1 distinct loops at each level i for

1 � i � log2N ÿ 1, while the maximal number occurs when

there are N2 loops at each level. This implies that some

permutations can be realized in 2N2ÿ1 different ways, and

some other permutations in 2N2 �log2Nÿ1� ways.

We compare Theorem 3.2 with the upper bound of the

looping algorithm by calculating for some cases, i.e.,

N � 8; 16; 32; and 64. As a result of the calculations, both

of the algorithms show equal bounds on the realizability ofa permutation. For the above cases, the calculated results

are 28, 224, 264, and 2160, respectively. Thus, the realizability

of the inside-out routing algorithm is comparable to that ofthe looping algorithm for the above cases. For larger-size

networks N � 128, however, the calculation of the equation

in Theorem 3.2 is computationally intensive, so direct

comparison is quite challenging.

3.2 Omegaÿ1 �Omega Network

Now, we consider the realizability of the inside-out routingalgorithm in the omegaÿ1 � omega network. Considering

that the center stages are composed of 8- and 16-element

cycles in the omegaÿ1 � omega network, we can obtain thebounds as follows:

Theorem 3.4. The upper bound on the number of possible

realizations of a given permutation in the omegaÿ1 � omeganetwork for N � 16 is given as follows:

2N2|{z}�1�fYN16ÿ1

i�0

f2N8 ÿ 2i

2

� �gg4

|�����������������{z�����������������}�2�

f�N16�!g4|����{z����}�3�

�22�N4|�{z�}�4�

� 2N�N4f�N

16�!g4f

YN16ÿ1

i�0

f2N8 ÿ 2i

2

� �gg4: �7�

952 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 9, SEPTEMBER 1999

Page 8: 946 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …yang/536notes/tfeng.pdf · Permutation Realizability and Fault Tolerance Property of the Inside-Out Routing Algorithm Seung-Woo

Proof. a) Factor (1): The ªUº and ªLº decomposition may bemade in a maximum of 2

N2 possible ways in the decision

chart, which is the same as in the Benes network. b)Factor (2): In a maximal case, all of the loops are 4-I/Opair loops, for which each group of four pairs must beselected at one quadrant. The product term is similar tofactorial. c) Factor (3): By considering that in the maximalcase, a 16-element cycle can be split into two 8-elementcycles, and the 8-element cycles can be arranged in �N16�!different ways. d) Factor (4): Each loop can be matched toa cycle in 22 different ways. tu

In a similar manner as in the Benes network, we can get atighter bound on the possible realizations by consideringthe fourth rule of the inside-out routing algorithm.

Theorem 3.5. The upper bound on the number of possiblerealizations of a permutation in the omegaÿ1 � omeganetwork for N � 8 when the fourth rule is considered ismodified as

2N2|{z}�1�f �N8 �!�2!�log2Nÿ4

g4

|����������{z����������}�2�

f�2!�log2Nÿ4 � �D3�log2Nÿ5 � � � �Dlog2Nÿ4�2�Dlog2Nÿ3�g4|������������������������������������������������{z������������������������������������������������}�3�

�22�N4|�{z�}�4�

� 2N�N16f�N

8�! �

Ylog2Nÿ3

i�3

�Di�log2Nÿiÿ2g4:

�8�

Proof. To reflect the fourth rule, the factors (2) and (3) inTheorem 3.4 need to be modified. a) Factor (2): For theprecoordination of the loops according to the fourth rule,we assume that the loops are 2-I/O pair loops, each ofwhich can be precoordinated, and after that, we cancombine two 2-I/O pair loops making a 4-I/O pair loopto match to an 8-element cycle. Under the aboveassumptions, factor (2) can be obtained in a similarmanner as in the Benes network. Regarding thedenominator, we divide the number of selections of 2-I/O pair loops by �2!�log2Nÿ4 to eliminate duplicate cases.b) Factor (3): Precoordinating 4-I/O pair loops requiresonly �log2N ÿ 4� steps of decomposition, not �log2N ÿ 3�steps as in the Benes network. A factor �2!�log2Nÿ4

indicates that a 4-I/O pair loop can be made bycombining two 2-I/O pair loops in two different ways.Note that although the factor �2!�log2Nÿ4 in (b) as well asin (c) is included twice independently, they are cancelledout automatically when calculating the overall equation.The other factors are the same as in Theorem 3.4. tu

Theorem 3.6. The lower bound on the number of possiblerealizations of a given permutation in the omegaÿ1 � omeganetwork for N � 16 is given as follows:

22f�N16�!g2f�22�N16g2 � 2

N4�2f�N

16�!g2: �9�

Proof. The minimum degree of freedom is obtained whenboth of the following conditions are met. First, in ªUº

and ªLº assignments, there are just two possibilities.Second, none of the 16-element cycles are split into 8-element cycles. The other parts of the proof are similar tothose above. tu

In particular, when N � 8, an 8-element cycle can bedivided into two 4-element cycles in the omegaÿ1 � omeganetwork. In this case, the upper and lower bounds differfrom the cases for N � 16. Instead, they are similar to thosefor the Benes network.

4 TOLERANCE OF CONTROL FAULTS AT CENTER

STAGES

In this and the following sections, we discuss how theenumerated alternate realizations of a permutation can beused for fault tolerance. We examine the tolerance ofcontrol faults that occur at the center and other stages.We first define some fault models. In [11], some faultdetection and fault location techniques for the Benesnetwork are presented, and in [16], a general fault modelas well as single and multiple fault detection schemes areproposed. In general, faults can occur in SEs or links,and there are two types of faults in SEs; a control faultand an SE fault. With the control fault, an SE is stuck atstraight (s-a-s) or at cross (s-a-c) regardless of its controlsignal, while with the SE fault, an SE cannot be operated.In this paper, we focus on the tolerance of control faultsbecause we are mostly interested in the permutationrealizability of rearrangeable networks. In a rearrangeablenetwork, if any of the SEs does not function, thepermutation realizability of the network can be broken.In a 2� 2 SE, there are 16 different states as modeled in[16]. However, among these 16 possible states, we onlyneed to consider two states for a control fault, i.e., s-a-sand s-a-c since the other states are invalid for the properoperation of the network.

Throughout the following part of this paper, we use thesame example as in [1] for the 16� 16 omegaÿ1 � omeganetwork to illustrate the permutation realizability and fault-tolerant properties of the inside-out algorithm. In theexample, the network is routed for a given permutation P .

P � 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

7 2 5 15 3 9 12 8 14 0 4 13 6 11 1 10

� �:

The network, which is completely routed, is shown inFig. 11 in [1]. The complete 8-I/O pair loops, which match16-element cycles at the center stages, are shown in Chart 2in [1] using the directed arrows.

Definition 4.1. A finite state Q represents the state of anetwork and a finite state F represents the state of each SEof the network. In particular, the SEs in the network arerepresented as 4-tuples F �i; j; q0; q�, where 1 � i � 2log2N ,0 � j � N

2 ÿ 1, and i; j denote the jth SE at the ith stage.Also, q0 is the desired state of an SE that can be either 0(straight) or 1 (cross), and q is a state after a fault, whichcan be 0 or 1 or inoperative state �. A fault is said to occurif q0 6� q under a certain control signal.

SEO ET AL.: PERMUTATION REALIZABILITY AND FAULT TOLERANCE PROPERTY OF THE INSIDE-OUT ROUTING ALGORITHM 953

Page 9: 946 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …yang/536notes/tfeng.pdf · Permutation Realizability and Fault Tolerance Property of the Inside-Out Routing Algorithm Seung-Woo

Thus,

Q � fF �i; j; q0; q�j1 � i � 2log2N ÿ 1; 0 � j � N2ÿ 1g;

where q0 2 f0; 1g, q 2 f0; 1; �g. Depending on the type offaults, the state q may transit to either 0 or 1 when controlfaults occur and may transit to an inoperative state � whenan SE fault occurs.

In [18], fault tolerance at the center stage of the Benesnetwork was considered using the alternate realizations inthe looping algorithm. Since the direction of routing in thelooping algorithm is inward, and the center stage in a(2log2N ÿ 1)-stage network cannot be treated by itself, thefaulty SE must be projected to the first nesting level, i.e.,input and output terminal ports. However, as the directionof the inside-out algorithm is outward, we can separate thecenter and remaining stages, and thus route them indepen-dently. This property enables us to separate the faulttolerance at center stages from a network. To examine that,we first need to investigate the connection properties of thecenter stages with various types of center-stage links.Among them, two fundamental types of center-stage linkpatterns are considered: straight and butterfly. These areshown in Fig. 5; other types are variations of these twotypes. Let us consider some pairs in an arbitrary permuta-tion where yi � ��xi� for 0 � i � N ÿ 1, and assume thatthere is no specified order among xis and yis.

P � � � � xp � � � xq � � � xr � � � xs � � �� � � yp � � � yq � � � yr � � � ys � � �

� �Assume also that location addresses at the center stages aregiven as in Fig. 5. The addresses ai, aj, ak, and al correspondto the input side and the addresses with prime (0)correspond to the output side. As a result of assignments,each address is assumed to have a proper value for conflict-free routing. We only examine the fault tolerance propertyin a butterfly; the straight connection follows directly.

4.1 Faults at Center Stages with ButterflyConnections

Consider a 2log2N-stage network where there are N4

butterflies at the center stages. In each of the N4 butterflies,

assume that four pairs of values are initially assigned to thecorresponding addresses as in Fig. 5b: At t � Ll; ��t�ai� �xp; �t�aj� � xq; �t�ak� � xr; �t�al� � xs� a n d , a t t � Lr,��t�a0i� � yp; �t�a0j� � yq; �t�a0k� � yr; �t�a0l� � ys�. Note thatthis assignment corresponds to �P0; P1� � �fXU

Ug; fXLLg�

decomposition. It should also be noted here thatLSB�xp; xr; yp; yr� � 0 and LSB�xq; xs; yq; ys� � 1, where

LSB�x; y� denotes the least significant bits in the binaryrepresentation of x and y. In connecting the center stagesthrough the butterfly, eight different interchanges arepossible, and one of them is shown below:

at t � Ll; ��t�ai� � xr; �t�aj� � xq; �t�ak� � xp; �t�al� � xs�:As an example, these eight types of interchanges are shownin Fig. 6, where Fig. 6o is the original assignment. Threedifferent possible initial assignments may also be consid-ered according to the characteristics of a given permutation:

at t � Ll; ��t�ai� � xp; �t�aj� � xq;�t�ak� � xr; �t�al� � xs� and

at t � Lr; ��t�a0i� � yq; �t�a0j� � yp;�t�a0k� � yr; �t�a0l� � ys�;

at t � Ll; ��t�ai� � xp; �t�aj� � xq;�t�ak� � xr; �t�al� � xs� and

at t � Lr; ��t�a0i� � yp; �t�a0j� � yq;�t�a0k� � ys; �t�a0l� � yr�;

at t � Ll; ��t�ai� � xp; �t�aj� � xq;�t�ak� � xr; �t�al� � xs� and

at t � Lr; ��t�a0i� � yq; �t�a0j� � yp;�t�a0k� � ys; �t�a0l� � yr�:

We denote each category as C0, C1, C2, and C3, respectively.We did not specify any order in the numbers, and the abovefour initial assignments have been made by consideringonly the �P0; P1� � �fXU

Ug; fXUUg� decomposition. Similarly,

the initial assignments for �P0; P1� � �fXULg; fXL

Ug� are alsopossible.

Fig. 6 demonstrates that any single control fault (eithers-a-s or s-a-c) at center stages can be tolerated by applyingdifferent interchanges. For rerouting the entire networkafter a center-stage fault, we use the principle of inter-changeable groups shown in [1], i.e., before finalizingassignments, some initial assignments need to be inter-changed with each other so that the center-stage butterfliescan be routed. In this principle, simultaneous interchangesof the proper values in all N4 butterflies at the center stages ofa 2log2N-stage network, without mixing input and output,do not affect the routing property of the network. Thisprinciple consequently allows the omegaÿ1 � omega net-work to tolerate any single center-stage control fault amongN4 butterflies.

However, in general, when multiple faults occur in abutterfly, they cannot be tolerated, while in some casesmultiple faults can also be tolerated. In a 2log2N-stage

954 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 9, SEPTEMBER 1999

Fig. 5. Center-stage patterns: (a) straight, (b) butterfly.

Page 10: 946 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …yang/536notes/tfeng.pdf · Permutation Realizability and Fault Tolerance Property of the Inside-Out Routing Algorithm Seung-Woo

network where the center stages are connected by butter-flies, SE setting patterns at the center stages are thecombination of those among four categories C0, C1, C2, andC3. It must be noted that the way in which categories C0, C1,C2, and C3 are employed for a combination is entirely up tothe permutation itself. In an extreme case, if the patterns arecomposed of those from only one category, multiple faultscannot be tolerated. If the patterns are a combination fromevery category, then any type of double SE control faultslocated in the same butterfly can be tolerated.

5 TOLERANCE OF CONTROL FAULTS AT OTHER

STAGES

Next, we show how the algorithm can provide faulttolerance for the control faults at any stage other than thecenter stages. Since we are discussing the fault tolerance ininherently symmetric networks (both the center stages andtwo half-networks are symmetric at the same time), we canconfine our discussion to the left-half network. Intuitively,due to the outward routing direction of the algorithm, theproblem is related to connectivity or reachability fromcenter-stage addresses to arbitrary SEs at an arbitrary stage.These connectivity properties can be represented by one ofthe representation schemes, e.g., the topology describingrule given in [2]. For convenience in this section, weconsider the omegaÿ1 � omega network again. However,similar fault-tolerance schemes can also be applied to anyother networks in the class.

5.1 Single Fault at Intermediate Stages

We first demonstrate that a single control fault F �i; j; q0; q�at an intermediate stage, where 2 � i � log2N ÿ 1 and0 � j � N

2 ÿ 1, can be tolerated by using an example ratherthan relying on a formal proof.

In Fig. 7, the tolerance of a single fault F �2; 6; 0; 1� in the16� 16 omegaÿ1 � omega network is demonstrated. Thisexample also shows a simple rerouting scheme when afaulty SE is found at an intermediate stage. Comparing Fig. 7to the originally routed network in Fig. 11 in [1], we can seethat in the left-half network, settings of the SEs (010), (011),(110), (111) at the third stage and the SEs (100), (101), (110),(111) at the second stage, which are connected to the center-stage SEs with position indicator ªL,º are complementedsimultaneously. The same is also true for the right halfnetwork.

5.2 Single Fault at the First Stage

Let us consider F �1; j; q0; q�, i.e., a fault that occurred at thefirst stage of the omegaÿ1 � omega network as shown inFig. 8. Since the algorithm first decomposes a givenpermutation P into two subpermutations, �P0; P1� ��fXU

Ug; fXLLg� or �P0; P1� � �fXU

Lg; fXLUg� according to cen-

ter-stage patterns, it is obvious that a single control fault canbe tolerated through the interchange of subpermutationsbetween P0 and P1, i.e., by interchanging the first-leveldecompositions.

When the first-level decompositions are interchanged,the entire SE setting pattern at the first nesting level iscomplemented, compared to the original settings. Fig. 8shows an example of F �1; 5; 0; 1�. The example demon-strates that the subpermutation P0 is assigned from theinitial address (0010) at the input side and that P1 isassigned from (0000) at the input side as the result of aninterchange between them.

6 CONCLUSION

Permutation realizability of the inside-out routing algo-rithm in a class of 2log2N- or (2log2N ÿ 1)-stage rearrange-able networks has been discussed in this paper. So far, only

SEO ET AL.: PERMUTATION REALIZABILITY AND FAULT TOLERANCE PROPERTY OF THE INSIDE-OUT ROUTING ALGORITHM 955

Fig. 6. C0: Tolerance of a single fault at center-stage butterflies.

Page 11: 946 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …yang/536notes/tfeng.pdf · Permutation Realizability and Fault Tolerance Property of the Inside-Out Routing Algorithm Seung-Woo

a class of Benes equivalent networks on this subject havebeen successfully investigated in the literature. As shown inthis paper, the results from the application of the inside-outalgorithm are very productive. Besides the generality,upper and lower bounds on the number of permutationrealizations obtained by applying the algorithm show that acomparable degree of freedom can be obtained as in thecase of the looping algorithm. In addition, since routingproceeds outward, it is now possible to separate the centerstages and the remaining half networks so that faulttolerance of the two parts may be dealt with independently.The results show that a single control fault can be toleratedby changing the initial position in the cycle, by changing therotating direction in the assignment, or by interchanging thefirst-level two-way decompositions. Although the faulttolerance was discussed mainly with the omegaÿ1 � omeganetwork, the same scheme may also be applied to othernetworks, including the Benes equivalent networks.

ACKNOWLEDGMENTS

The authors thank the referees for helpful suggestions that

improved the quality of this paper. The authors wish to

acknowledge the financial support of the Korea Research

Foundation made in the program year of 1997.

REFERENCES

[1] T. Feng and S.-W. Seo, ªA New Routing Algorithm for a Class ofRearrangeable Networks,º IEEE Trans. Computers, vol. 43, no. 11,pp. 1,270-1,280, Nov. 1994.

[2] C.L. Wu and T. Feng, ªOn a Class of Multistage InterconnectionNetworks,º IEEE Trans. Computers, vol. 29, no. 8, pp. 694-702, Aug.1980.

[3] D.H. Lawrie, ªAccess and Alignment of Data in an ArrayProcessor,º IEEE Trans. Computers, vol. 24, no. 12, pp. 1,145-1,155, Dec. 1975.

[4] J. Lenfant, ªParallel Permutations of Data: A Benes NetworkControl Algorithm for Frequently Used Permutations,º IEEETrans. Computers, vol. 27, no. 7, pp. 637-647, July 1978.

[5] G.B. Adams, III and H.J. Siegel, ªThe Extra Stage Cube: A FaultTolerant Interconnection Network for Supersystems,º IEEE Trans.Computers, vol. 31, no. 5, pp. 443-454, May 1982.

956 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 9, SEPTEMBER 1999

Fig. 7. Rerouting after the fault F �2; 6; 0; 1�.

Fig. 8. Rerouting after the fault F �1; 5; 0; 1�.

Page 12: 946 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …yang/536notes/tfeng.pdf · Permutation Realizability and Fault Tolerance Property of the Inside-Out Routing Algorithm Seung-Woo

[6] N.-F. Tzeng, P.-C. Yew, and C.-Q. Zhu, ªA Fault-Tolerant Schemefor Multistage Interconnection Networks,º Proc. Int'l Conf. ParallelProcessing, vol. 1, pp. 368-375, 1985.

[7] K. Padmanabhan and D.H. Lawrie, ªA Class of Redundant PathMultistage Interconnection Networks,º IEEE Trans. Computers,vol. 32, no. 12, pp. 1,099-1,107, Dec. 1983.

[8] V. Benes, Mathematical Theory of Connecting Networks. New York:Academic Press, 1965.

[9] A.E. Joel, Jr., ªOn Permutation Switching Networks,º Bell SystemTechnical J., pp. 813-822, May/June 1968.

[10] A. Waksman, ªA Permutation Network,º J. ACM, vol. 15, no. 1,pp. 159-163, Jan. 1968.

[11] D.C. Opferman and N.T. Tsao-Wu, ªOn a Class of RearrangeableSwitching Networks,º Bell System Technical J., vol. 50, no. 5,pp. 1,579-1,600, May/June 1971.

[12] S. Andresen, ªThe Looping Algorithm Extended to Base 2t

Rearrangeable Switching Networks,º IEEE Trans. Comm., vol. 25,pp. 1,057-1,063, Oct. 1977.

[13] Y.-M. Yeh and T. Feng, ªOn a Class of Rearrangeable Networks,ºIEEE Trans. Computers, vol. 41, no. 11, pp. 1,361-1,379, Nov. 1992.

[14] M.K. Kim, H. Yoon, and S.R. Maeng, ªOn the Correctness ofInside-Out Routing Algorithm,º IEEE Trans. Computers, vol. 46,no. 7, pp. 820-823, July 1997.

[15] D.K. Pradhan, ªFault-Tolerant Computing: Theory and Techni-ques,º vol. II, pp. 492-496, chapter 6. Prentice Hall, 1986.

[16] T. Feng and C.L. Wu, ªFault-Diagnosis for a Class of MultistageInterconnection Networks,º IEEE Trans. Computers, vol. 30, no. 10,pp. 743-758, Oct. 1981.

[17] Y.M. Yeh and T. Feng, ªFault Tolerant Routing on a Class ofRearrangeable Networks,º Proc. Int'l Conf. Parallel Processing,vol. 1, pp. 305-312, 1992.

[18] S.K. Wysham and T. Feng, ªOn Routing a Faulty Benes Network,ºProc. Int'l Conf. Parallel Processing, vol. 1, pp. 351-354, 1990.

Seung-Woo Seo received the BS and MSdegrees from Seoul National University, Seoul,Korea, both in electrical engineering, in 1987and 1989, respectively. He received his PhDdegree from the Department of Electrical andComputer Engineering, the Pennsylvania StateUniversity, University Park, in 1993. He was onthe faculty of the Department of ComputerScience and Engineering at the PennsylvaniaState University from 1993 to 1994, and was a

member of the research staff in the Department of Electrical Engineeringat Princeton University, Princeton, New Jersey. Since 1996, he hasbeen an assistant professor with the School of Electrical Engineering,Seoul National University, Seoul, Korea. His research interests includehigh speed networks, interconnection networks, switching systems,optical communication, and performance evaluation. He was therecipient of the U.S. National Science Foundation Postdoctoral Fellow-ship in 1995. He is a member of the IEEE.

Tse-yun Feng (S'61-M'67-SM'75-F'80) re-ceived the BS degree from the National TaiwanUniversity, Taipei, Taiwan, the MS degree fromOklahoma State University, Stillwater, and thePhD degree from the University of Michigan,Ann Arbor, all in electrical engineering. He wason the faculty of the Department of Electrical andComputer Engineering, Syracuse University,Syracuse, New York, Wayne State University,Detroit, Michigan, and the Department of Com-

puter and Information Science, Ohio State University, Columbus, Ohio.He is now a binder professor of computer engineering at ThePennsylvania State University, University Park. He has extensivetechnical publications in the areas of parallel and concurrent processors,interconnection networks, computer architecture, and switching theory.Dr. Feng was editor-in-chief of the IEEE Transactions on Computersfrom 1982-1986 and of the IEEE Transactions on Parallel andDistributed Systems from 1990-1993. He has edited a number ofSagamore Computer Conference on Parallel Processing Proceedingsand Interconnection Networks for Parallel and Distributed Processing. Inaddition, he has also edited special issues for the IEEE Transactions onComputers, ACM Computing Surveys, and the Proceedings of the IEEE.He has been an invited speaker to various organizations and served asa consultant or reviewer to several companies and publishers. He hasreceived a number of awards and honorary recognitions for his technicalcontributions and scholarship. In 1994, he was inducted as a fellow ofthe ACM. Dr. Feng has also been active professionally. He waspresident of the IEEE Computer Society, 1979-1980, chairman of theInternational Conference on Computers and Applications, amongothers. He was conference chairman of the International Conferenceon Parallel Processing for 25 years. He is a fellow of the IEEE.

Hyoung-Il Lee received his BS and MS degreesin the School of Electrical Engineering both fromSeoul National University, Korea, in 1997 and1999, respectively. He is currently workingtoward the PhD degree at the same university.His research interests are interconnection net-works, high-speed switch architectures, androuting in Internet.

SEO ET AL.: PERMUTATION REALIZABILITY AND FAULT TOLERANCE PROPERTY OF THE INSIDE-OUT ROUTING ALGORITHM 957