A link-disjoint subcube for processor allocation in hypercube computers

ELSBVIER Parallel Computing 22 (1997) 1579- 1595

PARALLEL COMPUTING

A link-disjoint subcube for processor allocation in hypercube computers T

Jong-Uk Kim I, Kyu-Hyun Shim 2, Kyu Ho Park *

Department of Electrical Engineering, Koreu Advunced Institute of Science and Technology, 373-l. Kusong-Dong, Yusong-Gu, Tuejon. 305-701, South Korea

Received 14 August 1995; revised 30 August 1996

Abstract

We propose a new type of subcubes, called link-disjoint subcubes (LS), which can be used for the subcube allocation problem in hypercube computers. A link-disjoint subcube is not a contiguous subcube as in the previous schemes, but this subcube still has no common communication link with any other subcubes. When link-disjoint subcubes are used, the performance degradation caused by non-contiguous processor allocation is lower than 1.0% in many cases. With the availability of link-disjoint subcubes, there are [n/2],_,C,_ ,2”-” k-dimensional LSs recognizable in an n-dimensional hypercube. The number of all the recognizable subcubes under our allocation scheme is ([n/2](n - k)k + n(n - l>)/n(n - 1) times that under the previous schemes. For example, the number of all the recognizable subcubes is at maximum 2.39 times that of contiguous subcubes in lo-dimensional hypercube computers. Through simulation, the performance of our scheme is measured and compared to the previous schemes in terms of processor utilization and waiting delay. It has been shown through simulation that the LSs increase the performance of our allocation scheme.

Keywords: Subcube allocation; Link-disjoint subcube; Circuit switching; Worm-hole routing

1. Introduction

’ A preliminary version of this article appeared in the Proceedings of the 12th hnUa1 International Phoenix

Conference on Computers and Communications [6]. Copyright 1993 IEEE.

* Corresponding author. Email: [email protected].

’ Current address: Computer System Dept., Daewoo Telecom Co., Ltd., 360-3, Daeya-dong, Shihung-si,

Kyonggi-do, 429-010, South Korea.

* Email: [email protected].

0167.8191/97/$17.00 0 1997 Published by Elsevier Science B.V. All rights reserved

PIf SOl67-8191(96)00057-9

1580 J.-U. Kim et al./ Parallel Computing 22 (1997) 1579-1595

The processor allocation problem is one of the most important research topics regarding hypercube computers running in a multiuser environment. After Chen and Shin [l] provided some frameworks and algorithms of the processor allocation problem, much research efforts related to that problem have been undertaken. In a multiuser environment, the number of processors to be allocated to an incoming job may be arbitrary within the processor size of the given hypercube computers. Several processor allocation policies have been reported in the literature. Most of them have dealt with the problem of finding an available subcube of a requested size more optimally and quickly.

Studies have shown that the performance of a subcube allocation scheme is largely dependent on its subcube recognition ability [l], defined as the ability to detect the availability of subcubes. The buddy strategy recognizes only 2”- k k-subcubes out of IICk X 2”- k within an n-cube computer. The subcube recognition ability of single gray

code (SGC) strategy is 2n-k+‘, which is twice that of the buddy strategy, and its extended version, the multipIe gray code (MGC) strategy [ll, can achieve full recognition of all the available subcubes of any sizes using “C,,,,, GCs. The modijed buddy

strategy [l] outnumbers the SGC strategy by (n - k + 1)/2 in its subcube recognition ability as it recognizes (n - k + 1) X 2”- k k-subcubes. Recently, several kinds of approaches with full recognition abilities have been reported such as maximal set of

subcubes (MSS) approach 131, tree collapsing (TC) strategy 121, ffee-list strategy [5], prime cube graph approach [l 11, MSIS approach [9].

The number of recognizable subcubes in the previous works have been limited to ,,C, X 2”- k k-subcubes because of the inherent topological properties of a hypercube. These subcubes are contiguous subcubes whose processors are constrained to be physically adjacent.

In [7], they showed that there are ,,+ ,C, X 2”- k k-subcubes in an n-dimensional Folded Hypercube @l-K). However, a FHC requires other hardware resources, such as communication links and logics, and modified softwares including different routing algorithm, different system kernel and modified user programs.

It is natural that if more subcubes are recognizable, better job response times and processor utilizations can be achieved. However, it is impossible to find more subcubes of the same structure as the contiguous subcubes under the same hardware implementations.

In this paper, we propose a new type of subcubes called link-disjoint subcubes [6]. A link-disjoint subcube is not a contiguous subcube as in the previous schemes, but this subcube still has no common communication link with any other subcubes. As a result of such extension of a subcube’s concept, we can recognize

ln/2](n-k)k+n(n- 1)

n(n- 1)

times more k-subcubes than the schemes having full recognition ability of all the available contiguous subcubes in n-cube computers.

We develop new concepts of link-disjoint subcubes assuming that the circuit switching or wormhole routing of the communication method is used. Several hypercube computers such as iPSC-860, iPSC-2, Ncube-2, Intel’s Touchstone DELTA and Mark-III have already adopted a type of circuit switching or wormhole routing communication

J.-U. Kim et al./Parallel Computing 22 (1997) 1579-1595 1581

networks. Also, it is assumed that the e-cube routing strategy [lo] for routing data between any pair of processors is used.

The rest of the paper is organized as follows. In Section 2, we introduce the necessary definitions and notations. The concept, link-disjoint, is also defined. In Section 3. a

link-disjoint subcube is defined formally and its structural properties are described. The consideration of performance degradation caused by non-contiguous processor allocation is treated in Section 4. An algorithm to recognize all the subcubes, including link-dis-

joint subcubes, is presented in Section 5. Section 6 shows some simulation results of our

algorithm and others. Concluding remarks are given in the last section.

2. Definitions and notations

Definition 1 (Contiguous subcube). A k-dimensional contiguous subcube (k-CS) of an n-cube is, in itself, a k-cube which is a subgraph of the given n-cube graph, where

O<k<n.

The term, a contiguous subcube (CS), is defined to classify the two types of subcubes discussed in this paper. For specifying CSs of various size, we use the ternary symbol

set 2 = (0, 1, x), where x is a don’t care symbol. Let cy be a k-CS of an n-cube, then (Y can be uniquely represented by an n-bit string, containing k x’s, of symbols in 2:. We call such a string the address of (Y. For example, a 2-CS of a 4-cube which consists of

nodes ((01001, (01011, (01 lo>, (0111)) is (01 xx>. Every edge in an n-cube consists of two opposite unidirectional links. Two links

joining a pair of neighboring nodes, a and 6, are denoted by I(a, 6) and 1(b, a>, where /(a, b)(f(b, a>> represents the link directed from a(b) to b(a). A path from a to b,

denoted by P(a, b), is composed of the sequence of links established from a to b. For example, f(a, b) is represented by (l(a, c)l(c, b)) if the path is established with only one intermediate node, c. A path is also described as a sequence of paths or a mixed sequence of paths and links. For example, if P(a, f) is composed of five links, (I( a, b)1( b, c)l(c, d)l(d, e)l( e, f)), following representations can be also used instead

of P(a, f>: (/(a, b)P(b, e>l(e, f>), (P(a, b)P(b, d)P(d, f>). (F’(a, d)l(d, e)l(e, f)).

Definition 2 (Hamming distance). Let (Y = a,,_ , a,_ 2 cq, and /3 = /3,_, p,_ 2 . PO be CSs of an n-cube. The Hamming distance, H, between two CSs a, p in an n-cube is defined as H(a, p) = c~&‘h(ai, /3,), where h(ai, pi) = 1 if ( ai = 0 and pi = I) or ( LY~ = 1 and pi = O), and h(cr;, pi) = 0 otherwise.

Definition 3 (Exact distance [5]). The exact distance, E, between two CSs cy, p in an

n-cube is defined as E(a, p > = Cl:de(a,, pi), where e(ai, pi> = 0 if ( (Yi = 0 and

p,=O) or (ai= 1 and pi= 1) or ((Y;= x and pi = x), and e(czi, pi) = 1 otherwise.

For example, let a = (0x1 xl and p = (IOxx), then H(cw, p> = 1 and E(cY, p) = 3.

1582 J.-V. Kim et al./ Parallel Computing 22 (1997) 1579-1595

C,={ ooo,011,100,111)

c,={olIl,olo)

c, =(101,110)

(a) Link-disjoint

(b) Link-conflict

Fig. 1. (a) Link-disjoint decomposition, (b) link-conflict decomposition.

The e-cube routing algorithm, a simple and deadlock-free algorithm for routing messages from one node to another in an n-cube, has been presented in [lo]. It always finds the shortest path for messages. The e-cube routing algorithm routes in increasing or decreasing order of dimension. For clarity in description, we assume that the e-cube routing algorithm routes messages in an increasing order. In this paper, we assume that the paths between any two nodes are uniquely determined by the e-cube routing algorithm. In practice, e-cube routing algorithm is used in almost all commercial hypercubes for the message routing algorithm, regardless of store and forward routing or wormhole routing.

A set of nodes in an n-cube has several constituent links and nodes defined as follows.

Definition 4 (Constituent links). Let 8 be a set of nodes in an n-cube. Then, a set of links which are included in one of the shortest paths between a pair of nodes, Va, b E 8, is called the constituent links of 13 denoted by L(B).

Example 1 A 3-cube can be decomposed into three set of nodes, {C,, C,, C,}, as shown in Fig. l(a). Three types of lines in Fig. l(a) depict all the constituent links of each set of nodes. In Fig. l(a),

L(C) ={1(000, loo), 1(100,000), 1(011, ill), 1(111,011), 1(000,001),

1(001,011), Z(O11, OlO), 1(010, OOO), 1(100, lOI), 1(101, ill),

1(111, llO), 1(110, loo)},


~(c,)={1(001,000),1(000,010), 1(010,011), 1(011,001)},

L(C,) = (Z(101, loo), 1(100, llO), Z(110, ill), Z(111, lOl)},

In Fig. l(b), another decomposition is also described.

L(C,) = {1(000, OOl), 1(001,011), f(O11, OlO), 1(010, OOO)},

f(C,) = {1(001,011), 1(011, ill), 1(111, lOI), 1(101, OOI)},

L(C,) = (1(010, IlO), 1(110, OlO)},

L(C,) = {1(100, IOl), Z(101, 100)).

Similarly, the set of all the nodes in 8 is called the constituent nodes of 0 and denoted by N(B). In the above example, NC,) = {(OOO), (lOO), (01 l), (111)) in Fig. l(a) and N(C,) = {(loo), (101)) in Fig. I(b).

Using the constituent links of a set of nodes, we define the following concept.

Definition 5 (Link-disjoint). Let an n-cube, Q,, be decomposed into p (1 < p B n) sets of nodes, {C,, C,, . . . , C,). Then, Q, is said to be link-disjoint under the given decomposition if L(Ci> n L(C,.) = 6, Vl Q i, j < p and i f j.

In Fig. l(a), L(C,> fl L(C,) = L(C,) n L(C,) = L(C,) rl L(C,) = 6, i.e., the 3-cube is link-disjoint under the decomposition, {C,, C,, C,). However, the 3-cube of Fig. l(b), is not link-disjoint, because link 1(001, 011) is included in both L(C,) and Jo. We call such a link $001, 011) a conflict link between C, and C,. When an n-cube is decomposed into an arbitrary pattern, it is very tedious work to determine whether a given cube is link-disjoint or not because all the constituent links must be listed and compared one by one. In the next section, we propose a type of decomposition that satisfies the property of the link-disjoint for an n-cube.

3. Link-disjoint subcube

If an n-cube is decomposed into only CSs, we refer to such a decomposition as the CS-decomposition. Then, it is easy to show that an n-cube is link-disjoint under the CS-decomposition.

Lemma 1. Every shortest path of a k-CS (0 < k < n) in an n-cube is constituted with

only such links that those end nodes are included in the CS.

Proof.Let cr=a,_,(Y,_2 .*. (Ye be a k-CS (0 G k Q n>. Using e-cube routing, a path between any two nodes in cr should be established with only such nodes that are included in cr. Therefore every shortest path within (Y is composed of a sequence of links that those end nodes are included in the CS. q

Theorem 1. An n-cube is link-disjoint under a CS-decomposition.


Proof. Let a, /3 be two CSs configured by a CS-decomposition of an n-cube, Q,. Then, by Lemma 1, the intersection of (Y and p’s constituent links is an empty set; i.e., L(&-lL(p>=er.Th’ is result can be applied to any other pair of CSs in Q,. Therefore, Q, is link-disjoint under the given CS-decomposition. Cl

Theorem 1, with the disjoint property of nodes under a CS-decomposition, is one of the most important reasons of allocating contiguous subcubes to jobs. Theorem 1 focuses on the disjoint property of links. Previous works have focused on the disjoint property of nodes. Suppose that two subcubes CY and /3 are allocated to job Jj and Jj. Then there exists a conflict link between LY and p. Although (Y and p are mutually disjoint with respect to the nodes, there will be a serious performance degradation with both (Y and p. This is caused by conflict links. In other words, the number of messages that must be sent through the conflict link may be doubled. Therefore, the completion time of Ji and Jj will be largely dependent on the conflict links. If there are some conflict links in a hypercube computers under some type of decomposition, the system’s utilization may be decreased dramatically.

Before continuing our discussions about link-disjoint subcubes, we will specify some links related with a pair of CSs.

Definition 6 (Bridge). Let (Y=u~-,(Y,-~ ... (Y,, and /~=&_,P,,_T ... PO be ki- and A,-CS of an n-cube, where 0 G k,, k, d n - 1, respectively. Also, let H(a, p) + 0 and N((Y)={LY’, a2 ,..., a*“}, N(p)={P’. p2 ,..., p*“). All the links included in one of the shortest paths between any two nodes, Va’ E N(a) and VP’ E N( P> but not included in both L(a) and L( /3 1, are called bridges between (Y and p, denoted by

Ha, p>.

In Fig. 2, B(xl0, 00x) and B(xO0, x11) are depicted, where

B( X10,00X) = (1(000, OlO), 1(010,000) I 1(010,01 l), q011,001>,

1(110, loo), 1(100,000), 1(110, ill), q111, lOl),

I( 101,001))

(a) B (00x,x1 0) (b) B (x00,x1 1)

Fig. 2. Bridges.


and

B( x00, x11) = {Z(OW, OOl), 1(001,011), [(loo, IOl), 1(101, ill),

qo11, OlO), f(O10, OOO), /(Ill, IlO), q110, 100)).

Using the concept of bridges between two CSs of the same dimension, a contiguous

subcube can be redefined as follows.

Definition 7 (Contiguous subcube (redefined)). A k-dimensional contiguous subcube

(k-CS), yk, of an n-cube (0 G k 6 n) is redefined recursively as follows:

(1) y0 is a trivial graph with one node, and

(2) -yk is defined as the graph with the following nodes and links of two

(k - l)-CSs, yk-, and $- ,,:

WY,) =N(y,-1) “N($,), L(Yk) =L(Y,-1) “WY;-,)

“B(Y,- 17 ?;-I)>

where $_, is a (k- l>-CS such that H(y,_ ,, $- ,> = 1

and E(y,_ ,, r;_ ,) = 1.

Clearly, the number of links included in B( y,_ , , -y;_ ,) is 2 X 2k- ‘. Half of this number goes from one node in yk_ l to another in r;._ ,, while the remainder goes in the

opposite direction. Now, we will define a new type of subcubes in an n-cube using the same notations as above.

Definition 8 (Link-disjoint subcube). A k-dimensional link-disjoint subcube (k-LS), 8, of an n-cube (1 G k G n - 11, is defined as a graph with the following nodes and links of

two (k-1)-C%, (Y=cY,_,(Y,-2 ... (Y~/-,(Y~/-~ ... ally0 and P=&,Pn-2 ...

P21- I P*,-2 . . . PI PO:

N(S)=qa)“N(p), L(6) =L(a) “q P> “B(a, P),

where the [Cl] condition should be satisfied.

[Cl1 H((Y, /3>=2, E(cr, /3)=2 and &,-,+&_, and (rz,_2f&,_2, 1 <I,<

ln/21).

An LS, S, includes two CSs. For example, N(6) = N(OOOx0) U N(011 x0). This subcube can not be addressed by the symbol set 2. In order to address the subcube, we introduce a new symbol, s. Using this new symbol, we address the subcube, 6 (= 000x0 U 011 x0), as 0~~x0. Also, another LS, y, includes two CSs. For example, N(y) = N(1 xx011 U N(1 x10). In order to address the subcube, we introduce a new symbol, t. Using this new symbol, we address the subcube, y (= 1 xx01 U 1~x10) as 1 X.W. The new set of symbols including s and t is defined as follows.

Definition 9 (Quinary symbol set). A quinary symbol set, denoted by .Z”, is defined as a set of (0, 1, x, s, t}, i.e., 2” = .Z U Is, t], for addressing a set of nodes.


An LS, S can be addressed by the symbols in 2”. We can retrieve its constituent CSs using the two following rules: [Rll If there are two s’s in the address, we substitute them by two O’s first, and then by

two 1 ‘s. The two substituted addresses represent two subset of N(S). [R2l If there are two r’s, we substitute them by 0 and 1, and then by 1 and 0. Now, the

two addresses represent the two subsets of N( S ). Also, we can uniquely represent a R-LS with (k - 1) X’S and two s’s or t’s assigned

continuously in the bit positions (2 1 - 1) and (21- 21, where 1 Q 1~ [n/2] (satisfying [Cl]). We call such a string of symbols in 2” an address of an IS.

Some structural properties of an LS can be extracted from the definition and the two rules directly. They are listed below. [Pl] A K-LS is composed of two (k - l)-CS’s whose Hamming distance is 2. [P21 The distance between any pair of nodes in a k-LS is less than or equal to (k + 1). [P3l Every bridge of an LS consists of links. One node of the link is included in the LS

while the other is not. Among the three properties, [P3] needs some proof. Its proof is as follows.

Proof. Assume that CX, /3 are defined the same as Definition 8, and N(o) = {(Y’, ff*,...,ff *Y, Mp>=IP’, p* , . . . , p *‘-‘}. Then, Q, p are the two retrieved CSs of a K-LS. Let S= 6,_,6,,_, a.. 6,,_,S,,_, a.* 6,S, be the K-LS, and S,,_, = a*,_* = (s or t). The shortest path from node (Y~ to pj, Vl G i, j< 2k-‘, is constituted as follows: P(a’, /3j) = P(cu’, a”‘)P(cz”‘, c>P(c, Pm>F(jlm, /3j), where (Pz;_,+cuz”r_,. P?,_2+47_2) and <$‘=P;, whereOGp<21-2,21- 1 <p<

n - 1). We used a;l to denote a ““s direction bit 1. c is the intermediate node of CY”’ and pm, where c2,_, = a~._, and c2,_* = pz_*. In Fig. 3, the shortest path P((Y’, pj) is illustrated conceptually. Then, all the links of P( oi, am> and P( p “‘, pj> are

Fig. 3. Conceptual description of a shortest path.

J.-U. Kim ef al./Parallel Computing 22 (1997) 1.579-1595 I587

(a) l-dimensional Ls’s (b) 2-dimensional LS’s

Fig. 4. Link-disjoint subcubes (LS’s) of a 3-cube.

included in L(a) and L( p) respectively. Node c is not included in both the N(a) and N( p). The Hamming distance between om and p” is 2. Therefore, P(a m, c) =

fXom, c) and P(c, pm> = l(c, p”). Two links l(om, c) and ICC, pm> are all possible bridges between (Y~ and pj. These links satisfy [P3]. This result can also be applied to any shortest path from node p’ to oi, Vl G i, j< 2k-‘. Thus, every bridge between (Y and /3 satisfy [P3]. Cl

The following example describes several LS’s in a 3-cube. The examples also explain the three properties conceptually.

Example 2 There are four I-LSs in a 3-cube. They are (Oss), (Ott), (1 ss) and (1~). There are two 2-LSs in a 3-cube. They are (xss) and (xfr).

In Fig. 4, all the LSs in a 3-cube are depicted.

So far, we have proposed a new type of a subcube, the link-disjoint subcube, and examined its structural properties and construction. However, if we want to allocate a link-disjoint subcube to a job, it must be proved that all the constituent links of an LS are not in conflict with any other LSs or CSs in a given configuration of the hypercube. If an n-cube is decomposed into only LSs, we refer to such decomposition as the LS-decomposition.

Lemma 2. Every constituent link of a k-LS (1 d k < n - 1) in an n-cube cannot be a

constituent link of any other CSs.

Proof. Let a and /3 be the two constituent CSs of a k-L& 6, in an n-cube, Q,. This lemmameansthat L(cu)nL(C,)=L(P)nL(C,>=ld and B(cu, p)nL(C,)=@,where Ci is any CS in Q, whose nodes are mutually disjoint with N( S ). The first two are clear by Theorem 1. Considering Lemma 1 and the proof of [P3], any link in L(C,) cannot be a bridge between LY and p. Therefore, the third part is proved. q

Theorem 2. An n-cube is link-disjoint under an LS-decomposition.

Proof. Let 6 be a A-LS (1 < k < n - 1) in an n-cube, Q,. Also, let be cx and p be the two constituent (k - I)-CSs of S retrieved applying [Rl] and [R21. 8’ is any LS in Q,

1588 J.-U. Kim et al./Parallel Computing 22 (1997) 1579-1595

c'=a~_,a~_,...~~_,~,_,~,_,...a~~

cz = ai_,ak_2.. . 7.

d&i_,o.&-20$-,~~~a~~ p’=&c&... a.$~cf$-3~ . . a’;4

Fig. 5. Bridges of a link-disjoint subcube.

whose nodes are mutually disjoint with 6. a’ and /3’ are the two constituent (k - ll-CSs of 6’ retrieved applying [Rl] and [R2]. This theorem states that L(S) n L(6’) = fl. By Theorem 1, L(a)nL(a’)=L(p>nL(p’)=lb and L(a)nL(p’)=L(p)nL(a’)= $4. B(a, p>nL(a’)=B(a, p)nL(p’)=@ byLemma2.If B(a, P)nB(a’, P’)= 6, L(6) n L(6’) = fil. We will prove that B(a, p) nB(a’, p’> = cd.

Let ai and pi be the nodes of a and p respectively, and let the distance between them be 2. P(a’, p’) = I(a’, c’)l(c’, pi>. F’( pi, a’) = l( pi, c*)$c*, a’). These are the bridges between ai and pi. Let ai=ai_,ai_2 ... a:Ia;l- Ia:,-2a:l-3 . *. afa& The addresses of pi and two intermediate nodes, c’ and c*, are represented as

pi=ai_,(yi_2 --. a~,a~I-, a&2a~,_ 3 * . . i

alao,

c 2_ i i -a,_,a,-2 **. ai,ail- ,a:,_2 a:/_3 . . . afa6

applying e-cube routing algorithm where we use 7 to represent the complement of af.

Let us consider the LS that has l(a’, c’) as its bridge. This LS has two CSs. One CS includes c’ as its constituent node, and the other includes a neighboring node of a i, d, as its constituent node according to [P3]. The [Cl] of Definition 8 forces that ci = ai_ ,ai_2 . . . a&a&_ ,a:,-2 ai,-3 * . . a f a:. Therefore ci should be c*, Links between c’ and c* are NC’, a’), l(a’, c*), ICC*, /ii’) and I( pi, c’>. The traversing direction of the shortest paths between them are the reverse of B( a, p >. For example, an LS including both c1 and c* is shown in Fig. 5. Therefore, l(a i, c’) is not in conflict with the links of any other LSs. This result can be applied to the remaining three bridges


between (Y’ and p i and all the bridges between (Y and p. We can say that any links of B( CY, p) is not in conflict with any other LS’s links. •I

Theorem 2 is a counterpart of Theorem 1 for link-disjoint subcubes. Then, the two theorems can be combined into one. If an n-cube is decomposed into CS and/or LSs,

we refer to such decomposition as CLS-decomposition.

Theorem 3. An n-cube is link-disjoint under a CLS-decomposition.

Proof. By Theorem 1, every pair of CSs cannot have a conflict link. Also, by Lemma 2,

every pair of a CS and LS cannot have a conflict link. Theorem 2 states that every pair

of LSs cannot. Thus, the theorem is completed. 0

Previous studies on the subcube allocation problems have shown that the performance

of a scheme is largely dependent on its subcube recognition ability. Various schemes are compared to each other with respect to their subcube recognition ability. They are

limited to “C, X 2”-k m their number of recognizable subcubes, although they can see all the subcubes in an n-cube. The following lemmas provide a quantification of our subcube recognition ability.

Lemma 3. The number of k-C.!& in an n-cube is ,C, X 2n-k, where 0 < k < n.

Proof. Each k-CS can be represented by an n-bit address, where (n - k) bits are 0 or 1 and the remaining k bits are X. There are .C, ways to choose k bits out of n bits. For a given set of k bit positions of X, there exist 2”- k different ways to label the remaining (n - k) bits. 0

Lemma 4. The number of k-LSs in an n-cube is 1 n/2],_ 2Ck_ ,2”- k, where 1 < k Q n - 1 and 2 < n.

Proof. By Definition 8, each k-LS can be represented by an n-bit address having (k - 1) bits of X’S, 2 bits of s(t>‘s and (n - (k - 1) - 2) bits of 0 or 1. There are in/21

ways to assign s(t)‘s out of n-bits and n_ 2Ck_ , ways to assign x’s. Thus, there are ln/2],_ 2Ck_ ,2”- k- ’ k-LSs containing two s( t>‘s. Thus, the lemma is proved as there are two types of k-LSs, one containing two s’s and the other two t’s, q

Lemma 5. The number of all the recognizable subcubes under a CLS-decomposition is

[n/2J(n-k)k+n(n- 1)

n(n- 1)

times that under a CS-decomposition.

Proof. It is obvious from Lemma 3 and 4. 0

In Table 1, we calculated the relative number of CSs and LSs, varying the cube dimensions and its subcube dimensions. We can see that the number of recognizable

1590 J.-U. Kim et al./ParaNel Computing 22 (1997) 1579-1595

Table 1 Relative numbers of all the recognizable subcubes compared to CSs

k n

10 9 8 I 6 5 4 3

1 1.50 1.44 1.50 1.43 1 so 1.40 1.50 1.33 2 1.89 1.78 1.86 1.71 1.80 1.60 1.66 1.33 3 2.17 2.00 2.07 1.86 1.90 1.60 1.50 - 4 2.33 2.11 2.14 1.86 1.80 1.40 - 5 2.39 2.11 2.07 1.71 1.50 - -

6 2.33 2.00 1.86 1.43 - 7 2.17 1.78 1.50 - 8 1.89 1.44 - - 9 1.50 - - -

n: dimension of a hypercube k: dimension of a subcube relative number = (# of LSs + # of CSs)/(# of CSs)

subcubes in a hypercube is greater than that of the previous schemes. For example, the number of all the recognizable subcubes is at maximum 2.39 times that of CSs in lo-dimensional hypercube computers.

4. Consideration of performance degradation

Since structures and properties of link-disjoint subcubes are very similar to contiguous subcubes, they are very attractive candidates for allocating jobs to themselves. However, it must be first proved that the time for completing a job on an LS comparable to that on a CS. Then, we can say that a job can be executed on a link-disjoint subcube with little or almost zero performance degradation.

Some notations are needed for the proof. Let the time to complete a job on a k-CS be denoted as TC,, and on a k-LS, TL,. Then TC, (TL,) is divided into two portions, the calculation time portion, t~,,,~ (rlealc), and the communication time portion, rc,,,,

(&7P?llll >. The communication time is also divided into k-portions. Each portion required to communicate through each dimension is denoted by tc~,,,,, (rl~O,,,,>, tc~,,,,,

(tl,z,,,>, * * *, tcckornrn (rl,k,,, ). It is estimated with averaging all the node’s portion globally. Clearly, TL, will be larger than TC,. It is larger because one of the LS’s communication path is traversed through an external node which is not in the LS. Let us take the dimension of such a communication path be dimension k.

Let the ratio (TL, - TC,> to TC, be performance degradation factor, denoted by PDF. Then PDF indicates how much additional time is required to execute a job on a k-LS compared to the time needed on a R-CS, denoted as

PDF = TL, - TC,

Tck

x loo (%).

J.-U. Kim et al./Pamllel Computing 22 (1997) 1579-1.595 1.591

The numerator will be also denoted as tl,,,, - rc,,,, because tlcolr and tcCalC are equal regardless of the hardware implementations or application programs. Therefore, PDF will be represented as follows:

PDF = tlconm - tCconlm

tc + ~ccon!nl x 100 (%)

COIL”

= c:z : tLm + tLm - (ctr : tcfmm + tL)

tc,*ic + tc,,,, x 100 (%)

cnnl - 4mlm = tc talc + tcconlnl

x 100 (%). (2)

Let the ratio of tZ&mm to tc,k,,, be referred to as p (1 < ~1. If all the factors of above equation are divided by tc,,,,,,, PDF becomes

PDF = ( cmn/tccom,) x ( CL - 1) ( tccolc/Gml) + 1

x loo(%). (3)

Above equation is applied only to an environment in which the given hypercube is link-disjoint. Otherwise, the analysis will be very complex. Several experimental works [s] about the performance of circuit switching and wormhole routing communication networks have been performed. Without any loss of generality, we can assume that I_L is less than 1.05, i.e., the additional time needed to send a message to a node in distance 2 is less than 5% of the time required to send it to a neighboring node. This assumption is a very relaxed one compared to several other experimental results. Also, we assume that

tc,kO, !?I is l/k of cc,,,,. In [4], various algorithms for hypercube computers are analyzed. They state that “the communication overhead is small compared to unity in a practical algorithm of a hypercube.” In almost all the cases, it is less than 0.5.

From the above discussions, we can roughly estimate PDF, which will be in the range of 1.67% (when p = 1.05, tc,,,c/tcco,,,m = 2.0 and k = 11, N 0.17% (when P = 1.05, fcm,c/fCcomm = 5.0 and k = 5), where p = 1.05, 1 < k < 5 and 2 <

tCcolc/tccomm G 5. In many cases, PDF will be lower than 1 .O%. Therefore, this amount of performance degradation may be acceptable if we can find disjoint subcubes which, otherwise, will be idle. Above discussions are one of the basis for our ideas about link-disjoint subcubes.

5. Algorithm

In this section, we propose an algorithm for subcube allocation in an n-cube. The framework used in our scheme is based on the strategies developed by previous work [5], which was able to recognize all the available CSs. A subcube allocation and deallocation algorithm which can find the two types of subcubes can be easily established using the already developed free list strategy. The detailed formal descrip- tions of free list strategy can be referred to in [5]. The procedure for our algorithm,


Algorithm 1. Allocation

Input: k-subcube request Output: allocated subcube

1 Apply the allocation algorithms in [51 2 If not allocated, find two free subcubes satisfying [Cl] of Definition 8 in k - 1

dimension free lists. 3 If found, allocate these as k-subcube.

Fig. 6. Link-disjoint free list allocation.

called link-disjoint free list (LFL) strategy, is given in Figs. 6 and 7. Adding the step 2 in Fig. 6 to any of the classical strategies [l-5], we can develop the subcube allocation scheme. The deallocation is the same.

Now let us analyze the time complexity of LFL. This is related to the list size. In [5], they say that the total list size is h . n for some constant h, where h is usually in the range of 2 to 3. In allocation, the time complexity of step 2 is O(h*) = O(1). We assume that a comparison takes one time unit. This assumption is the same as [5]. Therefore the complexity of LFL allocation and deallocation is the same as the free list scheme.

6. Simulation results

In order to study the performance of our LFL strategy, we have developed a simulator. The buddy and free list strategy were also simulated for comparison.

The simulation model used for running the three strategies is described below. Initially, the simulated hypercube is empty (or there is no job), and the incoming requests (or jobs) are generated at each time unit during a simulation run. The generation is continued until the predetermined time unit, T, is reached. It is assumed that the dimensions of the subcubes required by the incoming requests and the residence

Algorithm 2. Deallocation

Input: released k-subcube Output: update free list

1 if this subcube is a LS, 2 Apply the deallocation algorithms in [5] to two k - 1 subcubes of the LS. 3 else Apply the deallocation algorithms in [5] to the k-subcube.

Fig. 7. Link-disjoint free list deallocation.

J.-U. Kim et ul./Parallel Compuring 22 (1997) 1579-1595 1593

Table 2

Simulation results on a 5-cube

Distribution T delay ( d) Efftciency ( E) %ofLS(P)

Buddy FL LFL Buddy FL LFL LS/CLS (o/o)

Case- 1 100 10.94 10.09 9.50 76.47 77.62 78.36 2.8

200 21.38 19.53 18.33 77.96 79.25 80.04 2.7

300 31.93 29.02 27.22 78.53 79.89 80.69 2.6

Case-2 100 9.54 8.37 7.83 77.82 79.48 80.09 1.9

200 18.25 15.67 14.71 80.09 81.91 82.51 1.8

300 26.78 22.73 21.38 80.78 82.73 83.33 2.0

Case-3 loo I I .72 9.60 9.43 71.99 73.92 74.32 1.1

200 22.93 19.35 18.85 75.23 77.19 77.48 I.1

300 33.98 28.90 28.09 76.52 78.54 78.88 1.2

[Residence time distribution]:

Case-l: Uniform[3,71.

Case-2: Uniform[3.83,7.831.

Case-3: Uniform[8.36, 12.361.

[Cube size distribution]:

Case- 1 (Uniform): p. = p, = pz = p, = p4 = 0.2.

Case-2 (Normal): p. = p., = 0.098, p, = p3 = 0.214, p2 = 0.376.

Case-3 (Biased Normal): = p,, 0.504, p, = 0.217, p2 = 0.143, p, = 0.087, p4 = 0.049.

times of jobs follow the given distributions. The request that was not satisfied upon arrival was queued until a subcube became recognizable. When a job is released and there are some input requests in the waiting queue, the job scheduler tries to schedule the jobs at the queue using FCFS scheduling discipline. If there are some jobs in the

Table 3

Simulation results on a 8-cube

Distribution 7’ delay(d)

Buddy FL LFL

Efficiency ( E)

Buddy FL LFL

Case- I

Case-2

Case-3

100 1.67 1.47 1.38

200 2.04 1.77 1.66

300 2.10 1.82 1.68

100 0.02 0.01 0.01

200 0.02 0.01 0.01

300 0.02 0.01 0.01

loo 148.21 147.25 146.47

200 299.78 297.8 1 296.24

300 451.11 448.25 445.92

57.77

59.73

60.25

45.81

47.04

47.20

84.04

84.32

84.41

57.99 57.98

59.84 59.94

60.32 60.32

45.81 45.8 1 47.04 47.04

47.20 47.21

84.44 84.75

84.72 85.05

84.80 85.12

%ofLS(P)

LS/CLS (o/o)

1.2

1.2

1.2

0.0

0.0

0.0

2.6

3.2

3.1

[Residence time distribution]:

Case- I: Uniform[3,71.

Case-2: Uniform[3.83,7.831.

Case-3: Unifoorm[8.36, 12.361.

[Cube size distribution]:

Case-l (Uniform): p. = p, = pz = pj = p4 = ps = p, = p7 = 0.125.

Case-2 (Normal): pz = p6 = 0.098, pj = ps = 0.214, p4 = 0.3%

Case-3 (Biased Normal): p. = 0.504, p, = 0.217, pz = 0.143, p3 = 0.087, p4 = 0.049.

1594 J.-U. Kim et al./Parallel Computing 22 (1997) 1579-1595

queue, the new incoming request must be queued. Above simulation model is based on the methods used in [ll and [51.

Under the given simulation model, the following measures were collected or com- puted, and averaged over 100 independent runs.

. d: Average waiting delay per request.

. E: Efficiency of the system ((Cy= ,2 I Ji ‘t.)/(2” . T), where ti is the residence time of job Ji and n is the dimension of simulated hypercube).

. P: Percentage of allocation of link-disjoint subcubes over all allocations. It can be observed from Table 2 and 3 that the LFL scheme performs better than any other scheme in terms of average delay and efficiency.

7. Conclusion

In this paper, we propose a new type of subcubes, called link-disjoint subcubes (LSs), which can be used for the subcube allocation problem in hypercube computers. We have developed new concepts of link-disjoint subcubes assuming that the circuit switching or wormhole routing of the communication method is used. Also, it is assumed that the e-cube routing strategy for routing data between any pair of processors is used.

An LS is not a contiguous subcube as in the previous schemes, but has very similar structural properties to the contiguous subcube. Moreover, this subcube has no common communication link with any other subcubes. We have shown that when LSs are used, the performance degradation caused by non-contiguous processor allocation is lower than 1.0% in many cases.

With the availability of link-disjoint subcubes, there are 1 n/21”_ 2Ck_ ,2n-k k-LSs recognizable in an n-cube. The number of all the recognizable subcubes under our scheme is (ln/2j(n - k)k + n(n - l))/n(n - 1) times that under the previous schemes. For example, the number of all the recognizable subcubes is at maximum 2.39 times that of CSs in lo-dimensional hypercube computers.

Through simulation, the performance of our scheme has been measured and compared to the previous schemes which could find only contiguous subcubes. The simulation result has shown excellent processor utilization and less average waiting delay.

We also have presented the LFL strategy for subcube allocation, which finds the two types of subcubes, that is, link-disjoint subcubes and contiguous subcubes. The complexity of the LFL allocation and deallocation is the same as the free list scheme. This algorithm can be easily developed by using previous strategies developed for contiguous subcubes.

References

[ll M.-S. Chen and K.G. Shin, Processor allocation in an n-cube multiprocessor using Gray codes, UL5.E

Trans. Comput. 36 (12) (1987) 1396-1407.

[21 P.-J. Chuang and N.-F. Tzeng, A fast recognition-complete processor allocation strategy for hypercube computers, IEEE Trans. Comput. 41 (4) (1992) 467-479.

J.-U. Kim et ul./Purallel Computing 22 (19971 1579-1595 159.5

[3] S. Dutt and J.P. Hayes, On allocating subcubes in a hypercube multiprocessor, in: Proc. 3rd Conj: on Hypercuhe Concurrent Computers and Applications ( 1988) 80 I-8 10.

[4] G.C. Fox, M.A. Johnson, G.A. Lyzenga, S.W. Otto, J.K. Salmon and D.W. Walker, Soluin,q Problems On Concurrent Processors, Volume I - General Techniques and Regular Problems (Prentice-Hall, Engle-

wood Cliffs, NJ, 1988).

[51 J. Kim, CR. Das and W. Lin, A top-down processor allocation scheme for hypercube computers, IEEE Trans. Parallel und Distributed Systems 2 (I 1 (I 991) 20-30.

[6] J.-U. Kim, C.-H. Lee and K.H. Park, Modified subcube for processor allocation in circuit-switched

hypercubes, in: Proc. 12th Ann. Internat. Phoenix Conf. on Computers and Communications (1993) l-8.

[7] S. Latifi, The efficiency of the folded hypercube in subcube allocation, in: Proc. 1990 Internnt. Conj: on Parullel Processing, Vol. I ( 1990) 2 18-22 1.

[8] L.M. Ni and P.K. McKinley, A survey of wormhole routing techniques in direct networks, IEEE Comput. 26 (2) ( 1993) 62-76.

[9] D.D. Sharma and D.K. Pradhan, Fast and efficient strategies for cubic and non-cubic allocation in

hypercube multiprocessors, in: Proc. 1993 Internar. Conf on Pnrullel Processing (1993) I-l l8- 127.

[ I01 H. Suilivan and T.R. Brashkow, A large scale homogeneous machine, in: Proc. 4th Ann. Internat. Symp. on Computer Architecture ( 1977) 105- 124.

[I 11 Q. Yang and H. Wang, A new graph approach to minimizing processor fragmentation in hypercube

multiprocessors, IEEE Truns. Parullel urul Distributed Systems 4 ( IO) (1993) I l65- I 171.

Documents

A link-disjoint subcube for processor allocation in hypercube computers