7
Efficient Mapping and Voltage Islanding Technique for Energy Minimization in NoC under Design Constraints Pavel Ghosh Computing, Informatics and Decision Systems Engineering, Arizona State University Tempe, AZ, USA, 85287 [email protected] Arunabha Sen Computing, Informatics and Decision Systems Engineering, Arizona State University Tempe, AZ, USA, 85287 [email protected] ABSTRACT Voltage islanding technique in Network-on-Chip (NoC) can significantly reduce the computational energy consumption by scaling down the voltage levels of the processing elements (PEs). This reduction in energy consumption comes at the cost of the energy consumption of the level shifters between voltage islands. Moreover, from physical design perspective it is desirable to have a limited number of voltage islands. Considering voltage islanding during mapping of the PEs to the NoC routers can significantly reduce both the compu- tational and the level-shifter energy consumptions and the communication energy consumption on the NoC links. In this paper, we formulate the problem as an optimization problem with an objective of minimizing the overall energy consumption constrained by the performance in terms of delay and the maximum number of voltage islands. We pro- vide the optimal solution to our problem using Mixed Inte- ger Linear Program (MILP) formulation. We also propose a heuristic based on random greedy selection to solve the problem. Experimental results using E3S benchmark appli- cations and some real applications show that the heuristic finds near-optimal solution in almost all cases in a very small fraction of the time required to achieve the optimal solution. Categories and Subject Descriptors B.7.2 [Integrated Circuits]: Design Aids—Placement and Routing, Layout ; C.3 [Special-purpose and Application- based Systems]; G.1.6 [Numerical Analysis ]: Opti- mization—Linear Programming General Terms Algorithms, Design, Experimentation, Theory Keywords Multi-Processor System-on-Chip (MPSoC), Network-on-Chip (NoC), voltage islanding, integer linear program, greedy ran- domized heuristic Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’10 March 22-26, 2010, Sierre, Switzerland. Copyright 2010 ACM 978-1-60558-638-0/10/03 ...$10.00. 1. INTRODUCTION In recent years Multi-Processor System-on-Chip (MPSoC) design has become extremely challenging due to the increas- ing complexities in processor and semiconductor technolo- gies. Due to the growing complexity of consumer embed- ded products, and the complexity of new communication and multimedia standards, future MPSoCs are predicted to contain several hundreds of processing elements (PEs) com- municating among themselves at very high-speed rates. In order to meet the increasing requirements of performance, scalability and flexibility, shared bus based communication infrastructure is no longer adequate for MPSoCs. Network- on-Chip (NoC) provides an alternative to the bus-based on- chip communication that can overcome the problems of per- formance, scalability and flexibility. In order to handle the increasing complexities in the MPSoC designs, the necessity for NoC has been discussed by several researchers [2, 3, 8]. As the number of PEs on an MPSoC and the data traffic between them continue to grow, minimization of energy con- sumption subject to the performance constraint has become one of the most important objectives. Power consumption of VLSI circuits can be roughly broken down into two compo- nents: static power and dynamic power. While static power mainly relates to the leakage current, dynamic power P d is a result of the switching activities of the circuit, given by: P d = kCV 2 dd f where k is the switching rate, C is the load capacitance, V dd is the supply voltage and f is the clock frequency. As dynamic power is proportional to the square of supply volt- age V dd , reducing V dd can significantly reduce the dynamic power consumption. Among the various approaches taken to reduce power consumption of MPSoCs, the use of multi- supply voltages (MSV) has gained popularity among the re- searchers. The performance critical PEs generally require high supply voltage to meet the performance requirements, while the non-crictical PEs can be slowed down using lower supply voltages and thus gaining in terms of power consump- tion. Utilizing the power-performance tradeoff, the idea of multi-voltage islanding was first proposed in [9]. Voltage is- land on a chip is defined as a cluster of adjacent PEs all operating at the same voltage level. Although scaling down the voltage levels of PEs is favorable for reduction of en- ergy consumption, excessive number of voltage islands may be detrimental from the perspective of physical design [14] as it creates voltage island fragmentation of the chip and increases the complexity of layout of the power delivery net- work. Therefore, voltage islanding needs to be considered 535

[ACM Press the 2010 ACM Symposium - Sierre, Switzerland (2010.03.22-2010.03.26)] Proceedings of the 2010 ACM Symposium on Applied Computing - SAC '10 - Efficient mapping and voltage

Embed Size (px)

Citation preview

Efficient Mapping and Voltage Islanding Technique forEnergy Minimization in NoC under Design Constraints

Pavel GhoshComputing, Informatics and Decision Systems

Engineering, Arizona State UniversityTempe, AZ, USA, 85287

[email protected]

Arunabha SenComputing, Informatics and Decision Systems

Engineering, Arizona State UniversityTempe, AZ, USA, 85287

[email protected]

ABSTRACTVoltage islanding technique in Network-on-Chip (NoC) cansignificantly reduce the computational energy consumptionby scaling down the voltage levels of the processing elements(PEs). This reduction in energy consumption comes at thecost of the energy consumption of the level shifters betweenvoltage islands. Moreover, from physical design perspectiveit is desirable to have a limited number of voltage islands.Considering voltage islanding during mapping of the PEs tothe NoC routers can significantly reduce both the compu-tational and the level-shifter energy consumptions and thecommunication energy consumption on the NoC links. Inthis paper, we formulate the problem as an optimizationproblem with an objective of minimizing the overall energyconsumption constrained by the performance in terms ofdelay and the maximum number of voltage islands. We pro-vide the optimal solution to our problem using Mixed Inte-ger Linear Program (MILP) formulation. We also proposea heuristic based on random greedy selection to solve theproblem. Experimental results using E3S benchmark appli-cations and some real applications show that the heuristicfinds near-optimal solution in almost all cases in a very smallfraction of the time required to achieve the optimal solution.

Categories and Subject DescriptorsB.7.2 [Integrated Circuits]: Design Aids—Placement andRouting, Layout ; C.3 [Special-purpose and Application-based Systems]; G.1.6 [Numerical Analysis ]: Opti-mization—Linear Programming

General TermsAlgorithms, Design, Experimentation, Theory

KeywordsMulti-Processor System-on-Chip (MPSoC), Network-on-Chip(NoC), voltage islanding, integer linear program, greedy ran-domized heuristic

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SAC’10 March 22-26, 2010, Sierre, Switzerland.Copyright 2010 ACM 978-1-60558-638-0/10/03 ...$10.00.

1. INTRODUCTIONIn recent years Multi-Processor System-on-Chip (MPSoC)

design has become extremely challenging due to the increas-ing complexities in processor and semiconductor technolo-gies. Due to the growing complexity of consumer embed-ded products, and the complexity of new communicationand multimedia standards, future MPSoCs are predicted tocontain several hundreds of processing elements (PEs) com-municating among themselves at very high-speed rates. Inorder to meet the increasing requirements of performance,scalability and flexibility, shared bus based communicationinfrastructure is no longer adequate for MPSoCs. Network-on-Chip (NoC) provides an alternative to the bus-based on-chip communication that can overcome the problems of per-formance, scalability and flexibility. In order to handle theincreasing complexities in the MPSoC designs, the necessityfor NoC has been discussed by several researchers [2, 3, 8].

As the number of PEs on an MPSoC and the data trafficbetween them continue to grow, minimization of energy con-sumption subject to the performance constraint has becomeone of the most important objectives. Power consumption ofVLSI circuits can be roughly broken down into two compo-nents: static power and dynamic power. While static powermainly relates to the leakage current, dynamic power Pd isa result of the switching activities of the circuit, given by:

Pd = kCV 2ddf

where k is the switching rate, C is the load capacitance,Vdd is the supply voltage and f is the clock frequency. Asdynamic power is proportional to the square of supply volt-age Vdd, reducing Vdd can significantly reduce the dynamicpower consumption. Among the various approaches takento reduce power consumption of MPSoCs, the use of multi-supply voltages (MSV) has gained popularity among the re-searchers. The performance critical PEs generally requirehigh supply voltage to meet the performance requirements,while the non-crictical PEs can be slowed down using lowersupply voltages and thus gaining in terms of power consump-tion. Utilizing the power-performance tradeoff, the idea ofmulti-voltage islanding was first proposed in [9]. Voltage is-land on a chip is defined as a cluster of adjacent PEs alloperating at the same voltage level. Although scaling downthe voltage levels of PEs is favorable for reduction of en-ergy consumption, excessive number of voltage islands maybe detrimental from the perspective of physical design [14]as it creates voltage island fragmentation of the chip andincreases the complexity of layout of the power delivery net-work. Therefore, voltage islanding needs to be considered

535

during the mapping of the PEs to the NoC routers. This waycomputational energy consumption can be reduced, with-out creating excessive number of voltage islands. Moreover,mapping of heavily connected PEs in adjacent locations willreduce communication energy consumption on NoC links.

1.1 Related WorkIn [10], the authors have developed a dynamic program-

ming based voltage island partitioning, level shifter insertionand power network aware floorplanning for power optimiza-tion within timing constraints. The authors in [15] haveproposed dynamic programming based approach for voltageselection and island creation to minimize overall SoC power,area and floorplanner runtime. In [12], the authors opti-mize total power consumption and power network complex-ity without compromising wirelength and chip area. In [11],the authors have proposed an α2-approximation algorithmfor the voltage islanding problem, where α is the ratio of themaximum and minimum voltage values. In [17], the objec-tive of optimizing power consumption with limited designcost and number of level shifters has been considered. Theauthors in [6] develops a Simulated Annealing based frame-work with cost function combining the number of voltageislands, power consumption and area overhead. In [7], theauthors propose a temperature-aware voltage islanding andfloorplanning to minimize the peak and average tempera-ture across SoC, area, wirelength and power budget. In [13],the authors consider voltage islanding in NoC to minimizeenergy consumption based on solving a non-linear problemformulation. These approaches either have the disadvantageof allowing voltage islanding only on existing placements ofPEs, and thus suffer from early design decisions, or theydo not consider the energy consumption factor on the levelshifters. Also, some of these approaches involve exploringthe entire design space exhaustively, and thus become lesspractical for increasing number of PEs.

It has been mentioned in [13], that the energy consump-tion on level shifters can be significantly high. In [1], theauthors design a voltage level converter circuit, with an es-timated energy consumption proportional to the differenceof the square of the voltage levels of two end-points. We fol-low the similar characterization in this paper. To the best ofour knowledge, our paper is the first that considers the volt-age islanding problem for NoC at an early phase of designtaking into account all the contributing factors for energyconsumption, including the level shifters’ overhead.

1.2 Key ContributionsThe contributions of our paper can be listed as follows:

• The voltage islanding problem for NoC is considered aspart of mapping of the PEs to the NoC router nodes.

• All three components of energy consumption, namely- computational energy consumption, communicationenergy consumption on links and level shifter energyconsumption have been considered.

• The mapping and voltage islanding problem is formu-lated as an optimization problem and Mixed IntegerLinear Program (MILP) is developed for optimal solu-tion of the problem.

• Efficient heuristic based on random greedy selection isprovided for good quality solutions.

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]̂_̀abcdefghijklmnopqrstuvwxyz{|}~

A B

C D

(a) Mapping with 4voltage islands

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]̂_̀abcdefghijklmnopqrstuvwxyz{|}~

A B

C D

(b) Mapping with 2voltage islands, buthigher energy con-sumption

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]̂_̀abcdefghijklmnopqrstuvwxyz{|}~

A D

B C

(c) Mapping with 2voltage islands andlower energy con-sumption

Figure 1: The effect of mapping on final voltage is-landing and energy consumption (light green/ lightgray) indicating lower voltage level V1 and (darkgreen/dark gray) indicating higher voltage level V2

• Experimental results evaluate the quality of the heuris-tic as compared to the optimal.

1.3 Motivational ExampleThe motivation behind our approach can be shown in

the following example. It shows that considering voltageislanding as part of mapping of the PEs to router nodes,can be advantageous in terms of saving energy consump-tion. Let us consider a simple communication trace graphconsisting of 4 PEs, A,B,C and D. The edges in the graph(A,B), (A,C), (B,D) and (C,D) represent the communica-tion among the PEs. We define allowable voltage levels foreach PE, as the set of voltage levels, operating at each ofwhich the task executing at the PE will execute within itsperformance bound. Let us consider that the allowable volt-age levels for PEs A and D are V1 = 1.77 V and V2 = 2.5 V ,whereas, the allowable voltage levels for PEs B and C areonly V2 = 2.5 V . Now, let us consider placing the PEs ina regular 2 × 2 mesh NoC architecture topology. Withoutvoltage islanding in mind, we may map the communicatingPEs to the adjacent router nodes of the NoC, and the map-ping is shown in Fig. 1(a). If we operate all the PEs at thelowest possible voltage levels, there will be 4 voltage islandsrequiring 4 level shifters as shown in the figure. If the designconstraint specifies the maximum number of voltage islandsto be 2, we need to raise the voltage level of either A or D,as shown in Fig. 1(b). This will lead higher computationenergy consumption due to the higher voltage assignment toone of those PEs (shown as D in figure). On the other hand,if the voltage islanding was considered during mapping of thePEs, we could have achieved the mapping as shown in Fig.1(c). In this mapping, all the PEs can be operated at theirlowest possible voltage levels, and still satisfying the designconstraint. As shown in the Fig. 1(c), the number of voltageislands created is 2, and requires only 2 level shifters.

From this example, it is clear that considering the volt-age islanding technique for energy minimization at an earlyphase of design leads to better energy reduction. In ourpaper, we have combined all three components of energyconsumption (computation, level shifter and link communi-cation) in the objective function, and included the constraintof maximum number of voltage islands.

1.4 System Model and AssumptionsOur targeted system can be classified as application spe-

536

cific standard products (ASSP) [3]. The traffic patterns andbandwidth requirements are known a priori in this kind ofsystems. We consider the routers to follow static sourcerouting, where the routing of the data paths remains staticbased on the routing table at the source node. Delay anal-ysis is based on static analysis technique described in [5].The characterization of allowable voltage levels can be doneindependently for each PE as in [6, 7, 12, 15]. In this paper,we considered regular mesh NoC topology, but our approachcan be used for any other regular NoC topologies. Each ofthe routers have 5 ports. One of the ports is used for con-necting it to a PE and the other four are used for connec-tion to the neighboring routers. Voltage level shifter energyconsumption model is taken from [1]. Link communicationenergy consumption parameters have been taken from [16].

1.5 Paper OutlineThe rest of the paper is organized as follows. Section 2

gives a formal definition of the problem and formulate it asan optimization problem. In section 3 we develop MixedInteger Linear Program (MILP) to obtain optimal solutionto the problem. We develop efficient heuristic to solve theproblem in section 4. The experimental results are discussedin section 5. Finally, we conclude the paper in section 6.

2. PROBLEM FORMULATIONIn this section, we give the formal definition of the map-

ping and voltage islanding problem. Considering regularmesh topologies, the problem can be stated as follows: Given,

• A mesh topology for the NoC, GR(VR, ER), with di-mension M × N , where VR is the set of NoC routersand ER is the set of NoC links.

• Each node of the mesh topology is represented by atuple (xci, yci), representing the x−y coordinate of thenode, i.e., ∀ui ∈ VR, there is an associated (xci, yci),where 1 ≤ xci ≤M and 1 ≤ yci ≤ N .

• Distance between ui, uj ∈ VR is denoted by dij (innumber of hops: dij = |xci − xcj |+ |yci − ycj |.

• Communication trace graphGP (VP , EP ) (|VP | ≤ |VR|).

• Each pi ∈ VP represents a Processing Element (PE)and an edge eij ∈ EP represents a communicationtrace between PEs pi and pj , where eij = (pi, pj).

• Communication volume (in number of bytes) associ-ated with each eij ∈ EP , and denoted by cij .

• Each pi ∈ VP is associated with an allowable set ofvoltage levels, Li = {v1

i , v2i , . . . , v

nii }, operating at each

of which the task assigned at PE pi finishes within thespecified deadline.

• Communication delay bound λij for eij(pi, pj) ∈ EP .

• Maximum number of allowable voltage islands (κ).

• ηvi = computation energy consumption of pi ∈ VP at

voltage level v ∈ Li.

• αv1v2 = level shifter energy consumption between twoadjacent PEs operating at voltage levels v1 and v2.

• ψl = power consumption of the links (/mm/Mbps).

The objective is to place the PEs on the nodes of theNoC mesh topology such that, the cost function is minimizedwithout violating any design constraints. The cost functionfor this problem has three components:(1) Computation energy consumption(2) Level shifter energy consumption(3) Communication energy consumption

3. OPTIMAL SOLUTIONIn this section, we use the mathematical programming

techniques to solve the mapping and voltage islanding prob-lem. We formulate the problem as a Mixed Integer LinearProgram (MILP). The objective function combines all threecomponents of energy consumption. We shall first define thevariables used in the formulation, followed by the definitionof the objective function and the constraints.Variables:

1. ∀pi ∈ VP ,∀k ∈ Li,

xik =

1, if pi is operating at voltage level k0, otherwise

2. ∀pi ∈ VP ,∀up ∈ VR,

δip =

1, if pi is placed at location up

0, otherwise

3. ∀pi, pj ∈ VP , ∀k ∈ Li, ∀l ∈ Lj ,

yijkl =

1, if xik = 1 and xjl = 10, otherwise

4. ∀pi, pj ∈ VP , ∀up, uq ∈ VR,

βijpq =

1, if δip = 1 and δjq = 10, otherwise

5. ∀pi, pj ∈ VP , ∀up, uq ∈ VR,∀k ∈ Li,∀l ∈ Lj ,

γklijpq =

1, if yijkl = 1 and βijpq = 10, otherwise

6. Considering m =| VP |, in the extreme case there canbe at most m number of voltage islands. We numberthe islands in sequence (this formulation will be uti-lized to design the constraint on maximum number ofvoltage islands). Then, ∀pi ∈ VP , 1 ≤ k ≤ m,

θik =

1, if pi is in island k0, otherwise

7. ∀k, 1 ≤ k ≤ m,

tk =

1, if there are nonzero elements in island k0, otherwise

8. ∀pi, pj ∈ VP ,

ωij =

8<: 1, if pi, pj are placed at adjacent locations ofmesh

0, otherwise

9. ∀pi, pj ∈ VP ,

ρij =

1, if pi, pj are operating at the same voltage0, otherwise

537

10. ∀pi, pj ∈ VP ,

ζij =

1, if pi, pj reside in the same voltage island0, otherwise

11. ∀pi, pj ∈ VP , 1 ≤ k ≤ m,

θ′ijk =

1, if θik = 1 and θjk = 10, otherwise

Objective Function:The objective function consists of three components: thecomputational energy consumption on the PEs E1, the en-ergy consumption on the level shifters E2 and the communi-cation energy consumption on the NoC links. Each of thesecan be calculated as:

E1 =X

pi∈VP

Xk∈Li

ηki xik

E2 =X

(up,uq)∈ER

Xpi,pj∈VP

Xk∈Li,l∈Lj

αkl yijkl βijpq

=X

(up,uq)∈ER

Xpi,pj∈VP

Xk∈Li,l∈Lj

αkl γklijpq

E3 =X

eij∈EP

Xup,uq∈VR

cij dpq βijpq ψl

Therefore, the objective function can be written as

obj : minimize E1 + E2 + E3

Constraints:

1. Each PE will be assigned to exactly one voltage levelin its allowable voltage levels

∀pi ∈ VP :Xk∈Li

xik = 1

2. Each PE will be placed to exactly one location of themesh and no two PEs are placed in the same locationof mesh, i.e.,

∀pi ∈ VP :X

up∈VR

δip = 1 and ∀up ∈ VR :X

pi∈VP

δip ≤ 1

3. Each PE resides in exactly one voltage island

∀pi ∈ VP :

mXk=1

θik = 1

4. The number of voltage islands created is within theallowable maximum limit κ

mXk=1

tk ≤ κ

From the definition of tk, the LHS of the above inequa-tion is the number of voltage islands created.

5. For a large constant M (which can be set here to beequal to m =| VP |)

∀k, 1 ≤ k ≤ m : M tk ≥X

pi∈VP

θik

The RHS of the above inequation indicates the numberof PEs in island k. The inequation above force this to

be zero in case tk is zero. Also, to enforce tk to be 0when the RHS is 0, we need

∀k, 1 ≤ k ≤ m :X

pi∈VP

θik ≥ tk

6. We can consider the delay bound λij specified in num-ber of hops. Therefore

∀eij = (pi, pj) ∈ EP :X

up,uq∈VR

dpq βijpq ≤ λij

7. The following two constraints are required to ensurethat yijkl = 1 if and only if xik = 1 and xjl = 1, i.e.,∀pi, pj ∈ VP , k ∈ Li, l ∈ Lj :

xik + xjl ≥ 2yijkl and xik + xjl − 1 ≤ yijkl

8. The following two constraints ensure that βijpq = 1if and only if δip = 1 and δjq = 1, i.e., ∀pi, pj ∈VP , up, uq ∈ VR:

δip + δjq ≥ 2βijpq and δip + δjq − 1 ≤ βijpq

9. The following two constraints ensure that γklijpq = 1

if and only if yijkl = 1 and βijpq = 1. We formulate∀pi, pj ∈ VP , k ∈ Li, l ∈ Lj , up, uq ∈ VR :

yijkl + βijpq ≥ 2γklijpq and yijkl + βijpq − 1 ≤ γkl

ijpq

10. This constraint ensures the definition of ωij .

∀pi, pj ∈ VP : ωij =X

(up,uq)∈ER

βijpq

11. This constraint ensures the definition of ρij

∀pi, pj ∈ VP : ρij =X

v∈Li∩Lj

yijvv

12. Since ζij = 1 if and only if θ′ijk = 1 for some k, 1 ≤k ≤ m, we can write this constraint as follows:

∀pi, pj ∈ VP : ζij =

mXk=1

θ′ijk

13. At the same time, two PEs can reside in the sameisland if they are placed at adjacent locations and op-erating at the same voltage level, i.e., ζij = 1 if ωij = 1and ρij = 1. Also if ρij = 0, then ζij has to be 0. Thisis ensured by the following:

∀pi, pj ∈ VP : ωij + ρij − 1 ≤ ζij and ρij ≥ ζij

14. This constraint ensures the definition of θ′ijk, i.e., ∀pi, pj ∈VP , ∀k, 1 ≤ k ≤ m:

θik + θjk ≥ 2θ′ijk and θik + θjk − 1 ≤ θ′ijk

4. HEURISTIC SOLUTION BASED ON RAN-DOM GREEDY SELECTION APPROACH

Most of the efforts in developing algorithm for mappingand voltage assignments have been based on heuristics dueto the hardness of the problems. In this section, we providea greedy heuristic to solve the mapping and voltage island-ing problem defined in section 2. One drawback of greedy

538

policy is that it may quite easily get stuck into local optimalsolutions. In order to avoid this, we introduce randomnessin our greedy selection. Before we explain the heuristic,we give an overview of the major factors considered in theheuristic:

• Communication volume cij - More is the value of cijfor a pair of pi, pj ∈ VP (eij ∈ EP ), it is desirable toplace them closer to each other in order to reduce thecommunication energy consumption.

• Common allowable voltage levels Li ∩Lj - More is thevalue of | Li ∩ Lj |, the pair of nodes pi, pj ∈ VP canbe more likely operated at the same lower voltage levelwhen placed close to each other. This will reduce boththe voltage island fragmentation (and thus reducingthe number of voltage islands), and the energy con-sumption due to computation and level shifters.

• Delay bound λij - Less is the value of λij for an edgeeij ∈ EP , it is desirable to place them closer to eachother in order to satisfy the delay bounds.

• Distance dij - More is value of dij between pi and pj

in the current placement, it may be required to movethem closer.

Based on this observation, we define a function f(i, j) forevery eij ∈ EP , which is directly proportional to cij , | Li ∩Lj | and dij , while inversely proportional to λij . Then wecan claim that higher the value of f(i, j), it is desirable toplace pi and pj closer to each other. Before defining thefunction, we need to scale all the constituting factors intothe same order of magnitude, so that none of them dominatethe value of f(i, j) alone. It should be noted that only thedij factor changes from solution to solution and thus scalethe effect of the other factors accordingly in deciding themovement of the PEs. Now, we can define f(i, j) as:

∀eij = (pi, pj) ∈ EP : f(i, j) =cij× | Li ∩ Lj |

λij× dij

For the comparison of the solutions we use the following costfunction associated with each solution:

C = E1 + E2 + E3 + φpen ×DV

where E1 is the computation energy consumption, E2 is thelevel shifter energy consumption, and E3 is the communica-tion energy consumption. DV is the number of delay boundviolations in the solution. We set φpen to be a constant ofconsiderably higher order of magnitude. This way, solutionswith delay bound violations can be easily distinguished fromothers, as they will have a higher order of magnitude of costthan others, and will be less likely to be returned as thefinal solution. The details of the algorithms are as follows:In line 1 of the algorithm, the current solution set is ini-talized as empty. The for-loop (line 2 to 27), iterates N1

times, each returning a solution Si. In each iteration, firstwe set the placement of the PEs randomly (line 3). In thefor-loop (line 4 to 20), this solution is perturbed N2 numberof times. First we calculate the set F of f(i, j) values forall eij ∈ EP , and sort them (line 5-6). Based on the userspecified parameter a, we select a× | F | highest values ofthe set F and select one of them randomly (line 7). Forexample, if a = 0.3, then 30% of the highest values of F areconsidered and one of them is selected randomly. We call

Algorithm 1 Random Greedy Selection Heuristic

Input: Given problem formulation in section 2, selection param-eter a (0 ≤ a ≤ 1), and two iteration limits N1 and N2.

Output: Solution S consisting of the physical locations and thevoltage level assignments for all PEs pi ∈ VP .

1: Initialize solution set X as empty.2: for (i = 1 to N1) do3: Randomly place the PEs on distinct locations of up ∈ VR.4: for (j = 1 to N2) do5: ∀eij ∈ EP , calculate the set F of f(i, j) values.6: Sort F according to f(i, j) values.7: From the a× | F | highest values of F select f(x, y) ∈ F

randomly.8: while (px and py are already adjacently placed) do9: F = F \ {f(x, y)}

10: Select a new f(x, y) ∈ F11: end while12: ∀pw ∈ neighbor(px) and ∀pz ∈ neighbor(py), calculate

the set H of common voltage cardinality values, i.e., |Lx ∩ Lw | and | Ly ∩ Lz |.

13: Select the minimum value element from H, which is dueto neighbor pu.

14: if (pu is neighbor of py) then15: Swap position of pu with px

16: else17: Swap position of pu with py

18: end if19: Set Voltage level of px and py at v = min(Lx∩Ly), and

do not change them within the inner iteration20: end for21: Calculate number of islands numIslands22: while (numIslands > κ) do23: Find PE px operating at the lowest voltage level and

minimum number of neighbors at the same voltage24: Voltage assignment of px is increased to the next higher

voltage level of neighbors25: end while26: Save solution as Si, i.e., X = X ∪ Si

27: end for28: Select S ∈ X such that C(S) = minSi∈X (C(Si))29: return S

this selection as f(x, y). While px and py are are already ad-jacently placed, we remove them from F and select anotherpair px, py corresponding to f(x, y) (line 8-11). Among allneighbors of x and y, the one pu having minimum overlap-ping voltage range with px or py is chosen (line 12-13), andswapped positions with px or py accordingly (line 14-18).Now, px and py will be adjacently placed, and we set theirvoltage level to the lowest common one (line 19). After N2

such perturbations are performed, we calculate the numberof islands in the current solution, numIslands (line 21). Ifthis value is greater than the maximum limit on the numberof voltage islands κ, we choose the PE which is operatingat the lowest voltage level and having minimum number ofneighbors at the same voltage level. We change the volt-age level of this PE to a higher one which is minimum ofall its neighbors. This is performed until numIslands ≤ κ(line 22-25). The current solution is appended in X as Si

(line 26). At the end of N1 iterations, the solution havingthe minimum cost value is returned (line 29). It is to benoted here that all Si ∈ X are feasible solutions in termsof maximum allowable voltage islands constraint. Also, thecost function C introduces a high penalty for solutions vio-lating delay bounds, and thus remove them from the finalconsideration.

539

Table 1: Number of Nodes and Edges in the Applications

Application Nodes Edges

Auto-Industry 24 21Consumer 12 12Networking 13 9Office-Automation 5 5MPEG4 12 13MWD 12 11OPD 16 17

5. EXPERIMENTAL RESULTSIn this section, we present the results obtained from the

experiments performed. We analyze the effect of severalparameters on the solution costs. The experiments are per-formed using Communication Trace Graphs (CTG) for ap-plications (auto-industry, consumer, networking and office-automation) from the E3S benchmark suite [4] and three realapplications MPEG4, MWD (Multi-Window Display) andOPD (Object Place Decoder). The number of nodes andcommunication edges of the graphs are shown in Table 1.We used five discrete choices of voltage levels as V0 = 3.6V ,V1 = 3.3V , V2 = 2.5V , V3 = 2.3V and V4 = 1.9V . Powerconsumption of the tasks on the PEs are based on the in-formation provided in the benchmark. Using static delayanalysis, and using the power consumption variations withvoltage change from the processor vendors’ datasheet men-tioned in the benchmark, we assign allowable voltage levelsfor the PEs. Delay constraint parameters are varied in therange of small values and in the range of large values. Theupper limit on the number of voltage islands is also set atboth low (between 3 to 5) and high (between 7 and 9) values.We want to analyze the effects of the variations of the follow-ing parameters on the solution cost: N1, N2 - the numberof iterations and perturbations per iteration in the heuristic,a - value of the randomization parameter used by heuristic,and λij , κ - which are the bounds on delay and number ofvoltage islands in the problem specification, respectively.

We used values of N1 as 20, 100, 200, 500 and values of N2

as 5, 20, 40, 70. Three different values for the randomizationparameter a used are 0.3, 0.5 and 0.7. All the experimentswere performed on a Pentium-4 3.2 GHz processor with 1GB RAM. The heuristic is implemented in C++. The MILPexecution for generation of optimal solution was done usingILOG CPLEX 10.0 Concert technology on the same ma-chine. Considering the 7 CTGs, 4 possible combination ofλij and κ (low-low, low-high, high-low and high-high), 4values of N1, N2 and 3 different values of a, in total 336experiments were performed. Since, the values of the costfunction for different datasets (CTGs) are quite different, wescale them in order to plot them in the same graph.

From Fig. 2, it can be seen that the value of the cost func-tions for all the seven test-cases follow similar trend, withvariations of λij and κ. This is according to our expectation,since in case (λij , κ) = (low, low), the constraints becomestricter, leading to highest value of the cost functions. Incase of (λij , κ) being either (low, high) or (high, low), oneof the constrains is relaxed, while the other one still beingstrict. In this case, the value of cost function is smaller thanthe previous case. In case of (λij , κ) = (high, high), boththe constraints are relaxed, and thus leading to lowest valueof the cost function. In Fig. 2, we have plotted optimal value

!"#$%&"#$' !"#$%&()*(' !()*(%&"#$' !()*(%&()*('+

+,-

+,.

+,/

+,0

123*4&#5&62"748&5#9&!:4"2;&<#73:%&=6#">2*4&)8"23:8&<#73:'&?2)9

@A2"4:&B#8>

#55)A4!27>#C2>)#3

27>#!)3:78>9;

34>$#9D)3*

A#387C49

C?4*.

C$:

#?:

Figure 2: Optimal solution cost (scaled) for all appli-

cation CTGs with varying range of constraints (delay

bound, voltage island limit)

office−auto auto−indust networking consumer mpeg4 mwd opd0

0.4

0.8

1.2

1.4

Application test Cases − (delay bound, #voltage island bound) = (low, low)Sc

aled

Cos

t

OptimalHeuristic

Figure 3: Optimal vs. Heuristic solution for (delaybound, voltage island bound) = (low, low)

office−auto auto−indust networking consumer mpeg4 mwd opd0

0.2

0.4

0.6

0.8

1

1.2

1.4

Application Test Cases − (delay bound, #voltage island bound) = (low, high)

Scal

ed C

ost

optimal heuristic

Figure 4: Optimal vs. Heuristic solution for (delaybound, voltage island bound) = (low, high)

of the cost functions, for all seven datasets, and varying therange of (λij , κ).

Comparing the value of the cost function at different val-ues of the parameters N1, N2, a, it was observed that fornumber of iterations N1 as 500, and the number of pertur-bations N2 as 70 in each iteration, we get very good qualitysolutions. Although we do not expect our random greedyselection approach to follow strictly any pattern based onthe value of a, but it can be seen that in majority of thecases, it gives good result for a set at 0.5, since this is agood tradeoff between the strictly greedy (when a is verylow) and strictly random (when a is very high) approaches.

In Figures 3, 4, 5, 6, we compare the results obtained bythe heuristic with that of the optimal. Following our aboveobservations we set (N1, N2) as (500, 70) and a as 0.5 whileexecuting our heuristic. It can be seen from the plots inthese figures that, for all seven CTGs and for all four valuesof (λij , κ), the solution cost of the heuristic is very close to

540

office−auto auto−indust networking consumer mpeg4 mwd opd0

0.2

0.6

1

1.4

Optimal vs. Heuristic Solution for (delay bound, #voltage island bound) = (high, low)

Scal

ed C

ost

Optimal Heuristic

Figure 5: Optimal vs. Heuristic solution for (delaybound, voltage island bound) = (high, low)

office−auto auto−indust networking consumer mpeg4 mwd opd0

0.4

0.8

1.2

1.4

Application Test Cases − (delay bound, #voltage island bound) = (high, high)

Scal

ed C

ost

optimal heuristic

Figure 6: Optimal vs. Heuristic solution for (delaybound, voltage island bound) = (high, high)

that of the optimal. For all test-cases, the proposed heuristicfinishes execution within a few seconds, whereas the CPLEXsolver was executed for hours in order to achieve the optimalsolution. Also, out of all the 336 experiments, only once theheuristic was unable to return a feasible solution. This wasdetermined by the value of the cost function being a orderof magnitude higher due to the penalty associated for delayviolation. Moreover, our proposed heuristic is much moreefficient in terms of solution time than the traditional localsearch heuristics, such as, simulated annealing. In most ofthe test-cases for our experiments, the optimal solution wasalso found in reasonable amount of time. But, whereas forour experiments the maximum mesh size used was 5× 5, inreal-life very soon MPSoCs are predicted to have hundreds ofcores in a single chip. In this scenario, even for design phase,the use of the optimal solution will become unrealistic dueto the exponential growth in solution time.

6. CONCLUSIONWe have considered the mapping and voltage islanding

problem for NoCs in a unified fashion. Considering the volt-age islanding at an early phase of design, during mapping ofthe PEs onto the NoC routers, can be beneficial in restrictingthe number of voltage islands within a certain limit and alsominimizing the overall energy consumption. In this paper,we have formulated the mapping and voltage islanding prob-lem as an optimization problem. We provide both optimaland heuristic solution to the problem. Experimental resultsevaluate the quality of the heuristic solution as comparedwith that of the optimal.

7. REFERENCES[1] T. D. Burd and R. W. Brodensen. Design Issues for

Dynamic Voltage Scaling. In Proc. of ISLPED, pages9–14, Rapallo, Italy, 2000.

[2] W. Dally and B. Towles. Route Packets, Not Wires:On-Chip Interconnection Networks. In Proc. of DAC,pages 684–689, Las Vegas, Nevada, USA, June 2001.

[3] G. De-Micheli and L. Benini. Networks On Chips.Morgan Kaufmann, 2006.

[4] R. Dick. Embedded System Synthesis BenchmarksSuite(E3S).

[5] J. Hu and R. Marculescu. Energy-Aware Mapping forTile-based NoC Architectures Under PerformanceConstraints. In Proceedings of ASPDAC Conf.,Kitakyushu, Japan, January 2003.

[6] J. Hu, Y. Shin, N. Dhanwada, and R. Marculescu.Architecting Voltage Island in Core-basedSystem-on-a-Chip Designs. In Proc. of ISLPED, pages180–185, 9-11 Aug. 2004.

[7] W. L. Hung, G. M. Link, Y. Xie, N. Vijaykrishnan,N. Dhanwada, and J. Corner. Temperature-AwareVoltage Islands Architecting in System-on-ChipDesigns. In Proc. of ICCAD, pages 689–695, 2-5 Oct.2005.

[8] A. Jantsch and H. Tenhunen, editors. Networks OnChip. Kluwer Academic Publishers, 2003.

[9] D. E. Lackey, P. S. Zuchowski, and T. R. Bednar.Managing Power and Performance for System-on-ChipDesigns using Voltage Islands. In Proc. of ICCAD,pages 195–202, November 10-14 2002.

[10] W. Lee, H. Y. Liu, and Y. W. Chan. Voltage IslandAware Floorplanning for Power and TimingOptimization. In Proc. of ICCAD, pages 389–394, SanJose, CA, November 5-9 2006.

[11] H.-Y. Liu, W.-P. Lee, and Y.-W. Chang. A ProvablyGood Approximation Algorithm for PowerOptimization Using Multiple Supply Votages. In Proc.of DAC, pages 87–890, San Diego, CA, June 4-8 2007.

[12] W. K. Mak and J. W. Chen. Voltage IslandGeneration under Performance Requirement for SoCDesigns. In Proc. of ASPDAC, pages 798–803, 2007.

[13] U. Y. Ogras, R. Marculescu, P. Choudhary, andD. Marculescu. Voltage-Frequency Island Partitioningfor GALS-based Networks-on-Chip. In Proc. of DAC,San Diego, California, USA, June 2007.

[14] M. Popovich, E. G. Friedman, M. Sotman, andA. Kolodny. On-Chip Power Distribution Grids withMultiple Supply Voltages for High PerformanceIntegrated Circuits. In Proc. of GLSVLSI, pages 2–7,Chicago, Illinois, USA, April 2005.

[15] D. Sengupta and R. A. Saleh. Application-DrivenVoltage-Island Partitioning for Low-PowerSystem-on-Chip Design. IEEE Transactions onComputer-Aided Design of Integrated Circuits andSystems, 28(3):316–326, March 2009.

[16] K. Srinivasan, K. S. Chatha, and G. Konjevod.Linear-Programming-Based Techniques for Synthesisof Network-on-Chip Architectures. IEEE Transactionson VLSI Systems, 14(4):407–420, April 2006.

[17] H. Wu and M. D. F. Wong. Improving VoltageAssignment by Outlier Detection and IncrementalPlacement. In Proc. of DAC, pages 459–464, SanDiego, CA, June 4-8 2007.

541