13
132 Int. J. High Performance Systems Architecture, Vol. 2, Nos. 3/4, 2010 Copyright © 2010 Inderscience Enterprises Ltd. Energy efficient mapping and voltage islanding for regular NoC under design constraints Pavel Ghosh* and Arunabha Sen Computer Science Program, School of Computing, Informatics and Decision Systems Engineering, Arizona State University, Tempe, AZ 85281, USA E-mail: [email protected] E-mail: [email protected] *Corresponding author Abstract: Computational energy consumption of the processing elements (PEs) of a NoC can be significantly reduced by scaling down their voltage levels. This creates clusters of adjacent PEs operating at the same voltage level, known as voltage islands. Excessive number of voltage islands is undesirable from the physical design perspective and due to the overhead of level shifter energy consumption between adjacent voltage islands. Considering these issues during mapping of the PEs to the NoC routers, can potentially lead to acceptable solutions with reduced overall energy consumption. In this paper, we formulate the mapping problem as an optimisation problem. We present both optimal solution, obtained by solving a mixed integer linear program (MILP), and heuristic solution based on random greedy selection. Experimental results using benchmark and real applications show that the heuristic finds near-optimal solution in almost all cases in a very small fraction of the time required to achieve the optimal solution. Keywords: energy efficient mapping; voltage islanding; system-on-chip; SoC; network-on-chip; NoC; mixed integer linear programming; MILP; greedy heuristic; randominisation. Reference to this paper should be made as follows: Ghosh, P. and Sen, A. (2010) ‘Energy efficient mapping and voltage islanding for regular NoC under design constraints’, Int. J. High Performance Systems Architecture, Vol. 2, Nos. 3/4, pp.132–144. Biographical notes: Pavel Ghosh received his BE in Computer Science and Engineering from Jadavpur University, Kolkata, India in 2003 and PhD in Computer Science and Engineering form the Arizona State University, Arizona, USA in 2009. He is currently a Post-Doctorate Research Assistant in the School of Computing, Informatics and Decision Systems Engineering, Arizona State University. His primary research interest is in the system-level power/performance optimisation problems in system-on-chip (SoC) and network-on-chip (NoC) domain. Arunabha (Arun) Sen received his BS in Electronics and Telecommunication Engineering from Jadavpur University, India, and the PhD in Computer Science from the University of South Carolina, Columbia, USA. He is currently an Associate Professor in the School of Computing, Informatics and Decision Systems Engineering, Arizona State University. His research interest is in the area of resource optimisation problems in wireless and optical networks. He has published over 100 research papers in peer-reviewed journals and conferences on these topics. His current research is sponsored by the US Army Research Office, Air Force Office of Scientific Research and Defense Threat Reduction Agency. 1 Introduction In recent years the design of multi-processor system-on-chip (MPSoC)-based systems has become extremely challenging due to the progress in processor and semiconductor technologies. Due to the growing complexity of consumer embedded products, and the complexity of new communication and multimedia standards, future MPSoCs are predicted to contain several hundreds of processing elements (PEs) communicating among themselves at very high-speed rates. In order to meet the increasing requirements of performance, scalability and flexibility, shared bus based communication infrastructure is no longer adequate for MPSoCs. Network-on-chip (NoC) provides an alternative to the bus-based on-chip communication that can overcome the problems of performance, scalability and flexibility. In order to handle the increasing complexities in the MPSoC designs, the necessity for NoC has been discussed by several researchers, such as, Benini and De Micheli (2002), Dally and Towles (2001), and Jantsch and Tenhunen (2003). As the number of PEs on an MPSoC and the data traffic between them continue to grow, minimisation of energy consumption subject to the performance constraint has

Energy efficient mapping and voltage islanding for regular NoC under design constraints

Embed Size (px)

Citation preview

132 Int. J. High Performance Systems Architecture, Vol. 2, Nos. 3/4, 2010

Copyright © 2010 Inderscience Enterprises Ltd.

Energy efficient mapping and voltage islanding for regular NoC under design constraints

Pavel Ghosh* and Arunabha Sen Computer Science Program, School of Computing, Informatics and Decision Systems Engineering, Arizona State University, Tempe, AZ 85281, USA E-mail: [email protected] E-mail: [email protected] *Corresponding author

Abstract: Computational energy consumption of the processing elements (PEs) of a NoC can be significantly reduced by scaling down their voltage levels. This creates clusters of adjacent PEs operating at the same voltage level, known as voltage islands. Excessive number of voltage islands is undesirable from the physical design perspective and due to the overhead of level shifter energy consumption between adjacent voltage islands. Considering these issues during mapping of the PEs to the NoC routers, can potentially lead to acceptable solutions with reduced overall energy consumption. In this paper, we formulate the mapping problem as an optimisation problem. We present both optimal solution, obtained by solving a mixed integer linear program (MILP), and heuristic solution based on random greedy selection. Experimental results using benchmark and real applications show that the heuristic finds near-optimal solution in almost all cases in a very small fraction of the time required to achieve the optimal solution.

Keywords: energy efficient mapping; voltage islanding; system-on-chip; SoC; network-on-chip; NoC; mixed integer linear programming; MILP; greedy heuristic; randominisation.

Reference to this paper should be made as follows: Ghosh, P. and Sen, A. (2010) ‘Energy efficient mapping and voltage islanding for regular NoC under design constraints’, Int. J. High Performance Systems Architecture, Vol. 2, Nos. 3/4, pp.132–144.

Biographical notes: Pavel Ghosh received his BE in Computer Science and Engineering from Jadavpur University, Kolkata, India in 2003 and PhD in Computer Science and Engineering form the Arizona State University, Arizona, USA in 2009. He is currently a Post-Doctorate Research Assistant in the School of Computing, Informatics and Decision Systems Engineering, Arizona State University. His primary research interest is in the system-level power/performance optimisation problems in system-on-chip (SoC) and network-on-chip (NoC) domain.

Arunabha (Arun) Sen received his BS in Electronics and Telecommunication Engineering from Jadavpur University, India, and the PhD in Computer Science from the University of South Carolina, Columbia, USA. He is currently an Associate Professor in the School of Computing, Informatics and Decision Systems Engineering, Arizona State University. His research interest is in the area of resource optimisation problems in wireless and optical networks. He has published over 100 research papers in peer-reviewed journals and conferences on these topics. His current research is sponsored by the US Army Research Office, Air Force Office of Scientific Research and Defense Threat Reduction Agency.

1 Introduction

In recent years the design of multi-processor system-on-chip (MPSoC)-based systems has become extremely challenging due to the progress in processor and semiconductor technologies. Due to the growing complexity of consumer embedded products, and the complexity of new communication and multimedia standards, future MPSoCs are predicted to contain several hundreds of processing elements (PEs) communicating among themselves at very high-speed rates. In order to meet the increasing requirements of performance, scalability and flexibility,

shared bus based communication infrastructure is no longer adequate for MPSoCs. Network-on-chip (NoC) provides an alternative to the bus-based on-chip communication that can overcome the problems of performance, scalability and flexibility. In order to handle the increasing complexities in the MPSoC designs, the necessity for NoC has been discussed by several researchers, such as, Benini and De Micheli (2002), Dally and Towles (2001), and Jantsch and Tenhunen (2003).

As the number of PEs on an MPSoC and the data traffic between them continue to grow, minimisation of energy consumption subject to the performance constraint has

Energy efficient mapping and voltage islanding for regular NoC under design constraints 133

become one of the most important objectives. Power consumption of VLSI circuits can be roughly broken down into two components: static power and dynamic power. While static power mainly relates to the leakage current, dynamic power dP is a result of the switching activities of the circuit, given by:

2d ddP kCV f=

where k is the switching rate, C is the load capacitance, ddV is the supply voltage and f is the clock frequency. As

dynamic power is proportional to the square of supply voltage ,ddV reducing ddV can significantly reduce the dynamic power consumption. Among the various approaches taken to reduce power consumption of MPSoCs, the use of multi-supply voltages (MSV) has gained popularity among the researchers. The performance critical PEs generally require high supply voltage to meet the performance requirements, while the non-crictical PEs can be slowed down using lower supply voltages and thus, gaining in terms of power consumption. Utilising the power-performance trade-off, the idea of multi-voltage islanding was first proposed by Lackey et al. (2002). Voltage island on a chip is defined as a cluster of adjacent PEs operating at the same voltage level. Although, scaling down the voltage levels of PEs is favourable for reduction of energy consumption, excessive number of voltage islands may be detrimental from the perspective of physical design (Popovich et al., 2005) as it creates voltage island fragmentation of the chip and increases the complexity of layout of the power delivery network. In order to gain the advantage of voltage islanding, it needs to be considered during the mapping of the PEs to the NoC routers. This way computational energy consumption can be reduced, without creating excessive number of voltage islands. Moreover, mapping of heavily connected PEs in adjacent locations will also reduce the communication energy consumption on the NoC links.

1.1 Related work

The issues related to voltage islanding and floorplanning for system-on-chips (SoCs) with NoC based communication architecture have been addressed by several researchers. Some of these literatures fail to consider the level shifter overhead in case of multiple voltage islands, whereas some of them have very high potential overhead in terms of computational complexity due to the execution of computationally intensive floorplanning algorithm in multiple iterations. Moreover, in most of the cases voltage islanding and floorplanning problems are solved in a sequential manner and thus, making the final solution vulnerable to potentially bad design decisions, which leads to local optimality, taken in an early phase. In this subsection, we list a few of the most prominent research efforts in this domain and discuss their relative merits and demerits.

Hu et al. (2004) developed a simulated annealing based floorplanning framework with cost function combining the

number of voltage islands, power consumption and area overhead. They do not consider the level shifter energy consumption and the link delays. Hung et al. (2005) proposed a temperature-aware voltage islanding and floorplanning to minimise the peak and average temperature across SoC, area, wirelength and power budget. Although, they consider the area and delay introduced by the level shifters, but not the power overhead. Sengupta and Saleh (2009) proposed dynamic programming based approach for voltage selection and island creation to minimise overall SoC power, area and floorplanner runtime. Their cost function includes both power and connectivity components. The feasibility requirement in this paper is too strict as it considers a solution with two non-contiguous voltage islands operating at the same voltage to be infeasible, even if it satisfies the design constraints. They do not consider the level shifter power consumption and the link delays. Besides, performing several floorplanning for each of the voltage assignment solution and then selecting the best among them increases the computation overhead. Wu et al. (2007) present a two-approximation algorithm and a fast heuristic for standard cell placements, but here the approach is applicable only on existing floorplan and placement of the cells. Also, they do not consider the energy consumption on the level shifters between voltage islands. Lee et al. (2006) developed a dynamic programming based voltage island partitioning, level shifter insertion and power network aware floorplanning for power optimisation within timing constraints. They consider the delay and power overhead of level shifters. Interconnection delay is considered by transforming the slack on nets into a bound on wirelength. Moreover, their algorithm consists of multiple iterations with floorplanning being a part of the iterations. Considering the emerging trend of future MPSoC, with several hundreds of communicating PEs, this approach will lack the efficiency required to provide a solution. Mak and Chen (2007) optimised total power consumption and power network complexity without compromising wirelength and chip area. Although, they do not directly constraint the maximum number of voltage islands, but model it by restricting a minimum requirement on the number of neighbours for each PE operating at the same voltage. In their framework, the voltage assignment is performed for all generated floorplans, and thus will be practical only for limited size of input. Wu and Wong (2007) considered the objective of optimising power consumption with limited design cost and limited number of level shifters. Their algorithm is based on the concept of detecting outliers, which is defined as modules suffering from high voltage assignment due to remote placement, and followed by incremental placements. Since, it may be required to perform the incremental placement phase several times in iterations; it may not be practical for large size of inputs. Furthermore, the convergence of the solution is not guaranteed. Liu et al. (2007) proposed an

2α -approximation algorithm for the voltage islanding problem, where α is the ratio of the maximum and minimum voltage values, but the partitioning is performed

134 P. Ghosh and A. Sen

on existing floorplans, and thus may suffer from early decisions. Ogras et al. (2007) considered voltage islanding in NoC to minimise energy consumption. Their approach is based on solving a non-linear problem formulation. Starting with the case where all PEs operate at their respective minimum possible voltage level, they merge voltage islands based on the reduction of the cost function till the point when the entire chip operates at a single voltage level. From the set of all solutions achieved in the process, they choose the one with minimum energy consumption and satisfying the constraint on maximum number of islands allowed. This definitely adds a huge computation requirement. Also, voltage islanding as part of floorplanning objective has not been considered. It has been mentioned by the authors that the energy consumption on level shifters can be significantly high. Burd and Brodensen (2000) designed a voltage level converter circuit, with an estimated energy consumption proportional to the difference of the square of the voltage levels of two end-points. We follow the similar characterisation in this paper. To the best of our knowledge, our paper is the first that considers the voltage islanding problem for NoC at an early phase of design taking into account all the contributing factors for energy consumption, including the level shifters.

Figure 1 The effect of mapping on final voltage islanding and energy consumption (a) mapping with four voltage islands (b) mapping with two voltage islands, but higher energy consumption (c) mapping with two voltage islands and lower energy consumption

(a) (b)

(c)

Notes: Light grey indicating lower voltage level V1 and dark grey indicating higher voltage level V2.

1.2 Key contributions

The contributions of our paper can be listed as follows:

• the voltage islanding problem for NoC is considered as part of mapping of the PEs to the NoC router nodes

• all three components of energy consumption, namely – computational energy consumption, link communication energy consumption and level shifter energy consumption have been considered

• the mapping and voltage islanding problem is formulated as an optimisation problem and mixed integer linear program (MILP) is developed for optimal solution of the problem

• efficient heuristic based on random greedy selection is provided for good quality solutions

• experimental results investigate the effects of several parameters on the outcome of the heuristic and then evaluate the quality of the heuristic solution as compared to the optimal solution.

1.3 Motivational example

The motivation behind our approach can be shown in the following example. It shows that considering voltage islanding as part of mapping of the PEs to router nodes, can be advantageous in terms of saving energy consumption. Let us consider a simple communication trace graph consisting of four PEs, , ,A B C and .D The edges in the graph ( , ), ( , ), ( , )A B AC B D and ( , )C D represent the communication among the PEs. We define allowable voltage levels for each PE, as the set of voltage levels, operating at each of which the task executing at the PE will execute within its performance bound. Let us consider that the allowable voltage levels for PEs A and D are

1 1.77=V V and 2 2.5 ,=V V whereas, the allowable voltage levels for PEs B and C are only 2 2.5 .=V V Now, let us consider placing the PEs in a regular 2 × 2 mesh NoC architecture topology. Without voltage islanding in mind, we may map the communicating PEs to the adjacent router nodes of the NoC, and the mapping is shown in Figure 1(a). If we operate all the PEs at the lowest possible voltage levels, there will be four voltage islands requiring four level shifters as shown in the figure. If the design constraint specifies the maximum number of voltage islands to be two, we need to raise the voltage level of either A or ,D as shown in Figure 1(b). This will lead to higher computation energy consumption due to the higher voltage assignment to one of those PEs (shown as D in figure). On the other hand, if the voltage islanding was considered during mapping of the PEs, we could have achieved the mapping as shown in Figure 1(c). In this mapping, all the PEs can be operated at their lowest possible voltage levels, and still satisfying the design constraint. As shown in the figure, the number of voltage islands created is two, and also requiring only two level shifters.

From this example, it is clear that considering the voltage islanding technique for energy minimisation at an early phase of design leads to better energy reduction. In our paper, we have combined all three components of energy consumption (computation, level shifter and link communication) in the objective function, and included the constraint of maximum number of voltage islands.

Energy efficient mapping and voltage islanding for regular NoC under design constraints 135

1.4 System model and assumptions

Our targeted system can be classified as application specific standard products (ASSP) (De Micheli and Benini, 2006), also called as platforms. The traffic patterns and bandwidth requirements are known a priori in this kind of systems. We consider the routers to follow static source routing, where the routing of the data paths remains static based on the routing table at the source node. Delay analysis can be performed using static analysis technique described by Hu and Marculescu (2004). Starting with the application deadline and the variations of energy consumption and execution time of each task for each PE and each voltage level, the slack budgeting technique is used to obtain individual task deadline values. The characterisation of allowable voltage levels can be done independently for each PE as mentioned by Hu et al. (2004), Hung et al. (2005), Mak and Chen (2007), and Sengupta and Saleh (2009). In this paper, we considered regular mesh NoC topology, but our approach can be used for any other regular NoC topologies as well. Each of the routers has five ports. One of the ports is used for connecting it to a PE and the other four are used for connection to the neighbouring routers. Voltage level shifter energy consumption model is taken from Burd and Brodensen (2000). Link communication energy consumption parameters have been taken from Srinivasan et al. (2006).

1.5 Paper outline

The rest of the paper is organised as follows. Section 2 gives a formal definition of the problem and formulates it as an optimisation problem. In Section 3, we develop MILP in order to obtain optimal solution to the problem. We develop efficient heuristic to solve the problem in Section 4. The experimental results are discussed in Section 5. Finally, we conclude the paper in Section 6.

2 Problem formulation

In this section, we give the formal definition of the mapping and voltage islanding problem. Although, we consider mesh topology for the NoC architecture, our approach can be used for any other regular NoC topologies as well. The problem can be stated as follows: Given,

• A mesh topology for the NoC, ( , ),R R RG V E with dimension .×M N

• Each node of the mesh topology is represented by a tuple ( , ),i ix y i.e., ,i Rv V∀ ∈ there is an associated ( , ),i ix y where 1 ix M≤ ≤ and 1 .iy N≤ ≤

• Distance between , ji Rv v V∈ is denoted by ijd (in number of hops) and can be calculated as:

.= − + −j jij i id x x y y

• ( , )P P PG V E is a communication trace graph (| | | |).P RV V≤

• Each i Pp V∈ represents a PE and an edge ij Pe E∈ represents a communication trace between PEs ip and

,jp where ( , ).ij i je p p=

• Communication volume (in number of bytes) associated with each ,ij Pe E∈ and denoted by .ijc

• Each i Pp V∈ is associated with an allowable set of voltage levels, 1 2{ , , , }.in

i i i iL v v v= … Allowable set of voltage levels is defined as the set of voltage levels operating at each of which the task assigned at PE ip finishes within the specified deadline.

• Communication delay bound ijλ for each ( , ) .ij i j Pe p p E= ∈

• Maximum number of allowable voltage islands, denoted as .κ

• vi =η computation energy consumption of i Pp V∈ at

voltage level .iv L∈

• 1 2v vα = level shifter energy consumption between two adjacent PEs operating at voltage levels 1v and 2 ,v respectively.

• =lψ power consumption of the links (/ / ).mm Mbps

The objective is to map the PEs on the nodes of the NoC mesh topology and assign voltage levels such that the cost function is minimised without violating any design constraints. The cost function for this problem has three components:

• Computation energy consumption – Since, the computation energy is directly proportional to the square of the voltage, the voltage assignments have a major impact on this component.

• Level shifter energy consumption – More the number of voltage islands, more will be the number of level shifters between adjacent PEs operating at different voltage levels. Therefore, both the voltage assignment and mapping of the PEs will have an impact on this component.

• Communication energy consumption – Communication energy consumption on links is directly proportional to the link length and the communication bandwidth. Therefore, longer is the distance between two communicating PEs, more will be the overall energy consumption on the links of their routing path.

3 Optimal solution

In this section, we use the mathematical programming techniques to solve the mapping and voltage islanding problem. We formulate the problem as a MILP. The objective function combines all three components of energy consumption. We shall first define the variables used in the formulation, followed by the definition of the objective function and the constraints.

136 P. Ghosh and A. Sen

3.1 Variables

The following binary variables are used in the formulation of the MILP.

1 Indicator variable for the voltage assignment of the PEs:

, :1, if is operating at voltage level 0, otherwise

∀ ∈ ∀ ∈

⎧= ⎨⎩

i P i

iik

p V k L

p kx

2 Indicator variable for the mapping of the PEs on the NoC routers:

, :

1, if is placed at location

0, otherwise

∀ ∈ ∀ ∈

⎧⎪= ⎨⎪⎩

i P p R

i pip

p V v V

p vδ

3 Binary variable combining the voltage assignment of two PEs:

, , , :

1, if =1 and 1

0, otherwise

∀ ∈ ∀ ∈ ∀ ∈

=⎧⎪= ⎨⎪⎩

i j P i j

ik jlijkl

p p V k L l L

x xy

4 Binary variable combining the mapping of two PEs:

, , , :

1, if =1 and 1

0, otherwise

∀ ∈ ∀ ∈

=⎧⎪= ⎨⎪⎩

i j P p q R

ip jqijpq

p p V v v V

δ δβ

5 Binary variable combining the voltage assignment and mapping of two PEs:

, , , , , :

1, if = 1

0, otherwise

∀ ∈ ∈ ∈ ∈

=⎧⎪= ⎨⎪⎩

i j P p q R i j

ijkl ijpqklijpq

p p V v v V k L l L

y βγ

6 Considering | |,Pm V= in the extreme case there can be at most m number of voltage islands. We index the voltage islands with numbers ,1 .≤ ≤k k m Therefore, the indicator variable representing the voltage island number to which each PE is mapped onto:

,1 :1, if is in island 0, otherwise

∀ ∈ ≤ ≤

⎧= ⎨⎩

i P

iik

p V k m

p kθ

7 Indicator variable for the islands having non-zero elements. The islands with zero element in it does not exist:

,1 :1, if there are non-zero elements in island 0, otherwise

∀ ≤ ≤

⎧= ⎨⎩

k

k k m

kt

8 Binary variable indicating the adjacent placement of two PEs:

, :

1, if , are placed at adjacent locations of mesh

0,othrewise

∀ ∈

⎧⎪= ⎨⎪⎩

i j P

i jij

p p V

p pω

9 Binary variable indicating if two PEs are operating at the same voltage level:

, :

1, if , are operating at the same voltage

0,othrewise

∀ ∈

⎧⎪= ⎨⎪⎩

i j P

i jij

p p V

p pρ

10 Binary variable indicating if two PEs are located in the same voltage island:

, :

1, if , reside in the same voltage island

0,othrewise

∀ ∈

⎧⎪= ⎨⎪⎩

i j P

i jij

p p V

p pζ

11 Binary variable combining the voltage island assignment of two PEs:

, ,1 :

1, if =1 and 1

0,othrewise

∀ ∈ ≤ ≤

=⎧⎪′ = ⎨⎪⎩

i j P

ik jkijk

p p V k m

θ θθ

3.2 Objective function The objective function consists of three components: the computational energy consumption of the PEs 1,E the energy consumption of the level shifters 2E and the communication energy consumption of the NoC links 3.E Each of these can be calculated as:

1∈ ∈

= ∑ ∑i P i

ki ik

p V k L

E xη (1)

( )

( )

2, ,,

, ,,

∈ ∈ ∈∈

∈ ∈ ∈∈

=

=

∑ ∑ ∑

∑ ∑ ∑i j P i jp q R

i j P i jp q R

kl ijkl ijpqp p V k L l Lv v E

klkl ijpq

p p V k L l Lv v E

E yα β

α γ (2)

3,∈ ∈

= ∑ ∑ij P p q R

ij pq ijpq le E v v V

E c d β ψ (3)

Therefore, the objective function can be written as:

1 2 3: min + +obj imise E E E

3.3 Constraints

• Each PE will be assigned to exactly one voltage level in its allowable voltage levels:

: 1∈

∀ ∈ =∑i

i P ikk L

v V x (4)

Energy efficient mapping and voltage islanding for regular NoC under design constraints 137

• Each PE will be placed to exactly one location of the mesh and no two PEs are placed in the same location of mesh, i.e.:

: 1∈

∀ ∈ =∑p R

i P ipv V

v V δ (5)

and

: 1∈

∀ ∈ ≤∑i P

p R ipv V

v V δ (6)

• Each PE resides in exactly one voltage island:

1

: 1=

∀ ∈ =∑m

i P ikk

v V θ (7)

• The number of voltage islands created satisfies the design constraint, specified as maximum allowable number of voltage islands :κ

1=

≤∑m

kk

t κ (8)

From the definition of ,kt the LHS of the above inequation is the number of voltage islands created.

• For a significantly large constant :M

,1 :∈

∀ ≤ ≤ ≥ ∑i P

k ikv V

k k m Mt θ (9)

The RHS of the above inequation indicates the number of PEs in island .k The inequation above force this to be zero in case kt is zero. Also, to enforce kt to be 0 when the RHS is 0, we need:

,1 :∈

∀ ≤ ≤ ≤∑i P

ik kv V

k k m tθ (10)

• We can consider the delay bound ijλ specified in number of hops. Therefore,

,

( , ) :∈

∀ = ∈ ≤∑p q R

ij i j P pq ijpq ijv v V

e p p E d β λ (11)

• The following two constraints are required to ensure that 1ijkly = if and only if 1ikx = and 1,jlx = i.e.:

, , , : 2∀ ∈ ∈ ∈ + ≥i j P i j ik jl ijklp p V k L l L x x y (12)

, , , : 1∀ ∈ ∈ ∈ + − ≤i j P i j ik jl ijklp p V k L l L x x y (13)

• The following two constraints ensure that 1ijpq =β if and only if 1ip =δ and 1,jq =δ i.e.:

, , , : 2∀ ∈ ∈ + ≥i j P p q R ip jq ijpqp p V v v V δ δ β (14)

, , , : 1∀ ∈ ∈ + − ≤i j P p q R ip jq ijpqp p V v v V δ δ β (15)

• The following two constraints ensure that 1klijpq =γ if

and only if 1ijkly = and 1.ijpq =β This is formulated as:

, , , , :

2

∀ ∈ ∈ ∈

+ ≥

i j P i p q R

klijkl ijpq ijpq

p p V k L v v V

y β γ (16)

, , , , :

1

∀ ∈ ∈ ∈

+ − ≤

i j P i p q R

klijkl ijpq ijpq

p p V k L v v V

y β γ (17)

• This constraint ensures the definition of ,ijω i.e., ip and jp are mapped to the adjacent locations of the mesh if ip is mapped to ,p jv p is mapped to ,qv and

,p qv v are located at adjacent positions:

( ),

, :∈

∀ ∈ = ∑p q R

i j P ij ijpq

v v E

p p V ω β (18)

• This constraint ensures the definition of ,ijρ i.e., ip and jp are assigned to the same voltage level, if both of them are operating at the same voltage :∈ ∩ jiv L L

, :∈ ∩

∀ ∈ = ∑i j

i j P ij ijvvv L L

p p V yρ (19)

• Since 1ij =ζ if and only if 1ijk′ =θ for some ,1 ,≤ ≤k k m we can write this constraint as follows:

1

, :=

′∀ ∈ =∑m

i j P ij ijkk

p p V ζ θ (20)

• At the same time, two PEs can reside in the same island if they are placed at adjacent locations and operating at the same voltage level, i.e., 1ij =ζ if 1ij =ω and

1.ij =ρ Also if 0,ij =ρ then ijζ has to be 0. This is ensured by the following:

, : 1∀ ∈ + − ≤i j P ij ij ijp p V ω ρ ζ (21)

, :∀ ∈ ≥i j P ij ijp p V ρ ζ (22)

• This constraint ensures the definition of ,ijk′θ i.e., 1ijk′ =θ if and only if 1ik =θ and :1=jkθ

, , ,1 : 2 ′∀ ∈ ∀ ≤ ≤ + ≥i j P ik jk ijkp p V k k m θ θ θ (23)

, , ,1 : 1 ′∀ ∈ ∀ ≤ ≤ + − ≤i j P ik jk ijkp p V k k m θ θ θ (24)

4 Heuristic solution based on random greedy selection approach

Most of the research efforts in developing algorithms for mapping and voltage assignments in NoC have been based on heuristics due to the hardness of the problems. In this section, we provide a greedy heuristic to solve the mapping and voltage islanding problem defined in Section 2. One potential drawback of greedy policy is that it may quite easily get stuck into local optimal solutions, which in effect prevents it to reach the global optimal. We introduce randomness in our greedy heuristic in order to avoid this. We define a user-specified selection parameter a

(0 1),≤ ≤a a which determines the relative weightage of randomness and the greedy approach. Before explaining the

138 P. Ghosh and A. Sen

details of the heuristic steps, we give an overview of the important factors considered in the design of our heuristic:

• Communication volume ijc – More is the value of ijc for a pair of , ( ),∈ ∈i j P ij Pp p V e E it is desirable to place them closer to each other in order to reduce the communication energy consumption.

• Common allowable voltage levels ∩i jL L – More is the value of | |,∩i jL L the pair of nodes , ∈i j Pp p V can be more likely operated at the same lower voltage level when placed close to each other. This will reduce both the voltage island fragmentation (and thus, reducing the number of voltage islands), and the energy consumption due to computation and level shifters.

• Delay bound ijλ – Less is the value of ijλ for an edge ,∈ij Pe E it is desirable to place them closer to each

other in order to satisfy the delay bounds.

• Distance ijd – More is value of ijd between ip and jp in the current placement, it may be required to move them closer.

Based on this observation, we define a function ( , )f i j for every ,ij Pe E∈ which is directly proportional to

,| |ij i jc L L∩ and ,ijd while inversely proportional to .ijλ Then, we can claim that higher the value of ( , ),f i j it is desirable to place ip and jp closer to each other. Before defining the function, we need to scale all the constituting factors into the same order of magnitude, so that none of

them dominate the value of ( , )f i j alone. It should be noted that only the ijd factor changes from solution to solution and thus, scale the effect of the other factors accordingly in deciding the movement of the PEs. Now, we can define

( , )f i j as:

( ) | |, : ( , )

× ∩∀ = ∈ = ×ij i jij i j P ij

ij

c L Le p p E f i j d

λ (25)

In order to compare the solution, we use the following cost function associated with each solution:

( )1 2 3= + + + ×penC E E E DVφ (26)

where 1E is the computation energy consumption, 2E is the level shifter energy consumption, and 3E is the communication energy consumption. DV is the number of delay bound violations in the solution. We set penφ to be a constant of considerably higher order of magnitude. The introduction of the final term in the cost function helps to ignore the delay constraint violations while finding a solution, and later remove these solutions from final consideration as they are easily distinguishable from others due to the higher order of magnitude of .penφ Thus, the solutions with delay violations will be less likely to be returned as final solution as they will have significantly higher cost than the others without any delay violation. The details of the algorithms are as follows.

Algorithm 1 Random greedy selection heuristic

Require: Given problem formulation in Section 2, user-specified selection parameter (0 1),≤ ≤a a and two iteration bounds N1 and N2 Ensure: Solution S consisting of the mappings and the voltage level assignments for all PEs .∈i Pp V

1 Initialise solution set χ as empty 2 for ( )11 to =i N do

3 Randomly place the PEs on distinct locations of .∈p Rv V

4 for ( )21 to =j N do

5 ,∀ ∈ij Pe E calculate the set F of ( , )f i j values.

6 Sort F according to ( , )f i j values. 7 From the | |×a F highest values of F select ( , )∈f x y F randomly. 8 while ( and x yp p are already adjacently placed) do

9 \ { ( , )}=F F f x y 10 Select a new ( , )∈f x y F 11 end while 12 ( )∀ ∈w xp neighbour p and ( ),∀ ∈z yp neighbour p calculate the set H of common voltage cardinality values, i.e.,

∩x wL L and .∩y zL L

13 Select the minimum value element from ,H which is due to neighbour .up 14 if ( ) is nieghbour of u yp p then

15 Swap position of up with xp 16 else 17 Swap position of up with yp

18 end if 19 Set Voltage level of up and yp at ( )min ,= ∩x yv L L and fix them as constants within the inner iteration

Energy efficient mapping and voltage islanding for regular NoC under design constraints 139

20 end for 21 Calculate number of islands numIslands 22 while ( )>numIslands k do 23 Find PE xp operating at the lowest voltage level and minimum number of neighbours at the same voltage 24 Voltage assignment of xp is increased to the next higher voltage level of neighbours 25 end while 26 Save solution as ,iS i.e., ∪ iSχ= χ 27 end for 28 Select ∈S χ such that ( )( )( ) min ∈=

iS iC S C Sχ

29 return S

In line 1 of the algorithm, the current solution set is initialised as empty. The for-loop (line 2 to 27), iterates 1N times, each returning a solution .iS In each iteration, first we set the placement of the PEs randomly (line 3). In the for-loop (line 4 to 20), this solution is perturbed 2N number of times. First we calculate the set F of ( , )f i j values for all ,ij Pe E∈ and sort them (lines 5–6). Based on the user specified parameter ,a we select | |×a F highest values of the set F and select one of them randomly (line 7). For example, if 0.3,=a then 30% of the highest values of F are considered and one of them is selected randomly. We call this selection as ( , ).f x y While xp and yp are already adjacently placed, we remove them from F and select another pair ,x yp p corresponding to ( , )f x y (lines 8–11). Among all neighbours of x and ,y the one up having minimum overlapping voltage range with xp or yp is chosen (lines 12–13), and swapped positions with xp or yp accordingly (lines 14–18). Now, xp and yp will be adjacently placed, and we set their voltage level to the lowest common one (line 19). After 2N such perturbations are performed, we calculate the number of islands in the current solution, numIslands (line 21). If this value is greater than the maximum limit on the number of voltage islands ,κ we choose the PE which is operating at the lowest voltage level and having minimum number of neighbours at the same voltage level. We change the voltage level of this PE to a higher one which is minimum of all its neighbours. This is performed until the ≤numIslands κ (lines 22–25). The current solution is appended in X as iS (line 26). At the end of 1N iterations, the solution having the minimum cost value is returned (line 29). It is to be noted here that all ∈XiS are feasible solutions in terms of maximum allowable voltage islands constraint, but some of them may violate the delay bound constraint. The solutions with delay violations will have significantly higher cost than the ones without any delay violation due to the high penalty penφ introduced in the cost function .C Thus, they are

easily removed from the final consideration. The number of islands is calculated in line 21, and

inside the while-loop (lines 22–25). Two PEs are said to be in the same island if they share a boundary and operate at the same voltage, or they share boundaries with a common empty dead space (i.e., an NoC router to which none of the PEs are mapped) between them and operate at the same voltage level. Using the neighbour information we build a graph where the nodes are the PEs or the dead-space area.

There is an edge between a pair of nodes if they are adjacent to each other. We assign a colour to each PE node, depending on their voltage level. In this graph, we calculate the number of connected components, such that each node in a component has the same colour, or it is a node corresponding to a dead space. These components are eventually the abstract representation of the voltage islands. Therefore, number of such components represents the number of voltage islands.

4.1 Computational complexity of heuristic

Given n number of PEs, the initial random mapping of PEs in each iteration of the outer for-loop (lines 2–27) can be performed in ( )O n time. In each iteration of the inner for-loop, the calculation of ( , )f i j can be performed in

2( )O n time (since there are 2( )O n number of such values). The sorting of F (line 6), therefore takes

2( log )O n n time. The rest of the operations inside this for-loop can be performed in 2( )O n time. Since the loop iterates 2N number of times, the overall complexity of the inner for-loop is 2

2( log ).O N n n Calculation of the number of islands following the method described in the previous paragraph takes 2( )O n time. Since, we are using five distinct voltage levels, and in each iteration of the while-loop (lines 22–25) at least one of the PEs voltage level is increased by at least one step, therefore, the while-loop runs at the worst case ( )O n times. Therefore, the complexity of the body of the outer for-loop is

2 22( log ).+O N n n n Since, the outer for-loop (lines 2–27)

iterates 1N number of times, the overall complexity of the heuristic is 2 2

1 2 1( log ).+O N N n n N n For fixed constants 1N and 2 ,N the asymptotic time complexity of the

heuristic is given by 2 2 2( log ) ( log ).+ =O n n n O n n

5 Experimental results

In this section, we present the experimental results. We investigate the effect of several parameters on the cost of the solution. We use the benchmark applications, auto-indsutry, consumer, networking and office-automation from the E3S benchmark suite (Dick, 2000) and three real applications MPEG4, multi-window display (MWD) and object place decoder (OPD). The number of nodes and communication edges of the graphs are shown in Table 1. The

140 P. Ghosh and A. Sen

communication trace graphs (CTGs) are shown in Figure 2 and Figure 3.

Table 1 Number of nodes and edges in the applications

Application Nodes Edges

Auto-industry 24 21 Consumer 12 12 Networking 13 9 Office-automation 5 5 MPEG4 12 13 MWD 12 11 OPD 16 17

Figure 2 E3S benchmark application CTGs, (a) auto-industry (b) consumer (c) networking (d) office-automation

(a) (b)

(c) (d)

Figure 3 MPEG4, MWD and OPD application CTGs, (a) MPEG 4 (b) MWD (c) OPD

(a)

(b)

(c)

We used five discrete choices of voltage levels as 0 3.6 ,V V= 1 3.3 ,V V= 2 2.5 ,V V= 3 2.3 ,V V= and 4 1.9 .V V= For E3S benchmark applications, we used the

power consumption estimation of the tasks on the PEs as provided in the benchmark. In order to obtain the power estimation for each task of the real applications, MPEG4, MWD and OPD, we used the following approach. The E3S benchmark suite mentions the power consumption estimations for each task. Also, the tasks in the E3S benchmark applications can easily be classified as computing intensive, I/O read-write or memory read-write intensive tasks. We used similar classifications for the tasks in the real applications, and used similar values for power consumption given in the E3S benchmark suite as estimation of the power consumption values for the tasks in the MPEG4, MWD and OPD applications. Using static delay analysis, we can assign the deadline values for each task of the applications. Starting with the application deadline, and based on the variation of the power consumption and execution times for each task on each PE and each voltage level, we used the slack budgeting technique as discussed by Hu and Marculescu (2004). Then using deadline of each task and the power consumption

Energy efficient mapping and voltage islanding for regular NoC under design constraints 141

variations with voltage change from the processor vendors’ datasheet mentioned in the benchmark, we assign allowable voltage levels for the PEs. Delay constraint parameters are varied in the range of small values and in the range of large values. The upper limit on the number of voltage islands is also set at both low (between three to five) and high (between seven and nine) values. We want to analyse the effects of the variations of the following parameters on the solution cost:

• 1 2,N N – With higher value of number of iterations, the algorithm gets to explore more number of solutions (one per iteration), and thus is likely to find better solutions.

• a – With higher value of the randomisation parameter mentioned in Section 4, it is more likely to avoid locally optimal solutions. However, with an extremely high value of ,a the advantage of the greedy selection is lost and it may also choose bad quality solutions.

• ,ijλ κ – Higher values of these constraint parameters make the problem more relaxed, whereas lower values will make the problem specification more stricter.

We used values of 1N as 20, 100, 200, 500 and the corresponding values of 2N as 5, 20, 40, 70. Three different values for the randomisation parameter a used are 0.3, 0.5 and 0.7. All the experiments were performed on a Pentium-4 3.2 GHz processor with 1 GB RAM. The heuristic is implemented in C++. The MILP execution for generation of optimal solution was done using ILOG CPLEX 10.0 Concert technology on the same machine. Considering the 7 CTGs, 4 possible combination of ijλ and κ (low-low, low-high, high-low and high-high), four values of 1 2,N N and three different values of ,a in total 336 experiments were performed. Since, the value of the cost function for the different datasets (CTGs) are quite different in the order of magnitude, in order to plot them in a single graph, instead of plotting the absolute values of the solution cost, we plot the relative values. For each dataset, this relative value (shown as scaled cost in all the plots hereafter) is defined as the ratio of the cost for a particular case to the minimum cost over all the cases. It can be noted, because of this, in all the plots the lowest value is always shown as 1 (corresponding to ratio 1).

From Figure 4, it can be seen that the value of the cost functions for all the seven test-cases follow similar trend, with variations of ijλ and .κ This is according to our expectation, since in case ( , )ij =λ κ (low, low), the constraints become stricter, leading to highest value of the cost functions. In case of ( , )ijλ κ being either (low, high) or (high, low), one of the constrains is relaxed, while the other one still being strict. In this case, the value of cost function is smaller than the previous case. In case of ( , )ij =λ κ (low, low), both the constraints are relaxed, and thus leading to lowest value of the cost function. In Figure 4, we have plotted optimal value of the cost functions, for all seven datasets, and varying the range of ( , ).ijλ κ

Figure 4 Optimal solution cost (scaled) for all application CTGs with varying range of constraints (delay bound, voltage island limit) (see online version for colours)

Figure 5 Effect of number of iterations 1( )N and perturbations 2( )N on heuristic solution, (a) vary 1 2( , ), 0.3N N a =

(b) vary 1 2( , ), 0.5N N a = (c) vary 1 2( , ), 0.7N N a = (see online version for colours)

(a)

(b)

(c)

142 P. Ghosh and A. Sen

Figure 6 Effect of randomisation parameter a on heuristic solution, (a) vary 1 2,( , ) (20,5)=a N N (b) vary

1 2,( , ) (100,20)=a N N (c) vary 1 2,( , ) (200,40)=a N N (d) vary 1 2,( , ) (500,70)=a N N (see online version for colours)

(a)

(b)

(c)

(d)

Figure 7 Optimal vs. heuristic solution for different values of delay bound and bound on the maximum number of voltage islands for benchmark and real applications, (a) optimal vs. heuristic solution for (delay bound, voltage island bound) = (low, low) (b) optimal vs. heuristic solution for (delay bound, voltage island bound) = (low, high) (c) optimal vs. heuristic solution for (delay bound, voltage island bound) = (high, low) (d) optimal vs. heuristic solution for (delay bound, voltage island bound) = (high, high) (see online version for colours)

(a)

(b)

(c)

(d)

Energy efficient mapping and voltage islanding for regular NoC under design constraints 143

Now we set the value of ( , )ijλ κ as (low, low). Then we vary the number of iterations 1N and the number of perturbations in each iteration 2.N The four different values taken by the 1 2( , )N N pair, are set as (20, 5), (100, 20), (200, 40) and (500, 70). The plots are shown in Figure 5, where the y -axis represents the cost function obtained by our heuristic and x -axis has four different values of

1 2( , ),N N with the value of a being fixed at 0.3, 0.5 and 0.7. It can be seen from the nature of the plots for all seven datasets that the algorithm performs better with increasing number of iterations, and reasonable good solution is achieved around (500, 70).

Next, in Figure 6, we choose three different values (0.3, 0.5, 0.7) of a along the x -axis, and observe its effect on the value of the cost function returned by heuristic. The value of 1 2( , )N N is fixed here at (20, 5), (100, 20), (200, 40) and (500, 70), respectively in the four subplots. Although, we do not expect our random greedy selection approach to follow strictly any pattern based on the value of

,a but it can be seen that in majority of the cases, it gives good result for a set at 0.5, since this is a good trade-off between the strictly greedy (when a is very low) and strictly random (when a is very high) approaches. In both of the Figure 5 and Figure 6, the ( , )ijλ κ pair is set at (low, low). In other cases also, we observe similar trend of the cost function, justifying our expectation of the nature of the heuristic.

In Figure 7(a)–Figure 7(d), we compare the results obtained by the heuristic with that of the optimal. Following our above observations we set 1 2( , )N N as (500, 70) and a as 0.5 while executing our heuristic. It can be seen from the plots in this figure that, for all seven CTGs and for all four values of ( , ),ijλ κ the solution cost of the heuristic is very close to that of the optimal. For all test-cases, the proposed heuristic finishes execution within a few seconds, whereas the CPLEX solver was executed for hours in order to achieve the optimal solution. Therefore, we have observed the difficulty of the problem with varying range of values of the delay bounds and the maximum limit on number of voltage islands. We have developed a heuristic resulting in good-quality solution. The effects of the parameters to be set in the heuristic are also examined. Also, out of all the 336 experiments, only once the heuristic was unable to return a feasible solution. This was determined by the value of the cost function being a order of magnitude higher due to the penalty associated for delay violation.

6 Conclusions

In this paper, we have considered the mapping and voltage islanding problem for NoCs in a unified fashion. Use of MSV has gained popularity among researchers due to its effectiveness in minimising energy consumption. However, it comes at the cost of the energy consumption introduced by the level shifters. Moreover, excessive number of voltage islands is undesirable due to the associated complexity of the power network layout. Considering the voltage islanding

at an early design phase, during mapping of the PEs to the NoC routers, can facilitate in restricting the number of voltage islands within a certain limit, and also in minimising the overall energy consumption. We formulated the mapping and voltage islanding problem as an optimisation problem. We provide both optimal and heuristic solution to the problem. Experimental results evaluate the quality of the heuristic solution as compared with that of the optimal.

References Benini, L. and De Micheli, G. (2002) ‘Networks on chips: a new

SoC paradigm’, IEEE Comp., January, Vol. 35, No. 1, pp.70–78.

Burd, T.D. and Brodensen, R.W. (2000) ‘Design issues for dynamic voltage scaling’, in Proc. of ISLPED, pp.9–14, Rapallo, Italy.

Dally, W. and Towles, B. (2001) ‘Route packets, not wires: on-chip interconnection networks’, in Proc. of DAC, June, pp.684–689, Las Vegas, Nevada, USA.

De-Micheli, G. and Benini, L. (2006) Networks on Chips, Morgan Kaufmann.

Dick, R. (2000) ‘Embedded system synthesis benchmarks suite (E3S)’, available at http://ziyang.eecs.umich.edu/∼dickrp/e3s/.

Hu, J. and Marculescu, R. (2004) ‘Energy-aware communication and task scheduling for network-on-chip architectures under real-time constraints’, in Proceedings of Design, Automation and Test in Europe Conf., February, Paris, France.

Hu, J., Shin, Y., Dhanwada, N. and Marculescu, R. (2004) ‘Architecting voltage island in core-based system-on-a-chip designs’, in Proc. of ISLPED, 9–11 August, pp.180–185.

Hung, W.L., Link, G.M., Xie, Y., Vijaykrishnan, N., Dhanwada, N. and Corner, J. (2005) ‘Temperature-aware voltage islands architecting in system-on-chip designs’, in Proc. of ICCAD, 2–5 October, pp.689–695.

Jantsch, A. and Tenhunen, H. (Eds.) (2003) Networks on Chip, Kluwer Academic Publishers.

Lackey, D.E., Zuchowski, P.S. and Bednar, T.R. (2002) ‘Managing power and performance for system-on-chip designs using voltage islands’, in Proc. of ICCAD, 10–14 November, pp.195–202.

Lee, W., Liu, H.Y. and Chan, Y.W. (2006) ‘Voltage island aware floorplanning for power and timing optimization’, in Proc. of ICCAD, 5–9 November, pp.389–394, San Jose, CA.

Liu, H-Y., Lee, W-P. and Chang, Y-W. (2007) ‘A provably good approximation algorithm for power optimization using multiple supply voltages’, in Proc. of DAC, 4–8 June, pp.87–890, San Diego, CA.

Mak, W.K. and Chen, J.W. (2007) ‘Voltage island generation under performance requirement for SoC designs’, in Proc. of ASPDAC, pp.798–803.

Ogras, U.Y., Marculescu, R., Choudhary, P. and Marculescu, D. (2007) ‘Voltage-frequency island partitioning for GALS-based networks-on-chip’, in Proc. of DAC, June, San Diego, California, USA.

Popovich, M., Friedman, E.G., Sotman, M. and Kolodny, A. (2005) ‘On-chip power distribution grids with multiple supply voltages for high performance integrated circuits’, in Proc. of GLSVLSI, April, pp.2–7, Chicago, Illinois, USA.

144 P. Ghosh and A. Sen

Sengupta, D. and Saleh, R.A. (2009) Application-driven voltage-island partitioning for low-power system-on-chip design’, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, March, Vol. 28, No. 3, pp.316–326.

Srinivasan, K., Chatha, K.S. and Konjevod, G. (2006) ‘Linear-programming-based techniques for synthesis of network-on-chip architectures’, IEEE Transactions on VLSI Systems, April, Vol. 14, No. 4, pp.407–420.

Wu, H. and Wong, M.D.F. (2007) ‘Improving voltage assignment by outlier detection and incremental placement’, in Proc. of DAC, 4–8 June, pp.459–464, San Diego, CA.

Wu, H., Wong, M.D.F., Liu, I-M. and Wang, Y. (2007) ‘Placement-proximity-based voltage island grouping under performance requirement’, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, July, Vol. 26, No. 7, pp.1256–1269.