23
Cluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy efficient scheduling for parallel applications on mobile clusters Ziliang Zong · Mais Nijim · Adam Manzanares · Xiao Qin Received: 2 April 2007 / Accepted: 26 September 2007 / Published online: 8 November 2007 © Springer Science+Business Media, LLC 2007 Abstract During the past decade, cluster computing and mobile communication technologies have been extensively deployed and widely applied because of their giant com- mercial value. The rapid technological advancement makes it feasible to integrate these two technologies and a revolu- tionary application called mobile cluster computing is aris- ing on the horizon. Mobile cluster computing technology can further enhance the power of our laptops and mobile devices by running parallel applications. However, schedul- ing parallel applications on mobile clusters is technically challenging due to the significant communication latency and limited battery life of mobile devices. Therefore, short- ening schedule length and conserving energy consump- tion have become two major concerns in designing effi- cient and energy-aware scheduling algorithms for mobile clusters. In this paper, we propose two novel scheduling strategies aimed at leveraging performance and power con- sumption for parallel applications running on mobile clus- ters. Our research focuses on scheduling precedence con- strained parallel tasks and thus duplication heuristics are Z. Zong · A. Manzanares · X. Qin ( ) Department of Computer Science and Software Engineering, Samuel Ginn College of Engineering, Auburn University, Auburn, AL 36849-5347, USA e-mail: [email protected] Z. Zong e-mail: [email protected] A. Manzanares e-mail: [email protected] M. Nijim Department of Computer Science, School of Computing, University of Southern Mississippi, Hattiesburg, MS 39406-0001, USA e-mail: [email protected] applied to schedule parallel tasks to minimize communi- cation overheads. However, existing duplication algorithms are developed with consideration of schedule lengths, com- pletely ignoring energy consumption of clusters. In this regard, we design two energy-aware duplication schedul- ing algorithms, called EADUS and TEBUS, to schedule precedence constrained parallel tasks with a complexity of O(n 2 ), where n is the number of tasks in a parallel task set. Unlike the existing duplication-based scheduling algo- rithms that replicate all the possible predecessors of each task, the proposed algorithms judiciously replicate prede- cessors of a task if the duplication can help in conserving energy. Our energy-aware scheduling strategies are con- ducive to balancing scheduling lengths and energy savings of a set of precedence constrained parallel tasks. We con- ducted extensive experiments using both synthetic bench- marks and real-world applications to compare our algo- rithms with two existing approaches. Experimental results based on simulated mobile clusters demonstrate the ef- fectiveness and practicality of the proposed duplication- based scheduling strategies. For example, EADUS and TABUS can reduce energy consumption for the Gaussian Elimination application by averages of 16.08% and 8.1% with merely 5.7% and 2.2% increase in schedule length respectively. Keywords Precedence constraints · Parallel applications · Duplication · Scheduling · Energy conservation · Mobile clusters 1 Introduction Nowadays, high-performance clusters have been widely used to solve challenging and rigorous engineering tasks in

Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

Cluster Comput (2008) 11: 91–113DOI 10.1007/s10586-007-0044-5

Energy efficient scheduling for parallel applicationson mobile clusters

Ziliang Zong · Mais Nijim · Adam Manzanares ·Xiao Qin

Received: 2 April 2007 / Accepted: 26 September 2007 / Published online: 8 November 2007© Springer Science+Business Media, LLC 2007

Abstract During the past decade, cluster computing andmobile communication technologies have been extensivelydeployed and widely applied because of their giant com-mercial value. The rapid technological advancement makesit feasible to integrate these two technologies and a revolu-tionary application called mobile cluster computing is aris-ing on the horizon. Mobile cluster computing technologycan further enhance the power of our laptops and mobiledevices by running parallel applications. However, schedul-ing parallel applications on mobile clusters is technicallychallenging due to the significant communication latencyand limited battery life of mobile devices. Therefore, short-ening schedule length and conserving energy consump-tion have become two major concerns in designing effi-cient and energy-aware scheduling algorithms for mobileclusters. In this paper, we propose two novel schedulingstrategies aimed at leveraging performance and power con-sumption for parallel applications running on mobile clus-ters. Our research focuses on scheduling precedence con-strained parallel tasks and thus duplication heuristics are

Z. Zong · A. Manzanares · X. Qin (�)Department of Computer Science and Software Engineering,Samuel Ginn College of Engineering, Auburn University, Auburn,AL 36849-5347, USAe-mail: [email protected]

Z. Zonge-mail: [email protected]

A. Manzanarese-mail: [email protected]

M. NijimDepartment of Computer Science, School of Computing,University of Southern Mississippi, Hattiesburg, MS 39406-0001,USAe-mail: [email protected]

applied to schedule parallel tasks to minimize communi-cation overheads. However, existing duplication algorithmsare developed with consideration of schedule lengths, com-pletely ignoring energy consumption of clusters. In thisregard, we design two energy-aware duplication schedul-ing algorithms, called EADUS and TEBUS, to scheduleprecedence constrained parallel tasks with a complexity ofO(n2), where n is the number of tasks in a parallel taskset. Unlike the existing duplication-based scheduling algo-rithms that replicate all the possible predecessors of eachtask, the proposed algorithms judiciously replicate prede-cessors of a task if the duplication can help in conservingenergy. Our energy-aware scheduling strategies are con-ducive to balancing scheduling lengths and energy savingsof a set of precedence constrained parallel tasks. We con-ducted extensive experiments using both synthetic bench-marks and real-world applications to compare our algo-rithms with two existing approaches. Experimental resultsbased on simulated mobile clusters demonstrate the ef-fectiveness and practicality of the proposed duplication-based scheduling strategies. For example, EADUS andTABUS can reduce energy consumption for the GaussianElimination application by averages of 16.08% and 8.1%with merely 5.7% and 2.2% increase in schedule lengthrespectively.

Keywords Precedence constraints · Parallel applications ·Duplication · Scheduling · Energy conservation · Mobileclusters

1 Introduction

Nowadays, high-performance clusters have been widelyused to solve challenging and rigorous engineering tasks in

Page 2: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

92 Cluster Comput (2008) 11: 91–113

industry and scientific areas like molecular design, weathermodelling, database systems, and complex image rendering,to name just four examples. In the near future, with the rapidadvancement and extensive deployment of mobile commu-nication technology, we can predict that the mobile nodeswill also enter as a contributing entity to cluster computingin addition to the static nodes, which will result in brand newtechnologies like mobile computing. Empowered by the mo-bile computing technology, researchers, scientists, engineersand public users can interact with the critical data and usefulinformation conveniently without the concern of where theyare. However, mobile cluster systems have the following se-vere constraints like limited wireless bandwidth, frequentdisconnections and low power resources. These constraintsmay cause message loss or power exhausted in real appli-cations, which could lead to catastrophic sequence for endusers. Although due to the significant transmission over-head and unsatisfied reliability of wireless networks, mobileclusters may not be the best choice to provide high perfor-mance service for large scale and grand challenge applica-tions currently, we firmly believe that the constraints andweakness of mobile clusters will be overcome when nextgeneration wireless technology becomes mature. Further,there are many applications can benefit from mobile clustertechnology in our real life. Examples include oil rig sensors,sensors and monitors in earth quake detection/prediction,marketing representatives traveling all over the world withlaptops and sales data feeding in, and disaster managementsystems. Applications of this nature are executed on a sin-gle system, but the data sources are physically at distantmobile locations and most importantly, all of these appli-cations are energy critical because usually the mobile com-puting nodes do not have enough power due to the limitedbattery resources. Therefore, it is clearly desirable to im-plement energy-efficient scheduling algorithms for paral-lel applications running on mobile clusters. However, de-signing feasible parallel scheduling algorithms for mobileclusters is highly challenging, because we have to take intoaccount multiple design objectives, including performance(throughput and response times), energy efficiency, quality-of-service (QOS), and reliability. In this paper, we mainlyfocus our research on the issues related to performance andpower consumption.

Although recently much attention has been paid toprocessor and memory energy conservation in clusters, sav-ing energy in cluster interconnects for parallel applicationsremains an open problem. Reducing energy dissipation ininterconnects is of critically importance as a significantamount of the total energy consumption in a cluster is dueto the interconnect fabric. For example, it is observed thatinterconnect consumes 33 percent of the total energy in anAvici switch [10, 15], whereas routers and links consume37 percent of the total power budget in a Mellanox server

blade [25]. The energy consumption in interconnects be-comes even more critical for communication-intensive par-allel applications running on mobile clusters, which exten-sively make use of wireless cluster interconnects to trans-fer data among precedence constrained parallel tasks. Lackof energy conservation technology for mobile cluster in-terconnects becomes a severe problem because, withoutsuch technology, reducing energy consumption caused bycommunication-intensive parallel applications is most un-likely.

Task partitioning and scheduling strategies play an im-portant role in achieving high performance for parallel ap-plications on mobile clusters. A partitioning algorithm canbe employed to partition a parallel application into a set ofprecedence constrained tasks represented in the form of adirected acyclic graph (DAG), whereas a scheduling algo-rithm can be used to schedule the DAG onto the mobile com-putational nodes of a cluster. Scheduling precedence con-strained parallel tasks on mobile clusters is difficult becauseof, in part, high communication overheads exhibited by par-allel applications running on mobile clusters. Duplicationheuristics are proved an efficient strategy to schedule par-allel tasks to minimize communication overhead. However,Existing duplication-based scheduling algorithms are not fitfor mobile clusters because they merely consider schedulelengths, completely ignoring energy consumption issue. Toremedy this deficiency, in this paper we design two energy-aware duplication scheduling algorithms to schedule a set ofprecedence constrained parallel tasks in a judicious way toimprove performance (shorten schedule lengths) while opti-mizing energy consumption in mobile clusters.

In this research, we first build an energy consumptionmodel used to estimate power dissipation in CPUs and net-work links. The model is constructed for measuring en-ergy consumption incurred by a set of precedence con-strained parallel tasks on a mobile cluster. Second, we pro-posed two duplication-based scheduling algorithms, calledEADUS and TEBUS, to provide energy savings in net-work links by duplicating tasks on more than one compu-tational node to reduce network traffic. EADUS is designedto aggressively provide the greatest energy savings by us-ing task replicas to eliminate energy-consuming messages,whereas TEBUS aims at making tradeoffs between energyconservation and performance. Hence, the two algorithmsare named the Energy-Aware Duplication Scheduling algo-rithm (or EADUS for short) and the Time-Energy BalancedDuplication Scheduling algorithm (or TEBUS for short).Finally, to demonstrate the effectiveness of the proposedscheduling strategies, we designed and implemented a sim-ulated mobile cluster computing system with wireless inter-connects. We compare our approaches against two existingscheduling schemes in terms of energy dissipation and per-formance, using both synthetic benchmarks and real-worldapplications.

Page 3: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

Cluster Comput (2008) 11: 91–113 93

Fig. 1 Architecture of mobile cluster

The rest of the paper is organized as follows. In Sect. 2,we present the architecture of mobile cluster and briefly de-scribe related work. Next, Sect. 3 introduces mathematicalmodels including a system model, a task model, and an en-ergy consumption model. In Sect. 4 we present the energy-aware scheduling strategies and qualitatively compare thetwo strategies with the two existing approaches. Section 5demonstrates the working of the EADUS and TEBUS algo-rithms using a sample task graph. Experimental environmentand simulation results are analysis in Sect. 6. Finally, Sect. 7provides the concluding remarks and future research direc-tions.

2 Architecture of mobile cluster and related work

A high performance cluster is a type of parallel process-ing system, which consists of a collection of intercon-nected stand-alone computers cooperatively working to-gether as a single, integrated computing system. All theseloosely coupled computers do not have common mem-

ory; they communicate with each other by passing mes-sages. The major difference between high performance clus-ter and mobile cluster is that mobile cluster consists ofonly mobile nodes or both mobile and stationary nodes.All the nodes communicate with each other either by wire-less network (e.g. cellular network, Bluetooth) and/or reli-able physical network with high speed (e.g. Merinet, Eth-ernet). Generally, a mobile cluster system can be dividedinto three main components, which are hardware compo-nents, network protocols and software components (seeFig. 1). More specifically, hardware components includePC, workstation, high performance computers and under-neath network equipments (e.g. routers, switches and net-work interface cards). High-quality network protocols arecritical for mobile cluster since the computing nodes com-municate with each other both by wireless and wired net-works. Therefore, the mobile cluster communication pro-tocol should provide service for cellular networks, wire-less LAN, high-speed laser links and maybe satellite com-munication if needed. In addition, mobile IP protocol androaming protocol are necessary to support mobile comput-

Page 4: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

94 Cluster Comput (2008) 11: 91–113

ing. Software components consist of a general operatingsystem (responsible for mobility management, channel re-source management etc.), mobile cluster middleware (taskpartition, task allocation and task synchronization) and mes-sage passing library (provide message passing scheme andmanage messages). Since the mobile clusters are applied toboth sequential and parallel applications, a special paral-lel computing layer should be designed to support parallelcomputing.

To the best of our knowledge, the research on mobilecluster computing systems is still in its infancy. Very few pa-pers addressed this type of promising computing platformsusing mobile clusters [8, 9, 14, 21]. Zheng et al. defined andanalyzed the potential application environment of mobilecomputing technology and gave a generic architecture of amobile cluster system [8]. Basit and Chang presented a basicarchitecture of mobile cluster which uses IPV6 as a primaryprotocol [9]. Maluk Mohamed proposed a model for distrib-uted processing on mobile clusters, which include both sta-tic and mobile computing nodes in [14] and an anonymousremote mobile cluster computing paradigm was proposedin [21].

In addition, the energy consumption issues existing incluster systems have not attracted enough attention either.Researchers mainly concentrate on the performance, relia-bility and security of cluster system before 2000. Recently,people start to realize that energy consumption issue is crit-ical since energy demands of clusters have been steadilygrowing in the last six years. A handful of previous stud-ies investigated energy-aware processor and memory designtechniques to reduce energy consumption in CPU and mem-ory resources [5, 7, 28, 29]. IBM researchers E. Elnozahy,M. Kistler, and R. Rajamony proposed Request BatchingPolicy (RBP) in 2002, in which the servicing of incomingrequests is delayed while the web server is kept in a lowpower state. Incoming requests are accumulated in mem-ory until a request has been kept pending for longer thana specified batching timeout. RBP can save energy becausewhile requests are being accumulated, the processor can beplaced in a lower power state such as deep sleep [1]. Dy-namic power management is designed aiming at achiev-ing requested performance with minimum number of ac-tive components or a minimum load on such components[5, 6, 23]. Dynamic power management consists of a collec-tion of energy-efficient techniques that adaptively turn offsystem components or reduce their performance when thecomponent are idle or partially unexploited. For example,based on the observation of past idle and busy periods, pre-dictive shut-down policies can make power management de-cisions when a new idle period starts [13, 16, 35].

A large body of research has been conducted into power-aware scheduling [2, 3, 18, 37, 39]. For example, variable

voltage scheduling schemes were proposed to achieve min-imum energy consumption by adjusting voltage and fre-quency of processors [19, 24]. Shin and Choi proposed ascheme to slow down a processor when there is a singletask eligible for execution [32]. Yao et al. developed a staticoff-line scheduling algorithm [38], whereas Hong et al. pro-posed on-line heuristics scheduling for aperiodic tasks [20].Very recently, we proposed a task allocation strategy, whichcan minimize overall energy consumption while confiningschedule lengths to an ideal range [37]. We also studied apower-aware message scheduling algorithm in the context ofreal-time wireless networks [2]. However, most of the priorwork in the area of energy-aware scheduling has focused onenergy consumed by processors, which are not appropriatefor communication intensive applications running on mobileclusters. In this study, we aim at developing energy conser-vation techniques for both processors and interconnectionof a mobile cluster. As such, our approaches are in sharpcontrast to the existing scheduling algorithms for processorenergy conservation.

Scheduling strategies deployed in clusters have a largeimpact on overall system performance. Three extensivecategories for scheduling schemes include priority basedscheduling, cluster based scheduling, and task duplicationbased scheduling [4, 30]. Priority based scheduling in-volves the assignment of priorities to tasks and then mapsthose tasks to processors based upon assigned priorities.These methods trade performance (in terms of total exe-cution time of all scheduled tasks) for simplicity of imple-mentation [33]. Cluster based scheduling algorithms clus-ter intercommunicating tasks within a single processor,thereby eliminating communication overheads [26]. Du-plication based scheduling strategies address the problemthat inter-processor communication in parallel and distrib-uted systems accounts for a major portion of total sys-tem overhead. The basic idea behind duplication basedscheduling is to make use of processor idle times to repli-cate predecessor tasks. Many researchers have demonstratedthat various strategies regarding task duplication are ex-tremely applicable for reducing the total execution timewithin a system employing static scheduling [4, 11, 12, 30].In duplication-based scheduling strategies exhibiting per-formance improvements over other scheduling methods, re-dundantly executed tasks either eliminate communicationoverheads or allow the productive utilization of idle proces-sor times. Depending on a number of metrics, most no-tably a communication-to-computation ratio involving as-pects of the entire system, significant performance gainscan be achieved. Our algorithms are fundamentally differ-ent from existing duplication-based scheduling approachesin that ours are the first two duplication-based schedulingstrategies designed to conserve energy consumption in mo-bile clusters.

Page 5: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

Cluster Comput (2008) 11: 91–113 95

3 System models

In this section, we build mathematical models used to repre-sent a mobile cluster system, precedence constrained paral-lel tasks, and energy consumption in mobile processors andinterconnects.

3.1 Task model

A parallel application with a set of precedence-constrainedtasks is represented in form of a Directed Acyclic Graph(DAG), which throughout this paper is modeled as a pair(V ,E). V = {v1, v2, . . . , vn} represents a set of precedenceconstrained parallel tasks, and ti is the ith task’s computa-tion requirement showing the number of time units to com-pute vi , 1 ≤ i ≤ 0. It is assumed that all the tasks in V arenonpreemptive and indivisible work units, and a similar as-sumption can be found in related studies [11, 27]. E de-notes a set of messages representing communications andprecedence constraints among parallel tasks. Thus, eij =(vi, vj ) ∈ E is a message transmitted from task vi to vj , andcij is the communication cost of the message eij ∈ E. Weassume in this study that there is one entry task and one exittask for an application with a set of precedence-constrainedtasks. The assumption is reasonable because in case of mul-tiple entry or exit tasks exist, the multiple tasks can alwaysbe connected through a dummy task with zero computationcost and zero communication cost messages.

The communication-to-computation ratio or CCR of aparallel application is defined as the ratio between the av-erage communication cost and the average computation costof the application on a given cluster. Formally, the CCR ofan application (V , E) is given by the (1):

CCR(V ,E) =1

|E|∑

eij ∈E cij

1|V |

∑|V |i=1 ti

. (1)

A task allocation matrix (e.g., X) is an n × m binary ma-trix reflecting a mapping of n precedence constrained par-allel tasks to m computational nodes in a cluster. Elementxij in X is “1” if task vi is assigned to node pj and is “0”,otherwise.

3.2 Cluster model

A mobile cluster system in this study is characterized bya set P = {p1,p2, . . . , pm} of mobile computational nodes(hereinafter referred to as nodes) connected by a wirelessinterconnects. It is assumed that the computational nodesare homogeneous in nature, meaning that all the comput-ing nodes are identical in their capabilities. Similarly, theunderlying interconnection is assumed to be homogeneousand, thus, communication overhead of a message with fixed

data size between any pair of nodes is considered to be thesame. Each node communicates with other nodes throughmessage passing, and the communication time between twoprecedence constrained tasks assigned to the same node isnegligible. In our system model, computation and communi-cation can take place simultaneously. This is reasonable be-cause we assume that each computational node in a modernmobile cluster has a communication coprocessor that can beused to free the processor in the node from communicationtasks [22].

To simply the system model without loss of generality,we assume that the cluster system is fault free and the pagefault service time of each task is integrated into its executiontime. With respect to energy conservation, energy consump-tion rate of each node in the system is measured by Jouleper unit time. Each interconnection link is modelled charac-terized by its energy consumption rate that heavily relies ondata size and the transmission rate of the link.

3.3 Energy consumption model

We use a bottom-up approach to derive energy dissipationexperienced by a parallel application running on a mobilecluster. In this subsection, we first model energy consump-tion exhibited by computational nodes in the cluster. Next,we calculate energy dissipation in the interconnection net-work of the mobile cluster.

Let eni be the energy consumption caused by task vi

running on a computational node, of which the energy con-sumption rate is PNactive, and the energy dissipation of taskvi can be expressed as

eni = PNactive · ti . (2)

Given a parallel application with a task set V and allocationmatrix X, we can calculate the energy consumed by all thetasks in V using (3).

ENactive =|V |∑

i=1

eni =n∑

i=1

(PNactive · ti )

= PNactive

n∑

i=1

ti . (3)

Let PNidle be the energy consumption rate of a compu-tational node when it is inactive, and fi be the completiontime of task ti . The energy consumed by an inactive node isa product of the idle energy consumption rate PNidle and anidle period. Thus, we can use (4) to obtain the energy con-sumed by the j th computational node in a mobile clusterwhen the node is sitting idle.

ENj

idle = PNidle ·(

nmaxi=1

(fi) −n∑

i=1

(xij · ti ))

, (4)

Page 6: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

96 Cluster Comput (2008) 11: 91–113

where maxni=1(fi) is the schedule length (also known as

makespan time), and maxni=1(fi) − ∑n

i=1 xij · ti is the to-tal idle time on the j th node. The total energy consumptionof all the idle nodes is

ENidle =m∑

j=1

enj

idle = PNidle ·m∑

j=1

(n

maxi=1

(fi) −n∑

i=1

(xij · ti ))

= PNidle ·(

m · nmaxi=1

(fi) −m∑

j=1

n∑

i=1

(xij · ti ))

. (5)

Consequently, the total energy consumption of the parallelapplication running on the mobile cluster can be derivedfrom (3) and (5) as

EN = ENactive + ENidle

= PNactive

n∑

i=1

ti + PNidle

×(

m · max sni=1(fi) −

m∑

j=1

n∑

i=1

(xij · ti ))

. (6)

We denote elij as the energy consumed by the transmis-sion of message (ti , tj ) ∈ E. We can compute the energyconsumption of the message as a product of its communica-tion cost and the power PLactive of the link when it is active:

elij = PLactive · cij (7)

The interconnection in this study is homogeneous, whichimplies that all messages are transmitted over the wirelessnetwork at the same transmission rate. The energy con-sumed by a network link between pa and pb is a cumula-tive energy consumption caused by all messages transmit-ted over the link. Therefore, the link’s energy consumptionis obtained by (8) as follows, where Lab is a set of mes-sages delivered on the link, and Lab can be expressed asLab = {∀eij ∈ E, 1 ≤ a, b ≤ m|xiu = 1 ∧ xjv = 1}.ELab

active =∑

eij ∈Lab

elij =∑

eij ∈Lab

(PLactive · cij )

=n∑

i=1

n∑

j=1,j �=i

(xia · xjb · PLactive · cij ), (8)

The energy consumption of the whole wireless networkis derived from (8) as the summation of all the links’ energyconsumption. Thus, we have

ELactive =m∑

a=1

m∑

b=1,b �=a

ELabactive

=n∑

i=1

n∑

j=1,j �=i

m∑

a=1

m∑

b=1,b �=a

(xia · xjb · PLactive · cij ).

(9)

We can express energy consumed by a link when it isinactive as a product of the consumption rate and the idleperiod of the link. Thus, we have

ELabidle = PLidle ·

(n

maxi

(fi) −n∑

i=1

n∑

j=1,j �=i

(xia · xjb · cij )

)

,

(10)

where PLidle is the power of the link when it is inactive,and maxn

i (fi) − ∑ni=1

∑nj=1,j �=i (xia · xjb · cij ) is the total

idle time of the link. We can express energy incurred by thewhole interconnection network during the idle periods as

ELidle =m∑

a=1

m∑

b=1,b �=a

ELabidle

=m∑

a=1

m∑

b=1,b �=a

PLidle

(n

maxi

(fi)

−n∑

i=1

n∑

j=1,j �=i

(xia · xjb · cij )

)

· (11)

Total energy consumption exhibited by the interconnectis derived from (9) and (11) as

EL = ELactive + ELidle, (12)

Now, we can compute energy dissipation experienced bya parallel application on a mobile cluster using (6) and (12).Hence, we can express the total energy consumption of themobile cluster system executing the application as

E = EN + EL = PNactive

n∑

i=1

ti

+ PNidle ·(

m · nmaxi=1

(fi) −m∑

j=1

n∑

i=1

(xij · ti ))

+n∑

i=1

n∑

j=1,j �=i

m∑

a=1

m∑

b=1,b �=a

(xia · xjb · PLactive · cij )

+m∑

a=1

m∑

b=1,b �=a

PLidle

(n

maxi

(fi)

−n∑

i=1

n∑

j=1,j �=i

(xia · xjb · cij )

)

. (13)

4 Energy-aware duplication strategies

In this section, we present two energy-aware duplicationstrategies, called EADUS and TEBUS, for scheduling par-allel applications with precedence constraints. The objec-tive of the two scheduling strategies is to shorten sched-ule lengths while optimizing energy consumption of mobile

Page 7: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

Cluster Comput (2008) 11: 91–113 97

clusters. The scheduling problem studied in this paper canbe shown to be NP-hard by mapping it to a scheduling prob-lem proven to be an NP-complete [17]. Therefore, the pro-posed two scheduling algorithms are heuristic in the sensethat they can produce suboptimal solutions in polynomial-time. The EADUS and TEBUS algorithms consist of threemajor steps delineated in Sects. 4.1–4.3.

4.1 Generate a task sequence

Precedence constraints of a set of parallel tasks have to beguaranteed by executing predecessor tasks before successortasks. To achieve this goal, the first step in our algorithms isto construct an ordered task sequence using the concept oflevel, which of each task is defined as the length in compu-tation time of the longest path from the task to the exit task.There are alternative ways to generate the task sequence fora DAG, including critical path-based priority schemes [4]and other priority-based schemes [31]. In this study, we usea similar approach as proposed by Srinivasan and Jha [34]to define the level L(vi) of task vi as below

L(vi) =⎧⎨

ti , if ∀1 ≤ j ≤ n : (ti , tj ) /∈ E,

ti + maxeij ∈E

(L(vj ) + cij ), otherwise. (14)

The levels of other tasks can be obtained in a bottom-upfashion by specifying the level of the exit task as its exe-cution time and then recursively applying the second termon the right-hand side of (14) to calculate the levels of allthe other tasks. Next, all the tasks are placed in a queue in anon-increasing order of the levels.

4.2 Calculate important parameters

The second phase in the EADUS and TEBUS algorithmsis to calculate some important parameters, which the algo-rithms rely on. The important notation and parameters arelisted in Table 1. Note that similar notation was used byDarbha and Agrawal in [11]. The earliest start time of theentry task is 0 (see the first term on the right side of (15). Theearliest start times of all the other tasks can be calculated ina top-down manner by recursively applying the second term

Table 1 Important notation and parameters

Notation Definition

EST(vi ) Earliest start time of task vi

ECT(vi ) Earliest completion time of task vi

FP(vi ) Favorite predecessor of task vi

LACT(vi ) Latest allowable completion time of task vi

LAST(vi ) Latest allowable start time of task vi

on the right side of (15).

EST(vi) =

⎧⎪⎨

⎪⎩

0, if ∀1 ≤ j ≤ n : eji /∈ E,

mineji∈E

( maxeki∈E,vk �=vj

(ECT(vj ),ECT(vk) + cki)),

otherwise.(15)

The earliest completion time of task vi is expressed asthe summation of its earliest start time and execution time.Thus, we have

ECT(vi) = EST(vi) + ti . (16)

Allocating task vi and its favorite predecessor FP(vi) onthe same computational node can lead to a shorter schedulelength. As such, the favourite predecessor FP(vi) is definedas below

FP(vi) = vj , where ∀eji ∈ E, eki ∈ E,

j �= k|ECT(vj ) + cji ≥ ECT(vk) + cki . (17)

As shown by the first term on the right-hand side of (18),the latest allowable completion time of the exit task equalsto its earliest completion time. The latest allowable comple-tion times of all the other tasks are calculated in a top-downmanner by recursively applying the second term on the right-hand side of (18).

LACT(vi) =

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

ECT(vi), if ∀1 ≤ j ≤ n: eij /∈ E,

min( mineij ∈E,vi �=FP(vj )

(LAST(vj ) − cij ),

mineij ∈E,vi=FP(vj )

(LAST(vj ))), otherwise.

(18)

The latest allowable start time of task vi is derived fromits latest allowable completion time and execution time.Hence, the LAST(vi) can be written as

LAST(vi) = LACT(vi) − ti . (19)

Figure 2 illustrates an example task graph with a set of 10tasks along with the corresponding important parameters.

4.3 Energy-aware task allocation and duplication phase

4.3.1 The EADUS algorithm

Given a parallel application presented in form of a DAG,the EADUS algorithm in this phase allocates each paral-lel task to a computational node in a way to aggressivelyshorten the schedule length of the DAG while conservingenergy consumption. The pseudocode in Fig. 3 shows the

Page 8: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

98 Cluster Comput (2008) 11: 91–113

Task Level EST ECT LAST LACT FP

1 40 0 3 0 3 –2 28 3 6 8 11 13 37 3 7 3 7 14 35 3 5 4 6 15 16 6 7 21 22 26 25 6 16 11 21 27 33 7 27 7 27 38 15 16 23 23 30 69 13 27 32 27 32 710 8 32 40 32 40 9

Fig. 2 An example task graph and its corresponding important parameters

1. v = first waiting task of scheduling queue;2. i = 0;3. assign v to Pi;4. while (not all tasks are allocated to computational nodes) do5. u = FP(v);6. if (u has already been assigned to another processor) then7. if (LAST(v) − LACT(u) < cvu) then /* if duplicate u, we can shorten the schedule length */8. moreenergy = enu − elvu; /* energy increase */9. if (moreenergy ≤ threshold h) then /* increased energy less than our threshold */

10. assign u to Pi ; /* duplicate u */11. if v has another predecessor z �= u has not yet been allocated to any node then12. u = z;13. else14. if u is entry task then15. u = the next task that has not yet been assigned to a node;16. i + +;17. else18. for another predecessor z of v, z �= u,19. if (ECT(u) + ccuv = ECT(z) + cczv) and z hasn’t been allocated) then20. u = z; /* do not duplicate */21. else22. for another predecessor z of x, z �= u,23. if (ECT(u) + ccuv = ECT(z) + cczv) and z hasn’t been allocated) then24. u = z; /* do not duplicate */25. else allocate u to Pi;26. v = u;27. if v is entry task then28. v = the next task that has not yet been allocated to a computational node;29. i + +;30. assign v to Pi;31. return schedule list, schedule length and energy consumption;

Fig. 3 Pseudocode of phase 3 in the EADUS algorithm

Page 9: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

Cluster Comput (2008) 11: 91–113 99

details of this phase in the EADUS algorithm, which aimsto provide the greatest energy savings when it reaches thepoint to duplicate a task. Most existing duplication-basedscheduling schemes merely optimize schedule lengths with-out addressing the issue of energy conservation. As such,the existing duplication-based approaches tend to yield min-imized schedule lengths at the cost of high energy con-sumption. To make tradeoffs between energy savings andschedule lengths, we design the EADUS algorithm in whichtask duplications are strictly forbidden if the duplications donot exhibit energy conservation (see Steps 9–10). In otherwords, duplications are infeasible if they result in a sig-nificant increase in energy consumption (e.g., the increaseexceeds a threshold) and, are avoided in EADUS. In doingso, the EADUS algorithm ensures that schedule lengths areminimized using task duplication without adversely affect-ing energy conservation.

Before this phase starts, phase 1 sorts all the tasks ina waiting queue are sorted, followed by phase 2 to calcu-late the important parameters. In phase 3 EADUS strives togroup communication-intensive parallel tasks together andhave them allocated to the same computational node. Oncemultiple task groups are constructed, each group of tasks isassigned to a different node in the cluster. The process ofgrouping tasks is repeated from the first task in the queue byperforming a depth-first style search, which traces the pathfrom the first task to the entry task. Steps 5 and 6 choose afavorite predecessor if it has not been allocated a computa-tional node. Otherwise, EADUS may or may not replicatethe favorite predecessor on the current node. For example,we assume that vj is the favorite predecessor of the currenttask vi , and vj has been allocated to another node. If dupli-cating vj on the current node to which vi is allocated canimprove performance without sacrificing energy conserva-tion, Step 12 makes a duplication of vj .

4.3.2 The TEBUS algorithm

The generation of a task group terminates once the pathreaches the entry task. The next task group starts from thefirst unassigned task in the queue. If all the tasks are assignedto the computation nodes, then the algorithm terminates.

The third phase of the TEBUS algorithm is similar as thatof EADUS except that TEBUS seamlessly integrate the ap-proach to minimizing schedule lengths with the process ofenergy optimization (see Fig. 4). Unlike EADUS, the de-velopment of TEBUS is motivated by the needs of makingthe right tradeoff between performance and energy conser-vation. Thus, the TEBUS algorithm is geared to efficientlyreduce schedule lengths while providing the greatest en-ergy savings. Energy consumption incurred by duplicatinga task involves judging whether the duplication is feasible

or not. To facilitate the construction of TEBUS, we intro-duce a concept of cost ratio of a duplication, which is de-fined as the ratio between the energy saving and schedulelength reduction (see Step 10). While the energy saving ofthe duplication is obtained in Step 8, the reduction in sched-ule length is computed in Step 9. The TEBUS algorithm is,of course, conducive to maintaining cost ratios at a low level,thereby efficiently shortening schedule lengths with low en-ergy consumption. This feature is accomplished by Steps11–12, which duplicate a task in case the cost ratio of suchduplication is smaller than a given threshold.

4.4 Properties

In this subsection, we first present the time complexity andmajor properties of the EADUS and TEBUS algorithms.Then, we qualitatively compare EADUS and TEBUS withtwo existing scheduling algorithms

Theorem 1 The time complexity of EADUS and TEBUS isO(|V |2).

Proof The EADUS and TEBUS algorithms perform thethree main phases respectively described in Sects. 4.1–4.3.In the first phase, EADUS and TEBUS traverse all the tasksof the DAG to compute the levels of the tasks. It time com-plexity to calculate the levels is O(|E |), where |E | is the num-ber of messages. This is because all the messages have tobe examined in the worst case. It takes O(|V | log |V |) timeto sort the tasks in the non-increasing order of the levels,where |V | = n is the number of tasks. Therefore, the timecomplexity of phase 1 is O(|E| + |V | log |V |).

The second phase is performed to obtain all the im-portant parameters like EST, ECT, FP, LACT, and LAST.Phase 2 calculates these parameters by applying the depthfirst search with the complexity of O (|V | + |E|).

Recall that in phase 3 the tasks are allocated to the com-putational nodes. First, all the tasks are checked and allo-cated to one or more nodes in the while loop based on dupli-cation strategies. In the worst case, all the tasks in the criticalpath must be duplicated, meaning that the time complexityis O(h|V |) time, where h is the height of the DAG. Since h

is less than or equal to |V |, the complexity of the third phaseis O(|V |2).

Consequently, the overall time complexities of EADUSand TEBUS are O(2|E|+ |V |(lg |V |+ 1)+|V 2| = O(|E|+|V |2). For a dense DAG, the number of messages are propor-tional to O(|V |2). Hence, the time complexities of EADUSand TEBUS are O(|V |2). �

Proposition 1 Let vj be the favorite predecessor of the cur-rent task vi , and vj has been allocated to another node.A duplication of vj is made on v′

is current node if the fol-lowing three conditions are satisfied:

Page 10: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

100 Cluster Comput (2008) 11: 91–113

1. v = first waiting task of scheduling queue;2. i = 0;3. assign v to Pi;4. while (not all tasks are allocated to computational nodes) do5. u = FP(v);6. if (u has already been assigned to another node) then7. if (LAST(v) − LACT(u) < cvu) then /* if duplicate u, we can shorten the execution time */8. moreenergy = enu − elvu; /* energy increase */9. lesstime = LAST(v) − LACT(u) − cvu; /* schedule length is reduced */

10. cost ratio = moreenergy / lesstime; /* value of ratio: the smaller the better */11. if (ratio ≤ threshold h) then /* significantly shorten schedule length */12. assign u to Pi ; /* duplicate u */13. if v has another predecessor v �= u has not yet been assigned to any node then14. u = v;15. else16. if u is entry task then17. u = the next task that has not yet been allocated to a computational node;18. i + +;19. else20. for another predecessor z of v, z �= u,21. if (ECT(u) + ccuv = ECT(z) + cczv) and z has not been allocated) then22. u = z; /* do not duplicate */23. else24. for another predecessor z of v, z �= u,25. if (ECT(u) + ccuv = ECT(z) + cczv) and z has not been allocated) then26. u = z; /* do not duplicate */27. else assign u to Pi;28. v = u;29. if v is entry task then30. v = the next task that has not yet been allocated;31. i + +;32. allocate v to Pi;33. return schedule list, schedule length and energy consumption

Fig. 4 Pseudocode of phase 3 in the TEBUS algorithm

• LAST(vi) − LACT(vj ) < cji ,• enu − elvu < threshold h, where enu − elvu are computed

by (2) and (7),• ¬(∃(vk, vi) ∈ E, k �= j , vk has not been allocated:

ECT(vk) + cki = ECT(vj ) + cji).

The first condition ensures that vj is a critical predecessorof vi . The second condition guarantees that the increase inenergy due to the duplication must be maintained at a lowlevel. The third condition is used to identify if v′

is otherunallocated predecessors can initially be the favorite pre-decessors. In case that such an initial favorite predecessor(e.g., vk) exists, the path to the entry task will be traversedthrough vk .

The following three theorems are used to qualitativelycompare EADUS and TEBUS with two existing scheduling

algorithms: the non-duplication-based scheduling heuristic(NDS) [36], and the task duplication-based scheduling algo-rithm (TDS) [11]. The NDS and TDS algorithms are brieflydescribed below.

(1) NDS: This non-duplication-based algorithm is also re-ferred to as the static priority-based Modified Criti-cal Path (MCP) algorithm [36] with time complexityO(n2(logn + m)), where n and m are the numbers oftasks and nodes, respectively. NDS, which does notduplicate any task, performs scheduling based on thecritical-path method.

(2) TDS: The TDS algorithm allocates all the tasks that arein critical path on the same processor. If tasks have al-ready been dispatched to other processors, TDS onlyduplicates the tasks that can potentially shorten the

Page 11: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

Cluster Comput (2008) 11: 91–113 101

scheduling length. The goal of TDS is to generate aschedule for a DAG with the shortest schedule length.

Theorem 2 Given a task set V = {v1, v2, . . . , vm . . . vn} tobe allocated by the NDS and EADUS algorithms, EADUSoutperforms NDS in terms of energy conservation if and onlyif

∀Duplicated vi allocated on node k, eij ∈ E, xjk = 1:(PLactive − PLidle

) · cij >(PNactive − PNidle

) · ti .

Proof Duplicating task vi ∈ V means that a copy of vi hasto be executed on another computational node. As such, theduplication of vi indicates that energy consumption causedby vi under EADUS is EEADUS = 2PNactive · ti + PLidle · cij ,where ti is the execution time of vi . In contrast, the en-ergy consumption of vi and its corresponding message underNDS is ENDS = PNactive · ti +PLactive ·cij +PNidle · ti , wherecij is the data transmission time of the message. Then, wecan prove that EADUS is superior to NDS in terms of energyconservation if and only if ENDS > EEAUDS. Thus, EADUSoutperforms NDS in energy saving if and only if

PNactive · ti + PLactive · cij + PNidle · ti> 2PNactive · ti + PLidle · cij . (20)

Inequality (20) can be rewritten as below:

(PLactive − PLidle) · cij > (PNactive − PNidle) · ti . (21)

Consequently, we can prove that EADUS performs betterthan NDS in energy conservation if and only if for everyduplicated task vi allocated on node k where (vi, vj ) ∈ E

and xjk = 1, (PLactive −PLidle) ·cij > (PNactive −PNidle) · ti .This completes the proof of Theorem 2. �

We next prove that TEBUS provides more energy savingsthan NDS under certain conditions. The proof of Theorem 3is omitted since it is similar to that for Theorem 2.

Theorem 3 Given a task set V = {v1, v2, . . . vm, . . . vn} tobe allocated by the NDS and TEBUS algorithms, TEBUSoutperforms NDS in terms of energy conservation if and onlyif

∀Duplicated vi allocated on node k, eij ∈ E, xjk = 1:(PLactive − PLidle) · cij > (PNactive − PNidle) · ti .

In most cases energy consumed by idle resources is neg-ligible and, hence, we have Corollaries 1–4 from Theorems2 and 3. The proofs of Corollaries 3 and 4 are omitted sincethey are respectively similar to those for Corollaries 1 and 2.

Corollary 1 In cases where energy consumed by idle re-sources is negligible, EADUS is superior to NDS in terms ofenergy conservation if and only if

∀Duplicated vi allocated on node k, eij ∈ E, xjk = 1:PLactive · cij > PNactive · ti .

Proof Setting the values of PLidle and PNidle to zero, werewrite inequality (21) as

PLactive · cij > PNactive · ti . (22)

Therefore, we prove that EADUS outperforms NDS in en-ergy savings if and only if for every duplicated task vi allo-cated on node k where (vi, vj ) ∈ E and xjk = 1, PLactive ·cij > PNactive · ti . �

Corollary 2 In cases where energy consumed by idle re-sources is negligible, EADUS is superior to NDS in terms ofenergy conservation if and only if

∀Duplicated vi allocated on node k, eij ∈ E, xjk = 1:

CCR >PNactive

PLactive.

Proof Given tasks vi and vj where (vi, vj ) ∈ E, we can ap-proximate the value of CCR using

cij

ti.

Corollary 1 shows that EADUS is superior to NDS interms of energy conservation if and only if

∀Duplicated vi allocated on node k, eij ∈ E, xjk = 1:cij

ti>

PNactive

PLactive. (23)

Expression (23) can be rewritten as

∀Duplicated vi allocated on node k, eij ∈ E, xjk = 1:

CCR >PNactive

PLactive, (24)

which completes the proof of Corollary 2. �

Corollary 3 In cases where energy consumed by idle re-sources is negligible, TEBUS outperforms NDS in terms ofenergy conservation if and only if

∀Duplicated vi allocated on node k, eij ∈ E, xjk = 1:PLactive · cij > PNactive · ti .

Corollary 4 In cases where energy consumed by idle re-sources is negligible, TEBUS outperforms NDS in terms ofenergy conservation if and only if

∀Duplicated vi allocated on node k, eij ∈ E, xjk = 1:

CCR >PNactive

PLactive.

Page 12: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

102 Cluster Comput (2008) 11: 91–113

Theorem 4 Given a task set V = {v1, v2, . . . vm, . . . vn}to be allocated by the TDS and EADUS algorithms, theEADUS outperforms TDS in energy conservation if and onlyif

∀Non-duplicated vi, eij ∈ E:(PNactive − PNidle) · ti > (PLactive − PLidle) · cij .

Proof The energy consumption of a non-duplicated vi andits message (vi, vj ) ∈ E under EADUS equals to EEADUS =PNactive · ti + PLactive · cij + PNidle · ti . The energy con-sumption caused by vi under TDS is ETDS = 2PNactive · ti +PLidle · cij . Now we prove that EADUS is superior to NDSin terms of energy saving if and only if ETDS > EEAUDS.Hence, we have to prove EADUS outperforms NDS in en-ergy conservation if and only if

2PNactive · ti + PLidle · cij

> PNactive · ti + PLactive · cij + PNidle · ti . (25)

We can rewrite the above inequality as follows:

(PNactive − PNidle) · ti > (PLactive − PLidle) · cij . (26)

Consequently, we prove that EADUS outperforms TDS inenergy conservation if and only if for every non-duplicatedtask vi and its message eij ∈ E, (PLactive − PLidle) · cij >

(PNactive − PNidle) · ti . This completes the proof. �

The following theorem proves that TEBUS reduces en-ergy consumption compared with NDS. The proof of The-orem 5 is similar to that for Theorem 4 and, therefore, theproof is omitted.

Theorem 5 Given a task set V = {v1, v2, . . . vm, . . . vn} tobe allocated by the TDS and TEBUS algorithms, the TEBUSoutperforms TDS in energy conservation if and only if

∀Non-duplicated vi, eij ∈ E:(PNactive − PNidle) · ti > (PLactive − PLidle) · cij .

With Theorems 4 and 5 in place, we can easily prove the cor-rectness of the following corollaries. The proofs of Corol-laries 5 and 6 are given as follows. We omit the proofs forCorollaries 7 and 8, which are similar to Corollaries 5 and 6.

Corollary 5 In cases where energy consumed by idle re-sources is negligible, EADUS outperforms TDS in terms ofenergy conservation if and only if

∀Non-duplicated vi, eij ∈ E: PNactive · ti > PLactive · cij .

Proof Setting the values of PLidle and PNidle to zero, werewrite inequality (26) as

PNactive · ti > PLactive · cij . (27)

Thus, it is proved that EADUS outperforms NDS in energysaving if and only if for every non-duplicated task vi andeij ∈ E, PNactive · ti > PLactive · cij . �

Corollary 6 In cases where energy consumed by idle re-sources is negligible, EADUS outperforms TDS in terms ofenergy conservation if and only if

∀Non-duplicated vi, eij ∈ E: CCR <PNactive

PLactive.

Proof Given two tasks vi and vj where (vi, vj ) ∈ E, weapproximate the value of CCR as cij /ti .

Corollary 5 shows that EADUS is superior to NDS interms of energy conservation if and only if

∀Non-duplicated vi, eij ∈ E: cij

ti<

PNactive

PLactive. (28)

The corollary is immediate when E (28) is rewritten as

∀Non-duplicated vi, eij ∈ E: CCR <PNactive

PLactive, (29)

which completes the proof. �

Corollary 7 In cases where energy consumed by idle re-sources is negligible, TEBUS is superior to TDS in terms ofenergy conservation if and only if

∀Non-duplicated vi, eij ∈ E: PNactive · ti > PLactive · cij .

Corollary 8 In cases where energy consumed by idle re-sources is negligible, TEBUS is superior to TDS in terms ofenergy conservation if and only if

∀Non-duplicated vi, eij ∈ E: CCR <PNactive

PLactive.

In what follows, we prove that the energy conservationperformance of EADUS is better than or equal to that ofTEBUS. Before presenting Lemma 1, we introduce an im-portant definition.

Definition 1 Given a task graph (V ,E), eij ∈ E is anenergy-critical message if the following condition is satis-fied (PLactive − PLidle) · cij > (PNactive − PNidle) · ti .

Lemma 1 Given an energy-critical message eij ∈ E, wecan reduce energy consumption by duplicating vi on nodek to which vj is allocated.

Page 13: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

Cluster Comput (2008) 11: 91–113 103

Proof Since eij ∈ E is an energy-critical message, we havethe following inequality (see Definition 1)

(PLactive − PLidle) · cij > (PNactive − PNidle) · ti ,from which we can derive the following inequality

PNactive · ti + PLactive · cij + PNidle · ti> 2PNactive · ti + PLidle · cij .

The term on the left-hand side of the above inequality is theenergy caused by vi and its message, whereas the term onthe right-hand side of the inequality is the energy incurred byvi and its replica. The inequality states that the duplicationof vi helps in reducing energy consumption. �

Two additional definitions are introduced to facilitate theproofs of Lemmas 2–3 and Theorem 6. We omit the proofof Lemma 3, because it is similar to that of Lemma 2.

Definition 2 Given a task graph (V ,E), we define a set�EADUS of energy-critical messages as

�EADUS = {eij ∈ E|(PLactive − PLidle) · cij

> (PNactive − PNidle) · ti}.

Definition 3 Given a task graph (V ,E), we define a set of�TEBUS energy-critical messages that are eliminated by taskduplications. Thus, we have

�TEBUS = {eij ∈ E|eij is an energy-critical message and

Cost Ratio ≤ h}.

Lemma 2 Compared with NDS, given a set �EADUS ofenergy-critical messages for a task graph (V ,E), the energyconserved by EADUS equals to

∑eij ∈�EADUS

((PLactive −PLidle) · cij − (PNactive − PNidle) · ti ).

Proof EADUS conserves energy by eliminating energy-critical messages using task duplications. Based on Defin-ition 1 and Lemma 1, we can prove that energy saved byeliminating an energy-critical message eij ∈ �EADUS equalsto (PLactive − PLidle) · cij − (PNactive − PNidle) · t . The totalenergy conserved by EADUS is the sum of energy savingcontributed by eliminating each message in �EADUS. There-fore, the energy conserved by EADUS can be expressed as∑

eij ∈�EADUS((PLactive − PLidle) · cij − (PNactive − PNidle) ·

ti ). �

Lemma 3 Compared with NDS, given a set �TEBUS ofenergy-critical messages for a task graph (V ,E), the en-ergy conserved by TEBUS equals to

∑eij ∈�TEBUS

((PLactive −PLidle) · cij − (PNactive − PNidle) · ti ).

Theorem 6 With respect to energy conservation, the perfor-mance of EADUS is better than or equal to that of TEBUS.

Proof The TEBUS algorithm makes tradeoffs betweenschedule lengths and energy savings. An energy-criticalmessage in �EADUS is not eliminated by TEBUS througha task duplication if (1) the energy consumption of the du-plication is too high, (2) or the decrease in schedule lengthis not promising, (3) or both. The decision of having a mes-sage eliminated can be made by checking whether the cor-responding cost ratio is higher than a specified thresholdh. Hence, all the messages in �EADUS ought to be elim-inated by EADUS while some message in �EADUS maynot be eliminated by TEBUS. Therefore, it is proved that�TEBUS is a subset of �EADUS, i.e., �TEBUS ⊆ �EADUS,which proves the correctness of the following inequality

eij ∈�TEBUS

((PLactive − PLidle) · cij − (PNactive − PNidle) · ti )

≤∑

eij ∈�EADUS

((PLactive − PLidle) · cij

− (PNactive − PNidle) · ti ).Lemma 2 shows that the term on the right-hand side ofthe inequality is the energy conserved by EADUS, whereasLemma 3 states that the term on the left-hand side of theinequality is the energy savings provided by TEBUS. Theabove inequality proves that the energy-saving performanceof EADUS is better than or equal to that of TEBUS. �

5 A concrete example

Now we run the proposed scheduling algorithms using asample task graph delineated in Fig. 5. Recall that the energyconsumption of the task graph is determined by (13), wherePNactive and PLactive are set to 6 and 1 mW, respectively. Inthis study we ignore energy consumed by idle computationalnodes and communication links, because it is observed fromour experiments that energy consumption caused by idle re-sources is negligible. Given the task set with precedenceconstraints (see Fig. 2), we can obtain a new DAG plottedin Fig. 5, where each task is represented by (eni, ti) andmessage is denoted by (elij , cij ). Recall that eni and elij ,computed by (2) and (7), are the energy consumption of taskvi and communication between task vi and vj .

The running trace of EADUS and TEBUS is given as fol-lows:

Phase 1 Generate a task sequence by computing levels:The levels of tasks can be calculated using (14). For in-stance, the level of task v10 is 8, since v10 is the exit task

Page 14: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

104 Cluster Comput (2008) 11: 91–113

Fig. 5 Task graph for synthetic

without any successor. The level of v8 is 8 + 7 = 15 be-cause v8 has only one successor task. The level of taskv2 is max{L(v5) + 3,L(v6) + 3} = 28, since v2 has twosuccessors—v5 and v6. All the tasks are placed in a queuein the non-increasing order of levels. Thus, we have a list oftasks as {10,9,8,5,6,2,7,4,3,1}.

Phase 2 Calculate important parameters:Phase 2.1 Compute EST and ECT: The EST and ECT

values of each task can be computed by applying (15) and(16). For example, task v1 is the entry task and, therefore,EST(v1) = 0. In accordance with (16), we have ECT(v1) =0 + t1 = 3. Since v2, v3, and v4 are unable to start un-til v1 finishes and, thus, we have EST(v2) = EST(v3) =EST(v4) = ECT(v1) = 3. Similarly, EST of v7 is computedas below

EST(v7) = min{max(ECT(v4),ECT(v3) + c37),

max(ECT(v3),ECT(v4) + c47)}= min{max(5,7 + 20),max(7,5 + 10)} = 15.

Correspondingly, the ECT of v7 is ECT(v7) = EST(v7) +t7 = 15 + 20 = 35.

Phase 2.2 Compute favorite predecessors: The favoritepredecessor of a task is determined by using (17). For exam-ple, the favorite predecessor of task v2, v3, and v4 is v1, sim-ply because these three tasks have only one predecessor. Thefavorite predecessor of v8 is v6 because ECT(v6) + c68 =16 + 50 = 66 > ECT(v5) + c58 = 7 + 5 = 12.

Phase 2.3 Compute LAST and LACT: The LACT andECT values of the exit task v10 equal to 79 and, thus,we have LAST(v10) = LACT(v10) − t10 = 79 − 8 = 71.

In case of LACT(v6), we have to consider two succes-sors, namely, v8 (in critical path) and v9 (not in criti-cal path). We obtain LACT(v6) = min{min(LAST(v9) −c69, min(LAST(v8)))} = min{(66 − 50),29)} = 16 andLAST(v6) = LACT(v6) − t6 = 16 − 10 = 6. Here are the fi-nal computing results for all parameters:

Task Level EST ECT LAST LACT FP

1 40 0 3 31 34 –

2 28 3 6 3 6 1

3 37 3 7 42 46 1

4 35 3 5 34 36 1

5 16 6 7 23 24 2

6 25 6 16 6 16 2

7 33 15 35 46 66 3

8 15 16 23 29 36 6

9 13 66 71 66 71 7

10 8 71 79 71 79 9

Phase 3 Task allocation and duplication phase:The EADUS algorithm. Given a threshold h = 1,

EADUS generates the first group of tasks by starting fromthe first task in the task list obtained in Phase 1. The first taskgroup containing tasks v1, v3, v7, v9, and v10 is allocated tonode 1. Next, EADUS attempts to allocate the first unas-signed task in the list. In this case, the unassigned task is taskv8. Tasks v8, v6 and v2 are allocated to node 2, and the nexttask to be assigned is task v1. Since v1 has been allocated tonode 1, EADUS has to decide whether there is an incentiveto duplicate v1 on node 2. The condition in step 7 (see Fig. 2)is satisfied, because we have LAST(v2) − LACT(v1) =3 − 34 < cc12 = 15. Therefore, duplicating v1 on node 2gives rise to the shortened schedule length. However, the in-crease in energy consumption is en1 − el12 = 18 − 15 = 3(see step 8 in Fig. 2), which is greater than the threshold.Thus, there is no any incentive to duplicate the task due tothe high energy overhead, signifying that the duplication ofv1 must be avoided. EADUS assigns task v5 to node 3, fol-lowed by task v2, and v1, which are not duplicated on node 3because LAST(v5)− LACT(v2) = 23 − 6 = 17 > cc12 = 15,which means the schedule length will be increased. Task v4

is the only task allocated on node 4, and v1 is not dupli-cated because the increase in energy consumption is greaterthan the energy threshold. Finally, EADUS will generate thefollowing scheduling list:

Mobile Node 1 : Task 10 → Task 9 → Task 7 → Task 3

→ Task 1

Mobile Node 2 : Task 8 → Task 6 → Task 2

Page 15: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

Cluster Comput (2008) 11: 91–113 105

Table 2 Characteristics of system parameters

Parameters Value (fixed)–(varied)

Different Trees to be examined Synthetic benchmarks, Gaussian elimination, Fast Fourier Transform

Execution time of Example Tree {3, 3, 4, 2, 1, 10, 20, 7, 5, 8}–(randomly generated)

Execution time of Gaussian Tree {5, 4, 1, 1, 1, 1, 10, 2, 3, 3, 3, 7, 8, 6, 6, 20, 30, 30}–(random)

Execution time of FFT Tree {15, 10, 10, 8, 8, 1, 1, 20, 20, 40, 40, 5, 5, 3, 3}–(random)

Node energy consumption rate 6.0 mW

Communication energy consumption rate 1.5 mW

CCR set {0.1, 0.2, 0.3, 0.5, 0.7, 0.9,1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

Threshold h {0.5, 2}–{0.02, 0.1, 0.2, 0.3, 0.4, 0.5, 0.8, 1, 5, 10, 20, 30, 100, 500}

Mobile Node 3 : Task 5

Mobile Node 4 : Task 4

The TEBUS algorithm. The behavior of TEBUS is sim-ilar to that of EADUS except that energy-performance trade-offs are determined by a ratio between the energy consump-tion of replicas and the decrease in schedule length by virtueof replicas. Given the same threshold h = 1, TEBUS gener-ates the following task schedule lists below, where task v1 isduplicated on node 2. The duplication of v1 is made possibleby TEBUS because the replica helps in reducing the sched-ule length without significantly increasing energy consump-tion.

Mobile Node 1 : Task 10 → Task 9 → Task 7 → Task 3

→ Task 1

Mobile Node 2 : Task 8 → Task 6 → Task 2 → Task 1

Mobile Node 3 : Task 5

Mobile Node 4 : Task 4 → Task 1

6 Performance evaluation

Now we are positioned to evaluate the effectiveness of theproposed energy-aware duplication scheduling algorithms.To demonstratively show the strength of our novel schedul-ing schemes, we conducted extensive experiments usingboth synthetic task graphs (see Fig. 4) and real-world ap-plications like Gaussian elimination and Fast Fourier Trans-form applications. Furthermore, we compare EADUS andTEBUS with two existing scheduling algorithms: the non-duplication-based scheduling heuristic (NDS) [36], and thetask duplication-based scheduling algorithm (TDS) [11].

6.1 Simulation setup

In this subsection we present the experimental setup. Ta-ble 2 summarizes the configuration parameters of simu-lated mobile cluster systems used in our experiments. On

the right hand side of each row in Table 2, parameters inthe first part are fixed while parameters in the second partare varied or randomly generated using uniform distribu-tions. For instance, the threshold values of EADUS andTEBUS are respectively fixed to 0.5 and 2 in one experi-ment, and the threshold values are varied from 0.02 to 500in another experiment (see the last row of Table 2). The mo-bile nodes used in our simulation is a chip presented in [40],which works under 1.2 V power supply with a throughput of2M samples/s. We choose this chip since it is designed formobile computing, especially for narrowband mobile equip-ment. Wireless connection is based on Bluetooth technologyequipped sensor network [42] in our experiments, which hasa 10 kbit/sec transmission speed within 10 m ranges [41].

The performance metrics by which we evaluate systemperformance include:

• Schedule Length: The latest task completion time in thetask set represented by a DAG.

• Communication Energy: Energy consumed by data trans-missions among parallel tasks in the interconnection net-work.

• CPU Energy: Energy consumption incurred by computa-tional node.

• Total Energy: Energy consumed by a set of parallel task.

6.2 Performance evaluation of synthetic benchmarks

The goal of this experiment is to compare the performanceof the proposed EADUS and TEBUS algorithms with theNDS and TDS algorithms with respect to energy conserva-tion and scheduling length under synthetic benchmarks (seeFig. 5). Though communication costs, task execution times,and Communication-to-Computation Ratios (CCR) are syn-thetically generated, we examined their impacts on energyconsumption and performance by controlling these impor-tant parameters.

6.2.1 Sensitivity to CCR

CCR is one of the most important workload parameters forperformance evaluation. To study the performance impacts

Page 16: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

106 Cluster Comput (2008) 11: 91–113

Fig. 6 CCR sensitivity of synthetic benchmarks

Fig. 7 Threshold sensitivity for synthetic benchmarks, CCR = 0.5

of CCR, we evaluate the energy and schedule length perfor-mance as functions of CCR in Fig. 6. In this experiment,we only varied CCR and the threshold values of EADUSand TEBUS are set to be 0.5 and 2, respectively. Figure 6ashows that TDS leads to the highest CPU energy consump-tion, whereas EADUS and NDS consume the least CPU en-ergy. This is because TDS duplicates all possible predeces-sors of tasks while NDS never makes any task duplicationand EADUS aggressively conserves energy. Thus, the per-formance in energy conservation of EADUS is almost thesame as that of NDS. Unlike EADUS, TEBUS strives to finda balance point between time and energy, thereby being su-perior to TDS and worse than EADUS. Interestingly, Fig. 6bshows that the schedule lengths produced by all the four al-gorithms are identical for synthetic benchmarks, because theschedule lengths are determined by the longest critical pathsin the synthetic benchmarks. Note that this observation only

applies to extreme cases like the tested benchmarks. It doesnot necessarily imply that the same observation is drawnfrom other benchmarks or real-world applications.

6.2.2 Sensitivity to threshold

Performance of EADUS and TEBUS largely depends on thethreshold values (see Step 9 in Fig. 2 and Step 11 in Fig. 3).In this set of experiments, we measured energy consumptionof benchmarks while varying the threshold values. It is ob-served from Fig. 7 that applications scheduled by TEBUSconsume more energy when the threshold equals to or morethan 5. In contrast, energy consumption for the benchmarksscheduled by EADUS begins increasing when threshold ex-ceeds 20. These results clearly indicate that TEBUS is moresensitive to the threshold than EADUS.

While Fig. 6 plots energy performance with CCR set to0.5, Fig. 8 shows energy consumption with CCR fixed to 2.

Page 17: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

Cluster Comput (2008) 11: 91–113 107

Fig. 8 Threshold sensitivity for synthetic benchmarks, CCR = 2

Fig. 9 CCR sensitivity for Gaussian elimination

A second intriguing finding from this set of experiments isthat with a larger CCR value, both EADUS and TEBUS be-come more sensitive to the threshold. We attribute this find-ing to the fact that larger CCR values offer more opportuni-ties for EADUS and TEBUS to make appropriate tradeoffsbetween performance and energy conservation. More impor-tantly, the implication behind this observation is that to effi-ciently reduce energy consumption, lower threshold valuesmust be chosen for parallel applications with higher CCR.

6.3 Performance evaluation in real applications

To further compare the performance of the EADUS andTEBUS algorithms with NDS and TDS, we apply themto allocate parallel tasks of two real-world applications,namely, the Gaussian Elimination and Fast Fourier Trans-form applications. We are focusing on the energy consump-tion and schedule length for each application under variousCCRs and thresholds.

6.3.1 Gaussian elimination

The experimental results for the energy consumption of theGaussian Elimination application are shown in Fig. 9. Fourobservations are evident from this group of experiment.First, the energy consumption of Gaussian Elimination un-der all the four scheduling schemes is very sensitive to CCR.

Second, when CCR is greater than 6, energy consump-tion under NDS is consistently higher than that under theother three algorithms. However, NDS provides the greatestenergy savings if CCR is less than 4. This is because energycost in the interconnection network is extremely low witha small CCR value. Third, with respect to energy conserva-tion, EADUS performs as well as NDS with small CCRs.However, EADUS is superior to NDS when CCR is large.These results demonstrate that regardless of the CCR value,EADUS is the best energy-efficient duplication schedulingalgorithm among the four examined schemes. Last, and gen-

Page 18: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

108 Cluster Comput (2008) 11: 91–113

Fig. 10 Threshold sensitivity for Gaussian elimination, CCR = 7

Fig. 11 Threshold sensitivity for Gaussian elimination, CCR = 2

Fig. 12 Threshold Sensitivity for Gaussian Elimination, CCR = 8

erally speaking, the energy performance of TEBUS is some-where between those of EADUS and TDS.

Figures 10–12 illustrate the sensitivity of Gaussian Elim-ination to the threshold. In each group of experiment, wefixed the CCR and varied the threshold. Comparing the re-

sults plotted in Figs. 10–12, we conclude that small CCRsgive rise to high sensitivities of the four algorithms tothe threshold. More specifically, when CCR is small (e.g.,CCR = 0.7), energy consumption is very sensitive to thethreshold. For example, in case of the TEBUS algorithm

Page 19: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

Cluster Comput (2008) 11: 91–113 109

Fig. 13 CCR sensitivity for fast Fourier transform

Fig. 14 Threshold sensitivity for fast Fourier transform, CCR = 0.5

with CCR set to 0.7, the energy consumption values aregrouped into three threshold range [0, 1], [2, 200], [200,500] (see Fig. 10). However, when CCR is increased to 8,the energy consumption values is merely separated into twothreshold range—[0, 1] and [2, 500] (see Fig. 11).

6.3.2 Fast Fourier transform

Fast Fourier Transform is a very well known algorithm usedto implement a three-dimensional Fast Fourier transform.First, we are focused on the energy sensitivity of the FastFourier Transform application to CCR. Figure 12 plots CPUand total energy consumption of Fast Fourier Transform un-der an array of CCR values from 0.1 to 10.

Figure 13 shows that the total energy consumption ofFast Fourier Transform becomes more sensitive to CCRwhen CCR is less than 1. Comparing energy consump-tion results plotted in Figs. 9 and 13, we observe that FastFourier Transform is less sensitive to CCR than GaussianElimination. The implication behind this observation is thatGaussian Elimination can take more energy-saving advan-tages of EADUS and TEBUS than Fast Fourier Transform.

Likewise, we draw the conclusion from Figs. 14–16 thatFast Fourier Transform is less sensitive to the threshold thanGaussian Elimination. A second observation made fromFigs. 14–16 is that small CCRs lead to high sensitivities ofenergy consumption to the threshold. This result is consis-tent with that shown in Figs. 10–12. This experimental re-sult can be explained by the different communication pat-terns of the two applications. Again, Figs. 14–16 illustratethat EADUS and TEBUS outperform the existing TDS al-gorithm in term of energy conservation.

6.4 Overall performance comparisons

Figures 17 and 18 summarize simulation results used toevaluate the overall performance of the four schedulingalgorithms in term of energy consumption and schedulelength. Figure 17 confirms that EADUS and TABUS areconducive to reducing energy consumption of parallel ap-plications running on mobile clusters. In case of GaussianElimination, EADUS reduces the total energy by 16.08%and 15.66% when CCR is 0.2 and 8, respectively. Comparedwith TDS and NDS, TABUS conserves the total energy by

Page 20: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

110 Cluster Comput (2008) 11: 91–113

Fig. 15 Threshold sensitivity for fast Fourier transform, CCR = 3

Fig. 16 Threshold sensitivity for fast Fourier transform, CCR = 8

Fig. 17 Energy evaluation for Gaussian elimination and fast Fourier transform

Page 21: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

Cluster Comput (2008) 11: 91–113 111

Fig. 18 Schedule length evaluation for Gaussian elimination and fast Fourier transform

8.10% and 11.48%, respectively. For Fast Fourier Trans-form, EADUS improve the energy performance by 13.50%and 4.66% with CCR set to 0.2 and 8, respectively. TheTABUS algorithm achieves energy performance improve-ments over TDS and NDS by 9.87% and 4.23% when CCRis 0.2 and 8, respectively.

Figure 18 shows that EADUS and TABUS substantiallyreduce energy consumption of parallel applications withoutadversely affecting performance of the applications. For ex-ample, on average the schedule lengths of Gaussian Elimina-tion produced by EADUS and TABUS are merely 5.7% and2.2% larger than that generated by TDS. Similarly, on aver-age the schedule lengths of Fast Fourier Transfer yielded byEADUS and TABUS are only 5.1% and 3.4% longer thanthat of TDS. These results suggest that it is worth trading amarginal degradation in schedule length for a significant re-duction in energy dissipation for mobile cluster computingsystems.

7 Conclusions

In this paper, we addressed the issue of allocating tasksof parallel applications running on mobile clusters withan objective of shortening schedule lengths while con-serving energy. Specifically, we proposed two duplication-based scheduling algorithms, namely the Energy-AwareDuplication Scheduling algorithm (or EADUS) and theTime-Energy Balanced Duplication Scheduling algorithm(or TEBUS). EADUS and TEBUS are designed and imple-mented to provide energy savings in mobile clusters by du-plicating tasks on more than one computational node. WhileEADUS is able to aggressively provide the greatest energysavings by making use of task replicas to eliminate energy-consuming messages, TEBUS aims at making tradeoffs be-tween energy conservation and performance.

To facilitate the presentation of EADUS and TEBUS,we built mathematical models to describe a mobile clus-ter system framework, parallel applications with precedenceconstraints, and energy consumption model. We conductedextensive experiments based on both synthetic benchmarksand real-world applications running on a simulated mo-bile cluster. Experimental results demonstrate the effec-tiveness and practicality of the proposed duplication-basedscheduling strategies. The empirical results for the GaussianElimination and Fast Fourier Transfer applications showthat EADUS and TEBUS significantly improves the perfor-mance in terms of energy dissipation and schedule lengthover two existing allocation schemes called NDS [36] andTDS [11]. EADUS and TEBUS are capable of trading amarginal degradation in schedule length for a significantreduction in energy dissipation for mobile cluster com-puting systems. For example, EADUS conserves energyfor the Gaussian Elimination application by an average of16.08% with only 5.7% increase in schedule length. Like-wise, TABUS reduces energy consumption by an average of8.1% with merely 2.2% increase in schedule length.

Although simulation results in homogeneous mobileclusters have shown the efficiency of EADUS and TEBUSalgorithms, our research is still in a relatively utopian en-vironment because real mobile cluster should be more likea heterogeneous system, where computational nodes mayhave different processing capabilities and the transmissionlatency may not be identical due to various wireless net-work signal strength. Therefore, our future work will movetowards the direction of designing heterogeneous compati-ble scheduling algorithms for mobile clusters.

Acknowledgements The work reported in this paper was sup-ported by the US National Science Foundation under Grant No. CCF-0742187, Auburn University under a startup grant, the Intel Corpora-tion under Grant No. 2005-04-070, and the Altera Corporation underan equipment grant.

Page 22: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

112 Cluster Comput (2008) 11: 91–113

References

1. Elnozahy, E., Kistler, M., Rajamony, R.: Energy-efficient serverclusters. In: Proceedings of the Workshop on Power-Aware Com-puting Systems, February 2002

2. Alghamdi, M., Xie, T., Qin, X.: PARM: A power-aware messagescheduling algorithm for real-time wireless networks. In: Proc.ACM Workshop Wireless Multimedia Networking and Perfor-mance Modeling, Montreal, Oct. 2005

3. Aydin, H., Melhem, R., Mossé, D., Alvarez, P.M.: Determiningoptimal processor speeds for periodic real-time tasks with differ-ent power characteristics. In: Proc. EuroMicro Conf. Real-TimeSystems, Delft, Netherlands, June 2001

4. Bansal, S., Kumar, P., Singh, K.: An improved duplication strategyfor scheduling precedence constrained graphs in multiprocessorsystems. IEEE Trans. Parallel Distributed Syst. 14(6), 533–544(2003)

5. Benini, L., De Micheli, G.: Dynamic Power Management: DesignTechniques and CAD Tools. Kluwer (1998)

6. Benini, L., Bogliolo, A., Micheli, G.D.: A survey of design tech-niques for system-level dynamic power management. IEEE Trans.Very Large Scale Integr. Syst. 8(3), 299–316 (2000)

7. Chandrakasan, A.R., Brodersen, R.W.: Low Power Digital CMOSDesign. Kluwer, Norwell (1995)

8. Zheng, H., Buyya, R., Bhattacharya, S.: Mobile cluster computingand timeliness issues. Inform. Int. J. Comput. Inform. 23(1), 1999

9. Basit, A., Chang, C.-C.: Mobile cluster computing using IPV6. In:Linux 2002 Symposium, Ottawa, Canada, June 2002

10. Dally, W., Carvey, P., Dennison, L.: The avici terabitswitch/rounter. In: Proc. Hot Interconnects 6, pp. 41–50,Aug. 1998

11. Darbha, S., Agrawal, D.P.: Optimal scheduling algorithm fordistributed-memory machines. IEEE Trans. Parallel DistributedSyst. 9(1), 87–95 (1998)

12. Darbha, S., Agrawal, D.P.: A task duplication based scalablescheduling algorithm for distributed memory systems. J. ParallelDistributed Comput. 46(1), 15–27 (1997)

13. Douglis, F., Krishnan, P., Bershad, B.: Adaptive disk spin-downpolicies for mobile computer. In: USENIX Symp. Mobile andLocation-Independent Computing, pp. 121–137 (1995)

14. Maluk Mohamed, M.A., Devanathan, V.R., Janaki Ram, D.: Amodel for mobile cluster computing: design and evaluation. In: In-ternational Conference on Computer Science and its Applications(2003)

15. Elnozahy, E.N.M., Kistler, M., Rajamony, R.: Energy-efficientserver clusters. In: Proc. Int’l Workshop Power-Aware ComputerSystems, Feb. 2002

16. Golding, R., Bosh, P., Wilkes, J.: Idleness is not sloth. In: HP LabTechnical Report, HPL-96-140 (1996)

17. Graham, R.L., Lawler, L.E., Lenstra, J.K., Kan, A.H.: Optimizingand approximation in deterministic sequencing and scheduling: asurvey. Ann. Discrete Math., 287–326 (1979)

18. Hong, I., Kirovski, D., Qu, G., Potkonjak, M., Srivastava, M.:Power optimization of variable voltage core-based systems. In:Proc. Design Automation Conf. (1998)

19. Hong, I., Potkonjak, M., Srivastava, M.: Synthesis techniques forlow-power hard real-time systems on variable voltage processors.In: Proc. IEEE Real-Time System Symp., Dec. 1998

20. Hong, I., Potkonjak, M., Srivastava, M.: On-line scheduling ofhard real-time tasks on variable voltage processor. In: Proc. Com-puter Aided Design, pp. 653–656 (1998)

21. Maluk, M.A., Vijay Srinivas, A., Janakiram, D.: Moset: an anony-mous remote mobile cluster computing paradigm. J. Parallel Dis-tributed Comput. (JPDC) 65(10), 1212–1222 (2005)

22. Kuskin, J. et al.: The Stanford FLASH multiprocessor. In: Proc.21st Int’l Symp. Computer Architecture (1994)

23. Lorch, J., Smith, A.: Software strategies for portable computer en-ergy management. IEEE Personal Commun. 5, 60–73 (1998)

24. Lorch, J.R., Smith, A.J.: Improving dynamic voltage scaling al-gorithm with PACE. In: Proc. ACM SIGMETRICS Conf., Cam-bridge, MA, June 2001

25. Mellanox Technologies Inc., Mellanox performance, price, power,volumn metric (PPPV), http://www.mellanox.co/products/shared/PPPV.pdf (2004)

26. Pande, S.S., Agrawal, D.P., Mauney, J.: A scalable schedulingmethod for functional parallelism on distributed memory multi-processors. IEEE Trans. Parallel Distributed Syst. 6(4), 388–399(1995)

27. Qin, X., Jiang, H.: A dynamic and reliability-driven schedulingalgorithm for parallel real-time jobs on heterogeneous clusters. J.Parallel Distributed Comput. 65(8), 885–900 (2005)

28. Rabaey, J., Pedram, M. (eds.): Lower Power Design Methodolo-gies. Kluwer, Norwell (1998)

29. Raghunathan, A., Jha, N.K., Dey, S.: High-Level Power Analysisand Optimization. Kluwer, Norwell (1998)

30. Ranaweera, S., Agrawal, D.P.: A task duplication based schedul-ing algorithm for heterogeneous systems. In: Proc. Parallel andDistributed Processing Symp., pp. 445–450 (2000)

31. Rewini, H.E., Lewis, T.G., Ali, H.H.: Task Scheduling in Paralleland Distributed Systems. Prentice Hall, New Jersey (1994)

32. Shin, Y., Choi, K.: Power conscious fixed priority scheduling forhard real-time systems. In: Proc. Design Automation Conf. (1999)

33. Sih, G.C., Lee, E.A.: A Compile time scheduling heuristic forinterconnection-constrained heterogeneous processors architec-tures. IEEE Trans. Parallel Distributed Syst. 4(2), 175–187 (1993)

34. Srinivasan, S., Jha, N.K.: Safety and reliability driven task alloca-tion in distributed and systems. IEEE Trans. Parallel DistributedSyst. 10(3), 238–251 (1999)

35. Srivastava, M., Chandrakasan, A., Brodersen, R.: Predictive sys-tem shutdown and other architectural techniques for energy effi-cient programmable computation. IEEE Trans. VLSI Syst. 4(1),42–55 (1996)

36. Wu, M.Y., Gajski, D.D.: Hypertool: a performance aid formessage-passing systems. IEEE Trans. Parallel Distributed Syst.1(3), 330–343 (1990)

37. Xie, T., Qin, X., Nijim, M.: Solving energy-latency dilemma: taskallocation for parallel applications in heterogeneous embeddedsystems. In: Proc. 35th Int’l Conf. Parallel Processing, Columbus,Ohio, Aug. 2006

38. Yao, F., Demers, A., Shenker, S.: A scheduling model for reducedCPU energy. In: Proc. IEEE Annual Foundations of Computer Sci-ence, pp. 374–382 (1995)

39. Yu, Y., Prasanna, V.K.: Energy-balanced task allocation for col-laborative processing in wireless sensor networks. Mobile Netw.Appl. 10(1–2), 115–131 (2005)

40. Andreani, P., Sundstrom, L.: Chip for wideband digital predistor-tion RF power amplifier linearization. Electron. Lett. 33(11), 925(1997)

41. http://www.csee.umbc.edu/~younis/Sensor_Networks/Class_Notes/Lecture_2.pdf

42. Lundberg, M., Eliasson, J., Allan, J., Johansson, J., Lindgren, P.:Power characterization of a bluetooth-equipped sensor. In: Work-shop on Real-World Wireless Sensor Networks, June 2005

Page 23: Energy efficient scheduling for parallel applications …cs.txstate.edu/~zz11/publications/Mobile_clusters.pdfCluster Comput (2008) 11: 91–113 DOI 10.1007/s10586-007-0044-5 Energy

Cluster Comput (2008) 11: 91–113 113

Ziliang Zong received his B.S. and M.S. inComputer Science from Shandong Universityof China in 2002 and 2005 respectively. He iscurrently a PhD student in the Department ofComputer Science and Software Engineering atAuburn University, United States. During Oct.2003–Oct. 2004, he studied as research assis-tant student in the Artificial Intelligence Labof Toyama University, Japan. He is an IEEEstudent member and a recipient of 2006 IEEE

Technical Committee on Scalable Computing (TCSC) Student TravelAward. His research interests include energy-efficient scheduling, highperformance computing, distributed storage systems and embeddedsystems.

Mais Nijim is an Assistant Professor in theschool of computing at University of South-ern Mississippi. She received her M.S. in Com-puter Science from New Mexico State Univer-sity in 2004. She received her PhD in ComputerScience from New Mexico Institute of Min-ing and Technology. Her research interests in-clude parallel and distributed systems, real timesystems, storage systems, security, and perfor-mance evaluation.

Adam Manzanares received his B.S. in Com-puter Science in 2002 from the New Mexico In-stitute of Mining and Technology, United States.Currently he is a PhD student in the Departmentof Computer Science and Software Engineeringat Auburn University. During the summers of2002–2007 he has worked as a student internat the Los Alamos National Laboratory. His re-search interests include energy efficient comput-ing, modeling and simulation, and high perfor-mance computing.

Xiao Qin (S’99-M’04) received the BS and MSdegrees in Computer Science from HuazhongUniversity of Science and Technology in 1992and 1999, respectively. He received the PhDdegree in Computer Science from the Univer-sity of Nebraska-Lincoln in 2004. Currently, heis an Assistant Professor in the Department ofComputer Science and Software Engineering atAuburn University. Prior to joining Auburn Uni-versity in 2007, he had been an assistant profes-

sor with New Mexico Institute of Mining and Technology (New Mex-ico Tech) for three years. In 2007, he received an NSF CPA Award andan NSF CSR Award. His research interests include parallel and dis-tributed systems, storage systems, fault tolerance, real-time systems,and performance evaluation. His research is supported by the U.S. Na-tional Science Foundation, Auburn University, and Intel Corporation.He had served as a subject area editor of IEEE Distributed System On-line (2000–2001). He has been on the program committees of variousinternational conferences, including IEEE Cluster, IEEE IPCCC, andICPP.