IEEE TRANSACTIONS ON MOBILE COMPUTING, …IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL.XX, NO.XX, 201X 2 Considering the potential of fog computing, IoT devices can perform their resource-hungry

IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL.XX, NO.XX, 201X 1

An Application Placement Technique forConcurrent IoT Applications in Edge and Fog

Computing EnvironmentsMohammad Goudarzi, Member, IEEE, Huaming Wu, Member, IEEE,

Marimuthu Palaniswami, Fellow, IEEE, and Rajkumar Buyya, Fellow, IEEE

Abstract—Fog/Edge computing emerges as a novel computing paradigm that harnesses resources in the proximity of the Internet ofThings (IoT) devices so that, alongside with the cloud servers, provide services in a timely manner. However, due to theever-increasing growth of IoT devices with resource-hungry applications, fog/edge servers with limited resources cannot efficientlysatisfy the requirements of the IoT applications. Therefore, the application placement in the fog/edge computing environment, in whichseveral distributed fog/edge servers and centralized cloud servers are available, is a challenging issue. In this article, we propose aweighted cost model to minimize the execution time and energy consumption of IoT applications, in a computing environment withmultiple IoT devices, multiple fog/edge servers and cloud servers. Besides, a new application placement technique based on theMemetic Algorithm is proposed to make batch application placement decision for concurrent IoT applications. Due to the heterogeneityof IoT applications, we also propose a lightweight pre-scheduling algorithm to maximize the number of parallel tasks for the concurrentexecution. The performance results demonstrate that our technique significantly improves the weighted cost of IoT applications up to65% in comparison to its counterparts.

Index Terms—Fog Computing, Edge Computing, Internet of Things (IoT), Application Placement, Optimization, ApplicationPartitioning.

F

1 INTRODUCTION

DUE to recent advances in hardware and softwaretechnologies, the number of Internet of Things (IoT)

devices (e.g. smartphones, smart cameras, smart vehicles,etc) has significantly increased, so that IoT devices andtheir applications have become pervasive in modern digitalsociety. However, the IoT paradigm, in which heterogeneousdevices can connect and communicate together, generatesa huge amount of data that needs processing and stor-age. According to Cisco, it is anticipated that by 2030,approximately 500 billion IoT devices will be connected tothe Internet [1]. In addition, the number of real-time andlatency-sensitive applications such as smart transportation,smart health-care, augmented reality, and smart buildingsrequiring large amounts of computing and network re-sources has increased significantly [2]. Moreover, perform-ing such resource-hungry applications requires a consider-able amount of energy to be consumed, which significantlyaffects the performance of IoT devices such as mobile de-vices and sensors, due to their limited battery lifetime.

As a centralized solution, the cloud computingpmaradigm is one of the main enablers of the IoT, in whichunlimited and elastic resources are available to execute IoT’s

• M. Goudarzi and R. Buyya are with the Cloud Computing and DistributedSystems (CLOUDS) Laboratory, School of Computing and InformationSystems, The University of Melbourne, Australia.

• M. Palaniswami is with the Department of Electrical and ElectronicEngineering, The University of Melbourne, Australia

• H. Wu is with the Center for Applied Mathematics, Tianjin University,Tianjin 300072, China.E-mail: [email protected], [email protected]@unimelb.edu.au, [email protected].

computation-intensive applications. The execution time ofIoT applications and IoT devices’ energy consumption canbe reduced by offloading (i.e., application/task placement)all/some of their computation-intensive tasks to differentcloud servers [3]. However, IoT devices suffer from lowbandwidth and high latency when communicating withcloud servers. These latter are mainly because IoT devicesare connected to the cloud servers via Wide Area Network(WAN) which provides low bandwidth, and the far distanceof cloud servers and IoT devices which leads to high latency[4]. Besides, the huge amount of incoming data to the cloudservers and resource-hungry nature of emerging IoT appli-cations requiring more computing and storage resources,lead to congestion in the cloud servers. Hence, not only thecloud servers cannot efficiently satisfy the requirements ofemerging resource-hungry IoT applications, but also theymay incur more energy consumption for IoT devices due totheir low bandwidth.

To reduce the huge amount of incoming data to thecloud servers, and alleviate the high latency and lowbandwidth problem, a new computing paradigm, calledFog Computing has emerged. It provides an intermediatecomputing layer between cloud servers and IoT devicesin which several heterogeneous fog servers are distributed.These fog servers have fewer resources (e.g. CPU, RAM)in comparison to cloud servers, while they provide higherbandwidth with less latency for IoT devices since they canbe accessed via Local Area Network (LAN) [4], [5]. In ourview, edge computing harnesses only edge resources whilefog computing harnesses both edge and cloud resources (al-though some of the works use these terms interchangeably).


Considering the potential of fog computing, IoT devices canperform their resource-hungry and latency-sensitive appli-cations with improved Quality of Service (QoS) by offload-ing all/some of their applications to fog or cloud serversbased on their QoS requirements [6], [7]. It also leads to lesscongestion in cloud servers since distributed fog servers canease the burden of cloud servers for processing and storageof incoming data from IoT devices. However, consideringthe large number of heterogeneous IoT devices whose ap-plications require various level of QoS, it is challenging todecide whether the execution of such applications on remoteservers (whether fog or cloud servers) is beneficial or not.Besides, the ever-increasing number of IoT devices causesmore requests to be forwarded to the fog servers, which mayincur congestion due to their limited resources. This lattermay result in more execution time and energy consumptionfor IoT devices.

To address the aforementioned issues, we propose anefficient application placement technique to jointly optimizethe execution time and energy consumption of IoT devicesin an environment with multiple heterogeneous cloud andfog servers. The main contributions of this paper are asfollows.

• We propose a weighted cost model for applicationplacement of multiple IoT devices to minimize theirexecution time and energy consumption.

• We put forward a dynamic and lightweight pre-scheduling technique to maximize the number of par-allel tasks for execution. Considering the NP-Completenature of application placement in fog computing en-vironments, we propose an optimized version of theMemetic Algorithm (MA) to achieve a well-suited solu-tion in reasonable decision time.

• We embed a fast failure recovery method in our tech-nique to assign failed tasks to appropriate servers in atimely manner.

The rest of the paper is organized as follows. Relevantwork of application placement techniques in fog computingenvironments is discussed in section 2. The system modeland problem formulations are presented in section 3. Section4 presents our proposed applications placement technique.We evaluate the performance of our technique and compareit by the state-of-the-art techniques in section 5. Finally,section 6 concludes the paper and draws future works.

2 RELATED WORK

In this section, related works for application placementtechniques in fog computing environments are discussed,where cloud and fog servers work collaboratively to satisfythe IoT application requirements. They are divided intoindependent and dependent categories based on the de-pendency mode of their IoT applications’ constituent parts(e.g., tasks). Each IoT application can be modeled as a set ofindependent or dependent tasks. The dependent one refersto applications consisted of several dependent tasks so thateach new task runs only when its predecessor tasks arecompletely performed. However, in the independent one,the applications’ tasks do not have such constraints forexecution.

2.1 Independent Tasks

Huang et al. [8] proposed a task placement algorithm wheremultiple mobile devices offload their independent tasks tomultiple edge servers and one cloud server. In this tech-nique, each mobile device decides whether each task shouldbe offloaded or not, and in case of offloading, which edgeor cloud server is suited for execution of each task. Anenergy-aware cloudlet selection technique was proposed in[9] to meet the latency requirement of incoming tasks fromone IoT device. Haber et al. [10] proposed an offloadingalgorithm deployed in the cloud layer, aiming at minimizingthe energy consumption of several mobile devices while sat-isfying the latency requirements of mobile applications. It isobtained by optimizing mobile devices’ transmission powerand the assigned server computation. An offloading algo-rithm based on the Lyapunov optimization was proposedin [11] to reduce the execution time of IoT applications byoffloading the task to either the single fog server or onecloud server. Mahmud et. al. [12] proposed a Quality ofExperience (QoE)-aware application placement technique inwhich independent tasks of IoT devices are placed in the fogor cloud servers. Chen et al. [13] considered a multi-userenvironment with a single computing access point and aremote cloud server, in which the independent tasks of mo-bile users can be processed locally, at the computing accesspoint, or the cloud server. Hong et al. [14] proposed a game-theoretic approach for computation offloading, and multi-hop cooperative-messaging mechanism for IoT devices. Itconsiders that each IoT device decides either to forwardits single task to the fog or cloud server if it has access towireless networks or to collaborate with other IoT devicesthat have access to wireless networks for forwarding its task.

2.2 Dependent Tasks

In the dependent category, related works modeled theirapplications by Directed Acyclic Graph (DAG) in whicheach vertex represents one task of IoT application, and eachedge shows data flow (i.e., dependency) between two tasks.

Neto et al. [15] and Wu et al. [16] proposed a partition-ing algorithm for a single mobile device to offload theircomputation-intensive tasks to a single edge or cloud server.The placement engine of these proposal are placed at themobile device aiming at finding a group of tasks for of-floading, by which the execution time of mobile applicationand energy consumption of mobile device become reduced.The main goal of [17], [18] is to minimize the execution timeof IoT applications in an environment in which multiple fogservers and a cloud server are accessible for the applicationplacement. Lin et al. [17] considered only one mobile devicein its system model for offloading, while Stavrinides et al.[18] attempted to place tasks of multiple users requiringlow communication overhead at the cloud server and thosetasks that have more communication overhead at the edgelayer. Mahmud et al. [19] proposed a latency-aware appli-cation placement policy in an environment with multiplefog servers and a single cloud server. Although the above-mentioned techniques consider task placement as their mainobjective, Bi et al. [20] proposed a solution for joint op-timization of service caching placement and computationoffloading.


Table 1: The Qualitative Comparison of the Current Literature

Techniques

IoT Application Properties Architectural Properties Placement Engine Properties

DependencyMode

TaskNumber

HeterogeneityIoT Device Edge Layer Cloud Layer

PositionBatch

Placement

DecisionParameters

NumberRequestNumber

FogNumber

Cooperation HeterogeneityCloud

NumberCooperation Heterogeneity Time Energy Weighted

[8]

Independent

Multiple X Multiple Different Multiple × × Single × × IoTdevice

No

X X X

[9] Single X Single Same Multiple × X Single × × EdgeLayer

X X ×

[10] Single X Multiple Same Multiple × × Single × × CloudLayer

X X X

[11] Single X Multiple Single × × Single × × EdgeLayer

X × ×

[12] Single X Multiple Same Multiple × X Single × × EdgeLayer

X × ×

[13] Multiple X Multiple Same Single × × Single × × EdgeLayer

X X X

[14] Single X Multiple Same Multiple X X Single × × EdgeLayer

X X X

[15]

Dependent

Multiple X Single Same Single × Single × × IoTDevice

X X X

[16] Multiple X Single Same Single × Single × × IoTDevice

X X X

[18] Multiple X Single Same Multiple X X Single × × EdgeLayer

X × ×

[20] Multiple X Single Same Single × × × × X X X

[17] Multiple X Multiple Multiple × × Single × × EdgeLayer

X × ×

[19] Multiple X Multiple Different Multiple X X Single × × EdgeLayer

X × ×

[4] Multiple × Multiple Different Single × × Single × × EdgeLayer Yes

X X X

Our Technique Multiple X Multiple Different Multiple X X Multiple X XEdgeLayer

X X X

The proposed placement engines in the aforementionedworks made application placement decisions for differentusers at different time slots, or only consider a fraction of awhole of each user’s tasks at each time slot. However, Xuet al. [4] proposed a batch task placement based on GeneticAlgorithm (GA), in which mobile applications of multipleusers are forwarded to the single central edge server forapplication placement decision.

2.3 A Qualitative Comparison

Table 1 identifies and compares key elements of relatedworks with ours in terms of their IoT application, archi-tectural, and placement engine properties. In the IoT ap-plication section, the dependency mode of each proposalis studied which can be either independent or dependent.Moreover, we study how each proposal modeled IoT appli-cation in terms of the number of tasks and heterogeneity.This latter demonstrates whether IoT applications consistof homogeneous or heterogeneous tasks in terms of theircomputation and data flow. In the architectural section,the attributes of IoT devices, fog/edge servers, and cloudservers are studied. For IoT devices, the overall number ofdevices and their type of requests are identified. The differ-ent request number shows that each device has a differentnumber of requests compared to other IoT devices. In thefog and cloud layers, the number of fog and cloud servers,the cooperation between different fog/cloud servers, andthe heterogeneity in terms of servers’ specifications areidentified, respectively. The position of placement engine,the capability of batch placement, and decision parametersare also studied in the placement engine section.

Considering application placement techniques proposedfor fog computing environments, this work proposes abatch application placement technique for an environmentconsisting of multiple devices in the IoT layer, multiplefog/edge servers in the edge layer, and multiple cloudservers in the cloud layer. To the best of our knowledge,this is the only work that considers the aforementioned

fog computing environment and proposes a weighted costmodel to jointly minimize the execution time of IoT applica-tions and energy consumption of IoT devices. Our weightedcost model not only can be applied for our general fogcomputing environment, but it also can be used for sim-pler fog computing environments with a single IoT device,single fog server, single cloud server, or any combinationthereof. In addition, it is important to note that the IoTapplications are considered as heterogeneous DAGs (i.e.,workflows) with a different number of tasks and data flows.Hence, we propose a lightweight pre-scheduling algorithmto organize incoming tasks of different DAGs, so that thenumber of tasks for parallel execution becomes maximized.Then, an optimized version of the Memetic Algorithm (MA)is proposed to perform application placement in a timelymanner.

3 SYSTEM MODEL AND PROBLEM FORMULATION

We consider a framework consisting of multiple IoT devices,multiple fog (i.e., edge) servers, multiple cloud servers,and brokers, in which IoT devices can locally execute theirworkflows (i.e., DAGs) or completely/partially place themon cloud servers and/or fog servers for execution. Figure 1represents an overview of our system model.

In this system framework, each broker supports up toN IoT devices, which are distributed in its proximity. Thebroker (which can be a fog server) receives workflows fromdifferent IoT devices, and periodically makes task placementdecisions based on the requirements of IoT applicationsand the current status of the network. According to theresult of application placement decisions, each IoT deviceunderstands to which server each constituent part of itsworkflow should be sent, or it should be executed locallyon the IoT device.

3.1 Application WorkflowEach IoT application can be partitioned based on differentlevels of granularity such as class and task, just to mention a

IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL.XX, NO.XX, 201X 4Cloud #1

Broker

Broker

IoT devices

Distributed Fog/Edge Servers

IoT devices

Cloud #2

Cloud #3

IoT Layer

Edge Layer Cloud Layer

Fog Computing

Figure 1: An overview of our system model

few [21]. Without loss of generality, we represent the appli-cation running on the nth IoT device as a DAG (i.e., work-flow) of its tasks Gn = (Vn, En), ∀n ∈ {1, 2, · · · , N}, whereVn = {vn,i|1 ≤ i ≤ |Vn|} denotes the set of tasks running onthe nth IoT device, and En = {en,i,j |vn,i, vn,j ∈ Vn, i 6= j}illustrates the set of data flows between tasks. As an illus-tration, en,i,j represents the dependency between vn,i andvn,j of the application running on the nth IoT device.

Considering the number of instructions for each taskvn,i, its corresponding weight is represented as vwn,i. Besides,the associated weight of each edge ewn,i,j shows the amountof data that the task vn,j receives as an input from vn,i.Since IoT applications are modeled as DAGs, each task vn,icannot be executed unless all its predecessor tasks, denotedas P(vn,i) finish their execution.

3.2 Problem Formulation

We formulate the task placement problem as an optimiza-tion problem aiming at minimizing the overall executiontime of IoT applications and energy consumption of IoTdevices.

Since different servers are available to run each task vn,i,the set of all available servers is represented as S with |S| =M . The Sy,z represents one server, in which y represents thetype of server (the IoT device (y = 0), fog servers (y = 1),cloud servers (y = 2)) and z denotes that server’s index. Theoffloading configuration of the workflow belonging to thenth IoT device is represented as Xn, and the xn,i denotes theoffloading configuration for each task vn,i, which is obtainedfrom the following criteria:

xn,i =

{ 0, sy,z = s0,n,1, sy,z ∈ {s1,1, s1,2, · · · , s1,f} |z| = f

2, sy,z ∈ {s2,1, s2,2, · · · , s2,c}, |z| = c

(1)

where xn,i = 0 depicts that the ith task is assigned to the nthIoT device (s0,n) for local execution, and xn,i = 1 and xn,i =

2 denote that the ith task is assigned to one of fog serversand cloud servers, respectively, for the remote execution.Moreover, the f and c show the number of available fogservers and cloud servers respectively.

3.2.1 Weighted cost model

The goal of the task placement technique is to find thebest possible configuration of available servers for each IoTapplication so that the weighted cost of execution for each

IoT device becomes minimized, as depicted in the following:

minψγ ,ψθ∈[0,1]

Ψ(Xn), ∀n ∈ {1, 2, · · · , N} (2)

whereΨ(Xn) = ψγ ×

Γ(Xn)

ΓLocn+ ψθ ×

Θ(Xn)

ΘLocn(3)

s.t.C1 : VMfog,i ≤ Cfog,i, ∀i ∈ {S1,1, · · · ,S1,f} (4)C2 : |xn,i| = 1, ∀n ∈ {1, 2, · · · , N}, 1 ≤ i ≤ |Vn| (5)C3 : Ψ(P(vn,i)) ≤ Ψ(P(vn,i) + vn,i) (6)

where Γ(Xn), Θ(Xn), ΓLocn , and ΘLocn demonstrate theexecution time, energy consumption, local execution timeand local energy consumption of the nth IoT device’s work-flow, respectively. Besides, ψγ and ψθ are control parametersfor execution time and energy consumption, by which theweighted cost model can be tuned according to the users’requirements. Moreover, we assume that each task can beexactly assigned to one Virtual Machine (VM) of one fogor cloud server. C1 denotes that the number of instantiatedVMs of the ith fog server VMfog,i is less or equal to themaximum capacity of that fog server Cfog,i. C2 representsthat each task i belonging to the workflow of nth IoT devicecan only be assigned to one server in each time slot. Inaddition, C3 indicates that the predecessor tasks of vn,ishould be executed before the execution of the task vn,i.

3.2.2 Execution time model

Considering the Eq. 3, the weighted cost optimization isequal to the execution time model when ψγ = 1 and ψθ = 0.

The goal of execution time optimization model is tofind the optimal configuration of the application runningon the nth IoT device so that the execution time of thatapplication decreases. The overall execution time of eachcandidate configuration can be defined as the sum of latencyin task offloading (ΓlatXn ), the computing time of workflow’stasks based on their assigned servers (ΓexeXn ) and the datatransmission time between each pair of dependent tasks ineach workflow (ΓtraXn ), as depicted in the following:

Γ(Xn) = ΓexeXn + ΓlatXn + ΓtraXn (7)

The computing execution time that corresponds to theapplication running on the nth IoT device is calculated by:

ΓexeXn =∑

xn,i∈Xn

γexexn,i (8)

where γexexn,i shows the computing time of task vn,i, and iscalculated based on its corresponding assigned server fromthe following equation:

γexexn,i =

vwn,iloccpu , xn,i = 0

vwn,iSF f×loccpu , xn,i = 1

vwn,iSF c×loccpu , xn,i = 2

(9)

where loccpu demonstrates the computing power of the IoTdevice, and SF f and SF c denote the speedup factor of fogservers and cloud servers, respectively.


The offloading latency ΓlatXn of tasks corresponding to thenth IoT device is calculated based on tasks’ assigned servers:

ΓlatXn =∑

xn,i∈Xn

γlatxn,i (10)

where γlatxn,i illustrates the offloading latency of task vn,i, andis calculated according to its corresponding assigned serverfrom the following equation:

γlatxn,i =

0, xn,i = 0

LLAN , xn,i = 1

LWAN , xn,i = 2

(11)

where LLAN and LWAN correspond to the latency of LANand WAN respectively.

The tasks’ transmission time of the workflow corre-sponding to the nth IoT device is calculated by:

ΓtraXn =∑

en,i,j∈En

γtraen,i,j (12)

where the transmission time of each pair of dependent tasksvn,i and vn,j is calculated as follows:

γtraen,i,j =

ewn,i,jBLAN

, CTi = CT1, CT3ewn,i,jBWAN

, CTi = CT2, CT4

0, CTi = CT5

(13)

where BLAN and BWAN stand for the bandwidth (i.e., datarate) of LAN and WAN respectively. The CTi representspossible transmission configuration for each edge en,i,j ac-cording to the assigned servers of its tasks vn,i and vn,jto calculate transmission time. The possible transmissionconfigurations are defined as:

CTi(ewn,i,j) =

xn,i � xn,j = 0

& xn,i = 1 i = 1

& SI(vn,i) � SI(vn,j) 6= 0

xn,i � xn,j = 0

& xn,i = 2 i = 2

& SI(vn,i) � SI(vn,j) 6= 0

xn,i � xn,j = 1, i = 3

xn,i � xn,j > 1, i = 4

xn,i � xn,j = 0

& SI(vn,i) � SI(vn,j) = 0, i = 5

(14)

where � is XOR binary operation and SI(vn,i) is a func-tion that returns the assigned server’s index (i.e., z) ofith task belonging to the nth workflow. CT1 denotes thatthe invocation is between two tasks vn,i and vn,j that areassigned to two different fog servers, and CT2 represents theconfiguration in which the two tasks run on two differentcloud servers. The invocation between two tasks assignedto the IoT device and one of fog server is depicted in CT3.CT4 is used to show two different configurations. The firstone is whenever the two tasks are assigned to the IoT deviceand one of the cloud servers, while the second one illustratesthat one task is assigned to one of the cloud servers and thesecond task is assigned to one of the fog servers. Finally, CT5

refers to the condition that two tasks are assigned exactly tothe same server, for which the transmission time is equal tozero.

3.2.3 Energy consumption modelAccording to Eq. 2, the weighted cost optimization is equalto the energy consumption model when ψγ = 0 and ψθ = 1.The energy consumption model aims at finding the best-possible configuration of the application’s tasks to minimizethe energy consumption of the nth IoT device.

The overall energy consumption of each candidate con-figuration can be defined as the sum of energy consumedin task offloading (ΘlatXn ), the energy consumption for thecomputing of tasks (ΘexeXn ), and the energy consumed forthe data transmission between each pair of dependent tasks(ΘtraXn ) of that application, as depicted in the following:

Θ(Xn) = ΘexeXn + ΘlatXn + ΘtraXn (15)

The amount of energy consumed to compute the applicationbelonging to the nth IoT device is defined as follows:

ΘexeXn =∑

xn,i∈Xn

θexexn,i (16)

where θexexn,i represents the energy consumption required tocompute the task vn,i, as calculated in the following:

θexexn,i =

{γexexn,i × Pcpu, xn,i = 0

γidlexn,i × Pidle, xn,i = 1, 2(17)

where Pcpu is the CPU power of the IoT device on whichthe task vn,i runs. Since we only consider the energy con-sumption from IoT device perspective, whenever each taskis offloaded to the fog servers (xn,i = 1) or cloud servers(xn,i = 2), the respective energy consumption is equal tothe idle time of the IoT device γidlexn,i multiplied to the powerconsumption of that device in its idle mode Pidle.

The energy consumed to offload applications’ tasks be-longing to the nth IoT device ΘlatXn is calculated by:

ΘlatXn =∑

xn,i∈Xn

θlatxn,i (18)

where θlatxn,i stands for the offloading energy consumption ofthe task vn,i and is obtained from:

θlatxn,i =

{0, xn,i = 0

γlatxn,i × Pidle, xn,i = 1, 2(19)

The transmission energy consumption ΘtraXn correspond-ing to the nth IoT device is obtained from:

ΘtraXn =∑

xn,i∈Xn

θtraxn,i (20)

where the transmission energy between each pair of depen-dent tasks vn,i and vn,j is calculated as follows:

θtraen,i,j =

ewn,i,jBLAN

× Ptransfer, CEi = CE1

ewn,i,jBWAN

× Ptransfer, CEi = CE2

0, CEi = CE3

(21)

where the transmission power of the IoT device is denotedas Ptransfer, and the CEi shows transmission configuration


for each edge en,i,j based on the assigned servers of its tasksto calculate the transmission energy, which is calculatedfrom:

CEi(ewn,i,j) =

xn,i � xn,j = 1, i = 1

xn,i � xn,j = 2, i = 2

otherwise, i = 3

(22)

where CE1 denotes that the data flow is between two tasksvn,i and vn,j that are assigned to the IoT device and fogservers. Moreover, CE2 is used to represent the invocationbetween two tasks that are assigned to IoT device and cloudservers. Because the energy consumption is considered fromthe IoT device perspective, the transmission energy con-sumption is equal to zero whenever one of the participatingtasks in each edge ewn,i,j is not assigned to the IoT device, asrepresented in CE3.

4 A NEW APPLICATION PLACEMENT TECHNIQUEOur proposed application placement technique is dividedinto three phases: pre-scheduling, batch application place-ment, and failure recovery. In the pre-scheduling phase, analgorithm is proposed by which brokers can organize theconcurrent IoT devices’ workflows. Next, we propose anoptimized version of Memetic Algorithm (MA) for batchapplication placement to minimize the weighted cost of eachIoT device. Beside, to overcome any potential failures in theruntime, we embed a lightweight failure recovery methodin our technique.

4.1 Pre-scheduling PhaseThe broker receives concurrent workflows from IoT devicesin its decision time slot and organizes them based on theirrespective dependencies. Moreover, it calculates the localexecution time and energy consumption of IoT devicesbased on their respective workflows.

Workflows of IoT devices are heterogeneous in termsof the number and weight of tasks, dependencies, and theamount of dataflow between each pair of dependent tasks.Moreover, the order of execution of tasks in each workflowshould be sorted so that a new task vn,i cannot be executedunless all tasks in its P(vn,i) finish their execution.

4.1.1 Algorithmic processAlgorithm 1 demonstrates how the pre-scheduling phaseorganizes tasks of each workflow and accordingly createsa list of schedules of concurrent workflows. In Algorithm1, for each workflow, the local execution time and en-ergy consumption are calculated and stored in LocT ime

and LocEnergy, respectively (lines 3 and 4). Since DAGscan have several root vertices (i.e., source nodes), theRootF inder method finds all the root vertices of each DAGand stores them in Sourcen (line 5). This method checkswhether the P(vn,i) is equal to null or not for each task i inthe nth workflow, and if it equals to null returns that taskas one source root. The SingleRootTransformer methodreceives the WFn and Sourcen and creates a new DAG,called DAG?n, in which the workflow has only a singlesource root (line 6). To obtain this, we create a dummyvertex (called DummyRootn) and connect this vertex to allsource vertices of Sourcen obtained from the original DAG.

This enables us to run Breadth-First-Search (BFS) algorithmover DAG?n starting from the DummyRoot, by which we canspecify scheduling number for each vertex (i.e., BFS level ofeach vertex) (line 7). The main outcome of the first loop(lines 2-8) of this algorithm is providing a schedule numberfor tasks of each workflow, by which the concurrent tasks ofeach workflow are specified. Because our proposed batchapplication placement algorithm concurrently decides forseveral workflows at each time slot, it is required to combinethese workflows based on their respective schedule number.To achieve this, the algorithm iterates over all workflows, sothat tasks with same schedule number (either from same ordifferent workflows) are stored in the respective row of a 2DArraylist called FinalOrderedList. The get(x) and add(vn,i)

methods are used to access a row in the 2D Arraylist (i.e.,one schedule), and to add a new entry to a list, respectively(line 12).

Algorithm 1: Pre-scheduling phaseInput : WF : List of all workflowsOutput : FinalOrderedList, LocT ime, LocEnergy/* N: Number of workflows, WFn: The nth

workflow in the WF, LocT ime & LocEnergy:Lists storing local execution time andenergy consumption of workflows,FinalOrderedList: A 2D Arraylist in whichtasks in each row can be executed inparallel */

1 N = |WF |2 for n = 1 to N do3 LocT ime.add(CalLocalExeTime(WFn))4 LocEnergy.add(CalLocalExeEnergy(WFn))5 Sourcen = RootFinder(WFn)6 DAG?

n = SingleRootTransformer(WFn, Sourcen)7 BFS(DAG?

n, DummyRootn)8 end9 for n = 1 to N do

10 for i = 1 to |WFn| do11 integer x = CheckOrderNumber(vn,i)12 FinalOrderedList.get(x).add(vn,i)13 end14 end

4.1.2 ExampleFigure 2 demonstrates how this pre-scheduling phaseworks. Figure 2a represents two workflows with five andeight vertices. The first workflow has one source vertexwhile the second workflow has three source vertices (repre-sented by gray color). After identifying the source vertices,the SingleRootTransformer method creates a DAG?n withsingle source vertex, as depicted in Fig. 2b. Next, the BFSalgorithm is applied on the DAG?n to specify the schedulenumber for each task as depicted in Fig. 2c. This latter helpsto identify how many tasks can be executed in parallelin each schedule. When the schedule number of all tasksin all workflows are identified, the tasks with the sameschedule numbers are placed together in a 2D Arraylist(called FinalOrderedList) as depicted in Fig. 2d.

4.2 Batch Application Placement PhaseWe propose a batch application placement algorithm inwhich a Memetic Algorithm (MA) is employed to makeplacement decisions for tasks of each schedule. Becausetasks in each schedule are either independent tasks in one


V1,5

Workflow 1 Workflow 2

One task in workflow

Source Task (Source Vertex)

Dependency

V1,4V1,2

V1,1

V1,3

V2,8

V2,7

V2,6V2,5V2,4

V2,3V2,2V2,1

(a) Different workflows with identified source vertices

V1,5


Dummy Root

V1,4V1,2

V1,1

V1,3

V2,8

V2,7

V2,6V2,5V2,4

V2,3V2,2V2,1

Dummy Edge

(b) Transforming workflows to single root DAG?n

V1,5


V1,4V1,2

V1,1

V1,3

V2,8

V2,7

V2,6V2,5V2,4

V2,3V2,2V2,1

ScN:0

ScN:1

ScN:2 ScN:2 ScN:2

ScN:3

ScN: 0

SN:1 ScN:1 ScN:1

ScN:2 ScN:2 ScN:2

ScN:3

ScN:4

ScN: Schedule Number

(c) Assigning schedule numbers to tasks based on BFS

V1,1 V2,1 V2,2 V2,3

V1,2 V1,3 V1,4V2,4 V2,5 V2,6

V1,5 V2,7

V2,8

FinalOrderedList

Schedule 1

Schedule 2

Schedule 3

Schedule 4

(d) List of scheduled tasks of concurrent workflows

Figure 2: An example demonstrating the pre-scheduling phase

workflow or tasks from different workflows (which do nothave any dependency), they can be executed in parallel.

Algorithm 2: Batch task placement phaseInput : WF : The list of all workflows ,FinalOrderedList:

The 2D Arraylist containing all schedulesOutput : finalConfigs, finalCost/* N: Number of workflows, WFn: The nth

workflow, Q: Number of all schedules,MAResultList: A global 2D list container inwhich each row stores the offloadingconfiguration of one schedule,finalConfigs:A 2D Arraylist container storing obtainedsevers’ configuration of each workflow,finalCost: An array to store the executioncost of each workflow */

1 MAResultList = null2 for i = 1 to Q do3 MAResult.get(i) = APMA(FinalOrderedList.get(i))4 finalConfigs = ResultProcessor(MAResultList.get(i))5 end6 for n = 1 to N do7 finalCost[n] = CostCalculator(finalConfigs)8 end

An overview of the proposed batch application place-ment phase is presented in Algorithm 2. This phasereceives the list of all workflows WF and schedulesFinalOrderedList as an input, and outputs the workflows’configuration finalConfigs and the execution cost of allworkflows finalCost. Considering the number of schedules,the Application Placement Memetic Algorithm (APMA) isinvoked to decide for tasks of the current schedule whileconsidering the server assignments of previous schedules(line 3). Since tasks in each schedule are from one or severalworkflows, the ResultProcessor(MAResultList) method re-ceives tasks assignments of all schedules MAResultList,organize tasks assignments of each workflow, and storesthem in a 2D Arraylist called finalConfigs so that each row

represents one workflow (line 4). When task assignment ofall schedules is finished, the CostCalculator(finalConfigs)method calculates the execution cost of each workflowbased on the respective obtained configuration. Since themain function of this phase is the APMA, we illustrate howthis algorithm works in detail in what follows.

4.2.1 Application Placement Memetic Algorithm (APMA)The Memetic Algorithm (MA) is algorithmic pairing ofevolutionary-based search methods such as GA with oneor more refinement methods (i.e, local search, individuallearning), used for different types of optimization prob-lems such as routing and scheduling [22]. In the MA, eachcandidate solution is represented by an individual and thesolution is extracted from a set of candidate individualscalled population.

We propose an Application Placement Memetic Algo-rithm (APMA) based on the GA functions, in which localsearch is applied to the selected individuals of each iteration.This latter helps the APMA converge faster to the best-possible solution. In the APMA, each candidate configura-tion of servers assigned to tasks of one schedule is encodedas an individual. The atomic part of each individual is agene which represents a task in a schedule and carries atuple (x, y) illustrating the type of assigned server x and theindex of that server y. The values for each tuple is derivedfrom the Eq. 1 in which values for type and index of serversare defined. Moreover, the length of individuals in eachschedule depends on the number of genes (i.e., tasks) on thatschedule. A sample individual in our technique is depictedin Fig. 3 representing a sample configuration for tasks in thesecond schedule of Fig. 2d.

The APMA is made up of five main steps called initial-ization, selection, crossover, mutation, and local search. Thefirst four steps are among population-based operations usedin GA while the local search step is used as the refinement


[1,f] [2,c] [0,0][1,2] [1,3] [2,2]

V1,2Task

Site Identifier

Individual Gene

V1,3 V1,4 V2,4 V2,5 V2,6

Figure 3: An individual representing a sample server con-figuration for second schedule of Fig. 2d

method. Besides, the utility of each candidate individual isevaluated by a fitness function enabling the APMA to selectthe best individuals in each iteration. An overview of theAPMA is presented in Algorithm 3.

Algorithm 3: An overview of APMAInput : scheduleTasks: A set of tasks for one scheduleOutput : selectedListop.get(0)/* I:Maximum iteration number, selectedList: The

best individuals of respective populationfound in the in each iteration */

1 selectedListop=null; selectedListdp=null2 Initialization(scheduleTasks)3 selectedListop=Selection(OP )4 selectedListdp=Selection(DP )5 for i=1 to I do6 Crossover(selectedListop,selectedListdp)7 Mutation(selectedListop,selectedListdp)8 LocalSearch(selectedListop,selectedListdp)9 selectedListop = selection(OP )

10 selectedListdp = selection(DP )11 end

4.2.2 Initialization stepIn this step, required parameters for the APMA includ-ing the maximum number of iterations I, population sizePopSize, and individuals in the population are initialized.Moreover, alongside with Original Population (OP ), a newpopulation is defined to enhance the diversity of solutions,called Diversity Population (DP ). Since the main goal of theAPMA is to find the best-possible configuration of serversby which the local execution cost decreases, a pre-definedindividual is produced for the OP , in which tuple valuesof all genes are set to their respective local servers (i.e., IoTdevices). This reduces the number of low utility individualsbecause those whose fitness values are worse than the pre-defined individual are not selected in the subsequent itera-tions. The rest of the individuals in the OP and individualsin the DP are generated randomly in the initialization step.

4.2.3 Fitness functionThe APMA uses two global and local fitness functionsfor OP , which are used to evaluate the utility of eachindividual F opg (indv) (representing the utility of a servers’configuration for tasks of one schedule indv), and each taskof one workflow on that schedule F opl (vn,i) (representingthe cumulative utility of the given task plus the utility ofother tasks in that workflow), respectively. The F opl (vn,i)

receives a task vn,i and calculates the local fitness valuebased on Eq. 2 with the assumption that the executioncost of unassigned tasks in one workflow is equal to zero.Moreover, Algorithm 4 demonstrates how the global fitnessof each individual F opg (indv) is calculated. The F opg (indv) is

the sum of local fitness F opl (vn,i) of tasks on that sched-ule. However, due to the parallel execution of multipletasks of one workflow in each schedule, the maximumof local fitness F opl (vn,i) values of tasks belonging to thesame workflow MaxLoc are first calculated (line 1-11). Theresponsibility of finding tasks of the same workflow in oneschedule is handled by the ParallelTaskCheck method thatstores parallel tasks of one workflow in the parallelSet (line3). Then, the local fitness of each task in the parallelSet iscalculated and the maximum local fitness of tasks belongingto that workflow is stored in MaxLoc (line 4-10). Finally, theglobal fitness value gBest can be obtained by summationon all values of MaxLoc, which stores the maximum localfitness of each workflow up to that schedule (line 12-14).

Algorithm 4: Global fitness function of OP : F opgInput : indv: An individual showing tasks of one scheduleOutput : gBest/* WF: Set of all workflows , parallelSet = A

container to store parallel tasks of oneworkflow, MaxLoc: A container to store themaximum local fitness of each workflow inthe schedule, gBest: The global bestfitness value, N = |WF | */

1 for n=1 to N do2 parallelSet = null3 parallelSet = ParallelTaskCheck(indv, WFn)4 MaxLoc[n] = F op

l (parallelSet.get(1))5 for i=1 to |parallelSet| do6 tempMax = F op

l (parallelSeti)7 if tempMax >MaxLoc[n] then8 MaxLoc[n] = tempMax9 end

10 end11 end12 for i=1 to MaxLoc do13 gBest = gBest + MaxLoc.get(i)14 end

The principal goal of the diversity population (DP ) is todiversify the individuals in the APMA so that the probabil-ity of getting stuck in local optimum decreases. Hence, thefitness function of DP , F dpg (indv), is different from the OPand is calculated in what follows:

F dpg (indvdpr ) =

PopSize∑i=1

H(indvopi , indvdpr ) (23)

where PopSize represents the population size ofOP andDPin the APMA. Individual of OP and DP are displayed byindvopi and indvdpr , respectively. Besides, H(indvopi , indvdpr )

is the Hamming distance function that calculates the differ-ence between individuals received as its arguments in termsof assigned servers to their tasks, and is defined as:

H(indvopi , indvdpr ) =f∑k=1

df (24)

where f displays the size of that individual (i.e., schedule).In Eqs. 23 and 24, to calculate the fitness of one individ-

ual of DP , we calculate its difference by all individuals inthe OP , and the individual with a higher difference receivesbetter fitness value. This helps to maintain individuals witha higher difference in the DP that better diversify theindividuals in the APMA. Since different type of servers(i.e., IoT, Fog, and cloud) with different number of servers


in each type (i.e., server index) are considered in the systemmodel, a diversity factor df is defined which describes thefitness of each task according to the type and index of itsassigned server. This latter is obtained from what follows:

df =

2, sgn(|ST (indvopi,k)− ST (indvdpr,k)|) = 1

sgn(|ST (indvopi,k)− ST (indvdpr,k)|) = 0

1, &sgn(|SI(indvopi,k)− SI(indvdpr,k)|) = 1

sgn(|ST (indvopi,k)− ST (indvdpr,k)|) = 0

0, &sgn(|SI(indvopi,k)− SI(indvdpr,k)|) = 0

(25)

where the kth task (i.e., gene) on those individuals aredepicted as indvopi,k and indvdpr,k, respectively. sgn is thesymbolic function, which is defined as:

sgn(|x− y|) =

{0, x = y

1, x 6= y(26)

According to Eq. 25, if the server type of each task inthe DP (i.e., ST (indvdpr,k)) is different from the server type ofcorresponding task in an individual of OP (i.e., ST (indvopi,k)),it receives higher fitness value. However, in condition thatthe server types of these tasks are equal, the df is set to 1.Moreover, if the two tasks are assigned to exactly one server(i.e., same server type and server index), the fitness valuefor that task in the DP is equal to zero.

4.2.4 Selection stepThe goal of selection is to choose the high utility individualsfrom both OP and DP based on their respective fitnessfunctions for next iterations. To achieve this, the individualsof OP and DP are sorted based on their respective fitnessfunctions and the top three of individuals plus one randomindividual from each population are selected and stored inthe selectedListop and selectedListdp, respectively.

4.2.5 Crossover and Mutation stepsThe goal of crossover step is to generate new individuals(called offspring) by a combination of individuals selected inthe selection step (called parents). The APMA applies a two-point crossover operation to each pair of selected parentsand creates two offspring from them. In each iteration,the total number of new offspring for each population iscalculated based on the following equation:

offspringNumber =n!

(n− k)!, k = 2 (27)

In the two-point crossover, two crossover points are ran-domly selected from the parents. Then, genes in betweenthe two crossover points are exchanged between the parentindividuals while the rest remain unchanged. Since theAPMA uses two populations OP and DP , the crossoverbetween individuals of each population is called inbreeding,while the crossover between individuals of different popu-lations is called crossbreeding. The crossbreeding providesdiversity in individuals which helps to avoid local optimalvalues with higher probability. Besides, the outcomes ofcrossbreeding are stored in selected list of both populations

selectedListop, selectedListdp, while the results of inbreed-ings are only stored in the selected list of respective popula-tions.

In the APMA, the mutation function, based on the pre-defined probability, modifies several genes of offspring inhope of generating individuals with higher utility.

4.2.6 Local search stepConsidering the fact that crossover points and genes for themutation are selected randomly, a new function called localsearch is defined which works based on the local fitnessfunction of the OP (F opl (vn,i)). It is worth mentioning thatthe randomness provided by the crossover function andmutation is essential since it provides the opportunity tojump out from local optimal points with a higher probabil-ity. The local search function, alongside with those randomfunctions, leads to faster convergence to the global optimalsolution. Algorithm 5 demonstrates the process of localsearch step.

Algorithm 5: Local search stepInput : selectedListop: Selected list of the OP ,

selectedListdp: Selected list of the DP/* tempList: A temporary list container storing

the best-found tuple values for each genein the individual */

1 size=|selectedListop|2 tempList=setList(MAXINT)3 for i=1 to |indv| do4 for j=1 to size do

% j iterates over |selectedListop|5 if F op

l (indvopj,i) < tempList.get(i) then6 tempList[i]=F op

l (indvopj,i)

7 end8 end9 end

10 selectedListop.add(CreateNewIndv(tempList.get(i)))11 UpdatePop(OP ,selectedListop)12 UpdatePop(DP ,selectedListdp)

Although the local search function increases the prob-ability to converge faster to the global optimal solutions,two problems may occur. First, if the local search functionsare used solely, the probability of getting stuck in the localoptimal points increases. Second, for problems with a largesolution space, the local search function requires a signifi-cant amount of time to visit the search space. Hence, thesetwo factors should be considered while designing a localsearch function in the APMA. To address the first issue, thecrossover and mutation functions which provide random-ness are kept in the APMA. Moreover, the diversity popula-tion DP is created which ensures diversity in each iteration.To benefit from the local search function while decreasing itssearching time, we reduce the search space for local searchby only considering the individuals in the selected list ofOP (i.e., selectedListop) (line 1). The setList(MAXINT )

initializes the tempList with infinite value for all its indexes.Considering individuals in the selectedListop, genes with thesame index number are evaluated in terms of their localfitness values F opl (indvopj,i) and best genes are selected andstored in the respective index number of tempList (line 3-9). Since the fitness function is defined according to theexecution cost, the less fitness value means better assign-ment (line 5). Afterward, a new individual is created and


stored in the selectedListop (line 10). Finally, the updatedselectedListop in the local search step and the selectedListdp

are then combined with the OP and DP respectively, andtop individuals of each population (up to the PopSize) areselected for the populations of the next iteration (line 11-12).

Whenever the APMA reaches to its stopping criteria, thebest individual of the OP stored in selectedListop.get(0) isreturned as the result of the APMA.

4.3 Failure Recovery Phase

Failures can happen in any systems, and hence, providingan efficient failure recovery method is of paramount impor-tance. In our system, brokers always keep records of freeservers and check whether they are planned to performa task in the near future or not. Besides, considering theassigned server to each task, they estimate the completioncost of each task based on its local fitness value F opl (vn,i).So, if the execution of any tasks fails, the failure recoverymethod is called to select a surrogate server for that task.The failure recovery method receives the list of current freeservers (including IoT devices) and failed task as inputs.Then, it calculates the local fitness value F opl (vn,i) of thattasks for free servers. Finally, tasks will be forwarded to theserver with the least F opl (vn,i) for the execution.

4.4 Complexity Analysis

The Time Complexity (TC) of our technique depends on itsthree phases. We consider the number of incoming work-flows to the broker as N and the maximum number of tasksfor all workflows as L. The most time-consuming part inthe pre-scheduling phase (Algorithm 1) is the BFS whichrequires O(L+ |E|) time to visit all tasks of one workflow inwhich |E| represents the number of data flows. In the denseDAG, the |E| = O(L2). Hence, the TC of pre-schedulingphase at the worst case is of O(N × L2). In addition, in thebest-case scenario, if we assume N = 1, and |E| = O(L) forsparse DAGs, the TC is of O(L).

The batch task placement phase (Algorithm 2) calls theAPMA (Algorithm 3) Q times where Q represents the num-ber of schedules. To calculate the TC of the second phase, weignore the iteration size I and the population size popSizeof the APMA since they are constant values. In the APMA,the local fitness function F opl (vn,i) and ParallelTaskCheck

which are invoked from the global fitness function (Al-gorithm 4) are the most repeated functions, defining theTC of the batch application placement phase. The TC ofParallelTaskCheck depends on the size of indv which atmost can be N × (L − 1) in the case that each workflowhas L − 1 parallel tasks in one schedule. Hence, the TC ofparallelTaskCheck at the worst case is of O(Q × N2 × L).The maximum length of parallelSet (line 5 of Algorithm4) is L − 1, and hence, the local fitness function F opl (vn,i)

is called Q × N × (L − 1) times. Moreover, the instructionsin the F opl (vn,i) at most can be executed L times since thelocal fitness function only considers tasks of one workflowwhich are at most L. Finally, the TC of the batch taskplacement phase (Algorithm 2) at the worst case is ofO(Q × (N × L2 + N2 × L)). In addition, in the best-casescenario, if we assume N = 1, the TC is of O(Q× L2).

The TC of the failure recovery phase depends on the TCof local fitness function F opl (vn,i) which is of O(L), and thenumber of free servers which at most is equal to all availableservers in the system M . Hence the TC of this phase atthe worst case is of O(M × L). In addition, in the best-casescenario, no failure happens in the system.

Considering that in all cases 2 ≤ Q, the TC of ourtechnique at the worst case is polynomial and is representedasO(Q(NL2+N2L)+ML). Besides, in the best-case scenario,where N = 1, Q = 2, and no failures occur in the system, theTC is of O(L2).

5 PERFORMANCE EVALUATION

In this section, the system setup and parameters, and de-tailed performance analysis of our technique in comparisonto its counterparts (especially [4]) are provided.

5.1 System Setup and Parameters

In our experiments, all techniques are implemented andevaluated using iFogSim simulator [23]. We used two typesof workflows, namely, real workflows of applications andsynthetic workflows. For the real workflows, we used theDAGs extracted from the face recognition application [16](Workflow1) and the QR code recognition application [24](Workflow2). Moreover, to consider other possible formsof workflows, several synthetic workflows are generated(Workflow3 to Workflow6). We consider an environmentin which six IoT devices are available and each IoT devicehas one specific workflow from Workflow1 to Workflow6.Each group of six IoT devices is connected to one fogbroker, and fog brokers have access to six fog servers andthree cloud servers. In this setup, each fog server has threeVMs while each cloud server is assumed to have 16 VMs.The computing power of IoT devices is considered as 500MIPS [4] and their power consumption in processing andidle states are 0.9W and 0.3W respectively. Besides, thetransmission power consumption of IoT devices is 1.3W[25]. We also assume that the computing power of each VMof fog servers is 6 or 8 times more than IoT devices [4], [26]while the computing power of each VM of cloud serversare 10 or 12 times more than IoT devices [4]. The summaryof our evaluation parameters and their respective values ispresented in Table 2.

Table 2: Evaluation parameters

Evaluation Parameters ValueNumber of IoT devices 6Number of Fog/Edge servers 6Number of Cloud servers 3Bandwidth of LAN (2000,4000) KB/sBandwidth of WAN (500,1000) KB/sDelay of LAN 0.5 msDelay of WAN 30 msComputing power of IoT devices 500 MIPSSpeedup Factor of Fog/Edge Servers’ VMs (6, 8)Speedup Factor of Cloud Servers’ VMs (10, 12)Idle Power Consumption of IoT device 0.3 WCPU power of IoT devices 0.9 WTransmission Power of IoT devices 1.3 W


0

2

4

6

8

10

12

Workflow 1 Workflow 2 Workflow 3 Workflow 4 Workflow 5 Workflow 6

Ex

ecu

tion

Tim

e (

S)

Local Only Edge Only Cloud Proposed Solution COM 2019 ULOOF

(a) Execution time(LAN:2000 KB/s, WAN:500 KB/s)

0

1

2

3

4

5

6

7

8

9

10


En

erg

y C

on

sum

pti

on

(J)


(b) Energy consumption(LAN: 2000 KB/s, WAN: 500 KB/s)

0

0.2

0.4

0.6

0.8

1


Weig

hte

d C

ost


(c) Weighted cost(LAN: 2000 KB/s, WAN: 500 KB/s)

0

2

4

6

8

10


Ex

ecu

tion

Tim

e (

S)


(d) Execution time(LAN:4000 KB/s, WAN:1000 KB/s)

0

1

2

3

4

5

6

7

8

9

10


En

erg

y C

on

sum

pti

on

(J

)


(e) Energy consumption(LAN: 4000 KB/s, WAN: 1000 KB/s)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Weig

hte

d C

ost


(f) Weighted cost(LAN: 4000 KB/s, WAN: 1000 KB/s)

Figure 4: Execution cost of workflows with different bandwidth values

5.2 Performance Study

We employed three quantitative parameters including ex-ecution time, energy consumption, and weighted cost tocomprehensively study the behavior of our technique indifferent experiments. Five experiments are conducted toanalyze the efficiency of techniques in terms of variousbandwidths, different iteration sizes, techniques’ decisiontimes, failure recovery, and system size analysis. Both ψγand ψθ are set to 0.5 meaning that the importance of exe-cution time and energy consumption is equal in the results.However, these parameters can be adjusted based on theusers’ requirements and network conditions. To analyzethe efficiency of our technique, the following methods areimplemented for comparisons:

• Local: In this method, all tasks of workflows are exe-cuted locally on their respective IoT devices, and hence,no parallel execution of tasks can be performed forworkflows. The results of this method can be used asa reference point to analyze the gain of applicationplacement techniques.

• Only Edge: In this method, all tasks of workflows areoffloaded to the fog/edge servers in the edge layer forthe execution. If the VMs of all servers are full and thereis no free VMs, the remaining tasks have to wait untilfree computing resources become available.

• Only Cloud: In this method, all tasks of workflows areexecuted on the cloud servers.

• COM2019: To the best of our knowledge, there is nowork considering batch application placement in a sce-nario with multiple IoT devices, multiple fog servers,and multiple cloud servers. Therefore, we updated thefitness function and chromosome structure of the [4],which only consider single fog server and single cloudserver, to become compatible with our system model.Afterward, the efficiency of its heuristics and searchingmethods are compared with the other techniques.

• ULOOF: This is the extended version of user level

online offloading technique [15], so that it can considerscenarios with multiples cloud and fog/edge server fortask placement.

The obtained results of each workflow are the average of10,000 runs with a 95% confidence interval.

5.2.1 Bandwidth AnalysisIn this experiment, we study the behavior of techniquesin various bandwidth values as depicted in Fig. 4. Themaximum iteration size I and population size PopSize areset to 100 and 20, respectively.

Figure 4 shows that as the bandwidth increases, theexecution time, energy consumption, and weighted cost ofworkflows decrease, meaning better application placementgain in comparison to local execution of workflows. More-over, in most of cases, the only edge method outperformsthe only cloud because the fog servers are distributed atthe proximity of IoT devices and can be accessed by higherBandwidth and less latency. However, since the resourcesof fog servers are limited compared to cloud servers, itcannot obtain the best-possible outcome. This is why theCOM2019 and the ULOOF obtain better results in mostscenarios than only cloud and only edge methods. Theyuse the resources of cloud and fog servers simultaneously,resulting in the parallelization of more tasks. As it canbe seen, our proposed technique is superior to all othermethods due to two important reasons. First, similar to theCOM2019 and the ULOOF, it utilizes the resources of fogand cloud servers simultaneously. Second, due to its localfitness function, local search, and the diversity providedby the DP , it stays away from local optimal values withhigher probability, converges faster to the optimal solution,and hence, outperforms the COM2019 and the ULOOF.

It is worth mentioning that in some cases such asWorkflow5 in Fig. 4c, the weighted cost of the only cloudmethod is less than the local execution, however, its execu-tion time in Fig. 4a is far more than the local execution. Thisis because the ψγ and ψθ are set to 0.5, which give equal


0

2

4

6

8

10


Ex

ecu

tio

n T

ime (

S)

Iteration 50 Proposed Iteration 100 Proposed Iteration 150 Proposed

Iteration 200 Proposed Iteration 50 COM2019 Iteration 100 COM2019

Iteration 150 COM2019 Iteration 200 COM2019 Local

Only Edge Only Cloud ULOOF

(a) Execution time(LAN: 2000 KB/s, WAN: 500 KB/s)

0

1

2

3

4

5

6

7

8

9


En

erg

y C

on

su

mp

tio

n (

J)





(b) Energy consumption(LAN: 2000 KB/s, WAN: 500 KB/s)

0

0.2

0.4

0.6

0.8

1


Weig

hted

Co

st





(c) Weighted cost(LAN: 2000 KB/s, WAN: 500 KB/s)

Figure 5: Execution cost of workflows with different maximum iteration number values

importance to execution time and energy consumption.Therefore, due to lower value for the energy consumptionin this workflow compared to its obtained execution time,the weighted cost shows low gain for the task placement.

5.2.2 Maximum iteration number analysisOne of the important parameters for comparing evolu-tionary application placement techniques is the maximumiteration number, through which their convergence speedto the optimal solution can be evaluated. In this exper-iment, the performance of COM2019 and our techniqueare studied. Since the solution of the local execution, onlyedge, only cloud, and ULOOF methods do not change indifferent iterations, the obtained results of these methodsare just depicted to better understand the efficiency of othertechniques. For this experiment, the PopSize, the LAN, andWAN bandwidths are set to 20, 2000 KB/s and 500 KB/s,respectively.

It can be seen from Fig. 5 that the increase in maximumnumber of iterations I leads to better solutions for both ourtechnique and the COM2019 for all workflows in compari-son to the ULOOF, local, only edge, and only cloud methods.However, our technique converges to the better solution ina smaller number of iteration compared to the COM2019.The Fig. 5a shows that the obtained results of our techniquein I = 50 for all workflows outperform the obtained resultsof the COM2019 even at I = 200. This trend can also be seenin Fig. 5c for weighted cost of execution, while in Fig. 5bthe obtained results of the COM2019 and our technique arecloser to each other. It is important to note that althoughbetter solutions can be found by increasing the maximumnumber of iterations (if the techniques do not get stuck inthe local optimal points), the decision time of algorithmsalso increases that can be critical for some of workflows,especially for latency-sensitive ones.

5.2.3 Decision time analysisThis experiment analyzes the efficiency of each techniquebased on the decision time required to obtain a well-suitedsolution. Although application placement algorithms offerserver configurations by which the execution time andenergy consumption of IoT applications can be reduced, thetime that they spend to reach that solution is also important.This is mainly because obtaining good server configurationsfor IoT applications in a long period of time can negatively

Table 3: Decision time analysis

Decision

TimeTechnique

Workflow Execution Time Result

WF1 WF2 WF3 WF4 WF5 WF6

100 msProposed 2.412 2.467 2.758 3.638 3.837 1.649

COM2019 4.333 2.917 3.422 6.276 6.526 3.09

200 msProposed 2.345 2.397 2.610 3.430 3.384 1.446

COM2019 4.073 2.707 2.984 5.344 5.109 2.529

300 msProposed 2.288 2.302 2.455 2.869 3.362 1.344

COM2019 3.656 2.494 2.868 4.388 4.709 2.746

400 msProposed 2.229 2.204 2.403 2.587 2.870 1.304

COM2019 3.623 2.445 2.753 3.663 4.295 2.523

affect the execution time requirements of IoT applications.Another important reason elaborating the importance ofthe decision time analysis, especially for evolutionary al-gorithms, is that only iteration size analysis cannot solelyjudge the efficiency of one application placement technique.This is because one technique can reach to better solutionsin a small number of iterations compared to its counterparts,however, the time spent on each iteration may be far morethan other techniques resulting in longer decision time.Hence, although the maximum iteration size analysis isrequired, the decision time analysis acts as a supplementaryanalysis to ensure the efficiency of one technique. In thisexperiment, the population size PopSize is set to 20, and theLAN and WAN bandwidths are 2000 KB/s and 500 KB/s,respectively.

Table 3 represents obtained execution times of our pro-posed solution and COM2019 for four different decisiontimes. Since the execution time result of the ULOOF doesnot change in different decision times, its respective resultsare not presented in Table 3, however, its average decisiontime is roughly 30 ms. As the decision time of techniquesincreases from 100 ms to 400 ms, the execution time oftechniques decreases meaning that the higher utility resultsare obtained. The obtained results of our solution gradu-ally decrease from 100 ms to 400 ms, while the results ofCOM2019 has a significant decreasing trend in the range of100-200 ms and 200-300 ms, and gradually decrease between300-400 ms, which means that the results of COM2019 ap-proximately converged at 400 ms. It can be clearly seen thatour technique not only provides better values compared tothe COM2019 in the equivalent decision time, but its resultsat 100 ms also outperform the results of the COM2019 at


0

20

40

60

80

100

120

140

160

180

# IoT=6 # IoT=12 # IoT=18 # IoT=24

CE

T (

S)


(a) Cumulative Execution Time (CET)

0

20

40

60

80

100

120

140

160

180

# IoT=6 # IoT=12 # IoT=18 # IoT=24

CE

C (

J)


(b) Cumulative Energy Consumption(CEC)

0

5

10

15

20

25

30

# IoT=6 # IoT=12 # IoT=18 # IoT=24

CW

C


(c) Cumulative Weighted Cost (CWC)

Figure 6: System size analysis with different number of IoT devices per fog broker

Table 4: Failure recovery analysis

TechniqueWorkflow Execution Time Results

WF1 WF2 WF3 WF4 WF5 WF6Proposed

(FR Mode)2.7132 2.6243 2.8642 3.4125 3.6321 1.4685

Local 6.4354 10.031 5.5194 8.9654 6.0520 8.0180

400 ms. This demonstrates that, regardless of number ofiterations, our technique converges faster to the optimalsolutions.

5.2.4 Failure recovery analysisThis experiment analyzes the effect of failure recoverymethod in application placement techniques. Since theCOM2019 and ULOOF do not have any failure recoverymethod, we present results of our technique with failurerecovery mode (FR Mode) when the probability of failureoccurrence is 5% in comparison to the local execution,as depicted in Table 4. In this experiment, the maximumiteration size I is equal to 100 and values of the rest ofparameters are set as same as parameters in decision timeanalysis.

Table 4 shows that obtained results of our technique withFR mode still outperform results of local execution for allworkflows and achieve offloading gain. In techniques ignor-ing failure recovery in their consideration, failed tasks resultin incomplete execution of workflows due to dependenciesamong tasks of one workflow. However, our technique, byaccepting a small overhead of failure recovery phase, canachieve a reasonable gain in comparison to local execution.

5.2.5 System size analysisIn this experiment, we analyze the effect of system size ondifferent application placement techniques. In our system,each fog broker makes application placement decisions forits respective IoT devices. Hence, to analyze the perfor-mance of our proposed technique, we increase the numberof IoT devices and fog servers per each fog broker from 6 to24 by the step of 6. Moreover, in this experiment, we use thesame workflows as the previous experiments. In addition,the LAN, and WAN bandwidths are set to 2000 KB/s and500 KB/s, respectively, and the rest of parameters are as thesame as values of Table 2.

The Fig. 6 shows the result of Cumulative ExecutionTime (CET), Cumulative Energy Consumption (CEC), andCumulative Weighted Cost (CWC) when different numbers

of IoT devices are connected to one fog broker. The termcumulative refers to the aggregate execution cost of all IoTdevices (e.g., the CET shows the aggregate execution timeof all IoT devices in scenarios with different number ofIoT devices). In Fig. 6, the CET, CEC, and CWC increaseas the number of IoT devices increases. In all scenarios,the CET, CEC, and CWC of all methods are lower thanthe local execution cost, however, our proposed techniqueoutperforms other methods in all scenarios and results inlower cost. In addition, the performance of the ULOOFand COM2019 is roughly the same in scenarios with sixIoT devices, however the ULOOF shows better performancefor the rest of scenarios. This latter is because ULOOF isindependent of maximum number of iteration while the per-formance of the COM2019 largely depends on the maximumnumber of iterations.

6 CONCLUSIONS AND FUTURE WORK

We proposed a weighted cost model for optimizing the exe-cution time and energy consumption of IoT devices in a het-erogeneous computing environment, in which multiple IoTdevices, multiple fog servers, and multiple cloud servers areavailable. We also proposed a batch application placementtechnique based on the Memetic Algorithm to efficientlyplace tasks of different workflows on appropriate serversin a timely manner. Besides, a light-weight failure recoverytechnique is proposed to overcome the potential failuresin the execution of tasks in runtime. The effectiveness ofour technique is analyzed through extensive experimentsand comparisons by the state-of-the-art techniques in theliterature. The obtained results demonstrate that our tech-nique improves its counterparts by 65% and 51% in termsof weighted cost in bandwidth analysis and execution timein decision time analysis, respectively. The performanceresults demonstrate that our technique achieves up to 65%improvement over existing counterparts in terms of theweighted cost.

As part of future work, we plan to extend our proposedweighted cost model to consider other aspects such as mon-etary cost. Moreover, we plan to apply mobility models inthis scenario and adapt our proposed application placementtechnique accordingly.

REFERENCES

[1] Internet of things at-a-glance, 2016, [accessed 8 4 2019]. [Online].Available: https://www.cisco.com/c/dam/en/us/products/collateral/se/internet-of-things/at-a-glance-c45-731471.pdf

https://www.cisco.com/c/dam/en/us/products/collateral/se/internet-of-things/at-a-glance-c45-731471.pdf

https://www.cisco.com/c/dam/en/us/products/collateral/se/internet-of-things/at-a-glance-c45-731471.pdf


[2] P. Hu, S. Dhelim, H. Ning, and T. Qiu, “Survey on fog comput-ing: architecture, key technologies, applications and open issues,”Journal of Network and Computer Applications, vol. 98, pp. 27–42,2017.

[3] M. Goudarzi, M. Zamani, and A. T. Haghighat, “A fast hybridmulti-site computation offloading for mobile cloud computing,”Journal of Network and Computer Applications, vol. 80, pp. 219–231,2017.

[4] X. Xu, Q. Liu, Y. Luo, K. Peng, X. Zhang, S. Meng, andL. Qi, “A computation offloading method over big data for iot-enabled cloud-edge computing,” Future Generation Computer Sys-tems, vol. 95, pp. 522–533, 2019.

[5] M. Goudarzi, M. Palaniswami, and R. Buyya, “A fog-drivendynamic resource allocation technique in ultra dense femtocellnetworks,” Journal of Network and Computer Applications, vol. 145,p. 102407, 2019.

[6] Z. Zhu, T. Liu, Y. Yang, and X. Luo, “Blot: Bandit learning-basedoffloading of tasks in fog-enabled networks,” IEEE Transactions onParallel and Distributed Systems, 2019, (in press).

[7] J. Wang, K. Liu, B. Li, T. Liu, R. Li, and Z. Han, “Delay-sensitivemulti-period computation offloading with reliability guaranteesin fog networks,” IEEE Transactions on Mobile Computing, 2019, (inpress).

[8] L. Huang, X. Feng, L. Zhang, L. Qian, and Y. Wu, “Multi-servermulti-user multi-task computation offloading for mobile edgecomputing networks,” Sensors, vol. 19, no. 6, p. 1446, 2019.

[9] D. G. Roy, D. De, A. Mukherjee, and R. Buyya, “Application-awarecloudlet selection for computation offloading in multi-cloudletenvironment,” The Journal of Supercomputing, vol. 73, no. 4, pp.1672–1690, 2017.

[10] E. El Haber, T. M. Nguyen, D. Ebrahimi, and C. Assi, “Compu-tational cost and energy efficient task offloading in hierarchicaledge-clouds,” in Proceeding of the 29th IEEE International Symposiumon Personal, Indoor and Mobile Radio Communications (PIMRC).IEEE, 2018, pp. 1–6.

[11] Y. Nan, W. Li, W. Bao, F. C. Delicato, P. F. Pires, and A. Y.Zomaya, “A dynamic tradeoff data processing framework fordelay-sensitive applications in cloud of things systems,” Journalof Parallel and Distributed Computing, vol. 112, pp. 53–66, 2018.

[12] R. Mahmud, S. N. Srirama, K. Ramamohanarao, and R. Buyya,“Quality of experience (qoe)-aware placement of applications infog computing environments,” Journal of Parallel and DistributedComputing, vol. 132, pp. 190–203, 2018.

[13] M.-H. Chen, B. Liang, and M. Dong, “Joint offloading and resourceallocation for computation and communication in mobile cloudwith computing access point,” in Proceeding of the IEEE Conferenceon Computer Communications (INFOCOM). IEEE, 2017, pp. 1–9.

[14] Z. Hong, W. Chen, H. Huang, S. Guo, and Z. Zheng, “Multi-hop cooperative computation offloading for industrial iot-edge-cloud computing environments,” IEEE Transactions on Parallel andDistributed Systems, 2019, (in press).

[15] J. L. D. Neto, S.-Y. Yu, D. F. Macedo, J. M. S. Nogueira, R. Langar,and S. Secci, “Uloof: a user level online offloading framework formobile edge computing,” IEEE Transactions on Mobile Computing,vol. 17, no. 11, pp. 2660–2674, 2018.

[16] H. Wu, W. Knottenbelt, and K. Wolter, “An efficient applicationpartitioning algorithm in mobile environments,” IEEE Transactionson Parallel and Distributed Systems, vol. 30, no. 7, pp. 1464–1480,2019.

[17] L. Lin, P. Li, X. Liao, H. Jin, and Y. Zhang, “Echo: An edge-centriccode offloading system with quality of service guarantee,” IEEEAccess, vol. 7, pp. 5905–5917, 2018.

[18] G. L. Stavrinides and H. D. Karatza, “A hybrid approach toscheduling real-time iot workflows in fog and cloud environ-ments,” Multimedia Tools and Applications, pp. 1–17, 2018.

[19] R. Mahmud, K. Ramamohanarao, and R. Buyya, “Latency-awareapplication module management for fog computing environ-ments,” ACM Transactions on Internet Technology (TOIT), vol. 19,no. 1, p. 9, 2018.

[20] S. Bi, L. Huang, and Y.-J. A. Zhang, “Joint optimization of servicecaching placement and computation offloading in mobile edgecomputing system,” arXiv preprint arXiv:1906.00711, 2019.

[21] M. Goudarzi, M. Zamani, and A. Toroghi Haghighat, “A genetic-based decision algorithm for multisite computation offloading inmobile cloud computing,” International Journal of CommunicationSystems, vol. 30, no. 10, p. e3241, 2017.

[22] X. Chen, Y.-S. Ong, M.-H. Lim, and K. C. Tan, “A multi-facet sur-vey on memetic computation,” IEEE Transactions on EvolutionaryComputation, vol. 15, no. 5, pp. 591–607, 2011.

[23] H. Gupta, A. Vahid Dastjerdi, S. K. Ghosh, and R. Buyya, “ifogsim:A toolkit for modeling and simulation of resource managementtechniques in the internet of things, edge and fog computingenvironments,” Software: Practice and Experience, vol. 47, no. 9, pp.1275–1296, 2017.

[24] L. Yang, J. Cao, Y. Yuan, T. Li, A. Han, and A. Chan, “A frameworkfor partitioning and execution of data stream applications in mo-bile cloud computing,” ACM SIGMETRICS Performance EvaluationReview, vol. 40, no. 4, pp. 23–32, 2013.

[25] K. Kumar and Y.-H. Lu, “Cloud computing for mobile users: Canoffloading computation save energy?” Computer, no. 4, pp. 51–56,2010.

[26] L. F. Bittencourt, J. Diaz-Montes, R. Buyya, O. F. Rana, andM. Parashar, “Mobility-aware application scheduling in fog com-puting,” IEEE Cloud Computing, vol. 4, no. 2, pp. 26–35, 2017.

Mohammad Goudarzi is working towards thePh.D. degree at the Cloud Computing and Dis-tributed Systems (CLOUDS) Laboratory, Depart-ment of Computing and Information Systems,the University of Melbourne, Australia. He wasawarded the Melbourne International ResearchScholarship (MIRS) supporting his studies. Hisresearch interests include Internet of Things(IoT), Fog Computing, Wireless Networks, andDistributed Systems.

Huaming Wu received the B.E. and M.S. de-grees from Harbin Institute of Technology, Chinain 2009 and 2011, respectively, both in electri-cal engineering. He received the Ph.D. degreewith the highest honor in computer science atFreie Universitat Berlin, Germany in 2015. Heis currently an associate professor in the Centerfor Applied Mathematics, Tianjin University. Hisresearch interests include mobile cloud comput-ing, edge computing, fog computing, Internet ofThings (IoTs), and deep learning.

Marimuthu Palaniswami is a Fellow of IEEEand past distinguished lecturer of the IEEE Com-putational Intelligence Society. He received hisPh.D. from the University of Newcastle, Aus-tralia before joining the University of Melbourne,where he is a Professor of Electrical Engineer-ing. Previously, he was a Co-Director of Centreof Expertise on Networked Decision & SensorSystems. He has published more than 500 refer-eed journal and conference papers, including 3books, 10 edited volumes.

Rajkumar Buyya is a Redmond Barry Distin-guished Professor and Director of the CloudComputing and Distributed Systems (CLOUDS)Laboratory at the University of Melbourne, Aus-tralia. He has authored over 625 publicationsand seven text books including ”Mastering CloudComputing” published by McGraw Hill, ChinaMachine Press, and Morgan Kaufmann for In-dian, Chinese and international markets respec-tively. He is one of the highly cited authorsin computer science and software engineering

worldwide (h-index=130, g-index=280, 90,300+ citations).

Documents

IEEE TRANSACTIONS ON MOBILE COMPUTING, …IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL.XX, NO.XX, 201X 2 Considering the potential of fog computing, IoT devices can perform their resource-hungry