9
Research Article Learning-Based QoS Control Algorithms for Next Generation Internet of Things Sungwook Kim Department of Computer Science, Sogang University, 35 Baekbeom-ro (Sinsu-dong), Mapo-gu, Seoul 121-742, Republic of Korea Correspondence should be addressed to Sungwook Kim; [email protected] Received 24 July 2015; Revised 2 October 2015; Accepted 20 October 2015 Academic Editor: Yassine Hadjadj-Aoul Copyright © 2015 Sungwook Kim. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e Internet has become an evolving entity, growing in importance and creating new value through its expansion and added utilization. e Internet of ings (IoT) is a new concept associated with the future Internet and has recently become popular in a dynamic and global network infrastructure. However, in an IoT implementation, it is difficult to satisfy different Quality of Service (QoS) requirements and achieve rapid service composition and deployment. In this paper, we propose a new QoS control scheme for IoT systems. Based on the Markov game model, the proposed scheme can effectively allocate IoT resources while maximizing system performance. In multiagent environments, a game theory approach can provide an effective decision-making framework for resource allocation problems. To verify the results of our study, we perform a simulation and confirm that the proposed scheme can achieve considerably improved system performance compared to existing schemes. 1. Introduction In the past forty years, the Internet has grown into a network that connects an estimated 1.8 billion users and has attained a global penetration rate of almost 25%. Telecommunications and the Internet are forming an increasingly integrated sys- tem for processing, storing, accessing, and distributing infor- mation and managing content. is convergence is based on the rapid evolution of digital technology and the diffusion of the concept of the Internet. In recent years, the steps of penetration of digital technologies, the evolution towards integrated telecommunications, information technology, and the electronic media sector have been actively presented. Developments in many different technologies are creating a significant, innovative, technical potential for the production, distribution, and consumption of information services [1, 2]. In 1999, Ashton first presented the concept of the Internet of ings (IoT) [2], a technological revolution that promotes a new ubiquitous connectivity, computing, and communica- tion era. e IoT is a vision wherein the Internet extends into our everyday lives through a wireless network of uniquely identifiable objects. erefore, the development of the IoT depends on dynamic technical innovations in a number of fields, including wireless sensors and nanotechnology [3]. Furthermore, the IoT service infrastructure is expected to promptly evaluate the Quality of Services (QoS) and provide satisfying services by considering things such as the preferences of users’ device capability and current network status. However, the definition of QoS in the IoT is not clear because it has been poorly studied. To adaptively manage an IoT system, a new QoS control model is necessary. is model must be able to balance network availability with information accuracy in delivering data [4–8]. A fundamental challenge of QoS management is that relatively scarce network resources must be selected and allocated in a prudent manner to maximize system perfor- mance [7, 8]. To adaptively allocate network resources, game theory has been widely applied in mission-critical network management problems. Typically, game theory is used to study strategic situations where players choose different actions in an attempt to maximize their payoffs, depending upon the choices of other individuals. erefore, game theory provides a framework for modeling and analyzing various interactions between intelligent and rational game players in conflict situations [6]. Hindawi Publishing Corporation Mobile Information Systems Volume 2015, Article ID 605357, 8 pages http://dx.doi.org/10.1155/2015/605357

Research Article Learning-Based QoS Control …downloads.hindawi.com/journals/misy/2015/605357.pdf · Learning-Based QoS Control Algorithms for Next Generation Internet of Things

  • Upload
    vothuy

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Learning-Based QoS Control …downloads.hindawi.com/journals/misy/2015/605357.pdf · Learning-Based QoS Control Algorithms for Next Generation Internet of Things

Research ArticleLearning-Based QoS Control Algorithms forNext Generation Internet of Things

Sungwook Kim

Department of Computer Science Sogang University 35 Baekbeom-ro (Sinsu-dong) Mapo-gu Seoul 121-742 Republic of Korea

Correspondence should be addressed to Sungwook Kim swkim01sogangackr

Received 24 July 2015 Revised 2 October 2015 Accepted 20 October 2015

Academic Editor Yassine Hadjadj-Aoul

Copyright copy 2015 Sungwook Kim This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

The Internet has become an evolving entity growing in importance and creating new value through its expansion and addedutilization The Internet of Things (IoT) is a new concept associated with the future Internet and has recently become popular in adynamic and global network infrastructure However in an IoT implementation it is difficult to satisfy different Quality of Service(QoS) requirements and achieve rapid service composition and deployment In this paper we propose a new QoS control schemefor IoT systems Based on the Markov game model the proposed scheme can effectively allocate IoT resources while maximizingsystem performance In multiagent environments a game theory approach can provide an effective decision-making frameworkfor resource allocation problems To verify the results of our study we perform a simulation and confirm that the proposed schemecan achieve considerably improved system performance compared to existing schemes

1 Introduction

In the past forty years the Internet has grown into a networkthat connects an estimated 18 billion users and has attaineda global penetration rate of almost 25 Telecommunicationsand the Internet are forming an increasingly integrated sys-tem for processing storing accessing and distributing infor-mation and managing content This convergence is based onthe rapid evolution of digital technology and the diffusionof the concept of the Internet In recent years the steps ofpenetration of digital technologies the evolution towardsintegrated telecommunications information technology andthe electronic media sector have been actively presentedDevelopments in many different technologies are creating asignificant innovative technical potential for the productiondistribution and consumption of information services [1 2]

In 1999 Ashton first presented the concept of the Internetof Things (IoT) [2] a technological revolution that promotesa new ubiquitous connectivity computing and communica-tion eraThe IoT is a vision wherein the Internet extends intoour everyday lives through a wireless network of uniquelyidentifiable objects Therefore the development of the IoTdepends on dynamic technical innovations in a number

of fields including wireless sensors and nanotechnology[3] Furthermore the IoT service infrastructure is expectedto promptly evaluate the Quality of Services (QoS) andprovide satisfying services by considering things such as thepreferences of usersrsquo device capability and current networkstatus However the definition of QoS in the IoT is not clearbecause it has been poorly studied To adaptively manage anIoT system a newQoS controlmodel is necessaryThismodelmust be able to balance network availability with informationaccuracy in delivering data [4ndash8]

A fundamental challenge of QoS management is thatrelatively scarce network resources must be selected andallocated in a prudent manner to maximize system perfor-mance [7 8] To adaptively allocate network resources gametheory has been widely applied in mission-critical networkmanagement problems Typically game theory is used tostudy strategic situations where players choose differentactions in an attempt to maximize their payoffs dependingupon the choices of other individualsTherefore game theoryprovides a framework for modeling and analyzing variousinteractions between intelligent and rational game players inconflict situations [6]

Hindawi Publishing CorporationMobile Information SystemsVolume 2015 Article ID 605357 8 pageshttpdxdoiorg1011552015605357

2 Mobile Information Systems

In traditional gamemodels it is important to define equi-librium strategies as game solutions Equilibrium strategiesare assumed to be the optimal reaction to others given fullknowledge and observability of the payoffs and actions of theother players Therefore most equilibrium concepts requirethat the payoffs and strategies of the other players be knownin advance and observed by all players However this is astrong assumption that is not the case in the majority of real-life problems Players in actual situations have only partialknowledge or no knowledge at all regarding their environ-ments and the other players evolving around them [6] Toalleviate this difficulty van der Wal developed the Markovgame model [9] This approach relaxes the strict game modelassumptions by implementing learning algorithms Throughrepeated playsMarkov gameplayers effectively consider theircurrent payoffs and a history of observations regarding thestrategies of the other players [9 10]

The main purpose of this paper is to develop an effectiveQoS control scheme for IoT systems Based on the Markovgamemodel we build an intelligent decision-making processthat addresses the critical QoS problem of an IoT systemWith a real-time learning feedbackmechanism the proposedscheme adapts well to the dynamic requirements of IoTapplications Through online-oriented strategic decisionsthe proposed scheme attempts to attain a self-confirmingequilibrium the new solution concept for real-time networksystems

11 Related Work To improve IoT system performance sev-eral QoS control schemes have been proposed to efficientlyand integrally allocate IoT resources The Time-ControlledResource Sharing (TCRS) scheme [11] is a scheduling schemethat shares resources between Machine-to-Machine (M2M)and Human-to-Human (H2H) communication traffic ser-vices This scheme analytically focuses solely on resourceutilization and the QoS of the M2M and H2H traffic andderives expressions for blocking probabilities of theM2MandH2H traffic and percentage resource utilization [11]

The IoT Service Selection (IoTSS) scheme [12] is a modelto select the appropriate service from many services thatsatisfies a userrsquos requirements This scheme considers threecore concepts device resource and service while specifyingtheir relationships To dynamically aggregate individual QoSratings and select physical services the IoTSS scheme designsa Physical Service Selection (PSS) method that considersa user preference and an absolute dominance relationshipamong the physical services

TheApproximate Dynamic Programming based Prediction(ADPP) scheme [13] is a novel evaluation approach employ-ing prediction strategies to obtain accurate QoS valuesUnlike the traditional QoS prediction approaches the ADPPscheme is realized by incorporating an approximate dynamicprogramming based online parameter tuning strategy intothe QoS prediction approach The Services-oriented QoS-aware Scheduling (SQoSS) scheme [5] is a layered QoSscheduling scheme for service-oriented IoT The SQoSSscheme explores optimal QoS-aware service compositionusing the knowledge of each component serviceThis scheme

can effectively operate the scheduling problem in heteroge-neous network environments The main goal of the SQoSSscheme is to optimize the scheduling performance of the IoTnetwork while minimizing the resource costs [5]

The Intelligent Decision-Making Service (IDMS) scheme[4] constructs a context-orientedQoSmodel according to theAnalytical Hierarchy Process (AHP) Using this hierarchicalclustering algorithm the IDMS scheme can effect intelligentdecisions while fully considering the usersrsquo feedback Theearlier study has attracted significant attention and intro-duced unique challenges to efficiently solve the QoS controlproblem Compared to these schemes [4 5 13] the proposedscheme attains improved performance during the IoT systemoperations

The remainder of this paper is organized as followsThe proposed game model is formulated in Section 2 wherewe introduce a Markov decision process to solve the QoSproblem and explain the proposed IoT resource allocationalgorithm in detail In Section 3 we verify the effectivenessand efficiency of the proposed scheme from simulationresults We draw conclusions in Section 4

2 Proposed QoS Control Algorithms forIoT Systems

In this section we describe the proposed algorithm in detailThe algorithm implements a game theory technique andappears to be a natural approach to the QoS control problemEmploying a Markov game process we can effectively modelthe uncertainties in the current system environment Theproposed algorithm significantly improves the success rate ofthe IoT services

21 Markov Game Model for IoT Systems Network servicesare operated based on the Open Systems Interconnectionmodel (OSI Model) In this study we design the proposedscheme using a three-layered (ie application network andsensing layers) QoS architecture At the application layer anapplication is selected to establish a connection and decisionsare made by the user and the QoS scheduling engine Ingeneral the QoS module must allocate network resourcesto the services that are selected in the application layer[5] At the network layer the QoS module must allocatenetwork resources to the selected services The decision-making process at this layer may involve QoS attributes thatare used in traditionalQoSmechanisms over networks [5] Atthe sensing layer the decision-making process involves theselection of a basic sensing infrastructure based on sensingability and the required QoS for applications The QoSmodule at the sensing layer is responsible for the selectionof the basic sensing devices [5]

In this study we investigate learning algorithms usinguncertain dynamic and incomplete information anddevelopa new adaptive QoS scheduling algorithm that has an intelli-gent decision-making process useful in IoT systems For theinteractive decisions of the IoT system agents we formulatea multiple decision-making process using a game modelwhile studying a multiagent learning approach Using this

Mobile Information Systems 3

technique the proposed scheme can effectively improve theQoS in IoT systems

Learning is defined as the capability of making intelligentdecisions by self-adapting to the dynamics of the environ-ment considering experience gained in the past and presentsystem states and using long-term benefit estimations Thisapproach can be viewed as self-play where either a singleplayer or a population of players evolves during competi-tions on a repeated game During the operation of an IoTsystem learning is driven by the amount of informationavailable from every QoS scheduler [14] As indicated inthe traditional methods complete information significantlyimproves performance with respect to partial observabilityhowever the control overhead results in a lack of practicalimplementations Consequently a tradeoff must be madeconsidering that the capability tomake autonomous decisionsis a desirable property of self-organized IoT systems [5 14]

The Markov decision-making process is a well-established mathematical framework for solving sequentialdecision problems using probabilities It models a decision-making system where an action must be taken in each stateEach action may have different probabilistic outcomes thatchange the systemrsquos state The goal of the Markov decisionprocess is to determine a policy that dictates the best actionto take in each state By adopting the learning Markovgame approach the proposed model allows distributed QoSschedulers to learn the optimal strategy one step at a timeWithin each step the repeated game strategy is appliedto ensure cooperation among the QoS schedulers Thewell-known Markov decision process can be extended in astraightforwardmanner to create multiplayerMarkov gamesIn a Markov game actions are the result of the joint actionselection of all players and payoffs and state transitionsdepend on these joint actions Therefore payoffs are sensedfor combinations of actions taken by different players andplayers learn in a product or joint action space From theobtained data players can adapt to changing environmentsimprove performance based on their experience and makeprogress in understanding fundamental issues [5 9 10]

In the proposed QoS control algorithm the game modelis defined as a tuple ⟨S 119873A1198961le119896le119873 1198801198961le119896le119873T⟩ where S isthe set of all possible states and119873 is the number of players Inthe proposed model each state is the resource allocation sta-tus in the IoT systemA1198961le119896le119873 = 1198861 1198862 119886119898 is the collec-tion of strategies for player 119896 where119898 is the number of pos-sible strategies Actions are the joint result of multiple playerschoosing a strategy individually In the proposed Markovgame QoS schedulers are assumed as game players and thecollection of strategies for each player is the set of availabilitiesof system resources1198801198961le119896le119873 S times A1 timesA2 times sdot sdot sdot timesA119873 rarr Nis the utility function whereN represents the set of real num-bersT S times A1timesA2timessdot sdot sdottimesA119873 rarr Δ(S) is the state transitionfunction where Δ(S) is the set of discrete probability distri-butions over the set S Therefore T(119904119905 1198861 1198862 119886119873 119904

119905+1) is

the probability of arriving in state 119904119905+1 when each agent takesan action 119886119894 at state 119904

119905 where 119904119905 119904119905+1 isin S [5 9 10]In the developed gamemodel players seek to choose their

strategy independently and self-interestedly to maximize

their payoffs Each strategy represents an amount of systemresource and the utility function measures the outcomeof this decision Therefore different players can receivedifferent payoffs for the same state transition By consideringthe allocated resource amount delay and price the utilityfunction (119880) of each player is defined as follows

119880 (119909) = 120596 exp((Τ (119909)T

)

05minus120598

(120591119872 minus 120591)05+120598

) minus 119888 (119909 120585)

st 120598 isin minus05 05

(1)

where 120596 represents the playerrsquos willingness to pay for hisperceived service worth T is the systemrsquos average throughputand Τ(119909) is the playerrsquos current throughput with the allocatedresource 119909 this is the rate of successful data delivery over acommunication channel 120591119872 and 120591 are the maximum delayand the observed delay of the application services respec-tively 120591 is measured from real network operations In a real-time onlinemanner eachQoS scheduler actuallymeasures T Τ(119909) and 120591 119888(119909 120585) is the cost function and 120585 is the price fora resource unit 120591 is obtained according to the processing andarrival service rates In a distributed self-regarding fashioneach player (ie QoS scheduler) is independently interestedin the sole goal of maximizing his utility function as follows

max119909119880 (119909) where 119888 (119909 120585) = ( 119909120585

119909119860120585)

119902

(2)

where 119909 is the allocated resource in its own QoS scheduler119909119860 is the average resource amount of all QoS schedulers and119902 is a cost parameter for the cost function 119888(119909 120585) The costfunction is defined as the ratio of its own obtained resourceto the average resource amount of all the QoS schedulersTherefore other playersrsquo decisions are returned to each playerThis iterative feedback procedure continues under IoT systemdynamics In this study QoS schedulers can modify theiractions in an effort to maximize their 119880(119909) in a distributedmanner This approach can significantly reduce the compu-tational complexity and control overheads Therefore it ispractical and suitable for real world system implementation

22 Markov Decision Process for QoS Control Problems Inthis work we study the method that a player (ie QoSscheduler) in a dynamic IoT systemuses to learn an uncertainnetwork situation and arrives at a control decision by con-sidering the online feedback mechanism With an iterativelearning process the playersrsquo decision-making mechanism isdeveloped as a Markov game model which is an effectivemethod for the playersrsquo decision mechanism If playerschange their strategies the system state may change Basedon the immediate payoff (119880(1198780 119886119894(0))) of the current state1198780 and action 119886119894(0) players must consider the future payoffsWith the current payoff player 119894rsquos long-term expected payoff(119881119894(1198780 119886119894(0))) is given by [5]

4 Mobile Information Systems

119881119894 (1198780 119886119894 (0)) = max119886119894(119905)0le119905leinfin

[119880119894 (1198780 119886119894 (0)) +

infin

sum119905=1

(120573119905119880119894 (119878119905 119886119894 (119905)))]

st 119886119894 (119905) isin A119894

(3)

where 119886119894(119905) and 119880119894(119878119905 119886119894(119905)) are player 119894rsquos action and expectedpayoff at time 119905 respectively 120573 is a discount factor for thefuture state During game operations each combination ofstarting state action choice and next state has an associatedtransition probability Based on the transition probability (3)can be rewritten by the recursive Bellman equation formgiven in [5]

119881119894 (119878) =max119886119878

[119880119894 (119878119886119878) + 120574sum

1198781015840isinS119875119894(1198781015840| 119878 119886119878)119881119894 (119878

1015840)]

st 119886119878 isin A119894

(4)

where 1198781015840 represents all possible next states of 119878 and 120574 canbe regarded as the probability that the player remains at theselected strategy 119875119894(1198781015840|119878 119886119878) is the state transition probabilityfrom state 119878 to the state 1198781015840 119878 and 1198781015840 are elements of systemstate set S In this study119873 is the number of QoS schedulersand119898 is the number of possible strategies for each schedulerTherefore there are total119898119873 system states

119875(1198781015840| 119878 119886119878) is a distributed multiplayer probability

decision problem Using the multiplayer-learning algorithmeach player independently learns the current IoT systemsituation to dynamically determine 119875(1198781015840 | 119878 119886119878) Thisapproach can effectively control a Markov game processwith unknown transition probabilities and payoffs In theproposed algorithm each player is assumed to be intercon-nected by allowing them to play in a repeated game with thesame environment Assume there is a finite set of strategiesA1le119896le119873(119905) = 119886

119896

1(119905) 119886

119896

119898(119905) chosen by player 119896 at game

iteration 119905 119898 is the number of possible strategies Corre-spondingly U119896(119905) = (119906119896

1(119905) 119906

119896

119898(119905)) is a vector of specified

payoffs for player 119896 If player 119896 plays action 1198861198961198971le119897le119898

he earns apayoff 119906119896

1198971le119897le119898with probability 119901119896

119897 P119896(119905) = 119901119896

1(119905) 119901

119896

119898(119905)

is defined as player 119896rsquos probability distributionActions chosen by the players are input to the envi-

ronment and the environmental response to these actionsserves as input to each player Therefore multiple playersare connected in a feedback loop with the environmentWhen a player selects an actionwith his respective probabilitydistribution P(sdot) the environment produces a payoff 119880(sdot)according to (1) Therefore P(sdot) must be adjusted adaptivelyto contend with the payoff fluctuation At every game roundall players update their probability distributions based on

the online responses of the environment If player 119896 chooses119886119896

119897at time 119905 this player updates P119896(119905 + 1) as follows

119901119896

119895(119905 + 1)

=

119891(119901119896

119895(119905) + 120595[

119906119896

119897(119905) minus 119906

119896

119897(119905 minus 1)

119906119896119897(119905 minus 1)

]) if 119895 = 119897

120593119901119896

119895(119905) if 119895 = 119897

st

119891(120594) = 0 if 120594 lt 0

119891 (120594) = 120594 if 0 lt 120594 lt 1

119891 (120594) = 1 if 120594 gt 1

(5)

where 120593 is a discount factor and 120595 is a parameter to controlthe learning size from 119901(119905) to 119901(119905 + 1) In general smallvalues of 120595 correspond to slower rates of convergence andvice versa According to (5) 119875119896(1198781015840 | 119878 119886119878) is defined based onthe Boltzmann distribution

119875119896(1198781015840| 119878 119886119878) =

exp ((1120582) 119901119896119886119878(119905))

sum119895isinA119896

exp ((1120582) 119901119896119895(119905))

st 119886119878 isin A119896 (119905) = 119886119896

1(119905) 119886

119896

119898(119905)

(6)

where 120582 is a control parameter Strategies are chosen inproportion to their payoffs however their relative probabilityis adjusted by 120582 A value of 120582 close to zero allows minimalrandomization and a large value of 120582 results in completerandomization

23 The Main Steps of Proposed Scheme To allow optimalmovement inmultischeduler systems we consider the conse-quences of using the Markov game model by implementingthe adaptive learning algorithm that attempts to learn anoptimal action based on past actions and environmentalfeedback Although there are learning algorithms to con-struct a game model minimal research has been conductedon integrating learning algorithms with the decision-makingprocess where players are uncertain regarding the real worldand the influence of their decisions on each other

In the proposed learning-basedMarkov decision processa single QoS scheduler interacts with an environment definedby a probabilistic transition function From the result of theindividual learning experiences each scheduler can learnhow to effectively play under the dynamic network situationsAs the proposed learning algorithm proceeds and the various

Mobile Information Systems 5

Table 1 System parameters used in the simulation experiments

Traffic class Message application Bandwidth requirement Connection durationaveragesec

I Delay-critical emergency applications 32Kbps 30 sec (05min)

II Event-related applications 32Kbps 120 sec (2min)64Kbps 180 sec (3min)

III General applications 128Kbps 120 sec (2min)256Kbps 180 sec (3min)

IV Multimedia applications 384Kbps 300 sec (5min)512 Kbps 120 sec (2min)

Parameter Value Description120596 12 The playerrsquos willingness to pay for his perceived service worth120598 minus02 The control parameter between throughput and delay119902 11 The estimation parameters of the cost function120574 03 A probability that the user keeps staying at the selected strategyΔ 1 Predefined minimum bound for stable status120585 1 The price for resource unit in the cost functionm 3 The number of strategies for QoS schedulers120595 1 A parameter to control the learning size120593 08 A discount factor for the respective probability distribution120582 1 A control parameter on the Boltzmann distribution

Init ( ) 1 119901(sdot) = 1119898

2 Control parameter values (120596 120598 119902 120574 119901 Δ 120595 120593 and 120582) are given from Table 1

Main QoS control ( ) Start Init ( )

For ( ) 3 119880(119909) is obtained from (1) and (2)4 119901(sdot) is adjusted by using (5)5 119875(1198781015840 | 119878 119886119878) is defined by using (6)6 119886(119905) is selected to maximize 119881(sdot) based on (4)7 IF (119881(119905+1)(sdot) minus 119881(119905)(sdot) lt Δ) Temp ( )ELSE continue

Temp ( ) 8 For ( ) IF (119881(119905+1)(sdot) minus 119881(119905)(sdot) lt Δ) continue ELSE break

Pseudocode 1 IoT system QoS control procedure

actions are tested the QoS scheduler acquires increasinglymore informationThat is the payoff estimation at each gameiteration can be used to update 119875(1198781015840 | 119878 119886119878) in such amanner that those actions with a large payoff are more likelyto be chosen again in the next iteration To maximize theirexpected payoffs QoS schedulers adaptively modify theircurrent strategies This adjustment process is sequentiallyrepeated until the change of expected payoff (119881(sdot)) is within apredefined minimum bound (Δ) When no further strategymodifications are made by all the QoS schedulers the IoTsystem has attained a stable status The proposed algorithm

for this approach is described by Pseudocode 1 and thefollowing steps

Step 1 To begin 119901(sdot) is set to be equally distributed (119901(sdot) =1119898 where 119898 is the number of strategies) This startingguess guarantees that each strategy enjoys the same selectionprobability at the start of the game

Step 2 Control parameters 120596 120598 119902 120574 119901 Δ 120595 120593 and 120582 areprovided to eachQoS scheduler from the simulation scenario(refer to Table 1)

6 Mobile Information Systems

Step 3 Based on the current IoT situation each QoS sched-uler estimates his utility function (119880(119909)) according to (1) and(2)

Step 4 Using (5) eachQoS scheduler periodically adjusts the119901(sdot) values

Step 5 Based on the probability distribution P(sdot) each 119875(1198781015840 |119878 119886119878) is defined using the Boltzmann distribution

Step 6 Iteratively each QoS scheduler selects a strategy(119886(119905)) to maximize his long-term expected payoff (119881(sdot))This sequential learning process is repeatedly executed in adistributed manner

Step 7 If a QoS scheduler attains a stable status (ie119881(119905+1)

(sdot) minus 119881(119905)(sdot) lt Δ) this scheduler is assumed to have

obtained an equilibrium strategy When all QoS schedulersachieve a stable status the game process is temporarilystopped

Step 8 Each QoS scheduler continuously self-monitors thecurrent IoT situation and proceeds to Step 3 for the nextiteration

3 Performance Evaluation

In this section we compare the performance of the proposedscheme with other existing schemes [4 5 13] and confirm theperformance superiority of the proposed approach using asimulation model Our simulation model is a representationof an IoT system that includes system entities and thebehavior and interactions of these entities To facilitate thedevelopment and implementation of our simulator Table 1lists the system parameters

Our simulation results were achieved using MATLABwhich is widely used in academic and research institutionsin addition to industrial enterprises To emulate a real worldscenario the assumptions of our simulation environmentwere as follows

(i) The simulated system consisted of four QoS sched-ulers for the IoT system

(ii) In each scheduler coverage area a new service requestwas Poisson with rate 120588 (servicess) and the range ofthe offered service load was varied from 0 to 30

(iii) Therewere three strategies (119898) for theQoS schedulersand each strategy (1198861198941le119894le119898) was 119886119894 isin 25Mbps30Mbps 35Mbps Therefore there were total 119898119873that is 34 system states like S = (25Mbps 25Mbps25Mbps 25Mbps) (35Mbps 35Mbps 35Mbps35Mbps)

(iv) The resources of the IoT system bandwidth (bps) andtotal resource amount were 140Mbps

(v) Network performance measures obtained based on50 simulation runs were plotted as a function of theoffered traffic load

05

06

07

08

09

11

12

13

1

Reso

urce

usa

bilit

y in

IoT

syste

ms

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

Figure 1 Resource usability in IoT systems

(vi) The message size of each application was exponen-tially distributed with different means for differentmessage applications

(vii) For simplicity we assumed the absence of physicalobstacles in the experiments

(viii) The performance criteria obtained through simula-tion were resource usability service availability andnormalized service delay

(ix) Resource usability was defined as the percentage ofthe actually used resource

(x) Service availability was the success ratio of the servicerequests

(xi) The normalized service delay was a normalized ser-vice delay measured from real network operations

In this paper we compared the performance of the proposedscheme with existing schemes SQoSS [5] IDMS [4] andADPP [13] These existing schemes were recently developedas effective IoT management algorithms

Figure 1 presents the performance comparison of eachscheme in terms of resource usability in the IoT systemsIn this study resource usability is a measure of how systemresources are used Traditionally monitoring how resourcesare used is one of the most critical aspects of IoT man-agement During the system operations all schemes pro-duced similar resource usability However the proposedscheme adaptively allocates resources to the IoT system in anincremental manner while ensuring different requirementsTherefore the resource usability produced by the proposedscheme was higher than the other schemes from low to heavyservice load intensities

Figure 2 represents the service availability of each IoTcontrol scheme In this study service availability is defined asthe success ratio of the service requests In general excellentservice availability is a highly desirable property for real

Mobile Information Systems 7

03

04

05

06

07

08

09

11

12

13

1

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

Serv

ice a

vaila

bilit

y in

IoT

syste

ms

Figure 2 Service availability in IoT systems

Nor

mal

ized

serv

ice d

elay

in Io

T sy

stem

s

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

0

005

01

015

02

025

03

Figure 3 Normalized service delay in IoT systems

world IoT operations As indicated in the results it is clearthat performance trends are similar As the service requestrate increases it can saturate or exceed the system capacityTherefore excessive service requests may lead to systemcongestion decreasing the service availability This is intu-itively correct Under various application service requests theproposed game-based approach can provide a higher trafficservice than the other schemes From the above results weconclude that the proposed scheme can provide a higherservice availability in IoT systems

The curves in Figure 3 illustrate the normalized servicedelay for IoT services under different service loads Typicallyservice delay is an important QoS metric and can reveal thefitness or unfitness of system protocols for different delay-sensitive applications Owing to the feedback-based Markov

game approach the proposed scheme can dynamically adaptthe current situation and has significantly lower service delaythan the other schemes From the results we can observethat the proposed approach can support delay-sensitiveapplications and ensure a latency reduction in IoT services

The simulation results presented in Figures 1ndash3 demon-strate the performance of the proposed and other existingschemes and verify that the proposed Markov game-basedscheme can provide attractive network performance Themain features of the proposed scheme are as follows (i) anew Markov game model based on a distributed learningapproach is established (ii) each QoS scheduler learns theuncertain system state according to local information (iii)schedulers make decisions to maximize their own expectedpayoff by considering network dynamics and (iv) whenselecting a strategy schedulers consider not only the imme-diate payoff but also the subsequent decisions The proposedscheme constantly monitors the current network conditionsfor an adaptive IoT system management and successfullyexhibits excellent performance to approximate the optimizedperformance As expected the performance enhancementsprovided by the proposed scheme outperformed the existingschemes [4 5 13]

4 Summary and Conclusions

Today IoT-based services and applications are becoming anintegral part of our everyday life It is foreseeable that theIoT will be a part of the future Internet where ldquothingsrdquo canbe wirelessly organized as a global network that can providedynamic services for applications and users Therefore IoTtechnology can bridge the gap between the virtual networkand the ldquoreal thingsrdquo world Innovative uses of IoT techniqueson the Internetwill not only provide benefits to users to accesswide ranges of data sources but also generate challenges inaccessing heterogeneous application data especially in thedynamic environment of real-time IoT systems

This paper addressed a QoS control algorithm for IoTsystems Using the learning-based Markov game model QoSschedulers iteratively observed the current situation andrepeatedly modified their strategies to effectively managesystem resources Using a step-by-step feedback processthe proposed scheme effectively approximated the optimizedsystem performance in an entirely distributed manner Themost important novelties of the proposed scheme are itsadaptability and responsiveness to current system conditionsCompared with the existing schemes the simulation resultsconfirmed that the proposed game-based approach couldimprove the performance under dynamically changing IoTsystem environments whereas other existing schemes couldnot offer such an attractive performance Resource usabilityservice availability in IoT systems normalized service delayand accuracy were improved by approximately 5 10 10and 5 respectively compared to the existing schemes

Furthermore our study opens the door to several inter-esting extensions In the future we plan to design newreinforcement-learning models and develop adaptive onlinefeedback algorithmsThis is a potential direction and possible

8 Mobile Information Systems

extension to this study and can further improve the perfor-mance of IoT systems Moreover it would be interesting toextend theMarkov gamemodel to various decision-theoreticframeworks Under uncertain system environments thiswould be an interesting topic for future research

Conflict of Interests

The author Sungwook Kim declares that there is no conflictof interests regarding the publication of this paper

Acknowledgments

This research was supported by the MSIP (Ministry ofScience ICT and Future Planning) Korea under the ITRC(InformationTechnologyResearchCenter) Support Program(IITP-2015-H8501-15-1018) supervised by the IITP (Institutefor Information amp Communications Technology Promo-tion) and by the Sogang University Research Grant of 2014(20141002001)

References

[1] G Sallai ldquoChapters of future internet researchrdquo in Proceedingsof the 4th IEEE International Conference on Cognitive Infocom-munications (CogInfoCom rsquo13) pp 161ndash166 IEEE BudapestHungary December 2013

[2] K Ashton ldquoThat lsquoInternet of Thingsrsquo thing in the real worldthingsmattermore than ideasrdquoRFID Journal 2009 httpwwwrfidjournalcomarticleprint4986

[3] QWu G Ding Y Xu et al ldquoCognitive internet of things a newparadigm beyond connectionrdquo IEEE Internet of Things Journalvol 1 no 2 pp 129ndash143 2014

[4] Q Zhang and D Peng ldquoIntelligent decision-making serviceframework based on QoS model in the internet of thingsrdquo inProceedings of the 11th International Symposium on DistributedComputing and Applications to Business Engineering and Sci-ence (DCABES rsquo12) pp 103ndash107 Guilin China October 2012

[5] L Li S Li and S Zhao ldquoQoS-Aware scheduling of services-oriented internet of thingsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1497ndash1507 2014

[6] S Kim ldquoAdaptive ad-hoc network routing scheme by usingincentive-based modelrdquo Ad Hoc amp Sensor Wireless Networksvol 15 no 2 pp 107ndash125 2012

[7] G Pujolle ldquoMetamorphic networksrdquo Journal of ComputingScience and Engineering vol 7 no 3 pp 198ndash203 2013

[8] I Jang D Pyeon S Kim and H Yoon ldquoA survey on com-munication protocols for wireless sensor networksrdquo Journal ofComputing Science and Engineering vol 7 no 4 pp 231ndash2412013

[9] J van der Wal ldquoDiscounted Markov games successive approx-imation and stopping timesrdquo International Journal of GameTheory vol 6 no 1 pp 11ndash22 1977

[10] P Vrancx K Verbeeck and A Nowe ldquoDecentralized learningin Markov gamesrdquo IEEE Transactions on Systems Man andCybernetics Part B Cybernetics vol 38 no 4 pp 976ndash981 2008

[11] K Edemacu and T Bulega ldquoResource sharing between M2Mand H2H traffic under time-controlled scheduling scheme in

LTE networksrdquo in Proceedings of the 8th International Confer-ence on Telecommunication Systems Services and Applications(TSSA rsquo14) pp 1ndash6 Kuta Indonesia October 2014

[12] X Jin S Chun J Jung and K-H Lee ldquoIoT service selectionbased on physical service model and absolute dominancerelationshiprdquo in Proceedings of the 7th IEEE International Con-ference on Service-Oriented Computing and Applications (SOCArsquo14) pp 65ndash72 Matsue Japan November 2014

[13] X Luo H Luo and X Chang ldquoOnline optimization of col-laborative web service QoS prediction based on approximatedynamic programmingrdquo in Proceedings of the InternationalConference on Identification Information and Knowledge in theInternet of Things (IIKI rsquo14) pp 80ndash83 IEEE Beijing ChinaOctober 2014

[14] A Imran M Bennis and L Giupponi ldquoUse of learning gametheory and optimization as biomimetic approaches for Self-Organization inmacro-femtocell coexistencerdquo in Proceedings ofthe IEEE Wireless Communications and Networking ConferenceWorkshops (WCNCW rsquo12) pp 103ndash108 IEEE Paris FranceApril 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article Learning-Based QoS Control …downloads.hindawi.com/journals/misy/2015/605357.pdf · Learning-Based QoS Control Algorithms for Next Generation Internet of Things

2 Mobile Information Systems

In traditional gamemodels it is important to define equi-librium strategies as game solutions Equilibrium strategiesare assumed to be the optimal reaction to others given fullknowledge and observability of the payoffs and actions of theother players Therefore most equilibrium concepts requirethat the payoffs and strategies of the other players be knownin advance and observed by all players However this is astrong assumption that is not the case in the majority of real-life problems Players in actual situations have only partialknowledge or no knowledge at all regarding their environ-ments and the other players evolving around them [6] Toalleviate this difficulty van der Wal developed the Markovgame model [9] This approach relaxes the strict game modelassumptions by implementing learning algorithms Throughrepeated playsMarkov gameplayers effectively consider theircurrent payoffs and a history of observations regarding thestrategies of the other players [9 10]

The main purpose of this paper is to develop an effectiveQoS control scheme for IoT systems Based on the Markovgamemodel we build an intelligent decision-making processthat addresses the critical QoS problem of an IoT systemWith a real-time learning feedbackmechanism the proposedscheme adapts well to the dynamic requirements of IoTapplications Through online-oriented strategic decisionsthe proposed scheme attempts to attain a self-confirmingequilibrium the new solution concept for real-time networksystems

11 Related Work To improve IoT system performance sev-eral QoS control schemes have been proposed to efficientlyand integrally allocate IoT resources The Time-ControlledResource Sharing (TCRS) scheme [11] is a scheduling schemethat shares resources between Machine-to-Machine (M2M)and Human-to-Human (H2H) communication traffic ser-vices This scheme analytically focuses solely on resourceutilization and the QoS of the M2M and H2H traffic andderives expressions for blocking probabilities of theM2MandH2H traffic and percentage resource utilization [11]

The IoT Service Selection (IoTSS) scheme [12] is a modelto select the appropriate service from many services thatsatisfies a userrsquos requirements This scheme considers threecore concepts device resource and service while specifyingtheir relationships To dynamically aggregate individual QoSratings and select physical services the IoTSS scheme designsa Physical Service Selection (PSS) method that considersa user preference and an absolute dominance relationshipamong the physical services

TheApproximate Dynamic Programming based Prediction(ADPP) scheme [13] is a novel evaluation approach employ-ing prediction strategies to obtain accurate QoS valuesUnlike the traditional QoS prediction approaches the ADPPscheme is realized by incorporating an approximate dynamicprogramming based online parameter tuning strategy intothe QoS prediction approach The Services-oriented QoS-aware Scheduling (SQoSS) scheme [5] is a layered QoSscheduling scheme for service-oriented IoT The SQoSSscheme explores optimal QoS-aware service compositionusing the knowledge of each component serviceThis scheme

can effectively operate the scheduling problem in heteroge-neous network environments The main goal of the SQoSSscheme is to optimize the scheduling performance of the IoTnetwork while minimizing the resource costs [5]

The Intelligent Decision-Making Service (IDMS) scheme[4] constructs a context-orientedQoSmodel according to theAnalytical Hierarchy Process (AHP) Using this hierarchicalclustering algorithm the IDMS scheme can effect intelligentdecisions while fully considering the usersrsquo feedback Theearlier study has attracted significant attention and intro-duced unique challenges to efficiently solve the QoS controlproblem Compared to these schemes [4 5 13] the proposedscheme attains improved performance during the IoT systemoperations

The remainder of this paper is organized as followsThe proposed game model is formulated in Section 2 wherewe introduce a Markov decision process to solve the QoSproblem and explain the proposed IoT resource allocationalgorithm in detail In Section 3 we verify the effectivenessand efficiency of the proposed scheme from simulationresults We draw conclusions in Section 4

2 Proposed QoS Control Algorithms forIoT Systems

In this section we describe the proposed algorithm in detailThe algorithm implements a game theory technique andappears to be a natural approach to the QoS control problemEmploying a Markov game process we can effectively modelthe uncertainties in the current system environment Theproposed algorithm significantly improves the success rate ofthe IoT services

21 Markov Game Model for IoT Systems Network servicesare operated based on the Open Systems Interconnectionmodel (OSI Model) In this study we design the proposedscheme using a three-layered (ie application network andsensing layers) QoS architecture At the application layer anapplication is selected to establish a connection and decisionsare made by the user and the QoS scheduling engine Ingeneral the QoS module must allocate network resourcesto the services that are selected in the application layer[5] At the network layer the QoS module must allocatenetwork resources to the selected services The decision-making process at this layer may involve QoS attributes thatare used in traditionalQoSmechanisms over networks [5] Atthe sensing layer the decision-making process involves theselection of a basic sensing infrastructure based on sensingability and the required QoS for applications The QoSmodule at the sensing layer is responsible for the selectionof the basic sensing devices [5]

In this study we investigate learning algorithms usinguncertain dynamic and incomplete information anddevelopa new adaptive QoS scheduling algorithm that has an intelli-gent decision-making process useful in IoT systems For theinteractive decisions of the IoT system agents we formulatea multiple decision-making process using a game modelwhile studying a multiagent learning approach Using this

Mobile Information Systems 3

technique the proposed scheme can effectively improve theQoS in IoT systems

Learning is defined as the capability of making intelligentdecisions by self-adapting to the dynamics of the environ-ment considering experience gained in the past and presentsystem states and using long-term benefit estimations Thisapproach can be viewed as self-play where either a singleplayer or a population of players evolves during competi-tions on a repeated game During the operation of an IoTsystem learning is driven by the amount of informationavailable from every QoS scheduler [14] As indicated inthe traditional methods complete information significantlyimproves performance with respect to partial observabilityhowever the control overhead results in a lack of practicalimplementations Consequently a tradeoff must be madeconsidering that the capability tomake autonomous decisionsis a desirable property of self-organized IoT systems [5 14]

The Markov decision-making process is a well-established mathematical framework for solving sequentialdecision problems using probabilities It models a decision-making system where an action must be taken in each stateEach action may have different probabilistic outcomes thatchange the systemrsquos state The goal of the Markov decisionprocess is to determine a policy that dictates the best actionto take in each state By adopting the learning Markovgame approach the proposed model allows distributed QoSschedulers to learn the optimal strategy one step at a timeWithin each step the repeated game strategy is appliedto ensure cooperation among the QoS schedulers Thewell-known Markov decision process can be extended in astraightforwardmanner to create multiplayerMarkov gamesIn a Markov game actions are the result of the joint actionselection of all players and payoffs and state transitionsdepend on these joint actions Therefore payoffs are sensedfor combinations of actions taken by different players andplayers learn in a product or joint action space From theobtained data players can adapt to changing environmentsimprove performance based on their experience and makeprogress in understanding fundamental issues [5 9 10]

In the proposed QoS control algorithm the game modelis defined as a tuple ⟨S 119873A1198961le119896le119873 1198801198961le119896le119873T⟩ where S isthe set of all possible states and119873 is the number of players Inthe proposed model each state is the resource allocation sta-tus in the IoT systemA1198961le119896le119873 = 1198861 1198862 119886119898 is the collec-tion of strategies for player 119896 where119898 is the number of pos-sible strategies Actions are the joint result of multiple playerschoosing a strategy individually In the proposed Markovgame QoS schedulers are assumed as game players and thecollection of strategies for each player is the set of availabilitiesof system resources1198801198961le119896le119873 S times A1 timesA2 times sdot sdot sdot timesA119873 rarr Nis the utility function whereN represents the set of real num-bersT S times A1timesA2timessdot sdot sdottimesA119873 rarr Δ(S) is the state transitionfunction where Δ(S) is the set of discrete probability distri-butions over the set S Therefore T(119904119905 1198861 1198862 119886119873 119904

119905+1) is

the probability of arriving in state 119904119905+1 when each agent takesan action 119886119894 at state 119904

119905 where 119904119905 119904119905+1 isin S [5 9 10]In the developed gamemodel players seek to choose their

strategy independently and self-interestedly to maximize

their payoffs Each strategy represents an amount of systemresource and the utility function measures the outcomeof this decision Therefore different players can receivedifferent payoffs for the same state transition By consideringthe allocated resource amount delay and price the utilityfunction (119880) of each player is defined as follows

119880 (119909) = 120596 exp((Τ (119909)T

)

05minus120598

(120591119872 minus 120591)05+120598

) minus 119888 (119909 120585)

st 120598 isin minus05 05

(1)

where 120596 represents the playerrsquos willingness to pay for hisperceived service worth T is the systemrsquos average throughputand Τ(119909) is the playerrsquos current throughput with the allocatedresource 119909 this is the rate of successful data delivery over acommunication channel 120591119872 and 120591 are the maximum delayand the observed delay of the application services respec-tively 120591 is measured from real network operations In a real-time onlinemanner eachQoS scheduler actuallymeasures T Τ(119909) and 120591 119888(119909 120585) is the cost function and 120585 is the price fora resource unit 120591 is obtained according to the processing andarrival service rates In a distributed self-regarding fashioneach player (ie QoS scheduler) is independently interestedin the sole goal of maximizing his utility function as follows

max119909119880 (119909) where 119888 (119909 120585) = ( 119909120585

119909119860120585)

119902

(2)

where 119909 is the allocated resource in its own QoS scheduler119909119860 is the average resource amount of all QoS schedulers and119902 is a cost parameter for the cost function 119888(119909 120585) The costfunction is defined as the ratio of its own obtained resourceto the average resource amount of all the QoS schedulersTherefore other playersrsquo decisions are returned to each playerThis iterative feedback procedure continues under IoT systemdynamics In this study QoS schedulers can modify theiractions in an effort to maximize their 119880(119909) in a distributedmanner This approach can significantly reduce the compu-tational complexity and control overheads Therefore it ispractical and suitable for real world system implementation

22 Markov Decision Process for QoS Control Problems Inthis work we study the method that a player (ie QoSscheduler) in a dynamic IoT systemuses to learn an uncertainnetwork situation and arrives at a control decision by con-sidering the online feedback mechanism With an iterativelearning process the playersrsquo decision-making mechanism isdeveloped as a Markov game model which is an effectivemethod for the playersrsquo decision mechanism If playerschange their strategies the system state may change Basedon the immediate payoff (119880(1198780 119886119894(0))) of the current state1198780 and action 119886119894(0) players must consider the future payoffsWith the current payoff player 119894rsquos long-term expected payoff(119881119894(1198780 119886119894(0))) is given by [5]

4 Mobile Information Systems

119881119894 (1198780 119886119894 (0)) = max119886119894(119905)0le119905leinfin

[119880119894 (1198780 119886119894 (0)) +

infin

sum119905=1

(120573119905119880119894 (119878119905 119886119894 (119905)))]

st 119886119894 (119905) isin A119894

(3)

where 119886119894(119905) and 119880119894(119878119905 119886119894(119905)) are player 119894rsquos action and expectedpayoff at time 119905 respectively 120573 is a discount factor for thefuture state During game operations each combination ofstarting state action choice and next state has an associatedtransition probability Based on the transition probability (3)can be rewritten by the recursive Bellman equation formgiven in [5]

119881119894 (119878) =max119886119878

[119880119894 (119878119886119878) + 120574sum

1198781015840isinS119875119894(1198781015840| 119878 119886119878)119881119894 (119878

1015840)]

st 119886119878 isin A119894

(4)

where 1198781015840 represents all possible next states of 119878 and 120574 canbe regarded as the probability that the player remains at theselected strategy 119875119894(1198781015840|119878 119886119878) is the state transition probabilityfrom state 119878 to the state 1198781015840 119878 and 1198781015840 are elements of systemstate set S In this study119873 is the number of QoS schedulersand119898 is the number of possible strategies for each schedulerTherefore there are total119898119873 system states

119875(1198781015840| 119878 119886119878) is a distributed multiplayer probability

decision problem Using the multiplayer-learning algorithmeach player independently learns the current IoT systemsituation to dynamically determine 119875(1198781015840 | 119878 119886119878) Thisapproach can effectively control a Markov game processwith unknown transition probabilities and payoffs In theproposed algorithm each player is assumed to be intercon-nected by allowing them to play in a repeated game with thesame environment Assume there is a finite set of strategiesA1le119896le119873(119905) = 119886

119896

1(119905) 119886

119896

119898(119905) chosen by player 119896 at game

iteration 119905 119898 is the number of possible strategies Corre-spondingly U119896(119905) = (119906119896

1(119905) 119906

119896

119898(119905)) is a vector of specified

payoffs for player 119896 If player 119896 plays action 1198861198961198971le119897le119898

he earns apayoff 119906119896

1198971le119897le119898with probability 119901119896

119897 P119896(119905) = 119901119896

1(119905) 119901

119896

119898(119905)

is defined as player 119896rsquos probability distributionActions chosen by the players are input to the envi-

ronment and the environmental response to these actionsserves as input to each player Therefore multiple playersare connected in a feedback loop with the environmentWhen a player selects an actionwith his respective probabilitydistribution P(sdot) the environment produces a payoff 119880(sdot)according to (1) Therefore P(sdot) must be adjusted adaptivelyto contend with the payoff fluctuation At every game roundall players update their probability distributions based on

the online responses of the environment If player 119896 chooses119886119896

119897at time 119905 this player updates P119896(119905 + 1) as follows

119901119896

119895(119905 + 1)

=

119891(119901119896

119895(119905) + 120595[

119906119896

119897(119905) minus 119906

119896

119897(119905 minus 1)

119906119896119897(119905 minus 1)

]) if 119895 = 119897

120593119901119896

119895(119905) if 119895 = 119897

st

119891(120594) = 0 if 120594 lt 0

119891 (120594) = 120594 if 0 lt 120594 lt 1

119891 (120594) = 1 if 120594 gt 1

(5)

where 120593 is a discount factor and 120595 is a parameter to controlthe learning size from 119901(119905) to 119901(119905 + 1) In general smallvalues of 120595 correspond to slower rates of convergence andvice versa According to (5) 119875119896(1198781015840 | 119878 119886119878) is defined based onthe Boltzmann distribution

119875119896(1198781015840| 119878 119886119878) =

exp ((1120582) 119901119896119886119878(119905))

sum119895isinA119896

exp ((1120582) 119901119896119895(119905))

st 119886119878 isin A119896 (119905) = 119886119896

1(119905) 119886

119896

119898(119905)

(6)

where 120582 is a control parameter Strategies are chosen inproportion to their payoffs however their relative probabilityis adjusted by 120582 A value of 120582 close to zero allows minimalrandomization and a large value of 120582 results in completerandomization

23 The Main Steps of Proposed Scheme To allow optimalmovement inmultischeduler systems we consider the conse-quences of using the Markov game model by implementingthe adaptive learning algorithm that attempts to learn anoptimal action based on past actions and environmentalfeedback Although there are learning algorithms to con-struct a game model minimal research has been conductedon integrating learning algorithms with the decision-makingprocess where players are uncertain regarding the real worldand the influence of their decisions on each other

In the proposed learning-basedMarkov decision processa single QoS scheduler interacts with an environment definedby a probabilistic transition function From the result of theindividual learning experiences each scheduler can learnhow to effectively play under the dynamic network situationsAs the proposed learning algorithm proceeds and the various

Mobile Information Systems 5

Table 1 System parameters used in the simulation experiments

Traffic class Message application Bandwidth requirement Connection durationaveragesec

I Delay-critical emergency applications 32Kbps 30 sec (05min)

II Event-related applications 32Kbps 120 sec (2min)64Kbps 180 sec (3min)

III General applications 128Kbps 120 sec (2min)256Kbps 180 sec (3min)

IV Multimedia applications 384Kbps 300 sec (5min)512 Kbps 120 sec (2min)

Parameter Value Description120596 12 The playerrsquos willingness to pay for his perceived service worth120598 minus02 The control parameter between throughput and delay119902 11 The estimation parameters of the cost function120574 03 A probability that the user keeps staying at the selected strategyΔ 1 Predefined minimum bound for stable status120585 1 The price for resource unit in the cost functionm 3 The number of strategies for QoS schedulers120595 1 A parameter to control the learning size120593 08 A discount factor for the respective probability distribution120582 1 A control parameter on the Boltzmann distribution

Init ( ) 1 119901(sdot) = 1119898

2 Control parameter values (120596 120598 119902 120574 119901 Δ 120595 120593 and 120582) are given from Table 1

Main QoS control ( ) Start Init ( )

For ( ) 3 119880(119909) is obtained from (1) and (2)4 119901(sdot) is adjusted by using (5)5 119875(1198781015840 | 119878 119886119878) is defined by using (6)6 119886(119905) is selected to maximize 119881(sdot) based on (4)7 IF (119881(119905+1)(sdot) minus 119881(119905)(sdot) lt Δ) Temp ( )ELSE continue

Temp ( ) 8 For ( ) IF (119881(119905+1)(sdot) minus 119881(119905)(sdot) lt Δ) continue ELSE break

Pseudocode 1 IoT system QoS control procedure

actions are tested the QoS scheduler acquires increasinglymore informationThat is the payoff estimation at each gameiteration can be used to update 119875(1198781015840 | 119878 119886119878) in such amanner that those actions with a large payoff are more likelyto be chosen again in the next iteration To maximize theirexpected payoffs QoS schedulers adaptively modify theircurrent strategies This adjustment process is sequentiallyrepeated until the change of expected payoff (119881(sdot)) is within apredefined minimum bound (Δ) When no further strategymodifications are made by all the QoS schedulers the IoTsystem has attained a stable status The proposed algorithm

for this approach is described by Pseudocode 1 and thefollowing steps

Step 1 To begin 119901(sdot) is set to be equally distributed (119901(sdot) =1119898 where 119898 is the number of strategies) This startingguess guarantees that each strategy enjoys the same selectionprobability at the start of the game

Step 2 Control parameters 120596 120598 119902 120574 119901 Δ 120595 120593 and 120582 areprovided to eachQoS scheduler from the simulation scenario(refer to Table 1)

6 Mobile Information Systems

Step 3 Based on the current IoT situation each QoS sched-uler estimates his utility function (119880(119909)) according to (1) and(2)

Step 4 Using (5) eachQoS scheduler periodically adjusts the119901(sdot) values

Step 5 Based on the probability distribution P(sdot) each 119875(1198781015840 |119878 119886119878) is defined using the Boltzmann distribution

Step 6 Iteratively each QoS scheduler selects a strategy(119886(119905)) to maximize his long-term expected payoff (119881(sdot))This sequential learning process is repeatedly executed in adistributed manner

Step 7 If a QoS scheduler attains a stable status (ie119881(119905+1)

(sdot) minus 119881(119905)(sdot) lt Δ) this scheduler is assumed to have

obtained an equilibrium strategy When all QoS schedulersachieve a stable status the game process is temporarilystopped

Step 8 Each QoS scheduler continuously self-monitors thecurrent IoT situation and proceeds to Step 3 for the nextiteration

3 Performance Evaluation

In this section we compare the performance of the proposedscheme with other existing schemes [4 5 13] and confirm theperformance superiority of the proposed approach using asimulation model Our simulation model is a representationof an IoT system that includes system entities and thebehavior and interactions of these entities To facilitate thedevelopment and implementation of our simulator Table 1lists the system parameters

Our simulation results were achieved using MATLABwhich is widely used in academic and research institutionsin addition to industrial enterprises To emulate a real worldscenario the assumptions of our simulation environmentwere as follows

(i) The simulated system consisted of four QoS sched-ulers for the IoT system

(ii) In each scheduler coverage area a new service requestwas Poisson with rate 120588 (servicess) and the range ofthe offered service load was varied from 0 to 30

(iii) Therewere three strategies (119898) for theQoS schedulersand each strategy (1198861198941le119894le119898) was 119886119894 isin 25Mbps30Mbps 35Mbps Therefore there were total 119898119873that is 34 system states like S = (25Mbps 25Mbps25Mbps 25Mbps) (35Mbps 35Mbps 35Mbps35Mbps)

(iv) The resources of the IoT system bandwidth (bps) andtotal resource amount were 140Mbps

(v) Network performance measures obtained based on50 simulation runs were plotted as a function of theoffered traffic load

05

06

07

08

09

11

12

13

1

Reso

urce

usa

bilit

y in

IoT

syste

ms

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

Figure 1 Resource usability in IoT systems

(vi) The message size of each application was exponen-tially distributed with different means for differentmessage applications

(vii) For simplicity we assumed the absence of physicalobstacles in the experiments

(viii) The performance criteria obtained through simula-tion were resource usability service availability andnormalized service delay

(ix) Resource usability was defined as the percentage ofthe actually used resource

(x) Service availability was the success ratio of the servicerequests

(xi) The normalized service delay was a normalized ser-vice delay measured from real network operations

In this paper we compared the performance of the proposedscheme with existing schemes SQoSS [5] IDMS [4] andADPP [13] These existing schemes were recently developedas effective IoT management algorithms

Figure 1 presents the performance comparison of eachscheme in terms of resource usability in the IoT systemsIn this study resource usability is a measure of how systemresources are used Traditionally monitoring how resourcesare used is one of the most critical aspects of IoT man-agement During the system operations all schemes pro-duced similar resource usability However the proposedscheme adaptively allocates resources to the IoT system in anincremental manner while ensuring different requirementsTherefore the resource usability produced by the proposedscheme was higher than the other schemes from low to heavyservice load intensities

Figure 2 represents the service availability of each IoTcontrol scheme In this study service availability is defined asthe success ratio of the service requests In general excellentservice availability is a highly desirable property for real

Mobile Information Systems 7

03

04

05

06

07

08

09

11

12

13

1

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

Serv

ice a

vaila

bilit

y in

IoT

syste

ms

Figure 2 Service availability in IoT systems

Nor

mal

ized

serv

ice d

elay

in Io

T sy

stem

s

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

0

005

01

015

02

025

03

Figure 3 Normalized service delay in IoT systems

world IoT operations As indicated in the results it is clearthat performance trends are similar As the service requestrate increases it can saturate or exceed the system capacityTherefore excessive service requests may lead to systemcongestion decreasing the service availability This is intu-itively correct Under various application service requests theproposed game-based approach can provide a higher trafficservice than the other schemes From the above results weconclude that the proposed scheme can provide a higherservice availability in IoT systems

The curves in Figure 3 illustrate the normalized servicedelay for IoT services under different service loads Typicallyservice delay is an important QoS metric and can reveal thefitness or unfitness of system protocols for different delay-sensitive applications Owing to the feedback-based Markov

game approach the proposed scheme can dynamically adaptthe current situation and has significantly lower service delaythan the other schemes From the results we can observethat the proposed approach can support delay-sensitiveapplications and ensure a latency reduction in IoT services

The simulation results presented in Figures 1ndash3 demon-strate the performance of the proposed and other existingschemes and verify that the proposed Markov game-basedscheme can provide attractive network performance Themain features of the proposed scheme are as follows (i) anew Markov game model based on a distributed learningapproach is established (ii) each QoS scheduler learns theuncertain system state according to local information (iii)schedulers make decisions to maximize their own expectedpayoff by considering network dynamics and (iv) whenselecting a strategy schedulers consider not only the imme-diate payoff but also the subsequent decisions The proposedscheme constantly monitors the current network conditionsfor an adaptive IoT system management and successfullyexhibits excellent performance to approximate the optimizedperformance As expected the performance enhancementsprovided by the proposed scheme outperformed the existingschemes [4 5 13]

4 Summary and Conclusions

Today IoT-based services and applications are becoming anintegral part of our everyday life It is foreseeable that theIoT will be a part of the future Internet where ldquothingsrdquo canbe wirelessly organized as a global network that can providedynamic services for applications and users Therefore IoTtechnology can bridge the gap between the virtual networkand the ldquoreal thingsrdquo world Innovative uses of IoT techniqueson the Internetwill not only provide benefits to users to accesswide ranges of data sources but also generate challenges inaccessing heterogeneous application data especially in thedynamic environment of real-time IoT systems

This paper addressed a QoS control algorithm for IoTsystems Using the learning-based Markov game model QoSschedulers iteratively observed the current situation andrepeatedly modified their strategies to effectively managesystem resources Using a step-by-step feedback processthe proposed scheme effectively approximated the optimizedsystem performance in an entirely distributed manner Themost important novelties of the proposed scheme are itsadaptability and responsiveness to current system conditionsCompared with the existing schemes the simulation resultsconfirmed that the proposed game-based approach couldimprove the performance under dynamically changing IoTsystem environments whereas other existing schemes couldnot offer such an attractive performance Resource usabilityservice availability in IoT systems normalized service delayand accuracy were improved by approximately 5 10 10and 5 respectively compared to the existing schemes

Furthermore our study opens the door to several inter-esting extensions In the future we plan to design newreinforcement-learning models and develop adaptive onlinefeedback algorithmsThis is a potential direction and possible

8 Mobile Information Systems

extension to this study and can further improve the perfor-mance of IoT systems Moreover it would be interesting toextend theMarkov gamemodel to various decision-theoreticframeworks Under uncertain system environments thiswould be an interesting topic for future research

Conflict of Interests

The author Sungwook Kim declares that there is no conflictof interests regarding the publication of this paper

Acknowledgments

This research was supported by the MSIP (Ministry ofScience ICT and Future Planning) Korea under the ITRC(InformationTechnologyResearchCenter) Support Program(IITP-2015-H8501-15-1018) supervised by the IITP (Institutefor Information amp Communications Technology Promo-tion) and by the Sogang University Research Grant of 2014(20141002001)

References

[1] G Sallai ldquoChapters of future internet researchrdquo in Proceedingsof the 4th IEEE International Conference on Cognitive Infocom-munications (CogInfoCom rsquo13) pp 161ndash166 IEEE BudapestHungary December 2013

[2] K Ashton ldquoThat lsquoInternet of Thingsrsquo thing in the real worldthingsmattermore than ideasrdquoRFID Journal 2009 httpwwwrfidjournalcomarticleprint4986

[3] QWu G Ding Y Xu et al ldquoCognitive internet of things a newparadigm beyond connectionrdquo IEEE Internet of Things Journalvol 1 no 2 pp 129ndash143 2014

[4] Q Zhang and D Peng ldquoIntelligent decision-making serviceframework based on QoS model in the internet of thingsrdquo inProceedings of the 11th International Symposium on DistributedComputing and Applications to Business Engineering and Sci-ence (DCABES rsquo12) pp 103ndash107 Guilin China October 2012

[5] L Li S Li and S Zhao ldquoQoS-Aware scheduling of services-oriented internet of thingsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1497ndash1507 2014

[6] S Kim ldquoAdaptive ad-hoc network routing scheme by usingincentive-based modelrdquo Ad Hoc amp Sensor Wireless Networksvol 15 no 2 pp 107ndash125 2012

[7] G Pujolle ldquoMetamorphic networksrdquo Journal of ComputingScience and Engineering vol 7 no 3 pp 198ndash203 2013

[8] I Jang D Pyeon S Kim and H Yoon ldquoA survey on com-munication protocols for wireless sensor networksrdquo Journal ofComputing Science and Engineering vol 7 no 4 pp 231ndash2412013

[9] J van der Wal ldquoDiscounted Markov games successive approx-imation and stopping timesrdquo International Journal of GameTheory vol 6 no 1 pp 11ndash22 1977

[10] P Vrancx K Verbeeck and A Nowe ldquoDecentralized learningin Markov gamesrdquo IEEE Transactions on Systems Man andCybernetics Part B Cybernetics vol 38 no 4 pp 976ndash981 2008

[11] K Edemacu and T Bulega ldquoResource sharing between M2Mand H2H traffic under time-controlled scheduling scheme in

LTE networksrdquo in Proceedings of the 8th International Confer-ence on Telecommunication Systems Services and Applications(TSSA rsquo14) pp 1ndash6 Kuta Indonesia October 2014

[12] X Jin S Chun J Jung and K-H Lee ldquoIoT service selectionbased on physical service model and absolute dominancerelationshiprdquo in Proceedings of the 7th IEEE International Con-ference on Service-Oriented Computing and Applications (SOCArsquo14) pp 65ndash72 Matsue Japan November 2014

[13] X Luo H Luo and X Chang ldquoOnline optimization of col-laborative web service QoS prediction based on approximatedynamic programmingrdquo in Proceedings of the InternationalConference on Identification Information and Knowledge in theInternet of Things (IIKI rsquo14) pp 80ndash83 IEEE Beijing ChinaOctober 2014

[14] A Imran M Bennis and L Giupponi ldquoUse of learning gametheory and optimization as biomimetic approaches for Self-Organization inmacro-femtocell coexistencerdquo in Proceedings ofthe IEEE Wireless Communications and Networking ConferenceWorkshops (WCNCW rsquo12) pp 103ndash108 IEEE Paris FranceApril 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article Learning-Based QoS Control …downloads.hindawi.com/journals/misy/2015/605357.pdf · Learning-Based QoS Control Algorithms for Next Generation Internet of Things

Mobile Information Systems 3

technique the proposed scheme can effectively improve theQoS in IoT systems

Learning is defined as the capability of making intelligentdecisions by self-adapting to the dynamics of the environ-ment considering experience gained in the past and presentsystem states and using long-term benefit estimations Thisapproach can be viewed as self-play where either a singleplayer or a population of players evolves during competi-tions on a repeated game During the operation of an IoTsystem learning is driven by the amount of informationavailable from every QoS scheduler [14] As indicated inthe traditional methods complete information significantlyimproves performance with respect to partial observabilityhowever the control overhead results in a lack of practicalimplementations Consequently a tradeoff must be madeconsidering that the capability tomake autonomous decisionsis a desirable property of self-organized IoT systems [5 14]

The Markov decision-making process is a well-established mathematical framework for solving sequentialdecision problems using probabilities It models a decision-making system where an action must be taken in each stateEach action may have different probabilistic outcomes thatchange the systemrsquos state The goal of the Markov decisionprocess is to determine a policy that dictates the best actionto take in each state By adopting the learning Markovgame approach the proposed model allows distributed QoSschedulers to learn the optimal strategy one step at a timeWithin each step the repeated game strategy is appliedto ensure cooperation among the QoS schedulers Thewell-known Markov decision process can be extended in astraightforwardmanner to create multiplayerMarkov gamesIn a Markov game actions are the result of the joint actionselection of all players and payoffs and state transitionsdepend on these joint actions Therefore payoffs are sensedfor combinations of actions taken by different players andplayers learn in a product or joint action space From theobtained data players can adapt to changing environmentsimprove performance based on their experience and makeprogress in understanding fundamental issues [5 9 10]

In the proposed QoS control algorithm the game modelis defined as a tuple ⟨S 119873A1198961le119896le119873 1198801198961le119896le119873T⟩ where S isthe set of all possible states and119873 is the number of players Inthe proposed model each state is the resource allocation sta-tus in the IoT systemA1198961le119896le119873 = 1198861 1198862 119886119898 is the collec-tion of strategies for player 119896 where119898 is the number of pos-sible strategies Actions are the joint result of multiple playerschoosing a strategy individually In the proposed Markovgame QoS schedulers are assumed as game players and thecollection of strategies for each player is the set of availabilitiesof system resources1198801198961le119896le119873 S times A1 timesA2 times sdot sdot sdot timesA119873 rarr Nis the utility function whereN represents the set of real num-bersT S times A1timesA2timessdot sdot sdottimesA119873 rarr Δ(S) is the state transitionfunction where Δ(S) is the set of discrete probability distri-butions over the set S Therefore T(119904119905 1198861 1198862 119886119873 119904

119905+1) is

the probability of arriving in state 119904119905+1 when each agent takesan action 119886119894 at state 119904

119905 where 119904119905 119904119905+1 isin S [5 9 10]In the developed gamemodel players seek to choose their

strategy independently and self-interestedly to maximize

their payoffs Each strategy represents an amount of systemresource and the utility function measures the outcomeof this decision Therefore different players can receivedifferent payoffs for the same state transition By consideringthe allocated resource amount delay and price the utilityfunction (119880) of each player is defined as follows

119880 (119909) = 120596 exp((Τ (119909)T

)

05minus120598

(120591119872 minus 120591)05+120598

) minus 119888 (119909 120585)

st 120598 isin minus05 05

(1)

where 120596 represents the playerrsquos willingness to pay for hisperceived service worth T is the systemrsquos average throughputand Τ(119909) is the playerrsquos current throughput with the allocatedresource 119909 this is the rate of successful data delivery over acommunication channel 120591119872 and 120591 are the maximum delayand the observed delay of the application services respec-tively 120591 is measured from real network operations In a real-time onlinemanner eachQoS scheduler actuallymeasures T Τ(119909) and 120591 119888(119909 120585) is the cost function and 120585 is the price fora resource unit 120591 is obtained according to the processing andarrival service rates In a distributed self-regarding fashioneach player (ie QoS scheduler) is independently interestedin the sole goal of maximizing his utility function as follows

max119909119880 (119909) where 119888 (119909 120585) = ( 119909120585

119909119860120585)

119902

(2)

where 119909 is the allocated resource in its own QoS scheduler119909119860 is the average resource amount of all QoS schedulers and119902 is a cost parameter for the cost function 119888(119909 120585) The costfunction is defined as the ratio of its own obtained resourceto the average resource amount of all the QoS schedulersTherefore other playersrsquo decisions are returned to each playerThis iterative feedback procedure continues under IoT systemdynamics In this study QoS schedulers can modify theiractions in an effort to maximize their 119880(119909) in a distributedmanner This approach can significantly reduce the compu-tational complexity and control overheads Therefore it ispractical and suitable for real world system implementation

22 Markov Decision Process for QoS Control Problems Inthis work we study the method that a player (ie QoSscheduler) in a dynamic IoT systemuses to learn an uncertainnetwork situation and arrives at a control decision by con-sidering the online feedback mechanism With an iterativelearning process the playersrsquo decision-making mechanism isdeveloped as a Markov game model which is an effectivemethod for the playersrsquo decision mechanism If playerschange their strategies the system state may change Basedon the immediate payoff (119880(1198780 119886119894(0))) of the current state1198780 and action 119886119894(0) players must consider the future payoffsWith the current payoff player 119894rsquos long-term expected payoff(119881119894(1198780 119886119894(0))) is given by [5]

4 Mobile Information Systems

119881119894 (1198780 119886119894 (0)) = max119886119894(119905)0le119905leinfin

[119880119894 (1198780 119886119894 (0)) +

infin

sum119905=1

(120573119905119880119894 (119878119905 119886119894 (119905)))]

st 119886119894 (119905) isin A119894

(3)

where 119886119894(119905) and 119880119894(119878119905 119886119894(119905)) are player 119894rsquos action and expectedpayoff at time 119905 respectively 120573 is a discount factor for thefuture state During game operations each combination ofstarting state action choice and next state has an associatedtransition probability Based on the transition probability (3)can be rewritten by the recursive Bellman equation formgiven in [5]

119881119894 (119878) =max119886119878

[119880119894 (119878119886119878) + 120574sum

1198781015840isinS119875119894(1198781015840| 119878 119886119878)119881119894 (119878

1015840)]

st 119886119878 isin A119894

(4)

where 1198781015840 represents all possible next states of 119878 and 120574 canbe regarded as the probability that the player remains at theselected strategy 119875119894(1198781015840|119878 119886119878) is the state transition probabilityfrom state 119878 to the state 1198781015840 119878 and 1198781015840 are elements of systemstate set S In this study119873 is the number of QoS schedulersand119898 is the number of possible strategies for each schedulerTherefore there are total119898119873 system states

119875(1198781015840| 119878 119886119878) is a distributed multiplayer probability

decision problem Using the multiplayer-learning algorithmeach player independently learns the current IoT systemsituation to dynamically determine 119875(1198781015840 | 119878 119886119878) Thisapproach can effectively control a Markov game processwith unknown transition probabilities and payoffs In theproposed algorithm each player is assumed to be intercon-nected by allowing them to play in a repeated game with thesame environment Assume there is a finite set of strategiesA1le119896le119873(119905) = 119886

119896

1(119905) 119886

119896

119898(119905) chosen by player 119896 at game

iteration 119905 119898 is the number of possible strategies Corre-spondingly U119896(119905) = (119906119896

1(119905) 119906

119896

119898(119905)) is a vector of specified

payoffs for player 119896 If player 119896 plays action 1198861198961198971le119897le119898

he earns apayoff 119906119896

1198971le119897le119898with probability 119901119896

119897 P119896(119905) = 119901119896

1(119905) 119901

119896

119898(119905)

is defined as player 119896rsquos probability distributionActions chosen by the players are input to the envi-

ronment and the environmental response to these actionsserves as input to each player Therefore multiple playersare connected in a feedback loop with the environmentWhen a player selects an actionwith his respective probabilitydistribution P(sdot) the environment produces a payoff 119880(sdot)according to (1) Therefore P(sdot) must be adjusted adaptivelyto contend with the payoff fluctuation At every game roundall players update their probability distributions based on

the online responses of the environment If player 119896 chooses119886119896

119897at time 119905 this player updates P119896(119905 + 1) as follows

119901119896

119895(119905 + 1)

=

119891(119901119896

119895(119905) + 120595[

119906119896

119897(119905) minus 119906

119896

119897(119905 minus 1)

119906119896119897(119905 minus 1)

]) if 119895 = 119897

120593119901119896

119895(119905) if 119895 = 119897

st

119891(120594) = 0 if 120594 lt 0

119891 (120594) = 120594 if 0 lt 120594 lt 1

119891 (120594) = 1 if 120594 gt 1

(5)

where 120593 is a discount factor and 120595 is a parameter to controlthe learning size from 119901(119905) to 119901(119905 + 1) In general smallvalues of 120595 correspond to slower rates of convergence andvice versa According to (5) 119875119896(1198781015840 | 119878 119886119878) is defined based onthe Boltzmann distribution

119875119896(1198781015840| 119878 119886119878) =

exp ((1120582) 119901119896119886119878(119905))

sum119895isinA119896

exp ((1120582) 119901119896119895(119905))

st 119886119878 isin A119896 (119905) = 119886119896

1(119905) 119886

119896

119898(119905)

(6)

where 120582 is a control parameter Strategies are chosen inproportion to their payoffs however their relative probabilityis adjusted by 120582 A value of 120582 close to zero allows minimalrandomization and a large value of 120582 results in completerandomization

23 The Main Steps of Proposed Scheme To allow optimalmovement inmultischeduler systems we consider the conse-quences of using the Markov game model by implementingthe adaptive learning algorithm that attempts to learn anoptimal action based on past actions and environmentalfeedback Although there are learning algorithms to con-struct a game model minimal research has been conductedon integrating learning algorithms with the decision-makingprocess where players are uncertain regarding the real worldand the influence of their decisions on each other

In the proposed learning-basedMarkov decision processa single QoS scheduler interacts with an environment definedby a probabilistic transition function From the result of theindividual learning experiences each scheduler can learnhow to effectively play under the dynamic network situationsAs the proposed learning algorithm proceeds and the various

Mobile Information Systems 5

Table 1 System parameters used in the simulation experiments

Traffic class Message application Bandwidth requirement Connection durationaveragesec

I Delay-critical emergency applications 32Kbps 30 sec (05min)

II Event-related applications 32Kbps 120 sec (2min)64Kbps 180 sec (3min)

III General applications 128Kbps 120 sec (2min)256Kbps 180 sec (3min)

IV Multimedia applications 384Kbps 300 sec (5min)512 Kbps 120 sec (2min)

Parameter Value Description120596 12 The playerrsquos willingness to pay for his perceived service worth120598 minus02 The control parameter between throughput and delay119902 11 The estimation parameters of the cost function120574 03 A probability that the user keeps staying at the selected strategyΔ 1 Predefined minimum bound for stable status120585 1 The price for resource unit in the cost functionm 3 The number of strategies for QoS schedulers120595 1 A parameter to control the learning size120593 08 A discount factor for the respective probability distribution120582 1 A control parameter on the Boltzmann distribution

Init ( ) 1 119901(sdot) = 1119898

2 Control parameter values (120596 120598 119902 120574 119901 Δ 120595 120593 and 120582) are given from Table 1

Main QoS control ( ) Start Init ( )

For ( ) 3 119880(119909) is obtained from (1) and (2)4 119901(sdot) is adjusted by using (5)5 119875(1198781015840 | 119878 119886119878) is defined by using (6)6 119886(119905) is selected to maximize 119881(sdot) based on (4)7 IF (119881(119905+1)(sdot) minus 119881(119905)(sdot) lt Δ) Temp ( )ELSE continue

Temp ( ) 8 For ( ) IF (119881(119905+1)(sdot) minus 119881(119905)(sdot) lt Δ) continue ELSE break

Pseudocode 1 IoT system QoS control procedure

actions are tested the QoS scheduler acquires increasinglymore informationThat is the payoff estimation at each gameiteration can be used to update 119875(1198781015840 | 119878 119886119878) in such amanner that those actions with a large payoff are more likelyto be chosen again in the next iteration To maximize theirexpected payoffs QoS schedulers adaptively modify theircurrent strategies This adjustment process is sequentiallyrepeated until the change of expected payoff (119881(sdot)) is within apredefined minimum bound (Δ) When no further strategymodifications are made by all the QoS schedulers the IoTsystem has attained a stable status The proposed algorithm

for this approach is described by Pseudocode 1 and thefollowing steps

Step 1 To begin 119901(sdot) is set to be equally distributed (119901(sdot) =1119898 where 119898 is the number of strategies) This startingguess guarantees that each strategy enjoys the same selectionprobability at the start of the game

Step 2 Control parameters 120596 120598 119902 120574 119901 Δ 120595 120593 and 120582 areprovided to eachQoS scheduler from the simulation scenario(refer to Table 1)

6 Mobile Information Systems

Step 3 Based on the current IoT situation each QoS sched-uler estimates his utility function (119880(119909)) according to (1) and(2)

Step 4 Using (5) eachQoS scheduler periodically adjusts the119901(sdot) values

Step 5 Based on the probability distribution P(sdot) each 119875(1198781015840 |119878 119886119878) is defined using the Boltzmann distribution

Step 6 Iteratively each QoS scheduler selects a strategy(119886(119905)) to maximize his long-term expected payoff (119881(sdot))This sequential learning process is repeatedly executed in adistributed manner

Step 7 If a QoS scheduler attains a stable status (ie119881(119905+1)

(sdot) minus 119881(119905)(sdot) lt Δ) this scheduler is assumed to have

obtained an equilibrium strategy When all QoS schedulersachieve a stable status the game process is temporarilystopped

Step 8 Each QoS scheduler continuously self-monitors thecurrent IoT situation and proceeds to Step 3 for the nextiteration

3 Performance Evaluation

In this section we compare the performance of the proposedscheme with other existing schemes [4 5 13] and confirm theperformance superiority of the proposed approach using asimulation model Our simulation model is a representationof an IoT system that includes system entities and thebehavior and interactions of these entities To facilitate thedevelopment and implementation of our simulator Table 1lists the system parameters

Our simulation results were achieved using MATLABwhich is widely used in academic and research institutionsin addition to industrial enterprises To emulate a real worldscenario the assumptions of our simulation environmentwere as follows

(i) The simulated system consisted of four QoS sched-ulers for the IoT system

(ii) In each scheduler coverage area a new service requestwas Poisson with rate 120588 (servicess) and the range ofthe offered service load was varied from 0 to 30

(iii) Therewere three strategies (119898) for theQoS schedulersand each strategy (1198861198941le119894le119898) was 119886119894 isin 25Mbps30Mbps 35Mbps Therefore there were total 119898119873that is 34 system states like S = (25Mbps 25Mbps25Mbps 25Mbps) (35Mbps 35Mbps 35Mbps35Mbps)

(iv) The resources of the IoT system bandwidth (bps) andtotal resource amount were 140Mbps

(v) Network performance measures obtained based on50 simulation runs were plotted as a function of theoffered traffic load

05

06

07

08

09

11

12

13

1

Reso

urce

usa

bilit

y in

IoT

syste

ms

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

Figure 1 Resource usability in IoT systems

(vi) The message size of each application was exponen-tially distributed with different means for differentmessage applications

(vii) For simplicity we assumed the absence of physicalobstacles in the experiments

(viii) The performance criteria obtained through simula-tion were resource usability service availability andnormalized service delay

(ix) Resource usability was defined as the percentage ofthe actually used resource

(x) Service availability was the success ratio of the servicerequests

(xi) The normalized service delay was a normalized ser-vice delay measured from real network operations

In this paper we compared the performance of the proposedscheme with existing schemes SQoSS [5] IDMS [4] andADPP [13] These existing schemes were recently developedas effective IoT management algorithms

Figure 1 presents the performance comparison of eachscheme in terms of resource usability in the IoT systemsIn this study resource usability is a measure of how systemresources are used Traditionally monitoring how resourcesare used is one of the most critical aspects of IoT man-agement During the system operations all schemes pro-duced similar resource usability However the proposedscheme adaptively allocates resources to the IoT system in anincremental manner while ensuring different requirementsTherefore the resource usability produced by the proposedscheme was higher than the other schemes from low to heavyservice load intensities

Figure 2 represents the service availability of each IoTcontrol scheme In this study service availability is defined asthe success ratio of the service requests In general excellentservice availability is a highly desirable property for real

Mobile Information Systems 7

03

04

05

06

07

08

09

11

12

13

1

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

Serv

ice a

vaila

bilit

y in

IoT

syste

ms

Figure 2 Service availability in IoT systems

Nor

mal

ized

serv

ice d

elay

in Io

T sy

stem

s

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

0

005

01

015

02

025

03

Figure 3 Normalized service delay in IoT systems

world IoT operations As indicated in the results it is clearthat performance trends are similar As the service requestrate increases it can saturate or exceed the system capacityTherefore excessive service requests may lead to systemcongestion decreasing the service availability This is intu-itively correct Under various application service requests theproposed game-based approach can provide a higher trafficservice than the other schemes From the above results weconclude that the proposed scheme can provide a higherservice availability in IoT systems

The curves in Figure 3 illustrate the normalized servicedelay for IoT services under different service loads Typicallyservice delay is an important QoS metric and can reveal thefitness or unfitness of system protocols for different delay-sensitive applications Owing to the feedback-based Markov

game approach the proposed scheme can dynamically adaptthe current situation and has significantly lower service delaythan the other schemes From the results we can observethat the proposed approach can support delay-sensitiveapplications and ensure a latency reduction in IoT services

The simulation results presented in Figures 1ndash3 demon-strate the performance of the proposed and other existingschemes and verify that the proposed Markov game-basedscheme can provide attractive network performance Themain features of the proposed scheme are as follows (i) anew Markov game model based on a distributed learningapproach is established (ii) each QoS scheduler learns theuncertain system state according to local information (iii)schedulers make decisions to maximize their own expectedpayoff by considering network dynamics and (iv) whenselecting a strategy schedulers consider not only the imme-diate payoff but also the subsequent decisions The proposedscheme constantly monitors the current network conditionsfor an adaptive IoT system management and successfullyexhibits excellent performance to approximate the optimizedperformance As expected the performance enhancementsprovided by the proposed scheme outperformed the existingschemes [4 5 13]

4 Summary and Conclusions

Today IoT-based services and applications are becoming anintegral part of our everyday life It is foreseeable that theIoT will be a part of the future Internet where ldquothingsrdquo canbe wirelessly organized as a global network that can providedynamic services for applications and users Therefore IoTtechnology can bridge the gap between the virtual networkand the ldquoreal thingsrdquo world Innovative uses of IoT techniqueson the Internetwill not only provide benefits to users to accesswide ranges of data sources but also generate challenges inaccessing heterogeneous application data especially in thedynamic environment of real-time IoT systems

This paper addressed a QoS control algorithm for IoTsystems Using the learning-based Markov game model QoSschedulers iteratively observed the current situation andrepeatedly modified their strategies to effectively managesystem resources Using a step-by-step feedback processthe proposed scheme effectively approximated the optimizedsystem performance in an entirely distributed manner Themost important novelties of the proposed scheme are itsadaptability and responsiveness to current system conditionsCompared with the existing schemes the simulation resultsconfirmed that the proposed game-based approach couldimprove the performance under dynamically changing IoTsystem environments whereas other existing schemes couldnot offer such an attractive performance Resource usabilityservice availability in IoT systems normalized service delayand accuracy were improved by approximately 5 10 10and 5 respectively compared to the existing schemes

Furthermore our study opens the door to several inter-esting extensions In the future we plan to design newreinforcement-learning models and develop adaptive onlinefeedback algorithmsThis is a potential direction and possible

8 Mobile Information Systems

extension to this study and can further improve the perfor-mance of IoT systems Moreover it would be interesting toextend theMarkov gamemodel to various decision-theoreticframeworks Under uncertain system environments thiswould be an interesting topic for future research

Conflict of Interests

The author Sungwook Kim declares that there is no conflictof interests regarding the publication of this paper

Acknowledgments

This research was supported by the MSIP (Ministry ofScience ICT and Future Planning) Korea under the ITRC(InformationTechnologyResearchCenter) Support Program(IITP-2015-H8501-15-1018) supervised by the IITP (Institutefor Information amp Communications Technology Promo-tion) and by the Sogang University Research Grant of 2014(20141002001)

References

[1] G Sallai ldquoChapters of future internet researchrdquo in Proceedingsof the 4th IEEE International Conference on Cognitive Infocom-munications (CogInfoCom rsquo13) pp 161ndash166 IEEE BudapestHungary December 2013

[2] K Ashton ldquoThat lsquoInternet of Thingsrsquo thing in the real worldthingsmattermore than ideasrdquoRFID Journal 2009 httpwwwrfidjournalcomarticleprint4986

[3] QWu G Ding Y Xu et al ldquoCognitive internet of things a newparadigm beyond connectionrdquo IEEE Internet of Things Journalvol 1 no 2 pp 129ndash143 2014

[4] Q Zhang and D Peng ldquoIntelligent decision-making serviceframework based on QoS model in the internet of thingsrdquo inProceedings of the 11th International Symposium on DistributedComputing and Applications to Business Engineering and Sci-ence (DCABES rsquo12) pp 103ndash107 Guilin China October 2012

[5] L Li S Li and S Zhao ldquoQoS-Aware scheduling of services-oriented internet of thingsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1497ndash1507 2014

[6] S Kim ldquoAdaptive ad-hoc network routing scheme by usingincentive-based modelrdquo Ad Hoc amp Sensor Wireless Networksvol 15 no 2 pp 107ndash125 2012

[7] G Pujolle ldquoMetamorphic networksrdquo Journal of ComputingScience and Engineering vol 7 no 3 pp 198ndash203 2013

[8] I Jang D Pyeon S Kim and H Yoon ldquoA survey on com-munication protocols for wireless sensor networksrdquo Journal ofComputing Science and Engineering vol 7 no 4 pp 231ndash2412013

[9] J van der Wal ldquoDiscounted Markov games successive approx-imation and stopping timesrdquo International Journal of GameTheory vol 6 no 1 pp 11ndash22 1977

[10] P Vrancx K Verbeeck and A Nowe ldquoDecentralized learningin Markov gamesrdquo IEEE Transactions on Systems Man andCybernetics Part B Cybernetics vol 38 no 4 pp 976ndash981 2008

[11] K Edemacu and T Bulega ldquoResource sharing between M2Mand H2H traffic under time-controlled scheduling scheme in

LTE networksrdquo in Proceedings of the 8th International Confer-ence on Telecommunication Systems Services and Applications(TSSA rsquo14) pp 1ndash6 Kuta Indonesia October 2014

[12] X Jin S Chun J Jung and K-H Lee ldquoIoT service selectionbased on physical service model and absolute dominancerelationshiprdquo in Proceedings of the 7th IEEE International Con-ference on Service-Oriented Computing and Applications (SOCArsquo14) pp 65ndash72 Matsue Japan November 2014

[13] X Luo H Luo and X Chang ldquoOnline optimization of col-laborative web service QoS prediction based on approximatedynamic programmingrdquo in Proceedings of the InternationalConference on Identification Information and Knowledge in theInternet of Things (IIKI rsquo14) pp 80ndash83 IEEE Beijing ChinaOctober 2014

[14] A Imran M Bennis and L Giupponi ldquoUse of learning gametheory and optimization as biomimetic approaches for Self-Organization inmacro-femtocell coexistencerdquo in Proceedings ofthe IEEE Wireless Communications and Networking ConferenceWorkshops (WCNCW rsquo12) pp 103ndash108 IEEE Paris FranceApril 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article Learning-Based QoS Control …downloads.hindawi.com/journals/misy/2015/605357.pdf · Learning-Based QoS Control Algorithms for Next Generation Internet of Things

4 Mobile Information Systems

119881119894 (1198780 119886119894 (0)) = max119886119894(119905)0le119905leinfin

[119880119894 (1198780 119886119894 (0)) +

infin

sum119905=1

(120573119905119880119894 (119878119905 119886119894 (119905)))]

st 119886119894 (119905) isin A119894

(3)

where 119886119894(119905) and 119880119894(119878119905 119886119894(119905)) are player 119894rsquos action and expectedpayoff at time 119905 respectively 120573 is a discount factor for thefuture state During game operations each combination ofstarting state action choice and next state has an associatedtransition probability Based on the transition probability (3)can be rewritten by the recursive Bellman equation formgiven in [5]

119881119894 (119878) =max119886119878

[119880119894 (119878119886119878) + 120574sum

1198781015840isinS119875119894(1198781015840| 119878 119886119878)119881119894 (119878

1015840)]

st 119886119878 isin A119894

(4)

where 1198781015840 represents all possible next states of 119878 and 120574 canbe regarded as the probability that the player remains at theselected strategy 119875119894(1198781015840|119878 119886119878) is the state transition probabilityfrom state 119878 to the state 1198781015840 119878 and 1198781015840 are elements of systemstate set S In this study119873 is the number of QoS schedulersand119898 is the number of possible strategies for each schedulerTherefore there are total119898119873 system states

119875(1198781015840| 119878 119886119878) is a distributed multiplayer probability

decision problem Using the multiplayer-learning algorithmeach player independently learns the current IoT systemsituation to dynamically determine 119875(1198781015840 | 119878 119886119878) Thisapproach can effectively control a Markov game processwith unknown transition probabilities and payoffs In theproposed algorithm each player is assumed to be intercon-nected by allowing them to play in a repeated game with thesame environment Assume there is a finite set of strategiesA1le119896le119873(119905) = 119886

119896

1(119905) 119886

119896

119898(119905) chosen by player 119896 at game

iteration 119905 119898 is the number of possible strategies Corre-spondingly U119896(119905) = (119906119896

1(119905) 119906

119896

119898(119905)) is a vector of specified

payoffs for player 119896 If player 119896 plays action 1198861198961198971le119897le119898

he earns apayoff 119906119896

1198971le119897le119898with probability 119901119896

119897 P119896(119905) = 119901119896

1(119905) 119901

119896

119898(119905)

is defined as player 119896rsquos probability distributionActions chosen by the players are input to the envi-

ronment and the environmental response to these actionsserves as input to each player Therefore multiple playersare connected in a feedback loop with the environmentWhen a player selects an actionwith his respective probabilitydistribution P(sdot) the environment produces a payoff 119880(sdot)according to (1) Therefore P(sdot) must be adjusted adaptivelyto contend with the payoff fluctuation At every game roundall players update their probability distributions based on

the online responses of the environment If player 119896 chooses119886119896

119897at time 119905 this player updates P119896(119905 + 1) as follows

119901119896

119895(119905 + 1)

=

119891(119901119896

119895(119905) + 120595[

119906119896

119897(119905) minus 119906

119896

119897(119905 minus 1)

119906119896119897(119905 minus 1)

]) if 119895 = 119897

120593119901119896

119895(119905) if 119895 = 119897

st

119891(120594) = 0 if 120594 lt 0

119891 (120594) = 120594 if 0 lt 120594 lt 1

119891 (120594) = 1 if 120594 gt 1

(5)

where 120593 is a discount factor and 120595 is a parameter to controlthe learning size from 119901(119905) to 119901(119905 + 1) In general smallvalues of 120595 correspond to slower rates of convergence andvice versa According to (5) 119875119896(1198781015840 | 119878 119886119878) is defined based onthe Boltzmann distribution

119875119896(1198781015840| 119878 119886119878) =

exp ((1120582) 119901119896119886119878(119905))

sum119895isinA119896

exp ((1120582) 119901119896119895(119905))

st 119886119878 isin A119896 (119905) = 119886119896

1(119905) 119886

119896

119898(119905)

(6)

where 120582 is a control parameter Strategies are chosen inproportion to their payoffs however their relative probabilityis adjusted by 120582 A value of 120582 close to zero allows minimalrandomization and a large value of 120582 results in completerandomization

23 The Main Steps of Proposed Scheme To allow optimalmovement inmultischeduler systems we consider the conse-quences of using the Markov game model by implementingthe adaptive learning algorithm that attempts to learn anoptimal action based on past actions and environmentalfeedback Although there are learning algorithms to con-struct a game model minimal research has been conductedon integrating learning algorithms with the decision-makingprocess where players are uncertain regarding the real worldand the influence of their decisions on each other

In the proposed learning-basedMarkov decision processa single QoS scheduler interacts with an environment definedby a probabilistic transition function From the result of theindividual learning experiences each scheduler can learnhow to effectively play under the dynamic network situationsAs the proposed learning algorithm proceeds and the various

Mobile Information Systems 5

Table 1 System parameters used in the simulation experiments

Traffic class Message application Bandwidth requirement Connection durationaveragesec

I Delay-critical emergency applications 32Kbps 30 sec (05min)

II Event-related applications 32Kbps 120 sec (2min)64Kbps 180 sec (3min)

III General applications 128Kbps 120 sec (2min)256Kbps 180 sec (3min)

IV Multimedia applications 384Kbps 300 sec (5min)512 Kbps 120 sec (2min)

Parameter Value Description120596 12 The playerrsquos willingness to pay for his perceived service worth120598 minus02 The control parameter between throughput and delay119902 11 The estimation parameters of the cost function120574 03 A probability that the user keeps staying at the selected strategyΔ 1 Predefined minimum bound for stable status120585 1 The price for resource unit in the cost functionm 3 The number of strategies for QoS schedulers120595 1 A parameter to control the learning size120593 08 A discount factor for the respective probability distribution120582 1 A control parameter on the Boltzmann distribution

Init ( ) 1 119901(sdot) = 1119898

2 Control parameter values (120596 120598 119902 120574 119901 Δ 120595 120593 and 120582) are given from Table 1

Main QoS control ( ) Start Init ( )

For ( ) 3 119880(119909) is obtained from (1) and (2)4 119901(sdot) is adjusted by using (5)5 119875(1198781015840 | 119878 119886119878) is defined by using (6)6 119886(119905) is selected to maximize 119881(sdot) based on (4)7 IF (119881(119905+1)(sdot) minus 119881(119905)(sdot) lt Δ) Temp ( )ELSE continue

Temp ( ) 8 For ( ) IF (119881(119905+1)(sdot) minus 119881(119905)(sdot) lt Δ) continue ELSE break

Pseudocode 1 IoT system QoS control procedure

actions are tested the QoS scheduler acquires increasinglymore informationThat is the payoff estimation at each gameiteration can be used to update 119875(1198781015840 | 119878 119886119878) in such amanner that those actions with a large payoff are more likelyto be chosen again in the next iteration To maximize theirexpected payoffs QoS schedulers adaptively modify theircurrent strategies This adjustment process is sequentiallyrepeated until the change of expected payoff (119881(sdot)) is within apredefined minimum bound (Δ) When no further strategymodifications are made by all the QoS schedulers the IoTsystem has attained a stable status The proposed algorithm

for this approach is described by Pseudocode 1 and thefollowing steps

Step 1 To begin 119901(sdot) is set to be equally distributed (119901(sdot) =1119898 where 119898 is the number of strategies) This startingguess guarantees that each strategy enjoys the same selectionprobability at the start of the game

Step 2 Control parameters 120596 120598 119902 120574 119901 Δ 120595 120593 and 120582 areprovided to eachQoS scheduler from the simulation scenario(refer to Table 1)

6 Mobile Information Systems

Step 3 Based on the current IoT situation each QoS sched-uler estimates his utility function (119880(119909)) according to (1) and(2)

Step 4 Using (5) eachQoS scheduler periodically adjusts the119901(sdot) values

Step 5 Based on the probability distribution P(sdot) each 119875(1198781015840 |119878 119886119878) is defined using the Boltzmann distribution

Step 6 Iteratively each QoS scheduler selects a strategy(119886(119905)) to maximize his long-term expected payoff (119881(sdot))This sequential learning process is repeatedly executed in adistributed manner

Step 7 If a QoS scheduler attains a stable status (ie119881(119905+1)

(sdot) minus 119881(119905)(sdot) lt Δ) this scheduler is assumed to have

obtained an equilibrium strategy When all QoS schedulersachieve a stable status the game process is temporarilystopped

Step 8 Each QoS scheduler continuously self-monitors thecurrent IoT situation and proceeds to Step 3 for the nextiteration

3 Performance Evaluation

In this section we compare the performance of the proposedscheme with other existing schemes [4 5 13] and confirm theperformance superiority of the proposed approach using asimulation model Our simulation model is a representationof an IoT system that includes system entities and thebehavior and interactions of these entities To facilitate thedevelopment and implementation of our simulator Table 1lists the system parameters

Our simulation results were achieved using MATLABwhich is widely used in academic and research institutionsin addition to industrial enterprises To emulate a real worldscenario the assumptions of our simulation environmentwere as follows

(i) The simulated system consisted of four QoS sched-ulers for the IoT system

(ii) In each scheduler coverage area a new service requestwas Poisson with rate 120588 (servicess) and the range ofthe offered service load was varied from 0 to 30

(iii) Therewere three strategies (119898) for theQoS schedulersand each strategy (1198861198941le119894le119898) was 119886119894 isin 25Mbps30Mbps 35Mbps Therefore there were total 119898119873that is 34 system states like S = (25Mbps 25Mbps25Mbps 25Mbps) (35Mbps 35Mbps 35Mbps35Mbps)

(iv) The resources of the IoT system bandwidth (bps) andtotal resource amount were 140Mbps

(v) Network performance measures obtained based on50 simulation runs were plotted as a function of theoffered traffic load

05

06

07

08

09

11

12

13

1

Reso

urce

usa

bilit

y in

IoT

syste

ms

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

Figure 1 Resource usability in IoT systems

(vi) The message size of each application was exponen-tially distributed with different means for differentmessage applications

(vii) For simplicity we assumed the absence of physicalobstacles in the experiments

(viii) The performance criteria obtained through simula-tion were resource usability service availability andnormalized service delay

(ix) Resource usability was defined as the percentage ofthe actually used resource

(x) Service availability was the success ratio of the servicerequests

(xi) The normalized service delay was a normalized ser-vice delay measured from real network operations

In this paper we compared the performance of the proposedscheme with existing schemes SQoSS [5] IDMS [4] andADPP [13] These existing schemes were recently developedas effective IoT management algorithms

Figure 1 presents the performance comparison of eachscheme in terms of resource usability in the IoT systemsIn this study resource usability is a measure of how systemresources are used Traditionally monitoring how resourcesare used is one of the most critical aspects of IoT man-agement During the system operations all schemes pro-duced similar resource usability However the proposedscheme adaptively allocates resources to the IoT system in anincremental manner while ensuring different requirementsTherefore the resource usability produced by the proposedscheme was higher than the other schemes from low to heavyservice load intensities

Figure 2 represents the service availability of each IoTcontrol scheme In this study service availability is defined asthe success ratio of the service requests In general excellentservice availability is a highly desirable property for real

Mobile Information Systems 7

03

04

05

06

07

08

09

11

12

13

1

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

Serv

ice a

vaila

bilit

y in

IoT

syste

ms

Figure 2 Service availability in IoT systems

Nor

mal

ized

serv

ice d

elay

in Io

T sy

stem

s

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

0

005

01

015

02

025

03

Figure 3 Normalized service delay in IoT systems

world IoT operations As indicated in the results it is clearthat performance trends are similar As the service requestrate increases it can saturate or exceed the system capacityTherefore excessive service requests may lead to systemcongestion decreasing the service availability This is intu-itively correct Under various application service requests theproposed game-based approach can provide a higher trafficservice than the other schemes From the above results weconclude that the proposed scheme can provide a higherservice availability in IoT systems

The curves in Figure 3 illustrate the normalized servicedelay for IoT services under different service loads Typicallyservice delay is an important QoS metric and can reveal thefitness or unfitness of system protocols for different delay-sensitive applications Owing to the feedback-based Markov

game approach the proposed scheme can dynamically adaptthe current situation and has significantly lower service delaythan the other schemes From the results we can observethat the proposed approach can support delay-sensitiveapplications and ensure a latency reduction in IoT services

The simulation results presented in Figures 1ndash3 demon-strate the performance of the proposed and other existingschemes and verify that the proposed Markov game-basedscheme can provide attractive network performance Themain features of the proposed scheme are as follows (i) anew Markov game model based on a distributed learningapproach is established (ii) each QoS scheduler learns theuncertain system state according to local information (iii)schedulers make decisions to maximize their own expectedpayoff by considering network dynamics and (iv) whenselecting a strategy schedulers consider not only the imme-diate payoff but also the subsequent decisions The proposedscheme constantly monitors the current network conditionsfor an adaptive IoT system management and successfullyexhibits excellent performance to approximate the optimizedperformance As expected the performance enhancementsprovided by the proposed scheme outperformed the existingschemes [4 5 13]

4 Summary and Conclusions

Today IoT-based services and applications are becoming anintegral part of our everyday life It is foreseeable that theIoT will be a part of the future Internet where ldquothingsrdquo canbe wirelessly organized as a global network that can providedynamic services for applications and users Therefore IoTtechnology can bridge the gap between the virtual networkand the ldquoreal thingsrdquo world Innovative uses of IoT techniqueson the Internetwill not only provide benefits to users to accesswide ranges of data sources but also generate challenges inaccessing heterogeneous application data especially in thedynamic environment of real-time IoT systems

This paper addressed a QoS control algorithm for IoTsystems Using the learning-based Markov game model QoSschedulers iteratively observed the current situation andrepeatedly modified their strategies to effectively managesystem resources Using a step-by-step feedback processthe proposed scheme effectively approximated the optimizedsystem performance in an entirely distributed manner Themost important novelties of the proposed scheme are itsadaptability and responsiveness to current system conditionsCompared with the existing schemes the simulation resultsconfirmed that the proposed game-based approach couldimprove the performance under dynamically changing IoTsystem environments whereas other existing schemes couldnot offer such an attractive performance Resource usabilityservice availability in IoT systems normalized service delayand accuracy were improved by approximately 5 10 10and 5 respectively compared to the existing schemes

Furthermore our study opens the door to several inter-esting extensions In the future we plan to design newreinforcement-learning models and develop adaptive onlinefeedback algorithmsThis is a potential direction and possible

8 Mobile Information Systems

extension to this study and can further improve the perfor-mance of IoT systems Moreover it would be interesting toextend theMarkov gamemodel to various decision-theoreticframeworks Under uncertain system environments thiswould be an interesting topic for future research

Conflict of Interests

The author Sungwook Kim declares that there is no conflictof interests regarding the publication of this paper

Acknowledgments

This research was supported by the MSIP (Ministry ofScience ICT and Future Planning) Korea under the ITRC(InformationTechnologyResearchCenter) Support Program(IITP-2015-H8501-15-1018) supervised by the IITP (Institutefor Information amp Communications Technology Promo-tion) and by the Sogang University Research Grant of 2014(20141002001)

References

[1] G Sallai ldquoChapters of future internet researchrdquo in Proceedingsof the 4th IEEE International Conference on Cognitive Infocom-munications (CogInfoCom rsquo13) pp 161ndash166 IEEE BudapestHungary December 2013

[2] K Ashton ldquoThat lsquoInternet of Thingsrsquo thing in the real worldthingsmattermore than ideasrdquoRFID Journal 2009 httpwwwrfidjournalcomarticleprint4986

[3] QWu G Ding Y Xu et al ldquoCognitive internet of things a newparadigm beyond connectionrdquo IEEE Internet of Things Journalvol 1 no 2 pp 129ndash143 2014

[4] Q Zhang and D Peng ldquoIntelligent decision-making serviceframework based on QoS model in the internet of thingsrdquo inProceedings of the 11th International Symposium on DistributedComputing and Applications to Business Engineering and Sci-ence (DCABES rsquo12) pp 103ndash107 Guilin China October 2012

[5] L Li S Li and S Zhao ldquoQoS-Aware scheduling of services-oriented internet of thingsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1497ndash1507 2014

[6] S Kim ldquoAdaptive ad-hoc network routing scheme by usingincentive-based modelrdquo Ad Hoc amp Sensor Wireless Networksvol 15 no 2 pp 107ndash125 2012

[7] G Pujolle ldquoMetamorphic networksrdquo Journal of ComputingScience and Engineering vol 7 no 3 pp 198ndash203 2013

[8] I Jang D Pyeon S Kim and H Yoon ldquoA survey on com-munication protocols for wireless sensor networksrdquo Journal ofComputing Science and Engineering vol 7 no 4 pp 231ndash2412013

[9] J van der Wal ldquoDiscounted Markov games successive approx-imation and stopping timesrdquo International Journal of GameTheory vol 6 no 1 pp 11ndash22 1977

[10] P Vrancx K Verbeeck and A Nowe ldquoDecentralized learningin Markov gamesrdquo IEEE Transactions on Systems Man andCybernetics Part B Cybernetics vol 38 no 4 pp 976ndash981 2008

[11] K Edemacu and T Bulega ldquoResource sharing between M2Mand H2H traffic under time-controlled scheduling scheme in

LTE networksrdquo in Proceedings of the 8th International Confer-ence on Telecommunication Systems Services and Applications(TSSA rsquo14) pp 1ndash6 Kuta Indonesia October 2014

[12] X Jin S Chun J Jung and K-H Lee ldquoIoT service selectionbased on physical service model and absolute dominancerelationshiprdquo in Proceedings of the 7th IEEE International Con-ference on Service-Oriented Computing and Applications (SOCArsquo14) pp 65ndash72 Matsue Japan November 2014

[13] X Luo H Luo and X Chang ldquoOnline optimization of col-laborative web service QoS prediction based on approximatedynamic programmingrdquo in Proceedings of the InternationalConference on Identification Information and Knowledge in theInternet of Things (IIKI rsquo14) pp 80ndash83 IEEE Beijing ChinaOctober 2014

[14] A Imran M Bennis and L Giupponi ldquoUse of learning gametheory and optimization as biomimetic approaches for Self-Organization inmacro-femtocell coexistencerdquo in Proceedings ofthe IEEE Wireless Communications and Networking ConferenceWorkshops (WCNCW rsquo12) pp 103ndash108 IEEE Paris FranceApril 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article Learning-Based QoS Control …downloads.hindawi.com/journals/misy/2015/605357.pdf · Learning-Based QoS Control Algorithms for Next Generation Internet of Things

Mobile Information Systems 5

Table 1 System parameters used in the simulation experiments

Traffic class Message application Bandwidth requirement Connection durationaveragesec

I Delay-critical emergency applications 32Kbps 30 sec (05min)

II Event-related applications 32Kbps 120 sec (2min)64Kbps 180 sec (3min)

III General applications 128Kbps 120 sec (2min)256Kbps 180 sec (3min)

IV Multimedia applications 384Kbps 300 sec (5min)512 Kbps 120 sec (2min)

Parameter Value Description120596 12 The playerrsquos willingness to pay for his perceived service worth120598 minus02 The control parameter between throughput and delay119902 11 The estimation parameters of the cost function120574 03 A probability that the user keeps staying at the selected strategyΔ 1 Predefined minimum bound for stable status120585 1 The price for resource unit in the cost functionm 3 The number of strategies for QoS schedulers120595 1 A parameter to control the learning size120593 08 A discount factor for the respective probability distribution120582 1 A control parameter on the Boltzmann distribution

Init ( ) 1 119901(sdot) = 1119898

2 Control parameter values (120596 120598 119902 120574 119901 Δ 120595 120593 and 120582) are given from Table 1

Main QoS control ( ) Start Init ( )

For ( ) 3 119880(119909) is obtained from (1) and (2)4 119901(sdot) is adjusted by using (5)5 119875(1198781015840 | 119878 119886119878) is defined by using (6)6 119886(119905) is selected to maximize 119881(sdot) based on (4)7 IF (119881(119905+1)(sdot) minus 119881(119905)(sdot) lt Δ) Temp ( )ELSE continue

Temp ( ) 8 For ( ) IF (119881(119905+1)(sdot) minus 119881(119905)(sdot) lt Δ) continue ELSE break

Pseudocode 1 IoT system QoS control procedure

actions are tested the QoS scheduler acquires increasinglymore informationThat is the payoff estimation at each gameiteration can be used to update 119875(1198781015840 | 119878 119886119878) in such amanner that those actions with a large payoff are more likelyto be chosen again in the next iteration To maximize theirexpected payoffs QoS schedulers adaptively modify theircurrent strategies This adjustment process is sequentiallyrepeated until the change of expected payoff (119881(sdot)) is within apredefined minimum bound (Δ) When no further strategymodifications are made by all the QoS schedulers the IoTsystem has attained a stable status The proposed algorithm

for this approach is described by Pseudocode 1 and thefollowing steps

Step 1 To begin 119901(sdot) is set to be equally distributed (119901(sdot) =1119898 where 119898 is the number of strategies) This startingguess guarantees that each strategy enjoys the same selectionprobability at the start of the game

Step 2 Control parameters 120596 120598 119902 120574 119901 Δ 120595 120593 and 120582 areprovided to eachQoS scheduler from the simulation scenario(refer to Table 1)

6 Mobile Information Systems

Step 3 Based on the current IoT situation each QoS sched-uler estimates his utility function (119880(119909)) according to (1) and(2)

Step 4 Using (5) eachQoS scheduler periodically adjusts the119901(sdot) values

Step 5 Based on the probability distribution P(sdot) each 119875(1198781015840 |119878 119886119878) is defined using the Boltzmann distribution

Step 6 Iteratively each QoS scheduler selects a strategy(119886(119905)) to maximize his long-term expected payoff (119881(sdot))This sequential learning process is repeatedly executed in adistributed manner

Step 7 If a QoS scheduler attains a stable status (ie119881(119905+1)

(sdot) minus 119881(119905)(sdot) lt Δ) this scheduler is assumed to have

obtained an equilibrium strategy When all QoS schedulersachieve a stable status the game process is temporarilystopped

Step 8 Each QoS scheduler continuously self-monitors thecurrent IoT situation and proceeds to Step 3 for the nextiteration

3 Performance Evaluation

In this section we compare the performance of the proposedscheme with other existing schemes [4 5 13] and confirm theperformance superiority of the proposed approach using asimulation model Our simulation model is a representationof an IoT system that includes system entities and thebehavior and interactions of these entities To facilitate thedevelopment and implementation of our simulator Table 1lists the system parameters

Our simulation results were achieved using MATLABwhich is widely used in academic and research institutionsin addition to industrial enterprises To emulate a real worldscenario the assumptions of our simulation environmentwere as follows

(i) The simulated system consisted of four QoS sched-ulers for the IoT system

(ii) In each scheduler coverage area a new service requestwas Poisson with rate 120588 (servicess) and the range ofthe offered service load was varied from 0 to 30

(iii) Therewere three strategies (119898) for theQoS schedulersand each strategy (1198861198941le119894le119898) was 119886119894 isin 25Mbps30Mbps 35Mbps Therefore there were total 119898119873that is 34 system states like S = (25Mbps 25Mbps25Mbps 25Mbps) (35Mbps 35Mbps 35Mbps35Mbps)

(iv) The resources of the IoT system bandwidth (bps) andtotal resource amount were 140Mbps

(v) Network performance measures obtained based on50 simulation runs were plotted as a function of theoffered traffic load

05

06

07

08

09

11

12

13

1

Reso

urce

usa

bilit

y in

IoT

syste

ms

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

Figure 1 Resource usability in IoT systems

(vi) The message size of each application was exponen-tially distributed with different means for differentmessage applications

(vii) For simplicity we assumed the absence of physicalobstacles in the experiments

(viii) The performance criteria obtained through simula-tion were resource usability service availability andnormalized service delay

(ix) Resource usability was defined as the percentage ofthe actually used resource

(x) Service availability was the success ratio of the servicerequests

(xi) The normalized service delay was a normalized ser-vice delay measured from real network operations

In this paper we compared the performance of the proposedscheme with existing schemes SQoSS [5] IDMS [4] andADPP [13] These existing schemes were recently developedas effective IoT management algorithms

Figure 1 presents the performance comparison of eachscheme in terms of resource usability in the IoT systemsIn this study resource usability is a measure of how systemresources are used Traditionally monitoring how resourcesare used is one of the most critical aspects of IoT man-agement During the system operations all schemes pro-duced similar resource usability However the proposedscheme adaptively allocates resources to the IoT system in anincremental manner while ensuring different requirementsTherefore the resource usability produced by the proposedscheme was higher than the other schemes from low to heavyservice load intensities

Figure 2 represents the service availability of each IoTcontrol scheme In this study service availability is defined asthe success ratio of the service requests In general excellentservice availability is a highly desirable property for real

Mobile Information Systems 7

03

04

05

06

07

08

09

11

12

13

1

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

Serv

ice a

vaila

bilit

y in

IoT

syste

ms

Figure 2 Service availability in IoT systems

Nor

mal

ized

serv

ice d

elay

in Io

T sy

stem

s

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

0

005

01

015

02

025

03

Figure 3 Normalized service delay in IoT systems

world IoT operations As indicated in the results it is clearthat performance trends are similar As the service requestrate increases it can saturate or exceed the system capacityTherefore excessive service requests may lead to systemcongestion decreasing the service availability This is intu-itively correct Under various application service requests theproposed game-based approach can provide a higher trafficservice than the other schemes From the above results weconclude that the proposed scheme can provide a higherservice availability in IoT systems

The curves in Figure 3 illustrate the normalized servicedelay for IoT services under different service loads Typicallyservice delay is an important QoS metric and can reveal thefitness or unfitness of system protocols for different delay-sensitive applications Owing to the feedback-based Markov

game approach the proposed scheme can dynamically adaptthe current situation and has significantly lower service delaythan the other schemes From the results we can observethat the proposed approach can support delay-sensitiveapplications and ensure a latency reduction in IoT services

The simulation results presented in Figures 1ndash3 demon-strate the performance of the proposed and other existingschemes and verify that the proposed Markov game-basedscheme can provide attractive network performance Themain features of the proposed scheme are as follows (i) anew Markov game model based on a distributed learningapproach is established (ii) each QoS scheduler learns theuncertain system state according to local information (iii)schedulers make decisions to maximize their own expectedpayoff by considering network dynamics and (iv) whenselecting a strategy schedulers consider not only the imme-diate payoff but also the subsequent decisions The proposedscheme constantly monitors the current network conditionsfor an adaptive IoT system management and successfullyexhibits excellent performance to approximate the optimizedperformance As expected the performance enhancementsprovided by the proposed scheme outperformed the existingschemes [4 5 13]

4 Summary and Conclusions

Today IoT-based services and applications are becoming anintegral part of our everyday life It is foreseeable that theIoT will be a part of the future Internet where ldquothingsrdquo canbe wirelessly organized as a global network that can providedynamic services for applications and users Therefore IoTtechnology can bridge the gap between the virtual networkand the ldquoreal thingsrdquo world Innovative uses of IoT techniqueson the Internetwill not only provide benefits to users to accesswide ranges of data sources but also generate challenges inaccessing heterogeneous application data especially in thedynamic environment of real-time IoT systems

This paper addressed a QoS control algorithm for IoTsystems Using the learning-based Markov game model QoSschedulers iteratively observed the current situation andrepeatedly modified their strategies to effectively managesystem resources Using a step-by-step feedback processthe proposed scheme effectively approximated the optimizedsystem performance in an entirely distributed manner Themost important novelties of the proposed scheme are itsadaptability and responsiveness to current system conditionsCompared with the existing schemes the simulation resultsconfirmed that the proposed game-based approach couldimprove the performance under dynamically changing IoTsystem environments whereas other existing schemes couldnot offer such an attractive performance Resource usabilityservice availability in IoT systems normalized service delayand accuracy were improved by approximately 5 10 10and 5 respectively compared to the existing schemes

Furthermore our study opens the door to several inter-esting extensions In the future we plan to design newreinforcement-learning models and develop adaptive onlinefeedback algorithmsThis is a potential direction and possible

8 Mobile Information Systems

extension to this study and can further improve the perfor-mance of IoT systems Moreover it would be interesting toextend theMarkov gamemodel to various decision-theoreticframeworks Under uncertain system environments thiswould be an interesting topic for future research

Conflict of Interests

The author Sungwook Kim declares that there is no conflictof interests regarding the publication of this paper

Acknowledgments

This research was supported by the MSIP (Ministry ofScience ICT and Future Planning) Korea under the ITRC(InformationTechnologyResearchCenter) Support Program(IITP-2015-H8501-15-1018) supervised by the IITP (Institutefor Information amp Communications Technology Promo-tion) and by the Sogang University Research Grant of 2014(20141002001)

References

[1] G Sallai ldquoChapters of future internet researchrdquo in Proceedingsof the 4th IEEE International Conference on Cognitive Infocom-munications (CogInfoCom rsquo13) pp 161ndash166 IEEE BudapestHungary December 2013

[2] K Ashton ldquoThat lsquoInternet of Thingsrsquo thing in the real worldthingsmattermore than ideasrdquoRFID Journal 2009 httpwwwrfidjournalcomarticleprint4986

[3] QWu G Ding Y Xu et al ldquoCognitive internet of things a newparadigm beyond connectionrdquo IEEE Internet of Things Journalvol 1 no 2 pp 129ndash143 2014

[4] Q Zhang and D Peng ldquoIntelligent decision-making serviceframework based on QoS model in the internet of thingsrdquo inProceedings of the 11th International Symposium on DistributedComputing and Applications to Business Engineering and Sci-ence (DCABES rsquo12) pp 103ndash107 Guilin China October 2012

[5] L Li S Li and S Zhao ldquoQoS-Aware scheduling of services-oriented internet of thingsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1497ndash1507 2014

[6] S Kim ldquoAdaptive ad-hoc network routing scheme by usingincentive-based modelrdquo Ad Hoc amp Sensor Wireless Networksvol 15 no 2 pp 107ndash125 2012

[7] G Pujolle ldquoMetamorphic networksrdquo Journal of ComputingScience and Engineering vol 7 no 3 pp 198ndash203 2013

[8] I Jang D Pyeon S Kim and H Yoon ldquoA survey on com-munication protocols for wireless sensor networksrdquo Journal ofComputing Science and Engineering vol 7 no 4 pp 231ndash2412013

[9] J van der Wal ldquoDiscounted Markov games successive approx-imation and stopping timesrdquo International Journal of GameTheory vol 6 no 1 pp 11ndash22 1977

[10] P Vrancx K Verbeeck and A Nowe ldquoDecentralized learningin Markov gamesrdquo IEEE Transactions on Systems Man andCybernetics Part B Cybernetics vol 38 no 4 pp 976ndash981 2008

[11] K Edemacu and T Bulega ldquoResource sharing between M2Mand H2H traffic under time-controlled scheduling scheme in

LTE networksrdquo in Proceedings of the 8th International Confer-ence on Telecommunication Systems Services and Applications(TSSA rsquo14) pp 1ndash6 Kuta Indonesia October 2014

[12] X Jin S Chun J Jung and K-H Lee ldquoIoT service selectionbased on physical service model and absolute dominancerelationshiprdquo in Proceedings of the 7th IEEE International Con-ference on Service-Oriented Computing and Applications (SOCArsquo14) pp 65ndash72 Matsue Japan November 2014

[13] X Luo H Luo and X Chang ldquoOnline optimization of col-laborative web service QoS prediction based on approximatedynamic programmingrdquo in Proceedings of the InternationalConference on Identification Information and Knowledge in theInternet of Things (IIKI rsquo14) pp 80ndash83 IEEE Beijing ChinaOctober 2014

[14] A Imran M Bennis and L Giupponi ldquoUse of learning gametheory and optimization as biomimetic approaches for Self-Organization inmacro-femtocell coexistencerdquo in Proceedings ofthe IEEE Wireless Communications and Networking ConferenceWorkshops (WCNCW rsquo12) pp 103ndash108 IEEE Paris FranceApril 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article Learning-Based QoS Control …downloads.hindawi.com/journals/misy/2015/605357.pdf · Learning-Based QoS Control Algorithms for Next Generation Internet of Things

6 Mobile Information Systems

Step 3 Based on the current IoT situation each QoS sched-uler estimates his utility function (119880(119909)) according to (1) and(2)

Step 4 Using (5) eachQoS scheduler periodically adjusts the119901(sdot) values

Step 5 Based on the probability distribution P(sdot) each 119875(1198781015840 |119878 119886119878) is defined using the Boltzmann distribution

Step 6 Iteratively each QoS scheduler selects a strategy(119886(119905)) to maximize his long-term expected payoff (119881(sdot))This sequential learning process is repeatedly executed in adistributed manner

Step 7 If a QoS scheduler attains a stable status (ie119881(119905+1)

(sdot) minus 119881(119905)(sdot) lt Δ) this scheduler is assumed to have

obtained an equilibrium strategy When all QoS schedulersachieve a stable status the game process is temporarilystopped

Step 8 Each QoS scheduler continuously self-monitors thecurrent IoT situation and proceeds to Step 3 for the nextiteration

3 Performance Evaluation

In this section we compare the performance of the proposedscheme with other existing schemes [4 5 13] and confirm theperformance superiority of the proposed approach using asimulation model Our simulation model is a representationof an IoT system that includes system entities and thebehavior and interactions of these entities To facilitate thedevelopment and implementation of our simulator Table 1lists the system parameters

Our simulation results were achieved using MATLABwhich is widely used in academic and research institutionsin addition to industrial enterprises To emulate a real worldscenario the assumptions of our simulation environmentwere as follows

(i) The simulated system consisted of four QoS sched-ulers for the IoT system

(ii) In each scheduler coverage area a new service requestwas Poisson with rate 120588 (servicess) and the range ofthe offered service load was varied from 0 to 30

(iii) Therewere three strategies (119898) for theQoS schedulersand each strategy (1198861198941le119894le119898) was 119886119894 isin 25Mbps30Mbps 35Mbps Therefore there were total 119898119873that is 34 system states like S = (25Mbps 25Mbps25Mbps 25Mbps) (35Mbps 35Mbps 35Mbps35Mbps)

(iv) The resources of the IoT system bandwidth (bps) andtotal resource amount were 140Mbps

(v) Network performance measures obtained based on50 simulation runs were plotted as a function of theoffered traffic load

05

06

07

08

09

11

12

13

1

Reso

urce

usa

bilit

y in

IoT

syste

ms

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

Figure 1 Resource usability in IoT systems

(vi) The message size of each application was exponen-tially distributed with different means for differentmessage applications

(vii) For simplicity we assumed the absence of physicalobstacles in the experiments

(viii) The performance criteria obtained through simula-tion were resource usability service availability andnormalized service delay

(ix) Resource usability was defined as the percentage ofthe actually used resource

(x) Service availability was the success ratio of the servicerequests

(xi) The normalized service delay was a normalized ser-vice delay measured from real network operations

In this paper we compared the performance of the proposedscheme with existing schemes SQoSS [5] IDMS [4] andADPP [13] These existing schemes were recently developedas effective IoT management algorithms

Figure 1 presents the performance comparison of eachscheme in terms of resource usability in the IoT systemsIn this study resource usability is a measure of how systemresources are used Traditionally monitoring how resourcesare used is one of the most critical aspects of IoT man-agement During the system operations all schemes pro-duced similar resource usability However the proposedscheme adaptively allocates resources to the IoT system in anincremental manner while ensuring different requirementsTherefore the resource usability produced by the proposedscheme was higher than the other schemes from low to heavyservice load intensities

Figure 2 represents the service availability of each IoTcontrol scheme In this study service availability is defined asthe success ratio of the service requests In general excellentservice availability is a highly desirable property for real

Mobile Information Systems 7

03

04

05

06

07

08

09

11

12

13

1

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

Serv

ice a

vaila

bilit

y in

IoT

syste

ms

Figure 2 Service availability in IoT systems

Nor

mal

ized

serv

ice d

elay

in Io

T sy

stem

s

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

0

005

01

015

02

025

03

Figure 3 Normalized service delay in IoT systems

world IoT operations As indicated in the results it is clearthat performance trends are similar As the service requestrate increases it can saturate or exceed the system capacityTherefore excessive service requests may lead to systemcongestion decreasing the service availability This is intu-itively correct Under various application service requests theproposed game-based approach can provide a higher trafficservice than the other schemes From the above results weconclude that the proposed scheme can provide a higherservice availability in IoT systems

The curves in Figure 3 illustrate the normalized servicedelay for IoT services under different service loads Typicallyservice delay is an important QoS metric and can reveal thefitness or unfitness of system protocols for different delay-sensitive applications Owing to the feedback-based Markov

game approach the proposed scheme can dynamically adaptthe current situation and has significantly lower service delaythan the other schemes From the results we can observethat the proposed approach can support delay-sensitiveapplications and ensure a latency reduction in IoT services

The simulation results presented in Figures 1ndash3 demon-strate the performance of the proposed and other existingschemes and verify that the proposed Markov game-basedscheme can provide attractive network performance Themain features of the proposed scheme are as follows (i) anew Markov game model based on a distributed learningapproach is established (ii) each QoS scheduler learns theuncertain system state according to local information (iii)schedulers make decisions to maximize their own expectedpayoff by considering network dynamics and (iv) whenselecting a strategy schedulers consider not only the imme-diate payoff but also the subsequent decisions The proposedscheme constantly monitors the current network conditionsfor an adaptive IoT system management and successfullyexhibits excellent performance to approximate the optimizedperformance As expected the performance enhancementsprovided by the proposed scheme outperformed the existingschemes [4 5 13]

4 Summary and Conclusions

Today IoT-based services and applications are becoming anintegral part of our everyday life It is foreseeable that theIoT will be a part of the future Internet where ldquothingsrdquo canbe wirelessly organized as a global network that can providedynamic services for applications and users Therefore IoTtechnology can bridge the gap between the virtual networkand the ldquoreal thingsrdquo world Innovative uses of IoT techniqueson the Internetwill not only provide benefits to users to accesswide ranges of data sources but also generate challenges inaccessing heterogeneous application data especially in thedynamic environment of real-time IoT systems

This paper addressed a QoS control algorithm for IoTsystems Using the learning-based Markov game model QoSschedulers iteratively observed the current situation andrepeatedly modified their strategies to effectively managesystem resources Using a step-by-step feedback processthe proposed scheme effectively approximated the optimizedsystem performance in an entirely distributed manner Themost important novelties of the proposed scheme are itsadaptability and responsiveness to current system conditionsCompared with the existing schemes the simulation resultsconfirmed that the proposed game-based approach couldimprove the performance under dynamically changing IoTsystem environments whereas other existing schemes couldnot offer such an attractive performance Resource usabilityservice availability in IoT systems normalized service delayand accuracy were improved by approximately 5 10 10and 5 respectively compared to the existing schemes

Furthermore our study opens the door to several inter-esting extensions In the future we plan to design newreinforcement-learning models and develop adaptive onlinefeedback algorithmsThis is a potential direction and possible

8 Mobile Information Systems

extension to this study and can further improve the perfor-mance of IoT systems Moreover it would be interesting toextend theMarkov gamemodel to various decision-theoreticframeworks Under uncertain system environments thiswould be an interesting topic for future research

Conflict of Interests

The author Sungwook Kim declares that there is no conflictof interests regarding the publication of this paper

Acknowledgments

This research was supported by the MSIP (Ministry ofScience ICT and Future Planning) Korea under the ITRC(InformationTechnologyResearchCenter) Support Program(IITP-2015-H8501-15-1018) supervised by the IITP (Institutefor Information amp Communications Technology Promo-tion) and by the Sogang University Research Grant of 2014(20141002001)

References

[1] G Sallai ldquoChapters of future internet researchrdquo in Proceedingsof the 4th IEEE International Conference on Cognitive Infocom-munications (CogInfoCom rsquo13) pp 161ndash166 IEEE BudapestHungary December 2013

[2] K Ashton ldquoThat lsquoInternet of Thingsrsquo thing in the real worldthingsmattermore than ideasrdquoRFID Journal 2009 httpwwwrfidjournalcomarticleprint4986

[3] QWu G Ding Y Xu et al ldquoCognitive internet of things a newparadigm beyond connectionrdquo IEEE Internet of Things Journalvol 1 no 2 pp 129ndash143 2014

[4] Q Zhang and D Peng ldquoIntelligent decision-making serviceframework based on QoS model in the internet of thingsrdquo inProceedings of the 11th International Symposium on DistributedComputing and Applications to Business Engineering and Sci-ence (DCABES rsquo12) pp 103ndash107 Guilin China October 2012

[5] L Li S Li and S Zhao ldquoQoS-Aware scheduling of services-oriented internet of thingsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1497ndash1507 2014

[6] S Kim ldquoAdaptive ad-hoc network routing scheme by usingincentive-based modelrdquo Ad Hoc amp Sensor Wireless Networksvol 15 no 2 pp 107ndash125 2012

[7] G Pujolle ldquoMetamorphic networksrdquo Journal of ComputingScience and Engineering vol 7 no 3 pp 198ndash203 2013

[8] I Jang D Pyeon S Kim and H Yoon ldquoA survey on com-munication protocols for wireless sensor networksrdquo Journal ofComputing Science and Engineering vol 7 no 4 pp 231ndash2412013

[9] J van der Wal ldquoDiscounted Markov games successive approx-imation and stopping timesrdquo International Journal of GameTheory vol 6 no 1 pp 11ndash22 1977

[10] P Vrancx K Verbeeck and A Nowe ldquoDecentralized learningin Markov gamesrdquo IEEE Transactions on Systems Man andCybernetics Part B Cybernetics vol 38 no 4 pp 976ndash981 2008

[11] K Edemacu and T Bulega ldquoResource sharing between M2Mand H2H traffic under time-controlled scheduling scheme in

LTE networksrdquo in Proceedings of the 8th International Confer-ence on Telecommunication Systems Services and Applications(TSSA rsquo14) pp 1ndash6 Kuta Indonesia October 2014

[12] X Jin S Chun J Jung and K-H Lee ldquoIoT service selectionbased on physical service model and absolute dominancerelationshiprdquo in Proceedings of the 7th IEEE International Con-ference on Service-Oriented Computing and Applications (SOCArsquo14) pp 65ndash72 Matsue Japan November 2014

[13] X Luo H Luo and X Chang ldquoOnline optimization of col-laborative web service QoS prediction based on approximatedynamic programmingrdquo in Proceedings of the InternationalConference on Identification Information and Knowledge in theInternet of Things (IIKI rsquo14) pp 80ndash83 IEEE Beijing ChinaOctober 2014

[14] A Imran M Bennis and L Giupponi ldquoUse of learning gametheory and optimization as biomimetic approaches for Self-Organization inmacro-femtocell coexistencerdquo in Proceedings ofthe IEEE Wireless Communications and Networking ConferenceWorkshops (WCNCW rsquo12) pp 103ndash108 IEEE Paris FranceApril 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article Learning-Based QoS Control …downloads.hindawi.com/journals/misy/2015/605357.pdf · Learning-Based QoS Control Algorithms for Next Generation Internet of Things

Mobile Information Systems 7

03

04

05

06

07

08

09

11

12

13

1

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

Serv

ice a

vaila

bilit

y in

IoT

syste

ms

Figure 2 Service availability in IoT systems

Nor

mal

ized

serv

ice d

elay

in Io

T sy

stem

s

Proposed schemeThe SQoSS scheme

The IDMS schemeThe ADPP scheme

1 15 2 25 305Service request rates

0

005

01

015

02

025

03

Figure 3 Normalized service delay in IoT systems

world IoT operations As indicated in the results it is clearthat performance trends are similar As the service requestrate increases it can saturate or exceed the system capacityTherefore excessive service requests may lead to systemcongestion decreasing the service availability This is intu-itively correct Under various application service requests theproposed game-based approach can provide a higher trafficservice than the other schemes From the above results weconclude that the proposed scheme can provide a higherservice availability in IoT systems

The curves in Figure 3 illustrate the normalized servicedelay for IoT services under different service loads Typicallyservice delay is an important QoS metric and can reveal thefitness or unfitness of system protocols for different delay-sensitive applications Owing to the feedback-based Markov

game approach the proposed scheme can dynamically adaptthe current situation and has significantly lower service delaythan the other schemes From the results we can observethat the proposed approach can support delay-sensitiveapplications and ensure a latency reduction in IoT services

The simulation results presented in Figures 1ndash3 demon-strate the performance of the proposed and other existingschemes and verify that the proposed Markov game-basedscheme can provide attractive network performance Themain features of the proposed scheme are as follows (i) anew Markov game model based on a distributed learningapproach is established (ii) each QoS scheduler learns theuncertain system state according to local information (iii)schedulers make decisions to maximize their own expectedpayoff by considering network dynamics and (iv) whenselecting a strategy schedulers consider not only the imme-diate payoff but also the subsequent decisions The proposedscheme constantly monitors the current network conditionsfor an adaptive IoT system management and successfullyexhibits excellent performance to approximate the optimizedperformance As expected the performance enhancementsprovided by the proposed scheme outperformed the existingschemes [4 5 13]

4 Summary and Conclusions

Today IoT-based services and applications are becoming anintegral part of our everyday life It is foreseeable that theIoT will be a part of the future Internet where ldquothingsrdquo canbe wirelessly organized as a global network that can providedynamic services for applications and users Therefore IoTtechnology can bridge the gap between the virtual networkand the ldquoreal thingsrdquo world Innovative uses of IoT techniqueson the Internetwill not only provide benefits to users to accesswide ranges of data sources but also generate challenges inaccessing heterogeneous application data especially in thedynamic environment of real-time IoT systems

This paper addressed a QoS control algorithm for IoTsystems Using the learning-based Markov game model QoSschedulers iteratively observed the current situation andrepeatedly modified their strategies to effectively managesystem resources Using a step-by-step feedback processthe proposed scheme effectively approximated the optimizedsystem performance in an entirely distributed manner Themost important novelties of the proposed scheme are itsadaptability and responsiveness to current system conditionsCompared with the existing schemes the simulation resultsconfirmed that the proposed game-based approach couldimprove the performance under dynamically changing IoTsystem environments whereas other existing schemes couldnot offer such an attractive performance Resource usabilityservice availability in IoT systems normalized service delayand accuracy were improved by approximately 5 10 10and 5 respectively compared to the existing schemes

Furthermore our study opens the door to several inter-esting extensions In the future we plan to design newreinforcement-learning models and develop adaptive onlinefeedback algorithmsThis is a potential direction and possible

8 Mobile Information Systems

extension to this study and can further improve the perfor-mance of IoT systems Moreover it would be interesting toextend theMarkov gamemodel to various decision-theoreticframeworks Under uncertain system environments thiswould be an interesting topic for future research

Conflict of Interests

The author Sungwook Kim declares that there is no conflictof interests regarding the publication of this paper

Acknowledgments

This research was supported by the MSIP (Ministry ofScience ICT and Future Planning) Korea under the ITRC(InformationTechnologyResearchCenter) Support Program(IITP-2015-H8501-15-1018) supervised by the IITP (Institutefor Information amp Communications Technology Promo-tion) and by the Sogang University Research Grant of 2014(20141002001)

References

[1] G Sallai ldquoChapters of future internet researchrdquo in Proceedingsof the 4th IEEE International Conference on Cognitive Infocom-munications (CogInfoCom rsquo13) pp 161ndash166 IEEE BudapestHungary December 2013

[2] K Ashton ldquoThat lsquoInternet of Thingsrsquo thing in the real worldthingsmattermore than ideasrdquoRFID Journal 2009 httpwwwrfidjournalcomarticleprint4986

[3] QWu G Ding Y Xu et al ldquoCognitive internet of things a newparadigm beyond connectionrdquo IEEE Internet of Things Journalvol 1 no 2 pp 129ndash143 2014

[4] Q Zhang and D Peng ldquoIntelligent decision-making serviceframework based on QoS model in the internet of thingsrdquo inProceedings of the 11th International Symposium on DistributedComputing and Applications to Business Engineering and Sci-ence (DCABES rsquo12) pp 103ndash107 Guilin China October 2012

[5] L Li S Li and S Zhao ldquoQoS-Aware scheduling of services-oriented internet of thingsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1497ndash1507 2014

[6] S Kim ldquoAdaptive ad-hoc network routing scheme by usingincentive-based modelrdquo Ad Hoc amp Sensor Wireless Networksvol 15 no 2 pp 107ndash125 2012

[7] G Pujolle ldquoMetamorphic networksrdquo Journal of ComputingScience and Engineering vol 7 no 3 pp 198ndash203 2013

[8] I Jang D Pyeon S Kim and H Yoon ldquoA survey on com-munication protocols for wireless sensor networksrdquo Journal ofComputing Science and Engineering vol 7 no 4 pp 231ndash2412013

[9] J van der Wal ldquoDiscounted Markov games successive approx-imation and stopping timesrdquo International Journal of GameTheory vol 6 no 1 pp 11ndash22 1977

[10] P Vrancx K Verbeeck and A Nowe ldquoDecentralized learningin Markov gamesrdquo IEEE Transactions on Systems Man andCybernetics Part B Cybernetics vol 38 no 4 pp 976ndash981 2008

[11] K Edemacu and T Bulega ldquoResource sharing between M2Mand H2H traffic under time-controlled scheduling scheme in

LTE networksrdquo in Proceedings of the 8th International Confer-ence on Telecommunication Systems Services and Applications(TSSA rsquo14) pp 1ndash6 Kuta Indonesia October 2014

[12] X Jin S Chun J Jung and K-H Lee ldquoIoT service selectionbased on physical service model and absolute dominancerelationshiprdquo in Proceedings of the 7th IEEE International Con-ference on Service-Oriented Computing and Applications (SOCArsquo14) pp 65ndash72 Matsue Japan November 2014

[13] X Luo H Luo and X Chang ldquoOnline optimization of col-laborative web service QoS prediction based on approximatedynamic programmingrdquo in Proceedings of the InternationalConference on Identification Information and Knowledge in theInternet of Things (IIKI rsquo14) pp 80ndash83 IEEE Beijing ChinaOctober 2014

[14] A Imran M Bennis and L Giupponi ldquoUse of learning gametheory and optimization as biomimetic approaches for Self-Organization inmacro-femtocell coexistencerdquo in Proceedings ofthe IEEE Wireless Communications and Networking ConferenceWorkshops (WCNCW rsquo12) pp 103ndash108 IEEE Paris FranceApril 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article Learning-Based QoS Control …downloads.hindawi.com/journals/misy/2015/605357.pdf · Learning-Based QoS Control Algorithms for Next Generation Internet of Things

8 Mobile Information Systems

extension to this study and can further improve the perfor-mance of IoT systems Moreover it would be interesting toextend theMarkov gamemodel to various decision-theoreticframeworks Under uncertain system environments thiswould be an interesting topic for future research

Conflict of Interests

The author Sungwook Kim declares that there is no conflictof interests regarding the publication of this paper

Acknowledgments

This research was supported by the MSIP (Ministry ofScience ICT and Future Planning) Korea under the ITRC(InformationTechnologyResearchCenter) Support Program(IITP-2015-H8501-15-1018) supervised by the IITP (Institutefor Information amp Communications Technology Promo-tion) and by the Sogang University Research Grant of 2014(20141002001)

References

[1] G Sallai ldquoChapters of future internet researchrdquo in Proceedingsof the 4th IEEE International Conference on Cognitive Infocom-munications (CogInfoCom rsquo13) pp 161ndash166 IEEE BudapestHungary December 2013

[2] K Ashton ldquoThat lsquoInternet of Thingsrsquo thing in the real worldthingsmattermore than ideasrdquoRFID Journal 2009 httpwwwrfidjournalcomarticleprint4986

[3] QWu G Ding Y Xu et al ldquoCognitive internet of things a newparadigm beyond connectionrdquo IEEE Internet of Things Journalvol 1 no 2 pp 129ndash143 2014

[4] Q Zhang and D Peng ldquoIntelligent decision-making serviceframework based on QoS model in the internet of thingsrdquo inProceedings of the 11th International Symposium on DistributedComputing and Applications to Business Engineering and Sci-ence (DCABES rsquo12) pp 103ndash107 Guilin China October 2012

[5] L Li S Li and S Zhao ldquoQoS-Aware scheduling of services-oriented internet of thingsrdquo IEEE Transactions on IndustrialInformatics vol 10 no 2 pp 1497ndash1507 2014

[6] S Kim ldquoAdaptive ad-hoc network routing scheme by usingincentive-based modelrdquo Ad Hoc amp Sensor Wireless Networksvol 15 no 2 pp 107ndash125 2012

[7] G Pujolle ldquoMetamorphic networksrdquo Journal of ComputingScience and Engineering vol 7 no 3 pp 198ndash203 2013

[8] I Jang D Pyeon S Kim and H Yoon ldquoA survey on com-munication protocols for wireless sensor networksrdquo Journal ofComputing Science and Engineering vol 7 no 4 pp 231ndash2412013

[9] J van der Wal ldquoDiscounted Markov games successive approx-imation and stopping timesrdquo International Journal of GameTheory vol 6 no 1 pp 11ndash22 1977

[10] P Vrancx K Verbeeck and A Nowe ldquoDecentralized learningin Markov gamesrdquo IEEE Transactions on Systems Man andCybernetics Part B Cybernetics vol 38 no 4 pp 976ndash981 2008

[11] K Edemacu and T Bulega ldquoResource sharing between M2Mand H2H traffic under time-controlled scheduling scheme in

LTE networksrdquo in Proceedings of the 8th International Confer-ence on Telecommunication Systems Services and Applications(TSSA rsquo14) pp 1ndash6 Kuta Indonesia October 2014

[12] X Jin S Chun J Jung and K-H Lee ldquoIoT service selectionbased on physical service model and absolute dominancerelationshiprdquo in Proceedings of the 7th IEEE International Con-ference on Service-Oriented Computing and Applications (SOCArsquo14) pp 65ndash72 Matsue Japan November 2014

[13] X Luo H Luo and X Chang ldquoOnline optimization of col-laborative web service QoS prediction based on approximatedynamic programmingrdquo in Proceedings of the InternationalConference on Identification Information and Knowledge in theInternet of Things (IIKI rsquo14) pp 80ndash83 IEEE Beijing ChinaOctober 2014

[14] A Imran M Bennis and L Giupponi ldquoUse of learning gametheory and optimization as biomimetic approaches for Self-Organization inmacro-femtocell coexistencerdquo in Proceedings ofthe IEEE Wireless Communications and Networking ConferenceWorkshops (WCNCW rsquo12) pp 103ndash108 IEEE Paris FranceApril 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article Learning-Based QoS Control …downloads.hindawi.com/journals/misy/2015/605357.pdf · Learning-Based QoS Control Algorithms for Next Generation Internet of Things

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014