4
78 IEEE WIRELESS COMMUNICATIONS LETTERS, VOL. 2, NO. 1, FEBRUARY 2013 Optimal Adaptive Modulation for QoS Constrained Wireless Networks with Renewable Energy Sources Abhishek Agarwal and Aditya K. Jagannatham Abstract—In this paper we derive the optimal policy to minimize the average number of dropped packets for a delay constrained wireless node with a renewable energy source. The proposed framework employs adaptive modulation for transmis- sion of the optimal number of packets towards satisfying the Quality of Service (QoS) constraints. This framework is formu- lated as a Markov Decision Process (MDP) which minimizes the long term packet drop rate. Further, we demonstrate that the optimal policy is monotone with respect to each of the joint state components of the MDP. Simulation results are presented to validate the monotone structure and superior performance of the proposed policy with respect to the existing schemes. Index Terms—Markov decision process, adaptive modulation, renewable energy source, quality of service. I. I NTRODUCTION M ODERN wireless communication networks are rapidly progressing towards the transmission of delay-sensitive rich multimedia content over packet switched networks. In such scenarios satisfying the associated Quality of Service (QoS) constraints with respect to delay, jitter, and throughput under erratic channel conditions requires large amounts of energy. Hence, in energy limited battery powered systems, it is critical to optimally allocate energy for QoS satisfaction. In this scenario, to ease the system energy demands, renewable energy sources (such as solar, wind etc.) can be utilized to enhance the battery and hence performance of mobile nodes. The random charging nature of such renewable sources calls for efficient packet scheduling policies for performance maximization. Towards this end, we propose a framework for optimal packet scheduling in this delay and QoS constrained wireless scenario which employs renewable energy resources. This problem is formulated as an Markov Decision Process (MDP) with a maximum bit-error rate (BER) QoS constraint. The state space of the MDP consists of the wireless channel state, renewable battery energy, and data queue length at the transmitting wireless node. The optimal policy of the MDP aims to minimize the long term packet drop percentage through optimal selection of the transmit power and data rate parameters. Further, we demonstrate that the proposed policy has the desirable monotone properties with respect to each state component, leading to a computationally feasible scheme for packet scheduling suitable for nodes with limited processing capabilities. Manuscript received September 20, 2012. The associate editor coordinating the review of this letter and approving it for publication was D. Niyato. The authors are with the Department of EE, IIT Kanpur, 208016, India (e-mail: [email protected]). Digital Object Identifier 10.1109/WCL.2012.112012.120696 Recently, resource allocation for renewable energy wireless nodes has attracted significant research interest. In [1] the authors present a scheme towards long term capacity maxi- mization through efficient transmit power allocation. However, the authors therein assume infinite backlogged data at the transmitter and continuously adaptable transmission rate as a function of the transmit power. Other works such as [2], [3], [4], [5] consider similar scheduling problems for single and multiple users without renewable energy sources. The authors in [2], [3] consider data scheduling with the aim of minimizing the average transmit power while limiting the average delay assuming that an instantaneous error-free feedback channel is available. The scheme in [5] uses multiple packet transmis- sions to counter channel fading. A significant drawback of the scheme therein is the absence of delay constraints rendering the scheme unsuitable for multimedia transmission in practical scenarios. Other related studies such as [6], [7] have been carried out for average power constrained systems without renewable energy sources. In this paper we study the data scheduling problem in a practical context with finite buffer length and the data transmission constrained to finite adaptive M-ary Quadrature Amplitude Modulation (M-QAM) constellations. The joint state MDP captures the trade-off between the packet drop rate and the power consumed through the Markovian state transi- tions. Our novel low complexity monotone policy describes an optimal data scheduling for delay and QoS constrained systems with renewable energy sources for long term efficient power utilization. The proposed policy minimizes the aver- age packet drop rate employing insights into the convexity properties of the system transmit power cost function. Results demonstrate the superior performance of the proposed scheme compared to the existing optimal policy for renewable energy sources derived in [1]. This paper is organized as follows. Section II provides an overview of the system and formulates the MDP for the above scenario. Section III presents the central results that establish the monotonicity properties of the optimal policy for the MDP. Simulation results are presented in Section IV and Section V concludes the paper. II. SYSTEM MODEL AND MDP FORMULATION We consider a renewable energy source based wireless mobile user (MU) with a finite data buffer. The energy recharge of the renewable battery is modeled as a random process independent across different time slots to capture the variation in the charging phenomenon, similar to [1]. The wireless channel between the MU and the base station (BS) is assumed to be Rayleigh Fading. The MU employs 2162-2337/13$31.00 c 2013 IEEE

Optimal Adaptive Modulation for QoS Constrained Wireless Networks with Renewable Energy Sources

Embed Size (px)

Citation preview

78 IEEE WIRELESS COMMUNICATIONS LETTERS, VOL. 2, NO. 1, FEBRUARY 2013

Optimal Adaptive Modulation forQoS Constrained Wireless Networks with Renewable Energy Sources

Abhishek Agarwal and Aditya K. Jagannatham

Abstract—In this paper we derive the optimal policy tominimize the average number of dropped packets for a delayconstrained wireless node with a renewable energy source. Theproposed framework employs adaptive modulation for transmis-sion of the optimal number of packets towards satisfying theQuality of Service (QoS) constraints. This framework is formu-lated as a Markov Decision Process (MDP) which minimizes thelong term packet drop rate. Further, we demonstrate that theoptimal policy is monotone with respect to each of the jointstate components of the MDP. Simulation results are presentedto validate the monotone structure and superior performance ofthe proposed policy with respect to the existing schemes.

Index Terms—Markov decision process, adaptive modulation,renewable energy source, quality of service.

I. INTRODUCTION

MODERN wireless communication networks are rapidlyprogressing towards the transmission of delay-sensitive

rich multimedia content over packet switched networks. Insuch scenarios satisfying the associated Quality of Service(QoS) constraints with respect to delay, jitter, and throughputunder erratic channel conditions requires large amounts ofenergy. Hence, in energy limited battery powered systems, itis critical to optimally allocate energy for QoS satisfaction. Inthis scenario, to ease the system energy demands, renewableenergy sources (such as solar, wind etc.) can be utilizedto enhance the battery and hence performance of mobilenodes. The random charging nature of such renewable sourcescalls for efficient packet scheduling policies for performancemaximization.

Towards this end, we propose a framework for optimalpacket scheduling in this delay and QoS constrained wirelessscenario which employs renewable energy resources. Thisproblem is formulated as an Markov Decision Process (MDP)with a maximum bit-error rate (BER) QoS constraint. Thestate space of the MDP consists of the wireless channelstate, renewable battery energy, and data queue length atthe transmitting wireless node. The optimal policy of theMDP aims to minimize the long term packet drop percentagethrough optimal selection of the transmit power and datarate parameters. Further, we demonstrate that the proposedpolicy has the desirable monotone properties with respect toeach state component, leading to a computationally feasiblescheme for packet scheduling suitable for nodes with limitedprocessing capabilities.

Manuscript received September 20, 2012. The associate editor coordinatingthe review of this letter and approving it for publication was D. Niyato.

The authors are with the Department of EE, IIT Kanpur, 208016, India(e-mail: [email protected]).

Digital Object Identifier 10.1109/WCL.2012.112012.120696

Recently, resource allocation for renewable energy wirelessnodes has attracted significant research interest. In [1] theauthors present a scheme towards long term capacity maxi-mization through efficient transmit power allocation. However,the authors therein assume infinite backlogged data at thetransmitter and continuously adaptable transmission rate as afunction of the transmit power. Other works such as [2], [3],[4], [5] consider similar scheduling problems for single andmultiple users without renewable energy sources. The authorsin [2], [3] consider data scheduling with the aim of minimizingthe average transmit power while limiting the average delayassuming that an instantaneous error-free feedback channelis available. The scheme in [5] uses multiple packet transmis-sions to counter channel fading. A significant drawback of thescheme therein is the absence of delay constraints renderingthe scheme unsuitable for multimedia transmission in practicalscenarios. Other related studies such as [6], [7] have beencarried out for average power constrained systems withoutrenewable energy sources.

In this paper we study the data scheduling problem ina practical context with finite buffer length and the datatransmission constrained to finite adaptive M-ary QuadratureAmplitude Modulation (M-QAM) constellations. The jointstate MDP captures the trade-off between the packet drop rateand the power consumed through the Markovian state transi-tions. Our novel low complexity monotone policy describesan optimal data scheduling for delay and QoS constrainedsystems with renewable energy sources for long term efficientpower utilization. The proposed policy minimizes the aver-age packet drop rate employing insights into the convexityproperties of the system transmit power cost function. Resultsdemonstrate the superior performance of the proposed schemecompared to the existing optimal policy for renewable energysources derived in [1].

This paper is organized as follows. Section II provides anoverview of the system and formulates the MDP for the abovescenario. Section III presents the central results that establishthe monotonicity properties of the optimal policy for the MDP.Simulation results are presented in Section IV and Section Vconcludes the paper.

II. SYSTEM MODEL AND MDP FORMULATION

We consider a renewable energy source based wirelessmobile user (MU) with a finite data buffer. The energyrecharge of the renewable battery is modeled as a randomprocess independent across different time slots to capturethe variation in the charging phenomenon, similar to [1].The wireless channel between the MU and the base station(BS) is assumed to be Rayleigh Fading. The MU employs

2162-2337/13$31.00 c© 2013 IEEE

AGARWAL and JAGANNATHAN: OPTIMAL ADAPTIVE MODULATION FOR QOS CONSTRAINED WIRELESS NETWORKS WITH RENEWABLE ENERGY . . . 79

adaptive modulation to vary its data rate with the channelconditions such that the maximum BER is constrained by theQoS restrictions. This requires the MU to transmit a givennumber of bits (constellation size of modulation scheme) ata minimum power. Hence, the transmit power required for aQAM constellation of order 2a is given from [8] as,

P(a, ‖h‖2

)=

Γ (2a − 1)

‖h‖2 , (1)

where h is the Rayleigh fading channel coefficient, Γ =N0

3

[Q−1 (Pe/4)

]2is a constant dependent on the noise power

spectral density N0 and the QoS based maximum tolerableBER constraint Pe. Each time slot comprises of N symboltransmissions of the adaptively chosen constellation belongingto the class of M-QAM constellations. Hence, without loss ofgenerality we consider a model normalized by N .

We formulate this paradigm as an infinite horizon dis-counted Markov Decision Process with sn, the state at timen defined as,

sn = (gn, en, bn) ∈ S,where gn = ‖hn‖2, en is the battery available at time n,bn is the data queue length, and S denotes the state space.The optimal stationary policy for the above MDP gives theappropriate choice of the adaptive QAM constellation size anat time instant n towards long term packet drop minimization.The action space at time n, A (sn) = {0, 1, . . . , amax(sn)}is restricted by the amount of energy available and thebits present in the buffer. Hence, P (amax(sn), gn) ≤ enand amax(sn) ≤ bn. Due to the strict delay constraint theuntransmitted data is rendered redundant after one time epoch.Hence, the the reward at time n can be defined as,

rn (bn, an) = − (bn − an) , (2)

which essentially imposes a penalty proportional to the num-ber of packets lost. The MU objective of long term averagereward maximization is captured by the average discountedreward function defined as,

Rπ (sn) = r (sn, π (sn))+β∑

sn+1∈Sq (sn+1|sn, an)Rπ (sn+1) ,

(3)where π : S → ∪s∈SA (s) is the MU adaptive modulationpolicy, q (sn+1|sn, an) is the transition probability and β isthe discount factor. The transition from one state to anothercan be defined as,

sn+1 = (gn+1, en − P (an, gn) + ξn, bn+1) ,

where gn+1 is the uncorrelated exponentially distributed chan-nel gain, ξn is an independent random variable denoting theamount of energy recharged, bn+1 is an independent dataarrival random variable, and an is the action chosen at timen. The transition probability q(sn+1|sn, an) can be computedfrom the probability densities of the random variables gn+1,ξn, and bn+1 as follows,

q(sn+1|sn, an) = q(gn+1)× q(en+1|en, gn, an)× q(bn+1)

= q(gn+1)× q(ξn = en+1 − en + P (an, gn))

× q(bn+1). (4)

Let gn ∈ {0, 1, . . . , Hm}, en ∈ {0, 1, . . . , Em}, and bn ∈{0, 1, . . . , L} denote the size of the state space. Here Hm

denotes the size of the channel state space, Em the numberof energy units in the battery, and L the data buffer size. Letthe maximum constellation size of the M-QAM be 2M . Theaction space is limited by maximum energy that can be usedin one time epoch, Emax. As shown in Section IV, even forsmall parameter values such as M = 4, Em = 27, Emax =9, L = M = 4, the size of the state space is |S| ≈ 6000,which renders the standard value and policy iteration basedtechniques for optimal policy computation impractical. Hence,in the next section we prove key properties of the aboveMDP, for arbitrary distributions of the data arrival and energyrecharge processes, such that the computation of an optimalpolicy becomes tractable.

III. MONOTONICITY PROPERTIES OF THE OPTIMAL

POLICY

In this section we employ the convexity properties ofP (a, g) in (1) to derive the key monotonicity propertiesof the optimal policy for the above MDP. We denote thechannel, energy, and buffer state components by g, e, and b ,respectively by dropping the time index. The action is denotedby the variable a.

A. Concavity of Average Reward in energy state, e

The action space is limited by the amount of battery energy,P (amax(s), g) ≤ e, and the data queue, amax(s) ≤ b. Hence,increasing the battery energy increases the action space i.e.if e1 ≤ e2 and A1,A2 are the action spaces associated withthe states (g, e1, b) and (g, e2, b), respectively, then A2 ⊇ A1.Thus, the optimal value of the function R (g, e, b) must benon-decreasing in e i.e. R (g, e1, b) ≤ R (g, e2, b). Secondly,it can be readily seen from (1) that the function P (a, g) isconvex in a. Thus, for two different actions a1, and a2 wehave,

αP(a1, g

)+ (1− α)P

(a2, g

) ≥ P (a, g) , (5)

where 0 ≤ α ≤ 1 and a = αa1 + (1− α) a2. We now provethe concavity of R (g, e, b) in e in the following lemma.

Lemma 1. Let e be defined as, e = αe1 +(1− α) e2 for twodifferent energy states e1 and e2. It then follows that,

αR(g, e1, b

)+ (1− α)R

(g, e2, b

) ≤ R (g, e, b) .

Proof: We prove the above result using induction. Forease of illustration, we replace Rk(s) = Rk(g, e, b) by Rk(e)in this section. In [9], it has been demonstrated that optimalvalue function R (s) using value iteration is,

Rk+1(e) = maxa

{r(b, a) + β × Eg′,ξ,b′ [Rk (e− P (g, a) + ξ)]} .(6)

Under the inductive assumption, let Rk (s) be concave ine. Now, to prove the lemma above we have to demonstrate,

αRk+1

(e1)+ (1− α)Rk+1

(e2) ≤ Rk+1 (e) . (7)

Let the optimal action at states(g, ei, b

)be given by ai, i ∈

1, 2. Thus, Rk+1

(ei)

is given as,

Rk+1

(ei)= r

(b, ai

)+β×Eg′,ξ,b′

[Rk

(ei − P

(g, ai

)+ ξ

)],

(8)

80 IEEE WIRELESS COMMUNICATIONS LETTERS, VOL. 2, NO. 1, FEBRUARY 2013

where i = 1, 2. From the concavity of Rk we have,

αRk

(e1 − P

(g, a1

)+ ξ

)+ (1− α)Rk

(e2 − P

(g, a2

)+ ξ

)≤ Rk

(e− αP

(g, a1

)− (1− α)P(g, a2

)+ ξ

)≤ Rk (e− P (g, a) + ξ) , (9)

where the last inequality follows from (5) and the non-decreasing nature of R (g, e, b) in e. Thus, employing (9) andthe linearity of the stage reward r (b, a) in a, we have,

αRk+1(e1) + (1 − α)Rk+1(e

2)

≤ r(b, a) + β × E[Rk(e − P (g, a) + ξ)]

≤ Rk+1(g, e, b),

where the last inequality follows from the fact that the bestaction for Rk+1 in (6) cannot be worse than a, thus proving(7).

We now prove the monotonicity of the optimal policy withrespect to the channel gain g.

B. Monotonicity of the optimal policy in channel state, g

Let the optimal policy in state si =(gi, e, b

)be ai, i ∈ 1, 2.

We need to demostrate g1 < g2 =⇒ a1 ≤ a2. Assumeinstead, g1 < g2 and a1 > a2. Consider the argument of themaximization in (6) for action a, defined for state s = (g, e, b)as,

R (s, a) = r (b, a) + β × E[R(g′, e− P (g, a) + ξ, b′)]. (10)

We have,

R(s1, a1

) ≥ R(s1, a2

), (11)

R(s2, a2

) ≥ R(s2, a1

). (12)

Adding inequalities (11) and (12) and employing the relationin (10) we have,

ER (p1,1) + ER (p2,2) ≥ ER (p1,2) + ER (p2,1) , (13)

where pi,j = e − P(ai, gj

), 1 ≤ i, j ≤ 2 and ER (p) =

Eg′,ξ,b′ [R (g′, p+ ξ, b′)]. From the definition of P (a, g) in(1) it can be seen that,

p1,1 + p2,2 < p1,2 + p2,1, (14)

Further, we have either p1,1 ≤ p1,2 ≤ p2,1 ≤ p2,2 or p1,1 ≤p2,1 ≤ p1,2 ≤ p2,2. Thus, from the concavity and monotonicityof R (b, g, e) in e and (14) we have,

ER(p1,1) + ER(p2,2) ≤ ER(p1,2) + ER(p2,1),

which contradicts (13). Thus, given g1 < g2 we must havea1 ≤ a2, which completes the proof.

Using similar arguments we can prove the monotonicity ofthe optimal policy in the energy state e. This proof is omittedfor the sake of brevity. We now prove the monotonicity of theoptimal policy in b.

C. Monotonicity of the optimal policy in buffer state, b

Consider the states si =(g, e, bi

)for i = {1, 2} such that

b1 < b2. Let the optimal actions in si be ai, i ∈ {1, 2}. Onecan show that b1 < b2 =⇒ ∣∣A (

s1)∣∣ ⊆ ∣∣A (

s2)∣∣. From

(10) it can be observed that the second term in R(s1, a

)and R

(s2, a

)is identical for all a. The first term r (b, a)

differs only by a constant,(b1 − b2

). Thus if a2 ∈ A (

s1)

we must have a1 = a2. Otherwise if a2 ∈ A (s2) − A (

s1)

we necessarily have a2 > a1. Hence, the optimal action ismonotone in b.

Further, we can show that if a2 ∈ A (s2) − A (

s1)

wemust have a1 = maxA (

s1). The first term in (10) is

linearly increasing in a. From the convexity of P (a, g) andthe concavity of R (g, e, b) in e it can be readily seen thatthe second term is concave decreasing in a. Therefore, ifa2 ∈ A (

s2) − A (

s1), restricting the action space to A1,

it follows that a1 = maxA (s1).

IV. SIMULATION RESULTS

In our simulation setup for the above delay constrainedwireless scheduling scenario, we consider M-QAM constel-lations with maximum modulation size 2M = 24. The bufferlength considered is L = M = 4. The maximum number ofenergy units is Em = 27. We assume E[g] = E[||h||2] = 1and Γ = 0.19. The action space is restricted to use amaximum of Emax = 9 energy units in one time epoch. Thedata arrival process bn is assumed to be Poisson distributedwith mean λ. The probability of data loss due to bufferoverflow is neglected. Hence, we consider a finite support{0, 1, . . . , L} for bn. The discount factor β = 0.85. The energyrenewal process ξn is considered to be an independent processsimilar to [1] with an exponential distribution with support{0, 1, . . . , ξmax}, ξmax = 9, and mean μ.

For a practical wireless system scenario, the energy andchannel states have to be quantized. Hence, the associated en-ergy function P (a, g) is adapted to satisfy the QoS constraintsas,

P (a, g) =

⌈Γ (2a − 1)

g× Em/Emax

⌉,

where �x denotes the ceiling function. For the transmissionof a bits the channel gain can be divided into Em+1 intervalsaccording to the cost required in each interval. Thus,

Hm + 1 ≤ Emax(M + 1) + 1. (15)

The actual value of Hm will be slightly lower than Emax(M+1)+1 due to the overlapping nature of the interval boundaries.Hence, the size of the state space |S| = (Em + 1)× (Hm +1)× (L+ 1) can be approximated using (15) as follows,

|S| ≤ (Em + 1)× (Emax(M + 1) + 1)× (L+ 1)

= (27 + 1)× (9× 5 + 1)× (4 + 1) = 6440 (16)

Thus, even for moderate values of the MDP parametersM,Em, L the size of the state space is intractably large.Hence, the monotone properties derived in section III arecrucial for the computation of the optimal policy. In Fig.1(a), 2(a), 3(a) we plot the optimal policy of the above MDPas a function of channel gain g, battery energy e, and data

AGARWAL and JAGANNATHAN: OPTIMAL ADAPTIVE MODULATION FOR QOS CONSTRAINED WIRELESS NETWORKS WITH RENEWABLE ENERGY . . . 81

0 1 2 3 40

0.5

1

1.5

2

2.5

3

3.5

4

Data Queue Length, b(a)

Op

tim

al N

um

be

r o

f T

x B

its

g = 28;e = 8g = 28;e = 3g = 12;e = 8g = 12;e = 3

0 1 2 3 4−4

−3.5

−3

−2.5

−2

Data Queue Length, b(b)

Pac

ket

Dro

pR

ate,

R(g

,e,b

)

M = 4,λ = 2scheme in [1]greedy policy

Fig. 1. Optimal number of transmitted bits vs. channel g (left) at M = 4and Packet Drop Rate vs. channel g (right) at b = 4 and e = 8. The meanof the random variables bn and ξn are λ = 2 and μ = 0.992, respectively.

0 10 20 300

1

2

3

4

Channel Gain, g(a)

Op

tim

al N

um

be

r o

f T

x B

its

0 10 20 30−9

−8

−7

−6

−5

−4

−3

−2

Channel Gain, g(b)

Pac

ket

Dro

pR

ate,

R(g

,e,b

)

M = 4,λ = 2scheme in [1]greedy policyb = 4;e = 8

Fig. 2. Optimal number of transmitted bits vs. battery energy e (left) atM = 4 and Packet Drop Rate vs. battery energy e (right) at b = 4 andg = 23. The mean of the random variables bn and ξn are λ = 2 andμ = 0.992, respectively.

queue length b respectively. From the plots we observe thatthe optimal policy for the quantized MDP is monotone in theabove MDP joint state components. The stronger propertiesfor the optimal policy proved in III-C can also be readilyseen from these plots. Thus, one can employ the monotonicitypolicy iteration algorithm in [9] (section 6.11.2) to computethe optimal policy. This significantly reduces the computa-tions required to arrive at the optimal adaptive modulationscheduling policy since the size of the optimal action setmonotonically decreases with each iteration of the algorithm.

In Fig. 1(b), 2(b), 3(b) we plot the performance of theoptimal policy, in terms of the long term packet drop rate,R(g, e, b) in (3), with respect to the state components g, e,and b. This policy is compared with the sub-optimal policycomputed using the approach in [1] and the greedy policywhich uses maximum allowed battery energy at each timeepoch. As expected, the optimal policy surpasses both thepolicies. Note that for a difference of one unit in R(g, e, b) thepolicies will drop an average of 1−β = 0.15 more packets atevery decision epoch. Thus, both sub-optimal policies woulddrop large number of packets even during small time intervals.

0 10 200

0.5

1

1.5

2

2.5

3

3.5

4

Energy, e(a)

Op

tim

al N

um

be

r o

f T

x B

its

b = 4;g = 23b = 4;g = 29b = 4;g = 12

0 10 20−10

−8

−6

−4

−2

Energy, e(b)

Pac

ket

Dro

pR

ate,

R(g

,e,b

)

M = 4,λ = 2scheme in [1]greedy policy

Fig. 3. Optimal number of transmitted bits vs. data queue length b (left) atM = 4. Packet Drop Rate vs. queue length b (right) at g = 28 and e = 8.The mean of the random variables bn and ξn are λ = 2 and μ = 0.992,respectively.

V. CONCLUSION

In this paper we formulate the problem of variable rate,variable power data transmission for a QoS constrained wire-less user with renewable battery energy as a Markov DecisionProcess. It has been demonstrated that the high dimensionalityof the problem makes it prohibitively complex to employstraightforward techniques such as policy iteration for optimalpolicy computation. Thus, employing the convexity proper-ties of the transmit power function we have derived keymonotonicity properties of the optimal transmission policyfor the above MDP. These properties significantly reducethe computation required for the optimal policy for longterm packet drop minimization. Simulation results have beenpresented to illustrate the significantly lower long term packetdrop rate of the proposed scheme compared to the existingones.

REFERENCES

[1] A. Sinha and P. Chaporkar, “Optimal power allocation for a renewableenergy source,” in Proc. 2012 National Conference on Communications,pp. 1–5.

[2] M. H. Ngo and V. Krishnamurthy, “On optimality of monotone channel-aware transmission policies: a constrained Markov decision processapproach,” in Proc. 2007 International Conference on Acoustics, Speech,and Signal Processing, vol. 3, pp. 621–624.

[3] N. Salodkar, A. Karandikar, and V. Borkar, “A stable online algorithmfor energy-efficient multiuser scheduling,” IEEE Trans. Mobile Comput.,vol. 9, no. 10, pp. 1391–1406, Oct. 2010.

[4] Z. Ye, A. A. Abouzeid, and J. Ai, “Optimal stochastic policies fordistributed data aggregation in wireless sensor networks,” IEEE/ACMTrans. Networking, vol. 17, no. 5, pp. 1494–1507, Oct. 2009.

[5] B. Xiong, W. Yan, C. Lin, and F. Ren, “Cross-layer optimal policiesfor stochastic reliable transmission in wireless sensor networks,” in Proc.2010 International Congress on Ultra Modern Telecommunications andControl Systems and Workshops, pp. 944–951.

[6] A. T. Hoang and M. Motani, “Buffer and channel adaptive modulationfor transmission over fading channels,” in Proc. 2003 IEEE InternationalConference on Communications, vol. 4, pp. 2748–2752.

[7] A. Hoang and M. Motani, “Buffer and channel adaptive transmission overfading channels with imperfect channel state information,” in Proc. 2004IEEE Wireless Communications and Networking Conference, vol. 3, pp.1891–1896.

[8] C. Y. Wong, R. Cheng, K. Lataief, and R. Murch, “Multiuser OFDMwith adaptive subcarrier, bit, and power allocation,” IEEE J. Sel. AreasCommun., vol. 17, no. 10, pp. 1747–1758, Oct. 1999.

[9] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dy-namic Programming, 1st edition. John Wiley & Sons, Inc., 1994.