07002615.pdf

7/25/2019 07002615.pdf

1/6

Channel Access Framework For Cognitive Radio-

based Wireless Sensor Networks Using

Reinforcement Learning

J. A. Abolarinwa, N. M. Abdul Latiff, S. K. Syed Yusof

Faculty of Electrical Engineering

Universiti Teknologi Malaysia, UTM-MIMOS Center of Excellence

Johor, Malaysia

Abstract Cognitive radio-based wireless sensor network is a

new paradigm in sensor networks research. It is considered to

revolutionize next generation sensor networks. Therefore, it is of

paramount importance to develop an efficient channel access

technique suitable for cognitive radio-based wireless sensor

network. In this paper we have proposed a channel accessframework for cognitive radio-based wireless sensor networks

which is based on reinforcement learning technique. We have

used Q-learning approach to develop a simple access algorithm.

We have analyzed the effect of sensing time on the probability of

detection, probability of misdetection and probability of false

alarm. These parameters were compared using different

detection threshold values and significant simulation results were

discussed.

Keywords Cognitive-radio; Reinforcement; Channel; Q-

learning; Sensing; Energy-efficiency.

I. INTRODUCTION

Recently, there has been increase in research work in the field ofwireless sensor networks (WSNs). This is due to several applications

of sensor networks such as, surveillance, environmental monitoring,

intelligent health care systems, intelligent building, battle field control

and many others. WSNs operate in the industrial scientific andmedical (ISM) frequency band. This is unlicensed band that is open

to other communication applications such as, Wi-Fi systems, wireless

microphones, Bluetooth and microwave oven. As a result of

numerous applications operating in this band, there is spectrum

scarcity problem. In addition to spectrum scarcity is the problem of

interference among dissimilar wireless applications using this band.According to [1], in some locations, the occupancy of the 2.4GHz

frequency band has reached 90%.

As a result of spectrum scarcity and interference both in thelicensed and unlicensed band, a new efficient spectrum utilization

paradigm is being proposed. This paradigm is called cognitive radio

(CR). With CR, communication devices are able to adaptively anddynamically utilize the limited spectrum in an intelligent manner. CR

has the capability to sense its radio environment, intelligently adapt

its communication parameters and reconfigure itself accordingly.

With these characteristics of CRs, the problem of spectrum scarcity

and interference can easily be mitigated.

Leveraging on the numerous advantages attributed to CR, the

authors in [2], proposed a new sensor networking called cognitive

radio sensor network. This type of sensor network incorporates the

CR functionalities into the traditional WSN. It is possible for sensor

networks to operate in the licensed band in an opportunistic mannerby periodically sensing the spectrum for available channel. This

operation is what is called dynamic spectrum access (DSA), which is

the core of cognitive radio technology. The paradigm of cognitive

radio-based wireless sensor networks (CRWSN) is more challenging

that the traditional wireless sensor networks and cognitive radio

networks [3]. This is so because CRWSN combines the features ofWSN and CR in one network. Among other challenges are energy

and processing constraints. The CRWSN are also deployed in remote

locations and they are battery driven. Battery recharge is not possible

in most cases. Also, as a result of miniaturization of sensor nodes,

computation complexity has to be kept simple. In addition to these

problems, simple antennas and radios are to be used in order to

mitigate the problem of cost of deployment. In this paper, we arefocusing on energy-efficient channel access framework for CRWSN

using reinforcement learning.There are few works in literature that proposed different

frameworks for ad hoc networks and sensor networks based oncognitive radio approach. The approaches are either multi-radio ormulti-channel based. Example of such is the configurable mediumaccess control (CMAC) protocol.However, in this paper, we have

proposed a framework for channel access in cognitive radio basedwireless sensor networks using reinforcement learning. We adoptedQ-learning approach. The justification for our approach is, with Q-learning, future action can be determined based on the experience ofthe past action. Hence, a more rewarding action can be taken by anagent learning from its environment.

II. R ELATED WORKS

In [3], the authors presented recent developments and open

research issues in spectrum management based on CR

networks. Specifically, their discussion was focused on thedevelopment of CR networks that does not require

modification of existing networks. This work failed to address

the issue of coexistence and interference.

Universiti Teknologi Malaysia under the vote number QJ130000-2623-08J66. Federal Government of Nigeria through TETFund and Federal

University of Technology Minna.

2013 IEEE Student Conference on Research and Development (SCOReD), 16 -17 December 2013, Putrajaya, Malaysia

978-1-4799-2656-5/13/$31.00 2013 IEEE 386

7/25/2019 07002615.pdf

2/6

As a result of energy constraint in CRWSN, it is highly

imperative to develop a simple, energy-efficient channel

decision strategy for CRWSN operating in the ISM band. The

authors in [4] proposed a spectrum decision method that

utilizes the information about the primary user (PU) as the

input to the decision making process. However, the authors

failed to address the implementation of their methods in both

the PHY and MAC layers. In a bid to increase communication

reliability, [5] proposed a multi-radio architecture for sensornodes. This was done on the Incident Reporting Information

System (IRIS) software platform and IEEE 802.15.4 in order

to operate with two radios and with 900MHz and 2.4MHz

frequencies. Even though the result of the experiment shows

improved link stability and delivery rate, the radios were used

independently. Hence, cognitive radio functions were

absent.[6] proposed a channel sensing order for CR secondary

user (SU) in a multi-channel network without prior knowledge

of the PU activities. However, they did not consider the

inherent problem of energy efficiency peculiar to sensor

networks. Although, the authors in [7] used reinforcement

learning to search dynamically the optimal sensing order, their

method was not considered for CRWSN. The work in [8]

proposed optimal sensing and access mechanisms, but not

specifically adaptable to sensor networks. In [9], sensing time

and PU activities were used to maximize the SU throughput

and to keep the probability of collision below a certainthreshold. Nothing in terms of energy efficiency was done and

no decision strategy was applied. The authors in [10]

considered joint source and channel sensing as parameters on

which they based their energy efficiency. However, there was

no channel decision approach used for channel access.

While attempts have been made in literature to studyenergy efficiency in cognitive radio networks, little is being

done in the area of energy efficient decision strategy for

channel access in CRWSN. Therefore, in this paper, our focus

is to use reinforcement learning to achieve simple, energy

efficient decision for channel access in CRWSN. From the

results in this paper, using reinforcement learning techniqueyields energy efficient channel access in terms of maintaining

balance between sensing duration and transmission time.The rest of this paper is organized as follows; section III

describes the network topology and model, and section IVdescribes the channel access framework proposed. In section Vwe present the result of our simulation and we conclude the

paper in section VI.

III. NETWORK TOPOLOGY AND SYSTEM MODEL

Cognitive radio-based wireless sensor networks (CRWSN)are wireless sensor networks that employ cognition todynamically use the available channel in ISM band tocommunicate [11]. In this section, we consider a CRWSN withstar topology, cluster-based CRWSN. This model was chosenin order to analyse the energy issue more conveniently.Cluster-based CR sensor network topology is consideredenergy efficient.

A. Cognitive Radio-based Wireless Sensor Networks Model

A cluster-based, multi-channel CRWSN is considered inthis paper. In each cluster, there is a cluster head (CH), severalcluster member nodes (MN) which are assumed static or withnegligible mobility within their cluster range. This is illustratedin Fig. 1. The CH is endowed with cognitive radio capabilitiesof spectrum sensing and channel allocation among membernodes within its cluster. The CH also acts as the central control

of the network.

Fig. 1. Network model of cluster-based cognitive radio-based wireless sensor

networks

B. Cognitive Radio-based Wireless Sensor Networks

Operation

In Fig. 2 we show the per frame operation of CRWSN. Thisis time-slotted system.

Fig. 2. Sensing, channel access and switching operations of cognitive radio-

based wireless sensor networks

t1

Local Common Control Channel (LCCC)

Channels

C1

C2

C3

t2

Channel

sensed

SU

Transmit

Unavailable

channel

SU not

transmit

PU interrupts

SU in channel

Time

387

7/25/2019 07002615.pdf

3/6

Each frame is divided into two time slots which are, sensingand transmission slots. Sensing operation is carried out

periodically for a durationt1. During the transmission slot, theCR transmit packet for a duration t2.The local common controlchannel (LCCC) is introduced for the purpose of informationcontrol. In each cluster, there are C channels and only oneLCCC. A channel is considered available when there is no

primary user activity in the channel during the sensing and

transmission slots respectively, and channel condition issuitable for data transmission by the secondary user.Depending on the outcome of the sensing operation by the CH,the CH of the CRWSN at the decision state proceeds to one ofthe following states, transmit/receive state or handoff state.

As a result of radio frequency (RF) front-end hardwarelimitation and energy constraint of sensor networks, energydetection spectrum sensing is considered. With energydetection spectrum sensing technique, prior knowledge of the

primary user activity is not required. Hence, the CH does nothave to keep statistical record of the primary user activitywhich is random in nature.

C. Primary user behavior model

The primary user (PU) activities is modeled as a two-state,time-homogenous discrete Markov process [8]. This model waschosen in order to investigate the dependence of the presentstate on the previous state. The two-state Markov process isshown in Fig. 3. We considered a spectrum band consisting ofC number of channels each having different bandwidth BW.The PU activity in each of the channel can either be occupiedstate (that is, 0 orbusy, or ON state) or available state (that is, 1oridleor OFF state) at any given time. When the PU occupiesa channel, the channel is not available for the CR user totransmit. Otherwise, the CR user transmit within the availablechannel assuming that other channel conditions such as fadingand noise are favourable for packet transmission.

Fig. 3. Two state Markov model for primary user channel behavior

The PU occupancy of the channel is an exponential randomdurationp. This corresponds to an ON state when the channelis busy. Let the OFF state be the random duration of q forwhich the PU is not occupying the channel. The probabilitiesof PU channel occupancy and PU absence means probabilitiesof ON and OFF respectively. These probabilities are derivedas;

= + (1)

= + (2)

Assuming energy detection spectrum sensing, theprobability of detectionPdand probability of false alarm Pfaregiven as (3) and (4) respectively.

()= + (+ )4(+ ) (3)

()= +

4 (4)

Where and are the detection and false alarm decisionthreshold values respectively, and are noise and primarysignal variances, W is the sampling frequency, and t1 is the

sensing time of the SU. The energy consumption through

sensing is a function of sensing time. Q is the tail probability

of the standard normal distribution or the probability that a

normal (Gaussian) random variable will obtain a value largerthanx standard deviations above the mean otherwise known as

Q-function.Accurate probability of detection and high spectrum

sensing efficiency in terms of near zero misdetectionprobability are two important aims of spectrum sensing.Therefore, sensing efficiency is defined as the ratio of thetransmission time to the total CR operation time. From Fig. 2we can determine the spectrum sensing efficiency SSeffas;

= + (5)Where, t

1and t

2 corresponds to sensing and transmission time

slots respectively.

IV. ENERGY EFFICIENT CHANNEL ACCESS FRAMEWORK

USING REINFORCEMENT LEARNING

Here, we describe a simple energy efficient channel accessfor the CR sensor network using reinforcement learning (RL)technique. The advantage of using RL is that it does not requirea prior knowledge about the channel availability or theestimated channel quality through its average SNRs. Ofimportance also is the fact that using RL, the CR user caneasily adapt to changes in channel characteristics since it canlearn from the previous action. Hence, RL technique shieldsthe CR sensor network from the effect of changes in the PU

activity pattern on the channels. In this section we present RLfundamentals and channel access framework using RL.

A. Reinforcement learning theory

Reinforcement learning provides a simple means of trainingan agent to interact properly with its environment (in this case,radio environment) to achieve a given objective. Through theuse of reward and numerous trials in an environment, the agentlearns the proper action for each state. For simplicity ofapproach, Q-learning variant of RL has been chosen in thiswork among other variants of RL. Fig. 4 shows the algorithmflow of Q-learning used.

Available Busy

p1

q1

1-q11-p1

388

7/25/2019 07002615.pdf

4/6

Fig. 4. Generic Q-learning flow diagram

The Q learning update function is given as;

() () Where is the Q value of state S for which action A istaken, is the learning rate, which is a constant value that

defines how important new learned information is to the agent,

is another constant called discount factor. It determines

whether new information is more important than the previousor the other way round. S is the initial state while Y is the

subsequent state. The reward for transiting from S to Y is ra,

andE(Y)is the maximum Q value for stateY.

Once the Q value is updated, the agent transits to the new

state, and the entire process starts all over again. This is an

iterative process which must converge at a maximum rewardvalue after a finite number of iterations depending on the set

termination point. The Q value determines the amount of

reward (ra) for any given state-action pair. An agent decides

on which action to take based on the Q values it has estimated.

At every decision point, the agent weighs its current state-

action pair, chooses an action which results in another state-

action pair. Based on this action, the agent receives a reward

(ra) and the Q value in the new state is updated. The moment

the Q value is updated, the agent transitions to the new state

and the whole process starts all over again.The decision making strategy used in Q-learning is the

probabilistic greedy action selection strategy often regarded asP-greedy (Pg). An action is selected probabilistically based onthe given Q-value associated with the action. The Q-value isdirectly proportional to the probability that an action will beselected. Therefore, it means, as Q-value of an action increase,which leads to greaterra for the agent, the probability that theaction will be selected increases. After learning action

selection and how to maximize reward by using P-greedymethod, the agent then use maximum Q-value to determinewhich action to take at a given state. This is called greedyaction selection method. This method compliments the P-greedy method by enabling the agent maximize its reward as itexplores and exploits its environment using the best strategy.

B. Reinforcement learning-based framework

Fig. 5 shows the algorithm of the reinforcement learningbased framework for channel access. The Q-table is initializedwith state, action, and Q value set to zero. The CR learns fromits radio environment and randomly selects an action between 0and 1, which corresponds to possible channel state. For a givenchannelCasensed as free, the reward obtained for this action isan increase, while for a sensed busy channel, the reward is adecrease.

Fig. 5. Proposed channel access framework algorithm based on Q-learning

ComputeraforCa

Start

Initialization of Q-tables=a=Q=0

Takerandomnumberx[0,1]

If xQmax?

Use channelCa

End

Yes

No

No

No

Yes

Yes

389

7/25/2019 07002615.pdf

5/6

V. SIMULATION RESULT ANALYSIS

To validate the algorithm proposed, we used MATLAB tosimulate the effect of variation of sensing time and channelstate probabilities events. Table I shows the simulation

parameters set and description. The results evaluate the effectof sensing time (a major factor in spectrum sensing anddecision) on the three important channel state probabilitiesassociated with CRWSN, channel state probability of

detection, channel state probability of misdetection and channelstate probability of false alarm.

TABLE I. SET OF SIMULATION PARAMETERS

Parameter Description Value

BW Channel Bandwidth 1000KHz

ISM-Band Operating frequency Band 2.4GHz

EnergyDetectionThreshold 1and5 Distance between PU and SU 50mq Average Channel Free time 0.12s

p Average Channel Busy ti me 0.06s

Pathlosscomponent 2.5 Probability of Channel busy 0.33

Probability of channel free 0.66

No Noise Power 1.38x10-22

PU PU Transmission Power 10dB

Im Maximum Int erference Ra tio 0.1

In Fig. 6 the variation of sensing time and the probability ofdetection is shown. As the sensing time decreases, there is aslow, but steady decrease in the probability of detection. This isso because a smaller sensing time does not guarantee a fastdetection of primary user in the channel. Howbeit, a longersensing time is energy inefficient. An optimal sensing time hasto be determined in order to strike a balance between the longersensing time and shorter sensing time in relation to both

probability of detection and energy efficiency.Also, from the effect of varying the energy detection

threshold value becomes significant at lower sensing time. Itwas observed from Fig. 6 that, at a very small sensing time,

there is a sharp drop in the probability of detection when thethreshold is held at 5. This sharp drop is comparable to what is

obtained when the threshold is of a lower value of 1. Hence, it

is reasonable to conclude that to maintain the probability of

detection at a reasonable value, the detection threshold should

be set at a lower value.

Fig. 6. Variation of probability of detection with sensing time

Fig. 7 shows the variation of sensing time with theprobability of misdetection state. Misdetection occurs when thesecondary user cognitive radio sensor network could not detectthe presence of a primary user in the channel even though the

primary user is still transmitting within the channel. Theprobability of misdetection has significant effect on the sensingtime within the sensing slot of the SU frame. An exponentialdecrease in the probability of misdetection is observed with

increase in sensing time. This means that, if the SU senses for alonger time, the probability of misdetection will be reduced. Inorder to avoid collision due to interference with the PU as aresult of misdetection by the SU sensor network, it isrecommended that a sufficiently high sensing duration should

be used. However, this duration ought to be balanced in termsof energy efficiency of the CR sensor network.

Fig. 7. Variation of probability of misdetection with sensing time

Comparing between two threshold values, it could be seenthat a lower threshold gives higher probability of misdetection.As a result of this, a lower threshold value of energy detectionwill be most suitable in order to obtain a higher probability ofmisdetection.

There is a significant observable variation in the effect ofvarying the energy detection threshold in relation to the

probability of false alarm. This is shown in Fig. 8. At a higherthreshold of 5, the probability of false alarm increasesexponentially with sensing time, but at a reduced exponentialfactor compared to the same exponential increase when thethreshold is lower at value of 1.

From Fig. 8, it could be seen that the probability of falsealarm, which is a situation in which the CR, SU report to othernodes in the network the outcome of its sensing that the PU is

presently using the channel when that is not the actualsituation. This is understandable from the learning of the SUs.Based on the learning outcome of the past, the probability offalse alarm tends to increase as the sensing time increases. Thiswill ultimately reduce the reward rate of channel access of theSU.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Probabilityofdetection(Pd)

Sensing time (s)

Threshold = 1Threshold = 5

0

0.05

0.1

0.15

0.2

0.25

Probabilityofmisdetection

(pmd)

Sensing Time (s)


390

7/25/2019 07002615.pdf

6/6

Fig. 8. Variation of probability of false alarm with sensing time

VI. CONCLUSION

In this paper we have proposed a channel accessframework for cognitive radio-based wireless sensor networks

which is based on reinforcement learning technique. We have

used Q-learning approach in reinforcement learning to develop

a simple access algorithm. We have analyzed the effect of

sensing time on the probability of detection, probability of

misdetection and probability of false alarm. These parameters

were compared using different detection threshold values.

From our findings, we discovered the significant role of

maintaining the sensing time at an optimal value in order to

balance between energy efficiency and fast channel

availability and access.However, for future work, we shall be analyzing how to

ensure an optimal tradeoff between sensing time, energyefficiency and channel access.

ACKNOWLEDGMENT

The authors acknowledge the support of the TelematicResearch Group of Universiti Teknologi Malaysia under thevote number QJ130000-2623-08J66. The Federal Governmentof Nigeria through the Tertiary Education Trust Fund andFederal University of Technology Minna also providedfellowship to support this work.

REFERENCES

[1] L. H. A. Correia, E. E. Oliveira, D. F. Macedo, P. M. Moura, A. A. F.Loureiro, and J. S. Silva, "A framework for cognitive radio wirelesssensor networks," IEEE Symposium on Compute rs and Communications, pp. 611-616, 2012.

[2] O. Akan, O. Karli, and O. Ergul, "Cognitive radio sensor networks,"IEEE Network,vol. 23, pp. 34-40, 2009.

[3] I. F. Akyildiz, L. Won-Yeol, M. C. Vuran, and S. Mohanty, "A surveyon spectrum management in cognitive radio networks," IEEECommunications Magazine,vol. 46, pp. 40-48, 2008.

[4] G. Yuming, S. Yi, L. Shan, and E. Dutkiewicz, "ADSD: An automaticdistributed spectrum decision method in cognitive radio networks," FirstInternational Conference on Future Information Networks, pp. 253-258,2009.

[5] B. Kusy, C. Richter, H. Wen, M. Afanasyev, R. Jurdak, M. Brunig, D.

Abbott, Cong, Huynh and D. Ostry, "Radio diversity for reliablecommunication in wireless sensor networks," 10th InternationalConference on Information Processing in Sensor Networks, pp. 270-281,2011.

[6] C. Ho Ting and Z. Weihua, "Simple channel sensing order in cognitiveradio networks," IEEE Journal on Selected Areas in Communications,vol. 29, pp. 676-688, 2011.

[7] A. C. Mendes, C. H. P. Augusto, M. W. R. da Silva, R. M. Guedes, andJ. F. de Rezende, "Channel sensing order for cognitive radio networksusing reinforcement learning," 36th IEEE Conference on LocalComputer Networks, pp. 546-553, 2011.

[8] S. Wang, W. Yue, J. P. Coon, and A. Doufexi, "Energy efficientspectrum sensing and access for cognitive radio networks," IEEETransactions on Vehicular Technology,vol. 61, pp. 906-912, 2012.

[9] P. Yiyang, H. Anh Tuan, and L. Ying-Chang, "Sensing-throughput

tradeoff in cognitive radio networks: how frequently should spectrumsensing be carried out?," 18th IEEE International Symposium onPersonal, Indoor and Mobile Radio Communications, pp. 1-5, 2007.

[10] Z. Huazi, Z. Zhaoyang, C. Xiaoming, and Y. Rui, "Energy efficient jointsource and channel sensing in cognitive radio sensor networks," IEEEInternational Conference on Communications (ICC),pp. 1-6, 2011.

[11] J. A. Abolarinwa, N. M. Abdul Latiff, and S. K. Syed Yusof, EnergyConstrained Packet Size Optimization For Cluster-based CognitiveRadio-based Wireless Sensor Networks, Australian Journal of Basic andApplied Sciences,vol. 7, pp. 138-150, 2013.

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

ProbabilityofFalseAlarm

(Pf)

Sensing time (s)


391

Documents

07002615.pdf