Upload
saira-rafique
View
213
Download
0
Embed Size (px)
Citation preview
7/25/2019 07002615.pdf
1/6
Channel Access Framework For Cognitive Radio-
based Wireless Sensor Networks Using
Reinforcement Learning
J. A. Abolarinwa, N. M. Abdul Latiff, S. K. Syed Yusof
Faculty of Electrical Engineering
Universiti Teknologi Malaysia, UTM-MIMOS Center of Excellence
Johor, Malaysia
Abstract Cognitive radio-based wireless sensor network is a
new paradigm in sensor networks research. It is considered to
revolutionize next generation sensor networks. Therefore, it is of
paramount importance to develop an efficient channel access
technique suitable for cognitive radio-based wireless sensor
network. In this paper we have proposed a channel accessframework for cognitive radio-based wireless sensor networks
which is based on reinforcement learning technique. We have
used Q-learning approach to develop a simple access algorithm.
We have analyzed the effect of sensing time on the probability of
detection, probability of misdetection and probability of false
alarm. These parameters were compared using different
detection threshold values and significant simulation results were
discussed.
Keywords Cognitive-radio; Reinforcement; Channel; Q-
learning; Sensing; Energy-efficiency.
I. INTRODUCTION
Recently, there has been increase in research work in the field ofwireless sensor networks (WSNs). This is due to several applications
of sensor networks such as, surveillance, environmental monitoring,
intelligent health care systems, intelligent building, battle field control
and many others. WSNs operate in the industrial scientific andmedical (ISM) frequency band. This is unlicensed band that is open
to other communication applications such as, Wi-Fi systems, wireless
microphones, Bluetooth and microwave oven. As a result of
numerous applications operating in this band, there is spectrum
scarcity problem. In addition to spectrum scarcity is the problem of
interference among dissimilar wireless applications using this band.According to [1], in some locations, the occupancy of the 2.4GHz
frequency band has reached 90%.
As a result of spectrum scarcity and interference both in thelicensed and unlicensed band, a new efficient spectrum utilization
paradigm is being proposed. This paradigm is called cognitive radio
(CR). With CR, communication devices are able to adaptively anddynamically utilize the limited spectrum in an intelligent manner. CR
has the capability to sense its radio environment, intelligently adapt
its communication parameters and reconfigure itself accordingly.
With these characteristics of CRs, the problem of spectrum scarcity
and interference can easily be mitigated.
Leveraging on the numerous advantages attributed to CR, the
authors in [2], proposed a new sensor networking called cognitive
radio sensor network. This type of sensor network incorporates the
CR functionalities into the traditional WSN. It is possible for sensor
networks to operate in the licensed band in an opportunistic mannerby periodically sensing the spectrum for available channel. This
operation is what is called dynamic spectrum access (DSA), which is
the core of cognitive radio technology. The paradigm of cognitive
radio-based wireless sensor networks (CRWSN) is more challenging
that the traditional wireless sensor networks and cognitive radio
networks [3]. This is so because CRWSN combines the features ofWSN and CR in one network. Among other challenges are energy
and processing constraints. The CRWSN are also deployed in remote
locations and they are battery driven. Battery recharge is not possible
in most cases. Also, as a result of miniaturization of sensor nodes,
computation complexity has to be kept simple. In addition to these
problems, simple antennas and radios are to be used in order to
mitigate the problem of cost of deployment. In this paper, we arefocusing on energy-efficient channel access framework for CRWSN
using reinforcement learning.There are few works in literature that proposed different
frameworks for ad hoc networks and sensor networks based oncognitive radio approach. The approaches are either multi-radio ormulti-channel based. Example of such is the configurable mediumaccess control (CMAC) protocol.However, in this paper, we have
proposed a framework for channel access in cognitive radio basedwireless sensor networks using reinforcement learning. We adoptedQ-learning approach. The justification for our approach is, with Q-learning, future action can be determined based on the experience ofthe past action. Hence, a more rewarding action can be taken by anagent learning from its environment.
II. R ELATED WORKS
In [3], the authors presented recent developments and open
research issues in spectrum management based on CR
networks. Specifically, their discussion was focused on thedevelopment of CR networks that does not require
modification of existing networks. This work failed to address
the issue of coexistence and interference.
Universiti Teknologi Malaysia under the vote number QJ130000-2623-08J66. Federal Government of Nigeria through TETFund and Federal
University of Technology Minna.
2013 IEEE Student Conference on Research and Development (SCOReD), 16 -17 December 2013, Putrajaya, Malaysia
978-1-4799-2656-5/13/$31.00 2013 IEEE 386
7/25/2019 07002615.pdf
2/6
As a result of energy constraint in CRWSN, it is highly
imperative to develop a simple, energy-efficient channel
decision strategy for CRWSN operating in the ISM band. The
authors in [4] proposed a spectrum decision method that
utilizes the information about the primary user (PU) as the
input to the decision making process. However, the authors
failed to address the implementation of their methods in both
the PHY and MAC layers. In a bid to increase communication
reliability, [5] proposed a multi-radio architecture for sensornodes. This was done on the Incident Reporting Information
System (IRIS) software platform and IEEE 802.15.4 in order
to operate with two radios and with 900MHz and 2.4MHz
frequencies. Even though the result of the experiment shows
improved link stability and delivery rate, the radios were used
independently. Hence, cognitive radio functions were
absent.[6] proposed a channel sensing order for CR secondary
user (SU) in a multi-channel network without prior knowledge
of the PU activities. However, they did not consider the
inherent problem of energy efficiency peculiar to sensor
networks. Although, the authors in [7] used reinforcement
learning to search dynamically the optimal sensing order, their
method was not considered for CRWSN. The work in [8]
proposed optimal sensing and access mechanisms, but not
specifically adaptable to sensor networks. In [9], sensing time
and PU activities were used to maximize the SU throughput
and to keep the probability of collision below a certainthreshold. Nothing in terms of energy efficiency was done and
no decision strategy was applied. The authors in [10]
considered joint source and channel sensing as parameters on
which they based their energy efficiency. However, there was
no channel decision approach used for channel access.
While attempts have been made in literature to studyenergy efficiency in cognitive radio networks, little is being
done in the area of energy efficient decision strategy for
channel access in CRWSN. Therefore, in this paper, our focus
is to use reinforcement learning to achieve simple, energy
efficient decision for channel access in CRWSN. From the
results in this paper, using reinforcement learning techniqueyields energy efficient channel access in terms of maintaining
balance between sensing duration and transmission time.The rest of this paper is organized as follows; section III
describes the network topology and model, and section IVdescribes the channel access framework proposed. In section Vwe present the result of our simulation and we conclude the
paper in section VI.
III. NETWORK TOPOLOGY AND SYSTEM MODEL
Cognitive radio-based wireless sensor networks (CRWSN)are wireless sensor networks that employ cognition todynamically use the available channel in ISM band tocommunicate [11]. In this section, we consider a CRWSN withstar topology, cluster-based CRWSN. This model was chosenin order to analyse the energy issue more conveniently.Cluster-based CR sensor network topology is consideredenergy efficient.
A. Cognitive Radio-based Wireless Sensor Networks Model
A cluster-based, multi-channel CRWSN is considered inthis paper. In each cluster, there is a cluster head (CH), severalcluster member nodes (MN) which are assumed static or withnegligible mobility within their cluster range. This is illustratedin Fig. 1. The CH is endowed with cognitive radio capabilitiesof spectrum sensing and channel allocation among membernodes within its cluster. The CH also acts as the central control
of the network.
Fig. 1. Network model of cluster-based cognitive radio-based wireless sensor
networks
B. Cognitive Radio-based Wireless Sensor Networks
Operation
In Fig. 2 we show the per frame operation of CRWSN. Thisis time-slotted system.
Fig. 2. Sensing, channel access and switching operations of cognitive radio-
based wireless sensor networks
t1
Local Common Control Channel (LCCC)
Channels
C1
C2
C3
t2
Channel
sensed
SU
Transmit
Unavailable
channel
SU not
transmit
PU interrupts
SU in channel
Time
387
7/25/2019 07002615.pdf
3/6
Each frame is divided into two time slots which are, sensingand transmission slots. Sensing operation is carried out
periodically for a durationt1. During the transmission slot, theCR transmit packet for a duration t2.The local common controlchannel (LCCC) is introduced for the purpose of informationcontrol. In each cluster, there are C channels and only oneLCCC. A channel is considered available when there is no
primary user activity in the channel during the sensing and
transmission slots respectively, and channel condition issuitable for data transmission by the secondary user.Depending on the outcome of the sensing operation by the CH,the CH of the CRWSN at the decision state proceeds to one ofthe following states, transmit/receive state or handoff state.
As a result of radio frequency (RF) front-end hardwarelimitation and energy constraint of sensor networks, energydetection spectrum sensing is considered. With energydetection spectrum sensing technique, prior knowledge of the
primary user activity is not required. Hence, the CH does nothave to keep statistical record of the primary user activitywhich is random in nature.
C. Primary user behavior model
The primary user (PU) activities is modeled as a two-state,time-homogenous discrete Markov process [8]. This model waschosen in order to investigate the dependence of the presentstate on the previous state. The two-state Markov process isshown in Fig. 3. We considered a spectrum band consisting ofC number of channels each having different bandwidth BW.The PU activity in each of the channel can either be occupiedstate (that is, 0 orbusy, or ON state) or available state (that is, 1oridleor OFF state) at any given time. When the PU occupiesa channel, the channel is not available for the CR user totransmit. Otherwise, the CR user transmit within the availablechannel assuming that other channel conditions such as fadingand noise are favourable for packet transmission.
Fig. 3. Two state Markov model for primary user channel behavior
The PU occupancy of the channel is an exponential randomdurationp. This corresponds to an ON state when the channelis busy. Let the OFF state be the random duration of q forwhich the PU is not occupying the channel. The probabilitiesof PU channel occupancy and PU absence means probabilitiesof ON and OFF respectively. These probabilities are derivedas;
= + (1)
= + (2)
Assuming energy detection spectrum sensing, theprobability of detectionPdand probability of false alarm Pfaregiven as (3) and (4) respectively.
()= + (+ )4(+ ) (3)
()= +
4 (4)
Where and are the detection and false alarm decisionthreshold values respectively, and are noise and primarysignal variances, W is the sampling frequency, and t1 is the
sensing time of the SU. The energy consumption through
sensing is a function of sensing time. Q is the tail probability
of the standard normal distribution or the probability that a
normal (Gaussian) random variable will obtain a value largerthanx standard deviations above the mean otherwise known as
Q-function.Accurate probability of detection and high spectrum
sensing efficiency in terms of near zero misdetectionprobability are two important aims of spectrum sensing.Therefore, sensing efficiency is defined as the ratio of thetransmission time to the total CR operation time. From Fig. 2we can determine the spectrum sensing efficiency SSeffas;
= + (5)Where, t
1and t
2 corresponds to sensing and transmission time
slots respectively.
IV. ENERGY EFFICIENT CHANNEL ACCESS FRAMEWORK
USING REINFORCEMENT LEARNING
Here, we describe a simple energy efficient channel accessfor the CR sensor network using reinforcement learning (RL)technique. The advantage of using RL is that it does not requirea prior knowledge about the channel availability or theestimated channel quality through its average SNRs. Ofimportance also is the fact that using RL, the CR user caneasily adapt to changes in channel characteristics since it canlearn from the previous action. Hence, RL technique shieldsthe CR sensor network from the effect of changes in the PU
activity pattern on the channels. In this section we present RLfundamentals and channel access framework using RL.
A. Reinforcement learning theory
Reinforcement learning provides a simple means of trainingan agent to interact properly with its environment (in this case,radio environment) to achieve a given objective. Through theuse of reward and numerous trials in an environment, the agentlearns the proper action for each state. For simplicity ofapproach, Q-learning variant of RL has been chosen in thiswork among other variants of RL. Fig. 4 shows the algorithmflow of Q-learning used.
Available Busy
p1
q1
1-q11-p1
388
7/25/2019 07002615.pdf
4/6
Fig. 4. Generic Q-learning flow diagram
The Q learning update function is given as;
() () Where is the Q value of state S for which action A istaken, is the learning rate, which is a constant value that
defines how important new learned information is to the agent,
is another constant called discount factor. It determines
whether new information is more important than the previousor the other way round. S is the initial state while Y is the
subsequent state. The reward for transiting from S to Y is ra,
andE(Y)is the maximum Q value for stateY.
Once the Q value is updated, the agent transits to the new
state, and the entire process starts all over again. This is an
iterative process which must converge at a maximum rewardvalue after a finite number of iterations depending on the set
termination point. The Q value determines the amount of
reward (ra) for any given state-action pair. An agent decides
on which action to take based on the Q values it has estimated.
At every decision point, the agent weighs its current state-
action pair, chooses an action which results in another state-
action pair. Based on this action, the agent receives a reward
(ra) and the Q value in the new state is updated. The moment
the Q value is updated, the agent transitions to the new state
and the whole process starts all over again.The decision making strategy used in Q-learning is the
probabilistic greedy action selection strategy often regarded asP-greedy (Pg). An action is selected probabilistically based onthe given Q-value associated with the action. The Q-value isdirectly proportional to the probability that an action will beselected. Therefore, it means, as Q-value of an action increase,which leads to greaterra for the agent, the probability that theaction will be selected increases. After learning action
selection and how to maximize reward by using P-greedymethod, the agent then use maximum Q-value to determinewhich action to take at a given state. This is called greedyaction selection method. This method compliments the P-greedy method by enabling the agent maximize its reward as itexplores and exploits its environment using the best strategy.
B. Reinforcement learning-based framework
Fig. 5 shows the algorithm of the reinforcement learningbased framework for channel access. The Q-table is initializedwith state, action, and Q value set to zero. The CR learns fromits radio environment and randomly selects an action between 0and 1, which corresponds to possible channel state. For a givenchannelCasensed as free, the reward obtained for this action isan increase, while for a sensed busy channel, the reward is adecrease.
Fig. 5. Proposed channel access framework algorithm based on Q-learning
ComputeraforCa
Start
Initialization of Q-tables=a=Q=0
Takerandomnumberx[0,1]
If xQmax?
Use channelCa
End
Yes
No
No
No
Yes
Yes
389
7/25/2019 07002615.pdf
5/6
V. SIMULATION RESULT ANALYSIS
To validate the algorithm proposed, we used MATLAB tosimulate the effect of variation of sensing time and channelstate probabilities events. Table I shows the simulation
parameters set and description. The results evaluate the effectof sensing time (a major factor in spectrum sensing anddecision) on the three important channel state probabilitiesassociated with CRWSN, channel state probability of
detection, channel state probability of misdetection and channelstate probability of false alarm.
TABLE I. SET OF SIMULATION PARAMETERS
Parameter Description Value
BW Channel Bandwidth 1000KHz
ISM-Band Operating frequency Band 2.4GHz
EnergyDetectionThreshold 1and5 Distance between PU and SU 50mq Average Channel Free time 0.12s
p Average Channel Busy ti me 0.06s
Pathlosscomponent 2.5 Probability of Channel busy 0.33
Probability of channel free 0.66
No Noise Power 1.38x10-22
PU PU Transmission Power 10dB
Im Maximum Int erference Ra tio 0.1
In Fig. 6 the variation of sensing time and the probability ofdetection is shown. As the sensing time decreases, there is aslow, but steady decrease in the probability of detection. This isso because a smaller sensing time does not guarantee a fastdetection of primary user in the channel. Howbeit, a longersensing time is energy inefficient. An optimal sensing time hasto be determined in order to strike a balance between the longersensing time and shorter sensing time in relation to both
probability of detection and energy efficiency.Also, from the effect of varying the energy detection
threshold value becomes significant at lower sensing time. Itwas observed from Fig. 6 that, at a very small sensing time,
there is a sharp drop in the probability of detection when thethreshold is held at 5. This sharp drop is comparable to what is
obtained when the threshold is of a lower value of 1. Hence, it
is reasonable to conclude that to maintain the probability of
detection at a reasonable value, the detection threshold should
be set at a lower value.
Fig. 6. Variation of probability of detection with sensing time
Fig. 7 shows the variation of sensing time with theprobability of misdetection state. Misdetection occurs when thesecondary user cognitive radio sensor network could not detectthe presence of a primary user in the channel even though the
primary user is still transmitting within the channel. Theprobability of misdetection has significant effect on the sensingtime within the sensing slot of the SU frame. An exponentialdecrease in the probability of misdetection is observed with
increase in sensing time. This means that, if the SU senses for alonger time, the probability of misdetection will be reduced. Inorder to avoid collision due to interference with the PU as aresult of misdetection by the SU sensor network, it isrecommended that a sufficiently high sensing duration should
be used. However, this duration ought to be balanced in termsof energy efficiency of the CR sensor network.
Fig. 7. Variation of probability of misdetection with sensing time
Comparing between two threshold values, it could be seenthat a lower threshold gives higher probability of misdetection.As a result of this, a lower threshold value of energy detectionwill be most suitable in order to obtain a higher probability ofmisdetection.
There is a significant observable variation in the effect ofvarying the energy detection threshold in relation to the
probability of false alarm. This is shown in Fig. 8. At a higherthreshold of 5, the probability of false alarm increasesexponentially with sensing time, but at a reduced exponentialfactor compared to the same exponential increase when thethreshold is lower at value of 1.
From Fig. 8, it could be seen that the probability of falsealarm, which is a situation in which the CR, SU report to othernodes in the network the outcome of its sensing that the PU is
presently using the channel when that is not the actualsituation. This is understandable from the learning of the SUs.Based on the learning outcome of the past, the probability offalse alarm tends to increase as the sensing time increases. Thiswill ultimately reduce the reward rate of channel access of theSU.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Probabilityofdetection(Pd)
Sensing time (s)
Threshold = 1Threshold = 5
0
0.05
0.1
0.15
0.2
0.25
Probabilityofmisdetection
(pmd)
Sensing Time (s)
Threshold = 5Threshold = 1
390
7/25/2019 07002615.pdf
6/6
Fig. 8. Variation of probability of false alarm with sensing time
VI. CONCLUSION
In this paper we have proposed a channel accessframework for cognitive radio-based wireless sensor networks
which is based on reinforcement learning technique. We have
used Q-learning approach in reinforcement learning to develop
a simple access algorithm. We have analyzed the effect of
sensing time on the probability of detection, probability of
misdetection and probability of false alarm. These parameters
were compared using different detection threshold values.
From our findings, we discovered the significant role of
maintaining the sensing time at an optimal value in order to
balance between energy efficiency and fast channel
availability and access.However, for future work, we shall be analyzing how to
ensure an optimal tradeoff between sensing time, energyefficiency and channel access.
ACKNOWLEDGMENT
The authors acknowledge the support of the TelematicResearch Group of Universiti Teknologi Malaysia under thevote number QJ130000-2623-08J66. The Federal Governmentof Nigeria through the Tertiary Education Trust Fund andFederal University of Technology Minna also providedfellowship to support this work.
REFERENCES
[1] L. H. A. Correia, E. E. Oliveira, D. F. Macedo, P. M. Moura, A. A. F.Loureiro, and J. S. Silva, "A framework for cognitive radio wirelesssensor networks," IEEE Symposium on Compute rs and Communications, pp. 611-616, 2012.
[2] O. Akan, O. Karli, and O. Ergul, "Cognitive radio sensor networks,"IEEE Network,vol. 23, pp. 34-40, 2009.
[3] I. F. Akyildiz, L. Won-Yeol, M. C. Vuran, and S. Mohanty, "A surveyon spectrum management in cognitive radio networks," IEEECommunications Magazine,vol. 46, pp. 40-48, 2008.
[4] G. Yuming, S. Yi, L. Shan, and E. Dutkiewicz, "ADSD: An automaticdistributed spectrum decision method in cognitive radio networks," FirstInternational Conference on Future Information Networks, pp. 253-258,2009.
[5] B. Kusy, C. Richter, H. Wen, M. Afanasyev, R. Jurdak, M. Brunig, D.
Abbott, Cong, Huynh and D. Ostry, "Radio diversity for reliablecommunication in wireless sensor networks," 10th InternationalConference on Information Processing in Sensor Networks, pp. 270-281,2011.
[6] C. Ho Ting and Z. Weihua, "Simple channel sensing order in cognitiveradio networks," IEEE Journal on Selected Areas in Communications,vol. 29, pp. 676-688, 2011.
[7] A. C. Mendes, C. H. P. Augusto, M. W. R. da Silva, R. M. Guedes, andJ. F. de Rezende, "Channel sensing order for cognitive radio networksusing reinforcement learning," 36th IEEE Conference on LocalComputer Networks, pp. 546-553, 2011.
[8] S. Wang, W. Yue, J. P. Coon, and A. Doufexi, "Energy efficientspectrum sensing and access for cognitive radio networks," IEEETransactions on Vehicular Technology,vol. 61, pp. 906-912, 2012.
[9] P. Yiyang, H. Anh Tuan, and L. Ying-Chang, "Sensing-throughput
tradeoff in cognitive radio networks: how frequently should spectrumsensing be carried out?," 18th IEEE International Symposium onPersonal, Indoor and Mobile Radio Communications, pp. 1-5, 2007.
[10] Z. Huazi, Z. Zhaoyang, C. Xiaoming, and Y. Rui, "Energy efficient jointsource and channel sensing in cognitive radio sensor networks," IEEEInternational Conference on Communications (ICC),pp. 1-6, 2011.
[11] J. A. Abolarinwa, N. M. Abdul Latiff, and S. K. Syed Yusof, EnergyConstrained Packet Size Optimization For Cluster-based CognitiveRadio-based Wireless Sensor Networks, Australian Journal of Basic andApplied Sciences,vol. 7, pp. 138-150, 2013.
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
ProbabilityofFalseAlarm
(Pf)
Sensing time (s)
Threshold = 5Threshold = 1
391