07002615.pdf

Embed Size (px)

Citation preview

  • 7/25/2019 07002615.pdf

    1/6

    Channel Access Framework For Cognitive Radio-

    based Wireless Sensor Networks Using

    Reinforcement Learning

    J. A. Abolarinwa, N. M. Abdul Latiff, S. K. Syed Yusof

    Faculty of Electrical Engineering

    Universiti Teknologi Malaysia, UTM-MIMOS Center of Excellence

    Johor, Malaysia

    Abstract Cognitive radio-based wireless sensor network is a

    new paradigm in sensor networks research. It is considered to

    revolutionize next generation sensor networks. Therefore, it is of

    paramount importance to develop an efficient channel access

    technique suitable for cognitive radio-based wireless sensor

    network. In this paper we have proposed a channel accessframework for cognitive radio-based wireless sensor networks

    which is based on reinforcement learning technique. We have

    used Q-learning approach to develop a simple access algorithm.

    We have analyzed the effect of sensing time on the probability of

    detection, probability of misdetection and probability of false

    alarm. These parameters were compared using different

    detection threshold values and significant simulation results were

    discussed.

    Keywords Cognitive-radio; Reinforcement; Channel; Q-

    learning; Sensing; Energy-efficiency.

    I. INTRODUCTION

    Recently, there has been increase in research work in the field ofwireless sensor networks (WSNs). This is due to several applications

    of sensor networks such as, surveillance, environmental monitoring,

    intelligent health care systems, intelligent building, battle field control

    and many others. WSNs operate in the industrial scientific andmedical (ISM) frequency band. This is unlicensed band that is open

    to other communication applications such as, Wi-Fi systems, wireless

    microphones, Bluetooth and microwave oven. As a result of

    numerous applications operating in this band, there is spectrum

    scarcity problem. In addition to spectrum scarcity is the problem of

    interference among dissimilar wireless applications using this band.According to [1], in some locations, the occupancy of the 2.4GHz

    frequency band has reached 90%.

    As a result of spectrum scarcity and interference both in thelicensed and unlicensed band, a new efficient spectrum utilization

    paradigm is being proposed. This paradigm is called cognitive radio

    (CR). With CR, communication devices are able to adaptively anddynamically utilize the limited spectrum in an intelligent manner. CR

    has the capability to sense its radio environment, intelligently adapt

    its communication parameters and reconfigure itself accordingly.

    With these characteristics of CRs, the problem of spectrum scarcity

    and interference can easily be mitigated.

    Leveraging on the numerous advantages attributed to CR, the

    authors in [2], proposed a new sensor networking called cognitive

    radio sensor network. This type of sensor network incorporates the

    CR functionalities into the traditional WSN. It is possible for sensor

    networks to operate in the licensed band in an opportunistic mannerby periodically sensing the spectrum for available channel. This

    operation is what is called dynamic spectrum access (DSA), which is

    the core of cognitive radio technology. The paradigm of cognitive

    radio-based wireless sensor networks (CRWSN) is more challenging

    that the traditional wireless sensor networks and cognitive radio

    networks [3]. This is so because CRWSN combines the features ofWSN and CR in one network. Among other challenges are energy

    and processing constraints. The CRWSN are also deployed in remote

    locations and they are battery driven. Battery recharge is not possible

    in most cases. Also, as a result of miniaturization of sensor nodes,

    computation complexity has to be kept simple. In addition to these

    problems, simple antennas and radios are to be used in order to

    mitigate the problem of cost of deployment. In this paper, we arefocusing on energy-efficient channel access framework for CRWSN

    using reinforcement learning.There are few works in literature that proposed different

    frameworks for ad hoc networks and sensor networks based oncognitive radio approach. The approaches are either multi-radio ormulti-channel based. Example of such is the configurable mediumaccess control (CMAC) protocol.However, in this paper, we have

    proposed a framework for channel access in cognitive radio basedwireless sensor networks using reinforcement learning. We adoptedQ-learning approach. The justification for our approach is, with Q-learning, future action can be determined based on the experience ofthe past action. Hence, a more rewarding action can be taken by anagent learning from its environment.

    II. R ELATED WORKS

    In [3], the authors presented recent developments and open

    research issues in spectrum management based on CR

    networks. Specifically, their discussion was focused on thedevelopment of CR networks that does not require

    modification of existing networks. This work failed to address

    the issue of coexistence and interference.

    Universiti Teknologi Malaysia under the vote number QJ130000-2623-08J66. Federal Government of Nigeria through TETFund and Federal

    University of Technology Minna.

    2013 IEEE Student Conference on Research and Development (SCOReD), 16 -17 December 2013, Putrajaya, Malaysia

    978-1-4799-2656-5/13/$31.00 2013 IEEE 386

  • 7/25/2019 07002615.pdf

    2/6

    As a result of energy constraint in CRWSN, it is highly

    imperative to develop a simple, energy-efficient channel

    decision strategy for CRWSN operating in the ISM band. The

    authors in [4] proposed a spectrum decision method that

    utilizes the information about the primary user (PU) as the

    input to the decision making process. However, the authors

    failed to address the implementation of their methods in both

    the PHY and MAC layers. In a bid to increase communication

    reliability, [5] proposed a multi-radio architecture for sensornodes. This was done on the Incident Reporting Information

    System (IRIS) software platform and IEEE 802.15.4 in order

    to operate with two radios and with 900MHz and 2.4MHz

    frequencies. Even though the result of the experiment shows

    improved link stability and delivery rate, the radios were used

    independently. Hence, cognitive radio functions were

    absent.[6] proposed a channel sensing order for CR secondary

    user (SU) in a multi-channel network without prior knowledge

    of the PU activities. However, they did not consider the

    inherent problem of energy efficiency peculiar to sensor

    networks. Although, the authors in [7] used reinforcement

    learning to search dynamically the optimal sensing order, their

    method was not considered for CRWSN. The work in [8]

    proposed optimal sensing and access mechanisms, but not

    specifically adaptable to sensor networks. In [9], sensing time

    and PU activities were used to maximize the SU throughput

    and to keep the probability of collision below a certainthreshold. Nothing in terms of energy efficiency was done and

    no decision strategy was applied. The authors in [10]

    considered joint source and channel sensing as parameters on

    which they based their energy efficiency. However, there was

    no channel decision approach used for channel access.

    While attempts have been made in literature to studyenergy efficiency in cognitive radio networks, little is being

    done in the area of energy efficient decision strategy for

    channel access in CRWSN. Therefore, in this paper, our focus

    is to use reinforcement learning to achieve simple, energy

    efficient decision for channel access in CRWSN. From the

    results in this paper, using reinforcement learning techniqueyields energy efficient channel access in terms of maintaining

    balance between sensing duration and transmission time.The rest of this paper is organized as follows; section III

    describes the network topology and model, and section IVdescribes the channel access framework proposed. In section Vwe present the result of our simulation and we conclude the

    paper in section VI.

    III. NETWORK TOPOLOGY AND SYSTEM MODEL

    Cognitive radio-based wireless sensor networks (CRWSN)are wireless sensor networks that employ cognition todynamically use the available channel in ISM band tocommunicate [11]. In this section, we consider a CRWSN withstar topology, cluster-based CRWSN. This model was chosenin order to analyse the energy issue more conveniently.Cluster-based CR sensor network topology is consideredenergy efficient.

    A. Cognitive Radio-based Wireless Sensor Networks Model

    A cluster-based, multi-channel CRWSN is considered inthis paper. In each cluster, there is a cluster head (CH), severalcluster member nodes (MN) which are assumed static or withnegligible mobility within their cluster range. This is illustratedin Fig. 1. The CH is endowed with cognitive radio capabilitiesof spectrum sensing and channel allocation among membernodes within its cluster. The CH also acts as the central control

    of the network.

    Fig. 1. Network model of cluster-based cognitive radio-based wireless sensor

    networks

    B. Cognitive Radio-based Wireless Sensor Networks

    Operation

    In Fig. 2 we show the per frame operation of CRWSN. Thisis time-slotted system.

    Fig. 2. Sensing, channel access and switching operations of cognitive radio-

    based wireless sensor networks

    t1

    Local Common Control Channel (LCCC)

    Channels

    C1

    C2

    C3

    t2

    Channel

    sensed

    SU

    Transmit

    Unavailable

    channel

    SU not

    transmit

    PU interrupts

    SU in channel

    Time

    387

  • 7/25/2019 07002615.pdf

    3/6

    Each frame is divided into two time slots which are, sensingand transmission slots. Sensing operation is carried out

    periodically for a durationt1. During the transmission slot, theCR transmit packet for a duration t2.The local common controlchannel (LCCC) is introduced for the purpose of informationcontrol. In each cluster, there are C channels and only oneLCCC. A channel is considered available when there is no

    primary user activity in the channel during the sensing and

    transmission slots respectively, and channel condition issuitable for data transmission by the secondary user.Depending on the outcome of the sensing operation by the CH,the CH of the CRWSN at the decision state proceeds to one ofthe following states, transmit/receive state or handoff state.

    As a result of radio frequency (RF) front-end hardwarelimitation and energy constraint of sensor networks, energydetection spectrum sensing is considered. With energydetection spectrum sensing technique, prior knowledge of the

    primary user activity is not required. Hence, the CH does nothave to keep statistical record of the primary user activitywhich is random in nature.

    C. Primary user behavior model

    The primary user (PU) activities is modeled as a two-state,time-homogenous discrete Markov process [8]. This model waschosen in order to investigate the dependence of the presentstate on the previous state. The two-state Markov process isshown in Fig. 3. We considered a spectrum band consisting ofC number of channels each having different bandwidth BW.The PU activity in each of the channel can either be occupiedstate (that is, 0 orbusy, or ON state) or available state (that is, 1oridleor OFF state) at any given time. When the PU occupiesa channel, the channel is not available for the CR user totransmit. Otherwise, the CR user transmit within the availablechannel assuming that other channel conditions such as fadingand noise are favourable for packet transmission.

    Fig. 3. Two state Markov model for primary user channel behavior

    The PU occupancy of the channel is an exponential randomdurationp. This corresponds to an ON state when the channelis busy. Let the OFF state be the random duration of q forwhich the PU is not occupying the channel. The probabilitiesof PU channel occupancy and PU absence means probabilitiesof ON and OFF respectively. These probabilities are derivedas;

    = + (1)

    = + (2)

    Assuming energy detection spectrum sensing, theprobability of detectionPdand probability of false alarm Pfaregiven as (3) and (4) respectively.

    ()= + (+ )4(+ ) (3)

    ()= +

    4 (4)

    Where and are the detection and false alarm decisionthreshold values respectively, and are noise and primarysignal variances, W is the sampling frequency, and t1 is the

    sensing time of the SU. The energy consumption through

    sensing is a function of sensing time. Q is the tail probability

    of the standard normal distribution or the probability that a

    normal (Gaussian) random variable will obtain a value largerthanx standard deviations above the mean otherwise known as

    Q-function.Accurate probability of detection and high spectrum

    sensing efficiency in terms of near zero misdetectionprobability are two important aims of spectrum sensing.Therefore, sensing efficiency is defined as the ratio of thetransmission time to the total CR operation time. From Fig. 2we can determine the spectrum sensing efficiency SSeffas;

    = + (5)Where, t

    1and t

    2 corresponds to sensing and transmission time

    slots respectively.

    IV. ENERGY EFFICIENT CHANNEL ACCESS FRAMEWORK

    USING REINFORCEMENT LEARNING

    Here, we describe a simple energy efficient channel accessfor the CR sensor network using reinforcement learning (RL)technique. The advantage of using RL is that it does not requirea prior knowledge about the channel availability or theestimated channel quality through its average SNRs. Ofimportance also is the fact that using RL, the CR user caneasily adapt to changes in channel characteristics since it canlearn from the previous action. Hence, RL technique shieldsthe CR sensor network from the effect of changes in the PU

    activity pattern on the channels. In this section we present RLfundamentals and channel access framework using RL.

    A. Reinforcement learning theory

    Reinforcement learning provides a simple means of trainingan agent to interact properly with its environment (in this case,radio environment) to achieve a given objective. Through theuse of reward and numerous trials in an environment, the agentlearns the proper action for each state. For simplicity ofapproach, Q-learning variant of RL has been chosen in thiswork among other variants of RL. Fig. 4 shows the algorithmflow of Q-learning used.

    Available Busy

    p1

    q1

    1-q11-p1

    388

  • 7/25/2019 07002615.pdf

    4/6

    Fig. 4. Generic Q-learning flow diagram

    The Q learning update function is given as;

    () () Where is the Q value of state S for which action A istaken, is the learning rate, which is a constant value that

    defines how important new learned information is to the agent,

    is another constant called discount factor. It determines

    whether new information is more important than the previousor the other way round. S is the initial state while Y is the

    subsequent state. The reward for transiting from S to Y is ra,

    andE(Y)is the maximum Q value for stateY.

    Once the Q value is updated, the agent transits to the new

    state, and the entire process starts all over again. This is an

    iterative process which must converge at a maximum rewardvalue after a finite number of iterations depending on the set

    termination point. The Q value determines the amount of

    reward (ra) for any given state-action pair. An agent decides

    on which action to take based on the Q values it has estimated.

    At every decision point, the agent weighs its current state-

    action pair, chooses an action which results in another state-

    action pair. Based on this action, the agent receives a reward

    (ra) and the Q value in the new state is updated. The moment

    the Q value is updated, the agent transitions to the new state

    and the whole process starts all over again.The decision making strategy used in Q-learning is the

    probabilistic greedy action selection strategy often regarded asP-greedy (Pg). An action is selected probabilistically based onthe given Q-value associated with the action. The Q-value isdirectly proportional to the probability that an action will beselected. Therefore, it means, as Q-value of an action increase,which leads to greaterra for the agent, the probability that theaction will be selected increases. After learning action

    selection and how to maximize reward by using P-greedymethod, the agent then use maximum Q-value to determinewhich action to take at a given state. This is called greedyaction selection method. This method compliments the P-greedy method by enabling the agent maximize its reward as itexplores and exploits its environment using the best strategy.

    B. Reinforcement learning-based framework

    Fig. 5 shows the algorithm of the reinforcement learningbased framework for channel access. The Q-table is initializedwith state, action, and Q value set to zero. The CR learns fromits radio environment and randomly selects an action between 0and 1, which corresponds to possible channel state. For a givenchannelCasensed as free, the reward obtained for this action isan increase, while for a sensed busy channel, the reward is adecrease.

    Fig. 5. Proposed channel access framework algorithm based on Q-learning

    ComputeraforCa

    Start

    Initialization of Q-tables=a=Q=0

    Takerandomnumberx[0,1]

    If xQmax?

    Use channelCa

    End

    Yes

    No

    No

    No

    Yes

    Yes

    389

  • 7/25/2019 07002615.pdf

    5/6

    V. SIMULATION RESULT ANALYSIS

    To validate the algorithm proposed, we used MATLAB tosimulate the effect of variation of sensing time and channelstate probabilities events. Table I shows the simulation

    parameters set and description. The results evaluate the effectof sensing time (a major factor in spectrum sensing anddecision) on the three important channel state probabilitiesassociated with CRWSN, channel state probability of

    detection, channel state probability of misdetection and channelstate probability of false alarm.

    TABLE I. SET OF SIMULATION PARAMETERS

    Parameter Description Value

    BW Channel Bandwidth 1000KHz

    ISM-Band Operating frequency Band 2.4GHz

    EnergyDetectionThreshold 1and5 Distance between PU and SU 50mq Average Channel Free time 0.12s

    p Average Channel Busy ti me 0.06s

    Pathlosscomponent 2.5 Probability of Channel busy 0.33

    Probability of channel free 0.66

    No Noise Power 1.38x10-22

    PU PU Transmission Power 10dB

    Im Maximum Int erference Ra tio 0.1

    In Fig. 6 the variation of sensing time and the probability ofdetection is shown. As the sensing time decreases, there is aslow, but steady decrease in the probability of detection. This isso because a smaller sensing time does not guarantee a fastdetection of primary user in the channel. Howbeit, a longersensing time is energy inefficient. An optimal sensing time hasto be determined in order to strike a balance between the longersensing time and shorter sensing time in relation to both

    probability of detection and energy efficiency.Also, from the effect of varying the energy detection

    threshold value becomes significant at lower sensing time. Itwas observed from Fig. 6 that, at a very small sensing time,

    there is a sharp drop in the probability of detection when thethreshold is held at 5. This sharp drop is comparable to what is

    obtained when the threshold is of a lower value of 1. Hence, it

    is reasonable to conclude that to maintain the probability of

    detection at a reasonable value, the detection threshold should

    be set at a lower value.

    Fig. 6. Variation of probability of detection with sensing time

    Fig. 7 shows the variation of sensing time with theprobability of misdetection state. Misdetection occurs when thesecondary user cognitive radio sensor network could not detectthe presence of a primary user in the channel even though the

    primary user is still transmitting within the channel. Theprobability of misdetection has significant effect on the sensingtime within the sensing slot of the SU frame. An exponentialdecrease in the probability of misdetection is observed with

    increase in sensing time. This means that, if the SU senses for alonger time, the probability of misdetection will be reduced. Inorder to avoid collision due to interference with the PU as aresult of misdetection by the SU sensor network, it isrecommended that a sufficiently high sensing duration should

    be used. However, this duration ought to be balanced in termsof energy efficiency of the CR sensor network.

    Fig. 7. Variation of probability of misdetection with sensing time

    Comparing between two threshold values, it could be seenthat a lower threshold gives higher probability of misdetection.As a result of this, a lower threshold value of energy detectionwill be most suitable in order to obtain a higher probability ofmisdetection.

    There is a significant observable variation in the effect ofvarying the energy detection threshold in relation to the

    probability of false alarm. This is shown in Fig. 8. At a higherthreshold of 5, the probability of false alarm increasesexponentially with sensing time, but at a reduced exponentialfactor compared to the same exponential increase when thethreshold is lower at value of 1.

    From Fig. 8, it could be seen that the probability of falsealarm, which is a situation in which the CR, SU report to othernodes in the network the outcome of its sensing that the PU is

    presently using the channel when that is not the actualsituation. This is understandable from the learning of the SUs.Based on the learning outcome of the past, the probability offalse alarm tends to increase as the sensing time increases. Thiswill ultimately reduce the reward rate of channel access of theSU.

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    Probabilityofdetection(Pd)

    Sensing time (s)

    Threshold = 1Threshold = 5

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    Probabilityofmisdetection

    (pmd)

    Sensing Time (s)

    Threshold = 5Threshold = 1

    390

  • 7/25/2019 07002615.pdf

    6/6

    Fig. 8. Variation of probability of false alarm with sensing time

    VI. CONCLUSION

    In this paper we have proposed a channel accessframework for cognitive radio-based wireless sensor networks

    which is based on reinforcement learning technique. We have

    used Q-learning approach in reinforcement learning to develop

    a simple access algorithm. We have analyzed the effect of

    sensing time on the probability of detection, probability of

    misdetection and probability of false alarm. These parameters

    were compared using different detection threshold values.

    From our findings, we discovered the significant role of

    maintaining the sensing time at an optimal value in order to

    balance between energy efficiency and fast channel

    availability and access.However, for future work, we shall be analyzing how to

    ensure an optimal tradeoff between sensing time, energyefficiency and channel access.

    ACKNOWLEDGMENT

    The authors acknowledge the support of the TelematicResearch Group of Universiti Teknologi Malaysia under thevote number QJ130000-2623-08J66. The Federal Governmentof Nigeria through the Tertiary Education Trust Fund andFederal University of Technology Minna also providedfellowship to support this work.

    REFERENCES

    [1] L. H. A. Correia, E. E. Oliveira, D. F. Macedo, P. M. Moura, A. A. F.Loureiro, and J. S. Silva, "A framework for cognitive radio wirelesssensor networks," IEEE Symposium on Compute rs and Communications, pp. 611-616, 2012.

    [2] O. Akan, O. Karli, and O. Ergul, "Cognitive radio sensor networks,"IEEE Network,vol. 23, pp. 34-40, 2009.

    [3] I. F. Akyildiz, L. Won-Yeol, M. C. Vuran, and S. Mohanty, "A surveyon spectrum management in cognitive radio networks," IEEECommunications Magazine,vol. 46, pp. 40-48, 2008.

    [4] G. Yuming, S. Yi, L. Shan, and E. Dutkiewicz, "ADSD: An automaticdistributed spectrum decision method in cognitive radio networks," FirstInternational Conference on Future Information Networks, pp. 253-258,2009.

    [5] B. Kusy, C. Richter, H. Wen, M. Afanasyev, R. Jurdak, M. Brunig, D.

    Abbott, Cong, Huynh and D. Ostry, "Radio diversity for reliablecommunication in wireless sensor networks," 10th InternationalConference on Information Processing in Sensor Networks, pp. 270-281,2011.

    [6] C. Ho Ting and Z. Weihua, "Simple channel sensing order in cognitiveradio networks," IEEE Journal on Selected Areas in Communications,vol. 29, pp. 676-688, 2011.

    [7] A. C. Mendes, C. H. P. Augusto, M. W. R. da Silva, R. M. Guedes, andJ. F. de Rezende, "Channel sensing order for cognitive radio networksusing reinforcement learning," 36th IEEE Conference on LocalComputer Networks, pp. 546-553, 2011.

    [8] S. Wang, W. Yue, J. P. Coon, and A. Doufexi, "Energy efficientspectrum sensing and access for cognitive radio networks," IEEETransactions on Vehicular Technology,vol. 61, pp. 906-912, 2012.

    [9] P. Yiyang, H. Anh Tuan, and L. Ying-Chang, "Sensing-throughput

    tradeoff in cognitive radio networks: how frequently should spectrumsensing be carried out?," 18th IEEE International Symposium onPersonal, Indoor and Mobile Radio Communications, pp. 1-5, 2007.

    [10] Z. Huazi, Z. Zhaoyang, C. Xiaoming, and Y. Rui, "Energy efficient jointsource and channel sensing in cognitive radio sensor networks," IEEEInternational Conference on Communications (ICC),pp. 1-6, 2011.

    [11] J. A. Abolarinwa, N. M. Abdul Latiff, and S. K. Syed Yusof, EnergyConstrained Packet Size Optimization For Cluster-based CognitiveRadio-based Wireless Sensor Networks, Australian Journal of Basic andApplied Sciences,vol. 7, pp. 138-150, 2013.

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    ProbabilityofFalseAlarm

    (Pf)

    Sensing time (s)

    Threshold = 5Threshold = 1

    391