A reinforcement learning based routing protocol with qo s support for biomedical sensor networks

A Reinforcement Learning based Routing A Reinforcement Learning based Routing ProtocolProtocol

with QoS Support for Biomedical Sensor with QoS Support for Biomedical Sensor NetworksNetworksAuthor:

Xuedong LiangXuedong Liang

Ilangko BalasinghamIlangko Balasingham

Sang-Seon ByunSang-Seon ByunThe Interventional Center, Rikshospitalet University Hospital, Oslo, Norway N-0027

Dept. of Informatics, University of Oslo, Oslo, Norway N-0316

Dept. of Electronics and Telecommunications, Norwegian University of Science and Technology, Trondheim, Norway N-7491

Presented by:Iffat Anjum(Roll: 16)Nazia Alam(Roll: 28)15th Batch.

Date:26 th April, 2012

Green Networking Research GroupDept. of Computer Science and Engineering, University of Dhaka

Slide 1

ContentsContents Contribution. Problem Definition.

• Related works.• Biomedical Sensor Networks• Reinforcement Learning• Q-learning

Design of RL-QRP• Local Information Exchange• Q-learning Implementation• Learning-Based Routing Algorithm

Performance Evaluation. Limitation.

2

Slide 2


ContributionsContributions In RL-QRP, optimal routing policies can be found

through experiences and rewards without the need of maintaining precise network state information.

Considering impact of network traffic load and sensor node mobility on the network performance, RL-QRP fits well in dynamic environments.

RL-QRP performs well in terms of a number of QoS metrics and energy efficiency in various medical scenarios.

3

Slide 3


Slide 4 Problem DefinitionProblem DefinitionThe main function of biomedical sensor networks is ,

Ensuring that data packets can be sensed and delivered to the

medical server reliably and efficiently.

4Green Networking Research GroupDept. of Computer Science and Engineering, University of Dhaka

Related works

A number of QoS support routing protocols have been proposed for wireless sensor networks recently,

INSIGNIA, supported in mobile ad hoc networks, framework is based on in-band signaling and soft-state resource management. But not suitable for biomedical sensor networks for the inflexible nature of resource reservation scheme.

Problem DefinitionProblem Definition

CEDAR, is a core-extraction distributed ad hoc routing algorithm for QoS routing in ad hoc network environments. But the core could be the bottleneck of the network, the selection and maintenance of the core use extra network resources.

AdaR, adaptively learns optimal strategy to achieve multiple optimization goals. But how to map diverse QoS requirements into concrete Q-values is not defined.

Most of the previous QoS support routing protocols suffer .Heavy communication overhead.Computation burden of complicated algorithms.

5

Slide 5

Related works



A biomedical sensor network is deployed in a certain area, Sensor nodes are implanted or attached to patients body, Sink nodes are deployed in fix positions.

Biomedical sensor networks have the following features: Dynamic network topology : sensor node may leave, join or

dead (run out of battery); Time-varying wireless channel with serious electrical

interferences; Each sensor node has different QoS requirements , duty cycle,

packet arrival rate and forwarding willingness.

6

Slide 6

Biomedical Sensor Networks



Mobile nodes are aware of its geographic location , either using global positioning system (GPS) or distributed localization services.

Each node is aware of its immediate neighbors (within its radio range) and their locations using beacon exchanges.

Mobile sensor nodes follow the Random Waypoint Mobility Model (RWMM), for the network mobility.

This paper focus on 2 types of QoS requirements,Packet delivery ratio.End-to-end delay.

7

Slide 7

Biomedical Sensor Networks



8

Slide 8

Reinforcement Learning


Figure: A reinforcement learning model.


The concept of Reinforcement Learning is Markov Decision Process.

A MDP models an agent with a tuple (S,A,P,R).• S is the set of states,• A is a set of actions,• P(s` |s, a) is the transition model that describes the probability of

entering state s` after executing action a at state s.• R(s, a, s` ) is the reward obtained when the agent executes a at s and

enter s`. The goal of solving a MDP is to find an optimal policy , π : S → A,

that maps states to actions such that the cumulative reward is maximized.

9

Slide 9

Reinforcement Learning



10

Slide 10

Q-learning


A model-free method which calculates function Q(s, a) to find an optimal decision policy.

Each time an action a is executed, the agent receives an immediate reward r from the environment.

• Q(s, a) denotes the quality of action a at state s, α is the learning rate. And the weight of future rewards is modeled by γ.

• Q(s`, a`) is the expected future reward at state s` by taking action a`.

Design of Design of RL-QRPRL-QRP

11

Slide 11


The QoS routes computation and selection are based on a distributed reinforcement learning algorithm.

Sensor node calculates the route independently and individually. The Q-value Q(s, a) stands for the quality (progress has been

made) of the action a at state s.

Figure: Reinforcement learning based routing model.


12

Slide 12


QoS Support Consideration

Each node will check the Qos requirement of the data packet and its Q-value table. The node then checks if it can make a certain progress of the data packet, if so, it will forward the packet to one of its neighboring nodes with the highest Q-value; if not, the packet will be dropped or sent with ‘best effort’.

Local Information Exchange

The local information exchange are facilitated using beacon exchanges with 1-hop neighboring sensor nodes. Which contains,

Position Information Exchange. Q-values Exchange.


13

Slide 13


State: S = {si}, i= 1,2...N. N is the number of sensor nodes. Each node is a state s S.∈

Action: A = {a(sj |si)}, si, sj S. Execution of a(sj |si) means that a ∈packet is forwarded from state si to sj , provided si and sj are within each other’s communication range.

Reward function: R = prg(Pn).

Rn is the reward of execution of the action, which describes the progress has been made of forwarding data packet Pn.

Q-learning Implementation


14

Slide 14


Q-learning Implementation

Tsisj is the experienced delay between node si and sj ,

The reward of an action is implemented using ACK scheme.When node sj receives a packet from node si, sj will acknowledgethe packet by sending an ACK packet.

By calculating the1-hop delay, and the ratio of the number of ACK received divided by the number of data packets sent, si can estimatethe link properties between si and sj.


15

Slide 15


Learning-Based Routing Algorithm


16

Slide 16


Learning-Based Routing Algorithm

Performance EvaluationPerformance Evaluation

17

Slide 17


Fig: Average end-to-end delay Fig: Average packet delivery to the sink node. ratio to the sink node.

Performance EvaluationPerformance Evaluation

18

Slide 18


Fig: The impact of node mobility Fig:The impact of network trafficon average packet delivery ratio. load on average end-to-end delay.

LimitationLimitation

19

Slide 19


RL-QRP has neglected many common QoS requirements like network lifetime, throughput, connectivity etc.

Sensor nodes does not consider the interactions between itself and other sensor nodes, but this approach is not sufficient to achieve global optimization.

• Sensor nodes should consider the interactions with both the environment and the other nodes in the network, and cooperatively calculate the QoS routes in the context of multi-agent reinforcement learning (MaRL) framework.

20Green Networking Research GroupDept. of Computer Science and Engineering, University of Dhaka

THANK YOUTHANK YOU