29
Application of Reinforcement Learning in Network Routing By By Chaopin Zhu Chaopin Zhu

Applying Reinforcement Learning for Network Routing

  • Upload
    butest

  • View
    864

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Applying Reinforcement Learning for Network Routing

Application of Reinforcement Learning in Network Routing

ByBy

Chaopin ZhuChaopin Zhu

Page 2: Applying Reinforcement Learning for Network Routing

Machine Learning

Supervised LearningSupervised Learning Unsupervised LearningUnsupervised Learning Reinforcement LearningReinforcement Learning

Page 3: Applying Reinforcement Learning for Network Routing

Supervised Learning

Feature: Learning with a teacherFeature: Learning with a teacher PhasesPhases

• Training phaseTraining phase

• Testing phaseTesting phase ApplicationApplication

• Pattern recognitionPattern recognition

• Function approximationFunction approximation

Page 4: Applying Reinforcement Learning for Network Routing

Unsupervised Leaning

FeatureFeature

• Learning without a teacherLearning without a teacher ApplicationApplication

• Feature extractionFeature extraction

• Other preprocessingOther preprocessing

Page 5: Applying Reinforcement Learning for Network Routing

Reinforcement Learning

Feature: Learning with a criticFeature: Learning with a critic ApplicationApplication

• OptimizationOptimization

• Function approximationFunction approximation

Page 6: Applying Reinforcement Learning for Network Routing

Elements ofReinforcement Learning

AgentAgent EnvironmentEnvironment PolicyPolicy Reward functionReward function Value functionValue function Model of environment (optional)Model of environment (optional)

Page 7: Applying Reinforcement Learning for Network Routing

Reinforcement Learning Problem

Environment

Agent

action an

reward rn

state xn

xn+1

Rn+1

Page 8: Applying Reinforcement Learning for Network Routing

Markov Decision Process (MDP)

Definition:Definition:

A reinforcement learning task that satisfies A reinforcement learning task that satisfies the Markov propertythe Markov property

Transition probabilitiesTransition probabilities

nnnayx axyxP n

n,|Pr 1

Page 9: Applying Reinforcement Learning for Network Routing

An Example of MDP

High Low

search wait

recharge

search wait

Page 10: Applying Reinforcement Learning for Network Routing

Markov Decision Process (cont.)

ParametersParameters

Value functionsValue functionspolicy

discount

rewardr

actiona

statex

n

n

n

0

0

,|,

|

knnkn

knn

knkn

k

axrEaxQ

xxrExV

Page 11: Applying Reinforcement Learning for Network Routing

Elementary Methods forReinforcement Learning Problem

Dynamic programmingDynamic programming Monte Carlo MethodsMonte Carlo Methods Temporal-Difference LearningTemporal-Difference Learning

Page 12: Applying Reinforcement Learning for Network Routing

Bellman’s Equations

aaxxaxQrEaxQ

aaxxxVrExV

nnna

n

nnnna

,|,max,

,|max

'1

*1

*

1*

1*

'

Page 13: Applying Reinforcement Learning for Network Routing

Dynamic Programming Methods

Policy evaluationPolicy evaluation

Policy improvementPolicy improvement

xxxVrExV nnknk |111

aaxxxVrEaxQ

axQx

nnnn

a

,|,

,maxarg'

11

Page 14: Applying Reinforcement Learning for Network Routing

Dynamic Programming (cont.)

E ---- policy evaluationE ---- policy evaluation

I ---- policy improvementI ---- policy improvement Policy IterationPolicy Iteration Value IterationValue Iteration

**210

10 VVVEIEIEIE

Page 15: Applying Reinforcement Learning for Network Routing

Monte Carlo Methods

FeatureFeature• Learning from experienceLearning from experience• Do not need complete transition Do not need complete transition

probabilitiesprobabilities IdeaIdea

• Partition experience into episodesPartition experience into episodes• Average sample returnAverage sample return• Update at episode-by-episode baseUpdate at episode-by-episode base

Page 16: Applying Reinforcement Learning for Network Routing

Temporal-Difference Learning

Features Features (Combination of Monte Carlo and DP ideas)(Combination of Monte Carlo and DP ideas)

• Learn from experience (Monte Learn from experience (Monte Carlo)Carlo)

• Update estimates based in part on Update estimates based in part on other learned estimates (DP)other learned estimates (DP)

TD(TD() algorithm seemlessly integrates TD ) algorithm seemlessly integrates TD and Monte Carlo Methodsand Monte Carlo Methods

Page 17: Applying Reinforcement Learning for Network Routing

TD(0) LearningInitialize V(x) arbitrarilyInitialize V(x) arbitrarily to the policy to be evaluatedto the policy to be evaluatedRepeat (for each episode):Repeat (for each episode):

Initialize xInitialize xRepeat (for each step of episode)Repeat (for each step of episode)

aaaction given by action given by for x for xTake action a; observe reward r and next state x’Take action a; observe reward r and next state x’

xxx’x’until x is terminaluntil x is terminal

)]()'([)()( xVxVrxVxV

Page 18: Applying Reinforcement Learning for Network Routing

Q-Learning

Initialize Q(x,a) arbitrarilyInitialize Q(x,a) arbitrarilyRepeat (for each episode)Repeat (for each episode)

Initialize xInitialize xRepeat (for each step of episode):Repeat (for each step of episode):

Choose a from x using policy derived from QChoose a from x using policy derived from QTake action a, observe r, x’Take action a, observe r, x’

xxx’x’until x is terminaluntil x is terminal

)],(),(max[),(),( ''

'axQaxQraxQaxQ

a

Page 19: Applying Reinforcement Learning for Network Routing

Q-Routing

QQxx(y,d)----estimated time that a packet would (y,d)----estimated time that a packet would

take to reach the destination node d from take to reach the destination node d from current node x via x’s neighbor node ycurrent node x via x’s neighbor node y

TTyy(d) ------y’s estimate for the time remaining (d) ------y’s estimate for the time remaining

in the tripin the trip

qqyy ---------queuing time in node y ---------queuing time in node y

TTxyxy --------transmission time between x and y --------transmission time between x and y

dzQdT yyNz

y ,min)()(

Page 20: Applying Reinforcement Learning for Network Routing

Algorithm of Q-Routing

1.1. Set initial Q-values for each nodeSet initial Q-values for each node2.2. Get the first packet from the packet queue of Get the first packet from the packet queue of

node xnode x3.3. Choose the best neighbor node and forward Choose the best neighbor node and forward

the packet to node bythe packet to node by4.4. Get the estimated valueGet the estimated value from node from node5.5. Update Update 6.6. Go to 2.Go to 2.

y

y dyQy xxNy

,minargˆ)(

yy qdT ˆˆ

dyQdTqtdyQdyQ xyyyxxx ,ˆ,ˆ,ˆ ˆˆˆ

Page 21: Applying Reinforcement Learning for Network Routing

Dual Reinforcement Q-Routing

s

x y

d

Qx(y,d) Qy(x,s) Packet

Backward Exploration

Forward Exploration

Page 22: Applying Reinforcement Learning for Network Routing

Network Model

Subnet1 Subnet2

Page 23: Applying Reinforcement Learning for Network Routing

Network Model (cont.)

16

13

10

7

4

1

17

14

11

8

5

2

18

15

12

9

6

3

19

22

25

28

31

34

20

23

26

29

32

35

21

24

27

30

33

36

Page 24: Applying Reinforcement Learning for Network Routing

Node Model

Packet Generator

Packet Destroyer

Routing Controller

Input Queue

Input Queue

Input Queue

Input Queue

Output Queue

Output Queue

Output Queue

Output Queue

Page 25: Applying Reinforcement Learning for Network Routing

Routing Controller

Init

Idle

Arrival Departure

Arrival Depart

Default

Page 26: Applying Reinforcement Learning for Network Routing

Initialization/ Termination Procedures

InitilizationInitilization Initialize and / or register global variableInitialize and / or register global variable Initialize routing tableInitialize routing table

TerminationTermination Destroy routing tableDestroy routing table Release memoryRelease memory

Page 27: Applying Reinforcement Learning for Network Routing

Arrival Procedure

Data packet arrivalData packet arrival Update routing tableUpdate routing table Route it with control information or Route it with control information or

destroy the packet if it reaches the destroy the packet if it reaches the destinationdestination

Control information packet arrivalControl information packet arrival Update routing tableUpdate routing table Destroy the packetDestroy the packet

Page 28: Applying Reinforcement Learning for Network Routing

Departure Procedure

Set all fields of the packetSet all fields of the packet Get a shortest routeGet a shortest route Send the packet according to the routeSend the packet according to the route

Page 29: Applying Reinforcement Learning for Network Routing

References

[1] Richard S. Sutton and Andrew G. Barto, [1] Richard S. Sutton and Andrew G. Barto, Reinforcement Learning—An IntroductionReinforcement Learning—An Introduction

[2] Chengan Guo, Applications of [2] Chengan Guo, Applications of Reinforcement Learning in Sequence Reinforcement Learning in Sequence Detection and Network RoutingDetection and Network Routing

[3] Simon Haykin, Neural Networks– A [3] Simon Haykin, Neural Networks– A Comprehensive FoundationComprehensive Foundation