Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Event-Based Stochastic
Learning and Optimization
Xi-Ren Cao
Hong Kong Uni. of Science & Tech.
Example 1: Admission Control in Communication
l b(n)
1-b(n)
q0i
qij
n: number of all customers in networkni: number of customers at server i n=(n1,…,nM): state, N: Capacity
How do we choose the admission probability b(n)?
The Problem
n Non-standard n Not a Markov Decision Process (MDP)
Action depends on the population n,Different state n need to take the same action
n Not a Partially Observable MDP (POMDP)When a customer arrives, know something about the next state (population will not decrease)
n Additional New featuresn Control only applied when a customer arrivesn Relatively small policy space: n -> a vs n -> a
n New formulation and approachn Event based and sensitivity view
Not traditional: MDP with constraintsn Generally applicable to many other problems
Perturbation Analysis
i. Queuing sys.ii. Markov sys.
Markov Decision Process:
Policy Iteration
Event-Based Optimization:i. PAii. Policy Iterat
A sensitivityview of
Learning &Optimization
APPLIC
ATIO
NS
Table of Contents
n Formulation of the Event-based Optimizationn Formulating events
n Event-based optimization
n Introduction to a Sensitivity-Based View of Learning and Optimizationn The Problem
n Two fundamental sensitivity-based approaches
n Limitations of state-based approaches
n Solutions to the Event-Based Optimization
n Examples
n Summary and Discussion
Event-Based Approaches
n X={Xn, n=1,2,…..}, ergodic, Xn in S={1,2,…,M}.
n Transition prob. Matrix P=[p(i,j)]ij=1,..,M
n Steady-state probability: p=(p(1), p(2),..,p(M)).
n Performance function: f=(f(1), …, f(M))T
n Steady-state performance: h = pf = Si p(i)f(i)
Pe=e, p(I-P)=0, pe=1. e=(1,1,…,1)T, I: identity
The Markov ModelThe Markov Model
1
32
p(1,3)
p(3,1)
p(2,1)
p(3,2)
p(1,2)p(2,3)
Formulating Events
lb(n)
1-b(n)
An event may contain information about current state, and
next state after transition
A customer arrival: population not reduced after transition
A transition: <i,j>; An event: a set of transitions
n=(n1,…,nM): n+i=(n1,…,ni+1,…,nM) ; n-i=(n1,…,ni-1,…,nM)
An arrival customer accepted: a+ = {< n, n+i >, all n & i}
A customer departure: d- = {< n, n-i>, all n & i}
An arrival customer rejected: a- = {< n, n>, all n}
A customer arrival: a = a+U a-
Arriving customer finds population n:
Three Types of Events
Observable Event:
1. We may observe the event2. Contains some information
0 0 2
0 1 1 1 0 1
0 2 0 1 1 0 2 0 0
0 0 1
0 1 0 1 0 00 0 0
n=2
n=0
a1
d
bb
n=1
3 svr, 2 cust.
a1
a1
Controllable Event: Arriving customer accepted: an,+ = {< n, n+i >, all i & n in Sn}
Sn = {n, with S ni = n }
Arriving customer rejected: an,- = {< n, n>, n in Sn}
We can control the probabilities of these events: b(n) = p(an,+|an)
0 0 2
0 1 1 1 0 1
0 2 0 1 1 0 2 0 0
0 0 1
0 1 0 1 0 00 0 0
n=2
n=0
a1
d
bb
n=1a1-
a1 a1
a1+
a=a1+ U a1-
Natural Transition Event: A customer entering server i: an,+i = {< n, n+i >, n in Sn}
Nature determines randomly
0 0 2
0 1 1 1 0 1
0 2 0 1 1 0 2 0 0
0 0 1
0 1 0 1 0 00 0 0
n=2
n=0
a1
d
bb
n=1a1-
a1 a1
a1+
a1,+3a1,+2
a1,+1
Natural transitionEvent
Controllable Event
Action
Observable Event
Is this an arrival?No
Do nothing
yes
No. of customer n?
Determine b(n)
Accept Reject
b(n) 1-b(n)
Customerjoins sv j
Customerleaves
Logical Relations in Events
Three events, observable, controllable, and naturaltransition, happens simultaneously in time, but have a logical order
è State dependent Markov model does not fit well
Event-Based Optimization
DeterminesState
transition
Underlying Markov process {Xn , n=0,1,…}
At time n, we observe an observable event an or b, d (not Xn )
Determine an action L(an) - controls the prob. of
Controllable event occurs - an+ or an-
Natural transition event follows
Goal: find a policy L to optimize the performance.
Partially Observable MDP (POMDP)
Underlying Markov process {Xn , n=0,1,…}
At time n, we observe a random variable Yn (not Xn )
Determine an action L(Y0..Yn) - controls state transition prob.
State transition follows
Goal: find a policy L to optimize the performance.
HOWTo Find a Solution?
A Sensitivity-Based View of
Learning and Optimization
The System
(state x)
Inputs al , l=0,1,…(actions)
Output yl , l=0,1,…
(Observations)
System Performance
Dynamic system
h
Policy: action= f(information) , a =L(y)
L may depend on parameters q
Optimizatiom: Determine a policy L that maximizes hL
Learning and Optimization
Learning:
1. Observing /analyzing sample paths (with or without estimating system parameters).
2. Study the behavior of a system under one policy
to learn how to improve system performance
Difficulties:
1. Policy space too large (# states: N, # actions: M, # policies: MN)
2. System too complicated, structure/parameters unknown
A Philosophical Boundary
§ We can study/observe
only one policy at a time
We can do very little!
§ By studying or observingthe behavior of one policy,
in general we cannot
obtain the performance of
other policies
§ Continuous Parameters § Discrete Policy Space
Two Basic Approaches
q+Dq
q
(policy gradient) (policy iteration)
Qgd
dp
q
h= Qg'' phh =-
We can only start with these two sensitivity formulas!
Performance Difference Formula
For two Markov chains P, f, h, p and P’, f, h’, p’, let Q=P’-P
Poisson equation and performance potentials g :
fgePI =+- )( p
hpp == fg:p´ f'' ph =
:'p´ QggPP ')'('' pphh =-=-
Potentials g can be estimated from a sample path of (P,f)
å¥
=
=-=0
0 }|])([{)(k
k iXXfEig h
Policy Iteration
gPPQg )'(''' -==- pphh
1. h’>h if P’g>Pg
2. Policy iteration:
At any state find a policy P’ with P’g>Pg
3. Multi-chain (average performance): A simple, intuitive approach without discounting
Limitations of State-Based Approaches
n Curse of dimensionality: state space too large
n Independent actions: action at different statesneed to be independent
n Structural property lost
Event-Based Optimization
Event-based policy: an à b(n)
Problem: Find a policy that maximizes steady-state perf.
Method: Start with perf. difference formula
How to derive PDF?
Construction method with performance potentialsas building blocks.
Two policies: b(n) and b’(n)
Potential aggregation:
p(n): prob. of event an , arrival finding population n
åå ååå
-=== =
+
nn
M
i nn
ii
ii
ngnpngnpqn
nd )}()()]()([{)(
1)(
10
p
å-
=
-=-1
0
)}()]()(')[('{'N
n
ndnbnbnphh1. PDF:
å-
=
=1
0
)}()(
)({N
n
ndd
ndbn
d
d
qp
q
h q2. Performance deriv:
1è policy iteration: Reduced computation: N linear in sizeaction depends on states
2è gradient-based optimization (Perturbation analysis)
Performance Difference Formula
3. Learning and On-line implementation:
d(n) estimated with same amount of computation as g(n)!
n Action taken when an event happens
n Same event may happen at different states, i.e., actions may be the same for different states
Why ?
n Event-based policy
n Perf. diff. formula for event based policiesn Utilize system structure (aggregation)
How ?
Benefit ?n Event-based policy iteration possible
n Reduce computation (e.g, linear in system size)
n Intuitive, clear physical meaning
Summary:
Other Examples
Example 1: Admission Control in Communication
l b(n)
1-b(n)
q0i
qij
n: number of all customers in networkni: number of customers at server i n=(n1,…,nM): state, N: Capacity
Events: A customer arrives finding a population n
How do we choose the admission probability b(n)?
qi i=0
Example 2: Manufacturing
M products: i=1,2,…, M Operation of product i consists of Ni phases, j=1,…,Ni
State: (i,j)
Two level hierarchical control When a product i is finishes, how do we choose
The next product i’, (probability c(i,i’) )?
Events: product i finished
c(i, i’)
Example 3: Control with Lebesgue Sampling
Riemann Sampling Lebesgue Sampling
t t
x xf(x) f(x)
x1
x2
x3
x4
LSC More efficient than RSC (Astrom, et al, for X={x1})
X= {-inf, +inf} X= {x1, x2, x3 , x4}
How to develop an optimal control policy for LSC with X= {x1, … } ?
Events: System reaches a state in X= {x1, … }
Example 4: Large Communication Networks
Subnet 1
Subnet 2
Subnet 3
Subnet 4
Conclusion and Discussion
n Learning and optimization are bases on
two performance sensitivity formulas
n They lead to two fundamental approaches: gradient based (PA) and policy iteration based
n State-based model has limitations:State space too large, action independent, no structure
n Event-based optimizationn Event: a set of transitions, extension of set of states
n Three types of events, logically related, structure properties
n Solution based on PDF
n Computation reduced, maybe linear in system size
n On-lime implementation
n Widely applicable
(Exs: some not fit Markov model, others utilize special features)
Thank You!