33
Event-Based Stochastic Learning and Optimization Xi-Ren Cao Hong Kong Uni. of Science & Tech.

Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Event-Based Stochastic

Learning and Optimization

Xi-Ren Cao

Hong Kong Uni. of Science & Tech.

Page 2: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Example 1: Admission Control in Communication

l b(n)

1-b(n)

q0i

qij

n: number of all customers in networkni: number of customers at server i n=(n1,…,nM): state, N: Capacity

How do we choose the admission probability b(n)?

Page 3: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

The Problem

n Non-standard n Not a Markov Decision Process (MDP)

Action depends on the population n,Different state n need to take the same action

n Not a Partially Observable MDP (POMDP)When a customer arrives, know something about the next state (population will not decrease)

n Additional New featuresn Control only applied when a customer arrivesn Relatively small policy space: n -> a vs n -> a

n New formulation and approachn Event based and sensitivity view

Not traditional: MDP with constraintsn Generally applicable to many other problems

Page 4: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Perturbation Analysis

i. Queuing sys.ii. Markov sys.

Markov Decision Process:

Policy Iteration

Event-Based Optimization:i. PAii. Policy Iterat

A sensitivityview of

Learning &Optimization

APPLIC

ATIO

NS

Page 5: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Table of Contents

n Formulation of the Event-based Optimizationn Formulating events

n Event-based optimization

n Introduction to a Sensitivity-Based View of Learning and Optimizationn The Problem

n Two fundamental sensitivity-based approaches

n Limitations of state-based approaches

n Solutions to the Event-Based Optimization

n Examples

n Summary and Discussion

Page 6: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Event-Based Approaches

Page 7: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

n X={Xn, n=1,2,…..}, ergodic, Xn in S={1,2,…,M}.

n Transition prob. Matrix P=[p(i,j)]ij=1,..,M

n Steady-state probability: p=(p(1), p(2),..,p(M)).

n Performance function: f=(f(1), …, f(M))T

n Steady-state performance: h = pf = Si p(i)f(i)

Pe=e, p(I-P)=0, pe=1. e=(1,1,…,1)T, I: identity

The Markov ModelThe Markov Model

1

32

p(1,3)

p(3,1)

p(2,1)

p(3,2)

p(1,2)p(2,3)

Page 8: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Formulating Events

lb(n)

1-b(n)

An event may contain information about current state, and

next state after transition

A customer arrival: population not reduced after transition

A transition: <i,j>; An event: a set of transitions

n=(n1,…,nM): n+i=(n1,…,ni+1,…,nM) ; n-i=(n1,…,ni-1,…,nM)

An arrival customer accepted: a+ = {< n, n+i >, all n & i}

A customer departure: d- = {< n, n-i>, all n & i}

An arrival customer rejected: a- = {< n, n>, all n}

A customer arrival: a = a+U a-

Page 9: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Arriving customer finds population n:

Three Types of Events

Observable Event:

1. We may observe the event2. Contains some information

0 0 2

0 1 1 1 0 1

0 2 0 1 1 0 2 0 0

0 0 1

0 1 0 1 0 00 0 0

n=2

n=0

a1

d

bb

n=1

3 svr, 2 cust.

a1

a1

Page 10: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Controllable Event: Arriving customer accepted: an,+ = {< n, n+i >, all i & n in Sn}

Sn = {n, with S ni = n }

Arriving customer rejected: an,- = {< n, n>, n in Sn}

We can control the probabilities of these events: b(n) = p(an,+|an)

0 0 2

0 1 1 1 0 1

0 2 0 1 1 0 2 0 0

0 0 1

0 1 0 1 0 00 0 0

n=2

n=0

a1

d

bb

n=1a1-

a1 a1

a1+

a=a1+ U a1-

Page 11: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Natural Transition Event: A customer entering server i: an,+i = {< n, n+i >, n in Sn}

Nature determines randomly

0 0 2

0 1 1 1 0 1

0 2 0 1 1 0 2 0 0

0 0 1

0 1 0 1 0 00 0 0

n=2

n=0

a1

d

bb

n=1a1-

a1 a1

a1+

a1,+3a1,+2

a1,+1

Page 12: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Natural transitionEvent

Controllable Event

Action

Observable Event

Is this an arrival?No

Do nothing

yes

No. of customer n?

Determine b(n)

Accept Reject

b(n) 1-b(n)

Customerjoins sv j

Customerleaves

Logical Relations in Events

Page 13: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Three events, observable, controllable, and naturaltransition, happens simultaneously in time, but have a logical order

è State dependent Markov model does not fit well

Page 14: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Event-Based Optimization

DeterminesState

transition

Underlying Markov process {Xn , n=0,1,…}

At time n, we observe an observable event an or b, d (not Xn )

Determine an action L(an) - controls the prob. of

Controllable event occurs - an+ or an-

Natural transition event follows

Goal: find a policy L to optimize the performance.

Partially Observable MDP (POMDP)

Underlying Markov process {Xn , n=0,1,…}

At time n, we observe a random variable Yn (not Xn )

Determine an action L(Y0..Yn) - controls state transition prob.

State transition follows

Goal: find a policy L to optimize the performance.

Page 15: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

HOWTo Find a Solution?

Page 16: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

A Sensitivity-Based View of

Learning and Optimization

Page 17: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

The System

(state x)

Inputs al , l=0,1,…(actions)

Output yl , l=0,1,…

(Observations)

System Performance

Dynamic system

h

Policy: action= f(information) , a =L(y)

L may depend on parameters q

Page 18: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Optimizatiom: Determine a policy L that maximizes hL

Learning and Optimization

Learning:

1. Observing /analyzing sample paths (with or without estimating system parameters).

2. Study the behavior of a system under one policy

to learn how to improve system performance

Difficulties:

1. Policy space too large (# states: N, # actions: M, # policies: MN)

2. System too complicated, structure/parameters unknown

Page 19: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

A Philosophical Boundary

§ We can study/observe

only one policy at a time

We can do very little!

§ By studying or observingthe behavior of one policy,

in general we cannot

obtain the performance of

other policies

Page 20: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

§ Continuous Parameters § Discrete Policy Space

Two Basic Approaches

q+Dq

q

(policy gradient) (policy iteration)

Qgd

dp

q

h= Qg'' phh =-

We can only start with these two sensitivity formulas!

Page 21: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Performance Difference Formula

For two Markov chains P, f, h, p and P’, f, h’, p’, let Q=P’-P

Poisson equation and performance potentials g :

fgePI =+- )( p

hpp == fg:p´ f'' ph =

:'p´ QggPP ')'('' pphh =-=-

Potentials g can be estimated from a sample path of (P,f)

å¥

=

=-=0

0 }|])([{)(k

k iXXfEig h

Page 22: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Policy Iteration

gPPQg )'(''' -==- pphh

1. h’>h if P’g>Pg

2. Policy iteration:

At any state find a policy P’ with P’g>Pg

3. Multi-chain (average performance): A simple, intuitive approach without discounting

Page 23: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Limitations of State-Based Approaches

n Curse of dimensionality: state space too large

n Independent actions: action at different statesneed to be independent

n Structural property lost

Page 24: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Event-Based Optimization

Event-based policy: an à b(n)

Problem: Find a policy that maximizes steady-state perf.

Method: Start with perf. difference formula

How to derive PDF?

Construction method with performance potentialsas building blocks.

Page 25: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Two policies: b(n) and b’(n)

Potential aggregation:

p(n): prob. of event an , arrival finding population n

åå ååå

-=== =

+

nn

M

i nn

ii

ii

ngnpngnpqn

nd )}()()]()([{)(

1)(

10

p

å-

=

-=-1

0

)}()]()(')[('{'N

n

ndnbnbnphh1. PDF:

å-

=

=1

0

)}()(

)({N

n

ndd

ndbn

d

d

qp

q

h q2. Performance deriv:

1è policy iteration: Reduced computation: N linear in sizeaction depends on states

2è gradient-based optimization (Perturbation analysis)

Performance Difference Formula

3. Learning and On-line implementation:

d(n) estimated with same amount of computation as g(n)!

Page 26: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

n Action taken when an event happens

n Same event may happen at different states, i.e., actions may be the same for different states

Why ?

n Event-based policy

n Perf. diff. formula for event based policiesn Utilize system structure (aggregation)

How ?

Benefit ?n Event-based policy iteration possible

n Reduce computation (e.g, linear in system size)

n Intuitive, clear physical meaning

Summary:

Page 27: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Other Examples

Page 28: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Example 1: Admission Control in Communication

l b(n)

1-b(n)

q0i

qij

n: number of all customers in networkni: number of customers at server i n=(n1,…,nM): state, N: Capacity

Events: A customer arrives finding a population n

How do we choose the admission probability b(n)?

qi i=0

Page 29: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Example 2: Manufacturing

M products: i=1,2,…, M Operation of product i consists of Ni phases, j=1,…,Ni

State: (i,j)

Two level hierarchical control When a product i is finishes, how do we choose

The next product i’, (probability c(i,i’) )?

Events: product i finished

c(i, i’)

Page 30: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Example 3: Control with Lebesgue Sampling

Riemann Sampling Lebesgue Sampling

t t

x xf(x) f(x)

x1

x2

x3

x4

LSC More efficient than RSC (Astrom, et al, for X={x1})

X= {-inf, +inf} X= {x1, x2, x3 , x4}

How to develop an optimal control policy for LSC with X= {x1, … } ?

Events: System reaches a state in X= {x1, … }

Page 31: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Example 4: Large Communication Networks

Subnet 1

Subnet 2

Subnet 3

Subnet 4

Page 32: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Conclusion and Discussion

n Learning and optimization are bases on

two performance sensitivity formulas

n They lead to two fundamental approaches: gradient based (PA) and policy iteration based

n State-based model has limitations:State space too large, action independent, no structure

n Event-based optimizationn Event: a set of transitions, extension of set of states

n Three types of events, logically related, structure properties

n Solution based on PDF

n Computation reduced, maybe linear in system size

n On-lime implementation

n Widely applicable

(Exs: some not fit Markov model, others utilize special features)

Page 33: Event-Based Stochastic Learning and Optimizationvhosts.eecs.umich.edu/wodes2006/Images/XirenTalk.pdf · 2006. 8. 14. · Learning and optimization are bases on two performance sensitivity

Thank You!