Upload
sean-meyn
View
1.433
Download
2
Tags:
Embed Size (px)
DESCRIPTION
https://netfiles.uiuc.edu/meyn/www/spm_files/TD5552009/TD555.html Presentation by Dayu Huang, based on paper of the same name in Proc. of the 48th IEEE Conference on Decision and Control, December 16-18 2009
Citation preview
Approximate Dynamic Programming usingFluid and Diffusion Approximations with Applications to Power Management
Wei Chen, Dayu Huang, Ankur A. Kulkarni, Jayakrishnan Unnikrishnan, Quanyan Zhu, Prashant Mehta, Sean Meyn, and Adam Wierman
Coordinated Science Laboratory, UIUCDept. of IESE, UIUCDept. of CS, California Inst. of Tech.
Speaker: Dayu Huang
National Science Foundation (ECS-0523620 and CCF-0830511),ITMANET DARPA RK 2006-07284, and Microsoft Research
1
1
2
2
J
xn0 2 4 6 8 10 12 14 16 18 20
0
20
40
60
80
100
120
0 1 2 3 4 5 6 7 8 9 10 x 104−2
−1
0
1
2
3
4
Introduction
MDP model
i.i.d
Control
Cost
Minimize average cost
Introduction
MDP model
i.i.d
Control
Cost
Minimize average cost
Generator
Introduction
MDP model
i.i.d
Control
Cost
Minimize average cost
Average Cost Optimality Equation (ACOE)
Solve ACOE and Find
Relative value functionGenerator
TD Learning
The “curse of dimensionality”:
Approximate within a �nite-dimensional function class
Criterion: minimize the mean-squre error
solved by stochastic approximation algorithms
Complexity of solving ACOE grows exponentially with the dimension of the state space.
TD Learning
The “curse of dimensionality”:
Approximate within a �nite-dimensional function class
Criterion: minimize the mean-squre error
solved by stochastic approximation algorithms
Complexity of solving ACOE grows exponentially with the dimension of the state space.
Problem: How to select the basis functions ?
key to the success of TD learning
0 2 4 6 8 10 12 14 16 18 20
20
40
60
80
100
120
Fluid value function
Relative value function
is a tight approximation to
can be used as a part of the basis
Total cost for an associated deterministic model
Related Work
Veatch 2004Moallemi, Kumar and Van Roy 2006
Meyn 1997, Meyn 1997b
Hendersen et.al. 2003simulationnetwork scheduling
and routing
optimal control Chen and Meyn 1999
Meyn 2007
Multiclass queueing network
Control Techniques forComplex Networks
other approaches Mannor, Menache and Shimkin 2005Tsitsiklis and Van Roy 1997
Related Work
Veatch 2004Moallemi, Kumar and Van Roy 2006
Meyn 1997, Meyn 1997b
Hendersen et.al. 2003simulationnetwork scheduling
and routing
optimal control Chen and Meyn 1999
Meyn 2007
Multiclass queueing network
Control Techniques forComplex Networks
other approaches Mannor, Menache and Shimkin 2005Tsitsiklis and Van Roy 1997
Taylor series approximation this work
Power Management via Speed Scaling
Single processor
Control the processing speed to balance delay and energy costs
processing rate determined by the current power
Processor design: polynomial cost
We also consider for wireless communication applications
Bansal, Kimbrel and Pruhs 2007
Wierman, Andrew and Tang 2009
This talk
job arrivals
Kaxiras and Martonosi 2008Wierman, Andrew and Tang 2009
Mannor, Menache and Shimkin 2005
Total Cost
Fluid Model
Fluid model:
Total Cost Optimality Equation (TCOE) for the �uid model:
MDP
Why Fluid Model?
First order Taylor series approximation
MDP
Why Fluid Model?
First order Taylor series approximation
MDP
Simple butpowerful idea!
almost solves the ACOE
TCOE
ACOE
Policy
0 2 4 6 8 10 12 14 16 18 20−20
0
20
40
60
80
100
120
140
160
180
Stochastic optimal policy
myopic policy
Di erence
x
Value Iteration
5 10 15 200
50
100
150
200
250
Initialization:
Initialization: V0 0
n
V0 =
(See also [Chen and Meyn 1999])
Approximation of the Cost Function
Error Analysis
constant?
Approximation of the Cost Function
Error Analysis
constant?
Bounds on ?
approximates
Surrogate cost
Structure Results on the Fluid Solution
Lower Bound
Convexity of
Lower Bound
Convexity of
Upper Bound
Upper Bound
Upper Bound
Approach Based on Fluid and Di�usion Models
Value function of the �uid model
0 2 4 6 8 10 12 14 16 18 20
20
40
60
80
100
120
Fluid value function
Relative value function
is a tight approximation to
this talk: �uid model
can be used as a part of the basis
Total cost for an associated deterministic model
TD Learning Experiment
Estimates of Coe�cients for the case of quadratic cost
0 2 4 6 8 10 12 14 16 18 200
20
40
60
80
100
120
Approximate relative value function
Fluid value function
0 1 2 3 4 5 6 7 8 9 10 x 104−2
−1
0
1
2
3
4
Relative value function
Basis functions:
TD Learning with Policy Improvement
Nearly optimal after just a few iterations
Average cost at stage
0 5 10 15 20 25
2
3
Need the value of the optimal policy
Conclusions
The �uid value function can be used as a part of the basis for TD-learning.
Motivated by analysis using Taylor series expansion:
The �uid value function almost solves ACOE. In particular,it solves the ACOE for a slightly di�erent cost function; andthe error term can be estimated.
TD learning with policy improvement gives a near optimal policy in a few iterations, as shown by experiments.
Application in power management for processors.
[1] W. Chen, D. Huang, A. Kulkarni, J. Unnikrishnan, Q. Zhu, P. Mehta, S. Meyn, and A. Wierman.Approximate dynamic programming using fluid and diffusion approximations with applications to powermanagement. Accepted for inclusion in the 48th IEEE Conference on Decision and Control, December16-18 2009.
[1] P. Mehta and S. Meyn. Q-learning and Pontryagin’s Minimum Principle. To appear in Proceedings ofthe 48th IEEE Conference on Decision and Control, December 16-18 2009.
[1] R.-R. Chen and S. P. Meyn. Value iteration and optimization of multiclass queueing networks. QueueingSyst. Theory Appl., 32(1-3):65–97, 1999.
[1] S. G. Henderson, S. P. Meyn, and V. B. Tadic. Performance evaluation and policy selection in multiclassnetworks. Discrete Event Dynamic Systems: Theory and Applications, 13(1-2):149–189, 2003. Specialissue on learning, optimization and decision making (invited).
[1] S. P. Meyn. The policy iteration algorithm for average reward Markov decision processes with generalstate space. IEEE Trans. Automat. Control, 42(12):1663–1680, 1997.
[1] S. P. Meyn. Control Techniques for Complex Networks. Cambridge University Press, Cambridge, 2007.
[1] C. Moallemi, S. Kumar, and B. Van Roy. Approximate and data-driven dynamic programming forqueueing networks. Preprint available at http://moallemi.com/ciamac/research-interests.php, 2008.
References