28
Approximate Dynamic Programming using Fluid and Diffusion Approximations with Applications to Power Management Wei Chen, Dayu Huang, Ankur A. Kulkarni, Jayakrishnan Unnikrishnan, Quanyan Zhu, Prashant Mehta, Sean Meyn, and Adam Wierman Coordinated Science Laboratory, UIUC Dept. of IESE, UIUC Dept. of CS, California Inst. of Tech. Speaker: Dayu Huang National Science Foundation (ECS-0523620 and CCF-0830511), ITMANET DARPA RK 2006-07284, and Microsoft Research 1 1 2 2 J x n 0 2 4 6 8 10 12 14 16 18 20 0 20 40 60 80 100 120 0 1 2 3 4 5 6 7 8 9 10 x 10 4 −2 −1 0 1 2 3 4

Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Embed Size (px)

DESCRIPTION

https://netfiles.uiuc.edu/meyn/www/spm_files/TD5552009/TD555.html Presentation by Dayu Huang, based on paper of the same name in Proc. of the 48th IEEE Conference on Decision and Control, December 16-18 2009

Citation preview

Page 1: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Approximate Dynamic Programming usingFluid and Diffusion Approximations with Applications to Power Management

Wei Chen, Dayu Huang, Ankur A. Kulkarni, Jayakrishnan Unnikrishnan, Quanyan Zhu, Prashant Mehta, Sean Meyn, and Adam Wierman

Coordinated Science Laboratory, UIUCDept. of IESE, UIUCDept. of CS, California Inst. of Tech.

Speaker: Dayu Huang

National Science Foundation (ECS-0523620 and CCF-0830511),ITMANET DARPA RK 2006-07284, and Microsoft Research

1

1

2

2

J

xn0 2 4 6 8 10 12 14 16 18 20

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 x 104−2

−1

0

1

2

3

4

Page 2: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Introduction

MDP model

i.i.d

Control

Cost

Minimize average cost

Page 3: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Introduction

MDP model

i.i.d

Control

Cost

Minimize average cost

Generator

Page 4: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Introduction

MDP model

i.i.d

Control

Cost

Minimize average cost

Average Cost Optimality Equation (ACOE)

Solve ACOE and Find

Relative value functionGenerator

Page 5: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

TD Learning

The “curse of dimensionality”:

Approximate within a �nite-dimensional function class

Criterion: minimize the mean-squre error

solved by stochastic approximation algorithms

Complexity of solving ACOE grows exponentially with the dimension of the state space.

Page 6: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

TD Learning

The “curse of dimensionality”:

Approximate within a �nite-dimensional function class

Criterion: minimize the mean-squre error

solved by stochastic approximation algorithms

Complexity of solving ACOE grows exponentially with the dimension of the state space.

Problem: How to select the basis functions ?

key to the success of TD learning

Page 7: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

0 2 4 6 8 10 12 14 16 18 20

20

40

60

80

100

120

Fluid value function

Relative value function

is a tight approximation to

can be used as a part of the basis

Total cost for an associated deterministic model

Page 8: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Related Work

Veatch 2004Moallemi, Kumar and Van Roy 2006

Meyn 1997, Meyn 1997b

Hendersen et.al. 2003simulationnetwork scheduling

and routing

optimal control Chen and Meyn 1999

Meyn 2007

Multiclass queueing network

Control Techniques forComplex Networks

other approaches Mannor, Menache and Shimkin 2005Tsitsiklis and Van Roy 1997

Page 9: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Related Work

Veatch 2004Moallemi, Kumar and Van Roy 2006

Meyn 1997, Meyn 1997b

Hendersen et.al. 2003simulationnetwork scheduling

and routing

optimal control Chen and Meyn 1999

Meyn 2007

Multiclass queueing network

Control Techniques forComplex Networks

other approaches Mannor, Menache and Shimkin 2005Tsitsiklis and Van Roy 1997

Taylor series approximation this work

Page 10: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Power Management via Speed Scaling

Single processor

Control the processing speed to balance delay and energy costs

processing rate determined by the current power

Processor design: polynomial cost

We also consider for wireless communication applications

Bansal, Kimbrel and Pruhs 2007

Wierman, Andrew and Tang 2009

This talk

job arrivals

Kaxiras and Martonosi 2008Wierman, Andrew and Tang 2009

Mannor, Menache and Shimkin 2005

Page 11: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Total Cost

Fluid Model

Fluid model:

Total Cost Optimality Equation (TCOE) for the �uid model:

MDP

Page 12: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Why Fluid Model?

First order Taylor series approximation

MDP

Page 13: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Why Fluid Model?

First order Taylor series approximation

MDP

Simple butpowerful idea!

almost solves the ACOE

TCOE

ACOE

Page 14: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Policy

0 2 4 6 8 10 12 14 16 18 20−20

0

20

40

60

80

100

120

140

160

180

Stochastic optimal policy

myopic policy

Di erence

x

Page 15: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Value Iteration

5 10 15 200

50

100

150

200

250

Initialization:

Initialization: V0 0

n

V0 =

(See also [Chen and Meyn 1999])

Page 16: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Approximation of the Cost Function

Error Analysis

constant?

Page 17: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Approximation of the Cost Function

Error Analysis

constant?

Bounds on ?

approximates

Surrogate cost

Page 18: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Structure Results on the Fluid Solution

Page 19: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Lower Bound

Convexity of

Page 20: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Lower Bound

Convexity of

Page 21: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Upper Bound

Page 22: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Upper Bound

Page 23: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Upper Bound

Page 24: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Approach Based on Fluid and Di�usion Models

Value function of the �uid model

0 2 4 6 8 10 12 14 16 18 20

20

40

60

80

100

120

Fluid value function

Relative value function

is a tight approximation to

this talk: �uid model

can be used as a part of the basis

Total cost for an associated deterministic model

Page 25: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

TD Learning Experiment

Estimates of Coe�cients for the case of quadratic cost

0 2 4 6 8 10 12 14 16 18 200

20

40

60

80

100

120

Approximate relative value function

Fluid value function

0 1 2 3 4 5 6 7 8 9 10 x 104−2

−1

0

1

2

3

4

Relative value function

Basis functions:

Page 26: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

TD Learning with Policy Improvement

Nearly optimal after just a few iterations

Average cost at stage

0 5 10 15 20 25

2

3

Need the value of the optimal policy

Page 27: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

Conclusions

The �uid value function can be used as a part of the basis for TD-learning.

Motivated by analysis using Taylor series expansion:

The �uid value function almost solves ACOE. In particular,it solves the ACOE for a slightly di�erent cost function; andthe error term can be estimated.

TD learning with policy improvement gives a near optimal policy in a few iterations, as shown by experiments.

Application in power management for processors.

Page 28: Approximate dynamic programming using fluid and diffusion approximations with applications to power management

[1] W. Chen, D. Huang, A. Kulkarni, J. Unnikrishnan, Q. Zhu, P. Mehta, S. Meyn, and A. Wierman.Approximate dynamic programming using fluid and diffusion approximations with applications to powermanagement. Accepted for inclusion in the 48th IEEE Conference on Decision and Control, December16-18 2009.

[1] P. Mehta and S. Meyn. Q-learning and Pontryagin’s Minimum Principle. To appear in Proceedings ofthe 48th IEEE Conference on Decision and Control, December 16-18 2009.

[1] R.-R. Chen and S. P. Meyn. Value iteration and optimization of multiclass queueing networks. QueueingSyst. Theory Appl., 32(1-3):65–97, 1999.

[1] S. G. Henderson, S. P. Meyn, and V. B. Tadic. Performance evaluation and policy selection in multiclassnetworks. Discrete Event Dynamic Systems: Theory and Applications, 13(1-2):149–189, 2003. Specialissue on learning, optimization and decision making (invited).

[1] S. P. Meyn. The policy iteration algorithm for average reward Markov decision processes with generalstate space. IEEE Trans. Automat. Control, 42(12):1663–1680, 1997.

[1] S. P. Meyn. Control Techniques for Complex Networks. Cambridge University Press, Cambridge, 2007.

[1] C. Moallemi, S. Kumar, and B. Van Roy. Approximate and data-driven dynamic programming forqueueing networks. Preprint available at http://moallemi.com/ciamac/research-interests.php, 2008.

References