Approximate dynamic programming using fluid and diffusion approximations with applications to power...

Approximate Dynamic Programming usingFluid and Diffusion Approximations with Applications to Power Management

Wei Chen, Dayu Huang, Ankur A. Kulkarni, Jayakrishnan Unnikrishnan, Quanyan Zhu, Prashant Mehta, Sean Meyn, and Adam Wierman

Coordinated Science Laboratory, UIUCDept. of IESE, UIUCDept. of CS, California Inst. of Tech.

Speaker: Dayu Huang

National Science Foundation (ECS-0523620 and CCF-0830511),ITMANET DARPA RK 2006-07284, and Microsoft Research

xn0 2 4 6 8 10 12 14 16 18 20

0 1 2 3 4 5 6 7 8 9 10 x 104−2

Introduction

MDP model

Control

Minimize average cost

Introduction

MDP model

Control

Generator

Introduction

MDP model

Control

Average Cost Optimality Equation (ACOE)

Solve ACOE and Find

Relative value functionGenerator

TD Learning

The “curse of dimensionality”:

Approximate within a �nite-dimensional function class

Criterion: minimize the mean-squre error

solved by stochastic approximation algorithms

Complexity of solving ACOE grows exponentially with the dimension of the state space.

TD Learning

The “curse of dimensionality”:

Approximate within a �nite-dimensional function class

Criterion: minimize the mean-squre error

solved by stochastic approximation algorithms

Complexity of solving ACOE grows exponentially with the dimension of the state space.

Problem: How to select the basis functions ?

key to the success of TD learning

0 2 4 6 8 10 12 14 16 18 20

Fluid value function

Relative value function

is a tight approximation to

can be used as a part of the basis

Total cost for an associated deterministic model

Related Work

Veatch 2004Moallemi, Kumar and Van Roy 2006

Meyn 1997, Meyn 1997b

Hendersen et.al. 2003simulationnetwork scheduling

and routing

optimal control Chen and Meyn 1999

Meyn 2007

Multiclass queueing network

Control Techniques forComplex Networks

other approaches Mannor, Menache and Shimkin 2005Tsitsiklis and Van Roy 1997

Related Work

Veatch 2004Moallemi, Kumar and Van Roy 2006

Meyn 1997, Meyn 1997b

Hendersen et.al. 2003simulationnetwork scheduling

and routing

optimal control Chen and Meyn 1999

Meyn 2007

Multiclass queueing network

Control Techniques forComplex Networks

other approaches Mannor, Menache and Shimkin 2005Tsitsiklis and Van Roy 1997

Taylor series approximation this work

Power Management via Speed Scaling

Single processor

Control the processing speed to balance delay and energy costs

processing rate determined by the current power

Processor design: polynomial cost

We also consider for wireless communication applications

Bansal, Kimbrel and Pruhs 2007

Wierman, Andrew and Tang 2009

This talk

job arrivals

Kaxiras and Martonosi 2008Wierman, Andrew and Tang 2009

Mannor, Menache and Shimkin 2005

Total Cost

Fluid Model

Fluid model:

Total Cost Optimality Equation (TCOE) for the �uid model:

Why Fluid Model?

First order Taylor series approximation

Why Fluid Model?

First order Taylor series approximation

Simple butpowerful idea!

almost solves the ACOE

Policy

0 2 4 6 8 10 12 14 16 18 20−20

Stochastic optimal policy

myopic policy

Di erence

Value Iteration

5 10 15 200

Initialization:

Initialization: V0 0

(See also [Chen and Meyn 1999])

Approximation of the Cost Function

Error Analysis

constant?

Approximation of the Cost Function

Error Analysis

constant?

Bounds on ?

approximates

Surrogate cost

Structure Results on the Fluid Solution

Lower Bound

Convexity of

Lower Bound

Convexity of

Upper Bound

Approach Based on Fluid and Di�usion Models

Value function of the �uid model

0 2 4 6 8 10 12 14 16 18 20

is a tight approximation to

this talk: �uid model

can be used as a part of the basis

Total cost for an associated deterministic model

TD Learning Experiment

Estimates of Coe�cients for the case of quadratic cost

0 2 4 6 8 10 12 14 16 18 200

Approximate relative value function

0 1 2 3 4 5 6 7 8 9 10 x 104−2

Basis functions:

TD Learning with Policy Improvement

Nearly optimal after just a few iterations

Average cost at stage

0 5 10 15 20 25

Need the value of the optimal policy

Conclusions

The �uid value function can be used as a part of the basis for TD-learning.

Motivated by analysis using Taylor series expansion:

The �uid value function almost solves ACOE. In particular,it solves the ACOE for a slightly di�erent cost function; andthe error term can be estimated.

TD learning with policy improvement gives a near optimal policy in a few iterations, as shown by experiments.

Application in power management for processors.

[1] W. Chen, D. Huang, A. Kulkarni, J. Unnikrishnan, Q. Zhu, P. Mehta, S. Meyn, and A. Wierman.Approximate dynamic programming using fluid and diffusion approximations with applications to powermanagement. Accepted for inclusion in the 48th IEEE Conference on Decision and Control, December16-18 2009.

[1] P. Mehta and S. Meyn. Q-learning and Pontryagin’s Minimum Principle. To appear in Proceedings ofthe 48th IEEE Conference on Decision and Control, December 16-18 2009.

[1] R.-R. Chen and S. P. Meyn. Value iteration and optimization of multiclass queueing networks. QueueingSyst. Theory Appl., 32(1-3):65–97, 1999.

[1] S. G. Henderson, S. P. Meyn, and V. B. Tadic. Performance evaluation and policy selection in multiclassnetworks. Discrete Event Dynamic Systems: Theory and Applications, 13(1-2):149–189, 2003. Specialissue on learning, optimization and decision making (invited).

[1] S. P. Meyn. The policy iteration algorithm for average reward Markov decision processes with generalstate space. IEEE Trans. Automat. Control, 42(12):1663–1680, 1997.

[1] S. P. Meyn. Control Techniques for Complex Networks. Cambridge University Press, Cambridge, 2007.

[1] C. Moallemi, S. Kumar, and B. Van Roy. Approximate and data-driven dynamic programming forqueueing networks. Preprint available at http://moallemi.com/ciamac/research-interests.php, 2008.

References

Approximate dynamic programming using fluid and diffusion approximations with applications to power...

Design

Efﬁcient Multiple Scattering in Hair Using Spherical Harmonicssrm/publications/SG08-shhair.pdf · 2017. 1. 30. · diffusion-based approximations [Stam 1995]. By posing the cal-culation

Harmonic Numbers: Insights, Approximations and Applications · Harmonic Numbers: Insights, Approximations and ... Harmonic Numbers: Insights, Approximations and Applications ... The

· Web viewSince coded blocks only approximate the original signal, the difference between the approximations may cause discontinuities at the prediction and transform block boundaries

ENTROPY-STABLE AND ENTROPY-DISSIPATIVE APPROXIMATIONS OF …juengel/publications/pdf/p... · 2012-08-27 · APPROXIMATIONS OF A QUANTUM DIFFUSION EQUATION 3 entropy or energy dissipation

Local and Local-Global Approximations Local algebraic approximations – Variants on Taylor series Local-Global approximations – Variants on “fudge factor”

Bayesian Computing with INLA: A Review - arXiv · the approach of Integrated Nested Laplace Approximations (INLA) to do approximate Bayesian inference for latent Gaussian models (LGMs)

Asymptotic Solutions of the Kinetic Boltzmann Equation and ... · approximations method in the Section 2 as necessary condition of the existence of approximate (asymptotic) so-lution

Approximate Analytical Solution of Time-fractional … · Approximate Analytical Solution of Time-fractional order Cauchy-Reaction Diffusion equation ... Consider the time fractional-order

Chapter 14 – Partial Derivatives 14.4 Tangent Planes & Linear Approximations 1 Objectives: Determine how to approximate functions using tangent planes

2154 IEEE TRANSACTIONS ON NEURAL NETWORKS AND …ncr.mae.ufl.edu/papers/TNNLS18.pdf · Approximate Dynamic Programming: Combining Regional and Local State Following Approximations

Diffusion par des surfaces rugueuses: approximations faibles pentes Marc Saillard LSEET UMR 6133 CNRS-Université du Sud Toulon-Var BP 132, 83957 La Garde

1-1 Enrichment - Weebly · ©Glencoe/McGraw-Hill 6 Glencoe Algebra 2 Significant Digits All measurements are approximations. The significant digitsof an approximate number are those

Paul Glasserman - Saddlepoint Approximations for Affine Jump-Diffusion Models.pdf

Radiation boundary conditions for waves: the solved, the … · 2020. 10. 7. · Local approximate radiation conditions are de ned by somewhat more restrictive rational approximations

Stochastic processing networks: steady-state 23 diffusion ...Stochastic processing networks: steady-state 23 diffusion approximations Jim Dai School of Operations Research and Information

Analytical Solution for Cauchy Reaction-Diffusion … Chowdhury indd.pdf · Analytical Solution for Cauchy Reaction-Diffusion Problems by Homotopy Perturbation ... approximate analytical

Approximate Dynamic Programming using Fluid and Diffusion Approximations with Applications to Power Management (CDC 2009, Shanghai)

Travelling Waves and Numerical Approximations in …wujh/Paper/travelling_reaction_advection.pdf · Travelling Waves and Numerical Approximations in a Reaction Advection Diffusion

Limited Approximations

Sparse Approximations