Slide 1 © 2008 Warren B. Powell Slide 1 Approximate Dynamic Programming for High-Dimensional...
Preview:
Citation preview
- Slide 1
- Slide 1 2008 Warren B. Powell Slide 1 Approximate Dynamic
Programming for High-Dimensional Problems in Energy Modeling Ohio
St. University October 7, 2009 Warren Powell CASTLE Laboratory
Princeton University http://www.castlelab.princeton.edu 2009 Warren
B. Powell, Princeton University
- Slide 2
- Slide 2 Goals for an energy policy model Potential questions
Policy questions How do we design policies to achieve energy goals
(e.g. 20% renewables by 2015) with a given probability? How does
the imposition of a carbon tax change the likelihood of meeting
this goal? What might happen if ethanol subsidies are reduced or
eliminated? What is the impact of a breakthrough in batteries?
Energy economics What is the best mix of energy generation
technologies? How is the economic value of wind affected by the
presence of storage? What is the best mix of storage technologies?
How would climate change impact our ability to use hydroelectric
reservoirs as a regulating source of power?
- Slide 3
- Slide 3 Goals for an energy policy model Designing energy
supply and storage portfolios to work with wind: The marginal value
of wind and solar farms depends on the ability to work with
intermittent supply. The impact of intermittent supply will be
mitigated by the use of storage. Different storage technologies
(batteries, flywheels, compressed air, pumped hydro) are each
designed to serve different types of variations in supply and
demand. The need for storage (and the value of wind and solar)
depends on the entire portfolio of energy producing
technologies.
- Slide 4
- Slide 4 Intermittent energy sources Wind speed Solar
energy
- Slide 5
- Slide 5 Wind 30 days 1 year
- Slide 6
- Slide 6 Storage Batteries Ultracapacitors Flywheels
Hydroelectric
- Slide 7
- Slide 7 Long term uncertainties. 2010 2015 2020 2025 2030 Tax
policy Batteries Solar panels Carbon capture and sequestration
Price of oil Climate change
- Slide 8
- Slide 8 Goals for an energy policy model Model capabilities we
are looking for Multi-scale Multiple time scales (hourly, daily,
seasonal, annual, decade) Multiple spatial scales Multiple
technologies (different coal-burning technologies, new wind
turbines, ) Multiple markets Transportation (commercial, commuter,
home activities) Electricity use (heavy industrial, light
industrial, business, residential) . Stochastic (handles
uncertainty) Hourly fluctuations in wind, solar and demands Daily
variations in prices and rainfall Seasonal changes in weather
Yearly changes in supplies, technologies and policies
- Slide 9
- Slide 9 Outline Modeling stochastic resource allocation
problems An introduction to ADP ADP and the post-decision state
variable A blood management example The SMART energy policy
model
- Slide 10
- Slide 10 2008 Warren B. Powell Slide 10 A resource allocation
model Attribute vectors:
- Slide 11
- Slide 11 2008 Warren B. Powell Slide 11 A resource allocation
model Modeling resources: The attributes of a single resource: The
resource state vector: The information process:
- Slide 12
- Slide 12 2008 Warren B. Powell Slide 12 A resource allocation
model Modeling demands: The attributes of a single demand: The
demand state vector: The information process:
- Slide 13
- Slide 13 2008 Warren B. Powell Slide 13 Energy resource
modeling The system state:
- Slide 14
- Slide 14 Energy resource modeling The decision variables:
- Slide 15
- Slide 15 Energy resource modeling Exogenous information: Slide
15
- Slide 16
- Slide 16 2008 Warren B. Powell Slide 16 Energy resource
modeling The transition function Known as the: Transition function
Transfer function System model Plant model Model
- Slide 17
- Slide 17 Energy resource modeling Demands Resources
- Slide 18
- Slide 18 Energy resource modeling t t+1 t+2
- Slide 19
- Slide 19 Energy resource modeling t t+1 t+2 Optimizing at a
point in time Optimizing over time
- Slide 20
- Slide 20 Energy resource modeling The objective function How do
we find the best policy? Myopic policies Rolling horizon policies
Simulation-optimization Dynamic programming Decision function
(policy) State variable Contribution function Finding the best
policy Expectation over all random outcomes
- Slide 21
- Slide 21 Outline Modeling stochastic resource allocation
problems An introduction to ADP ADP and the post-decision state
variable A blood management example The SMART energy policy
model
- Slide 22
- Slide 22 Introduction to dynamic programming Bellmans
optimality equation: Assume this is knownCompute this for each
state S
- Slide 23
- Slide 23 Introduction to dynamic programming Bellmans
optimality equation: Problem: Curse of dimensionality Three curses
State space Outcome space Action space (feasible region)
- Slide 24
- Slide 24 Introduction to dynamic programming The computational
challenges: How do we find ? How do we compute the expectation? How
do we find the optimal solution?
- Slide 25
- Slide 25 Introduction to ADP Classical ADP Most applications of
ADP focus on the challenge of handling multidimensional state
variables Start with Now replace the value function with some sort
of approximation May draw from the entire field of
statistics/machine learning.
- Slide 26
- Slide 26 Introduction to ADP Other statistical methods
Regression trees Combines regression with techniques for discrete
variables. Data mining Good for categorical data Neural networks
Engineers like this for low-dimensional continuous problems
Kernel/locally polynomial regression Approximations portions of the
value function locally using simple functions Dirichlet mixture
models Aggregate portions of the function and fit approximations
around these aggregations.
- Slide 27
- Slide 27 Introduction to ADP But this does not solve our
problem Assume we have an approximate value function. We still have
to solve a problem that looks like This means we still have to deal
with a maximization problem (might be a linear, nonlinear or
integer program) with an expectation.
- Slide 28
- Slide 28 Outline Modeling stochastic resource allocation
problems An introduction to ADP ADP and the post-decision state
variable A blood management example The SMART energy policy
model
- Slide 29
- Do not use weather report Use weather report Forecast sunny.6
Rain.8 -$2000 Clouds.2 $1000 Sun.0 $5000 Rain.8 -$200 Clouds.2
-$200 Sun.0 -$200 Schedule game Cancel game Rain.1 -$2000 Clouds.5
$1000 Sun.4 $5000 Rain.1 -$200 Clouds.5 -$200 Sun.4 -$200 Schedule
game Cancel game Rain.1 -$2000 Clouds.2 $1000 Sun.7 $5000 Rain.1
-$200 Clouds.2 -$200 Sun.7 -$200 Schedule game Cancel game Rain.2
-$2000 Clouds.3 $1000 Sun.5 $5000 Rain.2 -$200 Clouds.3 -$200 Sun.5
-$200 Schedule game Cancel game Forecast cloudy.3 Forecast rain.1 -
Decision nodes - Outcome nodes Information Action Information
Action State
- Slide 30
- Slide 30 The post-decision state New concept: The pre-decision
state variable: Same as a decision node in a decision tree. The
post-decision state variable: Same as an outcome node in a decision
tree.
- Slide 31
- Slide 31 The post-decision state An inventory problem: Our
basic inventory equation: Using pre- and post-decision states:
- Slide 32
- Slide 32 The post-decision state Pre-decision, state-action,
and post-decision Pre-decision state State Action Post-decision
state
- Slide 33
- Slide 33 The post-decision state Pre-decision: resources and
demands
- Slide 34
- Slide 34 The post-decision state
- Slide 35
- Slide 35 The post-decision state
- Slide 36
- Slide 36 The post-decision state
- Slide 37
- Slide 37 The post-decision state Classical form of Bellmans
equation: Bellmans equations around pre- and post-decision states:
Optimization problem (making the decision): Note: this problem is
deterministic! Expectation problem (incorporating
uncertainty):
- Slide 38
- Slide 38 Introduction to ADP We first use the value function
around the post-decision state variable, removing the expectation:
We then replace the value function with an approximation that we
estimate using machine learning techniques:
- Slide 39
- Slide 39 The post-decision state Value function approximations:
Linear (in the resource state): Piecewise linear, separable:
Indexed PWL separable:
- Slide 40
- Slide 40 The post-decision state Value function approximations:
Ridge regression (Klabjan and Adelman) Benders cuts
- Slide 41
- Slide 4141 Making decisions Following an ADP policy
- Slide 42
- Slide 4242 Making decisions Following an ADP policy
- Slide 43
- Slide 4343 Making decisions Following an ADP policy
- Slide 44
- Slide 4444 Making decisions Following an ADP policy
- Slide 45
- Slide 45 45 Approximate dynamic programming With luck, the
objective function will improve steadily
- Slide 46
- Slide 46 The post-decision state Comparison to other methods:
Classical MDP (value iteration) Classical ADP (pre-decision state):
Updating around post-decision state: Expectation No
expectation
- Slide 47
- Slide 47 Approximate dynamic programming Step 1: Start with a
pre-decision state Step 2: Solve the deterministic optimization
using an approximate value function: to obtain. Step 3: Update the
value function approximation Step 4: Obtain Monte Carlo sample of
and compute the next pre-decision state: Step 5: Return to step 1.
Simulation Deterministic optimization Recursive statistics
- Slide 48
- Slide 48 Approximate dynamic programming Step 1: Start with a
pre-decision state Step 2: Solve the deterministic optimization
using an approximate value function: to obtain. Step 3: Update the
value function approximation Step 4: Obtain Monte Carlo sample of
and compute the next pre-decision state: Step 5: Return to step 1.
Simulation Deterministic optimization Recursive statistics
- Slide 49
- Slide 49 Outline Modeling stochastic resource allocation
problems An introduction to ADP ADP and the post-decision state
variable A blood management example The SMART energy policy
model
- Slide 50
- Slide 50 Blood management Managing blood inventories
- Slide 51
- Slide 51 Blood management Managing blood inventories over time
t=0 Week 1 Week 2 Week 3 t=1 t=2 t=3 Week 0
- Slide 52
- O-,1 O-,2 O-,3 AB+,2 AB+,3 O-,0 AB+,0 AB+,1 AB+,2 O-,0 O-,1
O-,2 AB+ AB- A+ A- B+ B- O+ O- AB+,0 AB+,1 Satisfy a
demandHold
- Slide 53
- AB+,0 AB+,1 AB+,2 O-,0 O-,1 O-,2 AB+,0 AB+,1 AB+,2 AB+,3 O-,0
O-,1 O-,2 O-,3 AB+,0 AB+,1 AB+,2 AB+,3 O-,0 O-,1 O-,2 O-,3
- Slide 54
- AB+,0 AB+,1 AB+,2 O-,0 O-,1 O-,2 AB+,0 AB+,1 AB+,2 AB+,3 O-,0
O-,1 O-,2 O-,3
- Slide 55
- AB+,0 AB+,1 AB+,2 O-,0 O-,1 O-,2 AB+,0 AB+,1 AB+,2 AB+,3 O-,0
O-,1 O-,2 O-,3 Solve this as a linear program.
- Slide 56
- AB+,0 AB+,1 AB+,2 O-,0 O-,1 O-,2 AB+,0 AB+,1 AB+,2 AB+,3 O-,0
O-,1 O-,2 O-,3 Dual variables give value additional unit of blood..
Duals
- Slide 57
- Slide 57 Updating the value function approximation Estimate the
gradient at
- Slide 58
- Slide 58 Updating the value function approximation Update the
value function at
- Slide 59
- Slide 59 Updating the value function approximation Update the
value function at
- Slide 60
- Slide 60 Updating the value function approximation Update the
value function at
- Slide 61
- Slide 61 Outline Modeling stochastic resource allocation
problems An introduction to ADP ADP and the post-decision state
variable A blood management example The SMART energy policy
model
- Slide 62
- Slide 62 SMART-Stochastic, multiscale model SMART: A
Stochastic, Multiscale Allocation model for energy Resources,
Technology and policy Stochastic able to handle different types of
uncertainty: Fine-grained Daily fluctuations in wind, solar,
demand, prices, Coarse-grained Major climate variations, new
government policies, technology breakthroughs Multiscale able to
handle different levels of detail: Time scales Hourly to yearly
Spatial scales Aggregate to fine-grained disaggregate Activities
Different types of demand patterns Decisions Hourly dispatch
decisions Yearly investment decisions Takes as input parameters
characterizing government policies, performance of technologies,
assumptions about climate
- Slide 63
- Slide 63 The annual investment problem 2008 2009
- Slide 64
- Slide 64 The hourly dispatch problem Hourly electricity
dispatch problem
- Slide 65
- Slide 65 The hourly dispatch problem Hourly model Decisions at
time t impact t+1 through the amount of water held in the
reservoir. Hour t Hour t+1
- Slide 66
- Slide 66 The hourly dispatch problem Hourly model Decisions at
time t impact t+1 through the amount of water held in the
reservoir. Value of holding water in the reservoir for future time
periods. Hour t
- Slide 67
- Slide 67 The hourly dispatch problem
- Slide 68
- Slide 68 The hourly dispatch problem 2008 Hour 1 2 3 4 8760
2009 1 2
- Slide 69
- Slide 69 The hourly dispatch problem 2008 Hour 1 2 3 4 8760
2009 1 2
- Slide 70
- Slide 70 SMART-Stochastic, multiscale model 2008 2009
- Slide 71
- Slide 71 SMART-Stochastic, multiscale model 2008 2009
- Slide 72
- Slide 72 SMART-Stochastic, multiscale model 2008 2009 2010 2011
2038
- Slide 73
- Slide 73 SMART-Stochastic, multiscale model 2008 2009 2010 2011
2038
- Slide 74
- Slide 74 SMART-Stochastic, multiscale model 2008 2009 2010 2011
2038 ~5 seconds
- Slide 75
- Slide 75 SMART-Stochastic, multiscale model Use statistical
methods to learn the value of resources in the future. Resources
may be: Stored energy Hydro Flywheel energy Storage capacity
Batteries Flywheels Compressed air Energy transmission capacity
Transmission lines Gas lines Shipping capacity Energy production
sources Wind mills Solar panels Nuclear power plants Amount of
resource Value
- Slide 76
- Slide 76 SMART-Stochastic, multiscale model Approximating
continuous functions The algorithm performs very fine
discretization over a small range of the function which is visited
most often.
- Slide 77
- Slide 77 SMART-Stochastic, multiscale model Benchmarking
Compare ADP to optimal LP for a deterministic problem Annual model
8,760 hours over a single year Focus on ability to match hydro
storage decisions 20 year model 24 hour time increments over 20
years Focus on investment decisions Comparisons on stochastic model
Stochastic rainfall analysis How does ADP solution compare to LP?
Carbon tax policy analysis Demonstrate nonanticipativity Slide
77
- Slide 78
- Slide 7878 Iterations Percentage error from optimal 0.06% over
optimal Benchmarking on hourly dispatch 2.50 2.00 1.50 1.00 0.50
0.00 ADP objective function relative to optimal LP
- Slide 79
- Slide 79 Benchmarking on hourly dispatch Optimal from linear
program Reservoir level Demand Rainfall
- Slide 80
- Slide 8080 Benchmarking on hourly dispatch ADP solution
Reservoir level Demand Rainfall Approximate dynamic
programming
- Slide 81
- Slide 81 Benchmarking on hourly dispatch Optimal from linear
program Reservoir level Demand Rainfall Optimal from linear
program
- Slide 82
- Slide 82 Benchmarking on hourly dispatch ADP solution Reservoir
level Demand Rainfall Approximate dynamic programming
- Slide 83
- Slide 83 2009 Warren B. Powell Slide 83 Multidecade energy
model Optimal vs. ADP daily model over 20 years 0.24% over
optimal
- Slide 84
- Slide 84 Energy policy modeling Traditional optimization models
tend to produce all-or-nothing solutions Cost differential: IGCC -
Pulverized coal Pulverized coal is cheaper IGCC is cheaper
Investment in IGCC Traditional optimization Approximate dynamic
programming
- Slide 85
- Slide 8585 Time period Precipitation Sample paths Stochastic
rainfall
- Slide 86
- Slide 8686 Reservoir level Optimal for individual scenarios
Time period ADP Stochastic rainfall
- Slide 87
- Slide 87 Energy policy modeling Following sample paths Demands,
prices, weather, technology, policies, Slide 87 2030 Achieved goal
w/ Prob. 0.70 Metric (e.g. % renewable) Need to consider:
Finge-grained noise (wind, rain, demand, prices, ) Coarse-grained
noise (technology, policy, climate, ) Need to consider:
Finge-grained noise (wind, rain, demand, prices, ) Coarse-grained
noise (technology, policy, climate, )
- Slide 88
- Slide 88 Energy policy modeling Policy study: What is the
effect of a potential (but uncertain) carbon tax in year 8? 1 2 3 4
5 6 7 8 9 Year Carbon tax 0
- Slide 89
- Slide 89 Energy policy modeling Renewable technologies
Carbon-based technologies No carbon tax
- Slide 90
- Slide 90 Energy policy modeling With carbon tax Carbon-based
technologies Renewable technologies Carbon tax policy unknown
Carbon tax policy determined
- Slide 91
- Slide 91 Energy policy modeling With carbon tax Carbon-based
technologies Renewable technologies
- Slide 92
- Slide 92 Conclusions Capabilities SMART can handle problems
with over 300,000 time periods so that it can model hourly
variations in a long- term energy investment model. It can simulate
virtually any form of uncertainty, either provided through an
exogenous scenario file or sampled from a probability distribution.
Accurate modeling of climate, technology and markets requires
access to exogenously provided scenarios. It properly models
storage processes over time. Current tests are on an aggregate
model, but the modeling framework (and library) is set up for
spatially disaggregate problems.
- Slide 93
- Slide 93 Conclusions Limitations More research is needed to
test the ability of the model to use multiple storage technologies.
Extension to spatially disaggregate model will require significant
engineering and data. Run times will start to become an issue for a
spatially disaggregate model. Value function approximations capture
the resource state vector, but are limited to very simple exogenous
state variations.
- Slide 94
- Slide 94 Outline Modeling stochastic resource allocation
problems An introduction to ADP ADP and the post-decision state
variable A blood management example The SMART energy policy model
Merging machine learning and optimization
- Slide 95
- Slide 95 Merging machine learning and optimization The
challenge of coarse-grained uncertainty Fine-grained uncertainty
can generally be modeled as memoryless (even if it is not).
Coarse-grained uncertainty affects what might be called state of
the world. The value of a resource depends on the state of the
world. Is there a carbon tax? What is the state of battery
research? Have there been major new oil discoveries? What is the
price of oil? Did the international community adopt strict limits
on carbon emissions? Has their been advances in our understanding
of climate change?
- Slide 96
- Slide 96 Merging machine learning and optimization Modeling the
state of the world We can use powerful machine learning algorithms
to overcome these new curses of dimensionality. Instead of one
piecewise linear value function for each resource and time period
We need one for each state of the world. There can be thousands of
these.
- Slide 97
- Slide 97 Merging machine learning and optimization Strategy 1:
Locally polynomial regression Widely used in statistics Approximate
complex functions locally using simple functions. Estimate of the
function is a weighted sum of these local approximations. But
cannot handle categorical variables.
- Slide 98
- Slide 98 Merging machine learning and optimization Strategy 2:
Dirichlet process mixtures of generalized linear models
- Slide 99
- Slide 99 Merging machine learning and optimization Strategy 3:
Hierarchical learning models Estimate piecewise constant functions
at different levels of aggregation:
- Slide 100
- Slide 100 Merging machine learning and optimization Next steps:
We need to transition these machine learning techniques into an ADP
setting: Can they be adapted to work within a linear or nonlinear
optimization algorithm? All three methods are asymptotically
unbiased, but this depends on unbiased observations. In an ADP
algorithm, observations are biased. We need to design an effective
exploration strategy so that the solution does not become stuck.
Other issues Will the methods provide fast, robust solutions for
effective policy analysis?
- Slide 101
- Slide 101 2009 Warren B. Powell 2008 Warren B. Powell Slide
101
- Slide 102
- Slide 102
- Slide 103
- Slide 103
- Slide 104
- Slide 104 2009 Warren B. Powell Demand modeling Commercial
electric demand 7 days