Stochastic processes for hydrological optimization problems Geoffrey Pritchard University of Auckland

Stochastic processes for hydrological optimization problems

Geoffrey PritchardUniversity of Auckland

Prologue: Iterated Function Systems

Consider the following Markov process in the plane:

chance 50% , 7.0

4.0

9.01.0

5.08.0

chance 50% , 6.0

5.0

6.07.0

4.02.0

1

1

t

t

t

t

t

t

Y

X

Y

X

Y

X

(Each step is randomly chosen from a finite list of affine transformations, independently of previous steps.)

(Is this useful for anything, besides (maybe) computer graphics?)

Hydro scheduling – an optimal control problem

random inflows

state variables: stored energy

control variables: outflows

Control water releases over time to maximize value.– As a workably competitive market would do.

Hydro scheduling – an optimal control problem

How not to do it:

1. Develop a stochastic model of inflows.2. Optimize releases versus the given inflow-generating process.

statistics

OR

• Small dataset.

e.g. autumn 2014 :

- Mar ~ 1620 MW

- Apr ~ 2280 MW

- May ~ 4010 MW

Past years (if any) with this exact sequence are not a reliable forecast for June 2014.

Why develop a model of inflows?

• A model allows events to be more extreme than anything in the data

The worst event ever observed is not the worst possible

Why not just use the historical data non-parametrically?

Week t-1 Week t Week t+1

min (present cost) + E[ future cost ]

s.t. satisfy demand, etc. with

stored energy + random inflow Xt

Time/information structure

• Each stage subproblem is a random optimization problem.

• Stage t subproblem is solved with knowledge of Xt, but not of the future.

– Weekly stages might be good, continuous time worse.


minu (present cost)(u) + gt(u)


(stored energy)(y) + random inflow Xt

• Let gt(u) = expected cost of consequences after week t of doing u in week t.

• Essential observation is that is convexity-preserving.

– so all optimizations can be of convex functions, i.e. tractable.

• Computationally, convex subproblems -> linear programs.

gt-1(y) =

1tt gg

Optimization

Model me this...

Benmore

Ohau C

Ohau A

Ohau B

Tekapo B

Tekapo A

Aviemore

Waitaki

(the upper Waitaki catchment)

• ~ 20% of NZ electricity derives from precipitation in this region

• Rainfall + summer snowmelt from Southern Alps.

Strong seasonal dependence

– 3:1 ratio between midsummer high/midwinter low.

Inflow dataWaitaki catchment above Benmore dam, weekly, 1948-2010

• Weather patterns persist

– increases probability of shortage/spill.

• Typical correlation length ~ several weeks (but varying seasonally).

– convenient for optimization (cf. e.g. Brazil).

Serial dependence

Extreme values

Hydro-scheduling is sensitive to extremes of inflow (in both tails).

• Low inflow -> reservoirs run dry (the most momentous thing that can happen)

• High inflow -> economic loss (spill); removes risk of shortage.

• Beware discrete approximations to the distribution!

De-seasonalization

A convenient normalization, but does not make (Qt) stationary!

t

tt g

XQ

inflow

ttt gX loglog

via regression:

Suggestion: an autoregressive model

tttt QQ 1loglog

)exp(1 ttttQQ

The AR(1) model

that is,

seems reasonable.

(Life should be so simple.)

ACF of (Qt)

• Stage t subproblem is solved with knowledge of Xt, but not of the future.

– stagewise independence, i.e. (Xt) an independent sequence.

• Inflows are not stagewise independent.

– Suggested model is Markov.





gt-1(y) =

Stagewise independence




(stored energy)(y) + (inflow)(y, Wt)

From independent to Markov inflows

• Make inflow a function of

– what happened last week (y)

– a random innovation Wt – with (Wt ) independent

That’ll work – if we can express it as a linear program.

gt-1(y) =

with an independent sequence, and a linear function.

LP-compatible autoregressive processes

),( 1 tttt WQHQ

• We’re allowed a process with

• But what we had in mind was

)exp(1 ttttQQ

which is nonlinear.

),( tt WH )( tW

• It’s concave, though, so admits a piecewise linear approximation.



),( 1 tttt WQHQ



)exp(1 ttttQQ

which is nonlinear.

),( tt WH )( tW




),( 1 tttt WQHQ



)exp(1 ttttQQ

which is nonlinear.

),( tt WH )( tW


One linear piece

Approximate the model

by linearizing about

)exp(1 ttttQQ

tqq 1q

)exp())1(1( 1 tttt QQ

How to fit this?

Inference

1. Auto-regression on log-inflows:

2. Auto-regression on Qt-1 :

(ignores the linear approximation step)

tttt QQ 1loglog

tttt QQ )1(1 1

(ignores the structure of errors)

3. Or we could do it right: a max-likelihood fit on the actual model:






• The leading algorithm for problems of this type.

• Essential step (backward pass):

– evaluate the expectation for given y, using current estimate of gt.

– Use dual variables from optimization to form a cut (linear lower bound), which improves estimate of gt-1.

gt-1(y) =

Stochastic dual dynamic programming (SDDP)





s

The importance of being discrete

• If random elements have a discrete joint distribution:

– solve the optimization problem for each atom.

• Otherwise, need a (Monte Carlo?) discrete approximation

– with not too many discrete scenarios, please

(computation time is (at least) proportional to number of scenarios)

gt-1(y) =

s

ps

Sample average approximation (SAA)

Objective function is an expectation, over a continuous distribution.

Only way to evaluate it is by Monte Carlo sampling.

Fix a sample, optimize the resulting approximation.

A catalogue of errors

Our efforts to model inflows have incurred

• model mis-specification error

– inflows might not really be AR-1

• inferential sampling error (finite data)

– parameters may be wrong

• sample average approximation error

– optimization is vs. a discrete approximation of inflow process

discrete (data)

continuous (AR-1 model)

discrete (SAA approx to transition kernel)

Can we obtain, in one step,

a good representation of the data by a model of the final form required?

The final form requiredModel for inflow Qt in week t :

- where (Rt, St) is chosen at random from a small collection of (seasonally-varying) scenarios.

1 tttt QSRQ

A linear iterated function system (IFS) Markov process.


(t discretely distributed)

Or more generally

The final form requiredModel for inflow Qt in week t :

- where (Rt, St) is chosen at random from a small collection of (seasonally-varying) scenarios.

1 tttt QSRQ

A linear iterated function system (IFS) Markov process.


(t discretely distributed)

Or more generally

Linear IFS Markov inflow model

• Have data xi and yi for i=1,…n

x

y

Fitting a model to data: quantile regression

x

y


• Want to represent the distribution of y|x by finitely many scenarios.


x

y


• Want to represent the distribution of y|x by finitely many scenarios.

• Quantile regression:

choose scenario sk() to

minimize i k( yi – sk(xi) )

for a suitable loss function k().


Quantile regression fitting

• For a scenario at quantile , is the loss function

• For each scenario, the quantile regression problem is a linear program.

For each of a fixed collection of quantiles, fit a scenario (Rt, St) by quantile regression:

Fitting a model to data: quantile autoregression

1 tttt XSRX

Quantiles () 0.02 0.1 0.2 0.35 0.5 0.65 0.8 0.9 0.98

Scenario probabilities 0.06 0.07 0.125 0.15 0.15 0.15 0.125 0.07 0.06

Scenario probabilities determined from quantiles; can be unequal.

• Scenarios should not cross.

• Dependence (slope St) can vary across the probability distribution.

High-flow scenarios differ in intercept (current rainfall). Low-flow scenarios differ mainly in slope.

Extreme scenarios have their own dependence structure.

Fitting a model to data: quantile autoregression

1 tttt XSRX

Continuous ranked probability score (CRPS)

• A method of judging the merit of a prediction made in the form of a probability distribution.

• Given prediction distribution F and actual outcome y,

dxxFyF yx2)1)((),(CRPS

y

F

Fitting a model to data: CRPS M-estimation

CRPS can also be used as an estimation method for multi-scenario regression.

• Given x, scenarios for y are s1(x) … sm(x) with probabilities p1 ... pm

• Choose sk() and pk to

x

y

jj

kxsk ypjk

,CRPSminimize )(

• This is the most computationally challenging method (global optimization, not LP or least-squares).

Fitting a model to data: CRPS M-estimation

1 tttt XSRX

• Scenarios may cross.

• Scenario probabilities are optimized in model fitting, instead of being arbitrarily chosen

• Quite small numbers of scenarios seem possible.

Multivariate inflow models

• Need to capture spatial as well as temporal correlations.

• Generalize models:

– Autoregressive: need discrete approx. to multivariate error.

– Quantile regression: no natural generalization of quantile.

– CRPS M-estimation: generalization to energy score.

1inflow IslandSouth

inflow IslandNorth

tttt XSRX

A test problem

Challenging fictional system based on Waitaki catchment inflows.

• Storage capacity 1000 GWh (cf. real Waitaki lakes 2800 GWh)

• Generation capacity 1749 MW hydro, 900 MW thermal

• Demand 1550 MW, constant

• Thermal fuel $50 / MWh, VOLL $1000 / MWh

Test problem: a dry winter.

• 35 weeks (2 April – 2 December)

• Initial storage 336 GWh

• Initial inflow 500 MW (~56% of average)

Solved with Doasa 2.0 (EPOC’s SDDP code).

Inflow modelNo. scenarios

per stageLost load

(MW, probability)

Spill

(MW, probability)

Energy price

($/MWh)

quantile regression

16 9.4, 28% 2.9, 6% 251

autoregressive, resampled errors

63 13.3, 37% 17.1, 15% 296

autoregressive, lognormal errors

63 6.8, 23% 6.0, 9% 207

independent

(uncorrected)16 1.59, 9% 0.14, 1% 112

(Quantities are expected averages over full time horizon; probabilities are for any shortage/spill within time horizon.)

Results – optimal strategy

Documents

Stochastic processes for hydrological optimization problems Geoffrey Pritchard University of Auckland