46
High-dimensional modeling and forecasting for wind power generation Jakob Messner * , Pierre Pinson * , Yongning Zhao ,* * Technical University of Denmark, China Agricultural University (authors in alphabetical order) Contact - email: [email protected] - webpage: www.pierrepinson.com YEQT Winter School on Energy Systems - 13 December 2017 1 / 46

High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

High-dimensional modeling and forecasting for windpower generation

Jakob Messner∗, Pierre Pinson∗, Yongning Zhao†,∗

∗Technical University of Denmark, †China Agricultural University(authors in alphabetical order)

Contact - email: [email protected] - webpage: www.pierrepinson.com

YEQT Winter School on Energy Systems - 13 December 2017

1 / 46

Page 2: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Outline

Motivations for high-dimension learning and forecasting

General sparsity control for VAR models

Online sparse and adaptive learning for VAR models

Distributed learning

Outlook

2 / 46

Page 3: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

1 From single wind farms to entire regions (1000s)

3 / 46

Page 4: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

A traditional view on wind power forecasting

The wind power forecasting problem is defined for a single location...

... or, if several locations, by considering each of them individually

(Note that, for simplicity, we will only look at very short-term forecasting in this talk, i.e., from a few mins to

1-hour ahead)

4 / 46

Page 5: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Wind farms as a network of sensors

Many works showed that forecast quality could be significantly improved:

by using data at offsite locations (i.e., other wind farms)based on spatio-temporal modelling (and the likes)

improvement of 1-hour ahead forecast RMSE

A Danish example...

Accounting for spatio-temporal effects allowsfor the correction of aggregated powerforecasts for horizons up to 8 hours ahead

Largest improvements at horizons of 2-5hours ahead

5 / 46

Page 6: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Scaling it up

Ultimately, we would like to predict all wind power generation, also solar and load, at thescale of a continental power system, e.g. the European one

CoalFuel OilHydroLignite

Natural GasNuclearUnknownCoal

Fuel OilHydroLignite

Natural GasNuclearUnknown

RE-Europe dataset, available at zenodo.org, descriptor in Nature, Scientific Data 6 / 46

Page 7: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

The big picture...

The “grand forecasting challenge”: predict renewable power generation, dynamicuncertainties and space-time dependencies at once for the whole Europe...!

Linkage with future electricity markets:

Monitoring and forecasting of the complete “Energy Weather” over EuropeProvides all necessary information for coupling of various existing markets (e.g.,day-ahead, balancing), and deciding upon optimal cross-border exchanges

7 / 46

Page 8: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

2 A proposal for general sparsity control (not online though)

8 / 46

Page 9: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Sparsity-controlled vector autoregressive (SC-VAR) model

Traditional LASSO-VAR can only provide overall sparse solutions, but not allow forfine-tuning different aspects of sparsity, e.g. :

overall number of nonzero coefficients of VAR (SA), i.e. the LASSO-VARnumber of explanatory wind farms used in VAR to explain target wind farm i (S i

F )number of past observations of each explanatory wind farm to explain target wind farm i(S i

P )

number of nonzero coefficients to explain target wind farm i (S iN ).

k = 1 k = 2

These aspects can be used to control the sparse structure of the solution as needed,especially when prior knowledge on spatio-temporal characteristics of wind farms areavailable for sparsity-control and expected to improve the forecasting.

9 / 46

Page 10: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Sparsity-controlled vector autoregressive (SC-VAR) model

How to freely control the sparse structure... [E. Carrizosa, et al. 2017]

Introducing binary control variables γ ij and δi

jk

γ ij controls whether wind farm j is used to explain target wind farm i .

δijk controls whether the coefficient αi

jk is zero or not.

Reformulating the VAR estimation as a constrained mixed integer non-linearprogramming (MINLP) problem.

For example: N = 3 wind farms, VAR(2) with p = 2 lagsγ11 γ1

2 γ13

γ21 γ2

2 γ23

γ31 γ3

2 γ33

=

1 0 10 1 01 0 1

⇐⇒ A =

α111 0 α1

31 α112 0 α1

32

0 α221 0 0 α2

22 0α3

11 0 α331 α3

12 0 α332

If additionally with control variable δ3

11 = 0, then

A =

α111 0 α1

31 α112 0 α1

32

0 α221 0 0 α2

22 00 0 α3

31 α312 0 α3

32

That is:

γ ij = 0⇔

p∑k=1

δijk = 0 δi

jk = 0⇔ αijk = 0

10 / 46

Page 11: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Sparsity-controlled vector autoregressive (SC-VAR) model

minα,δ,γ

N∑i=1

T∑t=p

(yi,t+1 −

N∑j=1

p∑k=1

αijkyj,t−k+1

)2

subject to δijk ≤ γ i

j ,∀k ∈ K, i , j ∈ I

N∑j=1

γ ij ≤ S i

F , ∀i ∈ I

p∑k=1

γ ij δ

ijk ≤ S i

P , ∀i , j ∈ I

N∑i=1

N∑j=1

p∑k=1

δijk ≤ SA,∀k ∈ K, i , j ∈ I

N∑j=1

p∑k=1

δijk ≤ S i

N , ∀i ∈ I∣∣∣αijk

∣∣∣ ≥ ηij δ

ijk , ∀k ∈ K, i , j ∈ I

αijk (1− δi

jk ) = 0, ∀k ∈ K, i , j ∈ I

δijk , γ

ij ∈ 0, 1, ∀k ∈ K, i , j ∈ I

I = 1, 2, · · · ,N

K = 1, 2, · · · , p

SA- overall number of nonzerocoefficients of VAR

S iF - number of explanatory wind

farms used in VAR to explain targetwind farm i

S iP - number of past observations of

each explanatory wind farm toexplain target wind farm i

S iN - number of nonzero coefficients

to explain target wind farm i

ηij - a threshold requires that only

coefficients with absolute valuegreater than or equal to ηi

j are

effective otherwise will be zero.

11 / 46

Page 12: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Pros and cons of SC-VAR model

Pros

allows for fully controlling the sparsity from different aspects.

can be directly solved by off-the-shelf standard MINLP solvers.

Cons

SC-VAR allows for sparsity-control but doesn’t tell how to control. Noinformation is available for setting so many parameters, which are practicallyintractable when dealing with high dimensional wind power forecasting.

The constraint∑p

k=1 γij δ

ijk ≤ S i

P is nonlinear.

The constraints are redundant: S iF + S i

P = S iN ,∑

i∈I SiN = SA

The constraint∑∑∑

δijk ≤ SA makes the optimization problem

non-decomposable, which slows down the computation.

Too many variables to be optimized: VAR coefficients αijk , binary control

variables γ ij and δi

jk .

(Note that, though∣∣∣αi

jk

∣∣∣ ≥ ηij δ

ijk and αi

jk (1− δijk ) = 0 are also nonlinear, [E. Carrizosa, et al. 2017] provides

linearized reformulation for them.)

12 / 46

Page 13: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Correlation-constrained SC-VAR (CCSC-VAR) model

Incorporate explicit spatial correlation information into the constraints!

minα,δ

N∑i=1

T∑t=p

(yi,t+1 −

N∑j=1

p∑k=1

αijkyj,t−k+1

)2

subject to δijk ≤ λi

j , ∀k ∈ K, i , j ∈ Ip∑

k=1

δijk ≥ λi

j , ∀i , j ∈ I

N∑j=1

p∑k=1

δijk ≤ S i

N , ∀i ∈ I∣∣∣αijk

∣∣∣ ≤ M · δijk , ∀k ∈ K, i , j ∈ I

δijk , γ

ij ∈ 0, 1, ∀k ∈ K, i , j ∈ I

whereλi

j =

1, φi

j ≥ τ

0, φij < τ∣∣∣αi

jk

∣∣∣ ≤ M · δijk ⇔

−M ≤ αi

jk ≤ M, δijk = 1

αijk = 0, δi

jk = 0

Notations:

φij is the Pearson correlation between

wind farms i and j .

M is a positive constant number(Generally M < 2).

τ and S iN are used to control sparsity.

Improvements: (simpler but better!)

Less parameters need to be tuned whilethe sparsity-control ability is preserved.

More capable of characterizing the trueinter-dependencies between wind farms.

Less variables to be optimized.

All constraints are linear.

The model is decomposable.

13 / 46

Page 14: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Application and case study

25 wind farms randomly chosenover western Denmark

15-minute resolution

20.000 data points for each windfarm

Compared Models:Local forecasting models

Persistence method

Auto-Regressive model

Spatio-temporal models

VAR model

LASSO-VAR model

SC-VAR model

CCSC-VAR model

Performance Metrics:Root Mean Square Error (RMSE)

Mean Absolute Error (MAE)

Sparsity for spatial models

14 / 46

Page 15: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Application and case study

Table: The average RMSE and MAE for all 25 wind farms for different forecasting models

Metrics Persistence AR VAR LASSO-VAR SC-VAR CCSC-VAR

Average RMSE 0.34843 0.34465 0.33156 0.33100 0.33080 0.33058

Average MAE 0.22158 0.22718 0.22631 0.22557 0.22490 0.22408

Model Sparsity n/a n/a 0 0.9248 0.8100 0.7504

RMSE improvement over Persistence method

From the Table and boxplot:

All of the spatio-temporal models significantlyoutperform the local models.

LASSO-VAR has highest sparsity but lowestaccuracy among sparse models.

CCSC-VAR model has lowest sparsity

CCSC-VAR model has lowest average RMSEerror for 25 wind farms

The minimum, maximum and averageimprovements of CCSC-VAR are highestamong these models.

15 / 46

Page 16: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

3 Online sparse and adaptive learning for VAR models

16 / 46

Page 17: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

(Lasso) vector auto regression

Power output depends on previous outputs at the wind farm itself and other wind farms:

yn =L∑

l=1

Al yn−l + εn

MinimizeT∑

t=1

νN−n

||L∑

l=1

(Al yn−l )− yn||22

+ λ

L∑l=1

||Al ||

sparse coefficient matrices Al

time-adaptive coefficients

17 / 46

Page 18: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

(Lasso) vector auto regression

Power output depends on previous outputs at the wind farm itself and other wind farms:

yn =L∑

l=1

Al yn−l + εn

MinimizeT∑

t=1

νN−n

||L∑

l=1

(Al yn−l )− yn||22

+ λ

L∑l=1

||Al ||

sparse coefficient matrices Al

time-adaptive coefficients

18 / 46

Page 19: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

(Lasso) vector auto regression

Power output depends on previous outputs at the wind farm itself and other wind farms:

yn =L∑

l=1

Al yn−l + εn

MinimizeT∑

t=1

νN−n

||L∑

l=1

(Al yn−l )− yn||22 + λL∑

l=1

||Al ||

sparse coefficient matrices Al

time-adaptive coefficients

19 / 46

Page 20: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

(Lasso) vector auto regression

Power output depends on previous outputs at the wind farm itself and other wind farms:

yn =L∑

l=1

Al yn−l + εn

MinimizeT∑

t=1

νN−n||L∑

l=1

(Al yn−l )− yn||22 + λL∑

l=1

||Al ||

sparse coefficient matrices Al

time-adaptive coefficients

0.0

0.4

0.8

wei

ght

past data −−>

20 / 46

Page 21: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

VAR Estimation

Cyclic coordinate descent algorithm:cyclically update coefficients until convergence:

Al [i , j ]←sign(KN )(|KN | − λ)+

LN

KN =N∑

n=1

νN−nyn−l [j ](yn[i ]− yn[i ] + Al [i , j ]yn−l [j ])

= νKN−1 + yN−l [j ](yN [i ]− yN [i ] + Al [i , j ]yN−l [j ])

LN =N∑

n=1

νN−nyn−l [j ]2

= νLN−1 + yN−l [j ]2

→ data need not to be stored

initialize coordinate descent with previous estimates

→ fast convergence

21 / 46

Page 22: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

VAR Estimation

Cyclic coordinate descent algorithm:cyclically update coefficients until convergence:

Al [i , j ]←sign(KN )(|KN | − λ)+

LN

KN =N∑

n=1

νN−nyn−l [j ](yn[i ]− yn[i ] + Al [i , j ]yn−l [j ])

= νKN−1 + yN−l [j ](yN [i ]− yN [i ] + Al [i , j ]yN−l [j ])

LN =N∑

n=1

νN−nyn−l [j ]2

= νLN−1 + yN−l [j ]2

→ data need not to be stored

initialize coordinate descent with previous estimates

→ fast convergence

22 / 46

Page 23: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

VAR Estimation

Cyclic coordinate descent algorithm:cyclically update coefficients until convergence:

Al [i , j ]←sign(KN )(|KN | − λ)+

LN

KN =N∑

n=1

νN−nyn−l [j ](yn[i ]− yn[i ] + Al [i , j ]yn−l [j ])

= νKN−1 + yN−l [j ](yN [i ]− yN [i ] + Al [i , j ]yN−l [j ])

LN =N∑

n=1

νN−nyn−l [j ]2

= νLN−1 + yN−l [j ]2

→ data need not to be stored

initialize coordinate descent with previous estimates

→ fast convergence

23 / 46

Page 24: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Simulation study

1st-order VAR time-series with coefficient matrix

A =

0.9 0 0.1 0 0 0 0 0 0 00 a1 0 0 0 0 0 0 0 00 0 0.8 0 0 0 0 0 0 00 0 a2 0.9 0 0 0 0.2 0 0

0.1 0 0 0 a3 0 0 0 0 00 0 0 a4 0 0.9 0 0 0 −0.10 0 0 0 0 0 0.8 0 0 00 0 0 0 0 0 0 0.7 0 00 0 −0.1 0 0 0 0 0 0.9 00 0 0 0 0 0 0 0 0 0.9

and a white multivariate Gaussian noise.

→ The interesting aspect is that a1, a2, a3, a4 are time varying...

24 / 46

Page 25: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Simulation study

10000 15000 20000 25000 30000 35000

−0.

20.

20.

61.

0

a 1

10000 11000 12000 13000 14000 15000

−0.

20.

20.

61.

0

a 2

10000 11000 12000 13000 14000 15000

−0.

20.

20.

61.

0

time step

a 3

10000 11000 12000 13000 14000 15000

−0.

20.

20.

61.

0

time step

a 4

25 / 46

Page 26: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Simulation study

10000 15000 20000 25000 30000 35000

−0.

20.

20.

61.

0

a 1

10000 11000 12000 13000 14000 15000

−0.

20.

20.

61.

0

a 2

10000 11000 12000 13000 14000 15000

−0.

20.

20.

61.

0

time step

a 3

10000 11000 12000 13000 14000 15000

−0.

20.

20.

61.

0

time step

a 4

truefittedbatch

Sparsity: 49% (true: 83%)26 / 46

Page 27: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Denmark data

100 wind farms (out of 349), 15-min resolutionlogistic transformation2011 (35.036 time steps)batch VAR estimation: first 20.000 datasorted from West to East

Transformed data

transformed power

Fre

quen

cy

−4 −2 0 2 4

010

0020

0030

00

27 / 46

Page 28: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Results

Lag−1 coefficient matrix

site

site

20

40

60

80

20 40 60 80

−0.2

0.0

0.2

0.4

0.6

0.8

1.0

Lag−2 coefficient matrix

site

site

20

40

60

80

20 40 60 80

−0.2

0.0

0.2

0.4

0.6

0.8

1.0

Lag−3 coefficient matrix

site

site

20

40

60

80

20 40 60 80

−0.2

0.0

0.2

0.4

0.6

0.8

1.0

Lag−4 coefficient matrix

site

site

20

40

60

80

20 40 60 80

−0.2

0.0

0.2

0.4

0.6

0.8

1.0

28 / 46

Page 29: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Results

Model

RM

SE

Impr

ovem

ent o

ver

Per

sist

ence

0.00

0.05

0.10

0.15

0.20

onlin

e AR

batc

h VA

R

onlin

e VA

R

15 min ahead

onlin

e AR

batc

h VA

R

onlin

e VA

R

30 min ahead

onlin

e AR

batc

h VA

R

onlin

e VA

R

45 min ahead

onlin

e AR

batc

h VA

R

onlin

e VA

R

60 min ahead

the VAR model with batch learning outperformed AR models with online learningonline sparse learning for the VAR model yields substantial extra gains

29 / 46

Page 30: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

France data

172 wind farms, 10-min resolution

subset 2013 (52.561 time steps)

logistic transformation

batch VAR estimation: first 20.000 time steps

sorted from West to East

Transformed data

transformed power

Fre

quen

cy

−4 −2 0 2 4

020

0040

0060

00

30 / 46

Page 31: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Results

Lag−1 coefficient matrix

site

site

50

100

150

50 100 150

−0.2

0.0

0.2

0.4

0.6

0.8

1.0

Lag−2 coefficient matrix

site

site

50

100

150

50 100 150

−0.2

0.0

0.2

0.4

0.6

0.8

1.0

Lag−3 coefficient matrix

site

site

50

100

150

50 100 150

−0.2

0.0

0.2

0.4

0.6

0.8

1.0

31 / 46

Page 32: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Results

Model

RM

SE

Impr

ovem

ent o

ver

Per

sist

ence

0.00

0.05

0.10

0.15

onlin

e AR

batc

h VA

R

onlin

e VA

R

10 min ahead

onlin

e AR

batc

h VA

R

onlin

e VA

R

20 min ahead

onlin

e AR

batc

h VA

R

onlin

e VA

R

30 min ahead

onlin

e AR

batc

h VA

R

onlin

e VA

R

40 min ahead

the results obtained on the Danish data are confirmed with the French dataset...

32 / 46

Page 33: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Comparison with CCSC-VAR

Model

RM

SE

Impr

ovem

ent o

ver

Per

sist

ence

0.02

0.04

0.06

0.08

onlin

e−AR

batc

h−VA

R

SC−V

AR

onlin

e−VA

R

the CCSC-VAR outperforms (slightly) the basic VAR with batch learning

the online sparse VAR estimator does even better

33 / 46

Page 34: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

4 Distributed learning

34 / 46

Page 35: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Data sharing... or not!

To my knowlegde, most players do not want to share their data - even thoughmodels and forecasts would highly benefit from that!

one may design distributed learning algorithms that are privacy-preserving

Example setup, with a central and contracted agents:

Distributed learning, optimization, etc. is to play a key role in future energyanalytics

35 / 46

Page 36: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Data sharing... or not!

To my knowlegde, most players do not want to share their data - even thoughmodels and forecasts would highly benefit from that!

one may design distributed learning algorithms that are privacy-preserving

Example setup, with a central and contracted agents:

Distributed learning, optimization, etc. is to play a key role in future energyanalytics

36 / 46

Page 37: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Our mathematical setup

Wind power generation measurements xj,t are being collected at sites sj ,j = 1, . . . ,m (with t the time index)

Out of the overall set of wind farms Ω,a central agent is interested in a subset ofwind farms Ωp (dim. mp)contracted agents relate to another subset ofwind farms Ωa (dim. ma)

Write yt the wind power production the centralagent is interested in predicting

3 possible cases in practice:

a wind farm operator contracting neighbouringwind farms (mp = 1)a portfolio manager contracting other windfarms (mp > 1)a system operator interested in the aggregateproduction of all wind farms (mp = m)

37 / 46

Page 38: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

AR models with offsite information

Since restricting ourselves to the very short term, Auto-Regressive (AR) models withoffsite information are sufficient

Such a model reads as

yt = β0 +∑

sj∈Ωp

l∑τ=1

βj,τxj,t−τ +∑

sj∈Ωa

l∑τ=1

βj,τxj,t−τ + εt

where τ is a lag variable (τ = 1, . . . , l)

In a compact form:yt = βxt + εt

As the number of coefficients may be large, we use a Lasso-type estimator, i.e.,

β = argminβ

1

2‖y − Aβ‖2

2 + λ‖β‖1

After estimating β a forecast is given by

yt+1|t = βxt+1

38 / 46

Page 39: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Distributed learning with ADMM

The Alternating Direction Method of Multipliers (ADMM), is a widely useddecomposition approach that allows to split a learning problem among features

The Lasso estimation problem is first reformulated as

min1

2‖y − Aβ‖2

2 + λ‖z‖1

s.t. β − z = 0

It is then split among agents by setting

β = [β1,β2, . . . ,βma+mp]

A = [A1 A2 . . . Ama+mp ]

The iterative solving approach is then defined such that, at iteration k,

(agent j) βkj = argmin

βj

(‖Ajβj − yk−1

j ‖22 +

ρ‖βj‖1

)(central agent) zk =

1

(l + 1)(ma + mp) + ρ

(y + Aβ

kρuk−1

)uk = ρuk−1 + Aβ

k − zk

(where yk−1j = Ajβ

k−1j − Aβ

k−1+ zk−1 − uk−1, and Aβ

k=∑ma+mp

j=1 Ajβj )

39 / 46

Page 40: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Case studies for application

Australia

Data from Australian Electricity MarketOperator (AEMO)

Data is public and shared by Uni. Strathclyde(Jethro Browell) and DTU

22 wind farms over a period of 2 years

5-minute resolution coarsened to 30 minutes

France

Data from Enedis (formerly EDF Distribution)

Data is confidential!

187 wind farms over a period of 3 years (only 85 used here)

10-minute resolution coarsened to 60 minutes

Only out-of-sample evaluation of genuine 1-step ahead forecasting!40 / 46

Page 41: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Case 1: Wind farm operator

Using Australian test case for a simple illustration at a single wind farm

Comparison of persistence benchmark, local model (AR), and distributed learningmodel (ARX)

Table: Comparative results for distributed learning (ARX model), as well as persistence and ARbenchmarks, at an Australian wind farm (wind farm no. 8) for 30-min ahead forecasting.

Persistence AR ARX (dist. learning)RMSE [% nom. capacity] 3.60 3.57 3.52

Improvement [%] - 0.8 2.2

The improvement is modest, but significant

This is while the central agent (wind farm 8) never had access to data of contractedwind farms

Thanks to L1-penalization, the number of contracted wind farm is very limited

41 / 46

Page 42: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Case 1: Wind farm operator (2)

Extensive analysis based on the French dataset

Improvement of distributed learning over local model only, in terms of RMSE

−2

02

46

810

RM

SE

impr

ovem

ent [

%] Improvement is nearly always

there

It ranges from modest tosubstantial

This obviously depends on thewind farm location

42 / 46

Page 43: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Case 2: Portfolio manager

Using French test case

We randomly pick 8 wind farms to build a portfolio

Comparison of persistence benchmark, local model (AR), and distributed learningmodel (ARX)

Table: Comparative results for distributed learning (ARX model), as well as persistence and ARbenchmarks, for a portfolio of 8 wind farms of the French dataset (randomly chosen) for 1-hourahead forecasting.

Persistence AR ARX (dist. learning)RMSE [% nom. capacity] 3.99 3.67 3.38

Improvement [%] - 8.2 15.3

The improvement is substantial

Again, thanks to L1-penalization, the number of contracted wind farm is very limited

Simulation studies may allow to look at how improvement relates to portfolio size,wind farm distribution, etc.

43 / 46

Page 44: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Case 3: System operator

Using French test case

The system operator aims to predict the aggregate of all wind farms, though neveraccessing the wind farm data(!)

Comparison of persistence benchmark, local model (AR), and distributed learningmodel (ARX)

Table: Comparative results for distributed learning (ARX model), as well as persistence and ARbenchmarks, for the aggregate of all 85 French wind farms for 1-hour ahead forecasting.

Persistence AR ARX (dist. learning)RMSE [% nom. capacity] 2.88 2.10 2.05

Improvement [%] - 27.1 28.8

The improvement is modest, since an AR model obviously does very well foraggregated wind power production

Though, the practical interest is huge, since data does not need to eb exchanged

More complex models (e.g., regime-switching) may yield higher improvements

44 / 46

Page 45: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Concluding thoughts

High-dimensional and distributed learning have a bright future in energyanalytics, since

high quantity of distributed data is being collected

data-driven and expert input to reveal and maintain sparsity

most actors do not want to share their data (unless forced to do so)

Some interesting future developments:

online distributed learning (i.e., merger of ideas persented), to lighten computationalcosts and exchange/communication needs

broaden the applicability to a wide class of models, e.g., with regime switching andregression on input weather forecasts

design distributed computation and data sharing markets!

45 / 46

Page 46: High-dimensional modeling and forecasting for wind power ...€¦ · PPP i jk S A makes the optimization problem non-decomposable, which slows down the computation. Too many variables

Thanks for your attention!

46 / 46