7
An Unsupervised Deep Learning Approach for Scenario Forecasts Yize Chen * , Xiyu Wang , and Baosen Zhang * * Department of Electrical Engineering, University of Washington, Seattle, WA, USA {yizechen, zhangbao}@uw.edu Department of Electrical Engineering, Tsinghua University, Beijing, China {xiyu-wan14}@mails.tsinghua.edu.cn Abstract—In this paper, we propose a novel scenario forecasts approach which can be applied to a broad range of power system operations (e.g., wind, solar, load) over various forecasts horizons and prediction intervals. This approach is model-free and data- driven, producing a set of scenarios that represent possible future behaviors based only on historical observations and point forecasts. It first applies a newly-developed unsupervised deep learning framework, the generative adversarial networks, to learn the intrinsic patterns in historical renewable generation data. Then by solving an optimization problem, we are able to quickly generate large number of realistic future scenarios. The proposed method has been applied to a wind power generation and forecasting dataset from national renewable energy laboratory. Simulation results indicate our method is able to generate scenarios that capture spatial and temporal correlations. Our code and simulation datasets are freely available online. Index Terms—deep learning, generative models, renewable scenario forecasting I. I NTRODUCTION The integration of high penetration of renewable generation into power systems calls for a growing need to model the uncertain and intermittent characteristics of these resources. An important method used in characterizing the behavior of renewable resources is scenario generation, where a set of possible future realizations are provided for the system opera- tor. Compared to deterministic point forecasts or probabilistic forecasts [1], scenario forecasts could not only inform users of the uncertainty about the future, but also reflect the temporal dependence of renewable power generation [2], [3]. The in- formation provided by these generated scenarios is valuable for a host of decision-making and stochastic optimization problems, such as the economic dispatch of renewables [4], [5], unit commitment [6], [7] and many others. Therefore, in recent years, many algorithms have been introduced for various applications, from load forecasting to wind and solar power generations. One of the biggest challenges of scenario forecasts is the difficulty of modeling and learning the underlying stochastic processes that drives renewable power generation [8]. Previous statistical or physical methods like first-order autoregressive model [9], ensemble methods and Gaussian Copula [2], [10], [11] either required strong statistical assumptions or detailed physical measurements and modeling. What’s more, most of these methods focus on capturing the marginal distribution of each individual time slots of the forecasting horizon, while paying less attention to the temporal correlations in the scenarios [12]. 0 50 100 150 200 250 0 2 4 6 8 10 12 14 16 Generator 0 50 100 150 200 250 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0 50 100 150 200 250 0 2 4 6 8 10 12 14 16 0 50 100 150 200 250 0 2 4 6 8 10 12 14 16 0 50 100 150 200 250 0 2 4 6 8 10 12 14 16 Discriminator ... ... ... ... ... ... ... ... ... ... ... ... Time-series records 0 2 4 6 8 10 12 14 16 ... ... Noise input Generated samples Output prediction real/fake 0 100 20 40 60 80 1 0 Historical data Generator ... ... ... ... ... ... (a) (b) GAN training Optimization 0 20 40 60 80 100 1 0 Group of Optimal inputs Group of Generated scenarios 0 50 100 150 200 250 Realized Scenarios 0 100 200 300 400 500 0 2 4 6 8 10 12 14 16 ? Historical Data minimize loss Power value[MW] Power value[MW] Power value[MW] 0 2 4 6 8 10 12 14 16 Figure 1. Model architecture for our proposed method. During the training process (a), the generator transforms noise vectors into generated time-series, and the discriminator is fed with both historical time-series and generated time-series, and try to discriminate the source of the input. Once training completes, we are able to optimize over the noise vectors to find the future scenarios from generator output conditioned on historical observations (b). In [13], an unsupervised machine learning algorithm using Generative Adversarial Networks (GANs) [14] was introduced to directly generate realistic scenarios based only on historical data, without the need to fit an explicit model. In this paper, we extend the algorithm to the scenario forecasting problem. Compared to the work in [13], this paper focuses on gener- ating scenarios conditioned on a given central forecast. Our proposed method is entirely model-free and data driven. Based on deep learning, the GANs used in our proposed method are unsupervised learners who can directly learn and generate arXiv:1711.02247v2 [math.OC] 19 Mar 2018

An Unsupervised Deep Learning Approach for Scenario Forecasts · driven, producing a set of scenarios that represent possible future behaviors based only on historical observations

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Unsupervised Deep Learning Approach for Scenario Forecasts · driven, producing a set of scenarios that represent possible future behaviors based only on historical observations

An Unsupervised Deep Learning Approach forScenario Forecasts

Yize Chen∗, Xiyu Wang†, and Baosen Zhang∗∗Department of Electrical Engineering, University of Washington, Seattle, WA, USA

{yizechen, zhangbao}@uw.edu†Department of Electrical Engineering, Tsinghua University, Beijing, China

{xiyu-wan14}@mails.tsinghua.edu.cn

Abstract—In this paper, we propose a novel scenario forecastsapproach which can be applied to a broad range of power systemoperations (e.g., wind, solar, load) over various forecasts horizonsand prediction intervals. This approach is model-free and data-driven, producing a set of scenarios that represent possiblefuture behaviors based only on historical observations and pointforecasts. It first applies a newly-developed unsupervised deeplearning framework, the generative adversarial networks, to learnthe intrinsic patterns in historical renewable generation data.Then by solving an optimization problem, we are able to quicklygenerate large number of realistic future scenarios. The proposedmethod has been applied to a wind power generation andforecasting dataset from national renewable energy laboratory.Simulation results indicate our method is able to generatescenarios that capture spatial and temporal correlations. Ourcode and simulation datasets are freely available online.

Index Terms—deep learning, generative models, renewablescenario forecasting

I. INTRODUCTION

The integration of high penetration of renewable generationinto power systems calls for a growing need to model theuncertain and intermittent characteristics of these resources.An important method used in characterizing the behavior ofrenewable resources is scenario generation, where a set ofpossible future realizations are provided for the system opera-tor. Compared to deterministic point forecasts or probabilisticforecasts [1], scenario forecasts could not only inform users ofthe uncertainty about the future, but also reflect the temporaldependence of renewable power generation [2], [3]. The in-formation provided by these generated scenarios is valuablefor a host of decision-making and stochastic optimizationproblems, such as the economic dispatch of renewables [4],[5], unit commitment [6], [7] and many others. Therefore,in recent years, many algorithms have been introduced forvarious applications, from load forecasting to wind and solarpower generations.

One of the biggest challenges of scenario forecasts is thedifficulty of modeling and learning the underlying stochasticprocesses that drives renewable power generation [8]. Previousstatistical or physical methods like first-order autoregressivemodel [9], ensemble methods and Gaussian Copula [2], [10],[11] either required strong statistical assumptions or detailedphysical measurements and modeling. What’s more, most ofthese methods focus on capturing the marginal distributionof each individual time slots of the forecasting horizon,

while paying less attention to the temporal correlations in thescenarios [12].

0 50 100 150 200 2500246810121416

Generator

0 50 100 150 200 2500

0.0050.010.0150.020.0250.030.0350.040.0450.05

0 50 100 150 200 2500246810121416

0 50 100 150 200 2500246810121416

0 50 100 150 200 2500246810121416

Discriminator...

...

...

...

...

...

...

...

...

...

...

...

Time-seriesrecords

0246810121416

...

...

Noise input

Generatedsamples

Output predictionreal/fake

0 10020 40 60 80

1

0

Historical data

Generator

...

...

...

...

...

...

(a)

(b)

GAN training

Optimization

0 20 40 60 80 100

1

0

Group ofOptimal inputs

Group ofGenerated scenarios

0 50 100 150 200 250

Realized

Scenarios

0 100 200 300 400 5000246810121416

?Historical

Dataminimize loss

Pow

er v

alue

[MW

]

Pow

er v

alue

[MW

]

Pow

er v

alue

[MW

]

0246810121416

Figure 1. Model architecture for our proposed method. Duringthe training process (a), the generator transforms noise vectors intogenerated time-series, and the discriminator is fed with both historicaltime-series and generated time-series, and try to discriminate thesource of the input. Once training completes, we are able to optimizeover the noise vectors to find the future scenarios from generatoroutput conditioned on historical observations (b).

In [13], an unsupervised machine learning algorithm usingGenerative Adversarial Networks (GANs) [14] was introducedto directly generate realistic scenarios based only on historicaldata, without the need to fit an explicit model. In this paper,we extend the algorithm to the scenario forecasting problem.Compared to the work in [13], this paper focuses on gener-ating scenarios conditioned on a given central forecast. Ourproposed method is entirely model-free and data driven. Basedon deep learning, the GANs used in our proposed methodare unsupervised learners who can directly learn and generate

arX

iv:1

711.

0224

7v2

[m

ath.

OC

] 1

9 M

ar 2

018

Page 2: An Unsupervised Deep Learning Approach for Scenario Forecasts · driven, producing a set of scenarios that represent possible future behaviors based only on historical observations

0

0 12060

hour

mins

(a)

(b)

Lead Time=4 Hours Lead Time=6 Hours Lead Time=12 Hours

Win

d Po

wer

[MW

]A

utoc

orre

latio

n

0 12060mins

0 12060mins

1684 12 0 hour1684 12 0 hour1684 120

2

4

6

8

10

12

14

0

2

4

6

8

10

12

14

0

2

4

6

8

10

12

14

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

-0.5

0

0.5

1

Figure 2. In (a) we show a group of 20 generated scenarios (red) with different scale of prediction lead time corresponding to the truemeasurements (blue). In (b) we show the associated autocorrelation plots for both measured values and generated scenarios.

time-series which hold same properties as the training time-series. The following optimization step would help us findgroup of scenarios based on point forecasts. Our approachcan be used for a variety of scenario forecasts problems, e.g.,wind and solar generation, and is easy to adjust the forecasthorizon (e.g. ranging from 1-2 hours to 1-2 days) with littletuning. Specifically, we make the following contributions:

1) Based on any provided point forecast method along withhistorical observations, our method is able to generatea group of short-term forecasting scenarios representingthe temporal correlations and fluctuation distribution.

2) The proposed approach can generate scenarios withoutforecast horizon or number restrictions.

3) The proposed approach is free of statistical assumptionsand ready to use in real power generation processes.

Our proposed method for scenarios forecasts contains thefollowing two components:

1) Generative Adversarial Networks: A generative adver-sarial networks (GANs) is composed of two deep neuralnetworks, the generator and the discriminator, who play azero-sum game. GANs are provided with past observations ofrenewables generation processes. Suppose these samples aredrawn from an unknown underlying “true” distribution. Thegenerator has access to a well-defined noise distribution (e.g.,Gaussian) and can draw i.i.d. samples from this distribution.The generator’s goal is to find a function that transforms avector from the known noise distribution to a sample followingthe same distribution as past observations. The discriminator’sgoal is to distinguish the generated data and the true historicaldata. By training the generator and the discriminator to anequilibrium, the discriminator can no longer distinguish be-tween generated and historical data, which means the generatorcan produce realistic time-series samples as if they are comingfrom the true distribution [14], [15]. Section. II describes this

approach in more detail.2) Optimization of scenarios forecasts: We are interested

in forecasting a group of scenarios which could inform systemoperators the possible future realizations of power generationprocess. In Section. III we detail the setup of the optimizationproblem with a pre-trained GANs model. We also show howthe optimization problem can be solved iteratively to obtainhigh-quality scenario forecasts. Some generated scenarios withdifferent forecast horizons are shown in Fig. 2.

Numerical simulation are performed and evaluated inSection.IV. We will show our generated scenarios not onlysatisfy the needs of reliability and accurateness as a fore-casting tool, they also capture the temporal dynamics ofpower generation. We also make our code for the proposedmethod freely available1, which can meet the needs for anefficient computation tool for generating reliable and accuratescenarios.

II. GENERATIVE ADVERSARIAL NETWORKS

In this section, we describe the setup for GANs [14]. Wefirst formulate the training objectives for the discriminator andthe generator respectively, and show GANs is a good fit togenerate a potentially unlimited number of renewable powerproduction time-series. In Section III we illustrate how thistime-series producer can be served in an optimization problemto find desired scenarios forecasts.

The architecture of GANs we use is shown in Fig. 1a.Assume observations xt

j for times t ∈ T of renewable powerproduction are available for each power plant j, j = 1, ...,N.Denote the true distribution of the observation as PX , whichis unknown and maybe difficult to model because of complexspatial and temporal correlations. Suppose we have access toa group of noise vector input z under a known distribution

1https://github.com/chennnnnyize/Scenario-Forecasts-GAN

Page 3: An Unsupervised Deep Learning Approach for Scenario Forecasts · driven, producing a set of scenarios that represent possible future behaviors based only on historical observations

Z ∼ PZ that is easily sampled from (e.g., jointly Gaussianor uniform). Given a sample z drawn from PZ , our objectiveis to find a function G such that after transformation, G(z)follows PX . This is accomplished by simultaneously trainingtwo deep neural networks: the generator network G(z;θ (G))and the discriminator network D(x;θ (D)). Here, θ (G) and θ (D)

denote the weights of two neural networks, respectively. Forconvenience, we sometimes suppress the symbol θ .Generator: During the training process, the generator istrained to take a batch of inputs from the noisy distribution PZ ,and by taking a series of up-sampling operations by neurons ofdifferent functions, and to output realistic time-series samples.Ideally, they should appear as if drawn from PX . Therefore,after training finishes, the mapping G(z;θ (G)) should followthe true data distribution PX .Discriminator: The discriminator is trained simultaneouslywith the generator. It takes input samples either comingfrom real historical data or coming from the generator. Bytaking a series of operations of down-sampling using anotherdeep neural network, it outputs a continuous value preal thatmeasures to what extent the input samples belong to PX . Thediscriminator can be expressed as D(x;θ (D)), where x maycome from PX or PZ . The discriminator is trained to learn todistinguish between PX from PZ , and thus to maximize thedifference between D(X) (X from real data) and D(G(Z)).

With the objectives for the discriminator and the generatordefined, we can now formulate loss function LG for thegenerator and LD for the discriminator to train to optimize theperformance of them (i.e., update neural networks’ weightsbased on the losses). A small LG reflects that G(z) is asrealistic as possible from the discriminator’s perspective, e.g.,the generated scenarios are “looking like” historical scenariosto the discriminator. Similarly, a small LD indicates discrim-inator is good at telling the difference between generatedscenarios and historical scenarios, which means there is a largedifference between PG and PX . Following this guideline andthe loss defined in [15], we define LD and LG as:

LG =−EZ [D(G(Z))] (1a)LD =−EX [D(X)]+EZ [D(G(Z))]. (1b)

In the above, the expectations are taken as empirical averagesbased either on the historical data or on the generated data.Note the functions D and G are parametrized by the weightsof two distinct deep neural networks.

We can now combine (1a) and (1b) to construct the minimaxgame value function V (G,D) for these two players:

minG

maxD

V (G,D) = EX [D(X)]−EZ [D(G(Z))] (2)

where V (G,D) is the negative of LD.During first few training iterations, G just generates time-

series samples G(z) totally different from samples in PX ,and after learning from those samples coming from PX , thediscriminator is able to reject G(z) with high confidence.In that case, LD is small, and LG, V (G,D) are both large.The generator gradually learns to generate more realistically

looking samples, while at the same time the discriminator isalso trained to distinguish these newly fed generated samplesfrom G. As training moves on and moves close the theequilibrium, G is able to generate samples that look as realisticas real power generation time-series corresponding to a smallLG value, while D is unable to distinguish G(z) from PX withlarge LD. Eventually, we are able to learn an unsupervisedrepresentation of the probability distribution of renewablestime-series. By sampling z from known distribution, we getG(z) that appears “as if” it was sampled from the truedistribution.

More formally, the minimax objective (2) of the game canbe interpreted as the dual of the so-called Wasserstein dis-tance (Earth-Mover distance) [16]. The Wasserstein distancebetween two distribution X and Y measures the effort (or“cost”) needed to transport X to Y. It is shown in [15] thatwe are precisely trying to get two distributions, Px and PG(z)to be close to each other by defined loss for G and D in (1a)and (1b) respectively.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Dis

crim

inat

or’s

out

put Real

Gaussian CopulaGenerated

0x 104Iterations

0.2 0.4 0.6 0.8 1.0 1.2

Figure 3. The evolution of discriminator D’s output during trainingprocess. Generated time-series(red) are indistinguishable from realmeasurements (blue). As a comparison, we also feed a batch ofscenarios(green) generated by Gaussian Copula method, which iseasy to distinguish from the discriminator’s perspective.

Note that unlike previous approaches for generating sce-narios given historical observations, which all involve themodeling of renewables generation stochastic processes [2],[10], [11], by using GANs we bypass the step of learningor modeling Px explicitly. It also avoids complicated latentmodeling in Variational Autoencoders [17]. The training al-gorithm of GANs for generating renewables time-series issummarized in Algorithm 1. In Fig. 3, we show the evolutionof the loss function of the discriminator during training processand how the generated samples learn to mimic real historicaldata. In comparison, we show that the discriminator can alwaysreliably detect that the scenarios generated using a Gaussiancopula method [2] from the true realizations.

III. SCENARIOS FORECASTS USING GANS

In this Section, we show by formulating the scenario fore-casts as an optimization problem, a trained GANs can be used

Page 4: An Unsupervised Deep Learning Approach for Scenario Forecasts · driven, producing a set of scenarios that represent possible future behaviors based only on historical observations

Algorithm 1 GANs for Time-Series GenerationInput: Learning rate η , clipping parameter c, batch size

m, Number of iterations for discriminator per generatoriteration ndiscri

Initialize: Initial weights θ (D), θ (G)

while θ (D) has not converged dofor t = 0, ...,ndiscri do

# Update parameter for DiscriminatorSample batch from historical data:{x(i)}m

i=1 ∼ PXSample batch from Uniform distribution:{z(i)}m

i=1 ∼Uni f (−1,1)Update discriminator nets using gradient descent:gθ (D) ← ∇θ (D) [− 1

m ∑mi=1 D(x(i))+ 1

m ∑mi=1 D(G(z(i)))]

θ (D)← θ (D)−η ·RMSProp(θ (D),gθ (D))

θ (D)← clip(θ (D),−c,c)end for# Update parameter for GeneratorUpdate generator nets using gradient descent:gθ (G) ← ∇θ (G)

1m ∑m

i=1 D(G(z(i)))θ (G)← θ (G)−η ·RMSProp(θ (G),gθ (G))

end while

to generate a group of scenarios given past observations andpoint forecasts.

A. Mathematical Formulation

For a typical renewable power generation site, assume attimestep t, we have records for actual past power outputsphist ∈ Rh+1 with phist = [pt . . . pt−h]

T . Meanwhile, we havesome forecasting method to obtain the point forecasts pt+ifor each look-ahead time i = 1, ...,k given phist . This forecastis denoted by ppred = [pt+1 . . . pt+k]

T , where k is the fore-casting horizon. Based on the historical information and thepoint forecast, we are interested in generating a group of Nscenarios S= {s1, ...,sN}, si ∈Rk, which represent the possiblevariations around the point forecast and accurately reflect thetemporal dynamics of future generation. Note we focus on thescenario forecasting problem, so the central point forecast canbe provided by any method.

Assume we have trained a GANs model based on the setof observations. Given some input noise z, G(z) generatesa possible realization without regarding to the historicallyobserved data phist and the point forecast ppred . Therefore, weneed to constrain the possible G(z) to satisfy two conditions:1) the part of G(z) from time index t−h to t should be closeto the historical data phist ; 2) the part of G(z) from time indext+1 to t+k should be realistic and respect the point forecast.

To describe these conditions, we introduce two projectionoperators that separate a vector into two parts. Given v ∈Rh+k+1, we denote two projection operations Phist( f (v)) =[v1, ...,vh+1]

T and Ppred( f (v)) = [vh+2, ...,vh+k+1]T to extract

former h+1 and latter k dimensions of v, respectively.Meanwhile, we want to constrain generated scenarios do

not conflict with the information provided by point fore-casts ppred (e.g., information from numerical weather predic-

tion (NWP)). Then we can constrain latter part of G(z) so thatthey do not conflict with the given ppred . Then to ensure thatthe first part of G(z) resembles Phist , we use the following costfunction:

||Phist(G(z))−phist ||2. (3)

To ensure the generated scenarios are realistic, we add a lossterm −D(G(z)) where D is the discriminator output (recalllarger discriminator ouput indicates more realistic samples).Finally, we use the point forecast ppred by defining a predictioninterval that the generated scenarios should lie in [18]. Wedescribe this interval with an upper bound Uα(ppred) and alower bound Lα(ppred), controlled by a parameter α (can beinterpreted as the prediction confidence):

Lα(ppred) =1α

ppred , Uα(ppred) = α ppred (4)

Using all of the objectives and constraints above, given theobservation and forecast vector pair phist , ppred , and GANspre-trained model G, D, the scenario forecasts problem canbe formulated as a constrained optimization problem:

minz||Phist(G(z))−phist ||2− γ D(G(z)) (5)

s.t. z ∈ Z (5a)Lα(ppred)≤ Ppred(G(z))≤Uα(ppred). (5b)

where γ is a weighting parameter; (5a) constrains z to bewithin the domain of G(z), which we take to be a hypercubeZ = [−1,1]h+k+1; (5b) constrains the generators’ output to bewithin the given prediction intervals given ppred . By solvingabove optimization problem, we can obtain a forecastingscenario Ppred(G(z∗)).

Since both of the objective and constraints in (5) are non-convex, to deal with the inequality constraints (5b), we proposeto substitute it into the main objective with two log barriers.Then the optimization problem is reformulated as

minz||Phist(G(z))−phist ||2−β log(Ppred(G(z))−Lα(ppred))−β log(Uα(ppred)−Ppred(G(z)))− γ D(G(z)) (6a)

s.t. z ∈ Z . (6b)

where β is the weighting parameter for log barriers.In next subsection, we will illustrate how we are able to

find a group of solutions z for problem defined in (6) giventhe fixed, differentiable, yet non-convex function G(·).

B. Forecasting Scenarios with Pre-trained GANs

Because of the highly non-convex nature of G(·) and D(·),there exist many local optima in (5). The key to finding agroup of solutions to (5) exploits that fact. Figure 4 showsthe landscape of solutions in a one-dimensional illustration. Toensure that we reach a good local optimum, we add momentumto the gradient descents algorithm [19] to skip from saddlepoints and shallow local optima.2

2There is a growing body of literature on the local optima of non-convexfunctions and interested readers can refer to [19] and the references within.

Page 5: An Unsupervised Deep Learning Approach for Scenario Forecasts · driven, producing a set of scenarios that represent possible future behaviors based only on historical observations

1x

2x 3x

1*x 2*x 3*x

1locx

( )f x

Figure 4. Several optimal points exist on the 1-D curve, whiledifferent initial points and momentum gradient descents can helpreach these optimal points by getting over the local optimal points.

Since there are multiple local optima to (5), we can startat different initial points zi ∈ Z and find distinct forecastingscenario Ppred(G(z∗i )) by solving (6) using gradient descentswith momentum (MomentumGD). As the training loss definedin (2) incurs G to generate diverse modes given different z,we are able to obtain a group of distinct yet realistic scenarioswith different initial starting values.

In order to obtain good starting points for z whichPpred(G(z))) do not fall outside of the log barriers in (6), wefirst solve the following subsidiary problem:

minz||Ppred(G(z))−pinitial ||2 (7)

s.t. z ∈ Z (7a)Lα(ppred)≤ pinitial ≤Uα(ppred). (7b)

where pinitial is sampled uniformly at random from[Lα(ppred), Uα(ppred)]. In order that we can always obtaina good initial z, we set α in (7) to be slightly smaller than αin main objective function (6). In Algorithm 2 we summarizeour approach for generating a group of scenarios providedwith a pre-trained GANs weights as well as pairing historicalmeasurements and point forecasts.

IV. SIMULATION RESULTS

In this section, we study the performance of the proposedmethod for scenario forecasts over various forecasting hori-zons. We focus on wind power production, and show thescenarios generated by our method not only conform to thestatistical properties of real measurements, but also capture thespatial and temporal correlations. We also show that our ap-proach is flexible to generate scenarios for varying predictionintervals as well as prediction horizons. All experiments areimplemented using Python 2.7 with deep learning open-sourcepackage TensorFlow [20]. The GANs training procedure isaccelerated by two Nvidia Geforce GTX TITAN X GPUs.

Both the generator and the discriminator deep neural net-works are composed of two convolutional layers and twofully-connected layers. All models in this paper are trainedusing RmsProp optimizer [21], which is a learning-rate self-adaptive gradient descent algorithm. Weights for neurons inboth neural networks were initialized from a centered normaldistribution with standard deviation of 0.02. Batch normal-ization is adopted before every layer except the input layerto stabilize learning by normalizing the input of every layer

Algorithm 2 Forecasting Scenarios with GANs

Input: Pre-trained GANs model weights θ (G), θ (D)

Input: Measurements phist , point forecast ppredInput: PI level α , initial PI level αsub, weighting parameters

γ,β , initial point finder iterations ninit , scenario finderiterations nscen, learning rate η , scenario number N

Initialize: Generated scenarios S ← /0for scenario = 0, ...,N do

Sample pinitial ∼Uni f (Lαsub(ppred),Uαsub(ppred))Sample z∼Uni f (−1,1)# Find good initial zfor iteration = 0, ...,ninit do

Update z using gradient descent:gz← ∇zLsub #Lsub is defined by 7z← z− z ·MomentumGD(z,gz)z← clip(z,−1,1)

end for# Find forecasting scenariosfor iteration = 0, ...,nscen do

Update z using gradient descent:gz← ∇zLmain #Lmain is defined by 6z← θ (D)−η ·MomentumGD(z,gz)z← clip(z,−1,1)

end forS.insert(G(z))

end for

to have zero mean and unit variance. With exception of theoutput layer, rectified linear units (ReLU) has been used as theactivation function in the generator, while Leaky-ReLU is usedin the discriminator. We observed in Algorithm 1, a setting ofndiscri = 4 could get the fastest convergence rate for D. Oncethe discriminator has converged to similar outputs value forD(G(Z)) and D(X), the generator was able to generate realisticpower generation samples.

A. Description of Data

In order to test the performance of our proposed frameworkfor scenario generation, we set up our numerical simulationsbased on wind power data published by the NREL WindIntegration National Data Set (WIND) Toolkit 3. Actual powermeasurements have a resolution of 5 minutes. The datasetalso contains deterministic, day-ahead forecasts along withestimated 10% and 90% forecast quantiles. The detailed NWP-based forecasts method is described in [22]. We construct ourdataset by aggregating 20 wind turbines’ records from Jan.1st,2007 to Dec. 31st, 2013. All these wind turbines are locatedin WA, USA and are of geographical proximity. Selected windturbines have a nominal capacity of 16MW . In total there are14,728,320 measurements, and we split 90% of daily samplesas the real training data for our GANs model, while theremaining 10% samples are only used to test the performanceof the proposed scenario forecasts method. All wind powermeasurements and forecasts are normalized to [0,1].

3https://www.nrel.gov/grid/wind-integration-data.html

Page 6: An Unsupervised Deep Learning Approach for Scenario Forecasts · driven, producing a set of scenarios that represent possible future behaviors based only on historical observations

B. Validation FrameworkThe validation of the quality of generated scenarios is more

complex than evaluating the performance of point forecasts.On the one hand, generated scenarios should be realisticenough to reflect the interdependence structure of forecastingvalues at different prediction horizons; on the other hand theyshould represent all the possible future realizations given pastobservations at certain wind farm.

50 100 150 200 250

50

100

150

200

250

50 100 150 200 250

50

100

150

200

2500.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Horizon (5 mins) Horizon (5 mins)

0 0

00

Hor

izon

(5 m

ins)

Hor

izon

(5 m

ins)

(a) Measurements (b) Scenarios

Figure 5. The Pearsons correlation matrix for a group of historicalrealizations (left) and scenarios forecasts (right) indicates our scenarioforecasts method capture the linear correlation for forcasting leadtime varying from 5 minutes to 1 day.

First, we examine generated scenarios’ temporal statistics.We calculate and compare samples’ autocorrelation with re-spect to look-ahead time k:

R(k) =E[(st −µ)(st+k−µ)]

σ2 (8)

where s∈ S represents sample either of generated scenarios orrealizations with mean µ and variance σ .

We also make use of the Pearson’s correlation coefficient,which is a standard method to evaluate the linear relationshipof time-series at various look-ahead times. Given the set ofgenerated scenarios or realizations S, each term ρi, j in thePearson’s correlation matrix denotes the Pearson correlationfor lead time i and j, and is calculated by

ρi, j =Cov(Si,S j)

σSiσS j

(9)

where Cov(Si,S j) is the covariance of Si and S j.In order to verify the group of generated scenarios are able

to represent possible future realizations, the scenarios shouldbe able to cover the actual value of power generation (reliable),while at the same time distance between generated scenariosshould be small (sharp). We make use of the ContinuousRanked Probability Score (CPRS) [23], which is a negatively-oriented score (smaller scores are better). It is a comprehensivemetric that jointly evaluates the reliability and sharpness ofgenerated scenarios. The score at lead time k is defined as

CPRSk =1N

N

∑t=1

∫ 1

0(Ft+k|t(p)− I(p≥ pt+k))

2d p (10)

where N is the total number of evaluated scenarios, Ft+k|t(p)is the cumulative distribution for normalized generated sce-narios’ value at lead time k, and I(p ≥ pt+k) is the indicator

RealizedPredictedScenarios

0 50 100 150 200 2500

2

4

6

8

10

12

14

16RealizedPredictedScenarios

0

2

4

6

8

10

12

14

16

0 50 100 150 200 250

0 50 100 150 200 2500

2

4

6

8

10

12

14

16RealizedPredictedScenarios

(a)

(b)

(c)Po

wer

val

ue[M

W]

Pow

er v

alue

[MW

]Po

wer

val

ue[M

W]

Horizon (5 mins)

Horizon (5 mins)

Horizon (5 mins)Figure 6. Plot (a) (b) (c) correspond to group of 10 day-aheadscenarios (red) with varying PIs of 1.5, 2 and 3 respectively.

function to compare the normalized scenarios and measure-ments. Since we are not using quantile statistics to calculateFt+k|t(p), we use the discrete-valued Ft+k|t(p) to calculate (10).

C. Simulation Results

Recall that Fig. 3 showed the output evolution of D,where D(x) and D(G(z)) converged after about 6,000 trainingiterations. In addition, we evaluate the quality of a generatedtime-series from empirical Gaussian Copula method [2]. Inthis case, D is able to distinguish the generated samples fromreal measurements. This observation suggests that eventhoughGaussian Copula method tries to model the interdependencestructure for time-series, the generated scenarios are stilldifferent from actual realizations.

We then validate if scenarios coming from our method havesimilar temporal correlation as the actual wind power values.In Fig. 5 we plot the colormap for the covariance matrix ofa group of 32 wind turbines’ 24−hour actual measurements,along with 1,600 forecasting scenarios with 50 scenarios foreach realization. The x− and y− axes are for the predictionhorizon k. Similar covariance matrix element values indicate

Page 7: An Unsupervised Deep Learning Approach for Scenario Forecasts · driven, producing a set of scenarios that represent possible future behaviors based only on historical observations

that without any model assumptions being made during train-ing process, our proposed scenario generation method is ableto capture the temporal dependency accurately.

Fig. 2 shows a group of 20 generated scenarios withforecasts lead time ranging from 4h to 12h. We show thatby only changing projection length k, our approach is able toconveniently generate reliable and sharp scenarios for differentforecast horizons. Meanwhile, these scenarios’ autocorrelationplots cover the realizations, which indicate the generatedscenarios are able to represent the temporal dependence ofany length.

0 5 10 15 2002468

1012141618

CPR

S [%

]

Lead Time [Hours]

Proposed MethodGaussian Copula

24

Figure 7. CPRS of scenarios generated by proposed method andempirical Gaussian Copula [2].

In Fig. 6 we specifically select one 24−hour sample whosepoint forecast is deviating a lot from the actual measure-ments. By selecting different prediction intervals, our proposedmethod could reflect the trade-off between reliability andsharpness. When the interval level is α = 1.5, generatedscenarios are close to point forecasts, yet fail to cover the re-alizations; while when α = 3, generated scenarios could coverthe actual power production values, but are less concentrated.

The performance of the proposed method is also demon-strated by the CPRS score. Results for our approach andGaussian copula method are plotted in Fig. 7. Both approachesuse the same training dataset to get the time-series generatoror to find the estimate of the covariance matrix, and are testedon the stand-alone testing samples. The proposed methodhas better performance at different lead time compared toGaussian Copula. Since point forecasts normally accumulatelarger errors with longer forecast horizons, both methods havegrowing CPRS values with respect to forecasting horizons.

V. CONCLUSION

In this paper we proposed a data-driven unsupervised ma-chine learning approach for forecasting scenarios of renew-ables power generation processes. The proposed method isflexible and easily implemented in problems with high pene-tration of renewables. Numerical results show that comparingwith existing scenario generation approaches, the proposedmethod is able to generate realistic, high quality scenarioscapturing spatiotemporal behaviors of renewables without anyexplicit model construction.

REFERENCES

[1] M. Lei, L. Shiyan, J. Chuanwen, L. Hongling, and Z. Yan, “A reviewon the forecasting of wind speed and generated power,” Renewable andSustainable Energy Reviews, vol. 13, no. 4, pp. 915–920, 2009.

[2] P. Pinson, H. Madsen, H. A. Nielsen, G. Papaefthymiou, and B. Klockl,“From probabilistic forecasts to statistical scenarios of short-term windpower production,” Wind energy, vol. 12, no. 1, pp. 51–62, 2009.

[3] Y. Wang, Y. Liu, and D. S. Kirschen, “Scenario reduction with submod-ular optimization,” IEEE Transactions on Power Systems, vol. 32, no. 3,pp. 2479–2480, 2017.

[4] C. Tang, Y. Wang, J. Xu, Y. Sun, and B. Zhang, “Economic dispatchconsidering spatial and temporal correlations of multiple renewablepower plants,” arXiv preprint arXiv:1707.00237, 2017.

[5] Y. Gu and L. Xie, “Stochastic look-ahead economic dispatch withvariable generation resources,” IEEE Transactions on Power Systems,vol. 32, no. 1, pp. 17–29, 2017.

[6] Y. Feng, I. Rios, S. M. Ryan, K. Spurkel, J.-P. Watson, R. J.-B. Wets,and D. L. Woodruff, “Toward scalable stochastic unit commitment. part1: load scenario generation,” Energy Systems, vol. 6, no. 3, pp. 309–329,2015.

[7] G. Osorio, J. Lujano-Rojas, J. Matias, and J. Catalao, “A new scenariogeneration-based method to solve the unit commitment problem withhigh penetration of renewable energies,” International Journal of Elec-trical Power & Energy Systems, vol. 64, pp. 1063–1072, 2015.

[8] G. Giebel, R. Brownsword, G. Kariniotakis, M. Denhard, and C. Draxl,“The state-of-the-art in short-term prediction of wind power: A literatureoverview,” ANEMOS. plus, Tech. Rep., 2011.

[9] R. Barth, L. Soder, C. Weber, H. Brand, and D. J. Swider, “Methodologyof the scenario tree tool,” Wilmar Deliverable, vol. 6, 2006.

[10] S. Delikaraoglou and P. Pinson, “High-quality wind power scenarioforecasts for decision-making under uncertainty in power systems,” in13th International Workshop on Large-Scale Integration of Wind Powerinto Power Systems as well as on Transmission Networks for OffshoreWind Power (WIW 2014), 2014.

[11] T. Wang, H.-D. Chiang, and R. Tanabe, “Toward a flexible scenariogeneration tool for stochastic renewable energy analysis,” in PowerSystems Computation Conference (PSCC), 2016. IEEE, 2016, pp. 1–7.

[12] X.-Y. Ma, Y.-Z. Sun, and H.-L. Fang, “Scenario generation of windpower based on statistical uncertainty and variability,” IEEE Transac-tions on Sustainable Energy, vol. 4, no. 4, pp. 894–904, 2013.

[13] Y. Chen, Y. Wang, D. S. Kirschen, and B. Zhang, “Model-free renew-able scenario generation using generative adversarial networks,” IEEETransactions on Power Systems, 2018.

[14] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” inAdvances in neural information processing systems, 2014, pp. 2672–2680.

[15] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” arXivpreprint arXiv:1701.07875, 2017.

[16] C. Villani, Optimal transport: old and new. Springer Science &Business Media, 2008, vol. 338.

[17] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXivpreprint arXiv:1312.6114, 2013.

[18] C. Wan, Z. Xu, P. Pinson, Z. Y. Dong, and K. P. Wong, “Probabilisticforecasting of wind power generation using extreme learning machine,”IEEE Transactions on Power Systems, vol. 29, no. 3, pp. 1033–1044,2014.

[19] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importanceof initialization and momentum in deep learning,” in Internationalconference on machine learning, 2013, pp. 1139–1147.

[20] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Large-scalemachine learning on heterogeneous distributed systems,” arXiv preprintarXiv:1603.04467, 2016.

[21] T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide the gradientby a running average of its recent magnitude,” COURSERA: Neuralnetworks for machine learning, vol. 4, no. 2, pp. 26–31, 2012.

[22] B.-M. Hodge, “Final report on the creation of the wind integrationnational dataset (wind) toolkit and api: October 1, 2013-september 30,2015,” NREL (National Renewable Energy Laboratory (NREL), Golden,CO (United States)), Tech. Rep., 2016.

[23] P. Pinson and R. Girard, “Evaluating the quality of scenarios of short-term wind power generation,” Applied Energy, vol. 96, pp. 12–20, 2012.