Upload
gyles-stewart
View
218
Download
0
Embed Size (px)
Citation preview
Analysis of day-ahead electricity data
Zita Marossy & Márk Szenes (ColBud)
MANMADE workshop
January 21, 2008
Topics
Stylized facts of electricity price data Modeling variable: price Autocorrelation structure Persistence Price distribution Seasonality
Time series modeling Neural network SETAR
Main results
Persistence analysis Underlying variable: price, not price change Results: H = 0.7-0.97 (0.8)
Price distribution Generalized extreme value distribution vs. Lévy distribution
Design of a seasonal filter Filtering the intra-weekly seasonality
Performance evaluation of an ANN model Reasonable for short-run forecasts
SETAR model for determining price spikes
Data: EEX, hourly day-ahead prices
Autocorrelation structure
Seasonality Effect of intra-weekly seasonality is strong AC decays slowly
0 50 100 150 200 250 300 350-0.2
0
0.2
0.4
0.6
0.8
Lag
Sam
ple
Par
tial A
utoc
orre
latio
ns
Sample Partial Autocorrelation Function
0 50 100 150 200 250 300 350-0.2
0
0.2
0.4
0.6
0.8
Lag
Sam
ple
Aut
ocor
rela
tion
Sample Autocorrelation Function (ACF)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x 104
-0.2
0
0.2
0.4
0.6
0.8
Lag
Sam
ple
Aut
ocor
rela
tion
Sample Autocorrelation Function (ACF)
Modeling prices, not price changes1. The price process has no unit root, there is
no need to differentiate the time series
2. Electricity can not be stored: ‘return’ has no direct meaning
3. By differencing we cause spurious patterns in ACF:
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1 13 25 37 49 61 73 85 97
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 13 25 37 49 61 73 85 97
Persistence analysis
Calculating the Hurst exponent of prices Without differencing the time series Hurst exponent – classical usage
(with differencing the time series first): > 0.5 persistent process
High ‘return’ shock followed by high ‘return’ = 0.5 random walk
‘Return’ is white noise < 0.5 antipersistent process (mean reversion)
Hurst exponent – without differencing > 0.5 persistent process
High price followed by high price = Are high prices persistent? = 0.5 white noise < 0.5 antipersistent process
Hurst exponent: estimation resultsMethod Estimated Hurst exponent
Aggregated variance 0.872
Differenced aggregated variance 0.702
Aggregated absolute values/means 0.924
Fractal dimension (Higuchi) 0.967
Residuals of regression (Peng) 0.811
R/S 0.835
Periodogram 0.891
Modified periodogram 0.770
Wavelet 0.839
Price distribution
Two estimated distributions: Lévy
Generalized extreme value
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 1000 2000 3000 4000 5000 6000 70000
0.5
1
1.5x 10
-3
Data
Den
sity
EEX (daily)
GEV (EEX)
0 1000 2000 3000 4000 5000 6000 70000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Data
Cum
ulat
ive
prob
abili
ty
EEX (daily)
GEV (EEX)
k
k
xkxF
/1
,, 1exp
Comparison
Kolmogorov test:
Test statistic: Lévy: 0.0141 GEV: 0.0262
Mean of absolute differences: Lévy: 8.07*10-4
GEV: 7.18*10-4
0 100 200 300 400 500 600-0.03
-0.02
-0.01
0
0.01
0.02
0.03
price (EUR)F
emp-
F
Differences in CDF
Lévy
GEV
estFFD sup
Seasonality
Seasonality: intradaily Weekly
Spectral decomposition Periodogram of prices Periodogram of ACF
Filtering Median or average week Differencing Moving average technique
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-30
-20
-10
0
10
20
30
40
50
60
70
Normalized Frequency ( rad/sample)
Pow
er/f
requ
ency
(dB
/rad
/sam
ple)
Power Spectral Density Estimate via Periodogram (Variable: price)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-70
-60
-50
-40
-30
-20
-10
0
10
20
Normalized Frequency ( rad/sample)
Pow
er/f
requ
ency
(dB
/rad
/sam
ple)
Power Spectral Density Estimate via Periodogram (Variable: ACF)
0 20 40 60 80 100 120 140 160 18010
20
30
40
50
60
70
time
pric
e (E
UR
)
Sample week
Need for new seasonal filter
The type of distribution changes
23 24 25 26 27 28 29 30 31 32 3370
75
80
85
90
95
100
105
110
mean
high
qua
ntile
Quantile dependence
Suggested filter
‘GEV filter’1. Separately estimate a GEV distribution for each
hour and day i: F1(i)2. Transform the prices:
F2-1F1,i(x)
F2: lognormal cdf (parameters: entire distribution)
3. Model the prices of filtered data4. Forecast5. Transform the forecasts back into GEV
Empirical results
Figures: periodogram of ACF (orig prices) ACF (filtered data)
Intraweekly filtering successful
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-70
-60
-50
-40
-30
-20
-10
0
10
20
30
Normalized Frequency ( rad/sample)
Pow
er/f
requ
ency
(dB
/rad
/sam
ple)
Power Spectral Density Estimate via Periodogram
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-70
-60
-50
-40
-30
-20
-10
0
10
20
Normalized Frequency ( rad/sample)
Pow
er/f
requ
ency
(dB
/rad
/sam
ple)
Power Spectral Density Estimate via Periodogram
Estimated GEV parameters
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1612
13
14
15
16
17
18
19
k
mu
GEV parameters
Distributions with high scale param
0 20 40 60 80 100 120 140 160 1800
2
4
6
8
10
12
14
16
18
20Distributions with high shape parameters
Hour
mu
Conclusion
Different hours of week behave differently There are a few hours with fatter tails These are more sensitive to price spikes We can model fat tails and forecasting
separately
Performance evaluation of an ANN
Short term price forecasting (few hours to days) ANN: simple but flexible tool Architecture: standard feedforward type Layers: 168 – 15 – 1 Input: historical data Training set: 42 days Prediction horizon:
from 1 hour to 1 week
Performance evaluation of an ANN Measuring error by MAPE
Testing against naive method Averaged over 50 runs:
50 consecutive weeks from Nov. 2005 to Nov. 2006
Results: NN performs well in day-ahead
forecasting But it fails to compete with naive
method in wider time horizon
Improvements: Exogenous variables
TAR (Threshold AR)
SETAR
Aim: Identifying the limit (C)
between high and low prices 2 state SETAR model
On daily price Threshold: 44.26
Cyify
Cyifyy
ttt
tttt
122
111
Dependent Variable: P Method: Least Squares Date: 11/02/07 Time: 17:28 Sample (adjusted): 16 2499 Included observations: 2484 after adjustments
Variable Coefficient Std. Error t-Statistic Prob. IND1 2.027058 0.776760 2.609634 0.0091
IND1*P(-1) 0.473548 0.026531 17.84886 0.0000 IND1*P(-2) -0.057900 0.026815 -2.159224 0.0309 IND1*P(-3) 0.135780 0.027550 4.928506 0.0000 IND1*P(-4) -0.063816 0.027579 -2.313925 0.0208 IND1*P(-5) 0.038770 0.032474 1.193888 0.2326 IND1*P(-6) 0.068898 0.037351 1.844597 0.0652 IND1*P(-7) 0.485232 0.042672 11.37114 0.0000 IND1*P(-8) -0.159181 0.030646 -5.194250 0.0000 IND1*P(-9) -0.050625 0.027950 -1.811270 0.0702 IND1*P(-10) -0.028025 0.029898 -0.937356 0.3487 IND1*P(-11) -0.024351 0.028674 -0.849228 0.3958 IND1*P(-12) -0.031132 0.032490 -0.958191 0.3381 IND1*P(-13) 0.031935 0.029685 1.075779 0.2821 IND1*P(-14) 0.166089 0.032485 5.112824 0.0000 IND1*P(-15) -0.043086 0.025717 -1.675384 0.0940
IND2 6.776519 2.197847 3.083253 0.0021 IND2*P(-1) 0.410020 0.028457 14.40836 0.0000 IND2*P(-2) 0.392799 0.034373 11.42748 0.0000 IND2*P(-3) -0.060909 0.032771 -1.858627 0.0632 IND2*P(-4) 0.177400 0.033091 5.360928 0.0000 IND2*P(-5) -0.073694 0.029019 -2.539503 0.0112 IND2*P(-6) 0.137620 0.026011 5.290871 0.0000 IND2*P(-7) 0.050451 0.028216 1.788034 0.0739 IND2*P(-8) 0.025822 0.031253 0.826232 0.4088 IND2*P(-9) -0.183298 0.035174 -5.211262 0.0000 IND2*P(-10) -0.043402 0.030573 -1.419632 0.1558 IND2*P(-11) 0.000582 0.031882 0.018240 0.9854 IND2*P(-12) -0.033339 0.027918 -1.194172 0.2325 IND2*P(-13) 0.072398 0.031071 2.330104 0.0199 IND2*P(-14) 0.051356 0.030162 1.702676 0.0888 IND2*P(-15) -0.007765 0.030624 -0.253556 0.7999
R-squared 0.677215 Mean dependent var 31.72947 Adjusted R-squared 0.673134 S.D. dependent var 18.46424 S.E. of regression 10.55641 Akaike info criterion 7.564143 Sum squared resid 273245.5 Schwarz criterion 7.639088 Log likelihood -9362.665 Durbin-Watson stat 1.993690