Upload
duonghuong
View
214
Download
2
Embed Size (px)
Citation preview
Bell Labs Technical Journal 18(4), 135–158 (2014) © 2014 Alcatel-Lucent. Published by Wiley Periodicals, Inc. Published online in Wiley Online Library (wileyonlinelibrary.com) • DOI: 10.1002/bltj.21650
◆ Demand Forecasting in Smart GridsPiotr Mirowski, Sining Chen, Tin Kam Ho, and Chun-Nam Yu
Data analytics in smart grids can be leveraged to channel the data downpour from individual meters into knowledge valuable to electric power utilities and end-consumers. Short-term load forecasting (STLF) can address issues vital to a utility but it has traditionally been done mostly at system (city or country) level. In this case study, we exploit rich, multi-year, and high-frequency annotated data collected via a metering infrastructure to perform STLF on aggregates of power meters in a mid-sized city. For smart meter aggregates complemented with geo-specifi c weather data, we benchmark several state-of-the-art forecasting algorithms, including kernel methods for nonlinear regression, seasonal and temperature-adjusted auto-regressive models, exponential smoothing and state-space models. We show how STLF accuracy improves at larger meter aggregation (at feeder, substation, and system-wide level). We provide an overview of our algorithms for load prediction and discuss system performance issues that impact real time STLF. © 2014 Alcatel-Lucent.
in the distribution network. Customers may also gain
better awareness of their own consumption patterns.
In this paper we report a study on demand predic-
tion, where we analyzed near real time power con-
sumption monitored by tens of thousands of smart
meters in a medium-size U.S. city. We developed and
adapted several short-term forecasting methods for
predicting the load at several levels of aggregation. In
this context, short-term load forecasting (STLF) refers
to the prediction of power consumption levels in the
next hour, next day, or up to a week ahead. Within
this time scope, one can have reliable weather fore-
casts, which provide important input to the predic-
tion, as historically the load in this city is highly
infl uenced by weather because electricity is used for
both heating and cooling.
IntroductionSmart grid deployments carry the promise of
allowing better control and balance of energy supply
and demand through near real time, continuous visi-
bility into detailed energy generation and consump-
tion patterns. Methods to extract knowledge from
near real time and accumulated observations are
hence critical to the extraction of value from the infra-
structure investment. On the demand side, widespread
deployment of smart meters that provide frequent
readings allows insight into continuous traces of usage
patterns that are unique to each premise and each
aggregate at different levels of the distribution hierar-
chy. This in turn enables better designs and triggers of
demand response actions and pricing strategies, and
provides input to the planning for growth and changes
136 Bell Labs Technical Journal DOI: 10.1002/bltj
Scenario of the StudyIn our study, the meters are deployed at customer
locations and their readings are sampled every 15
minutes. The meter’s network description includes its
geographical location (latitude, longitude); date of
installation and planned removal; type of customer
served; as well as which pole, which feeder section,
and which substation the meter is connected to.
Weather data is collected by the utility company at the
substation level, and consists of hourly temperature,
wind speed, and wind chill temperature. Additional
weather data, made available by the National Climatic
Data Center (NCDC) and the National Oceanic and
Atmospheric Administration (NOAA), provide addi-
tional measurements, such as humidity or sky cover
at the location of the city airport, and hourly weather
forecasts up to seven days ahead.
The load prediction algorithms that we have
investigated and implemented are embedded in a
module of a data analytic system being developed for
the utility company. The module receives meter mea-
surements and converts them to power usage values,
and aggregates usage at different levels: individual
meters, feeder sections, distribution substations, and
at the system level. It then generates load forecasts at
prediction horizons that range from 60 minutes
(next-hour predictions) to 24 hours (next-day pre-
dictions), or even 168 hours (next-week predictions).
As we will detail in a later section, the load forecasts
operate independently for each meter and meter
aggregate, communicating through a limited set of
inputs/outputs with a database to read the latest
weather forecasts and per-meter usage history and
return corresponding load forecasts. This procedure
can be parallelized, which enables some degree of
asynchronous behavior within the prediction time-
frame (the load forecasts are made with a granularity
of one hour).
Short-term load forecasts generated at the meter
(customer premise) level will provide the utility
company with customer-level smart grid capabilities
and help the company communicate with the cus-
tomer about energy saving and billing issues. STLF
generated at higher levels of aggregation (from feeder
section to city-wide) will help in planning and opera-
tion of the relevant components of the electric grid.
Panel 1. Abbreviations, Acronyms, and Terms
ACF—Auto-correlation functionARIMA—Auto-regressive integrated moving
averageARIMAX—Auto-regressive moving average with
external inputsARMA—Auto-regressive moving averageDASARIMA—Dummy-adjusted seasonal auto-
regressive integrated moving averageENEL—Ente Nazionale d’ElectricitaGARCH—Generalized auto-regressive
conditional heteroscedasticGDP—Gross domestic productHWT—Holt-Winters modeli.i.d.—Independently and identically distributedKPSS—Kwiatkowiski, Phillips, Schmidt, and ShinLOESS—Locally-weighted scatterplot smoothingLSE—Least square errorLTLF—Long-term load forecastMAPE—Mean absolute percentage error
ML—Machine learningMTLF—Medium-term load forecastNCDC—National Climatic Data CenterNOAA—National Oceanic and Atmospheric
AdministrationPACF—Partial ACFRAM—Random access memorySARIMA—Seasonal auto-regressive integrated
moving averageSARIMAX—Seasonal auto-regressive integrated
moving average with external inputsSSM—State-space modelsSVR—Sigma SVRSTLF—Short-term load forecastSVM—Support vector machineSVR—Support vector regressionWh—Watt hourWKR—Weighted kernel regression
DOI: 10.1002/bltj Bell Labs Technical Journal 137
State-of-the-Art in Load ForecastingElectric load forecasting is a mature fi eld of
investigation and the statistical methodologies have
been implemented and deployed in industrial appli-
cations. Several meta-review papers provide a good
overview of the demand prediction literature [13,
17, 25, 27] and identify three sub-fi elds, depending
on the prediction horizon.
The prominent sub-fi eld of investigation, short-
term load forecasting (STLF), handles prediction
horizons of one hour up to one week and typically
relies on time series analysis and modeling. Daily,
weekly, and sometimes yearly seasonality can be
explicitly modeled. These methods consider variables
such as date (e.g., day of week and hour of the day),
temperature (including weather forecasts), humid-
ity, temperature-humidity index, wind-chill index
and most importantly, historical load. Residential
versus commercial or industrial uses are rarely
specifi ed.
Representative algorithms for STLF include time
series models of linear dynamic systems involving
load and weather regressors, typically relying on
auto-regressive models such as the auto-regressive
moving average (ARMA) [15] and the seasonal auto-
regressive integrated moving average (SARIMA)
[35]. State-space models offer further refi nement to
linear dynamics by defi ning additional (so-called
“hidden” or “latent”) state variables representing
underlying load dynamics and seasonality, either by
explicit variables as in the exponential smoothing
methods [37] or in spline representations of daily
load [19]. An alternative approach to modeling load
and weather dynamics is to consider nonlinear mod-
els and a machine learning approach. A popular class
of algorithms for STLF, which we do not report here
but which has been used by several electric compa-
nies for system-wide predictions, is neural networks
[22]. We focused instead on so-called kernel meth-
ods, starting from simple weighted kernel regression
[6] all the way up to support vector machines [7] and
kernel ridge regression. The section on Short-Term
Load Forecasting Methodology provides more details
about the methods that we implemented and inves-
tigated for this comparative study.
The remaining two fi elds of investigation, not
covered in this paper, are medium-term load fore-
casting (MTLF), handling horizons of one week up to
one year, and long-term load forecasting (LTLF),
with predictions at horizons of multiple years. These
methods typically proceed by the regression on input
variables, which, in addition to historic load and cli-
mate forecasts, typically incorporate demographic
and economic factors such as the gross domestic
product (GDP), real estate statistics, or population
growth projections, as well as estimated demands of
electric equipment.
Our key fi nding was that most of the research
focused on large aggregated load data, typically at
city level or even at country level, where most indi-
vidual variations are averaged out by the effect of the
law of large numbers. These methods were seldom
tried on individual meters or at meter aggregate lev-
els such as distribution feeders and substations, with
a few exceptions such as recent work on STLF [3] in
non-residential buildings or a clustering analysis of
individual meters and aggregate load forecasting on
feeder sections in a neighborhood of Seoul [32].
In this paper, we propose to continue bridging
this gap by systematically evaluating when the state-
of-the-art STLF algorithms break down, i.e., how the
performance degrades when the number of consid-
ered meters goes down.
Other Datasets With Individual MetersThe specifi city of the unique dataset that we
investigated is that it contains energy consumption
data from the system-wide (city-wide) level down to
the level of individual meters. To our knowledge, few
such complex datasets [32] have been investigated
for short-term load forecasting, even though a few
localized (e.g., building-specifi c) smart meter data-
sets have been studied [3].
The Italian energy provider Enel has deployed
over 32 million smart meters. Remote monitoring is
done by sending the readings from each customer’s
location [31] through a low-bandwidth network to
data aggregators located at substations. Data is sam-
pled and stored at 15 minute frequency. The readings
are sent about every two weeks or every month. The
138 Bell Labs Technical Journal DOI: 10.1002/bltj
motivation for the utility is the ability to leverage
customized hourly-based tariffs [11] to price services
for its customers. Although the individual meter data
collected by ENEL has been used in studies on group-
ing individual customers based on clustering load
profi les [16], we are unaware of analyses of these
data from the perspective of load aggregates.
At the individual home level, there are several
studies on peak load prediction [33] and on energy
disaggregation of individual appliances in house-
holds [23]. However, these datasets are much smaller
(typically < 100 meters) than our current dataset.
The frequency of load measurement is also much
higher (one measurement every few seconds or mil-
liseconds) and is not typical of smart meters currently
under deployment.
The rest of the paper is divided as follows. We
begin by explaining the structure of our unique,
hierarchical dataset of load consumption coming
from a mid-size U.S. city. We then provide an over-
view of key algorithms for short-term load forecast-
ing that exploit both historical load and weather
data. The section titled “Short-Term Load Forecasting
Results” details the essentially state-of-the-art STLF
results that we obtain at the system (city) level and
how STLF performance depends on the size of the
load aggregate. We conclude with a discussion of
performance, parallelism, and runtime issues waged
by performing STLF at all levels of load aggregation;
it also introduces ensemble prediction that leverages
multiple STLF algorithms for improved predictions.
Smart Grid DataThe specifi city of our study on short-term load
forecasting is in its unique dataset consisting of hun-
dreds of thousands of individual meters intercon-
nected in a hierarchy of feeders and substations. We
provide details on how the meter data is aggregated
and how we associate it with weather data.
System Hierarchy of Meters, Feeders and SubstationsThis study actually exploits two sets of data, col-
lected in a mid-sized U.S. city (population of about
200,000 inhabitants) over the course of several years:
• System-wide data representing total city consumption
(residential and industrial), collected over the
course of 2007, 2008, and 2009 at hourly inter-
vals. This dataset is typical to classic STLF studies.
• Individual meter readings coming from over a hun-
dred thousand meters installed at customer loca-
tions. Out of this rich dataset, we use 32,000
mostly residential meters that satisfi ed a number
of conditions detailed earlier. This data was col-
lected between January 2011 and June 2012.
The individual meters measure consumption (in
Watt hours, Wh) at 15 minute intervals and are refer-
enced in a meter−pole−feeder section−substation−district hierarchy. Meter measurements included in
our analysis are those from the residential and small
business customers (with contract demand under
5000 kW). Load predictions are made at these levels:
1. Single customer. Load measurements are derived
from meter measurements by differentiation to
obtain the value increment within a sampling
time interval, divided over the duration of the
sampling interval. Single-customer STLF perfor-
mance and methods are not the object of this
paper.
2. Feeder section. We defi ne a feeder section as a sub-
set of transformers connected to a feeder (such
as serving a neighborhood). The system network
topology considered in this paper consists of
about 300 unique feeders employed in the time
period 2011−2012. The historical load measure-
ments at a given time and at the level of a feeder
section are based on aggregating the load derived
from all the meters connected to that feeder
section.
3. Substation. Each substation serves a small geo-
graphical area. There were about 100 unique
substations in the distribution network over the
time period we consider. Aggregation at the sub-
station level works in the same way as aggrega-
tion at the feeder section level.
4. System-wide. This highest level of aggregation
comprises all the residential and small business
meters indexed in the distribution hierarchy.
As explained in the next section, weather data
are geo-located with the average (center) location of
the meters connected to that feeder or substation
section.
DOI: 10.1002/bltj Bell Labs Technical Journal 139
From Consumer Meters to Consistent Load AggregatesThe main problem in aggregating meter data into
aggregates is the lack of consistency, across time, of
the constituents of each aggregate. Fortunately, our
dataset contained, in addition to meter readings,
periodically updated metadata that described each
meter, its connection to the feeder and substation, its
geographical (latitude, longitude) coordinates as well
as customer-specifi c data. To obtain a consistent
dataset for method evaluation, we used this meta-
data to discard meters that were disconnected and
reconnected, keeping only meters that satisfi ed sev-
eral consistency requirements: same owner, same
feeder connection, and same geographical location
throughout the evaluation period.
Although our per-meter dataset lists nearly a
hundred thousand meters, only a subset of the
32,000 meters satisfi es the consistency requirements
and contains non-zero meter readings. We concen-
trate on the load aggregates derived from these 32,000
meters. Load aggregates at feeder level are basically
obtained by summing up the 15-minute load from all
the meters connected to that feeder. Similarly, the
load aggregates at a substation are determined by
adding up loads at all the feeders connected to that
substation. Each aggregate’s load is then down-sam-
pled to hourly time intervals.
While aggregating the loads at individual meters,
we had to handle non-aligned time stamps, meter
reading resets, missing values or repeated readings,
sometimes resorting to linear interpolation of the load.
All processing for this 18 month dataset, representing
about 100 GB of data, was done using Perl* and shell
scripts. As shown in Figure 1, we were able to recon-
struct a smooth load profi le at the system-wide level.
Geo-Specifi c Weather DataThe area covered by the individual meters in our
mid-sized U.S. city dataset encompasses a gently hilly
area of about 40 km by 60 km, traversed by a river
and subject to micro-climatic variations. The weather
data (temperature and wind speed) are measured
hourly at 22 substations across that area. It is com-
mon to measure a difference of 15 degrees (F) in
temperature between weather substations.
Because the STLF methods detailed in the next
section are temperature-dependent, we are prompted
to interpolate the temperatures at the locations of all
meters and all meter aggregates. This interpolation is
done through the simple Kriging algorithm [14].
“Kriging” refers here to the temperature interpola-
tion based on temperature regression against observed
temperature values at a set of surrounding locations,
each of them weighted according to spatial covari-
ance. We employed the mGstat Matlab* toolbox for
geo-statistics [18] and performed simple Kriging
for each hour independently, using the latitude and
longitude coordinates of about 400 feeders and sub-
station meter load aggregates and the geographical
coordinates and temperatures of 22 weather substa-
tions. A similar procedure was adopted for wind
speed. The fi nal result is illustrated on Figure 2,
which shows the temperature interpolation at feeder
and substation aggregate levels at two times of the
year, and proves the large temperature variations.
Short-Term Load Forecasting MethodologyTime series modeling for short-term load fore-
casting (STLF) has been widely used over the last 30
years and a myriad of approaches have been devel-
oped. Kyriakides and Polycarpou [25] summarized
these methods as follows:
1. Regression models that represent electricity load as
a linear combination of variables related to
weather factors, day type, and customer class.
2. Linear time series-based methods including the
ARMA model, autoregressive integrated moving
average (ARIMA) model, auto regressive moving
average with external inputs (ARIMAX) model,
generalized auto-regressive conditional het-
eroscedastic (GARCH) model and state-space
models.
3. State-space models (SSMs) typically relying on a
fi ltering- (e.g., Kalman) based technique and
a characterization of dynamical systems.
4. Nonlinear time series modeling through machine
learning methods such as nonlinear regression.
Principles of Statistical Learning for Time SeriesIn the sections that follow, we will discuss the
temperature regression and load residual, linear
140 Bell Labs Technical Journal DOI: 10.1002/bltj
time series approaches, state-space models, and
nonlinear time series models. Before delving into
more detailed descriptions of learning algorithms,
we will begin by outlining their commonalities in a
section on the “Principles of Statistical Learning for
Time Series.”
Supervised learning of the predictor. Supervised
learning consists of fi tting a predictive model to a
training dataset (X; L), which consists of pairs (xi; Li)
of data points or samples xi and of associated target
values Li. In the case of load forecasting, samples x
represent historical values of electric load, weather,
or other types of data, collected over a short time
interval (e.g., one day). The target labels Li corre-
spond to the electric load at the prediction horizon.
The objective is to optimize a function f such that for
each data point xi, the prediction f(xi) is as close as
possible to the ground truth target Li. The discrepancy
between all the predictions and the target labels is
quantifi ed here by the mean absolute percentage
error (MAPE), whose formula is given in “Short-
Term Load Forecasting Results.”
Figure 1.System load aggregated from about 32,000 individual meters over 18 months.
(a) January 2011 through June 2012
Syst
em lo
ad (
MW
per
15m
in)
Syst
em lo
ad (
MW
per
15m
in)
(b) August 2011 through September 2011
Jan110
5
10
15
20
25
30
35
Apr11 Jul11 Oct11 Jan12 Apr12 Jul12
Aug11
5
10
15
20
25
30
Sep11 Oct11
DOI: 10.1002/bltj Bell Labs Technical Journal 141
Training, validation and test sets. Good statistical
learning algorithms are capable of extrapolating
knowledge and of generalizing it on unseen data
points. For this reason, we separate the known
data points into a training (in-sample) set, used to
defi ne model f, and a test (out-of-sample) set, used
exclusively to quantify the predictive power of f.
In the experiments previously reported in our
section on short-term load forecasting results, we use
one year of data for training and we test the model in
the calendar month immediately following. When
evaluating STLF on the 2007−2009 data, we retrain
the model 24 times and provide predictions for
January 2008 through December 2009. Using the
18-month aggregate dataset from January 2011
through June 2012, we trained six different STLF
models for predicting results for January through
June 2012.
Direct prediction versus iterated prediction in time series. In a time series prediction problem, as repre-
sented in Figure 3, the variable of interest (here, the
load) might be present at the same time in the targets
(output predictions) of the system and in the inputs,
particularly when that variable is serially correlated or
when it is produced by a dynamic system (e.g., the
weather/climate model or a model for the human
activities). Knowing the history of immediate previous
time samples of that variable helps in that prediction.
In our study, we consider hourly load and
weather data, and are interested in making load fore-
casts at prediction horizons ranging from h = 1 hour
(next hour) to h = 168 hours (next week). Predictions
at all these different horizons can be achieved in two
different ways, through direct prediction and iterated
prediction. Let us note t the current time and assume
that we have access to historical load up to time t, as
well as to weather forecasts up to time t + 168.
• Direct prediction. This predictor takes all the data
known up to time t, for instance load values in
the past 24 hours (Lt−23, Lt−22, … , Lt−1, Lt) and
temperature forecasts at any horizon h, namely
Tt+h, and directly predicts the load Lt+h that will
occur h hours ahead (see Figure 3b). Direct pre-
diction has a huge computational cost, because
different predictors need to be trained for each
prediction horizon (168 in our case).
• Iterated prediction. This predictor is simply
designed to make one-step-ahead predictions, at
horizon h = 1. As the predictive model moves
forward in time, the outputs of the predictor
Figure 2.Example of temperature variations and of spatial temperature interpolation using Kriging, at two different times of the year.
−85.5 −85.4 −85.3 −85.2 −85.1 −85 −84.9
34.95
35
35.05
35.1
35.15
35.2
35.25
35.3
35.35
35.4
35.45
Longitude
Lati
tud
e
Lati
tud
e
Temperatures (21k meters and weather stations) on 01–Jan–2011 05:45:00
51
52
53
54
55
56
57
(a)
−85.5 −85.4 −85.3 −85.2 −85.1 −85 −84.9
34.95
35
35.05
35.1
35.15
35.2
35.25
35.3
35.35
35.4
35.45
Longitude
Temperatures (21k meters and weather stations) on 30–Jun–2011 00:00:00
64
66
68
70
72
74
76
(b)
142 Bell Labs Technical Journal DOI: 10.1002/bltj
(here, load at time t + h) can in turn become its
inputs (see Figure 3a), albeit introducing the pre-
diction error directly into the model. This iter-
ated prediction can be seen as the discretization
of a dynamic system.
Temperature Regression and Load ResidualThe simplest method for load forecasting relates
the load to temperature. This is particularly relevant
for residential and business-related consumption,
where a signifi cant portion of power usage might be
due to electric heating in the winter and/or air con-
ditioning in the summer.
In our data set, electricity was used to both heat
and cool many buildings, in addition to gas heating.
The total load decreases with temperature fi rst and
then increases, the minimum occurring at or around
66 degrees Fahrenheit. We observed that this rela-
tionship varies slightly throughout the day. We
investigated two approaches for load regression.
Figure 3.Direct prediction versus iterated prediction in a time series.
(a) Iterated prediction on load with a 24-hour history of load valuesand the temperature at the prediction horizon.
(b) Direct prediction on load at horizon h = 3.
h − 23
h − 22
h − 21
h − 1 h + 1
h
h + 1
h + 1
h + 2
h + 3
h
Input
Input
Load
Temperature
Load
Temperature
Load
Temperature
Prediction at h + 2
Prediction at h + 1
temperature“forecast”
temperature“forecast”
InputPrediction at h + 3
temperature“forecast”
h − 21h − 23 h h + 3
Load
Temperature
InputPrediction at h + 3
temperature“forecast”
DOI: 10.1002/bltj Bell Labs Technical Journal 143
The fi rst used local polynomial regression,
locally-weighted scatterplot smoothing (LOESS) [8]
to fi t a surface of load on temperature and time of
day (see Figure 4). Specifi cally, for the fi t at point x,
a polynomial surface of degree 1 or 2 is made using
points in a neighborhood of x, weighted by their dis-
tance from x, to minimize the least square error
(LSE). The size of the neighborhood is controlled by
a parameter α chosen to be 0.2 in this situation for a
balance between smoothness and goodness of fi t.
The MAPE for this fi t is between six to seven percent for system-wide prediction with an average load of
approximately 0.7M kWh (2007−2009 system-wide
load) when the surface is fi tted to the previous full
year’s data.
log(L) = s(T,H) + ε
where L is the hourly load, T is the temperature, H is
the hour of day, and ε is the residual. The log transfor-
mation is used here to make the distribution more
Gaussian-like and to stabilize the variance, such that the
subsequent modeling assumptions hold. Note that
the residuals ε are not independently and identically
distributed (i.i.d.) and will continue to exhibit a daily
cyclic pattern. In the SARIMA, SSM, and Holt-Winters
model (HWT) methods detailed in the following sec-
tions, those methods are applied to the residuals ε, not
to the load time series.
A second method relies on fi tting a cubic polyno-
mial directly on the temperature values, using 24
sets of coeffi cients {ai(H)}3
i=0, one for each hour H of
the day. Temperature regression using cubic polyno-
mials is a simple benchmark for STLF [21].
L = a0(H) + a
1(H)T + a
2(H)T2 + a
3(H)T3+ ε
Note that we may use the apparent temperature,
or the wind-chill temperature, or an average of both,
instead of the raw temperature. The apparent tem-
perature (temperature taking into account the non-
linear “heat index” due to humidity) may improve
the fi t in some cases, particularly during the hot and
humid Summer season [36]. Similarly, the wind-
speed dependent wind chill temperature may help
for Winter load forecasts. We make our choices based
on cross-validation performance.
Hobby et al. [20] study the residential energy con-
sumption measured at an aggregate of all residential
meters by separating the weather- and illumination-
dependent load consumption from the residual con-
sumption. To fi t the weather- and illumination-dependent
component, they use 24 cubic spline surfaces, one per
hour of the day, indexed by apparent temperature
and illumination. They observe a strong cubic depen-
dency of load on temperature and an almost negli-
gible small linear term due to illumination.
Linear Time Series ApproachesLinear time series models exploit directly the his-
torical values of the load, and enable us to make iter-
ated load forecasts thanks to previously observed
load values. Gross and Galiana [15] wrote the refer-
ence paper on short-term load forecasting using sta-
tistical linear time series models, in particular the
auto-regressive moving average (ARMA) model.
These models have been later extended to cope with
seasonality and non-stationarity in so-called seasonal
Figure 4.Dependency among the temperature, the time of the day and the load modeled as a smooth surface. Load is expressed on the logarithmic scale and the temperature is taken one hour prior to the load value.
tim
e of
day
lag 1 hr temperature (F)
log(load +1)
144 Bell Labs Technical Journal DOI: 10.1002/bltj
auto-regressive integrated moving average (SARIMA)
models. Further extensions have been made in the
work of Soares and Medeiros [35], where they com-
pared two-level seasonal auto-regressive model and
dummy-adjusted seasonal auto-regressive integrated
moving average (DASARIMA) on Brazilian electric
load data.
Seasonal Auto-Regressive Integrated Moving Average Models
In seasonal auto-regressive integrated moving
average (SARIMA) models, the seasonality compo-
nent comes from the daily load cyclic pattern. In this
paper we apply the SARIMA model to residuals from
the LOESS fi t (we refer to this method as “residual
SARIMA”). We also considered the SARIMAX model,
i.e., SARIMA with “exogenous” variables, namely
temperature. However, the temperature coeffi cient is
diffi cult to interpret and the model offers poor pre-
diction accuracy compared to residual SARIMA. In
contrast, residual SARIMA explicitly models the rela-
tionship between the time series and the exogenous
variable. It is especially appealing when changes in
exogenous variable(s) are concurrent with changes
in the original time series, which is the case with
temperature and power usage.
A SARIMA model has seven order parameters.
We can write the model as:
SARIMA (p, d, q) × (P, D, Q)s
Φp(Bs)ϕP(B)(1−B)d(1−BS)DXt = ΘQ(BS)θq(B)εt
where B is the lag operator that satisfi es:
Bi(Xt) = Xt−i
and Φp(Bs), ΘQ(BS) and (1−BS)D are corresponding
autoregressive, moving average and differencing
parts for seasonal components, while ϕP(B), θq(B) and
(1−B)d are corresponding autoregressive, moving
average and differencing parts for the non-seasonal
component. S is the period length (S = 24 with
hourly load reading and a daily cyclic pattern).
The procedure of determining the order param-
eters follows Box-Jenkins procedures by examining
the auto-correlation function (ACF) and partial ACF
(PACF) of the differenced and original time series.
Investigating the order parameters on the one-year
training data, we concluded that d = 1, D = 1, p = 0,
P = 0, while q = 1 and Q = 1, essentially ignoring the
auto-regressive component. Stationarity of the dif-
ferenced data were checked using the Kwiatkowiski,
Phillips, Schmidt, and Shin (KPSS) test [2, 24]. The
p-value was greater than 0.1, suggesting stationarity
in differenced data.
Note that for the residual SARIMA, shortening
the training period for estimating the parameters of
the model, from one year down to the last month
immediately preceding the prediction (test) period
offered a better fi t.
State-Space ModelsThe state-space model (SSM) is an online adap-
tive method for forecasting. SSMs introduce hidden
(unknown) variables representing the quantity to be
estimated. The main state-space model used across
scientifi c disciplines is the Kalman fi lter. In their
review paper, Pigazo and Moreno [30] described how
the Kalman fi lter can predict electric load values
from the previous load measurements, and then
update that prediction using other regressors such as
temperature data. Harvey and Koopman [19] mod-
eled load time series through cubic spline interpola-
tion on intra-daily and intra-weekly patterns, where
the spline coeffi cients were time-varying and updated
using a Kalman fi lter. Dordonnat et al. [10] defi ned a
custom state-space that took into account calendar
days and used it to predict nationwide French electric
load. Taylor and McSharry [37] reformulated the
state-space model as a multi-level linear time series
model, which can handle weekly and daily seasonal-
ity in electric load.
State-space model on the spline fi t of load residu-als. The SSM in [19] does not require offl ine training
and updates the model parameters in real time as
each reading comes in. This method has been suc-
cessfully applied to the online monitoring of time-
varying network streams [4].
In that SSM, the computation for each update is
inexpensive thanks to Kalman fi ltering, making it an
ideal method for online forecasting. It uses B-splines
to model the daily cyclic pattern, as the nonlinear
DOI: 10.1002/bltj Bell Labs Technical Journal 145
trends in the load time series can be transformed into
a linear model with respect to the spline basis.
Moreover, a cyclic spline basis ensures the periodic
constraint (namely, the daily cyclic pattern of the
load). We place K equally spaced knots, or K−1 spline
bases to cover a full day (here K = 8 for 24 hourly
load readings on a given day).
The state space model consists of two equations:
the observation equation, which generates the load
data from the hidden variable, and the state equation,
which explains dynamics in the hidden (spline co-
effi cient) data. The observation equation is:
εt = Bαt + ut
ut ∼N(0, σuI)
εt is the one-day time series of the L load residu-
als from one day; B is a 24 by K matrix of B-spline
bases, each column corresponding to one spline; αt is
the vector of coeffi cients for the splines; ut is a vector
of i.i.d. Gaussian white noise with standard deviation
σu. The vector αt characterizes the daily pattern on
day t.
To accommodate day-to-day variations in the
daily pattern αt, we use a random walk for the spline
coeffi cients, specifi ed by the state equation:
αt = αt−1 + vt
vt ∼ N(0, σvI)
where the spline coeffi cients on day t are equal to
those on t−1, plus i.i.d. white noise of variance σv.
The above SSM is fi tted online with a Kalman
fi lter, such that the updating is done for each incom-
ing data point. This ensures that forecasts are done in
an online fashion. Hyper-parameters are estimated
empirically by fi tting them to spline coeffi cients for
individual days.
We also applied this approach directly to the log-
transformed load without the regression on temper-
ature (results not reported here). The performance is
slightly worse than using the residuals but still rea-
sonable. This approach would work well if tempera-
ture forecasts were unavailable or unreliable.
Holt-Winters double seasonal exponential smooth-ing. The HWT model [37] is a variation on the state-
space model designed specifi cally for data that have
two seasonalities: an intra-day (24 h) seasonality, and
an intra-week (168 h) seasonality. The state equa-
tions involve three state variables, essentially corre-
sponding to the smoothing, daily and weekly effect in
the data.
yt(k) = lt + dt−m1+k1+ wt−m2+k2
+ φket
et = yt − (lt−1 + dt−m1 + wt−m2
)
lt = lt−1 + αet
dt = dt−m1 + δet
wt = wt−m2 + ωet
In the above equations, y is the estimated value
of the load, l is the exponentially smoothed fi rst-
order auto-regressive component of the load, d is the
intra-day seasonal component of the load (m1 = 24
hours) and w is the intra-week seasonal component
of the load (m2 = 168 hours); fi nally e is the expo-
nentially decaying error term. The values of the state
variables are initialized in the following way: the
model is on about one month of data. The four coef-
fi cients α, δ, ω and φ are fi tted by least square optimi-
zation (i.e., by minimizing the error between the
actual observed load and the predicted load, and we
use simple heuristic search using genetic algorithms
to fi nd their optimal values.
Nonlinear Time Series ModelsMachine learning (ML) techniques focus on
learning a prediction function that takes as input the
historical load and other data such as weather, and
outputs the predicted load. Unlike the statistical meth-
ods reviewed in the previous section, the ML methods
chosen in our study enable us to learn a nonlinear
prediction function. Parametric machine learning
techniques focus on tuning the parameters of the load
prediction function. Khontanzad et al. [22] described
a state-of-the-art implementation of neural networks
for load forecasting, that has been used by several
electrical companies. Fan and Chen [12] employed
self-organizing maps to cluster the load and weather
data into several regimes, before using them as inputs
to a nonlinear regression function.
146 Bell Labs Technical Journal DOI: 10.1002/bltj
We focus in this paper on kernel-based methods,
learning the relationship between data samples: in
this case, each sample corresponds to a pair of his-
torical load and weather data, taken over a short
time interval, and the electric load at the next time
point. We compared three standard, proven tech-
niques: weighted kernel regression (WKR) [6], sup-
port vector regression (SVR) [7], and kernel ridge
regression with learnable feature coeffi cients.
In addition to kernel methods, we investigated
simple neural network models with one hidden
layer. Although the latter achieved good perfor-
mance at one-hour prediction horizons, they would
perform poorly on iterated forecasts and the error
would rapidly increase after a few iterations of the
neural network predictor (results not reported).
Research on modeling dynamic systems using one
hidden-layer neural networks showed indeed that
these nonlinear models are very sensitive to noise
and that they can generate predictions that diverge
from the training set patterns. More complex neural
network models that provide stable iterated predic-
tions and are capable of learning long-term depen-
dencies [1] are beyond the scope of this paper. In
parallel, it has been proven experimentally that ker-
nel methods such as SVR provide more stable iter-
ated predictions on highly nonlinear time series than
the basic embodiment of neural networks [26].
While they do not model long-term dependencies,
they at least provide a solution that is bounded and
stays within the patterns seen in the training set. This
statement does not apply to more complex neural
network architectures (that involve state space mod-
els and learning hidden representations of time
series).
Weighted kernel regression. Weighted kernel
regression (WKR) [28] is the simplest among the
non-parametric regression algorithms. It consists of computing the Euclidean distance metric between
the input sample x and each data point sample y(t) at
time t in the training set and then using it in a Gaussian
kernel function k(x,y(t)) that can be seen as a measure
of symmetric “similarity” between the two samples x
and y(t). The Gaussian kernel takes a value equal to one
when x and y(t) are identical and therefore when their
distance is equal to zero. The kernel function takes
decreasing values down to zero as the input sample x
becomes “dissimilar” from the training point y(t) and
therefore as their distance increases.
k(x,y(t)) = exp(− 1 __ 2 ∑
k=1
k
1 __ σ2 (xk−yk(t))2)
The kernel function is used as the weight of data
point y(t) in the decision function (equation 2). The
decision function is a weighted interpolation over
the entire training dataset.
L = ΣtLtk(x,y(t))
__________ Σtk(x,y(t))
WKR assumes smoothness within the input data,
controlled through a “spread” coeffi cient σ that
depends on the dataset and is fi tted by n-fold cross-
validation on the training data. We resorted to fi ve-
fold cross-validation on fi ve non-overlapping sets.
More specifi cally, for each choice of hyperparameters,
we used 80 percent of the training data to fi t the model
and the remaining 20 percent to compute the predic-
tion performance, and repeated that step fi ve times.
Support vector regression. Support vector
machines (SVMs) [9, 34] are a popular and effi cient
statistical learning tool that can be qualifi ed as mostly
non-parametric. SVMs are also called maximum mar-
gin classifi ers, because their decision boundary is, by
construction, as far as possible from the training data
points, so that they remain well separated according
to their labels. Maximum margin training enables bet-
ter generalization of the classifi er to unseen examples.
The work on support vector regression (SVR) by
Chen et al. [7] was indeed the winning entry to a
competition on the prediction of electric load and can
be considered as a state-of-the-art method. SVR relies
on the defi nition of a kernel function k(x,y(t)) and in
using a decision function f(x) for a sample x that is
defi ned in terms of the kernel function between x
and the data points in the training set, but involving
a minimal, sparse, set of support vectors S = {y(t)}
that are each given a weight αt. Learning in SVM cor-
responds to fi nding a minimal set S of support vectors
that minimizes the error on the training labels.
L = ΣtLtαtk(x,y(t))
DOI: 10.1002/bltj Bell Labs Technical Journal 147
We cross-validated the SVM’s regularization
coeffi cient C as well as the Gaussian spread coeffi -
cient using fi ve-fold cross-validation.
Kernel ridge regression. Kernel ridge regression is
a generalized version of support vector regression.
One can see it as a trivial extension of SVR, where
the Gaussian spread coeffi cient is tuned for each
input regressor (feature) separately using a gradient-
descent optimization procedure and cross-validation
[5]. This method, which we call sigma-SVR, differs
from SVR by this simple equation:
k(x,y(t)) = exp(− 1 __ 2 ∑ k=1
k
1 __ σ2 (xk − yk(t))2)
Short-Term Load Forecasting ResultsIn our investigations, we used the standard
demand prediction metric, mean absolute percent-
age error (MAPE), which, for a set of N load values
Lt(e.g, in Watt hours Wh) and associated load fore-
casts Lt, is defi ned as:
MAPE = 1 __ N
∑ t = 1
N
| Lt − Lt | ______
|Lt|
In the previously published STLF studies on city-
wide and country-wide load forecasting, the MAPE
typically was expected to range from a one to a three
percent error at next-hour horizon forecasts to about
four percent error at next-day horizons.
System-Wide PredictionsIn a fi rst series of experiments, we compared the
performance of three iterated predictors relying on
nonlinear time series models based on kernel meth-
ods: weighted kernel regression (WKR), support vec-
tor regression (SVR), and sigma-SVR (sSVR) on
system-wide load from 2007 to 2009. We would train
the predictors on one year of load and weather fore-
casts, and make predictions for the following month,
repeating this procedure 24 times for January 2008
through December 2009, averaging the MAPE per-
formance, for each prediction horizon, over all 24
months. Our approach essentially simulated an STLF
system retrained every month to fi t mid- to long-
term evolutions of the city-wide load consumption
and of the climate.
Unsurprisingly, as reported on Figure 5, the
more complex kernel method that enabled us both
to weigh each input feature (e.g., load at a specifi c
time, time of day, temperature or humidity forecast)
individually and to select the support vectors, namely
sSVR, achieved the best results (MAPE = 1.2 per-
cent) at the one-hour horizon and MAPE = 4.7 per-
cent after h = 24 hours. The Steadman apparent
temperature would slightly outperform raw temper-
ature (decreasing the MAPE).
We then compared the performance of iterated
sSVR to the direct prediction using sSVR, as well as to
the remaining, linear, models, namely Holt-Winters
double-exponential smoothing (HWT), state-space
models with B-spline fi t on load residue (SSM) and
seasonal auto-regressive integrated moving average
(SARIMA), all operating on the load residue after fi t-
ting the load on temperature and hour of the day
(see “Temperature Regression and Load Residual”).
As can be seen in Figure 6 and Figure 7, which pro-
vide details on the system-wide aggregated load from
2012, the overall best algorithms were HWT and
sSVR. HWT achieved MAPE = 4 percent performance
at h = 24 on the 2008 − 2009 dataset, slightly out-
performing sSVR. The performance on the aggre-
gated (2012) dataset was worse, because the set of
meters considered (32,000) was only a subset of the
total city load. Figure 8 and Figure 9 show how
these predictions actually look, at h = 1 and at h = 24
respectively.
Performance on Meter AggregatesWe observed that the load forecasting perfor-
mance seemed to worsen for lower level aggregates
and tried to verify the hypothesis that, independently
of the method, aggregates with large forecast errors
are those with very few meters. As can be seen on
Figure 10, we trained about 400 STLF predictors
on different meter aggregates (feeders, substations,
and system-wide) and plotted the performance
(MAPE at h = 1) versus the size of the meter aggregate
(which we can measure, for instance, as the number
of meters interconnected to that aggregate, or as the
peak hourly load measured at that meter aggregate).
The MAPE would decrease as a function of meter
k
148 Bell Labs Technical Journal DOI: 10.1002/bltj
aggregate size (the more meters in an aggregate, the
better the MAPE). We hypothesize that aggregates
connected to more meters tend to behave in a more
predictable way: the effect of weather (temperature)
is prominent and there is an averaging effect due to
the large sample (hundreds or thousands) of meters.
Some meter aggregates (see Figure 11) can never-
theless be relative well predictable, despite their
small size (here 12 meters).
At the substation or system level, accurate fore-
casts can be useful input to strategic cost-saving deci-
sions. At the level of individual meters, the utility is
not interested in predicting precisely how much elec-
tricity will be used every hour, but rather in detecting
large spikes of abnormal activity. Such abnormal
usage spikes could be indicative of a system failure in
the home (e.g., a malfunctioning heat pump), and
could be useful information to the customer. Accurate
forecasts can serve as baselines for detecting such
anomalies.
DiscussionIn this section, we discuss the practical consider-
ations for the implementation and deployment of a
load forecasting system, including modularity and
parallelization, running time considerations,
and robustness of the forecasts.
Independent STLF for Each Meter AggregateAs explained previously, the meters, feeders, and
substations considered in this study of a mid-sized
Figure 5.System-wide load forecasts using kernel methods for nonlinear time series modeling. These curves are the average of monthly MAPE performance over two years (2008–2009).
MAPE—Mean absolute percentage errorSVR—Support vector regressionWKR—Weighted kernel regression
sigmaSVR 24h load + temperature
sigmaSVR 24h load + Steadman temperaturesigmaSVR 24h load + temperature + humidity
WKR 8h load + temperature
WKR 8h load + Steadman temperatureSVR 8h load + temperature
SVR 8h load + Steadman temperature
11
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
3 6 9 12
Prediction horizon (h)
System–wide load predictions for Jan 2008 – Dec 2009, per–month MAPE averages
MA
PE (
%)
15 18 21 24
DOI: 10.1002/bltj Bell Labs Technical Journal 149
U.S. city are interconnected in a hierarchical distri-
bution network.
Such a rich hierarchy invites a study of the cor-
relations or even interdependencies among all
metered electrical components. The obvious advan-
tage is in exploiting redundancies among all the
meters (as households in the same urban area and
under identical climatic conditions might present
similar load consumption profi les).
From a systems perspective, it may be desirable
to make the load prediction component as modular
as possible and to forecast load independently for
each meter or load aggregate. In this study, all the
predictions at the same level of aggregation are
considered independent from the point of view of
load forecasting, despite the correlations between
each feeder connected to a given substation and the
substation itself.
There are several justifi cations for this approach.
First of all, the meters in our system often are updated
asynchronously or even suffer downtimes, not nec-
essarily related to power outages. It could therefore
be very detrimental, for the operation of the entire
system, to make it wait for synchronous meter
updates. Here, we allow for asynchronous data
updates and load forecasts within the prediction
timeframe, which happens at a granularity of one
hour.
Figure 6.System-wide load forecasts using various families of prediction algorithms, using the total load consumption of a mid-sized U.S. city. The curves represent average monthly MAPE performance over two years from 2008 to 2009.
HWT—Holt-Winters modelMAPE—Mean absolute percentage errorSARIMA—Seasonal auto-regressive integrated moving averageSSM—State-space modelSVR—Support vector regression
10
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7System–wide load predictions for Jan 2008 – Dec 2009, per–month MAPE averages
3 6 9 12Prediction horizon (h)
MA
PE (
%)
15 18 21 24
HWT on residual load from apparent temperature fit
Iterated sigma–SVR using app. temp. and wind chill temp. and load
Direct sigma–SVR using app. temp. and wind chill temp. and load
SSM on residual load from temperature fit at h−1
SARIMA on residual load from temperature fit at h−1
150 Bell Labs Technical Journal DOI: 10.1002/bltj
Secondly, enforcing independence at each level
of aggregation enables us to trivially parallelize the
operation of the STLF modules for all the aggregates.
Each module’s only points of input/output are data-
base accesses to read the latest meter historical load
data as well as the associated geo-specifi c weather data
and to return forecasts at different horizons.
Running Time and PerformanceThe parallelism that is enabled by the indepen-
dent STLF operations facilitates the implementation
of our system in a multi-threaded environment.
Essentially, the process for generating hourly load
forecasts at an aggregate level can be run as soon
as all the hourly weather and meter data for the
aggregate components have been collected. The
system does not need to wait for the completion of
all the prediction processes, each of which takes
care of updating the database with forecasts
independently.
For model development purposes, we have been
using a 16-core 2.3 GHz Intel Xeon* Linux* server
with 24 GB random access memory (RAM), running
Ubuntu*. The deployment system is a 32-core, 128
GB RAM, Linux system running Red Hat. The sigma-
SVR and HWT algorithms are implemented in Matlab
Figure 7.Comparison of different load forecasting algorithms for 32,000 meter load aggregates. The curves represent average monthly MAPE performance over the six months from January to June 2012.
HWT—Holt-Winters modelMAPE—Mean absolute percentage errorSARIMA—Seasonal auto-regressive integrated moving averageSSM—State-space modelSVR—Support vector regression
10
1
2
3
4
5
6
MA
PE (
%)
7
8
9
10
11
12
13
3 6 9 12Prediction horizon (h)
System–wide load predictions for Jan – Jun 2012, per–month MAPE averages
15 18 21 24
HWT on residual load from apparent temperature fit at h−1
Iterated sigma–SVR using app. temp. and wind chill temp. and load
Direct sigma–SVR using app. temp. and wind chill temp. and load
SSM on residual load from temperature fit at h−1
SARIMA on residual load from temperature fit at h−1
DOI: 10.1002/bltj Bell Labs Technical Journal 151
(or its open source clone, Octave) and the SARIMA
and SSM methods run in R.
Our system avoids major computational bottle-
necks at runtime. The SARIMA, SSM, and HWT
methods can make essentially instantaneous fore-
casts on the 400 or so meter aggregates. The kernel
methods-based predictions by the sigma-SVR algo-
rithm require, for each meter aggregate and for each
prediction horizon (up to 168), a few matrix multi-
plications, with matrix dimensions on the order of
10,000. The latter can bring the computational time
to several minutes, once per hour.
The largest computational requirements are due
to training the prediction algorithms, which, as we
explained, happens once a month. While the
SARIMA and SSM methods are, again, negligible in
terms of training time, it typically takes a few hours
to cross-validate the state parameters of the HWT
model and about one day to learn the feature and
Lagrange coeffi cients of the sigma-SVR predictor.
This is currently handled by scheduling learning for
all the models over several days.
Ensemble PredictionGiven that we have four different prediction
algorithms (HWT, sSVR, SARIMA, SSM), we can
study methods for combining their predictions for
potentially better accuracy and robustness to noise
Figure 8.Predictions and prediction errors by four algorithms and a simple weather fi t model (in gray) over one week in 2008. These plots show the predictions at horizon h = 1 hour.
HWT—Holt-Winters modelSARIMA—Seasonal auto-regressive integrated moving averageSSM—State-space modelsSVR—Sigma SVRSVR—Support vector regression
Predictions: 1–hour ahead
Prediction Errors: 1–hour ahead
4000
00
KW
h
02–05–2008 00:00:00 02–07–2008 00:00:00 02–09–2008 00:00:00 02–11–2008 00:00:00
8000
0012
0000
0
(a)
(b)
−2e
+05
KW
h
02–05–2008 00:00:00 02–07–2008 00:00:00 02–09–2008 00:00:00 02–11–2008 00:00:00
0e+
002e
+05
Observed Regression On Temp HWT sSVR SARIMA SSM
152 Bell Labs Technical Journal DOI: 10.1002/bltj
and random errors. We conjecture that possibility
after observing that the predictions of the four algo-
rithms have largely uncorrelated errors, as visible on
Figure 8 for an example of system-wide load fore-
casts over one week at the one-hour horizon and on
Figure 9, on the same data and time period, at a
24-hour horizon.
Systematically generated ensembles are used
extensively in numerical weather forecasting [29].
Our approach, on the other hand, needs to work
with a small ensemble, each of which has indepen-
dent ability to achieve a certain level of accuracy. In
this case, simple combination strategies are desirable.
We considered fi ve simple schemes for combining
the predictions:
1. Mean of four predictions,
2. Median of four predictions,
3. Switching among four predictions, using the one
with the smallest absolute error at the time when
the prediction is made,
4. Mean of HWT and sSVR, and
5. Switching between HWT and sSVR, using the
one with the smallest absolute error at the time
when the prediction is made.
We summarize in Figure 12 the performance of
these algorithms and the combined predictions for
the system-wide aggregates from 2008 to 2009. The
fi nal performance of the mean of HWT and sSVR
predictions on the system-wide data reaches a per-
formance around MAPE = 3 percent at a 24-hour
Figure 9.Predictions and prediction errors by four algorithms and a simple weather fi t model (in gray) over one week in 2008. These plots show the predictions at horizon h = 24 hours.
HWT—Holt-Winters modelSARIMA—Seasonal auto-regressive integrated moving averageSSM—State-space modelsSVR—Sigma SVRSVR—Support vector regression
Predictions: 24–hour ahead
Prediction Errors: 24–hour ahead
4000
00
KW
h
02–05–2008 00:00:00 02–07–2008 00:00:00 02–09–2008 00:00:00 02–11–2008 00:00:00
8000
0012
0000
0
(a)
(b)
−2e
+05
KW
h
02–05–2008 00:00:00 02–07–2008 00:00:00 02–09–2008 00:00:00 02–11–2008 00:00:00
0e+
002e
+05
Observed Regression On Temp HWT sSVR SARIMA SSM
DOI: 10.1002/bltj Bell Labs Technical Journal 153
prediction horizon, down from about four percent
achieved by HWT alone. We can see that by most of
the performance criteria considered, either the mean
or the median of the four predictors gives the best
performance, and it is better than the best individual
method except for the horizon of one hour ahead
(which is best done by sSVR). Further investigation
will examine to what extent this observation gener-
alizes to smaller meter aggregates.
ConclusionWe methodically evaluated state-of-the-art STLF
methods on a unique dataset consisting of load
aggregates from individual meters, and showed a
dependency of the load forecasting performance on
the size of the aggregate. In this study, we considered
load forecasting at each meter aggregate as an inde-
pendent task, and did not fully exploit the pyramidal
structure of the meter-feeder-substation network.
Future investigations could explore such hierarchical
time series prediction.
AcknowledgementsThe authors wish to acknowledge the help and
contribution of former and current members of
Alcatel-Lucent Bell Labs: Gary Atkinson, Kenneth
Budka, Jayant Deshpande, Frank Feather, Zhi He,
Marina Thottan and Kim Young Jin, as well as the
Figure 10.Relationship between load forecasting accuracy and the size of the load aggregate (i.e., the number of meters connected to the electrical structure). The monthly MAPE performance has been averaged over six months, from January to June 2012).
100 101 102 103 104 1051
2
3
4
5
6
7
8
9
10
2030405060708090
100
log10 of number of meters connected to the aggregate
MA
PE (
%)
in lo
g10
sca
le
1−hour ahead predictions at aggregate level
HWT—Holt-Winters modelMAPE—Mean absolute percentage errorSVR—Support vector regression
Selected feeder
System-wide
HWTSigma−SVR
154 Bell Labs Technical Journal DOI: 10.1002/bltj
Figure 11.Predictions by two algorithms and a simple weather fi t model (in gray) over three weeks in 2012 for a selected feeder connected to 12 individual meters. These plots show the predictions at horizon h = 1 hour.
Selected feeder: 1–hour ahead STLF
1000
0
Wh
02–12–2012 00:00:00 02–14–2012 00:00:00 02–16–2012 00:00:00 02–18–2012 00:00:00
3000
050
000
1000
0
Wh
02–19–2012 00:00:00 02–21–2012 00:00:00 02–23–2012 00:00:00 02–25–2012 00:00:00
3000
050
000
1000
0
Wh
02–26–2012 00:00:00 02–28–2012 00:00:00 03–01–2012 00:00:00 03–03–2012 00:00:00
3000
050
000
Selected feeder: 1–hour ahead STLF
Selected feeder: 1–hour ahead STLF
(a)
(b)
(c)
HWT—Holt-Winters modelsSVR—Sigma SVRSTLF—Short-term load forecastSVR—Support vector regression
Observed Regression On Temp HWT sSVR
Observed Regression On Temp HWT sSVR
Observed Regression On Temp HWT sSVR
Fig
ure
12.
Perf
orm
ance
of
the
fou
r p
red
icto
rs a
nd
th
eir
com
bin
atio
ns,
ben
chm
arke
d a
gai
nst
a p
red
icto
r u
sin
g o
nly
reg
ress
ion
on
tem
per
atu
re,
for
pre
dic
tio
n h
ori
zon
s fr
om
on
e to
24
ho
urs
ah
ead
.
(a)
(b)
(c)
Leg
end
s
HW
T—H
olt
-Win
ters
mo
del
SAR
IMA
—Se
aso
nal
au
to-r
egre
ssiv
e in
teg
rate
d m
ovi
ng
ave
rag
esS
VR
—Si
gm
a SV
RSV
R—
Sup
po
rt v
ecto
r re
gre
ssio
n
5
1
100002000030000
kwh
40000
234
%
567
Mea
n A
bso
lute
Per
cen
tag
e Er
ror
1015
Ho
rizo
n (
ho
urs
)H
ori
zon
(h
ou
rs)
205
Mea
n A
bso
lute
Err
or
1015
20
(d)
(e)
5
2000030000400005000060000
Ro
ot
Mea
n S
qu
ared
Err
or
1015
205
150000
−300000−250000−200000−150000
kwh
kwh
250000
kwh
350000450000
Wo
rst
Ove
rsh
oo
tW
ors
t U
nd
ersh
oo
t
1015
Ho
rizo
n (
ho
urs
)H
ori
zon
(h
ou
rs)
Ho
rizo
n (
ho
urs
)
205
1015
20
Reg
ress
ion
On
Tem
p
HW
T
sSV
R
SAR
IMA
SSM
MEA
N
MED
IAN
SWIT
CH
MEA
N_H
WT_
sSV
R
SWIT
CH
_HW
T_sS
VR
156 Bell Labs Technical Journal DOI: 10.1002/bltj
utility company for providing the meter and weather
dataset used in this study.
*TrademarksLinux is a trademark of Linus Torvalds. Matlab is a registered trademark of The Mathworks, Inc. Perl is a trademark of the Perl Foundation.Ubuntu is a registered trademark of Canonical Limited.Xeon is a registered trademark of Intel Corporation.
References [1] Y. Bengio, P. Simard, and P. Frasconi,
“Learning Long-Term Dependencies with Gradient Descent Is Diffi cult,” IEEE Trans. Neural Networks, 5:2 (1994), 157–166.
[2] A. Bhargava, “On the Theory of Testing for Unit Roots in Observed Time Series,” Rev. Econom. Stud., 53:3 (1986), 369–384.
[3] C. E. Borges, Y. K. Penya, and I. Fernández, “Optimal Combined Short-Term Building Load Forecasting,” Proc. IEEE PES Innovative Smart Grid Technol. Asia Conf. (ISGT ’11) (Perth, Aus., 2011).
[4] J. Cao, A. Chen, T. Bu, and A. Buvaneswari, “Monitoring Time-Varying Network Streams Using State-Space Models,” Proc. 28th IEEE Internat. Conf. on Comput. Commun. (INFOCOM ’09) (Rio de Janeiro, Bra., 2009), pp. 2721–2725.
[5] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, “Choosing Multiple Parameters for Support Vector Machines,” Mach. Learn., 46:1-3 (2002), 131–159.
[6] W. Charytoniuk, M. S. Chen, and P. Van Olinda, “Nonparametric Regression Based Short-Term Load Forecasting,” IEEE Trans. Power Syst., 13:3 (1998), 725–730.
[7] B.-J. Chen, M.-W. Chang, and C.-J. Lin, “Load Forecasting Using Support Vector Machines: A Study on EUNITE Competition 2001,” IEEE Trans. Power Syst., 19:4 (2004), 1821–1830.
[8] W. S. Cleveland and S. J. Devlin, “Locally-Weighted Regression: An Approach to Regression Analysis by Local Fitting,” J. Amer. Statist. Assoc., 83:403 (1988), 596–610.
[9] C. Cortes and V. Vapnik, “Support-Vector Networks,” Mach. Learn., 20:3 (1995), 273–297.
[10] V. Dordonnat, S. J. Koopman, M. Ooms, A. Dessertaine, and J. Collet, “An Hourly Periodic State Space Model for Modelling French National Electricity Load,” Internat. J. Forecasting, 24:4 (2008), 566–587.
[11] Enel, “Smart Metering System,” <http://www.enel.com/en-GB/innovation/project_technology/zero_emission_life/smart_networks/smart_meters.aspx>, accessed Mar. 5, 2012.
[12] S. Fan and L. Chen, “Short-Term Load Forecasting Based on an Adaptive Hybrid Method,” IEEE Trans. Power Syst., 21:1 (2006), 392–401.
[13] E. A. Feinberg and D. Genethliou, “Load Forecasting,” Applied Mathematics for Restructured Electric Power Systems: Optimization, Control, and Computational Intelligence (J. H. Chow, F. F. Wu, and J. A. Momoh, eds.), Springer, New York, 2005, pp. 269–285.
[14] P. Goovaerts, Geostatistics for Natural Resources Evaluation, Oxford University Press, New York, 1997.
[15] G. Gross and F. D. Galiana, “Short-Term Load Forecasting,” Proc. IEEE, 75:12 (1987), 1558–1573.
[16] F. Gullo, G. Ponti, A. Tagarelli, S. Iiritano, M. Ruffolo, and D. Labate, “Low-Voltage Electricity Customer Profi ling Based on Load Data Clustering,” Proc. 13th Internat. Database Eng. and Applications Symp. (IDEAS ’09) (Cetraro, Calabria, Ita., 2009), pp. 330–333.
[17] H. Hahn, S. Meyer-Nieberg, and S. Pickl, “Electric Load Forecasting Methods: Tools for Decision Making,” European J. Oper. Res., 199:3 (2009), 902–907.
[18] T. M. Hansen, “mGstat: A Geostatistical Matlab Toolbox,” <http://mgstat.sourceforge.net/>.
[19] A. Harvey and S. J. Koopman, “Forecasting Hourly Electricity Demand Using Time-Varying Splines,” J. Amer. Statist. Assoc., 88:424 (1993), 1228–1236.
[20] J. D. Hobby, A. Shoshitaishvili, and G. H. Tucci, “Analysis and Methodology to Segregate Residential Electricity Consumption in Different Taxonomies,” IEEE Trans. Smart Grid, 3:1 (2012), 217–224.
[21] T. Hong, P. Wang, and H. L. Willis, “A Naïve Multiple Linear Regression Benchmark for Short Term Load Forecasting,” Proc. IEEE Power and Energy Soc. Gen. Meeting (Detroit, MI, 2011).
[22] A. Khotanzad, R. Afkhami-Rohani, and D. Maratukulam, “ANNSTLF—Artifi cial Neural Network Short-Term Load Forecaster—
DOI: 10.1002/bltj Bell Labs Technical Journal 157
Generation Three,” IEEE Trans. Power Syst., 13:4 (1998), 1413–1422.
[23] J. Z. Kolter and M. J. Johnson, “REDD: A Public Data Set for Energy Disaggregation Research,” Proc. KDD Workshop on Data Mining Applications in Sustainability (SustKDD ’11) (San Diego, CA, 2011).
[24] D. Kwiatkowski, P. C. B. Phillips, P. Schmidt, and Y. Shin, “Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root,” J. Econometrics, 54:1-3 (1992), 159–178.
[25] E. Kyriakides and M. Polycarpou, “Short Term Electric Load Forecasting: A Tutorial,” Trends in Neural Computation (K. Chen and L. Wang, eds.), Springer, Berlin, New York, 2007, pp. 391–418.
[26] D. Mattera and S. Haykin, “Support Vector Machines for Dynamic Reconstruction of a Chaotic System,” Advances in Kernel Methods: Support Vector Learning (B. Schölkopf, C. J. C. Burges, and A. J. Smola, eds.), MIT Press, Cambridge, MA, 1999, pp. 211–242.
[27] A. Muñoz, E. F. Sánchez-Úbeda, A. Cruz, and J. Marín, “Short-Term Forecasting in Power Systems: A Guided Tour,” Handbook of Power Systems II: Energy Systems (S. Rebennack, P. M. Pardalos, M. V. F. Pereira, and N. A. Iliadis, eds.), Springer, Heidelberg, New York, 2010, pp. 129–160.
[28] E. A. Nadaraya, “On Estimating Regression,” Theory Probab. Appl., 9:1 (1964), 141–142.
[29] T. N. Palmer, G. J. Shutts, R. Hagedorn, F. J. Doblas-Reyes, T. Jung, and M. Leutbecher, “Representing Model Uncertainty in Weather and Climate Prediction,” Annual Review of Earth and Planetary Sciences, Volume 33 (R. Jeanloz, A. L. Albee, and K. C. Burke, eds.), Annual Reviews, Palo Alto, CA, May 2005, pp. 163–193.
[30] A. Pigazo and V. M. Moreno, “Estimation of Electrical Power Quantities by Means of Kalman Filtering,” Kalman Filter: Recent Advances and Applications (A. Pigazo and V. M. Moreno, eds.), In-Tech, Rijeka, Cro., Apr. 2009, pp. 375–396.
[31] S. Rogai, “Keynote I. Telegestore Project Progresses and Results,” IEEE Internat. Symp. on Power Line Commun. and Its Applications (ISPLC ’07) (Pisa, Ita., 2007).
[32] J.-H. Shin, B.-J. Yi, Y.-I. Kim, H.-G. Lee, and K. H. Ryu, “Spatiotemporal Load-Analysis Model for Electric Power Distribution Facilities
Using Consumer Meter-Reading Data,” IEEE Trans. Power Delivery, 26:2 (2011), 736–743.
[33] R. P. Singh, P. X. Gao, and D. J. Lizotte, “On Hourly Home Peak Load Prediction,” Proc. 3rd IEEE Internat. Conf. on Smart Grid Commun. (SmartGridComm ’12) (Tainan, Twn., 2012), pp. 163–168.
[34] A. J. Smola and B. Schölkopf, “A Tutorial on Support Vector Regression,” Stat. Comput., 14:3 (2004), 199–222.
[35] L. J. Soares and M. C. Medeiros, “Modeling and Forecasting Short-Term Electricity Load: A Comparison of Methods with an Application to Brazilian Data,” Internat. J. Forecasting, 24:4 (2008), 630–644.
[36] R. G. Steadman, “A Universal Scale of Apparent Temperature,” J. Climate Applied Meteorology, 23:12 (1984), 1674–1687.
[37] J. W. Taylor and P. E. McSharry, “Short-Term Load Forecasting Methods: An Evaluation Based on European Data,” IEEE Trans. Power Syst., 22:4 (2007), 2213–2219.
(Manuscript approved October 2013)
PIOTR MIROWSKI is a member of technical staff at Bell Labs in Murray Hill, New Jersey. He obtained his Ph.D. in computer science at the Courant Institute of Mathematical Sciences at New York University, New York City, with a thesis in machine learning under the supervision of Prof. Yann LeCun.
He also has a master’s degree in computer science from École Nationale Supérieure ENSEEIHT in Toulouse, France. Prior to joining Bell Labs he worked as a research engineer in geology at Schlumberger Research. During his Ph.D. studies, he interned at the NYU Medical Center (investigating epileptic seizure prediction from EEG), at Google, at the Quantitative Analytics department of Standard & Poor’s and at AT&T Labs Research. His current research focuses on machine learning methods for text analysis and query ranking, on computer vision and simultaneous localization and mapping for robotics, on indoor localization, on time series modeling and load forecasting for smart grids, and on deep learning.
SINING CHEN is a member of technical staff in the Statistics and Learning Research Department at Bell Labs in Murray Hill, New Jersey. She received a B.S. in applied mathematics from Tsinghua University, Beijing, China and a Ph.D. in statistics from
158 Bell Labs Technical Journal DOI: 10.1002/bltj
Duke University, Durham, North Carolina. After completing her doctorate, Dr. Chen worked at Johns Hopkins University, fi rst as a postdoctoral fellow in the School of Medicine and then as an assistant professor in the School of Public Health. Prior to joining Bell Labs, she was an associate professor at the Department of Biostatistics, University of Medicine and Dentistry of New Jersey. Her current research interests include Bayesian methods and forecasting.
TIN KAM HO leads the Statistics of Communication Systems Research Activity in Bell Labs at Murray Hill. She pioneered research in multiple classifi er systems, random decision forests, and data complexity analysis, and pursued applications of
automatic learning in many areas of science and engineering. She also led major efforts on modeling and monitoring large-scale optical transmission systems. Recently she worked on wireless geo-location, video surveillance, smart grid data mining, and customer experience modeling. Her contributions were recognized by a Bell Labs President’s Gold Award and two Bell Labs Teamwork Awards, a Young Scientist Award in 1999, and the 2008 Pierre Devijver Award for Statistical Pattern Recognition. She is an elected Fellow of IAPR (International Association for Pattern Recognition) and IEEE, and served as editor-in-chief of the journal Pattern Recognition Letters in 2004-2010. She received a Ph.D. in computer science from State University of New York (SUNY), Buffalo.
CHUN-NAM YU is a member of technical staff in the Statistics and Learning Department at Bell Labs in Murray Hill, New Jersey. He received a B.A. degree in mathematics and computer science from Oxford University, United Kingdom, and an M.S. degree and Ph.D. in computer science from Cornell
University, Ithaca, New York. Prior to joining Bell Labs, he was a postdoctoral fellow at the Alberta Innovates Centre of Machine Learning (AICML) at the University of Alberta in Edmonton, Alberta, Canada. His research interests include structured output learning, graphical models, kernel methods, optimization, and biomedical applications. ◆