60
AN ANALYSIS OF SHORT-TERM LOAD FORECASTING ON RESIDENTIAL BUILDINGS USING DEEP LEARNING MODELS SREERAG SURESH THESIS SUBMITTED TO THE FACULTY OF THE VIRGINIA POLYTECHNIC INSTITUTE AND STATE UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN ENVIRONMENTAL ENGINEERING FARROKH JAZIZADEH KARIMI, CHAIR LINSEY C MARR GABRIEL ISAACMAN-VANWERTZ MAY 21 ST , 2020 BLACKSBURG, VIRGINIA KEYWORDS: LOAD FORECASTING, BUILDING ENERGY, CNN, DEEP LEARNING, LSTM Copyright @ 2020, Sreerag Suresh

Sreerag Thesis Word - Virginia Tech

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sreerag Thesis Word - Virginia Tech

AN ANALYSIS OF SHORT-TERM LOAD FORECASTING ON RESIDENTIAL

BUILDINGS USING DEEP LEARNING MODELS

SREERAG SURESH

THESIS SUBMITTED TO THE FACULTY OF THE VIRGINIA POLYTECHNIC

INSTITUTE AND STATE UNIVERSITY IN PARTIAL FULFILLMENT OF THE

REQUIREMENTS FOR THE DEGREE OF

MASTER OF SCIENCE

IN

ENVIRONMENTAL ENGINEERING

FARROKH JAZIZADEH KARIMI, CHAIR

LINSEY C MARR

GABRIEL ISAACMAN-VANWERTZ

MAY 21ST, 2020

BLACKSBURG, VIRGINIA

KEYWORDS: LOAD FORECASTING, BUILDING ENERGY, CNN, DEEP

LEARNING, LSTM

Copyright @ 2020, Sreerag Suresh

Page 2: Sreerag Thesis Word - Virginia Tech

AN ANALYSIS OF SHORT-TERM LOAD FORECASTING ON RESIDENTIAL

BUILDINGS USING DEEP LEARNING MODELS

SREERAG SURESH

ABSTRACT

Building energy load forecasting is becoming an increasingly important task with the

rapid deployment of smart homes, integration of renewables into the grid and the advent

of decentralized energy systems. Residential load forecasting has been a challenging

task since the residential load is highly stochastic. Deep learning models have showed

tremendous promise in the fields of time-series and sequential data and have been

successfully used in the field of short-term load forecasting at the building level.

Although, other studies have looked at using deep learning models for building energy

forecasting, most of those studies have looked at limited number of homes or an

aggregate load of a collection of homes. This study aims to address this gap and serve

as an investigation on selecting the better deep learning model architecture for short

term load forecasting on 3 communities of residential buildings. The deep learning

models CNN and LSTM have been used in the study. For 15-min ahead forecasting for

a collection of homes it was found that homes with a higher variance were better

predicted by using CNN models and LSTM showed better performance for homes with

lower variances. The effect of adding weather variables on 24-hour ahead forecasting

was studied and it was observed that adding weather parameters did not show an

improvement in forecasting performance. In all the homes, deep learning models are

shown to outperform the simple ANN model.

Page 3: Sreerag Thesis Word - Virginia Tech

AN ANALYSIS OF DEEP LEARNING MODELS FOR SHORT TERM LOAD

FORECASTING ON RESIDENTIAL BUILDINGS

SREERAG SURESH

GENERAL AUDIENCE ABSTRACT

Building energy load forecasting is becoming an increasingly important task with the

rapid deployment of smart homes, integration of renewables into the grid and the advent

of decentralized energy systems. Residential load forecasting has been a challenging

task since residential load is highly stochastic. Deep learning models have showed

tremendous promise in the fields of time-series and sequential data and have been

successfully used in the field of short-term load forecasting. Although, other studies

have looked at using deep learning models for building energy forecasting, most of those

studies have looked at only a single home or an aggregate load of a collection of homes.

This study aims to address this gap and serve as an analysis on short term load

forecasting on 3 communities of residential buildings. Detailed analysis on the model

performances across all homes have been studied. Deep learning models have been used

in this study and their efficacy is measured compared to a simple ANN model.

Page 4: Sreerag Thesis Word - Virginia Tech

ACKNOWLEDGMENTS

I would like to express my sincere gratitude to my advisor, Dr. Farrokh Jazizadeh for

his constant support throughout this study. I would not have been able to complete this

research without his guidance. I would also like to express my gratitude to Dr. Linsey

Marr and Dr. Gabriel Issacman-VanWertz for serving in my committee and providing

their valuable insights.

I am extremely grateful to all my colleagues at Virginia Tech, especially all members at

the INFORM lab for their constant support. I would also like to express my sincere

gratitude to Cristiano Ronaldo, Kobe Bryant (RIP Mamba) and Uzumaki Naruto. They

have been an immense source of inspiration and constantly motivate me do best.

I am thankful to my friends Manu Krishnan, Amal Shaj, Nevedita Sankararaman,

Venkatesh Modi and Prachi Jain for providing me their valuable feedback and helping

me improve my thesis.

I would like to dedicate this thesis to my parents Mr. Suresh Babu and Mrs. Deepa

Suresh for their unconditional love and support.

iv

Page 5: Sreerag Thesis Word - Virginia Tech

Table of Contents

1. Introduction ............................................................................................................. 1

2. Literature Review .................................................................................................... 4

3. Methodology ........................................................................................................... 9

3.1 Problem Formulation .......................................................................................... 9

3.2 Theoretical Background – Deep Learning ........................................................ 10

3.3 Theoretical Background – Long Short-Term Networks (LSTMs) ................... 11

3.4 Theoretical Background – Convolutional Neural Networks (CNNs) ............... 13

3.5 Theoretical Background – Multi Layered Perceptron (MLPs) ......................... 14

3.5.1 Optimization Algorithm ................................................................................ 15

3.5.2 Regularization ............................................................................................... 16

4. Evaluation Method ................................................................................................ 17

4.1 Data Collection and Characteristics .................................................................. 17

4.1.1 Residential Homes Data ................................................................................... 17

4.1.2 Weather Data ................................................................................................... 17

4.2 Data Description and Pre-processing ................................................................ 18

4.2.1 Austin ............................................................................................................ 18

4.2.2 California ...................................................................................................... 20

4.2.3 New York ...................................................................................................... 22

4.3 Feature Engineering .......................................................................................... 25

4.4 Implementation Setup ....................................................................................... 26

4.5 Evaluation Metric .............................................................................................. 28

5. Results and Discussion ......................................................................................... 29

5.1 Introduction ....................................................................................................... 29

5.2 Multiple Home Analysis ................................................................................... 29

Page 6: Sreerag Thesis Word - Virginia Tech

5.2.1 Austin and California .................................................................................... 30

5.2.2 New York ...................................................................................................... 37

5.2.3 Variation of RMSE with Variance ................................................................ 41

5.3 Multistep ahead forecasting .............................................................................. 43

5.3.1 Grid Data Forecasting ................................................................................... 43

5.3.2 Use Data Forecasting .................................................................................... 45

5.3.3 Variation with Lookback .............................................................................. 47

6. Conclusions ........................................................................................................... 49

References .................................................................................................................... 51

vi

Page 7: Sreerag Thesis Word - Virginia Tech

1

1. Introduction

According to the US Energy Information Administration, the global energy demand is projected

to increase by 50 %, mostly led by developing economies in Asia. This increasing demand

would put a significant load on the present energy infrastructure and also possibly deteriorate

the world environmental health with an increase in emissions of greenhouse gases from

conventional power sources[1]. Across the United States and Europe, it is estimated that 39 %

and 40 % of the electricity consumption and 38% and 36% of the CO2 emissions are from the

building sector alone[2]. Therefore, regulating and managing building sector energy demand is

an important task for a transition into a more sustainable use of our limited energy resources.

Increasing the use of renewable sources of energy in buildings along with improving building

designs, so that they are more energy efficient are 2 ways of reducing building energy demand

[1]. Energy load forecasting is another method which can be very useful in regulating building

energy demand.

Energy load forecasting has its advantages economically and also with respect to the energy

infrastructure. Predicting the future consumption supports the utility companies to plan ahead

and make economically feasible decisions with respect to resource planning and future

generation. Large amounts of money are involved in energy budgets and therefore providing

reliable forecasts is of significant importance to engineers[3]. For buildings, load forecasting is

instrumental in efficient building load management, load systems commissioning through

noticing system faults, building load operation and avoiding blackouts[4]. Short term load

forecasting at an individual household level is important for implementing demand side

Page 8: Sreerag Thesis Word - Virginia Tech

2

management, improving energy use, improving cost saving and ultimately reducing

environmental impact.

In the last decade, there has been an uptick in the adoption of renewables and distributed

generation sources in the grid along with growing progress and execution of smart grids and

buildings to meet the growing energy demands in an effective way. Integrating these distributed

energy resources without causing disturbances in the grid requires accurate load forecasts across

different time horizons[5].

Although load forecasting is a fairly mature field, there is a shortage of studies using data-driven

methods for load forecasting at an individual building level in the US[6]. Load forecasting at

an individual building level is a challenging task compared to aggregate building load

forecasting as individual building loads are highly stochastic as there are multiple factors

affecting the energy consumption. Building energy consumption is dependent on factors such

as occupant behaviour, weather parameters, use of appliances, location of the building and the

structural characteristics of the building[7].

Our overarching goal in this study is to use Deep learning models i.e. 1-D Convolutional Neural

Networks (CNN’s) and Long-Short Term Memory Networks (LSTM’s) for short term load

forecasting. 1-D CNNs have been successfully used in studying sequential data such as audio

recordings and analyzing time series of sensor data[8]. The strength of CNN is in its capacity

to learn relevant features effectively from raw data which does not require preprocessing[9].

Both CNN’s and LSTM’s have proved to be a useful tool for load forecasting showing

Page 9: Sreerag Thesis Word - Virginia Tech

3

promising results. In this study, we aim to answer the following questions with regard to short

term load forecasting:

• For a collection of homes, if there is any value in setting up individual deep learning

forecasting model architectures for each home?

• For 24-hour ahead forecasts, is there any benefit in adding weather based, date-based

features to improve performance?

Page 10: Sreerag Thesis Word - Virginia Tech

4

2. Literature Review

Load forecasting can save utilities up to 100,000 $ USD annually for a city[10] and therefore,

gives an incentive for utilities to put a focused effort towards improving forecasting accuracy.

The economic and energy infrastructure advantages of load forecasting have led to approaches

as early as 1956 where the authors try to predict the daily peak load 24 hours ahead[11].

In this study a systematic literature review was done initially to identify trends in load

forecasting research. In the literature review it was found that an increased number of research

papers focused on system level whereas the number of studies focusing on individual building

or low aggregations were comparatively lower. This was identified as a gap in literature. The

figure below shows the number of studies classified into 5 categories based on the geographic

scope of prediction i.e. (i) State level – load forecasting done at a state, province or national

level, (ii) City level – forecasting done at a city level, (iii) Building level – load forecasting

done at an individual building level, (iv) Neighborhood level – load forecasting done for an

aggregate number of homes and (v) Review Papers denoted below as N/A.

Studies classified as ‘state level’ looked at forecasting system level loads such as forecasting

loads for the state of California, New South Wales, Chandigarh, Hubei province in China,

Singapore etc. using traditional machine learning models such as SVM, SVR, simple ANN,

deep learning models and also hybrid optimization models to good effect ([12],[13],[14],[15]).

City level studies looked at forecasting loads of cities such as Johor, Langfang, Rome and

Sydney using methods ranging like CNN’s, general regression neural network, echo state

networks using PCA decomposition([16],[17],[18],[19]). At the neighborhood scale,

Page 11: Sreerag Thesis Word - Virginia Tech

5

forecasting was studied for a collection of homes mostly using the CER-Ireland data and UK

DALE dataset which contains about a few hundred homes([20],[21],[22]). The review studies

on forecasting talks about the different methodologies, applications and challenges faced in

load forecasting([23],[24]).

Figure 1 : Preliminary literature review categorized based on the geographic scope of prediction.

Electric load forecasting can be categorized into different categories based on the time horizon

of forecasting. Mocanu et al categorized them into (i) short-term forecasting which involves a

prediction horizon in the range of one hour to one week, (ii) medium term forecasting which

involves a range of one week up to one year and lastly (iii) long term forecasts which involves

a prediction horizon greater than a year. Short term forecasting is useful for demand side

Page 12: Sreerag Thesis Word - Virginia Tech

6

management (DSM), generation capacity scheduling, renewable energy source

(RES)integration and energy storage system applications[25, 26]. In this study, the prediction

time horizon in focus is short-term load forecasting.

Building-level load forecasting methods mainly comprise of 2 types: (i) Physical modelling

approaches and (ii) data-driven approaches. The physical models or white-box models depend

on thermodynamic laws for energy modelling and analysis. Software’s that utilize physical

models for building energy simulation include EnergyPlus and Ecotect. These modelling

software’s use comprehensive building and environmental parameters such as building

construction details; operation schedules; HVAC design information; and climate, sky, solar

and shading information to calculate building energy consumption[2]. Such detailed data of the

buildings may not always be available and therefore resulting in poor performance during

simulation.

Data-driven forecasting methods on the other hand do not require such comprehensive

collection of features but instead learns form historical data for prediction. Data driven

forecasting models include statistical models, hybrid models and also machine learning models.

Traditionally, statistical models like ARIMA, SARIMA, SARIMAX have been used for short

term load forecasting[27], [28],[29]. But, with the popularity of machine learning and the

improvements made in computing power and more data available, forecasting has shifted to

more computational models. Machine learning models like SVR[30], SVM[31] and k-NN[32]

were used for forecasting energy loads to good effect. The development of intelligent

optimization technologies have led to different types of smart optimization algorithms being

Page 13: Sreerag Thesis Word - Virginia Tech

7

applied to the field of building energy forecasting[7]. Hybrid models have been used using

these optimization algorithms in conjunction with machine learning models to good effect

[4],[33].

In the past couple of years, deep learning methods have achieved tremendous success in

handling complex sequential data [34], [35]. Therefore, deep learning methods have also found

its application in load forecasting applications and has successfully shown to be capable of

surpassing various benchmark models, such as simple ANN and traditional statistical time

series methods such as ARIMA, SARIMA[9]. With increased computational power, larger

datasets and higher granularity of data available, deep learning models are destined to rule the

space of load forecasting. The electricity load dataset, the horizon of forecasting and the

evaluation metrics used in each deep learning studies for the building level forecasting are

described in the below table.

Page 14: Sreerag Thesis Word - Virginia Tech

8

Table 1 : Deep learning studies at a building level, datasets used and their prediction horizons.

Deep Learning

Model

Dataset Used Forecast

Horizon

Evaluation

Metric

LSTM-S2S[36] UCI Single Household 60 hr. RMSE

CNN[37] UCI Single Household 1 h, 60 hr. RMSE

RBM[25] UCI Single Household 1 d, 1 wk. RMSE

Autoencoders

and GAN[38]

Educational Building in

Hong Kong

1 hr. RMSE

GRU,

LSTM[39]

Educational Building in

Hong Kong

1 d RMSE

Gated CNN and

RNN[40]

BEMOSS project and

EnerNOC buildings

1 d CV-RMSE

LSTM[26] Canadian Household 1 hr. RMSE

LSTM[41] 48 non-residential

Chinese

buildings/industries

1 d RMSE

LSTM[42] CER Ireland (920homes) 1 d RMSE

RBM[43] 40 industrial customers

of KEPCO Korea

1 d RMSE

CNN[44] Pecan Street (220homes) 6 h, 1 d RMSE

CNN[9] Pecan Street (220homes) 1 d CV-RMSE

LSTM-S2S[5] Pecan Street (30 homes)

and Single Building in

Utah

1 wk., 1 yr. CV-RMSE

Page 15: Sreerag Thesis Word - Virginia Tech

9

3. Methodology

The CNN, LSTM and MLP models used in the study for short term load forecasting all belong

to the time series category. Being time series models, they do not require additional time

indexing parameters. As a result, these models are capable of revealing time-dependencies

intrinsically embedded in the input data and circumvent possible problems brought by incorrect

time-index labelling[40]. The deep learning models compared in this study have been

successfully used in the study of sequential and time-series data.

3.1 Problem Formulation

All the models used in the study are formulated into the same supervised learning framework

to ensure a fair comparison to answer the following research questions:

i. If there is any value in setting up individual deep learning forecasting models for each

home?

ii. Whether adding weather based and time-based features to the load data shown an

improvement in forecasting performance for multi-step ahead predictions?

For all the experiments conducted in this study, the input matrix X consists of historical load

profile along with weather-based and time-based features, and the output vector Y refers to the

predicted load profile. Both the input historical load window and the output vector (prediction

horizon) are configurable.

Page 16: Sreerag Thesis Word - Virginia Tech

10

There are 3 approaches of dealing with multi-step ahead time series forecasting i.e. (i) Direct

approach (ii) Recursive Approach and (iii) Multi-Input Multi Output (MIMO) approach. The

recursive approach may suffer from the trouble of error accumulation while the direct approach

requires more computational power compared to the recursive approach. On the other hand, the

MIMO approach circumvents the error accumulation drawback presented in the recursive

method and also overcomes the conditional independency assumption used in the direct

approach[39]. For the day-ahead (multi-step ahead) load prediction in this study, MIMO

approach has been used to forecast the load at each hour for the next day.

Figure 2 : MIMO approach for multi-step ahead predictions [39]

3.2 Theoretical Background – Deep Learning

According to Yann Lecun et al, “Deep learning allows computational models that are composed

of multiple processing layers to learn representations of data with multiple levels of

abstraction”[45]. Although the idea of ‘deep learning’ has been floating around for decades, it

had often been considered as a fancy concept rather than feasible technology. This was mainly

Page 17: Sreerag Thesis Word - Virginia Tech

11

due to 3 constraints i.e. (i) lack of sufficient training data (ii) lack of computing power and (iii)

lack of efficient training algorithm[42]. With the advancements in the semiconductor industry

resulting in powerful graphic processing units (GPUs) and rapid digitalization of the world,

these constraints are now taken care of. Moreover, Geoffrey Hinton’s[46] quantum leap in

developing an efficient neural network training resulted in deep learning implementations made

feasible. In the past few years, deep learning models have been extremely popular in the areas

of computer vision, speech recognition, machine translation and board game programs where

they have produced results equivalent to expert human performance or sometimes even exceed

it [47].

The huge advantage that deep learning models have over traditional machine learning models

is the fact that they learn high-level features from data in an incremental manner which removes

the requirement of subject knowledge and bothersome feature extraction[48]. The main

rationale of using deep learning models in this study is in its superior ability compared to the

traditional neural networks i.e. (i) To learn highly non-linear relationships and (ii) To learn

shared uncertainties.

3.3 Theoretical Background – Long Short-Term Networks (LSTMs)

LSTM or Long-Short Term Networks belongs to a class of networks called recurrent neural

networks that can learn the order reliance between items in a sequence. Recurrent neural

networks(RNNs) are specifically created for dealing with sequential data and they have been

Page 18: Sreerag Thesis Word - Virginia Tech

12

effectively used in the fields of machine translation, speech synthesis and time series

prediction[49]. Traditional RNNs often suffer from the problem of vanishing gradients which

reduces its efficacy in dealing with long data sequences. LSTMs are able to partially mitigate

this problem with the help of gates which control the flow of information, thus making them

ideal for dealing with time series data with long temporal dependencies.

An LSTM unit memory cell comprises of three gates i.e. the input gate, an output gate and the

forget gate which regulates the flow of information within the unit cell. The gates contain

sigmoid activation functions squishing values in between 0 and 1 which is useful in updating

or forgetting information. The forget gate decides what information should be kept or forgotten

with inputs as information from the previous hidden state and current input going through the

activation function. The input gate is then used to update the cell state and the output gate is

used to compute the new hidden state of the LSTM cell[50]. This mechanism of forgetting and

keeping information within a cell makes LSTM ideal with dealing with sequential data.

Figure 3 : LSTM Unit Cell and LSTM architecture used for time series forecasting.

Page 19: Sreerag Thesis Word - Virginia Tech

13

3.4 Theoretical Background – Convolutional Neural Networks (CNNs)

Convolutional Neural Networks belong to a class of deep learning networks that are used for

processing data with a grid like topology[49]. This can include time-series data and image data

which can be thought of as a 1-D and 2-D data grid respectively. They have been successfully

used in the applications of computer vision, human activity recognition, natural language

processing, drug discovery, time series forecasting etc. ([51],[8],[52],[53],[40]). CNN uses a

specialized linear mathematical operation called convolution in at least one of its layers[49]. In

CNN’s, convolution operation is done using repeated application of filters or kernels on the

input data to obtain a feature map.

Three different operations take place in the convolutional layer. The first operation described

above results in the production of the feature map. The second step is activation of the elements

in the feature map using a nonlinear activation function i.e. mostly RELU or rectified linear

activation function[49]. In the third step, a pooling operation is used to smoothen and reduce

the dimensions of the feature map output. Max pooling method is used in this study. It returns

an array of the values of maximum output within a rectangular neighborhood from the previous

layer[49]. The CNN network may consist of one or more convolutional layers. After the

convolutional layers create their outputs, it is then received by the hidden layers or fully

connected layers. The output layer is positioned following the hidden layer and it performs an

identical role to that of an output layer in a conventional neural network[37].

Page 20: Sreerag Thesis Word - Virginia Tech

14

Figure 4 : CNN Architecture for time series data2

3.5 Theoretical Background – Multi Layered Perceptron (MLPs)

Multilayer Perceptron also known as feedforward or artificial neural networks are the

archetypal deep learning models. Multilayer perceptrons are powerful machine learning models

that are used for learning non-linear relationships within the data and are highly flexible

universal approximators [54]. They are extremely useful to machine learning practitioners and

form the basis for many of the commercial machine learning applications [49]. They have been

successfully used in load forecasting and other time series applications[2].

On a high level, the simple neural network consists of an input layer, hidden layer and an output

layer. Unlike recurrent neural networks they have no feedback connections in which outputs of

the model are fed back into itself [49]. Networks with just 1 hidden layer also known as vanilla

artificial neural networks are used in this study. Detailed information on the workings of the

multilayered perceptron can be found in the following literature[49].

Page 21: Sreerag Thesis Word - Virginia Tech

15

Figure 5 : Simple Multi Layered Perceptron1

3.5.1 Optimization Algorithm

In this study, all the models use ADAM optimization algorithm for optimizing the weights of

each layer. This adaptive learning rate optimization algorithm shows quicker convergence than

the traditional SGD[55]. It is a first order gradient-based optimization algorithm which is

intuitive, computationally efficient and tailor-made for optimizing models which involve a large

set of parameters. Unlike the stochastic gradient descent which naively updates the weights

with a constant learning rate, ADAM optimization algorithm computes individual adaptive

learning rates from the moments of the gradients. Further details about the ADAM optimization

algorithm can be found in the literature[55].

1,2 These figures are generated using the web application: http://alexlenail.me/NN-SVG/index.html

Page 22: Sreerag Thesis Word - Virginia Tech

16

3.5.2 Regularization

Machine Learning models usually suffer from the problem of overfitting that results in testing

errors much worse than errors on the training data. This occurs when the model fits the data in

the training phase too well, resulting in high variance and low bias. The strategies to decrease

the testing error sometimes at the cost of increased training error is known as regularization[49].

In this study, weight decay regularization is used to address the problem of overfitting. The

regularization parameter lambda is selected as 0.01. This is the default lambda value in

Keras[56].

Page 23: Sreerag Thesis Word - Virginia Tech

17

4. Evaluation Method

4.1 Data Collection and Characteristics

4.1.1 Residential Homes Data

The residential building data used for the study is obtained from Pecan Street Inc. Dataport[57]

from Austin, New York and California. The dataset is publicly available and is downloaded

using the free student license. 1-year data of 15 min frequency load data is used for the study

where 25 homes from Austin, 24 homes from New York and 23 homes from California are

selected. All the homes in Austin used in the study are in the time range of: 1 January 2018 to

1 January 2019 whereas the homes in California consist of yearly data anywhere between 1

January 2014 to 1 January 2019. In the case of New York, the data consists of half year data

but with ranging from 1 January 2019 to 31 October 2019.

4.1.2 Weather Data

The use of weather data in the study is only done for the city of Austin. 1 year of weather data

is obtained from the website: openweathermap.org [58] from 1 January 2018 to 1 January 2019.

The weather data consists of temperature, humidity and atmospheric pressure.

Page 24: Sreerag Thesis Word - Virginia Tech

18

4.2 Data Description and Pre-processing

A total of 72(25, 24 and 23) homes from both Austin, New York and California are initially

selected. All the 72 homes are checked for missing values. The deep learning models would not

run with missing values present in the dataset. It is found that all the homes in California do not

have any missing values whereas several homes in Austin contain missing values. In the case

of New York, large sections of the data is missing and the collection of homes are analysed as

a separate case study.

4.2.1 Austin

The home types and the missing value data from the city of Austin is provided in the table

below. Homes in Austin with missing values greater than 0.5 % (approximately 170 missing

values out of 35000) are omitted and those with homes with missing values present, but less

than 0.5 % are interpolated linearly to fill out the missing values. The houses omitted from

Austin are highlighted in orange in the below table. This finally gives us 20 homes from Austin

(after interpolation). All the homes in Austin belong to the same building type with a few homes

having solar generation capacity.

Page 25: Sreerag Thesis Word - Virginia Tech

19

Table 2 : List of residential buildings from Austin. Source: Pecan Street Inc, Dataport[57]

House ID % Missing Values Building Type Solar Available

661 0.95 Single-Family Home Yes 1642 0.52 Single-Family Home Yes 2335 0 Single-Family Home Yes 2361 0 Single-Family Home Yes 2818 0 Single-Family Home Yes 3039 1.5 Single-Family Home No 3456 0.01 Single-Family Home Yes 3538 0 Single-Family Home Yes 4031 0.01 Single-Family Home Yes 4373 0.01 Single-Family Home Yes 4767 0 Single-Family Home Yes 5746 0.38 Single-Family Home No 6139 0 Single-Family Home Yes 7536 0.01 Single-Family Home Yes 7719 0 Single-Family Home Yes 7800 0.06 Single-Family Home Yes 7901 0.01 Single-Family Home No 7951 0 Single-Family Home No 8156 0.01 Single-Family Home Yes 8386 0.01 Single-Family Home No 8565 0 Single-Family Home No 9019 0 Single-Family Home Yes 9160 0 Single-Family Home Yes 9278 7.08 Single-Family Home No 9922 4.26 Single-Family Home Yes

Page 26: Sreerag Thesis Word - Virginia Tech

20

Figure 6 : 1 year of load data for a home (with solar, ID=2361) in Austin.

Figure 7 : 1-week of load data for a home (with solar, ID=2361) in Austin.

4.2.2 California

It is found that all 23 homes in California contained no missing values. But they consisted of

different building types i.e. single-family home, a townhome or an apartment. Except one home,

none of the homes in California had solar generation capacity. The data of the homes is given

below.

Page 27: Sreerag Thesis Word - Virginia Tech

21

Table 3 : List of residential buildings from California. Source: Pecan Street Inc, Dataport[57]

House ID Missing Values Building Type Solar Available

203 0 Single-Family Home No

1450 0 Town Home No

1524 0 Single-Family Home No

1731 0 Town Home No

2606 0 Town Home No

3687 0 Town Home No

3864 0 Town Home No

3938 0 Apartment No

4495 0 Apartment No

4934 0 Town Home No

5938 0 Town Home No

6377 0 Apartment No

6547 0 Town Home No

7062 0 Town Home No

7114 0 Town Home No

8061 0 Town Home No

8342 0 Town Home No

8574 0 Apartment No

8733 0 Apartment No

9213 0 Apartment No

9612 0 Town Home No

9775 0 Apartment No

9836 0 Town Home Yes

Page 28: Sreerag Thesis Word - Virginia Tech

22

Figure 8 : 1 year of load data for a home (no solar, ID = 1450) in California

Figure 9 : 1 week of load data for a home (no solar, ID = 1450) in California

4.2.3 New York

In the case of the New York dataset, it was found that for all the 24 homes 11 months of data

was present for the dates between January 1, 2019 and October 31, 2019, with large portions

of the data missing for similar periods of dates. This can be observed in Figure 10 for home

ID=914. Other homes also show a similar pattern. All the homes are of the Single-family home

type with a few homes having solar generation capacity.

Page 29: Sreerag Thesis Word - Virginia Tech

23

Table 4 : List of residential buildings from California. Source: Pecan Street Inc, Dataport[57]

House ID Missing Values Building Type Solar Available

27 0 Single-Family Home Yes

387 0 Single-Family Home Yes

558 0 Single-Family Home No

914 0 Single-Family Home Yes

950 0 Single-Family Home Yes

1222 0 Single-Family Home Yes

1240 0 Single-Family Home No

1417 0 Single-Family Home No

2096 0 Single-Family Home No

2318 0 Single-Family Home No

2358 0 Single-Family Home No

3000 0 Single-Family Home Yes

3488 0 Single-Family Home Yes

3517 0 Single-Family Home Yes

3700 0 Single-Family Home No

3996 0 Single-Family Home No

4283 0 Single-Family Home No

4550 0 Single-Family Home No

5058 0 Single-Family Home Yes

5587 0 Single-Family Home Yes

5679 0 Single-Family Home Yes

5982 0 Single-Family Home No

5997 0 Single-Family Home Yes

9053 0 Single-Family Home No

Page 30: Sreerag Thesis Word - Virginia Tech

24

Figure 10 : 1 year of load data for a home (with solar, ID=914) in New York

Fig 11 : 1-week load data for a home (with solar, ID = 914) in New York

Page 31: Sreerag Thesis Word - Virginia Tech

25

4.3 Feature Engineering

To test the efficacy of the multistep forecasting (24 hours ahead) models with different

combinations of features (multivariate forecasting), features had to be manually added to the

load data. This experiment is only carried out for a single home in Austin (ID=2361). 24 hours

ahead prediction is done for both grid and use data. Both, the grid and use data was rescaled

from 15-min frequency to 1-hour frequency as the weather data was available only in the latter

frequency. Weather based features: temperature, humidity, pressure and time-based features:

day of the week, weekend/day, holiday data are used in this study. All these features are for the

time range of 1 January 2018 to 1 January 2019. The weather and holiday data which is obtained

from an external source is manually appended to the load data. The ‘day of the week’ and

‘weekend/day’ features are constructed from the load data file using the datetime index in the

load file. For the ‘day of the week’ feature, values of 0 to 6 are assigned for the days Monday

to Sunday. For the ‘weekend/day’ feature, a value of 1 and 0 was assigned for weekdays and.

The above-mentioned features were added to the load data after rescaling and the Pearson

correlations between the load data and the features were studied. It is observed that the time-

based features i.e. ‘day of the week’ and ‘weekend/day’ shows almost close to zero correlation

to the load data. Similarly, the pressure variable also shows minimal correlation with the load

data. Thus, these features are not considered for the multivariate-multistep forecast study.

Page 32: Sreerag Thesis Word - Virginia Tech

26

4.4 Implementation Setup

All the models are developed on top of a Keras API running on top of a TensorFlow version

1.0 backend[59]. The analysis is done using Google Colab (Colaboratory) online compiler

which gives us the access to their external graphic processing units.7 different models are used

in this study to carry out the multiple home analysis that include a multi-layer perceptron,

LSTM and CNN networks. The models have been named MLP-1, CNN-1, CNN-2, CNN-3,

CNN-4, LSTM-1 and LSTM-2 respectively. For all the models, 70 % of the data is used for

training and 30 % of the data is used for testing the model. The MLP-1 model consists of 3

layers with 32 units in the input layer, 16 units in the hidden and single output unit. The other

architectures of the models used is provided in the table.

Feature Correlation with

‘Grid’

Temperature 0.424

Humidity -0.335

Pressure -0.169

‘Day of the

week’ -0.0049

‘Weekend/day’ -0.0038

Feature Correlation with

‘Use’

Temperature 0.479

Humidity -0.313

Pressure -0.168

‘Day of the

week’ -0.0068

‘Weekend/day’ -0.0054

Table 5 : Correlation with 'Grid' data Table 6 :Correlation with 'Use' data

Page 33: Sreerag Thesis Word - Virginia Tech

27

Table 8 : CNN model architectures used for multiple home analysis.

CNN Model Filters Kernel Sizes Pooling Filters Hidden Layers

1 [32] [[3]] [2] [1]

2 [64,32] [[3,3]] [2,2] [1]

3 [32,32,64,64,32,32] [[3,3,3,3,3,3,3]] [2,2] [1]

4 [32,8] [[4,4]] [3] [1]

Table 9 : LSTM architectures used for multiple home analysis.

LSTM Model LSTM Units Dropout Hidden Layers

1 [30] 0.2 1

2 [30,15] 0.2,0.2 1

Figure 12 : Code block of CNN-4 architecture compiled in Python in Google Colab

Page 34: Sreerag Thesis Word - Virginia Tech

28

4.5 Evaluation Metric

To assess the efficacy of the model, the accuracy metric: RMSE or root mean squared error is

used. It is a scale dependent accuracy metric.

𝑅𝑀𝑆𝐸 = '∑ )𝑦!"#$ − 𝑦%&',()*+

𝑛

where,

ypred: is the predicted values

yact: is the actual values

n: number of samples.

Page 35: Sreerag Thesis Word - Virginia Tech

29

5. Results and Discussion

5.1 Introduction

This study has been conducted to answer the research questions which are mentioned at the end

of Section 1. This section will describe the results obtained from carrying out experiments for

the two major research questions i.e. to evaluate a need for individual deep learning models for

each home for a collection of homes and to evaluate multistep-ahead (24-hour ahead)

forecasting for an individual home using different features.

5.2 Multiple Home Analysis

In this experiment, 7 different models consisting of an ANN, CNN and LSTM architectures are

used for single-step (15-min ahead) univariate forecasting across all the homes in the 3 locations

i.e. Austin, California and New York from the Pecan Street data. The models are trained and

tested on individual homes and the forecasting performance is evaluated using RMSE values.

For answering the research question, the RMSE values of all the homes for all the models are

tabulated and the overall best model is identified using the minimum average RMSE of each

model. Then for each home the overall best model’s RMSE is compared with all other model’s

RMSE to see any significant difference (if any) from the best model. Significant differences in

RMSE would indicate a need for separate model for that particular home. The multiple home

analysis is done as 2 separate case studies. One of the case studies is for Austin and California

Page 36: Sreerag Thesis Word - Virginia Tech

30

which after preprocessing do not contain any missing values and the second case study is for

New York where all the homes contain significant chunks of missing values.

5.2.1 Austin and California

5.2.1.1 Austin

The 7 different model architectures trained on each home are ran across all 20 homes in Austin.

It is observed that LSTM-2 architecture shows the best overall performance whereas MLP-1

showed the worst performance amongst the 7 models. It is also observed that increasing model

complexity by adding more layers in the case of CNN and LSTM do not show any significant

improvement in the forecasting performance. The table below shows the test RMSE values

obtained for all the 7 models on the Austin homes dataset for single-step (15-min ahead)

predictions. The best overall model (LSTM-2) is compared to all other model’s RMSE for each

home and the percentage difference between the best model RMSE and minimum RMSE for

that home is noted down in the table below. In the case of Austin dataset, it is observed that

only 3 homes show significant difference (> 5 % difference in RMSE) between the best overall

model and the min. RMSE for that home.

Page 37: Sreerag Thesis Word - Virginia Tech

31

Table 10 : Test RMSE (in kW) values of 20 homes in Austin.

House

ID:

Austin

MLP-1 CNN-1 CNN-2 CNN-3 CNN-4 LSTM-1 LSTM-2

Variance

of

load(kW2)

Min RMSE

for that

home(kW)

% Difference

between best

model and min

RMSE for the

home

2335 0.897 0.877 0.88 0.842 0.866 0.857 0.854 3.48 0.842 1.41

2361 0.756 0.753 0.753 0.733 0.763 0.761 0.743 3.57 0.733 1.35

2818 0.509 0.487 0.488 0.516 0.493 0.512 0.492 2.33 0.487 1.02

3456 0.555 0.564 0.548 0.547 0.536 0.552 0.533 1.82 0.533 0

3538 0.505 0.483 0.489 0.469 0.464 0.47 0.466 1.11 0.464 0.43

4031 0.651 0.628 0.651 0.639 0.636 0.628 0.62 1.94 0.62 0

4373 0.925 0.917 0.928 0.905 0.913 0.904 0.903 4.33 0.903 0

4767 1.014 0.995 1.003 0.908 0.992 0.974 0.975 5.22 0.908 6.87

5746 0.355 0.328 0.334 0.333 0.323 0.33 0.328 0.74 0.323 1.52

6139 0.821 0.813 0.811 0.809 0.815 0.811 0.81 2.36 0.809 0.12

7536 0.892 0.819 0.779 0.842 0.808 0.802 0.79 2.28 0.779 1.39

7719 1.08 1.034 1.101 0.93 1.054 1.039 1.033 3.63 0.93 9.97

7800 0.801 0.72 0.745 0.69 0.727 0.704 0.696 2 0.69 0.86

7901 0.667 0.588 0.597 0.551 0.583 0.618 0.612 1.24 0.551 9.97

7951 1.13 1.108 1.106 1.082 1.11 1.091 1.095 4.21 1.082 1.19

8156 1.001 0.941 0.909 0.981 0.914 0.937 0.906 4.64 0.906 0

8386 0.421 0.422 0.419 0.4 0.422 0.41 0.409 0.52 0.4 2.2

8565 0.523 0.532 0.522 0.53 0.504 0.508 0.5 1.06 0.5 0

9019 0.466 0.462 0.475 0.442 0.464 0.46 0.455 0.7 0.442 2.86

9160 0.541 0.503 0.504 0.6 0.493 0.513 0.496 1.17 0.493 0.6

Avg 0.7255 0.6987 0.7021 0.6874 0.6939 0.694 0.686

Table 11 : Homes showing significant and moderate differences in RMSE values w.r.t best model.

Homes with significant RMSE difference (> 5 %) 3

Homes with a moderate RMSE difference (2-5 %) 2

Total homes in Austin dataset 20

Page 38: Sreerag Thesis Word - Virginia Tech

32

Figure 13, 14 and 15 show the actual and predicted load values for MLP-1, CNN-3 and LSTM-

2 for 200 timesteps. There is only a slight variation in performance observed in the figures.

Although, the differences in performance is hard to observe at first glance; on closer

observation, it can be observed that CNN-3 and LSTM-2 show better accuracy in predicting the

peaks compared to MLP-1.

Figure 13 : Predicted and actual values for MLP-1 for a home in Austin (ID=4031)

Figure 14 :Predicted and actual values for CNN-3 for a home in Austin (ID=4031).

Page 39: Sreerag Thesis Word - Virginia Tech

33

Figure 15 : Predicted and actual values for LSTM-2 for a home in Austin (ID=4031).

5.2.1.2 California

In the California homes dataset, similar to the Austin dataset, LSTM-2 and MLP-1 shows the

best and worst performance respectively. Although, CNN-4 shows the best performance

amongst the CNN models. Like Austin dataset, increasing model complexity do not show an

improvement in performance. In fact, LSTM-2 and CNN-4 shows better performance than their

more complex counterparts. The overall average RMSE observed is significantly lesser the

Austin dataset. This can be explained by the fact all but one of the homes in California has no

solar generation capacity. After comparing the best overall model (LSTM-2) with the minimum

RMSE for that home, it is found that only 2 homes show a significant difference from the best

overall model.

Page 40: Sreerag Thesis Word - Virginia Tech

34

Table 12 : Test RMSE (in kW) values of 23 homes in California

House

ID: MLP-1 CNN-1 CNN-2 CNN-3 CNN-4 LSTM-1 LSTM-2

Variance

of

load(kW2)

Min

RMSE for

that

home(kW)

% Difference

between best

model and min

RMSE for the

home

203 0.43 0.393 0.396 0.381 0.392 0.405 0.396 0.24 0.381 3.79

1450 0.674 0.659 0.68 0.614 0.659 0.671 0.664 0.85 0.614 7.53

1524 0.486 0.47 0.479 0.477 0.461 0.433 0.433 0.47 0.433 0

1731 0.222 0.23 0.195 0.194 0.237 0.199 0.195 0.3 0.194 0.51

2606 0.609 0.601 0.58 0.539 0.594 0.579 0.585 0.76 0.539 7.86

3687 0.387 0.389 0.42 0.403 0.377 0.386 0.382 0.53 0.377 1.31

3864 0.388 0.371 0.384 0.429 0.377 0.369 0.381 0.3 0.369 3.15

3938 0.195 0.187 0.192 0.211 0.189 0.188 0.187 0.08 0.187 0

4495 0.107 0.105 0.106 0.108 0.105 0.104 0.106 0.03 0.104 1.89

4934 0.16 0.159 0.156 0.153 0.157 0.16 0.159 0.16 0.153 3.77

5938 0.162 0.159 0.166 0.171 0.161 0.176 0.158 0.04 0.158 0

6377 0.164 0.167 0.166 0.167 0.168 0.165 0.164 0.11 0.164 0

6547 0.355 0.367 0.375 0.373 0.372 0.369 0.368 0.17 0.355 3.53

7062 0.408 0.393 0.399 0.388 0.398 0.39 0.392 0.35 0.388 1.02

7114 0.402 0.396 0.398 0.477 0.384 0.373 0.373 0.35 0.373 0

8061 0.323 0.335 0.324 0.328 0.337 0.326 0.322 0.3 0.322 0

8342 0.253 0.258 0.258 0.256 0.251 0.258 0.258 0.31 0.251 2.71

8574 0.272 0.268 0.277 0.277 0.264 0.268 0.267 0.2 0.264 1.12

8733 0.253 0.252 0.248 0.237 0.25 0.246 0.244 0.39 0.237 2.87

9213 0.188 0.191 0.201 0.197 0.192 0.196 0.187 0.13 0.187 0

9612 0.279 0.29 0.279 0.277 0.285 0.282 0.283 0.49 0.277 2.12

9775 0.238 0.231 0.227 0.226 0.228 0.226 0.226 0.21 0.226 0

9836 .192 0.186 0.192 0.221 0.198 0.195 0.193 0.45 0.186 3.63

Average 0.3102 0.307 0.3086 0.3086 0.3049 0.3027 0.3020

Page 41: Sreerag Thesis Word - Virginia Tech

35

Table 13 : List of homes showing significant and moderate differences in RMSE values w.r.t best model.

Homes with significant RMSE difference (> 5 %) 2

Homes with a moderate RMSE difference (2-5 %) 8

Total homes in California dataset 23

Figure 16, 17 and 18 shows the actual and predicted load values of MLP-1, CNN-3 and LSTM-

2 for a home in California. From the three figures, it is apparent that the deep learning models

outperform the simple multilayer perceptron model.

Figure 16 : Predicted and actual values for MLP-1 for a home in California (ID=1731)

Page 42: Sreerag Thesis Word - Virginia Tech

36

Figure 17 : Predicted and actual values for CNN-3 for a home in California (ID=1731)

Figure 18 : Predicted and actual values for LSTM-2 for a home in California (ID=1731)

Page 43: Sreerag Thesis Word - Virginia Tech

37

5.2.2 New York

In the case of the New York dataset, it is observed that all the 24 homes contain large chunks

of missing data in the first half of the year as shown in Figure 10. The 7 models are run on these

homes on the reasonable assumption that the missing values should not affect single step

prediction as the lookback window is only 6 hours or 24 timesteps. Interestingly it is observed

that CNN-3 shows the best overall performance in the case of New York dataset. After

comparing the best overall model in the case of New York dataset (CNN-3) with the minimum

RMSE for each home, it is found that 6 homes show significant difference compared to the

best overall model.

From the table above it can be observed that for homes with high variances, the CNN-3 model

seems to do better and for homes with low variances LSTM-2 model seems do perform the best.

It is also observed that despite missing large chunks of data in the initial months, the models

are able to produce RMSE values comparable to that observed in the Austin dataset. This could

be due to the fact that for single step ahead prediction, with a 6-hour lookback window, the

large missing chunks of data does not seem a significant factor to affect the performance.

Page 44: Sreerag Thesis Word - Virginia Tech

38

Table 14 : List of homes showing significant and moderate differences in RMSE values w.r.t best model.

House

ID: MLP-1 CNN-1 CNN-2 CNN-3 CNN-4

LSTM-

1 LSTM-2

Variance

of

load(kW2)

Min

RMSE for

that

home(kW)

% Difference

between best

model and min

RMSE for the

home

27 0.855 0.926 0.958 0.793 0.836 0.815 0.945 10.36 0.793 0

387 0.316 0.316 0.308 0.307 0.31 0.313 0.34 0.69 0.307 0

558 0.351 0.352 0.355 0.352 0.345 0.35 0.557 0.75 0.345 1.99

914 0.704 0.654 0.683 0.657 0.666 0.705 0.896 6.69 0.654 0.46

950 2.133 2.039 2.086 1.926 2.017 2.036 2.136 23.5 1.926 0

1222 1.106 1.059 1.055 1.05 1.061 1.069 1.027 4.67 1.027 2.19

1240 0.514 0.504 0.51 0.516 0.498 0.499 0.377 0.4 0.377 26.94

1417 0.4 0.395 0.391 0.389 0.397 0.394 0.481 0.57 0.389 0

2096 0.583 0.549 0.554 0.573 0.549 0.555 0.54 0.35 0.54 5.76

2318 0.402 0.396 0.398 0.402 0.397 0.395 0.385 0.4 0.385 4.23

2358 0.366 0.355 0.356 0.36 0.361 0.36 0.278 0.16 0.278 22.78

3000 0.665 0.62 0.61 0.602 0.642 0.639 0.677 3.83 0.602 0

3488 0.518 0.491 0.505 0.496 0.502 0.503 0.684 1.46 0.491 1.01

3517 0.424 0.396 0.399 0.389 0.391 0.384 0.475 1.53 0.384 1.29

3700 0.163 0.166 0.16 0.159 0.164 0.164 0.13 0.06 0.13 18.24

3996 0.701 0.699 0.695 0.71 0.701 0.702 0.706 0.71 0.695 2.11

4283 0.526 0.503 0.502 0.512 0.504 0.501 0.469 0.67 0.469 8.4

4550 0.348 0.343 0.344 0.351 0.344 0.346 0.284 0.23 0.284 19.09

5058 0.46 0.459 0.451 0.446 0.467 0.464 0.436 1.47 0.436 2.24

5587 0.666 0.651 0.661 0.646 0.656 0.667 0.648 2.76 0.646 0

5679 0.857 0.826 0.847 0.815 0.828 0.837 0.827 3.19 0.815 0

5982 0.43 0.429 0.426 0.42 0.428 0.425 0.472 0.76 0.42 0

5997 0.761 0.751 0.861 0.744 0.796 0.749 0.795 5.67 0.744 0

9053 0.402 0.398 0.397 0.402 0.401 0.403 0.347 0.61 0.347 13.68

Average 0.6104 0.5948 0.6033 0.5853 0.5942 0.5947 0.5928

Table 15 : : List of homes showing significant and moderate differences in RMSE values w.r.t best model.

Homes with significant RMSE difference (> 5 %) 7

Homes with a moderate RMSE difference (2-5 %) 4

Total homes in New York dataset 24

Page 45: Sreerag Thesis Word - Virginia Tech

39

Figure 19 : Predicted and actual values for MLP-1 for a home in New York (ID=27)

Figure 20 : Predicted and actual values for CNN-3 for a home in New York (ID=27)

Page 46: Sreerag Thesis Word - Virginia Tech

40

Figure 21 : Predicted and actual values for LSTM-2 for a home in New York (ID=27)

Page 47: Sreerag Thesis Word - Virginia Tech

41

5.2.3 Variation of RMSE with Variance

The minimum RMSE for each of the homes in the Austin and California dataset (total = 43)

were plotted. The New York dataset is not considered for this analysis as it contains large

missing chunks of data for all the homes. A linear relationship is observed between the variance

of the home and minimum RMSE. Higher variance in the load is shown to give poor minimum

RMSE values.

Figure 22 : Variance vs Min. RMSE for all the homes in Austin and California

The variance, minimum RMSE and the corresponding best model is noted down in the table

below for all the homes in Austin and California. The variance is arranged in increasing order

and along with the corresponding values. Although, these results are not conclusive, it is

observed that, for homes with higher variance, the CNN (highlighted in green) models seem to

perform better and for homes with lower variance, the LSTM (highlighted in orange) models

seem to work better.

R² = 0.8382

0

1

2

3

4

5

6

0 0.2 0.4 0.6 0.8 1 1.2

Var

ianc

e(in

kW2 )

Min. RMSE

Variance vs Min. RMSE

Variance

Linear (Variance)

Page 48: Sreerag Thesis Word - Virginia Tech

42

Table 16 : Best models for each home in Austin and California with their variances

Min RMSE(in kW) Variance(in kW2) from

lowest to highest Best

Model Best Model

Type 0.104 0.03 LSTM-1 LSTM 0.158 0.04 LSTM-2 LSTM 0.187 0.08 LSTM-2 LSTM 0.164 0.11 LSTM-2 LSTM 0.187 0.13 LSTM-2 LSTM 0.153 0.16 CNN-3 CNN 0.355 0.17 LSTM-2 LSTM 0.264 0.2 CNN-4 CNN 0.226 0.21 LSTM-2 LSTM 0.381 0.24 CNN-3 CNN 0.194 0.3 CNN-3 CNN 0.369 0.3 LSTM-1 LSTM 0.322 0.3 LSTM-2 LSTM 0.251 0.31 CNN-4 CNN 0.388 0.35 CNN-3 CNN 0.373 0.35 LSTM-2 LSTM 0.237 0.39 CNN-3 CNN 0.186 0.45 CNN-1 CNN 0.433 0.47 LSTM-2 LSTM 0.277 0.49 CNN-3 CNN 0.4 0.52 CNN-3 CNN

0.377 0.53 CNN-4 CNN 0.442 0.7 CNN-3 CNN 0.323 0.74 CNN-4 CNN 0.539 0.76 CNN-3 CNN 0.614 0.85 CNN-3 CNN 0.5 1.06 LSTM-2 LSTM

0.464 1.11 CNN-4 CNN 0.493 1.17 CNN-4 CNN 0.551 1.24 CNN-3 CNN 0.533 1.82 LSTM-2 LSTM 0.62 1.94 LSTM-2 LSTM 0.69 2 CNN-3 CNN 0.779 2.28 CNN-2 CNN 0.487 2.33 CNN-1 CNN 0.809 2.36 CNN-3 CNN 0.842 3.48 CNN-3 CNN 0.733 3.57 CNN-3 CNN 0.93 3.63 CNN-3 CNN 1.082 4.21 CNN-3 CNN 0.903 4.33 LSTM-2 LSTM 0.906 4.64 LSTM-2 LSTM 0.908 5.22 CNN-3 CNN

Page 49: Sreerag Thesis Word - Virginia Tech

43

5.3 Multistep ahead forecasting

In this experiment, effect of adding weather-based variables on multistep-ahead (24-hour

ahead) forecasting is studied. The experiments are carried out on a single home in Austin

(ID=2361) with solar generation capacity. 1-hour frequency data is used in these experiments.

The 24-hr ahead forecasting is done for both ‘grid data’ and ‘electricity use’ data for the home.

After trial and error, a best CNN model is identified for the multistep forecast and the effects

of adding the weather features and changing the length of the sliding window are studied in this

section.

5.3.1 Grid Data Forecasting

The CNN model is used for forecasting grid data for the home (ID=2361). Using different

combination of features for 24-hr ahead forecasting, it is found that adding weather-based

features such as ‘temperature’ and ‘humidity’ to the grid data do not show any performance

improvement when compared to using only ‘grid’ data for forecasting. This could be because

of the fact that ‘temperature’ and ‘humidity’ did not show a strong correlation with grid data.

It is also observed that model with all the features combined shows the worst results. This

observation could be due to overfitting of the model.

Table 17 : RMSE values with different combination of features for 'grid' data

Features Used Test RMSE Values (in kW)

Only Grid 0.9120

Grid with Temperature 0.9367

Grid with Humidity 0.9410

Grid with Temperature & Humidity 0.9841

Page 50: Sreerag Thesis Word - Virginia Tech

44

Figure 23 : 24-hr ahead forecast with 'Only grid'

Figure 24 : 24-hr ahead forecast with 'Grid' and 'Temperature'

Figure 25 : 24-hr ahead forecast with 'Grid' and 'Humidity'

Page 51: Sreerag Thesis Word - Virginia Tech

45

Figure 26 : 24-hr ahead forecast with all the features.

5.3.2 Use Data Forecasting

The same CNN model used for grid forecasting is used in forecasting the 24-hr ahead

‘electricity use’ data. Similar to the results obtained in the case of ‘grid’ data, it is found that

adding the weather-based features ‘temperature’ and ‘humidity’ do not show an improvement

in forecasting performance. Although the correlation of ‘temperature’ with ‘use’ is more than

with ‘grid’ as shown in Table 7, it is still not good enough to show an improvement in the

forecasting performance.

Table 18 : RMSE values with different combination of features for 'grid'' data

Features Used Test RMSE Values (in kW)

Only Use 0.5090

Use with Temperature 0.5115

Use with Humidity 0.5244

Use with Temperature & Humidity 0.5419

Page 52: Sreerag Thesis Word - Virginia Tech

46

Figure 27 : 24-hr ahead forecast for 'Use'

Figure 28 : 24-hr ahead forecast for Use and 'Temperature'

Figure 29 : 24-hr ahead forecast for 'Use' and 'Humidity'

Page 53: Sreerag Thesis Word - Virginia Tech

47

Figure 30 : 24-hr ahead forecast for all the features.

5.3.3 Variation with Lookback

In this experiment, the effect of different ‘lookback’ values i.e. the number of timesteps (or

hours in this case) used as input to predict the forecast, is analyzed. This effect is studied in the

case of predicting ‘electricity use’ value for a home in Austin (ID=2361). It is observed that

the ideal lookback for 24-hours ahead forecast lies in the range of 4-8 days (96-192 hours).

This behaviour is observed in 3 experiments involving forecasting using ‘Only Use’, ‘Use and

Temperature’ and ‘Use and Humidity’ as shown in Figure 31, 32 and 33 below. Similar results

are obtained in the case of 24-hour ahead forecasting using ‘grid’ data. Thus, a lookback of 4-

8 days gives the best results for 24-hour ahead forecasts.

Page 54: Sreerag Thesis Word - Virginia Tech

48

Figure 31 : RMSE vs Lookback for 24-hr forecast using 'Only Use'

Figure 32 : RMSE vs Lookback for 24-hr forecast using 'Use' and 'Temperature'

Figure 33 : RMSE vs Lookback for 24-hr forecast using 'Use' and 'Humidity'

0.470.480.49

0.50.510.520.530.540.550.56

24 48 72 96 120 144 168 192 216 240

RMSE

Lookback(In hrs)

0.48

0.49

0.5

0.51

0.52

0.53

0.54

24 48 72 96 120 144 168 192 216 240

RMSE

Lookback(In hours)

0.49

0.5

0.51

0.52

0.53

0.54

0.55

24 48 72 96 120 144 168 192 216 240

RMSE

Lookback(In Hours)

Page 55: Sreerag Thesis Word - Virginia Tech

49

6. Conclusions

With an increased growth of smart grids, smart homes and decentralized energy production,

load forecasting at an individual building level becomes an increasingly important task. Deep

learning models have shown to surpass the traditional statistical as well as hybrid models. With

more data available and increased computational power, their performance is only set to

improve. This study only reinforces the use of deep learning models for building energy

forecasting. With other studies focusing on models on an individual building or an aggregate

building load, this study aims to address this gap and serve as an analysis on short term load

forecasting on a community of residential buildings. The following conclusions are drawn from

this study:

1. It was observed that the deep learning models outperform the ANN model in all cases.

2. For the multiple home analysis, it is found that the LSTM-2 was the best overall model

in the case of Austin and California. It is also seen that only 5 homes out of a total of 43

homes (both Austin and California) show a significant difference from the best

performing overall model (> 5 % RMSE). This indicates that there is not a pressing need

for individual models for each home. The best overall model can be applied across all

homes giving satisfactory results.

3. For the multiple home analysis, it was also found that the CNN models are shown to

give better performances in the case of homes with higher variance and the LSTM

models are shown to have better performance in the case of homes with lower variance.

Page 56: Sreerag Thesis Word - Virginia Tech

50

4. In the case of New York dataset, all the homes have large chunks of missing data in the

initial months of the year. It is found that the missing data do not affect the performance

when it comes to single step forecasting. This could be because the lookback used is

only 6 hours.

5. In the case of 24 hour ahead forecasting, it is found that adding weather-based features

such as temperature and humidity did not show an improvement in forecasting

performance. This could be because of not having a strong enough correlation with the

weather-based features. In literature[40] it is shown that adding temperature can result

in improvement in forecasting performance where the correlation between temperature

and load is as high as 0.74.

6. For 24-hour forecast, it is also observed that the forecast performance is dependent on

the lookback window used. A lookback window in the range of 4 to 8 days is shown to

have best results.

7. A forecasting competition on a public residential dataset can be used to compare the

different models already published as most studies focus on only an individual home

and do not show the results on more than a single home.

Page 57: Sreerag Thesis Word - Virginia Tech

51

References

1. Omer, A.M., Energy use and environmental impacts: A general review. Journal of renewable and Sustainable Energy, 2009. 1(5): p. 053101.

2. Amasyali, K. and N.M. El-Gohary, A review of data-driven building energy consumption prediction studies. Renewable and Sustainable Energy Reviews, 2018. 81: p. 1192-1205.

3. Namlı, E., H. Erdal, and H.I. Erdal, Artificial Intelligence-Based Prediction Models for Energy Performance of Residential Buildings, in Recycling and Reuse Approaches for Better Sustainability. 2019, Springer. p. 141-149.

4. Li, K., et al., A hybrid teaching-learning artificial neural network for building electrical energy consumption prediction. Energy and Buildings, 2018. 174: p. 323-334.

5. Rahman, A., V. Srikumar, and A.D. Smith, Predicting electricity consumption for commercial and residential buildings using deep recurrent neural networks. Applied energy, 2018. 212: p. 372-385.

6. Dong, B., et al., A hybrid model approach for forecasting future residential electricity consumption. Energy and Buildings, 2016. 117: p. 341-351.

7. Li, K., et al., Building's electricity consumption prediction using optimized artificial neural networks and principal component analysis. Energy and Buildings, 2015. 108: p. 106-113.

8. Um, T.T., V. Babakeshizadeh, and D. Kulić. Exercise motion classification from large-scale wearable sensor data using convolutional neural networks. in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2017. IEEE.

9. Voß, M., C. Bender-Saebelkampf, and S. Albayrak. Residential short-term load forecasting using convolutional neural networks. in 2018 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm). 2018. IEEE.

10. Association, A.P.P. More accurate load forecasts help utilities save. 2018, September 25; Available from: https://www.publicpower.org/periodical/article/more-accurate-load-forecasts-help-utilities-save.

11. Gillies, D., B. Bernholtz, and P. Sandiford, A New Approach to Forecasting Daily Peak Loads Transactions of the American Institute of Electrical Engineers. Part III: Power Apparatus and Systems, 1956. 75(3): p. 382-387.

12. Che, J. and J. Wang, Short-term load forecasting using a kernel-based support vector regression combination model. Applied energy, 2014. 132: p. 602-609.

13. Bedi, J. and D. Toshniwal, Deep learning framework to forecast electricity demand. Applied Energy, 2019. 238: p. 1312-1326.

14. He, F., et al., A hybrid short-term load forecasting model based on variational mode decomposition and long short-term memory networks considering relevant factors with Bayesian optimization algorithm. Applied energy, 2019. 237: p. 103-116.

15. Wu, Z., et al., A hybrid model based on modified multi-objective cuckoo search algorithm for short-term load forecasting. Applied energy, 2019. 237: p. 896-909.

16. Sadaei, H.J., et al., Short-term load forecasting by using a combined method of convolutional neural networks and fuzzy time series. Energy, 2019. 175: p. 365-377.

Page 58: Sreerag Thesis Word - Virginia Tech

52

17. Liang, Y., D. Niu, and W.-C. Hong, Short term load forecasting based on feature extraction and improved general regression neural network model. Energy, 2019. 166: p. 653-663.

18. Bianchi, F.M., et al., Short-term electric load forecasting using echo state networks and PCA decomposition. Ieee Access, 2015. 3: p. 1931-1943.

19. Johannesen, N.J., M. Kolhe, and M. Goodwin, Relative evaluation of regression tools for urban area electrical energy demand forecasting. Journal of cleaner production, 2019. 218: p. 555-564.

20. Hayes, B., J. Gruber, and M. Prodanovic. Short-term load forecasting at the local level using smart meter data. in 2015 IEEE Eindhoven PowerTech. 2015. IEEE.

21. Sauter, P.S., et al. Load Forecasting in Distribution Grids with High Renewable Energy Penetration for Predictive Energy Management Systems. in 2018 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe). 2018. IEEE.

22. Ebrahim, A.F. and O.A. Mohammed, Pre-Processing of Energy Demand Disaggregation Based Data Mining Techniques for Household Load Demand Forecasting. Inventions, 2018. 3(3): p. 45.

23. Wang, Y., et al., Review of smart meter data analytics: Applications, methodologies, and challenges. IEEE Transactions on Smart Grid, 2018. 10(3): p. 3125-3148.

24. Srivastava, A., A.S. Pandey, and D. Singh. Short-term load forecasting methods: A review. in 2016 International Conference on Emerging Trends in Electrical Electronics & Sustainable Energy Systems (ICETEESES). 2016. IEEE.

25. Mocanu, E., et al., Deep learning for estimating building energy consumption. Sustainable Energy, Grids and Networks, 2016. 6: p. 91-99.

26. Kong, W., et al., Short-term residential load forecasting based on resident behaviour learning. IEEE Transactions on Power Systems, 2017. 33(1): p. 1087-1088.

27. Alberg, D. and M. Last, Short-term load forecasting in smart meters with sliding window-based ARIMA algorithms. Vietnam Journal of Computer Science, 2018. 5(3-4): p. 241-249.

28. Chakhchoukh, Y., P. Panciatici, and P. Bondon. Robust estimation of SARIMA models: Application to short-term load forecasting. in 2009 IEEE/SP 15th Workshop on Statistical Signal Processing. 2009. IEEE.

29. Bercu, S. and F. Proïa, A SARIMAX coupled modelling applied to individual load curves intraday forecasting. Journal of Applied Statistics, 2013. 40(6): p. 1333-1348.

30. Hu, Z., Y. Bao, and T. Xiong, Electricity load forecasting using support vector regression with memetic algorithms. The Scientific World Journal, 2013. 2013.

31. Jain, A. and B. Satish. Clustering based short term load forecasting using support vector machines. in 2009 IEEE Bucharest PowerTech. 2009. IEEE.

32. Voß, M., A. Haja, and S. Albayrak. Adjusted feature-aware k-nearest neighbors: Utilizing local permutation-based error for short-term residential building load forecasting. in 2018 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm). 2018. IEEE.

33. Li, K., H. Su, and J. Chu, Forecasting building energy consumption using neural networks and hybrid neuro-fuzzy system: A comparative study. Energy and Buildings, 2011. 43(10): p. 2893-2899.

34. Baccouche, M., et al. Sequential deep learning for human action recognition. in International workshop on human behavior understanding. 2011. Springer.

Page 59: Sreerag Thesis Word - Virginia Tech

53

35. Yu, D. and L. Deng, Deep learning and its applications to signal and information processing

IEEE Signal Processing Magazine, 2010. 28(1): p. 145-154. 36. Marino, D.L., K. Amarasinghe, and M. Manic. Building energy load forecasting using deep

neural networks. in IECON 2016-42nd Annual Conference of the IEEE Industrial Electronics Society. 2016. IEEE.

37. Amarasinghe, K., D.L. Marino, and M. Manic. Deep neural networks for energy load forecasting. in 2017 IEEE 26th International Symposium on Industrial Electronics (ISIE). 2017. IEEE.

38. Fan, C., et al., Deep learning-based feature engineering methods for improved building energy prediction. Applied energy, 2019. 240: p. 35-45.

39. Fan, C., et al., Assessment of deep recurrent neural network-based strategies for short-term building energy predictions. Applied energy, 2019. 236: p. 700-710.

40. Cai, M., M. Pipattanasomporn, and S. Rahman, Day-ahead building-level load forecasts using deep learning vs. traditional time-series techniques. Applied Energy, 2019. 236: p. 1078-1088.

41. Jiao, R., et al., Short-term non-residential load forecasting based on multiple sequences LSTM recurrent neural network. IEEE Access, 2018. 6: p. 59438-59448.

42. Shi, H., M. Xu, and R. Li, Deep learning for household load forecasting—A novel pooling deep RNN. IEEE Transactions on Smart Grid, 2017. 9(5): p. 5271-5280.

43. Ryu, S., J. Noh, and H. Kim, Deep neural network based demand side short term load forecasting. Energies, 2017. 10(1): p. 3.

44. Elvers, A., M. Voß, and S. Albayrak. Short-term probabilistic load forecasting at low aggregation levels using convolutional neural networks. in 2019 IEEE Milan PowerTech. 2019. IEEE.

45. LeCun, Y., Y. Bengio, and G. Hinton, Deep learning. nature, 2015. 521(7553): p. 436-444. 46. Hinton, G.E., S. Osindero, and Y.-W. Teh, A fast learning algorithm for deep belief nets.

Neural computation, 2006. 18(7): p. 1527-1554. 47. Silver, D., et al., Mastering the game of go without human knowledge. Nature, 2017.

550(7676): p. 354-359. 48. Cao, Y., et al., Predicting Long-Term Health-Related Quality of Life after Bariatric

Surgery Using a Conventional Neural Network: A Study Based on the Scandinavian Obesity Surgery Registry. Journal of clinical medicine, 2019. 8(12): p. 2149.

49. Goodfellow, I., Y. Bengio, and A. Courville, Deep learning. 2016: MIT press. 50. Olah, C. Understanding LSTM Networks 2015 [cited 2020; Available from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/. 51. Garg, R., et al. Unsupervised cnn for single view depth estimation: Geometry to the rescue.

in European Conference on Computer Vision. 2016. Springer. 52. Zhang, Y., S. Roller, and B. Wallace, MGNC-CNN: A simple approach to exploiting

multiple word embeddings for sentence classification. arXiv preprint arXiv:1603.00968, 2016.

53. Gawehn, E., J.A. Hiss, and G. Schneider, Deep learning in drug discovery. Molecular informatics, 2016. 35(1): p. 3-14.

54. Pao, J.J. and D.S. Sullivan, Time Series Sales Forecasting. Final Year Project, 2017. 55. Kingma, D.P. and J. Ba, Adam: A method for stochastic optimization. arXiv preprint

arXiv:1412.6980, 2014.

Page 60: Sreerag Thesis Word - Virginia Tech

54

56. Chollet, F. Keras: Deep Learning for humans 2015 [cited 2020; Available from: https://github.com/keras-team/keras. 57. Street, P. Dataport : Researcher access to Pecan Street's groundbreaking energy and water

data. [cited 2020; Available from: https://www.pecanstreet.org/dataport/. 58. Ltd., O. OpenWeather Map. 2020 [cited March 20, 2020; Available from:

https://openweathermap.org/. 59. Team, G.B. An end-to-end open source machine learning platform. [cited 2019; Available

from: https://www.tensorflow.org/.