17
1 Phillip Price Jessica Granderson Lawrence Berkeley National Laboratory September 2012 Test of Several Whole-Building Baseline Models

Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

1

Phillip PriceJessica Granderson

Lawrence Berkeley National Laboratory

September 2012

Test of Several Whole-Building Baseline Models

Page 2: Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

Preliminary Baseline Evaluation Study

• Premise: statistical performance metrics can be used to evaluate automated whole-building baseline methods– To determine and compare accuracy of both proprietary and ‘open’ methods

• Objective performance assessment methodology can provide a win/win – Allow vendors to retain proprietary IP underlying the algorithms

– Allow users to gauge performance of the tool/approach

– Give industry confidence needed for scaled deployment, widespread adoption

2

Page 3: Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

Motivation: Industry Need

• To sustain and increase savings, utilities must move to a whole-building focus– Small savings are not sufficient; must ‘show’ savings in WB site energy use

• Industry is not yet able to deliver cost effective whole-building M&V at scale, with accuracy and rigor

– IMPVP, ASHRAE Guideline 14 not specific enough, still require experts to implement

– Not all practitioners have the expertise, those that do produce different estimates

• EMIS hold promise to automate and reduce M&V costs, vendors do not yet commonly disclose the error/accuracy in baselines used to determine WB savings

Site-level M&V is a common step in both gross and net savings, and important to owners

3

Page 4: Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

Motivation: Automated M&V

• Baseline model types in commercial tools– Regression, bin, neural nets, weighted averaging

• Advantages: streamline, reduce M&V time, costs

• Disadvantages– Lack of transparency in proprietary methods

– How to make adjustments if needed

• Standard performance metrics are commonly used– Manufacturing: mean time to failure, Cpk process quality control

– Building systems: SEER, COP, W/sf, lumens/W

– Why not for energy performance tracking tools?

4

Page 5: Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

Goals

5

• Compare performance of Pulse Adaptive Model (PAM) to that of typical models currently in use for M&V, for a variety of buildings. (Study was funded by Pulse Energy).

• To the extent possible, determine whether the PAM complies with ASHRAE Guideline 14, which lists requirements for models used for M&V:

– No “net determination bias” when applied to existing data.

– Maximum nRMSE of 0.20 = 20% for (monthly) energy use when less than 12 months of post-retrofit data are available.

nRMSE = “normalized Root Mean Squared Error” nRMSE = 0.20 means the prediction for a typical month is wrong by about 20% of the average month’s energy consumption

Page 6: Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

Models Compared

6

1. Pulse Adaptive Model (PAM)2. Mean Week = every week is identical to the average week (MW)3. Change-point = Temperature is the only explanatory variable (CP)4. Day, Time, and Temperature = daily, hour-of-day, and temperature effect

(DTT)5. LBNL Regression Model = hour-of-week and temperature effects (LBNL)

All of the comparison models (2-5) meet Guideline 14’s bias requirement.

Page 7: Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

Situations Considered

1. Use 6, 9, or 12 months of data to “train” the models2. Predict daily, weekly, or monthly energy consumption3. Compare results using several metrics, including:

• Normalized Root-Mean-Squared Error (nRMSE)• Median Absolute Relative Total Error• Correlation of Predictions to Data

Page 8: Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

Example: Pulse Adaptive Model, 6 months training data, predict daily energy use.

Top: black shows hourly power, red is daily average power, orange is forecast

Bottom: hourly data not shown, red is daily average power, orange is forecast

Page 9: Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

Example 2: Pulse Adaptive Model, 6 months training data, predict daily energy use.

Top: black shows hourly power, red is daily average power, orange is forecast

Bottom: hourly data not shown, red is daily average power, orange is forecast

Page 10: Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

Comparing Models: ExampleCompare Pulse model to Change-point model:

9 months of training dataPredict weekly energy consumption (integrated load)Plot normalized root mean squared error from each model, for

each buildingEach building generates one data point on this plot

Pulse model performs better than change-point model in all but a few buildings

Page 11: Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

PAM nRMSE (y-axis) compared to LBNL regression model (x-axis)

PAM was better (LBNL regression model was worse), for buildings plotted in lower right of each plot.

LBNL regression was better (PAM was worse) for buildings plotted in upper left of each plot.

6, 9, or 12 months training (columns)

Predicting Days, Weeks, or Months (rows)

Bottom row shows relative total error for the entire predicted period. This is related to Guideline 14 compliance.

Building-by-building model comparison

Page 12: Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

Summarize Model Performance

12

Summarize the goodness-of-fit for• each model (PAM, MW, CP,…), • each predicted quantity (daily, weekly, monthly), • and each training period

Boil it down to one number for each combination of (model, quantity, training period): Example: what was the median nRMSE over all of the buildings, for a given model and predicted quantity?

nRMSE = “normalized Root Mean Squared Error” nRMSE = 0.20 means the prediction for a typical month is wrong by about 20% of the average month’s energy consumption

Page 13: Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

Median nRMSE by training period and predicted quantity, for each model

Page 14: Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

Observations About Model Results

14

No single model worked best for all training periods and all buildings.

Usually PAM, LBNL, and DTT were bestUsually MW and CP were worst

Small quantitative differences don’t mean much: with a slightly different dataset the numbers, and the order of the models, might change a bit.

Page 15: Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

Study Validity

Black dots show buildings where Pulse could have cheated (i.e. they had the data for the prediction period).

Blue dots show buildings where Pulse could not have cheated.

PAM performance was very similar to LBNL regression model for both types of buildings. There is no evidence that Pulse artificially improved their predictions for cases for which they had data.

Page 16: Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

PAM compliance with ASHRAE Guideline 14

LBNL cannot independently guarantee compliance. However:1. When applied to data that are not used to fit the model, PAM shows no tendency

towards bias in either direction. Also, the pattern of errors is very similar to the LBNL regression model, which is Guideline 14 compliant.

2. Even when applied to data that are not used to fit the model, PAM achieves the Guideline 14 accuracy requirement in most buildings, even though the accuracy requirement applies to fitting data that are used in the model.

The bias requirement for Guideline 14 is very strict --- less than 0.005% net determination bias --- so it is possible that PAM fails to meet this requirement even though, with these data, PAM was not biased at a practically important level.

Based on this sample of buildings, it is likely that PAM easily meets the Guideline 14 accuracy requirement in most buildings.

Page 17: Test of Several Whole-Building Baseline Models...Example: Pulse Adaptive Model, 6 months training data, predict daily energy use. Top: black shows . hourly . power, red is daily average

Conclusions

The Pulse Adaptive Model, LBNL regression model, and Day-Time-Temperature models all performed roughly equally when applied to the available data.

The Change-point-only model performed very badly compared to the others.

The PAM model appears to be compliant or nearly compliant with ASHRAE guideline 14 (we cannot independently rule out a small bias).

12 months of training data aren’t always needed: good models can be created with as little as 6 months of data (or even less) for some buildings.

More months of data don’t always lead to better models.

An LBNL report with detailed results has been released: Evaluation of the Predictive Accuracy of Five Whole-Building Baseline Models Granderson J and Price PN, LBNL. August 2012. LBNL-5886E