High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan...

Preview:

Citation preview

High Resolution Energy Modeling that Scales with Apache Spark 2.0

Jonathan FarlandConsultant | Data Scientist, DNV GL

About me• Data Scientist & Technical Consultant for DNV

GL’s Policy Advisory and Research Group.

• Background in Econometrics, Forecasting, Machine Learning and Optimization.

• Working with Big Data for 3+ years

Agenda• Introduction to DNV GL• Energy Data Science using Spark

– Data Scales and the DGP– Application 1 – Princeton Score Keeping Method

(PRISM)– Application 2 – Hourly Predictive Modelling with

Distributed Energy Resources• Next Steps with Spark and Databricks

Introduction to DNV GL

Jonathan FarlandConsultant | Data Scientist, DNV GL

Energy Data Science:Data Scales and the DGP

Jonathan FarlandConsultant | Data Scientist, DNV GL

Metering Data: Historical measured quantities of electricity usage for a site or

meter during a particular time.

- An analogue origin requiring a physical reading of the meter on a specific cycle.

- Typically used for utility companies to bill customers for their usage

- Advanced metering technologies and machine learning now allows for millisecond reading and disaggregation down to the end use / appliance level.

Weather Data:

- Actual Weather: Records of temperature, humidity, cloud cover, solar irradiance, etc.

- Typical Weather: 30-year / 10-year averages that define “normal” weather conditions

Data Generating Process

Electricity Distribution Grid

Transmission Distribution ConsumerGeneration Transmission Distribution ConsumerGeneration

WindFarms

PhotoVoltaic

Aggregated Utility Scale

2-50 MW

Utility Scale

100kW-2MW

Distributed Scale

25kW-100kW

ResidentialCommercial & Industrial

DistributionTransmissionGeneration

Bulk Storage

> 50 MW

Distribution System

Bulk System

PhotovoltaicWind Farms

The Rise of The Smart Grid

Data Scales

The embarrassingly parallel ‘Primary Modeling Unit’:I. Temporal: Sub-hourly, hourly, daily, monthly, annually

II. CrossSectional: Clusters/Segments, Geography, System Hierarchy.

III. Hybrid: Structure and Year specific

Databricks: Rapid deployment and development of existing analytics pipeline

Spark 2.0: SparkR allows for UDF’s and Partition-Based Model Learning- gapply, dapply, lapply

Spark 2.1: Enable installing third party packages on workers using spark.addfile- SPARK-7159: Multiclass Logistic Regression in DataFrame-based API

Analytical Solution

Energy Data Science:Princeton Score Keeping Method (PRISM)

Jonathan FarlandConsultant | Data Scientist, DNV GL

PRISM Algorithm

   

- Decomposes energy usage into it’s weather-driven and baseload components.

- Site level modelling that combine both full and reduced form models

- Grid search over possible heating and cooling reference temperatures

- Rich history development based on fundamental structural engineering principles

- Origin: Miriam Goldberg's dissertation "A Geometrical Approach to Non-differentiable Regression Models as Related to Methods for Assessing Residential Energy Conservation.“

Just a little math…

 

 

     

Explained Visually

   

SparkR – gapply, dapply, lapply

Local Native R

Energy Data Science:Predictive Modeling with Distributed Energy Resources

Jonathan FarlandConsultant | Data Scientist, DNV GL

21

Load Shifting: Electric Vehicles

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

5

10

15

20

25

30

Standard Rate Electric Vehicle Rate

Hour Ending

Dem

and

(kW

)

22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 -

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

Forecasted - DR Reduction Forecasted - DR BaselineForecasted - DR Impacted Load Actual DR - Reduction

Hour Ending

Load

(kW

h)Load Reduction: Demand Response

Cluster Sizes:1 – 10,4952 – 4,5133 – 1,1274 – 9,823

Digitalization: Scalable Cluster Computing (Spark, Python, R)

Data Science: Machine Learning Algorithms (Spectral Clustering and K-means)

Predictive Analytics (Semiparametric Regression)

Cluster Sizes:1 – 10,4952 – 4,5133 – 1,1274 – 9,823

Digitalization: Scalable Cluster Computing (Spark, Python, R)

Data Science: Machine Learning Algorithms (Spectral Clustering and K-means)

Predictive Analytics (Semiparametric Regression)

How well did it work?Cluster 1 Cluster 4

ClusterSite Predictions

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 1031091151211271331390

0.5

1

1.5

2

2.5

3

Load Forecast Adjusted Load Forecast PV Production Storage Discharging

Forecast Horizon

kWClusterSite Tech Simulations

Conclusions

Jonathan FarlandConsultant | Data Scientist, DNV GL

Spark 2.0 / 2.1 has allowed DNV GL’s existing expertise and code base to scale

Databricks has provided an environment that facilitated existing codebases as well as additional rapid development

- Analytical contexts, prediction goals, and model selection processes define the Primary Modeling Unit (PMU) in any Energy Data Science Application.

- The distributed computing framework must be able to scale with the appropriate Primary Modeling Unit for any Energy Data Science Application

Take Home Message

Modeling Additional Fuels - Natural Gas (Therms)- Water (Liters / Gallons)- Hybrid (British Thermal Units)

Climate Change Simulations- DNV GL’s BayTown System Dynamics Model

Electricity Grid Optimization with Distributed Energy Resource Assets

The Future!

Thank You.Jonathan Farlandjon.farland@dnvgl.comhttps://github.com/jfarland

Recommended