31
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Jonathan Farland Consultant | Data Scientist, DNV GL

High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Embed Size (px)

Citation preview

Page 1: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

High Resolution Energy Modeling that Scales with Apache Spark 2.0

Jonathan FarlandConsultant | Data Scientist, DNV GL

Page 2: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

About me• Data Scientist & Technical Consultant for DNV

GL’s Policy Advisory and Research Group.

• Background in Econometrics, Forecasting, Machine Learning and Optimization.

• Working with Big Data for 3+ years

Page 3: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Agenda• Introduction to DNV GL• Energy Data Science using Spark

– Data Scales and the DGP– Application 1 – Princeton Score Keeping Method

(PRISM)– Application 2 – Hourly Predictive Modelling with

Distributed Energy Resources• Next Steps with Spark and Databricks

Page 4: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Introduction to DNV GL

Jonathan FarlandConsultant | Data Scientist, DNV GL

Page 5: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland
Page 6: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Energy Data Science:Data Scales and the DGP

Jonathan FarlandConsultant | Data Scientist, DNV GL

Page 7: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Metering Data: Historical measured quantities of electricity usage for a site or

meter during a particular time.

- An analogue origin requiring a physical reading of the meter on a specific cycle.

- Typically used for utility companies to bill customers for their usage

- Advanced metering technologies and machine learning now allows for millisecond reading and disaggregation down to the end use / appliance level.

Weather Data:

- Actual Weather: Records of temperature, humidity, cloud cover, solar irradiance, etc.

- Typical Weather: 30-year / 10-year averages that define “normal” weather conditions

Data Generating Process

Page 8: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Electricity Distribution Grid

Transmission Distribution ConsumerGeneration Transmission Distribution ConsumerGeneration

WindFarms

PhotoVoltaic

Aggregated Utility Scale

2-50 MW

Utility Scale

100kW-2MW

Distributed Scale

25kW-100kW

ResidentialCommercial & Industrial

DistributionTransmissionGeneration

Bulk Storage

> 50 MW

Distribution System

Bulk System

PhotovoltaicWind Farms

Page 9: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

The Rise of The Smart Grid

Page 10: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Data Scales

Page 11: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

The embarrassingly parallel ‘Primary Modeling Unit’:I. Temporal: Sub-hourly, hourly, daily, monthly, annually

II. CrossSectional: Clusters/Segments, Geography, System Hierarchy.

III. Hybrid: Structure and Year specific

Databricks: Rapid deployment and development of existing analytics pipeline

Spark 2.0: SparkR allows for UDF’s and Partition-Based Model Learning- gapply, dapply, lapply

Spark 2.1: Enable installing third party packages on workers using spark.addfile- SPARK-7159: Multiclass Logistic Regression in DataFrame-based API

Analytical Solution

Page 12: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Energy Data Science:Princeton Score Keeping Method (PRISM)

Jonathan FarlandConsultant | Data Scientist, DNV GL

Page 13: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

PRISM Algorithm

   

- Decomposes energy usage into it’s weather-driven and baseload components.

- Site level modelling that combine both full and reduced form models

- Grid search over possible heating and cooling reference temperatures

- Rich history development based on fundamental structural engineering principles

- Origin: Miriam Goldberg's dissertation "A Geometrical Approach to Non-differentiable Regression Models as Related to Methods for Assessing Residential Energy Conservation.“

Page 14: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Just a little math…

 

 

     

Page 15: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Explained Visually

   

Page 16: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

SparkR – gapply, dapply, lapply

Page 17: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Local Native R

Page 18: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland
Page 19: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland
Page 20: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Energy Data Science:Predictive Modeling with Distributed Energy Resources

Jonathan FarlandConsultant | Data Scientist, DNV GL

Page 21: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

21

Load Shifting: Electric Vehicles

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

5

10

15

20

25

30

Standard Rate Electric Vehicle Rate

Hour Ending

Dem

and

(kW

)

Page 22: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 -

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

Forecasted - DR Reduction Forecasted - DR BaselineForecasted - DR Impacted Load Actual DR - Reduction

Hour Ending

Load

(kW

h)Load Reduction: Demand Response

Page 23: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Cluster Sizes:1 – 10,4952 – 4,5133 – 1,1274 – 9,823

Digitalization: Scalable Cluster Computing (Spark, Python, R)

Data Science: Machine Learning Algorithms (Spectral Clustering and K-means)

Predictive Analytics (Semiparametric Regression)

Page 24: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Cluster Sizes:1 – 10,4952 – 4,5133 – 1,1274 – 9,823

Digitalization: Scalable Cluster Computing (Spark, Python, R)

Data Science: Machine Learning Algorithms (Spectral Clustering and K-means)

Predictive Analytics (Semiparametric Regression)

Page 25: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

How well did it work?Cluster 1 Cluster 4

Page 26: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

ClusterSite Predictions

Page 27: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 1031091151211271331390

0.5

1

1.5

2

2.5

3

Load Forecast Adjusted Load Forecast PV Production Storage Discharging

Forecast Horizon

kWClusterSite Tech Simulations

Page 28: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Conclusions

Jonathan FarlandConsultant | Data Scientist, DNV GL

Page 29: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Spark 2.0 / 2.1 has allowed DNV GL’s existing expertise and code base to scale

Databricks has provided an environment that facilitated existing codebases as well as additional rapid development

- Analytical contexts, prediction goals, and model selection processes define the Primary Modeling Unit (PMU) in any Energy Data Science Application.

- The distributed computing framework must be able to scale with the appropriate Primary Modeling Unit for any Energy Data Science Application

Take Home Message

Page 30: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Modeling Additional Fuels - Natural Gas (Therms)- Water (Liters / Gallons)- Hybrid (British Thermal Units)

Climate Change Simulations- DNV GL’s BayTown System Dynamics Model

Electricity Grid Optimization with Distributed Energy Resource Assets

The Future!

Page 31: High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan Farland

Thank You.Jonathan [email protected]://github.com/jfarland