High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan...

High Resolution Energy Modeling that Scales with Apache Spark 2.0

Jonathan FarlandConsultant | Data Scientist, DNV GL

About me• Data Scientist & Technical Consultant for DNV

GL’s Policy Advisory and Research Group.

• Background in Econometrics, Forecasting, Machine Learning and Optimization.

• Working with Big Data for 3+ years

Agenda• Introduction to DNV GL• Energy Data Science using Spark

– Data Scales and the DGP– Application 1 – Princeton Score Keeping Method

(PRISM)– Application 2 – Hourly Predictive Modelling with

Distributed Energy Resources• Next Steps with Spark and Databricks

Introduction to DNV GL

Energy Data Science:Data Scales and the DGP

Metering Data: Historical measured quantities of electricity usage for a site or

meter during a particular time.

- An analogue origin requiring a physical reading of the meter on a specific cycle.

- Typically used for utility companies to bill customers for their usage

- Advanced metering technologies and machine learning now allows for millisecond reading and disaggregation down to the end use / appliance level.

Weather Data:

- Actual Weather: Records of temperature, humidity, cloud cover, solar irradiance, etc.

- Typical Weather: 30-year / 10-year averages that define “normal” weather conditions

Data Generating Process

Electricity Distribution Grid

Transmission Distribution ConsumerGeneration Transmission Distribution ConsumerGeneration

WindFarms

PhotoVoltaic

Aggregated Utility Scale

2-50 MW

Utility Scale

100kW-2MW

Distributed Scale

25kW-100kW

ResidentialCommercial & Industrial

DistributionTransmissionGeneration

Bulk Storage

> 50 MW

Distribution System

Bulk System

PhotovoltaicWind Farms

The Rise of The Smart Grid

Data Scales

The embarrassingly parallel ‘Primary Modeling Unit’:I. Temporal: Sub-hourly, hourly, daily, monthly, annually

II. CrossSectional: Clusters/Segments, Geography, System Hierarchy.

III. Hybrid: Structure and Year specific

Databricks: Rapid deployment and development of existing analytics pipeline

Spark 2.0: SparkR allows for UDF’s and Partition-Based Model Learning- gapply, dapply, lapply

Spark 2.1: Enable installing third party packages on workers using spark.addfile- SPARK-7159: Multiclass Logistic Regression in DataFrame-based API

Analytical Solution

Energy Data Science:Princeton Score Keeping Method (PRISM)

PRISM Algorithm

- Decomposes energy usage into it’s weather-driven and baseload components.

- Site level modelling that combine both full and reduced form models

- Grid search over possible heating and cooling reference temperatures

- Rich history development based on fundamental structural engineering principles

- Origin: Miriam Goldberg's dissertation "A Geometrical Approach to Non-differentiable Regression Models as Related to Methods for Assessing Residential Energy Conservation.“

Just a little math…

Explained Visually

SparkR – gapply, dapply, lapply

Local Native R

Energy Data Science:Predictive Modeling with Distributed Energy Resources

Load Shifting: Electric Vehicles

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

Standard Rate Electric Vehicle Rate

Hour Ending

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 -

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

Forecasted - DR Reduction Forecasted - DR BaselineForecasted - DR Impacted Load Actual DR - Reduction

Hour Ending

h)Load Reduction: Demand Response

Cluster Sizes:1 – 10,4952 – 4,5133 – 1,1274 – 9,823

Digitalization: Scalable Cluster Computing (Spark, Python, R)

Data Science: Machine Learning Algorithms (Spectral Clustering and K-means)

Predictive Analytics (Semiparametric Regression)

Cluster Sizes:1 – 10,4952 – 4,5133 – 1,1274 – 9,823

Digitalization: Scalable Cluster Computing (Spark, Python, R)

Data Science: Machine Learning Algorithms (Spectral Clustering and K-means)

Predictive Analytics (Semiparametric Regression)

How well did it work?Cluster 1 Cluster 4

ClusterSite Predictions

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 1031091151211271331390

Load Forecast Adjusted Load Forecast PV Production Storage Discharging

Forecast Horizon

kWClusterSite Tech Simulations

Conclusions

Spark 2.0 / 2.1 has allowed DNV GL’s existing expertise and code base to scale

Databricks has provided an environment that facilitated existing codebases as well as additional rapid development

- Analytical contexts, prediction goals, and model selection processes define the Primary Modeling Unit (PMU) in any Energy Data Science Application.

- The distributed computing framework must be able to scale with the appropriate Primary Modeling Unit for any Energy Data Science Application

Take Home Message

Modeling Additional Fuels - Natural Gas (Therms)- Water (Liters / Gallons)- Hybrid (British Thermal Units)

Climate Change Simulations- DNV GL’s BayTown System Dynamics Model

Electricity Grid Optimization with Distributed Energy Resource Assets

The Future!

Thank You.Jonathan Farlandjon.farland@dnvgl.comhttps://github.com/jfarland

High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summit East talk by Jonathan...

Data & Analytics

NEW ARCHITECTURES FOR APACHE SPARK TM AND BIG DATA · NEW ARCHITECTURES FOR APACHE SPARK AND BIG DATA The Apache Spark Platform for Big Data The Apache Spark platform is an open-source

Apache Spark Introduction

Introduction to Apache Spark

A Tutorial on Apache Spark - Michael Hahslermichael.hahsler.net/SMU/EMIS8331/tutorials/Tutorial_Apache_Spark.pdf · A Tutorial on Apache Spark ... •Apache Spark is considered to

Managed Solutions Apache Spark® · Apache Spark® Apache Spark™ is a high performing engine for large-scale analytics and data processing, While Apache Spark™ provides advanced

Using Apache Spark Pat McDonough - Databricks. Apache Spark spark.incubator.apache.org github.com/apache/incubator- spark user@spark.incubator.apache.or

Apache spark Intro

Integrating Apache Hive with Kafka, Spark, and BI...Community Connection: Integrating Apache Hive with Apache Spark--Hive Warehouse Connector Apache Spark-Apache Hive connection configuration

Apache Spark RDDs

R + Apache Spark

Apache Spark 2.0

Apache Spark Streaming

Apache Spark PDF

Apache spark session

Apache spark meetup

Accelerator for Apache Spark Functional Specification · Accelerator for Apache Spark – Functional Specification 12 Table 1: Accelerator for Apache Spark Components Component Software

Using Apache Spark

KNIME Extension for Apache Spark Installation Guide · Apache Livy (recommended) Spark Job Server (deprecated) Supported Spark and Hadoop distributions KNIME Extension for Apache

Writing Apache Spark and Apache Flink Applications Using Apache Bahir

Running Apache Spark & Apache Zeppelin in Production