View
232
Download
2
Category
Preview:
Citation preview
High Resolution Energy Modeling that Scales with Apache Spark 2.0
Jonathan FarlandConsultant | Data Scientist, DNV GL
About me• Data Scientist & Technical Consultant for DNV
GL’s Policy Advisory and Research Group.
• Background in Econometrics, Forecasting, Machine Learning and Optimization.
• Working with Big Data for 3+ years
Agenda• Introduction to DNV GL• Energy Data Science using Spark
– Data Scales and the DGP– Application 1 – Princeton Score Keeping Method
(PRISM)– Application 2 – Hourly Predictive Modelling with
Distributed Energy Resources• Next Steps with Spark and Databricks
Introduction to DNV GL
Jonathan FarlandConsultant | Data Scientist, DNV GL
Energy Data Science:Data Scales and the DGP
Jonathan FarlandConsultant | Data Scientist, DNV GL
Metering Data: Historical measured quantities of electricity usage for a site or
meter during a particular time.
- An analogue origin requiring a physical reading of the meter on a specific cycle.
- Typically used for utility companies to bill customers for their usage
- Advanced metering technologies and machine learning now allows for millisecond reading and disaggregation down to the end use / appliance level.
Weather Data:
- Actual Weather: Records of temperature, humidity, cloud cover, solar irradiance, etc.
- Typical Weather: 30-year / 10-year averages that define “normal” weather conditions
Data Generating Process
Electricity Distribution Grid
Transmission Distribution ConsumerGeneration Transmission Distribution ConsumerGeneration
WindFarms
PhotoVoltaic
Aggregated Utility Scale
2-50 MW
Utility Scale
100kW-2MW
Distributed Scale
25kW-100kW
ResidentialCommercial & Industrial
DistributionTransmissionGeneration
Bulk Storage
> 50 MW
Distribution System
Bulk System
PhotovoltaicWind Farms
The Rise of The Smart Grid
Data Scales
The embarrassingly parallel ‘Primary Modeling Unit’:I. Temporal: Sub-hourly, hourly, daily, monthly, annually
II. CrossSectional: Clusters/Segments, Geography, System Hierarchy.
III. Hybrid: Structure and Year specific
Databricks: Rapid deployment and development of existing analytics pipeline
Spark 2.0: SparkR allows for UDF’s and Partition-Based Model Learning- gapply, dapply, lapply
Spark 2.1: Enable installing third party packages on workers using spark.addfile- SPARK-7159: Multiclass Logistic Regression in DataFrame-based API
Analytical Solution
Energy Data Science:Princeton Score Keeping Method (PRISM)
Jonathan FarlandConsultant | Data Scientist, DNV GL
PRISM Algorithm
- Decomposes energy usage into it’s weather-driven and baseload components.
- Site level modelling that combine both full and reduced form models
- Grid search over possible heating and cooling reference temperatures
- Rich history development based on fundamental structural engineering principles
- Origin: Miriam Goldberg's dissertation "A Geometrical Approach to Non-differentiable Regression Models as Related to Methods for Assessing Residential Energy Conservation.“
Just a little math…
Explained Visually
SparkR – gapply, dapply, lapply
Local Native R
Energy Data Science:Predictive Modeling with Distributed Energy Resources
Jonathan FarlandConsultant | Data Scientist, DNV GL
21
Load Shifting: Electric Vehicles
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240
5
10
15
20
25
30
Standard Rate Electric Vehicle Rate
Hour Ending
Dem
and
(kW
)
22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 -
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
Forecasted - DR Reduction Forecasted - DR BaselineForecasted - DR Impacted Load Actual DR - Reduction
Hour Ending
Load
(kW
h)Load Reduction: Demand Response
Cluster Sizes:1 – 10,4952 – 4,5133 – 1,1274 – 9,823
Digitalization: Scalable Cluster Computing (Spark, Python, R)
Data Science: Machine Learning Algorithms (Spectral Clustering and K-means)
Predictive Analytics (Semiparametric Regression)
Cluster Sizes:1 – 10,4952 – 4,5133 – 1,1274 – 9,823
Digitalization: Scalable Cluster Computing (Spark, Python, R)
Data Science: Machine Learning Algorithms (Spectral Clustering and K-means)
Predictive Analytics (Semiparametric Regression)
How well did it work?Cluster 1 Cluster 4
ClusterSite Predictions
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 1031091151211271331390
0.5
1
1.5
2
2.5
3
Load Forecast Adjusted Load Forecast PV Production Storage Discharging
Forecast Horizon
kWClusterSite Tech Simulations
Conclusions
Jonathan FarlandConsultant | Data Scientist, DNV GL
Spark 2.0 / 2.1 has allowed DNV GL’s existing expertise and code base to scale
Databricks has provided an environment that facilitated existing codebases as well as additional rapid development
- Analytical contexts, prediction goals, and model selection processes define the Primary Modeling Unit (PMU) in any Energy Data Science Application.
- The distributed computing framework must be able to scale with the appropriate Primary Modeling Unit for any Energy Data Science Application
Take Home Message
Modeling Additional Fuels - Natural Gas (Therms)- Water (Liters / Gallons)- Hybrid (British Thermal Units)
Climate Change Simulations- DNV GL’s BayTown System Dynamics Model
Electricity Grid Optimization with Distributed Energy Resource Assets
The Future!
Thank You.Jonathan Farlandjon.farland@dnvgl.comhttps://github.com/jfarland
Recommended