18
Spark for Behavior Analysis Research Ling Jin, Sam Borgeson, Anna Spurlock, Annika Todd Doris Lee, Alex Sim, John Wu Lawrence Berkeley National Lab

Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

Embed Size (px)

Citation preview

Page 1: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

Spark for Behavior Analysis Research Ling Jin, Sam Borgeson, Anna Spurlock, Annika ToddDoris Lee, Alex Sim, John Wu

Lawrence Berkeley National Lab

Page 2: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

www.lbl.gov

Page 3: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

Adam Smith’s Most Famous Books?

Behavioral Analysis Research 3

Page 4: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

One-minute Behavioral Economics

Simpler to analyze and predict Harder to predict -- Need massive data and computers

Behavioral Analysis Research 4

Page 5: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

How Are Blackjack and Electric Power Grid Similar?Bust when demand is larger than supply Bust when over 21

Behavioral Analysis Research 5

Page 6: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

• Residential electricity records: 100,000 households, hourly electricity usage

• A region where electricity usage peaks in the summer time

• Randomized controlled trial of new rate structures -households are randomly placed in different groups• Control: using existing fixed rate for electricity• Active: households opted in to Time-of-Use Pricing• Passive: households defaulted in to Time-of-Use Pricing

• Data collected hourly over three years, one pre-rate, two after (labeled T-1, T, T+1)

6

Reducing Peak Demands Through Pricing

Behavioral Analysis Research

Page 7: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

Goal Method Policy ImplicationBetter baseline models of energy use

Gradient tree boosting Better program evaluation

Define relevant household characteristics:

Classify and segment households using easily accessible data

• Define representative load shapes

Adaptive K-means clustering

• Estimate household-specific cooling change points (AC set point)

Piecewise linear regression, bootstrapping

• Characterize customers into “Lifestyle Groups”

Blend behavioral theory with machine learning techniques

• Define relevant household energy characteristics

Simple feature algorithms (e.g.,mean, min, max, peak usage; variance; entropy, etc.)

7

Research Examples

Behavioral Analysis Research

Page 8: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

Baselines are Critical for Measuring Changes

Behavioral Analysis Research 8

Current Standard: randomized control group

Page 9: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

Predictions from new Baseline TechniqueActive group Passive group

9

Group PT PT +1 MT −PT MT+1 −PT+1

Control 1.960 1.957 0.069 -0.020Passive 1.849 1.897 -0.027 -0.080Active 1.860 1.903 -0.164 -0.164

Observation: the Active group reduced their uses of electricity consistently over the two yeas of study during the peak-demand hours

P: predictedM: measured

Behavioral Analysis Research

Page 10: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

Goal Method Policy ImplicationBetter baseline models of energy use

Gradient tree boosting Better program evaluation

Define relevant household characteristics:

Classify and segment households using easily accessible data

• Define representative load shapes

Adaptive K-means clustering

• Estimate household-specific cooling change points (AC set point)

Piecewise linear regression, bootstrapping

• Characterize customers into “Lifestyle Groups”

Blend behavioral theory with machine learning techniques

• Define relevant household energy characteristics

Simple feature algorithms (e.g.,mean, min, max, peak usage; variance; entropy, etc.)

10

Research Examples

Behavioral Analysis Research

Page 11: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

Daily Load Shape: DefinitionObjective / Definition• 24-hour electricity usage

pattern• To capture hour variations:

don’t care about constant usage

• Want to use load shape to study mechanism that might affect electricity usage, therefore concentrate on discretionary demand

Samples

10/26/2016

2

• A load shape is the pattern created by 24 hours of demand data.

• Shapes are produced by patterns of occupancy, equipment ownership, and behavior

5

Defining Load Shapes

Flat/unoccupied Solar Morn/work/eve

Out for lunch? AC dominated?

Active at night?

6

Clustering approach

Normalization(load shapes)

Hourly smart meter data

(“customer-day”)

De-min

Dictionaryload shapes

Truncation

Adaptive K-means

Hierarchical Merging

Suitable parameters: • Error threshold• Violation rate30M customer-days

99 representative clusters/shapes

7

Top 12 Clusters

8

Cluster coverage

Behavioral Analysis Research 11

Page 12: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

Clustering Process

Behavioral Analysis Research 12

Load data preprocessing

Load shape clustering

Post clustering processing

Smart meter customer-day load shape data

(24 hourly observations per customer-day)

Subsample of 100,000

customer-day load shapes

De-min*

Normalize

Eliminate load shapes with missing data or extremely low power

demand (below 0.2kW)

Adaptive k-means algorithm

(parameter θ chosen)

Hierarchical merging

Iterative truncation algorithm*

(parameter V chosen)

Definition of final dictionary of load shapes

Assign full dataset of customer daily load shapes to the closest dictionary load shape

* Innovations we introduce

Page 13: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

Components of Clustering Process• Data cleaning and normalization• Remove households with very low demand (<0.2kW)• Normalization: compute hourly usage to hourly contribution to daily usage• Clustering• Centroid-based methods: Kmeans, Adaptive Kmeans• Hierarchical clustering: distance metric, linkage• Density-based clustering: DBSCAN• Model-based clustering: GMM• Judging cluster quality• Compactness: MIA, VRSE• Distinctness: SMI• Combined: DBI, Silhouette

13

Page 14: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

Clustering Quality Index

14

Page 15: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

Clustering Quality

15

More compactMore distinct

Puzzling??

Best choice?

Page 16: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

Kmeans Produces More Balanced Clusters

16

Sizes of clusters as fractions of total number of samples

Behavioral Analysis Research

Page 17: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

SummaryØ Examined a good number of clustering methods for identifying

daily usage profilesØ Short observation:– Centroid-based method (Kmeans) works the best on this set of dataØ Longer observations:– There are too many choices and no (really) clear winner– Centroid-based methods produce more compact clusters– Centroid-based methods produce more balanced cluster– DBSCAN declare many (up to 90%) data points as background– Our observation is different from previous published results (Chicco

2012)Ø Future work– Maybe a different clustering quality metric would work better?– Should examine alternative profile generation methods, other than

clustering Behavioral Analysis Research 17

Page 18: Spark for Behavioral Analytics Research: Spark Summit East talk by John W u

Thank You.John WuEmail: [email protected]