Upload
spark-summit
View
74
Download
1
Embed Size (px)
Citation preview
Spark for Behavior Analysis Research Ling Jin, Sam Borgeson, Anna Spurlock, Annika ToddDoris Lee, Alex Sim, John Wu
Lawrence Berkeley National Lab
www.lbl.gov
Adam Smith’s Most Famous Books?
Behavioral Analysis Research 3
One-minute Behavioral Economics
Simpler to analyze and predict Harder to predict -- Need massive data and computers
Behavioral Analysis Research 4
How Are Blackjack and Electric Power Grid Similar?Bust when demand is larger than supply Bust when over 21
Behavioral Analysis Research 5
• Residential electricity records: 100,000 households, hourly electricity usage
• A region where electricity usage peaks in the summer time
• Randomized controlled trial of new rate structures -households are randomly placed in different groups• Control: using existing fixed rate for electricity• Active: households opted in to Time-of-Use Pricing• Passive: households defaulted in to Time-of-Use Pricing
• Data collected hourly over three years, one pre-rate, two after (labeled T-1, T, T+1)
6
Reducing Peak Demands Through Pricing
Behavioral Analysis Research
Goal Method Policy ImplicationBetter baseline models of energy use
Gradient tree boosting Better program evaluation
Define relevant household characteristics:
Classify and segment households using easily accessible data
• Define representative load shapes
Adaptive K-means clustering
• Estimate household-specific cooling change points (AC set point)
Piecewise linear regression, bootstrapping
• Characterize customers into “Lifestyle Groups”
Blend behavioral theory with machine learning techniques
• Define relevant household energy characteristics
Simple feature algorithms (e.g.,mean, min, max, peak usage; variance; entropy, etc.)
7
Research Examples
Behavioral Analysis Research
Baselines are Critical for Measuring Changes
Behavioral Analysis Research 8
Current Standard: randomized control group
Predictions from new Baseline TechniqueActive group Passive group
9
Group PT PT +1 MT −PT MT+1 −PT+1
Control 1.960 1.957 0.069 -0.020Passive 1.849 1.897 -0.027 -0.080Active 1.860 1.903 -0.164 -0.164
Observation: the Active group reduced their uses of electricity consistently over the two yeas of study during the peak-demand hours
P: predictedM: measured
Behavioral Analysis Research
Goal Method Policy ImplicationBetter baseline models of energy use
Gradient tree boosting Better program evaluation
Define relevant household characteristics:
Classify and segment households using easily accessible data
• Define representative load shapes
Adaptive K-means clustering
• Estimate household-specific cooling change points (AC set point)
Piecewise linear regression, bootstrapping
• Characterize customers into “Lifestyle Groups”
Blend behavioral theory with machine learning techniques
• Define relevant household energy characteristics
Simple feature algorithms (e.g.,mean, min, max, peak usage; variance; entropy, etc.)
10
Research Examples
Behavioral Analysis Research
Daily Load Shape: DefinitionObjective / Definition• 24-hour electricity usage
pattern• To capture hour variations:
don’t care about constant usage
• Want to use load shape to study mechanism that might affect electricity usage, therefore concentrate on discretionary demand
Samples
10/26/2016
2
• A load shape is the pattern created by 24 hours of demand data.
• Shapes are produced by patterns of occupancy, equipment ownership, and behavior
5
Defining Load Shapes
Flat/unoccupied Solar Morn/work/eve
Out for lunch? AC dominated?
Active at night?
6
Clustering approach
Normalization(load shapes)
Hourly smart meter data
(“customer-day”)
De-min
Dictionaryload shapes
Truncation
Adaptive K-means
Hierarchical Merging
Suitable parameters: • Error threshold• Violation rate30M customer-days
99 representative clusters/shapes
7
Top 12 Clusters
8
Cluster coverage
Behavioral Analysis Research 11
Clustering Process
Behavioral Analysis Research 12
Load data preprocessing
Load shape clustering
Post clustering processing
Smart meter customer-day load shape data
(24 hourly observations per customer-day)
Subsample of 100,000
customer-day load shapes
De-min*
Normalize
Eliminate load shapes with missing data or extremely low power
demand (below 0.2kW)
Adaptive k-means algorithm
(parameter θ chosen)
Hierarchical merging
Iterative truncation algorithm*
(parameter V chosen)
Definition of final dictionary of load shapes
Assign full dataset of customer daily load shapes to the closest dictionary load shape
* Innovations we introduce
Components of Clustering Process• Data cleaning and normalization• Remove households with very low demand (<0.2kW)• Normalization: compute hourly usage to hourly contribution to daily usage• Clustering• Centroid-based methods: Kmeans, Adaptive Kmeans• Hierarchical clustering: distance metric, linkage• Density-based clustering: DBSCAN• Model-based clustering: GMM• Judging cluster quality• Compactness: MIA, VRSE• Distinctness: SMI• Combined: DBI, Silhouette
13
Clustering Quality Index
14
Clustering Quality
15
More compactMore distinct
Puzzling??
Best choice?
Kmeans Produces More Balanced Clusters
16
Sizes of clusters as fractions of total number of samples
Behavioral Analysis Research
SummaryØ Examined a good number of clustering methods for identifying
daily usage profilesØ Short observation:– Centroid-based method (Kmeans) works the best on this set of dataØ Longer observations:– There are too many choices and no (really) clear winner– Centroid-based methods produce more compact clusters– Centroid-based methods produce more balanced cluster– DBSCAN declare many (up to 90%) data points as background– Our observation is different from previous published results (Chicco
2012)Ø Future work– Maybe a different clustering quality metric would work better?– Should examine alternative profile generation methods, other than
clustering Behavioral Analysis Research 17
Thank You.John WuEmail: [email protected]