Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
George Mason University
NOVEC Customer Segmentation Analysis
Anita Ahn Mesele Aytenifsu Bryan Barfield Daniel Kim
Department of Systems Engineering and Operations Research
NOVEC Customer Segmentation Analysis 1/ 19
George Mason UniversityGeorge Mason University
Introduction
• NOVEC: Northern Virginia Electric
Cooperative. Locally based electric
distribution system
• Services 651 sq miles of area
• 6,880 miles of power lines
• Provides electricity to more than
155,000 home and businesses
• Stretches over multiple Counties: Fairfax, Loudoun, Prince William Stafford, Fauquier
Well-known clients: Potomac Mills Mall,
Verizon, AT&T
NOVEC Customer Segmentation Analysis 2/ 19
George Mason UniversityGeorge Mason University
Problem Statement
NOVEC wants to determine if the sample it has can be used to segment its customers by their contribution towards NOVEC’s peak demand and total energy purchases and how well those customer segments represent NOVEC’s system.
*In order to scope the problem into more manageable parts…
*Customer segmentation will be focused for the month of July from year 2011-2015.
NOVEC Customer Segmentation Analysis 3/ 19
George Mason UniversityGeorge Mason University
Goals for this Project
NOVEC Customer Segmentation Analysis
Successful clustering for July
data
•Using NOVEC’s data on July energy consumption for years 2011 - 2015, segment the customers into groups that accurately reflect NOVEC’s total peak consumption
Successful clustering for all
months data
•Using the same clustering technique, segment the customers for all the other months in 2011 - 2015 to accurately reflect NOVEC’s total peak consumption
Customer Segmentation
Implementation
•Using these cluster groups of NOVEC’s customers, input the segmentation into geospatial analysis model to accurately predict NOVEC’s total peak consumption
*Project Team will focus mostly on Criteria 1 & 2Implementation of Customer Segmentation will be done by NOVEC separately
4/ 19
George Mason UniversityGeorge Mason University
Agenda
• Initial Analysis
• Data Terminology
• Cluster Analysis• SAS Analysis (Cluster and Correlation)• WEKA Analysis (Cluster)• R Analysis
• Milestone
• Difficulties/ Challenges
• Way Ahead
NOVEC Customer Segmentation Analysis 5/ 19
George Mason UniversityGeorge Mason University
Initial Exploratory Analysis on Customer’s Peak Usage
• Each Customer type has different peak usage time • With the difference in electricity usage amounts and usage
behavior, project team decided to split the analysis into different customer segment groups
NOVEC Customer Segmentation Analysis 6/ 19
Total Customers in NOVEC
Customer Type July 2011 July 2012 July 2013 July 2014 July 2015
Residential 135,407 (92.33%) 137,819 (92.30%) 140,806 (92.34%) 144,488 (92.36%) 147,652 (92.36%)
Large Company 94 (0.06%) 96 (0.06%) 110 (0.07%) 116 (0.07%) 121 (0.08%)
Small Company 11,143 (7.60%) 11,379 (7.62%) 11,551 (7.58%) 11,820 (7.56%) 12,083 (7.56%)
Street Light 16 (0.01%) 17 (0.01%) 17 (0.01%) 17 (0.01%) 17 (0.01%)
Total 146,660 149,311 152,484 156,441 159,873
Table:Number of Accounts by Customer Type
Overall, the composition of customer types is consistent over the years.
NOVEC Customer Segmentation Analysis 7/ 19
Dataset Sample
Customer Type July 2011 July 2012 July 2013 July 2014 July 2015
Residential 389 (45.98%) 420 (43.43%) 465 (42.94%) 421 (42.18%) 365 (40.20%)
Church 25 (2.96%) 36 (3.72%) 36 (3.32%) 32 (3.21%) 30 (3.30%)
Large Company 258 (30.50%) 316 (32.68%) 362 (33.43%) 348 (34.87%) 346 (38.11%)
Small Company 174 (20.57%) 195 (20.17%) 220 (20.31%) 197 (19.74%) 167 (18.39%)
Total 846 967 1083 998 908
Table:Number of Accounts by Customer Type
The number of residential accounts is underrepresented, while the number of large
and small companies is overrepresented in the dataset provided by the client. It waslater noted that the client provided misclassified data and the most recent datasetstill contains errors in customer type classification.
Note: Customers can change billing classifications throughout the years depending
on how they want to be billed or other special requirement that doesn’t necessarily
have anything to do with their electric usage.
New classification of customers appears : Church
Customer type will not be used in this analysis.
NOVEC Customer Segmentation AnalysisGeorge Mason University 8/ 19
George Mason University
Terminology used in Analysis
NOVEC Customer Segmentation Analysis
July Peak: The customer’s maximum recorded electricity usage in July
July Consumption: Total electricity consumption in July
July Avg: Hourly Average electricity usage by customer
Peak System Load: Maximum peak electricity usage for entire NOVEC’s system in July
Demand Factor: July PeakPeak System Load
Ranges from 0 – 1. Measures how much customer’s electricity usage contributes to entire system’s electricity usage
9/ 19
George Mason University
Terminology used in Analysis
NOVEC Customer Segmentation Analysis
Coincident Peak: Customer KWH usage at the time NOVEC’s system peaked
Load Factor: Customer’s Avg Energy UseCustomer’s Peak Use
Ranges from 0 – 1. Shows how variant customer’s energy usage is from it’s peak.
Load Factor (CP): Customer’s Avg Energy UseCustomer’s Coincident Peak Use
Measure of how significantly particular customer contributes to NOVEC’s peak. Can be greater than 1
Coincident to Peak Ratio: Customer’s Coincident Peak UseCustomer’s Peak Use
Ranges from 0 – 1. Measures how close the customer’s peak usage is from NOVEC’s system peak usage.
10/ 19
Provided Variables
Account Unique customer identifier
Map Location Geospatial identifier
Group Customer Billing Classification (RES, LGCOM,
SMCOM, CHRCH)
UsageEnergy expenditure in kilowatt-hour (kWh)
DateTime MM-DD-YYYY 00:00 (24-hour)
NOVEC Customer Segmentation AnalysisGeorge Mason University 11/ 19
Useful Variables
AccountUnique customer identifier
Map Location Geospatial identifier
Group Customer Billing Classification (RES, LGCOM,
SMCOM, CHRCH)
UsageEnergy expenditure in kilowatt-hour (kWh)
DateTimeMM-DD-YYYY 00:00 (24-hour)
NOVEC Customer Segmentation AnalysisGeorge Mason University 12/ 19
George Mason University
Cluster Analysis with SAS
NOVEC Customer Segmentation Analysis 13/ 19
George Mason UniversityGeorge Mason University
Cluster Results for Residential Customers
NOVEC Customer Segmentation Analysis 14/ 19
George Mason UniversityGeorge Mason University
Cluster Results for Residential Customers
NOVEC Customer Segmentation Analysis 15/ 19
George Mason UniversityGeorge Mason University
Cluster Results for Small Company Customers
NOVEC Customer Segmentation Analysis 16/ 19
George Mason UniversityGeorge Mason University
Cluster Results for Small Company Customers
NOVEC Customer Segmentation Analysis 17/ 19
George Mason UniversityGeorge Mason University
Cluster Results for Large Company Customers
NOVEC Customer Segmentation Analysis 18/ 19
George Mason UniversityGeorge Mason University
Cluster Results for Large Company Customers
NOVEC Customer Segmentation Analysis 19/ 19
George Mason UniversityGeorge Mason University
Cluster Results for Residential Customers
NOVEC Customer Segmentation Analysis
Co
inci
den
t Pe
ak
Load Factor
20/ 19
George Mason UniversityGeorge Mason University
Cluster Results for Small Company Customers
NOVEC Customer Segmentation Analysis
Load Factor
Co
inci
den
t Pe
ak
21/ 19
George Mason UniversityGeorge Mason University
Cluster Results for Large Company Customers
NOVEC Customer Segmentation Analysis
Co
inci
den
t Pe
ak
Load Factor22/ 19
George Mason University
Cluster Analysis using R
NOVEC Customer Segmentation Analysis 23/ 19
Derived Variables
YearYear of electricity usage
July Peak Maximum recorded electricity usage by customer
July Consumption Total July electricity consumption by customer
July Avg Hourly average electricity usage by customer
Peak System Load Maximum recorded NOVEC systemelectricity
usage in July
Demand Factor July PeakPeak System Load
Load Factor July AvgJuly Peak
Coincident UsageCustomer electricity usage at time of NOVEC
system peak
Coincident Usage Ratio CoincidentUsagePeak System Load
Coincident Peak Ratio Coincident UsageJuly Peak
Variables in boxes were used for clustering analysis.
NOVEC Customer Segmentation AnalysisGeorge Mason University 24/ 19
Data Cleaning
Year Number of Customer Accounts in Original Data Number of Customer Accounts in Final Data
2011 846 811
2012 966 932
2013 1082 1044
2014 997 957
2015 908 869
Total 4,799 4,613
Table:Removed accounts with zero values for July Peak, July Avg, and Coincident
Usage. About96%of the original data was retained for analysis after cleaning.
NOVEC Customer Segmentation AnalysisGeorge Mason University 25/ 19
Demand Factor Exploration
The histogram for demand factor shows a heavily right-skewed
distribution. However, a Log and Ln transformation shows twodistinct
peaks suggestive of two unique customer populations.
9 / 19NOVEC Customer Segmentation AnalysisGeorge Mason University
Customer Usage Ratio Exploration
The histogram for coincident usage ratio shows a heavily right-skewed
distribution. However, a Log and Ln transformation shows twodistinct
peaks suggestive of two unique customer populations.
10 / 19NOVEC Customer Segmentation AnalysisGeorge Mason University
Coincident Peak Ratio Exploration
The histogram for coincident peak ratio shows 3 peaks suggestive of three
unique customer populations. There is no need for Log or Ln
transformation.11 / 19NOVEC Customer Segmentation AnalysisGeorge Mason University
Load Factor Exploration
The histogram for load factor shows one distinct peak suggestive of a single
customer population. There is no need for Log or Ln transformation.
12 / 19NOVEC Customer Segmentation AnalysisGeorge Mason University
Correlation between Variables
Strong correlation between Demand Factor and Coincident Usage Ratio.
Other variables show weakly positive correlations.
NOVEC Customer Segmentation AnalysisGeorge Mason University 30/ 19
Clustering Algorithms
The k-means algorithm places each observation into a cluster by its center
(i.e, centroid) which corresponds to the mean of points assigned to the cluster.
The Partitioning Around Medoids (PAM) algorithm is based on the search
for medoids among the observations of the dataset. The goal is to find k
representative objects which minimize the sum of the dissimilarities of the
observations to their closest representative object.
Both algorithms require the user to choose the number of clusters to be
generated beforehand.
NOVEC Customer Segmentation AnalysisGeorge Mason University 31/ 19
Determining Optimal Cluster Size Using K-Means
Using the “elbow criterion”, the optimal number of clusters is 6.
NOVEC Customer Segmentation AnalysisGeorge Mason University 32/ 19
K-means Clustering Result for 2011 - 2015
These two principle components explain 82.3% of the variability.
NOVEC Customer Segmentation AnalysisGeorge Mason University 33/ 19
Determining Optimal Cluster Size Using PAM
Using the “elbow criterion”, the optimal number of clusters is 8.
NOVEC Customer Segmentation AnalysisGeorge Mason University 34/ 19
PAM Clustering Result for 2011 - 2015
These two principle components explain 82.3% of the variability.
NOVEC Customer Segmentation AnalysisGeorge Mason University 35/ 19
Cluster Size Results
Cluster Size
1 470
2 839
3 493
4 13
5 1502
6 1296
Cluster Size
1 461
2 379
3 818
4 1031
5 686
6 490
7 737
8 11Table:K-means Algorithm
Table:PAM Algorithm
NOVEC Customer Segmentation AnalysisGeorge Mason University 36/ 19
George Mason University
Cluster Analysis with Weka
NOVEC Customer Segmentation Analysis 37/ 19
George Mason UniversityGeorge Mason University
Kmeans (Disjoint sets): How it works?
Specify k, the desired number of clusters.
Choose k points at random as cluster centers.
Assign all instances to their closest cluster center.
Calculate the centroid (i.e., mean) of instances in each cluster.
These centroids are the new cluster centers .
Continue until the cluster centers don’t change Minimizes the total
squared distance from instances to their cluster centers Local, not
global, minimum!
NOVEC Customer Segmentation Analysis 38/ 19
George Mason UniversityGeorge Mason University
K-means Clustering algorithm on cleaned dataset
Relation: NOVEC_Cleaned_DatasetInstances: 4613Attributes: 12
Demand_FactorLoad_FactorCoincident_Usage_RatioCoincident_Peak_Ratio
Ignored:accounttypeyearJuly_PeakJuly_ConsumptionJuly_AvgPeak_System_LoadCoincident_Usage
Test mode: evaluate on training data
NOVEC Customer Segmentation Analysis 39/ 19
George Mason UniversityGeorge Mason University
K-means Clustering algorithm (k=6) 2011-2015
Cluster number
Attribute Full Data 0 1 2 3 4 5
(4613) (716-16%) (452-10%) (995-22%) (749-16%) (683-15%) (1018-22%)
===================================================================================
Demand_Factor 0.0002 0.0001 0.0002 0.0001 0.0006 0.0001 0
Load_Factor 0.4067 0.4348 0.5404 0.2825 0.7417 0.2492 0.3081
Coincident_Usage_Ratio 0.0001 0.0001 0.0001 0 0.0006 0 0
Coincident_Peak_Ratio 0.5812 0.8802 0.5333 0.4007 0.9035 0.1175 0.6424
NOVEC Customer Segmentation Analysis 40/ 19
George Mason UniversityGeorge Mason University
K-means cluster plot 2011- 2015 ,k=6
NOVEC Customer Segmentation Analysis
Coincident peak ratio
1. Cluster 0 – 16%2. Cluster 1 – 10%3. Cluster 2 – 22%4. Cluster 3 – 16%5. Cluster 4 – 15%6. Cluster 5 – 22%Lo
ad f
acto
r
41/ 19
George Mason UniversityGeorge Mason University
K-means Clustering algorithm (k=8) 2011 - 2016
Cluster number
Attribute Full Data 0 1 2 3 4 5 6 7
(4613.0) (559-12%) (415-9%) (941-20%) (431-9%) (367-8%) (510-11%) (795-17%) (595-13%)
===========================================================================================
Demand_Factor 0.0002 0.0002 0.0002 0 0.0009 0.0002 0.0001 0 0.0001
Load_Factor 0.4067 0.5777 0.5617 0.3107 0.8109 0.4687 0.3328 0.2535 0.2264
Coincident_Usage_Ratio 0.0001 0.0002 0.0001 0 0.0008 0.0001 0 0 0
Coincident_Peak_Ratio 0.5812 0.9162 0.6496 0.6083 0.9159 0.3404 0.8409 0.3961 0.1066
NOVEC Customer Segmentation Analysis 42/ 19
George Mason UniversityGeorge Mason University
K-means cluster plot 2011- 2015 ,k=8
NOVEC Customer Segmentation Analysis
Load
fac
tor
1. Cluster 0 – 12%2. Cluster 1 – 9%3. Cluster 2 – 20%4. Cluster 3 – 9%5. Cluster 4 – 8%6. Cluster 5 – 11%7. Cluster 6 – 17%8. Cluster 7 – 13%
Coincident peak ratio
43/ 19
George Mason UniversityGeorge Mason University
K-means Clustering algorithm on cleaned dataset
Relation: NOVEC_Cleaned_Dataset_2015
Instances: 869
Attributes: 12
Demand_Factor
Load_Factor
Coincident_Usage_Ratio
Coincident_Peak_Ratio
Ignored:
account
type
year
July_Peak
July_Consumption
July_Avg
Peak_System_Load
Coincident_Usage
Test mode: evaluate on training data
NOVEC Customer Segmentation Analysis 44/ 19
George Mason UniversityGeorge Mason University
K-means Clustering algorithm (k=6) 2015
Cluster number
Attribute Full Data 0 1 2 3 4 5
(869.0) (149-17%) (178-20%) (179-21%) (117-13%) (54-6%) (192-22%)
==============================================================================================
Demand_Factor 0.0002 0.0002 0.0002 0 0.0002 0.0019 0
Load_Factor 0.4049 0.6166 0.2698 0.263 0.4965 0.8497 0.3173
Coincident_Usage_Ratio 0.0002 0.0002 0 0 0.0001 0.0018 0
Coincident_Peak_Ratio 0.5488 0.8462 0.1344 0.4651 0.4287 0.9496 0.7406
NOVEC Customer Segmentation Analysis 45/ 19
George Mason UniversityGeorge Mason University
K-means cluster plot 2015 ,k=6
NOVEC Customer Segmentation Analysis
Load
fac
tor
1. Cluster 0 – 17%2. Cluster 1 – 20%3. Cluster 2 – 21%4. Cluster 3 – 13%5. Cluster 4 – 6%6. Cluster 5 – 22%
Coincident peak ratio
46/ 19
George Mason UniversityGeorge Mason University
K-means Clustering algorithm (k=8) 2015
Attribute Full Data 0 1 2 3 4 5 6 7
(869.0) (90-10%) (138-16%) (169-19%) (107-12%) (62-7%) (88-10%) (77-9%) (138-16%)
==================================================================================================
Demand_Factor 0.0002 0.0002 0.0002 0 0.0002 0.0017 0 0.0002 0
Load_Factor 0.4049 0.6016 0.2633 0.313 0.4722 0.8386 0.299 0.6033 0.2406
Coincident_Usage_Ratio 0.0002 0.0002 0 0 0.0001 0.0016 0 0.0001 0
Coincident_Peak_Ratio 0.5488 0.9176 0.104 0.6145 0.3585 0.9314 0.8404 0.677 0.3905
NOVEC Customer Segmentation Analysis 47/ 19
George Mason UniversityGeorge Mason University
K-means cluster plot 2015 ,k=8
NOVEC Customer Segmentation Analysis
Load
fac
tor
1. Cluster 0 – 10%2. Cluster 1 – 16%3. Cluster 2 – 19%4. Cluster 3 – 12%5. Cluster 4 – 7%6. Cluster 5 – 10%7. Cluster 6 – 9%8. Cluster 7 – 16%
Coincident peak ratio
48/ 19
George Mason University
Milestone
DEC
Working Group Meeting
On time!
NOVEMBER DECEMBEROCTOBERSEPTEMBER
Problem Def. Presentation
Proj. ProposalPresentation
Proj. ProposalReport
In ProgressPresentation I
In ProgressPresentation II
Final ReportDraft Due
Final Dry run Presentation
Final Presentation and submit DeliverablesPresentation
Now
NOVEC Customer Segmentation Analysis 49/ 19
George Mason UniversityGeorge Mason University
Difficulties/Challenges
• Data available from NOVEC is not as clean – Grouping of
Customers is inconsistent
• Data over representing heavy users will be difficult to accurately
represent NOVEC’s total population
50/ 19NOVEC Customer Segmentation Analysis
George Mason UniversityGeorge Mason University
Way Ahead
Continue to see if cleaner data is available from NOVEC
Continue to do cluster analysis using different metrics to determine
the best metric/ combination of metrics to segment customers
Continue communication with client to confirm project is directed in
the right path
Continue report writing
NOVEC Customer Segmentation Analysis 51/ 19