Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
ENTROPY Consortium
Title: Document Version:
D4.1. Deployment of Energy Data Analytics Reporting and Visualization
Framework 1.0
Project Number: Project Acronym: Project Title:
649849 ENTROPY Design of an Innovative Energy-Aware IT Ecosystem for Motivating
Behavioural Changes Towards the Adoption of Energy Efficient Lifestyles
Contractual Delivery Date: Actual Delivery Date: Deliverable Type* - Security**:
31/10/2017 31/10/2017 P – PU * Type: P – Prototype, R – Report, D – Demonstrator, O – Other ** Security Class: PU- Public, PP – Restricted to other programme participants (including the Commission), RE – Restricted to a group
defined by the consortium (including the Commission), CO – Confidential, only for members of the consortium (including
the Commission)
Responsible and Editor/Author: Organization: Contributing WP:
Vassilis Nikolopoulos Intelen WP4
Authors (organizations):
Vassilis Nikolopoulos (INT), Anastasios Zafeiropoulos (UBI), John Papagiannis (INT), Eleni
Fotopoulou (UBI), Antonio Skarmeta (UMU), Aurora Gonzalez Vidal (UMU)
Abstract:
This document describes the prototype toolbox of ENTROPY algorithmics and the technical
framework
Keywords:
Prototype, analytics, algorithms
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 2 of 20
Revision History
The following table describes the main changes done in the document since created.
Revision Date Description Author (Organization)
v0.1 30/10/2017 Initial draft document by INTELEN Vassilis Nikolopoulos (INT)
John Papagiannis (INT)
v0.2 30/10/2017 Technical architecture and algorithmics Anastasios Zafeiropoulos (UBI)
Eleni Fotopoulou (UBI)
Aurora Gonzalez Vidal (UMU)
Vassilis Nikolopoulos (UMU)
v0.3 31/10/2017 Document Review Antonio Skarmeta (UMU)
Aurora Gonzalez Vidal (UMU)
Pedro J. Fernández Ruiz (UMU)
Aristotelis Agianniotis (HESSO)
v1.0 31/10/2017 Document Final Version Anastasios Zafeiropoulos (UBI)
Eleni Fotopoulou (UBI)
Aurora Gonzalez Vidal (UMU)
Vassilis Nikolopoulos (UMU)
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 3 of 20
Executive Summary
The document provides a basic manual/guide for the ENTROPY Analytics toolbox. The
technical analysis and outputs are based on WP2 results and the theoretical algorithms that were
developed. The basic UI of the ENTROPY algorithmic dashboard/algorithmic tool is presented,
including the backend architecture and the description of the various algorithms. The analysis
and the analytics toolbox will be used on real ENTROPY data sets and behavioral digital
interactions, generated by the mobile apps.
Disclaimer
This project has received funding from the European Union’s Horizon 2020 research and
innovation programme under grant agreement No 649849, but this document only reflects the
consortium’s view. The European Commission is not responsible for any use that may be made
of the information it contains.
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 4 of 20
Table of Contents
1. Introduction ........................................................................................................................ 5
2. Data Mining and Analysis Services ................................................................................... 7
2.1 Algorithms and Architecture ....................................................................................... 7
3. Integrated Algorithms IN ENTROPY Platform .............................................................. 10
3.1 EntArima ...................................................................................................................... 10
3.2 Cooling & Heating Degree Days ................................................................................ 10
3.3 GBDT4consumption.................................................................................................... 11
3.4 StreamQ ....................................................................................................................... 12
3.5 Energy Comparison between modular Periods ........................................................ 13
3.6 EntPass Forecasting .................................................................................................... 13
3.7 HeatMap with End User Behavioral Characteristics .............................................. 14
3.8 HeatMaps on Energy and Digital Behavioral Interactions (Apps)......................... 14
3.9 Behavioral Indications calculations ........................................................................... 15
3.10 Clustering HeatMap algorithm .................................................................................. 17
4. Conclusions ...................................................................................................................... 19
Bibliography – References ........................................................................................................... 20
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 5 of 20
1. INTRODUCTION
This document presents the whole architecture and various functionalities of ENTROPY
Analytics and Reporting dashboard (Prototype). The relevant framework was based on the
OpenCPU system, for embedding algorithms in R or Python languages and functions. The
flexibility that OpenCPU offers is of great importance, since the ENTROPY can connect with
any external algorithmic package or functions and execute it, on real ENTROPY data sets
(Energy or Behavioral interactions).
DATA SETS
Questionnaires
Surveys
Algorithms
Analytics in R/
Python
User inputs
Clustering
Classification
Correlation
Regression
PCA
Predictions
Digital
Interactions
Behavioral
KPIs
Energy Data
from
ENTROPY
Building
Descriptive
Data
ENTROPY
Admin Panel
Admin tuning /
selection
Various 3rd
Party data (ie.
weather)
DATA SETS
Execution
Visualization
and Results
Optimal Algorithmic selection based
on ENTROPY Value Propositions and
Engagement objectives
Measurement and Verification – M&V
Recommender Tuning
ENTROPY stratistics/trends
EU Engagement frameworks
Figure 1. ENTROPY Analytics overall Functional Procedure including data inputs
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 6 of 20
Various algorithms are presented with details on the input and output fields and some reporting
snapshots obtained from ENTROPY toolbox. Also some details on the behavioral KPIs and how
we calculate them, using ENTROPY’s data and mobile apps.
On ENTROPY platform, various algorithms will be able to be executed (in remote mode, based
on the OpenCPU approach), through the Administration panel. The general process of the
algorithmic flow for all algorithms is presented below in Figure 1
ENTROPY analytics engine will take into account all inputs from individual questionnaires,
from IoT sensor streams (pilot sites), from behavioural interactions (mobile apps) and other
external data sources and will use algorithmic functions to calculate results and produce reports.
Algorithms will be executed remotely and this will enable ENTROPY platform and the specific
toolbox to become a “SaaS (Software as A Service) toolbox” that will enable various business
models on the platform (Analytics as A Service)
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 7 of 20
2. DATA MINING AND ANALYSIS SERVICES
2.1 Algorithms and Architecture
Within the ENTROPY platform, a data mining and analysis service is implemented that regards
the support of a set of big data mining and analysis techniques towards the extraction of energy
and behavioral analytics [7]. Insights provided with regards to the energy usage in smart
buildings, as well as the behavioral characteristics of the occupants, may lead on one hand on
increase of their energy awareness and on the other hand on targeted recommendations for
reducing energy consumption.
The supported set of analytics processes concerns descriptive, predictive, classification,
clustering, and prescriptive analytics [1]. Descriptive analytics are providing summary
information regarding the energy usage, as well as other environmental or behavioral attributes.
Predictive analytics are providing estimates for usage of energy the upcoming period, as well as
examining the relationship among energy consumption and set of parameters, such as average
temperature, heating or cooling degree days, day of the week, etc.
The workflow followed for the support of data mining and analysis techniques is depicted in
Figure 1. An analysis process is based on the selection of an analysis template and the selection
of the queries to be executed for providing the input datasets (training and/or evaluation
datasets). Each analysis template represents a specific algorithm and provides to the user the
flexibility to adjust the relevant configuration parameters. Such parameters include input
parameters for the algorithm along with their description and their default value, as well as
output parameters along with their type (text, image, data, html). An indicative analysis template
for the calculation of heating or cooling degree days per day [2] for a monthly period is depicted
in Figure 2. A set of analysis templates can be made available, as in Figure 3 and be used for
initiating an analysis. It should be noted that an analysis process is also associated with a set of
execution parameters that denote whether an analysis should be realized in a manual or
automated way, as well as the periodicity factor for the latter case.
The design of queries for obtaining the input datasets for the analysis is based on the
development of a query builder over MongoDB, facilitating end users to easily prepare their
input datasets. Two categories of queries are supported, namely queries for fetching data
collected by sensor data streams (e.g., energy consumption, humidity, and indoor temperature
data per hour for a specific room) and queries for fetching data related to the set of users
participating at the energy efficiency campaign (e.g., a set of users with an educational level
relevant to a Master’s degree). Upon the execution of the queries, streams of the input training or
evaluation datasets are provided to the analysis toolkits.
In ENTROPY, the R Project for Statistical Computing [3], and the Apache Spark fast and
general engine for large-scale data processing [4] are used for this purpose. Depending on the
analysis needs in terms of big data management and performance aspects, the optimal tool per
case may be selected. Interconnection of the ENTROPY components with the analysis toolkits is
based on the OpenCPU system for embedded scientific computing that provides a reliable and
interoperable HTTP API for data analysis based on R. In the case of large-scale data processing
and the need for a big data analysis framework, the Apache Spark engine is used, where the
analysis process is realized in a set of worker nodes, each one of which is hosting an Apache
Spark OpenCPU Executor [5]. The set of worker nodes are formulating a cluster orchestrated by
a cluster manager.
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 8 of 20
Upon the realization of an analysis, the produced results (output dataset) are also made available
through a set of URLs providing access to the set of results, as defined in the output parameters
of the analysis template. It should be noted that analysis results are also semantically mapped to
the ENTROPY semantic models, based on the adoption of the LDAO ontology [6].
At the current phase, a set of initial algorithms are considered, however, the overall
implementation facilitates the incremental addition of further analysis mechanisms.
Figure 2. Data mining and analysis workflow
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 9 of 20
Figure 3. Indicative algorithm analysis template
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 10 of 20
3. INTEGRATED ALGORITHMS IN ENTROPY PLATFORM
3.1 EntArima
ARIMA models are a popular and flexible class of forecasting model that utilize historical
information to make predictions. This type of model is a basic forecasting technique that can be
used as a foundation for more complex models. Within ENTROPY project the specific package
focuses on examining time series for any sensor enabled attribute (e.g. temperature, co2, energy
consumption), fitting an ARIMA model, and creating a basic forecast. The execution of the
specific package is a general purpose analytic process with no need of extra configuration that
takes as input an hourly time series sensor data and returns predictions for the following 24
hours.
EntArima is used for forecasting a quantity into the future and explaining its historical patterns
depending on the kind of the analyzed attribute. Some of the insights upon the execution of the
algorithm are the seasonal patterns in energy consumption, predicting the expected number room
occupancy as well as the configuration temperature of each HVAC in each building area and
estimating the effect of a new campaign on energy consumption. The results of the specific
package are enabled in the form of a plot so as to be easily interpretable on behalf of the platform
administrator and campaign managers.
3.2 Cooling & Heating Degree Days
This data mining and analysis process regards the calculation of Heating degree days (hdd) and
Cooling degree days (cdd) per area or subarea of the registered buildings. The following steps
are followed:
● For each area, we calculate the average temperature per hour for indoor and outdoor conditions,
denoted as: avg_temp_in, avg_temp_out.
● The base temperature is denoted as base_temp (for winter that the comfort level is considered between
20-24 C --> base_temp = 24, for summer that the comfort level is considered between 23-26 C -->
base_temp = 23).
● The period for measuring the average values is 1 hour.
We consider two periods for the calculation of the heating and cooling degree days, namely the
cold period ranging from 16/10 to 15/04 with base_temp = 24 C and the warm period ranging
from 16/04 - 15/10 with base_temp = 23 C. The following process is realized.
For the cold period (averages per hour):
● If avg_temp_out < base_temp AND avg_temp_in > base_temp then hdd = (base_temp -
avg_temp_out) * 1/24. In this case there is energy waste that is related to temp_diff = avg_temp_in -
base_temp, or temp_diff (%) = 100 * (avg_temp_in - base_temp) / (avg_temp_in - avg_temp_out).
Based on the calculated temp_diff, the associated indicative energy waste can be estimated. Such
energy waste is calculated based on an indicator for the rate of transfer of heat per square meter. Such
a rate depends on the energy characteristics of the building. Accurate numbers to be provided per
category. Indicatively, a common office should have around 1.5 W/(m2K). As an example, for an
office room of 10 square meters and temp_diff = 5 C, the waste of energy is 10m2 * 5C * 1.5
W/(m2C)= 75W, which for an one hour period corresponds to 75Wh.
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 11 of 20
● If avg_temp_out < base_temp AND avg_temp_in < base_temp then hdd = (base_temp -
avg_temp_out) * 1/24 AND energy_waste=0
● If avg_temp_out > base_temp then hdd = 0
For the warm period (averages per hour):
● If avg_temp_out > base_temp AND avg_temp_in < base_temp then cdd = (avg_temp_out -
base_temp) * 1/24. In this case there is energy waste that is related to temp_diff = base_temp -
avg_temp_in, or temp_diff (%) = 100 * (base_temp - avg_temp_in) / (avg_temp_out - avg_temp_in).
Based on the calculated temp_diff, the associated indicative energy waste can be estimated. Such
energy waste is calculated based on an indicator for the rate of transfer of heat per square meter.
● If avg_temp_out > base_temp AND avg_temp_in > base_temp then cdd = (base_temp -
avg_temp_out) * 1/24 AND energy_waste=0
● If avg_temp_out < base_temp then cdd = 0
In all the cases, we calculate the hdd/energy_consumption (days/KWh) and
cdd/energy_consumption indicators in a daily or weekly fashion. Such indicators are very
relevant for comparisons among similar buildings (in terms of size, floors etc). For examination
of energy consumption and energy waste, it is considered the application of a linear regression
model for examining the correlation between degree days with kWh. Once the formula of the
regression line is made available, we can use it to calculate the baseline, or expected, energy
consumption from the degree days. Hence, we can compare these energy indicators with the
actual energy consumption for that period, to determine whether more energy was used than
expected.
3.3 GBDT4consumption
GBDT4consumption is a customized implementation of the gradient boosting algorithm in
order to gain insights at how energy consumption is related to other available environmental
variables. These variables depend on each pilot installation and can be freely selected. Gradient
boosting is a machine learning technique for regression and classification problems, which
produces a prediction model in the form of an ensemble of weak prediction models, typically
decision trees.
GBDT4consumption is adaptable, easy to interpret, and produces highly accurate models.
However, like most implementations today, it is computationally expensive and requires all
training data to be in main memory. As training data becomes ever larger, ENTROPY project
makes use of a distributed implementation of the GBDT algorithm by parallelizing decision trees
training.
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 12 of 20
3.4 StreamQ
Data quality assessment is the crucial process in engineering systems where sensor data
gathering are transmitted into a centralized repository [8]. Defining the quality of the sensor is
important because it will have an impact on the selection of the model, the estimation of
parameters and, consequently, on forecasts. In order to do so we are following the approach
described in [9] for automatic detection of outliers in time series. Innovational outliers (IO),
additive outliers (AO), level shifts (LS), temporary changes (TC) and seasonal level shifts (SLS)
are considered. The algorithms are implemented in the outliers R package [10] .
The proposed procedure may be applied to general seasonal and nonseasonal ARMA process.
The different types of outliers are counted and then, the proportion of outliers with respect to the
length of the stream is calculated as our quality measure.
The data Quality Indicator on streams can be seen on Figure 4
PercentageOutliers = num(outliers) / length(stream)
Q = 1- PercentageOutliers
A brief description of each type of outlier is given here:
Additive outliers: An AO affects the level of the observed time series only at the point that it
occurred
Innovational outliers (IO): An innovational outlier is characterized by an extraordinary impact
which effects persists over succeeding observations. The influence of the outliers may increase
as time proceeds. If they occur at the end of the series several observations are needed in order to
be labelled as IO. [4]
Level shifts (LS): In a level shift outlier, all observations appearing after the outlier move to a
new level. In contrast to additive outliers, a level shift outlier affects many observations and has
a permanent effect
Temporary change (TC): Temporary change outliers are similar to level shift outliers, but the
effect of the outlier diminishes exponentially over the subsequent observations. Eventually, the
series returns to its normal level
Seasonal Level Shifts (SLS): A seasonal additive outlier or seasonal level shift (SLS) appears as
a surprisingly large or small value occurring repeatedly at regular intervals.
The result of the algorithm process is a data quality rate from 0 - 1 that directly feeds the streams
quality tags within the ENTROPY platform.
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 13 of 20
Figure 4. Data Streams Quality Indicator
3.5 Energy Comparison between modular Periods
This data mining and analysis process supports the realization of set of comparisons with regards
to energy consumption per main area or subarea, taking into account the area characteristics.
Comparison of energy consumption as well as power is realized in an hourly, daily, weekly or
monthly level. Comparison may regard the average/min/max values obtained, while comparisons
taking into account the overall surface of the area, the number of occupants etc are also
supported. Modularity in terms of the definition of the comparison periods is supported through
the design and execution of appropriate queries through the implemented Query Builder.
3.6 EntPass Forecasting
Instead of doing a typical forecasting, EntPastForecasting aims to measure the impact of a
campaign intervention on energy consumption of ENTROPY registered building areas.
EntPastForecasting takes as input data all the historic sensor data stream data related with energy
consumption in a place. Then it splits the pool of data in training and testing dataset.
The date of initialization of a campaign is taken as point of data splitting into training and testing
data. A forecasting model is defined with the train dataset. Then the predicted values that come
out of the forecasting model are compared with the real past values. The difference between the
predicted values and the real ones is interpreted as the proofed impact of the campaign
intervention. Given the high computationally needs for generating the forecasting model we
make use of the random forest model available in spark computational framework.
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 14 of 20
3.7 HeatMap with End User Behavioral Characteristics
A heat map (or heatmap) is a graphical representation of data where the individual values
contained in a matrix are represented as colors. The data mining and analysis process supported
regards the creation of heatmaps in the dashboard of the campaign manager, depicting the
summary of the behavioral characteristics of end users per campaign.
Different heatmaps are provided per outcome table made available through the processing of the
campaign initiation questionnaires as well as the campaign evaluation questionnaires. The
processing of the questionnaires regards the execution of a script realizing profiling analysis per
end user. Comparison of initial and final profiling analysis can lead to meaningful insights with
regards to the behavioral change achieved per end user.
3.8 HeatMaps on Energy and Digital Behavioral Interactions (Apps)
In the ENTROPY project, there are different data available from the pilot sites. Those data
include demographics, building data (i.e. building type, building size etc.), psychographics, room
sensor data and building sensor data (i.e. temperature, humidity etc.).
ENTROPY heatmaps (shown in Figure 5) will be used in order to visualize energy consumption
patterns. This type of visualization will make easier to understand anomalies on the data and
detect specific patterns based on the time, the day or even the part of the year.
HeatMap reports will be used to highlight differences in energy consumptions among different
floors or buildings. Furthermore, typical energy consumption diagrams and reports will be
generated based on the average consumption for each week day or month. More specifically,
those results will be useful if also considered together with the average temperature in order to
detect peak hours, which could be analysed and explored further.
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 15 of 20
Figure 5. HeatMap with Typical consumption report on energy data from entropy streams
3.9 Behavioral Indications calculations
This data mining and analysis process regards the calculation of a set of indicators related to the
profile of end users with regards to energy efficiency. Such indicators include Engagement,
Knowledge and Effectiveness. Calculation of indicators is realized in periodical points during
time, taking into account the interaction of end users with the mobile app and the serious game
accordingly. Based on crowd-sensing feedback received through the mobile apps, app-specific as
well as generic behavioral metrics are produced. The evolution of such metrics is also monitored.
Even the use of a single KPI could serve our needs for measuring user’s interaction with the
platform itself. By measuring users’ three basic metrics:
“Engagement” (KPI). “Knowledge” (KPI), “Effectiveness” (KPI) with the mobile app,
important conclusions can be drawn that will afterwards drive recommender decision making. To
achieve so, the digital mobile app interactions listed below, should be considered:
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 16 of 20
Registration - the user registers as a user in the web/mobile app.
Login - the user logs in to the web/mobile app.
Content read/selected - the user completes a search within the web/mobile app.
Quiz/Questionnaire taken – the user completes a survey or quiz/questionnaire.
Level/Grade Achieved - the user completes a level/grade in the web/mobile app.
Content View - the user views particular content.
Comment – the user leaves a comment to a particular action/content/feature or registers a
Fault.
These digital interactions will be stored in the main database, based on the Table 1 below:
Update calculation Period: calculate/re-evaluate engagement metrics every day and some
aggregated KPIs will be computed every week.
ID field Type Description
1 Player ID int The player’s id (positive int)
2 Tips read Int Positive int or Boolean
3 Total tips sent Int Positive int
4 Time spent on Reading a Tip Time Timestamp difference (from Tip sent
to on-click Tip to read)
5 No of Questions sent (Quiz) Int/Time Positive int with timestamp
6 Questions read Int Positive int
7 Time spent on Questions Time Timestamp difference (from quiz
open to quiz close)
8 No of Questions answered
correctly
Int Correct answered on Quiz
9 Total questions in Quiz Int Positive int
16 Faults registered/Player ID Int Positive int
17 Total faults from all users int Positive int
18 Full Player’s ID click journey int Full click journey (store all clicks on
specific mobile app features with
timestamps)
19 Push notifications read Int/Time When reading a push notification
with timestamp
20 Push notifications sent int
Table 1: Behavioral metrics/KPIs
The quantification of the occupant’s participation in the ENTROPY designed perso app
educational challenges constitutes a separate behavioural metric and undoubtedly is one of the
most critical metrics. Participation is planned to track the actions occupants do and approve their
active involvement with the challenges. Participation is first calculated on an individual level and
then it is aggregated to provide a robust summary of the overall performance of occupants.
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 17 of 20
Based on the interactions below, the three main Behavioural KPIs will be calculated, based on
specific mathematical equations. The calculation will take place at the ENTROPY back-end and
ENTROPY admin panel.
Based on the interactions metrics and the behavioural KPI calculation, we will have a very clear
and solid view and analysis of all user’s behavioural profile inside the mobile app. One of the
basic objectives (knowledge, engagement and effectiveness) will be under a continuous recursive
calculation, in order to feed the recommender. The recommender will also tune and personalize
more the content, per groups, per user groups, per individual profiles, per user selections, and per
user persona. This will be initially tuned by the questionnaire input and general demographics.
The three Behavioral KPIs will be calculated as the Table 2 below shows:
Behavioral Metric Formula
Engagement Y=0.4(logins per user of last 30 days / top
player’s logins)+0.4(Content interactions
per user of last 30 days /top player content
interactions)+0.2(faults registered/top
player’s faults)
Measures the
interaction of the
player with the app
and the content
Knowledge Y=0.2(tips read/sent)+0.3(correct
questions/read)+0.3(commitments
fulfilled/engaged)+0.2(faults per player
resolved/total resolved)
Measures the
knowledge level of
the users, acquired
from his interaction
with the content
Effectiveness Y=0.4(average timestamp or read/content
itel)+0.1(tips read/sent)+0.1(questions
read/sent)+ 0.1(correct
answers/read)+0.1(commitments
fulfilled/engaged)+0.2(faults registered)
Measures the
effectiveness and
speed of user
interaction with the
content and app
Table 2: Calculation formulas of behavioral Metrics (weighted)
3.10 Clustering HeatMap algorithm
Clustering or cluster analysis is a form of unsupervised learning, which means that the class
labels of the input data are unknown. The aim of clustering is to detect groups in the data, called
clusters. The input data points should be partitioned into a number of clusters, in such a way that
the points belonging to the same cluster are more similar to each other than to points belonging
to other clusters.
The algorithm will take as inputs data from various ENTROPY streams (kWh, Temp, etc) and
data will be clustered against other data sets in order to identify common data sets and
similarities, during the analysis process.
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 18 of 20
Figure 6. Clustering HeatMap Report with color codes
The output clustering report will be visualized with the use of a hierarchical heatmap, as seen on
Figure 6. Hierarchical clustering algorithms are useful for observing hierarchical structure of
data and datasets and use as a distance metric the “Euclidean distance” while there are different
methods of agglomerative hierarchical methods.
In the context of the ENTROPY Analytics toolbox, clustering algorithms will be used in order to
identify clusters/groups of occupants and data sets/mobile interactions. This will enable
ENTROPY mobile app to be personalized according to the occupants groups and the ENTROPY
administrator to perform structural analysis on data sets.
Clustering results will also enable to identify special characteristics of the groups associated with
demographics data and their energy consumption
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 19 of 20
4. CONCLUSIONS
In this Deliverable, the prototype toolbox of ENTROPY Analytics and Reporting framework was
presented, as a technical manual. The algorithms were tested on real ENTROPY energy data
(streams) and behavioural interactions (mobile app data)
649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework
31/10/2017 – Final 1.0 Page 20 of 20
BIBLIOGRAPHY – REFERENCES
1. González-Vidal, A.; Moreno-Cano, V.; Terroso-Sáenz, F.; Skarmeta, A.F. Towards Energy Efficiency
Smart Buildings Models Based on Intelligent Data Analytics. Proced. Comput. Sci. 2016, 83, 994–999.
2. American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE). Chapter 14,
Climatic Design Information. In 2017 ASHRAE Handbook—Fundamentals; ASHRAE Handbook Series;
American Society of Heating, Refrigerating and Air-Conditioning Engineers: Atlanta, GA, USA, 2017;
ISBN 978-1939200587.
3. R Project for Statistical Computing. Available online: https://www.r-project.org/.
4. Apache Spark. Available online: http://spark.apache.org/.
5. Apache Spark OpenCPU Executor (ROSE). Available online: https://github.com/onetapbeyond/opencpu-
spark-executor.
6. Fotopoulou, E.; Zafeiropoulos, A.; Papaspyros, D.; Hasapis, P.; Tsiolis, G.; Bouras, T.; Mouzakitis, S.;
Zanetti, N. Linked Data Analytics in Interdisciplinary Studies: The Health Impact of Air Pollution in Urban
Areas. IEEE Access 2016, 4, 149–164.
7. Fotopoulou, E.; Zafeiropoulos, A.; Terroso-Sáenz, F.; Şimşek, U.; González-Vidal, A.; Tsiolis, G.; Gouvas,
P.; Liapis, P.; Fensel, A.; Skarmeta, A. Providing Personalized Energy Management and Awareness
Services for Energy Efficiency in Smart Buildings. Sensors 2017, 17, 2054.
8. Hill, David J, and Barbara S Minsker. 2010. “Anomaly Detection in Streaming Environmental Sensor Data:
A Data-Driven Modeling Approach.” Environmental Modelling & Software 25 (9). Elsevier: 1014–22.
9. Chen, Chung, and Lon-Mu Liu. 1993. “Joint Estimation of Model Parameters and Outlier Effects in Time
Series.” Journal of the American Statistical Association 88 (421). Taylor & Francis Group: 284–97
10. López-de-Lacalle, J. (2016). tsoutliers R Package for Detection of Outliers in time Series.
11. Box, George EP, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. 2015. Time Series Analysis:
Forecasting and Control. John Wiley & Sons.