20
ENTROPY Consortium Title: Document Version: D4.1. Deployment of Energy Data Analytics Reporting and Visualization Framework 1.0 Project Number: Project Acronym: Project Title: 649849 ENTROPY Design of an Innovative Energy-Aware IT Ecosystem for Motivating Behavioural Changes Towards the Adoption of Energy Efficient Lifestyles Contractual Delivery Date: Actual Delivery Date: Deliverable Type* - Security**: 31/10/2017 31/10/2017 P PU * Type: P Prototype, R Report, D Demonstrator, O Other ** Security Class: PU- Public, PP Restricted to other programme participants (including the Commission), RE Restricted to a group defined by the consortium (including the Commission), CO Confidential, only for members of the consortium (including the Commission) Responsible and Editor/Author: Organization: Contributing WP: Vassilis Nikolopoulos Intelen WP4 Authors (organizations): Vassilis Nikolopoulos (INT), Anastasios Zafeiropoulos (UBI), John Papagiannis (INT), Eleni Fotopoulou (UBI), Antonio Skarmeta (UMU), Aurora Gonzalez Vidal (UMU) Abstract: This document describes the prototype toolbox of ENTROPY algorithmics and the technical framework Keywords: Prototype, analytics, algorithms

649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

ENTROPY Consortium

Title: Document Version:

D4.1. Deployment of Energy Data Analytics Reporting and Visualization

Framework 1.0

Project Number: Project Acronym: Project Title:

649849 ENTROPY Design of an Innovative Energy-Aware IT Ecosystem for Motivating

Behavioural Changes Towards the Adoption of Energy Efficient Lifestyles

Contractual Delivery Date: Actual Delivery Date: Deliverable Type* - Security**:

31/10/2017 31/10/2017 P – PU * Type: P – Prototype, R – Report, D – Demonstrator, O – Other ** Security Class: PU- Public, PP – Restricted to other programme participants (including the Commission), RE – Restricted to a group

defined by the consortium (including the Commission), CO – Confidential, only for members of the consortium (including

the Commission)

Responsible and Editor/Author: Organization: Contributing WP:

Vassilis Nikolopoulos Intelen WP4

Authors (organizations):

Vassilis Nikolopoulos (INT), Anastasios Zafeiropoulos (UBI), John Papagiannis (INT), Eleni

Fotopoulou (UBI), Antonio Skarmeta (UMU), Aurora Gonzalez Vidal (UMU)

Abstract:

This document describes the prototype toolbox of ENTROPY algorithmics and the technical

framework

Keywords:

Prototype, analytics, algorithms

Page 2: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 2 of 20

Revision History

The following table describes the main changes done in the document since created.

Revision Date Description Author (Organization)

v0.1 30/10/2017 Initial draft document by INTELEN Vassilis Nikolopoulos (INT)

John Papagiannis (INT)

v0.2 30/10/2017 Technical architecture and algorithmics Anastasios Zafeiropoulos (UBI)

Eleni Fotopoulou (UBI)

Aurora Gonzalez Vidal (UMU)

Vassilis Nikolopoulos (UMU)

v0.3 31/10/2017 Document Review Antonio Skarmeta (UMU)

Aurora Gonzalez Vidal (UMU)

Pedro J. Fernández Ruiz (UMU)

Aristotelis Agianniotis (HESSO)

v1.0 31/10/2017 Document Final Version Anastasios Zafeiropoulos (UBI)

Eleni Fotopoulou (UBI)

Aurora Gonzalez Vidal (UMU)

Vassilis Nikolopoulos (UMU)

Page 3: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 3 of 20

Executive Summary

The document provides a basic manual/guide for the ENTROPY Analytics toolbox. The

technical analysis and outputs are based on WP2 results and the theoretical algorithms that were

developed. The basic UI of the ENTROPY algorithmic dashboard/algorithmic tool is presented,

including the backend architecture and the description of the various algorithms. The analysis

and the analytics toolbox will be used on real ENTROPY data sets and behavioral digital

interactions, generated by the mobile apps.

Disclaimer

This project has received funding from the European Union’s Horizon 2020 research and

innovation programme under grant agreement No 649849, but this document only reflects the

consortium’s view. The European Commission is not responsible for any use that may be made

of the information it contains.

Page 4: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 4 of 20

Table of Contents

1. Introduction ........................................................................................................................ 5

2. Data Mining and Analysis Services ................................................................................... 7

2.1 Algorithms and Architecture ....................................................................................... 7

3. Integrated Algorithms IN ENTROPY Platform .............................................................. 10

3.1 EntArima ...................................................................................................................... 10

3.2 Cooling & Heating Degree Days ................................................................................ 10

3.3 GBDT4consumption.................................................................................................... 11

3.4 StreamQ ....................................................................................................................... 12

3.5 Energy Comparison between modular Periods ........................................................ 13

3.6 EntPass Forecasting .................................................................................................... 13

3.7 HeatMap with End User Behavioral Characteristics .............................................. 14

3.8 HeatMaps on Energy and Digital Behavioral Interactions (Apps)......................... 14

3.9 Behavioral Indications calculations ........................................................................... 15

3.10 Clustering HeatMap algorithm .................................................................................. 17

4. Conclusions ...................................................................................................................... 19

Bibliography – References ........................................................................................................... 20

Page 5: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 5 of 20

1. INTRODUCTION

This document presents the whole architecture and various functionalities of ENTROPY

Analytics and Reporting dashboard (Prototype). The relevant framework was based on the

OpenCPU system, for embedding algorithms in R or Python languages and functions. The

flexibility that OpenCPU offers is of great importance, since the ENTROPY can connect with

any external algorithmic package or functions and execute it, on real ENTROPY data sets

(Energy or Behavioral interactions).

DATA SETS

Questionnaires

Surveys

Algorithms

Analytics in R/

Python

User inputs

Clustering

Classification

Correlation

Regression

PCA

Predictions

Digital

Interactions

Behavioral

KPIs

Energy Data

from

ENTROPY

Building

Descriptive

Data

ENTROPY

Admin Panel

Admin tuning /

selection

Various 3rd

Party data (ie.

weather)

DATA SETS

Execution

Visualization

and Results

Optimal Algorithmic selection based

on ENTROPY Value Propositions and

Engagement objectives

Measurement and Verification – M&V

Recommender Tuning

ENTROPY stratistics/trends

EU Engagement frameworks

Figure 1. ENTROPY Analytics overall Functional Procedure including data inputs

Page 6: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 6 of 20

Various algorithms are presented with details on the input and output fields and some reporting

snapshots obtained from ENTROPY toolbox. Also some details on the behavioral KPIs and how

we calculate them, using ENTROPY’s data and mobile apps.

On ENTROPY platform, various algorithms will be able to be executed (in remote mode, based

on the OpenCPU approach), through the Administration panel. The general process of the

algorithmic flow for all algorithms is presented below in Figure 1

ENTROPY analytics engine will take into account all inputs from individual questionnaires,

from IoT sensor streams (pilot sites), from behavioural interactions (mobile apps) and other

external data sources and will use algorithmic functions to calculate results and produce reports.

Algorithms will be executed remotely and this will enable ENTROPY platform and the specific

toolbox to become a “SaaS (Software as A Service) toolbox” that will enable various business

models on the platform (Analytics as A Service)

Page 7: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 7 of 20

2. DATA MINING AND ANALYSIS SERVICES

2.1 Algorithms and Architecture

Within the ENTROPY platform, a data mining and analysis service is implemented that regards

the support of a set of big data mining and analysis techniques towards the extraction of energy

and behavioral analytics [7]. Insights provided with regards to the energy usage in smart

buildings, as well as the behavioral characteristics of the occupants, may lead on one hand on

increase of their energy awareness and on the other hand on targeted recommendations for

reducing energy consumption.

The supported set of analytics processes concerns descriptive, predictive, classification,

clustering, and prescriptive analytics [1]. Descriptive analytics are providing summary

information regarding the energy usage, as well as other environmental or behavioral attributes.

Predictive analytics are providing estimates for usage of energy the upcoming period, as well as

examining the relationship among energy consumption and set of parameters, such as average

temperature, heating or cooling degree days, day of the week, etc.

The workflow followed for the support of data mining and analysis techniques is depicted in

Figure 1. An analysis process is based on the selection of an analysis template and the selection

of the queries to be executed for providing the input datasets (training and/or evaluation

datasets). Each analysis template represents a specific algorithm and provides to the user the

flexibility to adjust the relevant configuration parameters. Such parameters include input

parameters for the algorithm along with their description and their default value, as well as

output parameters along with their type (text, image, data, html). An indicative analysis template

for the calculation of heating or cooling degree days per day [2] for a monthly period is depicted

in Figure 2. A set of analysis templates can be made available, as in Figure 3 and be used for

initiating an analysis. It should be noted that an analysis process is also associated with a set of

execution parameters that denote whether an analysis should be realized in a manual or

automated way, as well as the periodicity factor for the latter case.

The design of queries for obtaining the input datasets for the analysis is based on the

development of a query builder over MongoDB, facilitating end users to easily prepare their

input datasets. Two categories of queries are supported, namely queries for fetching data

collected by sensor data streams (e.g., energy consumption, humidity, and indoor temperature

data per hour for a specific room) and queries for fetching data related to the set of users

participating at the energy efficiency campaign (e.g., a set of users with an educational level

relevant to a Master’s degree). Upon the execution of the queries, streams of the input training or

evaluation datasets are provided to the analysis toolkits.

In ENTROPY, the R Project for Statistical Computing [3], and the Apache Spark fast and

general engine for large-scale data processing [4] are used for this purpose. Depending on the

analysis needs in terms of big data management and performance aspects, the optimal tool per

case may be selected. Interconnection of the ENTROPY components with the analysis toolkits is

based on the OpenCPU system for embedded scientific computing that provides a reliable and

interoperable HTTP API for data analysis based on R. In the case of large-scale data processing

and the need for a big data analysis framework, the Apache Spark engine is used, where the

analysis process is realized in a set of worker nodes, each one of which is hosting an Apache

Spark OpenCPU Executor [5]. The set of worker nodes are formulating a cluster orchestrated by

a cluster manager.

Page 8: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 8 of 20

Upon the realization of an analysis, the produced results (output dataset) are also made available

through a set of URLs providing access to the set of results, as defined in the output parameters

of the analysis template. It should be noted that analysis results are also semantically mapped to

the ENTROPY semantic models, based on the adoption of the LDAO ontology [6].

At the current phase, a set of initial algorithms are considered, however, the overall

implementation facilitates the incremental addition of further analysis mechanisms.

Figure 2. Data mining and analysis workflow

Page 9: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 9 of 20

Figure 3. Indicative algorithm analysis template

Page 10: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 10 of 20

3. INTEGRATED ALGORITHMS IN ENTROPY PLATFORM

3.1 EntArima

ARIMA models are a popular and flexible class of forecasting model that utilize historical

information to make predictions. This type of model is a basic forecasting technique that can be

used as a foundation for more complex models. Within ENTROPY project the specific package

focuses on examining time series for any sensor enabled attribute (e.g. temperature, co2, energy

consumption), fitting an ARIMA model, and creating a basic forecast. The execution of the

specific package is a general purpose analytic process with no need of extra configuration that

takes as input an hourly time series sensor data and returns predictions for the following 24

hours.

EntArima is used for forecasting a quantity into the future and explaining its historical patterns

depending on the kind of the analyzed attribute. Some of the insights upon the execution of the

algorithm are the seasonal patterns in energy consumption, predicting the expected number room

occupancy as well as the configuration temperature of each HVAC in each building area and

estimating the effect of a new campaign on energy consumption. The results of the specific

package are enabled in the form of a plot so as to be easily interpretable on behalf of the platform

administrator and campaign managers.

3.2 Cooling & Heating Degree Days

This data mining and analysis process regards the calculation of Heating degree days (hdd) and

Cooling degree days (cdd) per area or subarea of the registered buildings. The following steps

are followed:

● For each area, we calculate the average temperature per hour for indoor and outdoor conditions,

denoted as: avg_temp_in, avg_temp_out.

● The base temperature is denoted as base_temp (for winter that the comfort level is considered between

20-24 C --> base_temp = 24, for summer that the comfort level is considered between 23-26 C -->

base_temp = 23).

● The period for measuring the average values is 1 hour.

We consider two periods for the calculation of the heating and cooling degree days, namely the

cold period ranging from 16/10 to 15/04 with base_temp = 24 C and the warm period ranging

from 16/04 - 15/10 with base_temp = 23 C. The following process is realized.

For the cold period (averages per hour):

● If avg_temp_out < base_temp AND avg_temp_in > base_temp then hdd = (base_temp -

avg_temp_out) * 1/24. In this case there is energy waste that is related to temp_diff = avg_temp_in -

base_temp, or temp_diff (%) = 100 * (avg_temp_in - base_temp) / (avg_temp_in - avg_temp_out).

Based on the calculated temp_diff, the associated indicative energy waste can be estimated. Such

energy waste is calculated based on an indicator for the rate of transfer of heat per square meter. Such

a rate depends on the energy characteristics of the building. Accurate numbers to be provided per

category. Indicatively, a common office should have around 1.5 W/(m2K). As an example, for an

office room of 10 square meters and temp_diff = 5 C, the waste of energy is 10m2 * 5C * 1.5

W/(m2C)= 75W, which for an one hour period corresponds to 75Wh.

Page 11: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 11 of 20

● If avg_temp_out < base_temp AND avg_temp_in < base_temp then hdd = (base_temp -

avg_temp_out) * 1/24 AND energy_waste=0

● If avg_temp_out > base_temp then hdd = 0

For the warm period (averages per hour):

● If avg_temp_out > base_temp AND avg_temp_in < base_temp then cdd = (avg_temp_out -

base_temp) * 1/24. In this case there is energy waste that is related to temp_diff = base_temp -

avg_temp_in, or temp_diff (%) = 100 * (base_temp - avg_temp_in) / (avg_temp_out - avg_temp_in).

Based on the calculated temp_diff, the associated indicative energy waste can be estimated. Such

energy waste is calculated based on an indicator for the rate of transfer of heat per square meter.

● If avg_temp_out > base_temp AND avg_temp_in > base_temp then cdd = (base_temp -

avg_temp_out) * 1/24 AND energy_waste=0

● If avg_temp_out < base_temp then cdd = 0

In all the cases, we calculate the hdd/energy_consumption (days/KWh) and

cdd/energy_consumption indicators in a daily or weekly fashion. Such indicators are very

relevant for comparisons among similar buildings (in terms of size, floors etc). For examination

of energy consumption and energy waste, it is considered the application of a linear regression

model for examining the correlation between degree days with kWh. Once the formula of the

regression line is made available, we can use it to calculate the baseline, or expected, energy

consumption from the degree days. Hence, we can compare these energy indicators with the

actual energy consumption for that period, to determine whether more energy was used than

expected.

3.3 GBDT4consumption

GBDT4consumption is a customized implementation of the gradient boosting algorithm in

order to gain insights at how energy consumption is related to other available environmental

variables. These variables depend on each pilot installation and can be freely selected. Gradient

boosting is a machine learning technique for regression and classification problems, which

produces a prediction model in the form of an ensemble of weak prediction models, typically

decision trees.

GBDT4consumption is adaptable, easy to interpret, and produces highly accurate models.

However, like most implementations today, it is computationally expensive and requires all

training data to be in main memory. As training data becomes ever larger, ENTROPY project

makes use of a distributed implementation of the GBDT algorithm by parallelizing decision trees

training.

Page 12: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 12 of 20

3.4 StreamQ

Data quality assessment is the crucial process in engineering systems where sensor data

gathering are transmitted into a centralized repository [8]. Defining the quality of the sensor is

important because it will have an impact on the selection of the model, the estimation of

parameters and, consequently, on forecasts. In order to do so we are following the approach

described in [9] for automatic detection of outliers in time series. Innovational outliers (IO),

additive outliers (AO), level shifts (LS), temporary changes (TC) and seasonal level shifts (SLS)

are considered. The algorithms are implemented in the outliers R package [10] .

The proposed procedure may be applied to general seasonal and nonseasonal ARMA process.

The different types of outliers are counted and then, the proportion of outliers with respect to the

length of the stream is calculated as our quality measure.

The data Quality Indicator on streams can be seen on Figure 4

PercentageOutliers = num(outliers) / length(stream)

Q = 1- PercentageOutliers

A brief description of each type of outlier is given here:

Additive outliers: An AO affects the level of the observed time series only at the point that it

occurred

Innovational outliers (IO): An innovational outlier is characterized by an extraordinary impact

which effects persists over succeeding observations. The influence of the outliers may increase

as time proceeds. If they occur at the end of the series several observations are needed in order to

be labelled as IO. [4]

Level shifts (LS): In a level shift outlier, all observations appearing after the outlier move to a

new level. In contrast to additive outliers, a level shift outlier affects many observations and has

a permanent effect

Temporary change (TC): Temporary change outliers are similar to level shift outliers, but the

effect of the outlier diminishes exponentially over the subsequent observations. Eventually, the

series returns to its normal level

Seasonal Level Shifts (SLS): A seasonal additive outlier or seasonal level shift (SLS) appears as

a surprisingly large or small value occurring repeatedly at regular intervals.

The result of the algorithm process is a data quality rate from 0 - 1 that directly feeds the streams

quality tags within the ENTROPY platform.

Page 13: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 13 of 20

Figure 4. Data Streams Quality Indicator

3.5 Energy Comparison between modular Periods

This data mining and analysis process supports the realization of set of comparisons with regards

to energy consumption per main area or subarea, taking into account the area characteristics.

Comparison of energy consumption as well as power is realized in an hourly, daily, weekly or

monthly level. Comparison may regard the average/min/max values obtained, while comparisons

taking into account the overall surface of the area, the number of occupants etc are also

supported. Modularity in terms of the definition of the comparison periods is supported through

the design and execution of appropriate queries through the implemented Query Builder.

3.6 EntPass Forecasting

Instead of doing a typical forecasting, EntPastForecasting aims to measure the impact of a

campaign intervention on energy consumption of ENTROPY registered building areas.

EntPastForecasting takes as input data all the historic sensor data stream data related with energy

consumption in a place. Then it splits the pool of data in training and testing dataset.

The date of initialization of a campaign is taken as point of data splitting into training and testing

data. A forecasting model is defined with the train dataset. Then the predicted values that come

out of the forecasting model are compared with the real past values. The difference between the

predicted values and the real ones is interpreted as the proofed impact of the campaign

intervention. Given the high computationally needs for generating the forecasting model we

make use of the random forest model available in spark computational framework.

Page 14: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 14 of 20

3.7 HeatMap with End User Behavioral Characteristics

A heat map (or heatmap) is a graphical representation of data where the individual values

contained in a matrix are represented as colors. The data mining and analysis process supported

regards the creation of heatmaps in the dashboard of the campaign manager, depicting the

summary of the behavioral characteristics of end users per campaign.

Different heatmaps are provided per outcome table made available through the processing of the

campaign initiation questionnaires as well as the campaign evaluation questionnaires. The

processing of the questionnaires regards the execution of a script realizing profiling analysis per

end user. Comparison of initial and final profiling analysis can lead to meaningful insights with

regards to the behavioral change achieved per end user.

3.8 HeatMaps on Energy and Digital Behavioral Interactions (Apps)

In the ENTROPY project, there are different data available from the pilot sites. Those data

include demographics, building data (i.e. building type, building size etc.), psychographics, room

sensor data and building sensor data (i.e. temperature, humidity etc.).

ENTROPY heatmaps (shown in Figure 5) will be used in order to visualize energy consumption

patterns. This type of visualization will make easier to understand anomalies on the data and

detect specific patterns based on the time, the day or even the part of the year.

HeatMap reports will be used to highlight differences in energy consumptions among different

floors or buildings. Furthermore, typical energy consumption diagrams and reports will be

generated based on the average consumption for each week day or month. More specifically,

those results will be useful if also considered together with the average temperature in order to

detect peak hours, which could be analysed and explored further.

Page 15: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 15 of 20

Figure 5. HeatMap with Typical consumption report on energy data from entropy streams

3.9 Behavioral Indications calculations

This data mining and analysis process regards the calculation of a set of indicators related to the

profile of end users with regards to energy efficiency. Such indicators include Engagement,

Knowledge and Effectiveness. Calculation of indicators is realized in periodical points during

time, taking into account the interaction of end users with the mobile app and the serious game

accordingly. Based on crowd-sensing feedback received through the mobile apps, app-specific as

well as generic behavioral metrics are produced. The evolution of such metrics is also monitored.

Even the use of a single KPI could serve our needs for measuring user’s interaction with the

platform itself. By measuring users’ three basic metrics:

“Engagement” (KPI). “Knowledge” (KPI), “Effectiveness” (KPI) with the mobile app,

important conclusions can be drawn that will afterwards drive recommender decision making. To

achieve so, the digital mobile app interactions listed below, should be considered:

Page 16: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 16 of 20

Registration - the user registers as a user in the web/mobile app.

Login - the user logs in to the web/mobile app.

Content read/selected - the user completes a search within the web/mobile app.

Quiz/Questionnaire taken – the user completes a survey or quiz/questionnaire.

Level/Grade Achieved - the user completes a level/grade in the web/mobile app.

Content View - the user views particular content.

Comment – the user leaves a comment to a particular action/content/feature or registers a

Fault.

These digital interactions will be stored in the main database, based on the Table 1 below:

Update calculation Period: calculate/re-evaluate engagement metrics every day and some

aggregated KPIs will be computed every week.

ID field Type Description

1 Player ID int The player’s id (positive int)

2 Tips read Int Positive int or Boolean

3 Total tips sent Int Positive int

4 Time spent on Reading a Tip Time Timestamp difference (from Tip sent

to on-click Tip to read)

5 No of Questions sent (Quiz) Int/Time Positive int with timestamp

6 Questions read Int Positive int

7 Time spent on Questions Time Timestamp difference (from quiz

open to quiz close)

8 No of Questions answered

correctly

Int Correct answered on Quiz

9 Total questions in Quiz Int Positive int

16 Faults registered/Player ID Int Positive int

17 Total faults from all users int Positive int

18 Full Player’s ID click journey int Full click journey (store all clicks on

specific mobile app features with

timestamps)

19 Push notifications read Int/Time When reading a push notification

with timestamp

20 Push notifications sent int

Table 1: Behavioral metrics/KPIs

The quantification of the occupant’s participation in the ENTROPY designed perso app

educational challenges constitutes a separate behavioural metric and undoubtedly is one of the

most critical metrics. Participation is planned to track the actions occupants do and approve their

active involvement with the challenges. Participation is first calculated on an individual level and

then it is aggregated to provide a robust summary of the overall performance of occupants.

Page 17: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 17 of 20

Based on the interactions below, the three main Behavioural KPIs will be calculated, based on

specific mathematical equations. The calculation will take place at the ENTROPY back-end and

ENTROPY admin panel.

Based on the interactions metrics and the behavioural KPI calculation, we will have a very clear

and solid view and analysis of all user’s behavioural profile inside the mobile app. One of the

basic objectives (knowledge, engagement and effectiveness) will be under a continuous recursive

calculation, in order to feed the recommender. The recommender will also tune and personalize

more the content, per groups, per user groups, per individual profiles, per user selections, and per

user persona. This will be initially tuned by the questionnaire input and general demographics.

The three Behavioral KPIs will be calculated as the Table 2 below shows:

Behavioral Metric Formula

Engagement Y=0.4(logins per user of last 30 days / top

player’s logins)+0.4(Content interactions

per user of last 30 days /top player content

interactions)+0.2(faults registered/top

player’s faults)

Measures the

interaction of the

player with the app

and the content

Knowledge Y=0.2(tips read/sent)+0.3(correct

questions/read)+0.3(commitments

fulfilled/engaged)+0.2(faults per player

resolved/total resolved)

Measures the

knowledge level of

the users, acquired

from his interaction

with the content

Effectiveness Y=0.4(average timestamp or read/content

itel)+0.1(tips read/sent)+0.1(questions

read/sent)+ 0.1(correct

answers/read)+0.1(commitments

fulfilled/engaged)+0.2(faults registered)

Measures the

effectiveness and

speed of user

interaction with the

content and app

Table 2: Calculation formulas of behavioral Metrics (weighted)

3.10 Clustering HeatMap algorithm

Clustering or cluster analysis is a form of unsupervised learning, which means that the class

labels of the input data are unknown. The aim of clustering is to detect groups in the data, called

clusters. The input data points should be partitioned into a number of clusters, in such a way that

the points belonging to the same cluster are more similar to each other than to points belonging

to other clusters.

The algorithm will take as inputs data from various ENTROPY streams (kWh, Temp, etc) and

data will be clustered against other data sets in order to identify common data sets and

similarities, during the analysis process.

Page 18: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 18 of 20

Figure 6. Clustering HeatMap Report with color codes

The output clustering report will be visualized with the use of a hierarchical heatmap, as seen on

Figure 6. Hierarchical clustering algorithms are useful for observing hierarchical structure of

data and datasets and use as a distance metric the “Euclidean distance” while there are different

methods of agglomerative hierarchical methods.

In the context of the ENTROPY Analytics toolbox, clustering algorithms will be used in order to

identify clusters/groups of occupants and data sets/mobile interactions. This will enable

ENTROPY mobile app to be personalized according to the occupants groups and the ENTROPY

administrator to perform structural analysis on data sets.

Clustering results will also enable to identify special characteristics of the groups associated with

demographics data and their energy consumption

Page 19: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 19 of 20

4. CONCLUSIONS

In this Deliverable, the prototype toolbox of ENTROPY Analytics and Reporting framework was

presented, as a technical manual. The algorithms were tested on real ENTROPY energy data

(streams) and behavioural interactions (mobile app data)

Page 20: 649849 ENTROPY Design of an Innovative Energy-Aware IT ...€¦ · 649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework 31/10/2017 – Final

649849 ENTROPY D4.1. Deployment of Energy Data Analytics Reporting & Visualization Framework

31/10/2017 – Final 1.0 Page 20 of 20

BIBLIOGRAPHY – REFERENCES

1. González-Vidal, A.; Moreno-Cano, V.; Terroso-Sáenz, F.; Skarmeta, A.F. Towards Energy Efficiency

Smart Buildings Models Based on Intelligent Data Analytics. Proced. Comput. Sci. 2016, 83, 994–999.

2. American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE). Chapter 14,

Climatic Design Information. In 2017 ASHRAE Handbook—Fundamentals; ASHRAE Handbook Series;

American Society of Heating, Refrigerating and Air-Conditioning Engineers: Atlanta, GA, USA, 2017;

ISBN 978-1939200587.

3. R Project for Statistical Computing. Available online: https://www.r-project.org/.

4. Apache Spark. Available online: http://spark.apache.org/.

5. Apache Spark OpenCPU Executor (ROSE). Available online: https://github.com/onetapbeyond/opencpu-

spark-executor.

6. Fotopoulou, E.; Zafeiropoulos, A.; Papaspyros, D.; Hasapis, P.; Tsiolis, G.; Bouras, T.; Mouzakitis, S.;

Zanetti, N. Linked Data Analytics in Interdisciplinary Studies: The Health Impact of Air Pollution in Urban

Areas. IEEE Access 2016, 4, 149–164.

7. Fotopoulou, E.; Zafeiropoulos, A.; Terroso-Sáenz, F.; Şimşek, U.; González-Vidal, A.; Tsiolis, G.; Gouvas,

P.; Liapis, P.; Fensel, A.; Skarmeta, A. Providing Personalized Energy Management and Awareness

Services for Energy Efficiency in Smart Buildings. Sensors 2017, 17, 2054.

8. Hill, David J, and Barbara S Minsker. 2010. “Anomaly Detection in Streaming Environmental Sensor Data:

A Data-Driven Modeling Approach.” Environmental Modelling & Software 25 (9). Elsevier: 1014–22.

9. Chen, Chung, and Lon-Mu Liu. 1993. “Joint Estimation of Model Parameters and Outlier Effects in Time

Series.” Journal of the American Statistical Association 88 (421). Taylor & Francis Group: 284–97

10. López-de-Lacalle, J. (2016). tsoutliers R Package for Detection of Outliers in time Series.

11. Box, George EP, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. 2015. Time Series Analysis:

Forecasting and Control. John Wiley & Sons.