Description of Available Components for SW Functions

SUCCESS D4.5 v1.0

Page 1 (61)

SUCCESS

D4.5 v1.0

Description of Available Components for SW Functions, Infrastructure and Related Documentation, V2

The research leading to these results has received funding from the European Union’s Horizon 2020 Research and Innovation Programme, under Grant Agreement no 700416.

Project Name SUCCESS

Contractual Delivery Date: July 31st 2017

Actual Delivery Date: July 31st 2017

Contributors: RWTH, EDD, P3E, P3C, LMF, ENG, REC

Workpackage: WP4 – Securing Smart Infrastructure

Security: PU

Nature: R

Version: 1.0

Total number of pages: 61

Abstract: This document describes the functionality of the components of the Pan-European Security Monitoring Centre part of the SUCCESS Security Monitoring Solution and the interfaces in the SUCCESS Security Monitoring Solution. In addition, it defines a set of security countermeasures to threats and we describe the way that the countermeasures are executed by means of the components in the SUCCESS Security Monitoring Solution. Hence, this deliverable maps the SUCCESS countermeasures to functionality of the components and the interfaces between the components.

Keyword list: Security, communication, Utility, Architecture, Threat, Countermeasure, Security Monitoring Solution

Disclaimer: All information provided reflects the status of the SUCCESS project at the time of writing and may be subject to change.

Ref. Ares(2017)3837433 - 31/07/2017

SUCCESS D4.5 v1.0

Page 2 (61)

Executive Summary

This document describes the components in the SUCCESS Security Monitoring Solution and describes how these components are used to implement countermeasures to cyber-attacks and physical attacks. SUCCESS is developing a Pan-European Security Monitoring Centre (ESMC), which develops a a comprehensive view of the security status of critical infrastructures, and in the context of SUCCESS, particularly the electrical grid and its communication network. ESMC is a distributed system, consisting of a central part, namely the Security Monitoring and Information System (E-SMIS), and a distributed part, consisting of individual instances distributed across Europe, namely the DE-SMIS instances. The DE-SMIS instances gather information in the local areas where they are located, both from the DSOs themselves and from other sources. E-SMIS, similarly, gathers information from the DE-SMIS instances and from other sources ESMC uses the gathered information to detect security incidents by examining the gathered data for patterns related to cyber-attacks or physical attacks and to alert DSOs or TSOs about the security incidents. SUCCESS applies a defense-in-depth approach to detection of security incidents and the application of countermeasures. On the top of the detection pyramid is the ESMC, which detects security incidents but does not itself automatically initiate countermeasures, due to the large possible consequences, which are considered to warrant a conscious decision from the responsible authorities. The SUCCESS Security Monitoring Solution components whose sphere of action is more limited are, however, able to autonomously initiate countermeasures: these components are the DSO Security Monitoring Centre (DSOSMC) and the Breakout Gateway (BR-GW). SUCCESS is also developing the DSO Security Monitoring Centre (DSOSMC), which acts as an agent of the DE-SMIS in the area of an individual DSO, providing DE-SMIS with information. Additionally, DSOSMC can, based on data gathered from the distribution grid, identify security incidents and apply countermeasures autonomously. In the SUCCESS infrastructure, data are provided by the NORM (the Smart Meter Gateway being developed by SUCCESS) to the DSOSMC. DSOSMC provides its local DE-SMIS instance with information on applied countermeasures and grid status. DE-SMIS processes this information by interacting with E-SMIS and by resorting to further data sources. Countermeasures are also implemented through the Breakout Gateway (BR-GW), which is a new 5G mobile communications node being developed in SUCCESS, which allows mobile core network functionality to be implemented on the edge of the network (e.g. in a cloud system located on the radio mast). The Breakout Gateway supports Data Centric Security (checking of packet integrity without decoding the packets) and Generic Bootstrap Architecture (authentication performed locally on the BR-GW). This deliverable outlines a set of countermeasures that can be applied to security incidents resulting from cyber-attacks and physical attacks on the SUCCESS infrastructure, together with a mapping onto the components and functions that implement the countermeasures. Focus is put on the countermeasures related to the use cases to be demonstrated in the SUCCESS field trials. This version of this deliverable documents the current status of the design of the SUCCESS Security Monitoring Solution. It is the second version of this deliverable, updating and deprecating the D4.4. This D4.5 version provides more details of the components of SUCCESS Security Monitoring Solution and their interfaces. Note to authors: this D4.5 is this later version, so we should add more details.

SUCCESS D4.5 v1.0

Page 3 (61)

Authors

Partner Name e-mail

RWTH Aachen University (RWTH) Padraic McKeever [email protected] Gianluca Lipari [email protected] OY L M ERICSSON AB (LMF) Patrik Salmela [email protected] ERICSSON GmbH (EDD) Dhruvin Patel [email protected] Zain Mehdi [email protected] Frank Sell [email protected] Robert Farac [email protected] P3 ENERGY & STORAGE GmbH (P3E) Manuel Allhoff [email protected] Engineering – Ingegneria Informatica SPA (ENG) Antonello Corsi [email protected] Giampaolo Fiorentino [email protected] P3 Communications GmbH (P3C) Panagiotis Paschalidis [email protected] Centrul Roman al Energiei (CRE) Mihai Paun [email protected] Mihai Sanduleac [email protected]

mailto:[email protected]












SUCCESS D4.5 v1.0

Page 4 (61)

Table of Contents

1. Introduction ................................................................................................. 6

1.1 How to Read This Document ......................................................................................... 6

2. List of Security Incidents and Outline of Countermeasures ................... 8

3. Mapping of Threats to Security Incidents and Countermeasures ........ 13

4. SUCCESS Components and Interfaces Description .............................. 16

4.1 SUCCESS Components .............................................................................................. 16 4.2 Communication Components....................................................................................... 18

4.2.1 State-of-the-Art of Mobile Networks (4G/LTE) ..................................................... 18 4.2.2 The SUCCESS Solution based on 5G Mobile Networks ..................................... 19 4.2.3 The Communication Solution for SUCCESS ....................................................... 19 4.2.4 Break-out Gateway (BR-GW) ............................................................................... 20 4.2.5 5G communication Test Setup ............................................................................. 25

4.3 Security Monitoring Components ................................................................................. 26 4.3.1 DSO Security Monitoring Centre .......................................................................... 26 4.3.2 Pan-European Security Monitoring Centre .......................................................... 26 4.3.3 Relationship between DE-SMIS and DSOSMC ................................................... 29

4.4 SUCCESS API ............................................................................................................. 29 4.4.1 Definition and Motivation ...................................................................................... 30 4.4.2 Documentation of the SUCCESS API as a RESTful API ..................................... 30 4.4.3 Discussion on Data Models describing Cyber-Security Incidents ........................ 31

4.5 Interfaces in the SUCCESS Security Monitoring Solution ........................................... 32 4.5.1 I1 between NORM and BR-GW or DSOSMC ...................................................... 33 4.5.2 I2 between BR-GW and DSOSMC ...................................................................... 33 4.5.3 I3 between DSO Security Monitoring Centre and DE-SMIS ................................ 34 4.5.4 I4 between DE-SMIS instances ........................................................................... 36 4.5.5 I5 between DE-SMIS and E-SMIS ....................................................................... 38 4.5.6 I6 between E-SMIS and External Data Sources .................................................. 38 4.5.7 I7 between DE-SMIS and internal data sources .................................................. 38 4.5.8 I8 between DE-SMIS and Edge Cloud ................................................................. 39

5. Conclusion ................................................................................................ 39

6. References ................................................................................................. 40

7. List of Abbreviations ................................................................................ 41

A. Descriptions of Security Incidents and Countermeasures ................... 42

A.1 Incident/countermeasure CS-1 .................................................................................... 42 A.1.1 Description ........................................................................................................... 42 A.1.2 Related SW Functions.......................................................................................... 45 A.1.3 Related Infrastructure ........................................................................................... 45

A.2 Incident-countermeasure CS-2 .................................................................................... 47 A.2.1 Description ........................................................................................................... 47 A.2.2 Related SW Functions.......................................................................................... 48 A.2.3 Related Infrastructure ........................................................................................... 48



A.5 Incident-countermeasure CS-5 .................................................................................... 52 A.5.1 Description ........................................................................................................... 52

SUCCESS D4.5 v1.0

Page 5 (61)

A.5.2 Related SW Functions.......................................................................................... 52 A.5.3 Related Infrastructure ........................................................................................... 53


A.7 Incident-countermeasure PS-1 .................................................................................... 55 A.7.1 Description ........................................................................................................... 55 A.7.2 Related SW Functions.......................................................................................... 55 A.7.3 Related Infrastructure ........................................................................................... 55



A.10 Incident-countermeasure PS-4 .................................................................................... 58 A.10.1 Description ................................................................................................. 58 A.10.2 Related SW Functions ............................................................................... 58 A.10.3 Related Infrastructure ................................................................................ 58

A.11 Incident-countermeasure PS-5 .................................................................................... 59 A.11.1 Description ................................................................................................. 59 A.11.2 Related SW Functions ............................................................................... 59 A.11.3 Related Infrastructure ................................................................................ 59

B. Interfaces ................................................................................................... 60

B.1 Example for a RESTful Service of the SUCCESS API: Attack/Incident found (I3-1) .. 60 B.1.1 Set a cyber-attack or incident [POST] .................................................................. 60

SUCCESS D4.5 v1.0

Page 6 (61)

1. Introduction

The SUCCESS architecture and the SUCCESS Security Monitoring Solution are described in D4.2 [3], which also fills in the background of SUCCESS and motivates the need for the SUCCESS Security Monitoring Solution. D4.2 describes the SUCCESS architecture from the perspectives of the Utilities, communications and security. This conceptual architecture is not specific to the SUCCESS project but may be instantiated in different particular systems. The SUCCESS Security Monitoring Solution is the instantiation of the SUCCESS architecture in the infrastructure developed by the SUCCESS project itself, where the components of the SUCCESS infrastructure work together to detect security threats and incidents to the Electricity Distribution Grid’s management and communication systems and execute countermeasures which mitigate these threats or incidents. Several of the components used in the SUCCESS Security Monitoring Solution are described in this document:

1) the pan-European Security Monitoring Centre (ESMC), which comprises two sub-

components, the decentralised European Security Monitoring and Information System

(DE-SMIS), which locally collect data on DSO/TSO level, and the European Security

Monitoring and Information System (E-SMIS) which collects data from several DE-SMIS

instances across Europe.

2) the DSO Security Monitoring Centre, which detects security incidents and applies

countermeasures on DSO level;

3) the Breakout Gateway (BR-GW) which is located at the edge of the mobile

communications network (in the eNodeB, i.e. radio base station, see Ch. 4.2.1 below for

details explaining the mobile communications netwwork) and implements mobile core

network functionality there, as well as detecting cyber-attacks on the data

communications and implementing countermeasures.

This document also describes the interfaces between all the SUCCESS components. In addition, this document defines a set of security countermeasures to security incidents resulting from identified threats and describes the way that the countermeasures are executed by means of the components in the SUCCESS infrastructure. Hence, this deliverable maps the SUCCESS countermeasures onto the components’ functionality and the interfaces between the components. The SUCCESS field trials will implement use cases to demonstrate countermeasures and verify that the SUCCESS Security Monitoring Solution mitigate cyber-threats to smart grid infrastructures. All the use cases fall into the category of incident and countermeasure CS-1 (Device behaving suspiciously in terms of communication pattern or content), discussed in Annex A.1.

1.1 How to Read This Document

Before reading this document, the reader should read the D4.2 [3], which motivates the SUCCESS project, describes the SUCCESS architecture, and describes the SUCCESS Security Monitoring Solution, which instantiates the SUCCESS Architecture. The analysis of cyber-security threats to the critical infrastructures made in D1.2 [1] should be consulted to understand the purpose of the countermeasures presented in this document. Additionally, further details of the DSOSMC, which is briefly described in this document, are also available in D3.4 [4] and the NORM Smart Meter Gateway is described in D3.7 [5].

This document is an output of Task 4.2 of SUCCESS, which is concerned with architecture

definition. It is the second release of this document.

Note on terminology: SUCCESS uses the following terms:

• A threat is a way to perform an attack (cyber-attack or physical-attack) on the system

(where the system being considered in SUCCESS is the electrical grid and its related

systems, i.e. the grid itself and its electrical components, the associated communications

networks and the systems managing the grid and the communications). In the context of

SUCCESS D4.5 v1.0

Page 7 (61)

SUCCESS, the components in the SUCCESS Security Monitoring Solution can

themselves be the subject of attacks and are also therefore, part of the system which is

under threat.

• An attack is the manifestation of a threat, i.e. the hypothetical threat is made real by an

attacker and becomes a reality.

• A security incident is a suspected attack (cyber- or physical) which has been detected by

the SUCCESS Security Monitoring Solution.

• A countermeasure is the procedure followed to mitigate the security incident.

The structure of this document is as follows:

• The rest of the document comprises a list of security incidents and applicable Countermeasures (Ch.2). Ch. 3 describes the mapping from the threats identified in D1.2 [1] to the identified incidents and their countermeasures.

• Ch.4 describes the environment of the SUCCESS project and the components used, with sub-chapters detailing the communication components, the DSOSMC and the Pan-European Security Monitoring.

• SUCCESS is concentrating on how mobile 5G communications can be applied to Utility communications to provide enhanced security functionality. The communication features used by SUCCESS are described in Ch.4.2.

• The DSO Security Monitoring Centre (DSOSMC) and the Pan-European Security Monitoring Centre (ESMC) are described in Chs. 4.3.1 and 4.3.2.

o The DSOSMC monitors and generates countermeasures for particular local electricity distribution grids.

o ESMC is intended to monitor a large area and to alert DSOs or TSOs in case of security threats ESMC comprises a central instance (E-SMIS) and several decentralised instances (DE-SMIS), which interwork with the local DSOSMCs. In contrast to the DSOSMC, ESMC will not directly initiate countermeasures. Rather, the cyber-attacks detected by ESMC will be notified to the DSO or TSO who is responsible for initiating the countermeasures. The supporting DSO/TSO SCADA system will not be included in the scope of the SUCCESS project.

• The interfaces between the components of the SUCCESS Security Monitoring Solution are outlined in Ch. 4.4.

• Annex A includes a set of sub-chapters describing in detail the security incidents and outlining countermeasures for them and how they will be implemented by the components of the SUCCESS Monitoring Solution. The countermeasures dealt with in this Annex are introduced in Ch. 3.

• Annex B gives detailed definitions of the SUCCESS API interfaces described in Ch. 4. In this version of the deliverable (D4.5), only an example outlining the way the interfaces will be defined is included. The detailed definitions will be included in the next, final version of this deliverable (D4.6), which will be released in M24.

The document covers two different, but related, broad subjects: 1) details of the components and interfaces in the SUCCESS Security Monitoring Solution and 2) details of the threats and countermeasures. The reader interested in getting details about the SUCCESS Security Monitoring Solution, its components and interfaces is recommended to begin by reading Chapter 4. Readers who are interested in the details of the threats and countermeasures are directed to Chapters 2 and 3.

SUCCESS D4.5 v1.0

Page 8 (61)

2. List of Security Incidents and Outline of Countermeasures

SUCCESS deliverable D4.2 [3] discusses security configuration and measures that should be taken to provide a good basic security for the smart grid and which can already eliminate many of the attacks and threats identified in deliverable D1.2 [1] on the communication, the network and the physical devices. However, any such precautionary measures cannot be considered sufficient to prevent attacks. The ability to detect attacks and initiate countermeasures is also needed. The combination of the basic security configuration and measures with active attack detection and mitigation complement each other to provide defence in depth, and making it more difficult for the attacker to succeed. The basic requirement to identify security incidents is monitoring of the network and the nodes in it. Effectively, the power grid can only be monitored through the communications network, so that the network being monitored is a hybrid containing both the power grid and communications network. Hence, the network in question here comprises both the power grid and the communications network it uses, and the nodes comprise equipment in the power grid and equipment in the communications network. Indications of potential security incidents includes devices not responding or not operating according to what is expected, e.g. exceptions to regular communication patterns or message type and content. However, identification of the root cause of the incident might not be trivial as multiple threats could result in similar type of results, such as a node not responding due to a physical attack on the node, a network based attack/hack on the node (e.g. Denial of Service (DoS) attack) or the network itself or a natural disaster or accident that has destroyed the node or rendered it useless. Once a security incident has been detected, it should be mitigated by following a pre-defined procedure, consisting of a number of steps or parts. This is referred to as a countermeasure, which is used for responding to the incident and minimising the effect of the incident. This deliverable outlines multiple incidents and the countermeasures that could be applied to them to solve the situation. Some of these will also be implemented in trial sites, with the focus being on the incident and countermeasure types outlined in CS-1 described in Annex A.1, where there are trial specific subsections describing the particular scenario. Countermeasures will be chosen as a combination of atomic actions, depending on the exact nature of the security incident. Normally, countermeasures are implemented sequentially, depending on the specific nature of the incident. Therefore, we perform a categorisation of the major types of security incident in Ch. 3 below. This categorisation will be further detailed in the project deliverable D1.4 [2]. The approach taken where a set of security incidents and countermeasures associated with the security incidents enables the countermeasures to be updated and revised as needed. The set of security incidents is more generically defined and is not expected to need modification so often. The background to this is that cyber-attacks constantly changing their nature, but the effect they have on ICT systems (as represented by the security incident) remains constant, e.g. loss of service. Hence, we focus on the incidents rather than the use of specific individual countermeasures. Furthermore, the defined countermeasures are more of a blueprint and might in many cases need to be adjusted for the particular incident at hand. This means that the countermeasure library will grow over time as new variations of the incidents occur and suitable countermeasures are defined. Table 1 and Table 2 contain countermeasures to incidents related to cyber security and physical security, respectively. The tables provide a high-level description of the countermeasure, while a more detailed description is given in Annex A. The last column of the tables identifies the threat groups from deliverable D1.2 [1] to which the incident and countermeasure applies. The term “device” is meant to be understood as any physical device in the system, such as NORM, as any device on which a (potentially virtualised) function is being run, or as any part of a bigger system that either sends, receives or handles communication.

SUCCESS D4.5 v1.0

Page 9 (61)

The scope of the countermeasures described in this chapter and detailed in Annex A is the whole of the SUCCESS Security Monitoring Solution, i.e. including components and functions which are not described in Ch. 2 above, e.g. NORM or Double Virtualisation.

SUCCESS D4.5 v1.0

Page 10 (61)

Table 1: Cyber-security related incidents and countermeasures

Incident Label

Incident Description

Countermeasure Targeted threat groups

CS-1 Device behaving suspiciously

1. Move the data or applications to another

physical/logical zone

2. Perform remote attestation to verify device

state

3. If state is OK, red-flag the device

a. Temporarily disconnect device from

the grid

b. Investigate

T000, T100, T200, T300, T400, T410, T420, T700

CS-2 Remote attestation fails (after step 2 of CS-1 above)

1. Disconnect device from the grid.

2. Send maintenance unit to location

3. Reset/Reinstall device & re-bootstrap (new

credentials & revoke old ones)

T000, T100, T200, T300, T400, T410, T420, T700

CS-3 Unauthorised messages 1. Identify device or network segment where

data is originating from

2. If device, perform CS-1

3. If network segment, it means there is an

unauthorised node in the network segment

4. Isolate network segment

5. Investigate

T000, T100, T200, T300,

CS-4 Virus detected in device

1. Re-deploy VMs

2. Isolate device/functions from network

3. Enable backup device if available

4. Reinstall device to remove malware

5. Verify peers not infected

6. Update malware definitions in all nodes as

soon as possible

T000, T200, T700

CS-5 DoS suspicions 1. Block DoS traffic at edge of network by

updating firewall rules

2. Move VMs to other location

3. Enable backup node if available

4. Do load-balancing if possible

5. Analyse suspected DoS traffic, verify attack

T000, T300, T410, T700

CS-6 Security algorithm deemed insecure

1. Remotely configure affected nodes to

deprecate insecure algorithm and enable

alternative algorithm

2. Optionally select and review proper

alternative algorithm

T100, T200, T300

SUCCESS D4.5 v1.0

Page 11 (61)

Table 2: Physical security related incidents and countermeasures

Incident Label

Incident Description

Countermeasure Targeted threat groups

PS-1 Perimeter breached 1. Send security personnel to the location to

investigate and repair breach (infrastructure

device)

T100, T300, T500

PS-2 Device casing breached 1. Send security personnel to the location to

investigate and repair breach (infrastructure

device)

2. Perform remote attestation of device state

3. Send maintenance unit to location

a. Reset device & re-bootstrap (new

credentials & revoke old ones)

b. Repair device

or

c. Replace device

T100, T300, T400, T500

PS-3 Communication link unavailable

2. Re-configure network to route device via

secondary access (if available)

3. At least if 1) not possible: Move data and/or

applications to another physical/logical zone

4. Send maintenance unit to location to

a. Reset network connection & unit

b. Repair network connection & unit

or

c. Replace network connection unit

T300, T500

PS-4 Device power unavailable 1. Enable backup power

2. At least if 1) not possible: Move data and/or

applications to another physical/logical zone


a. Reset power supply unit

b. Repair power supply unit

or

c. Replace power-supply unit

T300, T500

PS-5 Device Unavailable 1. Enable backup node if available

2. Move data and/or applications to another

physical/logical zone (to backup node if

available)


a. Reset device

b. Repair device

or

c. Replace device

T300, T500

SUCCESS D4.5 v1.0

Page 12 (61)

A more detailed description of the incident-countermeasure pairs is given in Annex A. The given countermeasures are the generic countermeasure for that specific type of incident. However, the countermeasures act as a template and should be adapted to the real use-cases and specific incidents for actual use. Concrete countermeasures for incident implemented in the trial sites is given in subchapters where applicable. All security incidents should also be propagated upwards to DE-SMIS to help provide a better understanding of security incidents on a pan-European level. This includes providing the DE-SMIS with all relevant information pertaining to the incident so that E-SMIS can correlate potential distributed attacks in a wide area and also providing DSOs with information on individual ongoing or recent incidents. This way, the DSOs can be prepared and learn about possibly future incident types and patterns. A mapping showing how the countermeasures map to the threats identified in deliverable D1.2 [1] is shown in Table 3 in Ch. 3.

SUCCESS D4.5 v1.0

Page 13 (61)

3. Mapping of Threats to Security Incidents and Countermeasures

Table 1 in deliverable D1.2 [1] contains the identified threats to the system. The following table indicates how those threats are mapped to the identified incidents and their countermeasures. The table is provided to give an overview of how the threats are covered by the defined countermeasures and the system security that should be in place.

• Green means the threat is covered by the countermeasure, orange that the threat is partially

covered while white indicates that it is not e.g. a DoS related countermeasure would not cover

an eavesdropping threat.

• In addition, physical security threats are generally not applicable to cyber security incidents

and countermeasures and vice versa. This is indicated by N/A (Not Applicable).

• Some N/A cases might still also be usable against some attacks and might thus also have

colour coding indicating this. The threat label is also colour-coded to indicate how well the

threat is covered by the countermeasures.

• The “Covered by default security” column is there to indicate which threats should be covered

by standard ICT security for smart-grid systems and thus should not be an issue (indicated

by ♦) or where it is not clear cut (indicated by ●).

However, as each system is individually installed and configured misconfigurations, human error and negligence could of course result in less than secure systems. Likewise, new security threats and e.g. protocol vulnerabilities might over time lessen the system security. An example is man-in-the-middle attacks, which should not have an impact (of course with the exception of performing DoS when being on the path), since all communicating entities should have certificates issued by a trusted CA, possibly the DSO or TSO CA, meaning an attacker should not be able to obtain a valid certificate, which would be needed to successfully become a man-in-the-middle. Likewise, eavesdropping should not be an issue as all communication should be sufficiently protected. Of course, this does not mean that one should not consider these threats as unimportant.

SUCCESS D4.5 v1.0

Page 14 (61)

Table 3: Threat to Countermeasure Mapping

Threat Covered by Default Security

CS-1

CS-2

CS-3

CS-4

CS-5

CS-6

PS-1

PS-2

PS-3

PS-4

PS-5

T001 N/A N/A N/A N/A N/A





T101 ♦ N/A N/A N/A N/A N/A






T107 ● N/A N/A N/A N/A N/A


























T307 N/A N/A N/A N/A N/A N/A









T502 ● N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A






T602 ♦ N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A


SUCCESS D4.5 v1.0

Page 15 (61)












Exceptions and special cases amongst the threats: T107: If the system is properly secured only frequency and potentially message size can be analysed. This can give some information to the attacker, but it is limited. T108: Threat of war driving depends on used access technology. 4G/5G should not have issues, while WLAN might have. T420: Could potentially be marked as green, but it depends on how the threat “Compromising confidential information” is interpreted. CS-1, CS-2 and PS-2 cover both cases when data is sent over the network (unusual behaviour) and accessed via the physical device (device casing breached) so those oranges could be seen as resulting in an overall green. T502-T504: When an employee steals information from the employer it is not something that can easily be fully protected against as has been seen with even the US National Security Agency (NSA) having its information leaked. Therefore, it is good to understand that the key to preventing or minimising such incidents is with proper security guidelines, authorisation and access control. We do not present a specific countermeasure to these types of incidents as they are more or less the same regardless of industry; trying to minimise the spread of the leaked/stolen information and charging the perpetrator. T601: Very much related to T502-T504 above. Employee stealing information is a general issue in all areas and as such difficult to fully protect against. T602, T603, T607: These are results of human error and like T601 quite generic problems in all fields. T604: An unreliable source would not be connected to the system. (it depends on how one interprets the meaning of the threat) T605, T606: Critical data should always be backed up, this is a non-issue. T607: A device that gets lost would have its credentials revoked from the system T701, T705-T707: Once more, related to stealing or human error.

SUCCESS D4.5 v1.0

Page 16 (61)

4. SUCCESS Components and Interfaces Description

The environment of SUCCESS comprises electricity grids, communications infrastructures and management applications. The SUCCESS project will perform field trials in Italy, Romania and Ireland, testing the SUCCESS Security Monitoring Solution in the real electricity grids in these countries. In addition, simulated grids will be used for tests in laboratory environments. Some details of these grids are given in D5.1 [7]. The field trials and laboratory setups are instantiations of the SUCCESS Security Monitoring Solution described in D4.2 [1]. The focus of this chapter is on the components developed by the SUCCESS project, which will be part of the field trial and laboratory infrastructures, and the interfaces between the components. The underlying electricity grids, communications networks and cloud systems hosting these components will depend on the particular instantiation of the SUCCESS Security Monitoring Solution and are therefore not described in detail.

4.1 SUCCESS Components

Figure 1: SUCCESS Components in their Environment

SUCCESS D4.5 v1.0

Page 17 (61)

The components being used by the SUCCESS project are shown in Figure 1. The components are located in, and form part of, the electricity grid, the communications network or in cloud systems. The components which are being used by SUCCESS are:

• the New-generation Open Real-time Smart Meter (NORM), which is produced by SUCCESS and will be described in detail in deliverable D3.7 [5].

• a new Phasor Measurement Unit, which is produced by SUCCESS and will run as one of the components integrated in the NORM and is described in detail in D3.7 [5].

• Smart Meters of various types which will be present in the field trials and laboratory (if they are connected to the Smart Meter Gateway of NORM and are also considered as components of NORM),

• Electric Vehicles (EVs) and corresponding charging stations which will be part of the Irish field trial; the EV Charging station hosts the NORM.

SUCCESS is focussing on the security capabilities offered by mobile communications, particularly 5G communications. This means SUCCESS reuses the security features of the 5G mobile network (described in Ch. 4.2 below) but also extends them with the Breakout Gateway. Please note, however, that 5G mobile communications is just one infrastructural component in the SUCCESS Security Monitoring Solution described in Ch. 4.2 below and that the SUCCESS Security Monitoring Solution itself does not require 5G mobile communications but other IP-based communications networks (public or private), albeit with the consequence that no 5G-Mobile-specific functions will then be available. Indeed, because 5G technologies are still under development and are not yet commercially available, the SUCCESS trial sites are expected to use communications solutions other than 5G mobile and it is expected that the use of 5G mobile technologies in SUCCESS will be limited to laboratory tests. SUCCESS is developing a new Breakout Gateway (BR-GW) which allows mobile core network functionality and application functionality to be executed in the mobile radio network, reducing communication latency and enabling dedicated security functionality. Moreover, BR-GW has the potential to increase the resilience of local microgrids, for which there is communication support during local power outages as well as during massive blackouts. Additionally, with respect to securing the NORM-DSOSMC Centre communications and guaranteeing authentication, while incurring only a minimal communications overhead, a solution based on using a Physically Unclonable Function (PUF) has been considered. By introducing PUF, SUCCESS is providing NORMs with hardware security features which inherently guarantee that the NORM’s identity cannot be tampered with or faked, as PUFs act to uniquely identify the device, its uniqueness being bound to their specific hardware construction characteristics, also considering environmental or explicitly-introduced randomness. In contrast to Trusted Platform Module (TPM) [25]] which is a standard for a secure crypto processor used to secure hardware by integrating cryptographic keys into devices, PUFs (particularly the ones relying on intrinsic randomness) can be added in the NORM infrastructure without requiring any change to the manufacturing/design process of the NORM. Further details on the PUF implementation of SUCCESS are provided in deliverable D3.7 [5]. The components grouped as Applications in Figure 1 perform functions related to managing the underlying electricity grids. Figure 1 shows the Applications being developed in SUCCESS. In SUCCESS, security attacks or incidents will be detected and countermeasures initiated by DSO Security Monitoring Centres, which gather data from a set of NORMs and searches for specific patterns with machine learning methods. The components of the SUCCESS infrastructure work together to implement a holistic security monitoring solution, called the SUCCESS Security Monitoring Solution, which is described in D4.2 [1], with a focus on the vulnerabilities introduced by smart devices in electrical distribution grids.

SUCCESS D4.5 v1.0

Page 18 (61)

4.2 Communication Components

This chapter describes the hardware and software elements of the communications network essential aiming for maximum security when transmitting information and commands between the nodes of the electrical network. Figure 2 shows the current cellular network technology (LTE) that can be utilised for machine type communications, in our case smart grid communication (for example wide area monitoring and control). LTE is the 4th generation of mobile communication technology and has been standardised by 3rd Generation partnership project (3GPP) community. Initial standards were specified in release 8, aiming to meet the demands of data traffic over mobile communication networks. Further enhancements are being made in LTE to satisfy new requirements for machine type communication.

Figure 2: LTE for Smart Grid Communication

4.2.1 State-of-the-Art of Mobile Networks (4G/LTE)

Today’s mobile communication networks consist of two major components, the core network, and the radio access network. The Radio network consists of the Evolved Universal Terrestrial Radio Access Network (E-UTRAN) and User Equipment (UE), which is a device which provides authentication of end users. E-UTRAN is the air interface part of the LTE network. It consists of the eNodeBs (base stations), which provide radio related functions. These functions include radio resource management, scheduling, QoS, ciphering/deciphering of the user plane and control plane data and compression and decompression of the user plane packet headers [9]. The core network is generally geographically located far away from the radio network and it enables authentication, mobility, and connection to other non-3GPP networks. In LTE, the core network is known as the Evolved Packet Core (EPC) and it includes the following components.

The Mobility Management Entity (MME): The MME is the key control-node for the LTE access network controlling the high-level operations such as authentication by means of signalling messages. It is responsible for user authentication and for regulating security parameters. It is also involved in the bearer activation/deactivation process. Serving Gateway (S-GW): The S-GW routes and forwards user data packets. It is the node that terminates the interface towards E-UTRAN and acts as the mobility anchor for the user plane during inter-eNodeB handovers and is responsible for compatibility between LTE and other 3GPP technologies. Packet Data Network Gateway (P-GW): The P-GW provides connectivity to the UE to external Packet Data Networks (PDNs) using the SGi interface. The UE may be simultaneously connected with more than one P-GW for accessing multiple PDNs. This has two slightly different implementations, namely S5 if the two devices are in the same network, and S8 if they are in different networks.

Power Grid

Radio

access

Control &

Monitoring

Applications

SCADA

Core Network

Application data

plane

Internet/VPNWireline access

eNB

HSSS-GW P-GWMME

SUCCESS D4.5 v1.0

Page 19 (61)

Home Subscriber Service (HSS): It is a centralised database containing the information about all the mobile subscriber in the network. MME interacts with HSS to authenticate all the 3GPP device in the network. The current mobile network infrastructure in Figure 2 shows that the traffic which is generated by, and sent to the end users are transmitted through radio network and core network.

4.2.2 The SUCCESS Solution based on 5G Mobile Networks

The SUCCESS communication solution is based on the evolving LTE infrastructure and 5G next generation networks as explained in D4.2 [3]. This chapter provides a detailed description of different components which enable local edge processing. The main driving force behind development of 5G is to support a wide range of use-cases in business areas such as augmented reality, logistic tracking, energy meters, cell automation, smart factory, connected vehicles, and transportation. In the future, smart grids and industry use-cases will require a sophisticated ICT infrastructure supporting ultra-low latency, high availability, maximum reliability, and devices with significantly longer battery life. These requirements set the scene for the next generation of wireless access – 5G systems that are set for commercial availability from 2019. The road to 5G is an evolution, and LTE is adopting and implementing many of the requirements of the new use cases while the development is approaching towards 5G standard. Key enablers for 5G, including Network Slicing, end-to-end security, enhanced Radio Access and Software Defined Networking will enable the use of the 5G technology in SUCCESS solution. These components are future enhancements to LTE networks which have not yet been deployed.

• Network Slicing: dedicated network resources in mobile radio and core networks ensure

necessary priority for the specific performance requirements of the use cases.

• End-to-end Security: security services including identity management and pervasive

integrity protection over the entire network.

• Network Function Virtualisation: networking hardware virtualization via software, allows

to operate software at the network edge or the network core, and caters for the

maximization of the scalability and reliability of the communication network

• Enhanced Radio Access: new spectrum, new encoding methods, and adaptable

bandwidth offerings support use cases with huge number of devices and extremely short

round-trip times.

• Software-Defined Networking: software-based configuration of mobile transport and core

networks including capacity provisioning enable the mobile network operator to configure

the network depending on service requirements, and independently of the underlying

physical and logical network architecture.

This next generation cellular infrastructure aims to fulfil the communication requirements for machine-to-machine (M2M) type communications involving use cases such as smart grid communications. As explained in in D4.2 [3], smart grid communications can be broadly divided into two categories, massive M2M type communication (Advanced Metering Infrastructure, Wide area monitoring) and critical M2M type communication (real time monitoring and grid control, FLISR applications). Enhancements of the current LTE network are being investigated to support both the above use cases.

4.2.3 The Communication Solution for SUCCESS

This and the following sub-chapters provide a detailed description of the communication solution including the functional components. A Breakout Gateway function (BR-GW) is described which enables local edge processing and allows implementation of real-time countermeasures. The 5G communication solution aims to provide seamless connectivity while at the same time ensuring the security of the communication and can interwork with any non-3GPP access network (non-cellular network), which might be the communication network in certain smart grid scenarios. Considering that, 3GPP technology such as Generic Bootstrapping Architecture (GBA) [12] can be utilised to authenticate applications running over 3GPP and non 3GPP network by leveraging

SUCCESS D4.5 v1.0

Page 20 (61)

trust of SIM card subscription data, Figure 3 shows an overall view of the 3GPP and non 3GPP network domain. In SUCCESS, Data Centric Security is applied for enabling end-to-end validation of data integrity against data manipulation and false data injection threats. These threats are deemed critical for the smart grid applications such as power grid state estimation [11] and voltage control [8] [14]. A detailed description of these components as part of BR-GW is provided later in the current chapter in Section 4.2.4.2.

Figure 3: 3GPP domain and extension of 3GPP security features

4.2.4 Break-out Gateway (BR-GW)

The BR-GW is a new functional unit being proposed, designed, and implemented as part of this project. The breakout functionality diverts the user plane traffic from going towards the core network and process it near the base station on a virtualized instance.

4.2.4.1 Motivation

Based on the requirements of the upcoming smart grid infrastructure with a huge number of smart meters streaming data in real-time, the ICT infrastructure must be designed to support such functionality. One of the major requirements is to enable local edge processing for realisation of real-time countermeasures. Current mobile networks are not designed to support local edge processing. Data traffic generated by, or sent to, end users are transmitted through core network functions which are usually not placed near radio networks. This induces more latency in the reaction time of any services running on it. To address the use cases described above in Ch. 4.2.2, current 5G network technology enablers are under thorough investigation [10]. Specifically, this project is developing breakout functionality enabling edge processing for real time applications. This will be hosted in a cloud environment located at the edge of the communications network. Such a breakout gateway can enable and

Mobile Network

BSF NAF

SCADA/Utility

Control Centre

Sensor

Network

3GPP

Gateway

BSF – Bootstrapping Server Function NAF – Network Application Function

3GPP enabled

Power System

Meters

Extended 3GPP Security Features (SIM+GBA)

Existing 3GPP Features (SIM)Smart Meters

based on Non

3GPP Network

SUCCESS D4.5 v1.0

Page 21 (61)

host distributed data processing and will perform real time countermeasures at the network edge, reducing the impact of local failures and reducing response time.

Figure 4: Breakout Gateway

The BR-GW acts as a local gateway based on Software Defined Networking (SDN), which is an integral part of the 5G network architecture [10]. Advantages of this Network Layout include:

• Data can be processed on decentralized computing nodes, close to the sensors and

actuators of the network.

• Higher data speeds between measurement and control units of the smart grid system, as

this solution bypasses many other network elements of the mobile system.

• Better support for the huge numbers of IoT devices in future applications, as most of their

communication can be managed without the mobile core network.

Power Grid

Radio

access

Edge Cloud

Internet/

VPN

Core Network

Control &

Monitoring

SDN Control

Plane

BR-GW

eNB

SDN Controller HSS

BR-GW

SUCCESS D4.5 v1.0

Page 22 (61)

4.2.4.2 Functionality of the Breakout Gateway

Figure 5: Conceptual diagram of Breakout Gateway

The conceptual diagram in Figure 5 shows the essential functions of the BR-GW. These include the local breakout based on SDN technology, GBA, Data-Centric Security, and other vital security applications such as DSOSMC in SUCCESS. The Functional impact of BR-GW on the communication network is detailed in the following description of the functional enhancements in 5G networks. Local Breakout In 5G mobile networks, the user plane (also called data plane) refers to the user data traffic and control plane refers to the signalling traffic, which is used for routing the user plane traffic. The Breakout Gateway is based on the principle of separating the user Plane from the control Plane. This is realised by the use of SDN technology allowing dynamic configuration of the network based on the application requirement. Rather than being a physical node, BR-GW is a virtualised instance which will enable new edge processing. Software-Defined Networking (SDN) SDN is based on the principle of separating the control plane and the user plane. It aims to provide a logically centralised network intelligence and a standardised interface for software development to control network resources and the flow of network traffic. Basic SDN components include the SDN controller, network elements and applications used to control network resources. The SDN controller interacts with applications via standardised Application Programming Interfaces (APIs) and on other side with network elements with standard protocols such as OpenFlow [15]. The applications can implement network services like routing, security, and bandwidth management, with different behaviour for traffic in different flows, i.e. from different sources, responding to real-time demand changes in the network. Data Centric Security (DCS) DCS enables data integrity checking of the end to end information flow in any communication network. Smart meter critical data that are used for power grid state estimation or voltage control [8][11][14] could, if manipulated, disrupt the output of the power grid applications. Such attacks are commonly referred to as False Data Injection attacks. Data integrity of the measurements being communicated over any network should be protected and checked at the entry and exit points. DCS technology is used to ensure the validity of the message data sent between a message sender and a message receiver. The procedure is shown in Figure 6. First, the message sender generates a hash (Hash 1) of the message data using the SHA-256 algorithm and sends the hash

Breakout Gateway

Data Centric

Security

GBASDN for local

breakout

Ericsson Cloud

Other Security

Applications

SUCCESS D4.5 v1.0

Page 23 (61)

value to a DCS Signing Authority that generates a Keyless Signature using a technology based on the blockchain principle and returns the signature to the message sender. Then the message sender sends the message data and the Keyless Signature to the message receiver. The message receiver generates a hash (Hash 2) of the received message data and sends it, along with the Keyless Signature, to the DCS Signing Authority. The DCS Signing Authority verifies the Keyless Signature and the received Hash 2 with the blockchain publication file and authenticates the transaction, informing the message receiver of the result. This procedure uses the Keyless Signature to prove that the transaction has taken place, that the data has not been tampered with and that the sender is the true sender of the data. In order to optimise the performance, a distributed DCS Signing Authority architecture, where versions of the blockchain publication file can be kept locally and periodically synchronised with the central version, can be deployed. In the SUCCESS Security Monitoring Solution, the message sender can be situated in the NORM and the message receiver in the DSOSMC, which is instantiated as an application in the BR-GW. In addition, the same procedure is also used to verify the integrity of the NORM firmware. In this case, the NORM (periodically or on request of the DSOSMC) generates a hash value based on its firmware and, using the same procedure as above, authenticates the firmware with the DSOSMC.

Figure 6: Data Centric Security

Generic Bootstrapping Architecture (GBA) Generic Bootstrapping Architecture (GBA) is a 3GPP authentication architecture. GBA is standardised in 3GPP standard TS 33.220 [12]. It uses the mobile SIM based authentication to provide security functions even outside the mobile network domain. One example could be authentication towards enterprise server for mobile device.

SenderDCS Signing

AuthorityReceiver

Data

Hash

OK, NOT OK

Hash (hash-1)

Signature

Signature + Data

Signature + Hash (hash-2)

SUCCESS D4.5 v1.0

Page 24 (61)

Figure 7: GBA functions applied at Breakout Gateway

The reason for using GBA for authentication is to leverage the trust of mobile communication operators to enable authentication of smart grid applications running over 3GPP or non 3GPP network. Compared to state of art solutions such as Public Key Infrastructure and digital certificates, GBA provides a novel solution to authenticate a power grid application running over remote measurement and control devices. GBA provides a means to use mobile network subscriber data - which a mobile network operator keeps in his mobile network - to authenticate mobile network users for application services. These other services need not necessarily be under the control of the mobile network operator. In fact, the ability to use mobile network subscriber data to authenticate 3rd party services is a key feature of GBA. Further, GBA supports interoperator authentication. GBA can be applied to local breakout services, so that security functions can be provided as well to localised enterprise functions. Figure 3 and Figure 7 shows the setup of the GBA in mobile network. Network Application Function (NAF) and Bootstrapping Server Function (BSF) are the two main new components included in GBA. In mobile communication networks, HSS is main database containing information used to authenticate User Equipment in the network. BSF sits in front of the HSS and protect the HSS from direct access by 3rd parties and provides necessary information for GBA to enable authentication of application/user. NAF is connected to BSF via Zn interface [10]. NAF’s main function is to verify the authentication of application messages generated from mobile phones (UEs). NAF acts as message authenticator for any service application. This authentication is done by utilising subscriber information stored in the HSS via BSF. NAF functionality can be embedded to authenticate smart grid application generating messages from smart meters connected via 3GPP or non 3GPP networks. Figure 7 shows the realisation of NAF function in BR-GW. BR-GW in Two Domains: BR-GW will be a virtual instance in the distributed cloud, which enables the connection to a local instance of DSO security function and the DSOSMC. Conceptually BR-GW is explained above; however its functionality can be divided into two domains, the Telco domain and the DSO domain, as shown in Figure 8. The functionality in the Telco domain is controlled using SDN mechanisms, and is responsible for local breakout which provides connectivity layer functionality. The DSO domain will have Services / applications installed on the BR-GW virtual machine to enable security functions for threat detection and countermeasure implementation. The encrypted traffic can be decrypted only in the DSO domain in order to analyse the traffic data. The DSO domain will be part of DSO and Telco will have no control or will not be able to access the data from DSO domain. However, there will be information exchange between both domains in order to enable countermeasure implementation. This information exchange is done via I2 interface. The applications / services running in the DSO domain will be able to communicate with the external DSOSMC on an encrypted connection. This will allow reporting of incidents to DSOSMC which can take appropriate countermeasures against attacks.

Enterprise

Application

NAFCore Network

HSS

P-GWBSF

MME

eNB BR-GW

SUCCESS D4.5 v1.0

Page 25 (61)

D4.2 applies the concepts of the Trusted Execution Environment (TEE) and Trusted Platform Module (TPM). TEE provides isolated execution of code, and ensures integrity and confidentially of critical applications. In the proposed BR-GW, TEE can be utilized to host and process services which require access the security critical data of the meters. Thus, these services could potentially have access to DSO session keys that they can use to access the protected meter data of NORMs in a secure and isolated way with the use of TEE. In order to store these keys, utilised for secure communication, a TPM module can be used. Recently work has been done toward virtualised Trusted Platform Modules (vTPM) [16], which enable TPMs also for virtualised entities. These concepts, combined with the BR-GW function in a virtualised environment, could provide a secure solution for the hosting service to enable security functions.

Figure 8: Breakout gateway logical domains

4.2.5 5G communication Test Setup

Figure 9: BR-GW test lab setup

A live LTE network comprising of the above described components is integrated with the state-of-art research facility at RWTH University’s ACS lab. BR-GW functionality is developed and integrated at the RWTH ACS lab with existing LTE network infrastructure. Testing of the functionality is done with hardware-in-the-loop simulation for a local grid control application. Interface functionality is realised in the laboratory. This test setup was shown in the project review with running live demonstration. The next step is the integration with other components of SUCCESS and testing of the interfaces I1, I2 defined in this deliverable in Ch. 4.5. It is planned in the trials to connect the Irish trial site consisting of Smart Electrical Vehicle charging system with the ACS lab at RWTH University which includes the 5G ready base station and BR-GW.

Breakout Gateway

DSOSMC

external

VPN enabled / encrypted

Telco domain

SDN to enable

Local breakout

DSO domain

Services to

enable security

functions

VPN enabled / encrypted

NORM

SUCCESS D4.5 v1.0

Page 26 (61)

Hence, the test setup incorporates the following components: 1) Standard LTE components: E-UTRAN and core networks. 2) LTE Modems and background traffics generation using laptops. 3) A set of NORM devices, 4) DSOSMC instantiation 5) Raspberry Pi cards to realise the smart meters The test lab setup involves the RTDS connected to the Raspberry Pi’s, which are connected via LTE Modems with the LTE base station. The eNodeB is connected with the BR-GW to provide edge cloud services running DCS, GBA, DSOSMC services. The test cases are described in detail in D4.8 [6].

4.3 Security Monitoring Components

4.3.1 DSO Security Monitoring Centre

The DSO Security Monitoring Centre (DSOSMC) implements a set of countermeasures to mitigate security incidents specific to smart meters. DSOSMC forwards the data it receives from NORM (directly or via the BR-GW) to the DE-SMIS after filtering and aggregating them as described in the sub-chapters below. As DSOSMC and DE-SMIS are located at the same TSO/DSO within the same network, they can share bidirectional data without it being anonymised. In addition, DSOSMC informs DE-SMIS of incidents which have been identified by DSOSMC and the selected countermeasures. Details of the DSOSMC are available in deliverable D3.4 [4].

4.3.2 Pan-European Security Monitoring Centre

The pan-European Security Monitoring Centre (ESMC) is a novel approach to monitor the security status of critical energy infrastructures. ESMC gathers data from sources distributed across Europe to obtain a comprehensive view of the security status of critical infrastructures. ESMC obtains information provided by DSOs and TSOs across Europe, identifies common patterns describing cyberattacks, and then shares these patterns with the DSOs and TSOs. Thereby, the DSOs and TSOs obtain information that cannot be derived from their local, non-European level.

4.3.2.1 Infrastructure

ESMC is shown in Figure 10 and consists of two components:

• Several instances of a decentralised European Security Monitoring and Information

System (DE-SMIS), which locally collect data on DSO/TSO level, and the

• European Security Monitoring and Information System (E-SMIS) which collects data from

several DE-SMIS instances across Europe. E-SMIS (1) evaluates the data with regard to

common patterns describing cyberattacks and (2) shares the information with DE-SMIS

instances.

See Figure 1 for an overview of the SUCCESS infrastructure and Figure 11 on p.32 for an overview of the SUCCESS Security Monitoring Solution, that is, how the components in the SUCCESS infrastructure interwork. Figure 10 depicts the concept of ESMC. ESMC receives data from the DSOSMC by Interface I3. The DSOSMC is hosted at the DSO/TSO sites (see Ch. 4.3.1 for more details about DSOSMC) and resorts to the breakout gateway to obtain data from NORM. DE-SMIS is also hosted at the DSO/TSO sites and receive data from:

• E-SMIS by Interface I5,

• DSOSMC by Interface I3,

• other DE-SMIS instances by Interface I4 (see Ch. 4.3.2.4 for more details), and

• further sources by Interface I7.

SUCCESS D4.5 v1.0

Page 27 (61)

Figure 10: Pan-European Security Monitoring Centre (ESMC) Functional Description

In particular, DSOSMC is only one data source among many for DE-SMIS. Figure 10 gives an overview of the infrastructure of ESMC and its interplay with other components for all trail sites in the SUCCESS project. Further data sources received by Interface I7 stem from the trial site infrastructure.

4.3.2.2 Functional Description

E-SMIS is responsible for the pan-European view of the critical infrastructure. At this level, external data sources, such as weather data or social media data, are incorporated to the monitoring centre for a comprehensive view of the critical infrastructure. For example, it can be evaluated whether twitter tweets can be automatically classified towards appeals for similar behaviour effecting the energy grid. One example for such an appeal is given by the hashtag “earthhour”. People following this hashtag want to make a mark for the environment by switching off lights for a particular hour. This coherent behaviour can cause instabilities in the energy grid. By identifying such kind of trends in social media, the energy provider can prepare for appropriate counteractions in advance. E-SMIS obtains information from several decentralised instances (DE-SMIS) which lie on the level of DSO/TSO. The rationale for establishing DE-SMIS instances is that they:

• collect, aggregate and analyse locally bounded data. Hence, DE-SMIS reduces traffic by

ensuring that only significant security-related information is shared with E-SMIS.

• make the collected data anonymous. Typically, the data at DSO/TSO level have various

legal restrictions due to privacy issues, such that an anonymisation step is crucial for data

sharing.

DE-SMIS instances share information, such that the E-SMIS can produce an aggregate analysis on a European level. Thus, it becomes possible to detect patterns which cannot be evaluated on local DSO/TSO level. E-SMIS in return shares the analysis results with DE-SMIS instances.

SUCCESS D4.5 v1.0

Page 28 (61)

Thereby, DE-SMIS instances can confirm the patterns by comparing them with the data available in their own network. ESMC searches for unexpected and significant patterns in data which stem from various sources. If such a pattern has been detected successfully, ESMC:

• precisely visualises the results for reasonable human-machine interactions. The

visualisation concept depends on the structure of the data (e.g. streaming or event based

data transmission) which will be transmitted to the ESMC and will take place at E-SMIS.

Particularly, time resolution and the amount of data will determine the underlying

visualisation approach.

• notifies the responsible authorities such that proper countermeasures can be initiated in

a downstream process by the controller.

Differently from ESMC, which does not itself initiated countermeasures, DSOSMC autonomously performs countermeasure selection and initiation based on algorithms and the Countermeasures Knowledgebase Database (see Ch. 4.3.1). Moreover, DSOSMC will visualise the results at DSO/TSO level.

4.3.2.3 Organisational Aspects and Relationship to Directives stated by the European Union

ESMC will have potentially a significant impact on the information sharing strategies of countries of the European Union about their critical infrastructures. It will affect both the communication between member states and member states to authorities of the European Union. Ongoing work on directives of the European Union which affect ESMC is in progress. In fact, ESMC can leverage protecting Critical European Infrastructure (ECI) as it is defined in the “Council Directive 2008/114/EC of 8th December 2008 on the identification and designation of European critical infrastructure and the assessment of the need to improve their protection.” For example, the Directive states that each member state should call a “Security Liaison Officer” who serves as the point of contact for security related issues between the owner/operator of the ECI and the relevant member state authority. As ESMC covers and describes the security status of European DSOs/TSOs, technical reports from ESMC can serve as a basis for the Security Liaison Officer to fulfil his/her duties. Thereby, the following statement of the Directive can be covered: “Each Member State shall implement an appropriate communication mechanism between the relevant Member State authority and the Security Liaison Officer or equivalent with the objective of exchanging relevant information concerning identified risks and threats in relation to the ECI concerned”.

However, an open issue is the definition of the organisational aspects of ESMC. There is ongoing work for answering the question who will run and maintain ESMC. Besides maintaining its computational infrastructure, dedicated personal should run ESMC to serve as a basis contact point for its potential customers, that is, the DSOs/TSOs across Europa. Also, it is not defined yet who will implement and configured the DE-SMIS instances at the DSOs/TSOs. For that reason, there is also ongoing work on possible business cases for ESMC to evaluate whether private companies, state institutions or a mixture of both can be in charge of ESMC.

4.3.2.4 Communication

Local DE-SMIS instances communicate with agents. Agents are linked to exactly one data source and ensure a secure data sharing with DE-SMIS. Data sources can be IT-related systems such as firewall, SIEM (Security Information and Event Management) systems and IDS (Intrusion Detecting System) or energy-related systems such as SCADA system. In particular, the DSO Security Monitoring Centre is an agent of DE-SMIS which provides pre-processing information about threats, incidents and countermeasures for critical infrastructure. See Ch. 4.3.1 for details about the DSO Security Monitoring Centre. Moreover, DE-SMIS instances shall be able to communicate with each other. This communication can be established at a local or national level and serves multiple purposes. The rationale is that threat patterns and attacks identified by a DE-SMIS instance based on its own data are directly shared with other DE-SMIS instances. DE-SMIS instances shall be able to increase the efficiency

SUCCESS D4.5 v1.0

Page 29 (61)

of their search since they will be able to test their own (private) data on the threat patterns from the adjacent DE-SMIS. This kind of communication does not require any significant resource allocation for data processing for the receiving DE-SMIS, therefore the analysis and the comparisons can be conducted quickly and efficiently. These immediate notifications and early warnings sent by a DE-SMIS instance that has identified a threat pattern can prevent geographically correlated attacks to adjacent DE-SMIS instances which is usually the case in recent attacks on multiple DSOs. The latter is stated at the study of the Ecossian Project, supporting the need for communication between local Critical Infrastructures [13]. A fallback mode for the communication between the DE-SMIS instances may be triggered in case the E-SMIS is inaccessible due to malfunction, maintenance or an attack. In this mode, a DE-SMIS shares all its public data with the DE-SMIS instances, as would do with the E-SMIS in the normal mode. The receiving DE-SMIS instances compare these data with their own private data and identify common threat patterns and attacks on regional level. The DE-SMIS communication functionality and the data processing on the DE-SMIS level resemble, in this case, the communication between DE-SMIS and E-SMIS. Although time- and resource-inefficient for the DE-SMIS, this mode mitigates the impact of an attack or a malfunction on the E-SMIS, since the DE-SMIS are still able to communicate with each other and identify threat patterns on a large scale. Therefore, the existence of this mode is necessary within the concept of enhanced security and availability at the E-SMIS level.

4.3.2.5 Data Sources

ESMC analyses energy-related data sources such as data from DSOSMC about incidents and counter measurements, and IT-related data sources such as firewall log files from the DSO/TSO. Additional data sources for the threat detection are open data. Open data can be integrated to strengthen the statement of the data analysis. Potential open data sources are:

• energy data: netzdaten-berlin.de offers 189 historical data sets. Moreover, the European

Network of Transmission System Operators for Electricity (ENTSO-E) offers energy

related open data. Both data sources can be used to train a model that serves as a gold

standard in the downstream evaluation.

• weather data: plenty of providers offer historical, current as well as predictive data of the

weather. A correlation to energy-related data is well-known.

• social network data: further data can stem, for example from Facebook, twitter or

Instagram. We can evaluate a potential correlation between the social network and

energy data, which may lead to an information advantage for the authorities, covering for

example political discussions in the energy sector or novel identified threats to critical

infrastructures.

It has to be evaluated how open data can be incorporated in the data analysis strategy of ESMC.

4.3.3 Relationship between DE-SMIS and DSOSMC

DE-SMIS obtains data from various data sources, including IT-related data such as firewall and SCADA log files and energy-related data such as log files from the control room indicating, for example, a critical exceeding of thresholds for certain components, which are defined by the user, in the field. In particular, DE-SMIS receives data from the local DSOSMC instance at the DSO/TSO level which contains information about the identified threats and incidents, the initiated countermeasures and measurements from the NORM devices (Interface 3, see Ch. 4.5.3). Data provided by DSOSMC is only one data source among many for DE-SMIS. DE-SMIS anonymises the data provided by DSOSMC. At the DSO/TSO level of the trail sites, DE-SMIS and DSOSMC work in a 1:1 relationship. The data received by DE-SMIS from DSOSMC is anonymised by the DSOSMC.

4.4 SUCCESS API

The SUCCESS API allows information exchange between DSO level and pan-European level, that is, the pan-European Security Monitoring Centre (ESMC).

SUCCESS D4.5 v1.0

Page 30 (61)

4.4.1 Definition and Motivation

As the following interfaces directly connect the DSO/TSO level to ESMC, the SUCCESS API includes:

• I3 (interface between DSOSMC and DE-SMIS),

• I7 (interface between DE-SMIS and internal data sources), and

• I8 (interface between DE-SMIS and the edge cloud).

Chapter 4.5 offers a detailed description of the interfaces. End user of ESMC, that is, DSOs and TSOs, will use these interfaces to communicate with ESMC. On the contrary, I4, I5, and I6 do not describe the communication between ESMC and its end users, but only cover the communication between components within ESMC. Consequently, these APIs are not part of the SUCCESS API. Interface I1 and I2 are not linked to ESMC and are therefore not part of the SUCCESS API as well.

The SUCCESS API supports passing information about (1) security incidents, (2) triggering and advising of countermeasures, and (3) further payload between the SUCCESS components. Payload data is data about the grid status (obtained by NORM) or the IT infrastructure (obtained by e.g. firewall log files).

The SUCCESS API is published for use by DSOs/TSOs. The main purpose in exposing the SUCCESS API is to allow interoperability of different components and a flexible, downstream implementation of additional threats and countermeasures. The API definition will be made openly available. However, the actual data passed on the API will be restricted and subject to security controls. No data related to private persons will be passed on the SUCCESS API. Data will be anonymised and private data protected.

The SUCCESS API provides unified definitions of how grid-state data can be made available between different DSOs or between DSOs, organisations operating a pan-European monitoring centre (E-SMIS) and national or local monitoring centres (DE-SMIS). Analysis and comparison of these data at the different levels can reveal abnormalities which may be caused by physical or cyber-attacks. Therefore, the communication via the SUCCESS APIs implements the holistic security approach of SUCCESS, which includes multiple tiers for the detection of a security incident and the initiation of countermeasures.

Chapter 4.5 gives an informal overview of SUCCESS API’s features. Each subchapter of Chapter 4.5 motives the interface for the considered software component. In the paragraphs “Communication Channels” a more detailed description of each interface is given. Here the interfaces are distinguished in particular by the sender, receiver and the format. Moreover, a more detailed description of the interface is given. Appendix B.1 gives an example for a formal definition of one RESTful service of the SUCCESS API.

Interfaces I4, I5 and I6 describe interfaces between components of ESMC and are therefore not part of the official, public available SUCCESS API. Therefore, there are not subparagraphs “Communication Channel” available for these interfaces in this document. However, the SUCCESS consortium plans to make a documentation, similar to the remaining interfaces, available for internal usage by an additional confidential deliverable. Thereby, the SUCCESS consortiums ensures that the ESMC components can be used by all consortium members. By not making interfaces I4, I5 and I6 public available, we provide a quality assurance for all services that are provided by ESMC, as we ensure that only proprietary data processing services offer data to ESMC. Hence, we provide data sovereignty for the end-users by ESMC.

4.4.2 Documentation of the SUCCESS API as a RESTful API

The SUCCESS API follows the REST architecture. REST services typically resort to HTTP, such that the services of the SUCCESS API are triggered by the HTTP methods GET, POST, PUT and DELETE. For each method, the services answer with a well-defined response: with GET a response object is called, with POST, a resource is created, with PUT a resource is altered and with DELETE a resource is removed. Hence, the documentation of the SUCCESS API differentiates between the HTTP verbs and gives the request’s header containing meta information about the REST call as well as the request’s body containing the payload. For the SUCCESS API, the payload usually is given in JSON or XML format. The documentation

SUCCESS D4.5 v1.0

Page 31 (61)

encompasses the response message of the service as well. The response message contains at any rate the HTML status code of the request. Examples for the HTML status code are 200 for OK, 201 for Created and 202 for Accepted. Appendix B.1 gives an example for a formal definition of a service of the SUCCESS API.

In the next version of this deliverable, we plan to populate the SUCCESS API documentation with further formal definitions. Up to now, we resort to a more informal description given by Chapter 4.5. In particular, the subparagraphs “Communication Channels” of Chapter 4.5. will be used as a basis to populate the SUCCESS API documentation.

As the SUCCESS API follows REST, it gives the following attributes:

• Client-Server-Model,

• Stateless,

• Caching,

• Uniform Interface,

• Layered System, and

• Code on Demand.

These attributes lead to various advantages for the SUCCESS API. By encapsulating, the independent development of client and server becomes possible. In particular, the server usually gets a lighter implementation. By using a stateless communication, that is, each operation is self-contained, the API of the services becomes clear and well-defined. REST is a resource-oriented architecture, that is, altering the resource leads change in the client’s state. Each resource has to be identifiable by and Uniform Resource Identifier (URI). Usually, JSON or XML formats are used for REST services.

All standard security features, such as SSL, PGP or OAuth, can be used the secure a REST service. See Appendix B.1 for an example of a documentation of a RESTful API. In the following Deliverable D4.6, all APIs will be documented that way.

4.4.3 Discussion on Data Models describing Cyber-Security Incidents

To support interoperability, to provide transparency to DSOs/TSOs and thereby to increase the acceptance by DSOs/TSOs, for the SUCCESS API, ESMC makes use of widely adopted standards for describing cyber-security incidents and threats. Various data models have been proposed by the research community [19]. For example, Structured Threat Information eXpression (STIX) is a community-driven structured language to define cyber threat information. For that, STIX provides an architecture tying together cyber threat information including Cyber Observables, Indicators, Incidents, Adversary Tactics, Exploit Targets, Course of Action, Cyber Attack Campaigns and Cyber Threat Actors. Moreover, Cyber Observable eXpression (CybOX) is a standardised schema for specifying events that are observable in the operational domain. CybOX resorts to objects describing cyber information. These objects can be for example E-Mail Message objects, Unix Process Objects or Disk Objects. Both, STIX and CybOX, are developed by the United States Department of Homeland Security (DHS) and are used by various research projects such as CES-21 (see [22] for more details) or the ECOSSIAN project (see [17] for more details). The Incident Object Description and Exchange Format (IODEF) is a standard developed by IETF and compiled in Request for Comments (RFC) 5070 [17][18]. IODEF is based on XML and is used to share incidents information of common interest for Computer Emergency Response Teams (CERTs) which typically have an operational responsibility for watch-and-warning over organizations. The data model encompasses information about hosts, networks and the services running on systems. Moreover, it provides information about the attack methodology and can contain forensic evidences [19]. IODEF is used in a number of projects and products [18].

The cyber-security research community has developed strategies and protocols for sharing the above mentioned standardized formats. Usually, STIX resorts to the Trusted Automated Exchange of Indicator Information (TAXII) [21] transport mechanism and IODEF resorts to Real-time network Defense (RID) [23]. However, ESMC established an own transport mechanism between DE-SMIS and E-SMIS to share IODEF and further messages. The mechanism is based

SUCCESS D4.5 v1.0

Page 32 (61)

on Kafka [24] for streaming of data and focuses on a fast, secure and safe communication between the components.

IODEF has been chosen as the core data model used by ESMC for collecting structured event-based incident data. The rationale is that IODEF is an open standard evaluated by various CERTs, vendor neutral in origin and flexible by resorting the XML that allows for extensions and the grouping of events. The XML representation leads to a light-weighted architecture.

Moreover, for an ad-hoc communication proprietary JSON based formats are established in the SUCCESS API. See the detailed description of each interface for more details about the specific JSON messages.

4.5 Interfaces in the SUCCESS Security Monitoring Solution

As depicted in Figure 11, the SUCCESS Security Monitoring Solution (see D4.2 [1] for more details) comprises various devices which exchange information through the depicted interfaces. These interfaces define which data is shared.

Figure 11: SUCCESS Security Monitoring Solution. Interfaces between the elements are denoted as I1…I8

SUCCESS D4.5 v1.0

Page 33 (61)

4.5.1 I1 between NORM and BR-GW or DSOSMC

The I1 interface provides connectivity between NORM and BR-GW or, in configurations without BR-GW, between NORM and DSOSMC. All the information between the rest of the other components (E-SMIS, power system applications) and NORM will be exchanged over this interface. The DSOSMC will handle the real-time information on various metrics that are measured from the smart meter and are proxied or processed by NORM, such as voltage levels, angles between voltages, frequency. This data will be also sent through the BR-GW. Details of the data is included in the D3.7 [5]. Moreover, NORM will send PMU data towards DSOSMC. This data will be sent to the BR-GW and the BR-GW will transmit the data further to the DSOSMC. The NORM will send metadata information and security related data such as tampering detection reports, access control statistics, credentials, challenge / response updates to the BR-GW which will be processed in BR-GW functions and further in DSOSMC and E-SMIS.

4.5.1.1 Communication Channels

4.5.1.1.1 Channel 1

Sender: NORM

Receiver: BR-GW

Short Description: metadata information

Format: JSON/XML

Content: NORM will send its metadata information to BR-GW

4.5.1.1.2 Channel 2

Sender: NORM

Receiver: BR-GW

Short Description: Data Centric Security check

Format: JSON/XML

Content: NORM will send firmware and data signatures over this channel to BR-GW so that the BR-GW can detect any anomalies.

4.5.1.1.3 Channel 3

Sender: BR-GW

Receiver: NORM

Short Description: Action taken on NORM

Format: JSON/XML

Content: BR-GW will take action / send commands to the NORM over this channel.

4.5.2 I2 between BR-GW and DSOSMC

Once the incident countermeasures analysis has been accomplished by DSOSMC and the part of the network and the devices with suspicious behaviour have been identified with some

SUCCESS D4.5 v1.0

Page 34 (61)

probability, the DSOSMC component oversees selecting the best countermeasure among the available ones to solve as much as possible the identified violations. Possible countermeasures are to communicate information on compromised devices and detected anomalies in pattern to the BR-GW. This information is communicated to the BR-GW. The same is possible in the other direction, that BR-GW can detect security anomalies and send alarms on them to the DSOSMC. Hence, there is a bidirectional exchange of data over interface I2 as shown in Figure 11. The functions described in Ch. 4.2.4 such as Data Centric Security will notify DSOSMC over this particular interface.


4.5.2.1.1 Channel 1

Sender: BR-GW

Receiver: DSOSMC

Short Description: Security breach notification

Format: JSON/XML

Content: DCS running on BR-GW will notify DSOSMC of any detected anomalies on NORM. This type of information will be send to DSOSMC over this channel.

4.5.2.1.2 Channel 2

Sender: BR-GW

Receiver: DSOSMC

Short Description: Information Forward

Format: JSON/XML

Content: BR-GW will be able to forward any data / messages received from the NORM to DSOSMC.

4.5.3 I3 between DSO Security Monitoring Centre and DE-SMIS

4.5.3.1 Description

Countermeasures related to attacks through the Neighbourhood Area Network (NAN) zone are addressed by DSOSMC. DSOSMC matches and correlates potential new and old incidents (with high impact and risks detected within WP1), with the most suitable security actions and countermeasures for the specific application scenario related to NAN-level smart devices. DSOSMC selects countermeasures from a countermeasures repository that will be incrementally populated. DSOSMC will share information about the detected threats and incidents, the initiated countermeasures as well as pre-processed data received from NORM with DE-SMIS. At TSO/DSO level, DSOSMC and DE-SMIS interwork in a 1:1 relationship where DSOSMC provides one among many data sources for DE-SMIS. Going further into detail, the data exchanged between DSOSMC and DE-SMIS are bidirectional. On the one hand, DSOSMC analyses grid related measurements (voltages and frequencies) and sends potential anomalies to DE-SMIS. This information allows a meta-analysis to be carried out by ESMC at pan-European level, which may potentially detect also the location of the detected security incident if NORM data is associated with its position obtained from a Geographic Information System (GIS) of the DSO. On the other hand, ESMC analyses data obtained from DSOs/TSOs across Europe and combines them with external data sources such as social data (i.e. twitter streams) to identify new trends in the energy sector. This information is shared with DSOSMC. Also, if ESMC marks a specific set

SUCCESS D4.5 v1.0

Page 35 (61)

of NORMs as being corrupted, then E-SMIS informs the DSOSMC about these compromised NORMs. Interface I3 will be part of the SUCCESS API along with I7.


4.5.3.2.1 Channel 1

Sender: DE-SMIS

Receiver: DSOSMC

Short Description: Attack/Incident found

Format: IODEF/XML

Content: DE-SMIS has detected an incident or an IT attack either by its own logic (for example for the analysis of the edge cloud’s hardware resources) or by a message obtained from E-SMIS which is then forwarded to DSOSMC. Each incident can be described by an IODEF message. The IODEF message contains at least the following protocol specific features: IncidentID, DetectedTime, StartTime, EndTime, Description, Assessment. The IODEF message is exchanged by the HTTP protocol.

4.5.3.2.2 Channel 2

Sender: DE-SMIS

Receiver: DSOSMC

Short Description: Twitter results

Format: JSON

Content: DE-SMIS receives energy-grid related trends from social media twitter channel and forwards them to DSOSMC. DSOSMC informs its authorities via its dashboard about the finding. The finding is described in JSON format including at least a free text description of the social media trend.

4.5.3.2.3 Channel 3

Sender: DE-SMIS

Receiver: DSOSMC

Short Description: Countermeasure description

Format: JSON

Content: DE-SMIS has a list of countermeasures (such as handling attacks on the edge cloud) which DE-SMIS enrolls at DSOSMC. The enrollment is taking place by sending a JSON based message encompassing a ID, a title and a short description of the countermeasure.

SUCCESS D4.5 v1.0

Page 36 (61)

4.5.3.2.4 Channel 4

Sender: DSOSMC

Receiver: DE-SMIS

Short Description: Appeal to initiate countermeasure

Format: JSON

Content: DSOSMC sends a JSON message to DE-SMIS such that DE-SMIS calls a certain countermeasure. The JSON message contains at least the ID of the countermeasure.

4.5.3.2.5 Channel 5

Sender: DSOSMC

Receiver: DE-SMIS

Short Description: Threat/Incident detected

Format: IODEF/XML

Content: DSOSMC informs DE-SMIS about all detected threat or incidents which DSOSMC finds in the field and in particular among the monitored NORM devices. These incidents are described in IODEF containing at least the protocol specific features IncidentID, ReportedTime, StartTime, EndTime, Description, Assessment.

4.5.3.2.6 Channel 6

Sender: DSOSMC

Receiver: DE-SMIS

Short Description: Countermeasure applied

Format: JSON

Content: DSOSMC informs DE-SMIS about all applied countermeasures via a JSON message. The message contains at least the time the countermeasure was applied and the countermeasures’ ID.

4.5.4 I4 between DE-SMIS instances

The interface I4 between the different DE-SMIS shall comprise information and attack patterns that the DE-SMIS has detected as threat relevant, so that an immediate response can be initiated by all DE-SMIS that could potentially be under attack, as explained in Ch. 4.3.2.4. The communication is realised with the use of Kafka platform, which can act as a powerful messaging system. Data transfer specifications are given in the table below, for a normal equipment setup.

Table 4: Interface I4 Specifications

Description of Specification Typical Value(s)

Throughput of transmitting messages (without encryption)

40 – 75 MB/s (depending on the kind of message replication)

Throughput of receiving messages (without encryption)

90 MB/s

Maximum end-to-end latency 15 ms

Median end-to-end latency 2 ms

SUCCESS D4.5 v1.0

Page 37 (61)

Encryption TLS (enabled by Kafka)

Authentication TLS or SASL (enabled by Kafka)

The criticality of a DSO’s response to an imminent attack makes the communication between the DE-SMIS and the DSOSMC instances very important, since the data process at the E-SMIS domain might not meet the stringent time requirements. The time for a cyber-attack to spread can be in the order of magnitude of fractions of seconds. Given that the communication of the DE-SMIS with the E-SMIS is also performed through Kafka, the time that would be required for the transmission of a message through the E-SMIS could be several milliseconds, depending on the load being transmitted. Furthermore, the data processing time of the E-SMIS, which expects a huge load from all European DE-SMIS instances and the external data sources, can last up to a second, increasing excessively the total transmission time. Therefore, the interface I4 provides a direct communication path, which enables the transmission of an attack pattern within the time limits of a few milliseconds. The interface I4 also enables the DE-SMIS which receive the attack patterns to compare these patterns with their own private data and draw more efficiently conclusions about their current status. The interface I4 can be established at a local or national level, as shown in Figure 12. DE-SMIS instances from a same region can communicate, but they do not communicate with a DE-SMIS from another region. Note that each DE-SMIS shares with the E-SMIS instance a larger amount of data than the rest of the DE-SMIS instances. This is because the E-SMIS uses the DE-SMIS public data to detect threats on a larger level than what each DE-SMIS could possibly do by combining its private data and the public data of any other DE-SMIS.

Figure 12: SUCCESS Security Monitoring Solution. I4 between DE-SMIS instances located in the same region

Furthermore, the interface between the DE-SMIS could substitute interface I5 in the emergency case when the E-SMIS may not be operational, which would initiate the fallback mode described in Ch. 4.3.2.4 and shown in Figure 13. In this case, each DE-SMIS would receive the public data from all the other DE-SMIS. This mode also stresses the importance of interface I4 within the SUCCESS solution, as it provides a decentralised and more fault-tolerant approach to the security monitoring system.

SUCCESS D4.5 v1.0

Page 38 (61)

Figure 13: SUCCESS Security Monitoring Solution. Fallback mode in case of E-SMIS unavailability: I4 between all DE-SMIS

Similarly, as in the interface 5, the information shared through I4 needs anonymization towards creating a privacy firewall between the different DE-SMIS entities.

4.5.5 I5 between DE-SMIS and E-SMIS

Interface 5 (I5) in Figure 11 comprises the exchange between E-SMIS and its corresponding DE-SMIS instances. As described in Ch.4.3.2.1, the data on the DE-SMIS are in the domain of the DSO/TSO. Thus, the processing of these data can include information the DSO/TSO has from his own network. As part of the interface I5, the DE-SMIS performs an anonymisation and any further necessary aggregative processing to create a privacy barrier/firewall to ensure that the E-SMIS (and the rest of the DE-SMIS instances at the respective interfaces I4) does not have access and cannot deduce any secondary/private information regarding the underlying data. Interface 5 thus separates the E-SMIS data domain from the DE-SMIS data domain. The E-SMIS data domain can be considered as “public” within the ESMC system, since the E-SMIS aggregates and processes the data and patterns from all DE-SMIS, including open data from external sources. On the contrary, the DE-SMIS data domain is the “private” data domain for each DSO/TSO. The data that the DE-SMIS shares with the E-SMIS includes pattern-specific data, even if the DSOSMC cannot detect any significant abnormalities and of course any further identified attacks/threats. The E-SMIS will share through I5 any identified patterns, weighted with the results from external data sources (see Ch. 4.5.6). Since these data comprise information that is “public”, the E-SMIS part of I5 does not need a further anonymisation process.

4.5.6 I6 between E-SMIS and External Data Sources

Interface 6 in Figure 11 comprises the communication content between E-SMIS and external data sources. External data sources may vary significantly in the type of data that they may provide. For example, at the current stage of development, it is being examined whether social media can provide meaningful data. Such messages could be analysed at the pan-European level concerning issues that are relevant for IT-Security in critical infrastructures. Interface 6 must ensure that the data from these sources are processed and filtered adequately. Hence, social media analysis is performed by E-SMIS and the results are shared with DE-SMIS instances by Interface 5 (see Ch. 4.5.5).

4.5.7 I7 between DE-SMIS and internal data sources

DE-SMIS can further receive data from internal data sources by Interface 7 in Figure 11. This data is called internal since they are within the DSO/TSO data domain, thus they are DE-SMIS private data. Such data can potentially stem from (1) energy-related sources such as SCADA systems and control room software; and (2) IT-security sources such as SIEM systems, firewalls or antivirus scanners. The functionality of interface 7 focuses on extracting the necessary content, similar to the functionality of interface 6. Interface I7 will form part of SUCCESS API along with I3. See Deliverable 4.2 and succeeding versions for more details about the SUCCESS API. Please note that the description of the SUCCESS API in Deliverable 4.2 [3] is updated by this document.

SUCCESS D4.5 v1.0

Page 39 (61)


4.5.7.1.1 Channel 1

Sender: Control Room Software

Receiver: DE-SMIS

Short Description: Control room event found

Format: CSV

Content: The control room software, such as the one from PSI, typically contain algorithm which log security-related events such as log-in/log-out of privileged users or the creating of a new privileged user. Such events are send to DE-SMIS.

4.5.8 I8 between DE-SMIS and Edge Cloud

DE-SMIS can receive the following info from the Edge Cloud:

• Cloud resources status;

• Virtual functions topology, describing the relationships among virtual functions and

physical grid resources (physical network segments).

DE-SMIS can send instructions for the virtual instances relocation to the Edge Cloud. E.g. which virtual instance will be migrated, shutdown etc.


4.5.8.1.1 Channel 1

Sender: Edge Cloud

Receiver: DE-SMIS

Short Description: Edge Cloud events

Format: JSON

Content: The edge cloud continuously monitors its CPU resources and sends it to DE-SMIS.

5. Conclusion

This deliverable describes several components in the SUCCESS Security Monitoring Solution: the pan-European Security Monitoring Centre (ESMC), the DSO Security Monitoring Centre and the Breakout Gateway. All of these components are involved in the detection of security incidents and the implementation of countermeasures to mitigate the incidents. The set of countermeasures to the cyber-security threats applicable for electrical grids has been outlined in this deliverable. The SUCCESS Security Monitoring Solution will be demonstrated in three field trials in electrical grids in Ireland, Italy and Romania. Each field trial implements use cases where cyber-attacks will be emulated and the relevant countermeasure applied using the components of the SUCCESS Security Monitoring Solution. The implementation and testing of the components will be performed in the coming months, as will the field trials. The new knowledge and experiences learned will be fed back into the two further versions of this document which will be prepared later in the SUCCESS project.

SUCCESS D4.5 v1.0

Page 40 (61)

6. References

1. SUCCESS D1.2 v1.0, “Identification of existing threats V1”, April 2017 2. SUCCESS D1.4 v1.0, “Threat Classification and Risk Analysis”, October 2017 3. SUCCESS D4.2 v1.0, “Solution Architecture and Solution Description, V2”, April 2017 4. SUCCESS D3.4 v1.0, “Information Security Management Components and Documentation,

V1”, January 2017 5. SUCCESS D3.7 v1.0, “Next Generation Smart Meter, V1”, April 2017 6. SUCCESS D4.8 v1.0, “Integration and Validation Plan - Test and certification specifications,

V2”, July 2017 7. SUCCESS D5.1 v1.0, “Trial Site Planning”, April 2017 8. Angioni, Andrea, et al. "Coordinated voltage control in distribution grids with LTE based

communication infrastructure”, Environment and Electrical Engineering (EEEIC), 2015 IEEE 15th International Conference on. IEEE, 2015.

9. Dahlman, Erik, Stefan Parkvall, and Johan Skold, “4G: LTE/LTE-advanced for mobile broadband”, Academic press, 2013.

10. Dohler, Mischa, and Takehiro Nakamura, “5G Mobile and Wireless Communications Technology”, Eds. Afif Osseiran, et al. Cambridge University Press, 2016.

11. Pau, Marco, et al. "Low voltage system state estimation based on smart metering infrastructure”, Applied Measurements for Power Systems (AMPS), 2016 IEEE International Workshop on. IEEE, 2016.

12. 3rd Generation Partnership Project “Technical Specification Group Services and System Aspects; Generic Authentication Architecture (GAA); Generic Bootstrapping Architecture (GBA)” Release 13, 2016

13. ECOSSIAN D1.2 V1.0,” Requirements Report “, March 2015, stored at http://ecossian.eu/downloads/D1.2-Requirements-PU-M09.pdf (accessed 20161223)

14. Teixeira, André, Dán, György, Sandberg, Henrik, Berthier, Robin, Bobba, Rakesh B. and Valdes, Alfonso, “Security of Smart Distribution Grids: Data Integrity Attacks on Integrated Volt/VAR Control and Countermeasures',' in Proc. of American Control Conference (ACC), Jun. 2014

15. Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, and Jonathan Turner. 2008. OpenFlow: enabling innovation in campus networks. SIGCOMM Comput. Commun. Rev. 38, 2 (March 2008), 69-74. DOI=http://dx.doi.org/10.1145/1355734.1355746

16. Perez, Ronald, Reiner Sailer, and Leendert van Doorn. "vTPM: virtualizing the trusted platform module." Proc. 15th Conf. on USENIX Security Symposium. 2006

17. Danyliw, R., Meijer, J. and Demchenko Y.: The Incident Object Description Exchange Format, 2007. https://www.ietf.org/rfc/rfc5070.txt, last access: 19th June 2017

18. Farnham, G. and Leune, K.: Tools and Standards for Cyber Threat Intelligence Projects. GIAC (GCPM) Gold Certification, 2013.

19. ENISA (European Union Agency for Network and Information Security): Detect, SHARE, Protect – Solutions for Improving Threat and Data Exchange among CERTs, 2013.

20. Settanni, G. , Skopik, F. , Shovgenya, Y. , Fiedler, R. , Carolan, M. , Conroy, D. , Boettinger, K. ,Gall, M. , Brost, G. , Ponchel, C. , Haustein, M. , Kaufmann, H. , Theuerkauf, K. , Olli, P. “A collaborative cyber incident management system for European interconnected critical infrastructures” Journal of Information Security and Applications, May 2016

21. https://stixproject.github.io/, last access: 19th June 2017 22. http://nationalinterest.org/blog/the-buzz/how-california-protecting-its-critical-infrastructure-

cyber-18366, last access: 19th June 2017 23. Moriarty, K.: Real-time Inter-network Defense (RID), 2012, https://tools.ietf.org/html/rfc6545,

last access: 19th June 2017 24. Kreps, J., Narkhede, N. and Rao, J.: Kafka: a Distributed Messaging System for Log

Processing, Proceedings of 6th International Workshop on Networking Meets Databases (NetDB), 2012

25. http://www.trustedcomputinggroup.org/trusted-platform-module-tpm-summary

http://ecossian.eu/downloads/D1.2-Requirements-PU-M09.pdf

https://www.ietf.org/rfc/rfc5070.txt

https://stixproject.github.io/

http://nationalinterest.org/blog/the-buzz/how-california-protecting-its-critical-infrastructure-cyber-18366

http://nationalinterest.org/blog/the-buzz/how-california-protecting-its-critical-infrastructure-cyber-18366

https://tools.ietf.org/html/rfc6545

SUCCESS D4.5 v1.0

Page 41 (61)

7. List of Abbreviations

3GPP 3rd Generation Partnership Project, mobile communications standardisation body 4G 4th Generation mobile communications system 5G 5th Generation mobile communications system API Application Programming Interface BR-GW Breakout Gateway BSF Bootstrapping Server Function CA Certificate Authority CS Cyber-security DCS Data Centric Security DoS Denial of Service DDoS Distributed Denial of Service DE-SMIS Distributed part of European Security Monitoring and Information System DSO Distribution System Operator DSOSMC DSO Security Monitoring Centre ESMC Pan-European Security Monitoring Centre E-SMIS Central part of European Security Monitoring and Information System EPC Evolved Packet Core, core network of LTE system E-UTRAN Evolved Universal Terrestrial Radio Access Network, part of EPC network EV Electric Vehicle FLISR Fault Location, Isolation, and Service Restoration FIWARE Future Internet Ware, open source components, development environment, platform GBA Generic Bootstrapping Architecture GE Generic Enabler HSS Home Subscriber Server, part of EPC network ICT Information and Communication Technology LTE Long Term Evolution (4th Generation mobile communications system) M2M Machine-to-Machine MME Mobility Management Entity, part of EPC network NAF Network Application Function NAN Neighbourhood Area Network NFV Network Function Virtualisation NORM New-generation Open Real-time Smart Meter P-GW Packet Data Network Gateway, part of EPC network PMU Phasor Measurement Unit PUF Physically Unclonable Function PS Physical security QoS Quality of Service SCADA Supervisory Control and Data Acquisition SDN Software Defined Network S-GW Serving Gateway, part of EPC network TEE Trusted Execution Environment TPM Trusted Platform Module TSO Transmission System Operator UE User Equipment, handset in EPC network USM Unbundled Smart Meter UUID Universally Unique Identifier VM Virtual Machine

SUCCESS D4.5 v1.0

Page 42 (61)

A. Descriptions of Security Incidents and Countermeasures

A.1 Incident/countermeasure CS-1

A.1.1 Description

Incident name: Device behaving suspiciously in terms of communication pattern or content Since both suspicious traffic patterns and suspicious message content result in the same countermeasure being used, both types of incidents are covered by one countermeasure, CS-1. However, as the incidents of the two differ, the description in this chapter will be divided to two sub-chapters accordingly. This type of incidents and corresponding countermeasures will also be implemented at trial sites and there are trial site use case specific sub-chapters in this chapter covering those cases.

A.1.1.1 Suspicious communication patterns

Incident characteristics: The threat, or incident, to which this countermeasure is applied is when a device in the network behaves suspiciously with respect to communication patterns. In practice, this means that the communication pattern does not match the typical one for a device of this type e.g. with respect to the frequency of messages, size of message or communication peer, i.e. too many or too few packets, possibly sent too often or seldom either to expected or unexpected peers. Root cause: The cause for these suspicious behaviours, if indeed erroneous, stem from the device malfunctioning, which can be a result of a bug, configuration error or the device has been tampered with. Furthermore, an attacker on the path between the communication end-points could modify the traffic pattern as could an attacker that has compromised the device and modified its configurations. The software of devices in the network should be updated regularly to remove found vulnerabilities or other unwanted features and to add new functionality to add value. Software here means anything from firmware to operating system, drivers, applications and configurations. Even when these are updated/modified with good intent, it is possible that it results in unwanted behaviour due to bugs or erroneous configurations. Also, hardware bugs/breakdowns are possible and can result in similar behaviour. Incident identification: Identification of incidents where the communication pattern is modified is done by having the monitoring centre verifying traffic in the network against accepted/expected traffic patterns. This might be done as a collaborative effort, with nodes in the system reporting on suspicious behaviour. When the traffic differs from the expected more than a set threshold it is an indication that the incident has occurred and the appropriate countermeasures should be taken. An example of traffic pattern being suspicious could be if a high number of commands are received at the corresponding local devices (NORM). This would be an indication that the orders may be malicious and that they have to be invalidated before being executed.

A.1.1.2 Suspicious message content

Incident characteristics: The threat, or incident, to which this countermeasure is applied is when a device in the network behaves suspiciously with respect to the content of the communication. This can either be by having the device provide data that is outside the normal range and thresholds defined for the device type or even by having the data be of unexpected type. This can also mean that the data pattern acquired by the device does not match in terms of data consistency with the data acquired from other devices in the power grid, e.g. the frequency, voltage angles or voltage level range acquired from the NORM device from a specific network location is not consistent with the data acquired from other different power network places. Furthermore, messages with invalid integrity or confidentiality protection are also a sign of something being wrong. It should be noted that to avoid manipulation of private data, NORM sends in principle only non-private data, usually grid-related and not personal-related data. Root cause: The cause for these suspicious behaviours, if indeed erroneous, stems from the node or its credentials have been compromised. This means that an attacker has managed to gain control of the device or its credentials and fake data is now transmitted to the different actors,

SUCCESS D4.5 v1.0

Page 43 (61)

e.g. to the DSO. This is typically catered for by having security flaws in the device software that the attacker can abuse for gaining unauthorised access. This then links back to the previous sub-chapter and the need to update the software of the device. The alternative is that the attacker generates traffic without being able to provide valid integrity and/or confidentiality protection and authentication of the data and its end-point. Unauthenticated or unauthorised messages, with invalid integrity or confidentiality protection is trivial to identify, and acts as an alarm that someone is trying to attack the system. It is a signal of either an attacker generating own traffic or modifying traffic sent in the network. This specific attack type is dealt with in CS-3. Assuming the communications and physical security measures discussed in deliverable D4.2 [3] have been implemented the only way an attacker can produce legitimate looking data is by gaining access to the device and/or its credentials. With end-to-end security applied, an attacker will not be able to inject data or generate own valid looking messages without approved credentials. For cases where the attacker has access to valid credentials identifying the incident/actions might be anything from easy to impossible depending on how aggressive the attacker is. Hardware faults and malfunction can also result in similar behaviour, as can also situations where the validity time of a certificate has expired, but is not the main focus of this work as we are here dealing with attacks rather than device failures. Incident identification: When the attacker has access to the target node and its credentials he can either modify the behaviour of the node, as discussed earlier or modify the content of the communication. In this case, the monitoring centre will notice the incident by either analysing the received data against the expected or by comparing consistency of data acquired from different electrical grid points, which may include also commands such as order of local breaker disconnection. When statistical deviations of selected parameters (frequency, voltage level and voltage angles pattern) differs from thresholds (signalisation and alarm levels), it is an indication that the incident has occurred and the appropriate countermeasures should be taken.

A.1.1.3 Countermeasure

The analysis of the devices’ traffic is performed by DSOSMC and DE-SMIS at the DSO/TSO level. As described in Ch. 4.3.1 and Ch. 4.4, DSOSMC provides certain data features such as RAM usage to DE-SMIS. DE-SMIS correlates this information with information from other sources (see Chs. 4.5.5 and 4.5.6) to extract new findings about the system’s status. If the affected device is part of the (edge) cloud, the VMs deployed on it as part of the double virtualisation solution should as a precaution be moved to another location to maintain the functionality they provide and prevent the attack from spreading. Functional VMs should be re-deployed, i.e. they will start with a clean state without any potentially harmful modifications that exist in the suspiciously behaving device. Data VMs should be migrated to not lose any data they store. In addition, the network needs to be re-configured to route the associated traffic via the new VM instances/locations rather than the original, potentially infected, ones. After the incident identification, remote attestation should be performed on the suspiciously behaving device. This will verify the state of the device and can find indications of compromising of devices. In CS-1, we assume the attestation does not fail, i.e. the device state is OK. Thus, the device should be red-flagged, indicating there is an unresolved issue with it, and it (preferably only the affected functional VM) should be isolated from the network pending further investigation.

A.1.1.4 Relation to threats

T000: The incident and countermeasure are applicable when the source of the DoS attack is a node in the system. Attacks by external sources of DoS are covered by CS-5. T100: All threats except T111, which is more related to physical security (EM), can be handled. T107 is not covered here, but should not be a big issue in a system with proper security configuration. Only the traffic pattern and message sizes are available to the attacker if the content is deemed to need confidentiality protection. T200: Whenever a malware modifies the behaviour of the targeted host it will be covered by this incident/countermeasure. Also, malware that has a control channel to the attacker for gathering data or reporting back can be identified based on untypical communication patterns.

SUCCESS D4.5 v1.0

Page 44 (61)

T300: From this threat group, the communication needed to the target device for modifying SW will be noticed as untypical communication. In addition, T301 will modify data and be caught either here or in CS-3. T307 relates to physical security and is handled there. In addition, the HW manipulation might lead to suspicious network behaviour and could be caught here as well. T400: Most threats in this group result in changed traffic patterns and will be caught here. A solution for T405 is being worked on in the project, but could potentially also be caught by comparing values reported from across the network and identifying discrepancies. T410: This will be identified as an abnormal traffic pattern. In addition, password protected systems should as default have limits for re-try attempts to counter these types of attacks. T420: If the data is sent over the network, it will be identified as unusual traffic pattern. In case the data is accessed physically from the device, the physical security incident/countermeasures will be alerted to the attack and device logs should be able to provide information of actions in the device. T700: Only some of the threats are covered here; T701 if data is sent out from the system, T702 potentially if the virus is transferred over the network, T703, T704 assuming the device starts to behave strangely and T705-T707 if the information is sent over the network.

A.1.1.5 Concrete use case from Irish trial

The incident description that would result in countermeasure CS-1 to be applied covers a lot of different threats and attacks. One concrete example is given in deliverable 5.1, in Ch. 2.1.2, regarding the Irish trial. The use case is an attacker that wants to destabilise the network by issuing a TSO command for load interrupt of EV chargers, possibly in massive scale. To increase the effect the attacker could even toggle (on-off-on-….) the command to create big fluctuations. When this type of commands, regardless if they are genuine or not, are issued the DSO must react to them instantaneously, there is no time for human interaction but the response must be automated. The countermeasure to this type of attack would be for the DSO to verify that the network indeed is in a state that the received commands are warranted. Basically, the DSO verifies from its own network state information that the TSO has sent the commands based on a real issue in the network. The NORMs either provide, or can be queried by the DSO about, grid frequency evolution, from which aspects related to grid stability at distribution level can be inferred. The DSO can thus verify that the received command is indeed a response to instability in the network and that the command should be executed. If the NORMs do not report any instability (by analysing the level of grid frequency measured locally by NORM), the DSO can decide to block the received commands. One thing to note is that this countermeasure would be applied always when the TSO requests, which could have a significant impact on the traffic level received at the DSO. The commands could be spoofed, i.e. the attacker does not have access to the TSO and its systems. In this case, the DSO will notice this from the fact that the message does not have a valid signature, is not coming from an authenticated end-point or is not encrypted with the TSO/session key or all of the above. However, if the command is received from the TSO, e.g. because a disgruntled employee with suitable access rights sent it, it would not be possible to detect the attack from the message itself. In this case, the solution described above would be used for detecting the attack. The countermeasure to the attack, would also contain additional steps to the blocking of the malicious commands. An alarm should be raised at the DSO that the TSO has issued a malicious command. This would result in an investigation of where the command originated and who issued/authorised it. This partly correlates to the remote attestation step and the investigation of the generic countermeasure in CS-1. In addition, DE-SMIS should be made aware of the attack attempt as it could be a part of a larger scale attack targeting multiple network. E.g. two neighbouring networks that get targeted with this attack in suitable synchronisation could be even more devastating than a single occurrence. The identified source of the command, if provided via double virtualisation, could be re-deployed at a new location to counter cases where malware or an attacker in the device is the issuer of the command. Remote attestation could also be performed to rule out malicious configurations and alternations of the suspected node. However,

SUCCESS D4.5 v1.0

Page 45 (61)

as the source of the attack is outside the DSO domain, the attestation of the node should be done by the TSO after being informed of the incident by the DSO.

A.1.1.6 Concrete use case from Romanian and Italian trials

The correct measurement and reporting of grid data is essential for any system to be able to use it efficiently for control and optimisation. In case of cyber-penetration at the level of measurement data, the message content might get modified. The incident description that would result in countermeasure CS-1 being applied, which covers a lot of different threats and attacks. One concrete example is the use case when an attacker that wants to destabilise the network by sending malicious / wrong grid data to the DSO. When this type of suspicious reports is received by the DSOSMC, they are assessed for consistency with such data received from other NORM devices, and an abnormal evolution can lead to alarm flags being raised. The particular data which is scrutinised for grid consistency are the voltage levels and frequency values, which are both essential for the grid assessment and are non-private by nature, appropriate to be analysed on a massive scale, meaning from any smart meter, e.g. from each NORM device.

A.1.2 Related SW Functions

From the communication perspective, the data centric security function in BR-GW described in Ch. 4.2.4 can be utilised to verify the state of the device. During device deployment, a signature based on device parameters is generated from data centric security function. This signature is taken as base for verifying state of device in real-time. Whenever the signature differs from the base signature taken during deployment, the BR-GW can act to block the traffic originating from that device. Also, it can further be used to notify DSOSMC regarding status of infected device. DSOSMC performs for online monitoring of network traffic patterns, deduction of usual behaviour and detection of suspicious behaviour (e.g., through threshold alerting when exceeding normal traffic levels deduced from historic monitoring data). DSOSMC performs for online monitoring of consistency of acquired data from the local devices (with focus on NORM devices) of the power network, deduction of usual behaviour and detection of suspicious behaviour (e.g., through threshold alerting when exceeding a certain inconsistency levels deduced from historic monitoring data or a suspicions number of commands which may disconnect the prosumer). When suspicious behaviour is detected this information is addressed to DE-SMIS which executes a meta-analysis. DSOSMC should itself also potentially react to this incident by moving VMs according to double virtualisation, performing remote attestation on suspiciously behaving nodes etc.

A.1.3 Related Infrastructure

Once DSOSMC has detected a security violation, it can first apply VM moving according to double virtualisation if possible. It tells BR-GW that one or more NORMs are compromised and must be disconnected and optionally their data has to be discarded. DE-SMIS at TSO/DSO level receives data from DSOSMC and combines them with further data sources such as firewall logs (see Figure 11) to extract new findings about the systems status. Therefore, DE-SMIS can generate a holistic view of the system to red-flag devices which cannot be trusted any more. After red-flagging the device, DE-SMIS shares the potential finding with E-SMIS. E-SMIS then searches on a pan-European level for significant patterns which indicate an attack. All new findings obtained by E-SMIS or DE-SMIS can potentially lead to a new strategy how to red-flag devices. However, how to proceed with these findings and how to implement a new red-flagging strategy is outside the scope of the success project. The cloud environment that applies the Double Virtualisation logic can be used to move the attacked virtual instances to another physical or logical entity. The moving process can be fully automatised. E.g. attacked data can be moved to another server or IP network. The implementation method used for the Double Virtualisation infrastructure considers also decoupling of applications and data. In fact, the data coming from the field devices is stored in a dedicated virtual instance managing only the data storage. In this way, the migration of both

SUCCESS D4.5 v1.0

Page 46 (61)

applications and virtual devices will be easier and faster, since it will not require also the migration of a large number of historical datasets.

SUCCESS D4.5 v1.0

Page 47 (61)

A.2 Incident-countermeasure CS-2

A.2.1 Description

Incident name: Remote attestation fails Incident characteristics: This incident is an alternative branch of the flow of CS-1, and relates to the situation where the remote attestation fails for the suspiciously behaving device. This indicates that there is indeed something seriously wrong with the node. Root cause: The fact that the remote attestation fails means that the state of the node is not acceptable, i.e. the node has been modified in some unacceptable way. This could be by malicious action or fault/bug in the system. Incident identification & countermeasure: Once the remote attestation fails the node is deemed to be compromised and it needs to be immediately isolated from the network. A maintenance unit needs to be sent to the node to reset and re-bootstrap the device. Re-bootstrapping means the device will get new credentials, which is necessary since it is possible that the device has been compromised by an attacker and the credentials can have been revealed. The old credentials of the device need to be revoked from the system. Also, the credentials of the VMs potentially run on the device need to be updated.


The same threats are covered as for CS-1 as this one is an extension of it. T000: The incident and countermeasure are applicable when the source of the DoS attack is a node in the system. Attacks by external sources of DoS are covered by CS-5 T100: All threats except T111, which is more related to physical security (EM), can be handled. T107 is not covered here, but should not be a big issue in a system with proper security configuration. Only the traffic pattern and message sizes are available to the attacker if the content is deemed to need confidentiality protection. T200: Whenever a malware modifies the behaviour of the targeted host it will be covered by this incident/countermeasure. Also, malware that has a control channel to the attacker for gathering data or reporting back can be identified based on untypical communication patterns. T300: From this threat group, the communication needed to the target device for modifying SW will be noticed as untypical communication. In addition, T301 will modify data and be caught either here or in CS-3. T307 relates to physical security and is handled there. In addition, the HW manipulation might lead to suspicious network behaviour and could be caught here as well. T400: Most threats in this group result in changed traffic patterns and will be caught here. A solution for T405 is being worked on in the project, but could potentially also be caught by comparing values reported from across the network and identifying discrepancies. T410: This will be identified as an abnormal traffic pattern. In addition, password protected systems should as default have limits for re-try attempts to counter these types of attacks. T420: If the data is sent over the network, it will be identified as unusual traffic pattern. In case the data is accessed physically from the device, the physical security incident/countermeasures will be alerted to the attack and device logs should be able to provide information of actions in the device. T700: Only some of the threats are covered here; T701 if data is sent out from the system, T702 potentially if the virus is transferred over the network, T703, T704 assuming the device starts to behave strangely and T705-T707 if the information is sent over the network.

SUCCESS D4.5 v1.0

Page 48 (61)


In case of failed attestation, BR-GW can put the device in its black list and will not accept traffic from the device anymore. DSOSMC performs remote attestation. If remote attestation fails, BR- GW informs DSOSMC about the failed node attestation and BR-GW blacklists the device.


Breakout Gateway, DSOSMC. DSOSMC shows to the DSO operator a set of possible countermeasures that can be put in place: • Send a maintenance unit to reset and re-bootstrap the device. • Disconnect the device from the grid sending information to BR-GW

SUCCESS D4.5 v1.0

Page 49 (61)


A.3.1 Description

Incident name: Unauthorised messages Incident characteristics: Network sees unauthorised messages, meaning the messages might not have a (valid) signature or might be of wrong type considering who is sending/receiving the messages. Root cause: Someone tries to send unauthorised messages, possibly with the intent to disrupt the system or gain information about the system. Incident identification & countermeasure: Either the monitoring centre notices these unauthorised messages or a node in the system can report to the monitoring centre about the unauthorised messages it sees. The first step is to try to identify the source of these messages, which could be either a node registered in the system or an unauthorised node attached to the system. Based on the source address of the unauthorised messages the node and network segment in which the node resides can be identified. For attacks including spoofing of source address a more detailed analysis of network behaviour and the suspicious flow need to be done to identify the network segment and/or node from where the messages are coming. If the messages are originating from a node in the system, countermeasure CS-1 should be applied on that node to verify its state. If the source of the messages is not part of the system, it means there is an unauthorised node in the network and the network segment from which the node is sending the messages should be isolated to minimise the damage. Isolation might here mean isolating just the traffic of the one unauthorised device so as to not disrupt the operation of the rest of the system which resides in or close to the network segment where the unauthorised device is operating. Finally, the network segment should be investigated to find the unauthorised node, or its point of entry. The result of the investigation should be identification of the node and/or point of entry to the network, and removing the node from the network as well as pre-emptive operations to prevent similar future attacks.


T000: Depending on DoS attack, the DoS messages will be categorised as unauthorised messages, e.g. due to being unexpected type or from unexpected source. T100: Since mutual authentication should be used an attacker on the path will be noticed if he tries to do a man in the middle or similar type of attack where he impersonates a peer. T107 is not covered here, but should not be a big issue in a system with proper security configuration. Only the traffic pattern and message sizes are available to the attacker if the content is deemed to need confidentiality protection. Also, T111 is not covered here as discussed in CS-1 and CS-2. T200: If the malware is capable of using the credentials of the device it will be able to send authorised messages so it will not be caught here. However, CS-1, and CS-2 at least partly covers those cases. T300: T301 will result in unauthorised messages and will be caught. Also, T301-T307 might be identified here depending on how they are executed towards the device, i.e. is the device expecting authenticated messages.


The GBA function of the Breakout Gateway, described in Ch.4.2.4, can be utilised for the authenticating the application messages based on SIM card authentication done by cellular network provider. Furthermore, NMF function can check for the unauthorised transmission of packet on a network address. Based on detection of such events, action can be taken by BR-GW to isolate the device from network.


Breakout Gateway, DSOSMC, DE-SMIS.

SUCCESS D4.5 v1.0

Page 50 (61)

If an attack is detected by DSOSMC, it can attempt to move VMs according to double virtualisation. If this does not succeed, an alarm will be invoked at the DSOSMC and DSOSMC will inform BR-GW to block certain devices/traffic. If DSOSMC cannot detect the attack, it will be left for DE-SMIS to make the decision. DE-SMIS has a wider scope of knowledge and with machine learning algorithm can decide if disconnect the node.

SUCCESS D4.5 v1.0

Page 51 (61)


A.4.1 Description

Incident name: Virus detected in node. Incident characteristics: A virus is found in a node either based on antivirus software in the node identifying the virus or monitoring centre identifying communication pattern of a node matching the pattern of a virus infection. Alternatively, the result from investigation in countermeasure CS-1 might show that there is a new malware on the loose, which has affected the node. Root cause: Malware has found its way to a node. Incident identification & countermeasure: If the affected device is part of the (edge) cloud, the VMs deployed on it as part of the double virtualisation solution should be re-deployed to prevent the malware from spreading. Both functional and data VMs might have the virus so just moving them is not an option, but instead fresh instances need to be initialised. If the malware is suspected to reside in the physical host, then the VMs should be re-deployed in another physical device. The device (if suspecting device contamination), and/or the infected VMs (regardless of if device or VM contamination) should be isolated from the network to minimise risk for further infections and damage. If available, a backup node should be enabled to maintain full system operation. The infected device and/or VMs should be cleaned of the malware, which is most securely done by reinstalling the device and terminating the VMs. For data VMs, the stored data could optionally be backed up so that it could be taken into use after verifying it is not contaminated. After the device is cleaned and the VMs terminated and re-deployed, also the peers need to be verified to not have been infected. If the malware is new, i.e. not part of the known malware definitions, it should be added and the definitions should be updated across the system.


T000: If the DoS is initiated by a malware in the device it will be caught here. T200: T204, T206, T208, T209, T210, T213 might not get detected by anti-virus or similar tools. However, device logs should record at least some of these types of events. T700: CS-4 mainly covers T702, T704.


If NORM detects any kind of virus or any suspicious activity, then it can inform BR-GW so that BR-GW can take some action on it accordingly. In addition, the Network Monitoring Function described in the Ch. 4.2.4 can detect DoS attacks initiated from a device in the system. Once such event is detected, BR-GW can block traffic originating from that device. DoS attacks originating outside the devices of the system are covered in CS-5. Furthermore, action can be notified to the DSOSMC. If a successful infection is detected by DSOSMC, it should initiate the countermeasure and report the incident to DE-SMIS for information sharing purposes.


Breakout Gateway, DSOSMC, DE-SMIS.

SUCCESS D4.5 v1.0

Page 52 (61)


A.5.1 Description

Incident name: DoS suspicion Incident characteristics: The system, or a node in the system is targeted in a (D)DoS attack. This typically means that a huge amount of service requests (anything from an echo request (ping) to service specific API requests) are sent to the victim to exhaust its resources. This can be identified by analysing the traffic (abnormal amount of (similar) requests to the target) or the state of the node (resources exhausted). The node can itself report on this or some other node in the network can report on the abnormal traffic amount. Root cause: An attacker wants to disable parts of or the whole system and launches a (D)DoS attack. Incident identification & countermeasure: When the (D)DoS attack has been identified the associated flows and flow types (e.g. based on message type, protocol, source address and port destination address and port) should be blocked, if possible (it might not be possible to distinguish attack flows from legitimate flows), at the edge of the network so that the attack flow does not have to be processed and forwarded by more nodes than necessary. This can be done by configuring firewalls to block those flows/flow types. In addition, if the affected node is part of the (edge) cloud, the VMs deployed on it as part of the double virtualisation solution should be moved to another location to maintain the functionality they provide and prevent the attack from taking down that part of the service/network. The VMs should be migrated to not lose any data they store. In addition, the network needs to be re-configured to route the associated traffic, except the DoS traffic, via the new VM locations rather than the original attacked one. Especially if the attacked node has services not running on VMs, i.e. services that cannot be migrated, a backup node for the attacked node should be enabled to maintain operation of the system. Furthermore, depending on the attack, the load in the network should be distributed to maintain operation. The attack flows should be analysed to verify it is actually an attack (natural disaster or other similar event could result in a lot of traffic being generated in the network).


T000: CS-5 deals with external DoS sources while CS-1, CS-2 and CS-4 also deal with cases where the DoS source is part of the system. T300: T305 might be identified as a DoS attack. T410: Brute forcing access to a system can be seen as a form of DoS attack as it means the system is bombarded with access requests. T700: CS-5 mainly relates to T703


The Data centric security function described in Ch. 4.2.4 can enable detection of malware insertion at device level. The BR-GW can verify the device integrity by regular checks of the device’s log; it can enable check of any unauthorised software insertion at device level (malware). Furthermore, signatures can be updated based for the firmware updates on the device. The BR-GW, after detecting changes in the device’s log using the Data centric security function, can block the traffic originating from device. After such an incident occurs, peers of infected device can be verifying by data centric security function. In case of a DoS attack on BR-GW itself, SDN technology can handle it and will block the traffic. The DSOSMC can gather information about traffic from NORM and BR-GW. This information then can be analysed using the analytic module, allowing construction of an immune based memory which allows classification of, and dynamical reflection of changes in, the analysed data. It sends

SUCCESS D4.5 v1.0

Page 53 (61)

this information to DE-SMIS to eventually put in place a countermeasure to block the illegitimate traffic flows. If the E-SMIS is targeted in a (D)DoS attack, it is incapable of receiving the public data from the DE-SMIS and external sources and sending back data with identified threat patterns. The fallback operation mode of the DE-SMIS, explained in Ch. 4.3.2.4, should be triggered in this case, so that the information exchange between them is not compromised. The DE-SMIS send all their public data to their adjacent DE-SMIS and an aggregate analysis at DE-SMIS level is possible. If a DE-SMIS is under (D)DoS attack, it red-flags itself to clarify to other DE-SMIS instances that they should avoid sending data. Only the communication path with the E-SMIS is preserved and the data exchange with it may become feasible. The DE-SMIS is able to receive indirectly through the E-SMIS information and threat patterns of their adjacent DE-SMIS instances.


Breakout gateway, DSOSMC, DE-SMIS. Double Virtualisation infrastructure in the cloud environment.

SUCCESS D4.5 v1.0

Page 54 (61)


A.6.1 Description

Incident name: Security algorithm deemed insecure Incident characteristics: Academia, standardisation organisations or other authority states that a security algorithm deployed and used in the network is insecure and should be deprecated. Root cause: Some entity has found a weakness in either an algorithm or a specific implementation of an algorithm. Incident identification & countermeasure: Reliable publication of affected algorithm and proof of weakness. If the weakness is serious and could affect operations, the affected algorithm needs to be deprecated system wide and replaced by another one. Already when designing and deploying a system, three should be defined multiple accepted security algorithms so that there are alternatives to fall back on in these types of scenarios. In addition to deprecating the affected algorithm and configuring the system to use one of the existing alternatives, a suitable new replacement algorithm could optionally also be identified and analysed. Once the new algorithm is approved, it can be installed in all places where the old (vulnerable) algorithm was used.


T100: CS-6 deals with T100 types of threats where the attacker tries to exploit security algorithm weaknesses to get access to information or even modify it, i.e. everything except T110 and T111. T200: If the malware or the injection of it takes advantage of weak security algorithms the threats are covered here. T300: If the manipulation is based on broken security algorithms the threats are covered here.


The cryptographic library that implements and provides the affected vulnerable algorithm needs to be updated and/or the node(s) using it needs to be re-configured to not use the affected algorithm and the new replacing algorithm needs to be taken into use.


Any node in the system that uses cryptographic functions might be affected depending on if they have been configured to use the affected library or vulnerable algorithm. For NORM, in specific cases when deployed security algorithm need more computational power or need changes in the HW which assist the security functionality (e.g. related to hardware needed to implement the PUF), the new algorithms may ask also for hardware upgrades. The Unbundled Smart Meter (USM) concept allows that both hardware and software upgrades can be made without affecting the hardware and functionality of the Smart Meter and of the PMU, as components which communicate with the Energy Gateway.

SUCCESS D4.5 v1.0

Page 55 (61)

A.7 Incident-countermeasure PS-1

A.7.1 Description

Incident name: Perimeter breached Incident characteristics: Unauthorised access to a site of an infrastructure device or other company premises. The site surveillance equipment, e.g. motion sensor, notices non-approved physical action or presence. Root cause: An unauthorised person accesses the site and triggers at least one of the monitoring equipment surveilling the site. Incident identification & countermeasure: An alarm is raised at the monitoring centre indicating the site and what has been observed. Security personnel is dispatched to the site to verify that everything is OK, or handle any situation that has occurred. Any breached or broken protection mechanism (doors, windows, locks etc.) need to be repaired or replaced.


T100: If the attacker tries to gain access to a communication channel by physically inserting himself on the path it can be covered here. T300: PS-1 mainly relates to T307, but could also be applied for T301-T303, T305 and T306 if the exploit requires physical access to the device. T500: This is the main threat area covered by PS-1. Depends on if the attacker has access rights or not if the alarms will be triggered.


None.


Any device that is being monitored against unauthorised access.

SUCCESS D4.5 v1.0

Page 56 (61)


A.8.1 Description

Incident name: Device casing breached Incident characteristics: The casing of a node or end-device is opened without authorisation. A sensor inside the device monitors the state of the case and signals the monitoring centre whenever it is opened. Root cause: Someone tries to open the case, possibly to try to modify the device e.g. to report wrongful meter readings. Incident identification & countermeasure: An alarm is raised at the monitoring centre indicating the node where the casing has been breached. If the incident is at an infrastructure node, security personnel is dispatched to the site as in PS-1. For end-devices this step can be skipped. Then, remote attestation is performed to verify the device state. Once security personnel have verified site integrity and safety, a maintenance person can be sent to the site if deemed necessary (based on visual observation and/or remote attestation result). The maintenance person should reset and re-bootstrap the node to generate new credentials for it and get those credentials properly configured to the backend/system and have the old credentials revoked. If needed, the device should be repaired or even replaced if required and the casing sensor should be re-applied. The same maintenance steps should be taken also for end-devices. Maintenance fees should fall on the owner of the end-device if the device located in owner premises.


T100: Here we mainly deal with T111, and the EM part of it (assuming the casing is protecting against EM radiation). Also, T105 and T112 are relevant when considering attempts to report modified usage statistics etc. T300: PS-1 mainly relates to T307, but could also be applied for T301-T303, T304, T305 and T306 in the case the exploit requires physical access to the device. T400: Relating to cases where the attacker would need physical access to the device to alter its measurements; T404. T500: PS-2 deals mainly with T501 and T504, but could also cover T506.


At the level of NORM, a micro-switch will be considered to monitor the housing of NORM, which should be read by the Energy Gateway of NORM and sent as alert message to the Security Agent running in the Energy Gateway, event which need to be sent to the DSOSMC.


As presented in Ch. A.8.2, the NORM housing will have a micro-switch to monitor eventual opening and intrusion inside for having NORM hardware access. The DSOSMC will get the alarm and initiate the countermeasure.

SUCCESS D4.5 v1.0

Page 57 (61)


A.9.1 Description

Incident name: Communication link unavailable Incident characteristics: The node’s communication link is unavailable, i.e. the node cannot be reached and it cannot communicate with others. Looks like node is not available and as such it is not maybe possible right away to identify the incident as being communication link unavailable. Root cause: For this incident characteristics, multiple incidents are possible (PS-3 – PS-5), and the reason for the device not being available could be either HW based or a result of a cyber-attack. The latter should be covered and protected against by cyber security countermeasures discussed earlier with respect to suspicious behaviour, DoS and malware incidents. For PS-3, the reason for not being reachable is that the communication link of the node is not functioning e.g. due to a physical attack on it or a breakdown. Incident identification & countermeasure: The node does not communicate according to communication pattern (i.e. if we expect to see a message and we do not see it) and cannot be connected to e.g. to perform remote attestation. If available, an attempt can be made to enable secondary/backup communication link of the node. If a secondary access is available, but cannot be enabled, it is likely that the incident is either related to PS-4 or PS-5 and the countermeasures related to those incidents could be tried. Since the node is not reachable it means that its services cannot be utilised. Therefore, if the node is part of the (edge) cloud, the VMs deployed on it as part of the double virtualisation solution should be re-deployed in another location to maintain the functionality they provide. These new instances of the VMs will naturally not have all the up-to-date data as that has been “lost” with the currently unavailable VMs. In addition to re-deploying the VMs, the network needs to be re-configured to route the associated traffic via the new VM instances rather than the original, now unavailable, ones. Next, a maintenance unit should be sent to the site to fix the issue. Initially a reset of the communications unit could help, otherwise it needs to be repaired or even replaced.


T300: Mainly relating to T301, T306 and T307. T500: PS-3 relates to T501 and T506.


DSOSMC will issue alert that the device is unavailable.


Virtual resources (application or data) related to the device with an unavailable link can be migrated and the new link can be established. Any device in the system might have the connectivity issue.

SUCCESS D4.5 v1.0

Page 58 (61)


A.10.1 Description

Incident name: Device power unavailable. Incident characteristics: Node’s power is not available, node cannot operate. Looks like node is not available and as such it is not maybe possible right away to identify the incident as being device power unavailable. Root cause: For this incident characteristics multiple incidents are possible (PS-3 – PS-5) and the reason for the device not being available could be either HW based or a result of a cyber-attack. The latter should be covered and protected against by cyber security countermeasures discussed earlier with respect to suspicious behaviour, DoS and malware incidents. For PS-4, the reason for not being reachable is that the node has lost its power e.g. due to a physical attack on it or a breakdown. Incident identification & countermeasure: Node does not communicate according to communication pattern (i.e. if we expect to see a message and we do not see it) and cannot be connected to e.g. to perform remote attestation. As the power is out, enabling the secondary communications link as in PS-3 will not work. Next, enabling of possibly available backup power can be attempted. If backup power is available but cannot be enabled or enabling it does not solve the problem, the incident is most likely of type PS-5 and could be handled according to it. If backup power is not available, or the device does not recover when enabling it the services running on the node cannot be utilised. Therefore, if the node is part of the (edge) cloud, the VMs deployed on it as part of the double virtualisation solution should be re-deployed in another location to maintain the functionality they provide. These new instances of the VMs will naturally not have all the up-to-date data as that has been “lost” with the currently unavailable VMs. In addition to re-deploying the VMs, the network needs to be re-configured to route the associated traffic via the new VM instances rather than the original, now unavailable, ones. Next, a maintenance unit should be sent to the site to fix the issue. Initially a reset of the power supply unit could help, otherwise it needs to be repaired or even replaced.


T300: Mainly relating to T306 and T307. T500: PS-3 relates to T501 and T506.




At NORM level, due to economic reasons, usually there is no backup supply, to allow functioning when network voltages are missing. This backup solution may exist only in resilient nodes, where power supply for critical loads is in place, based on UPS or other resilient architectures include energy storage. In case of power loss of the Cloud servers, virtual resources can be migrated to another server/zone.

SUCCESS D4.5 v1.0

Page 59 (61)


A.11.1 Description

Incident name: Device unavailable. Incident characteristics: Node is malfunctioning and will not operate. Looks like node is not available. However, the exact cause of why the device is unavailable might not be clear without further investigation. Root cause: For this incident characteristics, multiple incidents are possible (PS-3 – PS-5) and the reason for the device not being available could be either HW based or a result of a cyber-attack. The latter should be covered and protected against by cyber security countermeasures discussed earlier with respect to. suspicious behaviour, DoS and malware incidents. For PS-5, the reason for not being reachable is that the node is malfunctioning. Incident identification & countermeasure: Node does not communicate according to communication pattern (i.e. if we expect to see a message and we do not see it) and cannot be connected to e.g. to perform remote attestation. As the node is malfunctioning, enabling the secondary communications link as in PS-3 or backup power as in PS-4 will not work. To maintain optimal network operation a backup node should be enabled if available to take on the role of the faulty unit. If the unavailable node is part of the (edge) cloud, the VMs deployed on it as part of the double virtualisation solution should be re-deployed in another location to maintain the functionality they provide. If a backup node is available and has successfully been enabled, that is a natural location to re-deploy the VMs. These new instances of the VMs will naturally not have all the up-to-date data as that has been “lost” with the currently unavailable VMs. In addition to re-deploying the VMs, the network needs to be re-configured to route the associated traffic via the new VM instances rather than the original, now unavailable, ones. Next, a maintenance unit should be sent to the site to take care of the node. An attempt to reset and re-bootstrap the node could be done, but if not successfully recovered, the node would have to be repaired or even replaced.


T300: Mainly relating to T306 and T307. T500: PS-3 relates to T501 and T506.




In case of cloud host malfunctioning, virtual resources can be migrated to another server/zone. The incident could happen to any node in the system.

SUCCESS D4.5 v1.0

Page 60 (61)

B. Interfaces

B.1 Example for a RESTful Service of the SUCCESS API: Attack/Incident found (I3-1)

This chapter gives an example for a formal documentation of one service (I3-1, see Error! Reference source not found.) of the RESTful SUCCESS API.

Endpoint URL: [/{version}/incident]

Pushes a cyber-attack or incident after its identification.

Parameters

Name Type Example Description

version String v0.1 The version of the API to use.

Since only a POST request is foreseen from the service, a relevant body request should be defined. Next, the class diagram of the request object is presented, followed by the class diagram of the response, in Error! Reference source not found..

B.1.1 Set a cyber-attack or incident [POST]

Request

• Headers

1. Accept: application/xml 2. Content-Type: application/xml

• Body

3. <?xml version="1.0"?> 4. <IODEF-Document lang="en" xsi:schemaLocation="urn:ietf:params:xml:schem

a:iodef-1.0" version="1.00" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:ietf:params:xml:ns:iodef-1.0"><Incident lang="en" purpose="reporting">

5. <IncidentID name="CSIRT-X">1</IncidentID> 6. <ReportTime>2017-05-19T10:51:36+01:00</ReportTime> 7. <StartTime>2017-05-19T10:51:36+01:00</StartTime> 8. <Description>DDoS attack</Description> 9. <History/> 10. <Assessment occurence="actual"> 11. <Impact type="dos" completion="failed">DDoS attack on double vi

rtualisation system</Impact> 12. </Assessment> 13. <EventData> 14. <Flow> 15. <System category="source"> 16. <Node> 17. <Address category="ipv4-addr">source_host</Address> 18. </Node> 19. </System> 20. <System category="target"> 21. <Node> 22. <Address category="ipv4-addr">target_host</Address> 23. </Node>

SUCCESS D4.5 v1.0

Page 61 (61)

24. </System> 25. </Flow> 26. </EventData> 27. </Incident> 28. </IODEF-Document>

Response 201 (application/json) Created

• Body

1. { 2. "metadata":{ 3. "api_version":"0.1", 4. "created":"2015-01-20T16:49:30+02:00", 5. "updated":"2015-01-20T16:49:30+02:00" 6. }, 7. }

Documents

Description of Available Components for SW Functions