30
SUCCESS D2.4 V1.0 Page 1 (30) SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 The research leading to these results has received funding from the European Union’s Horizon 2020 Research and Innovation Programme, under Grant Agreement no 700416. Project Name SUCCESS Contractual Delivery Date: April 30, 2017 Actual Delivery Date: April 30, 2017 Contributors: EDD Workpackage: WP2 - Security, Resilience and Survivability by Design Security: PU Nature: R Version: 1.0 Total number of pages: 30 Abstract: With the integration of increasing numbers of decentralised power generation stations based on renewable energy sources, automation of the power distribution network is also becoming decentralised. In such situations, resilience becomes a key characteristic to be considered while designing such automated distribution network systems. A combined approach to improved resilience is being investigated in SUCCESS based on adapting Virtualisation techniques to the use case of decentralised automation functions and is termed Double Virtualisation in the SUCCESS project. This allows the automation system to continuously dynamically reallocate the specific controlling functions in use and also enables scaling of the solution with respect to computational power. This report describes how we realised our Double virtualisation concept in the laboratory setup. We report on how Substation Automation Functions have been virtualized, run over the Double Virtualisation implementation and tested on cloud platforms. Keyword list: Security, Communication, Threat, Countermeasure, Double Virtualisation, Cloud Computing, Virtual Instance Disclaimer: All information provided reflects the status of the SUCCESS project at the time of writing and may be subject to change.

SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 1 (30)

SUCCESS

D2.4 v1.0

The Resilience by Design Concept V1

The research leading to these results has received funding from the European Union’s Horizon 2020 Research and Innovation Programme, under Grant Agreement no 700416.

Project Name SUCCESS

Contractual Delivery Date: April 30, 2017

Actual Delivery Date: April 30, 2017

Contributors: EDD

Workpackage: WP2 - Security, Resilience and Survivability by Design

Security: PU

Nature: R

Version: 1.0

Total number of pages: 30

Abstract: With the integration of increasing numbers of decentralised power generation stations based on renewable energy sources, automation of the power distribution network is also becoming decentralised. In such situations, resilience becomes a key characteristic to be considered while designing such automated distribution network systems. A combined approach to improved resilience is being investigated in SUCCESS based on adapting Virtualisation techniques to the use case of decentralised automation functions and is termed Double Virtualisation in the SUCCESS project. This allows the automation system to continuously dynamically reallocate the specific controlling functions in use and also enables scaling of the solution with respect to computational power. This report describes how we realised our Double virtualisation concept in the laboratory setup. We report on how Substation Automation Functions have been virtualized, run over the Double Virtualisation implementation and tested on cloud platforms.

Keyword list: Security, Communication, Threat, Countermeasure, Double Virtualisation, Cloud Computing, Virtual Instance

Disclaimer: All information provided reflects the status of the SUCCESS project at the time of writing and may be subject to change.

Page 2: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 2 (30)

Executive Summary

Resilience is the one of key characteristics to be considered while designing a critical infrastructure system. Next generation power systems, in which power generation based on renewable energy sources is increasingly integrated, and in which distributed automation architecture are being decentralised to support the functioning of system, will become vulnerable to failures and cyber-attacks. There is a need to provide resilience functionality to such systems. This is the challenge we address with our work. SUCCESS applies a new double virtualisation concept to power network operations to increase their resilience. A joint reconfiguration logic, based on separating decentralised distribution automation functionality from the physical systems the applications run on, has been implemented through virtualised the functionality of the decentralised distribution automation functionality and then dynamically relocating this functionality to different physical systems so that an attacker has to know on which hardware the functionality is currently running in order to attack it. Furthermore, this functionality can be dynamically reallocated making the power system more resilient against a single point of failure and cyber-attacks.

Page 3: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 3 (30)

Authors

Partner Name e-mail ERICSSON GmbH (EDD) Robert Farac [email protected]

Page 4: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 4 (30)

Table of Contents

1.  Introduction ................................................................................................. 5 

2.  How to read this document ........................................................................ 6 

3.  Threats and Countermeasures .................................................................. 7 

4.  Solution Approaches .................................................................................. 8 

4.1  Cloud Based Solutions ................................................................................................... 8 4.1.1  Cloud Platforms ...................................................................................................... 9 

4.1.1.1  OpenStack ................................................................................................... 9 4.1.1.2  Proprietary Cloud Platforms ......................................................................... 9 

4.2  Comparison of Approaches ......................................................................................... 10 

5.  Triggering Virtual Control Relocation ..................................................... 11 

5.1.1  Relocation Triggering by DSO Entity ................................................................... 11 5.1.2  Relocation Triggered by the Edge Cloud ............................................................. 11 

6.  Implementation ......................................................................................... 13 

6.1  Double Virtualisation Concept ...................................................................................... 13 6.2  Use Case: Grid State Estimation ................................................................................. 14 6.3  Triggering Migration of Virtual Substation Automation Unit ......................................... 17 6.4  Migration of the Virtual Substation Automation Unit .................................................... 18 6.5  Migration of Virtual Devices ......................................................................................... 20 

7.  Conclusions .............................................................................................. 21 

8.  References ................................................................................................. 22 

9.  List of Abbreviations ................................................................................ 23 

A.  Annex ......................................................................................................... 24 

A.1  Live Migration on VMware Platform ............................................................................. 24 A.2  Migration of Virtual Devices ......................................................................................... 28 

Page 5: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 5 (30)

1. Introduction

This report describes the realization of the double Virtualisation concept in power system to enable resilience against cyber-attacks.

The resilience of any system describes its ability to continue functioning even if there has been a failure of one or more of it’s components. In the current transformation of power grids, power generation is becoming more decentralised and automation architecture should adapt to this situation by also becoming more decentralised. In such a scenario, resilience becomes a key characteristic for their seamless functioning. In such a decentralised scenario, the power network is built from bottom up by aggregating control cells that are combination of virtualised automation function and power infrastructure. This allow reconfiguration at any time to enable resilience in the system, which acts as a countermeasure against cyber-attacks.

Cyber-attacks are increasingly performed by highly sophisticated groups. The attacks can be directed at both power and communication resources.

The resilience technique explored in the SUCCESS project is mainly based on the cloud computing paradigm. In principle, system resilience is ensured by enabling fast relocation of cloud virtual resources when a security incident (attack) is identified.

With respect to communication technologies, the system resilience can be leveraged by separation of the functional layers from the data layers in the virtual environment. Functional layers consist of the decentralised distribution automation functions and data layer consist of the database used to store the measurement values based on which these automation functions are applied. In that way the effectiveness of attacks is reduced because combined attacks targeting simultaneously both layers will be complex to execute, since they will require different approaches and methodologies to be applied at the same time on different targets. This approach is named Double Virtualisation in SUCCESS.

Technologies used in our work are well known in the ICT domain almost for a decade. Our innovation is to apply the techniques to the use case of power network resilience. The IoT industrial platform by General Electric ([6]) is an example of a cloud based infrastructure that is offered to the power sector as well as to other industrial sectors. Double Virtualisation technology itself has been studied in all-electric ship power systems by the US navy [5].

Page 6: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 6 (30)

2. How to read this document

This document builds on the concepts of resilience, among other concepts such as security and survivability, which are introduced in D4.2 [13]. Therefore, the reader of this document should be familiar with the contents of D4.2 before reading this document.

The structure of this document is as follows:

Threats to which double virtualisation as a countermeasure is applied are listed in Chapter 3

Several Double Virtualisation solutions are introduced and compared in Chapter 4. Triggering mechanism of the virtualized power grid function control migration, and the

interface between DSO and Edge Cloud is described in Chapter 5. How Double Virtualisation is realised in SUCCESS is described in Chapter 6. Results and conclusions of the lab test as well as the future work planned for the

second phase of the project is described in Chapter 7.

Page 7: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 7 (30)

3. Threats and Countermeasures

Today, cyber-attacks are directed to both power and communication resources. Deliverable 1.2 [3] contains the identified threats to the system, particularly in Chapter 3.2 “Success Centric Threats”. Those threats are mapped to Double Virtualisation as a countermeasure in Deliverable 4.4 [4], in Chapter 15 “Threat and Countermeasure Mapping”. The threats shown in Table 1 are fully or partially covered by Double Virtualisation as a countermeasure.

In D4.4 the incidents are listed to which Double Virtualisation can be applied as a countermeasure. Relevant incidents are labeled as cyber-security related incidents (CS-1, CS-3 and CS-5) and physical-security incidents (PS-3, PS-4 and PS-5). How Double Virtualisation mitigates the listed incidents is explained in detail in D4.4.

Table 1 Threats where Double Virtualisation can be applied as a countermeasure

Threat No.

Threat

T001 Distributed DoS T307 Rogue hardware

T002 Smurf attack T401 IP hijacking

T003 TCP/SYN flooding T403 DNS poisoning

T004 UDP flooding T404 Falsification of record

T005 Teardrop attack T405 Time synchronization attack

T101 Man-in-the-middle T410 Brute Force

T102 Eavesdropping T501 Theft of fixed hardware

T103 Masquerade T505 Unauthorized physical access

T105 User Impersonation T605 Loss of information in the cloud

T106 Service spoofing T701 Stealing sensitive data

T107 Traffic analysis T702 Injection of Viruses

T109 Disclosure T703 Disturbing availability

T110 Replay attacks T704 Compromised actors

T112 Deception T705 Accidental leaks/sharing of data by employees (naive insiders)

T113 Session hijacking T706 Malicious insiders (saboteur, disloyal employees)

T201 Malware T707 Tech savvy insiders

T202 Virus/Worms

T203 Trojan Horse

T204 Trapdoor

T205 Rootkits

T206 Elevation of privileges

T207 Spyware

T208 Fake SSL certificate

T209 Signed malware

Page 8: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 8 (30)

4. Solution Approaches

This section gives a description of the solution designed to realise the concept of double virtualisation. The solution consists of cloud based computing platforms, in which fast relocation of the virtualised control functions is triggered, at the time of occurrence of an incident. The incident can occur due to system failure or due to a cyber-attack. The virtual control functions are realised with virtual machines [16].

4.1 Cloud Based Solutions

In the Double Virtualisation concept (see Figure 1), the functions, the data and the devices’ representations on the cloud platform infrastructure are deployed in separated logical layers, namely the Functional Layer and the Data Layer. The Functional Layer contains the virtualized control functions where the power system monitoring and control applications are deployed, while in the Data Layer grid data and virtual devices, virtual representations of measurement devices installed in the field, e.g., Smart Meters (SM) or Phasor Measurement Units (PMU) are hosted.

Figure 1: Double Virtualisation concept

Note that in terms of the SUCCESS infrastructure model, the Double Virtualisation concept is applicable on the application layer. When applying Double Virtualisation, a system on the Application layer is separated into functional and data-related parts.

Migration of virtual control functions can be done between isolated locations within data centers (zones). At least two zones will be able to define that can be logically and/or physically separated. Indicatively, zones can belong to different networks or they can be deployed in different physical servers.

Page 9: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 9 (30)

The virtualisation of the power system applications in the functional Layer aims at reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when security or failure of the hosting (on which virtual control functions are deployed) zone is detected. Safe zone is considered as the location where failure of the component is not observed.

The virtualisation of the field devices’ representation has the main purpose of hiding the physical devices installed in the field behind their virtual representations. By doing this, the only exposed entities will be the virtual devices and any attack aiming at the devices will impact these virtual devices. When an attack is detected, it can be isolated by deactivating the connection between the physical and the virtual device. A new virtual device control function is then started up in a new zone and the connection with the physical device is restored, while the attack is neutralised by switching off the compromised virtual control function.

When a cyber-attack is detected, either from an external or an internal network, virtual control functions will be immediately migrated to another zone. The data stream will be not affected if only the control function is migrated. However, if the database has to be migrated then the data stream should be redirected without the data loss. During the migration process the data delay could be experienced but the data should not be lost.

The functional and data layers, which will contain functional and data entities, will be logically separated. In case of the attack on separate layers, the relevant entities will be migrated. In case that the attack is targeted at a single layer, its effectiveness will be reduced. Combined attacks targeting simultaneously both layers will be complex to execute, since they will require different approaches and methodologies to be applied at the same time on different targets.

4.1.1 Cloud Platforms

Nowadays wide range of cloud platforms is available, which can host the Double Virtualisation solution. The concept can be realised on the proprietary cloud platforms as well as open source ones. In SUCCESS the following platforms are considered:

OpenStack (open-source) Proprietary platforms VMware and Red-Hat

4.1.1.1 OpenStack

OpenStack [9] is one of the most popular open source cloud operating options today. It is getting more stable, feature rich and easier to use.

The OpenStack platform is used mainly in the SUCCESS project laboratory work. On the OpenStack platform virtual machines are used to host the functional units and the data. Migration of the virtual machines is done without interrupting the execution of the virtualised power grid functions. This is achieved by means of live migration of the virtual machines.

In the SUCCESS project the live migration is implemented by separating the control function from the cloud infrastructure. Practically this means that migration of the function is done by migrating the function processes from the source to the target virtual control functions.

In OpenStack live migration related actions are done using command line interface. Migrating a virtual machine requires that the designer of the function is familiar with the OpenStack CLI, and accordingly it is more time consuming compared to the VMware cloud platform, which provides a graphical user interface for live migration.

4.1.1.2 Proprietary Cloud Platforms

In the SUCCESS project two proprietary cloud platforms are considered: VMware [18] and Red Hat [19].

VMware Cloud Platform

VMware is one of the oldest players in the virtualisation market and has an established record of performance and reliability. The VMware cloud platform supports live migration of virtual resources.

There are many migration configuration possibilities, here we only mention a few of them:

Migrations between hosts separated by high network round-trip latency times,

Page 10: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 10 (30)

Moving the virtual disks or configuration file of a powered-on virtual machine to a new data store,

Migration of a suspended virtual machine, and Moving virtual machines manually or set up a scheduled task to perform the cold

migration.

Because the migration is done at the infrastructure level, virtualised applications do not need to be adapted. In Appendix Fehler! Verweisquelle konnte nicht gefunden werden., live migration of virtual machines is demonstrated. During the migration the virtual machine is being pinged, and at the same time the virtual machine is pinging an external machine. It is shown that none of the pings is lost during the migration.

Red Hat Cloud Platform

Red Hat cloud platform is Red Hat’s supported distribution of OpenStack.

Live migration has some limitations as follows:

No operations can be performed on a virtual machine which is being live migrated, Virtual Machines under heavy memory load are hard to live migrate if the memory

change rate is high, Live migration generates excessive network traffic if the rate of change of the memory is

high or if the rate of change of the storage is high and operating on block-storage mode, By default, live migration is not allowed to a target node supporting fewer CPU

instructions than the source node, and Live migration does not use memory oversubscription.

The Manager automatically initiates live migration of virtual machines in the following situations:

When a host is being moved into maintenance mode, and In order to maintain load balancing or power saving levels in the cluster to be in line with

cluster policy

4.2 Comparison of Approaches

We have chosen the open-source cloud platform (OpenStack) to host the Double Virtualisation logic as it provides the required features and is cost effective. At the same time, the proprietary cloud platform (VMWare) is providing more features, a simple user interface and a more stable environment.

Live migration cloud platform feature (as vMotion in VMware platform or live-migration in OpenStack) enables migration of the virtual functions without the need to adapt the function. This is achieved by the cloud infrastructure itself as every virtual function process running in the cloud platform is being migrated separately by the platform. In this way no additional efforts have to be invested to adapt specific power grid applications.

Docker containers, which will be considered in the next phases of the project, are an efficient solution for hosting of virtual devices because containers are much lighter in terms of required resources by their nature than virtual machines. As an example, tens or hundreds of virtual devices, e.g., virtual smart meters, which are in principle simple logical units, can be run on a single host.

Page 11: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 11 (30)

5. Triggering Virtual Control Relocation

We envision that the Double Virtualisation concept is implemented in the Edge Cloud [4]. The overall security monitoring solution as well as the interfaces between its components is described in deliverable D4.4 v2.0 [4].

The following two fragments of the overall security monitoring solution are the most relevant in security incident identification:

local Distribution System Operator (DSO) entities, and the Edge Cloud itself.

Relevant DSO entities in the context of security incident detection are:

Distributed instance of European Security Monitoring, and Information System (DE-SMIS) and DSO Security Monitoring Centre (DSOSMC) ([4]).

A security incident can be identified in both of those fragments initially by monitoring the network and the nodes in it, which is essentially done by in the Edge Cloud. The DSO entities are informed about the security incident by the Edge Cloud or it is identified by the DSO entity itself; in both cases a warning flag is raised and the DSO operator is informed, e.g., by an issuing an alarm.

In the SUCCESS security monitoring solution, a DSO operator manual action is expected, which will trigger the migration of the virtual instances as a countermeasure, although the complete process from the incident identification to the virtual instance migration triggering could be fully automated. DE-SMIS will distribute the incident indication further to the upper instance European Security Monitoring and Information System (E-SMIS).

5.1.1 Relocation Triggering by DSO Entity

DE-SMIS or DSOSMC will continuously receive the following information from the Edge Cloud:

Cloud resources status Virtual functions topology

o Virtual function topology describes relations among virtual functions and physical grid resources (physical network segments)

In case that DE-SMIS or DSOSMC identifies an incident, they will issue notifications to the DSO (e.g. alarm or ticket). Accordingly, two cases are foreseen:

DSO operator will decide which action to take, and DE-SMIS or DSOSMC will send instructions to the Edge Cloud which virtual instance

should be migrated or shutdown.

The DE-SMIS entity will distribute the incident indication further to the upper instance of the European Security Monitoring and Information System (E-SMIS).

5.1.2 Relocation Triggered by the Edge Cloud

In case of a cyber-attack or malfunction of physical equipment monitored by the Edge Cloud, the Edge Cloud might take an internal action and will inform the DE-SMIS or DSOSMC.

The internal action taken by the Edge Cloud can be fully automatic or controlled by the Edge Cloud operator.

Regardless if the Edge Cloud takes immediate action or not, the Edge Cloud will send the following information to DE-SMIS or to DSOSMC:

The Edge Cloud virtual and controlled physical network resources’ status, Virtual function topology,

o Virtual function topology describes relations among virtual functions and physical grid resources (physical network segments)

Eventual internal action taken by the Edge Cloud. Upon receiving the information from the Edge Cloud, the DSO entity will issue notification to the DSO operator (e.g. alarm or ticket).

Page 12: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 12 (30)

The DSO entity will distribute the incident indication further to the upper instance of the European Security Monitoring and Information System (E-SMIS).

Page 13: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 13 (30)

6. Implementation

We have implemented a scenario use case on an OpenStack cloud platform. The scenario pertains to virtualizing a power grid monitoring function also known as a Substation Automation Unit – SAU [15]. SAU is an automation unit placed in the electrical substation for monitoring and controlling of the part of the grid, e.g., feeders, customers, distributed generators connected to the substation. The role of the SAU is to locally manage a portion of the distribution grid. SAU was developed in the FP7 project IDE4L [11]. The SAU function is using the data received from Smart Meters, which are stored in a database. Virtual devices are receiving the data streams from smart meters and store the data in the database. Finally, a cyber-attack is emulated. When the incident is identified, the SAU function is migrated without being interrupted.

6.1 Double Virtualisation Concept

In the Double Virtualisation solution that is deployed on the OpenStack cloud platform the functions and the database with the devices’ representations are deployed in separated logical layers (Figure 2).

Page 14: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 14 (30)

Figure 2 Double Virtualisation implementation concept

Two zones are defined that are logically separated. Zones could belong to different networks or they can be deployed in different physical servers.

6.2 Use Case: Grid State Estimation

An example of an application to which Double Virtualisation can be applied is a Distribution Energy Management System (DEMS), in which measurements of the voltage levels and phase angles at various points in the electrical grid are used to estimate the electrical grid’s state. Such a DEMS application would belong to the Application layer of Figure 2. In a Double Virtualisation context, the State Estimation would be realised with functional and data-related parts running on separate Virtual Machines, which can be moved as a countermeasure in response to a security attack.

In this scenario, the physical power grid network can be divided in the segments shown in Figure 3. Every network segment is a feeder in a low voltage network, and is controlled by one virtual function, the so called virtual DEMS application (vDEMS).

Page 15: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 15 (30)

Figure 3 Power Network Segmentation

The vDEMS application consists of the following components as shown in Figure 4:

Virtual devices representing Smart Meters (vSM), Database, and a Virtual Substation Automation Unit (vSAU).

The simple 1:1 relation between vDEMS and network segment is shown in Figure 4. Different relationships such as n:1 and 1:n are also possible.

Virtual devices (vSM) are protocol units that are responsible for receiving a data stream from the smart meter and writing to the database. Every physical device (smart meter) has a corresponding vSM in the virtual function vDEMS. Both virtual device and database are belonging to the Data Layer.

The vSAU is running on the Functional Layer. vSAU is reading the data streams from different smart meters from the databases, estimates the grid state and writes the results back in the database.

Page 16: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 16 (30)

Figure 4 Grid State Estimation before Attack

In the case of an attack or malfunction in one network segment, the virtual instances will be migrated from one virtual function to another. In Figure 5 migration of the virtual instances is done from the first to the second virtual function vDEMS after vDEMS1 is attacked.

Page 17: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 17 (30)

Figure 5 Grid State Estimation after Attack

Note that migration of the database is not necessary if the topology data of all segments are stored in all databases. Nonetheless, the topology data in every database should be constantly synchronized.

Another option is that the database is migrated as well.

During the migration of virtual devices vSMs, data streams should be redirected without losing the data. This can be achieved by using the Message Queuing Telemetry Transport (MQTT) protocol among the virtual devices and the smart meters.

6.3 Triggering Migration of Virtual Substation Automation Unit

In the SUCCESS project we simulate a cyber-attack that being detected based on CPU load of the virtual machine (this could be the result of a Distributed Denial of Service - DDoS attack). CPU load is monitored on which vSAU is hosted, and if the CPU load stays 100% for at least 10 seconds then virtual machine migration is triggered.

Page 18: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 18 (30)

Figure 6 Red Button

6.4 Migration of the Virtual Substation Automation Unit

In computing, a process is an instance of a computer program that is being executed. Some cloud platforms like VMware and Red Hat provide migration of the computing processes meaning that all virtual computing processes that are running on the migrating virtual machine will be smoothly migrated without losing the processes. Of course the processes will be paused briefly during the migration.

However, such a live migration feature was not provided by OpenStack at the time when live migration has been performed in SUCCESS project. Therefore, we have migrated the vSAU virtual function processes ‘manually’, by stopping the processes on the origin virtual machine and starting the identical process on the target virtual machine.

During the vSAU virtual function process migration, the origin and the target virtual machines were running. The origin and the target virtual machines performance charts are shown in further detail on the following figures. Figure 7 shows virtual CPU usage of the origin and target virtual machines during the migration.

Figure 7 Origin and target VM virtual CPU usage during migration

Figure 8 shows the origin and the target virtual machines virtual disk usage during the migration of the vSAU virtual function processes. In particular, the number of the disk writing operations is shown.

Page 19: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 19 (30)

Figure 8 Origin and target VM virtual disk usage during migration

Finally, Figure 9 shows the origin and the target virtual machines network activity. In particular the number of the packets sent to the network from the virtual machine is shown.

Figure 9 Origin and target VM virtual network activity during migration

From the performance graphs above it is seen that both the origin and the target virtual machine are operating normally during the migration.

Page 20: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 20 (30)

To demonstrate and evaluate the vSAU virtual function process migration, the number of vSAU read and write operations towards the database has been monitored and shown in the real-time graph. The vSAU is reading the data from the database, analyzing the data and returning the results of the analysis to the database. Figure 10 shows the number of the database transmissions per seconds (tps). The tps value is calculated on both the origin and the target virtual machine. The tps values are provided to ELK stack (Elasticsearch, Logstash-and Kibana tools stack [12]) for visualizing the results. Plotting is performed every 1 second.

Figure 10 vSAU migration line chart

On the left side of the graph we can see the tps values collected from the origin virtual machine. When the incident is identified, the vSAU process on the origin virtual machine is killed and immediately after the new vSAU process on the target virtual machine is started. This process happens in less than a second as is visible from the graph.

Note that a discontinuous line is resulting from the fact that here two graphs are actually shown on the same chart. The values for the two graphs are coming from two data sources (from two virtual machines). For this reason, the tool does not draw a continuous line as it is the case for a single graph with one source. For the single graph, the values are drawn once per second and the plots have been connected by the tool itself.

During the migration, both virtual machines are active. Soon after the migration is completed, the original virtual machine is shut down.

6.5 Migration of Virtual Devices

Migration of the virtual devices shall be considered and probably implemented in a laboratory trial in the next project deliverable. The idea and the implementation are described in Annex Fehler! Verweisquelle konnte nicht gefunden werden..

Page 21: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 21 (30)

7. Conclusions

In the future, in which information and communication infrastructure will play key role in the functioning of power systems, resilience will be seen as a key characteristic that has to be addressed when designing such hybrid systems.

This document described how the innovative concept of double virtualisation is realised in the test lab setup at RWTH in Aachen, Germany. In the demonstration, Substation Automation Functions are constantly relocated to enable improved resilience of the functioning the distribution grid during failure of the physical computing infrastructure or at the time where a cyber-attack has been detected. A cloud computing platform based on the Openstack platform is utilised to realise the concept and to investigate the performance of functioning of the virtual control functions.

Our results show that implementing the concept of Double Virtualisation for power network management applications involving database updates was achievable with a modest use of project resources and that the application of the concept increased the resilience of the application and added a further countermeasure to several types of cyber-attack (Single point of failure attacks, etc.) to the portfolio of countermeasures available to power network operators. Our results show that this technique has the potential to be deployed on a commercial scale in power networks in the near future as it can be configured using available products.

In future investigation on this topic in SUCCESS, environments providing light-weight virtual instances, on which applications can be deployed , as well as automated application deployment, such as Docker, will be considered. In this way the deployment and orchestration of virtualized power grid applications in a cloud environment could be significantly improved. At the same time, virtualized grid applications running in light-weight docker containers use less computational resources. Orchestration components such as Kubernetes [17] may be used in software integration.

Page 22: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 22 (30)

8. References

[1] M. Shahidehpour and Y. Wang. “Communication and Control in Electric Power Systems”. John Wiley and Sons, 2003

[2] Song Zhang; Vittal, V., "Design of Wide-Area Power System Damping Controllers Resilient to Communication Failures," Power Systems, IEEE Transactions on Power Systems, vol.28, no.4, pp.4292,4300, Nov. 2013

[3] SUCCESS D1.2, “Identification of Existing Threats, V2”, April 2017

[4] SUCCESS D4.4 v2.0, “Description of Available Components for SW Functions, Infrastructure and Related Documentation, V1”, January 2017

[5] Doerry, N., "Naval Power Systems: Integrated power systems for the continuity of the electrical power supply.," Electrification Magazine, IEEE, vol.3, no.2, pp.12,21, June 2015

[6] Predix, The Industrial IoT Platform

[7] What is Docker Container? Part 1: The Docker Open Source Project

[8] Open Source release of IoT app environment Calvin

[9] OpenStack wikipedia

[10] Ericsson RBS6501

[11] IDE4L Project Page

[12] ELK Stack

[13] SUCCESS D4.2, “Solution Architecture and Solution Description, V2”, April 2017

[14] Introducing the MQTT Security Fundamentals

[15] A. Angioni et al., "Design and Implementation of a Substation Automation Unit," in IEEE Transactions on Power Delivery, vol. 32, no. 2, pp. 1133-1142, April 2017

[16] Virtual machine, Wikipedia

[17] kubernetes, Production-Grade Container Orchestration

[18] VMware Cloud Computing

[19] Red Hat Cloud Computing

Page 23: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 23 (30)

9. List of Abbreviations

ACS Institute for Automation of Complex Power Systems CPU Central Processing Unit DE-SMIS Distributed instance of European Security Monitoring and Information System DEMS Decentralised Energy Management System DSO Distribution System Operator DSOSMC DSO Security Monitoring Centre DV Double Virtualisation E-SMIS European Security Monitoring and Information System FLISR Fault Location, Isolation, and Service Restoration ICT Information and Communication Technology MQTT Message Queuing Telemetry Transport PMU Phasor Measurement Unit SM Smart Meter vDEMS Virtual Distribution Energy Management System VM Virtual Machine vSAU Virtual Substation Automation Unit vSM Virtual Smart Meter WP Work Package

Page 24: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 24 (30)

A. Annex

A.1 Live Migration on VMware Platform

Live migration is mature feature in the VMware platform enabled by vMotion. Here we will demonstrate migration of virtual machine using graphical interface. Live migration is possible without downtime of virtual machine. In order to prove it we will be running double pinging during the migration: towards migrating virtual machine and from it.

There are different possibilities of migration. We will describe here migration of virtual machine between two hosts that are sharing external central storage. Physical and logical resources are shown in the Figure 11.

Figure 11 VM Migration

The VM NSXManager-003 is migrated from the host hpc7000-003-bl1-oam.de.eld.ericsson.se to hpc7000-003-bl12-oam.de.eld.ericsson.se. Both hosts are sharing the central storage (vmax machine). Storage resources are migrated from the file system VC-003-vmax-valid-040C-ds to VC-003-vmax-PoC-lab-0245.

Note that on the host only the VM processes are placed. The VM resources like RAM, disk, swap memory are actually files stored on the central storage file system deployed on the LUN.

Page 25: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 25 (30)

Accordingly, during live migration files are copied between LUNs, and processes are transferred between hosts.

In this case the migration is triggered manually via GUI; see Figure 12.

Figure 12 Triggering Migration

The following items can be configured before migration is started:

Destination compute resources, Storage, Network, Priority.

Migration configuration items are shown in the Figure 13.

Page 26: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 26 (30)

Page 27: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 27 (30)

Figure 13 Migration Configuration Items

Before migration we have started two ping processes:

Ping from external machine to the migrating VM, Ping from the migrating VM to external machine.

In the Figure 14, it can be observed that there are no lost pings, except some take few more seconds as the process was paused because of the migration.

Figure 14 Ping during the Migration

Page 28: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 28 (30)

A.2 Migration of Virtual Devices

In case of certain security attacks virtual device vSM can be migrated. During the migration the data stream that vSM is receiving from physical Smart Meter should be redirected without data loss. Smooth data stream redirection can be achieved by:

Deploying vSM on virtual machine and live migrating the VM, Implementing specific mechanism that will ensure data stream redirection, e.g. MQTT

protocol. The goal of the MQTT protocol is to provide a really lightweight and easy to use communication protocol for the internet of things. So that’s why in the protocol itself are only a few security mechanisms clearly specified. But in all common implementations other state-of-the-art security standards are used, like SSL/TLS for transport security. Security of the MQTT protocol ([14]) in common implementations is achieved on several layers, i.e., on network layer, above mentioned transport layer and application level.

Different functions including virtual device vSM that is deployed on virtual machine can be live migrated without adapting the function. It is an efficient implementation solution. However, a virtual machine is requesting too much computational resources for handling one simple protocol unit vSM. Note that every SM will have its virtual device clone. Besides virtual machines, Docker containers can also host virtual device functions, but Docker container live migration is still in trial phase at the moment.

For these reasons, we will implement a specific mechanism using MQTT protocol that will enable data stream redirection without data losses. Basic solution idea is shown in Figure 15.

Figure 15 Data Rerouting Solution

Page 29: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 29 (30)

Virtual device vSM will be deployed on Docker container, which will be deployed on the virtual machine. Smart meter will send data to vSM using MQTT protocol. MQTT broker will be installed on Smart Meter1. All MQTT clients will subscribe to MQTT topics. There will be two topics:

“signaling” will be used for infrastructure events, e.g., the target container can take over the data stream,

“data” will be used for transferring smart meter data. When security incident is identified the data stream going via Container-1 will be rerouted via Container-2. Detailed data flow is shown in Figure 16.

Figure 16 Data Flow

MQTT clients on the SM and vSM will subscribe to the topics “signaling” and “data”. SM will publish the data on the topic “data” whenever it collects new measurements. vSM client will propagate the data further to the database.

When security incident is identified cloud manager will create the new vSM on a different virtual machine. MQTT client on the newly created vSM will subscribe to the topics “data” and “signaling”, and publish on the topic “signaling” that it takes over the data stream from the SM. MQTT client on the origin vSM will disconnect from the MQTT broker consequently. Origin vSM

1 At the moment instead of a physical Smart Meter we will use a data streamer simulator that will be

deployed on a virtual machine. The real Smart Meter might be used in later project phases.

Page 30: SUCCESS D2.4 v1.0 The Resilience by Design Concept V1 · reinforcing the resilience of the SUCCESS architecture by allowing the relocation of the applications to a new safe zone when

SUCCESS D2.4 V1.0

Page 30 (30)

can be shut down then. Afterwards the data stream will be handled via the target (new) vSM container.