Download pdf - Anomaly Detection in Industrial Networks using a Resource

Anomaly Detection in Industrial Networks

using a

Resource-Constrained Edge Device

Anton Eliasson

Computer Science and Engineering, master's level

2019

Luleå University of Technology

Department of Computer Science, Electrical and Space Engineering

Anomaly Detection in IndustrialNetworks using a

Resource-Constrained Edge Device

Anton Eliasson

Lulea University of Technology

Dept. of Computer Science, Electrical and Space EngineeringLulea, Sweden

ABSTRACT

The detection of false data-injection attacks in industrial networks is a growing challenge

in the industry because it requires knowledge of application and protocol specific be-

haviors. Profinet is a common communication standard currently used in the industry,

which has the potential to encounter this type of attack. This motivates an examination

on whether a solution based on machine learning with a focus on anomaly detection can

be implemented and used to detect abnormal data in Profinet packets. Previous work

has investigated this topic; however, a solution is not available in the market yet. Any

solution that aims to be adopted by the industry requires the detection of abnormal

data at the application level and to run the analytics on a resource-constrained device.

This thesis presents an implementation, which aims to detect abnormal data in Profinet

packets represented as online data streams generated in real-time. The implemented un-

supervised learning approach is validated on data from a simulated industrial use-case

scenario. The results indicate that the method manages to detect all abnormal behaviors

in an industrial network.

iii

PREFACE

First of all, I want to thank my family for the support you have given me, not only during

this thesis but also during my entire study period at LTU.

This thesis work was carried out in collaboration with HMS Networks, where I especially

would like to thank my supervisor, Henrik Arleving, for your ideas, help, and feedback

throughout the project. I also want to thank the rest of the members of the company

who helped me, especially Mattias Svensson who assisted me with technical problems

with the software and hardware at the beginning of the project.

I would also like to thank my supervisor at LTU, Sergio Martin Del Campo Barraza.

Your quick responses and feedback have been amazing and have helped me a lot in order

to finish the report. Your knowledge in machine learning has been really valuable as well,

and you have learned me a lot during these months.

I also want to take the opportunity to thank the team at Stream analyze for guiding

me with the analytics platform, especially Johan Risch. Your help has been incredible.

Anton Eliasson

v

CONTENTS

Chapter 1 – Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Attacks on industrial systems . . . . . . . . . . . . . . . . . . . . 1

1.1.2 Intrusion detection in Industrial Networks . . . . . . . . . . . . . 2

1.1.3 Network Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.1 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Chapter 2 – Theory 7

2.1 Profinet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Profinet system model . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.2 Profinet Cyclic IO Data Exchange . . . . . . . . . . . . . . . . . . 9

2.1.3 Abnormality in Profinet packets . . . . . . . . . . . . . . . . . . . 10

2.2 Stream Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.2 Selection of features . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5 Machine learning algorithm for anomaly detection . . . . . . . . . . . . . 15

2.5.1 DBSCAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Chapter 3 – Methodology 19

3.1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3 Generation of network data . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.4 Data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.4.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.4.2 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.4.3 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.5 Creation of training and validation sets . . . . . . . . . . . . . . . . . . . 25

3.6 Implementation of machine learning algorithm . . . . . . . . . . . . . . . 28

3.7 Validation method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.7.1 Validation of clustering model . . . . . . . . . . . . . . . . . . . . 29

3.7.2 Validation of the online data stream case . . . . . . . . . . . . . . 31

Chapter 4 – Results 33

4.1 Clustering method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.1.1 Scenario A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.1.2 Scenario B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.1.3 Online data stream case . . . . . . . . . . . . . . . . . . . . . . . 35

Chapter 5 – Discussions, Conclusions and Future Work 39

5.1 Discussions and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Appendix A – Tables 43

viii

CHAPTER 1

Introduction

There is a current need to protect industrial control systems from attacks in the industry.

False data-injection attacks, where application data is modified in the network packets

aiming to interrupt the normal behavior of an industrial process, is one type of attack

that can cause major problems if it is not detected in time. Intrusion detection systems

need to be able to detect these attacks as early as possible. Therefore, analysis of network

traffic occurs over continuous sequences of data in real-time. Moreover, it is difficult to

obtain knowledge about all kinds of existing attacks, especially unknown attacks that

had never been seen before. Given these reasons, a solution based on anomaly detection

and unsupervised machine learning is implemented in this project, which aims to detect

abnormal data in industrial network packets. An investigation of the performance of

an anomaly detection solution is made and conclusions are drawn to determine if it is

possible to detect abnormal data in an industrial network using deep packet inspection

and machine learning. The examination of the proposed solution is extended to evaluate

the performance of the machine learning implementation when it is running on a resource-

constrained edge device.

1.1 Background

1.1.1 Attacks on industrial systems

During the development of industrial control systems, security has not been prioritized

in particular. The reason has been that the factory floors have been isolated from the

outside world and there has been no need for securing the systems from attacks and

intrusions. However, the interest in security and malware detection capabilities is in-

creasing considerably due to a higher demand for devices that are part of the Industrial

Internet-of-Things (IIoT).

One example of a real-world malware is the well-known computer worm Stuxnet, which

attacked Iran’s nuclear plant in 2010 [1]. Stuxnet aimed to target industrial controllers,

1

2 Introduction

specifically Siemens programmable logic controllers (PLCs), used to control industrial

processes such as industrial centrifuges for the separation of nuclear material. The pro-

gram, which controls and monitors the PLCs, called Step7, uses a specific library for

the communication with the PLCs. The worm managed to get control over this library

and was able to handle requests sent to the PLCs. Next, malicious code was created

and introduced to target specific PLCs. The payload of the data sent from the PLCs

to the centrifuges changed and set the speed to much lower or higher than the regular

behavior. The slow speed causes the uranium process to run inefficiently, while the high

speed can potentially destroy the centrifuges. Before the attack was made, the malware

recorded the normal operation of the centrifuges speed. The recorded data was then fed

to the monitor program, WinCC, and prevented the system from alerting on anomalous

behavior, which made it difficult to detect the abnormal speed [2].

Other attacks such as Denial-of-Service (DoS), replay attacks, and deception attacks

have recently received attention in industrial systems as well. A DoS attack aims to drain

the resources of the network to make them unavailable and prevents the devices of the

network from communicating with each other [3]. Replay attacks intercept a valid data

transmission and then repeats the transmission with the purpose of fetching the same

information as the original data transmission [3]. In deception attacks, also known as

false data-injection attacks, data integrity is modified in the transmitted packets. Stuxnet

is one example of such an attack [3]. False data-injection attacks are harder to detect

and are not as frequently investigated as DoS attacks [4]. Other types of existing attacks

on industrial systems are eavesdropping, Man-in-the-Middle attacks, various types of

worms, trojans, and viruses [5]. Detection of these attacks in an industrial environment

is crucial in order to have a secure system. Depending on the type of attack mentioned

above, the detection method may be different. This work focus on the detection of false

data-injection attacks.

1.1.2 Intrusion detection in Industrial Networks

Intrusion detection systems for industrial systems can be divided into two categories,

network-based or host-based [6, 7]. Network-based systems collect and analyze the entire

network communication while the host-based solution identifies intrusion behavior on

each individual node. As network-based intrusion detection systems only need to be

installed on one point in a network and can analyze the inbound and outbound traffic of all

configured devices on the network, they are often more suitable for automation networks

[7]. In addition, host-based intrusion detection systems require additional memory and

computing resources, which can affect the industrial process.

Each of these two methods can, in turn, be either misuse-based or anomaly-based.

Misuse-based intrusion detection systems compare the incoming data with a predefined

signature, which is the reason they are known as signature-based intrusion detection

systems as well. Meanwhile, anomaly-based intrusion detection systems compare the

1.1. Background 3

current behavior against a normal learned behavior and try to identify abnormal patterns

[6]. The main difference between signature and anomaly-based systems lies in the concept

of attack and anomaly. An attack is an operation that aims to put the security of a

system at risk [8], while an anomaly, in a network security context, is a behavior that

is suspicious to be a risk to the security since it does not act similarly to the historical

behavior. Signature-based systems are very good at detecting predefined attacks but lack

the ability to detect unknown and unseen behavior. Anomaly-based detection techniques

have the potential to detect these unseen events. There is a great interest in being able

to analyze network traffic in an anomaly detection procedure, due to the increase of new

unknown attacks on modern industrial control systems. This permits to identify not only

old known threats but also abnormal behavior that has not been seen before.

False data-injection attacks, can only be detected with a deep packet inspection where

the payload of the packets sent over the network is explored [9]. Since industrial systems

include multiple different communication protocols and standards, each protocol needs its

own analysis and therefore the packet inspection might differ depending on the protocol.

Profinet is one example of all the communication standards used in the industry and is one

of the emerging protocols found in many industrial applications today [7]. The protocol

communicates over Industrial Ethernet and is one of the leading Industrial Ethernet

Standards in the market [10]. The Profinet application data in the packets sent over the

network varies depending on the application. Taking the example with the centrifuges

of Iran’s nuclear program, the actual application data of the Profinet packets can be the

desired speed of the operating motor.

Anomaly-based intrusion detection systems in information technology (IT) systems are

not so common in practice. The reason is the dynamic behavior of regular IT systems,

which makes it difficult to define a proper model for normal behavior. For industrial

networks, however, communication is often much more structured and steady [7]. These

considerations motivate our work on intrusion detection systems that can inspect indus-

trial communication protocols on the application level with deep packet inspection and

detect anomalies in the network data.

1.1.3 Network Analytics

Where to run the analytics of network data, and data in general, has importance to

different concepts. Latency and network bandwidth are terms that might be interpreted

differently depending on where the analytics is running.

Virtually unlimited resources in the cloud have resulted in the emergence of a lot of

different cloud services. Running analytics in the cloud requires that the data to be

analyzed is sent from the data source to the server continuously. An alternative is to

run analytics in the local network, closer to the data source. Edge computing is the

concept of running analytics directly at the network edge [11]. Ahmed [11] compares

the pros and cons of cloud computing compared to edge computing. Cloud computing

4 Introduction

has the benefit that hardware capabilities are scalable, resulting in unlimited resource-

capabilities in practice. The disadvantage of cloud-based solutions is that latency and

jitter are often high. Sending continuous data to the cloud requires a constant high

bandwidth. Edge-based solutions, on the other hand, can maintain low latency, since the

traffic to the cloud can be reduced. Edge computing is therefore suitable for applications

that require analytics on a great volume of data in real-time. The drawback might be

that edge devices are often resource-constrained.

Detecting anomalies in network data requires analytics of big data volume in real-time.

This thesis work investigates an edge-based solution, where detection of anomalies in

network traffic will be made on a resource-constrained edge device.

1.2 Related work

Numerous studies have been made on information security and intrusion detection sys-

tems in general [12]. Wressnegger et al. [13] discuss the need for protocol specifications

to analyze the content in the data for industrial networks and presents a content-based

anomaly detection framework for binary protocols. Their method manages to detect

97.1 % of the attacks in their dataset with only 2 false-alarms out of 100,000 messages.

Most of the research within network-based intrusion detection systems on industrial net-

works is focused on detecting anomalies in the traffic flow characteristics, such as through-

put, port number, and IP addresses. Mantere et al. [14] analyze possible features for use

in a machine learning based anomaly detection system. The research is however limited

to the IP traffic and does not consider the actual payload of the traffic.

A few papers focus specifically on the Profinet standard. Sestito et al. [15] present a

method for detecting anomalies in Profinet networks. Their method uses an Artificial

Neural Network (ANN) for classifying the incoming data into 4 different classes, where one

of the classes is normal operation. The authors derive 16 different traffic-related features

used for the ANN and conclude that their methodology may be successful for anomaly

detection in any Profinet network. Their method is however based on supervised learning

and requires labeled data for classification. Schuster et al. [9] also present an approach

for anomaly detection of Profinet. Unlike the method above, their model analyses all

network data, including flow information, application data, and the packet sequences by

doing deep packet inspection. They extract features from the actual packet data, such

as mac address, packet type, packet payload. Schuster et al. [16] present the results of

applying the one-class SVM algorithm for detecting anomalies in Profinet packets.

The work in this thesis focus on similar approaches, with deep packet inspection and

unsupervised anomaly detection methods. Additionally, the method presented in this

work is based on a stream-based approach where the machine learning algorithm is doing

the analytics online on the incoming data. Mulinka and Casas [17] compare different

stream-based machine learning algorithms, for the case of detecting abnormal network

traffic in an online manner.

1.3. Problem formulation 5

1.3 Problem formulationAnalyzing industrial networks with deep packet inspection is not completely straightfor-

ward. One of the key challenges is the need for specific knowledge about each individual

protocol. Today, there does not exist a solution in the market which can inspect indus-

trial network packets at a payload-level and detect malicious malware with an anomaly

detection approach and in addition, run the analytics on a resource-constrained device.

Been able to detect unknown malware and additionally run the analytics locally on the

edge would have many benefits.

HMS Networks AB, a Swedish company from Halmstad, supplies products and solutions

for industrial communication and the Industrial Internet-of-Things. The company has

a large amount of experience in industrial communications and is a leader in providing

software that connects industrial devices to different industrial networks and IoT systems.

HMS constantly strives to prototype new solutions and possibilities within the market.

With the new technology within edge computing and machine learning, the company

wishes to examine the possibilities to design and implement a machine learning-based

solution running on a resource-constrained edge device from HMS.

Detection of anomalies in an industrial protocol is an interesting use case for both

the company and the industry. Since HMS has deep knowledge about specific industrial

protocols, the company wants to examine the possibility of using machine learning to

detect anomalies in industrial network packets. This project is narrowed down to the

PROFINET standard because the way of detecting anomalies is not the same for all

protocols. Therefore, the goal of this project will be to implement and evaluate a machine

learning solution to detect anomalies in Profinet packets. The solution should run on

one of HMS resource-constrained edge devices. In summary, the aim of the project is to

investigate and answer the following research questions:

Q1 Can abnormal data in an industrial network be detected using Deep Packet Inspec-

tion and Machine Learning?

Q2 To what extent, as it relates to performance, speed, and system footprint, can an

unsupervised anomaly detection algorithm be useful on a resource-constrained edge

device?

Q3 Is the implementation of the solution feasible in an existent device available in the

market?

1.3.1 Delimitations

As described in the problem formulation, the intention of the thesis is to investigate

whether it is possible to detect abnormal data in industrial networks. Unfortunately,

it will not be possible to test and validate the implemented solution on network data

generated from an actual industrial environment. Therefore, data will be simulated.

6 Introduction

Real world network data would be needed to deploy the implementation as a complete

solution in production.

1.4 Thesis outlineThis thesis is structured as follows: Chapter 2 provides the background theory about

the different elements used in the proposed solution, such as theoretical information

about Profinet, stream processing, feature engineering, anomaly detection, and machine

learning. Chapter 3 describes the method used in the implementation of this work, the

tools used in the implementation and how they fit together. This chapter also provides

a description of how the different parts in the solution are generated, implemented and

validated. Chapter 4 presents the results of the proposed solution, while Chapter 5

provides conclusions to the research questions stated in the problem formulation and

future work.

CHAPTER 2

Theory

Machine learning has existed for several decades, however, it is until recently that it

has gained popularity and many of their methods have become possible to implement

in real-world applications. Machine learning can be described as the process of using

algorithms that learn from data to predict outcomes on later observations [42]. This is

in contrast to algorithms that are explicitly programmed by humans.

A predictive machine learning model requires data for its training. When the model

has been trained and is provided with inputs, it makes prediction outputs based on the

data the model was trained with. In the case where detecting abnormal data in industrial

networks is the goal, the model is trained with data from network traffic, and the model

defines if new incoming data is normal or abnormal. The inputs to the model, also

known as features, must be extracted from the data source. This process is called feature

extraction and is a crucial step in the design of a machine learning solution.

The first step in the design of a machine learning solution is data collection, which

in this project is data in the form of Profinet packets. Data processing includes data

collection, pre-processing and feature extraction. The processed data can then be used

as input to train a model, which finally can make a decision based on future inputs.

2.1 Profinet

Industrial Ethernet technology has recently increased in popularity by offering higher

speed, longer connection distance and ability to connect more nodes than the tradi-

tional serial Fieldbus protocols in the industry floors [21]. Among several Industrial

Ethernet standards, Profinet is one of the most common in the industry today, used in

solutions such as factory automation, process automation, and motion control applica-

tions. Depending on the type of functionality and requirements for the data transmission

over the network, Profinet offers two variants of functionalities. The first one, defined as

Profinet CBA (Component Based Automation), is suitable for component-based machine-

7

8 Theory

to-machine communication via TCP/IP. The other variant is Profinet IO, used for data

exchange between controllers and devices. This thesis will focus only on Profinet IO.

2.1.1 Profinet system model

A Profinet IO system consists of the following different device classes that communicate

with each other:

IO-Controller A Profinet IO-Controller is typically the Programmable Logic Controller

(PLC) where the control program runs. The IO-Controller exchanges information

to the IO-Devices in the network, and acts as a provider of output instructions to

the devices and a consumer of input data from the devices [18, 20].

IO-Device Profinet IO-Devices are distributed I/O field devices that can exchange data

with one or several IO-Controllers [20].

IO-Supervisor An IO-Supervisor can be a personal computer (PC), programming de-

vice (PG), or a human-machine interface (HMI). The purpose of an IO-Supervisor

can be commissioning or diagnostic [20].

A communication path between an IO-Controller and an IO-Device must be established

to set up a communication between them, which is made during system startup [20].

When the IO-Controller is initialized, it sets up a connection, called Application Relation

(AR), to each IO-Device using Distributed Computing Environment / Remote Procedure

Calls (DCE RPC) [33]. The AR specifies Communication Relations (CR) where specific

types of data are sent. The different CRs that exist in an AR are Record data CR,

IO data CR, and Alarm CR. Figure 2.1 illustrates the application and communication

relations.

Figure 2.1: AR and CR between IO-Controller and IO-Device. Picture taken from [20].

2.1. Profinet 9

2.1.2 Profinet Cyclic IO Data Exchange

Profinet provides services such as the cyclic transmission of I/O data (RT and IRT),

acyclic transmissions of data (parameters, detailed diagnostics, etc.), acyclic transmission

of alarms and address resolution [20]. However, this project will only deal with the cyclic

IO data exchange, where data is sent from an IO-Device to an IO-Controller. The Cyclic

transmission of I/O data in PROFINET IO occurs in the IO data CR, where cyclic data

is sent between an IO-Controller and IO-Device. The data is always transmitted in real-

time according to the definitions in IEEE and IEC for high-performance data exchange of

I/O data [20]. The real-time communication in Profinet IO is separated into four classes,

as illustrated in Table 2.1.

RT CLASS 1 is used for unsynchronized communication within a subnet, whereas

RT CLASS 2 can be used for either unsynchronized or synchronized communication.

RT CLASS 3 supports Isochronous Real-Time with clock rates of under 1 ms and jit-

ter below 1 µs. The last class, RT CLASS UDP, uses unsynchronized communication

between different subnets. This project will deal with Profinet IO real-time class 1.

Profinet communication occurs in the data link layer, using the Ethernet protocol,

according to the Open Systems Interconnection model (OSI model). An Ethernet frame

in Profinet, illustrated in Figure 2.2, consists of a 16 bytes header block, containing

destination address, source address, Ethertype, and Frame ID. The Ethertype is set to

0x8892, which indicates the protocol used in the payload is Profinet. The Frame ID

differentiates the Profinet IO service used. For Cyclic data exchange with real-time class

1, the values are between 0x8000 and 0xBBFF. The payload, normally with a size between

40-1500 bytes, is the application data sent between an I/O-Controller and I/O-Device.

The cycle information, called cycle counter, sets the update time of the cycling data sent

from the provider [18]. The frame includes also status information, used for validation

of data status and transfer status in the cyclic exchange.

Table 2.1: Real-time classes in Profinet IO.

Real-time classes in Profinet IO

Class Functionality

RT CLASS 1 Unsynchronized communication within a subnet.

RT CLASS 2 Unsynchronized or synchronized communication.

RT CLASS 3 Isochronous Real-Time communication.

RT CLASS UDP Unsynchronized communication between different sub-

nets.

10 Theory

Figure 2.2: Profinet frame. Picture taken from [19].

2.1.3 Abnormality in Profinet packets

The meaning of abnormal data in Profinet packets depends on the application and re-

quires the definition of normal behavior. A common element for all false-data injection

attacks is that the contents in the packets differ from the normal case. The bytes in the

payload are modified to cause damage or interfere with the regular industrial process.

The main challenge to the detection of anomalies is the uncertainty for the detection sys-

tem of the significance of each byte in the packet. Expanding on the Stuxnet example,

where the speed of the centrifuges was changed, the speed is represented by a specific

number of bytes in the payload. Detection of the abnormal behavior of the speed requires

that the detection system knows where the data for the speed is in the payload. This is

not possible if the detection system is supposed to work for the general case with a lot

of different applications. Therefore, a more general approach is studied in this project,

where the aim is to detect deviations of the behavior in the packets.

An industrial process is often quite static in normal operation. As long as no unexpected

behavior occurs in the process, the inputs and outputs of a PLC stay within a limited

interval. During an attack, however, the static operation gets disrupted and the interval

of the input and output values are likely to widen. This results in that the range of value

combinations the bytes in the payload can take increases, meaning more variations in the

payload. Therefore, we make two assumptions for this project:

1. The payload of the packets varies little during normal operation resulting in a

limited number of combinations.

2. The payload of the packets varies more during an attack resulting in additional

variations in the data that would not appear during normal operation.

2.2. Stream Processing 11

These assumptions are backed by experts in the company, who also agree on them. The

detection method will use these assumptions to detect abnormal behavior. Abnormal

behavior on packet level in an industrial process is thus related to how much the content

varies during a time period.

One drawback of anomaly-based intrusion detection systems is the high amount of false

alarms. Not all anomalies are equivalent to an attack. For example, a fast increase of

user activities in a network may be the result of a DOS or Worm attack but it can also be

from an installment of new software under normal network operation [34]. Furthermore,

abnormal data in Profinet does not necessarily mean that it is an attack either. It could

be due to physical disturbances in the sensors that disrupt the control loop in a PLC.

However, the aim of the system in this project is to detect abnormal behavior. Further

analysis, out of the scope of this project, is required to decide whether the abnormal

behavior detected is an attack or not.

2.2 Stream Processing

A data stream can be described as a continuously growing sequence of data items [24].

Data streams exist in many different shapes in the real world, where some examples are

sensor data, network traffic, and sound waves. A lot of applications today require to

analyze incoming data streams in an online manner without actually storing the data.

Anomaly detection in network traffic by collecting all data and storing it in a traditional

database for offline analytics is not practical or sufficient when the goal is to detect the

anomalies in real-time. Furthermore, it is not always possible to store the large volume

of data that is required for processing because of constraints in capacity. The continuous

data used in many applications is often massive, unbounded and evolves over time [27].

Storing all incoming data in the memory of the device where the analytics takes place

may not be always feasible, especially if the device is resource-constrained. A tool that

makes it possible to query data directly on the incoming streams without storing all data

is often necessary. A Data Stream Management System (DSMS) is a software that can

process incoming queries over continuous data streams, in contrast to a DBMS (database

management system) which works only for static data. The queries are often written in

a scripting language, that is similar to SQL language, and can be used to make various

kinds of filtering, calculations, and statistics on the streaming data. The queries over

the data streams are continuous queries (CQs), which means they run continuously until

stopped by the user and produce a streaming result as long as the queries are active [25].

This is in contrast to traditional queries on databases where the queries are executed

until the requested data is delivered.

12 Theory

2.3 Feature Engineering

Anomaly detection methods based on a machine learning model use measurable infor-

mation as input to decide whether the incoming data follows a regular pattern or not.

How to represent the input, also known as features, to a machine learning algorithm

is an important element when doing a machine learning project [26]. Often, the col-

lected raw data can not be used directly as input, but instead derived features need to

be constructed from the data source. Feature engineering is the task of choosing the

features to be used for the machine learning algorithm, where the choice is often based

on domain-knowledge from experts within the specific field.

2.3.1 Features

Tran [41] proposes two different categories of network related features with the respective

subcategories and examples:

• Packet traffic features:

Packet traffic-related features inspect individual packets and extract useful informa-

tion from their header and content. The author divides the packet traffic features

into four subcategories.

– Basic packet features: The most simple category is basic packet features,

where basic header fields such as source port, destination port header length

and various status flags are some examples.

– Content-based packet features: Content-based packets features are derived

from the actual content in the packets.

– Time-based packet features: Time-based packet features focus on measuring

the number of a certain variable occurring during a time period. One example

of a time-based packet feature can be the number of frames from destination

to the same source in the last t seconds.

– Connection-based packet features: The last subcategory, connection-based

packets features, are those that identify characteristics between the sender and

receiver. The number of frames to a unique destination in the last n packets

from the same source could be a type of connection-based packet feature.

• Network flow traffic features:

The other category is network flow traffic-related features. The main difference

between packet traffic features and network flow traffic features is that the latter

one inspects the flow, meaning a sequence of packets, between source and destina-

tion. Analyzing sequences of packets is good as it enables to identify patterns that

otherwise might be hidden on the individual packet level.

2.4. Anomaly Detection 13

– Basic features: An example of a basic traffic flow feature is the length of the

flow (in seconds).

– Time-window features: The author mentions the number of flows to unique

destination IP addresses inside the network in the last t seconds from the same

source as an example of a time-window feature.

– Connection-based features: The number of flows to a unique destination IP

address in the last n flows from the same source is an example of a connection-

based traffic flow feature.

The idea discussed by Tran [41] is applied to TCP and IP traffic. However, the idea

remains the same within Profinet traffic. In summary, features can be constructed in

various domains, such as time-domain, connection-domain, and frequency-domain. The

features to be selected depends on the detection goal and how the abnormal behavior

might look like.

2.3.2 Selection of features

The selection of which features to be used in this project are based on domain-knowledge

and intuition of what measurements have clear distinctions between normal and abnormal

data. The constructed features consider the assumption stated in section 2.1.3 that the

payload varies more during abnormal operation, the constructed features take that in

mind. Since the goal is to use deep packet inspection and detecting abnormality in

the content of the packets, content-based packet features are used. The features also

have to take connections into consideration, because in an industrial environment the

data sent between an IO-Controller and different IO-Devices might not be the same.

The IO-Controller can have separated data transmissions to different devices, resulting

in diverse data depending on the source and destination connection. Therefore, the

machine learning model needs inputs that separate the connections. Another aspect to

take into account is that abnormal behavior may not be possible to detect by a simple

analysis of each individual packet. Detecting variations in data might require to examine

several packets over a window or sequences of packets. Schuster et al. [16] construct

feature vectors based on sequences of multiple packets. A similar approach is used in

this work. The selected features are:

• Standard deviation of the payloads from source to destination for the n last packets

• Number of distinct payloads from source to destination for the n last packets

2.4 Anomaly DetectionThe goal with anomaly detection, also referred to as outlier detection, is to identify

patterns in data that are different from the expected behavior [35]. The expected behavior

14 Theory

depends on the underlying distribution of the data. Anomalies are those behaviors or

objects that are not considered to be normal. Anomaly detection has been studied

since the early 19ths century by the statistics community and finds its use in several

applications amongst intrusion detection. In intrusion detection systems and network

security, the aim is to find known or unknown anomalies that indicate an attack or

a virus. Garcia-Teodoro et al. [8] divide anomaly-based network intrusion detection

systems into three categories:

• Statistical-based

• Knowledge-based

• Machine learning-based

Statistical-based methods fit a statistical model representing the stochastic behavior of

the system and assume that normal data belongs to the higher probability regions of

the model whereas anomalies lie in the lower probability regions. Incoming data is

compared with the trained model to estimate whether it is an anomaly or not. The

anomaly decision is often based on an anomaly score with a predefined threshold. If

the score exceeds the threshold the system detects the incoming data as an anomaly

[8]. Knowledge-based intrusion detection systems are formed by human experts and are

often defined by rules of what is a normal system behavior. The main advantage of

using knowledge-based anomaly detection systems is the ability to relate the acquired

information to the knowledge of the model. Another advantage is that the number of

false alarms is often low. The third approach listed above is a machine learning-based

method, which will be the focus in this thesis.

Machine learning-based anomaly detection use machine learning algorithms to classify

the data as either normal or abnormal. A feature vector can be described as X(t) ∈ Rat time t ∈ [0, t]. Consider two states that define a normal or abnormal operation, where

wq, q = 0, 1. w0 = 0 indicates normal data and w1 = 1 stands for anomalous data.

In machine learning a mapping between X and wq is made by learning from historic

measurements. Consider a data set D of m measurements where each measurement is

an observation on X. With xi(t) indicating the i-th observation, also called a training

sample in machine learning, D can be described as D = {xi(t)}mi=1. Furthermore, the

output model set can be described as Dl = {yq}kl=1, with k measurements, where yq are

individual samples on wq, which are called labels with q set to either 0 or 1. A pair

(xi(t), yi) is a labeled measurement. For example, (xi(t), 0) means a sample for normal

operation and (xi(t), 1) stands for an abnormal sample. Therefore there are three types

of data sets of measurements.

• Normal data: Dn = {xi(t), 0}k−ui=1

• Undefined data: D = {xj(t)}mj=1

2.5. Machine learning algorithm for anomaly detection 15

• Anomalous data: Dl = {xr(t), 1}ur=1

The equations are based on work by Thottan et al. [34]. D in this project is the raw

measurement data of Profinet packets. Dn is the measurements when the system is

running in normal mode, that is when no attacks occur. Dl corresponds to observations

when an attack is happening, which is an abnormal behavior. Anomaly detection learns

a mapping between a training set, consisting of measurements, to the operation state

wq. This learning can then be used to classify new incoming events to either normal or

anomalies [34].

A training set contains combinations ofDn, D andDl. WhenDl is included the learning

is said to be supervised, since labels are included in the training set. Although supervised

learning methods can provide higher accuracy than using unsupervised learning there are

some drawbacks. Labels are often very difficult and time-consuming to obtain in reality.

The knowledge of how the network and packets behave is often too poor in order to

set proper labels. These concerns apply to network behavior in general but also at the

packet level in particular. Another disadvantage is that labeling all possible attacks is

not possible, particularly new attacks that have never been seen before [36]. When only

D is included in the training set, the learning is unsupervised. In unsupervised learning,

the goal is to detect anomalies by only looking at the properties and relations between

the data elements in the data set. No labels are required neither for normal or abnormal

data. Unsupervised anomaly detection is a viable approach because it does not need

predefined labels to construct a model. This work will consider unsupervised learning

because of the motivation stated above.

2.5 Machine learning algorithm for anomaly detectionThis section describes the anomaly detection algorithm used in this work. Clustering is

a common method for anomaly detection. Clustering is an unsupervised classification

method used to separate data into groups (clusters), where each cluster has similar char-

acteristics [28]. In clustering, the available data to be grouped is not labeled beforehand

as is the case for supervised learning methods. Instead, the method tries to group a

collection of unlabeled patterns into meaningful categories that are obtained only from

the data itself [28]. The selection of ideal features for the clustering method is important

to be able to recognize patterns among the different clusters [29]. Clustering methods

can be divided into four different categories [30]:

• Partitioning methods

• Density-based methods

• Hierarchical methods

• Grid-based methods

16 Theory

Partitioning-methods divides the data into k partitions, where each partition is a clus-

ter. A well-known partitioning based method is the k-means algorithm. K-means uses

the mean as a function to decide which cluster an observation belongs to. Density-based

algorithms divide data into groups based on the density. High-dense regions are points

that lie close to each other and are grouped into the same cluster. Correspondingly,

low-dense regions are considered to be outliers. Hierarchical-methods group objects into

hierarchical structures, while grid-based divide objects into grid-structures. Chen and

Tu [31] describe density-based methods as natural and attractive for data streams, as

they have the ability to find arbitrarily shaped clusters, and they need to examine the

data only once. They can also handle noise well and it does not need a prior specification

of the numbers of clusters to be used unlike the partitioning-based k -means algorithm,

where you have to define the number of clusters in advance k. Examples of density-based

methods are DBSCAN, OPTICS, and DENSTREAM. DENSTREAM is based on DB-

SCAN and has additional features that enable the algorithm to be used with evolving

data streams [40]. The algorithm that will be studied in more depth in this work is DB-

SCAN. The reason why DBSCAN is selected over other clustering methods is because

it can find arbitrarily shaped clusters and consequently there is no need for defining the

number of clusters beforehand. The reason it is decided to use clustering in this project

instead of any other unsupervised machine learning method is because of its ability to

group observations into several groups. As described in section 2.3.2 the features take

connections in mind, which means that the model might divide observations into groups

related to the data sent between each connection. Other methods besides clustering were

intended to be tested in this thesis as well but because of time limits, it was not possible.

These methods were One-class support vector machine and Local Outlier Factor.

2.5.1 DBSCAN

A well known density-based clustering algorithm is the DBSCAN (Density-Based Spatial

Clustering of Applications with Noise) algorithm. DBSCAN takes two user-defined input

parameters, neighborhood distance Eps and minimum number of points MinPts. Having

defined a set of points to be clustered, each point can be classified as either a core

point, reachable point or an outlier point. A core point is a point with at least MinPts

neighbors (including the point itself) within the radius of Eps, where the Eps radius can

be measured with an arbitrary distance method such as the Euclidean distance. Each

neighbor to the core point within the Eps radius is called a direct density reachable

point and belongs to the same cluster as the core point. These neighbors can again be

core points. In that case, the points in the neighborhood of these core points are also

included in the same cluster, where each point is a density reachable point. None-core

points that are density reachable are called border points. All other points that are not

density-reachable from any other point are called outliers or noise points and are not

included in any cluster [32].

2.5. Machine learning algorithm for anomaly detection 17

Figure 2.3: DBSCAN. Picture taken from [32].

Figure 2.3 shows an illustration of the DBSCAN model. MinPts parameter is set to 4

and Eps is visualized by the circles. Point A and all the other red points are core points

since there are at least four points within its own neighborhood. The yellow points B and

C are not core points, but since they are density-reachable from point A they still belong

to the same cluster and are defined as border points. Point N is not reachable from any

other point. Thus, it is considered as an outlier. The task of the DBSCAN algorithm is

to compute clusters and find outliers or anomalies as illustrated in the model. Figure 2.4

describes the pseudocode for DBSCAN.

18 Theory

Figure 2.4: Pseudocode of DBSCAN. Picture taken from [32].

CHAPTER 3

Methodology

This chapter describes the methodology behind the implementation of the project. Vari-

ous software and hardware tools that are used and the architecture of how all these tools

fit together is described. It also describes how self-generated network data is processed

to create the selected features for the machine learning model and how the model is

implemented and validated. In short terms, network data is generated according to a

simulated use-case. The network traffic is sniffed and collected on an edge-device where

a machine learning platform is installed. The final solution consists of a machine learning

algorithm that runs on the device and classifies the incoming data as either normal or

abnormal.

3.1 Tools

3.1.1 Hardware

Edge device One of the goals of the project is to run the analysis on a resource-

constrained edge device. The edge device used is a development board from HMS

called Beck DK151 (DB150). The board has an embedded controller called SC145

with a 128 megabyte (MB) working memory, 64 MB flash memory and it comes

with an ARM Cortex A7 processor. The device has a built-in Linux-based operating

system called RTOS-LNX. The machine learning program runs on this device.

Profinet IO Controller A Siemens S7-1200 PLC is used as a Profinet IO Controller.

The PLC had to be configured to exchange cyclic data with the desired IO device

from the windows program Tia Portal before system startup. The PLC can provide

output data to IO devices and also consume data from IO devices. For this project,

it is configured to consume incoming data from an IO-device (Anybus X-gateway).

It can handle communications of up to 10/100 Mbps.

19

20 Methodology

Profinet IO Device An Anybus X-gateway from HMS Networks with Modbus TCP -

Profinet IRT translation is used as a Profinet IO Device. The device permits to

send cyclic I/O data between Modbus TCP networks and Profinet. The gateway

translates Modbus data into Profinet cyclic I/O data. This results in that Profinet

frames can be sent from the X-gateway to the PLC. The reason a Modbus TCP

gateway is included in the project is that it enables to generate Profinet data in a

very flexible way, which is described further down.

Network sniffer A Network Tap is used to sniff network traffic for monitoring. The

switch taps into the connection between the IO Controller and IO Device and

forwards the traffic to the edge device. It needs to be clarified that the edge device

is not a Profinet IO Device. It just collects traffic sniffed from the network tap.

Visual Analyzer A personal computer (PC) is used for data visualization and deploy-

ing queries to the data streams.

3.1.2 Software

Packet generator The initial idea was to program the PLC to send Profinet data from

the PLC to an IO device. However, after discussions with some employees at HMS,

it was concluded to change the architecture due to the complexity of generating

desired Profinet data. Instead, a packet generator was written in Python to send

the desired cyclic data from the Anybus X-gateway to the PLC. The Modbus/TCP

client library pyModbusTCP is used in the Python program to write data over

Modbus TCP to the gateway. The data is then transferred by the gateway over

Profinet to the PLC. Having this architecture makes it very flexible to try different

structures of the data to be sent over the Profinet network, and the changes can be

made very quickly. The operation of the python program is described in Section

3.3.

Packet collector The cyclic Profinet traffic is sniffed and collected on the Edge device.

The software application that reads the incoming traffic on the Ethernet port of

the device is written in the programming language C. The board includes an Ap-

plication programming interface (API) called Packet API that provides functions

for reception of Ethernet packets. The application reads all incoming traffic and

filters on cyclic Profinet I/O packets.

Stream engine The analysis part is made with the help of a stream processing and anal-

ysis system. The platform is called sa.engine and is provided by a company named

Stream Analyze. The platform supports online analysis of data streams includ-

ing deployments of statistics- and machine learning models to resource-constrained

edge devices. The largest configuration of the software requires only 7 MB storage

and was installed on the board without issues. The platform permits the creation

3.2. Architecture 21

of data streams in the programming language C. Since C is used for the data collec-

tion, the collection and creation of the stream are merged into the same program.

The stream consists of arrays where each array contains the data sources intended

for analysis. Once the initialization of the streams is completed, they are used by

the analysis tool in the platform to make continuous queries to the streams. For

the analyst, the platform provides a Visual Analyzer running on a PC. The Visual

Analyzer consists of a graphical user interface where queries can be written and

deployed to edge devices. The queries are written in a language, similar to SQL,

called OSQL. The CQ analyzes the data streams and the result is sent back to

the visual analyzer where it can be visualized as either text result or appropriate

graphical plots. The communication between the visual analyzer and the edge de-

vice occurs over TCP. Sa.engine has also support for developing machine learning

models in the visual analyzer. After training of the model, it can be deployed on

the edge device where online analytics is running.

3.2 Architecture

The overall architecture of the project is described in Figure 3.1. The python script

writes data over Modbus TCP to the Anybus X-Gateway. The gateway then translates

the data into Profinet and sends the Profinet frames via the network tap to the PLC. The

PLC acts as a Profinet IO Controller, which receives input from the IO Device (Anybus

X-Gateway). In a real world, this input could be a sensor value acting as an input to a

control loop in the PLC, or a simple monitoring measurement. The significance of the

Profinet traffic is not relevant for this project, neither is the direction of the traffic since

the goal is to find anomalies in the packets. Meanwhile the traffic is generated between

the IO Device and IO Controller, the network tap sniffs the packets and sends them to

the edge device, where the anomaly detection algorithm runs. Data visualization occurs

on the laptop via sa.engines visual analyzer.

3.3 Generation of network data

The aim of this project is to detect anomalies in Profinet packets. An already running

system setup generating real data is not available, neither is actual data generated from a

real attack. Therefore, a simulated use case needs to be constructed, where both normal

and abnormal data is generated. This simulated case, written in python, strives to be

as realistic as possible. It should also be generic, meaning that the detection method

should work for the general case and not be focused on one specific attack scenario. As

stated in section 2.1.3, normal data is more static than abnormal data. This situation

is taken into consideration in the script. Therefore, the script sticks to the idea that

a PLC normally operates in strict patterns with small changes in inputs and outputs.

22 Methodology

Figure 3.1: Overall architecture of hardware and data transmission.

Taking an industrial robot as an example, the movement of the robot is fixed. The same

is similar for a conveyor belt. The idea and structure of the generated data were defined

in conjunction with experts from HMS. Although the use case is supposed to be generic,

it requires backing of real-world logic. In an industrial setting that uses Profinet, a PLC

sends all information needed in the Cyclic I/O payload. Taking the example of an electric

motor drive, information such as speed and direction is embedded into the payload block.

The case will be influenced by a motor drive scenario where speed and direction will be

randomized and embedded into the Profinet packets. The script generates normal data

and abnormal data separately. The generation of data for normal operation follows the

following procedure:

1. Randomize time t (0-5 seconds)

2. Randomize speed s (0-30)

3. Randomize direction d (0 or 1)

4. Write speed and direction into registers in Anybus-X Gateway for t seconds.

5. Profinet cyclic I/O data is sent to the PLC.

6. Repeat steps 1 to 5 until program is stopped by the user.

3.4. Data processing 23

For abnormal data generation, the procedure looks the same. The only difference is

the speed range s, which is randomized between 0 and 1000. The abnormal data will

have a larger variance.

3.4 Data processing

The collected data requires to be processed to extract useful information from the raw

data sniffed from the network. The raw network traffic needs to be filtered and structured

to create proper features for the detection method. The steps involved in the data

processing stage are filtering, cleaning, normalization and feature extraction from the

data.

3.4.1 Data collection

Since no external data is available beforehand, all data is collected directly from the traffic

in the network. After the production of the Profinet traffic, as described in Section 3.3,

the network data is ready to be sniffed and captured. Raw Profinet packets are collected

directly on the edge device as described in Section 3.1.

3.4.2 Pre-processing

Useful information has to be produced from the raw network traffic to detect abnormal

data. The network traffic is filtered to consider only Profinet frames, which in turn

are filtered to only consider Cyclic I/O data. Therefore, frames with frame id between

0x8000 and 0xBBFF are used. This filtering is made in the C application running on the

edge device. To ensure that there are no duplicates or missing values, Wireshark is used

to record a sample of the traffic on the PC. The recorded traffic is relayed to the edge

device to ensure that the traffic is identical on the edge device. This is carried out by

comparing the recorded traffic file with the output on the device. This action revealed

that no packets were missing when comparing them, nor duplicates were found.

When filtering and cleaning stage is completed, the next step is to generate data streams

containing data sources. Data streams are created with sa.engine in the same application

running on the device. Each element in the data stream is represented as an array

containing all the data sources intended for the stream. The collected frames include

useful information as indicated in Figure 2.2. From these frames, data sources can be

fetched. The following data sources, all based on each individual packet, are used:

• Timestamp

• Packet size

• Mac source address

24 Methodology

• Mac destination address

• IO Data size

• Frame ID

• Cycle counter

• IO Data

Each item in the stream is represented by these packet characteristics. When the data

stream is ready, the stream is queried from the visual analyzer and the elements in the

stream are saved to a .json file on the PC. This procedure is carried out both when

normal operation and abnormal operation are running separately, resulting in two data

sets. One data set contains normal data and another contains only abnormal traffic.

The file with normal data consists of 78349 rows (packets) and 8 columns (data sources),

whereas the file representing abnormal data contains 46376 rows. Having the data stored

in a file provides greater flexibility to experiment with the data.

3.4.3 Feature extraction

The collected data sources need to be converted into useful features that can be used in

the machine learning model. The individual data sources in the stream cannot be used

as input to the machine learning algorithm. Instead, data transformation is required

to enable their use by the machine learning algorithm. Thus, a data set containing

the features to be used by the machine learning algorithm is constructed. The feature

engineering process is carried out offline. The feature extraction procedure is done on

the data sources stored in the .json files. The feature construction is implemented in

a function called feature extraction( windows ) using the OSQL language. The function

takes a stream of windows of specific size and stride. The windows are created by the

built-in function winagg( s, sz, str ) in sa.engine, which forms a stream of windows of

size sz and stride str over a stream s. The stream s is created by calling the function

read stream( file ), which creates a stream containing the elements in a .json file. The

stream contains the data sources listed in the previous subsection. The selected features

described in 2.3.2 are thereby constructed over windows of the data sets. An explanation

of how the selected features are extracted is provided below.

Standard deviation of the payloads from source to destination

for the n last packets

This feature is created by taking the standard deviation of the bytes in all payloads

in a window with size n. The bytes in the payload are first divided by 256 (the

maximum number for a byte) to get a normalized value between 0 and 1.

3.5. Creation of training and validation sets 25

Number of distinct payloads from source to destination for the n last packets

This feature is created by calculating the number of different variants of IO Data

contents existing per window with size n. A lot of variations in the payload from

packet to packet will result in a high number and vice versa. The feature is normal-

ized by dividing by the size of the window, which is the maximum possible number

of variations in a window resulting in a feature that varies between 0 and 1.

Since features are created on windows, and connections (source to destination) are con-

sidered, it results in one feature vector for each connection in a window.

When proper features had been selected and validated and the system is running online

the feature processing is made online on the data stream instead of on a static file. The

input to the feature extraction function becomes the online real-time stream origin from

the edge device. Furthermore, the feature extraction as well as the machine learning

algorithm is running on the edge device when the system is running online, instead of

on the PC (Visual analyzer). See Figure 3.2 for an illustration of the process that goes

from packet collection to feature extraction from the data stream. The window size in

the figure is 5, however different window sizes are tested during validation to examine to

what extent the size of the windows influences the algorithm accuracy.

Figure 3.2: Data processing.

3.5 Creation of training and validation setsThe training of the clustering model is divided into two different scenarios. In the real

world, it is impossible to obtain data resulting from an attack before the attack has even

happened. This means that it is difficult to create a training set based on abnormal

26 Methodology

data. If the aim is to detect unseen attacks that result in behavior that has never been

seen before, a good idea is to train the model with only normal data and everything that

differs from the normal behavior is considered to be anomalies [16]. On the other hand,

it is interesting to see the results of a clustering model trained with both normal and

abnormal data. Since abnormal examples of traffic are simulated and available for this

project, it is possible to use that abnormal data for training. Having described these two

situations, this project considers two different scenarios.

Scenario A Clustering model is trained using features based on both normal and ab-

normal data.

Scenario B Clustering model is trained using features based on normal data, only.

In scenario A, where the model is trained using features extracted from normal and

abnormal operation, the validation set also consists of feature vectors extracted from

normal and abnormal data. Figure 3.3 describes how the generation of the training set

and validation set is performed. Features are extracted in the same format as described

in 3.4.3, where 90% are features derived from normal data and 10% are features extracted

from abnormal data. The dataset ends with the feature vector labels describing the data

as normal or abnormal. It must be clarified that the labels for the training set are not

used in training. The labels are only used to be able to validate the model.

3.5. Creation of training and validation sets 27

Figure 3.3: Training set and validation set for scenario A.

In scenario B, the distribution of the training and validation set looks different when

compared to scenario A. A feature data set is created containing only features from

normal data. 70% of these features go into the training set. The validation set is more

complicated. Here, we want to include features both labeled as normal and abnormal.

30% of the data set described above goes into the validation set and are thereby labeled as

normal. In order to have some observations labeled as abnormal as well in the validation

set, features are generated from a mix of x % normal data and (100 - x) % abnormal

data. Even though this mix contains observations from normal data it is still labeled as

abnormal. The motivation behind this procedure supposes a real-world attack. It might

not really matter how much abnormal data is included, it might still be malicious as long

as there is some abnormal data. Figure 3.4 illustrates the distribution of the training and

validation set when the machine learning model is trained only using features labeled as

normal.

28 Methodology

Figure 3.4: Training set and validation set for scenario B.

3.6 Implementation of machine learning algorithm

The machine learning part in this project is implemented using the same stream pro-

cessing engine sa.engine. Sa.engine has built-in support for various machine learning al-

gorithms, including DBSCAN. The step after feature extraction, as described in Section

3.4.3, is to train the DBSCAN model. The model is trained with the feature vectors from

the generated training set described in Section 3.5. The first step is to store these feature

vectors that are intended to be clustered by calling the built-in populate data( s ) method.

The method takes a stream s as an argument, where s is a stream of feature vectors from

the training set. Once it is done, the model is trained with the stored feature vectors

by calling conn dbscan( eps, minPts ) with the defined Eps and MinPts as arguments.

The prediction results are generated from a method called conn dbscan classify( v, eps,

minPts ). The argument v is a row in the data set, thus a feature vector.

3.7. Validation method 29

3.7 Validation method

3.7.1 Validation of clustering model

Evaluation of the implemented system will be based on how well the clustering algorithm

manages to classify abnormal and normal data. Evaluating clustering algorithms and un-

supervised methods, in general, is not completely straightforward. The aim of clustering

is to group patterns without predefined labels. Contradictory, to measure how well the

model categorizes normal and abnormal traffic, labels are needed. Not for training the

model, but for validation. As described in section 3.5, the clustering model is trained

in two different ways. Therefore, the validation methodology also looks different for the

two scenarios.

Figure 3.5: Confusion matrix describing relation between predicted and actual values.

The validation metrics are based on how well the clustering model distinguishes normal

data from abnormal. Four concepts need to be introduced here. True positives (TP),

true negatives (TN), false positives (FP) and false negatives (FN). In intrusion detection

systems, these concepts describe the relation between what the system detects (normal or

intrusion) for an analyzed event and its actual characteristics (innocuous or malicious).

This is described in Figure 3.5. TP is when a malicious event is correctly classified as

an intrusion. A TN is if an innocuous event is classified as normal. FP happens if an

innocuous event is classified as an intrusion, in other words a false alarm. If a detection

system classifies a malicious event as normal it is a FN. The objective of an intrusion

detection system is to have a low number of FP and FN, and a high amount of TP and

TN [8].

For scenario A, where the model is trained with both normal and abnormal data, the

concepts signify the following:

30 Methodology

• TP: Abnormal labeled features end up in an abnormal cluster or as an outlier.

• TN: Normal labeled features end up in a normal cluster.

• FP: Normal labeled features end up in an abnormal cluster or as an outlier.

• FN: Abnormal labeled features end up in a normal cluster.

The idea is that if the cluster contains points where 70% are labeled as normal in the

training set the cluster is considered to be normal otherwise it is an abnormal cluster.

This value is not precise and can be changed based on intuition, but it is decided that

70% is a reasonable value. For this experiment clusters in a trained model might be

points that are from abnormal data, that is the reason why some labeling of the clusters

has to be made which comes from the labels in the training set to be able to do the

validation.

For scenario B we train the model only using normal data. The definitions of TP, TN,

FP, FN are therefore slightly different in this case. The following significations apply:

• TP: Abnormal labeled features end up as an outlier.

• TN: Normal labeled features end up in a cluster.

• FP: Normal labeled features end up as an outlier.

• FN: Abnormal labeled features end up in a cluster.

A normal feature vector is a feature vector constructed from normal data, while an

abnormal feature vector is one that is built on x % normal data and (100-x) % abnormal

data, as illustrated in Figure 3.4.

Three measures are used to measure the performance of the clustering model. These

measures are Recall, Precision and F1-score. Recall measures the TP rate and tells how

well the model classifies incoming malicious observations. The formula for Recall is:

Recall =TP

TP + FN(3.1)

One can claim, by looking at the formula, that a high recall means that the model is

good at classifying abnormal data correctly since FN will be low.

Precision, on the other hand, tells how well the model predicts normal observations.

The formula for Precision is:

Precision =TP

TP + FP(3.2)

A low number of FPs results in high precision and vice versa. The ideal scenario for an

intrusion detection system is to have as high recall and precision as possible. Since one of

the problem formulations in this thesis is if it is possible to detect abnormal data, recall

3.7. Validation method 31

is relevant to measure. In addition, since false alarms are a major problem in intrusion

detection systems, these need to be considered as well.

The last metric to be used is F1-score, which is described by the following equation:

F1 = 2 · Precision ·RecallPrecision+Recall

(3.3)

F1 score indicates the harmonic average of recall and precision. The best value is 1 and

the worst is 0.

The validation will be split into the two scenarios A and B. In both scenarios, the

DBSCAN model will be validated with different values of Eps, MinPts, and windows

size of the feature. Eps and MinPts will be varied to see what impact these values have

in DBSCAN. See Section 2.5.1 for info on what Eps and MinPts is.

For scenario B different values of the mix of normal and abnormal data will be used

for the validation set (value x in Figure 3.4). This will measure the model’s ability to

detect abnormal traffic when the data deviates with small steps. In other words, how

much abnormal data that is required in an attack to detect what is abnormal.

3.7.2 Validation of the online data stream case

One of the problem formulations is whether a machine learning algorithm can run on

a resource-constrained edge device. Therefore, the goal is to run the clustering model

on the edge device and detect abnormal traffic in the form of online data streams. The

validation will be based on how well the model classifies generated abnormal traffic as

malicious, thus recall will be measured. The network traffic will be generated in the same

way as described in section 3.3. The only difference is that normal data will be mixed

with abnormal data with x % normal data and (100-x) % abnormal. Features will be

extracted as shown in Figure 3.2 directly on the online data stream. The model will be

trained as in scenario B with only normal data. The training occurs on a PC, and then

the trained clustering model is deployed on the edge device. The selected parameters for

the clustering will be the ones that generate the best results for scenario B.

CHAPTER 4

Results

This chapter describes the results from the validation of the clustering model, both

when the model is trained with both abnormal and normal data and when it is trained

with normal data, only. Results when the solution runs on online data streams are also

presented.

4.1 Clustering method

4.1.1 Scenario A

Figure 4.1 presents the results from the validation of the clustering model when the model

has been trained with both normal and abnormal features. As can be seen, the choice

of Eps and MinPts is of great importance when it comes to designing the model. One

interesting observation is that the precision becomes worse with smaller Eps and greater

MinPts when extracting features of window size 1000, 2000 and 4000 packets. Thus, the

number of false alarms increases if Eps is low and MinPts is high. This might be because

regardless of the label of the incoming observation, any observation becomes an outlier.

The same pattern can be seen for 500 packet windows but not as clearly. By comparing

windows sizes, one can see that the precision is better with smaller window size. An

interruption in the line indicates a NaN (not a number). The reason why precision is

NaN in some cases is when TP+FP becomes zero, hence division by zero in the precision

equation. This happens when all incoming observations are classified as normal, and

therefore no TP or FP occurs. A high value of Eps can lead to that all points going into

normal clusters.

The recall does not have the same impact from Eps and MinPts. On the other hand

one can see that it gets better and better the bigger the window size becomes. When the

window size is 2000 and 4000 the recall gets 100%, resulting in no undetected malicious

behavior. Window sizes greater than 4000 are not tested in this project. The reason is

that a too large window size would result in a few numbers of feature vectors, because

33

34 Results

of the relatively low number of rows in the data set. However, this investigation can

continue in future work.

All the results for scenario A are available as tables in Appendix A.

4.1.2 Scenario B

The results from the model, when training was made with only normal data, can be seen in

Figure 4.2 and 4.3. Figure 4.2 shows Recall, Precision and F1 score when x% normal data

and (100-x)% abnormal data is used in the feature vectors labeled as abnormal. Again

refer to Figure 3.4 for an illustration of these definitions. The features are extracted from

windows with size 4000. As can be seen, the precision is 100% in most of the cases when

Eps is low. Same as in scenario A, NaN means that every observation gets classified as

normal, meaning there are no abnormal labeled features that get classified as outliers,

nor any normal labeled features that become an outlier. This results in that all abnormal

features lands in a cluster resulting in a lot of FNs, which is not good. When 30% of the

abnormal traffic comes from abnormal data, the precision is 100% regardless of Eps and

MinPts, meaning that no false alarms occur.

The recall results are not promising when only 5% is from abnormal data in the ab-

normal labeled features, where the best recall is 32.5%. For 10% abnormal data it is

85.71%, still not a hundred percent trustworthy. However, when 20% is abnormal data,

the clustering method manages to get 100% recall with Eps 0.008 and MinPts 8. This

means that when at least 20% of all Profinet packets in a network are malicious the de-

tection method manages to detect the attack in every case. Since the detection method

considers connections and analyzes all traffic on the network this could mean that either

an attacker completely hijacks 20 out of 100 machines and sends 100% malicious data to

each of these 20 machines. Or it could be that the Profinet traffic to every 100 machines

contains 20% packets from the attacker and 80% packets from the normal operation.

Figure 4.3 shows the importance of selecting a proper window size. It shows a com-

parison when the window size is changed and the ratio of abnormal and normal data is

fixed. As can be noticed the bigger window size the better Recall. This means that the

model is better to detect attacks if the window size is bigger. To see a visualization of

this Figure 4.4 shows a scatter plot of the selected features for window size 500 and 4000

when 90% in the abnormal traffic is from normal data. As one can see, the distinction

between normal and abnormal is cleaner and therefore the model is better at classifying

abnormal traffic correctly.

Comparing scenario A and B for the best case when the window size is 4000 and Eps

and MinPts adapted values. One can see that there is no difference between precision

and recall. They are both 100%.

All the results for scenario B are available as tables in Appendix A.

4.1. Clustering method 35

Table 4.1: Results for Scenario B on online data streams when changing x% normal traffic.

Window size: 4000 packets

95 % normal 90 % normal 80 % normal

Recall 57,1 % 71,4 % 100 %

4.1.3 Online data stream case

The stream engine platform, which includes support for machine learning was installed

on the device without issues. The device did not have any problems to handle a 7 MB

footprint. The trained model was deployed on the device where the analytics were running

on online data streams without obstacles. When the clustering model was running online

on the edge device, the results from the recall looks like in Table 4.1. The selected

parameters for the clustering model are Eps = 0.008 and MinPts = 8. The reason these

parameters are selected is because they give the most promising result in scenario B. The

recall looks rather similar to when the model was running offline. When the generated

traffic consists of 20% abnormal data, the detection method classifies the traffic as an

attack 100 times out of 100. This is a promising result. Only window size 4000 is

presented since it gives the best result when it comes to recall. However, when running

with smaller windows, the detection system is able to send a faster response, in the form

of an alarm, since it takes less time to calculate features for smaller windows in the data

stream.

36 Results

Figure 4.1: Results for Scenario A.

4.1. Clustering method 37

Figure 4.2: Results for Scenario B when changing x% normal traffic.

38 Results

Figure 4.3: Results for Scenario B when changing window size.

Figure 4.4: Scatter plot of the selected features over different window sizes, normalized between

0 and 1.

CHAPTER 5

Discussions, Conclusions andFuture Work

The goal of this project was to implement and evaluate a machine learning solution,

which aims to detect abnormal application data in industrial networks. In addition, the

solution should run locally on a resource-constrained edge device from HMS. Running the

analytics close to the industrial floor reduces the traffic that would have been required

when compared to the case of the analytics running in the cloud.

5.1 Discussions and Conclusions

The investigated questions in this thesis were related to anomaly detection in industrial

networks using a resource-constrained edge device. This section answers the research

questions stated in Section 1.3.

Q1 Can abnormal data in an industrial network be detected using Deep Packet Inspec-

tion and Machine Learning?

The results show that when features are extracted with a proper window size, it is

possible to separate normal from abnormal data, and thereby permitting the detec-

tion of abnormal observations. In this project, the network data is self-constructed,

this includes the appearance of normal and abnormal data. This may explain why

the recall and precision scores are high in the best case scenario. Because this work

was carried out using simulated data, it is not possible to ensure the performance

of the approach in all operated cases. The investigation of additional cases requires

a solution that is trained and validated when data is generated from a real indus-

trial environment. The goal of this implementation was that the detection system

should operate in the general case. General case, means that the method should

work not only on one case of an attack scenario but should work on all false-data

39

40 Discussions, Conclusions and Future Work

injection attacks for any kind of application in the industry. This means that the

unsupervised model should learn what is normal for a general use case, and detect

abnormal behavior by only looking at how the behavior of the incoming application

data is classified by the model. This required a solution that works for any case

without actually knowing the relevance of the network data. If the assumptions in

section 2.1.3 hold true in the real world, and the generated traffic in this project

is realistic, it can be concluded that it is possible to detect abnormal data in an

industrial network by using deep packet inspection and machine learning. This

solution applies only to the Profinet protocol.

Q2 To what extent, as it relates to performance, speed, and system footprint, can an

unsupervised anomaly detection algorithm be useful on a resource-constrained edge

device?

Detecting anomalies in real-time, independent of the use case, requires that the

processing of the incoming data occurs as close as possible to the source of the data.

The results show that there is no significant difference in performance when the

model is running online on the edge device compared to when the model is running

offline, when the adequate parameters for the clustering model and window size are

selected. This indicates that the selected unsupervised anomaly method can retain

its high detection rate even when the system is running on a resource-constrained

edge device with online data streams. The edge device did not have any problems to

handle the required footprint of the stream engine platform, including the machine

learning tools. As described in the results, the detection system provides with

frequent prediction responses when the window size is small. This indicates that

the selection of the number of individual packets to extract features from relates to

the systems ability to send a faster response to the model. Extracting features from

a small number of packets at a time does not require as much processing power as

extracting over bigger windows. This is a trade-off when deciding how many packets

to analyze at a time because at the same time bigger windows produce improved

prediction performance. An investigation of whether a supervised learning anomaly

detection method would have resulted in better or worse performance was not part

of this thesis. However, it can be concluded that an anomaly-detection approach

is preferable over a signature-based system where all signatures for known attacks

must be stored when it comes to storage since an anomaly-based approach does

not require to store signatures for attacks, but instead only makes decisions based

on the observed data.

Q3 Is the implementation of the solution feasible in an existent device available in the

market?

The requirements on a complete solution like the one implemented in this project

must be high to manage running in a real-world case on devices available in the

market. The system must have a good detection rate of abnormal data in order to

5.2. Future Work 41

not miss any potential attacks. At the same time, the system should not generate

too many false alarms, which reduces the confidence in the system. The physical

performance is also important, meaning the system should work with no crashes or

interruptions. Since the results show that the detection rate is good when proper pa-

rameters are selected and the physical performance is good when it comes to speed

and required storage space, this thesis work demonstrates that it is feasible to im-

plement a machine learning solution that runs on an existing resource-constrained

device available in the market. In addition, this solution is able to detect anomalies

in industrial networks according to the response of Q1. However, a full judgment

of a real-world scenario requires validation made on real-world data, which was not

available for this thesis work.

5.2 Future WorkAlthough the implemented solution resulted in good scores in the validation of the de-

tection method and performance, there still remains some future work to improve the

proposed solution. The investigation of different anomaly detection methods was not

entirely thorough. The only algorithm implemented and tested was DBSCAN. Some

research papers bring up One-Class support vector machine as a potential alternative

for unsupervised anomaly detection [16]. An investigation of whether One-Class support

vector machine would generate better results would be interesting to evaluate. DEN-

STREAM is another algorithm that would be interesting to evaluate. Both of these

algorithms are supported in the platform used in this work, but time constraints did not

permit to test them. Another aspect that is left for future work is to train and validate

the model on a real-world Profinet network to see if the validation measurement of the

model provides similar results. This is required to conclude whether the implemented

solution is sufficient for use in a real-world environment. The results from a real-world

case would also confirm whether the assumptions stated in Section 3.3 apply to real-world

environments. As discussed in Section 2.1.3, it is possible the existence of a case where

the anomaly-based intrusion detection system identifies an anomaly that might not be

an attack even though they are anomalies according to the model. A study should be

made as future work to measure how many of the correctly classified anomalies are ac-

tually attacks. This would require data generated from real-world attacks, which was

not available for this thesis work. Future work also includes an investigation of greater

window sizes when extracting features, to see whether the results get better or eventually

begin to decrease. Finally, it can be useful to start building a database with labeled data

of different types of attacks to test supervised learning machine learning algorithms and

compare the results with those in this work.

APPENDIX A

Tables

43

44

Table A.1: Results for Scenario A with window size: 500 packets.


Precision Recall F-score Eps MinPts

NaN 0 % 0 % 0,05 2

100 % 87,5 % 93,3 % 0,025 2

100 % 100 % 100 % 0,01 2

100 % 100 % 100 % 0,005 2

100 % 100 % 100 % 0,003 2

100 % 100 % 100 % 0,002 2

100 % 12,5 % 22,2 % 0,05 4

100 % 87,5 % 93,3 % 0,025 4

100 % 100 % 100 % 0,01 4

100 % 100 % 100 % 0,005 4

100 % 100 % 100 % 0,003 4

100 % 100 % 100 % 0,002 4

100 % 25 % 40 % 0,05 8

100 % 87,5 % 93,3 % 0,025 8

100 % 100 % 100 % 0,01 8

100 % 100 % 100 % 0,005 8

100 % 100 % 100 % 0,003 8

80 % 100 % 88,9 % 0,002 8

100 % 37,5 % 54,5 % 0,05 16

100 % 87,5 % 93,3 % 0,025 16

80 % 100 % 88,9 % 0,01 16

80 % 100 % 88,9 % 0,005 16

80 % 100 % 88,9 % 0,003 16

72,7 % 100 % 84,2 % 0,002 16

45




NaN 0 % 0 % 0,05 2

100 % 100 % 100 % 0,025 2

100 % 100 % 100 % 0,01 2

66,7 % 100 % 80 % 0,005 2

66,7 % 100 % 80 % 0,003 2

66,7 % 100 % 80 % 0,002 2

NaN 0 % 0 % 0,05 4

100 % 100 % 100 % 0,025 4

66,7 % 100 % 80 % 0,01 4

66,7 % 100 % 80 % 0,005 4

66,7 % 100 % 80 % 0,003 4

66,7 % 100 % 80 % 0,002 4

NaN 0 % 0 % 0,05 8

100 % 100 % 100 % 0,025 8

40 % 100 % 57,1 % 0,01 8

40 % 100 % 57,1 % 0,005 8

40 % 100 % 57,1 % 0,003 8

40 % 100 % 57,1 % 0,002 8

100 % 100 % 100 % 0,05 16

100 % 100 % 100 % 0,025 16

13,8 % 100 % 24,2 % 0,01 16

13,8 % 100 % 24,2 % 0,005 16

13,8 % 100 % 24,2 % 0,003 16

11,4 % 100 % 20,5 % 0,002 16

46




100 % 100 % 100 % 0,05 2

100 % 100 % 100 % 0,025 2

100 % 100 % 100 % 0,01 2

50 % 100 % 66,7 % 0,005 2

50 % 100 % 66,7 % 0,003 2

50 % 100 % 66,7 % 0,002 2

100 % 100 % 100 % 0,05 4

100 % 100 % 100 % 0,025 4

100 % 100 % 100 % 0,01 4

50 % 100 % 66,7 % 0,005 4

50 % 100 % 66,7 % 0,003 4

50 % 100 % 66,7 % 0,002 4

100 % 100 % 100 % 0,05 8

100 % 100 % 100 % 0,025 8

50 % 100 % 66,7 % 0,01 8

12,5 % 100 % 22,2 % 0,005 8

12,5 % 100 % 22,2 % 0,003 8

12,5 % 100 % 22,2 % 0,002 8

100 % 100 % 100 % 0,05 16

100 % 100 % 100 % 0,025 16

33,3 % 100 % 50 % 0,01 16

7,7 % 100 % 14,3 % 0,005 16

7,7 % 100 % 14,3 % 0,003 16

7,7 % 100 % 14,3 % 0,002 16

47




100 % 100 % 100 % 0,05 2

100 % 100 % 100 % 0,025 2

100 % 100 % 100 % 0,01 2

100 % 100 % 100 % 0,005 2

100 % 100 % 100 % 0,003 2

20 % 100 % 33,3 % 0,002 2

100 % 100 % 100 % 0,05 4

100 % 100 % 100 % 0,025 4

100 % 100 % 100 % 0,01 4

100 % 100 % 100 % 0,005 4

66,7 % 100 % 80 % 0,003 4

16,7 % 100 % 28,6 % 0,002 4

100 % 100 % 100 % 0,05 8

100 % 100 % 100 % 0,025 8

100 % 100 % 100 % 0,01 8

25 % 100 % 40 % 0,005 8

20 % 100 % 33,3 % 0,003 8

14,3 % 100 % 25 % 0,002 8

100 % 100 % 100 % 0,05 16

100 % 100 % 100 % 0,025 16

25 % 100 % 40 % 0,01 16

14,3 % 100 % 25 % 0,005 16

14,3 % 100 % 25 % 0,003 16

14,3 % 100 % 25 % 0,002 16

48

Table A.5: Results for Scenario B with window size: 4000 packets and 95 % normal traffic.

Window size: 4000 packets — 95 % normal


NaN 0 % 0 % 0,05 2

NaN 0 % 0 % 0,025 2

100 % 15 % 26,1 % 0,01 2

100 % 25 % 40 % 0,008 2

NaN 0 % 0 % 0,05 4

NaN 0 % 0 % 0,025 4

100 % 15 % 26,1 % 0,01 4

100 % 25 % 40 % 0,008 4

NaN 0 % 0 % 0,05 6

NaN 0 % 0 % 0,025 6

100 % 15 % 26,1 % 0,01 6

100 % 25 % 40 % 0,008 6

NaN 0 % 0 % 0,05 8

NaN 0 % 0 % 0,025 8

100 % 20 % 33,3 % 0,01 8

100 % 32,5 % 49,1 % 0,008 8

49




NaN 0 % 0 % 0,05 2

100 % 2,4 % 4,7 % 0,025 2

100 % 47,6 % 64,5 % 0,01 2

100 % 71,4 % 83,3 % 0,008 2

NaN 0 % 0 % 0,05 4

100 % 4,8 % 9,1 % 0,025 4

100 % 47,6 % 64,5 % 0,01 4

100 % 71,4 % 83,3 % 0,008 4

NaN 0 % 0 % 0,05 6

100 % 4,8 % 9,1 % 0,025 6

100 % 57,1 % 72,7 % 0,01 6

100 % 76,2 % 86,5 % 0,008 6

NaN 0 % 0 % 0,05 8

100 % 4,8 % 9,1 % 0,025 8

100 % 61,9 % 76,5 % 0,01 8

100 % 85,7 % 92,3 % 0,008 8

50




NaN 0 % 0 % 0,05 2

100 % 29,2 % 45,2 % 0,025 2

100 % 91,7 % 95,7 % 0,01 2

100 % 95,8 % 97,9 % 0,008 2

NaN 0 % 0 % 0,05 4

100 % 29,2 % 45,2 % 0,025 4

100 % 91,7 % 95,7 % 0,01 4

100 % 95,8 % 97,9 % 0,008 4

NaN 0 % 0 % 0,05 6

100 % 33,3 % 50 % 0,025 6

100 % 91,7 % 95,7 % 0,01 6

100 % 95,8 % 97,9 % 0,008 6

NaN 0 % 0 % 0,05 8

100 % 33,3 % 50 % 0,025 8

100 % 95,8 % 97,9 % 0,01 8

100 % 100 % 100 % 0,008 8

51




100 % 7,4 % 13,8 % 0,05 2

100 % 77,8 % 87,5 % 0,025 2

100 % 100 % 100 % 0,01 2

100 % 100 % 100 % 0,008 2

100 % 7,4 % 13,8 % 0,05 4

100 % 77,8 % 87,5 % 0,025 4

100 % 100 % 100 % 0,01 4

100 % 100 % 100 % 0,008 4

100 % 7,4 % 13,8 % 0,05 6

100 % 81,5 % 89,8 % 0,025 6

100 % 100 % 100 % 0,01 6

100 % 100 % 100 % 0,008 6

100 % 7,4 % 13,8 % 0,05 8

100 % 81,5 % 89,8 % 0,025 8

100 % 100 % 100 % 0,01 8

100 % 100 % 100 % 0,008 8

52




100 % 39,1 % 56,3 % 0,05 2

100 % 100 % 100 % 0,025 2

100 % 100 % 100 % 0,01 2

100 % 100 % 100 % 0,008 2

100 % 43,5 % 60,6 % 0,05 4

100 % 100 % 100 % 0,025 4

100 % 100 % 100 % 0,01 4

100 % 100 % 100 % 0,008 4

100 % 43,5 % 60,6 % 0,05 6

100 % 100 % 100 % 0,025 6

100 % 100 % 100 % 0,01 6

100 % 100 % 100 % 0,008 6

100 % 47,8 % 64,7 % 0,05 8

100 % 100 % 100 % 0,025 8

100 % 100 % 100 % 0,01 8

100 % 100 % 100 % 0,008 8

53




100 % 100 % 100 % 0,05 2

100 % 100 % 100 % 0,025 2

100 % 100 % 100 % 0,01 2

100 % 100 % 100 % 0,008 2

100 % 100 % 100 % 0,05 4

100 % 100 % 100 % 0,025 4

100 % 100 % 100 % 0,01 4

100 % 100 % 100 % 0,008 4

100 % 100 % 100 % 0,05 6

100 % 100 % 100 % 0,025 6

100 % 100 % 100 % 0,01 6

100 % 100 % 100 % 0,008 6

100 % 100 % 100 % 0,05 8

100 % 100 % 100 % 0,025 8

100 % 100 % 100 % 0,01 8

100 % 100 % 100 % 0,008 8

54




NaN 0 % 0 % 0,05 2

100 % 9,5 % 17,3 % 0,025 2

100 % 42,2 % 59,4 % 0,01 2

100 % 52,3 % 68,7 % 0,008 2

NaN 0 % 0 % 0,05 4

100 % 9,8 % 17,8 % 0,025 4

100 % 44 % 61,1 % 0,01 4

100 % 53,2 % 69,4 % 0,008 4

NaN 0 % 0 % 0,05 6

100 % 10,3 % 18,8 % 0,025 6

100 % 44,5 % 61,6 % 0,01 6

100 % 54,9 % 70,9 % 0,008 6

NaN 0 % 0 % 0,05 8

100 % 10,9 % 19,7 % 0,025 8

100 % 45,1 % 62,2 % 0,01 8

100 % 55,2 % 71,1 % 0,008 8

55




NaN 0 % 0 % 0,05 2

100 % 5,7 % 10,9 % 0,025 2

97,8 % 52,3 % 68,2 % 0,01 2

98,2 % 62,1 % 76,1 % 0,008 2

NaN 0 % 0 % 0,05 4

100 % 5,7 % 10,9 % 0,025 4

97,8 % 53,4 % 69,1 % 0,01 4

98,2 % 62,6 % 76,5 % 0,008 4

NaN 0 % 0 % 0,05 6

100 % 5,7 % 10,9 % 0,025 6

97,9 % 54,6 % 70,1 % 0,01 6

98,2 % 64,4 % 77,8 % 0,008 6

NaN 0 % 0 % 0,05 8

100 % 6,3 % 11,9 % 0,025 8

94,2 % 56,3 % 70,5 % 0,01 8

95,1 % 66,7 % 78,4 % 0,008 8

56




NaN 0 % 0 % 0,05 2

NaN 0 % 0 % 0,025 2

100 % 44,2 % 61,3 % 0,01 2

100 % 55,8 % 71,6 % 0,008 2

NaN 0 % 0 % 0,05 4

100 % 1,2 % 2,3 % 0,025 4

100 % 51,2 % 67,7 % 0,01 4

100 % 58,1 % 73,5 % 0,008 4

NaN 0 % 0 % 0,05 6

100 % 1,2 % 2,3 % 0,025 6

100 % 53,5 % 69,7 % 0,01 6

100 % 62,8 % 77,1 % 0,008 6

NaN 0 % 0 % 0,05 8

100 % 4,7 % 8,9 % 0,025 8

96 % 55,8 % 70,6 % 0,01 8

96,7 % 67,4 % 79,5 % 0,008 8

REFERENCES

[1] J. Kennedy, “Stuxnet worm hits iran nuclear plant staff computers.”

https://www.siliconrepublic.com/enterprise/cyber-attack-stuxnet-worm-hits-

iranian-nuclear-plant, January 2010. Accessed: 2019-04-16.

[2] P. Mueller and B. Yadegari, “The stuxnet worm,” Departement des sciences de

linformatique, University of Arizona, 2012.

[3] D. Ding, Q. Han, Y. Xiang, X. Ge, and X.M.Zhang, “A survey on security con-

trol and attack detection for industrial cyber-physical systems,” Neurocomputing,

vol. 275, pp. 1674–1683, 2018.

[4] Y. Mo and B. Sinopoli, “False data injection attacks in control systems,” Preprints

of the 1st workshop on Secure Control Systems, pp. 1–6, 2010.

[5] A. Meshram and C. Haas, “Anomaly detection in industrial networks using machine

learning: A roadmap,” in Machine Learning for Cyber Physical Systems, (Berlin),

pp. 65–72, Springer Vieweg, 2016.

[6] Y. Hu, A. Yang, and H. L, “A survey of intrusion detection on industrial control

systems,” International Journal of Distributed Sensor Networks, vol. 14, 2018.

[7] A. Paul, F. Schuster, and H. Konig, “Towards the protection of industrial con-

trol systems - conclusions of a vulnerability analysis of Profinet IO,” in Detection

of Intrusions and Malware, and Vulnerability Assessment, (Berlin), pp. 160–176,

Springer, 2013.

[8] P. Garcia-Teodoro, J. Diaz-Verdejo, G. Macia-Fernandez, and E. Vazquez,

“Anomaly-based network intrusion detection: Techniques, systems and challenges,”

Computers security, vol. 28, pp. 18–28, 2009.

[9] F. Schuster, A. Paul, and H. Konig, “Towards learning normality for anomaly de-

tection in industrial control networks,” in Emerging Management Mechanisms for

the Future Internet, (Berlin), pp. 61–72, Springer, 2013.

57

58

[10] “Profinet - the leading industrial ethernet standard.”

https://www.profibus.com/technology/profinet/. Accessed: 2019-04-16.

[11] E. Ahmed, A. Ahmed, I. Yaqoob, J. Shuja, A. Gani, M. Imran, and M. Shoaib,

“Bringing computation closer towards user network: Is edge computing the solu-

tion?,” IEEE Communications Magazine, 2017.

[12] H. J. Liao, C. H. R. Lin, Y. C. Lin, and K. Tung, “Intrusion detection system:

A comprehensive review,” Journal of Network and Computer Applications, vol. 36,

pp. 16–24, 2013.

[13] C. Wressnegger, A. Kellner, and K. Rieck, “Zoe: Content-based anomaly detection

for industrial control systems,” in 2018 48th Annual IEEE/IFIP International Con-

ference on Dependable Systems and Networks (DSN), (Luxembourg City), pp. 127–

138, 2018.

[14] M. Mantere, M. Sailio, and S. Noponen, “Network traffic features for anomaly de-

tection in specific industrial control system network,” Future Internet, pp. 460–473,

2013.

[15] G. S. Sestito, A. C. Tucato, A. L. Dias, M. S. Rocha, M. M. da Silva, P. Ferrari,

and D. Brandao, “A method for anomalies detection in real-time ethernet data

traffic applied to profinet,” IEEE Transactions on Industrial Informatics, vol. 14,

pp. 2171–2180, 2018.

[16] F. Schuster, A. Paul, R. Rietz, and H. Koenig, “Potentials of using one-class svm for

detecting protocol-specific anomalies in industrial networks,” in 2015 IEEE Sympo-

sium Series on Computational Intelligence, (Cape Town), 2015.

[17] P. Mulinka and P. Casas, “Stream-based machine learning for network security and

anomaly detection,” in Proceedings of the 2018 Workshop on Big Data Analytics and

Machine Learning for Data Communication Networks, (New York), pp. 1–7, 2018.

[18] Profinet, “Profinet technology and application - system description.”

https://www.profibus.com/index.php?eID=dumpFile&t=f&f=82430&token=

7cbb78f5ba6b3e17762ab594f803f1901eb24fdf, November 2018. Accessed: 2019-04-

18.

[19] P. Thomas, “An introduction to profinet frame analysis using wireshark.”

https://profibusgroup.files.wordpress.com/2013/01/w4-profinet-frame-analysis-

peter-thomas.pdf, May 2013. Accessed: 2019-04-18.

[20] Profinet, “Profinet system description - open solution for the world of automation.”

https://www2.mmu.ac.uk/media/mmuacuk/content/documents/ascent/

B01 PROFINET system en.pdf, September 2010. Accessed: 2019-04-18.

59

[21] Z. Lin and S. Pearson, “An inside look at industrial ethernet communication pro-

tocols.” http://www.ti.com/lit/wp/spry254b/spry254b.pdf, July 2018. Accessed:

2019-04-19.

[22] J. Scheible and A. Lu, “Anomaly detection on the edge,” in MILCOM 2017 - 2017

IEEE Military Communications Conference (MILCOM), (Baltimore), IEEE, 2017.

[23] J. Zhang and M. Zulkernine, “Anomaly based network intrusion detection with

unsupervised outlier detection,” in 2006 IEEE International Conference on Com-

munications, (Instanbul), 2006.

[24] L. Golab and M. T. Ozsu, “Issues in data stream management,” ACM Sigmod

Record, vol. 32, pp. 5–14, 2003.

[25] C. Xu, Scalable Validation of Data Streams. PhD thesis, Uppsala Universitet, Upp-

sala, 2016.

[26] P. Domingos, “A few useful things to know about machine learning,” Communica-

tions of the ACM, pp. 78–87, 2012.

[27] I. Souiden, Z. Brahmi, and H. Toumi, “A survey on outlier detection in the context of

stream mining: review of existing approaches and recommadations,” in International

Conference on Intelligent Systems Design and Applications, pp. 372–383, Springer,

Cham, 2016.

[28] A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM

Computing Surveys, pp. 264–323, 1999.

[29] R. Xu and D. Wunsch, “Survey of clustering algorithms,” IEEE Transactions on

Neural Networks, vol. 16, pp. 645–678, 2005.

[30] K. Leung and C. Leckie, “Unsupervised anomaly detection in network intrusion de-

tection using clusters,” in Proceedings of the Twenty-eighth Australasian conference

on Computer Science, vol. 38, (Newcastle), pp. 333–342, 2005.

[31] Y. Chen and L. Tu, “Density-based clustering for real-time stream data,” in Pro-

ceedings of the 13th ACM SIGKDD international conference on Knowledge discovery

and data mining, (New York), pp. 133–142, ACM, 2007.

[32] E. Schubert, J. Sander, M. Ester, H. P. Kriegel, and X. Xu, “Dbscan revisited, revis-

ited: Why and how you should (still) use dbscan,” ACM Transactions on Database

Systems, vol. 42, pp. 1–21, 2017.

[33] M. Baud and M. Felser, “Profinet io-device emulator based on the man-in-the-middle

attack,” in Emerging Technologies and Factory Automation, (Prague), 2006.

60

[34] M. Thottan, G. Liu, and C. Ji, “Anomaly detection approaches for communication

networks,” in Algorithms for Next Generation Networks, (London), pp. 239–261,

2010.

[35] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM

computing surveys (CSUR), p. 15, 2009.

[36] P. Laskov, P. Dussel, C. Schafer, and K. Rieck, “Learning intrusion detection: Super-

vised or unsupervised?,” in Image Analysis and Processing - ICIAP 2005, (Berlin),

pp. 50–57, 2005.

[37] M. Goldstein and S. Uchida, “A comparative evaluation of unsupervised anomaly

detection algorithms for multivariate data,” PloS one, p. 15, 2016.

[38] S. Omar, A. Ngadi, and H. H. Jebur, “Machine learning techniques for anomaly

detection: an overview,” International Journal of Computer Applications, vol. 79,

2013.

[39] R. H. Moulton, H. L. Viktor, N. Japkowicz, and J. Gama, “Clustering in the presence

of concept drift,” in Joint European Conference on Machine Learning and Knowledge

Discovery in Databases, pp. 339–355, 2018.

[40] F. Cao, M. Estert, W. Qian, and A. Zhou, “Density-based clustering over an evolving

data stream with noise,” in Proceedings of the 2006 SIAM international conference

on data mining, pp. 328–339, Society for Industrial and Applied Mathematics, 2006.

[41] A. T. Tran, “Network anomaly detection,” 2017.

[42] J. Hurwitz and D. Kirsch, Machine Learning For Dummies, IBM Limited Edition.

Wiley, 2018.